ci(deploy): auto-register Telegram webhook + commands after SAM deploy

Append two steps to .github/workflows/deploy.yml that POST setWebhook
and setMyCommands against the freshly-deployed Function URL, reading
credentials from SSM. Mirrors `make telegram-setup` but inlined to
avoid the Makefile's --profile admin assumption.

Token and webhook-secret are masked via ::add-mask:: before any echo.
Jobs fail loudly on Telegram API errors via `jq -e .ok`.

Mark the manual setWebhook snippets in docs/deploy-aws.md and
docs/deploy-aws-free-tier-guide.md as break-glass.
This commit is contained in:
2026-05-16 10:55:43 +07:00
parent 39491d134e
commit 1f5f304041
6 changed files with 336 additions and 0 deletions
+46
View File
@@ -63,3 +63,49 @@ jobs:
--output text)
echo "FunctionUrl=$URL"
curl -fsSL --max-time 30 "$URL/" | tee /tmp/smoke.json | jq .
- name: Register Telegram webhook
env:
STACK_ENV: prod
run: |
set -euo pipefail
URL=$(aws cloudformation describe-stacks \
--stack-name "$STACK_NAME" \
--query "Stacks[0].Outputs[?OutputKey=='FunctionUrl'].OutputValue" \
--output text)
TOKEN=$(aws ssm get-parameter \
--name "/miti99bot/${STACK_ENV}/telegram-bot-token" \
--with-decryption --query Parameter.Value --output text)
echo "::add-mask::$TOKEN"
SECRET=$(aws ssm get-parameter \
--name "/miti99bot/${STACK_ENV}/telegram-webhook-secret" \
--with-decryption --query Parameter.Value --output text)
echo "::add-mask::$SECRET"
WEBHOOK_URL="${URL%/}/webhook"
echo "Setting Telegram webhook to ${WEBHOOK_URL}"
RESP=$(curl -fsS --max-time 30 -X POST \
"https://api.telegram.org/bot${TOKEN}/setWebhook" \
-d "url=${WEBHOOK_URL}" \
-d "secret_token=${SECRET}" \
-d 'allowed_updates=["message","callback_query"]')
echo "$RESP" | jq -e '.ok == true' >/dev/null \
|| { echo "setWebhook failed: $RESP"; exit 1; }
echo "$RESP" | jq '{ok, result, description}'
- name: Register Telegram command menu
env:
STACK_ENV: prod
run: |
set -euo pipefail
TOKEN=$(aws ssm get-parameter \
--name "/miti99bot/${STACK_ENV}/telegram-bot-token" \
--with-decryption --query Parameter.Value --output text)
echo "::add-mask::$TOKEN"
echo "Registering Telegram commands from aws/telegram-commands.json"
RESP=$(curl -fsS --max-time 30 -X POST \
"https://api.telegram.org/bot${TOKEN}/setMyCommands" \
-H 'Content-Type: application/json' \
--data-binary "@aws/telegram-commands.json")
echo "$RESP" | jq -e '.ok == true' >/dev/null \
|| { echo "setMyCommands failed: $RESP"; exit 1; }
echo "$RESP" | jq '{ok, result, description}'
+2
View File
@@ -258,6 +258,8 @@ aws cloudformation describe-stacks --profile admin \
## Step 5 — Point Telegram at the webhook
> For first-time setup only. After `Step 6` wires the GitHub workflow, every push to `main` auto-runs `setWebhook` + `setMyCommands`; this manual block is the break-glass path.
```sh
URL=… # from previous command
TOKEN=$(aws ssm get-parameter --profile admin \
+2
View File
@@ -53,6 +53,8 @@ curl -fsSL "$(...)/" | jq . # health JSON
## Set the Telegram webhook
> `.github/workflows/deploy.yml` auto-runs `setWebhook` + `setMyCommands` after every push to `main`. The snippet below is the break-glass equivalent for manual / out-of-band fixes (e.g. rerun from a workstation when CI is unavailable).
```sh
URL=$(aws cloudformation describe-stacks --stack-name miti99bot \
--query "Stacks[0].Outputs[?OutputKey=='FunctionUrl'].OutputValue" --output text)
@@ -0,0 +1,147 @@
# Phase 01 — Wire `telegram-setup` Into `deploy.yml`
**Status:** implemented (pending live verification on next push to main)
**Priority:** P1 (next deploy needs it for full automation)
**Estimate:** ~30 min implementation + 1 deploy cycle to verify
## Context links
- Parent plan: `../plan.md`
- Existing GH workflow: `.github/workflows/deploy.yml`
- Reference Makefile targets: `Makefile` lines 101-148 (`telegram-setup`, `telegram-webhook`, `telegram-commands`)
- Commands payload: `aws/telegram-commands.json`
- Deploy role IAM: `aws/README.md` § 4 (already has `AmazonSSMFullAccess`)
- Cutover docs: `docs/deploy-aws.md` line 64, `docs/deploy-aws-free-tier-guide.md` line 270 (current manual steps)
## Overview
Add two steps to `deploy.yml` after the existing **Smoke test** step:
1. **Register Telegram webhook** — read `FunctionUrl` from CFN, read token + secret from SSM, `POST` `setWebhook`.
2. **Register Telegram command menu** — read token from SSM, `POST` `setMyCommands` with `aws/telegram-commands.json`.
Mirror the existing inline-CLI pattern used by the Smoke step (no `make` indirection — Makefile uses `--profile admin` which is wrong for CI).
## Key insights
- Deploy role already has `AmazonSSMFullAccess` and `AWSCloudFormationFullAccess` → no IAM change.
- `setWebhook` / `setMyCommands` are idempotent → safe to run every push.
- CFN `Outputs.FunctionUrl` (template.yaml:208-210) → reused from smoke step.
- SSM param paths fixed at `/miti99bot/prod/{telegram-bot-token,telegram-webhook-secret}` (per `aws/README.md` § 3, `samconfig.toml`).
- Bot token must be **masked** before any shell echo (it is a credential in URL path).
- `aws/telegram-commands.json` is committed → `--data-binary "@aws/telegram-commands.json"` works directly.
## Requirements
### Functional
1. After a green SAM deploy + smoke test, the workflow registers the webhook + commands.
2. Webhook URL format: `${FunctionUrl%/}/webhook` (trim trailing slash; Function URLs sometimes include it).
3. `allowed_updates` must equal `["message","callback_query"]` (matches `Makefile:120`).
4. `secret_token` from SSM is sent in the `setWebhook` payload — bot validates this on every incoming update.
5. Job fails (non-zero exit) if either Telegram call returns non-2xx **or** `"ok": false`.
### Non-functional
- Token never appears in plaintext logs (`::add-mask::TOKEN` before use).
- No new secrets in GitHub repo settings — everything still flows through SSM.
- No new third-party action — only `aws` CLI + `curl` + `jq` (all preinstalled on `ubuntu-latest`).
## Architecture
```
deploy.yml job: deploy (existing)
├─ checkout (existing)
├─ setup-go (existing)
├─ setup-sam (existing)
├─ configure-aws-creds (existing)
├─ build-lambda (existing)
├─ sam-deploy (existing)
├─ smoke-test (existing) — reads FunctionUrl from CFN
├─ register-webhook (NEW) — reads FunctionUrl + SSM token/secret, POST setWebhook
└─ register-commands (NEW) — reads SSM token, POST setMyCommands w/ aws/telegram-commands.json
```
The Function URL is fetched twice (smoke + register-webhook). Acceptable: CFN describe-stacks is fast and the steps stay independent / debuggable. Optimization (cache URL in a step output) is out of scope.
## Related code files
**Modify**
- `.github/workflows/deploy.yml` — append two steps after **Smoke test**
**Read (no change)**
- `Makefile` lines 101-148 — reference implementation
- `aws/telegram-commands.json` — payload body
**Possibly update**
- `docs/deploy-aws.md` line 64 — current "manual setWebhook" instructions become "automatic on push; manual command kept for emergencies"
- `docs/deploy-aws-free-tier-guide.md` line 270 — same note
## Implementation steps
1. **Open** `.github/workflows/deploy.yml`. Locate the `- name: Smoke test (Function URL responds)` step (last step today).
2. **Append step `Register Telegram webhook`** after smoke-test:
- Reuse `STACK_NAME` env (already at job level).
- `URL=$(aws cloudformation describe-stacks ... FunctionUrl ...)` — copy pattern from smoke step.
- `TOKEN=$(aws ssm get-parameter --name /miti99bot/prod/telegram-bot-token --with-decryption --query Parameter.Value --output text)`
- `echo "::add-mask::$TOKEN"` immediately after read.
- `SECRET=$(aws ssm get-parameter --name /miti99bot/prod/telegram-webhook-secret --with-decryption --query Parameter.Value --output text)`
- `echo "::add-mask::$SECRET"` immediately after read.
- `WEBHOOK_URL="${URL%/}/webhook"`
- `RESP=$(curl -fsS -X POST "https://api.telegram.org/bot${TOKEN}/setWebhook" -d "url=${WEBHOOK_URL}" -d "secret_token=${SECRET}" -d 'allowed_updates=["message","callback_query"]')`
- `echo "$RESP" | jq -e '.ok == true' >/dev/null || { echo "setWebhook failed: $RESP"; exit 1; }`
- `echo "$RESP" | jq '{ok, result}'`
3. **Append step `Register Telegram command menu`**:
- Read `TOKEN` from SSM (same path) and re-mask. (Step env doesn't persist across steps; re-read is fine — single SSM call is cheap.)
- `RESP=$(curl -fsS -X POST "https://api.telegram.org/bot${TOKEN}/setMyCommands" -H 'Content-Type: application/json' --data-binary "@aws/telegram-commands.json")`
- Same `jq -e '.ok == true'` validation + pretty-print.
4. **Lint locally** with `actionlint` if available (or just YAML parse): `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/deploy.yml'))"`.
5. **Update docs**:
- `docs/deploy-aws.md` line 64: add note "as of <date>, push-to-main auto-runs `setWebhook` + `setMyCommands`; manual command below kept for break-glass".
- Same in `docs/deploy-aws-free-tier-guide.md` line 270.
6. **Commit** with conventional message: `ci(deploy): auto-register Telegram webhook + commands after SAM deploy`.
## Todo list
- [x] Read current `deploy.yml` end (smoke step) to confirm insertion point
- [x] Append `Register Telegram webhook` step (with token mask + `jq -e` validation)
- [x] Append `Register Telegram command menu` step (with token mask + `jq -e` validation)
- [x] Validate YAML parse locally (`yaml.safe_load` → OK)
- [x] Update `docs/deploy-aws.md` + `docs/deploy-aws-free-tier-guide.md` notes
- [ ] Commit + push, observe first run in GH Actions UI
- [ ] Verify `curl https://api.telegram.org/bot$TOKEN/getWebhookInfo` shows the Function URL post-deploy
## Success criteria
| Check | How to verify |
|-------|---------------|
| Step `Register Telegram webhook` shows green | GH Actions run UI |
| Step `Register Telegram command menu` shows green | GH Actions run UI |
| `setWebhook` response logged as `{ok:true, result:true, description:"Webhook was set"}` | step log |
| Token / secret not visible in logs | search step output for first 4 chars of token → must show `***` |
| `make telegram-webhook-info` from local shows `url == <FunctionUrl>/webhook` | local `make` after pipeline finishes |
| `/help` works in Telegram after deploy | live bot smoke test |
## Risk assessment
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|-----------|
| Telegram API blip → CI fails despite healthy deploy | Low | Low (Lambda still serving) | `-fsS` + manual workflow re-run; document break-glass `make telegram-setup` |
| SSM param missing on first-ever deploy | Low | High (deploy red) | Precondition documented in `aws/README.md` § 3 — params must exist before first push (already true today) |
| Bot token printed in `set -x`-style verbose log | Medium | High (token leak) | `::add-mask::` immediately after SSM read; never use `set -x` in these steps |
| `aws/telegram-commands.json` invalid JSON | Low | Low (single-step fail) | `--data-binary @file` + Telegram validates; `jq -e .ok` catches |
## Security considerations
- **Mask tokens / secrets**: `::add-mask::` after every SSM read. GH Actions then redacts that string from all subsequent log lines (including child commands).
- **Path-credential leak**: `curl` URL contains `${TOKEN}` — masking covers it, but additionally avoid `set -x`, `-v`, or `echo "$URL"` in any debug temp.
- **No new IAM**: deploy role keeps existing scope; no expansion of privileges.
- **Webhook secret**: validates inbound requests at `internal/telegram/webhook.go:19-20` (`X-Telegram-Bot-Api-Secret-Token` header). Auto-registration enforces same secret across token rotations.
## Next steps
After this phase merges and one deploy cycle confirms green:
- (Optional follow-up, separate plan) Replace `AmazonSSMFullAccess` with a scoped policy granting only `ssm:GetParameter` on `/miti99bot/*` — pairs with the broader IAM tightening already deferred in `aws/README.md` § 4.
- (Optional) Cache `FunctionUrl` between smoke and register-webhook via `$GITHUB_OUTPUT` — micro-optimization only.
## Unresolved questions
- None. All decisions locked: always-on registration, inline CLI (no `make` from CI), `jq -e` failure semantics, both docs files updated.
@@ -0,0 +1,60 @@
# Auto-Register Telegram Webhook + Commands After Deploy
**Date:** 2026-05-16
**Slug:** `260516-1035-auto-register-after-deploy`
**Status:** Implemented (pending commit + live verification on next push to main)
**Type:** CI/CD enhancement (single phase)
**Mode:** fast (no research needed — referenced code paths already exist)
## Goal
Every push to `main` already runs `.github/workflows/deploy.yml` → SAM deploy → smoke-test the Function URL. After a successful deploy, register the Telegram webhook + command menu **automatically** (currently done manually via `make telegram-setup`).
## Why
- Eliminates manual `make telegram-setup` step after first deploy / handler-path change / webhook secret rotation.
- Self-healing: if Telegram's `webhook_url` ever drifts from the Function URL (e.g. secret rotated but `setWebhook` forgotten), the next deploy fixes it.
- `setWebhook` and `setMyCommands` are idempotent — running on every deploy is safe and cheap.
## Non-goals
- Do **not** introduce a new Go binary / Lambda hook / CloudFormation custom resource.
- Do **not** rewrite the existing Makefile targets — keep `make telegram-setup` working for local/manual use.
- Do **not** touch `aws/telegram-commands.json` content or module behavior.
## Phases
| # | Phase | File | Status |
|---|-------|------|--------|
| 01 | Wire telegram-setup into deploy.yml | `phase-01-wire-telegram-setup-into-deploy.md` | implemented |
## Key files
- `.github/workflows/deploy.yml` — add post-smoke-test registration step
- `Makefile` (lines 101-148) — reference impl (do not modify unless CI parity requires)
- `aws/telegram-commands.json` — read by `setMyCommands` step
- `aws/README.md` § 4 — deploy role already has `AmazonSSMFullAccess`, no IAM change needed
## Dependencies
- Deploy role `github-deploy-miti99bot` already has `AmazonSSMFullAccess` + `AWSCloudFormationFullAccess` (verified in `aws/README.md` § 4).
- SSM params `/miti99bot/prod/telegram-bot-token` and `/miti99bot/prod/telegram-webhook-secret` already populated (precondition of first deploy).
## Risks
- **Telegram API outage on deploy** → CI fails even though Lambda is healthy. Mitigation: use `curl -fsS` so non-2xx aborts the job; failure surface is loud, recoverable by re-running workflow.
- **Token exposed in logs** → use `::add-mask::` for TOKEN before any echo / curl line; do not pass via `-d` URL arg (token is in path, but mask anyway).
- **Webhook secret rotation race** → SSM read happens after deploy, so newest secret wins. No race in practice.
## Success criteria
After merging this change, the next push to `main`:
1. SAM deploy succeeds.
2. Smoke-test passes.
3. New step: `curl /setWebhook` returns `{"ok":true,...}`.
4. New step: `curl /setMyCommands` returns `{"ok":true,...}`.
5. `getWebhookInfo` shows `url == <FunctionUrl>/webhook`.
## Unresolved questions
- None (locked decisions): always-on registration, inline CLI calls (not `make telegram-setup`), fail-loud on API errors.
@@ -0,0 +1,79 @@
# Code Review — Auto-Register Telegram Webhook + Commands After Deploy
**Date:** 2026-05-16
**Branch:** main (uncommitted)
**Scope:** `.github/workflows/deploy.yml` (+46 lines), `docs/deploy-aws.md` (+2), `docs/deploy-aws-free-tier-guide.md` (+2)
**Plan:** `plans/260516-1035-auto-register-after-deploy/`
## Status
**DONE — no must-fix issues.** Implementation faithfully mirrors `Makefile:101-148` reference, hardens it with `set -euo pipefail` + `jq -e` validation, and masks credentials before they can reach any log line. All 8 acceptance criteria from the plan are met by the diff.
## Critical findings
None. No blockers, no security regressions, no contract breaks.
## Verified safe
1. **Mask timing — no leak window.**
- `.github/workflows/deploy.yml:76-79``TOKEN=$(...)` then `echo "::add-mask::$TOKEN"` on the very next line. AWS CLI with `--output text --query Parameter.Value` writes nothing else to stdout/stderr, so the value never escapes between capture and mask.
- Same pattern at lines 80-83 (SECRET) and 100-103 (re-read TOKEN in commands step).
2. **No URL-leak via curl error.** Empirically verified `curl 8.5.0 -fsS` on 4xx prints `curl: (22) The requested URL returned error: <code>` — URL is **not** in the message. Even if it were, `::add-mask::` was issued at line 79 before line 86's `curl`, so GH Actions redacts the token from all subsequent log output (stdout + stderr) in the same job. No `set -x`, no `-v`, no `curl -w` interpolating the URL.
3. **Failure semantics — fail-loud, no silent swallows.**
- `set -euo pipefail` + `RESP=$(curl -fsS ...)` — empirically confirmed: curl exit-22 propagates through command substitution, `set -e` kills the shell before next line. (Note: `errexit` *does* propagate from `$(...)` since bash 4.0.)
- `jq -e '.ok == true' >/dev/null || { echo ...; exit 1; }` — catches Telegram returning HTTP 200 with `{"ok":false}` body. The `|| { ... }` keeps `set -e` honest (no pipefail-with-tee anti-patterns).
- Both steps lack `continue-on-error: true` — failures bubble up to the job.
4. **Webhook contract matches code.**
- Path: workflow sends to `${URL%/}/webhook`; router accepts at `internal/server/router.go:51` (`mux.Handle("/webhook", ...)`).
- Secret: workflow forwards `secret_token=${SECRET}`; Telegram echoes back via `X-Telegram-Bot-Api-Secret-Token` header; `internal/telegram/webhook.go:22,48-52` validates with `subtle.ConstantTimeCompare`. Same SSM param (`/miti99bot/prod/telegram-webhook-secret`) feeds both setWebhook and Lambda env (via `template.yaml` `:1` resolver), so values stay in sync.
- `allowed_updates=["message","callback_query"]` — matches `Makefile:120` reference exactly.
5. **setMyCommands payload format.** `aws/telegram-commands.json` has top-level `{"commands": [...]}` which matches Telegram API spec for `--data-binary @file` with `Content-Type: application/json`. Verified via Telegram Bot API docs.
6. **IAM permissions present.** `aws/README.md:74` lists `AmazonSSMFullAccess` and `:69` lists `AWSCloudFormationFullAccess` attached to `github-deploy-miti99bot`. No new grants needed. Plan claim verified.
7. **Concurrency safe.** `.github/workflows/deploy.yml:12-14``concurrency.group: deploy-prod`, `cancel-in-progress: false` → serial queueing, no parallel setWebhook race. SSM secret values are versioned and read-after-deploy, so the newest value wins by ordering, not by collision.
8. **YAML structural validity.** 9 total steps (was 7, +2 new). Indentation consistent with existing steps; `env:`, `run:` blocks well-formed (read at `.github/workflows/deploy.yml:67-93,95-111`). No actionlint available locally to formally validate, but eye-parse is clean.
9. **Docs labeling is unambiguous.**
- `docs/deploy-aws.md:56` — "auto-runs `setWebhook` + `setMyCommands` after every push to `main`. The snippet below is the break-glass equivalent..."
- `docs/deploy-aws-free-tier-guide.md:261` — "For first-time setup only. After Step 6 wires the GitHub workflow, every push to `main` auto-runs `setWebhook` + `setMyCommands`; this manual block is the break-glass path."
- Both clearly tag the manual blocks as break-glass / first-time-only. Low risk of a reader running them every deploy.
10. **Idempotency.** `setWebhook` and `setMyCommands` are documented as idempotent by Telegram. Running on every push is safe and self-healing (per plan's stated goal).
11. **No YAGNI/scope creep.** Diff is exactly the two new steps + the two one-line doc notes. No defensive plumbing, no caching layer, no follow-up IAM tightening (correctly deferred to a separate plan).
## Recommendations (defer — non-blocking)
| # | Location | Note |
|---|----------|------|
| R1 | `.github/workflows/deploy.yml:91-92,109-110` | `echo "setWebhook failed: $RESP"` prints the full Telegram response on failure. Telegram's `description` field is generic ("Bad Request: ..."), so token-in-body leak is implausible, but masking is already in place as belt-and-suspenders. Keep as-is — useful for diagnosing rare API rejections. |
| R2 | `docs/deploy-aws.md:67`, `docs/deploy-aws-free-tier-guide.md:273` | Break-glass manual blocks still use `${URL}webhook` (no `%/` trim). This relies on CFN's `FunctionUrl` always ending with `/`, which is the documented Lambda behavior. Not a regression (pre-existing). Tighten next time the file is touched. |
| R3 | `.github/workflows/deploy.yml:76-78,100-102` | Two SSM calls in two adjacent steps to read the same token. Plan already acknowledges this as acceptable (single SSM call is cheap, keeps steps independently re-runnable). Single-step consolidation or `$GITHUB_OUTPUT` is the documented follow-up. |
| R4 | `.github/workflows/deploy.yml:67-93` | No `if: success()` guard on the new steps. GH Actions defaults to running a step only when prior steps succeed, so this is redundant — but adding it explicitly would make the intent obvious to a reader. Optional. |
## Unresolved questions
None. All adversarial vectors closed by verification against codebase + empirical curl/bash tests.
---
## Citations
- Workflow diff: `.github/workflows/deploy.yml:67-111`
- Webhook handler validation: `internal/telegram/webhook.go:22,48-52`
- Router mount: `internal/server/router.go:51`
- IAM policy attachments: `aws/README.md:69,74`
- CFN output declaration: `template.yaml:207-210`
- Reference Makefile targets: `Makefile:101-148`
- Commands JSON: `aws/telegram-commands.json:1-96`
- Doc break-glass labels: `docs/deploy-aws.md:56`, `docs/deploy-aws-free-tier-guide.md:261`
- Concurrency control: `.github/workflows/deploy.yml:12-14`
**Status:** DONE
**Summary:** Implementation is correct, secure, and matches the plan. Token masking is in place before any line that could log the value; `curl -fsS` + `jq -e` + `set -euo pipefail` produce loud failures with no silent swallows; `/webhook` path and secret-token contract match `internal/server/router.go:51` and `internal/telegram/webhook.go:48-52`. Docs clearly label manual blocks as break-glass. No must-fix issues; safe to commit.