ci(deploy): auto-register Telegram webhook + commands after SAM deploy

Append two steps to .github/workflows/deploy.yml that POST setWebhook and setMyCommands against the freshly-deployed Function URL, reading credentials from SSM. Mirrors `make telegram-setup` but inlined to avoid the Makefile's --profile admin assumption. Token and webhook-secret are masked via ::add-mask:: before any echo. Jobs fail loudly on Telegram API errors via `jq -e .ok`. Mark the manual setWebhook snippets in docs/deploy-aws.md and docs/deploy-aws-free-tier-guide.md as break-glass.
2026-07-28 20:20:52 +00:00 · 2026-05-16 10:55:43 +07:00
parent 39491d134e
commit 1f5f304041
6 changed files with 336 additions and 0 deletions
@@ -63,3 +63,49 @@ jobs:
            --output text)
          echo "FunctionUrl=$URL"
          curl -fsSL --max-time 30 "$URL/" | tee /tmp/smoke.json | jq .
+
+      - name: Register Telegram webhook
+        env:
+          STACK_ENV: prod
+        run: |
+          set -euo pipefail
+          URL=$(aws cloudformation describe-stacks \
+            --stack-name "$STACK_NAME" \
+            --query "Stacks[0].Outputs[?OutputKey=='FunctionUrl'].OutputValue" \
+            --output text)
+          TOKEN=$(aws ssm get-parameter \
+            --name "/miti99bot/${STACK_ENV}/telegram-bot-token" \
+            --with-decryption --query Parameter.Value --output text)
+          echo "::add-mask::$TOKEN"
+          SECRET=$(aws ssm get-parameter \
+            --name "/miti99bot/${STACK_ENV}/telegram-webhook-secret" \
+            --with-decryption --query Parameter.Value --output text)
+          echo "::add-mask::$SECRET"
+          WEBHOOK_URL="${URL%/}/webhook"
+          echo "Setting Telegram webhook to ${WEBHOOK_URL}"
+          RESP=$(curl -fsS --max-time 30 -X POST \
+            "https://api.telegram.org/bot${TOKEN}/setWebhook" \
+            -d "url=${WEBHOOK_URL}" \
+            -d "secret_token=${SECRET}" \
+            -d 'allowed_updates=["message","callback_query"]')
+          echo "$RESP" | jq -e '.ok == true' >/dev/null \
+            || { echo "setWebhook failed: $RESP"; exit 1; }
+          echo "$RESP" | jq '{ok, result, description}'
+
+      - name: Register Telegram command menu
+        env:
+          STACK_ENV: prod
+        run: |
+          set -euo pipefail
+          TOKEN=$(aws ssm get-parameter \
+            --name "/miti99bot/${STACK_ENV}/telegram-bot-token" \
+            --with-decryption --query Parameter.Value --output text)
+          echo "::add-mask::$TOKEN"
+          echo "Registering Telegram commands from aws/telegram-commands.json"
+          RESP=$(curl -fsS --max-time 30 -X POST \
+            "https://api.telegram.org/bot${TOKEN}/setMyCommands" \
+            -H 'Content-Type: application/json' \
+            --data-binary "@aws/telegram-commands.json")
+          echo "$RESP" | jq -e '.ok == true' >/dev/null \
+            || { echo "setMyCommands failed: $RESP"; exit 1; }
+          echo "$RESP" | jq '{ok, result, description}'
@@ -258,6 +258,8 @@ aws cloudformation describe-stacks --profile admin \

 ## Step 5 — Point Telegram at the webhook

+> For first-time setup only. After `Step 6` wires the GitHub workflow, every push to `main` auto-runs `setWebhook` + `setMyCommands`; this manual block is the break-glass path.
+
 ```sh
 URL=…   # from previous command
 TOKEN=$(aws ssm get-parameter --profile admin \
@@ -53,6 +53,8 @@ curl -fsSL "$(...)/" | jq .                       # health JSON

 ## Set the Telegram webhook

+> `.github/workflows/deploy.yml` auto-runs `setWebhook` + `setMyCommands` after every push to `main`. The snippet below is the break-glass equivalent for manual / out-of-band fixes (e.g. rerun from a workstation when CI is unavailable).
+
 ```sh
 URL=$(aws cloudformation describe-stacks --stack-name miti99bot \
        --query "Stacks[0].Outputs[?OutputKey=='FunctionUrl'].OutputValue" --output text)
@@ -0,0 +1,147 @@
+# Phase 01 — Wire `telegram-setup` Into `deploy.yml`
+
+**Status:** implemented (pending live verification on next push to main)
+**Priority:** P1 (next deploy needs it for full automation)
+**Estimate:** ~30 min implementation + 1 deploy cycle to verify
+
+## Context links
+
+- Parent plan: `../plan.md`
+- Existing GH workflow: `.github/workflows/deploy.yml`
+- Reference Makefile targets: `Makefile` lines 101-148 (`telegram-setup`, `telegram-webhook`, `telegram-commands`)
+- Commands payload: `aws/telegram-commands.json`
+- Deploy role IAM: `aws/README.md` § 4 (already has `AmazonSSMFullAccess`)
+- Cutover docs: `docs/deploy-aws.md` line 64, `docs/deploy-aws-free-tier-guide.md` line 270 (current manual steps)
+
+## Overview
+
+Add two steps to `deploy.yml` after the existing **Smoke test** step:
+
+1. **Register Telegram webhook** — read `FunctionUrl` from CFN, read token + secret from SSM, `POST` `setWebhook`.
+2. **Register Telegram command menu** — read token from SSM, `POST` `setMyCommands` with `aws/telegram-commands.json`.
+
+Mirror the existing inline-CLI pattern used by the Smoke step (no `make` indirection — Makefile uses `--profile admin` which is wrong for CI).
+
+## Key insights
+
+- Deploy role already has `AmazonSSMFullAccess` and `AWSCloudFormationFullAccess` → no IAM change.
+- `setWebhook` / `setMyCommands` are idempotent → safe to run every push.
+- CFN `Outputs.FunctionUrl` (template.yaml:208-210) → reused from smoke step.
+- SSM param paths fixed at `/miti99bot/prod/{telegram-bot-token,telegram-webhook-secret}` (per `aws/README.md` § 3, `samconfig.toml`).
+- Bot token must be **masked** before any shell echo (it is a credential in URL path).
+- `aws/telegram-commands.json` is committed → `--data-binary "@aws/telegram-commands.json"` works directly.
+
+## Requirements
+
+### Functional
+1. After a green SAM deploy + smoke test, the workflow registers the webhook + commands.
+2. Webhook URL format: `${FunctionUrl%/}/webhook` (trim trailing slash; Function URLs sometimes include it).
+3. `allowed_updates` must equal `["message","callback_query"]` (matches `Makefile:120`).
+4. `secret_token` from SSM is sent in the `setWebhook` payload — bot validates this on every incoming update.
+5. Job fails (non-zero exit) if either Telegram call returns non-2xx **or** `"ok": false`.
+
+### Non-functional
+- Token never appears in plaintext logs (`::add-mask::TOKEN` before use).
+- No new secrets in GitHub repo settings — everything still flows through SSM.
+- No new third-party action — only `aws` CLI + `curl` + `jq` (all preinstalled on `ubuntu-latest`).
+
+## Architecture
+
+```
+deploy.yml job: deploy (existing)
+  ├─ checkout              (existing)
+  ├─ setup-go              (existing)
+  ├─ setup-sam             (existing)
+  ├─ configure-aws-creds   (existing)
+  ├─ build-lambda          (existing)
+  ├─ sam-deploy            (existing)
+  ├─ smoke-test            (existing) — reads FunctionUrl from CFN
+  ├─ register-webhook      (NEW)      — reads FunctionUrl + SSM token/secret, POST setWebhook
+  └─ register-commands     (NEW)      — reads SSM token, POST setMyCommands w/ aws/telegram-commands.json
+```
+
+The Function URL is fetched twice (smoke + register-webhook). Acceptable: CFN describe-stacks is fast and the steps stay independent / debuggable. Optimization (cache URL in a step output) is out of scope.
+
+## Related code files
+
+**Modify**
+- `.github/workflows/deploy.yml` — append two steps after **Smoke test**
+
+**Read (no change)**
+- `Makefile` lines 101-148 — reference implementation
+- `aws/telegram-commands.json` — payload body
+
+**Possibly update**
+- `docs/deploy-aws.md` line 64 — current "manual setWebhook" instructions become "automatic on push; manual command kept for emergencies"
+- `docs/deploy-aws-free-tier-guide.md` line 270 — same note
+
+## Implementation steps
+
+1. **Open** `.github/workflows/deploy.yml`. Locate the `- name: Smoke test (Function URL responds)` step (last step today).
+2. **Append step `Register Telegram webhook`** after smoke-test:
+   - Reuse `STACK_NAME` env (already at job level).
+   - `URL=$(aws cloudformation describe-stacks ... FunctionUrl ...)` — copy pattern from smoke step.
+   - `TOKEN=$(aws ssm get-parameter --name /miti99bot/prod/telegram-bot-token --with-decryption --query Parameter.Value --output text)`
+   - `echo "::add-mask::$TOKEN"` immediately after read.
+   - `SECRET=$(aws ssm get-parameter --name /miti99bot/prod/telegram-webhook-secret --with-decryption --query Parameter.Value --output text)`
+   - `echo "::add-mask::$SECRET"` immediately after read.
+   - `WEBHOOK_URL="${URL%/}/webhook"`
+   - `RESP=$(curl -fsS -X POST "https://api.telegram.org/bot${TOKEN}/setWebhook" -d "url=${WEBHOOK_URL}" -d "secret_token=${SECRET}" -d 'allowed_updates=["message","callback_query"]')`
+   - `echo "$RESP" | jq -e '.ok == true' >/dev/null || { echo "setWebhook failed: $RESP"; exit 1; }`
+   - `echo "$RESP" | jq '{ok, result}'`
+3. **Append step `Register Telegram command menu`**:
+   - Read `TOKEN` from SSM (same path) and re-mask. (Step env doesn't persist across steps; re-read is fine — single SSM call is cheap.)
+   - `RESP=$(curl -fsS -X POST "https://api.telegram.org/bot${TOKEN}/setMyCommands" -H 'Content-Type: application/json' --data-binary "@aws/telegram-commands.json")`
+   - Same `jq -e '.ok == true'` validation + pretty-print.
+4. **Lint locally** with `actionlint` if available (or just YAML parse): `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/deploy.yml'))"`.
+5. **Update docs**:
+   - `docs/deploy-aws.md` line 64: add note "as of <date>, push-to-main auto-runs `setWebhook` + `setMyCommands`; manual command below kept for break-glass".
+   - Same in `docs/deploy-aws-free-tier-guide.md` line 270.
+6. **Commit** with conventional message: `ci(deploy): auto-register Telegram webhook + commands after SAM deploy`.
+
+## Todo list
+
+- [x] Read current `deploy.yml` end (smoke step) to confirm insertion point
+- [x] Append `Register Telegram webhook` step (with token mask + `jq -e` validation)
+- [x] Append `Register Telegram command menu` step (with token mask + `jq -e` validation)
+- [x] Validate YAML parse locally (`yaml.safe_load` → OK)
+- [x] Update `docs/deploy-aws.md` + `docs/deploy-aws-free-tier-guide.md` notes
+- [ ] Commit + push, observe first run in GH Actions UI
+- [ ] Verify `curl https://api.telegram.org/bot$TOKEN/getWebhookInfo` shows the Function URL post-deploy
+
+## Success criteria
+
+| Check | How to verify |
+|-------|---------------|
+| Step `Register Telegram webhook` shows green | GH Actions run UI |
+| Step `Register Telegram command menu` shows green | GH Actions run UI |
+| `setWebhook` response logged as `{ok:true, result:true, description:"Webhook was set"}` | step log |
+| Token / secret not visible in logs | search step output for first 4 chars of token → must show `***` |
+| `make telegram-webhook-info` from local shows `url == <FunctionUrl>/webhook` | local `make` after pipeline finishes |
+| `/help` works in Telegram after deploy | live bot smoke test |
+
+## Risk assessment
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|-----------|
+| Telegram API blip → CI fails despite healthy deploy | Low | Low (Lambda still serving) | `-fsS` + manual workflow re-run; document break-glass `make telegram-setup` |
+| SSM param missing on first-ever deploy | Low | High (deploy red) | Precondition documented in `aws/README.md` § 3 — params must exist before first push (already true today) |
+| Bot token printed in `set -x`-style verbose log | Medium | High (token leak) | `::add-mask::` immediately after SSM read; never use `set -x` in these steps |
+| `aws/telegram-commands.json` invalid JSON | Low | Low (single-step fail) | `--data-binary @file` + Telegram validates; `jq -e .ok` catches |
+
+## Security considerations
+
+- **Mask tokens / secrets**: `::add-mask::` after every SSM read. GH Actions then redacts that string from all subsequent log lines (including child commands).
+- **Path-credential leak**: `curl` URL contains `${TOKEN}` — masking covers it, but additionally avoid `set -x`, `-v`, or `echo "$URL"` in any debug temp.
+- **No new IAM**: deploy role keeps existing scope; no expansion of privileges.
+- **Webhook secret**: validates inbound requests at `internal/telegram/webhook.go:19-20` (`X-Telegram-Bot-Api-Secret-Token` header). Auto-registration enforces same secret across token rotations.
+
+## Next steps
+
+After this phase merges and one deploy cycle confirms green:
+- (Optional follow-up, separate plan) Replace `AmazonSSMFullAccess` with a scoped policy granting only `ssm:GetParameter` on `/miti99bot/*` — pairs with the broader IAM tightening already deferred in `aws/README.md` § 4.
+- (Optional) Cache `FunctionUrl` between smoke and register-webhook via `$GITHUB_OUTPUT` — micro-optimization only.
+
+## Unresolved questions
+
+- None. All decisions locked: always-on registration, inline CLI (no `make` from CI), `jq -e` failure semantics, both docs files updated.
@@ -0,0 +1,60 @@
+# Auto-Register Telegram Webhook + Commands After Deploy
+
+**Date:** 2026-05-16
+**Slug:** `260516-1035-auto-register-after-deploy`
+**Status:** Implemented (pending commit + live verification on next push to main)
+**Type:** CI/CD enhancement (single phase)
+**Mode:** fast (no research needed — referenced code paths already exist)
+
+## Goal
+
+Every push to `main` already runs `.github/workflows/deploy.yml` → SAM deploy → smoke-test the Function URL. After a successful deploy, register the Telegram webhook + command menu **automatically** (currently done manually via `make telegram-setup`).
+
+## Why
+
+- Eliminates manual `make telegram-setup` step after first deploy / handler-path change / webhook secret rotation.
+- Self-healing: if Telegram's `webhook_url` ever drifts from the Function URL (e.g. secret rotated but `setWebhook` forgotten), the next deploy fixes it.
+- `setWebhook` and `setMyCommands` are idempotent — running on every deploy is safe and cheap.
+
+## Non-goals
+
+- Do **not** introduce a new Go binary / Lambda hook / CloudFormation custom resource.
+- Do **not** rewrite the existing Makefile targets — keep `make telegram-setup` working for local/manual use.
+- Do **not** touch `aws/telegram-commands.json` content or module behavior.
+
+## Phases
+
+| # | Phase | File | Status |
+|---|-------|------|--------|
+| 01 | Wire telegram-setup into deploy.yml | `phase-01-wire-telegram-setup-into-deploy.md` | implemented |
+
+## Key files
+
+- `.github/workflows/deploy.yml` — add post-smoke-test registration step
+- `Makefile` (lines 101-148) — reference impl (do not modify unless CI parity requires)
+- `aws/telegram-commands.json` — read by `setMyCommands` step
+- `aws/README.md` § 4 — deploy role already has `AmazonSSMFullAccess`, no IAM change needed
+
+## Dependencies
+
+- Deploy role `github-deploy-miti99bot` already has `AmazonSSMFullAccess` + `AWSCloudFormationFullAccess` (verified in `aws/README.md` § 4).
+- SSM params `/miti99bot/prod/telegram-bot-token` and `/miti99bot/prod/telegram-webhook-secret` already populated (precondition of first deploy).
+
+## Risks
+
+- **Telegram API outage on deploy** → CI fails even though Lambda is healthy. Mitigation: use `curl -fsS` so non-2xx aborts the job; failure surface is loud, recoverable by re-running workflow.
+- **Token exposed in logs** → use `::add-mask::` for TOKEN before any echo / curl line; do not pass via `-d` URL arg (token is in path, but mask anyway).
+- **Webhook secret rotation race** → SSM read happens after deploy, so newest secret wins. No race in practice.
+
+## Success criteria
+
+After merging this change, the next push to `main`:
+1. SAM deploy succeeds.
+2. Smoke-test passes.
+3. New step: `curl /setWebhook` returns `{"ok":true,...}`.
+4. New step: `curl /setMyCommands` returns `{"ok":true,...}`.
+5. `getWebhookInfo` shows `url == <FunctionUrl>/webhook`.
+
+## Unresolved questions
+
+- None (locked decisions): always-on registration, inline CLI calls (not `make telegram-setup`), fail-loud on API errors.
@@ -0,0 +1,79 @@
+# Code Review — Auto-Register Telegram Webhook + Commands After Deploy
+
+**Date:** 2026-05-16
+**Branch:** main (uncommitted)
+**Scope:** `.github/workflows/deploy.yml` (+46 lines), `docs/deploy-aws.md` (+2), `docs/deploy-aws-free-tier-guide.md` (+2)
+**Plan:** `plans/260516-1035-auto-register-after-deploy/`
+
+## Status
+
+**DONE — no must-fix issues.** Implementation faithfully mirrors `Makefile:101-148` reference, hardens it with `set -euo pipefail` + `jq -e` validation, and masks credentials before they can reach any log line. All 8 acceptance criteria from the plan are met by the diff.
+
+## Critical findings
+
+None. No blockers, no security regressions, no contract breaks.
+
+## Verified safe
+
+1. **Mask timing — no leak window.**
+   - `.github/workflows/deploy.yml:76-79` — `TOKEN=$(...)` then `echo "::add-mask::$TOKEN"` on the very next line. AWS CLI with `--output text --query Parameter.Value` writes nothing else to stdout/stderr, so the value never escapes between capture and mask.
+   - Same pattern at lines 80-83 (SECRET) and 100-103 (re-read TOKEN in commands step).
+
+2. **No URL-leak via curl error.** Empirically verified `curl 8.5.0 -fsS` on 4xx prints `curl: (22) The requested URL returned error: <code>` — URL is **not** in the message. Even if it were, `::add-mask::` was issued at line 79 before line 86's `curl`, so GH Actions redacts the token from all subsequent log output (stdout + stderr) in the same job. No `set -x`, no `-v`, no `curl -w` interpolating the URL.
+
+3. **Failure semantics — fail-loud, no silent swallows.**
+   - `set -euo pipefail` + `RESP=$(curl -fsS ...)` — empirically confirmed: curl exit-22 propagates through command substitution, `set -e` kills the shell before next line. (Note: `errexit` *does* propagate from `$(...)` since bash 4.0.)
+   - `jq -e '.ok == true' >/dev/null || { echo ...; exit 1; }` — catches Telegram returning HTTP 200 with `{"ok":false}` body. The `|| { ... }` keeps `set -e` honest (no pipefail-with-tee anti-patterns).
+   - Both steps lack `continue-on-error: true` — failures bubble up to the job.
+
+4. **Webhook contract matches code.**
+   - Path: workflow sends to `${URL%/}/webhook`; router accepts at `internal/server/router.go:51` (`mux.Handle("/webhook", ...)`).
+   - Secret: workflow forwards `secret_token=${SECRET}`; Telegram echoes back via `X-Telegram-Bot-Api-Secret-Token` header; `internal/telegram/webhook.go:22,48-52` validates with `subtle.ConstantTimeCompare`. Same SSM param (`/miti99bot/prod/telegram-webhook-secret`) feeds both setWebhook and Lambda env (via `template.yaml` `:1` resolver), so values stay in sync.
+   - `allowed_updates=["message","callback_query"]` — matches `Makefile:120` reference exactly.
+
+5. **setMyCommands payload format.** `aws/telegram-commands.json` has top-level `{"commands": [...]}` which matches Telegram API spec for `--data-binary @file` with `Content-Type: application/json`. Verified via Telegram Bot API docs.
+
+6. **IAM permissions present.** `aws/README.md:74` lists `AmazonSSMFullAccess` and `:69` lists `AWSCloudFormationFullAccess` attached to `github-deploy-miti99bot`. No new grants needed. Plan claim verified.
+
+7. **Concurrency safe.** `.github/workflows/deploy.yml:12-14` — `concurrency.group: deploy-prod`, `cancel-in-progress: false` → serial queueing, no parallel setWebhook race. SSM secret values are versioned and read-after-deploy, so the newest value wins by ordering, not by collision.
+
+8. **YAML structural validity.** 9 total steps (was 7, +2 new). Indentation consistent with existing steps; `env:`, `run:` blocks well-formed (read at `.github/workflows/deploy.yml:67-93,95-111`). No actionlint available locally to formally validate, but eye-parse is clean.
+
+9. **Docs labeling is unambiguous.**
+   - `docs/deploy-aws.md:56` — "auto-runs `setWebhook` + `setMyCommands` after every push to `main`. The snippet below is the break-glass equivalent..."
+   - `docs/deploy-aws-free-tier-guide.md:261` — "For first-time setup only. After Step 6 wires the GitHub workflow, every push to `main` auto-runs `setWebhook` + `setMyCommands`; this manual block is the break-glass path."
+   - Both clearly tag the manual blocks as break-glass / first-time-only. Low risk of a reader running them every deploy.
+
+10. **Idempotency.** `setWebhook` and `setMyCommands` are documented as idempotent by Telegram. Running on every push is safe and self-healing (per plan's stated goal).
+
+11. **No YAGNI/scope creep.** Diff is exactly the two new steps + the two one-line doc notes. No defensive plumbing, no caching layer, no follow-up IAM tightening (correctly deferred to a separate plan).
+
+## Recommendations (defer — non-blocking)
+
+| # | Location | Note |
+|---|----------|------|
+| R1 | `.github/workflows/deploy.yml:91-92,109-110` | `echo "setWebhook failed: $RESP"` prints the full Telegram response on failure. Telegram's `description` field is generic ("Bad Request: ..."), so token-in-body leak is implausible, but masking is already in place as belt-and-suspenders. Keep as-is — useful for diagnosing rare API rejections. |
+| R2 | `docs/deploy-aws.md:67`, `docs/deploy-aws-free-tier-guide.md:273` | Break-glass manual blocks still use `${URL}webhook` (no `%/` trim). This relies on CFN's `FunctionUrl` always ending with `/`, which is the documented Lambda behavior. Not a regression (pre-existing). Tighten next time the file is touched. |
+| R3 | `.github/workflows/deploy.yml:76-78,100-102` | Two SSM calls in two adjacent steps to read the same token. Plan already acknowledges this as acceptable (single SSM call is cheap, keeps steps independently re-runnable). Single-step consolidation or `$GITHUB_OUTPUT` is the documented follow-up. |
+| R4 | `.github/workflows/deploy.yml:67-93` | No `if: success()` guard on the new steps. GH Actions defaults to running a step only when prior steps succeed, so this is redundant — but adding it explicitly would make the intent obvious to a reader. Optional. |
+
+## Unresolved questions
+
+None. All adversarial vectors closed by verification against codebase + empirical curl/bash tests.
+
+---
+
+## Citations
+
+- Workflow diff: `.github/workflows/deploy.yml:67-111`
+- Webhook handler validation: `internal/telegram/webhook.go:22,48-52`
+- Router mount: `internal/server/router.go:51`
+- IAM policy attachments: `aws/README.md:69,74`
+- CFN output declaration: `template.yaml:207-210`
+- Reference Makefile targets: `Makefile:101-148`
+- Commands JSON: `aws/telegram-commands.json:1-96`
+- Doc break-glass labels: `docs/deploy-aws.md:56`, `docs/deploy-aws-free-tier-guide.md:261`
+- Concurrency control: `.github/workflows/deploy.yml:12-14`
+
+**Status:** DONE
+**Summary:** Implementation is correct, secure, and matches the plan. Token masking is in place before any line that could log the value; `curl -fsS` + `jq -e` + `set -euo pipefail` produce loud failures with no silent swallows; `/webhook` path and secret-token contract match `internal/server/router.go:51` and `internal/telegram/webhook.go:48-52`. Docs clearly label manual blocks as break-glass. No must-fix issues; safe to commit.