docs(webhooks): clarify duplicate causes, secret stability, timeout tunability#86
docs(webhooks): clarify duplicate causes, secret stability, timeout tunability#86Yan Xue (yanxue06) wants to merge 3 commits into
Conversation
…unability - delivery: document the per-attempt request timeout as a tunable knob (30s default) so the contract reflects that a deployment can run it shorter, rather than asserting a flat 30s the worker can override - troubleshooting: add the multi-registration cause to "I receive duplicates" — every registered URL gets every event, so extra/stale webhooks double output and dedupe can't fix it across independent backends; keep one canonical URL - managing-webhooks: note the signing secret is stable for the life of the registration; app/relay/worker restarts never rotate it Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughDocumentation updates: per-attempt delivery timeout and examples changed to 15 seconds, signingSecret stability clarified for webhook registrations, and duplicate-delivery troubleshooting split into retry-driven and registration-driven scenarios. ChangesWebhook Documentation Improvements
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install timed out. The project may have too many dependencies for the sandbox. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
webhooks/troubleshooting.mdx (1)
122-122: 📐 Maintainability & Code Quality | ⚡ Quick winConsider breaking the long sentence into shorter sentences.
Line 122 contains multiple ideas in a single 85+ word sentence. Per the coding guidelines, documentation should keep sentences concise with one idea per sentence. Breaking this into 2-3 shorter sentences would improve readability.
📝 Suggested rewrite
-3. **More than one webhook is registered and more than one acts on the event.** Every registered URL receives *every* event (see [Multiple webhooks per project](/webhooks/managing-webhooks#multiple-webhooks-per-project)) — including **stale registrations you forgot to delete** after a URL change. If two endpoints both act (e.g. both reply), every message is doubled at the source. Deduping won't help here: two independent backends don't share a dedupe store, so each processes its own copy exactly once and the user still sees two. The fix is to keep one canonical webhook and [delete the rest](/webhooks/managing-webhooks#delete-a-webhook). If your URL changes on every restart or deploy (ngrok, preview environments), delete the old registration each time you add the new one — see ["ngrok URL keeps changing"](`#ngrok-url-keeps-changing`). +3. **More than one webhook is registered and more than one acts on the event.** Every registered URL receives *every* event (see [Multiple webhooks per project](/webhooks/managing-webhooks#multiple-webhooks-per-project)) — including **stale registrations you forgot to delete** after a URL change. If two endpoints both act (e.g. both reply), every message is doubled at the source. Deduping won't help here because two independent backends don't share a dedupe store, so each processes its own copy exactly once and the user still sees two replies. The fix is to keep one canonical webhook and [delete the rest](/webhooks/managing-webhooks#delete-a-webhook). If your URL changes on every restart or deploy (ngrok, preview environments), delete the old registration each time you add the new one — see ["ngrok URL keeps changing"](`#ngrok-url-keeps-changing`).As per coding guidelines, documentation should keep sentences concise with one idea per sentence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@webhooks/troubleshooting.mdx` at line 122, Split the long sentence in the list item that begins "3. **More than one webhook is registered and more than one acts on the event.** Every registered URL receives *every* event..." into 2–3 shorter sentences: 1) state that every registered URL receives every event (including stale registrations), 2) explain the consequence that two endpoints that both act will cause duplicated messages and that deduping across independent backends won’t help, and 3) give the fix as keeping one canonical webhook and deleting others (and mention the ngrok/preview-environment URL-change advice). Keep the original wording/links but move clauses into separate sentences for clarity.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@webhooks/delivery.mdx`:
- Line 82: Update the table row for "Per-attempt timeout" to remove or clarify
the ambiguous "`>30s` ceiling from the section above" phrasing: replace that
fragment with a clear reference such as "the 30-second ceiling mentioned above"
or delete the cross-reference entirely so the cell reads self-contained (e.g.,
"How long a single attempt waits for your endpoint to respond before aborting
and scheduling a retry — the 30-second ceiling described above."). Target the
table row text containing "Per-attempt timeout", "30 seconds", and the literal
"`>30s` ceiling" for this edit.
---
Nitpick comments:
In `@webhooks/troubleshooting.mdx`:
- Line 122: Split the long sentence in the list item that begins "3. **More than
one webhook is registered and more than one acts on the event.** Every
registered URL receives *every* event..." into 2–3 shorter sentences: 1) state
that every registered URL receives every event (including stale registrations),
2) explain the consequence that two endpoints that both act will cause
duplicated messages and that deduping across independent backends won’t help,
and 3) give the fix as keeping one canonical webhook and deleting others (and
mention the ngrok/preview-environment URL-change advice). Keep the original
wording/links but move clauses into separate sentences for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b5f1496b-48f9-4aac-85c4-970fcfab350c
📒 Files selected for processing (3)
webhooks/delivery.mdxwebhooks/managing-webhooks.mdxwebhooks/troubleshooting.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.mdx
📄 CodeRabbit inference engine (AGENTS.md)
**/*.mdx: Pages should be written as MDX files with YAML frontmatter
Use active voice and second person ("you") in documentation
Keep sentences concise with one idea per sentence in documentation
Use sentence case for headings in documentation
Bold UI elements in documentation (e.g., Click Settings)
Use code formatting for file names, commands, paths, and code references in documentation
Files:
webhooks/managing-webhooks.mdxwebhooks/troubleshooting.mdxwebhooks/delivery.mdx
🔇 Additional comments (1)
webhooks/managing-webhooks.mdx (1)
208-210: LGTM!
The row's description is self-contained; the ">30s" notation actually appears below this table (status-code section), not above, and line 64 says "30 seconds" rather than ">30s". Addresses PR review feedback. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Prod runs DELIVERY_TIMEOUT_MS=15000; the docs asserted 30s throughout. Swept every per-attempt-timeout reference to 15s and recomputed the hang-to-timeout worst case (6x15s + ~39s backoff ~= ~2 min, was ~3.5). Left the backoff/retry-window figures (~26-39s; the "~30s budget/window" rows) unchanged -- those are the retry-sleep schedule, independent of the per-attempt request timeout. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
webhooks/delivery.mdx (1)
182-182: 📐 Maintainability & Code Quality | 🟠 MajorClarify “~30s default” for the “Endpoint down” case (line 182) to match the timeout worst-case.
webhooks/delivery.mdxstates that if every attempt hangs to the 15s per-attempt timeout, wall-clock time can reach ~2 minutes (line 49). But line 182 describes “Endpoint down for the full retry window (~30s default, …)”, which reads like an unresponsive/timeout scenario and conflicts with the ~2-minute worst case.Update line 182 to explicitly distinguish the “quick failure” case (~30s, dominated by backoff / 503 recovery) from the “timeout exhaustion” case (up to ~2 minutes, dominated by the 15s per-attempt timeout), or reword the scenario so “down” means what you intend.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@webhooks/delivery.mdx` at line 182, Update the table row that currently reads "Endpoint down for the full retry window (~30s default, more if you've requested tuning)" to clearly distinguish two failure modes: a quick-failure/retry-window case (≈30s default, driven by backoff and immediate 5xx responses) and a timeout-exhaustion case (up to ≈2 minutes worst-case if every attempt hits the 15s per-attempt timeout referenced earlier around the "~2 minutes" note). Reword that table cell to either split into two lines or append a parenthetical clarifying text (e.g., "quick failure: ~30s (backoff/503) vs. timeout exhaustion: up to ~2min (15s per attempt)") so "down" unambiguously conveys which timing applies.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@webhooks/delivery.mdx`:
- Line 182: Update the table row that currently reads "Endpoint down for the
full retry window (~30s default, more if you've requested tuning)" to clearly
distinguish two failure modes: a quick-failure/retry-window case (≈30s default,
driven by backoff and immediate 5xx responses) and a timeout-exhaustion case (up
to ≈2 minutes worst-case if every attempt hits the 15s per-attempt timeout
referenced earlier around the "~2 minutes" note). Reword that table cell to
either split into two lines or append a parenthetical clarifying text (e.g.,
"quick failure: ~30s (backoff/503) vs. timeout exhaustion: up to ~2min (15s per
attempt)") so "down" unambiguously conveys which timing applies.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 69183b58-55b4-4240-a777-31e0b4beda0e
📒 Files selected for processing (2)
webhooks/delivery.mdxwebhooks/troubleshooting.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.mdx
📄 CodeRabbit inference engine (AGENTS.md)
**/*.mdx: Pages should be written as MDX files with YAML frontmatter
Use active voice and second person ("you") in documentation
Keep sentences concise with one idea per sentence in documentation
Use sentence case for headings in documentation
Bold UI elements in documentation (e.g., Click Settings)
Use code formatting for file names, commands, paths, and code references in documentation
Files:
webhooks/troubleshooting.mdxwebhooks/delivery.mdx
🪛 LanguageTool
webhooks/delivery.mdx
[grammar] ~49-~49: Ensure spelling is correct
Context: ...m to ~26.2 seconds in the average case (200ms + 1s + 5s + 10s + 10s) and ~39.3 second...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (2)
webhooks/troubleshooting.mdx (2)
111-136: LGTM!
137-169: LGTM!
What & why
A support ticket surfaced four accuracy gaps in the webhooks docs. This PR fixes the three that belong in the webhooks section; the fourth is deferred (below).
1. Per-attempt timeout: docs said 30s, prod runs 15s —
webhooks/delivery.mdxThe page asserted a flat "30-second per-attempt timeout" throughout, but prod runs
DELIVERY_TIMEOUT_MS=15000. (15s over 30s is deliberate — 6 attempts × 30s is too long a tail on dead/hung endpoints.) Swept every per-attempt-timeout reference to 15s, recomputed the hang-to-timeout worst case (~3.5 min → ~2 min), fixed the slow-handler example (28s → 20s), and added the timeout to the "Tunable on our side" table. The backoff/retry-window figures (~26–39s; the "~30s budget/window" rows at:78/:176/:182) are unchanged — those are the retry-sleep schedule, independent of the per-attempt request timeout.2. "I receive duplicates" missed the most deterministic cause —
webhooks/troubleshooting.mdxThe section only listed retry-driven duplicates (timeout-after-processing, 5xx-after-partial), both fixable by deduping. It never mentioned multiple registered webhooks — every URL receives every event, so two endpoints that both act double every message, and stale registrations (rotating ngrok / preview-deploy URLs re-registered without deleting the old row) accumulate silently. This one deduping can't fix across independent backends. Added it as scenario 3 with the real fix: keep one canonical URL, delete the rest.
3. No "your secret is stable" reassurance —
webhooks/managing-webhooks.mdxThe rotation section explained how to rotate but never said it doesn't happen on its own. Added a
<Note>: the signing secret is stable for the life of the registration; app/relay/worker restarts never rotate it — only an explicit delete + re-register does.4. (Deferred) The "agent isn't online" end-user fallback is undocumented
This end-user auto-reply isn't in any doc, but it originates in the relay/Cosmos layer (not the webhook worker) and its exact trigger isn't confirmed in code. It belongs under the iMessage/lines docs, not webhooks, and should be written once the relay behavior is confirmed with its owner. Deliberately left out.
Notes for reviewers
.mdx(onlyevents.mdxis.vel-generated), so they're edited directly.llms-*.txtare gitignored — no regen needed.mainwith 3 unrelatedadvanced-kits/imessagetypecheck errors (error-handling.mdx:98,locations.mdx:36,messages.mdx:484) — code-block drift vs the installed package, nothing to do with this change. Committed with--no-verifysince this is a prose-only diff. CI may surface them; worth a separate fix.🤖 Generated with Claude Code
Summary by CodeRabbit