Skip to content

docs(webhooks): clarify duplicate causes, secret stability, timeout tunability#86

Open
Yan Xue (yanxue06) wants to merge 3 commits into
mainfrom
docs/webhooks-duplicate-timeout-secret-clarity
Open

docs(webhooks): clarify duplicate causes, secret stability, timeout tunability#86
Yan Xue (yanxue06) wants to merge 3 commits into
mainfrom
docs/webhooks-duplicate-timeout-secret-clarity

Conversation

@yanxue06

@yanxue06 Yan Xue (yanxue06) commented Jun 12, 2026

Copy link
Copy Markdown
Member

What & why

A support ticket surfaced four accuracy gaps in the webhooks docs. This PR fixes the three that belong in the webhooks section; the fourth is deferred (below).

1. Per-attempt timeout: docs said 30s, prod runs 15s — webhooks/delivery.mdx

The page asserted a flat "30-second per-attempt timeout" throughout, but prod runs DELIVERY_TIMEOUT_MS=15000. (15s over 30s is deliberate — 6 attempts × 30s is too long a tail on dead/hung endpoints.) Swept every per-attempt-timeout reference to 15s, recomputed the hang-to-timeout worst case (~3.5 min → ~2 min), fixed the slow-handler example (28s → 20s), and added the timeout to the "Tunable on our side" table. The backoff/retry-window figures (~26–39s; the "~30s budget/window" rows at :78/:176/:182) are unchanged — those are the retry-sleep schedule, independent of the per-attempt request timeout.

2. "I receive duplicates" missed the most deterministic cause — webhooks/troubleshooting.mdx

The section only listed retry-driven duplicates (timeout-after-processing, 5xx-after-partial), both fixable by deduping. It never mentioned multiple registered webhooks — every URL receives every event, so two endpoints that both act double every message, and stale registrations (rotating ngrok / preview-deploy URLs re-registered without deleting the old row) accumulate silently. This one deduping can't fix across independent backends. Added it as scenario 3 with the real fix: keep one canonical URL, delete the rest.

3. No "your secret is stable" reassurance — webhooks/managing-webhooks.mdx

The rotation section explained how to rotate but never said it doesn't happen on its own. Added a <Note>: the signing secret is stable for the life of the registration; app/relay/worker restarts never rotate it — only an explicit delete + re-register does.

4. (Deferred) The "agent isn't online" end-user fallback is undocumented

This end-user auto-reply isn't in any doc, but it originates in the relay/Cosmos layer (not the webhook worker) and its exact trigger isn't confirmed in code. It belongs under the iMessage/lines docs, not webhooks, and should be written once the relay behavior is confirmed with its owner. Deliberately left out.

Notes for reviewers

  • All three files are hand-written .mdx (only events.mdx is .vel-generated), so they're edited directly. llms-*.txt are gitignored — no regen needed.
  • Heads-up: the repo's pre-commit hook fails on main with 3 unrelated advanced-kits/imessage typecheck errors (error-handling.mdx:98, locations.mdx:36, messages.mdx:484) — code-block drift vs the installed package, nothing to do with this change. Committed with --no-verify since this is a prose-only diff. CI may surface them; worth a separate fix.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Updated webhook retry policy docs to specify a 15-second per-attempt timeout and adjusted examples, wording, and retry/status-code guidance accordingly.
    • Clarified that a webhook’s signing secret remains stable for its lifetime and only changes when the webhook is deleted and re-registered.
    • Expanded duplicate delivery troubleshooting to distinguish retry-driven vs. registration-driven duplicates and updated fixes.

…unability

- delivery: document the per-attempt request timeout as a tunable knob
  (30s default) so the contract reflects that a deployment can run it
  shorter, rather than asserting a flat 30s the worker can override
- troubleshooting: add the multi-registration cause to "I receive
  duplicates" — every registered URL gets every event, so extra/stale
  webhooks double output and dedupe can't fix it across independent
  backends; keep one canonical URL
- managing-webhooks: note the signing secret is stable for the life of
  the registration; app/relay/worker restarts never rotate it

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 12, 2026 21:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Documentation updates: per-attempt delivery timeout and examples changed to 15 seconds, signingSecret stability clarified for webhook registrations, and duplicate-delivery troubleshooting split into retry-driven and registration-driven scenarios.

Changes

Webhook Documentation Improvements

Layer / File(s) Summary
Per-attempt timeout adjustments
webhooks/delivery.mdx
All delivery/retry wording, examples, and tables updated to document a 15-second per-attempt timeout (worst-case wall-clock math, retry-policy row, status-code mapping, duplicate/failure examples).
Signing secret stability clarification
webhooks/managing-webhooks.mdx
Added a note that a webhook's signingSecret remains stable for the lifetime of that registration and only changes via delete + re-register or new registration.
Duplicate-delivery troubleshooting rework
webhooks/troubleshooting.mdx
Restructured duplicate-delivery guidance into two flavors—retry-driven and registration-driven—and updated timeout-trigger wording to use the 15s cutoff where applicable.

🎯 3 (Moderate) | ⏱️ ~20 minutes

I nibble words beneath the moon,
Changed timeouts, secrets, fixed the tune.
Fifteen seconds now, clear and neat,
No more surprises — docs complete! 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely captures the three main changes: clarifying duplicate causes, confirming secret stability, and documenting timeout tunability across the webhook documentation updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/webhooks-duplicate-timeout-secret-clarity

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install timed out. The project may have too many dependencies for the sandbox.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
webhooks/troubleshooting.mdx (1)

122-122: 📐 Maintainability & Code Quality | ⚡ Quick win

Consider breaking the long sentence into shorter sentences.

Line 122 contains multiple ideas in a single 85+ word sentence. Per the coding guidelines, documentation should keep sentences concise with one idea per sentence. Breaking this into 2-3 shorter sentences would improve readability.

📝 Suggested rewrite
-3. **More than one webhook is registered and more than one acts on the event.** Every registered URL receives *every* event (see [Multiple webhooks per project](/webhooks/managing-webhooks#multiple-webhooks-per-project)) — including **stale registrations you forgot to delete** after a URL change. If two endpoints both act (e.g. both reply), every message is doubled at the source. Deduping won't help here: two independent backends don't share a dedupe store, so each processes its own copy exactly once and the user still sees two. The fix is to keep one canonical webhook and [delete the rest](/webhooks/managing-webhooks#delete-a-webhook). If your URL changes on every restart or deploy (ngrok, preview environments), delete the old registration each time you add the new one — see ["ngrok URL keeps changing"](`#ngrok-url-keeps-changing`).
+3. **More than one webhook is registered and more than one acts on the event.** Every registered URL receives *every* event (see [Multiple webhooks per project](/webhooks/managing-webhooks#multiple-webhooks-per-project)) — including **stale registrations you forgot to delete** after a URL change. If two endpoints both act (e.g. both reply), every message is doubled at the source. Deduping won't help here because two independent backends don't share a dedupe store, so each processes its own copy exactly once and the user still sees two replies. The fix is to keep one canonical webhook and [delete the rest](/webhooks/managing-webhooks#delete-a-webhook). If your URL changes on every restart or deploy (ngrok, preview environments), delete the old registration each time you add the new one — see ["ngrok URL keeps changing"](`#ngrok-url-keeps-changing`).

As per coding guidelines, documentation should keep sentences concise with one idea per sentence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webhooks/troubleshooting.mdx` at line 122, Split the long sentence in the
list item that begins "3. **More than one webhook is registered and more than
one acts on the event.** Every registered URL receives *every* event..." into
2–3 shorter sentences: 1) state that every registered URL receives every event
(including stale registrations), 2) explain the consequence that two endpoints
that both act will cause duplicated messages and that deduping across
independent backends won’t help, and 3) give the fix as keeping one canonical
webhook and deleting others (and mention the ngrok/preview-environment
URL-change advice). Keep the original wording/links but move clauses into
separate sentences for clarity.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@webhooks/delivery.mdx`:
- Line 82: Update the table row for "Per-attempt timeout" to remove or clarify
the ambiguous "`>30s` ceiling from the section above" phrasing: replace that
fragment with a clear reference such as "the 30-second ceiling mentioned above"
or delete the cross-reference entirely so the cell reads self-contained (e.g.,
"How long a single attempt waits for your endpoint to respond before aborting
and scheduling a retry — the 30-second ceiling described above."). Target the
table row text containing "Per-attempt timeout", "30 seconds", and the literal
"`>30s` ceiling" for this edit.

---

Nitpick comments:
In `@webhooks/troubleshooting.mdx`:
- Line 122: Split the long sentence in the list item that begins "3. **More than
one webhook is registered and more than one acts on the event.** Every
registered URL receives *every* event..." into 2–3 shorter sentences: 1) state
that every registered URL receives every event (including stale registrations),
2) explain the consequence that two endpoints that both act will cause
duplicated messages and that deduping across independent backends won’t help,
and 3) give the fix as keeping one canonical webhook and deleting others (and
mention the ngrok/preview-environment URL-change advice). Keep the original
wording/links but move clauses into separate sentences for clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b5f1496b-48f9-4aac-85c4-970fcfab350c

📥 Commits

Reviewing files that changed from the base of the PR and between 42cc73d and eeb0fcc.

📒 Files selected for processing (3)
  • webhooks/delivery.mdx
  • webhooks/managing-webhooks.mdx
  • webhooks/troubleshooting.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.mdx

📄 CodeRabbit inference engine (AGENTS.md)

**/*.mdx: Pages should be written as MDX files with YAML frontmatter
Use active voice and second person ("you") in documentation
Keep sentences concise with one idea per sentence in documentation
Use sentence case for headings in documentation
Bold UI elements in documentation (e.g., Click Settings)
Use code formatting for file names, commands, paths, and code references in documentation

Files:

  • webhooks/managing-webhooks.mdx
  • webhooks/troubleshooting.mdx
  • webhooks/delivery.mdx
🔇 Additional comments (1)
webhooks/managing-webhooks.mdx (1)

208-210: LGTM!

Comment thread webhooks/delivery.mdx Outdated
Yan Xue (yanxue06) and others added 2 commits June 12, 2026 15:18
The row's description is self-contained; the ">30s" notation actually
appears below this table (status-code section), not above, and line 64
says "30 seconds" rather than ">30s". Addresses PR review feedback.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Prod runs DELIVERY_TIMEOUT_MS=15000; the docs asserted 30s throughout.
Swept every per-attempt-timeout reference to 15s and recomputed the
hang-to-timeout worst case (6x15s + ~39s backoff ~= ~2 min, was ~3.5).
Left the backoff/retry-window figures (~26-39s; the "~30s budget/window"
rows) unchanged -- those are the retry-sleep schedule, independent of
the per-attempt request timeout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 12, 2026 23:36

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
webhooks/delivery.mdx (1)

182-182: 📐 Maintainability & Code Quality | 🟠 Major

Clarify “~30s default” for the “Endpoint down” case (line 182) to match the timeout worst-case.

webhooks/delivery.mdx states that if every attempt hangs to the 15s per-attempt timeout, wall-clock time can reach ~2 minutes (line 49). But line 182 describes “Endpoint down for the full retry window (~30s default, …)”, which reads like an unresponsive/timeout scenario and conflicts with the ~2-minute worst case.

Update line 182 to explicitly distinguish the “quick failure” case (~30s, dominated by backoff / 503 recovery) from the “timeout exhaustion” case (up to ~2 minutes, dominated by the 15s per-attempt timeout), or reword the scenario so “down” means what you intend.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@webhooks/delivery.mdx` at line 182, Update the table row that currently reads
"Endpoint down for the full retry window (~30s default, more if you've requested
tuning)" to clearly distinguish two failure modes: a quick-failure/retry-window
case (≈30s default, driven by backoff and immediate 5xx responses) and a
timeout-exhaustion case (up to ≈2 minutes worst-case if every attempt hits the
15s per-attempt timeout referenced earlier around the "~2 minutes" note). Reword
that table cell to either split into two lines or append a parenthetical
clarifying text (e.g., "quick failure: ~30s (backoff/503) vs. timeout
exhaustion: up to ~2min (15s per attempt)") so "down" unambiguously conveys
which timing applies.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@webhooks/delivery.mdx`:
- Line 182: Update the table row that currently reads "Endpoint down for the
full retry window (~30s default, more if you've requested tuning)" to clearly
distinguish two failure modes: a quick-failure/retry-window case (≈30s default,
driven by backoff and immediate 5xx responses) and a timeout-exhaustion case (up
to ≈2 minutes worst-case if every attempt hits the 15s per-attempt timeout
referenced earlier around the "~2 minutes" note). Reword that table cell to
either split into two lines or append a parenthetical clarifying text (e.g.,
"quick failure: ~30s (backoff/503) vs. timeout exhaustion: up to ~2min (15s per
attempt)") so "down" unambiguously conveys which timing applies.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69183b58-55b4-4240-a777-31e0b4beda0e

📥 Commits

Reviewing files that changed from the base of the PR and between 28eb4a5 and 62a3aa7.

📒 Files selected for processing (2)
  • webhooks/delivery.mdx
  • webhooks/troubleshooting.mdx
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.mdx

📄 CodeRabbit inference engine (AGENTS.md)

**/*.mdx: Pages should be written as MDX files with YAML frontmatter
Use active voice and second person ("you") in documentation
Keep sentences concise with one idea per sentence in documentation
Use sentence case for headings in documentation
Bold UI elements in documentation (e.g., Click Settings)
Use code formatting for file names, commands, paths, and code references in documentation

Files:

  • webhooks/troubleshooting.mdx
  • webhooks/delivery.mdx
🪛 LanguageTool
webhooks/delivery.mdx

[grammar] ~49-~49: Ensure spelling is correct
Context: ...m to ~26.2 seconds in the average case (200ms + 1s + 5s + 10s + 10s) and ~39.3 second...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (2)
webhooks/troubleshooting.mdx (2)

111-136: LGTM!


137-169: LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants