diff --git a/webhooks/delivery.mdx b/webhooks/delivery.mdx
index d15ef2e..d11e4e6 100644
--- a/webhooks/delivery.mdx
+++ b/webhooks/delivery.mdx
@@ -10,7 +10,7 @@ You know what arrives ([Events](/webhooks/events)) and how to prove it's real ([
- **Strong retry behaviour.** Up to 6 attempts per event by default, with exponential backoff plus jitter on `5xx`, `408`, `429`, network errors, and worker-side timeouts. The vast majority of deliveries land on attempt 1; the retries are there for the occasional bad minute on your side.
- **Fast acknowledgement.** Any `2xx` ends it — the worker stops as soon as your server says ok.
- **Fast permanent failure.** Other `4xx` codes (`400`/`401`/`404`/etc.) are treated as fatal — we don't waste your retry budget when the request will never succeed.
-- **Bounded budget.** 30-second per-attempt timeout, with up to ~39 seconds of backoff sleeps between attempts (jittered). If your server is still down after the final attempt, the event is logged and the worker moves on — there is no dead-letter queue today.
+- **Bounded budget.** 15-second per-attempt timeout, with up to ~39 seconds of backoff sleeps between attempts (jittered). If your server is still down after the final attempt, the event is logged and the worker moves on — there is no dead-letter queue today.
- **At-least-once delivery.** A retry after your server timed out can re-deliver an event you already processed — always dedupe in your handler (see [Be idempotent](#be-idempotent) below).
- **URL guard, fail-closed.** Before every attempt the worker validates the target URL: it must be `https://`, must resolve to a public address, and must not redirect. A URL that fails the check is dropped immediately — fatal, no retry — see [Where we won't deliver](#where-we-wont-deliver) below.
@@ -46,7 +46,7 @@ sequenceDiagram
Note over W: ✓ delivered after retry
```
-The backoff *sleeps* sum to ~26.2 seconds in the average case (200ms + 1s + 5s + 10s + 10s) and ~39.3 seconds in the worst case (jitter ceiling). Wall-clock time also includes per-attempt network time, bounded by the 30-second per-attempt timeout: a healthy delivery finishes in milliseconds, while a worst case where every attempt hangs to the timeout can run up to ~3.5 minutes before the worker gives up. It stops as soon as it gets a 2xx or determines further retries are pointless.
+The backoff *sleeps* sum to ~26.2 seconds in the average case (200ms + 1s + 5s + 10s + 10s) and ~39.3 seconds in the worst case (jitter ceiling). Wall-clock time also includes per-attempt network time, bounded by the 15-second per-attempt timeout: a healthy delivery finishes in milliseconds, while a worst case where every attempt hangs to the timeout can run up to ~2 minutes before the worker gives up. It stops as soon as it gets a 2xx or determines further retries are pointless.
## Retry policy
@@ -61,7 +61,7 @@ Retries follow an exponential-backoff schedule with ±50% jitter applied to ever
| 5 | 10 seconds after attempt 4 ends (clamped from a formula value of 25s by the per-attempt cap) | `[5s, 15s)` |
| 6 | 10 seconds after attempt 5 ends (clamped from a formula value of 125s by the per-attempt cap) | `[5s, 15s)` |
-Per-attempt timeout: **30 seconds**. Treat it as a hard ceiling, not a target — acknowledge in well under a second and push slow work off the response path (see [Acknowledge fast](#acknowledge-fast-process-asynchronously) below).
+Per-attempt timeout: **15 seconds**. Treat it as a hard ceiling, not a target — acknowledge in well under a second and push slow work off the response path (see [Acknowledge fast](#acknowledge-fast-process-asynchronously) below).
After attempt 6 fails, the event is logged and dropped. There is no persistent queue and no dead-letter destination — both are out of scope for v1.
@@ -79,6 +79,7 @@ The retry schedule is operator-configurable. The Photon team can adjust these kn
| Knob | Default | Effect |
| --- | --- | --- |
+| Per-attempt timeout | 15 seconds | How long a single attempt waits for your endpoint to respond before aborting and scheduling a retry. A shorter deployed value means a slow endpoint times out (and gets retried) sooner. |
| Initial delay | 200ms | The `i = 0` term — delay before the first retry. |
| Growth factor | 5× | Multiplier applied per retry index (`200ms → 1s → 5s → ...`). |
| Per-attempt cap | 10 seconds | Ceiling applied to every computed delay before jitter, so the curve can't run away. |
@@ -87,7 +88,7 @@ The retry schedule is operator-configurable. The Photon team can adjust these kn
These are *internal* env vars on the spectrum-webhook worker — customers can't set them per-webhook today. If you have a use case that needs different retry behaviour (more retries, longer ceiling), reach out and we'll discuss tuning the deployment-wide defaults or adding a per-project override. Open an issue on the [docs repo](https://github.com/photon-hq/docs) or message us in the [Discord](https://discord.gg/4c3VJzDfNA).
-If you're seeing duplicates after long handler waits — say, attempt 1 takes 28 seconds and succeeds on your side, but our retry layer doesn't see the response in time — that's the per-attempt timeout, not the retry schedule. Tighten your handler (acknowledge first, process later) before asking us to widen our budget.
+If you're seeing duplicates after long handler waits — say, attempt 1 takes 20 seconds and succeeds on your side, but our retry layer doesn't see the response in time — that's the per-attempt timeout, not the retry schedule. Tighten your handler (acknowledge first, process later) before asking us to widen our budget.
## What your status codes mean to us
@@ -102,7 +103,7 @@ If you're seeing duplicates after long handler waits — say, attempt 1 takes 28
| Any other `4xx` (e.g. `400`, `401`, `403`, `404`, `422`) | Fatal | Don't retry. The assumption is that the request will never succeed (auth bug, schema mismatch, missing route). |
| Connection refused / TCP reset (after the URL guard passes) | Retriable | Wait, retry. |
| Hostname doesn't resolve (DNS failure) | Fatal | Caught by the URL guard *before* the request — fail-closed, no retry. |
-| Per-attempt timeout (>30s) | Retriable | Wait, retry. |
+| Per-attempt timeout (>15s) | Retriable | Wait, retry. |
**Return `4xx` deliberately.** Returning `400` or `401` from a real bug (e.g. signature verification failure) is correct — it tells us "stop retrying, this request will never work." Returning `500` for the same bug wastes our retry budget and your CPU cycles.
@@ -139,7 +140,7 @@ app.post('/spectrum-webhook', async (c) => {
});
```
-If your handler takes >30 seconds, the worker will time out the connection, mark it retriable, and `POST` again. Now you'll process the same event twice.
+If your handler takes >15 seconds, the worker will time out the connection, mark it retriable, and `POST` again. Now you'll process the same event twice.
### Be idempotent
@@ -173,7 +174,7 @@ Returning `503` on overload is fine — we'll back off and retry. But it eats in
| --- | --- |
| Endpoint returns `2xx` on first try | Best case. One delivery, one process. |
| Endpoint returns `503`, recovers within ~30s | Retried, eventually delivered. One process (assuming no `2xx` on the failed attempt). |
-| Endpoint times out after 30s, then succeeds | Retried, eventually delivered. **Possibly processed twice** — your handler ran during the timeout and again on retry. Dedupe required. |
+| Endpoint times out after 15s, then succeeds | Retried, eventually delivered. **Possibly processed twice** — your handler ran during the timeout and again on retry. Dedupe required. |
| Endpoint returns `400` (signature bug, etc.) | Dropped immediately, no retry. Event lost. Logged on our side. |
| Webhook URL is `http://` (not HTTPS) | Dropped immediately by the URL guard, no retry. Every event lost until you re-register an `https://` URL. |
| Webhook URL resolves to a private/internal IP | Dropped immediately, no retry (SSRF guard). Logged. |
diff --git a/webhooks/managing-webhooks.mdx b/webhooks/managing-webhooks.mdx
index ed3ac22..ebcb17d 100644
--- a/webhooks/managing-webhooks.mdx
+++ b/webhooks/managing-webhooks.mdx
@@ -205,6 +205,10 @@ The delete is logical — the row is soft-deleted with a `deletedAt` timestamp o
## Rotating the signing secret
+
+Your signing secret is **stable for the life of the registration**. Restarting your app, your relay, or the Spectrum worker never rotates it — the only things that change a secret are an explicit delete + re-register (below) or registering a brand-new webhook. If you find yourself capturing a new secret on every restart, you're deleting and re-creating the webhook when you don't need to.
+
+
There is no dedicated rotation endpoint. To rotate, **delete and re-register**:
```sh
diff --git a/webhooks/troubleshooting.mdx b/webhooks/troubleshooting.mdx
index c028865..6797d9b 100644
--- a/webhooks/troubleshooting.mdx
+++ b/webhooks/troubleshooting.mdx
@@ -110,12 +110,18 @@ All of these drop the event as **fatal** — no retry. There's no update endpoin
## "I receive duplicates"
-This is expected behavior under at-least-once delivery. The two scenarios that cause it:
+Two flavors, with different fixes. The first two are retry-driven and solved by deduping; the third is a registration problem that deduping **can't** fix.
+
+**Retry-driven (at-least-once delivery).** Expected under our delivery contract:
1. **Your handler succeeded but timed out before responding.** We retried, you processed twice.
2. **Your handler returned `5xx` after partially processing.** We retried, you re-ran the partial work.
-### Fix
+**Registration-driven.** Not a retry at all:
+
+3. **More than one webhook is registered and more than one acts on the event.** Every registered URL receives *every* event (see [Multiple webhooks per project](/webhooks/managing-webhooks#multiple-webhooks-per-project)) — including **stale registrations you forgot to delete** after a URL change. If two endpoints both act (e.g. both reply), every message is doubled at the source. Deduping won't help here: two independent backends don't share a dedupe store, so each processes its own copy exactly once and the user still sees two. The fix is to keep one canonical webhook and [delete the rest](/webhooks/managing-webhooks#delete-a-webhook). If your URL changes on every restart or deploy (ngrok, preview environments), delete the old registration each time you add the new one — see ["ngrok URL keeps changing"](#ngrok-url-keeps-changing).
+
+### Fix (scenarios 1 and 2)
Dedupe at the top of your handler using `X-Spectrum-Webhook-Id` plus `payload.message.id` as a composite key:
@@ -130,7 +136,7 @@ A 24-48 hour TTL is plenty — our retry budget is bounded to a few minutes at m
## "Deliveries time out"
-If you're seeing your endpoint logged as "took >30s," it triggers a retry on our side and a likely duplicate processing on yours.
+If you're seeing your endpoint logged as "took >15s," it triggers a retry on our side and a likely duplicate processing on yours.
### Diagnosis
@@ -145,7 +151,7 @@ app.post('/spectrum-webhook', async (c) => {
});
```
-Anything network-dependent in the request path can blow past 30s.
+Anything network-dependent in the request path can blow past 15s.
### Fix