Background
Bulwark's current accounting model is read-before / write-after (documented in DESIGN.md → "Why post-hoc accounting, not reservations"). It accepts a bounded overspend window under concurrency in exchange for hot-path latency.
A real production failure mode that the current model doesn't handle cleanly: what happens to in-flight provider calls when the caller (or an orchestrator like an agent runtime) forcefully halts a call mid-flight?
Concretely:
- Client opens a request to Bulwark
- Bulwark forwards to provider (OpenAI / Anthropic / Gemini)
- Provider starts generating; tokens are being billed regardless of whether anyone reads the response
- Client disconnects, or an upstream orchestrator (Manager-Employee pattern, swarm coordinator, etc.) hits a budget breach and cancels the in-flight tool call
- The provider charge still lands. Bulwark currently has no way to:
- Account for the spend that was committed-but-not-yet-recorded
- Release a reservation (because we don't have reservations yet)
- Avoid permitting the next call to oversubscribe the budget on the assumption the cancelled call "didn't count"
Why this matters
This becomes acute the moment Bulwark sits in front of an agent runtime that supports forced halts — and increasingly common architectures (Manager-Employee, swarm orchestration, hierarchical agents) explicitly halt sub-agents on budget breaches. See r/SideProject discussion with Nimind for a concrete example of this pattern in the wild.
Proposed design
When Phase 2 reservation work lands (see ROADMAP.md → v1.1), the reservation system should include a TTL'd hold:
- Reserve worst-case cost (
max_tokens × output_price + input_tokens × input_price) before dispatching to the provider.
- Hold the reservation in shared state with a TTL (default: provider's max response time + buffer, e.g. 60s).
- Convert the reservation to a confirmed debit when the provider returns usage data — release the delta back to the budget.
- Expire the reservation automatically if the call times out, the client disconnects, or no confirmation arrives within the TTL window. This frees the budget for legitimate subsequent calls without requiring explicit cancellation signalling from the caller.
- Optional
DELETE /v1/reservations/:id for orchestrators that want to release a hold explicitly on forced halt (rather than waiting for TTL).
Implementation notes
- Reservations live in Durable Objects (one DO per
keyId) for true single-writer semantics, with KV as a read-cache for the hot path.
- The TTL approach intentionally doesn't depend on the caller doing the right thing — a crashed client, a halted Employee agent, or a network blip all expire safely.
- The explicit
DELETE endpoint is for well-behaved orchestrators that want to release the held budget faster than the TTL.
- The provider charge still lands regardless of what Bulwark does — this design ensures Bulwark's accounting of that charge is correct, but it doesn't (and can't) refund the spend.
Acceptance criteria
Out of scope for this issue
- Refunding spend to the user (impossible — provider already charged)
- Replicating the cancellation upstream to the provider (would require provider-side cancellation APIs, which are spotty and inconsistent)
- Distributed reservations across multiple Bulwark deployments — single-deployment DO is enough for v1.1; cross-region is a v2 concern
Related
Background
Bulwark's current accounting model is read-before / write-after (documented in DESIGN.md → "Why post-hoc accounting, not reservations"). It accepts a bounded overspend window under concurrency in exchange for hot-path latency.
A real production failure mode that the current model doesn't handle cleanly: what happens to in-flight provider calls when the caller (or an orchestrator like an agent runtime) forcefully halts a call mid-flight?
Concretely:
Why this matters
This becomes acute the moment Bulwark sits in front of an agent runtime that supports forced halts — and increasingly common architectures (Manager-Employee, swarm orchestration, hierarchical agents) explicitly halt sub-agents on budget breaches. See r/SideProject discussion with Nimind for a concrete example of this pattern in the wild.
Proposed design
When Phase 2 reservation work lands (see ROADMAP.md → v1.1), the reservation system should include a TTL'd hold:
max_tokens × output_price + input_tokens × input_price) before dispatching to the provider.DELETE /v1/reservations/:idfor orchestrators that want to release a hold explicitly on forced halt (rather than waiting for TTL).Implementation notes
keyId) for true single-writer semantics, with KV as a read-cache for the hot path.DELETEendpoint is for well-behaved orchestrators that want to release the held budget faster than the TTL.Acceptance criteria
BulwarkKeyRecordextended with optional reservation state referenceskeyIdDurable Object handles reservation lifecycleDELETE /v1/reservations/:idendpoint for explicit releaseOut of scope for this issue
Related