Skip to content

Reservation TTL on forced halt — handle in-flight call termination cleanly #1

Description

@OpsToInnovator

Background

Bulwark's current accounting model is read-before / write-after (documented in DESIGN.md → "Why post-hoc accounting, not reservations"). It accepts a bounded overspend window under concurrency in exchange for hot-path latency.

A real production failure mode that the current model doesn't handle cleanly: what happens to in-flight provider calls when the caller (or an orchestrator like an agent runtime) forcefully halts a call mid-flight?

Concretely:

  1. Client opens a request to Bulwark
  2. Bulwark forwards to provider (OpenAI / Anthropic / Gemini)
  3. Provider starts generating; tokens are being billed regardless of whether anyone reads the response
  4. Client disconnects, or an upstream orchestrator (Manager-Employee pattern, swarm coordinator, etc.) hits a budget breach and cancels the in-flight tool call
  5. The provider charge still lands. Bulwark currently has no way to:
    • Account for the spend that was committed-but-not-yet-recorded
    • Release a reservation (because we don't have reservations yet)
    • Avoid permitting the next call to oversubscribe the budget on the assumption the cancelled call "didn't count"

Why this matters

This becomes acute the moment Bulwark sits in front of an agent runtime that supports forced halts — and increasingly common architectures (Manager-Employee, swarm orchestration, hierarchical agents) explicitly halt sub-agents on budget breaches. See r/SideProject discussion with Nimind for a concrete example of this pattern in the wild.

Proposed design

When Phase 2 reservation work lands (see ROADMAP.md → v1.1), the reservation system should include a TTL'd hold:

  1. Reserve worst-case cost (max_tokens × output_price + input_tokens × input_price) before dispatching to the provider.
  2. Hold the reservation in shared state with a TTL (default: provider's max response time + buffer, e.g. 60s).
  3. Convert the reservation to a confirmed debit when the provider returns usage data — release the delta back to the budget.
  4. Expire the reservation automatically if the call times out, the client disconnects, or no confirmation arrives within the TTL window. This frees the budget for legitimate subsequent calls without requiring explicit cancellation signalling from the caller.
  5. Optional DELETE /v1/reservations/:id for orchestrators that want to release a hold explicitly on forced halt (rather than waiting for TTL).

Implementation notes

  • Reservations live in Durable Objects (one DO per keyId) for true single-writer semantics, with KV as a read-cache for the hot path.
  • The TTL approach intentionally doesn't depend on the caller doing the right thing — a crashed client, a halted Employee agent, or a network blip all expire safely.
  • The explicit DELETE endpoint is for well-behaved orchestrators that want to release the held budget faster than the TTL.
  • The provider charge still lands regardless of what Bulwark does — this design ensures Bulwark's accounting of that charge is correct, but it doesn't (and can't) refund the spend.

Acceptance criteria

  • BulwarkKeyRecord extended with optional reservation state references
  • Per-keyId Durable Object handles reservation lifecycle
  • Worst-case cost calculation correct for OpenAI / Anthropic / Gemini
  • TTL'd auto-expiry, default and configurable
  • DELETE /v1/reservations/:id endpoint for explicit release
  • Test coverage for: normal flow (reserve → confirm), timeout flow (reserve → expire), explicit halt (reserve → DELETE), concurrent reservations on the same key
  • DESIGN.md updated to reflect new accounting model
  • Backward compatible: existing post-hoc model remains available behind a config flag during the transition

Out of scope for this issue

  • Refunding spend to the user (impossible — provider already charged)
  • Replicating the cancellation upstream to the provider (would require provider-side cancellation APIs, which are spotty and inconsistent)
  • Distributed reservations across multiple Bulwark deployments — single-deployment DO is enough for v1.1; cross-region is a v2 concern

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionNeeds design conversationenhancementNew feature or improvementhelp wantedExtra attention or hands welcome

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions