Reservation TTL on forced halt — handle in-flight call termination cleanly

## Background

Bulwark's current accounting model is **read-before / write-after** (documented in [DESIGN.md → "Why post-hoc accounting, not reservations"](https://github.com/OpsToInnovator/bulwark/blob/main/DESIGN.md#why-post-hoc-accounting-not-reservations)). It accepts a bounded overspend window under concurrency in exchange for hot-path latency.

A real production failure mode that the current model doesn't handle cleanly: **what happens to in-flight provider calls when the caller (or an orchestrator like an agent runtime) forcefully halts a call mid-flight?**

Concretely:

1. Client opens a request to Bulwark
2. Bulwark forwards to provider (OpenAI / Anthropic / Gemini)
3. Provider starts generating; tokens are being billed regardless of whether anyone reads the response
4. Client disconnects, or an upstream orchestrator (Manager-Employee pattern, swarm coordinator, etc.) hits a budget breach and cancels the in-flight tool call
5. **The provider charge still lands.** Bulwark currently has no way to:
   - Account for the spend that was committed-but-not-yet-recorded
   - Release a reservation (because we don't have reservations yet)
   - Avoid permitting the next call to oversubscribe the budget on the assumption the cancelled call "didn't count"

## Why this matters

This becomes acute the moment Bulwark sits in front of an agent runtime that supports forced halts — and increasingly common architectures (Manager-Employee, swarm orchestration, hierarchical agents) explicitly halt sub-agents on budget breaches. See [r/SideProject discussion with Nimind](https://www.reddit.com/r/SideProject/comments/1u8wcer/i_got_tired_of_ai_agents_draining_my_api_budget/) for a concrete example of this pattern in the wild.

## Proposed design

When Phase 2 reservation work lands (see [ROADMAP.md → v1.1](https://github.com/OpsToInnovator/bulwark/blob/main/ROADMAP.md)), the reservation system should include a **TTL'd hold**:

1. **Reserve** worst-case cost (`max_tokens × output_price + input_tokens × input_price`) before dispatching to the provider.
2. **Hold** the reservation in shared state with a TTL (default: provider's max response time + buffer, e.g. 60s).
3. **Convert** the reservation to a confirmed debit when the provider returns usage data — release the delta back to the budget.
4. **Expire** the reservation automatically if the call times out, the client disconnects, or no confirmation arrives within the TTL window. This frees the budget for legitimate subsequent calls without requiring explicit cancellation signalling from the caller.
5. **Optional `DELETE /v1/reservations/:id`** for orchestrators that want to release a hold explicitly on forced halt (rather than waiting for TTL).

## Implementation notes

- Reservations live in Durable Objects (one DO per `keyId`) for true single-writer semantics, with KV as a read-cache for the hot path.
- The TTL approach intentionally doesn't depend on the caller doing the right thing — a crashed client, a halted Employee agent, or a network blip all expire safely.
- The explicit `DELETE` endpoint is for well-behaved orchestrators that want to release the held budget faster than the TTL.
- The provider charge still lands regardless of what Bulwark does — this design ensures Bulwark's *accounting* of that charge is correct, but it doesn't (and can't) refund the spend.

## Acceptance criteria

- [ ] `BulwarkKeyRecord` extended with optional reservation state references
- [ ] Per-`keyId` Durable Object handles reservation lifecycle
- [ ] Worst-case cost calculation correct for OpenAI / Anthropic / Gemini
- [ ] TTL'd auto-expiry, default and configurable
- [ ] `DELETE /v1/reservations/:id` endpoint for explicit release
- [ ] Test coverage for: normal flow (reserve → confirm), timeout flow (reserve → expire), explicit halt (reserve → DELETE), concurrent reservations on the same key
- [ ] DESIGN.md updated to reflect new accounting model
- [ ] Backward compatible: existing post-hoc model remains available behind a config flag during the transition

## Out of scope for this issue

- Refunding spend to the user (impossible — provider already charged)
- Replicating the cancellation upstream to the provider (would require provider-side cancellation APIs, which are spotty and inconsistent)
- Distributed reservations across multiple Bulwark deployments — single-deployment DO is enough for v1.1; cross-region is a v2 concern

## Related

- DESIGN.md: [Why post-hoc accounting, not reservations](https://github.com/OpsToInnovator/bulwark/blob/main/DESIGN.md#why-post-hoc-accounting-not-reservations)
- ROADMAP.md: [v1.1 — Reserved-vs-confirmed budget accounting](https://github.com/OpsToInnovator/bulwark/blob/main/ROADMAP.md)
- Velocity Discussion #24: [agent-runtime vs gateway split](https://github.com/ishandutta2007/Velocity/discussions/24)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reservation TTL on forced halt — handle in-flight call termination cleanly #1

Background

Why this matters

Proposed design

Implementation notes

Acceptance criteria

Out of scope for this issue

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reservation TTL on forced halt — handle in-flight call termination cleanly #1

Description

Background

Why this matters

Proposed design

Implementation notes

Acceptance criteria

Out of scope for this issue

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions