From 1b43961f1586fb93e1a9c03aedc4f04a11ec820c Mon Sep 17 00:00:00 2001 From: Kishore Kumar Date: Fri, 12 Jun 2026 03:49:49 +0530 Subject: [PATCH 1/2] docs(changelog): platform token spend now actually bills User-facing entry for the runner split-token billing fix: platform-model runs were charged run-fee-only because the runner never reported its token counts; it now streams cumulative input/output counts on every renewal and the final report. Documents the renew/report body shape, the zero-default wire compatibility (mixed fleet safe both directions), and the cached-input 0 limitation until the provider layer surfaces cache reads. Co-Authored-By: Claude Fable 5 --- changelog.mdx | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/changelog.mdx b/changelog.mdx index f90a492..c8da348 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -22,6 +22,26 @@ export const STAGE_SELF_MANAGED_M66 = "$0.0001"; usezombie is in **stealth-mode testing** and pre-production. APIs and agent behavior may change between releases without long deprecation windows. Email [usezombie@agentmail.to](mailto:usezombie@agentmail.to) if you want a hand calibrating an agent or to join as a design partner. + + ## Platform token spend now actually bills + + Runs on a platform-provided model were billed the run fee only — the tokens the agent consumed priced as zero. The server has always known how to charge for token usage, but the runner never told it how many tokens a run spent: the mid-run renewal posted an empty body, and the final report sent a single combined total into fields the server prices from per-side. Both defaulted to zero, so token cost came out to zero. The runner now streams its cumulative input and output token counts to the control plane on every renewal during a run and again on the final report, so a platform run's token spend is charged the way the rate card always described. + + ## Upgrading + + - **Update your runner fleet to bill token spend.** The change is on the runner side; the control plane already accepted these counts. A runner from before this release keeps working against the new server and an old server keeps working against a new runner — every token field defaults to zero on both ends, so a mixed fleet never errors and never charges a negative amount. But a run executed by a not-yet-upgraded runner still bills run-fee-only, exactly as before, until that host is updated. + + ## Bug fixes + + - **Platform-model runs under-billed** — a run that consumed model tokens on a platform-provided key was charged only the per-second run fee; the token component was zero. Such runs now bill the input and output token counts the agent actually consumed, metered incrementally across the run so a long run's spend is charged as it accrues rather than only at the end. + + ## API reference + + - **`POST /v1/runners/me/leases/{id}/renew`** now sends a JSON body carrying the run's cumulative token counts so far: `{"input_tokens": N, "cached_input_tokens": N, "output_tokens": N}` (cumulative for the run, not per-renewal deltas). An empty body is still accepted and meters run-fee-only. + - **The runner report** carries the same three cumulative fields alongside the existing combined `tokens` total. All three default to zero, so a report from an older runner settles run-fee-only with no error. + - Cached-input tokens ride the wire as zero until the model provider layer reports cache reads separately; cache-heavy runs bill those tokens at whatever class the provider currently folds them into. + + ## Agent memory loss is now measurable From fd21e6877dfd0df767ffead61e6296f00bc6b7dd Mon Sep 17 00:00:00 2001 From: Kishore Kumar Date: Fri, 12 Jun 2026 08:00:51 +0530 Subject: [PATCH 2/2] docs(changelog): keep token-billing entry at the operator level (greptile P1/P2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The API reference block documented the /v1/runners/me/* renew/report wire shape — internal runner↔control-plane RPC, which AGENTS.md keeps out of operator docs. Removed it and softened the lead to drop the empty-body/field-shape specifics; the user-facing story (platform token spend now bills; update your runner fleet) and the cached-input caveat stay. Removing the API reference section also fixes the section-order inversion (Bug fixes had preceded API reference). Dropped the now-unused "API" tag. Co-Authored-By: Claude Fable 5 --- changelog.mdx | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/changelog.mdx b/changelog.mdx index c8da348..17e7c99 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -22,24 +22,19 @@ export const STAGE_SELF_MANAGED_M66 = "$0.0001"; usezombie is in **stealth-mode testing** and pre-production. APIs and agent behavior may change between releases without long deprecation windows. Email [usezombie@agentmail.to](mailto:usezombie@agentmail.to) if you want a hand calibrating an agent or to join as a design partner. - + ## Platform token spend now actually bills - Runs on a platform-provided model were billed the run fee only — the tokens the agent consumed priced as zero. The server has always known how to charge for token usage, but the runner never told it how many tokens a run spent: the mid-run renewal posted an empty body, and the final report sent a single combined total into fields the server prices from per-side. Both defaulted to zero, so token cost came out to zero. The runner now streams its cumulative input and output token counts to the control plane on every renewal during a run and again on the final report, so a platform run's token spend is charged the way the rate card always described. + Runs on a platform-provided model were billed the run fee only — the tokens the agent consumed priced as zero. The platform has always known how to charge for token usage, but the runner never reported how many tokens a run spent, so the token component of every platform run settled at zero. A run's input and output token usage is now reported as the run executes, so platform token spend is charged the way the rate card always described. ## Upgrading - - **Update your runner fleet to bill token spend.** The change is on the runner side; the control plane already accepted these counts. A runner from before this release keeps working against the new server and an old server keeps working against a new runner — every token field defaults to zero on both ends, so a mixed fleet never errors and never charges a negative amount. But a run executed by a not-yet-upgraded runner still bills run-fee-only, exactly as before, until that host is updated. + - **Update your runner fleet to bill token spend.** The fix is in the runner binary. A host running an older runner keeps working and is never charged a wrong amount — it simply continues to bill run-fee-only, exactly as before, until you update it. Update your runner hosts to start billing the token component on the runs they execute. ## Bug fixes - - **Platform-model runs under-billed** — a run that consumed model tokens on a platform-provided key was charged only the per-second run fee; the token component was zero. Such runs now bill the input and output token counts the agent actually consumed, metered incrementally across the run so a long run's spend is charged as it accrues rather than only at the end. - - ## API reference - - - **`POST /v1/runners/me/leases/{id}/renew`** now sends a JSON body carrying the run's cumulative token counts so far: `{"input_tokens": N, "cached_input_tokens": N, "output_tokens": N}` (cumulative for the run, not per-renewal deltas). An empty body is still accepted and meters run-fee-only. - - **The runner report** carries the same three cumulative fields alongside the existing combined `tokens` total. All three default to zero, so a report from an older runner settles run-fee-only with no error. - - Cached-input tokens ride the wire as zero until the model provider layer reports cache reads separately; cache-heavy runs bill those tokens at whatever class the provider currently folds them into. + - **Platform-model runs under-billed** — a run that consumed model tokens on a platform-provided key was charged only the per-second run fee; the token component was zero. Such runs now bill the input and output token counts the agent actually consumed, metered as the run accrues rather than only at the end. + - **Cached-input tokens** are not yet billed separately: until the model-provider layer reports cache reads on their own, cache-heavy runs bill those tokens at whatever class the provider folds them into. No run is over-charged for them.