From 1b43961f1586fb93e1a9c03aedc4f04a11ec820c Mon Sep 17 00:00:00 2001
From: Kishore Kumar <nkishore@megam.io>
Date: Fri, 12 Jun 2026 03:49:49 +0530
Subject: [PATCH 1/2] docs(changelog): platform token spend now actually bills

User-facing entry for the runner split-token billing fix: platform-model
runs were charged run-fee-only because the runner never reported its token
counts; it now streams cumulative input/output counts on every renewal and
the final report. Documents the renew/report body shape, the zero-default
wire compatibility (mixed fleet safe both directions), and the cached-input
0 limitation until the provider layer surfaces cache reads.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 changelog.mdx | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
diff --git a/changelog.mdx b/changelog.mdx
index f90a492..c8da348 100644
--- a/changelog.mdx
+++ b/changelog.mdx
@@ -22,6 +22,26 @@ export const STAGE_SELF_MANAGED_M66 = "$0.0001";
   usezombie is in **stealth-mode testing** and pre-production. APIs and agent behavior may change between releases without long deprecation windows. Email [usezombie@agentmail.to](mailto:usezombie@agentmail.to) if you want a hand calibrating an agent or to join as a design partner.
 </Tip>
 
+<Update label="Jun 12, 2026" tags={["Bug fixes", "API"]}>
+  ## Platform token spend now actually bills
+
+  Runs on a platform-provided model were billed the run fee only — the tokens the agent consumed priced as zero. The server has always known how to charge for token usage, but the runner never told it how many tokens a run spent: the mid-run renewal posted an empty body, and the final report sent a single combined total into fields the server prices from per-side. Both defaulted to zero, so token cost came out to zero. The runner now streams its cumulative input and output token counts to the control plane on every renewal during a run and again on the final report, so a platform run's token spend is charged the way the rate card always described.
+
+  ## Upgrading
+
+  - **Update your runner fleet to bill token spend.** The change is on the runner side; the control plane already accepted these counts. A runner from before this release keeps working against the new server and an old server keeps working against a new runner — every token field defaults to zero on both ends, so a mixed fleet never errors and never charges a negative amount. But a run executed by a not-yet-upgraded runner still bills run-fee-only, exactly as before, until that host is updated.
+
+  ## Bug fixes
+
+  - **Platform-model runs under-billed** — a run that consumed model tokens on a platform-provided key was charged only the per-second run fee; the token component was zero. Such runs now bill the input and output token counts the agent actually consumed, metered incrementally across the run so a long run's spend is charged as it accrues rather than only at the end.
+
+  ## API reference
+
+  - **`POST /v1/runners/me/leases/{id}/renew`** now sends a JSON body carrying the run's cumulative token counts so far: `{"input_tokens": N, "cached_input_tokens": N, "output_tokens": N}` (cumulative for the run, not per-renewal deltas). An empty body is still accepted and meters run-fee-only.
+  - **The runner report** carries the same three cumulative fields alongside the existing combined `tokens` total. All three default to zero, so a report from an older runner settles run-fee-only with no error.
+  - Cached-input tokens ride the wire as zero until the model provider layer reports cache reads separately; cache-heavy runs bill those tokens at whatever class the provider currently folds them into.
+</Update>
+
 <Update label="Jun 12, 2026" tags={["What's new", "Observability"]}>
   ## Agent memory loss is now measurable
 

From fd21e6877dfd0df767ffead61e6296f00bc6b7dd Mon Sep 17 00:00:00 2001
From: Kishore Kumar <nkishore@megam.io>
Date: Fri, 12 Jun 2026 08:00:51 +0530
Subject: [PATCH 2/2] docs(changelog): keep token-billing entry at the operator
 level (greptile P1/P2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The API reference block documented the /v1/runners/me/* renew/report
wire shape — internal runner↔control-plane RPC, which AGENTS.md keeps
out of operator docs. Removed it and softened the lead to drop the
empty-body/field-shape specifics; the user-facing story (platform token
spend now bills; update your runner fleet) and the cached-input caveat
stay. Removing the API reference section also fixes the section-order
inversion (Bug fixes had preceded API reference). Dropped the now-unused
"API" tag.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 changelog.mdx | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/changelog.mdx b/changelog.mdx
index c8da348..17e7c99 100644
--- a/changelog.mdx
+++ b/changelog.mdx
@@ -22,24 +22,19 @@ export const STAGE_SELF_MANAGED_M66 = "$0.0001";
   usezombie is in **stealth-mode testing** and pre-production. APIs and agent behavior may change between releases without long deprecation windows. Email [usezombie@agentmail.to](mailto:usezombie@agentmail.to) if you want a hand calibrating an agent or to join as a design partner.
 </Tip>
 
-<Update label="Jun 12, 2026" tags={["Bug fixes", "API"]}>
+<Update label="Jun 12, 2026" tags={["Bug fixes"]}>
   ## Platform token spend now actually bills
 
-  Runs on a platform-provided model were billed the run fee only — the tokens the agent consumed priced as zero. The server has always known how to charge for token usage, but the runner never told it how many tokens a run spent: the mid-run renewal posted an empty body, and the final report sent a single combined total into fields the server prices from per-side. Both defaulted to zero, so token cost came out to zero. The runner now streams its cumulative input and output token counts to the control plane on every renewal during a run and again on the final report, so a platform run's token spend is charged the way the rate card always described.
+  Runs on a platform-provided model were billed the run fee only — the tokens the agent consumed priced as zero. The platform has always known how to charge for token usage, but the runner never reported how many tokens a run spent, so the token component of every platform run settled at zero. A run's input and output token usage is now reported as the run executes, so platform token spend is charged the way the rate card always described.
 
   ## Upgrading
 
-  - **Update your runner fleet to bill token spend.** The change is on the runner side; the control plane already accepted these counts. A runner from before this release keeps working against the new server and an old server keeps working against a new runner — every token field defaults to zero on both ends, so a mixed fleet never errors and never charges a negative amount. But a run executed by a not-yet-upgraded runner still bills run-fee-only, exactly as before, until that host is updated.
+  - **Update your runner fleet to bill token spend.** The fix is in the runner binary. A host running an older runner keeps working and is never charged a wrong amount — it simply continues to bill run-fee-only, exactly as before, until you update it. Update your runner hosts to start billing the token component on the runs they execute.
 
   ## Bug fixes
 
-  - **Platform-model runs under-billed** — a run that consumed model tokens on a platform-provided key was charged only the per-second run fee; the token component was zero. Such runs now bill the input and output token counts the agent actually consumed, metered incrementally across the run so a long run's spend is charged as it accrues rather than only at the end.
-
-  ## API reference
-
-  - **`POST /v1/runners/me/leases/{id}/renew`** now sends a JSON body carrying the run's cumulative token counts so far: `{"input_tokens": N, "cached_input_tokens": N, "output_tokens": N}` (cumulative for the run, not per-renewal deltas). An empty body is still accepted and meters run-fee-only.
-  - **The runner report** carries the same three cumulative fields alongside the existing combined `tokens` total. All three default to zero, so a report from an older runner settles run-fee-only with no error.
-  - Cached-input tokens ride the wire as zero until the model provider layer reports cache reads separately; cache-heavy runs bill those tokens at whatever class the provider currently folds them into.
+  - **Platform-model runs under-billed** — a run that consumed model tokens on a platform-provided key was charged only the per-second run fee; the token component was zero. Such runs now bill the input and output token counts the agent actually consumed, metered as the run accrues rather than only at the end.
+  - **Cached-input tokens** are not yet billed separately: until the model-provider layer reports cache reads on their own, cache-heavy runs bill those tokens at whatever class the provider folds them into. No run is over-charged for them.
 </Update>
 
 <Update label="Jun 12, 2026" tags={["What's new", "Observability"]}>