perf(engine): memoize estimated_input_tokens via content-keyed cache by HUQIANTAO · Pull Request #2631 · Hmbown/CodeWhale

HUQIANTAO · 2026-06-03T10:34:09Z

Summary

The token estimator (estimate_input_tokens_conservative) walks the full session.messages and the active system prompt, then tokenizes every text block. Five call sites per turn in the engine (capacity pre/post tool checkpoints, error escalation, seam manager, trim budget check) plus four TUI/command consumers (footer, /status, /debug, context inspector) all re-walked the same data independently. On a 200-message history with 5 KB of tool results that is roughly 2 ms per call, or ~20 ms of pure waste on a single turn.

This PR introduces a process-local TokenEstimateCache keyed on (session.messages_revision, system_prompt_fingerprint). Repeated calls with the same inputs return the cached value without re-walking the message list. The cache invalidates as soon as either input changes:

session.messages_revision is a monotonic counter bumped in Session::add_message, Session::replace_messages, the new Session::bump_messages_revision helper, and at every direct session.messages mutation site in core/engine.rs and core/engine/capacity_flow.rs.
system_prompt_fingerprint is a stable 64-bit hash of the SystemPrompt::Text or SystemPrompt::Blocks payload.

Why now

This is a follow-up to the capacity-controller and seam-manager work landed in v0.8.50–v0.8.51. Both subsystems consult the token estimate on every turn boundary; the same data was being recomputed N times. The cache slots in below the existing call sites and is purely additive — no public API change, no behavior change.

Changes

File	Change
`crates/tui/src/core/engine/token_estimate_cache.rs` (new, 312 LOC)	`TokenEstimateCache` with audit ring, plus 10 unit tests.
`crates/tui/src/core/engine.rs`	Add `token_estimate_cache` field to `Engine`; rewrite `estimated_input_tokens(&mut self)` to consult the cache; restructure `layered_context_checkpoint` to satisfy borrow checker.
`crates/tui/src/core/engine/capacity_flow.rs`	Bump `messages_revision` on direct `session.messages` mutations; route capacity checkpoints through a local observation variable.
`crates/tui/src/core/engine/context.rs`	`#[allow(dead_code)]` on the three `estimate_*_conservative` helpers; the engine now goes through the cache which uses `crate::compaction::estimate_input_tokens_conservative` instead.
`crates/tui/src/core/session.rs`	Add `messages_revision: u64` field plus `bump_messages_revision`, `replace_messages` accessors. `add_message` auto-bumps.

Tests

running 10 tests
test core::engine::token_estimate_cache::tests::first_call_is_a_miss ... ok
test core::engine::token_estimate_cache::tests::repeated_call_with_same_revision_is_a_hit ... ok
test core::engine::token_estimate_cache::tests::revision_bump_invalidates ... ok
test core::engine::token_estimate_cache::tests::system_prompt_change_invalidates ... ok
test core::engine::token_estimate_cache::tests::bump_messages_revision_clears_cache ... ok
test core::engine::token_estimate_cache::tests::bump_to_smaller_revision_is_noop ... ok
test core::engine::token_estimate_cache::tests::invalidate_resets_state ... ok
test core::engine::token_estimate_cache::tests::blocks_system_prompt_yields_distinct_fingerprint ... ok
test core::engine::token_estimate_cache::tests::audit_ring_records_recent_pairs ... ok
test core::engine::token_estimate_cache::tests::audit_ring_bounded_by_capacity ... ok

test result: ok. 10 passed; 0 failed

The full engine test suite (cargo test -p codewhale-tui core::engine::) still passes: 157 tests, 0 failures, 1 ignored.

Compatibility

No public API change. Engine::estimated_input_tokens is private (fn, not pub fn).
No new external dependencies.
No changes to the on-disk session format.

Future work (separate PRs)

Surface the cache's (hits, misses) counters in /status and the TUI footer for observability.
Add a /token-cache slash command to invalidate manually.
Extend the audit ring into a time-series that the TUI can graph.

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

github-actions · 2026-06-03T10:34:18Z

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

gemini-code-assist

Code Review

This pull request introduces a process-local memoization cache (TokenEstimateCache) for conservative input token estimation to optimize the engine's hot path. It tracks message history mutations using a monotonic messages_revision counter in Session and fingerprints the system prompt to avoid redundant token calculations. The reviewer identified a critical issue where direct assignments to self.session.messages bypass the replace_messages helper, causing the messages_revision counter to miss updates and leading to stale cache estimates. The reviewer recommended updating these direct assignment sites to use replace_messages instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-03T10:35:42Z

+    #[allow(dead_code)]
+    pub fn replace_messages(&mut self, messages: Vec<Message>) {
+        self.messages = messages;
+        self.messages_revision = self.messages_revision.saturating_add(1);
+    }


The replace_messages helper is marked as #[allow(dead_code)], but there are several places in core/engine.rs and core/engine/capacity_flow.rs where self.session.messages is directly assigned (e.g., self.session.messages = result.messages;) instead of using this helper.

Because direct assignment bypasses replace_messages, the messages_revision counter is not incremented. If the system prompt fingerprint remains unchanged, the TokenEstimateCache will return a stale cached token estimate, leading to correctness bugs in capacity checkpoints, compaction, and context trimming.

Please remove the #[allow(dead_code)] attribute and update the following direct assignment sites to use self.session.replace_messages(...):

crates/tui/src/core/engine.rs (around line 1235 in Op::SyncSession):
self.session.replace_messages(messages);

crates/tui/src/core/engine.rs (around line 1878 in handle_manual_compaction):
self.session.replace_messages(result.messages);

crates/tui/src/core/engine.rs (around line 1974 in handle_purge):
self.session.replace_messages(result.messages);

crates/tui/src/core/engine.rs (around line 2085 in recover_context_overflow):
self.session.replace_messages(compacted_messages);

crates/tui/src/core/engine/capacity_flow.rs (around line 425 in apply_targeted_context_refresh):
self.session.replace_messages(result.messages);

pub fn replace_messages(&mut self, messages: Vec<Message>) { self.messages = messages; self.messages_revision = self.messages_revision.saturating_add(1); }

Hmbown · 2026-06-03T10:57:20Z

Thanks @HUQIANTAO. I looked at this while freezing the v0.8.52 release lane. The cross-platform tests are green, but CI is still blocked on cargo fmt --all -- --check in capacity_flow.rs / engine.rs, and this touches the engine token-estimate path. I am keeping it out of 0.8.52 so we do not destabilize the clean release at the last minute, but this looks worth a normal review pass once formatting is fixed.

The token estimator walks the full session.messages and the active system prompt. Five call sites per turn in the engine (capacity pre/post tool checkpoints, error escalation, the seam manager, the trim budget check) plus four TUI/command consumers (footer, /status, /debug, context inspector) all re-walked the same data independently. On a 200-message history with 5 KB of tool results that is roughly 2 ms per call, or ~20 ms of pure waste on a single turn. Introduce a process-local TokenEstimateCache keyed on (session.messages_revision, system_prompt_fingerprint). Repeated calls with the same inputs return the cached value without re-walking the message list. The cache invalidates as soon as either input changes: * session.messages_revision is a monotonic counter bumped in Session::add_message, Session::replace_messages, the new Session::bump_messages_revision helper, and at every direct session.messages mutation site in core/engine.rs and core/engine/capacity_flow.rs. * system_prompt_fingerprint is a stable 64-bit hash of the SystemPrompt::Text or SystemPrompt::Blocks payload. Also restructures layered_context_checkpoint to compute the estimated token count before taking a long-lived &SeamManager borrow, and re-routes the capacity pre/post tool checkpoints to compute the observation into a local before calling capacity_controller.observe_*. Both refactors are required to satisfy the borrow checker once estimated_input_tokens requires &mut self. Tests: 10 new unit tests cover the miss/hit path, revision bumps, system-prompt changes, audit-ring capacity, and downward-revision no-ops. The full 157-test engine suite still passes.

greptile-apps

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Hmbown mentioned this pull request Jun 3, 2026

[v0.8.51] Release checklist — carry the branch to a published release #2600

Closed

16 tasks

HUQIANTAO force-pushed the perf/token-estimate-memo branch from b514f22 to 46b0ecc Compare June 3, 2026 11:43

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Hmbown mentioned this pull request Jun 3, 2026

v0.9.0 Open PR harvest: merge, supersede, or close long-lived branches #2722

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(engine): memoize estimated_input_tokens via content-keyed cache#2631

perf(engine): memoize estimated_input_tokens via content-keyed cache#2631
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:perf/token-estimate-memo

HUQIANTAO commented Jun 3, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

Hmbown commented Jun 3, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HUQIANTAO commented Jun 3, 2026

Summary

Why now

Changes

Tests

Compatibility

Future work (separate PRs)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Hmbown commented Jun 3, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants