perf(engine): memoize estimated_input_tokens via content-keyed cache#2631
perf(engine): memoize estimated_input_tokens via content-keyed cache#2631HUQIANTAO wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Thanks @HUQIANTAO for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request introduces a process-local memoization cache (TokenEstimateCache) for conservative input token estimation to optimize the engine's hot path. It tracks message history mutations using a monotonic messages_revision counter in Session and fingerprints the system prompt to avoid redundant token calculations. The reviewer identified a critical issue where direct assignments to self.session.messages bypass the replace_messages helper, causing the messages_revision counter to miss updates and leading to stale cache estimates. The reviewer recommended updating these direct assignment sites to use replace_messages instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| #[allow(dead_code)] | ||
| pub fn replace_messages(&mut self, messages: Vec<Message>) { | ||
| self.messages = messages; | ||
| self.messages_revision = self.messages_revision.saturating_add(1); | ||
| } |
There was a problem hiding this comment.
The replace_messages helper is marked as #[allow(dead_code)], but there are several places in core/engine.rs and core/engine/capacity_flow.rs where self.session.messages is directly assigned (e.g., self.session.messages = result.messages;) instead of using this helper.
Because direct assignment bypasses replace_messages, the messages_revision counter is not incremented. If the system prompt fingerprint remains unchanged, the TokenEstimateCache will return a stale cached token estimate, leading to correctness bugs in capacity checkpoints, compaction, and context trimming.
Please remove the #[allow(dead_code)] attribute and update the following direct assignment sites to use self.session.replace_messages(...):
crates/tui/src/core/engine.rs(around line 1235 inOp::SyncSession):self.session.replace_messages(messages);
crates/tui/src/core/engine.rs(around line 1878 inhandle_manual_compaction):self.session.replace_messages(result.messages);
crates/tui/src/core/engine.rs(around line 1974 inhandle_purge):self.session.replace_messages(result.messages);
crates/tui/src/core/engine.rs(around line 2085 inrecover_context_overflow):self.session.replace_messages(compacted_messages);
crates/tui/src/core/engine/capacity_flow.rs(around line 425 inapply_targeted_context_refresh):self.session.replace_messages(result.messages);
pub fn replace_messages(&mut self, messages: Vec<Message>) {
self.messages = messages;
self.messages_revision = self.messages_revision.saturating_add(1);
}|
Thanks @HUQIANTAO. I looked at this while freezing the v0.8.52 release lane. The cross-platform tests are green, but CI is still blocked on |
The token estimator walks the full session.messages and the active system
prompt. Five call sites per turn in the engine (capacity pre/post tool
checkpoints, error escalation, the seam manager, the trim budget check)
plus four TUI/command consumers (footer, /status, /debug, context
inspector) all re-walked the same data independently. On a 200-message
history with 5 KB of tool results that is roughly 2 ms per call, or
~20 ms of pure waste on a single turn.
Introduce a process-local TokenEstimateCache keyed on
(session.messages_revision, system_prompt_fingerprint). Repeated calls
with the same inputs return the cached value without re-walking the
message list. The cache invalidates as soon as either input changes:
* session.messages_revision is a monotonic counter bumped in
Session::add_message, Session::replace_messages, the new
Session::bump_messages_revision helper, and at every direct
session.messages mutation site in core/engine.rs and
core/engine/capacity_flow.rs.
* system_prompt_fingerprint is a stable 64-bit hash of the
SystemPrompt::Text or SystemPrompt::Blocks payload.
Also restructures layered_context_checkpoint to compute the estimated
token count before taking a long-lived &SeamManager borrow, and
re-routes the capacity pre/post tool checkpoints to compute the
observation into a local before calling
capacity_controller.observe_*. Both refactors are required to satisfy
the borrow checker once estimated_input_tokens requires &mut self.
Tests: 10 new unit tests cover the miss/hit path, revision bumps,
system-prompt changes, audit-ring capacity, and downward-revision
no-ops. The full 157-test engine suite still passes.
b514f22 to
46b0ecc
Compare
There was a problem hiding this comment.
HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Summary
The token estimator (
estimate_input_tokens_conservative) walks the fullsession.messagesand the active system prompt, then tokenizes every text block. Five call sites per turn in the engine (capacity pre/post tool checkpoints, error escalation, seam manager, trim budget check) plus four TUI/command consumers (footer,/status,/debug, context inspector) all re-walked the same data independently. On a 200-message history with 5 KB of tool results that is roughly 2 ms per call, or ~20 ms of pure waste on a single turn.This PR introduces a process-local
TokenEstimateCachekeyed on(session.messages_revision, system_prompt_fingerprint). Repeated calls with the same inputs return the cached value without re-walking the message list. The cache invalidates as soon as either input changes:session.messages_revisionis a monotonic counter bumped inSession::add_message,Session::replace_messages, the newSession::bump_messages_revisionhelper, and at every directsession.messagesmutation site incore/engine.rsandcore/engine/capacity_flow.rs.system_prompt_fingerprintis a stable 64-bit hash of theSystemPrompt::TextorSystemPrompt::Blockspayload.Why now
This is a follow-up to the capacity-controller and seam-manager work landed in v0.8.50–v0.8.51. Both subsystems consult the token estimate on every turn boundary; the same data was being recomputed N times. The cache slots in below the existing call sites and is purely additive — no public API change, no behavior change.
Changes
crates/tui/src/core/engine/token_estimate_cache.rs(new, 312 LOC)TokenEstimateCachewith audit ring, plus 10 unit tests.crates/tui/src/core/engine.rstoken_estimate_cachefield toEngine; rewriteestimated_input_tokens(&mut self)to consult the cache; restructurelayered_context_checkpointto satisfy borrow checker.crates/tui/src/core/engine/capacity_flow.rsmessages_revisionon directsession.messagesmutations; route capacity checkpoints through a local observation variable.crates/tui/src/core/engine/context.rs#[allow(dead_code)]on the threeestimate_*_conservativehelpers; the engine now goes through the cache which usescrate::compaction::estimate_input_tokens_conservativeinstead.crates/tui/src/core/session.rsmessages_revision: u64field plusbump_messages_revision,replace_messagesaccessors.add_messageauto-bumps.Tests
The full engine test suite (
cargo test -p codewhale-tui core::engine::) still passes: 157 tests, 0 failures, 1 ignored.Compatibility
Engine::estimated_input_tokensis private (fn, notpub fn).Future work (separate PRs)
(hits, misses)counters in/statusand the TUI footer for observability./token-cacheslash command to invalidate manually.