Skip to content

perf(engine): memoize estimated_input_tokens via content-keyed cache#2631

Open
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:perf/token-estimate-memo
Open

perf(engine): memoize estimated_input_tokens via content-keyed cache#2631
HUQIANTAO wants to merge 1 commit into
Hmbown:mainfrom
HUQIANTAO:perf/token-estimate-memo

Conversation

@HUQIANTAO
Copy link
Copy Markdown
Contributor

Summary

The token estimator (estimate_input_tokens_conservative) walks the full session.messages and the active system prompt, then tokenizes every text block. Five call sites per turn in the engine (capacity pre/post tool checkpoints, error escalation, seam manager, trim budget check) plus four TUI/command consumers (footer, /status, /debug, context inspector) all re-walked the same data independently. On a 200-message history with 5 KB of tool results that is roughly 2 ms per call, or ~20 ms of pure waste on a single turn.

This PR introduces a process-local TokenEstimateCache keyed on (session.messages_revision, system_prompt_fingerprint). Repeated calls with the same inputs return the cached value without re-walking the message list. The cache invalidates as soon as either input changes:

  • session.messages_revision is a monotonic counter bumped in Session::add_message, Session::replace_messages, the new Session::bump_messages_revision helper, and at every direct session.messages mutation site in core/engine.rs and core/engine/capacity_flow.rs.
  • system_prompt_fingerprint is a stable 64-bit hash of the SystemPrompt::Text or SystemPrompt::Blocks payload.

Why now

This is a follow-up to the capacity-controller and seam-manager work landed in v0.8.50–v0.8.51. Both subsystems consult the token estimate on every turn boundary; the same data was being recomputed N times. The cache slots in below the existing call sites and is purely additive — no public API change, no behavior change.

Changes

File Change
crates/tui/src/core/engine/token_estimate_cache.rs (new, 312 LOC) TokenEstimateCache with audit ring, plus 10 unit tests.
crates/tui/src/core/engine.rs Add token_estimate_cache field to Engine; rewrite estimated_input_tokens(&mut self) to consult the cache; restructure layered_context_checkpoint to satisfy borrow checker.
crates/tui/src/core/engine/capacity_flow.rs Bump messages_revision on direct session.messages mutations; route capacity checkpoints through a local observation variable.
crates/tui/src/core/engine/context.rs #[allow(dead_code)] on the three estimate_*_conservative helpers; the engine now goes through the cache which uses crate::compaction::estimate_input_tokens_conservative instead.
crates/tui/src/core/session.rs Add messages_revision: u64 field plus bump_messages_revision, replace_messages accessors. add_message auto-bumps.

Tests

running 10 tests
test core::engine::token_estimate_cache::tests::first_call_is_a_miss ... ok
test core::engine::token_estimate_cache::tests::repeated_call_with_same_revision_is_a_hit ... ok
test core::engine::token_estimate_cache::tests::revision_bump_invalidates ... ok
test core::engine::token_estimate_cache::tests::system_prompt_change_invalidates ... ok
test core::engine::token_estimate_cache::tests::bump_messages_revision_clears_cache ... ok
test core::engine::token_estimate_cache::tests::bump_to_smaller_revision_is_noop ... ok
test core::engine::token_estimate_cache::tests::invalidate_resets_state ... ok
test core::engine::token_estimate_cache::tests::blocks_system_prompt_yields_distinct_fingerprint ... ok
test core::engine::token_estimate_cache::tests::audit_ring_records_recent_pairs ... ok
test core::engine::token_estimate_cache::tests::audit_ring_bounded_by_capacity ... ok

test result: ok. 10 passed; 0 failed

The full engine test suite (cargo test -p codewhale-tui core::engine::) still passes: 157 tests, 0 failures, 1 ignored.

Compatibility

  • No public API change. Engine::estimated_input_tokens is private (fn, not pub fn).
  • No new external dependencies.
  • No changes to the on-disk session format.

Future work (separate PRs)

  • Surface the cache's (hits, misses) counters in /status and the TUI footer for observability.
  • Add a /token-cache slash command to invalidate manually.
  • Extend the audit ring into a time-series that the TUI can graph.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Thanks @HUQIANTAO for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a process-local memoization cache (TokenEstimateCache) for conservative input token estimation to optimize the engine's hot path. It tracks message history mutations using a monotonic messages_revision counter in Session and fingerprints the system prompt to avoid redundant token calculations. The reviewer identified a critical issue where direct assignments to self.session.messages bypass the replace_messages helper, causing the messages_revision counter to miss updates and leading to stale cache estimates. The reviewer recommended updating these direct assignment sites to use replace_messages instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +180 to +184
#[allow(dead_code)]
pub fn replace_messages(&mut self, messages: Vec<Message>) {
self.messages = messages;
self.messages_revision = self.messages_revision.saturating_add(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The replace_messages helper is marked as #[allow(dead_code)], but there are several places in core/engine.rs and core/engine/capacity_flow.rs where self.session.messages is directly assigned (e.g., self.session.messages = result.messages;) instead of using this helper.

Because direct assignment bypasses replace_messages, the messages_revision counter is not incremented. If the system prompt fingerprint remains unchanged, the TokenEstimateCache will return a stale cached token estimate, leading to correctness bugs in capacity checkpoints, compaction, and context trimming.

Please remove the #[allow(dead_code)] attribute and update the following direct assignment sites to use self.session.replace_messages(...):

  1. crates/tui/src/core/engine.rs (around line 1235 in Op::SyncSession):
    self.session.replace_messages(messages);
  2. crates/tui/src/core/engine.rs (around line 1878 in handle_manual_compaction):
    self.session.replace_messages(result.messages);
  3. crates/tui/src/core/engine.rs (around line 1974 in handle_purge):
    self.session.replace_messages(result.messages);
  4. crates/tui/src/core/engine.rs (around line 2085 in recover_context_overflow):
    self.session.replace_messages(compacted_messages);
  5. crates/tui/src/core/engine/capacity_flow.rs (around line 425 in apply_targeted_context_refresh):
    self.session.replace_messages(result.messages);
    pub fn replace_messages(&mut self, messages: Vec<Message>) {
        self.messages = messages;
        self.messages_revision = self.messages_revision.saturating_add(1);
    }

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 3, 2026

Thanks @HUQIANTAO. I looked at this while freezing the v0.8.52 release lane. The cross-platform tests are green, but CI is still blocked on cargo fmt --all -- --check in capacity_flow.rs / engine.rs, and this touches the engine token-estimate path. I am keeping it out of 0.8.52 so we do not destabilize the clean release at the last minute, but this looks worth a normal review pass once formatting is fixed.

The token estimator walks the full session.messages and the active system
prompt. Five call sites per turn in the engine (capacity pre/post tool
checkpoints, error escalation, the seam manager, the trim budget check)
plus four TUI/command consumers (footer, /status, /debug, context
inspector) all re-walked the same data independently. On a 200-message
history with 5 KB of tool results that is roughly 2 ms per call, or
~20 ms of pure waste on a single turn.

Introduce a process-local TokenEstimateCache keyed on
(session.messages_revision, system_prompt_fingerprint). Repeated calls
with the same inputs return the cached value without re-walking the
message list. The cache invalidates as soon as either input changes:
  * session.messages_revision is a monotonic counter bumped in
    Session::add_message, Session::replace_messages, the new
    Session::bump_messages_revision helper, and at every direct
    session.messages mutation site in core/engine.rs and
    core/engine/capacity_flow.rs.
  * system_prompt_fingerprint is a stable 64-bit hash of the
    SystemPrompt::Text or SystemPrompt::Blocks payload.

Also restructures layered_context_checkpoint to compute the estimated
token count before taking a long-lived &SeamManager borrow, and
re-routes the capacity pre/post tool checkpoints to compute the
observation into a local before calling
capacity_controller.observe_*. Both refactors are required to satisfy
the borrow checker once estimated_input_tokens requires &mut self.

Tests: 10 new unit tests cover the miss/hit path, revision bumps,
system-prompt changes, audit-ring capacity, and downward-revision
no-ops. The full 157-test engine suite still passes.
@HUQIANTAO HUQIANTAO force-pushed the perf/token-estimate-memo branch from b514f22 to 46b0ecc Compare June 3, 2026 11:43
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUQIANTAO has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants