Skip to content

server: enhance FIFO prompt cache eviction with second-chance algorithm#23666

Open
nonml wants to merge 1 commit into
ggml-org:masterfrom
nonml:clock-eviction
Open

server: enhance FIFO prompt cache eviction with second-chance algorithm#23666
nonml wants to merge 1 commit into
ggml-org:masterfrom
nonml:clock-eviction

Conversation

@nonml

@nonml nonml commented May 25, 2026

Copy link
Copy Markdown
  • Pure FIFO eviction always removes the oldest cache entry
  • A small side session request evicts a large session's cached KV state immediately, forcing full reprocessing and preventing users on small machines from switching sessions back and forth.

What the fix does:

  • Assign score = 1 to new cache entries
  • Only on a cache hit (best match) score += 1 (max score=4) -> 4 is the sweet spot
  • During checkpoint limit reached, instead of always evicting first out,
    • If front score <= 1, evict it
    • Otherwise, score -= 1 rotate it to the back (second chance), repeat
  • If all entries have max score, cap max iter at states.size() * 5 and falls back to evicting the front to prevents infinite loops

With this algorithm, I can switch between sessions without sacrificing compute.

Additional information

related issue (#23030, #20510)

Requirements

@nonml nonml requested a review from a team as a code owner May 25, 2026 13:18
@jacekpoplawski

Copy link
Copy Markdown
Contributor

I initially tried this #22826

Switches to a second-chance policy with basic hit tracking and decay to prevent one-off requests from evicting heavily used system prompts. Uses a simple score-decay approach to track hits without adding O(N) scan overhead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants