fix(engine): drop disconnected sequences before the prefill pass#2207
Open
sergey-scherbina wants to merge 1 commit into
Open
fix(engine): drop disconnected sequences before the prefill pass#2207sergey-scherbina wants to merge 1 commit into
sergey-scherbina wants to merge 1 commit into
Conversation
A dead receiver is only noticed at the first post-prefill streaming send, so an abandoned long prefill ran to completion and, with max_num_seqs=1, starved every following request. In the PagedAttention arm, retain only sequences whose responder is still open before stepping; mark the rest Done(Canceled) for the normal completed-sequence reaping.
This was referenced Jun 11, 2026
Code Metrics Report━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Language Files Lines Code Comments Blanks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ C Header 23 4454 3116 790 548 CSS 3 281 252 5 24 CUDA 119 23575 19136 1696 2743 Dockerfile 1 38 21 8 9 HTML 2 27 27 0 0 JavaScript 3 392 387 2 3 Jinja2 7 694 656 5 33 JSON 26 9360 9357 0 3 Makefile 1 6 5 0 1 MDX 1 149 0 133 16 Metal Shading Lan| 37 14287 11284 1136 1867 PowerShell 1 357 276 33 48 Python 131 10342 8515 460 1367 Shell 2 549 379 101 69 Plain Text 3 3723 0 2413 1310 TOML 29 1388 1211 41 136 TypeScript 11 1607 1371 66 170 YAML 3 25 23 2 0 ───────────────────────────────────────────────────────────────────────────────── Jupyter Notebooks 3 122 83 23 16 |- Markdown 1 60 30 22 8 |- Python 1 122 113 1 8 (Total) 304 226 46 32 ───────────────────────────────────────────────────────────────────────────────── Markdown 129 9703 0 6648 3055 |- BASH 61 600 520 47 33 |- Dockerfile 2 5 5 0 0 |- JSON 18 700 700 0 0 |- PowerShell 3 5 5 0 0 |- Python 25 830 722 5 103 |- Rust 15 437 382 1 54 |- TOML 10 124 98 3 23 |- YAML 1 13 13 0 0 (Total) 12417 2445 6704 3268 ───────────────────────────────────────────────────────────────────────────────── Rust 625 270388 239956 5864 24568 |- Markdown 397 9504 452 7882 1170 (Total) 279892 240408 13746 25738 ───────────────────────────────────────────────────────────────────────────────── Svelte 18 1831 1696 50 85 |- CSS 1 4 4 0 0 |- JavaScript 18 876 727 24 125 (Total) 2711 2427 74 210 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total 1178 366578 301522 27461 37595 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Reap sequences whose client has disconnected before running the prefill pass,
not after. A prompt prefill is the expensive step; if the client is already gone, the
sequence should be dropped before it consumes a forward.
Why
With
max_num_seqs = 1(serialized scheduling, common on memory-constrained Metal),a large-prompt prefill from a client that has already disconnected would still run to
completion and block the single slot, stalling every subsequent request. Dropping
disconnected sequences up front frees the queue immediately.
Scope
mistralrs-core/src/engine/mod.rs, +12. Self-contained and general (not tied to anymodel or backend). Independent of the other PRs in this series.
Part of splitting the Qwen3.6 work into focused, reviewable PRs:
Suggested merge order: #2206 + #2207 -> #2201 -> #2208.