Skip to content

fix(engine): drop disconnected sequences before the prefill pass#2207

Open
sergey-scherbina wants to merge 1 commit into
EricLBuehler:masterfrom
sergey-scherbina:engine-reap-disconnected
Open

fix(engine): drop disconnected sequences before the prefill pass#2207
sergey-scherbina wants to merge 1 commit into
EricLBuehler:masterfrom
sergey-scherbina:engine-reap-disconnected

Conversation

@sergey-scherbina

@sergey-scherbina sergey-scherbina commented Jun 11, 2026

Copy link
Copy Markdown

What

Reap sequences whose client has disconnected before running the prefill pass,
not after. A prompt prefill is the expensive step; if the client is already gone, the
sequence should be dropped before it consumes a forward.

Why

With max_num_seqs = 1 (serialized scheduling, common on memory-constrained Metal),
a large-prompt prefill from a client that has already disconnected would still run to
completion and block the single slot, stalling every subsequent request. Dropping
disconnected sequences up front frees the queue immediately.

Scope

mistralrs-core/src/engine/mod.rs, +12. Self-contained and general (not tied to any
model or backend). Independent of the other PRs in this series.


Part of splitting the Qwen3.6 work into focused, reviewable PRs:

Suggested merge order: #2206 + #2207 -> #2201 -> #2208.

A dead receiver is only noticed at the first post-prefill streaming send, so an
abandoned long prefill ran to completion and, with max_num_seqs=1, starved every
following request. In the PagedAttention arm, retain only sequences whose
responder is still open before stepping; mark the rest Done(Canceled) for the
normal completed-sequence reaping.
@github-actions

Copy link
Copy Markdown
Code Metrics Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 C Header                 23         4454         3116          790          548
 CSS                       3          281          252            5           24
 CUDA                    119        23575        19136         1696         2743
 Dockerfile                1           38           21            8            9
 HTML                      2           27           27            0            0
 JavaScript                3          392          387            2            3
 Jinja2                    7          694          656            5           33
 JSON                     26         9360         9357            0            3
 Makefile                  1            6            5            0            1
 MDX                       1          149            0          133           16
 Metal Shading Lan|       37        14287        11284         1136         1867
 PowerShell                1          357          276           33           48
 Python                  131        10342         8515          460         1367
 Shell                     2          549          379          101           69
 Plain Text                3         3723            0         2413         1310
 TOML                     29         1388         1211           41          136
 TypeScript               11         1607         1371           66          170
 YAML                      3           25           23            2            0
─────────────────────────────────────────────────────────────────────────────────
 Jupyter Notebooks         3          122           83           23           16
 |- Markdown               1           60           30           22            8
 |- Python                 1          122          113            1            8
 (Total)                              304          226           46           32
─────────────────────────────────────────────────────────────────────────────────
 Markdown                129         9703            0         6648         3055
 |- BASH                  61          600          520           47           33
 |- Dockerfile             2            5            5            0            0
 |- JSON                  18          700          700            0            0
 |- PowerShell             3            5            5            0            0
 |- Python                25          830          722            5          103
 |- Rust                  15          437          382            1           54
 |- TOML                  10          124           98            3           23
 |- YAML                   1           13           13            0            0
 (Total)                            12417         2445         6704         3268
─────────────────────────────────────────────────────────────────────────────────
 Rust                    625       270388       239956         5864        24568
 |- Markdown             397         9504          452         7882         1170
 (Total)                           279892       240408        13746        25738
─────────────────────────────────────────────────────────────────────────────────
 Svelte                   18         1831         1696           50           85
 |- CSS                    1            4            4            0            0
 |- JavaScript            18          876          727           24          125
 (Total)                             2711         2427           74          210
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                  1178       366578       301522        27461        37595
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant