fix(responses): drain streaming chunks in background task handler#2114
fix(responses): drain streaming chunks in background task handler#2114glaziermag wants to merge 1 commit into
Conversation
Code Metrics Report━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Language Files Lines Code Comments Blanks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ C Header 5 305 210 52 43 CSS 3 281 252 5 24 CUDA 59 17661 13824 1637 2200 Dockerfile 1 38 21 8 9 HTML 2 27 27 0 0 JavaScript 3 392 387 2 3 Jinja2 7 694 656 5 33 JSON 25 9346 9343 0 3 Makefile 1 6 5 0 1 MDX 1 147 0 132 15 Metal Shading Lan| 31 11647 9007 1064 1576 PowerShell 1 300 227 30 43 Python 129 9969 8194 456 1319 Shell 2 489 331 96 62 Plain Text 3 3723 0 2413 1310 TOML 27 1309 1145 36 128 TypeScript 11 1607 1371 66 170 YAML 3 25 23 2 0 ───────────────────────────────────────────────────────────────────────────────── Jupyter Notebooks 3 122 83 23 16 |- Markdown 1 60 30 22 8 |- Python 1 122 113 1 8 (Total) 304 226 46 32 ───────────────────────────────────────────────────────────────────────────────── Markdown 119 8232 0 5591 2641 |- BASH 52 491 432 34 25 |- Dockerfile 2 5 5 0 0 |- JSON 16 582 582 0 0 |- PowerShell 3 5 5 0 0 |- Python 22 687 604 5 78 |- Rust 13 415 362 1 52 |- TOML 9 107 83 3 21 |- YAML 1 9 9 0 0 (Total) 10533 2082 5634 2817 ───────────────────────────────────────────────────────────────────────────────── Rust 571 245656 216375 6437 22844 |- Markdown 379 9235 452 7653 1130 (Total) 254891 216827 14090 23974 ───────────────────────────────────────────────────────────────────────────────── Svelte 18 1831 1696 50 85 |- CSS 1 4 4 0 0 |- JavaScript 18 876 727 24 125 (Total) 2711 2427 74 210 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total 1025 326405 266585 25848 33972 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ |
d6b3865 to
1c15df6
Compare
6e868fd to
763c42c
Compare
|
Agent 6 follow-up on existing A100 validation: this remains valid. Classification: |
|
Wave 1 evidence bundle index: PR: #2114 Exact server command recorded in the PR body: cargo build --release --features cuda -p mistralrs-server
./target/release/mistralrs-server --port 18082 gguf \
-m Qwen/Qwen2.5-0.5B-Instruct-GGUF \
-f qwen2.5-0.5b-instruct-q4_k_m.ggufThe exact client path was Environment: GCP Spot Base result: exact |
|
Ready for maintainer review. Evidence attached shows the original/base failure and current-head pass under the scoped issue conditions. The PR may use the |
4e61107 to
c2545db
Compare
c2545db to
e8cf506
Compare
Note
Agent 4 A100 validation update (2026-05-13 UTC): classification
ACTUAL, feasibilityFEASIBLE_NOW. On A100 base2d4ba4f16f61e5e18be085d0dd137bc95cba038a, the exact/v1/responsesrequest withbackground=trueandstream=truefailed when polled withunknown_error: Unexpected response type. On PR head61219e17256b67920e56d630a8c63b89b1e81be2, the same request completed withoutput_text:"4"andbackground:true. Controls passed:background=true, stream=false,background=false, stream=true, and normal/v1/chat/completions; server log scan forerror|panic|channel_closed|failedhad no matches. “Fixes #1945” is honest for this path.Fix: Force Non-Streaming for Background Request Generation
This PR forces
stream=falsefor background tasks inside the Responses API handler (/v1/responseswithbackground=true) to prevent the generator from hanging indefinitely while waiting for an HTTP stream that does not exist.Root Cause
When
background=trueis set in a Responses API request, the handler spawns a background task that expects a terminalResponse::Done. If the upstream request includesstream=true, the engine can send chunk events instead of the non-streaming terminal shape expected by that background task.Fix
oairequest.stream = Some(false)beforeparse_openresponses_requestin theif background {branch.Response::Chunkevents and only complete the task onResponse::Done.Runtime Validation (GCP g2-standard-8, L4 GPU, Qwen2.5-0.5B Q4_K_M GGUF)
Branch head tested:
bd8ad18517e8c1e10f8fe4d6dcd6528d8afa73bfHardware: GCP
g2-standard-8, 1x NVIDIA L4, driver580.126.09, CUDA 12.9Build:
Result: build completed successfully on the L4 VM.
Server:
Server log included:
Exact bug-trigger request:
background=true+stream=truevia/v1/responses:Initial response:
{"id":"resp_ffa42766-f0cc-4935-8c72-8c3c355ab22d","object":"response","created_at":1777337731,"model":"default","status":"queued","output":[],"metadata":null}Poll command:
Final poll response:
{"id":"resp_ffa42766-f0cc-4935-8c72-8c3c355ab22d","object":"response","created_at":1777337731,"completed_at":1777337731,"model":"Qwen/Qwen2.5-0.5B-Instruct-GGUF","status":"completed","output":[{"type":"message","id":"msg_d24e9970-4040-4fd9-b0b4-d2c16c986fa5","role":"assistant","content":[{"type":"output_text","text":"4"}],"status":"completed"}],"output_text":"4","usage":{"input_tokens":42,"output_tokens":2,"total_tokens":44},"background":true}Server log scan:
grep -Ei "error|panic|channel_closed|failed" /tmp/mistral_responses_1777337716.logResult: no matches.
This validates the intended
/v1/responsesbackground + streaming request path on the PR head.A100 validation (2026-05-01)
Additional validation on GCP Spot
a2-highgpu-1gin[redacted-region]with 1x NVIDIA A100-SXM4-40GB, driver580.126.20, branch head61219e17256b67920e56d630a8c63b89b1e81be2.Result: build passed. The exact
/v1/responsesbug shape withbackground=trueandstream=truereturned an initial queued response, then completed when polled withoutput_text":"4"andbackground":true. A normal/v1/chat/completionssmoke also returnedOK. Server log scan forerror|panic|channel_closed|failedhad no matches.