fix(qwen-vl): make chunked MRoPE slicing offset-aware#2083
Conversation
35b9b34 to
d80776e
Compare
Code Metrics Report━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Language Files Lines Code Comments Blanks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ C Header 5 305 210 52 43 CSS 2 1181 1036 34 111 CUDA 59 17706 13869 1637 2200 Dockerfile 1 39 22 8 9 HTML 2 235 197 14 24 JavaScript 16 3580 2702 486 392 Jinja2 7 694 656 5 33 JSON 21 409 406 0 3 Makefile 1 6 5 0 1 Metal Shading Lan| 31 11647 9007 1064 1576 PowerShell 1 300 227 30 43 Python 125 8316 6808 412 1096 Shell 2 485 329 95 61 Plain Text 3 3723 0 2413 1310 TOML 27 1290 1124 35 131 YAML 3 25 23 2 0 ───────────────────────────────────────────────────────────────────────────────── Jupyter Notebooks 3 122 83 23 16 |- Markdown 1 60 30 22 8 |- Python 1 122 113 1 8 (Total) 304 226 46 32 ───────────────────────────────────────────────────────────────────────────────── Markdown 105 11197 0 8067 3130 |- BASH 72 934 691 149 94 |- Dockerfile 1 1 1 0 0 |- JSON 20 719 719 0 0 |- PowerShell 3 3 3 0 0 |- Python 23 1038 862 60 116 |- Rust 51 2048 1718 54 276 |- TOML 6 207 164 0 43 |- YAML 2 9 8 1 0 (Total) 16156 4166 8331 3659 ───────────────────────────────────────────────────────────────────────────────── Rust 547 236072 207590 6565 21917 |- Markdown 361 8962 452 7385 1125 (Total) 245034 208042 13950 23042 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total 961 311435 249055 28614 33766 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ |
Fixes CI failures introduced by EricLBuehler#2109 (fast CUDA MMQ GGUF kernels) merged on 2026-04-15, combined with new lints in Rust 1.95 stable: Typos: - Exclude vendored mistralrs-quant/kernels/mmq_gguf/ directory - Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN to ignore Rustfmt (1.95): - Fix import ordering in openapi_doc.rs - Reformat lines affected by other lint fixes Clippy (1.95): - useless_conversion: remove redundant .into_iter() in zip/extend calls across tool_dispatch, rag, llava, idefics3, gemma4, distributed/layers - iter_kv_map: use .into_values().flatten() in default_scheduler - manual_checked_ops: use .checked_div() in distributed/layers, video.rs, pyo3/util.rs - let_unit_value: remove unit let binding in bench.rs - dead_code: allow unused num_experts field in MoEExperts (set but unread) Signed-off-by: glaziermag <glaziermag@users.noreply.github.com>
d80776e to
7f7aca6
Compare
7f7aca6 to
0121e00
Compare
Fixes CI failures introduced by EricLBuehler#2109 (fast CUDA MMQ GGUF kernels) merged on 2026-04-15, combined with new lints in Rust 1.95 stable: Typos: - Exclude vendored mistralrs-quant/kernels/mmq_gguf/ directory - Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN to ignore Rustfmt (1.95): - Fix import ordering in openapi_doc.rs - Reformat lines affected by other lint fixes Clippy (1.95): - useless_conversion: remove redundant .into_iter() in zip/extend calls across tool_dispatch, rag, llava, idefics3, gemma4, distributed/layers - iter_kv_map: use .into_values().flatten() in default_scheduler - manual_checked_ops: use .checked_div() in distributed/layers, video.rs, pyo3/util.rs - let_unit_value: remove unit let binding in bench.rs - dead_code: allow unused num_experts field in MoEExperts (set but unread) Signed-off-by: glaziermag <glaziermag@users.noreply.github.com> (cherry picked from commit 0a11bf2)
|
Agent 6 follow-up on existing A100 validation: this remains valid as a targeted Qwen-VL fix. Classification: |
|
Marked ready for review. Validation evidence and narrowed claim wording are already attached in the PR discussion/body. This PR is ready under the scoped claim described in the PR. Ready for maintainer review under the narrowed claim in the PR body. This is a targeted/invariant fix and should not be read as full closure of the broader linked issue unless the PR body explicitly says so. |
Track 5 open upstream PRs that affect our fork: EricLBuehler#2129 (GGUF Qwen3.5), EricLBuehler#2116 (KvCacheCodec — enables Polar/Turbo), EricLBuehler#2089 (same GEMV fix we ported), EricLBuehler#2083 (Qwen VL MRoPE), EricLBuehler#2054 (GDN TP). Note Metal test results: 282 passed, 0 failures. Windows CUDA deferred (PC offline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concrete issue fixed
Qwen-VL chunked prompt forwards must slice MRoPE position IDs by the active chunk offset, not by taking the suffix of the full prompt position tensor.
The broken pattern was present in the Qwen3 family path:
That is wrong whenever prompt chunking is active and the current chunk is not the final suffix of the full prompt. It can silently assign the wrong 3D position IDs to a chunk.
Fix
Compute position IDs from the full input sequence, then slice each batch item by its
seqlen_offsetsentry and current chunk length.This now uses the same checked helper for:
qwen3_vland related Qwen3-VL paths already covered by the original PR;qwen2vl;qwen2_5_vl.The helper rejects mismatched batch/offset lengths and out-of-range chunk windows instead of silently slicing the wrong positions.
Tests
Current branch head after Agent 5 follow-up:
8efa949d512a1356a59fe305b02dbfc1db600dab.cargo test -q -p mistralrs-core qwen3_vl::tests::chunked_mrope --libA100 result: passed,
3 passed; 0 failed.The focused tests cover:
A100 validation, Agent 5, 2026-05-13
Hardware/software:
a2-highgpu-1g, 1xNVIDIA A100-SXM4-40GB, 40960 MiB580.126.09;nvidia-smiCUDA13.0; CUDA toolkit12.9.411.95.0; Cargo1.95.0Recovered original #1815 conditions:
6c77f32d37563367e707ea945f698f74b3e9af4fVisionModelBuilder/VisionMessagesQwen/Qwen3-VL-4B-InstructIsqType::Q4Khttps://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpgWhat is this?1024A weight is negative, too large or not a valid numberHistorical exact-repro command on A100:
Result on
6c77f32d...: reproduced the original failure after model load and dummy run:Current-API equivalent on previous PR head
ecd72e3ff6ca858226c6d8745a2589e75b25af7d:Result: passed; model loaded, dummy run completed, and generation returned text for the Mount Washington image.
Same current-API equivalent on current branch head
8efa949d512a1356a59fe305b02dbfc1db600dab: passed with the same model/image/prompt setup.Relationship to #1815
Classification:
TARGETED, notACTUAL.The old #1815 sample was recovered and reproduced on its historical commit, and the closest current-API equivalent passes on this PR branch. However, the old
VisionModelBuilder/VisionMessagesAPI no longer exists on this branch line, so this is not a same-source before/after proof that the chunk-offset change fixes the original sampler error.Safe wording: Fixes and tests Qwen-VL chunk-offset MRoPE slicing; does not conclusively close #1815.