fix(qwen-vl): make chunked MRoPE slicing offset-aware by glaziermag · Pull Request #2083 · EricLBuehler/mistral.rs

glaziermag · 2026-04-09T01:34:48Z

Concrete issue fixed

Qwen-VL chunked prompt forwards must slice MRoPE position IDs by the active chunk offset, not by taking the suffix of the full prompt position tensor.

The broken pattern was present in the Qwen3 family path:

let full_len = position_ids.dim(2)?;
let trimmed_len = input_ids.dim(1)?;
position_ids.narrow(2, full_len - trimmed_len, trimmed_len)?

That is wrong whenever prompt chunking is active and the current chunk is not the final suffix of the full prompt. It can silently assign the wrong 3D position IDs to a chunk.

Fix

Compute position IDs from the full input sequence, then slice each batch item by its seqlen_offsets entry and current chunk length.

This now uses the same checked helper for:

qwen3_vl and related Qwen3-VL paths already covered by the original PR;
qwen2vl;
qwen2_5_vl.

The helper rejects mismatched batch/offset lengths and out-of-range chunk windows instead of silently slicing the wrong positions.

Tests

Current branch head after Agent 5 follow-up: 8efa949d512a1356a59fe305b02dbfc1db600dab.

cargo test -q -p mistralrs-core qwen3_vl::tests::chunked_mrope --lib

A100 result: passed, 3 passed; 0 failed.

The focused tests cover:

nonzero chunk offsets slice the expected full-position window;
out-of-range offsets error;
batch/offset length mismatch errors.

A100 validation, Agent 5, 2026-05-13

Hardware/software:

GCP a2-highgpu-1g, 1x NVIDIA A100-SXM4-40GB, 40960 MiB
Driver 580.126.09; nvidia-smi CUDA 13.0; CUDA toolkit 12.9.41
Rust 1.95.0; Cargo 1.95.0

Recovered original #1815 conditions:

historical issue commit: 6c77f32d37563367e707ea945f698f74b3e9af4f
old API: VisionModelBuilder / VisionMessages
model: Qwen/Qwen3-VL-4B-Instruct
ISQ: IsqType::Q4K
image: https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg
prompt: What is this?
observed prompt chunk size: 1024
original reported error: A weight is negative, too large or not a valid number

Historical exact-repro command on A100:

RUST_LOG=info cargo run --release --features cuda -p mistralrs --example issue1815

Result on 6c77f32d...: reproduced the original failure after model load and dummy run:

Error: A weight is negative, too large or not a valid number

Current-API equivalent on previous PR head ecd72e3ff6ca858226c6d8745a2589e75b25af7d:

RUST_LOG=info cargo run --release --features cuda -p mistralrs --example issue1815_current

Result: passed; model loaded, dummy run completed, and generation returned text for the Mount Washington image.

Same current-API equivalent on current branch head 8efa949d512a1356a59fe305b02dbfc1db600dab: passed with the same model/image/prompt setup.

Relationship to #1815

Classification: TARGETED, not ACTUAL.

The old #1815 sample was recovered and reproduced on its historical commit, and the closest current-API equivalent passes on this PR branch. However, the old VisionModelBuilder / VisionMessages API no longer exists on this branch line, so this is not a same-source before/after proof that the chunk-offset change fixes the original sampler error.

Safe wording: Fixes and tests Qwen-VL chunk-offset MRoPE slicing; does not conclusively close #1815.

github-actions · 2026-04-17T19:41:13Z

Code Metrics Report

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 C Header                  5          305          210           52           43
 CSS                       2         1181         1036           34          111
 CUDA                     59        17706        13869         1637         2200
 Dockerfile                1           39           22            8            9
 HTML                      2          235          197           14           24
 JavaScript               16         3580         2702          486          392
 Jinja2                    7          694          656            5           33
 JSON                     21          409          406            0            3
 Makefile                  1            6            5            0            1
 Metal Shading Lan|       31        11647         9007         1064         1576
 PowerShell                1          300          227           30           43
 Python                  125         8316         6808          412         1096
 Shell                     2          485          329           95           61
 Plain Text                3         3723            0         2413         1310
 TOML                     27         1290         1124           35          131
 YAML                      3           25           23            2            0
─────────────────────────────────────────────────────────────────────────────────
 Jupyter Notebooks         3          122           83           23           16
 |- Markdown               1           60           30           22            8
 |- Python                 1          122          113            1            8
 (Total)                              304          226           46           32
─────────────────────────────────────────────────────────────────────────────────
 Markdown                105        11197            0         8067         3130
 |- BASH                  72          934          691          149           94
 |- Dockerfile             1            1            1            0            0
 |- JSON                  20          719          719            0            0
 |- PowerShell             3            3            3            0            0
 |- Python                23         1038          862           60          116
 |- Rust                  51         2048         1718           54          276
 |- TOML                   6          207          164            0           43
 |- YAML                   2            9            8            1            0
 (Total)                            16156         4166         8331         3659
─────────────────────────────────────────────────────────────────────────────────
 Rust                    547       236072       207590         6565        21917
 |- Markdown             361         8962          452         7385         1125
 (Total)                           245034       208042        13950        23042
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                   961       311435       249055        28614        33766
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

glaziermag · 2026-04-17T22:19:51Z

Housekeeping note: This branch currently bundles the CI fix from #2115 (.typos.toml, openapi_doc.rs, distributed/layers.rs). Once #2115 is merged, this branch will need a rebase onto updated master to drop the duplicate CI fix commit and resolve the resulting conflicts.

Fixes CI failures introduced by EricLBuehler#2109 (fast CUDA MMQ GGUF kernels) merged on 2026-04-15, combined with new lints in Rust 1.95 stable: Typos: - Exclude vendored mistralrs-quant/kernels/mmq_gguf/ directory - Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN to ignore Rustfmt (1.95): - Fix import ordering in openapi_doc.rs - Reformat lines affected by other lint fixes Clippy (1.95): - useless_conversion: remove redundant .into_iter() in zip/extend calls across tool_dispatch, rag, llava, idefics3, gemma4, distributed/layers - iter_kv_map: use .into_values().flatten() in default_scheduler - manual_checked_ops: use .checked_div() in distributed/layers, video.rs, pyo3/util.rs - let_unit_value: remove unit let binding in bench.rs - dead_code: allow unused num_experts field in MoEExperts (set but unread) Signed-off-by: glaziermag <glaziermag@users.noreply.github.com>

Fixes CI failures introduced by EricLBuehler#2109 (fast CUDA MMQ GGUF kernels) merged on 2026-04-15, combined with new lints in Rust 1.95 stable: Typos: - Exclude vendored mistralrs-quant/kernels/mmq_gguf/ directory - Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN to ignore Rustfmt (1.95): - Fix import ordering in openapi_doc.rs - Reformat lines affected by other lint fixes Clippy (1.95): - useless_conversion: remove redundant .into_iter() in zip/extend calls across tool_dispatch, rag, llava, idefics3, gemma4, distributed/layers - iter_kv_map: use .into_values().flatten() in default_scheduler - manual_checked_ops: use .checked_div() in distributed/layers, video.rs, pyo3/util.rs - let_unit_value: remove unit let binding in bench.rs - dead_code: allow unused num_experts field in MoEExperts (set but unread) Signed-off-by: glaziermag <glaziermag@users.noreply.github.com> (cherry picked from commit 0a11bf2)

glaziermag · 2026-05-14T23:20:19Z

Agent 6 follow-up on existing A100 validation: this remains valid as a targeted Qwen-VL fix. Classification: TARGETED; feasibility: FEASIBLE_NOW. The historical API path could not be reproduced exactly, but the A100 validation covers the current equivalent chunked MRoPE slicing behavior. Safe wording should avoid claiming an exact reproduction of the old issue. Recommendation: keep open/draft for review.

glaziermag · 2026-05-18T23:39:04Z

Marked ready for review. Validation evidence and narrowed claim wording are already attached in the PR discussion/body. This PR is ready under the scoped claim described in the PR.

Ready for maintainer review under the narrowed claim in the PR body. This is a targeted/invariant fix and should not be read as full closure of the broader linked issue unless the PR body explicitly says so.

Track 5 open upstream PRs that affect our fork: EricLBuehler#2129 (GGUF Qwen3.5), EricLBuehler#2116 (KvCacheCodec — enables Polar/Turbo), EricLBuehler#2089 (same GEMV fix we ported), EricLBuehler#2083 (Qwen VL MRoPE), EricLBuehler#2054 (GDN TP). Note Metal test results: 282 passed, 0 failures. Windows CUDA deferred (PC offline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

glaziermag marked this pull request as draft April 9, 2026 01:39

glaziermag marked this pull request as ready for review April 9, 2026 01:43

glaziermag force-pushed the fix-1815-qwen-chunk-index-bounds branch from 35b9b34 to d80776e Compare April 17, 2026 19:05

glaziermag force-pushed the fix-1815-qwen-chunk-index-bounds branch from d80776e to 7f7aca6 Compare April 18, 2026 02:31

fix(qwen): correct CUDA index bounds failure during chunked VLM prefill

0121e00

glaziermag force-pushed the fix-1815-qwen-chunk-index-bounds branch from 7f7aca6 to 0121e00 Compare April 27, 2026 16:46

glaziermag added 2 commits April 27, 2026 12:01

chore: remove unrelated files from PR

8d63e0e

glaziermag changed the title ~~fix: correct Qwen chunked CUDA index bounds (#1815)~~ fix(qwen-vl): slice MRoPE position IDs for chunked forwards Apr 28, 2026

fix(qwen-vl): slice Qwen3 MRoPE chunks by offset

ecd72e3

glaziermag changed the title ~~fix(qwen-vl): slice MRoPE position IDs for chunked forwards~~ fix(qwen-vl): slice chunked MRoPE position IDs by offset Apr 28, 2026

glaziermag mentioned this pull request Apr 28, 2026

Vision: A weight is negative, too large or not a valid number (cuda) #1815

Open

glaziermag changed the title ~~fix(qwen-vl): slice chunked MRoPE position IDs by offset~~ fix(qwen-vl): make chunked MRoPE slicing offset-aware Apr 28, 2026

glaziermag marked this pull request as draft May 5, 2026 19:16

fix(qwen-vl): reuse checked MRoPE chunk slicing

8efa949

glaziermag marked this pull request as ready for review May 18, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwen-vl): make chunked MRoPE slicing offset-aware#2083

fix(qwen-vl): make chunked MRoPE slicing offset-aware#2083
glaziermag wants to merge 6 commits into
EricLBuehler:masterfrom
glaziermag:fix-1815-qwen-chunk-index-bounds

glaziermag commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

glaziermag commented Apr 17, 2026

Uh oh!

glaziermag commented May 14, 2026

Uh oh!

glaziermag commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glaziermag commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Concrete issue fixed

Fix

Tests

A100 validation, Agent 5, 2026-05-13

Relationship to #1815

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glaziermag commented Apr 17, 2026

Uh oh!

glaziermag commented May 14, 2026

Uh oh!

glaziermag commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

glaziermag commented Apr 9, 2026 •

edited

Loading

github-actions Bot commented Apr 17, 2026 •

edited

Loading