Skip to content

fix(qwen-vl): make chunked MRoPE slicing offset-aware#2083

Open
glaziermag wants to merge 6 commits into
EricLBuehler:masterfrom
glaziermag:fix-1815-qwen-chunk-index-bounds
Open

fix(qwen-vl): make chunked MRoPE slicing offset-aware#2083
glaziermag wants to merge 6 commits into
EricLBuehler:masterfrom
glaziermag:fix-1815-qwen-chunk-index-bounds

Conversation

@glaziermag

@glaziermag glaziermag commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Concrete issue fixed

Qwen-VL chunked prompt forwards must slice MRoPE position IDs by the active chunk offset, not by taking the suffix of the full prompt position tensor.

The broken pattern was present in the Qwen3 family path:

let full_len = position_ids.dim(2)?;
let trimmed_len = input_ids.dim(1)?;
position_ids.narrow(2, full_len - trimmed_len, trimmed_len)?

That is wrong whenever prompt chunking is active and the current chunk is not the final suffix of the full prompt. It can silently assign the wrong 3D position IDs to a chunk.

Fix

Compute position IDs from the full input sequence, then slice each batch item by its seqlen_offsets entry and current chunk length.

This now uses the same checked helper for:

  • qwen3_vl and related Qwen3-VL paths already covered by the original PR;
  • qwen2vl;
  • qwen2_5_vl.

The helper rejects mismatched batch/offset lengths and out-of-range chunk windows instead of silently slicing the wrong positions.

Tests

Current branch head after Agent 5 follow-up: 8efa949d512a1356a59fe305b02dbfc1db600dab.

cargo test -q -p mistralrs-core qwen3_vl::tests::chunked_mrope --lib

A100 result: passed, 3 passed; 0 failed.

The focused tests cover:

  • nonzero chunk offsets slice the expected full-position window;
  • out-of-range offsets error;
  • batch/offset length mismatch errors.

A100 validation, Agent 5, 2026-05-13

Hardware/software:

  • GCP a2-highgpu-1g, 1x NVIDIA A100-SXM4-40GB, 40960 MiB
  • Driver 580.126.09; nvidia-smi CUDA 13.0; CUDA toolkit 12.9.41
  • Rust 1.95.0; Cargo 1.95.0

Recovered original #1815 conditions:

  • historical issue commit: 6c77f32d37563367e707ea945f698f74b3e9af4f
  • old API: VisionModelBuilder / VisionMessages
  • model: Qwen/Qwen3-VL-4B-Instruct
  • ISQ: IsqType::Q4K
  • image: https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg
  • prompt: What is this?
  • observed prompt chunk size: 1024
  • original reported error: A weight is negative, too large or not a valid number

Historical exact-repro command on A100:

RUST_LOG=info cargo run --release --features cuda -p mistralrs --example issue1815

Result on 6c77f32d...: reproduced the original failure after model load and dummy run:

Error: A weight is negative, too large or not a valid number

Current-API equivalent on previous PR head ecd72e3ff6ca858226c6d8745a2589e75b25af7d:

RUST_LOG=info cargo run --release --features cuda -p mistralrs --example issue1815_current

Result: passed; model loaded, dummy run completed, and generation returned text for the Mount Washington image.

Same current-API equivalent on current branch head 8efa949d512a1356a59fe305b02dbfc1db600dab: passed with the same model/image/prompt setup.

Relationship to #1815

Classification: TARGETED, not ACTUAL.

The old #1815 sample was recovered and reproduced on its historical commit, and the closest current-API equivalent passes on this PR branch. However, the old VisionModelBuilder / VisionMessages API no longer exists on this branch line, so this is not a same-source before/after proof that the chunk-offset change fixes the original sampler error.

Safe wording: Fixes and tests Qwen-VL chunk-offset MRoPE slicing; does not conclusively close #1815.

@glaziermag glaziermag marked this pull request as draft April 9, 2026 01:39
@glaziermag glaziermag marked this pull request as ready for review April 9, 2026 01:43
@glaziermag glaziermag force-pushed the fix-1815-qwen-chunk-index-bounds branch from 35b9b34 to d80776e Compare April 17, 2026 19:05
@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Code Metrics Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 C Header                  5          305          210           52           43
 CSS                       2         1181         1036           34          111
 CUDA                     59        17706        13869         1637         2200
 Dockerfile                1           39           22            8            9
 HTML                      2          235          197           14           24
 JavaScript               16         3580         2702          486          392
 Jinja2                    7          694          656            5           33
 JSON                     21          409          406            0            3
 Makefile                  1            6            5            0            1
 Metal Shading Lan|       31        11647         9007         1064         1576
 PowerShell                1          300          227           30           43
 Python                  125         8316         6808          412         1096
 Shell                     2          485          329           95           61
 Plain Text                3         3723            0         2413         1310
 TOML                     27         1290         1124           35          131
 YAML                      3           25           23            2            0
─────────────────────────────────────────────────────────────────────────────────
 Jupyter Notebooks         3          122           83           23           16
 |- Markdown               1           60           30           22            8
 |- Python                 1          122          113            1            8
 (Total)                              304          226           46           32
─────────────────────────────────────────────────────────────────────────────────
 Markdown                105        11197            0         8067         3130
 |- BASH                  72          934          691          149           94
 |- Dockerfile             1            1            1            0            0
 |- JSON                  20          719          719            0            0
 |- PowerShell             3            3            3            0            0
 |- Python                23         1038          862           60          116
 |- Rust                  51         2048         1718           54          276
 |- TOML                   6          207          164            0           43
 |- YAML                   2            9            8            1            0
 (Total)                            16156         4166         8331         3659
─────────────────────────────────────────────────────────────────────────────────
 Rust                    547       236072       207590         6565        21917
 |- Markdown             361         8962          452         7385         1125
 (Total)                           245034       208042        13950        23042
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                   961       311435       249055        28614        33766
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

@glaziermag

Copy link
Copy Markdown
Contributor Author

Housekeeping note: This branch currently bundles the CI fix from #2115 (.typos.toml, openapi_doc.rs, distributed/layers.rs). Once #2115 is merged, this branch will need a rebase onto updated master to drop the duplicate CI fix commit and resolve the resulting conflicts.

Fixes CI failures introduced by EricLBuehler#2109 (fast CUDA MMQ GGUF kernels) merged
on 2026-04-15, combined with new lints in Rust 1.95 stable:

Typos:
- Exclude vendored mistralrs-quant/kernels/mmq_gguf/ directory
- Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN to ignore

Rustfmt (1.95):
- Fix import ordering in openapi_doc.rs
- Reformat lines affected by other lint fixes

Clippy (1.95):
- useless_conversion: remove redundant .into_iter() in zip/extend calls
  across tool_dispatch, rag, llava, idefics3, gemma4, distributed/layers
- iter_kv_map: use .into_values().flatten() in default_scheduler
- manual_checked_ops: use .checked_div() in distributed/layers, video.rs,
  pyo3/util.rs
- let_unit_value: remove unit let binding in bench.rs
- dead_code: allow unused num_experts field in MoEExperts (set but unread)

Signed-off-by: glaziermag <glaziermag@users.noreply.github.com>
@glaziermag glaziermag force-pushed the fix-1815-qwen-chunk-index-bounds branch from d80776e to 7f7aca6 Compare April 18, 2026 02:31
@glaziermag glaziermag force-pushed the fix-1815-qwen-chunk-index-bounds branch from 7f7aca6 to 0121e00 Compare April 27, 2026 16:46
Fixes CI failures introduced by EricLBuehler#2109 (fast CUDA MMQ GGUF kernels) merged
on 2026-04-15, combined with new lints in Rust 1.95 stable:

Typos:
- Exclude vendored mistralrs-quant/kernels/mmq_gguf/ directory
- Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN to ignore

Rustfmt (1.95):
- Fix import ordering in openapi_doc.rs
- Reformat lines affected by other lint fixes

Clippy (1.95):
- useless_conversion: remove redundant .into_iter() in zip/extend calls
  across tool_dispatch, rag, llava, idefics3, gemma4, distributed/layers
- iter_kv_map: use .into_values().flatten() in default_scheduler
- manual_checked_ops: use .checked_div() in distributed/layers, video.rs,
  pyo3/util.rs
- let_unit_value: remove unit let binding in bench.rs
- dead_code: allow unused num_experts field in MoEExperts (set but unread)

Signed-off-by: glaziermag <glaziermag@users.noreply.github.com>
(cherry picked from commit 0a11bf2)
@glaziermag glaziermag changed the title fix: correct Qwen chunked CUDA index bounds (#1815) fix(qwen-vl): slice MRoPE position IDs for chunked forwards Apr 28, 2026
@glaziermag glaziermag changed the title fix(qwen-vl): slice MRoPE position IDs for chunked forwards fix(qwen-vl): slice chunked MRoPE position IDs by offset Apr 28, 2026
@glaziermag glaziermag changed the title fix(qwen-vl): slice chunked MRoPE position IDs by offset fix(qwen-vl): make chunked MRoPE slicing offset-aware Apr 28, 2026
@glaziermag glaziermag marked this pull request as draft May 5, 2026 19:16

Copy link
Copy Markdown
Contributor Author

Agent 6 follow-up on existing A100 validation: this remains valid as a targeted Qwen-VL fix. Classification: TARGETED; feasibility: FEASIBLE_NOW. The historical API path could not be reproduced exactly, but the A100 validation covers the current equivalent chunked MRoPE slicing behavior. Safe wording should avoid claiming an exact reproduction of the old issue. Recommendation: keep open/draft for review.

@glaziermag glaziermag marked this pull request as ready for review May 18, 2026 23:39
@glaziermag

Copy link
Copy Markdown
Contributor Author

Marked ready for review. Validation evidence and narrowed claim wording are already attached in the PR discussion/body. This PR is ready under the scoped claim described in the PR.

Ready for maintainer review under the narrowed claim in the PR body. This is a targeted/invariant fix and should not be read as full closure of the broader linked issue unless the PR body explicitly says so.

Jamesrobertsonldn added a commit to Jamesrobertsonldn/mistral.rs that referenced this pull request May 19, 2026
Track 5 open upstream PRs that affect our fork: EricLBuehler#2129 (GGUF Qwen3.5),
EricLBuehler#2116 (KvCacheCodec — enables Polar/Turbo), EricLBuehler#2089 (same GEMV fix we
ported), EricLBuehler#2083 (Qwen VL MRoPE), EricLBuehler#2054 (GDN TP). Note Metal test
results: 282 passed, 0 failures. Windows CUDA deferred (PC offline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vision: A weight is negative, too large or not a valid number (cuda)

1 participant