Skip to content

fix(metal): back zero-element KV cache buffers with a shared placeholder#2206

Open
sergey-scherbina wants to merge 1 commit into
EricLBuehler:masterfrom
sergey-scherbina:metal-zero-kv-buffer
Open

fix(metal): back zero-element KV cache buffers with a shared placeholder#2206
sergey-scherbina wants to merge 1 commit into
EricLBuehler:masterfrom
sergey-scherbina:metal-zero-kv-buffer

Conversation

@sergey-scherbina

@sergey-scherbina sergey-scherbina commented Jun 11, 2026

Copy link
Copy Markdown

What

PagedAttention allocates a per-layer KV cache block buffer sized to the layer's
KV. Hybrid models have layers with no KV (the linear-attention / GatedDeltaNet
layers carry a recurrent state instead), so that size is 0. On Metal,
new_private_buffer(0) produces a zero-length buffer that later indexing treats as
invalid. Back those zero-element buffers with a shared 1-element placeholder
(elem_count.max(1)) so the no-KV layers allocate something valid and are simply
never read as KV.

Why

Without this, any hybrid model (e.g. Qwen3.6 qwen3_5_moe) crashes on Metal as soon
as the cache engine sets up the no-KV layers. The change is a harmless general
hardening for the dense path (a layer with KV is unaffected; max(1) is a no-op
there).

Scope

mistralrs-core/src/paged_attention/cache_engine.rs, +22/-5. Self-contained.

This is a prerequisite for #2201 (Qwen3.6 on Metal). It is split out as its own small
PR for reviewability; suggested merge order: this + the engine-reap fix, then #2201,
then the chunked-prefill PR.


Part of splitting the Qwen3.6 work into focused, reviewable PRs:

Suggested merge order: #2206 + #2207 -> #2201 -> #2208.

Hybrid models (GDN linear-attention + sparse full-attention) produce a 0-element
KV cache tensor for no-KV layers. Metal rejects newBufferWithLength:0, so loading
Qwen3.6-35B-A3B failed with 'Failed to create metal resource: Buffer'. Route the
k/v allocations through a closure that hands all such layers a clone of one
lazily-created 1-element placeholder (never read); the tensor shape stays 0-dim.
@github-actions

Copy link
Copy Markdown
Code Metrics Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language              Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 C Header                 23         4454         3116          790          548
 CSS                       3          281          252            5           24
 CUDA                    119        23575        19136         1696         2743
 Dockerfile                1           38           21            8            9
 HTML                      2           27           27            0            0
 JavaScript                3          392          387            2            3
 Jinja2                    7          694          656            5           33
 JSON                     26         9360         9357            0            3
 Makefile                  1            6            5            0            1
 MDX                       1          149            0          133           16
 Metal Shading Lan|       37        14287        11284         1136         1867
 PowerShell                1          357          276           33           48
 Python                  131        10342         8515          460         1367
 Shell                     2          549          379          101           69
 Plain Text                3         3723            0         2413         1310
 TOML                     29         1388         1211           41          136
 TypeScript               11         1607         1371           66          170
 YAML                      3           25           23            2            0
─────────────────────────────────────────────────────────────────────────────────
 Jupyter Notebooks         3          122           83           23           16
 |- Markdown               1           60           30           22            8
 |- Python                 1          122          113            1            8
 (Total)                              304          226           46           32
─────────────────────────────────────────────────────────────────────────────────
 Markdown                129         9703            0         6648         3055
 |- BASH                  61          600          520           47           33
 |- Dockerfile             2            5            5            0            0
 |- JSON                  18          700          700            0            0
 |- PowerShell             3            5            5            0            0
 |- Python                25          830          722            5          103
 |- Rust                  15          437          382            1           54
 |- TOML                  10          124           98            3           23
 |- YAML                   1           13           13            0            0
 (Total)                            12417         2445         6704         3268
─────────────────────────────────────────────────────────────────────────────────
 Rust                    625       270388       239956         5864        24568
 |- Markdown             397         9504          452         7882         1170
 (Total)                           279892       240408        13746        25738
─────────────────────────────────────────────────────────────────────────────────
 Svelte                   18         1831         1696           50           85
 |- CSS                    1            4            4            0            0
 |- JavaScript            18          876          727           24          125
 (Total)                             2711         2427           74          210
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Total                  1178       366578       301522        27461        37595
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant