Skip to content

Runtime MPSGraph null-deref (mlir::FloatType::getWidth) executing 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context #2

Description

@scndls

Runtime crash (MPSGraph mlir::FloatType::getWidth() null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context

Summary

A model that stacks 2 or more GatedDeltaUpdate (Gated DeltaNet) layers together with one
attention layer reading a dynamic-length KV context
(the Qwen3.5 / Qwen3-Next hybrid shape)
exports and loads fine, then segfaults on the first execution:

thread #23, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: MetalPerformanceShadersGraph`mlir::FloatType::getWidth() + 16

A null FloatType is dereferenced during MPSGraph JIT compilation of the executed function.

The crash is governed precisely by the number of GatedDeltaUpdate layers:

Model Result
1 GatedDeltaUpdate layer + 1 attention layer (dynamic context) runs (exit 0)
2 GatedDeltaUpdate layers + 1 attention layer (dynamic context) EXC_BAD_ACCESS at execute
N stacked GatedDeltaUpdate layers, no attention, fully static export runs (verified to N=3)

So it is not the multi-layer DeltaNet state handling (that stacks fine on its own), and
not attention alone (a single attention layer with a static query + dynamic context runs).
It is the combination: 2+ DeltaNet scf.while scans plus a dynamic KV-context dimension in
the same exported function.

Environment

  • coreai-torch 0.4.0, coreai-models @ b1cb71b
  • torch 2.9.0, Python 3.11.15
  • macOS 27.0 (build 26A5353q), arm64 (Apple M1 Max, 32 GB)

Repro

Full self-contained script: toy_hybrid.py (≈180 lines; attached / available on request).
It builds embed → N×[RMSNorm + GatedDeltaNet(conv1d+GatedDeltaUpdate+gated RMSNorm) + MLP] → [RMSNorm + gpt_oss-style attention + MLP] → RMSNorm → lm_head, with a static query
(input_ids, len 12) and a dynamic KV context (position_ids len + KV-cache seq dim).

Shape policy (the crux): the query length is static so each DeltaNet GatedDeltaUpdate's
scf.while lowers; the attention reads a dynamic context the gpt_oss.py way
(sequence_length = position_ids.shape[-1], offset = sequence_length - query_len,
cache.update_and_fetch(..., seq_len=sequence_length, query_len=query_len)).

dynamic_shapes = {
    "input_ids": None,                                              # STATIC query chunk
    "position_ids": {1: torch.export.Dim("ctx",  min=13, max=64)},  # DYNAMIC context
    "k_cache":      {3: torch.export.Dim("kseq", min=13, max=64)},
    "v_cache":      {3: torch.export.Dim("vseq", min=13, max=64)},
    "conv_state": None, "ssm_state": None,
}
prog = export_to_coreai(model, inputs, input_names=("input_ids","position_ids"),
                        output_names=("logits",),
                        state_names=("k_cache","v_cache","conv_state","ssm_state"),
                        dynamic_shapes=dynamic_shapes)
prog.optimize(); prog.save_asset(path)
m = await AIModel.load(path)
fn = m.load_function(m.function_names[0])
await fn(inputs=..., state=...)          # <-- EXC_BAD_ACCESS for NDELTA=2, OK for NDELTA=1

Run:

NDELTA=1 uv run python toy_hybrid.py    # RAN — exit 0
NDELTA=2 uv run python toy_hybrid.py    # EXC_BAD_ACCESS in MPSGraph FloatType::getWidth()

Notes

  • SDPA is de-externalized in the repro (removed from _EXTERNALIZE_SPECS so it decomposes
    to primitive ops) purely to work around a separate coreai-torch externalize bug that blocks
    exporting a static-query / dynamic-context model with externalized SDPA (filed separately —
    the d_20 unbounded-key-dim issue). If that export bug is fixed and SDPA stays fused, this
    MPSGraph crash should be re-checked with the fused attention kernel.
  • The model is tiny (hidden 256, 2 layers), so this is not a memory-pressure issue.
  • This blocks running native Qwen3.5 / Qwen3-Next style hybrids (e.g. a 27B with 48 DeltaNet +
    16 attention layers) on Core AI, since they necessarily stack many DeltaNet layers alongside
    attention with a dynamic context.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions