Runtime crash (MPSGraph mlir::FloatType::getWidth() null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context
Summary
A model that stacks 2 or more GatedDeltaUpdate (Gated DeltaNet) layers together with one
attention layer reading a dynamic-length KV context (the Qwen3.5 / Qwen3-Next hybrid shape)
exports and loads fine, then segfaults on the first execution:
thread #23, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: MetalPerformanceShadersGraph`mlir::FloatType::getWidth() + 16
A null FloatType is dereferenced during MPSGraph JIT compilation of the executed function.
The crash is governed precisely by the number of GatedDeltaUpdate layers:
| Model |
Result |
| 1 GatedDeltaUpdate layer + 1 attention layer (dynamic context) |
runs (exit 0) |
| 2 GatedDeltaUpdate layers + 1 attention layer (dynamic context) |
EXC_BAD_ACCESS at execute |
| N stacked GatedDeltaUpdate layers, no attention, fully static export |
runs (verified to N=3) |
So it is not the multi-layer DeltaNet state handling (that stacks fine on its own), and
not attention alone (a single attention layer with a static query + dynamic context runs).
It is the combination: 2+ DeltaNet scf.while scans plus a dynamic KV-context dimension in
the same exported function.
Environment
coreai-torch 0.4.0, coreai-models @ b1cb71b
- torch 2.9.0, Python 3.11.15
- macOS 27.0 (build 26A5353q), arm64 (Apple M1 Max, 32 GB)
Repro
Full self-contained script: toy_hybrid.py (≈180 lines; attached / available on request).
It builds embed → N×[RMSNorm + GatedDeltaNet(conv1d+GatedDeltaUpdate+gated RMSNorm) + MLP] → [RMSNorm + gpt_oss-style attention + MLP] → RMSNorm → lm_head, with a static query
(input_ids, len 12) and a dynamic KV context (position_ids len + KV-cache seq dim).
Shape policy (the crux): the query length is static so each DeltaNet GatedDeltaUpdate's
scf.while lowers; the attention reads a dynamic context the gpt_oss.py way
(sequence_length = position_ids.shape[-1], offset = sequence_length - query_len,
cache.update_and_fetch(..., seq_len=sequence_length, query_len=query_len)).
dynamic_shapes = {
"input_ids": None, # STATIC query chunk
"position_ids": {1: torch.export.Dim("ctx", min=13, max=64)}, # DYNAMIC context
"k_cache": {3: torch.export.Dim("kseq", min=13, max=64)},
"v_cache": {3: torch.export.Dim("vseq", min=13, max=64)},
"conv_state": None, "ssm_state": None,
}
prog = export_to_coreai(model, inputs, input_names=("input_ids","position_ids"),
output_names=("logits",),
state_names=("k_cache","v_cache","conv_state","ssm_state"),
dynamic_shapes=dynamic_shapes)
prog.optimize(); prog.save_asset(path)
m = await AIModel.load(path)
fn = m.load_function(m.function_names[0])
await fn(inputs=..., state=...) # <-- EXC_BAD_ACCESS for NDELTA=2, OK for NDELTA=1
Run:
NDELTA=1 uv run python toy_hybrid.py # RAN — exit 0
NDELTA=2 uv run python toy_hybrid.py # EXC_BAD_ACCESS in MPSGraph FloatType::getWidth()
Notes
- SDPA is de-externalized in the repro (removed from
_EXTERNALIZE_SPECS so it decomposes
to primitive ops) purely to work around a separate coreai-torch externalize bug that blocks
exporting a static-query / dynamic-context model with externalized SDPA (filed separately —
the d_20 unbounded-key-dim issue). If that export bug is fixed and SDPA stays fused, this
MPSGraph crash should be re-checked with the fused attention kernel.
- The model is tiny (hidden 256, 2 layers), so this is not a memory-pressure issue.
- This blocks running native Qwen3.5 / Qwen3-Next style hybrids (e.g. a 27B with 48 DeltaNet +
16 attention layers) on Core AI, since they necessarily stack many DeltaNet layers alongside
attention with a dynamic context.
Runtime crash (MPSGraph
mlir::FloatType::getWidth()null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV contextSummary
A model that stacks 2 or more
GatedDeltaUpdate(Gated DeltaNet) layers together with oneattention layer reading a dynamic-length KV context (the Qwen3.5 / Qwen3-Next hybrid shape)
exports and loads fine, then segfaults on the first execution:
A null
FloatTypeis dereferenced during MPSGraph JIT compilation of the executed function.The crash is governed precisely by the number of
GatedDeltaUpdatelayers:So it is not the multi-layer DeltaNet state handling (that stacks fine on its own), and
not attention alone (a single attention layer with a static query + dynamic context runs).
It is the combination: 2+ DeltaNet
scf.whilescans plus a dynamic KV-context dimension inthe same exported function.
Environment
coreai-torch0.4.0,coreai-models@ b1cb71bRepro
Full self-contained script:
toy_hybrid.py(≈180 lines; attached / available on request).It builds
embed → N×[RMSNorm + GatedDeltaNet(conv1d+GatedDeltaUpdate+gated RMSNorm) + MLP] → [RMSNorm + gpt_oss-style attention + MLP] → RMSNorm → lm_head, with a static query(
input_ids, len 12) and a dynamic KV context (position_idslen + KV-cache seq dim).Shape policy (the crux): the query length is static so each DeltaNet
GatedDeltaUpdate'sscf.whilelowers; the attention reads a dynamic context the gpt_oss.py way(
sequence_length = position_ids.shape[-1],offset = sequence_length - query_len,cache.update_and_fetch(..., seq_len=sequence_length, query_len=query_len)).Run:
Notes
_EXTERNALIZE_SPECSso it decomposesto primitive ops) purely to work around a separate coreai-torch externalize bug that blocks
exporting a static-query / dynamic-context model with externalized SDPA (filed separately —
the
d_20unbounded-key-dim issue). If that export bug is fixed and SDPA stays fused, thisMPSGraph crash should be re-checked with the fused attention kernel.
16 attention layers) on Core AI, since they necessarily stack many DeltaNet layers alongside
attention with a dynamic context.