Runtime MPSGraph null-deref (mlir::FloatType::getWidth) executing 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context

# Runtime crash (MPSGraph `mlir::FloatType::getWidth()` null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context

## Summary

A model that stacks **2 or more `GatedDeltaUpdate` (Gated DeltaNet) layers** together with **one
attention layer reading a dynamic-length KV context** (the Qwen3.5 / Qwen3-Next hybrid shape)
exports and loads fine, then **segfaults on the first execution**:

```
thread #23, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: MetalPerformanceShadersGraph`mlir::FloatType::getWidth() + 16
```

A null `FloatType` is dereferenced during MPSGraph JIT compilation of the executed function.

The crash is governed precisely by the **number of `GatedDeltaUpdate` layers**:

| Model | Result |
|---|---|
| 1 GatedDeltaUpdate layer + 1 attention layer (dynamic context) | **runs** (exit 0) |
| **2** GatedDeltaUpdate layers + 1 attention layer (dynamic context) | **EXC_BAD_ACCESS** at execute |
| N stacked GatedDeltaUpdate layers, **no attention**, fully static export | runs (verified to N=3) |

So it is **not** the multi-layer DeltaNet state handling (that stacks fine on its own), and
**not** attention alone (a single attention layer with a static query + dynamic context runs).
It is the **combination**: 2+ DeltaNet `scf.while` scans plus a dynamic KV-context dimension in
the same exported function.

## Environment

- `coreai-torch` 0.4.0, `coreai-models` @ b1cb71b
- torch 2.9.0, Python 3.11.15
- macOS 27.0 (build 26A5353q), arm64 (Apple M1 Max, 32 GB)

## Repro

Full self-contained script: `toy_hybrid.py` (≈180 lines; attached / available on request).
It builds `embed → N×[RMSNorm + GatedDeltaNet(conv1d+GatedDeltaUpdate+gated RMSNorm) + MLP]
→ [RMSNorm + gpt_oss-style attention + MLP] → RMSNorm → lm_head`, with a **static** query
(`input_ids`, len 12) and a **dynamic** KV context (`position_ids` len + KV-cache seq dim).

Shape policy (the crux): the query length is static so each DeltaNet `GatedDeltaUpdate`'s
`scf.while` lowers; the attention reads a dynamic context the gpt_oss.py way
(`sequence_length = position_ids.shape[-1]`, `offset = sequence_length - query_len`,
`cache.update_and_fetch(..., seq_len=sequence_length, query_len=query_len)`).

```python
dynamic_shapes = {
    "input_ids": None,                                              # STATIC query chunk
    "position_ids": {1: torch.export.Dim("ctx",  min=13, max=64)},  # DYNAMIC context
    "k_cache":      {3: torch.export.Dim("kseq", min=13, max=64)},
    "v_cache":      {3: torch.export.Dim("vseq", min=13, max=64)},
    "conv_state": None, "ssm_state": None,
}
prog = export_to_coreai(model, inputs, input_names=("input_ids","position_ids"),
                        output_names=("logits",),
                        state_names=("k_cache","v_cache","conv_state","ssm_state"),
                        dynamic_shapes=dynamic_shapes)
prog.optimize(); prog.save_asset(path)
m = await AIModel.load(path)
fn = m.load_function(m.function_names[0])
await fn(inputs=..., state=...)          # <-- EXC_BAD_ACCESS for NDELTA=2, OK for NDELTA=1
```

Run:
```
NDELTA=1 uv run python toy_hybrid.py    # RAN — exit 0
NDELTA=2 uv run python toy_hybrid.py    # EXC_BAD_ACCESS in MPSGraph FloatType::getWidth()
```

## Notes

- SDPA is **de-externalized** in the repro (removed from `_EXTERNALIZE_SPECS` so it decomposes
  to primitive ops) purely to work around a separate coreai-torch externalize bug that blocks
  exporting a static-query / dynamic-context model with externalized SDPA (filed separately —
  the `d_20` unbounded-key-dim issue). If that export bug is fixed and SDPA stays fused, this
  MPSGraph crash should be re-checked with the fused attention kernel.
- The model is tiny (hidden 256, 2 layers), so this is not a memory-pressure issue.
- This blocks running native Qwen3.5 / Qwen3-Next style hybrids (e.g. a 27B with 48 DeltaNet +
  16 attention layers) on Core AI, since they necessarily stack many DeltaNet layers alongside
  attention with a dynamic context.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime MPSGraph null-deref (mlir::FloatType::getWidth) executing 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context #2

Runtime crash (MPSGraph `mlir::FloatType::getWidth()` null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context

Summary

Environment

Repro

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Result
1 GatedDeltaUpdate layer + 1 attention layer (dynamic context)	runs (exit 0)
2 GatedDeltaUpdate layers + 1 attention layer (dynamic context)	EXC_BAD_ACCESS at execute
N stacked GatedDeltaUpdate layers, no attention, fully static export	runs (verified to N=3)

Runtime MPSGraph null-deref (mlir::FloatType::getWidth) executing 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context #2

Description

Runtime crash (MPSGraph mlir::FloatType::getWidth() null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context

Summary

Environment

Repro

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Runtime crash (MPSGraph `mlir::FloatType::getWidth()` null-deref) executing a hybrid model with 2+ GatedDeltaUpdate layers + an attention layer with dynamic KV context