Skip to content

Commit 9beea3c

Browse files
authored
fix: avoid non-parameter/buffer property scan (#40)
* fix: avoid moe property scan * update * fix: support moe tensor buffers * Update pyproject.toml * fix: ernie vl moe strict config tests * fix: allow runtime defusion for standalone replacement-model experts * modify: add comments to defuser helpers and tests * modify: update readme
1 parent 54966a9 commit 9beea3c

20 files changed

+256
-45
lines changed

.github/dependabot.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
version: 2
26
updates:
37
- package-ecosystem: github-actions
@@ -16,4 +20,4 @@ updates:
1620
groups:
1721
python-dependencies:
1822
patterns:
19-
- "*"
23+
- "*"

.github/scripts/deps.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,6 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
common:
26
- transformers

.github/scripts/install_deps.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
import os
26
import subprocess
37
import sys

.github/workflows/release.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
name: Release
26

37
concurrency:
@@ -118,4 +122,3 @@ jobs:
118122
with:
119123
name: ${{ env.WHL_NAME }}
120124
path: dist/${{ env.WHL_NAME }}
121-

.github/workflows/unit_tests.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
name: Unit Tests
26

37
defaults:
@@ -155,4 +159,3 @@ jobs:
155159
run: |
156160
mkdir -p artifacts
157161
pytest --durations=0 tests/${{ matrix.test_script }}.py --junitxml=artifacts/${{ runner.os }}-${{ matrix.test_script }}.xml
158-

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
### Python template
26
# Byte-compiled / optimized / DLL files
37
__pycache__/
@@ -156,4 +160,3 @@ dmypy.json
156160
cython_debug/
157161

158162
.idea/
159-

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Depending on the model family, Defuser can:
2323

2424
- patch a supported model class before load so HF instantiates a defused block directly
2525
- split fused tensors such as `gate_up_proj` into `gate_proj` + `up_proj`
26-
- convert 3D expert tensors into numbered expert `nn.Linear` modules
26+
- convert 3D expert tensors, including registered expert buffers, into numbered expert `nn.Linear` modules
2727
- preserve the original fused math while presenting a naive module structure again
2828

2929
Public API:
@@ -33,8 +33,9 @@ from defuser import convert_model, replace_fused_blocks
3333
```
3434

3535
- `replace_fused_blocks(model_type)` patches supported HF model classes before `from_pretrained()` or direct model construction.
36-
- `convert_model(model, cleanup_original=True, max_layers=None, filter=None)` converts an already loaded model in place. This is the runtime defusion path for supported post-load expert and MLP conversions, including `qwen3_5_moe` style checkpoints.
36+
- `convert_model(model, cleanup_original=False, max_layers=None, filter=None)` converts an already loaded model in place. This is the runtime defusion path for supported post-load expert and MLP conversions, including `qwen3_5_moe` style checkpoints.
3737
- Defuser is designed and CI-tested for `transformers>=5.3.0`, and support is only offered for that version range. Older versions log a warning on these public APIs and are skipped as unsupported.
38+
- Some model families appear in both support tables. Full models can be prepatched with `replace_fused_blocks(...)`, while standalone fused expert modules from those same families can still be runtime-defused with `convert_model(...)`.
3839

3940
`filter` is an optional list of PCRE regex rules evaluated against full module paths such as `model.layers.0.mlp.experts`:
4041

@@ -46,7 +47,7 @@ from defuser import convert_model, replace_fused_blocks
4647

4748
## Supported Models
4849

49-
Defuser currently supports the following `transformers==5.3.0` `model_type` values.
50+
Defuser currently supports the following `transformers>=5.3.0` `model_type` values.
5051

5152
### `replace_fused_blocks(model_type)` before load
5253

@@ -65,7 +66,7 @@ Defuser currently supports the following `transformers==5.3.0` `model_type` valu
6566

6667
| Pattern | Supported model types | Defused op performed |
6768
| --- | --- | --- |
68-
| Standard routed expert tensors | `deepseek_v2`, `dots1`, `ernie4_5_moe`, `ernie4_5_vl_moe`, `exaone_moe`, `flex_olmo`, `glm4_moe_lite`, `glm4v_moe`, `hunyuan_v1_moe`, `jamba`, `lfm2_moe`, `minimax`, `minimax_m2`, `olmoe`, `qwen3_vl_moe`, `solar_open` | Splits fused expert tensors into numbered expert `nn.Linear` modules with per-expert `gate_proj`, `up_proj`, and `down_proj`. |
69+
| Standard routed expert tensors | `deepseek_v2`, `dots1`, `ernie4_5_moe`, `ernie4_5_vl_moe`, `exaone_moe`, `flex_olmo`, `glm4_moe_lite`, `glm4v_moe`, `hunyuan_v1_moe`, `jamba`, `lfm2_moe`, `minimax`, `minimax_m2`, `olmoe`, `qwen3_vl_moe`, `solar_open` | Splits fused expert tensors or registered expert buffers into numbered expert `nn.Linear` modules with per-expert `gate_proj`, `up_proj`, and `down_proj`. |
6970
| Mixed sparse and shared experts | `deepseek_v3`, `glm_moe_dsa`, `qwen3_5_moe`, `qwen3_5_moe_text` | Runtime expert tensor defusion for routed experts while preserving the model's shared-expert path. |
7071
| Transposed or packed expert tensors | `gpt_oss`, `phimoe` | Splits transposed fused expert `gate_up_proj` tensors into per-expert `gate_proj` + `up_proj`, preserves expert bias when present, and converts expert tensors into numbered expert `nn.Linear` modules. |
7172
| Flattened expert layout | `dbrx` | Rebuilds the flattened DBRX expert FFN weights into numbered expert `gate_proj`, `up_proj`, and `down_proj` `nn.Linear` modules. |
@@ -100,6 +101,8 @@ converted = convert_model(model)
100101
print(converted) # True when runtime defusion happened
101102
```
102103

104+
`convert_model(model)` also preserves meta-device construction for supported meta-initialized models, so structural validation can run without materializing weights.
105+
103106
Use `filter` when only specific blocks should be defused:
104107

105108
```python

defuser/checkpoint_ops.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium
15
import torch
26
from transformers.core_model_loading import Chunk, Concatenate, ConversionOps, MergeModulelist
37

defuser/defuser.py

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,26 @@ def get_checkpoint_conversion_mapping(model_type):
3636

3737

3838
class PatchError(Exception):
39+
"""Raised when Defuser cannot patch a registered Transformers class."""
40+
3941
pass
4042

4143

44+
def _has_prebuilt_replacements(model: nn.Module, model_type: str) -> bool:
45+
"""Detect models that were already instantiated with registry-backed replacements."""
46+
replacement_paths = MODEL_CONFIG[model_type].get(PATCH.REPLACE_MODULE, [])
47+
replacement_class_paths = {custom_path for _, custom_path in replacement_paths}
48+
if not replacement_class_paths:
49+
return False
50+
51+
for module in model.modules():
52+
class_path = f"{module.__class__.__module__}.{module.__class__.__name__}"
53+
if class_path in replacement_class_paths:
54+
return True
55+
56+
return False
57+
58+
4259
def replace_fused_blocks(model_type: str) -> bool:
4360
"""Patch supported HF model classes so future loads instantiate defused blocks."""
4461
if warn_if_public_api_transformers_unsupported("replace_fused_blocks()", logger):
@@ -202,9 +219,10 @@ def convert_model(
202219

203220
apply_model_patches(model, max_layers=max_layers, filter_rules=filter)
204221

205-
# If fused blocks have already been structurally replaced at load model before,
206-
# there is no need to perform runtime defusing again
207-
if MODEL_CONFIG[model.config.model_type].get(PATCH.REPLACE_MODULE):
222+
# Full models patched at construction time already contain the defused
223+
# replacement modules, but standalone experts from those model types can
224+
# still use runtime tensor defusion.
225+
if _has_prebuilt_replacements(model, model.config.model_type):
208226
return False
209227

210228
# Perform runtime defusing of fused projections

defuser/modeling/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# SPDX-FileCopyrightText: 2026 ModelCloud.ai
2+
# SPDX-FileCopyrightText: 2026 qubitium@modelcloud.ai
3+
# SPDX-License-Identifier: Apache-2.0
4+
# Contact: qubitium@modelcloud.ai, x.com/qubitium

0 commit comments

Comments
 (0)