Summary
When serving mlx-community/Qwen3.6-35B-A3B-4bit through afm mlx with the Qwen XML/adaptive XML tool parser, some edit/run tool calls are emitted by the model inside <tool_call>...</tool_call> blocks but are not parsed into OpenAI message.tool_calls. They leak back as assistant content with tool_calls: [], so agent harnesses such as VulcanBench stop executing the intended edit/run step.
This appears to be an afm parser/template compatibility bug, not a Qwen3.6 model capability limitation:
- The same Qwen3.6 weights served through Python
mlx_lm.server produced proper OpenAI tool_calls.
- Under
afm, Qwen3.6 successfully emitted valid tool calls for discovery operations like list_files and read_file.
- The failures occur on larger edit/run calls where the raw output is malformed but salvageable.
Reproduction Evidence
VulcanBench comparison used the same model family and local OpenAI-compatible serving:
afm Qwen3.6 slice: /private/tmp/VulcanBench/runs-afm-qwen36-slice
- Python MLX Qwen3.6 slice with exact same weights:
/private/tmp/VulcanBench/runs-mlxpy-qwen36-slice
Observed afm Qwen3.6 failures:
ts-querystring-bug: final edit returned raw content and no tool calls:
<tool_call>
{"name="edit_file", "arguments": {"path": "src/parse.ts", ...}}
</tool_call>
<tool_call>
{"function="edit_file", "path="dag/graph.py", "old_string="...", "new_string="..."}}
</tool_call>
<tool_call>
{"function>
<name>edit_file</name>
<parameter=new_string>...</parameter>
<parameter=old_string>...</parameter>
<parameter=path>stack/pop.go</parameter>
</function>
</tool_call>
<tool_call>
{"function="run_command", "arguments="cmd="find . -name '*.rs' ..."}}
</tool_call>
In each case the VulcanBench trace had tool_calls: [], so the command/edit was not executed.
Root Cause
MLXModelService.extractToolCallsFallback handles:
- standard XML:
<function=name><parameter=key>...</parameter></function>
- XML with embedded valid JSON arguments
- valid JSON:
{"name":"func","arguments":{...}}
It did not handle the malformed Qwen3.6 hybrids above. Since the vendor parser also missed them, they remained in assistant content.
Fix Branch
Local fix branch:
fix/qwen36-malformed-toolcall-parser
The branch adds a narrow fallback parser for these Qwen3.6 malformed hybrid forms, after the existing standard XML/JSON parsers, plus regression tests for all four trace shapes.
Changed areas:
Sources/MacLocalAPI/Models/MLXModelService.swift
Tests/MacLocalAPITests/XMLToolCallParsingTests.swift
Verification
Fresh parser-suite verification on the fix branch:
swift test --filter XMLToolCallParsingTests
Result:
Test run with 104 tests in 1 suite passed
Follow-up
After merging the parser fix, rerun the VulcanBench Qwen3.6 slice against the rebuilt afm binary to confirm the edit/run calls are executed instead of leaked as content.
Summary
When serving
mlx-community/Qwen3.6-35B-A3B-4bitthroughafm mlxwith the Qwen XML/adaptive XML tool parser, some edit/run tool calls are emitted by the model inside<tool_call>...</tool_call>blocks but are not parsed into OpenAImessage.tool_calls. They leak back as assistantcontentwithtool_calls: [], so agent harnesses such as VulcanBench stop executing the intended edit/run step.This appears to be an
afmparser/template compatibility bug, not a Qwen3.6 model capability limitation:mlx_lm.serverproduced proper OpenAItool_calls.afm, Qwen3.6 successfully emitted valid tool calls for discovery operations likelist_filesandread_file.Reproduction Evidence
VulcanBench comparison used the same model family and local OpenAI-compatible serving:
afmQwen3.6 slice:/private/tmp/VulcanBench/runs-afm-qwen36-slice/private/tmp/VulcanBench/runs-mlxpy-qwen36-sliceObserved
afmQwen3.6 failures:ts-querystring-bug: final edit returned raw content and no tool calls:py-topo-sort-cycle:go-stack-pop-bug:rs-borrow-split:In each case the VulcanBench trace had
tool_calls: [], so the command/edit was not executed.Root Cause
MLXModelService.extractToolCallsFallbackhandles:<function=name><parameter=key>...</parameter></function>{"name":"func","arguments":{...}}It did not handle the malformed Qwen3.6 hybrids above. Since the vendor parser also missed them, they remained in assistant content.
Fix Branch
Local fix branch:
The branch adds a narrow fallback parser for these Qwen3.6 malformed hybrid forms, after the existing standard XML/JSON parsers, plus regression tests for all four trace shapes.
Changed areas:
Sources/MacLocalAPI/Models/MLXModelService.swiftTests/MacLocalAPITests/XMLToolCallParsingTests.swiftVerification
Fresh parser-suite verification on the fix branch:
swift test --filter XMLToolCallParsingTestsResult:
Follow-up
After merging the parser fix, rerun the VulcanBench Qwen3.6 slice against the rebuilt
afmbinary to confirm the edit/run calls are executed instead of leaked as content.