Skip to content

Qwen3.6 MoE malformed XML/JSON tool calls leak as content instead of parsed tool_calls #141

Description

@scouzi1966

Summary

When serving mlx-community/Qwen3.6-35B-A3B-4bit through afm mlx with the Qwen XML/adaptive XML tool parser, some edit/run tool calls are emitted by the model inside <tool_call>...</tool_call> blocks but are not parsed into OpenAI message.tool_calls. They leak back as assistant content with tool_calls: [], so agent harnesses such as VulcanBench stop executing the intended edit/run step.

This appears to be an afm parser/template compatibility bug, not a Qwen3.6 model capability limitation:

  • The same Qwen3.6 weights served through Python mlx_lm.server produced proper OpenAI tool_calls.
  • Under afm, Qwen3.6 successfully emitted valid tool calls for discovery operations like list_files and read_file.
  • The failures occur on larger edit/run calls where the raw output is malformed but salvageable.

Reproduction Evidence

VulcanBench comparison used the same model family and local OpenAI-compatible serving:

  • afm Qwen3.6 slice: /private/tmp/VulcanBench/runs-afm-qwen36-slice
  • Python MLX Qwen3.6 slice with exact same weights: /private/tmp/VulcanBench/runs-mlxpy-qwen36-slice

Observed afm Qwen3.6 failures:

  • ts-querystring-bug: final edit returned raw content and no tool calls:
<tool_call>
{"name="edit_file", "arguments": {"path": "src/parse.ts", ...}}
</tool_call>
  • py-topo-sort-cycle:
<tool_call>
{"function="edit_file", "path="dag/graph.py", "old_string="...", "new_string="..."}}
</tool_call>
  • go-stack-pop-bug:
<tool_call>
{"function>
<name>edit_file</name>
<parameter=new_string>...</parameter>
<parameter=old_string>...</parameter>
<parameter=path>stack/pop.go</parameter>
</function>
</tool_call>
  • rs-borrow-split:
<tool_call>
{"function="run_command", "arguments="cmd="find . -name '*.rs' ..."}}
</tool_call>

In each case the VulcanBench trace had tool_calls: [], so the command/edit was not executed.

Root Cause

MLXModelService.extractToolCallsFallback handles:

  1. standard XML: <function=name><parameter=key>...</parameter></function>
  2. XML with embedded valid JSON arguments
  3. valid JSON: {"name":"func","arguments":{...}}

It did not handle the malformed Qwen3.6 hybrids above. Since the vendor parser also missed them, they remained in assistant content.

Fix Branch

Local fix branch:

fix/qwen36-malformed-toolcall-parser

The branch adds a narrow fallback parser for these Qwen3.6 malformed hybrid forms, after the existing standard XML/JSON parsers, plus regression tests for all four trace shapes.

Changed areas:

  • Sources/MacLocalAPI/Models/MLXModelService.swift
  • Tests/MacLocalAPITests/XMLToolCallParsingTests.swift

Verification

Fresh parser-suite verification on the fix branch:

swift test --filter XMLToolCallParsingTests

Result:

Test run with 104 tests in 1 suite passed

Follow-up

After merging the parser fix, rerun the VulcanBench Qwen3.6 slice against the rebuilt afm binary to confirm the edit/run calls are executed instead of leaked as content.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions