Qwen3.6 MoE malformed XML/JSON tool calls leak as content instead of parsed tool_calls

## Summary

When serving `mlx-community/Qwen3.6-35B-A3B-4bit` through `afm mlx` with the Qwen XML/adaptive XML tool parser, some edit/run tool calls are emitted by the model inside `<tool_call>...</tool_call>` blocks but are not parsed into OpenAI `message.tool_calls`. They leak back as assistant `content` with `tool_calls: []`, so agent harnesses such as VulcanBench stop executing the intended edit/run step.

This appears to be an `afm` parser/template compatibility bug, not a Qwen3.6 model capability limitation:

- The same Qwen3.6 weights served through Python `mlx_lm.server` produced proper OpenAI `tool_calls`.
- Under `afm`, Qwen3.6 successfully emitted valid tool calls for discovery operations like `list_files` and `read_file`.
- The failures occur on larger edit/run calls where the raw output is malformed but salvageable.

## Reproduction Evidence

VulcanBench comparison used the same model family and local OpenAI-compatible serving:

- `afm` Qwen3.6 slice: `/private/tmp/VulcanBench/runs-afm-qwen36-slice`
- Python MLX Qwen3.6 slice with exact same weights: `/private/tmp/VulcanBench/runs-mlxpy-qwen36-slice`

Observed `afm` Qwen3.6 failures:

- `ts-querystring-bug`: final edit returned raw content and no tool calls:

```text
<tool_call>
{"name="edit_file", "arguments": {"path": "src/parse.ts", ...}}
</tool_call>
```

- `py-topo-sort-cycle`:

```text
<tool_call>
{"function="edit_file", "path="dag/graph.py", "old_string="...", "new_string="..."}}
</tool_call>
```

- `go-stack-pop-bug`:

```text
<tool_call>
{"function>
<name>edit_file</name>
<parameter=new_string>...</parameter>
<parameter=old_string>...</parameter>
<parameter=path>stack/pop.go</parameter>
</function>
</tool_call>
```

- `rs-borrow-split`:

```text
<tool_call>
{"function="run_command", "arguments="cmd="find . -name '*.rs' ..."}}
</tool_call>
```

In each case the VulcanBench trace had `tool_calls: []`, so the command/edit was not executed.

## Root Cause

`MLXModelService.extractToolCallsFallback` handles:

1. standard XML: `<function=name><parameter=key>...</parameter></function>`
2. XML with embedded valid JSON arguments
3. valid JSON: `{"name":"func","arguments":{...}}`

It did not handle the malformed Qwen3.6 hybrids above. Since the vendor parser also missed them, they remained in assistant content.

## Fix Branch

Local fix branch:

```text
fix/qwen36-malformed-toolcall-parser
```

The branch adds a narrow fallback parser for these Qwen3.6 malformed hybrid forms, after the existing standard XML/JSON parsers, plus regression tests for all four trace shapes.

Changed areas:

- `Sources/MacLocalAPI/Models/MLXModelService.swift`
- `Tests/MacLocalAPITests/XMLToolCallParsingTests.swift`

## Verification

Fresh parser-suite verification on the fix branch:

```sh
swift test --filter XMLToolCallParsingTests
```

Result:

```text
Test run with 104 tests in 1 suite passed
```

## Follow-up

After merging the parser fix, rerun the VulcanBench Qwen3.6 slice against the rebuilt `afm` binary to confirm the edit/run calls are executed instead of leaked as content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.6 MoE malformed XML/JSON tool calls leak as content instead of parsed tool_calls #141

Summary

Reproduction Evidence

Root Cause

Fix Branch

Verification

Follow-up

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Qwen3.6 MoE malformed XML/JSON tool calls leak as content instead of parsed tool_calls #141

Description

Summary

Reproduction Evidence

Root Cause

Fix Branch

Verification

Follow-up

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions