Skip to content

[BUG] Claude Messages API: stop_sequences report stop_reason=end_turn / stop_sequence=null, and multi-token stop sequences leak into output #2122

Description

@morisil

Summary

The Claude Messages API (/v1/messages) mishandles stop_sequences in three distinct ways:

  1. stop_reason is never "stop_sequence" — it is always reported as "end_turn".
  2. The stop_sequence response field is never populated — it is always null.
  3. Multi-token stop sequences leak their leading bytes into the output, and are not fully stripped.

(1) and (2) affect every stop sequence, including single-token ones. (3) additionally corrupts the output text whenever a stop sequence spans more than one decoded token.

Reproduction

curl http://localhost:52415/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<any mlx model>",
    "max_tokens": 1024,
    "messages": [{"role": "user",
      "content": [{"type": "text",
        "text": "Output the whole alphabet from A to Z, without spaces, then immediately output END"}]}],
    "stop_sequences": ["END"],
    "thinking": {"type": "disabled"}
  }'

Observed response:

{
  "content": [{"type": "text", "text": "ABCDEFGHIJKLMNOPQRSTUVWXYZ"}],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  ...
}

Expected (per the Anthropic Messages API):

{
  "content": [{"type": "text", "text": "ABCDEFGHIJKLMNOPQRSTUVWXYZ"}],
  "stop_reason": "stop_sequence",
  "stop_sequence": "END",
  ...
}

The single-token "END" is correctly stripped, but stop_reason/stop_sequence are wrong. With a stop sequence that tokenizes into multiple tokens, the leading bytes of the sequence additionally leak into content[0].text.

Root cause

Bugs 1 & 2 — reporting. When a stop sequence matches, the generator collapses it into the generic "stop" finish reason (src/exo/worker/engines/mlx/generator/generate.py, in the stop-sequence loop) and the matched string is discarded. The Claude adapter then maps it:

# src/exo/api/adapters/claude.py  (finish_reason_to_claude_stop_reason)
mapping: dict[FinishReason, ClaudeStopReason] = {
    "stop": "end_turn",   # <- always end_turn; "stop_sequence" is never produced
    ...
}

"stop_sequence" is a valid ClaudeStopReason but is unreachable, and neither collect_claude_response nor generate_claude_stream ever sets the stop_sequence field (it defaults to None). There is no way to distinguish a stop-sequence stop from a natural EOS because both arrive as finish_reason == "stop".

Bug 3 — multi-token leak. Generated text is emitted one token at a time, and the stop check is a substring test over the accumulated text:

# generate.py (and the equivalent in batch_generate.py via potential_stop_sequence_text)
if stop_seq in accumulated_text:
    ...
    chunk_start = len(accumulated_text) - len(out.text)
    text = text_before_stop[chunk_start:]   # trims only the CURRENT chunk

If "END" arrives as "E" then "ND", the "E" token is emitted as ordinary text before "END" is ever present in accumulated_text. When the final token completes the match, the trim only affects the current chunk and cannot retract the already-emitted "E". So the leading bytes of any multi-token stop sequence leak into output. The same incremental-emit flaw exists in the batch path.

Test coverage gap

There is currently no test that drives a stop sequence end-to-end. test_claude_api.py::test_stop_maps_to_end_turn asserts the mapping in isolation, and no test exercises the generator's stop-sequence matching or asserts the response stop_sequence field. So all three bugs are silently uncovered.

Affected scope

  • /v1/messages (Claude Messages API) — both streaming and non-streaming.
  • OpenAI (/v1/chat/completions) and Ollama endpoints are affected by Bug 3 only; for them "stop" is the correct finish reason, so their reporting is fine.

A fix is in progress (threading the matched sequence through, plus a streaming-safe hold-back scanner for the multi-token case) and will be linked as a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions