Skip to content

BatchScheduler: garbage output at high --concurrent with short/early-finishing generations (exposed by --no-think) #140

Description

@scouzi1966

Summary

Under --concurrent at high batch size (B=8), short / early-finishing generations produce corrupted output (repeated tokens, e.g. !!!!!!!). Discovered during v0.9.13 release validation on mlx-community/Qwen3.6-35B-A3B-4bit.

Repro

# GARBAGE at B=8:
afm mlx -m mlx-community/Qwen3.6-35B-A3B-4bit --port 9999 --no-think --concurrent 8
python3 Scripts/feature-mlx-concurrent-batch/validate_responses.py
# -> 22/32, B=8: 3/8, e.g. "capital of France" -> '!!!!!!!!!!!!...'

# CLEAN with thinking on (long generations):
afm mlx -m mlx-community/Qwen3.6-35B-A3B-4bit --port 9999 --concurrent 8
python3 Scripts/feature-mlx-concurrent-batch/validate_responses.py
# -> 30/32, B=8: 8/8

Analysis

  • Not caused by --no-think per se. --no-think (fixed in this release so it actually disables thinking) makes generations short and varied-length, which exposes a latent BatchScheduler bug. With thinking on, all sequences are long and finish together, so the bug doesn't trigger.
  • The !!!! repeated-token signature points at a slot/KV lifecycle bug: when one sequence in the batch finishes early, its slot's KV state appears to corrupt a still-running sequence (or the finished slot is re-decoded).
  • Same family as the long-standing Concurrent x8 shared-prefix assertion failures and the Concurrent x8 prefix cache + grammar returns empty responses #86 concurrent issues.

Impact

  • Default behavior is unaffected — without --no-think, batched decode is clean (30/32 at B=8).
  • Narrow combination: opt-in --no-think and high --concurrent. Lower concurrency or omitting --no-think avoids it.

Suggested fix area

BatchScheduler slot retirement / KV-cache handling when a sequence hits EOS before others in the batch (eviction or masking of the finished slot's contribution).

Found during v0.9.13 validation; documented as a known limitation for that release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions