Skip to content

Make prefill step size configurable via EXO_PREFILL_STEP_SIZE (#2141)#2155

Open
lollinng wants to merge 1 commit into
exo-explore:mainfrom
lollinng:feat/configurable-prefill-step-size
Open

Make prefill step size configurable via EXO_PREFILL_STEP_SIZE (#2141)#2155
lollinng wants to merge 1 commit into
exo-explore:mainfrom
lollinng:feat/configurable-prefill-step-size

Conversation

@lollinng

@lollinng lollinng commented Jun 5, 2026

Copy link
Copy Markdown

Problem (#2141)

The prefill step size in the MLX generator is hardcoded:

prefill_step_size = 4096

As reported in #2141, on some setups (RDMA over Thunderbolt across Macs) a step size of 4096 stalls — activity collapses to a single GPU and a query can hang for hours — while 1024 keeps all devices busy and responds in under a minute. The reporter asked for this to be a "hardcoded default, env, or commandline option."

Fix

Read the step size from EXO_PREFILL_STEP_SIZE, defaulting to 4096, matching exo's existing os.getenv("EXO_*", default) convention (e.g. EXO_OFFLINE, EXO_BOOTSTRAP_PEERS). Behavior is unchanged unless the env var is set, so users on affected interconnects can drop it to 1024 without a code change.

Verification

default (unset)              -> 4096
EXO_PREFILL_STEP_SIZE=1024   -> 1024  (int)

ruff check (incl. import sort) clean on the file.

The prefill step size was hardcoded to 4096. On some interconnects (e.g. RDMA
over Thunderbolt across Macs) a smaller step keeps all devices busy and is
dramatically faster, while 4096 can stall (exo-explore#2141). Read it from
EXO_PREFILL_STEP_SIZE, defaulting to 4096 so behavior is unchanged unless set.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant