Balance burst requests by tracking pending master assignments by RWL-Dittrich · Pull Request #2143 · exo-explore/exo

RWL-Dittrich · 2026-06-03T12:17:54Z

Motivation

When a cluster runs multiple instances of the same model, the master should send new requests to the least busy instance. But with several requests arriving at the same time, most of them were sent to one instance while the other stayed idle.

Root cause:
The master checks self.state.tasks to see how many tasks each instance has, then emits a TaskCreated event. However, self.state is only updated later by _event_processor.
Because of that delay, several requests can all see the same old state: node1=0, node2=0. Since ties always pick the first instance, the whole burst can go to one node.
This happens inside the master scheduler, so it does not matter which node receives the HTTP request.

Changes

Added Master._pending_assignments to track tasks that were assigned but are not yet visible in self.state.
Added _in_flight_counts(model_id, exclude) to count both tasks in self.state and pending assignments.
Used this helper in the TextGeneration, ImageGeneration, and ImageEdits schedulers.
Remove pending assignments when tasks are cancelled or finished.

Why It Works

The master now counts tasks immediately after assigning them, instead of waiting for the event to update self.state.
Example:

Request 1 sees node1=0, node2=0 and picks node 1.
Request 2 now sees node1=1, node2=0 because request 1 is pending, so it picks node 2.
This prevents bursts from all going to the same instance. Once the TaskCreated event is applied, the task is counted through self.state and the pending entry is removed.

Test Plan

Manual Testing

Hardware: 2 mac mini's, both running Gemma 4.
Started two instances of Gemma 4
Send a few concurrent /v1/chat/completions requests at the same time.
Before: most requests go to instance 1.
After: requests are split roughly 2/2.
Repeat with HTTP requests sent to both nodes. The distribution should stay balanced.

Automated Testing

Existing src/exo/master/tests/test_master.py still covers the single-instance TextGeneration path.

…ent handling

feat: implement in-flight task counting and optimize instance assignm…

df05dcc

…ent handling

RWL-Dittrich force-pushed the feature/round-robin-instances branch from 20300c1 to df05dcc Compare June 8, 2026 06:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balance burst requests by tracking pending master assignments#2143

Balance burst requests by tracking pending master assignments#2143
RWL-Dittrich wants to merge 1 commit into
exo-explore:mainfrom
RWL-Dittrich:feature/round-robin-instances

RWL-Dittrich commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RWL-Dittrich commented Jun 3, 2026

Motivation

Changes

Why It Works

Test Plan

Manual Testing

Automated Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant