Skip to content

[slice] Check if partition ids are in use before creating slice#1022

Merged
gabesaba merged 10 commits intoAI-Hypercomputer:slice-mainfrom
pajakd:instersection
Feb 5, 2026
Merged

[slice] Check if partition ids are in use before creating slice#1022
gabesaba merged 10 commits intoAI-Hypercomputer:slice-mainfrom
pajakd:instersection

Conversation

@pajakd
Copy link
Collaborator

@pajakd pajakd commented Feb 4, 2026

Description

Issue

Testing

@pajakd pajakd marked this pull request as ready for review February 5, 2026 11:33
@pajakd pajakd changed the title [slice] [WIP] check if partition ids are in use before creating slice [slice] Check if partition ids are in use before creating slice Feb 5, 2026
}

if conflictingID, found := r.findConflictingPartitionID(existingSlicesByName, slicesToCreate); found {
msg := fmt.Sprintf("Partition ID %q is already used by an existing Slice", conflictingID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's log this branch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 641 to 643
if usedPartitionIDs.Has(id) {
return id, true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log each overlap? the oldSlice, the newSlice, and the partition ID? Then just return the first match (or return all the matches, and populate this in AC message)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Returning all the conflicts.

Comment on lines 109 to 114
equateErrors := cmp.Comparer(func(x, y error) bool {
if x == nil || y == nil {
return x == nil && y == nil
}
return errors.Is(x, y) || errors.Is(y, x) || x.Error() == y.Error()
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you defined this recently in another PR, right? Should this be a test utility?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, moved

log := ctrl.LoggerFrom(ctx)
log.V(3).Info("Partition ID collisions detected, not creating slices, evicting workload")
ac.State = kueue.CheckStateRetry
ac.RequeueAfterSeconds = ptr.To(int32(10))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this retry be within the Kueue Slice Plugin, rather than going back to Kueue via admission check? Then we can make the requeue time much shorter (order of second or two). My only concern would be, could we get stuck in this state indefinitely?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern would be, could we get stuck in this state indefinitely?

and if so, do we need a timeout mechanism, like we have with the activation timeout

@gabesaba
Copy link
Collaborator

gabesaba commented Feb 5, 2026

@gabesaba gabesaba merged commit f548c1d into AI-Hypercomputer:slice-main Feb 5, 2026
12 of 17 checks passed
@pajakd pajakd deleted the instersection branch February 5, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants