[slice] Check if partition ids are in use before creating slice#1022
[slice] Check if partition ids are in use before creating slice#1022gabesaba merged 10 commits intoAI-Hypercomputer:slice-mainfrom
Conversation
| } | ||
|
|
||
| if conflictingID, found := r.findConflictingPartitionID(existingSlicesByName, slicesToCreate); found { | ||
| msg := fmt.Sprintf("Partition ID %q is already used by an existing Slice", conflictingID) |
| if usedPartitionIDs.Has(id) { | ||
| return id, true | ||
| } |
There was a problem hiding this comment.
Can we log each overlap? the oldSlice, the newSlice, and the partition ID? Then just return the first match (or return all the matches, and populate this in AC message)?
There was a problem hiding this comment.
Done. Returning all the conflicts.
| equateErrors := cmp.Comparer(func(x, y error) bool { | ||
| if x == nil || y == nil { | ||
| return x == nil && y == nil | ||
| } | ||
| return errors.Is(x, y) || errors.Is(y, x) || x.Error() == y.Error() | ||
| }) |
There was a problem hiding this comment.
you defined this recently in another PR, right? Should this be a test utility?
| log := ctrl.LoggerFrom(ctx) | ||
| log.V(3).Info("Partition ID collisions detected, not creating slices, evicting workload") | ||
| ac.State = kueue.CheckStateRetry | ||
| ac.RequeueAfterSeconds = ptr.To(int32(10)) |
There was a problem hiding this comment.
Should this retry be within the Kueue Slice Plugin, rather than going back to Kueue via admission check? Then we can make the requeue time much shorter (order of second or two). My only concern would be, could we get stuck in this state indefinitely?
There was a problem hiding this comment.
My only concern would be, could we get stuck in this state indefinitely?
and if so, do we need a timeout mechanism, like we have with the activation timeout
|
lgtm Leaving https://github.com/AI-Hypercomputer/xpk/pull/1022/changes#r2769489909 as a follow-up |
Description
Issue
Testing