Conversation
Add new interface for pod group, including scheduling request, interal status, and scheduled placement.
Add conversion, defaulting, and validation for v2 pod scheduling spec, update ExtractPodSchedulingSpec accordingly.
Convert AlgoAffinityGroup to PodGroupSchedulingStatus before intra VC scheduler.
db3d1ec to
e56fe3b
Compare
Add cell type to level mapping for intra-vc.
Fix pod group placement type in scheduling status.
* adjust virtual and physical placements for scheduling results * update pod binding info for tree structure * expose pod group in api through webserver
* pick best affinity cells * convert cells to leaf cells in placement
Fix legacy unit tests.
Codecov Report
@@ Coverage Diff @@
## master #34 +/- ##
==========================================
- Coverage 88.46% 80.68% -7.79%
==========================================
Files 8 10 +2
Lines 1890 2205 +315
==========================================
+ Hits 1672 1779 +107
- Misses 166 334 +168
- Partials 52 92 +40
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Fix bugs in new v2 test cases, including: * calculate leaf cell numbers per pod when creating new PodGroupSchedulingStatus * get cell chain through within cell type (no higher than node) instead of leaf cell type * sort cells by virtual address in cluster view
9fd0560 to
ae26632
Compare
Fix comments.
pkg/api/v2/types.go
Outdated
|
|
||
| // PodGroupMemberCellSpec represents cell spec for each pod in pod group. | ||
| type PodGroupMemberCellSpec struct { | ||
| CellType api.CellType `yaml:"cellType"` // cell type to be used in pod, differnt group can have differnt cell typs in the same chain, TODO: not support multiple chains yet |
pkg/api/v2/types.go
Outdated
| Name string `yaml:"name"` // pod group name | ||
| WithinOneCell api.CellType `yaml:"withinOneCell"` // within cell for all cells in current group, e.g., two GPU-cell within one numa-cell, two node-cell within one rack-cell | ||
| Pods []PodGroupMemberSpec `yaml:"pods"` // pod list for current group, any two pods are gang with each other in the groupS | ||
| ChildGroups []*PodGroupSpec `yaml:"childGroups"` // child group in the hierarchical structure |
There was a problem hiding this comment.
Pods and ChildGroups are order sensitive? #Pending
|
Only backtrace to parent pod group may not enough to find the whole pod root group placement if it exist in theory. We can add more pod group sorting criterias, such as if it has So given pod group trees are generally small trees with few nodes, backtrace to previous bound pod group (i.e. parent or left sibling node) should not too time consuming, which is enough to find the whole pod root group placement if it exist in theory. In reply to: 918971150 |
pkg/algorithm/sku_scheduler.go
Outdated
| // virtual cluster view | ||
| type skuClusterView []*skuCell | ||
|
|
||
| // skuScheduler can schedule a pod group of pods with arbitrary cell types on a cluster view. |
There was a problem hiding this comment.
pkg/algorithm/sku_scheduler.go
Outdated
| // skuScheduler can schedule a pod group of pods with arbitrary cell types on a cluster view. | ||
| // It first tries to place pod group without preemption, then enable preemption if schedule failed. | ||
| // For each try, it will find within cells for each pod group, then find cells for each pods with better affinity. | ||
| type skuScheduler struct { |
pkg/algorithm/hived_algorithm.go
Outdated
| cellChains map[string][]CellChain | ||
| // map each level in a chain to the specific cell type name | ||
| cellTypes map[CellChain]map[CellLevel]api.CellType | ||
| // map each type name in a chain to the specific cell level |
| placement, failedReason = PodGroupPlacement{}, "no matched cells in vc" | ||
|
|
||
| cv := skuClusterView{nil} | ||
| if level, ok := s.cellLevels[podGroup.WithinOneCell]; ok { |
|
|
||
| func (s *skuScheduler) findCellsForPods( | ||
| pods []apiv2.PodGroupMemberSpec, | ||
| priority CellPriority, |
There was a problem hiding this comment.
early return if pods nil or empty? #Pending
| placement, failedReason = PodGroupPlacement{}, "no matched cells in vc" | ||
|
|
||
| cv := skuClusterView{nil} | ||
| if level, ok := s.cellLevels[podGroup.WithinOneCell]; ok { |
pkg/algorithm/sku_scheduler.go
Outdated
| } | ||
|
|
||
| func (s *skuScheduler) createSkuClusterView( | ||
| within *skuCell, |
There was a problem hiding this comment.
Comment the func return and its param relation #Pending
pkg/algorithm/sku_scheduler.go
Outdated
| } | ||
|
|
||
| func (s *skuScheduler) createSkuClusterView( | ||
| within *skuCell, |
pkg/algorithm/sku_scheduler.go
Outdated
|
|
||
| // skuCell type for selected level cell in virtual cluster view | ||
| type skuCell struct { | ||
| cell Cell // within cell level, maybe higher or lower than node level |
pkg/algorithm/sku_scheduler.go
Outdated
| return false | ||
| } | ||
|
|
||
| func ancestorNoHigherThanLevel(withinLevel CellLevel, cell Cell) Cell { |
| for _, c := range s.chainCellList[l] { | ||
| if (within != nil && !isAncestor(within.cell, c)) || | ||
| cv.contains(ancestorNoHigherThanLevel(withinLevel, c)) { | ||
| continue |
There was a problem hiding this comment.
Will "cv.contains() continue" cause some cells with more free gpus are excluded in the final cv? #Closed
| "have to use at least one bad cell %v", withinCell.address) | ||
| } | ||
| podPlacement := s.findCellsForSinglePod(pods[podIndex], priority, withinCell, allocatedCells) | ||
| if podPlacement == nil { |
There was a problem hiding this comment.
schedule pods also need backtrace to previous bound pod? otherwise, we need to prove it will not miss solution #Pending
There was a problem hiding this comment.
since we have changed podGroup.Pods to podGroup.Pod, all pods here with different podIndex are the same now
pkg/algorithm/sku_scheduler.go
Outdated
| func (s *skuScheduler) findCellsForSinglePod( | ||
| pod apiv2.PodGroupMemberSpec, | ||
| priority CellPriority, | ||
| within *skuCell, |
| } | ||
| podPlacement := s.findCellsForSinglePod(pods[podIndex], priority, withinCell, allocatedCells) | ||
| if podPlacement == nil { | ||
| withinCellIndex++ |
There was a problem hiding this comment.
since we have changed podGroup.Pods to podGroup.Pod, all pods here with different podIndex are the same, no need to reset withinCellIndex anymore
pkg/algorithm/sku_scheduler.go
Outdated
| freeCells, preemptibleCells = getFreeCellsAtLevel( | ||
| within.cell, currLevel, priority, allocatedCells, freeCells, preemptibleCells) | ||
| // free leaf cells will be used first (before preemptible leaf cells) | ||
| freeCells = append(freeCells, preemptibleCells...) |
| } | ||
|
|
||
| func (s *skuScheduler) findCellsForSinglePod( | ||
| pod apiv2.PodGroupMemberSpec, |
There was a problem hiding this comment.
Comment for single pod we still best effort find nearest cells for it (old behaiour) #Pending
pkg/algorithm/sku_scheduler.go
Outdated
| currLevel := s.cellLevels[pod.CellsPerPod.CellType] | ||
| freeCells, preemptibleCells := CellList{}, CellList{} | ||
| freeCells, preemptibleCells = getFreeCellsAtLevel( | ||
| within.cell, currLevel, priority, allocatedCells, freeCells, preemptibleCells) |
There was a problem hiding this comment.
First use cell that does not need to be got from higher level cell split? #Pending
There was a problem hiding this comment.
sort cells when splitting in getFreeCellsAtLevel so that cells need higher level split can be used later
pkg/algorithm/hived_algorithm.go
Outdated
| leafCellChains map[string][]CellChain | ||
| // map each within cell type to all chains that contain this type | ||
| cellChains map[string][]CellChain | ||
| // map each level in a chain to the specific cell type name |
| // 1. cell health (prefer healthy) | ||
| // 2. cell level (prefer lower) | ||
| // 3. usedLeafCellNumAtPriority (prefer higher) | ||
| // 4. usedLeafCellNumHigherPriority (prefer lower) |
There was a problem hiding this comment.
if greedy search, may need use same order to sort here? #Pending
There was a problem hiding this comment.
Comment usedLeafCellNumAtPriority different meaning in different crossPriorityPack case
pkg/api/v2/types.go
Outdated
| Name string `yaml:"name"` // pod group name | ||
| WithinOneCell api.CellType `yaml:"withinOneCell"` // within cell for all cells in current group, e.g., two GPU-cell within one numa-cell, two node-cell within one rack-cell | ||
| Pods []PodGroupMemberSpec `yaml:"pods"` // pod list for current group, any two pods are gang with each other in the groupS | ||
| ChildGroups []*PodGroupSpec `yaml:"childGroups"` // child group in the hierarchical structure |
There was a problem hiding this comment.
Seems no need Pod list, and then pods search no need back trace #Pending
There was a problem hiding this comment.
changed to PodGroup.Pod, reserve Pods field to flatten Pod for internal structure
Update according to comments.
Fix GitHub Action config.
Change `PodGroup.Pods` to `PodGroup.Pod` to resolve backtracing search issue.
* early return check for pods and withinOneCell * sortcells when splitting in getFreeCellsAtLevel * add comments for sorting usedLeafCellNumAtPriority
Update config in examples accordingly.
Skip build for legacy topology aware scheduler through build tag.

New cell as SKU feature in intra-vc scheduler, work items: