Releases: coldzerofear/vgpu-manager
Releases · coldzerofear/vgpu-manager
v0.4.1
What's Changed
- Bump golang.org/x/net from 0.32.0 to 0.38.0 by @dependabot in #22
New Contributors
- @dependabot made their first contribution in #22
Full Changelog: v0.4.0...v0.4.1
v0.4.0
Feat
- Import Kubernetes client auth plugins
- support compatible with open-gpu-kernel-modules
- webhook support setting default runtimeClassName for vgpu pods
Fix
- use more accurate cgroup driver recognition
- multiple detail adjustments and bug fixes
- cuda api compatibility fix
Optimization
- container runtime compatibility optimization
- optimizing command line parameters help Information
- use hidden directories to mount host proc and host cgroup paths
Full Changelog: v0.3.1...v0.4.0
v0.3.1
Feat
- memory scaling allows reporting more device memory
- enhanced compatibility of dynamic library API
- metrics server add rate limiter
Fix
- fix dynamic library API error
Optimization
- optimize the core monitoring thread of the dynamic library
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Feat
- webhook component supports setting default scheduling policies for pods
- scheduler extender component add SerialBindNode feature gate
- support optimal allocation mode based on device link topology
- webhook component supports setting default device topology mode
Fix
- fix configurable host manager dir path
- fix nvidia-smi topo numa info index out of range
- fix pod multi container scheduling device allocation exceeds actual limit #4
- fix inaccurate calculation of scheduler node score #5
- fix pod enable shareProcessNamespace container id extraction
- fix the issue with the interception library in CUDA version 12.8
Optimization
- optimize the numa topology algorithm to align numa as much as possible when allocating multiple GPUs
- optimize the multi node score performance of the scheduler
- optimize scheduler node resource collection
- enhance the interception effect of the library
Full Changelog: v0.2.1...v0.3.0
v0.2.1
Feat
- Provide helm charts deployment method
- Provide webhook admission service component
Fix
- Fix task interruption caused by device plugin initialization and installation #1
Full Changelog: v0.2...v0.2.1
v0.2
This version has undergone multiple improvements compared to v0.1
Feat
- Idle computing power of dynamic balancing equipment
- GPU device uses virtual memory after exceeding memory limit
- Rescheduling device allocation failed pod
- Include the memory used for graphic calculations in the memory limit
Fix
- Container parallel device allocation error #2
- Frequent calls to ListPodresource fail #3
- Fix warnings during compilation of CUDA interception library
- Fix some errors in the CUDA interception library
Optimization
- Optimize the efficiency of CUDA interception library
- Optimize the efficiency of multiple GPU card core limitations
- Adjust vGPU monitoring indicators
- Optimize GPU exclusive mode configuration
- Optimize the success rate of device allocation
- Move some feature switches to feature gate
- Refactoring to reduce some redundant code
Full Changelog: v0.1...v0.2
v0.1
Feat
- Support CUDA core and memory virtualization
- Support cgroupv1 and cgroupv2
- Support CUDA 12
- Support dual scheduling strategy
- Hard isolation of resources, the size of resources cannot be changed inside the container
- Support multiple VGPU resource monitoring indicators
Full Changelog: https://github.com/coldzerofear/vgpu-manager/commits/v0.1