Releases · coldzerofear/vgpu-manager · GitHub

21 Apr 05:41

coldzerofear

v0.4.1 Latest

Latest

What's Changed

Bump golang.org/x/net from 0.32.0 to 0.38.0 by @dependabot in #22

New Contributors

@dependabot made their first contribution in #22

Full Changelog: v0.4.0...v0.4.1

Contributors

dependabot

Assets 3

10 Apr 10:41

coldzerofear

v0.4.0

Feat

Import Kubernetes client auth plugins
support compatible with open-gpu-kernel-modules
webhook support setting default runtimeClassName for vgpu pods

Fix

use more accurate cgroup driver recognition
multiple detail adjustments and bug fixes
cuda api compatibility fix

Optimization

container runtime compatibility optimization
optimizing command line parameters help Information
use hidden directories to mount host proc and host cgroup paths

Full Changelog: v0.3.1...v0.4.0

Assets 2

20 Mar 08:15

coldzerofear

v0.3.1

Feat

memory scaling allows reporting more device memory
enhanced compatibility of dynamic library API
metrics server add rate limiter

Fix

fix dynamic library API error

Optimization

optimize the core monitoring thread of the dynamic library

Full Changelog: v0.3.0...v0.3.1

Assets 2

13 Mar 09:48

coldzerofear

v0.3.0

Feat

webhook component supports setting default scheduling policies for pods
scheduler extender component add SerialBindNode feature gate
support optimal allocation mode based on device link topology
webhook component supports setting default device topology mode

Fix

fix configurable host manager dir path
fix nvidia-smi topo numa info index out of range
fix pod multi container scheduling device allocation exceeds actual limit #4
fix inaccurate calculation of scheduler node score #5
fix pod enable shareProcessNamespace container id extraction
fix the issue with the interception library in CUDA version 12.8

Optimization

optimize the numa topology algorithm to align numa as much as possible when allocating multiple GPUs
optimize the multi node score performance of the scheduler
optimize scheduler node resource collection
enhance the interception effect of the library

Full Changelog: v0.2.1...v0.3.0

Assets 2

17 Feb 15:32

coldzerofear

v0.2.1

Feat

Provide helm charts deployment method
Provide webhook admission service component

Fix

Fix task interruption caused by device plugin initialization and installation #1

Full Changelog: v0.2...v0.2.1

Assets 2

13 Feb 08:21

coldzerofear

v0.2

This version has undergone multiple improvements compared to v0.1

Feat

Idle computing power of dynamic balancing equipment
GPU device uses virtual memory after exceeding memory limit
Rescheduling device allocation failed pod
Include the memory used for graphic calculations in the memory limit

Fix

Container parallel device allocation error #2
Frequent calls to ListPodresource fail #3
Fix warnings during compilation of CUDA interception library
Fix some errors in the CUDA interception library

Optimization

Optimize the efficiency of CUDA interception library
Optimize the efficiency of multiple GPU card core limitations
Adjust vGPU monitoring indicators
Optimize GPU exclusive mode configuration
Optimize the success rate of device allocation
Move some feature switches to feature gate
Refactoring to reduce some redundant code

Full Changelog: v0.1...v0.2

Assets 2

20 Jan 13:25

coldzerofear

v0.1

Feat

Support CUDA core and memory virtualization
Support cgroupv1 and cgroupv2
Support CUDA 12
Support dual scheduling strategy
Hard isolation of resources, the size of resources cannot be changed inside the container
Support multiple VGPU resource monitoring indicators

Full Changelog: https://github.com/coldzerofear/vgpu-manager/commits/v0.1

Assets 2