Skip to content

Releases: coldzerofear/vgpu-manager

v0.4.1

21 Apr 05:41

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0

10 Apr 10:41

Choose a tag to compare

Feat

  • Import Kubernetes client auth plugins
  • support compatible with open-gpu-kernel-modules
  • webhook support setting default runtimeClassName for vgpu pods

Fix

  • use more accurate cgroup driver recognition
  • multiple detail adjustments and bug fixes
  • cuda api compatibility fix

Optimization

  • container runtime compatibility optimization
  • optimizing command line parameters help Information
  • use hidden directories to mount host proc and host cgroup paths

Full Changelog: v0.3.1...v0.4.0

v0.3.1

20 Mar 08:15

Choose a tag to compare

Feat

  • memory scaling allows reporting more device memory
  • enhanced compatibility of dynamic library API
  • metrics server add rate limiter

Fix

  • fix dynamic library API error

Optimization

  • optimize the core monitoring thread of the dynamic library

Full Changelog: v0.3.0...v0.3.1

v0.3.0

13 Mar 09:48

Choose a tag to compare

Feat

  • webhook component supports setting default scheduling policies for pods
  • scheduler extender component add SerialBindNode feature gate
  • support optimal allocation mode based on device link topology
  • webhook component supports setting default device topology mode

Fix

  • fix configurable host manager dir path
  • fix nvidia-smi topo numa info index out of range
  • fix pod multi container scheduling device allocation exceeds actual limit #4
  • fix inaccurate calculation of scheduler node score #5
  • fix pod enable shareProcessNamespace container id extraction
  • fix the issue with the interception library in CUDA version 12.8

Optimization

  • optimize the numa topology algorithm to align numa as much as possible when allocating multiple GPUs
  • optimize the multi node score performance of the scheduler
  • optimize scheduler node resource collection
  • enhance the interception effect of the library

Full Changelog: v0.2.1...v0.3.0

v0.2.1

17 Feb 15:32

Choose a tag to compare

Feat

  • Provide helm charts deployment method
  • Provide webhook admission service component

Fix

  • Fix task interruption caused by device plugin initialization and installation #1

Full Changelog: v0.2...v0.2.1

v0.2

13 Feb 08:21

Choose a tag to compare

This version has undergone multiple improvements compared to v0.1

Feat

  • Idle computing power of dynamic balancing equipment
  • GPU device uses virtual memory after exceeding memory limit
  • Rescheduling device allocation failed pod
  • Include the memory used for graphic calculations in the memory limit

Fix

  • Container parallel device allocation error #2
  • Frequent calls to ListPodresource fail #3
  • Fix warnings during compilation of CUDA interception library
  • Fix some errors in the CUDA interception library

Optimization

  • Optimize the efficiency of CUDA interception library
  • Optimize the efficiency of multiple GPU card core limitations
  • Adjust vGPU monitoring indicators
  • Optimize GPU exclusive mode configuration
  • Optimize the success rate of device allocation
  • Move some feature switches to feature gate
  • Refactoring to reduce some redundant code

Full Changelog: v0.1...v0.2

v0.1

20 Jan 13:25

Choose a tag to compare

Feat

  • Support CUDA core and memory virtualization
  • Support cgroupv1 and cgroupv2
  • Support CUDA 12
  • Support dual scheduling strategy
  • Hard isolation of resources, the size of resources cannot be changed inside the container
  • Support multiple VGPU resource monitoring indicators

Full Changelog: https://github.com/coldzerofear/vgpu-manager/commits/v0.1