torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.42 GiB. GPU 0 has a total capacity of 44.40 GiB of which 33.26 GiB is free. Including non-PyTorch memory, this process has 11.13 GiB memory in use. Of the allocated memory 10.25 GiB is allocated by PyTorch, and 394.50 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
accelerate 1.13.0
certifi 2026.5.20
charset-normalizer 3.4.7
cuda-bindings 13.3.1
cuda-pathfinder 1.5.5
cuda-toolkit 12.8.0
decord 0.6.0
filelock 3.29.1
fsspec 2026.4.0
hf-xet 1.5.0
huggingface_hub 0.36.2
idna 3.18
Jinja2 3.1.6
lmdb 1.7.5
MarkupSafe 3.0.3
mpmath 1.3.0
networkx 3.6.1
numpy 1.25.0
nvidia-cublas 13.1.0.3
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti 13.0.85
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc 13.0.88
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime 13.0.96
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-cu13 9.19.0.56
nvidia-cufft 12.0.0.61
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile 1.15.1.6
nvidia-cufile-cu12 1.13.1.3
nvidia-curand 10.4.0.35
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver 12.0.4.66
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse 12.6.3.3
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-cusparselt-cu13 0.8.0
nvidia-nccl-cu12 2.27.3
nvidia-nccl-cu13 2.28.9
nvidia-nvjitlink 13.0.88
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvshmem-cu13 3.4.5
nvidia-nvtx 13.0.85
nvidia-nvtx-cu12 12.8.90
opencv-python-headless 4.11.0.86
packaging 26.2
peft 0.19.1
pillow 11.1.0
pip 25.2
psutil 7.2.2
PyYAML 6.0.3
regex 2026.5.9
requests 2.34.2
safetensors 0.7.0
setuptools 80.9.0
sympy 1.14.0
tokenizers 0.22.2
torch 2.8.0
torchaudio 2.8.0
torchvision 0.23.0
tqdm 4.67.3
transformers 4.57.1
triton 3.4.0
typing_extensions 4.15.0
urllib3 2.7.0
wheel 0.45.1
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.42 GiB. GPU 0 has a total capacity of 44.40 GiB of which 33.26 GiB is free. Including non-PyTorch memory, this process has 11.13 GiB memory in use. Of the allocated memory 10.25 GiB is allocated by PyTorch, and 394.50 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
accelerate 1.13.0
certifi 2026.5.20
charset-normalizer 3.4.7
cuda-bindings 13.3.1
cuda-pathfinder 1.5.5
cuda-toolkit 12.8.0
decord 0.6.0
filelock 3.29.1
fsspec 2026.4.0
hf-xet 1.5.0
huggingface_hub 0.36.2
idna 3.18
Jinja2 3.1.6
lmdb 1.7.5
MarkupSafe 3.0.3
mpmath 1.3.0
networkx 3.6.1
numpy 1.25.0
nvidia-cublas 13.1.0.3
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti 13.0.85
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc 13.0.88
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime 13.0.96
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-cu13 9.19.0.56
nvidia-cufft 12.0.0.61
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile 1.15.1.6
nvidia-cufile-cu12 1.13.1.3
nvidia-curand 10.4.0.35
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver 12.0.4.66
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse 12.6.3.3
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-cusparselt-cu13 0.8.0
nvidia-nccl-cu12 2.27.3
nvidia-nccl-cu13 2.28.9
nvidia-nvjitlink 13.0.88
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvshmem-cu13 3.4.5
nvidia-nvtx 13.0.85
nvidia-nvtx-cu12 12.8.90
opencv-python-headless 4.11.0.86
packaging 26.2
peft 0.19.1
pillow 11.1.0
pip 25.2
psutil 7.2.2
PyYAML 6.0.3
regex 2026.5.9
requests 2.34.2
safetensors 0.7.0
setuptools 80.9.0
sympy 1.14.0
tokenizers 0.22.2
torch 2.8.0
torchaudio 2.8.0
torchvision 0.23.0
tqdm 4.67.3
transformers 4.57.1
triton 3.4.0
typing_extensions 4.15.0
urllib3 2.7.0
wheel 0.45.1