Expected Behavior
When configuring multiple ComfyUI backends with different GPU IDs on Linux with ROCm, expect to run a backend per GPU.
Actual Behavior
First ComfyUI backend starts, the second and subsequent ComfyUI backends fail to start with the error:
RuntimeError: No HIP GPUs are available
Steps to Reproduce
Environment
- OS: Linux (Debian 12)
- GPU: AMD GPUs with ROCm
- SwarmUI Version: 0.9.7.4
- ComfyUI: Self-starting backend with ROCm support
Steps to Reproduce
- Configure a ComfyUI backend with
GPU_ID: 0 - works fine
- Configure a second ComfyUI backend with
GPU_ID: 1 - fails to start
- Check logs show:
RuntimeError: No HIP GPUs are available
Debug Logs
16:58:50.608 [Warning] User local requested edit of backend 3.
16:58:50.616 [Info] ComfyUI backend 3 shutting down...
16:58:52.479 [Init] Initializing backend #3 - ComfyUI Self-Starting...
16:58:52.989 [Init] Self-Start ComfyUI-3 on port 7823 is loading...
16:58:54.641 [Warning] [ComfyUI-3/STDERR] Traceback (most recent call last):
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/main.py", line 177, in
16:58:54.641 [Warning] [ComfyUI-3/STDERR] import execution
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/execution.py", line 15, in
16:58:54.641 [Warning] [ComfyUI-3/STDERR] import comfy.model_management
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 239, in
16:58:54.641 [Warning] [ComfyUI-3/STDERR] total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
16:58:54.641 [Warning] [ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 189, in get_torch_device
16:58:54.641 [Warning] [ComfyUI-3/STDERR] return torch.device(torch.cuda.current_device())
16:58:54.641 [Warning] [ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 1150, in current_device
16:58:54.641 [Warning] [ComfyUI-3/STDERR] _lazy_init()
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 480, in _lazy_init
16:58:54.641 [Warning] [ComfyUI-3/STDERR] torch._C._cuda_init()
16:58:54.641 [Warning] [ComfyUI-3/STDERR] RuntimeError: No HIP GPUs are available
16:58:55.048 [Info] Self-Start ComfyUI-3 unexpectedly exited (ExitCode=unknown) (if something failed, change setting LogLevel to Debug to see why!)
16:58:55.048 [Info] Self-Start ComfyUI-3 had errors before shutdown:
[ComfyUI-3/STDERR] Set cuda device to: 1
[ComfyUI-3/STDERR] Adding extra search path checkpoints /SwarmUI/Models/Stable-Diffusion
[ComfyUI-3/STDERR] Adding extra search path vae /SwarmUI/Models/VAE
[ComfyUI-3/STDERR] Adding extra search path loras /SwarmUI/Models/Lora
[ComfyUI-3/STDERR] Adding extra search path loras /SwarmUI/Models/LyCORIS
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/ESRGAN
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/RealESRGAN
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/SwinIR
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/upscale-models
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/upscale_models
[ComfyUI-3/STDERR] Adding extra search path embeddings /SwarmUI/Models/Embeddings
[ComfyUI-3/STDERR] Adding extra search path embeddings /SwarmUI/Models/embeddings
[ComfyUI-3/STDERR] Adding extra search path hypernetworks /SwarmUI/Models/hypernetworks
[ComfyUI-3/STDERR] Adding extra search path controlnet /SwarmUI/Models/controlnet
[ComfyUI-3/STDERR] Adding extra search path controlnet /SwarmUI/Models/model_patches
[ComfyUI-3/STDERR] Adding extra search path controlnet /SwarmUI/Models/ControlNet
[ComfyUI-3/STDERR] Adding extra search path model_patches /SwarmUI/Models/controlnet
[ComfyUI-3/STDERR] Adding extra search path model_patches /SwarmUI/Models/model_patches
[ComfyUI-3/STDERR] Adding extra search path model_patches /SwarmUI/Models/ControlNet
[ComfyUI-3/STDERR] Adding extra search path clip /SwarmUI/Models/text_encoders
[ComfyUI-3/STDERR] Adding extra search path clip /SwarmUI/Models/clip
[ComfyUI-3/STDERR] Adding extra search path clip /SwarmUI/Models/CLIP
[ComfyUI-3/STDERR] Adding extra search path clip_vision /SwarmUI/Models/clip_vision
[ComfyUI-3/STDERR] Adding extra search path unet /SwarmUI/Models/unet
[ComfyUI-3/STDERR] Adding extra search path diffusion_models /SwarmUI/Models/diffusion_models
[ComfyUI-3/STDERR] Adding extra search path gligen /SwarmUI/Models/gligen
[ComfyUI-3/STDERR] Adding extra search path ipadapter /SwarmUI/Models/ipadapter
[ComfyUI-3/STDERR] Adding extra search path yolov8 /SwarmUI/Models/yolov8
[ComfyUI-3/STDERR] Adding extra search path tensorrt /SwarmUI/Models/tensorrt
[ComfyUI-3/STDERR] Adding extra search path clipseg /SwarmUI/Models/clipseg
[ComfyUI-3/STDERR] Adding extra search path style_models /SwarmUI/Models/style_models
[ComfyUI-3/STDERR] Adding extra search path latent_upscale_models /SwarmUI/Models/latent_upscale_models
[ComfyUI-3/STDERR] Adding extra search path custom_nodes /SwarmUI/src/BuiltinExtensions/ComfyUIBackend/DLNodes
[ComfyUI-3/STDERR] Adding extra search path custom_nodes /SwarmUI/src/BuiltinExtensions/ComfyUIBackend/ExtraNodes
[ComfyUI-3/STDERR] Checkpoint files will always be loaded safely.
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] Traceback (most recent call last):
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/main.py", line 177, in
[ComfyUI-3/STDERR] import execution
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/execution.py", line 15, in
[ComfyUI-3/STDERR] import comfy.model_management
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 239, in
[ComfyUI-3/STDERR] total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
[ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 189, in get_torch_device
[ComfyUI-3/STDERR] return torch.device(torch.cuda.current_device())
[ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 1150, in current_device
[ComfyUI-3/STDERR] _lazy_init()
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 480, in _lazy_init
[ComfyUI-3/STDERR] torch._C._cuda_init()
[ComfyUI-3/STDERR] RuntimeError: No HIP GPUs are available
Other
Root Cause
In NetworkBackendUtils.cs, the HIP_VISIBLE_DEVICES environment variable is only set on Windows (line 374-377), but not on Linux. However, ComfyUI with ROCm requires HIP_VISIBLE_DEVICES to be set on Linux as well.
Additionally, when ROCR_VISIBLE_DEVICES is set to a single GPU ID (e.g., ROCR_VISIBLE_DEVICES=1), that GPU becomes visible as device 0 to the process. Therefore, HIP_VISIBLE_DEVICES must be set to 0 (not the original GPU ID) to correctly access the restricted GPU.
Current Code
// In NetworkBackendUtils.cs, line ~372-378
PythonLaunchHelper.CleanEnvironmentOfPythonMess(start, $"({nameSimple} launch) ");
start.Environment["CUDA_VISIBLE_DEVICES"] = $"{gpuId}";
if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
{
start.Environment["HIP_VISIBLE_DEVICES"] = $"{gpuId}";
}
start.Environment["ROCR_VISIBLE_DEVICES"] = $"{gpuId}";
Proposed Solution
Set HIP_VISIBLE_DEVICES=0 on Linux when ROCR_VISIBLE_DEVICES restricts to a single GPU, since that GPU becomes device 0 in the process's view:
// In NetworkBackendUtils.cs, line ~372-378
PythonLaunchHelper.CleanEnvironmentOfPythonMess(start, $"({nameSimple} launch) ");
start.Environment["CUDA_VISIBLE_DEVICES"] = $"{gpuId}";
// For ROCm/HIP, when ROCR_VISIBLE_DEVICES restricts to a single GPU, that GPU becomes device 0
// So we need to set HIP_VISIBLE_DEVICES=0 to access it
start.Environment["HIP_VISIBLE_DEVICES"] = RuntimeInformation.IsOSPlatform(OSPlatform.Windows) ? $"{gpuId}" : "0";
start.Environment["ROCR_VISIBLE_DEVICES"] = $"{gpuId}";
Testing
After applying this fix on server:
- Backend with
GPU_ID: 0 starts successfully ✓
- Backend with
GPU_ID: 1 starts successfully ✓
- Backend with
GPU_ID: 2 starts successfully ✓
- Backend with
GPU_ID: 3 starts successfully ✓
All four backends can now run simultaneously, each using a different GPU.
Expected Behavior
When configuring multiple ComfyUI backends with different GPU IDs on Linux with ROCm, expect to run a backend per GPU.
Actual Behavior
First ComfyUI backend starts, the second and subsequent ComfyUI backends fail to start with the error:
Steps to Reproduce
Environment
Steps to Reproduce
GPU_ID: 0- works fineGPU_ID: 1- fails to startRuntimeError: No HIP GPUs are availableDebug Logs
16:58:50.608 [Warning] User local requested edit of backend 3.
16:58:50.616 [Info] ComfyUI backend 3 shutting down...
16:58:52.479 [Init] Initializing backend #3 - ComfyUI Self-Starting...
16:58:52.989 [Init] Self-Start ComfyUI-3 on port 7823 is loading...
16:58:54.641 [Warning] [ComfyUI-3/STDERR] Traceback (most recent call last):
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/main.py", line 177, in
16:58:54.641 [Warning] [ComfyUI-3/STDERR] import execution
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/execution.py", line 15, in
16:58:54.641 [Warning] [ComfyUI-3/STDERR] import comfy.model_management
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 239, in
16:58:54.641 [Warning] [ComfyUI-3/STDERR] total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
16:58:54.641 [Warning] [ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 189, in get_torch_device
16:58:54.641 [Warning] [ComfyUI-3/STDERR] return torch.device(torch.cuda.current_device())
16:58:54.641 [Warning] [ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 1150, in current_device
16:58:54.641 [Warning] [ComfyUI-3/STDERR] _lazy_init()
16:58:54.641 [Warning] [ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 480, in _lazy_init
16:58:54.641 [Warning] [ComfyUI-3/STDERR] torch._C._cuda_init()
16:58:54.641 [Warning] [ComfyUI-3/STDERR] RuntimeError: No HIP GPUs are available
16:58:55.048 [Info] Self-Start ComfyUI-3 unexpectedly exited (ExitCode=unknown) (if something failed, change setting
LogLeveltoDebugto see why!)16:58:55.048 [Info] Self-Start ComfyUI-3 had errors before shutdown:
[ComfyUI-3/STDERR] Set cuda device to: 1
[ComfyUI-3/STDERR] Adding extra search path checkpoints /SwarmUI/Models/Stable-Diffusion
[ComfyUI-3/STDERR] Adding extra search path vae /SwarmUI/Models/VAE
[ComfyUI-3/STDERR] Adding extra search path loras /SwarmUI/Models/Lora
[ComfyUI-3/STDERR] Adding extra search path loras /SwarmUI/Models/LyCORIS
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/ESRGAN
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/RealESRGAN
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/SwinIR
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/upscale-models
[ComfyUI-3/STDERR] Adding extra search path upscale_models /SwarmUI/Models/upscale_models
[ComfyUI-3/STDERR] Adding extra search path embeddings /SwarmUI/Models/Embeddings
[ComfyUI-3/STDERR] Adding extra search path embeddings /SwarmUI/Models/embeddings
[ComfyUI-3/STDERR] Adding extra search path hypernetworks /SwarmUI/Models/hypernetworks
[ComfyUI-3/STDERR] Adding extra search path controlnet /SwarmUI/Models/controlnet
[ComfyUI-3/STDERR] Adding extra search path controlnet /SwarmUI/Models/model_patches
[ComfyUI-3/STDERR] Adding extra search path controlnet /SwarmUI/Models/ControlNet
[ComfyUI-3/STDERR] Adding extra search path model_patches /SwarmUI/Models/controlnet
[ComfyUI-3/STDERR] Adding extra search path model_patches /SwarmUI/Models/model_patches
[ComfyUI-3/STDERR] Adding extra search path model_patches /SwarmUI/Models/ControlNet
[ComfyUI-3/STDERR] Adding extra search path clip /SwarmUI/Models/text_encoders
[ComfyUI-3/STDERR] Adding extra search path clip /SwarmUI/Models/clip
[ComfyUI-3/STDERR] Adding extra search path clip /SwarmUI/Models/CLIP
[ComfyUI-3/STDERR] Adding extra search path clip_vision /SwarmUI/Models/clip_vision
[ComfyUI-3/STDERR] Adding extra search path unet /SwarmUI/Models/unet
[ComfyUI-3/STDERR] Adding extra search path diffusion_models /SwarmUI/Models/diffusion_models
[ComfyUI-3/STDERR] Adding extra search path gligen /SwarmUI/Models/gligen
[ComfyUI-3/STDERR] Adding extra search path ipadapter /SwarmUI/Models/ipadapter
[ComfyUI-3/STDERR] Adding extra search path yolov8 /SwarmUI/Models/yolov8
[ComfyUI-3/STDERR] Adding extra search path tensorrt /SwarmUI/Models/tensorrt
[ComfyUI-3/STDERR] Adding extra search path clipseg /SwarmUI/Models/clipseg
[ComfyUI-3/STDERR] Adding extra search path style_models /SwarmUI/Models/style_models
[ComfyUI-3/STDERR] Adding extra search path latent_upscale_models /SwarmUI/Models/latent_upscale_models
[ComfyUI-3/STDERR] Adding extra search path custom_nodes /SwarmUI/src/BuiltinExtensions/ComfyUIBackend/DLNodes
[ComfyUI-3/STDERR] Adding extra search path custom_nodes /SwarmUI/src/BuiltinExtensions/ComfyUIBackend/ExtraNodes
[ComfyUI-3/STDERR] Checkpoint files will always be loaded safely.
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
[ComfyUI-3/STDERR] Traceback (most recent call last):
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/main.py", line 177, in
[ComfyUI-3/STDERR] import execution
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/execution.py", line 15, in
[ComfyUI-3/STDERR] import comfy.model_management
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 239, in
[ComfyUI-3/STDERR] total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
[ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/comfy/model_management.py", line 189, in get_torch_device
[ComfyUI-3/STDERR] return torch.device(torch.cuda.current_device())
[ComfyUI-3/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 1150, in current_device
[ComfyUI-3/STDERR] _lazy_init()
[ComfyUI-3/STDERR] File "/SwarmUI/dlbackend/ComfyUI/venv/lib/python3.11/site-packages/torch/cuda/init.py", line 480, in _lazy_init
[ComfyUI-3/STDERR] torch._C._cuda_init()
[ComfyUI-3/STDERR] RuntimeError: No HIP GPUs are available
Other
Root Cause
In
NetworkBackendUtils.cs, theHIP_VISIBLE_DEVICESenvironment variable is only set on Windows (line 374-377), but not on Linux. However, ComfyUI with ROCm requiresHIP_VISIBLE_DEVICESto be set on Linux as well.Additionally, when
ROCR_VISIBLE_DEVICESis set to a single GPU ID (e.g.,ROCR_VISIBLE_DEVICES=1), that GPU becomes visible as device 0 to the process. Therefore,HIP_VISIBLE_DEVICESmust be set to0(not the original GPU ID) to correctly access the restricted GPU.Current Code
Proposed Solution
Set
HIP_VISIBLE_DEVICES=0on Linux whenROCR_VISIBLE_DEVICESrestricts to a single GPU, since that GPU becomes device 0 in the process's view:Testing
After applying this fix on server:
GPU_ID: 0starts successfully ✓GPU_ID: 1starts successfully ✓GPU_ID: 2starts successfully ✓GPU_ID: 3starts successfully ✓All four backends can now run simultaneously, each using a different GPU.