Skip to content

RuntimeError: GET was unable to find an engine to execute this computation from F.conv2D() #372

Description

@kcostilow

(This is not my wheelhouse as I developer, but here is what my AI and I came up with for a bug report; I hope it is useful). I got this same error when this was a branch (multitask-v2-detector), using Detectorv2 with both path and tensor as inputs.

I have been testing the new DetectorV2 multitask branch on a System76 laptop with an NVIDIA RTX 2060 (6 GB VRAM). My environment is Python 3.12.3, PyTorch 2.12.1+cu130, CUDA runtime 13.0, cuDNN 9.2, NVIDIA proprietary driver 580.159.03. torch.cuda.is_available() is True, torch.backends.cudnn.is_available() is True, and the model parameters are on cuda:0 with torch.float32.

DetectorV2.detect() fails almost immediately during the first forward pass with:

RuntimeError: GET was unable to find an engine to execute this computation

The traceback points into the ConvNeXt backbone (timm) at the first depthwise convolution (conv_dw -> F.conv2d).

To isolate the problem, I created several standalone tests outside of py-feat:

  • Standalone CUDA Conv2d works.
  • Standalone depthwise Conv2d(groups=...) works.
  • Standalone timm ConvNeXt (features_only=True) on CUDA works correctly, including batch size 16 and 256×256 inputs.
  • cuDNN reports available (version=92000), and the PyTorch build reports USE_CUDA=ON, USE_CUDNN=ON.
  • The GPU architecture (SM 7.5) is included in the PyTorch wheel.

This seems to rule out a general CUDA, cuDNN, driver, or PyTorch installation problem. The failure appears to be specific to the new DetectorV2 multitask inference path rather than the underlying ConvNeXt implementation itself.

If it would be helpful, I'd be happy to test patches or provide additional diagnostics. If there are additional experiments that would help narrow this down further, I'm happy to run them.

This script (below) creates the failure on my machine, using a pexels video (couldn't upload). genfail.py:

import os
import argparse
from feat import Detectorv2
from feat.utils.io import video_to_tensor

parser = argparse.ArgumentParser()
parser.add_argument('--skip', dest='skip', type=int, default=24)
parser.add_argument('--batch_size', dest='batch_size', type=int, default=1)
parser.add_argument('--num_workers', dest='num_workers', type=int, default=1)
parser.add_argument('video_file')
args = parser.parse_args()

print(f"processing: {args}")

detector = Detectorv2(device="cuda")

print(detector.info)

tvf = video_to_tensor(args.video_file)
fex = detector.detect(tvf, data_type="tensor", face_identity_threshold=0.95, face_detection_threshold=0.95, skip=args.skip, batch_size=args.batch_size, num_workers=args.num_workers, verbose=True )

print(fex)

Output is:
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
{'face_model': 'retinaface', 'multitask_model': 'face_multitask_v2', 'identity_model': 'arcface', 'facepose_model': 'multitask', 'gaze_model': 'multitask'}
0%| | 0/801 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/kcostilow/venvs/genfail.py", line 21, in
fex = detector.detect(tvf, data_type="tensor", face_identity_threshold=0.95, face_detection_threshold=0.95, skip=args.skip, batch_size=args.batch_size, num_workers=args.num_workers, verbose=True )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/feat/detector_v2.py", line 534, in detect
batch_results = self.forward(faces_data, batch_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/feat/detector_v2.py", line 363, in forward
out = self.multitask(faces) # MultitaskOutput; faces already [0,1] 256 crops
^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/feat/multitask/inference.py", line 165, in call
out = self.model(x)
^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/feat/multitask/model_v2.py", line 695, in forward
feats = self.backbone(x)[-1] # [B, bb_ch, H, W]
^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/timm/models/_features.py", line 345, in forward
return list(self._collect(x).values())
^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/timm/models/_features.py", line 299, in _collect
x = module(x)
^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/timm/models/convnext.py", line 306, in forward
x = self.blocks(x)
^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/container.py", line 253, in forward
input = module(input)
^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/timm/models/convnext.py", line 200, in forward
x = self.conv_dw(x)
^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 565, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kcostilow/venvs/3.12-py-feat/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 560, in _conv_forward
return F.conv2d(
^^^^^^^^^
RuntimeError: GET was unable to find an engine to execute this computation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions