Torchvision API infrastructure by mdabek-nvidia · Pull Request #6229 · NVIDIA/DALI

mdabek-nvidia · 2026-02-25T12:01:28Z

Torchvision API infrastructure

Category:

New feature

Description:

This is the first of the series of PRs implementing Torchvision API.
The full PR has been modified to include only infrastructure, resize and flip operators required for unit testing.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

Initial commit including infrastructure, resize and flip operators and unit tests Signed-off-by: Marek Dabek <mdabek@nvidia.com>

mdabek-nvidia · 2026-02-25T12:02:43Z

!build

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

dali-automaton · 2026-02-25T12:06:24Z

CI MESSAGE: [44795470]: BUILD STARTED

dali-automaton · 2026-02-25T15:34:40Z

CI MESSAGE: [44795470]: BUILD PASSED

greptile-apps · 2026-02-26T11:54:42Z

Greptile Summary

This PR introduces the foundational Torchvision-compatible API infrastructure for NVIDIA DALI, adding Compose, Resize, RandomHorizontalFlip, and RandomVerticalFlip operators under nvidia.dali.experimental.torchvision. The implementation provides two execution paths: a pipeline-mode Compose class that builds a persistent DALI pipeline accepting either PIL Images (HWC layout) or torch.Tensor inputs (CHW layout), and a functional API (fn_dali.resize, fn_dali.horizontal_flip, etc.) that uses the adjust_input decorator for single-call eager execution via nvidia.dali.experimental.dynamic.

Key findings from the review:

Functional API accepts unsupported interpolation modes silently — NEAREST_EXACT, BOX, and HAMMING pass functional.resize() verification and silently fall back to INTERP_NN, whereas the class-based Resize correctly raises NotImplementedError. These APIs should behave consistently.
Dead VerifSizeDescriptor code — defined in operator.py but never referenced in any arg_rules or import.

The core infrastructure is sound and the flip/compose operators are well-tested and pixel-exact against torchvision. The resize pipeline-mode path correctly passes interpolation and antialias parameters. However, the behavioral inconsistency between functional.resize() and Resize class for NEAREST_EXACT/BOX/HAMMING modes is a correctness issue that could mislead users.

Confidence Score: 3/5

Mergeable with caution — there is one behavioral inconsistency between the functional and class-based APIs for unsupported interpolation modes that should be addressed before merge.
The core infrastructure is sound and well-structured. The flip and compose operators are correct, and the resize pipeline-mode path properly forwards all parameters. However, the functional API silently accepts NEAREST_EXACT, BOX, and HAMMING interpolation modes while the class-based API rejects them — this inconsistency could mislead users and should be fixed. Additionally, there is unreferenced dead code (VerifSizeDescriptor) that should be cleaned up.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py (fix interpolation mode handling) and dali/python/nvidia/dali/experimental/torchvision/v2/operator.py (remove unused VerifSizeDescriptor)

_{Last reviewed commit: b12f415}

greptile-apps

_{13 files reviewed, 7 comments}

_{Edit Code Review Agent Settings | Greptile}

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py

dali/python/nvidia/dali/experimental/torchvision/v2/functional/flips.py

jantonguirao · 2026-03-02T11:57:21Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        stream = torch.cuda.Stream(0)
+        with torch.cuda.stream(stream):
+            output = self.pipe.run(stream, input_data=data_input)
+


Using a cuda stream unconditionally might break CPU-only environments. Please wrap it with torch.cuda.is_available() or similar

Do we have CPU-only tests that will cover torchvision? If not, we should definitely add some to catch things like this

There are CPU-only test cases, but no test suite. I will add it.

jantonguirao · 2026-03-02T12:09:56Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        if len(tensor_or_tl) == 1:
+            tensor_or_tl = tensor_or_tl[0]
+        return _to_torch_tensor(tensor_or_tl)


why do we what to unpack 1 element tuples?. How about something like:

def to_torch_tensor( x: Union[tuple, 'TensorListGPU', 'TensorListCPU'] ) -> Union[torch.Tensor, tuple]: if isinstance(x, (TensorListGPU, TensorListCPU)): return to_torch_tensor(x.as_tensor()) elif isinstance(x, tuple): return tuple(to_torch_tensor(elem) for elem in x) else: return torch.from_dlpack(x)

We need to unpack it since Torchvision Compose operator returns tensor or PIL.Image not tuple. DALI.Torchvision supposed to be used as a drop in replacement to Torchvision and maximum compatibility should be maintained.
I will use the above code as a base for better implementation of to_torch_tensor.

jantonguirao · 2026-03-02T12:11:21Z

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

+            # This is WAR for DLPpack not supporting pinned memory
+            if output.device.device_type == "cpu":
+                output = np.asarray(output)


we should at least document this, or add a warning message

jantonguirao · 2026-03-02T12:16:04Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        VerificationTensorOrImage.verify(data_input)
+
+        if self.active_pipeline is None:
+            self._build_pipeline(data_input)


We should verify if we built a CHW or HWC pipeline, and rebuild in case the type changes. We can store the "pipeline kind" as a member, and initialize it during _build_pipeline.

Shouldn't changing the layout in the middle of pipeline lifecycle considered a bug?

jantonguirao · 2026-03-02T12:20:11Z

dali/python/nvidia/dali/experimental/torchvision/v2/tensor.py

+    [DEPRECATED but used]
+    """
+
+    def __call__(self, data_input):


this doesn't seem to cover fully the torchvision core semantics. Here's a suggestion (LLM generated):

class ToTensor: """ Convert an image-like tensor to a float32 CHW tensor in [0, 1]. Intended to be analogous to torchvision's ToTensor for the common image path: - input is uint8 in [0, 255] with layout "HWC" (e.g. PIL image converted to array) - output is float32 in [0, 1] with layout "CHW" Notes ----- - This operates on DALI graph nodes (DataNode). - If the input is already floating point, it is only cast to float32 by default and not rescaled (to avoid double-scaling). Set `scale=True` if you want unconditional division by 255. """ def __init__( self, *, output_layout: Literal["CHW", "HWC"] = "CHW", dtype=types.FLOAT, scale: bool = True, input_layout: Optional[Literal["HWC", "CHW"]] = None, ): """ Parameters ---------- output_layout: Layout of the output. torchvision uses CHW. dtype: Output dtype (default: FLOAT / float32). scale: If True, divide by 255 when input is integer type. If False, only cast/transpose. input_layout: If provided, forces how ToTensor interprets the input layout. If None, ToTensor assumes HWC (typical for PIL path in your Compose). """ self.output_layout = output_layout self.dtype = dtype self.scale = scale self.input_layout = input_layout def __call__(self, data_input): # 1) Convert to float32 (or requested dtype) # If input is integer and scale=True, do scaling in float to get [0, 1]. x = data_input # Cast first; DALI will handle conversion x = fn.cast(x, dtype=self.dtype) if self.scale: # This matches torchvision's ToTensor scaling for uint8-like images. # We do it unconditionally after cast; for non-uint8 integer inputs it still scales, # which is usually desired for "image" semantics. If you need stricter behavior, # you can make this conditional on original dtype (requires dtype introspection). x = x / 255.0 # 2) Layout conversion in_layout = self.input_layout or "HWC" out_layout = self.output_layout if in_layout == out_layout: return x if in_layout == "HWC" and out_layout == "CHW": # permute HWC -> CHW x = fn.transpose(x, perm=[2, 0, 1]) return x if in_layout == "CHW" and out_layout == "HWC": x = fn.transpose(x, perm=[1, 2, 0]) return x raise ValueError(f"Unsupported layout conversion: {in_layout} -> {out_layout}")

ToTensor is a special case in this implementation, since DALI is returning tensors from a pipeline. I agree that scaling should be optional and casting is missing.
What is more, DALI implementation is limited and allows using ToTensor as a final operator of the pipeline - this will be documented.

jantonguirao · 2026-03-02T12:23:34Z

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

+        if no_size:
+            if orig_h > orig_w:
+                target_w = (max_size * orig_w) / orig_h
+            else:
+                target_h = (max_size * orig_h) / orig_w


this works in float, which might be imcompatible with torchvision (can lead to error off by one, etc).

Here's a suggestion (LLM generated, to be double checked) following the torch rounding rules:

from __future__ import annotations from typing import Optional, Sequence, Tuple, Union import math SizeArg = Union[int, Sequence[int], None] def _check_size_arg(size: SizeArg) -> SizeArg: # torchvision allows int or sequence len 1 or 2; len 1 treated like int if isinstance(size, (list, tuple)): if len(size) == 0 or len(size) > 2: raise ValueError(f"size must have len 1 or 2, got {len(size)}") if len(size) == 1: return int(size[0]) return (int(size[0]), int(size[1])) if size is None: return None return int(size) def _round_to_int(x: float) -> int: # torchvision uses rounding-to-nearest-int for computed sizes. # (In practice they do `int(x + 0.5)` for positive x or `round(x)` depending on version.) return int(math.floor(x + 0.5)) def torchvision_resize_output_size( orig_h: int, orig_w: int, size: SizeArg, max_size: Optional[int] = None, ) -> Tuple[int, int]: """ Compute output size for torchvision.transforms.Resize(size, max_size=max_size) for 2D images. Returns (new_h, new_w) as ints. """ size = _check_size_arg(size) if size is None: if max_size is None: raise ValueError("If size is None, max_size must be provided.") # Equivalent to "not_larger": longest edge becomes max_size # and aspect ratio is preserved. # This mirrors torchvision v2 functional behavior. if orig_w >= orig_h: new_w = int(max_size) new_h = _round_to_int(max_size * orig_h / orig_w) else: new_h = int(max_size) new_w = _round_to_int(max_size * orig_w / orig_h) return new_h, new_w # size is (h, w): direct if isinstance(size, tuple): if max_size is not None: raise ValueError("max_size must be None when size is a sequence of length 2.") return int(size[0]), int(size[1]) # size is int: match shorter edge if max_size is not None and max_size <= 0: raise ValueError("max_size must be positive.") short, long_ = (orig_w, orig_h) if orig_w <= orig_h else (orig_h, orig_w) if short == size: # torchvision returns original size (no-op) new_h, new_w = orig_h, orig_w else: scale = float(size) / float(short) new_h = _round_to_int(orig_h * scale) new_w = _round_to_int(orig_w * scale) if max_size is not None: new_short, new_long = (new_w, new_h) if new_w <= new_h else (new_h, new_w) if new_long > max_size: scale = float(max_size) / float(new_long) new_h = _round_to_int(new_h * scale) new_w = _round_to_int(new_w * scale) # safety: clamp to at least 1 new_h = max(1, int(new_h)) new_w = max(1, int(new_w)) return new_h, new_w

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

jantonguirao · 2026-03-02T12:27:44Z

dali/test/python/torchvision/test_tv_resize.py

+        tv_shape_lower = torch.Size([out_tv.shape[1] - 1, out_tv.shape[2] - 1])
+        tv_shape_upper = torch.Size([out_tv.shape[1] + 1, out_tv.shape[2] + 1])


instead, you could follow torch's rounding rules and expect equality? At least to be checked if it is possible to achieve with DALI.

Strongly agree, full equality would be the best.

If the rounding errors are unavoidable, we should probably document where discrepancy comes from and make sure that we know the error bound (is it always +/-1? or can it be +/-2 in some scenarios?). The documentation should warn the users that they won't see bit-exact results.

If equality is not possible, I think using something like isclose would make this test code more readable, as you'll have something along the lines of assert torch.isclose(out_fn.shape[0], out_tv.shape[0], rtol=0, atol=1)

I would like it to be equal as well, but DALI calculates the output size different from Torchvision.
The following change in the resize_attr_base.cc::AdjustOutputSize:

- out_size[d] = in_size[d] * scale[d]; + out_size[d] = std::floor(in_size[d] * scale[d]);

would align both libraries, but it would change DALI's behavior.
There is an option to recalculate the output size in Python, but it would add an additional overhead to resize and I took a liberty to not do that and add it to the documentation that the calcualted size may be off by one.

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

szkarpinski

Leaving some comments after the first reading. My main concerns are:

There's a significant amount of hard-coded dependencies on layouts (HWC, CHW) or color formats (RGB, RGBA). I'd like to make sure that this is intentional and we don't anticipate any need to extend this soon.
The verification of arguments and argument handling seems to overlap with what DALI already does, and duplicated input validation means added maintenance cost and a risk of discrepancies

szkarpinski · 2026-03-02T13:59:07Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+    return torch.from_dlpack(dali_tensor)
+
+
+def to_torch_tensor(tensor_or_tl: tuple | TensorListGPU | TensorListCPU) -> torch.Tensor:


This looks like a useful and generic thing, not strictly related to torchvision. Shouldn't this be a part of main DALI, or Torch plugin?

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

szkarpinski · 2026-03-02T14:04:11Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        stream = torch.cuda.Stream(0)
+        with torch.cuda.stream(stream):
+            output = self.pipe.run(stream, input_data=data_input)
+


Do we have CPU-only tests that will cover torchvision? If not, we should definitely add some to catch things like this

szkarpinski · 2026-03-02T14:06:33Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        return self.convert_to_tensor
+
+
+class PipelineHWC(PipelineLayouted):


Can you share some more context for this design? Why do we need a class per layout? This will be hard to generalize, so I believe that documenting the reasons for this decision is important

Class per layout is needed due to how Torchvision works. It expects HWC for PIL.Images and CHW for torch.Tensors. (currently no other inputs are supported in object oriented API). The main idea is to separate both cases to avoid if-else chains in input type verification, input and output data conversion, etc. The first release of the Torchvision DALI OO API is limited in what can be supported and focuses on the most common types. Having separate classes for different layouts allows us to extend support even further, hopefully without tampering with the existing code.

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

szkarpinski · 2026-03-02T14:11:53Z

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

+            raise ValueError(f"Values {name} should be positive number, got {values}")
+
+
+class VerifyIfOrderedPair(ArgumentVerificationRule):


Nitpick: ordered pair is something different

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

szkarpinski · 2026-03-02T14:13:22Z

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

+    arg_rules: Sequence[ArgumentVerificationRule] = []
+    input_rules: Sequence[DataVerificationRule] = []


I like the idea of formalizing and generalizing the verification rules, but do we know how much does this overlap with existing verification that DALI does? Won't such an overlap affect the maintainability?

The main reason behind verification rules were to align with Torchvision verification of arguments and input data and respond with the same error messages in case of violation of these rules. I agree that there may be an overlap with what DALI does, but the new verification serves a bit different purpose.

szkarpinski · 2026-03-02T14:14:13Z

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

+         L, RGB, or RGBA mode
+        - torch.Tensor:
+          ndim==3 -> ndd.Tensor(layout = "CHW"),
+          ndim>3 -> ndd.Batch(layout="NCHW")


Is it ndim>3 or ndim==4? Do we support multi-dimensional batches?

Torchvision does not limit number of supported dimensions :

The v2 transforms generally accept an arbitrary number of leading dimensions (..., C, H, W) and can handle batched images or batched videos.

Source

I am not currently sure if there are use cases for multi-dimensional batches but I decided to let DALI backend be a point of failure, so that we could easily pinpoint the future issues.

dali/python/nvidia/dali/experimental/torchvision/v2/functional/flips.py

szkarpinski · 2026-03-02T20:28:36Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        output = to_torch_tensor(output)
+        # ToTensor
+        if self.convert_to_tensor:
+            if output.shape[-4] > 1:


What if I pass a tensor of shape (100, 1, 2, 3, 4)? Will this 2D batch be caught somewhere?

If it is not supported by DALI it will be caught there, otherwise no.

dali/python/nvidia/dali/experimental/torchvision/v2/flips.py

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

szkarpinski · 2026-03-02T20:41:05Z

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

+    If the image is torch Tensor, it is expected to have […, H, W] shape, where … means a maximum
+    of two leading dimensions


I don't get this one. If I have 300x200 image, my tensor is expected to have shape [max(300, 200), 300, 200] = [300, 300, 200]?

(nitpick) Also, "leading dimension" is a term used in BLAS, and here I assume it means something different

https://docs.pytorch.org/vision/main/generated/torchvision.transforms.Resize.html

dali/test/python/torchvision/test_tv_compose.py

szkarpinski · 2026-03-02T20:47:05Z

dali/test/python/torchvision/test_tv_flips.py

+@params("gpu", "cpu")
+def test_horizontal_random_flip_probability(device):
+    img = make_test_tensor()
+    transform = Compose([RandomHorizontalFlip(p=1.0, device=device)])  # always flip


Is Compose required to instantiate an operator? I believe that in torchvision you can use Compose to compose multiple operators, but Compose on a single-element list is a no-op and is not required nor encouraged

Is Compose required to instantiate an operator?

Yes, since Compose is a wrapper for a pipeline. It does not make much sense to instantiate standalone operators (and build a pipeline around them), since it would promote an antipattern and probably hinder the performance. It is better to use functional, which is using dynamic mode.

Torchvision allows standalone operators instantiation, since they have low overhead when executed on CPU.

szkarpinski · 2026-03-02T20:56:56Z

dali/test/python/torchvision/test_tv_resize.py

+        tv_shape_lower = torch.Size([out_tv.shape[1] - 1, out_tv.shape[2] - 1])
+        tv_shape_upper = torch.Size([out_tv.shape[1] + 1, out_tv.shape[2] + 1])


Strongly agree, full equality would be the best.

If the rounding errors are unavoidable, we should probably document where discrepancy comes from and make sure that we know the error bound (is it always +/-1? or can it be +/-2 in some scenarios?). The documentation should warn the users that they won't see bit-exact results.

If equality is not possible, I think using something like isclose would make this test code more readable, as you'll have something along the lines of assert torch.isclose(out_fn.shape[0], out_tv.shape[0], rtol=0, atol=1)

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

mdabek-nvidia · 2026-03-04T13:15:27Z

!build

dali/test/python/torchvision/test_tv_resize.py

dali-automaton · 2026-03-04T13:20:53Z

CI MESSAGE: [45326301]: BUILD STARTED

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

dali/test/python/torchvision/test_tv_resize.py

dali-automaton · 2026-03-04T13:27:55Z

CI MESSAGE: [45326301]: BUILD FAILED

dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

Modified resize operator size calculation Signed-off-by: Marek Dabek <mdabek@nvidia.com>

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Marek Dabek <mdabek@nvidia.com>

greptile-apps · 2026-03-05T15:15:02Z

dali/test/python/torchvision/test_tv_resize.py

+    assert torch.allclose(
+        torch.tensor(out_tv.shape[1:3]), torch.tensor(out_dali_tv.shape[1:3]), rtol=0, atol=1
+    ), f"Should be:{out_tv.shape} is:{out_dali_tv.shape}"
+    assert torch.allclose(
+        torch.tensor(out_fn.shape[1:3]), torch.tensor(out_dali_fn.shape[1:3]), rtol=0, atol=1
+    ), f"Should be:{out_fn.shape} is:{out_dali_fn.shape}"


Shape comparison skips width for 4D batched tensors

shape[1:3] extracts different semantic dimensions depending on the tensor rank:

For a 3D CHW output (C, H, W) → shape[1:3] = (H, W) ✓

For a 4D NCHW output (N, C, H, W) → shape[1:3] = (C, H) ✗

build_tensors() deliberately creates 4D batched tensors (e.g., torch.ones((1, channels, max_size, max_size)) and torch.ones((10, channels, max_size, max_size))), so for those cases the assertion compares (C, H_dali) against (C, H_tv). Since C is never changed by resize, only height is effectively validated and width is never checked. A buggy resize that produced the correct height but the wrong width would silently pass this assertion for all 4D inputs.

# For layout-agnostic spatial dimension comparison: if out_dali_tv.ndim == 4: # NCHW dali_spatial = torch.tensor(out_dali_tv.shape[-2:]) tv_spatial = torch.tensor(out_tv.shape[-2:]) else: # CHW dali_spatial = torch.tensor(out_dali_tv.shape[-2:]) tv_spatial = torch.tensor(out_tv.shape[-2:]) assert torch.allclose(tv_spatial, dali_spatial, rtol=0, atol=1), ...

greptile-apps · 2026-03-05T15:15:03Z

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

+    layout = data_input.property("layout")[0]
+
+    # CHW
+    if layout == np.frombuffer(bytes("C", "utf-8"), dtype=np.uint8)[0]:
+        input_height = data_input.shape()[-2]
+        input_width = data_input.shape()[-1]
+    # HWC
+    else:
+        input_height = data_input.shape()[-3]
+        input_width = data_input.shape()[-2]
+
+    return input_height, input_width, data_input


Fragile layout detection via single-character comparison

The layout is identified by comparing only the first byte (property("layout")[0]) against the ASCII code of "C". While this correctly distinguishes "CHW" from "HWC" in the current pipeline context (which only ever produces those two layouts), it is brittle:

Any layout starting with "C" (e.g., a hypothetical "CWH") would be silently treated as CHW with correct-looking but semantically wrong height/width extraction.

"NCHW" starts with "N", so it would fall through to the HWC branch, producing input_height = shape[-3] (=C) and input_width = shape[-2] (=H) — both wrong.

If this function is ever called outside the tightly-controlled pipeline context (e.g., from a custom pipeline using "NCHW" layout), it will silently produce incorrect aspect-ratio calculations. Consider comparing the full layout string or using a dictionary/explicit set:

layout = data_input.property("layout") if layout in (b"CHW", b"NCHW"): input_height = data_input.shape()[-2] input_width = data_input.shape()[-1] elif layout in (b"HWC", b"NHWC"): input_height = data_input.shape()[-3] input_width = data_input.shape()[-2] else: raise ValueError(f"Unsupported layout: {layout!r}")

greptile-apps · 2026-03-05T15:15:04Z

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        if output is None:
+            return output


Dead code: output is None check can never be true

output is initialised to None on line 129 and is unconditionally assigned by both branches of the if/else block above (self.pipe.run(...) is called in both the GPU and CPU path). pipe.run() never returns None, so this guard is unreachable dead code and may give a false sense of safety. Consider removing it or replacing it with an assertion:

Suggested change

if output is None:

return output

assert output is not None, "pipe.run() returned None unexpectedly"

greptile-apps · 2026-03-05T15:15:05Z

dali/python/nvidia/dali/experimental/torchvision/v2/resize.py

+class VerificationSize(ArgumentVerificationRule):
+    @classmethod
+    def verify(cls, *, size, max_size, interpolation, **_):
+        if size is not None and not isinstance(size, int) and not isinstance(size, (tuple, list)):
+            raise ValueError(
+                "Invalid combination: size must be int, None, or sequence of two ints. "
+                "max_size only applies when size is int or None."
+            )
+        if size is None and max_size is None:
+            raise ValueError("Must provide max_size if size is None.")
+        if size is not None and max_size is not None and np.min(size) > max_size:
+            raise ValueError("max_size should not be smaller than the actual size")
+        if isinstance(size, (tuple, list)) and len(size) == 2 and max_size is not None:
+            raise ValueError(
+                "max_size should only be passed if size specifies the length of the smaller \
+                 edge, i.e. size should be an int"
+            )
+        if interpolation not in Resize.interpolation_modes.keys():
+            raise ValueError(f"Interpolation {type(interpolation)} is not supported")


VerificationSize does not validate that max_size is a positive integer

max_size participates in arithmetic calculations ((max_size * orig_w) / orig_h) and is forwarded directly to DALI's fn.resize(..., max_size=self.max_size). However, the verifier only checks relational constraints against size — it never checks that max_size itself is a positive integer. Passing max_size=0 or max_size=-1 would silently produce nonsensical resize targets or a runtime DALI error with a confusing message.

Consider adding an explicit check:

if max_size is not None and max_size <= 0: raise ValueError(f"max_size must be a positive integer, got {max_size}")

Moved resize output size calculation to a Resize class Signed-off-by: Marek Dabek <mdabek@nvidia.com>

mdabek-nvidia · 2026-03-06T15:58:59Z

!build

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+        return output
+
+    @abstractmethod
+    def get_layout(self) -> str: ...


dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+    def get_layout(self) -> str: ...
+
+    @abstractmethod
+    def get_channel_reverse_idx(self) -> int: ...


dali/python/nvidia/dali/experimental/torchvision/v2/compose.py

+    def get_channel_reverse_idx(self) -> int: ...
+
+    @abstractmethod
+    def verify_layout(self, data) -> None: ...


greptile-apps · 2026-03-06T16:08:07Z

dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py

+    Resize.verify_args(
+        size=size, max_size=max_size, interpolation=interpolation, antialias=antialias
+    )
+
+    size_normalized = Resize.infer_effective_size(size, max_size)
+    interpolation = Resize.interpolation_modes[interpolation]


Unsupported interpolation modes silently accepted in functional API

Resize.verify_args() delegates to VerificationSize.verify(), which only checks that interpolation is a key in Resize.interpolation_modes. However, InterpolationMode.NEAREST_EXACT, BOX, and HAMMING are present in that dict (all mapped to INTERP_NN) specifically to avoid a ValueError during the check, while Resize.__init__ explicitly raises NotImplementedError for those same modes.

As a result, calling functional.resize(img, size=512, interpolation=InterpolationMode.NEAREST_EXACT) silently uses nearest-neighbour interpolation instead of raising an error, which is directly inconsistent with the class-based Resize API.

The fix is to add the same guard that Resize.__init__ uses:

if interpolation in Resize.not_supported_interpolation_modes: raise NotImplementedError( f"Interpolation mode {interpolation} is not supported in DALI." )

greptile-apps · 2026-03-06T16:08:08Z

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py

+class VerifSizeDescriptor(ArgumentVerificationRule):
+    """
+    Verify if the value can describe a size argument, which is:
+    - an integer
+    - or a sequence of length of 1,
+    - or a sequence of length of 2
+
+    Parameters
+    ----------
+    size : any
+        Value to verify. Should be an integer or a sequence of length 1 or 2.
+    """
+
+    @classmethod
+    def verify(cls, *, size, **_) -> None:
+        if not isinstance(size, (int, list, tuple)):
+            raise TypeError(f"Size must be int, list, or tuple, got {type(size)}")
+        elif isinstance(size, (list, tuple)) and len(size) > 2:
+            raise ValueError(f"Size sequence must have length 1 or 2, got {len(size)}")
+        VerifyIfPositive.verify(values=size, name="size")
+


VerifSizeDescriptor defined but never used

VerifSizeDescriptor is declared in this file but never appears in any operator's arg_rules, is not imported anywhere else in the package, and its validation is already subsumed by VerificationSize (which calls VerifyIfPositive internally). This is unreferenced dead code that adds noise to the public surface of the module and should be removed.

Torchvision API

7e437a6

Initial commit including infrastructure, resize and flip operators and unit tests Signed-off-by: Marek Dabek <mdabek@nvidia.com>

github-advanced-security bot found potential problems Feb 25, 2026

View reviewed changes

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py Fixed Show fixed Hide fixed

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py Fixed Show fixed Hide fixed

mdabek-nvidia marked this pull request as ready for review February 26, 2026 11:50

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

dali-automaton assigned stiepan, szkarpinski and jantonguirao Feb 27, 2026

jantonguirao reviewed Mar 2, 2026

View reviewed changes

dali/python/nvidia/dali/experimental/torchvision/v2/operator.py Show resolved Hide resolved

jantonguirao reviewed Mar 2, 2026

View reviewed changes

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py Outdated Show resolved Hide resolved

jantonguirao reviewed Mar 2, 2026

View reviewed changes

dali/python/nvidia/dali/experimental/torchvision/v2/compose.py Outdated Show resolved Hide resolved

szkarpinski reviewed Mar 2, 2026

View reviewed changes

Review fixes

e4ade7a

Signed-off-by: Marek Dabek <mdabek@nvidia.com>

github-advanced-security bot found potential problems Mar 4, 2026

View reviewed changes

dali/test/python/torchvision/test_tv_resize.py Fixed Show fixed Hide fixed

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

mdabek-nvidia force-pushed the torchvision_api_infra branch from 34bcd56 to 08bbd62 Compare March 5, 2026 14:50

github-advanced-security bot found potential problems Mar 5, 2026

View reviewed changes

dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py Fixed Show fixed Hide fixed

dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py Fixed Show fixed Hide fixed

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

Review fixes - 2

bfed721

Modified resize operator size calculation Signed-off-by: Marek Dabek <mdabek@nvidia.com>

mdabek-nvidia force-pushed the torchvision_api_infra branch 2 times, most recently from 438737e to d15613b Compare March 5, 2026 15:03

Apply suggestion from @greptile-apps[bot]

9ac174a

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Marek Dabek <mdabek@nvidia.com>

mdabek-nvidia force-pushed the torchvision_api_infra branch from d15613b to 9ac174a Compare March 5, 2026 15:08

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

Fixed review errors and added resize tests.

b12f415

Moved resize output size calculation to a Resize class Signed-off-by: Marek Dabek <mdabek@nvidia.com>

github-advanced-security bot found potential problems Mar 6, 2026

View reviewed changes

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

		tv_shape_lower = torch.Size([out_tv.shape[1] - 1, out_tv.shape[2] - 1])
		tv_shape_upper = torch.Size([out_tv.shape[1] + 1, out_tv.shape[2] + 1])

		return torch.from_dlpack(dali_tensor)


		def to_torch_tensor(tensor_or_tl: tuple \| TensorListGPU \| TensorListCPU) -> torch.Tensor:

		return self.convert_to_tensor


		class PipelineHWC(PipelineLayouted):

		raise ValueError(f"Values {name} should be positive number, got {values}")


		class VerifyIfOrderedPair(ArgumentVerificationRule):

		arg_rules: Sequence[ArgumentVerificationRule] = []
		input_rules: Sequence[DataVerificationRule] = []

		If the image is torch Tensor, it is expected to have […, H, W] shape, where … means a maximum
		of two leading dimensions

	if output is None:
	return output
	assert output is not None, "pipe.run() returned None unexpectedly"

Conversation

mdabek-nvidia commented Feb 25, 2026

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

mdabek-nvidia commented Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

dali-automaton commented Feb 25, 2026

Uh oh!

dali-automaton commented Feb 25, 2026

Uh oh!

greptile-apps bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jantonguirao Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdabek-nvidia Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jantonguirao Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jantonguirao Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

szkarpinski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 26, 2026 •

edited

Loading

jantonguirao Mar 2, 2026 •

edited

Loading

mdabek-nvidia Mar 3, 2026 •

edited

Loading

jantonguirao Mar 2, 2026 •

edited

Loading

jantonguirao Mar 2, 2026 •

edited

Loading