Skip to content

Torchvision API infrastructure#6229

Open
mdabek-nvidia wants to merge 5 commits intoNVIDIA:mainfrom
mdabek-nvidia:torchvision_api_infra
Open

Torchvision API infrastructure#6229
mdabek-nvidia wants to merge 5 commits intoNVIDIA:mainfrom
mdabek-nvidia:torchvision_api_infra

Conversation

@mdabek-nvidia
Copy link
Collaborator

Torchvision API infrastructure

Category:

New feature

Description:

This is the first of the series of PRs implementing Torchvision API.
The full PR has been modified to include only infrastructure, resize and flip operators required for unit testing.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Initial commit including infrastructure, resize and flip operators and unit tests

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44795470]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44795470]: BUILD PASSED

@mdabek-nvidia mdabek-nvidia marked this pull request as ready for review February 26, 2026 11:50
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR introduces the foundational Torchvision-compatible API infrastructure for NVIDIA DALI, adding Compose, Resize, RandomHorizontalFlip, and RandomVerticalFlip operators under nvidia.dali.experimental.torchvision. The implementation provides two execution paths: a pipeline-mode Compose class that builds a persistent DALI pipeline accepting either PIL Images (HWC layout) or torch.Tensor inputs (CHW layout), and a functional API (fn_dali.resize, fn_dali.horizontal_flip, etc.) that uses the adjust_input decorator for single-call eager execution via nvidia.dali.experimental.dynamic.

Key findings from the review:

  • Functional API accepts unsupported interpolation modes silentlyNEAREST_EXACT, BOX, and HAMMING pass functional.resize() verification and silently fall back to INTERP_NN, whereas the class-based Resize correctly raises NotImplementedError. These APIs should behave consistently.
  • Dead VerifSizeDescriptor code — defined in operator.py but never referenced in any arg_rules or import.

The core infrastructure is sound and the flip/compose operators are well-tested and pixel-exact against torchvision. The resize pipeline-mode path correctly passes interpolation and antialias parameters. However, the behavioral inconsistency between functional.resize() and Resize class for NEAREST_EXACT/BOX/HAMMING modes is a correctness issue that could mislead users.

Confidence Score: 3/5

  • Mergeable with caution — there is one behavioral inconsistency between the functional and class-based APIs for unsupported interpolation modes that should be addressed before merge.
  • The core infrastructure is sound and well-structured. The flip and compose operators are correct, and the resize pipeline-mode path properly forwards all parameters. However, the functional API silently accepts NEAREST_EXACT, BOX, and HAMMING interpolation modes while the class-based API rejects them — this inconsistency could mislead users and should be fixed. Additionally, there is unreferenced dead code (VerifSizeDescriptor) that should be cleaned up.
  • dali/python/nvidia/dali/experimental/torchvision/v2/functional/resize.py (fix interpolation mode handling) and dali/python/nvidia/dali/experimental/torchvision/v2/operator.py (remove unused VerifSizeDescriptor)

Last reviewed commit: b12f415

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +124 to +127
stream = torch.cuda.Stream(0)
with torch.cuda.stream(stream):
output = self.pipe.run(stream, input_data=data_input)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a cuda stream unconditionally might break CPU-only environments. Please wrap it with torch.cuda.is_available() or similar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have CPU-only tests that will cover torchvision? If not, we should definitely add some to catch things like this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are CPU-only test cases, but no test suite. I will add it.

Comment on lines +58 to +60
if len(tensor_or_tl) == 1:
tensor_or_tl = tensor_or_tl[0]
return _to_torch_tensor(tensor_or_tl)
Copy link
Contributor

@jantonguirao jantonguirao Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we what to unpack 1 element tuples?. How about something like:

def to_torch_tensor(
    x: Union[tuple, 'TensorListGPU', 'TensorListCPU']
) -> Union[torch.Tensor, tuple]:
    if isinstance(x, (TensorListGPU, TensorListCPU)):
        return to_torch_tensor(x.as_tensor())
    elif isinstance(x, tuple):
        return tuple(to_torch_tensor(elem) for elem in x)
    else:
        return torch.from_dlpack(x)

Copy link
Collaborator Author

@mdabek-nvidia mdabek-nvidia Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to unpack it since Torchvision Compose operator returns tensor or PIL.Image not tuple. DALI.Torchvision supposed to be used as a drop in replacement to Torchvision and maximum compatibility should be maintained.
I will use the above code as a base for better implementation of to_torch_tensor.

Comment on lines +283 to +285
# This is WAR for DLPpack not supporting pinned memory
if output.device.device_type == "cpu":
output = np.asarray(output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should at least document this, or add a warning message

VerificationTensorOrImage.verify(data_input)

if self.active_pipeline is None:
self._build_pipeline(data_input)
Copy link
Contributor

@jantonguirao jantonguirao Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify if we built a CHW or HWC pipeline, and rebuild in case the type changes. We can store the "pipeline kind" as a member, and initialize it during _build_pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't changing the layout in the middle of pipeline lifecycle considered a bug?

[DEPRECATED but used]
"""

def __call__(self, data_input):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem to cover fully the torchvision core semantics. Here's a suggestion (LLM generated):

class ToTensor:
    """
    Convert an image-like tensor to a float32 CHW tensor in [0, 1].

    Intended to be analogous to torchvision's ToTensor for the common image path:
    - input is uint8 in [0, 255] with layout "HWC" (e.g. PIL image converted to array)
    - output is float32 in [0, 1] with layout "CHW"

    Notes
    -----
    - This operates on DALI graph nodes (DataNode).
    - If the input is already floating point, it is only cast to float32 by default
      and not rescaled (to avoid double-scaling). Set `scale=True` if you want
      unconditional division by 255.
    """

    def __init__(
        self,
        *,
        output_layout: Literal["CHW", "HWC"] = "CHW",
        dtype=types.FLOAT,
        scale: bool = True,
        input_layout: Optional[Literal["HWC", "CHW"]] = None,
    ):
        """
        Parameters
        ----------
        output_layout:
            Layout of the output. torchvision uses CHW.
        dtype:
            Output dtype (default: FLOAT / float32).
        scale:
            If True, divide by 255 when input is integer type. If False, only cast/transpose.
        input_layout:
            If provided, forces how ToTensor interprets the input layout.
            If None, ToTensor assumes HWC (typical for PIL path in your Compose).
        """
        self.output_layout = output_layout
        self.dtype = dtype
        self.scale = scale
        self.input_layout = input_layout

    def __call__(self, data_input):
        # 1) Convert to float32 (or requested dtype)
        # If input is integer and scale=True, do scaling in float to get [0, 1].
        x = data_input

        # Cast first; DALI will handle conversion
        x = fn.cast(x, dtype=self.dtype)

        if self.scale:
            # This matches torchvision's ToTensor scaling for uint8-like images.
            # We do it unconditionally after cast; for non-uint8 integer inputs it still scales,
            # which is usually desired for "image" semantics. If you need stricter behavior,
            # you can make this conditional on original dtype (requires dtype introspection).
            x = x / 255.0

        # 2) Layout conversion
        in_layout = self.input_layout or "HWC"
        out_layout = self.output_layout

        if in_layout == out_layout:
            return x

        if in_layout == "HWC" and out_layout == "CHW":
            # permute HWC -> CHW
            x = fn.transpose(x, perm=[2, 0, 1])
            return x

        if in_layout == "CHW" and out_layout == "HWC":
            x = fn.transpose(x, perm=[1, 2, 0])
            return x

        raise ValueError(f"Unsupported layout conversion: {in_layout} -> {out_layout}")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToTensor is a special case in this implementation, since DALI is returning tensors from a pipeline. I agree that scaling should be optional and casting is missing.
What is more, DALI implementation is limited and allows using ToTensor as a final operator of the pipeline - this will be documented.

Comment on lines +127 to +131
if no_size:
if orig_h > orig_w:
target_w = (max_size * orig_w) / orig_h
else:
target_h = (max_size * orig_h) / orig_w
Copy link
Contributor

@jantonguirao jantonguirao Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works in float, which might be imcompatible with torchvision (can lead to error off by one, etc).

Here's a suggestion (LLM generated, to be double checked) following the torch rounding rules:

from __future__ import annotations

from typing import Optional, Sequence, Tuple, Union
import math

SizeArg = Union[int, Sequence[int], None]

def _check_size_arg(size: SizeArg) -> SizeArg:
    # torchvision allows int or sequence len 1 or 2; len 1 treated like int
    if isinstance(size, (list, tuple)):
        if len(size) == 0 or len(size) > 2:
            raise ValueError(f"size must have len 1 or 2, got {len(size)}")
        if len(size) == 1:
            return int(size[0])
        return (int(size[0]), int(size[1]))
    if size is None:
        return None
    return int(size)

def _round_to_int(x: float) -> int:
    # torchvision uses rounding-to-nearest-int for computed sizes.
    # (In practice they do `int(x + 0.5)` for positive x or `round(x)` depending on version.)
    return int(math.floor(x + 0.5))

def torchvision_resize_output_size(
    orig_h: int,
    orig_w: int,
    size: SizeArg,
    max_size: Optional[int] = None,
) -> Tuple[int, int]:
    """
    Compute output size for torchvision.transforms.Resize(size, max_size=max_size)
    for 2D images.

    Returns (new_h, new_w) as ints.
    """
    size = _check_size_arg(size)

    if size is None:
        if max_size is None:
            raise ValueError("If size is None, max_size must be provided.")
        # Equivalent to "not_larger": longest edge becomes max_size
        # and aspect ratio is preserved.
        # This mirrors torchvision v2 functional behavior.
        if orig_w >= orig_h:
            new_w = int(max_size)
            new_h = _round_to_int(max_size * orig_h / orig_w)
        else:
            new_h = int(max_size)
            new_w = _round_to_int(max_size * orig_w / orig_h)
        return new_h, new_w

    # size is (h, w): direct
    if isinstance(size, tuple):
        if max_size is not None:
            raise ValueError("max_size must be None when size is a sequence of length 2.")
        return int(size[0]), int(size[1])

    # size is int: match shorter edge
    if max_size is not None and max_size <= 0:
        raise ValueError("max_size must be positive.")

    short, long_ = (orig_w, orig_h) if orig_w <= orig_h else (orig_h, orig_w)

    if short == size:
        # torchvision returns original size (no-op)
        new_h, new_w = orig_h, orig_w
    else:
        scale = float(size) / float(short)
        new_h = _round_to_int(orig_h * scale)
        new_w = _round_to_int(orig_w * scale)

    if max_size is not None:
        new_short, new_long = (new_w, new_h) if new_w <= new_h else (new_h, new_w)
        if new_long > max_size:
            scale = float(max_size) / float(new_long)
            new_h = _round_to_int(new_h * scale)
            new_w = _round_to_int(new_w * scale)

    # safety: clamp to at least 1
    new_h = max(1, int(new_h))
    new_w = max(1, int(new_w))
    return new_h, new_w

Comment on lines +102 to +103
tv_shape_lower = torch.Size([out_tv.shape[1] - 1, out_tv.shape[2] - 1])
tv_shape_upper = torch.Size([out_tv.shape[1] + 1, out_tv.shape[2] + 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, you could follow torch's rounding rules and expect equality? At least to be checked if it is possible to achieve with DALI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree, full equality would be the best.

If the rounding errors are unavoidable, we should probably document where discrepancy comes from and make sure that we know the error bound (is it always +/-1? or can it be +/-2 in some scenarios?). The documentation should warn the users that they won't see bit-exact results.

If equality is not possible, I think using something like isclose would make this test code more readable, as you'll have something along the lines of assert torch.isclose(out_fn.shape[0], out_tv.shape[0], rtol=0, atol=1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like it to be equal as well, but DALI calculates the output size different from Torchvision.
The following change in the resize_attr_base.cc::AdjustOutputSize:

-          out_size[d] = in_size[d] * scale[d];
+          out_size[d] = std::floor(in_size[d] * scale[d]);

would align both libraries, but it would change DALI's behavior.
There is an option to recalculate the output size in Python, but it would add an additional overhead to resize and I took a liberty to not do that and add it to the documentation that the calcualted size may be off by one.

Copy link
Collaborator

@szkarpinski szkarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments after the first reading. My main concerns are:

  • There's a significant amount of hard-coded dependencies on layouts (HWC, CHW) or color formats (RGB, RGBA). I'd like to make sure that this is intentional and we don't anticipate any need to extend this soon.
  • The verification of arguments and argument handling seems to overlap with what DALI already does, and duplicated input validation means added maintenance cost and a risk of discrepancies

return torch.from_dlpack(dali_tensor)


def to_torch_tensor(tensor_or_tl: tuple | TensorListGPU | TensorListCPU) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a useful and generic thing, not strictly related to torchvision. Shouldn't this be a part of main DALI, or Torch plugin?

Comment on lines +124 to +127
stream = torch.cuda.Stream(0)
with torch.cuda.stream(stream):
output = self.pipe.run(stream, input_data=data_input)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have CPU-only tests that will cover torchvision? If not, we should definitely add some to catch things like this

return self.convert_to_tensor


class PipelineHWC(PipelineLayouted):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share some more context for this design? Why do we need a class per layout? This will be hard to generalize, so I believe that documenting the reasons for this decision is important

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class per layout is needed due to how Torchvision works. It expects HWC for PIL.Images and CHW for torch.Tensors. (currently no other inputs are supported in object oriented API). The main idea is to separate both cases to avoid if-else chains in input type verification, input and output data conversion, etc. The first release of the Torchvision DALI OO API is limited in what can be supported and focuses on the most common types. Having separate classes for different layouts allows us to extend support even further, hopefully without tampering with the existing code.

raise ValueError(f"Values {name} should be positive number, got {values}")


class VerifyIfOrderedPair(ArgumentVerificationRule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: ordered pair is something different

Comment on lines +178 to +179
arg_rules: Sequence[ArgumentVerificationRule] = []
input_rules: Sequence[DataVerificationRule] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of formalizing and generalizing the verification rules, but do we know how much does this overlap with existing verification that DALI does? Won't such an overlap affect the maintainability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason behind verification rules were to align with Torchvision verification of arguments and input data and respond with the same error messages in case of violation of these rules. I agree that there may be an overlap with what DALI does, but the new verification serves a bit different purpose.

L, RGB, or RGBA mode
- torch.Tensor:
ndim==3 -> ndd.Tensor(layout = "CHW"),
ndim>3 -> ndd.Batch(layout="NCHW")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ndim>3 or ndim==4? Do we support multi-dimensional batches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Torchvision does not limit number of supported dimensions :

The v2 transforms generally accept an arbitrary number of leading dimensions (..., C, H, W) and can handle batched images or batched videos.

Source

I am not currently sure if there are use cases for multi-dimensional batches but I decided to let DALI backend be a point of failure, so that we could easily pinpoint the future issues.

output = to_torch_tensor(output)
# ToTensor
if self.convert_to_tensor:
if output.shape[-4] > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I pass a tensor of shape (100, 1, 2, 3, 4)? Will this 2D batch be caught somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not supported by DALI it will be caught there, otherwise no.

Comment on lines +51 to +52
If the image is torch Tensor, it is expected to have […, H, W] shape, where … means a maximum
of two leading dimensions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this one. If I have 300x200 image, my tensor is expected to have shape [max(300, 200), 300, 200] = [300, 300, 200]?

(nitpick) Also, "leading dimension" is a term used in BLAS, and here I assume it means something different

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@params("gpu", "cpu")
def test_horizontal_random_flip_probability(device):
img = make_test_tensor()
transform = Compose([RandomHorizontalFlip(p=1.0, device=device)]) # always flip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Compose required to instantiate an operator? I believe that in torchvision you can use Compose to compose multiple operators, but Compose on a single-element list is a no-op and is not required nor encouraged

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Compose required to instantiate an operator?

Yes, since Compose is a wrapper for a pipeline. It does not make much sense to instantiate standalone operators (and build a pipeline around them), since it would promote an antipattern and probably hinder the performance. It is better to use functional, which is using dynamic mode.

Torchvision allows standalone operators instantiation, since they have low overhead when executed on CPU.

Comment on lines +102 to +103
tv_shape_lower = torch.Size([out_tv.shape[1] - 1, out_tv.shape[2] - 1])
tv_shape_upper = torch.Size([out_tv.shape[1] + 1, out_tv.shape[2] + 1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree, full equality would be the best.

If the rounding errors are unavoidable, we should probably document where discrepancy comes from and make sure that we know the error bound (is it always +/-1? or can it be +/-2 in some scenarios?). The documentation should warn the users that they won't see bit-exact results.

If equality is not possible, I think using something like isclose would make this test code more readable, as you'll have something along the lines of assert torch.isclose(out_fn.shape[0], out_tv.shape[0], rtol=0, atol=1)

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45326301]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45326301]: BUILD FAILED

@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_api_infra branch from 34bcd56 to 08bbd62 Compare March 5, 2026 14:50
Modified resize operator size calculation

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_api_infra branch 2 times, most recently from 438737e to d15613b Compare March 5, 2026 15:03
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_api_infra branch from d15613b to 9ac174a Compare March 5, 2026 15:08
Comment on lines +110 to +115
assert torch.allclose(
torch.tensor(out_tv.shape[1:3]), torch.tensor(out_dali_tv.shape[1:3]), rtol=0, atol=1
), f"Should be:{out_tv.shape} is:{out_dali_tv.shape}"
assert torch.allclose(
torch.tensor(out_fn.shape[1:3]), torch.tensor(out_dali_fn.shape[1:3]), rtol=0, atol=1
), f"Should be:{out_fn.shape} is:{out_dali_fn.shape}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shape comparison skips width for 4D batched tensors

shape[1:3] extracts different semantic dimensions depending on the tensor rank:

  • For a 3D CHW output (C, H, W)shape[1:3] = (H, W)
  • For a 4D NCHW output (N, C, H, W)shape[1:3] = (C, H)

build_tensors() deliberately creates 4D batched tensors (e.g., torch.ones((1, channels, max_size, max_size)) and torch.ones((10, channels, max_size, max_size))), so for those cases the assertion compares (C, H_dali) against (C, H_tv). Since C is never changed by resize, only height is effectively validated and width is never checked. A buggy resize that produced the correct height but the wrong width would silently pass this assertion for all 4D inputs.

# For layout-agnostic spatial dimension comparison:
if out_dali_tv.ndim == 4:  # NCHW
    dali_spatial = torch.tensor(out_dali_tv.shape[-2:])
    tv_spatial = torch.tensor(out_tv.shape[-2:])
else:  # CHW
    dali_spatial = torch.tensor(out_dali_tv.shape[-2:])
    tv_spatial = torch.tensor(out_tv.shape[-2:])
assert torch.allclose(tv_spatial, dali_spatial, rtol=0, atol=1), ...

Comment on lines +43 to +54
layout = data_input.property("layout")[0]

# CHW
if layout == np.frombuffer(bytes("C", "utf-8"), dtype=np.uint8)[0]:
input_height = data_input.shape()[-2]
input_width = data_input.shape()[-1]
# HWC
else:
input_height = data_input.shape()[-3]
input_width = data_input.shape()[-2]

return input_height, input_width, data_input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile layout detection via single-character comparison

The layout is identified by comparing only the first byte (property("layout")[0]) against the ASCII code of "C". While this correctly distinguishes "CHW" from "HWC" in the current pipeline context (which only ever produces those two layouts), it is brittle:

  • Any layout starting with "C" (e.g., a hypothetical "CWH") would be silently treated as CHW with correct-looking but semantically wrong height/width extraction.
  • "NCHW" starts with "N", so it would fall through to the HWC branch, producing input_height = shape[-3] (=C) and input_width = shape[-2] (=H) — both wrong.

If this function is ever called outside the tightly-controlled pipeline context (e.g., from a custom pipeline using "NCHW" layout), it will silently produce incorrect aspect-ratio calculations. Consider comparing the full layout string or using a dictionary/explicit set:

layout = data_input.property("layout")
if layout in (b"CHW", b"NCHW"):
    input_height = data_input.shape()[-2]
    input_width  = data_input.shape()[-1]
elif layout in (b"HWC", b"NHWC"):
    input_height = data_input.shape()[-3]
    input_width  = data_input.shape()[-2]
else:
    raise ValueError(f"Unsupported layout: {layout!r}")

Comment on lines +138 to +139
if output is None:
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code: output is None check can never be true

output is initialised to None on line 129 and is unconditionally assigned by both branches of the if/else block above (self.pipe.run(...) is called in both the GPU and CPU path). pipe.run() never returns None, so this guard is unreachable dead code and may give a false sense of safety. Consider removing it or replacing it with an assertion:

Suggested change
if output is None:
return output
assert output is not None, "pipe.run() returned None unexpectedly"

Comment on lines +57 to +75
class VerificationSize(ArgumentVerificationRule):
@classmethod
def verify(cls, *, size, max_size, interpolation, **_):
if size is not None and not isinstance(size, int) and not isinstance(size, (tuple, list)):
raise ValueError(
"Invalid combination: size must be int, None, or sequence of two ints. "
"max_size only applies when size is int or None."
)
if size is None and max_size is None:
raise ValueError("Must provide max_size if size is None.")
if size is not None and max_size is not None and np.min(size) > max_size:
raise ValueError("max_size should not be smaller than the actual size")
if isinstance(size, (tuple, list)) and len(size) == 2 and max_size is not None:
raise ValueError(
"max_size should only be passed if size specifies the length of the smaller \
edge, i.e. size should be an int"
)
if interpolation not in Resize.interpolation_modes.keys():
raise ValueError(f"Interpolation {type(interpolation)} is not supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerificationSize does not validate that max_size is a positive integer

max_size participates in arithmetic calculations ((max_size * orig_w) / orig_h) and is forwarded directly to DALI's fn.resize(..., max_size=self.max_size). However, the verifier only checks relational constraints against size — it never checks that max_size itself is a positive integer. Passing max_size=0 or max_size=-1 would silently produce nonsensical resize targets or a runtime DALI error with a confusing message.

Consider adding an explicit check:

if max_size is not None and max_size <= 0:
    raise ValueError(f"max_size must be a positive integer, got {max_size}")

Moved resize output size calculation to a Resize class

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia
Copy link
Collaborator Author

!build

return output

@abstractmethod
def get_layout(self) -> str: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
def get_layout(self) -> str: ...

@abstractmethod
def get_channel_reverse_idx(self) -> int: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
def get_channel_reverse_idx(self) -> int: ...

@abstractmethod
def verify_layout(self, data) -> None: ...

Check notice

Code scanning / CodeQL

Statement has no effect Note

This statement has no effect.
Comment on lines +38 to +43
Resize.verify_args(
size=size, max_size=max_size, interpolation=interpolation, antialias=antialias
)

size_normalized = Resize.infer_effective_size(size, max_size)
interpolation = Resize.interpolation_modes[interpolation]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported interpolation modes silently accepted in functional API

Resize.verify_args() delegates to VerificationSize.verify(), which only checks that interpolation is a key in Resize.interpolation_modes. However, InterpolationMode.NEAREST_EXACT, BOX, and HAMMING are present in that dict (all mapped to INTERP_NN) specifically to avoid a ValueError during the check, while Resize.__init__ explicitly raises NotImplementedError for those same modes.

As a result, calling functional.resize(img, size=512, interpolation=InterpolationMode.NEAREST_EXACT) silently uses nearest-neighbour interpolation instead of raising an error, which is directly inconsistent with the class-based Resize API.

The fix is to add the same guard that Resize.__init__ uses:

if interpolation in Resize.not_supported_interpolation_modes:
    raise NotImplementedError(
        f"Interpolation mode {interpolation} is not supported in DALI."
    )

Comment on lines +142 to +162
class VerifSizeDescriptor(ArgumentVerificationRule):
"""
Verify if the value can describe a size argument, which is:
- an integer
- or a sequence of length of 1,
- or a sequence of length of 2

Parameters
----------
size : any
Value to verify. Should be an integer or a sequence of length 1 or 2.
"""

@classmethod
def verify(cls, *, size, **_) -> None:
if not isinstance(size, (int, list, tuple)):
raise TypeError(f"Size must be int, list, or tuple, got {type(size)}")
elif isinstance(size, (list, tuple)) and len(size) > 2:
raise ValueError(f"Size sequence must have length 1 or 2, got {len(size)}")
VerifyIfPositive.verify(values=size, name="size")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifSizeDescriptor defined but never used

VerifSizeDescriptor is declared in this file but never appears in any operator's arg_rules, is not imported anywhere else in the package, and its validation is already subsumed by VerificationSize (which calls VerifyIfPositive internally). This is unreferenced dead code that adds noise to the public surface of the module and should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants