Add enable_frame_num='sequence' mode to video readers. by JanuszL · Pull Request #6237 · NVIDIA/DALI

JanuszL · 2026-02-27T10:30:34Z

extends the enable_frame_num argument in both the legacy
(fn.readers.video) and experimental (fn.experimental.readers.video)
video reader operators from a boolean to a string enum, following the
same convention as out_of_bounds_policy:
- "none" (default) - no frame number output (previous False)
- "scalar" - returns the index of the first frame in the decoded
  sequence as a scalar output with shape (1,) (previous True)
- "sequence" - returns the frame index of each decoded frame as an
  additional output with shape (F,); padded frames get index -1
the FrameNumPolicy enum and ParseFrameNumPolicy helper are added to
video_utils.h and shared by both readers.
tests are added for the sequence mode covering basic stride behaviour,
constant-padding (-1 sentinel), and consistency between "scalar"
and "sequence" outputs.

Category:

New feature (non-breaking change which adds functionality)
Relates to #4410

Description:

extends the enable_frame_num argument in both the legacy
(fn.readers.video) and experimental (fn.experimental.readers.video)
video reader operators from a boolean to a string enum, following the
same convention as out_of_bounds_policy:
- "none" (default) - no frame number output (previous False)
- "scalar" - returns the index of the first frame in the decoded
  sequence as a scalar output with shape (1,) (previous True)
- "sequence" - returns the frame index of each decoded frame as an
  additional output with shape (F,); padded frames get index -1
the FrameNumPolicy enum and ParseFrameNumPolicy helper are added to
video_utils.h and shared by both readers.
tests are added for the sequence mode covering basic stride behaviour,
constant-padding (-1 sentinel), and consistency between "scalar"
and "sequence" outputs.

Additional information:

Affected modules and functionalities:

video reader old and experimental one

Key points relevant for the review:

NA

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: RVID.28

JIRA TASK: N/A

greptile-apps · 2026-02-27T10:35:48Z

Greptile Summary

This PR extends the enable_frame_num argument in both fn.readers.video (legacy) and fn.experimental.readers.video (experimental) from a boolean flag to a three-value string enum ("none" / "scalar" / "sequence"), fitting naturally alongside other existing string-enum arguments like pad_mode. The shared FrameNumPolicy enum and ParseFrameNumPolicy helper centralise parsing in video_utils.h and are used by both readers. The new "sequence" mode outputs a per-frame index tensor of shape (F,) — the experimental reader computes these via HandleBoundary (correctly propagating pad-mode semantics including -1 for constant-padding and reflected/clamped indices for other modes), while the legacy reader uses a simpler linear formula. Backward compatibility with Python True/False booleans is maintained via "True"/"False" string checks.

Key observations:

Missing divisibility guard in test_enable_frame_num_sequence_with_padding: if the test video's frame count is exactly divisible by stride × sequence_length, no padded sequence is ever produced and the assertion always fails. The sibling test test_enable_frame_num_sequence_non_constant_padding already handles this with a SkipTest; the same guard is needed here.
Legacy reader Sequence mode accuracy: the index formula first_frame_idx + i * stride_ is used for all frames where i < prefetched_video.count. If the legacy loader pads sequences to max_count (i.e., prefetched_video.count == count_ for edge/repeat-padded videos), the -1 sentinel is never emitted and the computed indices will extend beyond the actual end of the file for the repeated frames — inconsistent with the experimental reader's behaviour.
ParseFrameNumPolicy is case-sensitive while GetBoundaryType (in the same file) normalises to lowercase. Passing "None" or "Scalar" would throw at runtime.
test_enable_frame_num_sequence_matches_scalar compares the first elements of two independently-constructed pipelines, which is fragile. The established compare_pipelines pattern (used in test_enable_frame_num_true_compat) is more robust.

Confidence Score: 3/5

Safe to merge after addressing the test divisibility guard and clarifying legacy-reader padding semantics
The core feature (FrameNumPolicy enum, ParseFrameNumPolicy, experimental-reader integration) is sound. The experimental reader correctly uses HandleBoundary for all padding modes. The main concerns are: (1) a test that will always fail when the video frame count is divisible by stride×sequence_length — a concrete and reproducible bug; (2) a potential semantic gap in the legacy reader's Sequence mode where edge/repeat-padded frames may report out-of-range indices instead of the actual source-frame index. Both are addressable before merge.
dali/test/python/decoder/test_video.py (missing divisibility guard), dali/operators/video/legacy/reader/video_reader_op.h (linear index formula for padded frames)

Important Files Changed

Filename	Overview
dali/operators/video/video_utils.h	Adds `FrameNumPolicy` enum and `ParseFrameNumPolicy` helper. Logic is correct; case-sensitive matching may surprise users who pass mixed-case strings.
dali/operators/video/legacy/reader/video_reader_op.h	Switches `enable_frame_num_` from `bool` to `FrameNumPolicy`; adds `Sequence` mode using a linear index formula that may produce incorrect results for edge/reflect-padded frames if the loader fills `prefetched_video.count` to `count_`.
dali/operators/video/reader/video_reader_decoder_op.cc	Correctly integrates `FrameNumPolicy` into the experimental reader; `frame_idx_` type narrowed from `int64_t` to `int32_t` (consistent with `DALI_INT32` output). `HandleBoundary` returns `-1` for CONSTANT and correct reflected/clamped indices otherwise — logic verified.
dali/test/python/decoder/test_video.py	Adds six new Python tests covering basic stride, constant-padding (-1 sentinel), non-constant padding, scalar/sequence consistency, and backward-compat for `True`/`False`/`None`. `test_enable_frame_num_sequence_with_padding` is missing the divisibility guard present in its sibling test, which can cause a spurious failure. `test_enable_frame_num_sequence_matches_scalar` relies on ordering equivalence between two independent pipelines rather than `compare_pipelines`.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Python: enable_frame_num=value] --> B[ParseFrameNumPolicy]
    B -->|'none' / False| C[FrameNumPolicy::None\nNo extra output]
    B -->|'scalar' / True| D[FrameNumPolicy::Scalar\nOutput shape 1]
    B -->|'sequence'| E[FrameNumPolicy::Sequence\nOutput shape F]
    B -->|unknown| F[DALI_FAIL]

    D --> G{Reader type}
    E --> G

    G -->|Legacy reader\nvideo_reader_op.h| H[Linear formula\nfirst_frame_idx + i * stride_\nelse -1 if i >= count]
    G -->|Experimental reader\nvideo_reader_decoder_op.cc| I[HandleBoundary\nper pad_mode]

    I -->|CONSTANT| J[Return -1\nfor out-of-bounds]
    I -->|CLAMP / REFLECT| K[Return clamped /\nreflected index]
    I -->|ISOLATED| L[DALI_FAIL\nif out-of-bounds]

    H --> M[GPU copy to output\nframe_num_output_]
    J --> N[frame_idx_ stored in\nVideoSample during Prefetch]
    K --> N
    N --> O[OutputMetadata in RunImpl]

Comments Outside Diff (4)

dali/test/python/decoder/test_video.py, line 499-512 (link)

Missing divisibility guard - test will always fail on exact-divisible video

This test asserts a padded sequence must be found, but there is no guard for the case where total_frames is exactly divisible by stride * sequence_length (i.e., total_frames % (stride * sequence_length) == 0). When the video divides evenly, no padded tail sequence is ever generated, so the loop never finds −1 in idxs, padded_idxs stays None, and the final assertion always fails.

test_enable_frame_num_sequence_non_constant_padding (line 574) already has the correct guard:
```
if total_frames % full_seq_stride == 0:
    raise SkipTest(
        f"Video has {total_frames} frames, exactly divisible by {full_seq_stride}; "
        f"no padding needed"
    )
```
The same check should be added here, right after computing total_frames:
```
full_seq_stride = stride * sequence_length
if total_frames % full_seq_stride == 0:
    raise SkipTest(
        f"Video has {total_frames} frames, exactly divisible by {full_seq_stride}; "
        f"no padding needed for pad_mode='constant'"
    )
full_sequences = total_frames // full_seq_stride
```

dali/operators/video/video_utils.h, line 68-76 (link)

Case-sensitive string matching may confuse users

ParseFrameNumPolicy only recognises exact-case strings: "none", "scalar", "sequence". Any variation ("None", "Scalar", "SEQUENCE", etc.) will throw a runtime DALI_FAIL. For comparison, GetBoundaryType (also in this file) uses a lowercase-normalised comparison via std::transform. Adopting the same pattern would provide a more user-friendly error surface, especially as the docstrings show mixed-case examples ("None", "False").

inline FrameNumPolicy ParseFrameNumPolicy(const std::string &s) {
  std::string lower(s);
  std::transform(lower.begin(), lower.end(), lower.begin(),
                 [](unsigned char c){ return std::tolower(c); });
  if (lower == "none" || lower == "false")   return FrameNumPolicy::None;
  if (lower == "scalar" || lower == "true")  return FrameNumPolicy::Scalar;
  if (lower == "sequence")                   return FrameNumPolicy::Sequence;
  DALI_FAIL(make_string(...));
}

dali/operators/video/legacy/reader/video_reader_op.h, line 175-193 (link)

Sequence mode frame indices may be inaccurate for non-constant-padded frames

The per-frame index is computed as a purely linear formula:
```
idxs[i] = (i < prefetched_video.count)
              ? (prefetched_video.first_frame_idx + i * stride_)
              : -1;
```
This is correct when prefetched_video.count < count_ (indicating constant/zero padding). However, if the legacy loader fills the full max_count slots via edge-repeat or reflect padding, prefetched_video.count == count_ and the -1 branch is never taken. The "padded" repeated/reflected frames would then report frame indices that extend beyond the end of the file (e.g., first_frame_idx + (count_-1) * stride_ for a video that was padded by repeating the last frame), which is incorrect.

The experimental reader avoids this by calling decoder_->HandleBoundary(boundary_type_, ...) which correctly maps each logical slot to its actual source frame index (or -1 for constant padding). Consider adding a comment here documenting the assumption that prefetched_video.count always equals the number of unique decoded frames (not the padded total), or align the logic with the experimental reader.
dali/test/python/decoder/test_video.py, line 652-663 (link)

Two independent pipelines may produce different sample ordering

This test creates two separate pipeline instances (pipe_s / pipe_q) and compares the i-th element from each, implicitly assuming both pipelines always emit exactly the same sample at position i. Although this works deterministically today (same filenames, no random_shuffle, same seed), it is fragile: any internal difference in prefetch depth, initial-fill behaviour, or future reader changes could cause them to diverge silently.

The existing test_enable_frame_num_true_compat test (line 697) already demonstrates the idiomatic solution — compare_pipelines — which explicitly guarantees the two pipelines are advanced in lock-step. Consider adopting the same approach here for robustness:
```
compare_pipelines(pipe_s, pipe_q, batch_size=batch_size, N_iterations=2,
                  compare_fn=lambda a, b: (a[0], a[1][0]) == (b[0], b[1][0]))
```
or compare the frame-index outputs after a manual pipe.run() in a single loop.

_{Last reviewed commit: f5f3664}

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

JanuszL · 2026-02-27T10:40:37Z

!build

dali-automaton · 2026-02-27T10:45:32Z

CI MESSAGE: [44961487]: BUILD STARTED

dali-automaton · 2026-02-27T12:47:18Z

CI MESSAGE: [44961487]: BUILD FAILED

mzient · 2026-02-27T13:01:47Z

This is a breaking change - old code that would pass a boolean value will cease to work after this change.

JanuszL · 2026-02-27T13:11:41Z

This is a breaking change - old code that would pass a boolean value will cease to work after this change.

Good catch. I updated logic to handle True and False, reworked native tests to pass strings, still sequence_processing/video/video_file_list_outputs and checkpointing.test_dali_checkpointing.test_video_reader_resize_reade should check the backward compatibility.

JanuszL · 2026-02-27T13:13:41Z

!build

dali-automaton · 2026-02-27T13:15:42Z

CI MESSAGE: [44968884]: BUILD STARTED

dali-automaton · 2026-02-27T18:11:14Z

CI MESSAGE: [44968884]: BUILD FAILED

JanuszL · 2026-02-27T18:17:09Z

!build

dali-automaton · 2026-02-27T18:20:47Z

CI MESSAGE: [44984063]: BUILD STARTED

greptile-apps · 2026-02-27T18:22:37Z

Additional Comments (2)

dali/test/python/decoder/test_video.py, line 980
Consider testing enable_frame_num="sequence" with non-constant padding modes (edge, reflect, etc.) to verify behavior. The experimental reader uses HandleBoundary() which returns reflected/clamped indices for non-constant modes, while the legacy reader always uses -1 for padded frames regardless of mode.

dali/operators/video/video_utils.h, line 70
Consider using || instead of or for consistency with typical C++ style, although both are valid

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

JanuszL · 2026-02-27T18:49:09Z

!build

dali-automaton · 2026-02-27T18:56:05Z

CI MESSAGE: [44986246]: BUILD STARTED

dali-automaton · 2026-02-27T19:55:13Z

CI MESSAGE: [44984063]: BUILD PASSED

dali-automaton · 2026-02-27T20:47:18Z

CI MESSAGE: [44986246]: BUILD PASSED

jantonguirao · 2026-03-02T11:46:16Z

dali/operators/video/legacy/reader/video_reader_op.cc

+      R"code(Determines what frame number information is returned as an additional output.
+Only available when `file_list` or `filenames` with `labels` is passed.
+
+* ``"none"`` or ``False`` (default): No frame number output.


I wonder if we need the string "none", as opposed to accepting None

Fixed to None.

jantonguirao · 2026-03-02T11:52:28Z

dali/test/python/decoder/test_video.py

+            f"First element of 'sequence' output ({seq_idxs[0]}) should match "
+            f"'scalar' output ({scalar_idx})"
+        )
+


can we add tests with legacy syntax? =False and =True?

greptile-apps · 2026-03-02T14:05:52Z

Additional Comments (3)

dali/operators/video/reader/video_reader_decoder_op.cc, line 473
Non-uniform shape for variable-length sequences

The kSequence output shape is allocated as uniform_list_shape<1>(batch_size, {num_frames}) using only GetSample(0)'s frame count. However, timestamps — which face the exact same problem — correctly iterates over all samples:

// Correct pattern used by timestamps (lines 477-483):
TensorListShape<1> timestamps_shape(batch_size);
for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
    auto &sample = GetSample(sample_id);
    auto num_frames = sample.data_.shape()[0];
    timestamps_shape.set_tensor_shape(sample_id, {num_frames});
}

With pad_mode='none' (ISOLATED boundary) sequences near the end of the video can be shorter than sequence_length, producing samples where data_.shape()[0] < GetSample(0).data_.shape()[0]. When OutputMetadata then iterates over s.frame_idx_ sequentially into the pre-allocated flat buffer it will write the shorter per-sample data into the wrong offsets for subsequent samples, corrupting all following entries. The fix is to use the same per-sample iteration pattern as timestamps:

} else if (frame_num_policy_ == FrameNumPolicy::kSequence) {
    TensorListShape<1> frame_idx_shape(batch_size);
    for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
        auto &sample = GetSample(sample_id);
        auto num_frames = sample.data_.shape()[0];
        frame_idx_shape.set_tensor_shape(sample_id, {num_frames});
    }
    output_desc.push_back({frame_idx_shape, DALI_INT32});
}

dali/operators/video/video_utils.h, line 70
Case-sensitive Python bool backward-compat strings

The strings "True" and "False" rely on Python's str(True) and str(False) representations being title-cased. If DALI's Python-to-C++ argument coercion ever changes to lowercase ("true"/"false") these branches would silently fall through to DALI_FAIL. A brief comment explaining the convention and the source of these strings would make this intent clear to future readers:

  if (s == "none" || s == "False")   return FrameNumPolicy::kNone;    // "False"/"True" for Python bool back-compat
  if (s == "scalar" || s == "True")  return FrameNumPolicy::kScalar;  // Python: str(True) == "True"

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

dali/operators/video/legacy/reader/video_reader_op.h, line 95
kSequence shape uses fixed count_ but frame index computation depends on actual prefetched_video.count

The shape is pre-allocated as {count_} (line 94 via label_shape_) for all samples, and the loop at line 180 fills indices up to count_ with -1 sentinels beyond prefetched_video.count. This is intentional — the output always has count_ elements.

However, the arithmetic prefetched_video.first_frame_idx + i * stride_ is a raw linear computation that bypasses the ROI handling performed by the VideoLoader. If first_frame_idx is a file-relative frame number and the clip starts mid-video, the reported indices are correct. But please verify that first_frame_idx is set to the absolute video frame index (not an ROI-relative one) to ensure consistency with the experimental reader's output, which uses the absolute frame index via HandleBoundary.

JanuszL · 2026-03-02T14:28:03Z

!build

dali-automaton · 2026-03-02T14:30:31Z

CI MESSAGE: [45141308]: BUILD STARTED

dali-automaton · 2026-03-03T05:14:07Z

CI MESSAGE: [45141308]: BUILD FAILED

dali-automaton · 2026-03-05T00:44:52Z

CI MESSAGE: [45141308]: BUILD PASSED

dali-automaton · 2026-03-05T06:05:27Z

CI MESSAGE: [45141308]: BUILD PASSED

mzient · 2026-03-05T09:00:54Z

dali/operators/video/video_utils.h

+  kNone,      // no frame number output
+  kScalar,    // first frame index as a scalar with shape (1,)
+  kSequence   // per-frame indices with shape (F,); padded frames get -1


Nit: we don't use kOnstant names for enum values, especially ones without an explicit numerical representation.

JanuszL · 2026-03-05T10:00:37Z

!build

dali-automaton · 2026-03-05T10:05:58Z

CI MESSAGE: [45416810]: BUILD STARTED

dali-automaton · 2026-03-05T10:36:37Z

CI MESSAGE: [45416810]: BUILD FAILED

- extends the `enable_frame_num` argument in both the legacy (`fn.readers.video`) and experimental (`fn.experimental.readers.video`) video reader operators from a boolean to a string enum, following the same convention as `out_of_bounds_policy`: * ``"none"`` (default) - no frame number output (previous `False`) * ``"scalar"`` - returns the index of the first frame in the decoded sequence as a scalar output with shape `(1,)` (previous `True`) * ``"sequence"`` - returns the frame index of each decoded frame as an additional output with shape `(F,)`; padded frames get index `-1` - the `FrameNumPolicy` enum and `ParseFrameNumPolicy` helper are added to `video_utils.h` and shared by both readers. - tests are added for the `sequence` mode covering basic stride behaviour, constant-padding (``-1`` sentinel), and consistency between `"scalar"` and `"sequence"` outputs. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2026-03-05T10:48:03Z

!build

dali-automaton · 2026-03-05T10:50:31Z

CI MESSAGE: [45419309]: BUILD STARTED

dali-automaton · 2026-03-05T12:10:31Z

CI MESSAGE: [45419309]: BUILD FAILED

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

dali-automaton assigned jantonguirao and mzient Feb 27, 2026

JanuszL force-pushed the enable_filled_frame_nums branch from 3774066 to 711d1bd Compare February 27, 2026 18:48

jantonguirao reviewed Mar 2, 2026

View reviewed changes

jantonguirao approved these changes Mar 5, 2026

View reviewed changes

mzient reviewed Mar 5, 2026

View reviewed changes

mzient approved these changes Mar 5, 2026

View reviewed changes

JanuszL added 8 commits March 5, 2026 11:39

Fixes

2ee0f99

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix

3d6a74c

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fixes

80e167e

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

More tests

37d8ac8

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix

1be832e

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Review fix

24c84e5

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix

f5f3664

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the enable_filled_frame_nums branch from ae299d5 to f5f3664 Compare March 5, 2026 10:42

Conversation

JanuszL commented Feb 27, 2026

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

greptile-apps bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (4)

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanuszL commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

mzient commented Feb 27, 2026

Uh oh!

JanuszL commented Feb 27, 2026

Uh oh!

JanuszL commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

JanuszL commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

greptile-apps bot commented Feb 27, 2026

Uh oh!

JanuszL commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

dali-automaton commented Feb 27, 2026

Uh oh!

jantonguirao Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

JanuszL Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jantonguirao Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

JanuszL Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 2, 2026

Uh oh!

JanuszL commented Mar 2, 2026

Uh oh!

dali-automaton commented Mar 2, 2026

Uh oh!

dali-automaton commented Mar 3, 2026

Uh oh!

dali-automaton commented Mar 5, 2026

Uh oh!

dali-automaton commented Mar 5, 2026

Uh oh!

mzient Mar 5, 2026

greptile-apps bot commented Feb 27, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading