Skip to content

Add enable_frame_num='sequence' mode to video readers.#6237

Open
JanuszL wants to merge 8 commits intoNVIDIA:mainfrom
JanuszL:enable_filled_frame_nums
Open

Add enable_frame_num='sequence' mode to video readers.#6237
JanuszL wants to merge 8 commits intoNVIDIA:mainfrom
JanuszL:enable_filled_frame_nums

Conversation

@JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Feb 27, 2026

  • extends the enable_frame_num argument in both the legacy
    (fn.readers.video) and experimental (fn.experimental.readers.video)
    video reader operators from a boolean to a string enum, following the
    same convention as out_of_bounds_policy:

    • "none" (default) - no frame number output (previous False)
    • "scalar" - returns the index of the first frame in the decoded
      sequence as a scalar output with shape (1,) (previous True)
    • "sequence" - returns the frame index of each decoded frame as an
      additional output with shape (F,); padded frames get index -1
  • the FrameNumPolicy enum and ParseFrameNumPolicy helper are added to
    video_utils.h and shared by both readers.

  • tests are added for the sequence mode covering basic stride behaviour,
    constant-padding (-1 sentinel), and consistency between "scalar"
    and "sequence" outputs.

Category:

New feature (non-breaking change which adds functionality)
Relates to #4410

Description:

  • extends the enable_frame_num argument in both the legacy
    (fn.readers.video) and experimental (fn.experimental.readers.video)
    video reader operators from a boolean to a string enum, following the
    same convention as out_of_bounds_policy:

    • "none" (default) - no frame number output (previous False)
    • "scalar" - returns the index of the first frame in the decoded
      sequence as a scalar output with shape (1,) (previous True)
    • "sequence" - returns the frame index of each decoded frame as an
      additional output with shape (F,); padded frames get index -1
  • the FrameNumPolicy enum and ParseFrameNumPolicy helper are added to
    video_utils.h and shared by both readers.

  • tests are added for the sequence mode covering basic stride behaviour,
    constant-padding (-1 sentinel), and consistency between "scalar"
    and "sequence" outputs.

Additional information:

Affected modules and functionalities:

  • video reader old and experimental one

Key points relevant for the review:

  • NA

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
      • test_video
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: RVID.28

JIRA TASK: N/A

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR extends the enable_frame_num argument in both fn.readers.video (legacy) and fn.experimental.readers.video (experimental) from a boolean flag to a three-value string enum ("none" / "scalar" / "sequence"), fitting naturally alongside other existing string-enum arguments like pad_mode. The shared FrameNumPolicy enum and ParseFrameNumPolicy helper centralise parsing in video_utils.h and are used by both readers. The new "sequence" mode outputs a per-frame index tensor of shape (F,) — the experimental reader computes these via HandleBoundary (correctly propagating pad-mode semantics including -1 for constant-padding and reflected/clamped indices for other modes), while the legacy reader uses a simpler linear formula. Backward compatibility with Python True/False booleans is maintained via "True"/"False" string checks.

Key observations:

  • Missing divisibility guard in test_enable_frame_num_sequence_with_padding: if the test video's frame count is exactly divisible by stride × sequence_length, no padded sequence is ever produced and the assertion always fails. The sibling test test_enable_frame_num_sequence_non_constant_padding already handles this with a SkipTest; the same guard is needed here.
  • Legacy reader Sequence mode accuracy: the index formula first_frame_idx + i * stride_ is used for all frames where i < prefetched_video.count. If the legacy loader pads sequences to max_count (i.e., prefetched_video.count == count_ for edge/repeat-padded videos), the -1 sentinel is never emitted and the computed indices will extend beyond the actual end of the file for the repeated frames — inconsistent with the experimental reader's behaviour.
  • ParseFrameNumPolicy is case-sensitive while GetBoundaryType (in the same file) normalises to lowercase. Passing "None" or "Scalar" would throw at runtime.
  • test_enable_frame_num_sequence_matches_scalar compares the first elements of two independently-constructed pipelines, which is fragile. The established compare_pipelines pattern (used in test_enable_frame_num_true_compat) is more robust.

Confidence Score: 3/5

  • Safe to merge after addressing the test divisibility guard and clarifying legacy-reader padding semantics
  • The core feature (FrameNumPolicy enum, ParseFrameNumPolicy, experimental-reader integration) is sound. The experimental reader correctly uses HandleBoundary for all padding modes. The main concerns are: (1) a test that will always fail when the video frame count is divisible by stride×sequence_length — a concrete and reproducible bug; (2) a potential semantic gap in the legacy reader's Sequence mode where edge/repeat-padded frames may report out-of-range indices instead of the actual source-frame index. Both are addressable before merge.
  • dali/test/python/decoder/test_video.py (missing divisibility guard), dali/operators/video/legacy/reader/video_reader_op.h (linear index formula for padded frames)

Important Files Changed

Filename Overview
dali/operators/video/video_utils.h Adds FrameNumPolicy enum and ParseFrameNumPolicy helper. Logic is correct; case-sensitive matching may surprise users who pass mixed-case strings.
dali/operators/video/legacy/reader/video_reader_op.h Switches enable_frame_num_ from bool to FrameNumPolicy; adds Sequence mode using a linear index formula that may produce incorrect results for edge/reflect-padded frames if the loader fills prefetched_video.count to count_.
dali/operators/video/reader/video_reader_decoder_op.cc Correctly integrates FrameNumPolicy into the experimental reader; frame_idx_ type narrowed from int64_t to int32_t (consistent with DALI_INT32 output). HandleBoundary returns -1 for CONSTANT and correct reflected/clamped indices otherwise — logic verified.
dali/test/python/decoder/test_video.py Adds six new Python tests covering basic stride, constant-padding (-1 sentinel), non-constant padding, scalar/sequence consistency, and backward-compat for True/False/None. test_enable_frame_num_sequence_with_padding is missing the divisibility guard present in its sibling test, which can cause a spurious failure. test_enable_frame_num_sequence_matches_scalar relies on ordering equivalence between two independent pipelines rather than compare_pipelines.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Python: enable_frame_num=value] --> B[ParseFrameNumPolicy]
    B -->|'none' / False| C[FrameNumPolicy::None\nNo extra output]
    B -->|'scalar' / True| D[FrameNumPolicy::Scalar\nOutput shape 1]
    B -->|'sequence'| E[FrameNumPolicy::Sequence\nOutput shape F]
    B -->|unknown| F[DALI_FAIL]

    D --> G{Reader type}
    E --> G

    G -->|Legacy reader\nvideo_reader_op.h| H[Linear formula\nfirst_frame_idx + i * stride_\nelse -1 if i >= count]
    G -->|Experimental reader\nvideo_reader_decoder_op.cc| I[HandleBoundary\nper pad_mode]

    I -->|CONSTANT| J[Return -1\nfor out-of-bounds]
    I -->|CLAMP / REFLECT| K[Return clamped /\nreflected index]
    I -->|ISOLATED| L[DALI_FAIL\nif out-of-bounds]

    H --> M[GPU copy to output\nframe_num_output_]
    J --> N[frame_idx_ stored in\nVideoSample during Prefetch]
    K --> N
    N --> O[OutputMetadata in RunImpl]
Loading

Comments Outside Diff (4)

  1. dali/test/python/decoder/test_video.py, line 499-512 (link)

    Missing divisibility guard - test will always fail on exact-divisible video

    This test asserts a padded sequence must be found, but there is no guard for the case where total_frames is exactly divisible by stride * sequence_length (i.e., total_frames % (stride * sequence_length) == 0). When the video divides evenly, no padded tail sequence is ever generated, so the loop never finds −1 in idxs, padded_idxs stays None, and the final assertion always fails.

    test_enable_frame_num_sequence_non_constant_padding (line 574) already has the correct guard:

    if total_frames % full_seq_stride == 0:
        raise SkipTest(
            f"Video has {total_frames} frames, exactly divisible by {full_seq_stride}; "
            f"no padding needed"
        )

    The same check should be added here, right after computing total_frames:

    full_seq_stride = stride * sequence_length
    if total_frames % full_seq_stride == 0:
        raise SkipTest(
            f"Video has {total_frames} frames, exactly divisible by {full_seq_stride}; "
            f"no padding needed for pad_mode='constant'"
        )
    full_sequences = total_frames // full_seq_stride
  2. dali/operators/video/video_utils.h, line 68-76 (link)

    Case-sensitive string matching may confuse users

    ParseFrameNumPolicy only recognises exact-case strings: "none", "scalar", "sequence". Any variation ("None", "Scalar", "SEQUENCE", etc.) will throw a runtime DALI_FAIL. For comparison, GetBoundaryType (also in this file) uses a lowercase-normalised comparison via std::transform. Adopting the same pattern would provide a more user-friendly error surface, especially as the docstrings show mixed-case examples ("None", "False").

    inline FrameNumPolicy ParseFrameNumPolicy(const std::string &s) {
      std::string lower(s);
      std::transform(lower.begin(), lower.end(), lower.begin(),
                     [](unsigned char c){ return std::tolower(c); });
      if (lower == "none" || lower == "false")   return FrameNumPolicy::None;
      if (lower == "scalar" || lower == "true")  return FrameNumPolicy::Scalar;
      if (lower == "sequence")                   return FrameNumPolicy::Sequence;
      DALI_FAIL(make_string(...));
    }
  3. dali/operators/video/legacy/reader/video_reader_op.h, line 175-193 (link)

    Sequence mode frame indices may be inaccurate for non-constant-padded frames

    The per-frame index is computed as a purely linear formula:

    idxs[i] = (i < prefetched_video.count)
                  ? (prefetched_video.first_frame_idx + i * stride_)
                  : -1;

    This is correct when prefetched_video.count < count_ (indicating constant/zero padding). However, if the legacy loader fills the full max_count slots via edge-repeat or reflect padding, prefetched_video.count == count_ and the -1 branch is never taken. The "padded" repeated/reflected frames would then report frame indices that extend beyond the end of the file (e.g., first_frame_idx + (count_-1) * stride_ for a video that was padded by repeating the last frame), which is incorrect.

    The experimental reader avoids this by calling decoder_->HandleBoundary(boundary_type_, ...) which correctly maps each logical slot to its actual source frame index (or -1 for constant padding). Consider adding a comment here documenting the assumption that prefetched_video.count always equals the number of unique decoded frames (not the padded total), or align the logic with the experimental reader.

  4. dali/test/python/decoder/test_video.py, line 652-663 (link)

    Two independent pipelines may produce different sample ordering

    This test creates two separate pipeline instances (pipe_s / pipe_q) and compares the i-th element from each, implicitly assuming both pipelines always emit exactly the same sample at position i. Although this works deterministically today (same filenames, no random_shuffle, same seed), it is fragile: any internal difference in prefetch depth, initial-fill behaviour, or future reader changes could cause them to diverge silently.

    The existing test_enable_frame_num_true_compat test (line 697) already demonstrates the idiomatic solution — compare_pipelines — which explicitly guarantees the two pipelines are advanced in lock-step. Consider adopting the same approach here for robustness:

    compare_pipelines(pipe_s, pipe_q, batch_size=batch_size, N_iterations=2,
                      compare_fn=lambda a, b: (a[0], a[1][0]) == (b[0], b[1][0]))

    or compare the frame-index outputs after a manual pipe.run() in a single loop.

Last reviewed commit: f5f3664

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@JanuszL
Copy link
Contributor Author

JanuszL commented Feb 27, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44961487]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44961487]: BUILD FAILED

@mzient
Copy link
Contributor

mzient commented Feb 27, 2026

This is a breaking change - old code that would pass a boolean value will cease to work after this change.

@JanuszL
Copy link
Contributor Author

JanuszL commented Feb 27, 2026

This is a breaking change - old code that would pass a boolean value will cease to work after this change.

Good catch. I updated logic to handle True and False, reworked native tests to pass strings, still sequence_processing/video/video_file_list_outputs and checkpointing.test_dali_checkpointing.test_video_reader_resize_reade should check the backward compatibility.

@JanuszL
Copy link
Contributor Author

JanuszL commented Feb 27, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44968884]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44968884]: BUILD FAILED

@JanuszL
Copy link
Contributor Author

JanuszL commented Feb 27, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44984063]: BUILD STARTED

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Additional Comments (2)

dali/test/python/decoder/test_video.py, line 980
Consider testing enable_frame_num="sequence" with non-constant padding modes (edge, reflect, etc.) to verify behavior. The experimental reader uses HandleBoundary() which returns reflected/clamped indices for non-constant modes, while the legacy reader always uses -1 for padded frames regardless of mode.


dali/operators/video/video_utils.h, line 70
Consider using || instead of or for consistency with typical C++ style, although both are valid

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@JanuszL JanuszL force-pushed the enable_filled_frame_nums branch from 3774066 to 711d1bd Compare February 27, 2026 18:48
@JanuszL
Copy link
Contributor Author

JanuszL commented Feb 27, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44986246]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44984063]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [44986246]: BUILD PASSED

R"code(Determines what frame number information is returned as an additional output.
Only available when `file_list` or `filenames` with `labels` is passed.

* ``"none"`` or ``False`` (default): No frame number output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need the string "none", as opposed to accepting None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to None.

f"First element of 'sequence' output ({seq_idxs[0]}) should match "
f"'scalar' output ({scalar_idx})"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add tests with legacy syntax? =False and =True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Additional Comments (3)

dali/operators/video/reader/video_reader_decoder_op.cc, line 473
Non-uniform shape for variable-length sequences

The kSequence output shape is allocated as uniform_list_shape<1>(batch_size, {num_frames}) using only GetSample(0)'s frame count. However, timestamps — which face the exact same problem — correctly iterates over all samples:

// Correct pattern used by timestamps (lines 477-483):
TensorListShape<1> timestamps_shape(batch_size);
for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
    auto &sample = GetSample(sample_id);
    auto num_frames = sample.data_.shape()[0];
    timestamps_shape.set_tensor_shape(sample_id, {num_frames});
}

With pad_mode='none' (ISOLATED boundary) sequences near the end of the video can be shorter than sequence_length, producing samples where data_.shape()[0] < GetSample(0).data_.shape()[0]. When OutputMetadata then iterates over s.frame_idx_ sequentially into the pre-allocated flat buffer it will write the shorter per-sample data into the wrong offsets for subsequent samples, corrupting all following entries. The fix is to use the same per-sample iteration pattern as timestamps:

} else if (frame_num_policy_ == FrameNumPolicy::kSequence) {
    TensorListShape<1> frame_idx_shape(batch_size);
    for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
        auto &sample = GetSample(sample_id);
        auto num_frames = sample.data_.shape()[0];
        frame_idx_shape.set_tensor_shape(sample_id, {num_frames});
    }
    output_desc.push_back({frame_idx_shape, DALI_INT32});
}

dali/operators/video/video_utils.h, line 70
Case-sensitive Python bool backward-compat strings

The strings "True" and "False" rely on Python's str(True) and str(False) representations being title-cased. If DALI's Python-to-C++ argument coercion ever changes to lowercase ("true"/"false") these branches would silently fall through to DALI_FAIL. A brief comment explaining the convention and the source of these strings would make this intent clear to future readers:

  if (s == "none" || s == "False")   return FrameNumPolicy::kNone;    // "False"/"True" for Python bool back-compat
  if (s == "scalar" || s == "True")  return FrameNumPolicy::kScalar;  // Python: str(True) == "True"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


dali/operators/video/legacy/reader/video_reader_op.h, line 95
kSequence shape uses fixed count_ but frame index computation depends on actual prefetched_video.count

The shape is pre-allocated as {count_} (line 94 via label_shape_) for all samples, and the loop at line 180 fills indices up to count_ with -1 sentinels beyond prefetched_video.count. This is intentional — the output always has count_ elements.

However, the arithmetic prefetched_video.first_frame_idx + i * stride_ is a raw linear computation that bypasses the ROI handling performed by the VideoLoader. If first_frame_idx is a file-relative frame number and the clip starts mid-video, the reported indices are correct. But please verify that first_frame_idx is set to the absolute video frame index (not an ROI-relative one) to ensure consistency with the experimental reader's output, which uses the absolute frame index via HandleBoundary.

@JanuszL
Copy link
Contributor Author

JanuszL commented Mar 2, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45141308]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45141308]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45141308]: BUILD PASSED

1 similar comment
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45141308]: BUILD PASSED

Comment on lines +63 to +65
kNone, // no frame number output
kScalar, // first frame index as a scalar with shape (1,)
kSequence // per-frame indices with shape (F,); padded frames get -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we don't use kOnstant names for enum values, especially ones without an explicit numerical representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JanuszL
Copy link
Contributor Author

JanuszL commented Mar 5, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45416810]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45416810]: BUILD FAILED

JanuszL added 8 commits March 5, 2026 11:39
- extends the `enable_frame_num` argument in both the legacy
  (`fn.readers.video`) and experimental (`fn.experimental.readers.video`)
  video reader operators from a boolean to a string enum, following the
  same convention as `out_of_bounds_policy`:
  * ``"none"`` (default) - no frame number output (previous `False`)
  * ``"scalar"`` - returns the index of the first frame in the decoded
    sequence as a scalar output with shape `(1,)` (previous `True`)
  * ``"sequence"`` - returns the frame index of each decoded frame as an
    additional output with shape `(F,)`; padded frames get index `-1`

- the `FrameNumPolicy` enum and `ParseFrameNumPolicy` helper are added to
  `video_utils.h` and shared by both readers.
- tests are added for the `sequence` mode covering basic stride behaviour,
  constant-padding (``-1`` sentinel), and consistency between `"scalar"`
  and `"sequence"` outputs.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL force-pushed the enable_filled_frame_nums branch from ae299d5 to f5f3664 Compare March 5, 2026 10:42
@JanuszL
Copy link
Contributor Author

JanuszL commented Mar 5, 2026

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45419309]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [45419309]: BUILD FAILED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants