Add enable_frame_num='sequence' mode to video readers.#6237
Add enable_frame_num='sequence' mode to video readers.#6237JanuszL wants to merge 8 commits intoNVIDIA:mainfrom
Conversation
Greptile SummaryThis PR extends the Key observations:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Python: enable_frame_num=value] --> B[ParseFrameNumPolicy]
B -->|'none' / False| C[FrameNumPolicy::None\nNo extra output]
B -->|'scalar' / True| D[FrameNumPolicy::Scalar\nOutput shape 1]
B -->|'sequence'| E[FrameNumPolicy::Sequence\nOutput shape F]
B -->|unknown| F[DALI_FAIL]
D --> G{Reader type}
E --> G
G -->|Legacy reader\nvideo_reader_op.h| H[Linear formula\nfirst_frame_idx + i * stride_\nelse -1 if i >= count]
G -->|Experimental reader\nvideo_reader_decoder_op.cc| I[HandleBoundary\nper pad_mode]
I -->|CONSTANT| J[Return -1\nfor out-of-bounds]
I -->|CLAMP / REFLECT| K[Return clamped /\nreflected index]
I -->|ISOLATED| L[DALI_FAIL\nif out-of-bounds]
H --> M[GPU copy to output\nframe_num_output_]
J --> N[frame_idx_ stored in\nVideoSample during Prefetch]
K --> N
N --> O[OutputMetadata in RunImpl]
|
|
!build |
|
CI MESSAGE: [44961487]: BUILD STARTED |
|
CI MESSAGE: [44961487]: BUILD FAILED |
|
This is a breaking change - old code that would pass a boolean value will cease to work after this change. |
Good catch. I updated logic to handle |
|
!build |
|
CI MESSAGE: [44968884]: BUILD STARTED |
|
CI MESSAGE: [44968884]: BUILD FAILED |
|
!build |
|
CI MESSAGE: [44984063]: BUILD STARTED |
Additional Comments (2)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
3774066 to
711d1bd
Compare
|
!build |
|
CI MESSAGE: [44986246]: BUILD STARTED |
|
CI MESSAGE: [44984063]: BUILD PASSED |
|
CI MESSAGE: [44986246]: BUILD PASSED |
| R"code(Determines what frame number information is returned as an additional output. | ||
| Only available when `file_list` or `filenames` with `labels` is passed. | ||
|
|
||
| * ``"none"`` or ``False`` (default): No frame number output. |
There was a problem hiding this comment.
I wonder if we need the string "none", as opposed to accepting None
| f"First element of 'sequence' output ({seq_idxs[0]}) should match " | ||
| f"'scalar' output ({scalar_idx})" | ||
| ) | ||
|
|
There was a problem hiding this comment.
can we add tests with legacy syntax? =False and =True?
Additional Comments (3)
The // Correct pattern used by timestamps (lines 477-483):
TensorListShape<1> timestamps_shape(batch_size);
for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
auto &sample = GetSample(sample_id);
auto num_frames = sample.data_.shape()[0];
timestamps_shape.set_tensor_shape(sample_id, {num_frames});
}With } else if (frame_num_policy_ == FrameNumPolicy::kSequence) {
TensorListShape<1> frame_idx_shape(batch_size);
for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
auto &sample = GetSample(sample_id);
auto num_frames = sample.data_.shape()[0];
frame_idx_shape.set_tensor_shape(sample_id, {num_frames});
}
output_desc.push_back({frame_idx_shape, DALI_INT32});
}
The strings Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
The shape is pre-allocated as However, the arithmetic |
|
!build |
|
CI MESSAGE: [45141308]: BUILD STARTED |
|
CI MESSAGE: [45141308]: BUILD FAILED |
|
CI MESSAGE: [45141308]: BUILD PASSED |
1 similar comment
|
CI MESSAGE: [45141308]: BUILD PASSED |
dali/operators/video/video_utils.h
Outdated
| kNone, // no frame number output | ||
| kScalar, // first frame index as a scalar with shape (1,) | ||
| kSequence // per-frame indices with shape (F,); padded frames get -1 |
There was a problem hiding this comment.
Nit: we don't use kOnstant names for enum values, especially ones without an explicit numerical representation.
|
!build |
|
CI MESSAGE: [45416810]: BUILD STARTED |
|
CI MESSAGE: [45416810]: BUILD FAILED |
- extends the `enable_frame_num` argument in both the legacy
(`fn.readers.video`) and experimental (`fn.experimental.readers.video`)
video reader operators from a boolean to a string enum, following the
same convention as `out_of_bounds_policy`:
* ``"none"`` (default) - no frame number output (previous `False`)
* ``"scalar"`` - returns the index of the first frame in the decoded
sequence as a scalar output with shape `(1,)` (previous `True`)
* ``"sequence"`` - returns the frame index of each decoded frame as an
additional output with shape `(F,)`; padded frames get index `-1`
- the `FrameNumPolicy` enum and `ParseFrameNumPolicy` helper are added to
`video_utils.h` and shared by both readers.
- tests are added for the `sequence` mode covering basic stride behaviour,
constant-padding (``-1`` sentinel), and consistency between `"scalar"`
and `"sequence"` outputs.
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
ae299d5 to
f5f3664
Compare
|
!build |
|
CI MESSAGE: [45419309]: BUILD STARTED |
|
CI MESSAGE: [45419309]: BUILD FAILED |
extends the
enable_frame_numargument in both the legacy(
fn.readers.video) and experimental (fn.experimental.readers.video)video reader operators from a boolean to a string enum, following the
same convention as
out_of_bounds_policy:"none"(default) - no frame number output (previousFalse)"scalar"- returns the index of the first frame in the decodedsequence as a scalar output with shape
(1,)(previousTrue)"sequence"- returns the frame index of each decoded frame as anadditional output with shape
(F,); padded frames get index-1the
FrameNumPolicyenum andParseFrameNumPolicyhelper are added tovideo_utils.hand shared by both readers.tests are added for the
sequencemode covering basic stride behaviour,constant-padding (
-1sentinel), and consistency between"scalar"and
"sequence"outputs.Category:
New feature (non-breaking change which adds functionality)
Relates to #4410
Description:
extends the
enable_frame_numargument in both the legacy(
fn.readers.video) and experimental (fn.experimental.readers.video)video reader operators from a boolean to a string enum, following the
same convention as
out_of_bounds_policy:"none"(default) - no frame number output (previousFalse)"scalar"- returns the index of the first frame in the decodedsequence as a scalar output with shape
(1,)(previousTrue)"sequence"- returns the frame index of each decoded frame as anadditional output with shape
(F,); padded frames get index-1the
FrameNumPolicyenum andParseFrameNumPolicyhelper are added tovideo_utils.hand shared by both readers.tests are added for the
sequencemode covering basic stride behaviour,constant-padding (
-1sentinel), and consistency between"scalar"and
"sequence"outputs.Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: RVID.28
JIRA TASK: N/A