feat(parse): implement video key frame extraction with metadata by mvanhorn · Pull Request #943 · volcengine/OpenViking

mvanhorn · 2026-03-25T00:21:31Z

Description

Implement video processing in VideoParser using opencv-python-headless. Replaces three stubs:

_extract_metadata(): extracts duration, resolution, fps, frame count from video files via cv2.VideoCapture
_extract_keyframes(): captures frames at configurable intervals (default 10s), returns (timestamp, jpeg_bytes) tuples
_generate_video_description(): produces structured markdown with metadata and keyframe timeline

Also wires real metadata into parse() so ResourceNode gets actual video dimensions instead of placeholder zeros.

Related Issue

Relates to #372

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

Add _extract_metadata() in video.py using cv2 for duration/resolution/fps
Add _extract_keyframes() in video.py for periodic frame capture with max_frames cap (30)
Replace _generate_video_description() stub with metadata + keyframe timeline output
Wire _extract_metadata() into parse() to populate real metadata values
Add [video] optional dependency group in pyproject.toml (pip install openviking[video])
Add 7 tests in tests/parse/test_video_keyframes.py

Testing

Tests mock cv2.VideoCapture to verify metadata extraction, keyframe timing, and graceful fallback
ruff format and ruff check pass
All cv2.VideoCapture instances are released in finally blocks to prevent resource leaks

Design decisions

opencv-python-headless over moviepy/ffmpeg: lighter dependency, pure pip install, no system binary required. Headless variant avoids pulling in GUI dependencies.
Max 30 keyframes: caps memory usage for long videos. Configurable via the method parameter.
Metadata in parse(): the existing parse() returned zeros for duration/width/height/fps. Now returns real values when cv2 is available, zeros otherwise (backward compatible).
No VLM calls yet: this PR adds frame extraction only. VLM scene description per keyframe can be added in a follow-up once the extraction pipeline is validated.

This is the second part of the multimodal parsing trilogy: audio ASR (#805), image OCR (#942), and video processing.

This contribution was developed with AI assistance (Claude Code).

Replace the _generate_video_description() stub with a working OpenCV integration. Adds _extract_metadata() for duration/resolution/fps and _extract_keyframes() for periodic frame capture at configurable intervals. Wires real metadata into parse() so ResourceNode gets actual video dimensions instead of zeros. Degrades gracefully when opencv-python-headless is not installed. Added as optional dependency: pip install openviking[video] Relates to volcengine#372

github-actions · 2026-03-25T00:22:14Z

Failed to generate code suggestions for PR

mvanhorn · 2026-03-25T01:15:39Z

Same build CI issue as #942 - see #942 (comment). The video extras addition to pyproject.toml triggers the build matrix, which has a pre-existing No module named pip failure. lint, tests, and CLA all pass.

mvanhorn · 2026-04-10T15:42:29Z

Closing - CI is blocked by the build matrix and this hasn't gotten reviewer attention. Will revisit if there's maintainer interest in video parsing.

github-project-automation bot added this to OpenViking project Mar 25, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 25, 2026

qin-ctx self-assigned this Mar 25, 2026

mvanhorn closed this Apr 10, 2026

github-project-automation bot moved this from Backlog to Done in OpenViking project Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parse): implement video key frame extraction with metadata#943

feat(parse): implement video key frame extraction with metadata#943
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:osc/372-video-keyframe-extraction

mvanhorn commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

mvanhorn commented Mar 25, 2026

Uh oh!

mvanhorn commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvanhorn commented Mar 25, 2026

Description

Related Issue

Type of Change

Changes Made

Testing

Design decisions

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

mvanhorn commented Mar 25, 2026

Uh oh!

mvanhorn commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants