Skip to content

feat(parse): implement video key frame extraction with metadata#943

Closed
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:osc/372-video-keyframe-extraction
Closed

feat(parse): implement video key frame extraction with metadata#943
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:osc/372-video-keyframe-extraction

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Description

Implement video processing in VideoParser using opencv-python-headless. Replaces three stubs:

  • _extract_metadata(): extracts duration, resolution, fps, frame count from video files via cv2.VideoCapture
  • _extract_keyframes(): captures frames at configurable intervals (default 10s), returns (timestamp, jpeg_bytes) tuples
  • _generate_video_description(): produces structured markdown with metadata and keyframe timeline

Also wires real metadata into parse() so ResourceNode gets actual video dimensions instead of placeholder zeros.

Related Issue

Relates to #372

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Add _extract_metadata() in video.py using cv2 for duration/resolution/fps
  • Add _extract_keyframes() in video.py for periodic frame capture with max_frames cap (30)
  • Replace _generate_video_description() stub with metadata + keyframe timeline output
  • Wire _extract_metadata() into parse() to populate real metadata values
  • Add [video] optional dependency group in pyproject.toml (pip install openviking[video])
  • Add 7 tests in tests/parse/test_video_keyframes.py

Testing

  • Tests mock cv2.VideoCapture to verify metadata extraction, keyframe timing, and graceful fallback
  • ruff format and ruff check pass
  • All cv2.VideoCapture instances are released in finally blocks to prevent resource leaks

Design decisions

  • opencv-python-headless over moviepy/ffmpeg: lighter dependency, pure pip install, no system binary required. Headless variant avoids pulling in GUI dependencies.
  • Max 30 keyframes: caps memory usage for long videos. Configurable via the method parameter.
  • Metadata in parse(): the existing parse() returned zeros for duration/width/height/fps. Now returns real values when cv2 is available, zeros otherwise (backward compatible).
  • No VLM calls yet: this PR adds frame extraction only. VLM scene description per keyframe can be added in a follow-up once the extraction pipeline is validated.

This is the second part of the multimodal parsing trilogy: audio ASR (#805), image OCR (#942), and video processing.

This contribution was developed with AI assistance (Claude Code).

Replace the _generate_video_description() stub with a working OpenCV
integration. Adds _extract_metadata() for duration/resolution/fps and
_extract_keyframes() for periodic frame capture at configurable intervals.

Wires real metadata into parse() so ResourceNode gets actual video
dimensions instead of zeros. Degrades gracefully when opencv-python-headless
is not installed. Added as optional dependency: pip install openviking[video]

Relates to volcengine#372
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@mvanhorn
Copy link
Copy Markdown
Contributor Author

Same build CI issue as #942 - see #942 (comment). The video extras addition to pyproject.toml triggers the build matrix, which has a pre-existing No module named pip failure. lint, tests, and CLA all pass.

@qin-ctx qin-ctx self-assigned this Mar 25, 2026
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Closing - CI is blocked by the build matrix and this hasn't gotten reviewer attention. Will revisit if there's maintainer interest in video parsing.

@mvanhorn mvanhorn closed this Apr 10, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants