Skip to content

🚀 Feature Proposal: Interactive GUI for Video Object Segmentation (VOS) #722

@MDSALMANSHAMS

Description

@MDSALMANSHAMS

Summary

Propose adding a new example notebook that implements an interactive, OpenCV-based GUI for Video Object Segmentation (VOS) using SAM2VideoPredictor. This enhances the user experience for segmenting and tracking objects across video frames.

Motivation

Existing examples lack a direct, interactive demonstration of SAM-2's advanced VOS and object tracking capabilities. This new example will:

  1. Showcase Temporal Tracking: Directly utilize SAM2VideoPredictor's state-aware inference (inference_state, obj_id) for robust, continuous tracking.
  2. Enable User Refinement: Allow users to provide real-time positive/negative point prompts (mouse clicks) to correct segmentation errors in each frame.
  3. Provide a Practical Tool: Offer a ready-to-use boilerplate for interactive video annotation and qualitative model evaluation.

Core Changes

The implementation is contained within a custom Python class (InteractiveAnnotatorWithSAM) that manages the following:

  • GUI Setup: Uses cv2 (OpenCV) to handle frame display and mouse/key callbacks.
  • Prompting: Captures right-clicks (positive) and left-clicks (negative) to guide the segmentation.
  • On-Demand Prediction: Triggers predictor.add_new_points_or_box(...) on keypress ('v') and displays the mask overlay immediately for visual feedback.
  • Flow Control: Allows users to advance to the next frame ('n') while preserving the object's tracking state.

I have a working notebook ready for submission. Please let me know if this feature aligns with the repository's goals so I can prepare a Pull Request.

@haithamkhedr @NielsRogge @ronghanghu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions