Skip to content

Guidance on representative local evaluation setup for qualification phase #541

Description

@jashshah999

Question

We're running local evaluation via docker compose -f docker/docker-compose.yaml up with the default sample_config.yaml. We'd like to understand how to make local testing more representative of the official portal evaluation.

Specifically:

  1. Does the official portal evaluation use randomized task board poses for each submission? The docs mention randomization, but it's unclear whether each submission sees a fresh random config or a fixed (but secret) one.

  2. Are the randomization ranges for the official eval the same as those documented in task_board_limits in sample_config.yaml? (NIC translation: [-0.0215, 0.0234]m, SC translation: [-0.06, 0.055]m, etc.)

  3. Is the task board yaw fully randomized (0-360°) or constrained to a range where insertion is kinematically feasible?

  4. Is there a recommended way to test locally with varied configurations? For example, should we rebuild the eval image with modified configs, or is there a launch parameter approach that works with the pre-built ghcr.io/intrinsic-dev/aic/aic_eval image?

We've noticed that policies perform very differently on the fixed sample_config.yaml vs slightly varied board poses, and want to ensure our local testing is representative before using limited daily submissions.

Thanks for any guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions