Guidance on representative local evaluation setup for qualification phase

## Question

We're running local evaluation via `docker compose -f docker/docker-compose.yaml up` with the default `sample_config.yaml`. We'd like to understand how to make local testing more representative of the official portal evaluation.

Specifically:

1. **Does the official portal evaluation use randomized task board poses for each submission?** The docs mention randomization, but it's unclear whether each submission sees a fresh random config or a fixed (but secret) one.

2. **Are the randomization ranges for the official eval the same as those documented in `task_board_limits` in `sample_config.yaml`?** (NIC translation: [-0.0215, 0.0234]m, SC translation: [-0.06, 0.055]m, etc.)

3. **Is the task board yaw fully randomized (0-360°) or constrained to a range where insertion is kinematically feasible?**

4. **Is there a recommended way to test locally with varied configurations?** For example, should we rebuild the eval image with modified configs, or is there a launch parameter approach that works with the pre-built `ghcr.io/intrinsic-dev/aic/aic_eval` image?

We've noticed that policies perform very differently on the fixed `sample_config.yaml` vs slightly varied board poses, and want to ensure our local testing is representative before using limited daily submissions.

Thanks for any guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance on representative local evaluation setup for qualification phase #541

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Guidance on representative local evaluation setup for qualification phase #541

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions