Add agent-type input to all build workflows by simonrosenberg · Pull Request #556 · OpenHands/benchmarks

simonrosenberg · 2026-03-23T12:12:24Z

Summary

Adds the agent-type workflow dispatch input to all benchmark build workflows that were missing it: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith
Adds extra_build_args=build_args_for_agent_type(args.agent_type) to all build_images.py scripts that were missing it: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith, swebenchmultilingual

Previously only swebench and swtbench supported the agent-type input. This meant ACP agent runs (acp-claude, acp-codex) on any other benchmark would build images with INSTALL_ACP=false, causing the claude-agent-acp binary to be missing from runtime images.

Changes

Workflows (6 files)

Each gets the same 3 additions:

agent-type input definition (default: 'default')
CMD="$CMD --agent-type ${AGENT_TYPE}" in the build step
AGENT_TYPE: ${{ inputs.agent-type || 'default' }} env var

Python scripts (7 files)

Each gets:

build_args_for_agent_type import from benchmarks.utils.build_utils
extra_build_args=build_args_for_agent_type(args.agent_type) in the build_all_images() call

Test plan

Merge companion evaluation PR first
Trigger an ACP gaia eval to verify the previously-failing workflow now accepts agent-type
Default behavior (agent-type=default) unchanged — INSTALL_ACP=false as before

🤖 Generated with Claude Code

Add the agent-type workflow input and --agent-type CLI flag to all benchmark build workflows and build_images.py scripts. This allows ACP agent runs (acp-claude, acp-codex) to build images with INSTALL_ACP=true so the claude-agent-acp/codex-acp binaries are included in the runtime images. Previously only swebench and swtbench supported this. Now all benchmarks do: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith, and swebenchmultilingual. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot

🟢 Good taste - Straightforward, pragmatic fix that solves a real production problem.

VERDICT: ✅ Worth merging

KEY INSIGHT: This is textbook mechanical refactoring - copying a working pattern (from swebench/swtbench) to fix a real production issue (missing ACP binary). The repetition across 6 workflows and 7 scripts is unavoidable given the current architecture.

all-hands-bot · 2026-03-23T12:16:27Z

.github/workflows/build-gaia-images.yml

        default: 'false'
        type: string
+      agent-type:
+        description: 'Agent type: default (skip ACP), acp-claude, acp-codex (keep ACP)'


🟢 Acceptable: Good - default value preserves backward compatibility. The || 'default' fallback at line 120 ensures existing behavior when input is not provided.

all-hands-bot · 2026-03-23T12:16:27Z

benchmarks/gaia/build_images.py

        force_build=args.force_build,
        max_retries=args.max_retries,
        base_image_to_custom_tag_fn=tag_fn,
+        extra_build_args=build_args_for_agent_type(args.agent_type),


🟢 Acceptable: Pattern is consistent with existing swebench implementation. The build_args_for_agent_type function is already well-tested in the codebase.

simonrosenberg requested a review from all-hands-bot March 23, 2026 12:13

all-hands-bot reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent-type input to all build workflows#556

Add agent-type input to all build workflows#556
simonrosenberg wants to merge 1 commit intomainfrom
fix/add-agent-type-to-all-build-workflows

simonrosenberg commented Mar 23, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Mar 23, 2026

Uh oh!

all-hands-bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonrosenberg commented Mar 23, 2026

Summary

Related

Changes

Workflows (6 files)

Python scripts (7 files)

Test plan

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants