Add agent-type input to all build workflows#556
Conversation
Add the agent-type workflow input and --agent-type CLI flag to all benchmark build workflows and build_images.py scripts. This allows ACP agent runs (acp-claude, acp-codex) to build images with INSTALL_ACP=true so the claude-agent-acp/codex-acp binaries are included in the runtime images. Previously only swebench and swtbench supported this. Now all benchmarks do: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith, and swebenchmultilingual. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Straightforward, pragmatic fix that solves a real production problem.
VERDICT: ✅ Worth merging
KEY INSIGHT: This is textbook mechanical refactoring - copying a working pattern (from swebench/swtbench) to fix a real production issue (missing ACP binary). The repetition across 6 workflows and 7 scripts is unavoidable given the current architecture.
| default: 'false' | ||
| type: string | ||
| agent-type: | ||
| description: 'Agent type: default (skip ACP), acp-claude, acp-codex (keep ACP)' |
There was a problem hiding this comment.
🟢 Acceptable: Good - default value preserves backward compatibility. The || 'default' fallback at line 120 ensures existing behavior when input is not provided.
| force_build=args.force_build, | ||
| max_retries=args.max_retries, | ||
| base_image_to_custom_tag_fn=tag_fn, | ||
| extra_build_args=build_args_for_agent_type(args.agent_type), |
There was a problem hiding this comment.
🟢 Acceptable: Pattern is consistent with existing swebench implementation. The build_args_for_agent_type function is already well-tested in the codebase.
Summary
agent-typeworkflow dispatch input to all benchmark build workflows that were missing it: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmithextra_build_args=build_args_for_agent_type(args.agent_type)to allbuild_images.pyscripts that were missing it: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith, swebenchmultilingualPreviously only swebench and swtbench supported the
agent-typeinput. This meant ACP agent runs (acp-claude,acp-codex) on any other benchmark would build images withINSTALL_ACP=false, causing theclaude-agent-acpbinary to be missing from runtime images.Related
AGENT_TYPEenv var to benchmark build dispatch)acp-claudeChanges
Workflows (6 files)
Each gets the same 3 additions:
agent-typeinput definition (default:'default')CMD="$CMD --agent-type ${AGENT_TYPE}"in the build stepAGENT_TYPE: ${{ inputs.agent-type || 'default' }}env varPython scripts (7 files)
Each gets:
build_args_for_agent_typeimport frombenchmarks.utils.build_utilsextra_build_args=build_args_for_agent_type(args.agent_type)in thebuild_all_images()callTest plan
agent-typeagent-type=default) unchanged —INSTALL_ACP=falseas before🤖 Generated with Claude Code