Skip to content

Add agent-type input to all build workflows#556

Open
simonrosenberg wants to merge 1 commit intomainfrom
fix/add-agent-type-to-all-build-workflows
Open

Add agent-type input to all build workflows#556
simonrosenberg wants to merge 1 commit intomainfrom
fix/add-agent-type-to-all-build-workflows

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

  • Adds the agent-type workflow dispatch input to all benchmark build workflows that were missing it: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith
  • Adds extra_build_args=build_args_for_agent_type(args.agent_type) to all build_images.py scripts that were missing it: gaia, commit0, swebenchmultimodal, multiswebench, swegym, swesmith, swebenchmultilingual

Previously only swebench and swtbench supported the agent-type input. This meant ACP agent runs (acp-claude, acp-codex) on any other benchmark would build images with INSTALL_ACP=false, causing the claude-agent-acp binary to be missing from runtime images.

Related

Changes

Workflows (6 files)

Each gets the same 3 additions:

  1. agent-type input definition (default: 'default')
  2. CMD="$CMD --agent-type ${AGENT_TYPE}" in the build step
  3. AGENT_TYPE: ${{ inputs.agent-type || 'default' }} env var

Python scripts (7 files)

Each gets:

  1. build_args_for_agent_type import from benchmarks.utils.build_utils
  2. extra_build_args=build_args_for_agent_type(args.agent_type) in the build_all_images() call

Test plan

  • Merge companion evaluation PR first
  • Trigger an ACP gaia eval to verify the previously-failing workflow now accepts agent-type
  • Default behavior (agent-type=default) unchanged — INSTALL_ACP=false as before

🤖 Generated with Claude Code

Add the agent-type workflow input and --agent-type CLI flag to all
benchmark build workflows and build_images.py scripts. This allows
ACP agent runs (acp-claude, acp-codex) to build images with
INSTALL_ACP=true so the claude-agent-acp/codex-acp binaries are
included in the runtime images.

Previously only swebench and swtbench supported this. Now all
benchmarks do: gaia, commit0, swebenchmultimodal, multiswebench,
swegym, swesmith, and swebenchmultilingual.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Straightforward, pragmatic fix that solves a real production problem.

VERDICT: ✅ Worth merging

KEY INSIGHT: This is textbook mechanical refactoring - copying a working pattern (from swebench/swtbench) to fix a real production issue (missing ACP binary). The repetition across 6 workflows and 7 scripts is unavoidable given the current architecture.

default: 'false'
type: string
agent-type:
description: 'Agent type: default (skip ACP), acp-claude, acp-codex (keep ACP)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Good - default value preserves backward compatibility. The || 'default' fallback at line 120 ensures existing behavior when input is not provided.

force_build=args.force_build,
max_retries=args.max_retries,
base_image_to_custom_tag_fn=tag_fn,
extra_build_args=build_args_for_agent_type(args.agent_type),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Pattern is consistent with existing swebench implementation. The build_args_for_agent_type function is already well-tested in the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants