Problem
SWT-bench benchmark images (source-minimal target) bundle three heavy dependency groups that benchmarks never use:
| Dependency |
Where declared |
Install cost |
Used by benchmarks? |
@zed-industries/claude-agent-acp, @zed-industries/codex-acp |
Dockerfile L88-96 (npm) |
~38s/image |
No |
boto3 → botocore |
Dockerfile L34 (--extra boto3) |
~5-10s/image + large install size |
No |
browser-use → playwright |
openhands-tools/pyproject.toml L14 (hard dep) |
~15-30s/image + large install size |
No |
These add build time, disk footprint (3+ GiB/image), and push time to every benchmark image — for functionality benchmarks don't exercise.
However, these dependencies are critical for other OpenHands users (ACP for Claude Code/Codex agent support, boto3 for Bedrock model discovery, browser-use for browser automation). We cannot simply remove them.
Proposal
Add build-time flags with safe defaults that preserve current behavior for all existing users, while allowing benchmarks to opt out of unused dependencies:
# New build args (in base-image-minimal stage)
ARG INSTALL_ACP=true
ARG INSTALL_BOTO3=true
ARG INSTALL_BROWSER=true
- Default
true = identical to today. No user sees any change.
- Benchmarks pass
false = lighter images, faster builds.
Dependency-by-dependency analysis
1. npm ACP packages — trivial
The ACP npm packages are installed unconditionally in base-image-minimal (Dockerfile L88-96):
npm install -g @zed-industries/claude-agent-acp @zed-industries/codex-acp
ACP is architecturally isolated — only loaded when running in ACP server mode. The agent server and benchmark evaluation paths never import it.
Fix: Wrap in a conditional:
ARG INSTALL_ACP=true
RUN set -eux; \
if ! command -v npm >/dev/null 2>&1; then \
curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
apt-get install -y --no-install-recommends nodejs && \
rm -rf /var/lib/apt/lists/*; \
fi; \
if [ "$INSTALL_ACP" = "true" ]; then \
npm install -g @zed-industries/claude-agent-acp @zed-industries/codex-acp; \
fi
2. boto3/botocore — trivial
boto3 is already an optional extra in openhands-sdk/pyproject.toml L29-30:
[project.optional-dependencies]
boto3 = ["boto3>=1.35.0"]
And the runtime already handles its absence gracefully via lazy import in unverified_models.py:
def _get_boto3():
try:
return importlib.import_module("boto3")
except ModuleNotFoundError:
return None
If boto3 isn't installed, Bedrock model listing is skipped with a warning. Everything else works fine.
The only reason it's always installed is that the Dockerfile unconditionally passes --extra boto3 (Dockerfile L34):
uv sync --frozen --no-editable --managed-python --extra boto3
Fix: Conditionally include the extra:
ARG INSTALL_BOTO3=true
RUN ... uv sync --frozen --no-editable --managed-python $([ "$INSTALL_BOTO3" = "true" ] && echo "--extra boto3")
3. browser-use — moderate (but well-positioned)
browser-use>=0.8.0 is currently a hard dependency of openhands-tools (pyproject.toml L14):
dependencies = [
...
"browser-use>=0.8.0",
...
]
However, the runtime already treats it as optional. Browser tools are conditionally loaded behind an enable_browser flag in preset/default.py:
if enable_browser:
from openhands.tools.browser_use import BrowserToolSet
CLI mode explicitly disables browser tools (enable_browser=not cli_mode). Benchmarks also don't use them.
Fix (two parts):
-
Move browser-use to an optional extra in openhands-tools/pyproject.toml:
dependencies = [
"openhands-sdk",
"bashlex>=0.18",
"binaryornot>=0.4.4",
"cachetools",
"libtmux>=0.53.0",
"pydantic>=2.11.7",
"func-timeout>=4.3.5",
"tom-swe>=1.0.3",
]
[project.optional-dependencies]
browser = ["browser-use>=0.8.0"]
-
Add a try/except guard in preset/default.py for when the package isn't installed:
if enable_browser:
try:
from openhands.tools.browser_use import BrowserToolSet
logger.debug(f"Tool: {BrowserToolSet.name} registered.")
except ImportError:
logger.warning("browser-use not installed — browser tools unavailable")
-
Add a corresponding Dockerfile build arg and conditionally include --extra browser in uv sync.
Changes required
SDK repo (software-agent-sdk)
| File |
Change |
Effort |
openhands-agent-server/.../Dockerfile |
Add INSTALL_ACP, INSTALL_BOTO3, INSTALL_BROWSER build args with true defaults; wrap npm ACP install in conditional; conditionally pass --extra boto3 and --extra browser to uv sync |
Small |
openhands-tools/pyproject.toml |
Move browser-use from dependencies to [project.optional-dependencies] browser = [...] |
Small |
openhands-tools/.../preset/default.py |
Add ImportError guard around BrowserToolSet import |
Small |
openhands-agent-server/.../docker/build.py |
Accept and forward new build args |
Small |
Benchmarks repo
| File |
Change |
Effort |
benchmarks/utils/build_utils.py |
Pass --build-arg INSTALL_ACP=false --build-arg INSTALL_BOTO3=false --build-arg INSTALL_BROWSER=false for benchmark builds |
Small |
.github/workflows/build-swtbench-images.yml |
Optionally expose the flags as workflow inputs |
Small |
Expected impact
| Savings |
Per image |
At 433 images |
| Skip npm ACP install |
~38s |
~4.5 hours |
| Skip browser-use + playwright |
~15-30s install + smaller image |
~2-3 hours |
| Skip boto3/botocore |
~5-10s |
~0.5-1 hour |
| Smaller image → faster export/push |
~10-20s |
~1-2 hours |
Combined with the ARG cache fix from #531 (SDK PR #2522), cold builds for 433 images could drop below 4 hours.
Non-breaking guarantee
- All build args default to
true — existing docker build invocations produce identical images
pip install openhands-tools continues to work (browser-use becomes an extra, but the Dockerfile includes it by default)
- Runtime code already handles missing browser tools and missing boto3 gracefully
- Only benchmark builds explicitly opt out via
--build-arg
Related
Problem
SWT-bench benchmark images (
source-minimaltarget) bundle three heavy dependency groups that benchmarks never use:@zed-industries/claude-agent-acp,@zed-industries/codex-acpboto3→botocore--extra boto3)browser-use→playwrightThese add build time, disk footprint (3+ GiB/image), and push time to every benchmark image — for functionality benchmarks don't exercise.
However, these dependencies are critical for other OpenHands users (ACP for Claude Code/Codex agent support, boto3 for Bedrock model discovery, browser-use for browser automation). We cannot simply remove them.
Proposal
Add build-time flags with safe defaults that preserve current behavior for all existing users, while allowing benchmarks to opt out of unused dependencies:
true= identical to today. No user sees any change.false= lighter images, faster builds.Dependency-by-dependency analysis
1. npm ACP packages — trivial
The ACP npm packages are installed unconditionally in
base-image-minimal(Dockerfile L88-96):ACP is architecturally isolated — only loaded when running in ACP server mode. The agent server and benchmark evaluation paths never import it.
Fix: Wrap in a conditional:
2. boto3/botocore — trivial
boto3 is already an optional extra in openhands-sdk/pyproject.toml L29-30:
And the runtime already handles its absence gracefully via lazy import in
unverified_models.py:If boto3 isn't installed, Bedrock model listing is skipped with a warning. Everything else works fine.
The only reason it's always installed is that the Dockerfile unconditionally passes
--extra boto3(Dockerfile L34):Fix: Conditionally include the extra:
3. browser-use — moderate (but well-positioned)
browser-use>=0.8.0is currently a hard dependency of openhands-tools (pyproject.toml L14):However, the runtime already treats it as optional. Browser tools are conditionally loaded behind an
enable_browserflag inpreset/default.py:CLI mode explicitly disables browser tools (
enable_browser=not cli_mode). Benchmarks also don't use them.Fix (two parts):
Move
browser-useto an optional extra inopenhands-tools/pyproject.toml:Add a try/except guard in
preset/default.pyfor when the package isn't installed:Add a corresponding Dockerfile build arg and conditionally include
--extra browserinuv sync.Changes required
SDK repo (
software-agent-sdk)openhands-agent-server/.../DockerfileINSTALL_ACP,INSTALL_BOTO3,INSTALL_BROWSERbuild args withtruedefaults; wrap npm ACP install in conditional; conditionally pass--extra boto3and--extra browsertouv syncopenhands-tools/pyproject.tomlbrowser-usefromdependenciesto[project.optional-dependencies] browser = [...]openhands-tools/.../preset/default.pyImportErrorguard aroundBrowserToolSetimportopenhands-agent-server/.../docker/build.pyBenchmarks repo
benchmarks/utils/build_utils.py--build-arg INSTALL_ACP=false --build-arg INSTALL_BOTO3=false --build-arg INSTALL_BROWSER=falsefor benchmark builds.github/workflows/build-swtbench-images.ymlExpected impact
Combined with the ARG cache fix from #531 (SDK PR #2522), cold builds for 433 images could drop below 4 hours.
Non-breaking guarantee
true— existingdocker buildinvocations produce identical imagespip install openhands-toolscontinues to work (browser-use becomes an extra, but the Dockerfile includes it by default)--build-argRelated