feat: three-phase image build with local docker for 4x speedup by simonrosenberg · Pull Request #555 · OpenHands/benchmarks

simonrosenberg · 2026-03-23T01:39:14Z

Summary

Add a phased benchmark image build path that uses:

one shared builder image
per-instance base images
local final assembly with docker build + docker push

This path is enabled through the existing workflow input use-prebuilt-bases and is implemented with a dedicated entrypoint: benchmarks/swebench/build_phased_images.py.

Current Behavior

When use-prebuilt-bases=true, the workflows now run the phased path:

Phase 0: build and push ghcr.io/openhands/eval-builder:{sdk_sha}
Phase 1: build and push ghcr.io/openhands/eval-base:{instance_tag}
Phase 2: assemble final ghcr.io/openhands/eval-agent-server:{sdk_sha}-{instance_tag}-source-minimal images locally

Final assembly uses the thin benchmarks/swebench/Dockerfile.agent-layer Dockerfile and local Docker instead of remote docker buildx --push.

When use-prebuilt-bases is not enabled, the existing standard build path remains unchanged.

What Is In This PR

New phased-build entrypoint: benchmarks/swebench/build_phased_images.py
Shared helper logic for:
- building the shared builder image
- building base images
- assembling final images locally
Workflow integration for both SWE-bench and SWT-bench
Final-image wrapping still applied for repos that need the docutils/roman layer
Updated SDK submodule to the SDK branch that exposes the required builder / base-image-minimal targets

Why This Design

The heavy work is amortized into reusable images:

the builder image changes with the SDK
the base images change with the benchmark environments
the final assembly step becomes lightweight and can reuse local Docker layer state across many images in one job

Results

Validated workflow runs:

Three-phase remote buildx assembly: run 23409630540
- about 455s/image
Three-phase local assembly: run 23409964288
- 87.12s/image average for 100 images
Three-phase local assembly: run 23417634029
- 70.654s/image average, 61.4s median for 500 images
Three-phase local assembly: run 23417634506
- 69.019s/image average, 59.2s median for 433 images

Dependency

Depends on:

feat: support pre-built base images for faster rebuilds software-agent-sdk#2542

Scope

This PR is about the phased local-assembly path only. The legacy direct-build path is still preserved as the default when the phased mode is not selected.

all-hands-bot

Taste Rating: 🟡 Acceptable - Solves a real problem (6h→52min) but ships 700 lines of untested concurrency code into production build infrastructure.

all-hands-bot · 2026-03-23T01:42:35Z

benchmarks/swebench/build_base_images.py

+
+
+if __name__ == "__main__":
+    sys.exit(main())


🔴 Critical - Testing Gap: This entire 702-line file has zero tests. This is complex build orchestration code with:

Concurrent ProcessPoolExecutor workflows

Retry logic with exponential backoff

Docker build/push operations

State management across phases

If this breaks, it blocks all image builds. At minimum, add integration tests that:

Build a single base image end-to-end

Assemble a single agent image end-to-end

Handle a build failure and verify retry logic

Verify the manifest.jsonl output format

This is build infrastructure - test it like your production depends on it (because it does).

all-hands-bot · 2026-03-23T01:42:36Z

benchmarks/swebench/build_base_images.py

+    logger.info("[assembly] Building: %s", " ".join(build_cmd))
+    build_started = time.monotonic()
+    proc = subprocess.run(build_cmd, text=True, capture_output=True)
+    build_seconds = round(time.monotonic() - build_started, 3)
+
+    if proc.stdout:
+        print(proc.stdout, end="")
+    if proc.stderr:
+        print(proc.stderr, end="", file=sys.stderr)
+
+    if proc.returncode != 0:
+        error = (
+            proc.stderr.strip()
+            or proc.stdout.strip()
+            or f"docker build failed with exit code {proc.returncode}"
+        )
+        logger.info(
+            "[assembly] FAILED %s: build_seconds=%.1f error=%s",
+            tag_label,
+            build_seconds,
+            error[:200],
+        )
+        return BuildOutput(base_image=base_tag, tags=[], error=error)
+
+    # Step 2: docker push each tag


🟠 Important - Error Handling: This push loop fails fast on the first error:

if push: for t in final_tags: # ... push ... if push_proc.returncode != 0: return BuildOutput(..., error=error)

If you have 3 tags and push #1 fails but #2 and #3 would succeed, you lose that information and waste the successful docker build.

Better approach:

failed_tags = [] for t in final_tags: # ... push ... if push_proc.returncode != 0: failed_tags.append((t, error)) if failed_tags: return BuildOutput(..., error=f"Failed to push {len(failed_tags)}/{len(final_tags)} tags: {failed_tags}")

At least report partial success rather than treating it as total failure.

all-hands-bot · 2026-03-23T01:42:36Z

benchmarks/swebench/build_base_images.py

+            str(ctx / "Dockerfile"),
+            "--target",
+            "builder",


🟡 Suggestion - Silent Cleanup Failure: shutil.rmtree(ctx, ignore_errors=True) silently swallows errors. If cleanup fails (disk full? permission issue?), you won't know until subsequent builds fail mysteriously.

Suggested change

str(ctx / "Dockerfile"),

"--target",

"builder",

try:

shutil.rmtree(ctx)

except Exception as e:

logger.warning("Failed to cleanup build context %s: %s", ctx, e)

all-hands-bot · 2026-03-23T01:42:36Z

benchmarks/swebench/build_base_images.py

+    tag = builder_image_tag(builder_image)
+
+    if remote_image_exists(tag):
+        logger.info("Builder image %s already exists. Skipping.", tag)


🟡 Suggestion - Data Structure Misuse: BuildOutput(base_image="builder", ...) is semantically wrong. The builder image isn't a base image - this violates the principle of making data structures match your problem domain.

Either:

Add a separate BuilderOutput type, or

Make base_image optional and add a builder_image field

Using "builder" as a magic string here will confuse anyone reading logs or debugging build failures.

all-hands-bot · 2026-03-23T01:42:36Z

.github/workflows/build-swebench-images.yml

            echo "Warning: /var/lib/buildkit not found; skipping disk check"
          fi

-      - name: Build and push SWE-Bench images
+      - name: "Phase 0: Build and push builder image (pre-built base mode)"
+        if: ${{ env.USE_PREBUILT_BASES == 'true' }}
+        run: |
+          set -euo pipefail
+          uv run python -c "
+          from benchmarks.swebench.build_base_images import build_builder_image
+          result = build_builder_image(push=True)
+          if result.error:
+              raise RuntimeError(f'Builder build failed: {result.error}')
+          print(f'Builder image: {result.tags[0]}')
+          # Export for next steps
+          import os
+          with open(os.environ['GITHUB_ENV'], 'a') as f:
+              f.write(f'BUILDER_TAG={result.tags[0]}\n')
+          "
+        env:
+          DOCKER_BUILDKIT: 1
+          BUILDKIT_PROGRESS: plain
+
+      - name: "Phase 1: Build and push base images (pre-built base mode)"
+        if: ${{ env.USE_PREBUILT_BASES == 'true' }}
+        run: |
+          set -euo pipefail
+
+          CMD="uv run benchmarks/swebench/build_base_images.py \
+            --dataset '${DATASET}' \
+            --split '${SPLIT}' \
+            --push \
+            --max-workers '${MAX_WORKERS}' \
+            --max-retries '${MAX_RETRIES}'"
+
+          if [ -n "${N_LIMIT}" ]; then
+            CMD="$CMD --n-limit '${N_LIMIT}'"
+          fi
+          if [ -n "${SELECT_FILE}" ]; then
+            CMD="$CMD --select '${SELECT_FILE}'"
+          fi
+
+          echo "Running: $CMD"
+          eval "$CMD"
+        env:
+          DOCKER_BUILDKIT: 1
+          BUILDKIT_PROGRESS: plain
+
+      - name: "Phase 2: Assemble agent images (pre-built base mode)"
+        if: ${{ env.USE_PREBUILT_BASES == 'true' }}
+        run: |
+          set -euo pipefail
+          FORCE_BUILD="${{ inputs.force-build || 'false' }}"
+
+          uv run python -c "
+          from benchmarks.swebench.build_images import collect_unique_base_images
+          from benchmarks.swebench.build_base_images import assemble_all_agent_images
+          from benchmarks.utils.build_utils import default_build_output_dir
+          import sys
+
+          base_images = collect_unique_base_images(
+              '${DATASET}', '${SPLIT}',
+              int('${N_LIMIT}') if '${N_LIMIT}' else 0,
+              '${SELECT_FILE}' if '${SELECT_FILE}' else None,
+          )
+          build_dir = default_build_output_dir('${DATASET}', '${SPLIT}')
+          rc = assemble_all_agent_images(
+              base_images=base_images,
+              builder_tag='${BUILDER_TAG}',
+              build_dir=build_dir,
+              push=True,
+              max_workers=int('${MAX_WORKERS}'),
+              max_retries=int('${MAX_RETRIES}'),
+              force_build=('${FORCE_BUILD}' == 'true'),
+          )
+          sys.exit(rc)
+          "
+        env:
+          DOCKER_BUILDKIT: 1
+          BUILDKIT_PROGRESS: plain
+          FORCE_BUILD: ${{ inputs.force-build || 'false' }}
+


🟡 Suggestion - Workflow Complexity: These three phases add 80+ lines of bash/Python/YAML mixing with complex conditionals:

${{ env.USE_PREBUILT_BASES == 'true' }}

Inline Python in bash heredocs

Multi-line uv run python -c blocks

This will be painful to debug when it breaks in CI. Consider:

Extract the phase logic into a single Python script (build_with_phases.py --phase 0|1|2)

Keep workflow steps simple: just call the script with appropriate args

Makes local reproduction easier: uv run build_with_phases.py --phase 1

Simple scripts are easier to test and debug than workflow YAML.

all-hands-bot · 2026-03-23T01:42:36Z

.github/workflows/build-swtbench-images.yml

+      - name: "Phase 0: Build and push builder image (pre-built base mode)"
+        if: ${{ inputs.use-prebuilt-bases == true }}
+        run: |
+          set -euo pipefail
+          uv run python -c "
+          from benchmarks.swebench.build_base_images import build_builder_image
+          result = build_builder_image(push=True)
+          if result.error:
+              raise RuntimeError(f'Builder build failed: {result.error}')
+          print(f'Builder image: {result.tags[0]}')
+          import os
+          with open(os.environ['GITHUB_ENV'], 'a') as f:
+              f.write(f'BUILDER_TAG={result.tags[0]}\n')
+              f.write('USE_PREBUILT_BASES=true\n')
+          "
+        env:
+          DOCKER_BUILDKIT: 1


🟢 Acceptable - Inconsistent Conditionals: SWE-bench workflow uses ${{ env.USE_PREBUILT_BASES == 'true' }} (string comparison), but SWT-bench uses ${{ inputs.use-prebuilt-bases == true }} (boolean comparison).

Both work, but the inconsistency makes maintenance harder. Pick one pattern and use it everywhere.

(This is pragmatic given time constraints, but worth standardizing in a follow-up.)

simonrosenberg · 2026-03-23T02:06:43Z

Addressed review feedback in 5592f9f:

Fixed:

Minimal one function docker runtime #2 (Push error handling): Collect partial push failures instead of fail-fast. Reports pushed/total count and all error details.
Bump the version-all group with 2 updates #3 (Silent cleanup): Replaced shutil.rmtree(ctx, ignore_errors=True) with try/except that logs warnings.
Update agent-sdk version #4 (Data structure): Replaced magic string base_image="builder" with base_image="sdk-builder-stage".
Bump the version-all group across 1 directory with 5 updates #6 (Inconsistent conditionals): Standardized SWT-bench workflow to use env.USE_PREBUILT_BASES == 'true' matching SWE-bench pattern.
Removed unused platform parameter from assemble_agent_image.

Acknowledged, deferred:

Setup pyproject.toml, pre-commit and some aggresive cleanup #1 (Testing gap): Valid concern. Build infrastructure integration tests need Docker and registry access which makes them hard to run in standard CI. Will create a follow-up issue for adding tests with a mock/stub Docker approach.
Bump the version-all group across 1 directory with 3 updates #5 (Workflow complexity): Agree the inline Python-in-YAML is fragile. Extracting to a build_with_phases.py script is the right call — will do in a follow-up to keep this PR focused on the optimization.

all-hands-bot

Taste Rating: 🟡 Acceptable - Solves a real problem (6h→52min) but ships 700 lines of untested concurrency code into production build infrastructure.

Good: Phased caching strategy is solid engineering. 4x speedup with real evidence.

Concerns: Two previous issues are now fixed (cleanup handling, partial push failures), but the testing gap remains critical and there's a new architectural dependency.

all-hands-bot · 2026-03-23T14:49:04Z

benchmarks/swebench/build_base_images.py

+def _get_sdk_dockerfile() -> Path:
+    """Locate the SDK Dockerfile from the vendor submodule."""
+    benchmarks_root = Path(__file__).resolve().parent.parent.parent
+    dockerfile = (
+        benchmarks_root
+        / "vendor"
+        / "software-agent-sdk"
+        / "openhands-agent-server"
+        / "openhands"
+        / "agent_server"
+        / "docker"
+        / "Dockerfile"
+    )
+    if not dockerfile.exists():
+        raise FileNotFoundError(
+            f"SDK Dockerfile not found at {dockerfile}. "
+            "Make sure submodules are initialized."


🟠 Important - Fragile Path Traversal: Hard-coded relative path with multiple parent/child hops is brittle. If directory structure changes or submodule location moves, this breaks silently.

Suggested change

def _get_sdk_dockerfile() -> Path:

"""Locate the SDK Dockerfile from the vendor submodule."""

benchmarks_root = Path(__file__).resolve().parent.parent.parent

dockerfile = (

benchmarks_root

/ "vendor"

/ "software-agent-sdk"

/ "openhands-agent-server"

/ "openhands"

/ "agent_server"

/ "docker"

/ "Dockerfile"

)

if not dockerfile.exists():

raise FileNotFoundError(

f"SDK Dockerfile not found at {dockerfile}. "

"Make sure submodules are initialized."

# Use configured submodule path or fall back to discovery

BENCHMARKS_ROOT = Path(__file__).resolve().parent.parent.parent

SDK_DOCKERFILE = BENCHMARKS_ROOT / "vendor" / "software-agent-sdk" / "openhands-agent-server" / "openhands" / "agent_server" / "docker" / "Dockerfile"

def _get_sdk_dockerfile() -> Path:

if not SDK_DOCKERFILE.exists():

raise FileNotFoundError(

f"SDK Dockerfile not found at {SDK_DOCKERFILE}. "

"Make sure submodules are initialized."

)

return SDK_DOCKERFILE

Move the path computation to module level so it's visible and fails fast at import time.

all-hands-bot · 2026-03-23T14:49:04Z

benchmarks/swebench/build_base_images.py

+    push: bool = False,
+    git_sha: str = "unknown",
+) -> BuildOutput:
+    """Assemble a final agent image from pre-built base + builder (Phase 2).
+
+    Uses local `docker build` + `docker push` instead of `docker buildx build`
+    to avoid the remote Blacksmith buildx driver overhead (~400s/image).
+    The local Docker daemon caches pulled base/builder layers across builds.
+    """
+    import time
+
+    if not AGENT_LAYER_DOCKERFILE.exists():
+        return BuildOutput(
+            base_image=base_tag,
+            tags=[],
+            error=f"Agent layer Dockerfile not found at {AGENT_LAYER_DOCKERFILE}",
+        )
+
+    tag_label = final_tags[0] if final_tags else base_tag
+    overall_started = time.monotonic()
+


🔴 Critical - New Architectural Dependency: This phase requires a local Docker daemon (docker build + docker push), while the standard path can use remote buildx drivers.

This is a fundamental trade-off:

✅ Benefit: Local layer caching → 4x speedup

⚠️ Risk: Breaks in environments without local Docker daemon (some CI runners, remote build farms)

Document this requirement clearly:

Add a comment block explaining the local-daemon dependency

Check docker info before running Phase 2 and fail fast with a clear error if daemon is unreachable

Document in the PR description that phased mode requires local Docker access

Example pre-flight check:

def _check_docker_daemon(): proc = subprocess.run(["docker", "info"], capture_output=True) if proc.returncode != 0: raise RuntimeError( "Local Docker daemon required for phased builds. " "Ensure Docker is running or use --no-prebuilt-bases." )

all-hands-bot · 2026-03-23T14:49:04Z

benchmarks/swebench/build_base_images.py

+
+
+if __name__ == "__main__":
+    sys.exit(main())


🔴 Critical - Testing Gap Remains: This 736-line file orchestrates concurrent builds with retry logic, Docker operations, and state management across three phases. Zero tests.

This is production build infrastructure that blocks all image builds when it breaks. The 4x speedup is worthless if a regression takes down CI for hours.

Minimum viable tests:

Unit: extract_custom_tag, base_image_tag, builder_image_tag (pure functions)

Integration: Mock subprocess.run to verify correct Docker commands are constructed

End-to-end: Build one test image through all three phases in CI

You don't need 100% coverage, but you need something to catch basic regressions. The concurrency code is especially fragile - test that partial failures are collected correctly and retries work.

all-hands-bot · 2026-03-23T14:49:04Z

benchmarks/swebench/build_base_images.py

+            "--file",
+            str(ctx / "Dockerfile"),
+            "--target",
+            "builder",


🟢 Acceptable - Fixed: Previous review flagged silent cleanup failure. This now logs warnings on cleanup errors. Good fix.

all-hands-bot · 2026-03-23T14:49:04Z

benchmarks/swebench/build_base_images.py

+        for t in final_tags:
+            push_cmd = ["docker", "push", t]
+            logger.info("[assembly] Pushing: %s", t)
+            push_started = time.monotonic()
+            push_proc = subprocess.run(push_cmd, text=True, capture_output=True)
+            push_seconds += time.monotonic() - push_started
+
+            if push_proc.stdout:
+                print(push_proc.stdout, end="")


🟢 Acceptable - Fixed: Previous review flagged fail-fast push loop. This now collects partial failures in failed_pushes and reports which tags succeeded. Good pragmatic fix.

neubig

Basically LGTM!

all-hands-bot

Taste Rating: 🟡 Acceptable - Real 4x speedup (6h→52min) with solid phase separation. Two critical fixes verified (cleanup warnings, partial push handling). One blocking issue remains: 700+ lines of production build infrastructure with zero tests. The local Docker dependency is acceptable for GitHub Actions but needs documentation.

Key Insight: You've solved a real problem with pragmatic caching, but untested CI infrastructure is a ticking bomb. The 4x speedup is worthless if this breaks silently and blocks all builds.

benchmarks/swebench/build_base_images.py

all-hands-bot

Taste Rating: 🟡 Acceptable - Real 4x speedup (6h→52min) with solid evidence from workflow runs. Previous critical issues (cleanup warnings, partial push handling, testing gap) are now resolved. The mock-heavy tests are pragmatic for build infrastructure that requires Docker/registry interactions.

Key Insight: You've proven this works at scale (500+ images) and the local Docker trade-off is well-documented. The main concerns from previous reviews have been addressed.

all-hands-bot · 2026-03-23T20:08:42Z

tests/test_phased_build.py

+
+        log_path = tmp_path / "log.txt"
+        log_path.touch()
+        mock_capture.return_value.__enter__ = MagicMock(return_value=log_path)


🟢 Acceptable - Pragmatic Test Strategy: These tests are mock-heavy (subprocess.run, remote_image_exists, etc.), which means they validate error handling and retry logic but not actual Docker command correctness.

This is a reasonable trade-off for build infrastructure:

✅ Fast feedback on logic paths

✅ Validates retry/error handling

✅ No Docker daemon or registry credentials needed

⚠️ Doesn't catch malformed Docker commands or tag format issues

However, the workflow run evidence (70s/image for 500 images) proves the real integration works. This is acceptable pragmatic testing for CI infrastructure.

all-hands-bot · 2026-03-23T20:08:42Z

tests/test_phased_build.py

+
+
+def _fail_proc(stderr="build error", code=1):
+    """Fake subprocess.CompletedProcess with non-zero returncode."""


🟡 Suggestion - Document Test vs Production Executor: Tests use ThreadPoolExecutor (to avoid pickling mocks) but production uses ProcessPoolExecutor. Consider adding a comment explaining this difference so future maintainers don't get confused:

# Use ThreadPoolExecutor in tests to avoid pickling issues with mocks. # Production code uses ProcessPoolExecutor for true parallelism. def _thread_pool(**kw): return ThreadPoolExecutor(**kw)

all-hands-bot · 2026-03-23T20:08:42Z

benchmarks/swebench/build_base_images.py

+
+
+if __name__ == "__main__":
+    sys.exit(main())


🟢 Resolved - Testing Gap Addressed: Previous reviews flagged this 750-line file as having zero tests. Now tests/test_phased_build.py provides 489 lines of coverage for:

Build success/failure/skip logic

Retry mechanisms

Partial push failure handling

Manifest writing

Orchestration flow

Combined with workflow evidence (runs 23409964288, 23417634029, 23417634506 showing 70s/image average), this is sufficient for production build infrastructure.

all-hands-bot · 2026-03-23T20:08:42Z

benchmarks/swebench/build_base_images.py

+
+    tag_label = final_tags[0] if final_tags else base_tag
+    overall_started = time.monotonic()
+


🟢 Acceptable - Documented Architectural Trade-off: The docstring clearly states the local Docker daemon requirement and explains the 4x speedup benefit (~455s → ~70s per image).

This is a conscious engineering choice with workflow evidence proving it works in GitHub Actions. The standard build path remains available as fallback for environments without local Docker.

Well-documented pragmatic decision.

- Add phased build pipeline (base → agent-layer → per-instance) - Add -acp suffix to image tags for ACP agent builds to avoid IfNotPresent pull policy reusing stale non-ACP cached images Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove the use-prebuilt-bases feature flag and standard build path. The three-phase build (builder → base → agent) is now the sole path in both swebench and swtbench workflows. Also wire --agent-type through to build_phased_images.py so ACP builds get the -acp tag suffix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge the phased build CLI from build_phased_images.py into build_images.py so there is a single entry point for image builds. The old single-pass build path is removed; the three-phase pipeline (builder → base → assemble) is now the only code path. SWT-bench's build_images.py becomes a thin shim that forwards to the SWE-bench module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove unused BUILD_DEFAULTS import. Fix TestPhasedOrchestration patches to target build_base_images (where functions are defined) since build_images uses lazy imports inside main() to avoid circular dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonrosenberg requested a review from all-hands-bot March 23, 2026 01:39

all-hands-bot reviewed Mar 23, 2026

View reviewed changes

simonrosenberg requested a review from all-hands-bot March 23, 2026 14:46

all-hands-bot reviewed Mar 23, 2026

View reviewed changes

simonrosenberg mentioned this pull request Mar 23, 2026

Split base image from SDK layer to avoid full 9h+ rebuilds on SDK changes #538

Open

simonrosenberg requested a review from all-hands-bot March 23, 2026 19:26

neubig approved these changes Mar 23, 2026

View reviewed changes

all-hands-bot reviewed Mar 23, 2026

View reviewed changes

benchmarks/swebench/build_base_images.py Show resolved Hide resolved

benchmarks/swebench/build_base_images.py Show resolved Hide resolved

benchmarks/swebench/build_base_images.py Outdated Show resolved Hide resolved

simonrosenberg requested a review from all-hands-bot March 23, 2026 20:05

all-hands-bot reviewed Mar 23, 2026

View reviewed changes

simonrosenberg mentioned this pull request Mar 23, 2026

perf: switch phased image builds to cheaper GitHub-hosted runner #559

Open

3 tasks

simonrosenberg force-pushed the feat/prebuilt-base-local-build branch from cb034c8 to 3cd3a50 Compare March 23, 2026 23:22

simonrosenberg force-pushed the feat/prebuilt-base-local-build branch from 3cd3a50 to 976f003 Compare March 23, 2026 23:58

Debug Agent and others added 4 commits March 23, 2026 21:04

style: fix line length in test_phased_build.py @patch decorators

e7cc5f3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>



		def _fail_proc(stderr="build error", code=1):
		"""Fake subprocess.CompletedProcess with non-zero returncode."""


		tag_label = final_tags[0] if final_tags else base_tag
		overall_started = time.monotonic()

Conversation

simonrosenberg commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Current Behavior

What Is In This PR

Why This Design

Results

Dependency

Scope

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonrosenberg commented Mar 23, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neubig left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonrosenberg commented Mar 23, 2026 •

edited

Loading