Feature/unified multiarch by tdamir · Pull Request #752 · mostlygeek/llama-swap

tdamir · 2026-05-12T20:59:16Z

This pull request introduces multi-architecture (amd64 and arm64) Docker image builds,

Every backend/arch will create own image (e.g. unified-cuda-amd64, unified-cuda-arm64, unified-vulkan-amd64, unified-vulkan-arm64) and after successfully pushed, images will be merged in unified-cuda and unified-vulkan tags.

I needed to split the root and rootless build because rootless was using the newly pushed image as a base and on first run, there is no such image since it is pushed only after.

I have extracted and updated the CUDA version to 13 and removed unsupported CUDA_ARCHITECTURES and added 121 for DGX Spark.

New separate cache images are created for architectures. This could cause slow building on first run. Maybe to consider not adding ARCH to cache image name if AMD64 to keep the compatibility.

Old: CACHE_REF="ghcr.io/mostlygeek/llama-swap:unified-${BACKEND}-cache"
New: CACHE_REF="ghcr.io/${GITHUB_REPOSITORY}:unified-${BACKEND}-cache-${ARCH}"

I needed to explicitly set -march in ik_llama arm build. I set it to the arm9.2 which is optimized for DGX Spark. This can be reconsidered.

if [ "$ARCH" = "arm64" ]; then
CMAKE_FLAGS+=(-DGGML_ARCH_FLAGS="-march=armv9.2-a+dotprod+fp16")
fi

I modified the install-sd.sh to include the frontend.

While working on this, I have noticed that if both CUDA and Vulkan are checked for build in one Workflow, then Vulkan amd64 hangs for some reason. When initiated separately, then jobs are running fine. I guess it has something to do with GitHub runners.

…pport

coderabbitai · 2026-05-12T20:59:29Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 75dfc356-200b-4713-98b8-ea2291a0c6b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

The pull request extends the Docker build infrastructure to support multi-platform (amd64 and arm64) image builds with parameterized CUDA/Ubuntu versions, introduces a standalone rootless image variant builder, and consolidates platform-specific images via multi-arch manifests in ghcr.io.

Changes

Multi-platform Docker Build System with Rootless Variants

Layer / File(s)	Summary
Workflow and build matrix + manifest publishing `.github/workflows/unified-docker.yml`	CI matrix now builds per-backend and per-platform (amd64/arm64), sets `runs-on` conditionally, tags platform-specific images under `ghcr.io/${{ github.repository }}`, pushes date-stamped variants, and assembles multi-arch manifests with `docker buildx imagetools create`.
Host-arch aware build script and cache refs `docker/unified/build-image.sh`	Script detects host architecture, maps to `amd64`/`arm64`, includes `${ARCH}` in default `DOCKER_IMAGE_TAG`, accepts `--cuda-arch`, passes `CMAKE_CUDA_ARCHITECTURES` into the build, updates BuildKit cache ref to include backend+arch, and simplifies post-build output.
Rootless image build script `docker/unified/build-image-rootless.sh`	New script adds CLI flags (`--cuda`, `--vulkan`, `--no-cache`), detects arch, computes image tags, builds a rootless variant via `buildx` with an inline Dockerfile that creates `llama-swap` user (UID/GID 10001), and prints backend-specific run examples.
Dockerfile parameterization and builder tooling `docker/unified/Dockerfile`	Adds `CUDA_VERSION` and `UBUNTU_VERSION` build args, expands builder packages (curl, ccache, nodejs/npm, global `pnpm`), standardizes cache mount target to `${CCACHE_DIR}` with per-stage ids, and parameterizes runtime base images (CUDA runtime and Ubuntu versions).
Backend build flags `docker/unified/install-ik-llama.sh`, `docker/unified/install-sd.sh`	`install-ik-llama.sh` adds host arch detection and conditionally appends `-DGGML_ARCH_FLAGS` on arm64 when supported; `install-sd.sh` adds `-DSD_SERVER_BUILD_FRONTEND=ON` to `CMAKE_FLAGS`.

🎯 3 (Moderate) | ⏱️ ~25 minutes

mostlygeek/llama-swap#630: Overlaps on -rootless Docker image variant creation and tagging.
mostlygeek/llama-swap#597: Related unified Docker workflow, arch-aware tagging, and cache logic.
mostlygeek/llama-swap#625: Related changes to CMAKE_CUDA_ARCHITECTURES handling.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Feature/unified multiarch' directly describes the main change: introduction of multi-architecture Docker image builds for amd64 and arm64.
Description check	✅ Passed	The description is directly related to the changeset, providing context about multi-arch Docker builds, the rationale for splitting root/rootless builds, CUDA version updates, and architectural optimizations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/unified-docker.yml:
- Around line 159-179: The merge-manifests job still runs during local act runs;
update its conditional so it is skipped when the ACT environment is set. Modify
the job "merge-manifests" if expression (the current if: ${{
needs.setup.outputs.matrix != '[]' && (github.event_name == 'schedule' ||
inputs.push_to_ghcr == true) }}) to also require that env.ACT is not true (for
example add && (env.ACT != 'true') or equivalent), so the GHCR login and
imagetools steps are not executed during local act runs.

In `@docker/unified/build-image-rootless.sh`:
- Around line 20-31: The rootless build script parses --no-cache into the
NO_CACHE variable but never forwards it to the docker buildx invocation; update
the rootless build path so that when NO_CACHE=true you add the --no-cache option
to the docker buildx build command (the invocation using docker buildx build in
build-image-rootless.sh) so cached layers are not used; locate the docker buildx
build command in the rootless branch and conditionally append "--no-cache" (or
include it via a build arguments array) based on the NO_CACHE variable.

In `@docker/unified/Dockerfile`:
- Around line 29-35: The Vulkan builder stage is missing the frontend toolchain
while install-sd.sh sets SD_SERVER_BUILD_FRONTEND=ON unconditionally; update the
Vulkan builder Docker stage in docker/unified/Dockerfile to install nodejs, npm
and pnpm (same packages/commands used in the other builder stage: include nodejs
and npm in the apt-get install list and run npm install -g pnpm@latest-10) so
the build can run when SD_SERVER_BUILD_FRONTEND is enabled, or alternatively
make install-sd.sh conditional on SD_SERVER_BUILD_FRONTEND only for backends
that have the toolchain.

In `@docker/unified/install-ik-llama.sh`:
- Around line 41-43: The script currently forces
-DGGML_ARCH_FLAGS="-march=armv9.2-a+dotprod+fp16" when ARCH is "arm64", which
raises the minimum CPU baseline and breaks generic linux/arm64 images; instead,
only append that flag to CMAKE_FLAGS if the host actually supports armv9.2+
features (detect via cpu feature checks like /proc/cpuinfo or lscpu for dotprod
and fp16) or remove the addition for the generic ARM64 build and produce a
separate performance-optimized image/tag for Armv9.2+ targets; update the
conditional around ARCH and the CMAKE_FLAGS modification (the ARCH check and the
-DGGML_ARCH_FLAGS addition) accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 135a72e1-f0d2-499f-8fe7-cb635ca919cf

📥 Commits

Reviewing files that changed from the base of the PR and between 174e856 and 83788fa.

📒 Files selected for processing (6)

.github/workflows/unified-docker.yml
docker/unified/Dockerfile
docker/unified/build-image-rootless.sh
docker/unified/build-image.sh
docker/unified/install-ik-llama.sh
docker/unified/install-sd.sh

mostlygeek · 2026-05-12T21:07:25Z

Do you have this running in GHA on your own fork? I'd like to see the output of what the successful runs look like.

tdamir · 2026-05-12T21:10:07Z

Yes. You can take a look at it here: https://github.com/tdamir/llama-swap/actions/workflows/unified-docker.yml.

tdamir · 2026-05-12T21:47:38Z

This one looks promising: https://github.com/tdamir/llama-swap/actions/runs/25763537472. Still running...

juilpark · 2026-05-13T02:28:52Z

+FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} AS builder-base-cuda

-ARG CMAKE_CUDA_ARCHITECTURES="60;61;75;86;89"
+ARG CMAKE_CUDA_ARCHITECTURES="75;86;89;121"


what about adding 120 for RTX 50 Series and RTX Pro Blackwell series?

Sure. We can add it. The build time will increase but that's the way it is I guess...

Removing 60 and 61 drops support for pascal (p40) cards. I actually have and still use those.

We could try PTX to support multiple architecture? I'm open to suggestions for broader hardware support.

Maybe we could split cuda into the cuda and cuda13? Then there is also an issue with deciding what arm architecture/instructions to support...

Creating separate cuda and cuda13 images makes a lot of sense. The people who are running P40s with RTX6000 will have to deal with the compatibility issues. I imagine hardware mixes like that will be pretty rare anyways.

Architecture support for Maxwell, Pascal, and Volta is considered feature-complete. Offline compilation and library support for these architectures have been removed in CUDA Toolkit 13.0 major version release. The use of CUDA Toolkits through the 12.x series to build applications for these architectures will continue to be supported, but newer toolkits will be unable to target these architectures. cuda 13 deprecation architectures notice

I have added the cuda13 option yesterday and started the unified workflow.

It looks like matrix expansion is now working but the build exceeds 6 hours so it was cancelled by GitHub. All CUDA (for 12 the old ones and the new ones, for 13 the new ones) architectures are enabled. I guess that is causing a long build times.

https://github.com/tdamir/llama-swap/actions/runs/25832659614

docker: update CUDA architectures for improved compatibility build: enhance build script to support no-cache option install: conditionally apply architecture flags for arm64

coderabbitai

🧹 Nitpick comments (2)

.github/workflows/unified-docker.yml (2)
71-78: ⚡ Quick win

Consolidate the platform→runner/arch mapping via matrix.include.

The ternary matrix.platform == 'linux/amd64' && '…' || matrix.platform == 'linux/arm64' && '…' is repeated five times (runner at 71, image tag suffix at 118/130/139/151). The expressions also silently evaluate to false if a new platform is ever added, which would produce an invalid runs-on or a malformed tag suffix. Adding a small include block lets you reference matrix.runner / matrix.arch directly and keeps the mapping in one place.
♻️ Proposed consolidation
   build:
     needs: setup
     if: ${{ needs.setup.outputs.matrix != '[]' }}
-    runs-on: ${{ matrix.platform == 'linux/amd64' && 'ubuntu-latest' || matrix.platform == 'linux/arm64' && 'ubuntu-24.04-arm' }}
+    runs-on: ${{ matrix.runner }}
     strategy:
       fail-fast: false
       matrix:
-        platform: 
-          - linux/amd64
-          - linux/arm64
         backend: ${{ fromJSON(needs.setup.outputs.matrix) }}
+        include:
+          - platform: linux/amd64
+            arch: amd64
+            runner: ubuntu-latest
+          - platform: linux/arm64
+            arch: arm64
+            runner: ubuntu-24.04-arm
Then replace each downstream occurrence (lines 118, 130, 139, 151) of the suffix expression with ${{ matrix.arch }}, e.g.:
-          DOCKER_IMAGE_TAG: ghcr.io/${{ github.repository }}:unified-${{ matrix.backend }}-${{ matrix.platform == 'linux/amd64' && 'amd64' || matrix.platform == 'linux/arm64' && 'arm64' }}
+          DOCKER_IMAGE_TAG: ghcr.io/${{ github.repository }}:unified-${{ matrix.backend }}-${{ matrix.arch }}
Note: when using include to add fields to existing matrix entries, the platform values must come from the include itself (as above) — not from a separate top-level platform: list — otherwise GHA will produce a Cartesian product plus include rows.
Also applies to: 118-118, 130-130, 139-139, 151-151
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/unified-docker.yml around lines 71 - 78, The workflow
repeats a fragile ternary mapping from matrix.platform to runner/arch; replace
it by defining matrix.include entries that set platform, runner, and arch for
each matrix row (e.g. include: - platform: linux/amd64 runner: ubuntu-latest
arch: amd64; - platform: linux/arm64 runner: ubuntu-24.04-arm arch: arm64) and
then use matrix.runner for runs-on and matrix.arch for tag suffixes instead of
the inline ternary expressions (update all places where matrix.platform == ...
&& '...' || ... is used to reference matrix.runner/matrix.arch).
159-168: ⚖️ Poor tradeoff

Per-backend manifest creation is gated on the whole build job succeeding.

merge-manifests declares needs: build, so if a single matrix cell fails (e.g. vulkan + linux/amd64 hangs as the PR description warns), GHA marks the whole build job failed and skips manifest creation for every backend — including the ones that fully succeeded. With fail-fast: false on the matrix, you may want manifest publishing to remain per-backend independent: e.g. add if: always() && needs.build.result != 'cancelled' && … and guard each imagetools create so it only runs when both arch tags for that backend were pushed (check the corresponding matrix outcomes via the needs context or a small dependency-graph adjustment).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/unified-docker.yml around lines 159 - 168, The
merge-manifests job currently depends on the entire build job (needs: build) so
a single failing/hanging matrix cell prevents manifest creation for all
backends; change merge-manifests to run regardless of build matrix failures and
then gate per-backend manifest publishing: replace the job-level if with
something like if: always() && needs.build.result != 'cancelled' &&
(github.event_name == 'schedule' || inputs.push_to_ghcr == true') and inside the
job, for each backend matrix entry (matrix.backend) conditionally run the
imagetools create/publish steps only when that backend's corresponding build
matrix cells succeeded (check the individual build outcomes via
needs.build.<jobName>.result or a small dependency graph/output from the build
matrix), so merge-manifests (job: merge-manifests) no longer gets skipped by
unrelated matrix failures and each imagetools create is guarded per backend.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/unified-docker.yml:
- Around line 71-78: The workflow repeats a fragile ternary mapping from
matrix.platform to runner/arch; replace it by defining matrix.include entries
that set platform, runner, and arch for each matrix row (e.g. include: -
platform: linux/amd64 runner: ubuntu-latest arch: amd64; - platform: linux/arm64
runner: ubuntu-24.04-arm arch: arm64) and then use matrix.runner for runs-on and
matrix.arch for tag suffixes instead of the inline ternary expressions (update
all places where matrix.platform == ... && '...' || ... is used to reference
matrix.runner/matrix.arch).
- Around line 159-168: The merge-manifests job currently depends on the entire
build job (needs: build) so a single failing/hanging matrix cell prevents
manifest creation for all backends; change merge-manifests to run regardless of
build matrix failures and then gate per-backend manifest publishing: replace the
job-level if with something like if: always() && needs.build.result !=
'cancelled' && (github.event_name == 'schedule' || inputs.push_to_ghcr == true')
and inside the job, for each backend matrix entry (matrix.backend) conditionally
run the imagetools create/publish steps only when that backend's corresponding
build matrix cells succeeded (check the individual build outcomes via
needs.build.<jobName>.result or a small dependency graph/output from the build
matrix), so merge-manifests (job: merge-manifests) no longer gets skipped by
unrelated matrix failures and each imagetools create is guarded per backend.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 070ec702-2bc1-45c2-b76e-82745904d9f2

📥 Commits

Reviewing files that changed from the base of the PR and between 83788fa and cd077a0.

📒 Files selected for processing (4)

.github/workflows/unified-docker.yml
docker/unified/Dockerfile
docker/unified/build-image-rootless.sh
docker/unified/install-ik-llama.sh

🚧 Files skipped from review as they are similar to previous changes (2)

docker/unified/install-ik-llama.sh
docker/unified/build-image-rootless.sh

docker: update architecture flags for arm64 in install script

coderabbitai

🧹 Nitpick comments (1)

docker/unified/install-ik-llama.sh (1)
41-43: ⚡ Quick win

Consider documenting the ARM64 CPU baseline.

The code sets -march=armv8.2-a+dotprod+fp16, which requires ARMv8.2-A with DotProd and FP16 extensions. This optimizes for modern ARM64 servers (Graviton2+, Neoverse N1+) but won't run on older ARM64 CPUs. Adding a brief comment would clarify the minimum CPU requirements for users deploying the resulting image.
📝 Example documentation comment
 if [ "$ARCH" = "arm64" ]; then
+    # Optimize for ARMv8.2-A+ (Graviton2+, Neoverse N1+); won't run on older ARM64 CPUs
     CMAKE_FLAGS+=(-DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16")
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/unified/install-ik-llama.sh` around lines 41 - 43, Add a brief comment
above the ARM64 conditional explaining the CPU baseline required by the CMake
flag: note that setting CMAKE_FLAGS with
-DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" requires ARMv8.2-A with
DotProd and FP16 (e.g., Graviton2+, Neoverse N1+, modern Cortex-A cores) and
will not run on older ARM64 CPUs; reference the ARCH variable and the
CMAKE_FLAGS line so it's easy to find and update.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docker/unified/install-ik-llama.sh`:
- Around line 41-43: Add a brief comment above the ARM64 conditional explaining
the CPU baseline required by the CMake flag: note that setting CMAKE_FLAGS with
-DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" requires ARMv8.2-A with
DotProd and FP16 (e.g., Graviton2+, Neoverse N1+, modern Cortex-A cores) and
will not run on older ARM64 CPUs; reference the ARCH variable and the
CMAKE_FLAGS line so it's easy to find and update.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0e3fd81b-cb72-4b34-9334-4d17f3247aa2

📥 Commits

Reviewing files that changed from the base of the PR and between 2fedd31 and a72f2cd.

📒 Files selected for processing (2)

.github/workflows/unified-docker.yml
docker/unified/install-ik-llama.sh

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/unified-docker.yml

… enhance build script with CUDA architecture option

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docker/unified/build-image.sh (1)

38-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix help text to match actual default behavior and show proper quoting for copy/paste commands.

Line 48 omits the -<arch> suffix that the code actually appends to the default tag at runtime. Line 58 example should show quotes around 86;89 to demonstrate correct shell escaping when copying the command.

Proposed fix

-            echo "  DOCKER_IMAGE_TAG     Set custom image tag (default: llama-swap:unified-cuda or llama-swap:unified-vulkan)"
+            echo "  DOCKER_IMAGE_TAG     Set custom image tag (default: llama-swap:unified-cuda-<arch> or llama-swap:unified-vulkan-<arch>)"
@@
-            echo "  ./build-image.sh --cuda --cuda-arch=86;89   # Build for sm_86 and sm_89"
+            echo "  ./build-image.sh --cuda --cuda-arch='86;89' # Build for sm_86 and sm_89"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/unified/build-image.sh` around lines 38 - 58, Update the help text in
build-image.sh so it accurately reflects the runtime tag behavior: modify the
DOCKER_IMAGE_TAG/default image tag description to show the appended -<arch>
suffix (e.g., "llama-swap:unified-cuda-<arch> or
llama-swap:unified-vulkan-<arch>") and correct the example usage for --cuda-arch
to show proper shell quoting (e.g., add quotes around 86;89) so copy/paste
works; adjust the echo lines that print the "DOCKER_IMAGE_TAG" help and the
"--cuda-arch" example accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker/unified/build-image.sh`:
- Line 21: The script unconditionally clears CMAKE_CUDA_ARCHITECTURES and
contains an impossible test, so exported environment values are discarded;
change the logic in the CUDA-arch handling (the CMAKE_CUDA_ARCHITECTURES
variable and the block that checks CLI --cuda-arch) so that CLI flag overrides
env but an exported CMAKE_CUDA_ARCHITECTURES is honored if no flag is provided,
and only set a default value when neither is present; specifically, remove the
line that wipes CMAKE_CUDA_ARCHITECTURES, replace the impossible condition that
uses both -z and -n with a clear precedence check (if CLI flag present -> use
it; else if CMAKE_CUDA_ARCHITECTURES env is set -> keep it; else -> set
default), and ensure any downstream usage references the resolved variable.

---

Outside diff comments:
In `@docker/unified/build-image.sh`:
- Around line 38-58: Update the help text in build-image.sh so it accurately
reflects the runtime tag behavior: modify the DOCKER_IMAGE_TAG/default image tag
description to show the appended -<arch> suffix (e.g.,
"llama-swap:unified-cuda-<arch> or llama-swap:unified-vulkan-<arch>") and
correct the example usage for --cuda-arch to show proper shell quoting (e.g.,
add quotes around 86;89) so copy/paste works; adjust the echo lines that print
the "DOCKER_IMAGE_TAG" help and the "--cuda-arch" example accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6ea8ae46-c00a-4fb3-a7ac-a2815e2f3690

📥 Commits

Reviewing files that changed from the base of the PR and between a72f2cd and b4b45f8.

📒 Files selected for processing (2)

.github/workflows/unified-docker.yml
docker/unified/build-image.sh

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/unified-docker.yml

…itecture resolution in build script

…er workflow

tdamir · 2026-05-13T15:05:31Z

I started fixing the issues and cleaning the code. Currently the matrix expansion is not working for some reason as expected. Only vulkan and cuda on arm64 is created. It's probably my lack of GitHub actions experience.

tdamir · 2026-05-15T09:18:36Z

I needed to remove the newer architectures from the CUDA 12 so the build don't exceed 6 hours build time.

Workflow: https://github.com/tdamir/llama-swap/actions/runs/25893668016

Images: https://github.com/tdamir/llama-swap/pkgs/container/llama-swap/versions

Thoughts?

juilpark · 2026-05-21T06:12:41Z

I needed to remove the newer architectures from the CUDA 12 so the build don't exceed 6 hours build time.

Workflow: https://github.com/tdamir/llama-swap/actions/runs/25893668016

Images: https://github.com/tdamir/llama-swap/pkgs/container/llama-swap/versions

Thoughts?

I think dropping the native Blackwell SM targets from the CUDA 12 image is acceptable if that keeps the build under the GitHub Actions 6-hour limit.

CUDA 12.8+ does support SM_120, so I’d describe this as a build-time tradeoff rather than a CUDA 12 limitation.
For Blackwell users, the CUDA 13 image seems like the better target since recent Blackwell-focused performance work is mostly landing there.

tdamir added 2 commits May 12, 2026 21:36

docker: update Dockerfile and build scripts for multi-architecture su…

0d9d6c1

…pport

ci: enhance multi-architecture support in Docker workflow

83788fa

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread .github/workflows/unified-docker.yml

Comment thread docker/unified/build-image-rootless.sh

Comment thread docker/unified/Dockerfile

Comment thread docker/unified/install-ik-llama.sh

juilpark reviewed May 13, 2026

View reviewed changes

juilpark mentioned this pull request May 13, 2026

[Feature Request] Release ARM64 Container not only amd64 #709

Open

ci: add conditional for multi-arch manifest creation in workflow

cd077a0

docker: update CUDA architectures for improved compatibility build: enhance build script to support no-cache option install: conditionally apply architecture flags for arm64

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

tdamir added 2 commits May 13, 2026 13:09

Consolidate the platform→runner/arch mapping via matrix.include

2fedd31

ci: limit parallel jobs to 2 in Docker build workflow

a72f2cd

docker: update architecture flags for arm64 in install script

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

ci: update unified Docker workflow for multi-architecture support and…

b4b45f8

… enhance build script with CUDA architecture option

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread docker/unified/build-image.sh Outdated

tdamir added 3 commits May 13, 2026 15:31

ci: simplify conditions for merge-manifests job and enhance CUDA arch…

c7fd01e

…itecture resolution in build script

ci: remove max-parallel limit from build job strategy in unified Dock…

d7a420b

…er workflow

ci: remove platform inclusion for multi-architecture in Docker workflow

52ccb19

tdamir added 3 commits May 14, 2026 01:30

Add cuda13.

063e6f6

ci: update backend condition to support cuda13 in installation scripts

95de083

Remove newer architectures from CUDA 12

9dc0a6c

tdamir added 2 commits June 12, 2026 19:54

Merge branch 'mostlygeek:main' into feature/unified-multiarch

5538f1d

Merge branch 'mostlygeek:main' into feature/unified-multiarch

ea87c5a

Conversation

tdamir commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mostlygeek commented May 12, 2026

Uh oh!

tdamir commented May 12, 2026

Uh oh!

tdamir commented May 12, 2026

Uh oh!

juilpark May 13, 2026

Choose a reason for hiding this comment

Uh oh!

tdamir May 13, 2026

Choose a reason for hiding this comment

Uh oh!

mostlygeek May 13, 2026

Choose a reason for hiding this comment

Uh oh!

tdamir May 13, 2026

Choose a reason for hiding this comment

Uh oh!

mostlygeek May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdamir May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tdamir commented May 13, 2026

Uh oh!

tdamir commented May 15, 2026

Uh oh!

juilpark commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 12, 2026 •

edited

Loading

mostlygeek May 13, 2026 •

edited

Loading