Feature/unified multiarch#752
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe pull request extends the Docker build infrastructure to support multi-platform (amd64 and arm64) image builds with parameterized CUDA/Ubuntu versions, introduces a standalone rootless image variant builder, and consolidates platform-specific images via multi-arch manifests in ghcr.io. ChangesMulti-platform Docker Build System with Rootless Variants
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/unified-docker.yml:
- Around line 159-179: The merge-manifests job still runs during local act runs;
update its conditional so it is skipped when the ACT environment is set. Modify
the job "merge-manifests" if expression (the current if: ${{
needs.setup.outputs.matrix != '[]' && (github.event_name == 'schedule' ||
inputs.push_to_ghcr == true) }}) to also require that env.ACT is not true (for
example add && (env.ACT != 'true') or equivalent), so the GHCR login and
imagetools steps are not executed during local act runs.
In `@docker/unified/build-image-rootless.sh`:
- Around line 20-31: The rootless build script parses --no-cache into the
NO_CACHE variable but never forwards it to the docker buildx invocation; update
the rootless build path so that when NO_CACHE=true you add the --no-cache option
to the docker buildx build command (the invocation using docker buildx build in
build-image-rootless.sh) so cached layers are not used; locate the docker buildx
build command in the rootless branch and conditionally append "--no-cache" (or
include it via a build arguments array) based on the NO_CACHE variable.
In `@docker/unified/Dockerfile`:
- Around line 29-35: The Vulkan builder stage is missing the frontend toolchain
while install-sd.sh sets SD_SERVER_BUILD_FRONTEND=ON unconditionally; update the
Vulkan builder Docker stage in docker/unified/Dockerfile to install nodejs, npm
and pnpm (same packages/commands used in the other builder stage: include nodejs
and npm in the apt-get install list and run npm install -g pnpm@latest-10) so
the build can run when SD_SERVER_BUILD_FRONTEND is enabled, or alternatively
make install-sd.sh conditional on SD_SERVER_BUILD_FRONTEND only for backends
that have the toolchain.
In `@docker/unified/install-ik-llama.sh`:
- Around line 41-43: The script currently forces
-DGGML_ARCH_FLAGS="-march=armv9.2-a+dotprod+fp16" when ARCH is "arm64", which
raises the minimum CPU baseline and breaks generic linux/arm64 images; instead,
only append that flag to CMAKE_FLAGS if the host actually supports armv9.2+
features (detect via cpu feature checks like /proc/cpuinfo or lscpu for dotprod
and fp16) or remove the addition for the generic ARM64 build and produce a
separate performance-optimized image/tag for Armv9.2+ targets; update the
conditional around ARCH and the CMAKE_FLAGS modification (the ARCH check and the
-DGGML_ARCH_FLAGS addition) accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 135a72e1-f0d2-499f-8fe7-cb635ca919cf
📒 Files selected for processing (6)
.github/workflows/unified-docker.ymldocker/unified/Dockerfiledocker/unified/build-image-rootless.shdocker/unified/build-image.shdocker/unified/install-ik-llama.shdocker/unified/install-sd.sh
|
Do you have this running in GHA on your own fork? I'd like to see the output of what the successful runs look like. |
|
Yes. You can take a look at it here: https://github.com/tdamir/llama-swap/actions/workflows/unified-docker.yml. |
|
This one looks promising: https://github.com/tdamir/llama-swap/actions/runs/25763537472. Still running... |
| FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} AS builder-base-cuda | ||
|
|
||
| ARG CMAKE_CUDA_ARCHITECTURES="60;61;75;86;89" | ||
| ARG CMAKE_CUDA_ARCHITECTURES="75;86;89;121" |
There was a problem hiding this comment.
what about adding 120 for RTX 50 Series and RTX Pro Blackwell series?
There was a problem hiding this comment.
Sure. We can add it. The build time will increase but that's the way it is I guess...
There was a problem hiding this comment.
Removing 60 and 61 drops support for pascal (p40) cards. I actually have and still use those.
We could try PTX to support multiple architecture? I'm open to suggestions for broader hardware support.
There was a problem hiding this comment.
Maybe we could split cuda into the cuda and cuda13? Then there is also an issue with deciding what arm architecture/instructions to support...
There was a problem hiding this comment.
Creating separate cuda and cuda13 images makes a lot of sense. The people who are running P40s with RTX6000 will have to deal with the compatibility issues. I imagine hardware mixes like that will be pretty rare anyways.
Architecture support for Maxwell, Pascal, and Volta is considered feature-complete. Offline compilation and library support for these architectures have been removed in CUDA Toolkit 13.0 major version release. The use of CUDA Toolkits through the 12.x series to build applications for these architectures will continue to be supported, but newer toolkits will be unable to target these architectures. cuda 13 deprecation architectures notice
There was a problem hiding this comment.
I have added the cuda13 option yesterday and started the unified workflow.
It looks like matrix expansion is now working but the build exceeds 6 hours so it was cancelled by GitHub. All CUDA (for 12 the old ones and the new ones, for 13 the new ones) architectures are enabled. I guess that is causing a long build times.
https://github.com/tdamir/llama-swap/actions/runs/25832659614
docker: update CUDA architectures for improved compatibility build: enhance build script to support no-cache option install: conditionally apply architecture flags for arm64
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.github/workflows/unified-docker.yml (2)
71-78: ⚡ Quick winConsolidate the platform→runner/arch mapping via
matrix.include.The ternary
matrix.platform == 'linux/amd64' && '…' || matrix.platform == 'linux/arm64' && '…'is repeated five times (runner at 71, image tag suffix at 118/130/139/151). The expressions also silently evaluate tofalseif a new platform is ever added, which would produce an invalidruns-onor a malformed tag suffix. Adding a smallincludeblock lets you referencematrix.runner/matrix.archdirectly and keeps the mapping in one place.♻️ Proposed consolidation
build: needs: setup if: ${{ needs.setup.outputs.matrix != '[]' }} - runs-on: ${{ matrix.platform == 'linux/amd64' && 'ubuntu-latest' || matrix.platform == 'linux/arm64' && 'ubuntu-24.04-arm' }} + runs-on: ${{ matrix.runner }} strategy: fail-fast: false matrix: - platform: - - linux/amd64 - - linux/arm64 backend: ${{ fromJSON(needs.setup.outputs.matrix) }} + include: + - platform: linux/amd64 + arch: amd64 + runner: ubuntu-latest + - platform: linux/arm64 + arch: arm64 + runner: ubuntu-24.04-armThen replace each downstream occurrence (lines 118, 130, 139, 151) of the suffix expression with
${{ matrix.arch }}, e.g.:- DOCKER_IMAGE_TAG: ghcr.io/${{ github.repository }}:unified-${{ matrix.backend }}-${{ matrix.platform == 'linux/amd64' && 'amd64' || matrix.platform == 'linux/arm64' && 'arm64' }} + DOCKER_IMAGE_TAG: ghcr.io/${{ github.repository }}:unified-${{ matrix.backend }}-${{ matrix.arch }}Note: when using
includeto add fields to existing matrix entries, theplatformvalues must come from the include itself (as above) — not from a separate top-levelplatform:list — otherwise GHA will produce a Cartesian product plus include rows.Also applies to: 118-118, 130-130, 139-139, 151-151
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/unified-docker.yml around lines 71 - 78, The workflow repeats a fragile ternary mapping from matrix.platform to runner/arch; replace it by defining matrix.include entries that set platform, runner, and arch for each matrix row (e.g. include: - platform: linux/amd64 runner: ubuntu-latest arch: amd64; - platform: linux/arm64 runner: ubuntu-24.04-arm arch: arm64) and then use matrix.runner for runs-on and matrix.arch for tag suffixes instead of the inline ternary expressions (update all places where matrix.platform == ... && '...' || ... is used to reference matrix.runner/matrix.arch).
159-168: ⚖️ Poor tradeoffPer-backend manifest creation is gated on the whole
buildjob succeeding.
merge-manifestsdeclaresneeds: build, so if a single matrix cell fails (e.g.vulkan+linux/amd64hangs as the PR description warns), GHA marks the wholebuildjob failed and skips manifest creation for every backend — including the ones that fully succeeded. Withfail-fast: falseon the matrix, you may want manifest publishing to remain per-backend independent: e.g. addif: always() && needs.build.result != 'cancelled' && …and guard eachimagetools createso it only runs when both arch tags for that backend were pushed (check the corresponding matrix outcomes via theneedscontext or a small dependency-graph adjustment).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/unified-docker.yml around lines 159 - 168, The merge-manifests job currently depends on the entire build job (needs: build) so a single failing/hanging matrix cell prevents manifest creation for all backends; change merge-manifests to run regardless of build matrix failures and then gate per-backend manifest publishing: replace the job-level if with something like if: always() && needs.build.result != 'cancelled' && (github.event_name == 'schedule' || inputs.push_to_ghcr == true') and inside the job, for each backend matrix entry (matrix.backend) conditionally run the imagetools create/publish steps only when that backend's corresponding build matrix cells succeeded (check the individual build outcomes via needs.build.<jobName>.result or a small dependency graph/output from the build matrix), so merge-manifests (job: merge-manifests) no longer gets skipped by unrelated matrix failures and each imagetools create is guarded per backend.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/unified-docker.yml:
- Around line 71-78: The workflow repeats a fragile ternary mapping from
matrix.platform to runner/arch; replace it by defining matrix.include entries
that set platform, runner, and arch for each matrix row (e.g. include: -
platform: linux/amd64 runner: ubuntu-latest arch: amd64; - platform: linux/arm64
runner: ubuntu-24.04-arm arch: arm64) and then use matrix.runner for runs-on and
matrix.arch for tag suffixes instead of the inline ternary expressions (update
all places where matrix.platform == ... && '...' || ... is used to reference
matrix.runner/matrix.arch).
- Around line 159-168: The merge-manifests job currently depends on the entire
build job (needs: build) so a single failing/hanging matrix cell prevents
manifest creation for all backends; change merge-manifests to run regardless of
build matrix failures and then gate per-backend manifest publishing: replace the
job-level if with something like if: always() && needs.build.result !=
'cancelled' && (github.event_name == 'schedule' || inputs.push_to_ghcr == true')
and inside the job, for each backend matrix entry (matrix.backend) conditionally
run the imagetools create/publish steps only when that backend's corresponding
build matrix cells succeeded (check the individual build outcomes via
needs.build.<jobName>.result or a small dependency graph/output from the build
matrix), so merge-manifests (job: merge-manifests) no longer gets skipped by
unrelated matrix failures and each imagetools create is guarded per backend.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 070ec702-2bc1-45c2-b76e-82745904d9f2
📒 Files selected for processing (4)
.github/workflows/unified-docker.ymldocker/unified/Dockerfiledocker/unified/build-image-rootless.shdocker/unified/install-ik-llama.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- docker/unified/install-ik-llama.sh
- docker/unified/build-image-rootless.sh
docker: update architecture flags for arm64 in install script
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docker/unified/install-ik-llama.sh (1)
41-43: ⚡ Quick winConsider documenting the ARM64 CPU baseline.
The code sets
-march=armv8.2-a+dotprod+fp16, which requires ARMv8.2-A with DotProd and FP16 extensions. This optimizes for modern ARM64 servers (Graviton2+, Neoverse N1+) but won't run on older ARM64 CPUs. Adding a brief comment would clarify the minimum CPU requirements for users deploying the resulting image.📝 Example documentation comment
if [ "$ARCH" = "arm64" ]; then + # Optimize for ARMv8.2-A+ (Graviton2+, Neoverse N1+); won't run on older ARM64 CPUs CMAKE_FLAGS+=(-DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16") fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/unified/install-ik-llama.sh` around lines 41 - 43, Add a brief comment above the ARM64 conditional explaining the CPU baseline required by the CMake flag: note that setting CMAKE_FLAGS with -DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" requires ARMv8.2-A with DotProd and FP16 (e.g., Graviton2+, Neoverse N1+, modern Cortex-A cores) and will not run on older ARM64 CPUs; reference the ARCH variable and the CMAKE_FLAGS line so it's easy to find and update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docker/unified/install-ik-llama.sh`:
- Around line 41-43: Add a brief comment above the ARM64 conditional explaining
the CPU baseline required by the CMake flag: note that setting CMAKE_FLAGS with
-DGGML_ARCH_FLAGS="-march=armv8.2-a+dotprod+fp16" requires ARMv8.2-A with
DotProd and FP16 (e.g., Graviton2+, Neoverse N1+, modern Cortex-A cores) and
will not run on older ARM64 CPUs; reference the ARCH variable and the
CMAKE_FLAGS line so it's easy to find and update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0e3fd81b-cb72-4b34-9334-4d17f3247aa2
📒 Files selected for processing (2)
.github/workflows/unified-docker.ymldocker/unified/install-ik-llama.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/unified-docker.yml
… enhance build script with CUDA architecture option
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docker/unified/build-image.sh (1)
38-58:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix help text to match actual default behavior and show proper quoting for copy/paste commands.
Line 48 omits the
-<arch>suffix that the code actually appends to the default tag at runtime. Line 58 example should show quotes around86;89to demonstrate correct shell escaping when copying the command.Proposed fix
- echo " DOCKER_IMAGE_TAG Set custom image tag (default: llama-swap:unified-cuda or llama-swap:unified-vulkan)" + echo " DOCKER_IMAGE_TAG Set custom image tag (default: llama-swap:unified-cuda-<arch> or llama-swap:unified-vulkan-<arch>)" @@ - echo " ./build-image.sh --cuda --cuda-arch=86;89 # Build for sm_86 and sm_89" + echo " ./build-image.sh --cuda --cuda-arch='86;89' # Build for sm_86 and sm_89"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/unified/build-image.sh` around lines 38 - 58, Update the help text in build-image.sh so it accurately reflects the runtime tag behavior: modify the DOCKER_IMAGE_TAG/default image tag description to show the appended -<arch> suffix (e.g., "llama-swap:unified-cuda-<arch> or llama-swap:unified-vulkan-<arch>") and correct the example usage for --cuda-arch to show proper shell quoting (e.g., add quotes around 86;89) so copy/paste works; adjust the echo lines that print the "DOCKER_IMAGE_TAG" help and the "--cuda-arch" example accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docker/unified/build-image.sh`:
- Line 21: The script unconditionally clears CMAKE_CUDA_ARCHITECTURES and
contains an impossible test, so exported environment values are discarded;
change the logic in the CUDA-arch handling (the CMAKE_CUDA_ARCHITECTURES
variable and the block that checks CLI --cuda-arch) so that CLI flag overrides
env but an exported CMAKE_CUDA_ARCHITECTURES is honored if no flag is provided,
and only set a default value when neither is present; specifically, remove the
line that wipes CMAKE_CUDA_ARCHITECTURES, replace the impossible condition that
uses both -z and -n with a clear precedence check (if CLI flag present -> use
it; else if CMAKE_CUDA_ARCHITECTURES env is set -> keep it; else -> set
default), and ensure any downstream usage references the resolved variable.
---
Outside diff comments:
In `@docker/unified/build-image.sh`:
- Around line 38-58: Update the help text in build-image.sh so it accurately
reflects the runtime tag behavior: modify the DOCKER_IMAGE_TAG/default image tag
description to show the appended -<arch> suffix (e.g.,
"llama-swap:unified-cuda-<arch> or llama-swap:unified-vulkan-<arch>") and
correct the example usage for --cuda-arch to show proper shell quoting (e.g.,
add quotes around 86;89) so copy/paste works; adjust the echo lines that print
the "DOCKER_IMAGE_TAG" help and the "--cuda-arch" example accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6ea8ae46-c00a-4fb3-a7ac-a2815e2f3690
📒 Files selected for processing (2)
.github/workflows/unified-docker.ymldocker/unified/build-image.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/unified-docker.yml
|
I started fixing the issues and cleaning the code. Currently the matrix expansion is not working for some reason as expected. Only vulkan and cuda on arm64 is created. It's probably my lack of GitHub actions experience. |
|
I needed to remove the newer architectures from the CUDA 12 so the build don't exceed 6 hours build time. Workflow: https://github.com/tdamir/llama-swap/actions/runs/25893668016 Images: https://github.com/tdamir/llama-swap/pkgs/container/llama-swap/versions Thoughts? |
I think dropping the native Blackwell SM targets from the CUDA 12 image is acceptable if that keeps the build under the GitHub Actions 6-hour limit. CUDA 12.8+ does support SM_120, so I’d describe this as a build-time tradeoff rather than a CUDA 12 limitation. |
This pull request introduces multi-architecture (amd64 and arm64) Docker image builds,
Every backend/arch will create own image (e.g. unified-cuda-amd64, unified-cuda-arm64, unified-vulkan-amd64, unified-vulkan-arm64) and after successfully pushed, images will be merged in unified-cuda and unified-vulkan tags.
I needed to split the root and rootless build because rootless was using the newly pushed image as a base and on first run, there is no such image since it is pushed only after.
I have extracted and updated the CUDA version to 13 and removed unsupported CUDA_ARCHITECTURES and added 121 for DGX Spark.
New separate cache images are created for architectures. This could cause slow building on first run. Maybe to consider not adding ARCH to cache image name if AMD64 to keep the compatibility.
Old: CACHE_REF="ghcr.io/mostlygeek/llama-swap:unified-${BACKEND}-cache"
New: CACHE_REF="ghcr.io/${GITHUB_REPOSITORY}:unified-${BACKEND}-cache-${ARCH}"
I needed to explicitly set -march in ik_llama arm build. I set it to the arm9.2 which is optimized for DGX Spark. This can be reconsidered.
if [ "$ARCH" = "arm64" ]; then
CMAKE_FLAGS+=(-DGGML_ARCH_FLAGS="-march=armv9.2-a+dotprod+fp16")
fi
I modified the install-sd.sh to include the frontend.
While working on this, I have noticed that if both CUDA and Vulkan are checked for build in one Workflow, then Vulkan amd64 hangs for some reason. When initiated separately, then jobs are running fine. I guess it has something to do with GitHub runners.