Skip to content

SPIKE: CI Optimization: Share built Docker images between GitHub Actions jobs #2347

@AlexSkrypnyk

Description

@AlexSkrypnyk

Goal

Build Docker images once in a dedicated build job, then reuse them in downstream parallel jobs (lint, test) without rebuilding. No container registry (ghcr.io, Docker Hub) involved.

Research findings

Option 1: docker save + upload-artifact@v4

Save image as tar, upload as artifact, download + docker load in next job.

# Build job
- uses: docker/build-push-action@v7
  with:
    tags: myimage:latest
    outputs: type=docker,dest=${{ runner.temp }}/myimage.tar
- uses: actions/upload-artifact@v4
  with:
    name: myimage
    path: ${{ runner.temp }}/myimage.tar
    compression-level: 0  # critical: 98% faster for binary data

# Downstream job
- uses: actions/download-artifact@v4
  with:
    name: myimage
    path: ${{ runner.temp }}
- run: docker load --input ${{ runner.temp }}/myimage.tar
  • Transfer time for 1-3GB images: 2-8 min
  • Uses artifact storage quota (separate from cache quota)
  • Byte-identical image transfer
  • Simple but slow for large images

Option 2: actions/cache with tar

Same as option 1 but uses actions/cache instead of artifacts.

  • 10GB per-repo cache limit (tight with multiple images)
  • Slightly faster restore than artifacts in some cases
  • Cache persists across workflow runs (bonus)

Option 3: Buildx --cache-from type=gha (TO TEST)

BuildKit stores individual layers in GHA cache. Downstream jobs "rebuild" with full cache hit.

# Build job
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v7
  with:
    tags: myimage:latest
    load: true
    cache-from: type=gha,scope=myimage
    cache-to: type=gha,scope=myimage,mode=max

# Downstream job (near-instant rebuild from cache)
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v7
  with:
    tags: myimage:latest
    load: true
    cache-from: type=gha,scope=myimage
  • Rebuild completes in seconds when fully cached
  • mode=max caches ALL intermediate layers (essential for full cache hits)
  • scope parameter required when building multiple images
  • Subject to 10GB per-repo cache limit
  • Not byte-identical (it's a rebuild), but functionally equivalent
  • Requires Buildx >= v0.21.0 and BuildKit >= v0.20.0

Option 4: Local cache exporter + actions/cache (TO TEST)

BuildKit writes layers to a local directory, cached via actions/cache. Similar to option 3 but uses actions/cache transport instead of GHA cache API.

# Build job
- uses: actions/cache@v4
  with:
    path: ${{ runner.temp }}/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ github.sha }}
    restore-keys: ${{ runner.os }}-buildx-
- uses: docker/build-push-action@v7
  with:
    tags: myimage:latest
    load: true
    cache-from: type=local,src=${{ runner.temp }}/.buildx-cache
    cache-to: type=local,dest=${{ runner.temp }}/.buildx-cache-new,mode=max
# Prevent cache from growing unbounded
- run: rm -rf ${{ runner.temp }}/.buildx-cache && mv ${{ runner.temp }}/.buildx-cache-new ${{ runner.temp }}/.buildx-cache

# Downstream job
- uses: actions/cache/restore@v4
  with:
    path: ${{ runner.temp }}/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ github.sha }}
- uses: docker/build-push-action@v7
  with:
    tags: myimage:latest
    load: true
    cache-from: type=local,src=${{ runner.temp }}/.buildx-cache
  • Avoids GHA cache API throttling (goes through actions/cache instead)
  • Same 10GB cache limit
  • Cache directory can be large
  • Still requires a rebuild step in downstream jobs (though cached)

Option 5: docker save + zstd streaming

Apache Pulsar's approach using a custom CLI tool (gh-actions-artifact-client) for streaming with zstd compression.

  • Achieves 180+ MiB/s upload, 100+ MiB/s download
  • Avoids writing full tar to disk
  • Requires third-party tool (lhotari/gh-actions-artifact-client)

Comparison

Approach Transfer (1-3GB) Complexity Cache Limit Byte-identical
Artifact upload/download 2-8 min Low Artifact quota Yes
actions/cache with tar 1-5 min Low 10GB/repo Yes
Buildx type=gha 10-60s rebuild Low 10GB/repo No (rebuild)
Local cache exporter 1-4 min Medium 10GB/repo No (rebuild)
zstd streaming 30s-2min Medium Artifact quota Yes

Next steps

  1. Test option 3 (Buildx GHA cache) with the project's CLI image to measure actual rebuild time with full cache hit
  2. Test option 4 (local cache exporter) as fallback if option 3 hits API throttling or cache size issues
  3. If viable, restructure workflow: buildlint + test (downstream jobs rebuild from cache)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    IdeaIdea that requires additional discussion

    Type

    Projects

    Status

    BACKLOG

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions