Skip to content

docker: Add optional FFmpeg support for whisper.cpp#785

Open
ts200-G wants to merge 1 commit into
mostlygeek:mainfrom
ts200-G:whisper-ffmpeg-support
Open

docker: Add optional FFmpeg support for whisper.cpp#785
ts200-G wants to merge 1 commit into
mostlygeek:mainfrom
ts200-G:whisper-ffmpeg-support

Conversation

@ts200-G

@ts200-G ts200-G commented May 23, 2026

Copy link
Copy Markdown

Summary

Solves #783
Adds optional FFmpeg support for whisper.cpp builds as described here

Currently ffmpeg needs to be installed manually and it is launched as a sub-process when using it with whisper-server and --convert. Setting WHISPER_FFMPEG=yes at build time enables linking the ffmpeg libraries into the whisper binary at compile time.

The result is that the binary can natively decode audio formats like Opus, AAC, MP3, etc. without spawning any external process.

Changes:

  • Adds the build arg/env toggle
  • Installs required FFmpeg dev/runtime libraries in build and runtime images.
  • Adds -DWHISPER_FFMPEG=ON to CMAKE_FLAGS in install-whisper.sh if enabled.
  • Separates Docker build cache for FFmpeg/non-FFmpeg builds

WHISPER_FFMPEG Behavior:

  • yes / true => enabled (default)
  • no / false => disabled

Libraries are not installed conditionally, but image size difference is minimal:
Build: pkg-config libavcodec-dev libavformat-dev libavutil-dev libswresample-dev ~31.5 MB
Runtime: libavcodec60 libavformat60 libavutil58 libswresample4 ~20.4 MB

pkg-config not mentioned in whisper.cpp readme, but added to build image as whisper uses it to detect them. libswresample-dev/libswresample4` also not in readme, but is checked for at build time so added for consistency: src

@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c2bc46d-f179-4255-aad9-9ee777169709

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR introduces FFmpeg as an optional dependency in the unified Docker whisper.cpp build pipeline. The install script processes a new WHISPER_FFMPEG configuration flag to conditionally enable FFmpeg support during CMake compilation, while the Dockerfile provides the necessary build-time and runtime libraries across CUDA and Vulkan stages, and the build script exposes the feature to users as a configurable knob.

Changes

FFmpeg-optional whisper.cpp build configuration

Layer / File(s) Summary
Install script FFmpeg environment handling
docker/unified/install-whisper.sh
install-whisper.sh accepts WHISPER_FFMPEG (defaulting to yes), normalizes it to lowercase, sets an internal WHISPER_FFMPEG_ENABLED flag, and conditionally appends -DWHISPER_FFMPEG=ON to CMake configuration when enabled.
Dockerfile builder and runtime stage updates
docker/unified/Dockerfile
Both CUDA and Vulkan builder stages now install FFmpeg development libraries (pkg-config, libavcodec, libavformat, libavutil, libswresample). The whisper-build stage accepts WHISPER_FFMPEG and WHISPER_COMMIT_HASH build args, passes them to install-whisper.sh, and adds BuildKit cache mounts for ccache and the whisper build directory. Both CUDA and Vulkan runtime stages now install FFmpeg runtime libraries (libavcodec60, libavformat60, libavutil58, libswresample4).
Build script configuration and documentation
docker/unified/build-image.sh
Introduces WHISPER_FFMPEG environment variable (defaulting to yes), documents it in the --help output, and passes it to docker buildx build as an additional build argument.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • mostlygeek/llama-swap#597: Initial unified CUDA container infrastructure that this PR extends with FFmpeg support and improved build caching.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding optional FFmpeg support for whisper.cpp builds in Docker.
Description check ✅ Passed The description thoroughly explains the purpose, implementation details, and rationale for the changes, directly relating to the modified Docker files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mostlygeek

Copy link
Copy Markdown
Owner

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 24, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docker/unified/Dockerfile (1)

63-64: 💤 Low value

Consider the cache segregation strategy.

The cache mount ID includes ${WHISPER_FFMPEG}, which creates separate build caches for FFmpeg-enabled and FFmpeg-disabled builds. While this ensures clean builds, it means users toggling between WHISPER_FFMPEG=yes and WHISPER_FFMPEG=no will not benefit from shared cache and will trigger full rebuilds.

If toggling is expected to be rare, this is fine. If users frequently switch, consider whether a shared cache with proper CMake reconfiguration would be more efficient.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/unified/Dockerfile` around lines 63 - 64, The current Dockerfile RUN
uses a cache mount id "whisper-${BACKEND}-${WHISPER_FFMPEG}" which splits the
whisper build cache by the WHISPER_FFMPEG flag causing separate caches and full
rebuilds when toggling; to fix, change the cache id to omit ${WHISPER_FFMPEG}
(e.g., "whisper-${BACKEND}") so FFmpeg-enabled/disabled builds share the same
cache, and ensure the CMake/configure step in the whisper build (look for the
whisper.cpp build invocation) properly reconfigures on flag changes to avoid
stale artifacts; if you must keep separate caches, add a short comment
documenting the trade-off and rationale.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker/unified/Dockerfile`:
- Around line 140-144: The Vulkan runtime Dockerfile RUN command currently
installs the build-time tool "pkg-config" alongside runtime libraries; remove
"pkg-config" from the apt-get install list in the Vulkan runtime stage (the RUN
that installs libgomp1, libvulkan1, mesa-vulkan-drivers, python3, curl,
ca-certificates, pkg-config, libavcodec60, libavformat60, libavutil58,
libswresample4) so only runtime packages remain (leave libgomp1, libvulkan1,
mesa-vulkan-drivers, python3, curl, ca-certificates, and the libav* packages)
mirroring the CUDA runtime stage which correctly omits pkg-config.

---

Nitpick comments:
In `@docker/unified/Dockerfile`:
- Around line 63-64: The current Dockerfile RUN uses a cache mount id
"whisper-${BACKEND}-${WHISPER_FFMPEG}" which splits the whisper build cache by
the WHISPER_FFMPEG flag causing separate caches and full rebuilds when toggling;
to fix, change the cache id to omit ${WHISPER_FFMPEG} (e.g.,
"whisper-${BACKEND}") so FFmpeg-enabled/disabled builds share the same cache,
and ensure the CMake/configure step in the whisper build (look for the
whisper.cpp build invocation) properly reconfigures on flag changes to avoid
stale artifacts; if you must keep separate caches, add a short comment
documenting the trade-off and rationale.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4cfed62a-893f-4382-8ee4-89821949c763

📥 Commits

Reviewing files that changed from the base of the PR and between 636b53e and 45acd73.

📒 Files selected for processing (3)
  • docker/unified/Dockerfile
  • docker/unified/build-image.sh
  • docker/unified/install-whisper.sh

Comment thread docker/unified/Dockerfile
Comment on lines 140 to 144
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 libvulkan1 mesa-vulkan-drivers \
python3 curl ca-certificates \
pkg-config libavcodec60 libavformat60 libavutil58 libswresample4 \
&& rm -rf /var/lib/apt/lists/*

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Remove pkg-config from the Vulkan runtime stage.

pkg-config is a build-time tool used to detect FFmpeg during compilation (as noted in the PR description). It should not be included in the runtime image. The CUDA runtime stage (line 126) correctly omits it.

🔧 Proposed fix
 RUN apt-get update && apt-get install -y --no-install-recommends \
     libgomp1 libvulkan1 mesa-vulkan-drivers \
     python3 curl ca-certificates \
-    pkg-config libavcodec60 libavformat60 libavutil58 libswresample4 \
+    libavcodec60 libavformat60 libavutil58 libswresample4 \
     && rm -rf /var/lib/apt/lists/*
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 libvulkan1 mesa-vulkan-drivers \
python3 curl ca-certificates \
pkg-config libavcodec60 libavformat60 libavutil58 libswresample4 \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 libvulkan1 mesa-vulkan-drivers \
python3 curl ca-certificates \
libavcodec60 libavformat60 libavutil58 libswresample4 \
&& rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/unified/Dockerfile` around lines 140 - 144, The Vulkan runtime
Dockerfile RUN command currently installs the build-time tool "pkg-config"
alongside runtime libraries; remove "pkg-config" from the apt-get install list
in the Vulkan runtime stage (the RUN that installs libgomp1, libvulkan1,
mesa-vulkan-drivers, python3, curl, ca-certificates, pkg-config, libavcodec60,
libavformat60, libavutil58, libswresample4) so only runtime packages remain
(leave libgomp1, libvulkan1, mesa-vulkan-drivers, python3, curl,
ca-certificates, and the libav* packages) mirroring the CUDA runtime stage which
correctly omits pkg-config.

…tlygeek#783)

- Add FFmpeg dev libraries at build time and runtime libraries in CUDA/Vulkan runtime stages.
- Add `WHISPER_FFMPEG` build arg (default: yes) to build whisper.cpp with `-DWHISPER_FFMPEG=ON`.
@ts200-G ts200-G force-pushed the whisper-ffmpeg-support branch from 45acd73 to 79a2dba Compare June 5, 2026 15:00
@ts200-G

ts200-G commented Jun 5, 2026

Copy link
Copy Markdown
Author

Have removed pkg-config from vulkan runtime stage and also removed "whisper-${BACKEND}" from the cache bind mount id as I couldn't find anything that mentions this id option in the docker docs to confirm what it does, so left it alone since build args invalidate the cache anyways.

Build ran succesfully with:

docker buildx build \                                    
  --no-cache \
  --build-arg BACKEND=cuda \
  --build-arg CMAKE_CUDA_ARCHITECTURES=86 \
  --build-arg RUN_UID=10001 \
  --build-arg WHISPER_FFMPEG=yes \
  --build-arg LLAMA_COMMIT_HASH=86591c7536ced84cea49ee5b3e24096632a33c5a \
  --build-arg WHISPER_COMMIT_HASH=99613cb720b65036237d44b52f753b51f75c2797 \
  --build-arg SD_COMMIT_HASH=1f9ee88e09c258053fa59d5e05e23dfb10fa0b13 \
  --build-arg IK_LLAMA_COMMIT_HASH=6b9de3dbaa21ae95ea80638e5ee836795cc48c93 \
  --build-arg LS_VERSION=ccfba0df28ab4d5dcace9056469cbc929249696b \
  -t llama-swap:unified-cuda-rootless \
  ./docker/unified

And tested with this config:

macros:
  "WHISPER": >
    /usr/local/bin/whisper-server
    --host 127.0.0.1
    --port ${PORT}
    --flash-attn
    --inference-path ""
    --request-path /v1/audio/transcriptions
#....

models:
  "stt-large-turbo": &stt_def
    macros:
      "MODEL": /models/stt/ggml-large-v3-turbo-q8_0.bin
      "BACKEND": WHISPER
      "modality": audio->text
      "family": whisper
      "variant": large-v3-turbo
      "params": 0.8B
      "quant": Q8_0
      "dl_url": https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-large-v3-turbo-q8_0.bin
    cmd:
      ${backend}
      --port ${PORT}
      -m ${MODEL}
    proxy: "http://127.0.0.1:${PORT}"
    description: "audio transcriptions"
    checkEndpoint: /v1/audio/transcriptions/
    metadata: &tts_meta
      "model-family": ${family}
      "model-params": ${params}
      "backend": ${backend}
      "modality": ${modality}
      "capabilities":
        audio:
          "transcription": true
          "translation": false # translation data excluded from finetune
          "language-detection": true
  • Correctly converts audio without using --convert and no additional ffmpeg install/sub-process.

Just let me know if you'd like any other changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants