docker: Add optional FFmpeg support for whisper.cpp#785
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR introduces FFmpeg as an optional dependency in the unified Docker whisper.cpp build pipeline. The install script processes a new ChangesFFmpeg-optional whisper.cpp build configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docker/unified/Dockerfile (1)
63-64: 💤 Low valueConsider the cache segregation strategy.
The cache mount ID includes
${WHISPER_FFMPEG}, which creates separate build caches for FFmpeg-enabled and FFmpeg-disabled builds. While this ensures clean builds, it means users toggling betweenWHISPER_FFMPEG=yesandWHISPER_FFMPEG=nowill not benefit from shared cache and will trigger full rebuilds.If toggling is expected to be rare, this is fine. If users frequently switch, consider whether a shared cache with proper CMake reconfiguration would be more efficient.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/unified/Dockerfile` around lines 63 - 64, The current Dockerfile RUN uses a cache mount id "whisper-${BACKEND}-${WHISPER_FFMPEG}" which splits the whisper build cache by the WHISPER_FFMPEG flag causing separate caches and full rebuilds when toggling; to fix, change the cache id to omit ${WHISPER_FFMPEG} (e.g., "whisper-${BACKEND}") so FFmpeg-enabled/disabled builds share the same cache, and ensure the CMake/configure step in the whisper build (look for the whisper.cpp build invocation) properly reconfigures on flag changes to avoid stale artifacts; if you must keep separate caches, add a short comment documenting the trade-off and rationale.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docker/unified/Dockerfile`:
- Around line 140-144: The Vulkan runtime Dockerfile RUN command currently
installs the build-time tool "pkg-config" alongside runtime libraries; remove
"pkg-config" from the apt-get install list in the Vulkan runtime stage (the RUN
that installs libgomp1, libvulkan1, mesa-vulkan-drivers, python3, curl,
ca-certificates, pkg-config, libavcodec60, libavformat60, libavutil58,
libswresample4) so only runtime packages remain (leave libgomp1, libvulkan1,
mesa-vulkan-drivers, python3, curl, ca-certificates, and the libav* packages)
mirroring the CUDA runtime stage which correctly omits pkg-config.
---
Nitpick comments:
In `@docker/unified/Dockerfile`:
- Around line 63-64: The current Dockerfile RUN uses a cache mount id
"whisper-${BACKEND}-${WHISPER_FFMPEG}" which splits the whisper build cache by
the WHISPER_FFMPEG flag causing separate caches and full rebuilds when toggling;
to fix, change the cache id to omit ${WHISPER_FFMPEG} (e.g.,
"whisper-${BACKEND}") so FFmpeg-enabled/disabled builds share the same cache,
and ensure the CMake/configure step in the whisper build (look for the
whisper.cpp build invocation) properly reconfigures on flag changes to avoid
stale artifacts; if you must keep separate caches, add a short comment
documenting the trade-off and rationale.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4cfed62a-893f-4382-8ee4-89821949c763
📒 Files selected for processing (3)
docker/unified/Dockerfiledocker/unified/build-image.shdocker/unified/install-whisper.sh
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| libgomp1 libvulkan1 mesa-vulkan-drivers \ | ||
| python3 curl ca-certificates \ | ||
| pkg-config libavcodec60 libavformat60 libavutil58 libswresample4 \ | ||
| && rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Remove pkg-config from the Vulkan runtime stage.
pkg-config is a build-time tool used to detect FFmpeg during compilation (as noted in the PR description). It should not be included in the runtime image. The CUDA runtime stage (line 126) correctly omits it.
🔧 Proposed fix
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 libvulkan1 mesa-vulkan-drivers \
python3 curl ca-certificates \
- pkg-config libavcodec60 libavformat60 libavutil58 libswresample4 \
+ libavcodec60 libavformat60 libavutil58 libswresample4 \
&& rm -rf /var/lib/apt/lists/*📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN apt-get update && apt-get install -y --no-install-recommends \ | |
| libgomp1 libvulkan1 mesa-vulkan-drivers \ | |
| python3 curl ca-certificates \ | |
| pkg-config libavcodec60 libavformat60 libavutil58 libswresample4 \ | |
| && rm -rf /var/lib/apt/lists/* | |
| RUN apt-get update && apt-get install -y --no-install-recommends \ | |
| libgomp1 libvulkan1 mesa-vulkan-drivers \ | |
| python3 curl ca-certificates \ | |
| libavcodec60 libavformat60 libavutil58 libswresample4 \ | |
| && rm -rf /var/lib/apt/lists/* |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker/unified/Dockerfile` around lines 140 - 144, The Vulkan runtime
Dockerfile RUN command currently installs the build-time tool "pkg-config"
alongside runtime libraries; remove "pkg-config" from the apt-get install list
in the Vulkan runtime stage (the RUN that installs libgomp1, libvulkan1,
mesa-vulkan-drivers, python3, curl, ca-certificates, pkg-config, libavcodec60,
libavformat60, libavutil58, libswresample4) so only runtime packages remain
(leave libgomp1, libvulkan1, mesa-vulkan-drivers, python3, curl,
ca-certificates, and the libav* packages) mirroring the CUDA runtime stage which
correctly omits pkg-config.
…tlygeek#783) - Add FFmpeg dev libraries at build time and runtime libraries in CUDA/Vulkan runtime stages. - Add `WHISPER_FFMPEG` build arg (default: yes) to build whisper.cpp with `-DWHISPER_FFMPEG=ON`.
45acd73 to
79a2dba
Compare
|
Have removed pkg-config from vulkan runtime stage and also removed "whisper-${BACKEND}" from the cache bind mount id as I couldn't find anything that mentions this id option in the docker docs to confirm what it does, so left it alone since build args invalidate the cache anyways. Build ran succesfully with: docker buildx build \
--no-cache \
--build-arg BACKEND=cuda \
--build-arg CMAKE_CUDA_ARCHITECTURES=86 \
--build-arg RUN_UID=10001 \
--build-arg WHISPER_FFMPEG=yes \
--build-arg LLAMA_COMMIT_HASH=86591c7536ced84cea49ee5b3e24096632a33c5a \
--build-arg WHISPER_COMMIT_HASH=99613cb720b65036237d44b52f753b51f75c2797 \
--build-arg SD_COMMIT_HASH=1f9ee88e09c258053fa59d5e05e23dfb10fa0b13 \
--build-arg IK_LLAMA_COMMIT_HASH=6b9de3dbaa21ae95ea80638e5ee836795cc48c93 \
--build-arg LS_VERSION=ccfba0df28ab4d5dcace9056469cbc929249696b \
-t llama-swap:unified-cuda-rootless \
./docker/unifiedAnd tested with this config: macros:
"WHISPER": >
/usr/local/bin/whisper-server
--host 127.0.0.1
--port ${PORT}
--flash-attn
--inference-path ""
--request-path /v1/audio/transcriptions
#....
models:
"stt-large-turbo": &stt_def
macros:
"MODEL": /models/stt/ggml-large-v3-turbo-q8_0.bin
"BACKEND": WHISPER
"modality": audio->text
"family": whisper
"variant": large-v3-turbo
"params": 0.8B
"quant": Q8_0
"dl_url": https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-large-v3-turbo-q8_0.bin
cmd:
${backend}
--port ${PORT}
-m ${MODEL}
proxy: "http://127.0.0.1:${PORT}"
description: "audio transcriptions"
checkEndpoint: /v1/audio/transcriptions/
metadata: &tts_meta
"model-family": ${family}
"model-params": ${params}
"backend": ${backend}
"modality": ${modality}
"capabilities":
audio:
"transcription": true
"translation": false # translation data excluded from finetune
"language-detection": true
Just let me know if you'd like any other changes |
Summary
Solves #783
Adds optional FFmpeg support for whisper.cpp builds as described here
Currently ffmpeg needs to be installed manually and it is launched as a sub-process when using it with whisper-server and
--convert. Setting WHISPER_FFMPEG=yes at build time enables linking the ffmpeg libraries into the whisper binary at compile time.The result is that the binary can natively decode audio formats like Opus, AAC, MP3, etc. without spawning any external process.
Changes:
-DWHISPER_FFMPEG=ONto CMAKE_FLAGS in install-whisper.sh if enabled.WHISPER_FFMPEG Behavior:
Libraries are not installed conditionally, but image size difference is minimal:
Build: pkg-config libavcodec-dev libavformat-dev libavutil-dev libswresample-dev ~31.5 MB
Runtime: libavcodec60 libavformat60 libavutil58 libswresample4 ~20.4 MB
pkg-config not mentioned in whisper.cpp readme, but added to build image as whisper uses it to detect them.libswresample-dev/libswresample4` also not in readme, but is checked for at build time so added for consistency: src