proxy: parallelize startup preload within process groups#20
Conversation
…k#523) ## Summary - Add `--provenance=false` to docker build commands in `build-container.sh` - BuildKit attestation manifests are stored as untagged images in GHCR, and the `delete-untagged-containers` cleanup job deletes them, breaking the manifest list and causing `manifest unknown` errors on pull - ref: actions/delete-package-versions#162
…ostlygeek#520) add incremental rendering to Playground > Chat
- Keep Playground component mounted when navigating away, preserving streaming/generating state - Add animated gradient effect on Playground nav link when activity is in progress
…ek#516) add a real time counter of pending (inflight) requests to the UI.
Pause auto-scroll when the user scrolls up to review logs, and resume when they scroll back to the bottom. - add `userScrolledUp` state variable - add `handleScroll` to detect scroll position with 40px threshold - guard the auto-scroll effect with `!userScrolledUp` closes mostlygeek#529
Add setParamsByID filter that applies different request parameters based
on the requested model ID, enabling per-alias behaviour for a single
loaded model.
- add SetParamsByID field to Filters struct and SanitizedSetParamsByID
method
- substitute ${MODEL_ID} and other macros in setParamsByID keys and
values
- validate no unknown macros remain in keys or values after substitution
- apply setParamsByID in proxyInferenceHandler after setParams (can
override it)
- update config-schema.json with setParamsByID definition
- update UI to show aliases and make them selectable in the Playground
closes mostlygeek#534
Add a new Rerank tab to the playground that lets users test /v1/rerank endpoints. Supports a visual table editor and a JSON editor mode that stay in sync when toggling between them. - add rerankApi.ts with typed wrapper for /v1/rerank - add RerankInterface.svelte with query input, sortable document table, color-coded scores, auto-add row, cancel/clear, and token usage - add rerankLoading store to playgroundActivity derived store - register Rerank tab in Playground.svelte Updates mostlygeek#481
Updated README to enhance the description of the web interface and added details about features like token metrics, request inspection, model management, and real-time log streaming.
Updated description to clarify compatibility and usage.
Added new images for model loading and real-time log streaming sections.
Add `cuda13` as a supported build architecture, targeting the `ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image. The `server-cuda13` image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existing `server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput. - Add `cuda13` to the allowed architectures list in `docker/build-container.sh` - Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so the container is built and pushed automatically
Add a copy-to-clipboard button that appears on hover for each code block rendered in the chat interface assistant messages. - Svelte action `codeBlockCopy` injects a button into every `<pre>` element - MutationObserver reattaches buttons as streaming content arrives - Button shows a check icon for 2 seconds after a successful copy - Uses clipboard API with execCommand fallback for non-secure contexts - CSS hides button by default and reveals it on pre:hover https://claude.ai/code/session_01PTA5ao5YQuFAS6a9juLeZW --------- Co-authored-by: Claude <noreply@anthropic.com>
Add a new configuration parameter globalTTL that all models will inherit. The default value is 0 which matches the currently functionality to never automatically unload a model. The model.ttl's default has changed to -1, which means use the global TTL value. Any model.ttl >=0 is now value with 0 meaning never unload. This allows a model to override a globalTTL > 0 and be configured to never unload. Fixes mostlygeek#459 Closes mostlygeek#512
…#578) Extend macro substitution to the name and description fields of ModelConfig, matching the behavior already present for cmd, proxy, checkEndpoint, and filters. - substitute global/model macros (including MODEL_ID) in name and description - substitute PORT macro in name and description when allocated - validate no unknown macros remain in name and description after substitution - add tests for macro substitution, MODEL_ID, and unknown macro error
Use natural sorting for model names. Previously the model list was sorted lexicographically, which resulted in unintuitive ordering when numbers were included in the name. Example: Before qwen3.5:2B qwen3.5:35B-3AB qwen3.5:9B After qwen3.5:2B qwen3.5:9B qwen3.5:35B-3AB This change sorts models using natural order so numeric parts are compared numerically.
Upgrade vite and related dependencies to take advantage of Vite 8's improved build times via Rolldown and Oxc. - vite: ^6.3.5 → ^8.0.0 - @sveltejs/vite-plugin-svelte: ^5.0.3 → ^7.0.0 - svelte: ^5.19.0 → ^5.46.4 - vite-plugin-compression2: ^2.4.0 → ^2.5.1 - vitest: ^4.0.18 → ^4.1.0 --------- Co-authored-by: Claude <noreply@anthropic.com>
properly parse anthropic compatible usage data from streaming responses. closes: mostlygeek#577
Add proxy routes for stable-diffusion.cpp's /sdapi/v1/txt2img, /sdapi/v1/img2img, and /sdapi/v1/loras endpoints. POST endpoints use proxyInferenceHandler (model in JSON body), GET /loras uses proxyGETModelHandler (model in query param). Update the image playground with a dual-mode UI supporting both OpenAI and SDAPI backends. In SDAPI mode, loras are fetched first to prime the server-side cache, and all txt2img parameters are exposed (negative prompt, steps, cfg_scale, seed, batch_size, clip_skip, sampler, scheduler, lora selection with multipliers). - Add 3 sdapi route registrations in proxymanager.go - Add sdApi.ts client with generateSdImage and fetchSdLoras - Add SDAPI types (SdApiTxt2ImgRequest, SdApiResponse, etc.) - Add /sdapi to vite dev proxy config - Add backend tests for sdapi routing - Support batch image display in gallery grid https://claude.ai/code/session_0186MGX6NXdHVBTv2KH45fqn --------- Co-authored-by: Claude <noreply@anthropic.com>
…ostlygeek#597) Add Docker build scripts for a unified cuda docker container with llama-server, stable-diffusion.cpp, whisper.cpp.
- set up a GHA scheduled job to build the container nightly - enabling pushing a llama-swap:unified and a llama-swap:unified-Y-M-D image to ghcr.io - tidy up Dockerfile to use a non-root user and llama-swap as an entry point
Update docker/unified scripts to support building both cuda and vulkan unified images.
multiple fixes to vulkan build: - use ubuntu 26.04 to be compatible with AMD 395+ (Strix halo) hardware - add home directory in container - fix stable-diffusion install to actually enable vulkan --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…tlygeek#625) Expose CMAKE_CUDA_ARCHITECTURES as a Docker build ARG so users can customize CUDA architectures via --build-arg without editing the Dockerfile. - convert hardcoded ENV to ARG with default, feeding into ENV - replace silent fallback defaults (:-) in scripts with :? guards to fail fast if the env var is missing - add usage example to Dockerfile header Follow up to: mostlygeek#624 https://claude.ai/code/session_01EWiUe7jNABX7Uz95dUGJqK Co-authored-by: Claude <noreply@anthropic.com>
…ek#627) Extend the existing config-schema workflow to also validate config.example.yaml against config-schema.json using check-jsonschema. - add config.example.yaml to PR and push path triggers - install check-jsonschema via pip - run validation of config.example.yaml against schema https://claude.ai/code/session_01Y1oqwE6mwNs9UTJgZRgXtG --------- Co-authored-by: Claude <noreply@anthropic.com>
…k#619) Add configurable HTTP timeout settings to both models and peers to support installations that requires longer timeouts than the current hardcoded defaults. Closes mostlygeek#618
Keep request duration from being underreported when upstream timings only cover part of the full request lifecycle. - compare wall-clock and upstream timing durations - keep token and throughput values from timings - add regression coverage for underreported timings fixes mostlygeek#602
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughStartup preload groups models by process-group and starts each group concurrently; added a thread-safe InflightCounter and Gin Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity. This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously. - Group preload models by ProcessGroup before starting - Use sync.WaitGroup to parallelize within each group - Call swapProcessGroup once per group (handles exclusivity) - Log group completion with model count Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/proxymanager.go`:
- Around line 227-230: The current preload logic ignores errors from
http.NewRequest and processGroup.ProxyRequest and always emits
ModelPreloadedEvent{Success: true}; fix by checking and handling both errors:
capture the error returned by http.NewRequest and by
processGroup.ProxyRequest(mid, &DiscardWriter{}, req), only emit
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) and call
proxyLogger.Infof when ProxyRequest returns nil, otherwise emit
ModelPreloadedEvent{ModelName: mid, Success: false} (include the error) and log
proxyLogger.Errorf with the error; ensure you reference ProxyRequest,
DiscardWriter, ModelPreloadedEvent, event.Emit and
proxyLogger.Infof/proxyLogger.Errorf when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0b7a1a88-3621-42f7-9618-df4413efb758
📒 Files selected for processing (1)
proxy/proxymanager.go
| req, _ := http.NewRequest("GET", "/", nil) | ||
| processGroup.ProxyRequest(mid, &DiscardWriter{}, req) | ||
| event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) | ||
| proxyLogger.Infof("Preloaded model: %s", mid) |
There was a problem hiding this comment.
Handle ProxyRequest errors before emitting success events.
Line 228 ignores the preload error, but Line 229 always emits Success: true. This can incorrectly report failed preloads as successful.
Suggested fix
- req, _ := http.NewRequest("GET", "/", nil)
- processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
- event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
- proxyLogger.Infof("Preloaded model: %s", mid)
+ req, _ := http.NewRequest("GET", "/", nil)
+ if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil {
+ event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
+ proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
+ return
+ }
+ event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+ proxyLogger.Infof("Preloaded model: %s", mid)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| req, _ := http.NewRequest("GET", "/", nil) | |
| processGroup.ProxyRequest(mid, &DiscardWriter{}, req) | |
| event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) | |
| proxyLogger.Infof("Preloaded model: %s", mid) | |
| req, _ := http.NewRequest("GET", "/", nil) | |
| if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil { | |
| event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false}) | |
| proxyLogger.Errorf("Failed to preload model %s: %v", mid, err) | |
| return | |
| } | |
| event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) | |
| proxyLogger.Infof("Preloaded model: %s", mid) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@proxy/proxymanager.go` around lines 227 - 230, The current preload logic
ignores errors from http.NewRequest and processGroup.ProxyRequest and always
emits ModelPreloadedEvent{Success: true}; fix by checking and handling both
errors: capture the error returned by http.NewRequest and by
processGroup.ProxyRequest(mid, &DiscardWriter{}, req), only emit
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) and call
proxyLogger.Infof when ProxyRequest returns nil, otherwise emit
ModelPreloadedEvent{ModelName: mid, Success: false} (include the error) and log
proxyLogger.Errorf with the error; ensure you reference ProxyRequest,
DiscardWriter, ModelPreloadedEvent, event.Emit and
proxyLogger.Infof/proxyLogger.Errorf when making the change.
78dbba7 to
0353306
Compare
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/go-ci.yml (2)
49-67:⚠️ Potential issue | 🟡 MinorCache key mismatch prevents cache reuse.
Line 54 uses
cmd/simple-responder/simple-responder.gofor the restore step, but line 67 still usesmisc/simple-responder/simple-responder.gofor the save step. Since the file exists atcmd/simple-responder/simple-responder.go, the cache key will always differ between restore and save, preventing any cache hit. Update line 67 to hashcmd/simple-responder/simple-responder.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/go-ci.yml around lines 49 - 67, The cache keys for the restore and save steps differ: the "restore-simple-responder" step hashes 'cmd/simple-responder/simple-responder.go' but the "save-simple-responder" step hashes 'misc/simple-responder/simple-responder.go', preventing cache reuse; update the key in the save step (save-simple-responder) to use the same hashFiles path 'cmd/simple-responder/simple-responder.go' so both steps compute identical cache keys.
49-58:⚠️ Potential issue | 🟠 MajorFix path filter to include simple-responder changes, and align cache keys.
This job now depends on
cmd/simple-responder/simple-responder.go(via the cache restore key), but the workflow excludes allcmd/**paths. A PR that only modifies this file will skip CI even though the job needs to rebuild it for tests.Additionally, the restore cache key hashes
cmd/simple-responder/simple-responder.gowhile the save step hashesmisc/simple-responder/simple-responder.go—these will never match, so the cache is non-functional and always rebuilds.Fix: Either whitelist
cmd/simple-responder/simple-responder.goin the path filters and align the cache keys to use the same source file, or if this helper truly needs no testing, document why the cache key should not follow the source.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/go-ci.yml around lines 49 - 58, The workflow excludes cmd/** changes but the restore/save cache keys and this job depend on cmd/simple-responder/simple-responder.go, causing CI to skip and cache misses; update the workflow path filter to include cmd/simple-responder/** (or specifically cmd/simple-responder/simple-responder.go) so PRs touching that file trigger the job, and make the cache key paths consistent by changing either the restore key or the save key so both reference the same source file (ensure the cache step identified as restore-simple-responder and the corresponding save step use identical hashFiles('cmd/simple-responder/simple-responder.go') or both point to misc/... if you intend that path).
♻️ Duplicate comments (1)
proxy/proxymanager.go (1)
265-268:⚠️ Potential issue | 🔴 CriticalH1: Emit preload success only after the preload call succeeds.
Line 265/266 still drops both request-construction and
ProxyRequesterrors, while Line 267 always emitsSuccess: true. A failed startup will still look preloaded to observers.Suggested fix
- req, _ := http.NewRequest("GET", "/", nil) - processGroup.ProxyRequest(mid, &DiscardWriter{}, req) - event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) - proxyLogger.Infof("Preloaded model: %s", mid) + req, err := http.NewRequest("GET", "/", nil) + if err != nil { + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false}) + proxyLogger.Errorf("Failed to preload model %s: %v", mid, err) + return + } + if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil { + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false}) + proxyLogger.Errorf("Failed to preload model %s: %v", mid, err) + return + } + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) + proxyLogger.Infof("Preloaded model: %s", mid)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager.go` around lines 265 - 268, The code currently ignores errors from http.NewRequest and processGroup.ProxyRequest and always emits ModelPreloadedEvent{ModelName: mid, Success: true}; change this so you capture and check both errors: call http.NewRequest and handle its error, call processGroup.ProxyRequest and capture any returned error, only emit ModelPreloadedEvent with Success: true if both succeed; on failure, emit Success: false and log the error via proxyLogger.Errorf (include mid and error), and avoid treating the preload as successful when ProxyRequest or request creation fails.
🧹 Nitpick comments (7)
AGENTS.md (1)
30-40: Consider adding a language identifier to the fenced code block.The commit message example code block lacks a language specifier (e.g.,
textormarkdown), which triggers a markdown lint warning.Proposed fix
-``` +```text proxy: add new feature🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 30 - 40, The fenced code block showing the example commit message in AGENTS.md is missing a language identifier and triggers a markdown lint warning; update that block (the triple-backtick block containing "proxy: add new feature" and the list) to include a language tag such as text or markdown (e.g., change ``` to ```text) so the linter recognizes the block type and the warning is resolved.proxy/peerproxy.go (1)
40-54: Usehttp.DefaultTransport.Clone()to start the transport configuration.A fresh
http.Transportdrops every stdlib default you did not remember to re-specify. Cloning the default transport first preserves Go's baseline behavior (Proxy, ForceAttemptHTTP2, MaxIdleConns, etc.) and lets you override only the peer-specific timeouts and MaxIdleConnsPerHost.Suggested refactor
- peerTransport := &http.Transport{ - Proxy: http.ProxyFromEnvironment, - DialContext: (&net.Dialer{ - Timeout: time.Duration(peer.Timeouts.Connect) * time.Second, - KeepAlive: 30 * time.Second, - }).DialContext, - TLSHandshakeTimeout: time.Duration(peer.Timeouts.TLSHandshake) * time.Second, - ResponseHeaderTimeout: time.Duration(peer.Timeouts.ResponseHeader) * time.Second, - ExpectContinueTimeout: time.Duration(peer.Timeouts.ExpectContinue) * time.Second, - ForceAttemptHTTP2: true, - MaxIdleConns: 100, - MaxIdleConnsPerHost: 10, - IdleConnTimeout: time.Duration(peer.Timeouts.IdleConn) * time.Second, - } + peerTransport := http.DefaultTransport.(*http.Transport).Clone() + peerTransport.DialContext = (&net.Dialer{ + Timeout: time.Duration(peer.Timeouts.Connect) * time.Second, + KeepAlive: 30 * time.Second, + }).DialContext + peerTransport.TLSHandshakeTimeout = time.Duration(peer.Timeouts.TLSHandshake) * time.Second + peerTransport.ResponseHeaderTimeout = time.Duration(peer.Timeouts.ResponseHeader) * time.Second + peerTransport.ExpectContinueTimeout = time.Duration(peer.Timeouts.ExpectContinue) * time.Second + peerTransport.MaxIdleConnsPerHost = 10 + peerTransport.IdleConnTimeout = time.Duration(peer.Timeouts.IdleConn) * time.Second🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/peerproxy.go` around lines 40 - 54, The transport creation currently constructs a fresh http.Transport and loses stdlib defaults; change it to start from http.DefaultTransport.Clone() (type-assert to *http.Transport) and then override only peer-specific fields: set DialContext using a net.Dialer with Timeout and KeepAlive, and set TLSHandshakeTimeout, ResponseHeaderTimeout, ExpectContinueTimeout, IdleConnTimeout, and MaxIdleConnsPerHost as before; update the variable peerTransport in peerproxy.go to use the cloned transport so defaults like Proxy, ForceAttemptHTTP2, and MaxIdleConns are preserved.proxy/process_test.go (1)
588-588: Variable shadows module-leveldebugLogger.The local
debugLoggershadows the package-level variable. Consider using a different name liketestLoggerfor clarity.🔧 Suggested fix
- debugLogger := NewLogMonitorWriter(io.Discard) - process := NewProcess("test-model", 30, modelConfig, debugLogger, debugLogger) + testLogger := NewLogMonitorWriter(io.Discard) + process := NewProcess("test-model", 30, modelConfig, testLogger, testLogger)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/process_test.go` at line 588, The test creates a local variable named debugLogger that shadows the package-level debugLogger; rename the local variable (e.g., testLogger) where NewLogMonitorWriter(io.Discard) is called in process_test.go to avoid shadowing and potential confusion and update any subsequent references in the test to use the new name (reference: debugLogger, NewLogMonitorWriter).ui-svelte/src/stores/api.ts (1)
65-65: Minor formatting inconsistency.There's an extra space before the colon in the
localeCompareoptions.🔧 Suggested fix
- return (a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric : true} ); + return (a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric: true });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/stores/api.ts` at line 65, Fix the spacing inside the options object passed to String.prototype.localeCompare: change the options from "{ numeric : true}" to "{ numeric: true }" in the expression "(a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric : true} )" so the colon has no leading space and add the closing-space before the final parenthesis for consistent formatting.proxy/config/model_config.go (1)
8-10: Consider idiomatic Go constant naming.Go convention uses
MixedCapsormixedCapsfor constants rather thanSCREAMING_SNAKE_CASE. Consider renaming toModelConfigDefaultTTLordefaultModelTTL.💡 Suggested change
const ( - MODEL_CONFIG_DEFAULT_TTL = -1 + ModelConfigDefaultTTL = -1 )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/config/model_config.go` around lines 8 - 10, The constant MODEL_CONFIG_DEFAULT_TTL uses SCREAMING_SNAKE_CASE; rename it to follow Go idioms (e.g., ModelConfigDefaultTTL for exported or defaultModelTTL for package-private) and update all references, including any usages in functions, tests, and imports in this package so compilation succeeds; ensure documentation/comments and any JSON/tags that relied on the old name are adjusted accordingly.ui-svelte/src/components/playground/ChatInterface.svelte (1)
60-75: Consider persisting on component unmount.The throttled save works well during normal operation, but if the user navigates away before the 2-second timer fires, the latest messages may be lost. Consider adding an
onDestroyor ensuring a final save.💡 Suggested improvement
import { onDestroy } from "svelte"; // ... existing code ... onDestroy(() => { try { localStorage.setItem("playground-messages", JSON.stringify(messages)); } catch {} });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/playground/ChatInterface.svelte` around lines 60 - 75, The throttled persistence currently using $effect, lastSaveTime, and the save() closure can miss the latest messages if the component unmounts before the timer fires; add a final flush in an onDestroy handler that calls the same save logic (writing JSON.stringify(messages) to "playground-messages" via localStorage) to guarantee persistence on teardown, or extract the save functionality into a shared function and invoke it both from the throttled timer and from onDestroy to avoid duplication.docker/unified/install-llama-swap.sh (1)
30-39: Consider GitHub API rate limiting.The unauthenticated GitHub API request on line 32-33 is subject to rate limiting (60 requests/hour per IP). In CI environments with shared IPs, this could cause intermittent failures.
Consider using
$GITHUB_TOKENif available:💡 Suggested improvement
if [ "$VERSION" = "latest" ]; then echo "=== Resolving latest llama-swap release ===" - VERSION=$(curl -fsSL "https://api.github.com/repos/${REPO}/releases/latest" \ + AUTH_HEADER="" + if [ -n "$GITHUB_TOKEN" ]; then + AUTH_HEADER="Authorization: token $GITHUB_TOKEN" + fi + VERSION=$(curl -fsSL ${AUTH_HEADER:+-H "$AUTH_HEADER"} "https://api.github.com/repos/${REPO}/releases/latest" \ | grep '"tag_name"' | head -1 | cut -d'"' -f4 | sed 's/^v//')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/unified/install-llama-swap.sh` around lines 30 - 39, The unauthenticated curl call that resolves the latest release (using VERSION, REPO and the curl pipeline) can hit GitHub rate limits; update the script to use a GITHUB_TOKEN when present by adding an Authorization header to the curl request (e.g., set header only if $GITHUB_TOKEN is non-empty) and fall back to the unauthenticated request otherwise, keeping the same parsing of tag_name and the existing error handling for an empty VERSION.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/unified-docker.yml:
- Around line 52-61: The matrix serialization currently produces an empty-string
entry when the backends array is empty; modify the logic around backends and
matrix so that if backends is empty you write an explicit empty JSON array
("[]") to GITHUB_OUTPUT instead of serializing an empty element. Locate the
backends array and the matrix generation (variables backends and matrix, and the
echo "matrix=$matrix" >> $GITHUB_OUTPUT) and add a conditional: if backends has
no elements set matrix='[]' else build the JSON with printf | jq as before, then
echo matrix to GITHUB_OUTPUT.
In `@AGENTS.md`:
- Line 8: Update the typo "svelt5" to "svelte5" in the documentation string (the
phrase "typescript, vite and svelt5 for UI (located in ui/)") so the UI tech
list correctly reads "typescript, vite and svelte5 for UI"; ensure only the
spelling is changed and no other text is modified.
In `@docker/build-image.sh`:
- Around line 117-128: The get_default_branch function currently probes branches
with git ls-remote --heads which returns success even if the ref is absent;
update get_default_branch to perform reliable detection by either adding
--exit-code to each git ls-remote --heads probe (e.g., git ls-remote --exit-code
--heads ...) so the if/elif properly fails when the branch is missing, or
alternatively query the remote HEAD symref (git ls-remote --symref <repo_url>
HEAD) and parse the target to determine the default branch; ensure the function
still falls back predictably (used by get_latest_commit) when detection fails.
In `@docker/unified/build-image.sh`:
- Around line 177-179: The script is calling resolve_ref("${LLAMA_SWAP_REPO}"
"${LS_VERSION}") which rejects user-friendly selectors like "latest" or bare
release numbers, causing help/examples to fail; update build-image.sh so that
when LS_VERSION is set the script either (a) skips resolving and passes the raw
LS_VERSION through to the build (i.e. set LS_HASH="${LS_VERSION}" when
resolve_ref would reject it) or (b) validate/normalize LS_VERSION up front and
restrict the help text to only document real git refs; modify the logic around
LS_VERSION/LS_HASH and the resolve_ref invocation (and update any help text) so
LLAMA_SWAP_REPO uses a usable ref for docker buildx instead of aborting for
"latest"/numeric values.
In `@docker/unified/install-llama.sh`:
- Around line 32-45: The script currently only handles BACKEND="cuda" and
"vulkan" so add a fail-fast else branch after the existing if/elif to error out
for any other BACKEND; update install-llama.sh to check the BACKEND variable and
call echo >&2 with a clear message like "Unsupported BACKEND: $BACKEND" and exit
1 if it's not "cuda" or "vulkan", and apply the same guard change to
docker/unified/install-sd.sh; reference the BACKEND variable and the CMAKE_FLAGS
assignment to locate where to insert the else { echo ...; exit 1 } branch.
In `@Makefile`:
- Line 54: The Makefile currently has the linux/arm64 build command commented
out (the GOOS=linux GOARCH=arm64 go build ... line), which prevents generation
of the linux-arm64 artifact expected by .goreleaser.yaml and scripts/install.sh;
restore that artifact by uncommenting or re-adding the ARM64 build command that
uses GOOS=linux GOARCH=arm64 with the same -ldflags (-X main.commit=${GIT_HASH}
-X main.version=local_${GIT_HASH} -X main.date=${BUILD_DATE}) and output pattern
($(BUILD_DIR)/$(APP_NAME)-linux-arm64), and ensure the linux build target still
invokes both the amd64 and arm64 build commands so `make linux` produces both
artifacts.
In `@proxy/config/config.go`:
- Around line 315-329: The rebuild of modelConfig.Filters.SetParamsByID can
silently overwrite entries when two different source keys expand to the same
newKey; after computing newKey (using macroSlug/macroStr and
substituteMacroInValue with entry.Name/entry.Value) but before assigning into
newSetParamsByID, check whether newKey already exists in newSetParamsByID and if
so return a Config error referencing modelId and the colliding key(s); do this
check inside the loop that constructs newSetParamsByID so collisions are
rejected early rather than overwriting.
In `@proxy/process.go`:
- Around line 99-115: checkHealthEndpoint() currently uses hard-coded timeouts
causing startup health checks to fail despite transport using config.Timeouts;
modify checkHealthEndpoint() to create its http.Client with a Transport that
applies the same timeout settings from config.Timeouts (same DialContext
timeout, TLSHandshakeTimeout, ResponseHeaderTimeout, ExpectContinueTimeout,
IdleConnTimeout, etc.) or reuse the existing transport instance used for
reverseProxy (transport) so the health probe uses identical connection and
response timeouts as the proxied requests.
In `@ui-svelte/src/App.svelte`:
- Around line 51-56: The Playground subtree is still mounted because
class:hidden only hides it; change to conditionally render Playground so it only
mounts on the root route: replace the div wrapping Playground with an {`#if`
$currentRoute === "/"} block that renders <Playground /> (and remove the
class:hidden usage), while leaving <Router {routes}
on:routeLoaded={handleRouteLoaded} /> mounted as-is so routeLoaded still fires.
In `@ui-svelte/src/components/playground/AudioInterface.svelte`:
- Around line 26-28: The effect setting playgroundStores.audioTranscribing from
isTranscribing doesn't clear the flag on unmount, so add an unmount cleanup that
resets playgroundStores.audioTranscribing to false: either return a cleanup
function from the existing $effect that calls
playgroundStores.audioTranscribing.set(false) or add an onDestroy(() =>
playgroundStores.audioTranscribing.set(false)) next to the $effect to ensure the
global flag is cleared when the component is destroyed.
In `@ui-svelte/src/components/playground/ImageInterface.svelte`:
- Around line 33-38: When the user changes models, clear or re-key the SDAPI
LoRA state so LoRAs from the previous model aren’t sent to the new one: reset
availableLoras, selectedLoras, lorasLoaded and loraError whenever the selected
model (or its reactive store) changes; alternatively scope these stores by model
id (e.g., a Map keyed by model name). Update the reactive blocks or watchers
that load LoRAs (the code manipulating availableLoras, selectedLoras,
isLoadingLoras, lorasLoaded, loraError — and the similar logic repeated in the
other occurrences) to perform this reset before starting the new load and ensure
selectedLoras is always tied to the current model.
In `@ui-svelte/src/lib/markdown.ts`:
- Around line 239-245: normalizeLatexDelimiters currently does blind string
replacements and mutates fenced code/inline code; replace it with a remark-style
transform that walks the Markdown AST and performs the same regex substitutions
only on plain text nodes (e.g., visit/text nodes) while explicitly skipping
nodes of type "code", "inlineCode", "math", and "inlineMath". Update the
function (or add a new remark plugin referenced by normalizeLatexDelimiters) to
accept/transform an MDAST root, apply the two replacements on node.value for
eligible text nodes, and return the root so markdown parsing preserves literal
code blocks and existing math nodes.
- Around line 120-122: The current logic that treats a blank trimmed line (where
trimmed === "") as a safe split and sets lastCompleteBoundary = i can break
CommonMark containers (e.g., lists/blockquote that continue across blank lines)
and causes suffix-only rendering to diverge from renderMarkdown(text); change
the split strategy so that before accepting a boundary you check whether the
previous top-level block can legally continue (or whether the blank line is
still part of the same container) and if it can, fall back to re-rendering the
full complete section instead of only the suffix, or implement caching of parsed
top-level AST nodes and use those to determine safe split points; update the
code paths that compute lastCompleteBoundary, complete, and suffix and the code
that calls renderMarkdown to use this container-aware decision (or the cached
top-level AST) so streamed DOM matches full renderMarkdown output.
---
Outside diff comments:
In @.github/workflows/go-ci.yml:
- Around line 49-67: The cache keys for the restore and save steps differ: the
"restore-simple-responder" step hashes
'cmd/simple-responder/simple-responder.go' but the "save-simple-responder" step
hashes 'misc/simple-responder/simple-responder.go', preventing cache reuse;
update the key in the save step (save-simple-responder) to use the same
hashFiles path 'cmd/simple-responder/simple-responder.go' so both steps compute
identical cache keys.
- Around line 49-58: The workflow excludes cmd/** changes but the restore/save
cache keys and this job depend on cmd/simple-responder/simple-responder.go,
causing CI to skip and cache misses; update the workflow path filter to include
cmd/simple-responder/** (or specifically
cmd/simple-responder/simple-responder.go) so PRs touching that file trigger the
job, and make the cache key paths consistent by changing either the restore key
or the save key so both reference the same source file (ensure the cache step
identified as restore-simple-responder and the corresponding save step use
identical hashFiles('cmd/simple-responder/simple-responder.go') or both point to
misc/... if you intend that path).
---
Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 265-268: The code currently ignores errors from http.NewRequest
and processGroup.ProxyRequest and always emits ModelPreloadedEvent{ModelName:
mid, Success: true}; change this so you capture and check both errors: call
http.NewRequest and handle its error, call processGroup.ProxyRequest and capture
any returned error, only emit ModelPreloadedEvent with Success: true if both
succeed; on failure, emit Success: false and log the error via
proxyLogger.Errorf (include mid and error), and avoid treating the preload as
successful when ProxyRequest or request creation fails.
---
Nitpick comments:
In `@AGENTS.md`:
- Around line 30-40: The fenced code block showing the example commit message in
AGENTS.md is missing a language identifier and triggers a markdown lint warning;
update that block (the triple-backtick block containing "proxy: add new feature"
and the list) to include a language tag such as text or markdown (e.g., change
``` to ```text) so the linter recognizes the block type and the warning is
resolved.
In `@docker/unified/install-llama-swap.sh`:
- Around line 30-39: The unauthenticated curl call that resolves the latest
release (using VERSION, REPO and the curl pipeline) can hit GitHub rate limits;
update the script to use a GITHUB_TOKEN when present by adding an Authorization
header to the curl request (e.g., set header only if $GITHUB_TOKEN is non-empty)
and fall back to the unauthenticated request otherwise, keeping the same parsing
of tag_name and the existing error handling for an empty VERSION.
In `@proxy/config/model_config.go`:
- Around line 8-10: The constant MODEL_CONFIG_DEFAULT_TTL uses
SCREAMING_SNAKE_CASE; rename it to follow Go idioms (e.g., ModelConfigDefaultTTL
for exported or defaultModelTTL for package-private) and update all references,
including any usages in functions, tests, and imports in this package so
compilation succeeds; ensure documentation/comments and any JSON/tags that
relied on the old name are adjusted accordingly.
In `@proxy/peerproxy.go`:
- Around line 40-54: The transport creation currently constructs a fresh
http.Transport and loses stdlib defaults; change it to start from
http.DefaultTransport.Clone() (type-assert to *http.Transport) and then override
only peer-specific fields: set DialContext using a net.Dialer with Timeout and
KeepAlive, and set TLSHandshakeTimeout, ResponseHeaderTimeout,
ExpectContinueTimeout, IdleConnTimeout, and MaxIdleConnsPerHost as before;
update the variable peerTransport in peerproxy.go to use the cloned transport so
defaults like Proxy, ForceAttemptHTTP2, and MaxIdleConns are preserved.
In `@proxy/process_test.go`:
- Line 588: The test creates a local variable named debugLogger that shadows the
package-level debugLogger; rename the local variable (e.g., testLogger) where
NewLogMonitorWriter(io.Discard) is called in process_test.go to avoid shadowing
and potential confusion and update any subsequent references in the test to use
the new name (reference: debugLogger, NewLogMonitorWriter).
In `@ui-svelte/src/components/playground/ChatInterface.svelte`:
- Around line 60-75: The throttled persistence currently using $effect,
lastSaveTime, and the save() closure can miss the latest messages if the
component unmounts before the timer fires; add a final flush in an onDestroy
handler that calls the same save logic (writing JSON.stringify(messages) to
"playground-messages" via localStorage) to guarantee persistence on teardown, or
extract the save functionality into a shared function and invoke it both from
the throttled timer and from onDestroy to avoid duplication.
In `@ui-svelte/src/stores/api.ts`:
- Line 65: Fix the spacing inside the options object passed to
String.prototype.localeCompare: change the options from "{ numeric : true}" to
"{ numeric: true }" in the expression "(a.name + a.id).localeCompare(b.name +
b.id, undefined, { numeric : true} )" so the colon has no leading space and add
the closing-space before the final parenthesis for consistent formatting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b3e4dd0d-49c4-46ac-8ac3-97ca408fb785
⛔ Files ignored due to path filters (1)
ui-svelte/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (70)
.coderabbit.yaml.github/workflows/config-schema.yml.github/workflows/containers.yml.github/workflows/go-ci.yml.github/workflows/unified-docker.ymlAGENTS.mdCLAUDE.mdMakefileREADME.mdcmd/simple-responder/simple-responder.goconfig-schema.jsonconfig.example.yamldocker/build-container.shdocker/build-image.shdocker/unified/Dockerfiledocker/unified/README.mddocker/unified/build-image.shdocker/unified/config.example.yamldocker/unified/install-ik-llama.shdocker/unified/install-llama-swap.shdocker/unified/install-llama.shdocker/unified/install-sd.shdocker/unified/install-whisper.shdocs/configuration.mdgo.modproxy/config/config.goproxy/config/config_posix_test.goproxy/config/config_test.goproxy/config/config_windows_test.goproxy/config/filters.goproxy/config/filters_test.goproxy/config/macro_in_macro_test.goproxy/config/model_config.goproxy/config/model_config_test.goproxy/config/peer.goproxy/events.goproxy/metrics_monitor.goproxy/metrics_monitor_test.goproxy/peerproxy.goproxy/peerproxy_test.goproxy/process.goproxy/process_test.goproxy/proxymanager.goproxy/proxymanager_api.goproxy/proxymanager_test.goui-svelte/.npmrcui-svelte/package.jsonui-svelte/src/App.svelteui-svelte/src/components/Header.svelteui-svelte/src/components/LogPanel.svelteui-svelte/src/components/ModelsPanel.svelteui-svelte/src/components/StatsPanel.svelteui-svelte/src/components/playground/AudioInterface.svelteui-svelte/src/components/playground/ChatInterface.svelteui-svelte/src/components/playground/ChatMessage.svelteui-svelte/src/components/playground/ImageInterface.svelteui-svelte/src/components/playground/ModelSelector.svelteui-svelte/src/components/playground/RerankInterface.svelteui-svelte/src/components/playground/SpeechInterface.svelteui-svelte/src/lib/markdown.test.tsui-svelte/src/lib/markdown.tsui-svelte/src/lib/rerankApi.tsui-svelte/src/lib/sdApi.tsui-svelte/src/lib/types.tsui-svelte/src/routes/Playground.svelteui-svelte/src/routes/PlaygroundStub.svelteui-svelte/src/stores/api.tsui-svelte/src/stores/playgroundActivity.tsui-svelte/src/stores/route.tsui-svelte/vite.config.ts
✅ Files skipped from review due to trivial changes (11)
- .coderabbit.yaml
- CLAUDE.md
- docker/unified/README.md
- go.mod
- ui-svelte/.npmrc
- ui-svelte/src/routes/PlaygroundStub.svelte
- ui-svelte/src/components/ModelsPanel.svelte
- docker/unified/config.example.yaml
- ui-svelte/vite.config.ts
- ui-svelte/package.json
- proxy/events.go
| backends=() | ||
| # schedule uses defaults (build both); workflow_dispatch respects inputs | ||
| if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then | ||
| backends+=("cuda") | ||
| fi | ||
| if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then | ||
| backends+=("vulkan") | ||
| fi | ||
| matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .) | ||
| echo "matrix=$matrix" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
H2: Return [] when no backend is selected.
If both inputs are false, this still serializes one empty entry, so the build job runs once with matrix.backend == "" instead of being skipped.
Suggested fix
- matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
- echo "matrix=$matrix" >> $GITHUB_OUTPUT
+ if [[ ${`#backends`[@]} -eq 0 ]]; then
+ matrix='[]'
+ else
+ matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
+ fi
+ echo "matrix=$matrix" >> "$GITHUB_OUTPUT"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| backends=() | |
| # schedule uses defaults (build both); workflow_dispatch respects inputs | |
| if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then | |
| backends+=("cuda") | |
| fi | |
| if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then | |
| backends+=("vulkan") | |
| fi | |
| matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .) | |
| echo "matrix=$matrix" >> $GITHUB_OUTPUT | |
| backends=() | |
| # schedule uses defaults (build both); workflow_dispatch respects inputs | |
| if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then | |
| backends+=("cuda") | |
| fi | |
| if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then | |
| backends+=("vulkan") | |
| fi | |
| if [[ ${`#backends`[@]} -eq 0 ]]; then | |
| matrix='[]' | |
| else | |
| matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .) | |
| fi | |
| echo "matrix=$matrix" >> "$GITHUB_OUTPUT" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/unified-docker.yml around lines 52 - 61, The matrix
serialization currently produces an empty-string entry when the backends array
is empty; modify the logic around backends and matrix so that if backends is
empty you write an explicit empty JSON array ("[]") to GITHUB_OUTPUT instead of
serializing an empty element. Locate the backends array and the matrix
generation (variables backends and matrix, and the echo "matrix=$matrix" >>
$GITHUB_OUTPUT) and add a conditional: if backends has no elements set
matrix='[]' else build the JSON with printf | jq as before, then echo matrix to
GITHUB_OUTPUT.
| ## Tech stack | ||
|
|
||
| - golang | ||
| - typescript, vite and svelt5 for UI (located in ui/) |
There was a problem hiding this comment.
Typo: "svelt5" should be "svelte5".
Proposed fix
-- typescript, vite and svelt5 for UI (located in ui/)
+- typescript, vite and svelte5 for UI (located in ui/)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - typescript, vite and svelt5 for UI (located in ui/) | |
| - typescript, vite and svelte5 for UI (located in ui/) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` at line 8, Update the typo "svelt5" to "svelte5" in the
documentation string (the phrase "typescript, vite and svelt5 for UI (located in
ui/)") so the UI tech list correctly reads "typescript, vite and svelte5 for
UI"; ensure only the spelling is changed and no other text is modified.
| get_default_branch() { | ||
| local repo_url="$1" | ||
|
|
||
| # Check for master first | ||
| if git ls-remote --heads "${repo_url}" master &>/dev/null; then | ||
| echo "master" | ||
| elif git ls-remote --heads "${repo_url}" main &>/dev/null; then | ||
| echo "main" | ||
| else | ||
| echo "master" # fallback | ||
| fi | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
repo="https://github.com/ggml-org/whisper.cpp.git"
set +e
git ls-remote --heads "$repo" branch-that-does-not-exist >/dev/null 2>&1
plain_status=$?
git ls-remote --exit-code --heads "$repo" branch-that-does-not-exist >/dev/null 2>&1
strict_status=$?
set -e
echo "plain ls-remote status: $plain_status"
echo "--exit-code status: $strict_status"
echo -n "remote HEAD branch: "
git ls-remote --symref "$repo" HEAD \
| awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'Repository: napmany/llmsnap
Length of output: 133
Fix get_default_branch to discover non-master defaults.
Line 121 uses git ls-remote --heads ... master without --exit-code. This command returns success (exit code 0) even when the branch does not exist, so the if-condition always succeeds for any reachable repository and the elif chain never reaches the main probe. Repositories with main as the default branch will then produce an empty hash in get_latest_commit and fail the build.
Use --exit-code to make the probes fail when a branch is absent, or read the remote HEAD symref directly.
Suggested fix
function get_default_branch() {
local repo_url="$1"
-
- # Check for master first
- if git ls-remote --heads "${repo_url}" master &>/dev/null; then
- echo "master"
- elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
- echo "main"
- else
- echo "master" # fallback
- fi
+ local branch
+
+ branch="$(
+ git ls-remote --symref "${repo_url}" HEAD 2>/dev/null \
+ | awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'
+ )"
+
+ if [[ -n "${branch}" ]]; then
+ echo "${branch}"
+ elif git ls-remote --exit-code --heads "${repo_url}" main >/dev/null 2>&1; then
+ echo "main"
+ elif git ls-remote --exit-code --heads "${repo_url}" master >/dev/null 2>&1; then
+ echo "master"
+ else
+ echo "ERROR: Could not determine default branch for ${repo_url}" >&2
+ return 1
+ fi
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| get_default_branch() { | |
| local repo_url="$1" | |
| # Check for master first | |
| if git ls-remote --heads "${repo_url}" master &>/dev/null; then | |
| echo "master" | |
| elif git ls-remote --heads "${repo_url}" main &>/dev/null; then | |
| echo "main" | |
| else | |
| echo "master" # fallback | |
| fi | |
| } | |
| get_default_branch() { | |
| local repo_url="$1" | |
| local branch | |
| branch="$( | |
| git ls-remote --symref "${repo_url}" HEAD 2>/dev/null \ | |
| | awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}' | |
| )" | |
| if [[ -n "${branch}" ]]; then | |
| echo "${branch}" | |
| elif git ls-remote --exit-code --heads "${repo_url}" main >/dev/null 2>&1; then | |
| echo "main" | |
| elif git ls-remote --exit-code --heads "${repo_url}" master >/dev/null 2>&1; then | |
| echo "master" | |
| else | |
| echo "ERROR: Could not determine default branch for ${repo_url}" >&2 | |
| return 1 | |
| fi | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/build-image.sh` around lines 117 - 128, The get_default_branch
function currently probes branches with git ls-remote --heads which returns
success even if the ref is absent; update get_default_branch to perform reliable
detection by either adding --exit-code to each git ls-remote --heads probe
(e.g., git ls-remote --exit-code --heads ...) so the if/elif properly fails when
the branch is missing, or alternatively query the remote HEAD symref (git
ls-remote --symref <repo_url> HEAD) and parse the target to determine the
default branch; ensure the function still falls back predictably (used by
get_latest_commit) when detection fails.
| if [[ -n "${LS_VERSION:-}" ]]; then | ||
| LS_HASH=$(resolve_ref "${LLAMA_SWAP_REPO}" "${LS_VERSION}") || exit 1 | ||
| echo "llama-swap: ${LS_VERSION} -> ${LS_HASH}" |
There was a problem hiding this comment.
H3: LS_VERSION no longer accepts the values this script advertises.
resolve_ref() rejects latest and bare release numbers like 170, so the documented examples/help text now terminate before docker buildx starts. Either pass llama-swap release selectors through unchanged, or tighten the docs/input descriptions so this path only accepts real git refs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/unified/build-image.sh` around lines 177 - 179, The script is calling
resolve_ref("${LLAMA_SWAP_REPO}" "${LS_VERSION}") which rejects user-friendly
selectors like "latest" or bare release numbers, causing help/examples to fail;
update build-image.sh so that when LS_VERSION is set the script either (a) skips
resolving and passes the raw LS_VERSION through to the build (i.e. set
LS_HASH="${LS_VERSION}" when resolve_ref would reject it) or (b)
validate/normalize LS_VERSION up front and restrict the help text to only
document real git refs; modify the logic around LS_VERSION/LS_HASH and the
resolve_ref invocation (and update any help text) so LLAMA_SWAP_REPO uses a
usable ref for docker buildx instead of aborting for "latest"/numeric values.
| if [ "$BACKEND" = "cuda" ]; then | ||
| CMAKE_FLAGS+=( | ||
| -DGGML_CUDA=ON | ||
| -DGGML_VULKAN=OFF | ||
| "-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}" | ||
| "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" | ||
| "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda" | ||
| ) | ||
| elif [ "$BACKEND" = "vulkan" ]; then | ||
| CMAKE_FLAGS+=( | ||
| -DGGML_CUDA=OFF | ||
| -DGGML_VULKAN=ON | ||
| ) | ||
| fi |
There was a problem hiding this comment.
M2 (Medium): Fail fast on unsupported BACKEND values.
Any value other than cuda or vulkan falls through with only the common CMake flags, which can leave you with a “successful” image that lacks the requested accelerator support. Add an explicit failure here; the same guard is needed in docker/unified/install-sd.sh.
Suggested fix
if [ "$BACKEND" = "cuda" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=ON
-DGGML_VULKAN=OFF
"-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
)
elif [ "$BACKEND" = "vulkan" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=OFF
-DGGML_VULKAN=ON
)
+else
+ echo "FATAL: unsupported BACKEND '$BACKEND' (expected cuda or vulkan)" >&2
+ exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ "$BACKEND" = "cuda" ]; then | |
| CMAKE_FLAGS+=( | |
| -DGGML_CUDA=ON | |
| -DGGML_VULKAN=OFF | |
| "-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}" | |
| "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" | |
| "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda" | |
| ) | |
| elif [ "$BACKEND" = "vulkan" ]; then | |
| CMAKE_FLAGS+=( | |
| -DGGML_CUDA=OFF | |
| -DGGML_VULKAN=ON | |
| ) | |
| fi | |
| if [ "$BACKEND" = "cuda" ]; then | |
| CMAKE_FLAGS+=( | |
| -DGGML_CUDA=ON | |
| -DGGML_VULKAN=OFF | |
| "-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}" | |
| "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" | |
| "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda" | |
| ) | |
| elif [ "$BACKEND" = "vulkan" ]; then | |
| CMAKE_FLAGS+=( | |
| -DGGML_CUDA=OFF | |
| -DGGML_VULKAN=ON | |
| ) | |
| else | |
| echo "FATAL: unsupported BACKEND '$BACKEND' (expected cuda or vulkan)" >&2 | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/unified/install-llama.sh` around lines 32 - 45, The script currently
only handles BACKEND="cuda" and "vulkan" so add a fail-fast else branch after
the existing if/elif to error out for any other BACKEND; update install-llama.sh
to check the BACKEND variable and call echo >&2 with a clear message like
"Unsupported BACKEND: $BACKEND" and exit 1 if it's not "cuda" or "vulkan", and
apply the same guard change to docker/unified/install-sd.sh; reference the
BACKEND variable and the CMAKE_FLAGS assignment to locate where to insert the
else { echo ...; exit 1 } branch.
| <div class="h-full" class:hidden={$currentRoute !== "/"}> | ||
| <Playground /> | ||
| </div> | ||
| <div class="h-full" class:hidden={$currentRoute === "/"}> | ||
| <Router {routes} on:routeLoaded={handleRouteLoaded} /> | ||
| </div> |
There was a problem hiding this comment.
H1: Only mount Playground on the root route.
class:hidden just toggles CSS. On non-root pages, the whole playground subtree still mounts, so its state and effects stay live in the background. If the router needs to stay mounted for routeLoaded, you can still keep <Router> mounted and gate only <Playground>.
Suggested fix
<main class="flex-1 overflow-auto p-4">
- <div class="h-full" class:hidden={$currentRoute !== "/"}>
- <Playground />
- </div>
+ {`#if` $currentRoute === "/"}
+ <div class="h-full">
+ <Playground />
+ </div>
+ {/if}
<div class="h-full" class:hidden={$currentRoute === "/"}>
<Router {routes} on:routeLoaded={handleRouteLoaded} />
</div>
</main>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <div class="h-full" class:hidden={$currentRoute !== "/"}> | |
| <Playground /> | |
| </div> | |
| <div class="h-full" class:hidden={$currentRoute === "/"}> | |
| <Router {routes} on:routeLoaded={handleRouteLoaded} /> | |
| </div> | |
| {`#if` $currentRoute === "/"} | |
| <div class="h-full"> | |
| <Playground /> | |
| </div> | |
| {/if} | |
| <div class="h-full" class:hidden={$currentRoute === "/"}> | |
| <Router {routes} on:routeLoaded={handleRouteLoaded} /> | |
| </div> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui-svelte/src/App.svelte` around lines 51 - 56, The Playground subtree is
still mounted because class:hidden only hides it; change to conditionally render
Playground so it only mounts on the root route: replace the div wrapping
Playground with an {`#if` $currentRoute === "/"} block that renders <Playground />
(and remove the class:hidden usage), while leaving <Router {routes}
on:routeLoaded={handleRouteLoaded} /> mounted as-is so routeLoaded still fires.
| $effect(() => { | ||
| playgroundStores.audioTranscribing.set(isTranscribing); | ||
| }); |
There was a problem hiding this comment.
L1: Reset audioTranscribing when this component unmounts.
This effect only writes the current value. If the component is destroyed while isTranscribing is true, the global activity flag stays stuck true.
Suggested fix
$effect(() => {
playgroundStores.audioTranscribing.set(isTranscribing);
+ return () => {
+ playgroundStores.audioTranscribing.set(false);
+ };
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| $effect(() => { | |
| playgroundStores.audioTranscribing.set(isTranscribing); | |
| }); | |
| $effect(() => { | |
| playgroundStores.audioTranscribing.set(isTranscribing); | |
| return () => { | |
| playgroundStores.audioTranscribing.set(false); | |
| }; | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui-svelte/src/components/playground/AudioInterface.svelte` around lines 26 -
28, The effect setting playgroundStores.audioTranscribing from isTranscribing
doesn't clear the flag on unmount, so add an unmount cleanup that resets
playgroundStores.audioTranscribing to false: either return a cleanup function
from the existing $effect that calls
playgroundStores.audioTranscribing.set(false) or add an onDestroy(() =>
playgroundStores.audioTranscribing.set(false)) next to the $effect to ensure the
global flag is cleared when the component is destroyed.
| // SDAPI lora state | ||
| let availableLoras = $state<SdApiLora[]>([]); | ||
| let selectedLoras = $state<SdApiLoraRef[]>([]); | ||
| let isLoadingLoras = $state(false); | ||
| let lorasLoaded = $state(false); | ||
| let loraError = $state<string | null>(null); |
There was a problem hiding this comment.
M1 (Medium): Reset LoRA state when the selected model changes.
availableLoras, selectedLoras, and lorasLoaded persist across model switches, but selectedLoras is sent unchanged in the next SDAPI request. After loading LoRAs for model A, switching to model B can submit A’s LoRAs to B. Clear this state on model change, or key it by model.
Suggested fix
let availableLoras = $state<SdApiLora[]>([]);
let selectedLoras = $state<SdApiLoraRef[]>([]);
let isLoadingLoras = $state(false);
let lorasLoaded = $state(false);
let loraError = $state<string | null>(null);
+
+ $effect(() => {
+ $selectedModelStore;
+ availableLoras = [];
+ selectedLoras = [];
+ lorasLoaded = false;
+ loraError = null;
+ });Also applies to: 47-62, 99-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui-svelte/src/components/playground/ImageInterface.svelte` around lines 33 -
38, When the user changes models, clear or re-key the SDAPI LoRA state so LoRAs
from the previous model aren’t sent to the new one: reset availableLoras,
selectedLoras, lorasLoaded and loraError whenever the selected model (or its
reactive store) changes; alternatively scope these stores by model id (e.g., a
Map keyed by model name). Update the reactive blocks or watchers that load LoRAs
(the code manipulating availableLoras, selectedLoras, isLoadingLoras,
lorasLoaded, loraError — and the similar logic repeated in the other
occurrences) to perform this reset before starting the new load and ensure
selectedLoras is always tied to the current model.
| // Outside fences/math: blank line marks a complete boundary | ||
| if (trimmed === "") { | ||
| lastCompleteBoundary = i; |
There was a problem hiding this comment.
H2 — Suffix-only rendering breaks container markdown.
A blank line is not a safe split point for all CommonMark containers. For example, - a\n\n- b and > a\n\n> b are still single list/blockquote structures when parsed together, but Line 218 renders the suffix in isolation, so the streamed final DOM can diverge from renderMarkdown(text). Please fall back to re-rendering the full complete section whenever the previous block can legally continue, or cache parsed top-level AST nodes instead of raw string slices.
Also applies to: 214-221
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui-svelte/src/lib/markdown.ts` around lines 120 - 122, The current logic that
treats a blank trimmed line (where trimmed === "") as a safe split and sets
lastCompleteBoundary = i can break CommonMark containers (e.g., lists/blockquote
that continue across blank lines) and causes suffix-only rendering to diverge
from renderMarkdown(text); change the split strategy so that before accepting a
boundary you check whether the previous top-level block can legally continue (or
whether the blank line is still part of the same container) and if it can, fall
back to re-rendering the full complete section instead of only the suffix, or
implement caching of parsed top-level AST nodes and use those to determine safe
split points; update the code paths that compute lastCompleteBoundary, complete,
and suffix and the code that calls renderMarkdown to use this container-aware
decision (or the cached top-level AST) so streamed DOM matches full
renderMarkdown output.
| // Convert \[...\] to $$...$$ and \(...\) to $...$ | ||
| export function normalizeLatexDelimiters(text: string): string { | ||
| // Display math: \[...\] → $$...$$ (may span multiple lines) | ||
| text = text.replace(/\\\[([\s\S]*?)\\\]/g, (_match, inner) => `$$${inner}$$`); | ||
| // Inline math: \(...\) → $...$ | ||
| text = text.replace(/\\\(([\s\S]*?)\\\)/g, (_match, inner) => `$${inner}$`); | ||
| return text; |
There was a problem hiding this comment.
M1 — Raw-string LaTeX normalization mutates code examples.
Line 254 normalizes \(...\) and \[...\] before markdown parsing, so the same sequences inside fenced code blocks or inline code are rewritten into $...$/$$...$$ and no longer render as literal code. This should run as a remark transform over plain text nodes only, skipping code, inlineCode, and existing math nodes.
Also applies to: 254-254
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ui-svelte/src/lib/markdown.ts` around lines 239 - 245,
normalizeLatexDelimiters currently does blind string replacements and mutates
fenced code/inline code; replace it with a remark-style transform that walks the
Markdown AST and performs the same regex substitutions only on plain text nodes
(e.g., visit/text nodes) while explicitly skipping nodes of type "code",
"inlineCode", "math", and "inlineMath". Update the function (or add a new remark
plugin referenced by normalizeLatexDelimiters) to accept/transform an MDAST
root, apply the two replacements on node.value for eligible text nodes, and
return the root so markdown parsing preserves literal code blocks and existing
math nodes.
The previous approach used a seen map that collapsed non-contiguous same-group entries into a single batch, reordering the preload sequence. This change preserves the original preload order by only batching models that appear consecutively for the same group. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity.
This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously.
Changes:
sync.WaitGroupto parallelize within each groupswapProcessGrouponce per group (handles exclusivity)Before: 3 models in same group → loaded sequentially (total = sum of load times)
After: 3 models in same group → loaded in parallel (total ≈ max load time)
Cross-ref: mostlygeek#637
Summary by CodeRabbit
Refactor
New Features
Configuration
Bug Fixes