proxy: parallelize startup preload within process groups by Quentin-M · Pull Request #20 · napmany/llmsnap

Quentin-M · 2026-04-08T18:35:18Z

Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity.

This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously.

Changes:

Group preload models by ProcessGroup before starting
Use sync.WaitGroup to parallelize within each group
Call swapProcessGroup once per group (handles exclusivity)
Log group completion with model count

Before: 3 models in same group → loaded sequentially (total = sum of load times)
After: 3 models in same group → loaded in parallel (total ≈ max load time)

Cross-ref: mostlygeek#637

Summary by CodeRabbit

Refactor
- Faster, concurrent model preload by group and improved preload logging; new in‑flight request tracking surfaced to UI.
New Features
- SDAPI image endpoints and LoRA listing; multi-image gallery and SD settings in the UI.
- UI: in‑flight request counts, model aliases, rerank tool, improved chat persistence/streaming markdown, OGG audio uploads.
Configuration
- Per‑model and per‑peer timeouts, global TTL, and per‑request parameter overrides by requested model ID.
Bug Fixes
- More robust metrics parsing and proxy timeout handling; improved SSE inflight reporting.

…#522) Fixes: mostlygeek#517

…k#523) ## Summary - Add `--provenance=false` to docker build commands in `build-container.sh` - BuildKit attestation manifests are stored as untagged images in GHCR, and the `delete-untagged-containers` cleanup job deletes them, breaking the manifest list and causing `manifest unknown` errors on pull - ref: actions/delete-package-versions#162

…ostlygeek#520) add incremental rendering to Playground > Chat

- Keep Playground component mounted when navigating away, preserving streaming/generating state - Add animated gradient effect on Playground nav link when activity is in progress

…ek#516) add a real time counter of pending (inflight) requests to the UI.

Pause auto-scroll when the user scrolls up to review logs, and resume when they scroll back to the bottom. - add `userScrolledUp` state variable - add `handleScroll` to detect scroll position with 40px threshold - guard the auto-scroll effect with `!userScrolledUp` closes mostlygeek#529

Add setParamsByID filter that applies different request parameters based on the requested model ID, enabling per-alias behaviour for a single loaded model. - add SetParamsByID field to Filters struct and SanitizedSetParamsByID method - substitute ${MODEL_ID} and other macros in setParamsByID keys and values - validate no unknown macros remain in keys or values after substitution - apply setParamsByID in proxyInferenceHandler after setParams (can override it) - update config-schema.json with setParamsByID definition - update UI to show aliases and make them selectable in the Playground closes mostlygeek#534

Add a new Rerank tab to the playground that lets users test /v1/rerank endpoints. Supports a visual table editor and a JSON editor mode that stay in sync when toggling between them. - add rerankApi.ts with typed wrapper for /v1/rerank - add RerankInterface.svelte with query input, sortable document table, color-coded scores, auto-add row, cancel/clear, and token usage - add rerankLoading store to playgroundActivity derived store - register Rerank tab in Playground.svelte Updates mostlygeek#481

…eek#544)

Updated README to enhance the description of the web interface and added details about features like token metrics, request inspection, model management, and real-time log streaming.

Updated description to clarify compatibility and usage.

Added new images for model loading and real-time log streaming sections.

Add `cuda13` as a supported build architecture, targeting the `ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image. The `server-cuda13` image ships with CUDA 13 libraries, providing improved performance on recent NVIDIA hardware compared to the existing `server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX 50-series) benefit from reduced model load latency and higher token throughput. - Add `cuda13` to the allowed architectures list in `docker/build-container.sh` - Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so the container is built and pushed automatically

Add a copy-to-clipboard button that appears on hover for each code block rendered in the chat interface assistant messages. - Svelte action `codeBlockCopy` injects a button into every `<pre>` element - MutationObserver reattaches buttons as streaming content arrives - Button shows a check icon for 2 seconds after a successful copy - Uses clipboard API with execCommand fallback for non-secure contexts - CSS hides button by default and reveals it on pre:hover https://claude.ai/code/session_01PTA5ao5YQuFAS6a9juLeZW --------- Co-authored-by: Claude <noreply@anthropic.com>

Add a new configuration parameter globalTTL that all models will inherit. The default value is 0 which matches the currently functionality to never automatically unload a model. The model.ttl's default has changed to -1, which means use the global TTL value. Any model.ttl >=0 is now value with 0 meaning never unload. This allows a model to override a globalTTL > 0 and be configured to never unload. Fixes mostlygeek#459 Closes mostlygeek#512

…#578) Extend macro substitution to the name and description fields of ModelConfig, matching the behavior already present for cmd, proxy, checkEndpoint, and filters. - substitute global/model macros (including MODEL_ID) in name and description - substitute PORT macro in name and description when allocated - validate no unknown macros remain in name and description after substitution - add tests for macro substitution, MODEL_ID, and unknown macro error

Use natural sorting for model names. Previously the model list was sorted lexicographically, which resulted in unintuitive ordering when numbers were included in the name. Example: Before qwen3.5:2B qwen3.5:35B-3AB qwen3.5:9B After qwen3.5:2B qwen3.5:9B qwen3.5:35B-3AB This change sorts models using natural order so numeric parts are compared numerically.

Upgrade vite and related dependencies to take advantage of Vite 8's improved build times via Rolldown and Oxc. - vite: ^6.3.5 → ^8.0.0 - @sveltejs/vite-plugin-svelte: ^5.0.3 → ^7.0.0 - svelte: ^5.19.0 → ^5.46.4 - vite-plugin-compression2: ^2.4.0 → ^2.5.1 - vitest: ^4.0.18 → ^4.1.0 --------- Co-authored-by: Claude <noreply@anthropic.com>

properly parse anthropic compatible usage data from streaming responses. closes: mostlygeek#577

Add proxy routes for stable-diffusion.cpp's /sdapi/v1/txt2img, /sdapi/v1/img2img, and /sdapi/v1/loras endpoints. POST endpoints use proxyInferenceHandler (model in JSON body), GET /loras uses proxyGETModelHandler (model in query param). Update the image playground with a dual-mode UI supporting both OpenAI and SDAPI backends. In SDAPI mode, loras are fetched first to prime the server-side cache, and all txt2img parameters are exposed (negative prompt, steps, cfg_scale, seed, batch_size, clip_skip, sampler, scheduler, lora selection with multipliers). - Add 3 sdapi route registrations in proxymanager.go - Add sdApi.ts client with generateSdImage and fetchSdLoras - Add SDAPI types (SdApiTxt2ImgRequest, SdApiResponse, etc.) - Add /sdapi to vite dev proxy config - Add backend tests for sdapi routing - Support batch image display in gallery grid https://claude.ai/code/session_0186MGX6NXdHVBTv2KH45fqn --------- Co-authored-by: Claude <noreply@anthropic.com>

…ostlygeek#597) Add Docker build scripts for a unified cuda docker container with llama-server, stable-diffusion.cpp, whisper.cpp.

- set up a GHA scheduled job to build the container nightly - enabling pushing a llama-swap:unified and a llama-swap:unified-Y-M-D image to ghcr.io - tidy up Dockerfile to use a non-root user and llama-swap as an entry point

Update docker/unified scripts to support building both cuda and vulkan unified images.

multiple fixes to vulkan build: - use ubuntu 26.04 to be compatible with AMD 395+ (Strix halo) hardware - add home directory in container - fix stable-diffusion install to actually enable vulkan --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…geek#605)

…tlygeek#625) Expose CMAKE_CUDA_ARCHITECTURES as a Docker build ARG so users can customize CUDA architectures via --build-arg without editing the Dockerfile. - convert hardcoded ENV to ARG with default, feeding into ENV - replace silent fallback defaults (:-) in scripts with :? guards to fail fast if the env var is missing - add usage example to Dockerfile header Follow up to: mostlygeek#624 https://claude.ai/code/session_01EWiUe7jNABX7Uz95dUGJqK Co-authored-by: Claude <noreply@anthropic.com>

…ek#627) Extend the existing config-schema workflow to also validate config.example.yaml against config-schema.json using check-jsonschema. - add config.example.yaml to PR and push path triggers - install check-jsonschema via pip - run validation of config.example.yaml against schema https://claude.ai/code/session_01Y1oqwE6mwNs9UTJgZRgXtG --------- Co-authored-by: Claude <noreply@anthropic.com>

…k#619) Add configurable HTTP timeout settings to both models and peers to support installations that requires longer timeouts than the current hardcoded defaults. Closes mostlygeek#618

Keep request duration from being underreported when upstream timings only cover part of the full request lifecycle. - compare wall-clock and upstream timing durations - keep token and throughput values from timings - add regression coverage for underreported timings fixes mostlygeek#602

coderabbitai · 2026-04-08T18:35:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 850f0800-9b97-48ed-b3ca-03482ac0d62b

📥 Commits

Reviewing files that changed from the base of the PR and between 0353306 and 8e00a60.

📒 Files selected for processing (1)

proxy/proxymanager.go

🚧 Files skipped from review as they are similar to previous changes (1)

proxy/proxymanager.go

Walkthrough

Startup preload groups models by process-group and starts each group concurrently; added a thread-safe InflightCounter and Gin trackInflight middleware emitting SSE inflight events; added SDAPI routes; introduced per-model and per-peer timeouts, globalTTL, and filters.setParamsByID; multiple UI, Docker, CI, and test updates.

Changes

Cohort / File(s)	Summary
Proxy manager & runtime `proxy/proxymanager.go`, `proxy/proxymanager_api.go`, `proxy/proxymanager_test.go`, `proxy/process.go`, `proxy/process_test.go`	Added `InflightCounter`, `trackInflight()` middleware, SSE inflight emission, SDAPI route wiring, and reworked startup preload grouping/parallel start; updated API model `Aliases` and tests.
Config, schema & filters `proxy/config/...`, `config-schema.json`, `config.example.yaml`, `proxy/config/*_test.go`	Added `globalTTL` and `MODEL_CONFIG_DEFAULT_TTL` sentinel, `TimeoutsConfig` for models/peers, `filters.setParamsByID` (with macro/alias handling), validation and tests; schema/example updated.
Reverse proxy & transports `proxy/peerproxy.go`, `proxy/peerproxy_test.go`, `proxy/process.go`	Replaced shared transport with per-peer/per-process `http.Transport` configured from `TimeoutsConfig`, enabled HTTP/2 attempt, and adjusted tests.
Metrics, events `proxy/metrics_monitor.go`, `proxy/metrics_monitor_test.go`, `proxy/events.go`	Adjusted metrics parsing (v1/responses fallback) and duration calculation; added `InFlightRequestsEvent` and integrated SSE emission.
Routing / SD API / simple-responder `cmd/simple-responder/simple-responder.go`, `proxy/proxymanager.go`, `proxy/proxymanager_test.go`	Added `/sdapi/v1/*` endpoints, applied `SanitizedSetParamsByID` in inference flow, and added SDAPI routing tests and error handling.
UI: inflight, aliases, SD, streaming markdown, rerank `ui-svelte/...` (e.g. `src/stores/api.ts`, `src/lib/sdApi.ts`, `src/lib/markdown.ts`, `src/components/`, `src/routes/`, `vite.config.ts`)	Added inflight store/UI surfaces, model `aliases` support, SDAPI client/UI, streaming-markdown pipeline + tests, rerank UI and API client, route/route-store changes, and many UX improvements.
Docker, build scripts & CI `.github/workflows/`, `docker/`, `docker/unified/*`, `docker/build-image.sh`, `Makefile`	Added unified Dockerfiles and installers, `docker/build-image.sh`, `cuda13` target, CI workflow updates, and packaging / provenance tweaks.
Misc, docs & tests `AGENTS.md`, `README.md`, `CLAUDE.md`, `go.mod`, many tests across `proxy/*` and UI tests	Documentation updates, Go version bump, widespread test additions/adjustments to cover new TTLs/timeouts/filters/SDAPI/metrics/UI behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Sync with llama-swap #7 — Overlaps changes to proxy/proxymanager.go (route/handler registration and middleware wiring).
Sync with llama-swap v178 #11 — Related ProxyManager internals and startup/route logic modifications.
Sync with llama-swap v190 #17 — Touches proxymanager.go and API/event behavior similar to inflight/event additions.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'proxy: parallelize startup preload within process groups' directly describes the main change in the changeset: parallelizing model preload within process groups while maintaining group-level exclusivity.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch parallel-preload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity. This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously. - Group preload models by ProcessGroup before starting - Use sync.WaitGroup to parallelize within each group - Call swapProcessGroup once per group (handles exclusivity) - Log group completion with model count Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@proxy/proxymanager.go`:
- Around line 227-230: The current preload logic ignores errors from
http.NewRequest and processGroup.ProxyRequest and always emits
ModelPreloadedEvent{Success: true}; fix by checking and handling both errors:
capture the error returned by http.NewRequest and by
processGroup.ProxyRequest(mid, &DiscardWriter{}, req), only emit
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) and call
proxyLogger.Infof when ProxyRequest returns nil, otherwise emit
ModelPreloadedEvent{ModelName: mid, Success: false} (include the error) and log
proxyLogger.Errorf with the error; ensure you reference ProxyRequest,
DiscardWriter, ModelPreloadedEvent, event.Emit and
proxyLogger.Infof/proxyLogger.Errorf when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0b7a1a88-3621-42f7-9618-df4413efb758

📥 Commits

Reviewing files that changed from the base of the PR and between 1c24bb1 and 78dbba7.

📒 Files selected for processing (1)

proxy/proxymanager.go

coderabbitai · 2026-04-08T18:39:20Z

+						req, _ := http.NewRequest("GET", "/", nil)
+						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
+						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+						proxyLogger.Infof("Preloaded model: %s", mid)


⚠️ Potential issue | 🔴 Critical

Handle ProxyRequest errors before emitting success events.

Line 228 ignores the preload error, but Line 229 always emits Success: true. This can incorrectly report failed preloads as successful.

Suggested fix

- req, _ := http.NewRequest("GET", "/", nil) - processGroup.ProxyRequest(mid, &DiscardWriter{}, req) - event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) - proxyLogger.Infof("Preloaded model: %s", mid) + req, _ := http.NewRequest("GET", "/", nil) + if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil { + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false}) + proxyLogger.Errorf("Failed to preload model %s: %v", mid, err) + return + } + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) + proxyLogger.Infof("Preloaded model: %s", mid)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

req, _ := http.NewRequest("GET", "/", nil)

processGroup.ProxyRequest(mid, &DiscardWriter{}, req)

event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})

proxyLogger.Infof("Preloaded model: %s", mid)

req, _ := http.NewRequest("GET", "/", nil)

if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil {

event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})

proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)

return

}

event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})

proxyLogger.Infof("Preloaded model: %s", mid)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager.go` around lines 227 - 230, The current preload logic ignores errors from http.NewRequest and processGroup.ProxyRequest and always emits ModelPreloadedEvent{Success: true}; fix by checking and handling both errors: capture the error returned by http.NewRequest and by processGroup.ProxyRequest(mid, &DiscardWriter{}, req), only emit event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) and call proxyLogger.Infof when ProxyRequest returns nil, otherwise emit ModelPreloadedEvent{ModelName: mid, Success: false} (include the error) and log proxyLogger.Errorf with the error; ensure you reference ProxyRequest, DiscardWriter, ModelPreloadedEvent, event.Emit and proxyLogger.Infof/proxyLogger.Errorf when making the change.

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/workflows/go-ci.yml (2)
49-67: ⚠️ Potential issue | 🟡 Minor

Cache key mismatch prevents cache reuse.

Line 54 uses cmd/simple-responder/simple-responder.go for the restore step, but line 67 still uses misc/simple-responder/simple-responder.go for the save step. Since the file exists at cmd/simple-responder/simple-responder.go, the cache key will always differ between restore and save, preventing any cache hit. Update line 67 to hash cmd/simple-responder/simple-responder.go.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/go-ci.yml around lines 49 - 67, The cache keys for the
restore and save steps differ: the "restore-simple-responder" step hashes
'cmd/simple-responder/simple-responder.go' but the "save-simple-responder" step
hashes 'misc/simple-responder/simple-responder.go', preventing cache reuse;
update the key in the save step (save-simple-responder) to use the same
hashFiles path 'cmd/simple-responder/simple-responder.go' so both steps compute
identical cache keys.
49-58: ⚠️ Potential issue | 🟠 Major

Fix path filter to include simple-responder changes, and align cache keys.

This job now depends on cmd/simple-responder/simple-responder.go (via the cache restore key), but the workflow excludes all cmd/** paths. A PR that only modifies this file will skip CI even though the job needs to rebuild it for tests.

Additionally, the restore cache key hashes cmd/simple-responder/simple-responder.go while the save step hashes misc/simple-responder/simple-responder.go—these will never match, so the cache is non-functional and always rebuilds.

Fix: Either whitelist cmd/simple-responder/simple-responder.go in the path filters and align the cache keys to use the same source file, or if this helper truly needs no testing, document why the cache key should not follow the source.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/go-ci.yml around lines 49 - 58, The workflow excludes
cmd/** changes but the restore/save cache keys and this job depend on
cmd/simple-responder/simple-responder.go, causing CI to skip and cache misses;
update the workflow path filter to include cmd/simple-responder/** (or
specifically cmd/simple-responder/simple-responder.go) so PRs touching that file
trigger the job, and make the cache key paths consistent by changing either the
restore key or the save key so both reference the same source file (ensure the
cache step identified as restore-simple-responder and the corresponding save
step use identical hashFiles('cmd/simple-responder/simple-responder.go') or both
point to misc/... if you intend that path).

♻️ Duplicate comments (1)

proxy/proxymanager.go (1)

265-268: ⚠️ Potential issue | 🔴 Critical

H1: Emit preload success only after the preload call succeeds.

Line 265/266 still drops both request-construction and ProxyRequest errors, while Line 267 always emits Success: true. A failed startup will still look preloaded to observers.

Suggested fix

-						req, _ := http.NewRequest("GET", "/", nil)
-						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
-						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
-						proxyLogger.Infof("Preloaded model: %s", mid)
+						req, err := http.NewRequest("GET", "/", nil)
+						if err != nil {
+							event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
+							proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
+							return
+						}
+						if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil {
+							event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
+							proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
+							return
+						}
+						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+						proxyLogger.Infof("Preloaded model: %s", mid)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@proxy/proxymanager.go` around lines 265 - 268, The code currently ignores
errors from http.NewRequest and processGroup.ProxyRequest and always emits
ModelPreloadedEvent{ModelName: mid, Success: true}; change this so you capture
and check both errors: call http.NewRequest and handle its error, call
processGroup.ProxyRequest and capture any returned error, only emit
ModelPreloadedEvent with Success: true if both succeed; on failure, emit
Success: false and log the error via proxyLogger.Errorf (include mid and error),
and avoid treating the preload as successful when ProxyRequest or request
creation fails.

🧹 Nitpick comments (7)

AGENTS.md (1)

30-40: Consider adding a language identifier to the fenced code block.

The commit message example code block lacks a language specifier (e.g., text or markdown), which triggers a markdown lint warning.
Proposed fix
-```
+```text
 proxy: add new feature
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 30 - 40, The fenced code block showing the example
commit message in AGENTS.md is missing a language identifier and triggers a
markdown lint warning; update that block (the triple-backtick block containing
"proxy: add new feature" and the list) to include a language tag such as text or
markdown (e.g., change ``` to ```text) so the linter recognizes the block type
and the warning is resolved.

proxy/peerproxy.go (1)

40-54: Use http.DefaultTransport.Clone() to start the transport configuration.

A fresh http.Transport drops every stdlib default you did not remember to re-specify. Cloning the default transport first preserves Go's baseline behavior (Proxy, ForceAttemptHTTP2, MaxIdleConns, etc.) and lets you override only the peer-specific timeouts and MaxIdleConnsPerHost.

Suggested refactor

-		peerTransport := &http.Transport{
-			Proxy: http.ProxyFromEnvironment,
-			DialContext: (&net.Dialer{
-				Timeout:   time.Duration(peer.Timeouts.Connect) * time.Second,
-				KeepAlive: 30 * time.Second,
-			}).DialContext,
-			TLSHandshakeTimeout:   time.Duration(peer.Timeouts.TLSHandshake) * time.Second,
-			ResponseHeaderTimeout: time.Duration(peer.Timeouts.ResponseHeader) * time.Second,
-			ExpectContinueTimeout: time.Duration(peer.Timeouts.ExpectContinue) * time.Second,
-			ForceAttemptHTTP2:     true,
-			MaxIdleConns:          100,
-			MaxIdleConnsPerHost:   10,
-			IdleConnTimeout:       time.Duration(peer.Timeouts.IdleConn) * time.Second,
-		}
+		peerTransport := http.DefaultTransport.(*http.Transport).Clone()
+		peerTransport.DialContext = (&net.Dialer{
+			Timeout:   time.Duration(peer.Timeouts.Connect) * time.Second,
+			KeepAlive: 30 * time.Second,
+		}).DialContext
+		peerTransport.TLSHandshakeTimeout = time.Duration(peer.Timeouts.TLSHandshake) * time.Second
+		peerTransport.ResponseHeaderTimeout = time.Duration(peer.Timeouts.ResponseHeader) * time.Second
+		peerTransport.ExpectContinueTimeout = time.Duration(peer.Timeouts.ExpectContinue) * time.Second
+		peerTransport.MaxIdleConnsPerHost = 10
+		peerTransport.IdleConnTimeout = time.Duration(peer.Timeouts.IdleConn) * time.Second

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@proxy/peerproxy.go` around lines 40 - 54, The transport creation currently
constructs a fresh http.Transport and loses stdlib defaults; change it to start
from http.DefaultTransport.Clone() (type-assert to *http.Transport) and then
override only peer-specific fields: set DialContext using a net.Dialer with
Timeout and KeepAlive, and set TLSHandshakeTimeout, ResponseHeaderTimeout,
ExpectContinueTimeout, IdleConnTimeout, and MaxIdleConnsPerHost as before;
update the variable peerTransport in peerproxy.go to use the cloned transport so
defaults like Proxy, ForceAttemptHTTP2, and MaxIdleConns are preserved.

proxy/process_test.go (1)

588-588: Variable shadows module-level debugLogger.

The local debugLogger shadows the package-level variable. Consider using a different name like testLogger for clarity.

🔧 Suggested fix

-	debugLogger := NewLogMonitorWriter(io.Discard)
-	process := NewProcess("test-model", 30, modelConfig, debugLogger, debugLogger)
+	testLogger := NewLogMonitorWriter(io.Discard)
+	process := NewProcess("test-model", 30, modelConfig, testLogger, testLogger)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@proxy/process_test.go` at line 588, The test creates a local variable named
debugLogger that shadows the package-level debugLogger; rename the local
variable (e.g., testLogger) where NewLogMonitorWriter(io.Discard) is called in
process_test.go to avoid shadowing and potential confusion and update any
subsequent references in the test to use the new name (reference: debugLogger,
NewLogMonitorWriter).

ui-svelte/src/stores/api.ts (1)

65-65: Minor formatting inconsistency.

There's an extra space before the colon in the localeCompare options.

🔧 Suggested fix

-              return (a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric : true} );
+              return (a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric: true });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/stores/api.ts` at line 65, Fix the spacing inside the options
object passed to String.prototype.localeCompare: change the options from "{
numeric : true}" to "{ numeric: true }" in the expression "(a.name +
a.id).localeCompare(b.name + b.id, undefined, { numeric : true} )" so the colon
has no leading space and add the closing-space before the final parenthesis for
consistent formatting.

proxy/config/model_config.go (1)

8-10: Consider idiomatic Go constant naming.

Go convention uses MixedCaps or mixedCaps for constants rather than SCREAMING_SNAKE_CASE. Consider renaming to ModelConfigDefaultTTL or defaultModelTTL.

💡 Suggested change

 const (
-	MODEL_CONFIG_DEFAULT_TTL = -1
+	ModelConfigDefaultTTL = -1
 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@proxy/config/model_config.go` around lines 8 - 10, The constant
MODEL_CONFIG_DEFAULT_TTL uses SCREAMING_SNAKE_CASE; rename it to follow Go
idioms (e.g., ModelConfigDefaultTTL for exported or defaultModelTTL for
package-private) and update all references, including any usages in functions,
tests, and imports in this package so compilation succeeds; ensure
documentation/comments and any JSON/tags that relied on the old name are
adjusted accordingly.

ui-svelte/src/components/playground/ChatInterface.svelte (1)

60-75: Consider persisting on component unmount.

The throttled save works well during normal operation, but if the user navigates away before the 2-second timer fires, the latest messages may be lost. Consider adding an onDestroy or ensuring a final save.

💡 Suggested improvement

import { onDestroy } from "svelte";

// ... existing code ...

onDestroy(() => {
  try { localStorage.setItem("playground-messages", JSON.stringify(messages)); } catch {}
});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/components/playground/ChatInterface.svelte` around lines 60 -
75, The throttled persistence currently using $effect, lastSaveTime, and the
save() closure can miss the latest messages if the component unmounts before the
timer fires; add a final flush in an onDestroy handler that calls the same save
logic (writing JSON.stringify(messages) to "playground-messages" via
localStorage) to guarantee persistence on teardown, or extract the save
functionality into a shared function and invoke it both from the throttled timer
and from onDestroy to avoid duplication.

docker/unified/install-llama-swap.sh (1)

30-39: Consider GitHub API rate limiting.

The unauthenticated GitHub API request on line 32-33 is subject to rate limiting (60 requests/hour per IP). In CI environments with shared IPs, this could cause intermittent failures.

Consider using $GITHUB_TOKEN if available:

💡 Suggested improvement

 if [ "$VERSION" = "latest" ]; then
     echo "=== Resolving latest llama-swap release ==="
-    VERSION=$(curl -fsSL "https://api.github.com/repos/${REPO}/releases/latest" \
+    AUTH_HEADER=""
+    if [ -n "$GITHUB_TOKEN" ]; then
+        AUTH_HEADER="Authorization: token $GITHUB_TOKEN"
+    fi
+    VERSION=$(curl -fsSL ${AUTH_HEADER:+-H "$AUTH_HEADER"} "https://api.github.com/repos/${REPO}/releases/latest" \
         | grep '"tag_name"' | head -1 | cut -d'"' -f4 | sed 's/^v//')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docker/unified/install-llama-swap.sh` around lines 30 - 39, The
unauthenticated curl call that resolves the latest release (using VERSION, REPO
and the curl pipeline) can hit GitHub rate limits; update the script to use a
GITHUB_TOKEN when present by adding an Authorization header to the curl request
(e.g., set header only if $GITHUB_TOKEN is non-empty) and fall back to the
unauthenticated request otherwise, keeping the same parsing of tag_name and the
existing error handling for an empty VERSION.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/unified-docker.yml:
- Around line 52-61: The matrix serialization currently produces an empty-string
entry when the backends array is empty; modify the logic around backends and
matrix so that if backends is empty you write an explicit empty JSON array
("[]") to GITHUB_OUTPUT instead of serializing an empty element. Locate the
backends array and the matrix generation (variables backends and matrix, and the
echo "matrix=$matrix" >> $GITHUB_OUTPUT) and add a conditional: if backends has
no elements set matrix='[]' else build the JSON with printf | jq as before, then
echo matrix to GITHUB_OUTPUT.

In `@AGENTS.md`:
- Line 8: Update the typo "svelt5" to "svelte5" in the documentation string (the
phrase "typescript, vite and svelt5 for UI (located in ui/)") so the UI tech
list correctly reads "typescript, vite and svelte5 for UI"; ensure only the
spelling is changed and no other text is modified.

In `@docker/build-image.sh`:
- Around line 117-128: The get_default_branch function currently probes branches
with git ls-remote --heads which returns success even if the ref is absent;
update get_default_branch to perform reliable detection by either adding
--exit-code to each git ls-remote --heads probe (e.g., git ls-remote --exit-code
--heads ...) so the if/elif properly fails when the branch is missing, or
alternatively query the remote HEAD symref (git ls-remote --symref <repo_url>
HEAD) and parse the target to determine the default branch; ensure the function
still falls back predictably (used by get_latest_commit) when detection fails.

In `@docker/unified/build-image.sh`:
- Around line 177-179: The script is calling resolve_ref("${LLAMA_SWAP_REPO}"
"${LS_VERSION}") which rejects user-friendly selectors like "latest" or bare
release numbers, causing help/examples to fail; update build-image.sh so that
when LS_VERSION is set the script either (a) skips resolving and passes the raw
LS_VERSION through to the build (i.e. set LS_HASH="${LS_VERSION}" when
resolve_ref would reject it) or (b) validate/normalize LS_VERSION up front and
restrict the help text to only document real git refs; modify the logic around
LS_VERSION/LS_HASH and the resolve_ref invocation (and update any help text) so
LLAMA_SWAP_REPO uses a usable ref for docker buildx instead of aborting for
"latest"/numeric values.

In `@docker/unified/install-llama.sh`:
- Around line 32-45: The script currently only handles BACKEND="cuda" and
"vulkan" so add a fail-fast else branch after the existing if/elif to error out
for any other BACKEND; update install-llama.sh to check the BACKEND variable and
call echo >&2 with a clear message like "Unsupported BACKEND: $BACKEND" and exit
1 if it's not "cuda" or "vulkan", and apply the same guard change to
docker/unified/install-sd.sh; reference the BACKEND variable and the CMAKE_FLAGS
assignment to locate where to insert the else { echo ...; exit 1 } branch.

In `@Makefile`:
- Line 54: The Makefile currently has the linux/arm64 build command commented
out (the GOOS=linux GOARCH=arm64 go build ... line), which prevents generation
of the linux-arm64 artifact expected by .goreleaser.yaml and scripts/install.sh;
restore that artifact by uncommenting or re-adding the ARM64 build command that
uses GOOS=linux GOARCH=arm64 with the same -ldflags (-X main.commit=${GIT_HASH}
-X main.version=local_${GIT_HASH} -X main.date=${BUILD_DATE}) and output pattern
($(BUILD_DIR)/$(APP_NAME)-linux-arm64), and ensure the linux build target still
invokes both the amd64 and arm64 build commands so `make linux` produces both
artifacts.

In `@proxy/config/config.go`:
- Around line 315-329: The rebuild of modelConfig.Filters.SetParamsByID can
silently overwrite entries when two different source keys expand to the same
newKey; after computing newKey (using macroSlug/macroStr and
substituteMacroInValue with entry.Name/entry.Value) but before assigning into
newSetParamsByID, check whether newKey already exists in newSetParamsByID and if
so return a Config error referencing modelId and the colliding key(s); do this
check inside the loop that constructs newSetParamsByID so collisions are
rejected early rather than overwriting.

In `@proxy/process.go`:
- Around line 99-115: checkHealthEndpoint() currently uses hard-coded timeouts
causing startup health checks to fail despite transport using config.Timeouts;
modify checkHealthEndpoint() to create its http.Client with a Transport that
applies the same timeout settings from config.Timeouts (same DialContext
timeout, TLSHandshakeTimeout, ResponseHeaderTimeout, ExpectContinueTimeout,
IdleConnTimeout, etc.) or reuse the existing transport instance used for
reverseProxy (transport) so the health probe uses identical connection and
response timeouts as the proxied requests.

In `@ui-svelte/src/App.svelte`:
- Around line 51-56: The Playground subtree is still mounted because
class:hidden only hides it; change to conditionally render Playground so it only
mounts on the root route: replace the div wrapping Playground with an {`#if`
$currentRoute === "/"} block that renders <Playground /> (and remove the
class:hidden usage), while leaving <Router {routes}
on:routeLoaded={handleRouteLoaded} /> mounted as-is so routeLoaded still fires.

In `@ui-svelte/src/components/playground/AudioInterface.svelte`:
- Around line 26-28: The effect setting playgroundStores.audioTranscribing from
isTranscribing doesn't clear the flag on unmount, so add an unmount cleanup that
resets playgroundStores.audioTranscribing to false: either return a cleanup
function from the existing $effect that calls
playgroundStores.audioTranscribing.set(false) or add an onDestroy(() =>
playgroundStores.audioTranscribing.set(false)) next to the $effect to ensure the
global flag is cleared when the component is destroyed.

In `@ui-svelte/src/components/playground/ImageInterface.svelte`:
- Around line 33-38: When the user changes models, clear or re-key the SDAPI
LoRA state so LoRAs from the previous model aren’t sent to the new one: reset
availableLoras, selectedLoras, lorasLoaded and loraError whenever the selected
model (or its reactive store) changes; alternatively scope these stores by model
id (e.g., a Map keyed by model name). Update the reactive blocks or watchers
that load LoRAs (the code manipulating availableLoras, selectedLoras,
isLoadingLoras, lorasLoaded, loraError — and the similar logic repeated in the
other occurrences) to perform this reset before starting the new load and ensure
selectedLoras is always tied to the current model.

In `@ui-svelte/src/lib/markdown.ts`:
- Around line 239-245: normalizeLatexDelimiters currently does blind string
replacements and mutates fenced code/inline code; replace it with a remark-style
transform that walks the Markdown AST and performs the same regex substitutions
only on plain text nodes (e.g., visit/text nodes) while explicitly skipping
nodes of type "code", "inlineCode", "math", and "inlineMath". Update the
function (or add a new remark plugin referenced by normalizeLatexDelimiters) to
accept/transform an MDAST root, apply the two replacements on node.value for
eligible text nodes, and return the root so markdown parsing preserves literal
code blocks and existing math nodes.
- Around line 120-122: The current logic that treats a blank trimmed line (where
trimmed === "") as a safe split and sets lastCompleteBoundary = i can break
CommonMark containers (e.g., lists/blockquote that continue across blank lines)
and causes suffix-only rendering to diverge from renderMarkdown(text); change
the split strategy so that before accepting a boundary you check whether the
previous top-level block can legally continue (or whether the blank line is
still part of the same container) and if it can, fall back to re-rendering the
full complete section instead of only the suffix, or implement caching of parsed
top-level AST nodes and use those to determine safe split points; update the
code paths that compute lastCompleteBoundary, complete, and suffix and the code
that calls renderMarkdown to use this container-aware decision (or the cached
top-level AST) so streamed DOM matches full renderMarkdown output.

---

Outside diff comments:
In @.github/workflows/go-ci.yml:
- Around line 49-67: The cache keys for the restore and save steps differ: the
"restore-simple-responder" step hashes
'cmd/simple-responder/simple-responder.go' but the "save-simple-responder" step
hashes 'misc/simple-responder/simple-responder.go', preventing cache reuse;
update the key in the save step (save-simple-responder) to use the same
hashFiles path 'cmd/simple-responder/simple-responder.go' so both steps compute
identical cache keys.
- Around line 49-58: The workflow excludes cmd/** changes but the restore/save
cache keys and this job depend on cmd/simple-responder/simple-responder.go,
causing CI to skip and cache misses; update the workflow path filter to include
cmd/simple-responder/** (or specifically
cmd/simple-responder/simple-responder.go) so PRs touching that file trigger the
job, and make the cache key paths consistent by changing either the restore key
or the save key so both reference the same source file (ensure the cache step
identified as restore-simple-responder and the corresponding save step use
identical hashFiles('cmd/simple-responder/simple-responder.go') or both point to
misc/... if you intend that path).

---

Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 265-268: The code currently ignores errors from http.NewRequest
and processGroup.ProxyRequest and always emits ModelPreloadedEvent{ModelName:
mid, Success: true}; change this so you capture and check both errors: call
http.NewRequest and handle its error, call processGroup.ProxyRequest and capture
any returned error, only emit ModelPreloadedEvent with Success: true if both
succeed; on failure, emit Success: false and log the error via
proxyLogger.Errorf (include mid and error), and avoid treating the preload as
successful when ProxyRequest or request creation fails.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 30-40: The fenced code block showing the example commit message in
AGENTS.md is missing a language identifier and triggers a markdown lint warning;
update that block (the triple-backtick block containing "proxy: add new feature"
and the list) to include a language tag such as text or markdown (e.g., change
``` to ```text) so the linter recognizes the block type and the warning is
resolved.

In `@docker/unified/install-llama-swap.sh`:
- Around line 30-39: The unauthenticated curl call that resolves the latest
release (using VERSION, REPO and the curl pipeline) can hit GitHub rate limits;
update the script to use a GITHUB_TOKEN when present by adding an Authorization
header to the curl request (e.g., set header only if $GITHUB_TOKEN is non-empty)
and fall back to the unauthenticated request otherwise, keeping the same parsing
of tag_name and the existing error handling for an empty VERSION.

In `@proxy/config/model_config.go`:
- Around line 8-10: The constant MODEL_CONFIG_DEFAULT_TTL uses
SCREAMING_SNAKE_CASE; rename it to follow Go idioms (e.g., ModelConfigDefaultTTL
for exported or defaultModelTTL for package-private) and update all references,
including any usages in functions, tests, and imports in this package so
compilation succeeds; ensure documentation/comments and any JSON/tags that
relied on the old name are adjusted accordingly.

In `@proxy/peerproxy.go`:
- Around line 40-54: The transport creation currently constructs a fresh
http.Transport and loses stdlib defaults; change it to start from
http.DefaultTransport.Clone() (type-assert to *http.Transport) and then override
only peer-specific fields: set DialContext using a net.Dialer with Timeout and
KeepAlive, and set TLSHandshakeTimeout, ResponseHeaderTimeout,
ExpectContinueTimeout, IdleConnTimeout, and MaxIdleConnsPerHost as before;
update the variable peerTransport in peerproxy.go to use the cloned transport so
defaults like Proxy, ForceAttemptHTTP2, and MaxIdleConns are preserved.

In `@proxy/process_test.go`:
- Line 588: The test creates a local variable named debugLogger that shadows the
package-level debugLogger; rename the local variable (e.g., testLogger) where
NewLogMonitorWriter(io.Discard) is called in process_test.go to avoid shadowing
and potential confusion and update any subsequent references in the test to use
the new name (reference: debugLogger, NewLogMonitorWriter).

In `@ui-svelte/src/components/playground/ChatInterface.svelte`:
- Around line 60-75: The throttled persistence currently using $effect,
lastSaveTime, and the save() closure can miss the latest messages if the
component unmounts before the timer fires; add a final flush in an onDestroy
handler that calls the same save logic (writing JSON.stringify(messages) to
"playground-messages" via localStorage) to guarantee persistence on teardown, or
extract the save functionality into a shared function and invoke it both from
the throttled timer and from onDestroy to avoid duplication.

In `@ui-svelte/src/stores/api.ts`:
- Line 65: Fix the spacing inside the options object passed to
String.prototype.localeCompare: change the options from "{ numeric : true}" to
"{ numeric: true }" in the expression "(a.name + a.id).localeCompare(b.name +
b.id, undefined, { numeric : true} )" so the colon has no leading space and add
the closing-space before the final parenthesis for consistent formatting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b3e4dd0d-49c4-46ac-8ac3-97ca408fb785

📥 Commits

Reviewing files that changed from the base of the PR and between 78dbba7 and 0353306.

⛔ Files ignored due to path filters (1)

ui-svelte/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (70)

.coderabbit.yaml
.github/workflows/config-schema.yml
.github/workflows/containers.yml
.github/workflows/go-ci.yml
.github/workflows/unified-docker.yml
AGENTS.md
CLAUDE.md
Makefile
README.md
cmd/simple-responder/simple-responder.go
config-schema.json
config.example.yaml
docker/build-container.sh
docker/build-image.sh
docker/unified/Dockerfile
docker/unified/README.md
docker/unified/build-image.sh
docker/unified/config.example.yaml
docker/unified/install-ik-llama.sh
docker/unified/install-llama-swap.sh
docker/unified/install-llama.sh
docker/unified/install-sd.sh
docker/unified/install-whisper.sh
docs/configuration.md
go.mod
proxy/config/config.go
proxy/config/config_posix_test.go
proxy/config/config_test.go
proxy/config/config_windows_test.go
proxy/config/filters.go
proxy/config/filters_test.go
proxy/config/macro_in_macro_test.go
proxy/config/model_config.go
proxy/config/model_config_test.go
proxy/config/peer.go
proxy/events.go
proxy/metrics_monitor.go
proxy/metrics_monitor_test.go
proxy/peerproxy.go
proxy/peerproxy_test.go
proxy/process.go
proxy/process_test.go
proxy/proxymanager.go
proxy/proxymanager_api.go
proxy/proxymanager_test.go
ui-svelte/.npmrc
ui-svelte/package.json
ui-svelte/src/App.svelte
ui-svelte/src/components/Header.svelte
ui-svelte/src/components/LogPanel.svelte
ui-svelte/src/components/ModelsPanel.svelte
ui-svelte/src/components/StatsPanel.svelte
ui-svelte/src/components/playground/AudioInterface.svelte
ui-svelte/src/components/playground/ChatInterface.svelte
ui-svelte/src/components/playground/ChatMessage.svelte
ui-svelte/src/components/playground/ImageInterface.svelte
ui-svelte/src/components/playground/ModelSelector.svelte
ui-svelte/src/components/playground/RerankInterface.svelte
ui-svelte/src/components/playground/SpeechInterface.svelte
ui-svelte/src/lib/markdown.test.ts
ui-svelte/src/lib/markdown.ts
ui-svelte/src/lib/rerankApi.ts
ui-svelte/src/lib/sdApi.ts
ui-svelte/src/lib/types.ts
ui-svelte/src/routes/Playground.svelte
ui-svelte/src/routes/PlaygroundStub.svelte
ui-svelte/src/stores/api.ts
ui-svelte/src/stores/playgroundActivity.ts
ui-svelte/src/stores/route.ts
ui-svelte/vite.config.ts

✅ Files skipped from review due to trivial changes (11)

.coderabbit.yaml
CLAUDE.md
docker/unified/README.md
go.mod
ui-svelte/.npmrc
ui-svelte/src/routes/PlaygroundStub.svelte
ui-svelte/src/components/ModelsPanel.svelte
docker/unified/config.example.yaml
ui-svelte/vite.config.ts
ui-svelte/package.json
proxy/events.go

coderabbitai · 2026-04-08T18:59:42Z

+          backends=()
+          # schedule uses defaults (build both); workflow_dispatch respects inputs
+          if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then
+            backends+=("cuda")
+          fi
+          if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then
+            backends+=("vulkan")
+          fi
+          matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
+          echo "matrix=$matrix" >> $GITHUB_OUTPUT


⚠️ Potential issue | 🟠 Major

H2: Return [] when no backend is selected.

If both inputs are false, this still serializes one empty entry, so the build job runs once with matrix.backend == "" instead of being skipped.

Suggested fix

- matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .) - echo "matrix=$matrix" >> $GITHUB_OUTPUT + if [[ ${`#backends`[@]} -eq 0 ]]; then + matrix='[]' + else + matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .) + fi + echo "matrix=$matrix" >> "$GITHUB_OUTPUT"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

backends=()

# schedule uses defaults (build both); workflow_dispatch respects inputs

if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then

backends+=("cuda")

fi

if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then

backends+=("vulkan")

fi

matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)

echo "matrix=$matrix" >> $GITHUB_OUTPUT

backends=()

# schedule uses defaults (build both); workflow_dispatch respects inputs

if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then

backends+=("cuda")

fi

if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then

backends+=("vulkan")

fi

if [[ ${`#backends`[@]} -eq 0 ]]; then

matrix='[]'

else

matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)

fi

echo "matrix=$matrix" >> "$GITHUB_OUTPUT"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/unified-docker.yml around lines 52 - 61, The matrix serialization currently produces an empty-string entry when the backends array is empty; modify the logic around backends and matrix so that if backends is empty you write an explicit empty JSON array ("[]") to GITHUB_OUTPUT instead of serializing an empty element. Locate the backends array and the matrix generation (variables backends and matrix, and the echo "matrix=$matrix" >> $GITHUB_OUTPUT) and add a conditional: if backends has no elements set matrix='[]' else build the JSON with printf | jq as before, then echo matrix to GITHUB_OUTPUT.

coderabbitai · 2026-04-08T18:59:42Z

+## Tech stack
+
+- golang
+- typescript, vite and svelt5 for UI (located in ui/)


⚠️ Potential issue | 🟡 Minor

Typo: "svelt5" should be "svelte5".

Proposed fix

-- typescript, vite and svelt5 for UI (located in ui/) +- typescript, vite and svelte5 for UI (located in ui/)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- typescript, vite and svelt5 for UI (located in ui/)

- typescript, vite and svelte5 for UI (located in ui/)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 8, Update the typo "svelt5" to "svelte5" in the documentation string (the phrase "typescript, vite and svelt5 for UI (located in ui/)") so the UI tech list correctly reads "typescript, vite and svelte5 for UI"; ensure only the spelling is changed and no other text is modified.

coderabbitai · 2026-04-08T18:59:42Z

+get_default_branch() {
+    local repo_url="$1"
+
+    # Check for master first
+    if git ls-remote --heads "${repo_url}" master &>/dev/null; then
+        echo "master"
+    elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
+        echo "main"
+    else
+        echo "master"  # fallback
+    fi
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail repo="https://github.com/ggml-org/whisper.cpp.git" set +e git ls-remote --heads "$repo" branch-that-does-not-exist >/dev/null 2>&1 plain_status=$? git ls-remote --exit-code --heads "$repo" branch-that-does-not-exist >/dev/null 2>&1 strict_status=$? set -e echo "plain ls-remote status: $plain_status" echo "--exit-code status: $strict_status" echo -n "remote HEAD branch: " git ls-remote --symref "$repo" HEAD \ | awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'

Repository: napmany/llmsnap

Length of output: 133

Fix get_default_branch to discover non-master defaults.

Line 121 uses git ls-remote --heads ... master without --exit-code. This command returns success (exit code 0) even when the branch does not exist, so the if-condition always succeeds for any reachable repository and the elif chain never reaches the main probe. Repositories with main as the default branch will then produce an empty hash in get_latest_commit and fail the build.

Use --exit-code to make the probes fail when a branch is absent, or read the remote HEAD symref directly.

Suggested fix

function get_default_branch() { local repo_url="$1" - - # Check for master first - if git ls-remote --heads "${repo_url}" master &>/dev/null; then - echo "master" - elif git ls-remote --heads "${repo_url}" main &>/dev/null; then - echo "main" - else - echo "master" # fallback - fi + local branch + + branch="$( + git ls-remote --symref "${repo_url}" HEAD 2>/dev/null \ + | awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}' + )" + + if [[ -n "${branch}" ]]; then + echo "${branch}" + elif git ls-remote --exit-code --heads "${repo_url}" main >/dev/null 2>&1; then + echo "main" + elif git ls-remote --exit-code --heads "${repo_url}" master >/dev/null 2>&1; then + echo "master" + else + echo "ERROR: Could not determine default branch for ${repo_url}" >&2 + return 1 + fi }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

get_default_branch() {

local repo_url="$1"

# Check for master first

if git ls-remote --heads "${repo_url}" master &>/dev/null; then

echo "master"

elif git ls-remote --heads "${repo_url}" main &>/dev/null; then

echo "main"

else

echo "master" # fallback

fi

}

get_default_branch() {

local repo_url="$1"

local branch

branch="$(

git ls-remote --symref "${repo_url}" HEAD 2>/dev/null \

| awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'

)"

if [[ -n "${branch}" ]]; then

echo "${branch}"

elif git ls-remote --exit-code --heads "${repo_url}" main >/dev/null 2>&1; then

echo "main"

elif git ls-remote --exit-code --heads "${repo_url}" master >/dev/null 2>&1; then

echo "master"

else

echo "ERROR: Could not determine default branch for ${repo_url}" >&2

return 1

fi

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/build-image.sh` around lines 117 - 128, The get_default_branch function currently probes branches with git ls-remote --heads which returns success even if the ref is absent; update get_default_branch to perform reliable detection by either adding --exit-code to each git ls-remote --heads probe (e.g., git ls-remote --exit-code --heads ...) so the if/elif properly fails when the branch is missing, or alternatively query the remote HEAD symref (git ls-remote --symref <repo_url> HEAD) and parse the target to determine the default branch; ensure the function still falls back predictably (used by get_latest_commit) when detection fails.

coderabbitai · 2026-04-08T18:59:42Z

+if [[ -n "${LS_VERSION:-}" ]]; then
+    LS_HASH=$(resolve_ref "${LLAMA_SWAP_REPO}" "${LS_VERSION}") || exit 1
+    echo "llama-swap: ${LS_VERSION} -> ${LS_HASH}"


⚠️ Potential issue | 🟠 Major

H3: LS_VERSION no longer accepts the values this script advertises.

resolve_ref() rejects latest and bare release numbers like 170, so the documented examples/help text now terminate before docker buildx starts. Either pass llama-swap release selectors through unchanged, or tighten the docs/input descriptions so this path only accepts real git refs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/unified/build-image.sh` around lines 177 - 179, The script is calling resolve_ref("${LLAMA_SWAP_REPO}" "${LS_VERSION}") which rejects user-friendly selectors like "latest" or bare release numbers, causing help/examples to fail; update build-image.sh so that when LS_VERSION is set the script either (a) skips resolving and passes the raw LS_VERSION through to the build (i.e. set LS_HASH="${LS_VERSION}" when resolve_ref would reject it) or (b) validate/normalize LS_VERSION up front and restrict the help text to only document real git refs; modify the logic around LS_VERSION/LS_HASH and the resolve_ref invocation (and update any help text) so LLAMA_SWAP_REPO uses a usable ref for docker buildx instead of aborting for "latest"/numeric values.

coderabbitai · 2026-04-08T18:59:42Z

+if [ "$BACKEND" = "cuda" ]; then
+    CMAKE_FLAGS+=(
+        -DGGML_CUDA=ON
+        -DGGML_VULKAN=OFF
+        "-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"
+        "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
+        "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
+    )
+elif [ "$BACKEND" = "vulkan" ]; then
+    CMAKE_FLAGS+=(
+        -DGGML_CUDA=OFF
+        -DGGML_VULKAN=ON
+    )
+fi


⚠️ Potential issue | 🟠 Major

M2 (Medium): Fail fast on unsupported BACKEND values.

Any value other than cuda or vulkan falls through with only the common CMake flags, which can leave you with a “successful” image that lacks the requested accelerator support. Add an explicit failure here; the same guard is needed in docker/unified/install-sd.sh.

Suggested fix

if [ "$BACKEND" = "cuda" ]; then CMAKE_FLAGS+=( -DGGML_CUDA=ON -DGGML_VULKAN=OFF "-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}" "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler" "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda" ) elif [ "$BACKEND" = "vulkan" ]; then CMAKE_FLAGS+=( -DGGML_CUDA=OFF -DGGML_VULKAN=ON ) +else + echo "FATAL: unsupported BACKEND '$BACKEND' (expected cuda or vulkan)" >&2 + exit 1 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [ "$BACKEND" = "cuda" ]; then

CMAKE_FLAGS+=(

-DGGML_CUDA=ON

-DGGML_VULKAN=OFF

"-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"

"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"

"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"

)

elif [ "$BACKEND" = "vulkan" ]; then

CMAKE_FLAGS+=(

-DGGML_CUDA=OFF

-DGGML_VULKAN=ON

)

fi

if [ "$BACKEND" = "cuda" ]; then

CMAKE_FLAGS+=(

-DGGML_CUDA=ON

-DGGML_VULKAN=OFF

"-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"

"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"

"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"

)

elif [ "$BACKEND" = "vulkan" ]; then

CMAKE_FLAGS+=(

-DGGML_CUDA=OFF

-DGGML_VULKAN=ON

)

else

echo "FATAL: unsupported BACKEND '$BACKEND' (expected cuda or vulkan)" >&2

exit 1

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/unified/install-llama.sh` around lines 32 - 45, The script currently only handles BACKEND="cuda" and "vulkan" so add a fail-fast else branch after the existing if/elif to error out for any other BACKEND; update install-llama.sh to check the BACKEND variable and call echo >&2 with a clear message like "Unsupported BACKEND: $BACKEND" and exit 1 if it's not "cuda" or "vulkan", and apply the same guard change to docker/unified/install-sd.sh; reference the BACKEND variable and the CMAKE_FLAGS assignment to locate where to insert the else { echo ...; exit 1 } branch.

coderabbitai · 2026-04-08T18:59:43Z

+    <div class="h-full" class:hidden={$currentRoute !== "/"}>
+      <Playground />
+    </div>
+    <div class="h-full" class:hidden={$currentRoute === "/"}>
+      <Router {routes} on:routeLoaded={handleRouteLoaded} />
+    </div>


⚠️ Potential issue | 🟠 Major

H1: Only mount Playground on the root route.

class:hidden just toggles CSS. On non-root pages, the whole playground subtree still mounts, so its state and effects stay live in the background. If the router needs to stay mounted for routeLoaded, you can still keep <Router> mounted and gate only <Playground>.

Suggested fix

<main class="flex-1 overflow-auto p-4"> - <div class="h-full" class:hidden={$currentRoute !== "/"}> - <Playground /> - </div> + {`#if` $currentRoute === "/"} + <div class="h-full"> + <Playground /> + </div> + {/if} <div class="h-full" class:hidden={$currentRoute === "/"}> <Router {routes} on:routeLoaded={handleRouteLoaded} /> </div> </main>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<div class="h-full" class:hidden={$currentRoute !== "/"}>

<Playground />

</div>

<div class="h-full" class:hidden={$currentRoute === "/"}>

<Router {routes} on:routeLoaded={handleRouteLoaded} />

</div>

{`#if` $currentRoute === "/"}

<div class="h-full">

<Playground />

</div>

{/if}

<div class="h-full" class:hidden={$currentRoute === "/"}>

<Router {routes} on:routeLoaded={handleRouteLoaded} />

</div>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/App.svelte` around lines 51 - 56, The Playground subtree is still mounted because class:hidden only hides it; change to conditionally render Playground so it only mounts on the root route: replace the div wrapping Playground with an {`#if` $currentRoute === "/"} block that renders <Playground /> (and remove the class:hidden usage), while leaving <Router {routes} on:routeLoaded={handleRouteLoaded} /> mounted as-is so routeLoaded still fires.

coderabbitai · 2026-04-08T18:59:43Z

+  $effect(() => {
+    playgroundStores.audioTranscribing.set(isTranscribing);
+  });


⚠️ Potential issue | 🟡 Minor

L1: Reset audioTranscribing when this component unmounts.

This effect only writes the current value. If the component is destroyed while isTranscribing is true, the global activity flag stays stuck true.

Suggested fix

$effect(() => { playgroundStores.audioTranscribing.set(isTranscribing); + return () => { + playgroundStores.audioTranscribing.set(false); + }; });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

$effect(() => {

playgroundStores.audioTranscribing.set(isTranscribing);

});

$effect(() => {

playgroundStores.audioTranscribing.set(isTranscribing);

return () => {

playgroundStores.audioTranscribing.set(false);

};

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/playground/AudioInterface.svelte` around lines 26 - 28, The effect setting playgroundStores.audioTranscribing from isTranscribing doesn't clear the flag on unmount, so add an unmount cleanup that resets playgroundStores.audioTranscribing to false: either return a cleanup function from the existing $effect that calls playgroundStores.audioTranscribing.set(false) or add an onDestroy(() => playgroundStores.audioTranscribing.set(false)) next to the $effect to ensure the global flag is cleared when the component is destroyed.

coderabbitai · 2026-04-08T18:59:43Z

+  // SDAPI lora state
+  let availableLoras = $state<SdApiLora[]>([]);
+  let selectedLoras = $state<SdApiLoraRef[]>([]);
+  let isLoadingLoras = $state(false);
+  let lorasLoaded = $state(false);
+  let loraError = $state<string | null>(null);


⚠️ Potential issue | 🟠 Major

M1 (Medium): Reset LoRA state when the selected model changes.

availableLoras, selectedLoras, and lorasLoaded persist across model switches, but selectedLoras is sent unchanged in the next SDAPI request. After loading LoRAs for model A, switching to model B can submit A’s LoRAs to B. Clear this state on model change, or key it by model.

Suggested fix

let availableLoras = $state<SdApiLora[]>([]); let selectedLoras = $state<SdApiLoraRef[]>([]); let isLoadingLoras = $state(false); let lorasLoaded = $state(false); let loraError = $state<string | null>(null); + + $effect(() => { + $selectedModelStore; + availableLoras = []; + selectedLoras = []; + lorasLoaded = false; + loraError = null; + });

Also applies to: 47-62, 99-114

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/playground/ImageInterface.svelte` around lines 33 - 38, When the user changes models, clear or re-key the SDAPI LoRA state so LoRAs from the previous model aren’t sent to the new one: reset availableLoras, selectedLoras, lorasLoaded and loraError whenever the selected model (or its reactive store) changes; alternatively scope these stores by model id (e.g., a Map keyed by model name). Update the reactive blocks or watchers that load LoRAs (the code manipulating availableLoras, selectedLoras, isLoadingLoras, lorasLoaded, loraError — and the similar logic repeated in the other occurrences) to perform this reset before starting the new load and ensure selectedLoras is always tied to the current model.

coderabbitai · 2026-04-08T18:59:43Z

+    // Outside fences/math: blank line marks a complete boundary
+    if (trimmed === "") {
+      lastCompleteBoundary = i;


⚠️ Potential issue | 🟠 Major

H2 — Suffix-only rendering breaks container markdown.

A blank line is not a safe split point for all CommonMark containers. For example, - a\n\n- b and > a\n\n> b are still single list/blockquote structures when parsed together, but Line 218 renders the suffix in isolation, so the streamed final DOM can diverge from renderMarkdown(text). Please fall back to re-rendering the full complete section whenever the previous block can legally continue, or cache parsed top-level AST nodes instead of raw string slices.

Also applies to: 214-221

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/lib/markdown.ts` around lines 120 - 122, The current logic that treats a blank trimmed line (where trimmed === "") as a safe split and sets lastCompleteBoundary = i can break CommonMark containers (e.g., lists/blockquote that continue across blank lines) and causes suffix-only rendering to diverge from renderMarkdown(text); change the split strategy so that before accepting a boundary you check whether the previous top-level block can legally continue (or whether the blank line is still part of the same container) and if it can, fall back to re-rendering the full complete section instead of only the suffix, or implement caching of parsed top-level AST nodes and use those to determine safe split points; update the code paths that compute lastCompleteBoundary, complete, and suffix and the code that calls renderMarkdown to use this container-aware decision (or the cached top-level AST) so streamed DOM matches full renderMarkdown output.

coderabbitai · 2026-04-08T18:59:43Z

+// Convert \[...\] to $$...$$ and \(...\) to $...$
+export function normalizeLatexDelimiters(text: string): string {
+  // Display math: \[...\] → $$...$$  (may span multiple lines)
+  text = text.replace(/\\\[([\s\S]*?)\\\]/g, (_match, inner) => `$$${inner}$$`);
+  // Inline math: \(...\) → $...$
+  text = text.replace(/\\\(([\s\S]*?)\\\)/g, (_match, inner) => `$${inner}$`);
+  return text;


⚠️ Potential issue | 🟡 Minor

M1 — Raw-string LaTeX normalization mutates code examples.

Line 254 normalizes $...$ and \[...\] before markdown parsing, so the same sequences inside fenced code blocks or inline code are rewritten into $...$ /$$...$$ and no longer render as literal code. This should run as a remark transform over plain text nodes only, skipping code, inlineCode, and existing math nodes.

Also applies to: 254-254

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/lib/markdown.ts` around lines 239 - 245, normalizeLatexDelimiters currently does blind string replacements and mutates fenced code/inline code; replace it with a remark-style transform that walks the Markdown AST and performs the same regex substitutions only on plain text nodes (e.g., visit/text nodes) while explicitly skipping nodes of type "code", "inlineCode", "math", and "inlineMath". Update the function (or add a new remark plugin referenced by normalizeLatexDelimiters) to accept/transform an MDAST root, apply the two replacements on node.value for eligible text nodes, and return the root so markdown parsing preserves literal code blocks and existing math nodes.

The previous approach used a seen map that collapsed non-contiguous same-group entries into a single batch, reordering the preload sequence. This change preserves the original preload order by only batching models that appear consecutively for the same group. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mostlygeek and others added 30 commits February 14, 2026 10:06

.github/workflows: fix expired token in publishing images (mostlygeek…

17e5263

…#522) Fixes: mostlygeek#517

ui-svelte: incremental rendering of chat messages in the Playground (m…

3e52144

…ostlygeek#520) add incremental rendering to Playground > Chat

ui: persist playground state across route navigation (mostlygeek#525)

e3bf065

- Keep Playground component mounted when navigating away, preserving streaming/generating state - Add animated gradient effect on Playground nav link when activity is in progress

.coderabbit.yaml,AGENTS.md: small tweaks

d33d51f

proxy, ui: add pending requests count to the main dashboard (mostlyge…

1688bdd

…ek#516) add a real time counter of pending (inflight) requests to the UI.

ui: add OGG audio format support to transcription playground (mostlyg…

2f377f6

…eek#544)

Enhance web UI section in README

aedb640

Updated README to enhance the description of the web interface and added details about features like token metrics, request inspection, model management, and real-time log streaming.

Revise README description for clarity and detail

175bb36

Updated description to clarify compatibility and usage.

Update README with additional images

2c07896

Added new images for model loading and real-time log streaming sections.

ui: fix text size svg

49546e2

proxy: fix metrics capture for v1/responses (mostlygeek#586)

c3c258a

properly parse anthropic compatible usage data from streaming responses. closes: mostlygeek#577

Update go.mod to 1.26.1 (mostlygeek#593)

a3725e7

.github/workflows,docker/unified: add cuda based unified container (m…

916d13f

…ostlygeek#597) Add Docker build scripts for a unified cuda docker container with llama-server, stable-diffusion.cpp, whisper.cpp.

.github,docker/unified: include vulkan build (mostlygeek#599)

e5e7391

Update docker/unified scripts to support building both cuda and vulkan unified images.

build(deps): bump picomatch from 4.0.3 to 4.0.4 in /ui-svelte (mostly…

6574a52

…geek#605)

docker/unified,.github: fix unified build (mostlygeek#606)

c794273

ci: fix matrix exclude for scheduled docker workflow (mostlygeek#610)

1e44077

mostlygeek and others added 8 commits April 1, 2026 03:38

docker/unified: build llama.cpp with static libraries (mostlygeek#616)

c2c8cfa

add /sdapi to list of supported endpoints

9559009

docker/unified: add ik_llama.cpp to CUDA container (mostlygeek#620)

1dd1aad

proxy: add configurable HTTP timeouts for models and peers (mostlygee…

a37b486

…k#619) Add configurable HTTP timeout settings to both models and peers to support installations that requires longer timeouts than the current hardcoded defaults. Closes mostlygeek#618

docker/unified: publish rootless image variant (mostlygeek#630)

d87f0ce

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Quentin-M force-pushed the parallel-preload branch from 78dbba7 to 0353306 Compare April 8, 2026 18:43

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

	- typescript, vite and svelt5 for UI (located in ui/)
	- typescript, vite and svelte5 for UI (located in ui/)

Conversation

Quentin-M commented Apr 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Quentin-M commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading