Skip to content

proxy: parallelize startup preload within process groups#20

Open
Quentin-M wants to merge 40 commits into
napmany:mainfrom
Quentin-M:parallel-preload
Open

proxy: parallelize startup preload within process groups#20
Quentin-M wants to merge 40 commits into
napmany:mainfrom
Quentin-M:parallel-preload

Conversation

@Quentin-M

@Quentin-M Quentin-M commented Apr 8, 2026

Copy link
Copy Markdown

Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity.

This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously.

Changes:

  • Group preload models by ProcessGroup before starting
  • Use sync.WaitGroup to parallelize within each group
  • Call swapProcessGroup once per group (handles exclusivity)
  • Log group completion with model count

Before: 3 models in same group → loaded sequentially (total = sum of load times)
After: 3 models in same group → loaded in parallel (total ≈ max load time)

Cross-ref: mostlygeek#637

Summary by CodeRabbit

  • Refactor

    • Faster, concurrent model preload by group and improved preload logging; new in‑flight request tracking surfaced to UI.
  • New Features

    • SDAPI image endpoints and LoRA listing; multi-image gallery and SD settings in the UI.
    • UI: in‑flight request counts, model aliases, rerank tool, improved chat persistence/streaming markdown, OGG audio uploads.
  • Configuration

    • Per‑model and per‑peer timeouts, global TTL, and per‑request parameter overrides by requested model ID.
  • Bug Fixes

    • More robust metrics parsing and proxy timeout handling; improved SSE inflight reporting.

mostlygeek and others added 30 commits February 14, 2026 10:06
…k#523)

## Summary
- Add `--provenance=false` to docker build commands in
`build-container.sh`
- BuildKit attestation manifests are stored as untagged images in GHCR,
and the `delete-untagged-containers` cleanup job deletes them, breaking
the manifest list and causing `manifest unknown` errors on pull
- ref: actions/delete-package-versions#162
- Keep Playground component mounted when navigating away, preserving
streaming/generating state
- Add animated gradient effect on Playground nav link when activity is
in progress
…ek#516)

add a real time counter of pending (inflight) requests to the UI.
Pause auto-scroll when the user scrolls up to review logs, and resume
when they scroll back to the bottom.

- add `userScrolledUp` state variable
- add `handleScroll` to detect scroll position with 40px threshold
- guard the auto-scroll effect with `!userScrolledUp`

closes mostlygeek#529
Add setParamsByID filter that applies different request parameters based
on the requested model ID, enabling per-alias behaviour for a single
loaded model.

- add SetParamsByID field to Filters struct and SanitizedSetParamsByID
method
- substitute ${MODEL_ID} and other macros in setParamsByID keys and
values
- validate no unknown macros remain in keys or values after substitution
- apply setParamsByID in proxyInferenceHandler after setParams (can
override it)
- update config-schema.json with setParamsByID definition
- update UI to show aliases and make them selectable in the Playground

closes mostlygeek#534
Add a new Rerank tab to the playground that lets users test /v1/rerank
endpoints. Supports a visual table editor and a JSON editor mode that
stay in sync when toggling between them.

- add rerankApi.ts with typed wrapper for /v1/rerank
- add RerankInterface.svelte with query input, sortable document table,
color-coded scores, auto-add row, cancel/clear, and token usage
- add rerankLoading store to playgroundActivity derived store
- register Rerank tab in Playground.svelte

Updates mostlygeek#481
Updated README to enhance the description of the web interface and added details about features like token metrics, request inspection, model management, and real-time log streaming.
Updated description to clarify compatibility and usage.
Added new images for model loading and real-time log streaming sections.
Add `cuda13` as a supported build architecture, targeting the
`ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image.

The `server-cuda13` image ships with CUDA 13 libraries, providing
improved performance on recent NVIDIA hardware compared to the existing
`server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX
50-series) benefit from reduced model load latency and higher token
throughput.

- Add `cuda13` to the allowed architectures list in
`docker/build-container.sh`
- Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so
the container is built and pushed automatically
Add a copy-to-clipboard button that appears on hover for each code block
rendered in the chat interface assistant messages.

- Svelte action `codeBlockCopy` injects a button into every `<pre>`
element
- MutationObserver reattaches buttons as streaming content arrives
- Button shows a check icon for 2 seconds after a successful copy
- Uses clipboard API with execCommand fallback for non-secure contexts
- CSS hides button by default and reveals it on pre:hover

https://claude.ai/code/session_01PTA5ao5YQuFAS6a9juLeZW

---------

Co-authored-by: Claude <noreply@anthropic.com>
Add a new configuration parameter globalTTL that all models will
inherit. The default value is 0 which matches the currently
functionality to never automatically unload a model.

The model.ttl's default has changed to -1, which means use the global
TTL value. Any model.ttl >=0 is now value with 0 meaning never unload.
This allows a model to override a globalTTL > 0 and be configured to
never unload.

Fixes mostlygeek#459
Closes mostlygeek#512
…#578)

Extend macro substitution to the name and description fields of
ModelConfig, matching the behavior already present for cmd, proxy,
checkEndpoint, and filters.

- substitute global/model macros (including MODEL_ID) in name and
description
- substitute PORT macro in name and description when allocated
- validate no unknown macros remain in name and description after
substitution
- add tests for macro substitution, MODEL_ID, and unknown macro error
Use natural sorting for model names.

Previously the model list was sorted lexicographically, which resulted
in unintuitive ordering when numbers were included in the name.

Example:

Before
qwen3.5:2B
qwen3.5:35B-3AB
qwen3.5:9B

After
qwen3.5:2B
qwen3.5:9B
qwen3.5:35B-3AB

This change sorts models using natural order so numeric parts are
compared numerically.
Upgrade vite and related dependencies to take advantage of Vite 8's
improved build times via Rolldown and Oxc.

- vite: ^6.3.5 → ^8.0.0
- @sveltejs/vite-plugin-svelte: ^5.0.3 → ^7.0.0
- svelte: ^5.19.0 → ^5.46.4
- vite-plugin-compression2: ^2.4.0 → ^2.5.1
- vitest: ^4.0.18 → ^4.1.0

---------

Co-authored-by: Claude <noreply@anthropic.com>
properly parse anthropic compatible usage data from streaming responses.

closes: mostlygeek#577
Add proxy routes for stable-diffusion.cpp's /sdapi/v1/txt2img,
/sdapi/v1/img2img, and /sdapi/v1/loras endpoints. POST endpoints
use proxyInferenceHandler (model in JSON body), GET /loras uses
proxyGETModelHandler (model in query param).

Update the image playground with a dual-mode UI supporting both
OpenAI and SDAPI backends. In SDAPI mode, loras are fetched first
to prime the server-side cache, and all txt2img parameters are
exposed (negative prompt, steps, cfg_scale, seed, batch_size,
clip_skip, sampler, scheduler, lora selection with multipliers).

- Add 3 sdapi route registrations in proxymanager.go
- Add sdApi.ts client with generateSdImage and fetchSdLoras
- Add SDAPI types (SdApiTxt2ImgRequest, SdApiResponse, etc.)
- Add /sdapi to vite dev proxy config
- Add backend tests for sdapi routing
- Support batch image display in gallery grid

https://claude.ai/code/session_0186MGX6NXdHVBTv2KH45fqn

---------

Co-authored-by: Claude <noreply@anthropic.com>
…ostlygeek#597)

Add Docker build scripts for a unified cuda docker container with llama-server, stable-diffusion.cpp, whisper.cpp.
- set up a GHA scheduled job to build the container nightly 
- enabling pushing a llama-swap:unified and a llama-swap:unified-Y-M-D
image to ghcr.io
- tidy up Dockerfile to use a non-root user and llama-swap as an entry
point
Update docker/unified scripts to support building both cuda and vulkan unified images.
multiple fixes to vulkan build: 

- use ubuntu 26.04 to be compatible with AMD 395+ (Strix halo) hardware
- add home directory in container 
- fix stable-diffusion install to actually enable vulkan

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
mostlygeek and others added 8 commits April 1, 2026 03:38
…tlygeek#625)

Expose CMAKE_CUDA_ARCHITECTURES as a Docker build ARG so users can
customize CUDA architectures via --build-arg without editing the
Dockerfile.

- convert hardcoded ENV to ARG with default, feeding into ENV
- replace silent fallback defaults (:-) in scripts with :? guards
  to fail fast if the env var is missing
- add usage example to Dockerfile header

Follow up to: mostlygeek#624

https://claude.ai/code/session_01EWiUe7jNABX7Uz95dUGJqK

Co-authored-by: Claude <noreply@anthropic.com>
…ek#627)

Extend the existing config-schema workflow to also validate
config.example.yaml against config-schema.json using check-jsonschema.

- add config.example.yaml to PR and push path triggers
- install check-jsonschema via pip
- run validation of config.example.yaml against schema

https://claude.ai/code/session_01Y1oqwE6mwNs9UTJgZRgXtG

---------

Co-authored-by: Claude <noreply@anthropic.com>
…k#619)

Add configurable HTTP timeout settings to both models and peers to support installations that requires longer timeouts than the current hardcoded defaults.

Closes mostlygeek#618
Keep request duration from being underreported when upstream timings
only cover part of the full request lifecycle.

- compare wall-clock and upstream timing durations
- keep token and throughput values from timings
- add regression coverage for underreported timings

fixes mostlygeek#602
@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 850f0800-9b97-48ed-b3ca-03482ac0d62b

📥 Commits

Reviewing files that changed from the base of the PR and between 0353306 and 8e00a60.

📒 Files selected for processing (1)
  • proxy/proxymanager.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • proxy/proxymanager.go

Walkthrough

Startup preload groups models by process-group and starts each group concurrently; added a thread-safe InflightCounter and Gin trackInflight middleware emitting SSE inflight events; added SDAPI routes; introduced per-model and per-peer timeouts, globalTTL, and filters.setParamsByID; multiple UI, Docker, CI, and test updates.

Changes

Cohort / File(s) Summary
Proxy manager & runtime
proxy/proxymanager.go, proxy/proxymanager_api.go, proxy/proxymanager_test.go, proxy/process.go, proxy/process_test.go
Added InflightCounter, trackInflight() middleware, SSE inflight emission, SDAPI route wiring, and reworked startup preload grouping/parallel start; updated API model Aliases and tests.
Config, schema & filters
proxy/config/..., config-schema.json, config.example.yaml, proxy/config/*_test.go
Added globalTTL and MODEL_CONFIG_DEFAULT_TTL sentinel, TimeoutsConfig for models/peers, filters.setParamsByID (with macro/alias handling), validation and tests; schema/example updated.
Reverse proxy & transports
proxy/peerproxy.go, proxy/peerproxy_test.go, proxy/process.go
Replaced shared transport with per-peer/per-process http.Transport configured from TimeoutsConfig, enabled HTTP/2 attempt, and adjusted tests.
Metrics, events
proxy/metrics_monitor.go, proxy/metrics_monitor_test.go, proxy/events.go
Adjusted metrics parsing (v1/responses fallback) and duration calculation; added InFlightRequestsEvent and integrated SSE emission.
Routing / SD API / simple-responder
cmd/simple-responder/simple-responder.go, proxy/proxymanager.go, proxy/proxymanager_test.go
Added /sdapi/v1/* endpoints, applied SanitizedSetParamsByID in inference flow, and added SDAPI routing tests and error handling.
UI: inflight, aliases, SD, streaming markdown, rerank
ui-svelte/... (e.g. src/stores/api.ts, src/lib/sdApi.ts, src/lib/markdown.ts, src/components/*, src/routes/*, vite.config.ts)
Added inflight store/UI surfaces, model aliases support, SDAPI client/UI, streaming-markdown pipeline + tests, rerank UI and API client, route/route-store changes, and many UX improvements.
Docker, build scripts & CI
.github/workflows/*, docker/*, docker/unified/*, docker/build-image.sh, Makefile
Added unified Dockerfiles and installers, docker/build-image.sh, cuda13 target, CI workflow updates, and packaging / provenance tweaks.
Misc, docs & tests
AGENTS.md, README.md, CLAUDE.md, go.mod, many tests across proxy/* and UI tests
Documentation updates, Go version bump, widespread test additions/adjustments to cover new TTLs/timeouts/filters/SDAPI/metrics/UI behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'proxy: parallelize startup preload within process groups' directly describes the main change in the changeset: parallelizing model preload within process groups while maintaining group-level exclusivity.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch parallel-preload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Models in the same group now preload concurrently via goroutines instead
of sequentially. Groups are still loaded in order to respect exclusivity.

This significantly reduces startup time when a group has multiple models
(e.g., an LLM + STT + TTS that share GPU resources). Each model's
container starts and health-checks simultaneously.

- Group preload models by ProcessGroup before starting
- Use sync.WaitGroup to parallelize within each group
- Call swapProcessGroup once per group (handles exclusivity)
- Log group completion with model count

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@proxy/proxymanager.go`:
- Around line 227-230: The current preload logic ignores errors from
http.NewRequest and processGroup.ProxyRequest and always emits
ModelPreloadedEvent{Success: true}; fix by checking and handling both errors:
capture the error returned by http.NewRequest and by
processGroup.ProxyRequest(mid, &DiscardWriter{}, req), only emit
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) and call
proxyLogger.Infof when ProxyRequest returns nil, otherwise emit
ModelPreloadedEvent{ModelName: mid, Success: false} (include the error) and log
proxyLogger.Errorf with the error; ensure you reference ProxyRequest,
DiscardWriter, ModelPreloadedEvent, event.Emit and
proxyLogger.Infof/proxyLogger.Errorf when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0b7a1a88-3621-42f7-9618-df4413efb758

📥 Commits

Reviewing files that changed from the base of the PR and between 1c24bb1 and 78dbba7.

📒 Files selected for processing (1)
  • proxy/proxymanager.go

Comment thread proxy/proxymanager.go
Comment on lines +227 to +230
req, _ := http.NewRequest("GET", "/", nil)
processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
proxyLogger.Infof("Preloaded model: %s", mid)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle ProxyRequest errors before emitting success events.

Line 228 ignores the preload error, but Line 229 always emits Success: true. This can incorrectly report failed preloads as successful.

Suggested fix
-						req, _ := http.NewRequest("GET", "/", nil)
-						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
-						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
-						proxyLogger.Infof("Preloaded model: %s", mid)
+						req, _ := http.NewRequest("GET", "/", nil)
+						if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil {
+							event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
+							proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
+							return
+						}
+						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+						proxyLogger.Infof("Preloaded model: %s", mid)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
req, _ := http.NewRequest("GET", "/", nil)
processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
proxyLogger.Infof("Preloaded model: %s", mid)
req, _ := http.NewRequest("GET", "/", nil)
if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil {
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
return
}
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
proxyLogger.Infof("Preloaded model: %s", mid)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/proxymanager.go` around lines 227 - 230, The current preload logic
ignores errors from http.NewRequest and processGroup.ProxyRequest and always
emits ModelPreloadedEvent{Success: true}; fix by checking and handling both
errors: capture the error returned by http.NewRequest and by
processGroup.ProxyRequest(mid, &DiscardWriter{}, req), only emit
event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) and call
proxyLogger.Infof when ProxyRequest returns nil, otherwise emit
ModelPreloadedEvent{ModelName: mid, Success: false} (include the error) and log
proxyLogger.Errorf with the error; ensure you reference ProxyRequest,
DiscardWriter, ModelPreloadedEvent, event.Emit and
proxyLogger.Infof/proxyLogger.Errorf when making the change.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/go-ci.yml (2)

49-67: ⚠️ Potential issue | 🟡 Minor

Cache key mismatch prevents cache reuse.

Line 54 uses cmd/simple-responder/simple-responder.go for the restore step, but line 67 still uses misc/simple-responder/simple-responder.go for the save step. Since the file exists at cmd/simple-responder/simple-responder.go, the cache key will always differ between restore and save, preventing any cache hit. Update line 67 to hash cmd/simple-responder/simple-responder.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/go-ci.yml around lines 49 - 67, The cache keys for the
restore and save steps differ: the "restore-simple-responder" step hashes
'cmd/simple-responder/simple-responder.go' but the "save-simple-responder" step
hashes 'misc/simple-responder/simple-responder.go', preventing cache reuse;
update the key in the save step (save-simple-responder) to use the same
hashFiles path 'cmd/simple-responder/simple-responder.go' so both steps compute
identical cache keys.

49-58: ⚠️ Potential issue | 🟠 Major

Fix path filter to include simple-responder changes, and align cache keys.

This job now depends on cmd/simple-responder/simple-responder.go (via the cache restore key), but the workflow excludes all cmd/** paths. A PR that only modifies this file will skip CI even though the job needs to rebuild it for tests.

Additionally, the restore cache key hashes cmd/simple-responder/simple-responder.go while the save step hashes misc/simple-responder/simple-responder.go—these will never match, so the cache is non-functional and always rebuilds.

Fix: Either whitelist cmd/simple-responder/simple-responder.go in the path filters and align the cache keys to use the same source file, or if this helper truly needs no testing, document why the cache key should not follow the source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/go-ci.yml around lines 49 - 58, The workflow excludes
cmd/** changes but the restore/save cache keys and this job depend on
cmd/simple-responder/simple-responder.go, causing CI to skip and cache misses;
update the workflow path filter to include cmd/simple-responder/** (or
specifically cmd/simple-responder/simple-responder.go) so PRs touching that file
trigger the job, and make the cache key paths consistent by changing either the
restore key or the save key so both reference the same source file (ensure the
cache step identified as restore-simple-responder and the corresponding save
step use identical hashFiles('cmd/simple-responder/simple-responder.go') or both
point to misc/... if you intend that path).
♻️ Duplicate comments (1)
proxy/proxymanager.go (1)

265-268: ⚠️ Potential issue | 🔴 Critical

H1: Emit preload success only after the preload call succeeds.

Line 265/266 still drops both request-construction and ProxyRequest errors, while Line 267 always emits Success: true. A failed startup will still look preloaded to observers.

Suggested fix
-						req, _ := http.NewRequest("GET", "/", nil)
-						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
-						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
-						proxyLogger.Infof("Preloaded model: %s", mid)
+						req, err := http.NewRequest("GET", "/", nil)
+						if err != nil {
+							event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
+							proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
+							return
+						}
+						if err := processGroup.ProxyRequest(mid, &DiscardWriter{}, req); err != nil {
+							event.Emit(ModelPreloadedEvent{ModelName: mid, Success: false})
+							proxyLogger.Errorf("Failed to preload model %s: %v", mid, err)
+							return
+						}
+						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+						proxyLogger.Infof("Preloaded model: %s", mid)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/proxymanager.go` around lines 265 - 268, The code currently ignores
errors from http.NewRequest and processGroup.ProxyRequest and always emits
ModelPreloadedEvent{ModelName: mid, Success: true}; change this so you capture
and check both errors: call http.NewRequest and handle its error, call
processGroup.ProxyRequest and capture any returned error, only emit
ModelPreloadedEvent with Success: true if both succeed; on failure, emit
Success: false and log the error via proxyLogger.Errorf (include mid and error),
and avoid treating the preload as successful when ProxyRequest or request
creation fails.
🧹 Nitpick comments (7)
AGENTS.md (1)

30-40: Consider adding a language identifier to the fenced code block.

The commit message example code block lacks a language specifier (e.g., text or markdown), which triggers a markdown lint warning.

Proposed fix
-```
+```text
 proxy: add new feature
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 30 - 40, The fenced code block showing the example
commit message in AGENTS.md is missing a language identifier and triggers a
markdown lint warning; update that block (the triple-backtick block containing
"proxy: add new feature" and the list) to include a language tag such as text or
markdown (e.g., change ``` to ```text) so the linter recognizes the block type
and the warning is resolved.
proxy/peerproxy.go (1)

40-54: Use http.DefaultTransport.Clone() to start the transport configuration.

A fresh http.Transport drops every stdlib default you did not remember to re-specify. Cloning the default transport first preserves Go's baseline behavior (Proxy, ForceAttemptHTTP2, MaxIdleConns, etc.) and lets you override only the peer-specific timeouts and MaxIdleConnsPerHost.

Suggested refactor
-		peerTransport := &http.Transport{
-			Proxy: http.ProxyFromEnvironment,
-			DialContext: (&net.Dialer{
-				Timeout:   time.Duration(peer.Timeouts.Connect) * time.Second,
-				KeepAlive: 30 * time.Second,
-			}).DialContext,
-			TLSHandshakeTimeout:   time.Duration(peer.Timeouts.TLSHandshake) * time.Second,
-			ResponseHeaderTimeout: time.Duration(peer.Timeouts.ResponseHeader) * time.Second,
-			ExpectContinueTimeout: time.Duration(peer.Timeouts.ExpectContinue) * time.Second,
-			ForceAttemptHTTP2:     true,
-			MaxIdleConns:          100,
-			MaxIdleConnsPerHost:   10,
-			IdleConnTimeout:       time.Duration(peer.Timeouts.IdleConn) * time.Second,
-		}
+		peerTransport := http.DefaultTransport.(*http.Transport).Clone()
+		peerTransport.DialContext = (&net.Dialer{
+			Timeout:   time.Duration(peer.Timeouts.Connect) * time.Second,
+			KeepAlive: 30 * time.Second,
+		}).DialContext
+		peerTransport.TLSHandshakeTimeout = time.Duration(peer.Timeouts.TLSHandshake) * time.Second
+		peerTransport.ResponseHeaderTimeout = time.Duration(peer.Timeouts.ResponseHeader) * time.Second
+		peerTransport.ExpectContinueTimeout = time.Duration(peer.Timeouts.ExpectContinue) * time.Second
+		peerTransport.MaxIdleConnsPerHost = 10
+		peerTransport.IdleConnTimeout = time.Duration(peer.Timeouts.IdleConn) * time.Second
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/peerproxy.go` around lines 40 - 54, The transport creation currently
constructs a fresh http.Transport and loses stdlib defaults; change it to start
from http.DefaultTransport.Clone() (type-assert to *http.Transport) and then
override only peer-specific fields: set DialContext using a net.Dialer with
Timeout and KeepAlive, and set TLSHandshakeTimeout, ResponseHeaderTimeout,
ExpectContinueTimeout, IdleConnTimeout, and MaxIdleConnsPerHost as before;
update the variable peerTransport in peerproxy.go to use the cloned transport so
defaults like Proxy, ForceAttemptHTTP2, and MaxIdleConns are preserved.
proxy/process_test.go (1)

588-588: Variable shadows module-level debugLogger.

The local debugLogger shadows the package-level variable. Consider using a different name like testLogger for clarity.

🔧 Suggested fix
-	debugLogger := NewLogMonitorWriter(io.Discard)
-	process := NewProcess("test-model", 30, modelConfig, debugLogger, debugLogger)
+	testLogger := NewLogMonitorWriter(io.Discard)
+	process := NewProcess("test-model", 30, modelConfig, testLogger, testLogger)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/process_test.go` at line 588, The test creates a local variable named
debugLogger that shadows the package-level debugLogger; rename the local
variable (e.g., testLogger) where NewLogMonitorWriter(io.Discard) is called in
process_test.go to avoid shadowing and potential confusion and update any
subsequent references in the test to use the new name (reference: debugLogger,
NewLogMonitorWriter).
ui-svelte/src/stores/api.ts (1)

65-65: Minor formatting inconsistency.

There's an extra space before the colon in the localeCompare options.

🔧 Suggested fix
-              return (a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric : true} );
+              return (a.name + a.id).localeCompare(b.name + b.id, undefined, { numeric: true });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/stores/api.ts` at line 65, Fix the spacing inside the options
object passed to String.prototype.localeCompare: change the options from "{
numeric : true}" to "{ numeric: true }" in the expression "(a.name +
a.id).localeCompare(b.name + b.id, undefined, { numeric : true} )" so the colon
has no leading space and add the closing-space before the final parenthesis for
consistent formatting.
proxy/config/model_config.go (1)

8-10: Consider idiomatic Go constant naming.

Go convention uses MixedCaps or mixedCaps for constants rather than SCREAMING_SNAKE_CASE. Consider renaming to ModelConfigDefaultTTL or defaultModelTTL.

💡 Suggested change
 const (
-	MODEL_CONFIG_DEFAULT_TTL = -1
+	ModelConfigDefaultTTL = -1
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/config/model_config.go` around lines 8 - 10, The constant
MODEL_CONFIG_DEFAULT_TTL uses SCREAMING_SNAKE_CASE; rename it to follow Go
idioms (e.g., ModelConfigDefaultTTL for exported or defaultModelTTL for
package-private) and update all references, including any usages in functions,
tests, and imports in this package so compilation succeeds; ensure
documentation/comments and any JSON/tags that relied on the old name are
adjusted accordingly.
ui-svelte/src/components/playground/ChatInterface.svelte (1)

60-75: Consider persisting on component unmount.

The throttled save works well during normal operation, but if the user navigates away before the 2-second timer fires, the latest messages may be lost. Consider adding an onDestroy or ensuring a final save.

💡 Suggested improvement
import { onDestroy } from "svelte";

// ... existing code ...

onDestroy(() => {
  try { localStorage.setItem("playground-messages", JSON.stringify(messages)); } catch {}
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/components/playground/ChatInterface.svelte` around lines 60 -
75, The throttled persistence currently using $effect, lastSaveTime, and the
save() closure can miss the latest messages if the component unmounts before the
timer fires; add a final flush in an onDestroy handler that calls the same save
logic (writing JSON.stringify(messages) to "playground-messages" via
localStorage) to guarantee persistence on teardown, or extract the save
functionality into a shared function and invoke it both from the throttled timer
and from onDestroy to avoid duplication.
docker/unified/install-llama-swap.sh (1)

30-39: Consider GitHub API rate limiting.

The unauthenticated GitHub API request on line 32-33 is subject to rate limiting (60 requests/hour per IP). In CI environments with shared IPs, this could cause intermittent failures.

Consider using $GITHUB_TOKEN if available:

💡 Suggested improvement
 if [ "$VERSION" = "latest" ]; then
     echo "=== Resolving latest llama-swap release ==="
-    VERSION=$(curl -fsSL "https://api.github.com/repos/${REPO}/releases/latest" \
+    AUTH_HEADER=""
+    if [ -n "$GITHUB_TOKEN" ]; then
+        AUTH_HEADER="Authorization: token $GITHUB_TOKEN"
+    fi
+    VERSION=$(curl -fsSL ${AUTH_HEADER:+-H "$AUTH_HEADER"} "https://api.github.com/repos/${REPO}/releases/latest" \
         | grep '"tag_name"' | head -1 | cut -d'"' -f4 | sed 's/^v//')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/unified/install-llama-swap.sh` around lines 30 - 39, The
unauthenticated curl call that resolves the latest release (using VERSION, REPO
and the curl pipeline) can hit GitHub rate limits; update the script to use a
GITHUB_TOKEN when present by adding an Authorization header to the curl request
(e.g., set header only if $GITHUB_TOKEN is non-empty) and fall back to the
unauthenticated request otherwise, keeping the same parsing of tag_name and the
existing error handling for an empty VERSION.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/unified-docker.yml:
- Around line 52-61: The matrix serialization currently produces an empty-string
entry when the backends array is empty; modify the logic around backends and
matrix so that if backends is empty you write an explicit empty JSON array
("[]") to GITHUB_OUTPUT instead of serializing an empty element. Locate the
backends array and the matrix generation (variables backends and matrix, and the
echo "matrix=$matrix" >> $GITHUB_OUTPUT) and add a conditional: if backends has
no elements set matrix='[]' else build the JSON with printf | jq as before, then
echo matrix to GITHUB_OUTPUT.

In `@AGENTS.md`:
- Line 8: Update the typo "svelt5" to "svelte5" in the documentation string (the
phrase "typescript, vite and svelt5 for UI (located in ui/)") so the UI tech
list correctly reads "typescript, vite and svelte5 for UI"; ensure only the
spelling is changed and no other text is modified.

In `@docker/build-image.sh`:
- Around line 117-128: The get_default_branch function currently probes branches
with git ls-remote --heads which returns success even if the ref is absent;
update get_default_branch to perform reliable detection by either adding
--exit-code to each git ls-remote --heads probe (e.g., git ls-remote --exit-code
--heads ...) so the if/elif properly fails when the branch is missing, or
alternatively query the remote HEAD symref (git ls-remote --symref <repo_url>
HEAD) and parse the target to determine the default branch; ensure the function
still falls back predictably (used by get_latest_commit) when detection fails.

In `@docker/unified/build-image.sh`:
- Around line 177-179: The script is calling resolve_ref("${LLAMA_SWAP_REPO}"
"${LS_VERSION}") which rejects user-friendly selectors like "latest" or bare
release numbers, causing help/examples to fail; update build-image.sh so that
when LS_VERSION is set the script either (a) skips resolving and passes the raw
LS_VERSION through to the build (i.e. set LS_HASH="${LS_VERSION}" when
resolve_ref would reject it) or (b) validate/normalize LS_VERSION up front and
restrict the help text to only document real git refs; modify the logic around
LS_VERSION/LS_HASH and the resolve_ref invocation (and update any help text) so
LLAMA_SWAP_REPO uses a usable ref for docker buildx instead of aborting for
"latest"/numeric values.

In `@docker/unified/install-llama.sh`:
- Around line 32-45: The script currently only handles BACKEND="cuda" and
"vulkan" so add a fail-fast else branch after the existing if/elif to error out
for any other BACKEND; update install-llama.sh to check the BACKEND variable and
call echo >&2 with a clear message like "Unsupported BACKEND: $BACKEND" and exit
1 if it's not "cuda" or "vulkan", and apply the same guard change to
docker/unified/install-sd.sh; reference the BACKEND variable and the CMAKE_FLAGS
assignment to locate where to insert the else { echo ...; exit 1 } branch.

In `@Makefile`:
- Line 54: The Makefile currently has the linux/arm64 build command commented
out (the GOOS=linux GOARCH=arm64 go build ... line), which prevents generation
of the linux-arm64 artifact expected by .goreleaser.yaml and scripts/install.sh;
restore that artifact by uncommenting or re-adding the ARM64 build command that
uses GOOS=linux GOARCH=arm64 with the same -ldflags (-X main.commit=${GIT_HASH}
-X main.version=local_${GIT_HASH} -X main.date=${BUILD_DATE}) and output pattern
($(BUILD_DIR)/$(APP_NAME)-linux-arm64), and ensure the linux build target still
invokes both the amd64 and arm64 build commands so `make linux` produces both
artifacts.

In `@proxy/config/config.go`:
- Around line 315-329: The rebuild of modelConfig.Filters.SetParamsByID can
silently overwrite entries when two different source keys expand to the same
newKey; after computing newKey (using macroSlug/macroStr and
substituteMacroInValue with entry.Name/entry.Value) but before assigning into
newSetParamsByID, check whether newKey already exists in newSetParamsByID and if
so return a Config error referencing modelId and the colliding key(s); do this
check inside the loop that constructs newSetParamsByID so collisions are
rejected early rather than overwriting.

In `@proxy/process.go`:
- Around line 99-115: checkHealthEndpoint() currently uses hard-coded timeouts
causing startup health checks to fail despite transport using config.Timeouts;
modify checkHealthEndpoint() to create its http.Client with a Transport that
applies the same timeout settings from config.Timeouts (same DialContext
timeout, TLSHandshakeTimeout, ResponseHeaderTimeout, ExpectContinueTimeout,
IdleConnTimeout, etc.) or reuse the existing transport instance used for
reverseProxy (transport) so the health probe uses identical connection and
response timeouts as the proxied requests.

In `@ui-svelte/src/App.svelte`:
- Around line 51-56: The Playground subtree is still mounted because
class:hidden only hides it; change to conditionally render Playground so it only
mounts on the root route: replace the div wrapping Playground with an {`#if`
$currentRoute === "/"} block that renders <Playground /> (and remove the
class:hidden usage), while leaving <Router {routes}
on:routeLoaded={handleRouteLoaded} /> mounted as-is so routeLoaded still fires.

In `@ui-svelte/src/components/playground/AudioInterface.svelte`:
- Around line 26-28: The effect setting playgroundStores.audioTranscribing from
isTranscribing doesn't clear the flag on unmount, so add an unmount cleanup that
resets playgroundStores.audioTranscribing to false: either return a cleanup
function from the existing $effect that calls
playgroundStores.audioTranscribing.set(false) or add an onDestroy(() =>
playgroundStores.audioTranscribing.set(false)) next to the $effect to ensure the
global flag is cleared when the component is destroyed.

In `@ui-svelte/src/components/playground/ImageInterface.svelte`:
- Around line 33-38: When the user changes models, clear or re-key the SDAPI
LoRA state so LoRAs from the previous model aren’t sent to the new one: reset
availableLoras, selectedLoras, lorasLoaded and loraError whenever the selected
model (or its reactive store) changes; alternatively scope these stores by model
id (e.g., a Map keyed by model name). Update the reactive blocks or watchers
that load LoRAs (the code manipulating availableLoras, selectedLoras,
isLoadingLoras, lorasLoaded, loraError — and the similar logic repeated in the
other occurrences) to perform this reset before starting the new load and ensure
selectedLoras is always tied to the current model.

In `@ui-svelte/src/lib/markdown.ts`:
- Around line 239-245: normalizeLatexDelimiters currently does blind string
replacements and mutates fenced code/inline code; replace it with a remark-style
transform that walks the Markdown AST and performs the same regex substitutions
only on plain text nodes (e.g., visit/text nodes) while explicitly skipping
nodes of type "code", "inlineCode", "math", and "inlineMath". Update the
function (or add a new remark plugin referenced by normalizeLatexDelimiters) to
accept/transform an MDAST root, apply the two replacements on node.value for
eligible text nodes, and return the root so markdown parsing preserves literal
code blocks and existing math nodes.
- Around line 120-122: The current logic that treats a blank trimmed line (where
trimmed === "") as a safe split and sets lastCompleteBoundary = i can break
CommonMark containers (e.g., lists/blockquote that continue across blank lines)
and causes suffix-only rendering to diverge from renderMarkdown(text); change
the split strategy so that before accepting a boundary you check whether the
previous top-level block can legally continue (or whether the blank line is
still part of the same container) and if it can, fall back to re-rendering the
full complete section instead of only the suffix, or implement caching of parsed
top-level AST nodes and use those to determine safe split points; update the
code paths that compute lastCompleteBoundary, complete, and suffix and the code
that calls renderMarkdown to use this container-aware decision (or the cached
top-level AST) so streamed DOM matches full renderMarkdown output.

---

Outside diff comments:
In @.github/workflows/go-ci.yml:
- Around line 49-67: The cache keys for the restore and save steps differ: the
"restore-simple-responder" step hashes
'cmd/simple-responder/simple-responder.go' but the "save-simple-responder" step
hashes 'misc/simple-responder/simple-responder.go', preventing cache reuse;
update the key in the save step (save-simple-responder) to use the same
hashFiles path 'cmd/simple-responder/simple-responder.go' so both steps compute
identical cache keys.
- Around line 49-58: The workflow excludes cmd/** changes but the restore/save
cache keys and this job depend on cmd/simple-responder/simple-responder.go,
causing CI to skip and cache misses; update the workflow path filter to include
cmd/simple-responder/** (or specifically
cmd/simple-responder/simple-responder.go) so PRs touching that file trigger the
job, and make the cache key paths consistent by changing either the restore key
or the save key so both reference the same source file (ensure the cache step
identified as restore-simple-responder and the corresponding save step use
identical hashFiles('cmd/simple-responder/simple-responder.go') or both point to
misc/... if you intend that path).

---

Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 265-268: The code currently ignores errors from http.NewRequest
and processGroup.ProxyRequest and always emits ModelPreloadedEvent{ModelName:
mid, Success: true}; change this so you capture and check both errors: call
http.NewRequest and handle its error, call processGroup.ProxyRequest and capture
any returned error, only emit ModelPreloadedEvent with Success: true if both
succeed; on failure, emit Success: false and log the error via
proxyLogger.Errorf (include mid and error), and avoid treating the preload as
successful when ProxyRequest or request creation fails.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 30-40: The fenced code block showing the example commit message in
AGENTS.md is missing a language identifier and triggers a markdown lint warning;
update that block (the triple-backtick block containing "proxy: add new feature"
and the list) to include a language tag such as text or markdown (e.g., change
``` to ```text) so the linter recognizes the block type and the warning is
resolved.

In `@docker/unified/install-llama-swap.sh`:
- Around line 30-39: The unauthenticated curl call that resolves the latest
release (using VERSION, REPO and the curl pipeline) can hit GitHub rate limits;
update the script to use a GITHUB_TOKEN when present by adding an Authorization
header to the curl request (e.g., set header only if $GITHUB_TOKEN is non-empty)
and fall back to the unauthenticated request otherwise, keeping the same parsing
of tag_name and the existing error handling for an empty VERSION.

In `@proxy/config/model_config.go`:
- Around line 8-10: The constant MODEL_CONFIG_DEFAULT_TTL uses
SCREAMING_SNAKE_CASE; rename it to follow Go idioms (e.g., ModelConfigDefaultTTL
for exported or defaultModelTTL for package-private) and update all references,
including any usages in functions, tests, and imports in this package so
compilation succeeds; ensure documentation/comments and any JSON/tags that
relied on the old name are adjusted accordingly.

In `@proxy/peerproxy.go`:
- Around line 40-54: The transport creation currently constructs a fresh
http.Transport and loses stdlib defaults; change it to start from
http.DefaultTransport.Clone() (type-assert to *http.Transport) and then override
only peer-specific fields: set DialContext using a net.Dialer with Timeout and
KeepAlive, and set TLSHandshakeTimeout, ResponseHeaderTimeout,
ExpectContinueTimeout, IdleConnTimeout, and MaxIdleConnsPerHost as before;
update the variable peerTransport in peerproxy.go to use the cloned transport so
defaults like Proxy, ForceAttemptHTTP2, and MaxIdleConns are preserved.

In `@proxy/process_test.go`:
- Line 588: The test creates a local variable named debugLogger that shadows the
package-level debugLogger; rename the local variable (e.g., testLogger) where
NewLogMonitorWriter(io.Discard) is called in process_test.go to avoid shadowing
and potential confusion and update any subsequent references in the test to use
the new name (reference: debugLogger, NewLogMonitorWriter).

In `@ui-svelte/src/components/playground/ChatInterface.svelte`:
- Around line 60-75: The throttled persistence currently using $effect,
lastSaveTime, and the save() closure can miss the latest messages if the
component unmounts before the timer fires; add a final flush in an onDestroy
handler that calls the same save logic (writing JSON.stringify(messages) to
"playground-messages" via localStorage) to guarantee persistence on teardown, or
extract the save functionality into a shared function and invoke it both from
the throttled timer and from onDestroy to avoid duplication.

In `@ui-svelte/src/stores/api.ts`:
- Line 65: Fix the spacing inside the options object passed to
String.prototype.localeCompare: change the options from "{ numeric : true}" to
"{ numeric: true }" in the expression "(a.name + a.id).localeCompare(b.name +
b.id, undefined, { numeric : true} )" so the colon has no leading space and add
the closing-space before the final parenthesis for consistent formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b3e4dd0d-49c4-46ac-8ac3-97ca408fb785

📥 Commits

Reviewing files that changed from the base of the PR and between 78dbba7 and 0353306.

⛔ Files ignored due to path filters (1)
  • ui-svelte/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (70)
  • .coderabbit.yaml
  • .github/workflows/config-schema.yml
  • .github/workflows/containers.yml
  • .github/workflows/go-ci.yml
  • .github/workflows/unified-docker.yml
  • AGENTS.md
  • CLAUDE.md
  • Makefile
  • README.md
  • cmd/simple-responder/simple-responder.go
  • config-schema.json
  • config.example.yaml
  • docker/build-container.sh
  • docker/build-image.sh
  • docker/unified/Dockerfile
  • docker/unified/README.md
  • docker/unified/build-image.sh
  • docker/unified/config.example.yaml
  • docker/unified/install-ik-llama.sh
  • docker/unified/install-llama-swap.sh
  • docker/unified/install-llama.sh
  • docker/unified/install-sd.sh
  • docker/unified/install-whisper.sh
  • docs/configuration.md
  • go.mod
  • proxy/config/config.go
  • proxy/config/config_posix_test.go
  • proxy/config/config_test.go
  • proxy/config/config_windows_test.go
  • proxy/config/filters.go
  • proxy/config/filters_test.go
  • proxy/config/macro_in_macro_test.go
  • proxy/config/model_config.go
  • proxy/config/model_config_test.go
  • proxy/config/peer.go
  • proxy/events.go
  • proxy/metrics_monitor.go
  • proxy/metrics_monitor_test.go
  • proxy/peerproxy.go
  • proxy/peerproxy_test.go
  • proxy/process.go
  • proxy/process_test.go
  • proxy/proxymanager.go
  • proxy/proxymanager_api.go
  • proxy/proxymanager_test.go
  • ui-svelte/.npmrc
  • ui-svelte/package.json
  • ui-svelte/src/App.svelte
  • ui-svelte/src/components/Header.svelte
  • ui-svelte/src/components/LogPanel.svelte
  • ui-svelte/src/components/ModelsPanel.svelte
  • ui-svelte/src/components/StatsPanel.svelte
  • ui-svelte/src/components/playground/AudioInterface.svelte
  • ui-svelte/src/components/playground/ChatInterface.svelte
  • ui-svelte/src/components/playground/ChatMessage.svelte
  • ui-svelte/src/components/playground/ImageInterface.svelte
  • ui-svelte/src/components/playground/ModelSelector.svelte
  • ui-svelte/src/components/playground/RerankInterface.svelte
  • ui-svelte/src/components/playground/SpeechInterface.svelte
  • ui-svelte/src/lib/markdown.test.ts
  • ui-svelte/src/lib/markdown.ts
  • ui-svelte/src/lib/rerankApi.ts
  • ui-svelte/src/lib/sdApi.ts
  • ui-svelte/src/lib/types.ts
  • ui-svelte/src/routes/Playground.svelte
  • ui-svelte/src/routes/PlaygroundStub.svelte
  • ui-svelte/src/stores/api.ts
  • ui-svelte/src/stores/playgroundActivity.ts
  • ui-svelte/src/stores/route.ts
  • ui-svelte/vite.config.ts
✅ Files skipped from review due to trivial changes (11)
  • .coderabbit.yaml
  • CLAUDE.md
  • docker/unified/README.md
  • go.mod
  • ui-svelte/.npmrc
  • ui-svelte/src/routes/PlaygroundStub.svelte
  • ui-svelte/src/components/ModelsPanel.svelte
  • docker/unified/config.example.yaml
  • ui-svelte/vite.config.ts
  • ui-svelte/package.json
  • proxy/events.go

Comment on lines +52 to +61
backends=()
# schedule uses defaults (build both); workflow_dispatch respects inputs
if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then
backends+=("cuda")
fi
if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then
backends+=("vulkan")
fi
matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
echo "matrix=$matrix" >> $GITHUB_OUTPUT

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

H2: Return [] when no backend is selected.

If both inputs are false, this still serializes one empty entry, so the build job runs once with matrix.backend == "" instead of being skipped.

Suggested fix
-          matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
-          echo "matrix=$matrix" >> $GITHUB_OUTPUT
+          if [[ ${`#backends`[@]} -eq 0 ]]; then
+            matrix='[]'
+          else
+            matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
+          fi
+          echo "matrix=$matrix" >> "$GITHUB_OUTPUT"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backends=()
# schedule uses defaults (build both); workflow_dispatch respects inputs
if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then
backends+=("cuda")
fi
if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then
backends+=("vulkan")
fi
matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
echo "matrix=$matrix" >> $GITHUB_OUTPUT
backends=()
# schedule uses defaults (build both); workflow_dispatch respects inputs
if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_cuda }}" == "true" ]]; then
backends+=("cuda")
fi
if [[ "${{ github.event_name }}" == "schedule" ]] || [[ "${{ inputs.build_vulkan }}" == "true" ]]; then
backends+=("vulkan")
fi
if [[ ${`#backends`[@]} -eq 0 ]]; then
matrix='[]'
else
matrix=$(printf '%s\n' "${backends[@]}" | jq -R . | jq -sc .)
fi
echo "matrix=$matrix" >> "$GITHUB_OUTPUT"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unified-docker.yml around lines 52 - 61, The matrix
serialization currently produces an empty-string entry when the backends array
is empty; modify the logic around backends and matrix so that if backends is
empty you write an explicit empty JSON array ("[]") to GITHUB_OUTPUT instead of
serializing an empty element. Locate the backends array and the matrix
generation (variables backends and matrix, and the echo "matrix=$matrix" >>
$GITHUB_OUTPUT) and add a conditional: if backends has no elements set
matrix='[]' else build the JSON with printf | jq as before, then echo matrix to
GITHUB_OUTPUT.

Comment thread AGENTS.md
## Tech stack

- golang
- typescript, vite and svelt5 for UI (located in ui/)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "svelt5" should be "svelte5".

Proposed fix
-- typescript, vite and svelt5 for UI (located in ui/)
+- typescript, vite and svelte5 for UI (located in ui/)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- typescript, vite and svelt5 for UI (located in ui/)
- typescript, vite and svelte5 for UI (located in ui/)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 8, Update the typo "svelt5" to "svelte5" in the
documentation string (the phrase "typescript, vite and svelt5 for UI (located in
ui/)") so the UI tech list correctly reads "typescript, vite and svelte5 for
UI"; ensure only the spelling is changed and no other text is modified.

Comment thread docker/build-image.sh
Comment on lines +117 to +128
get_default_branch() {
local repo_url="$1"

# Check for master first
if git ls-remote --heads "${repo_url}" master &>/dev/null; then
echo "master"
elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
echo "main"
else
echo "master" # fallback
fi
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo="https://github.com/ggml-org/whisper.cpp.git"

set +e
git ls-remote --heads "$repo" branch-that-does-not-exist >/dev/null 2>&1
plain_status=$?
git ls-remote --exit-code --heads "$repo" branch-that-does-not-exist >/dev/null 2>&1
strict_status=$?
set -e

echo "plain ls-remote status: $plain_status"
echo "--exit-code status: $strict_status"
echo -n "remote HEAD branch: "
git ls-remote --symref "$repo" HEAD \
  | awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'

Repository: napmany/llmsnap

Length of output: 133


Fix get_default_branch to discover non-master defaults.

Line 121 uses git ls-remote --heads ... master without --exit-code. This command returns success (exit code 0) even when the branch does not exist, so the if-condition always succeeds for any reachable repository and the elif chain never reaches the main probe. Repositories with main as the default branch will then produce an empty hash in get_latest_commit and fail the build.

Use --exit-code to make the probes fail when a branch is absent, or read the remote HEAD symref directly.

Suggested fix
 function get_default_branch() {
     local repo_url="$1"
-
-    # Check for master first
-    if git ls-remote --heads "${repo_url}" master &>/dev/null; then
-        echo "master"
-    elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
-        echo "main"
-    else
-        echo "master"  # fallback
-    fi
+    local branch
+
+    branch="$(
+        git ls-remote --symref "${repo_url}" HEAD 2>/dev/null \
+            | awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'
+    )"
+
+    if [[ -n "${branch}" ]]; then
+        echo "${branch}"
+    elif git ls-remote --exit-code --heads "${repo_url}" main >/dev/null 2>&1; then
+        echo "main"
+    elif git ls-remote --exit-code --heads "${repo_url}" master >/dev/null 2>&1; then
+        echo "master"
+    else
+        echo "ERROR: Could not determine default branch for ${repo_url}" >&2
+        return 1
+    fi
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
get_default_branch() {
local repo_url="$1"
# Check for master first
if git ls-remote --heads "${repo_url}" master &>/dev/null; then
echo "master"
elif git ls-remote --heads "${repo_url}" main &>/dev/null; then
echo "main"
else
echo "master" # fallback
fi
}
get_default_branch() {
local repo_url="$1"
local branch
branch="$(
git ls-remote --symref "${repo_url}" HEAD 2>/dev/null \
| awk '/^ref:/ {sub("refs/heads/","",$2); print $2; exit}'
)"
if [[ -n "${branch}" ]]; then
echo "${branch}"
elif git ls-remote --exit-code --heads "${repo_url}" main >/dev/null 2>&1; then
echo "main"
elif git ls-remote --exit-code --heads "${repo_url}" master >/dev/null 2>&1; then
echo "master"
else
echo "ERROR: Could not determine default branch for ${repo_url}" >&2
return 1
fi
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/build-image.sh` around lines 117 - 128, The get_default_branch
function currently probes branches with git ls-remote --heads which returns
success even if the ref is absent; update get_default_branch to perform reliable
detection by either adding --exit-code to each git ls-remote --heads probe
(e.g., git ls-remote --exit-code --heads ...) so the if/elif properly fails when
the branch is missing, or alternatively query the remote HEAD symref (git
ls-remote --symref <repo_url> HEAD) and parse the target to determine the
default branch; ensure the function still falls back predictably (used by
get_latest_commit) when detection fails.

Comment on lines +177 to +179
if [[ -n "${LS_VERSION:-}" ]]; then
LS_HASH=$(resolve_ref "${LLAMA_SWAP_REPO}" "${LS_VERSION}") || exit 1
echo "llama-swap: ${LS_VERSION} -> ${LS_HASH}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

H3: LS_VERSION no longer accepts the values this script advertises.

resolve_ref() rejects latest and bare release numbers like 170, so the documented examples/help text now terminate before docker buildx starts. Either pass llama-swap release selectors through unchanged, or tighten the docs/input descriptions so this path only accepts real git refs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/unified/build-image.sh` around lines 177 - 179, The script is calling
resolve_ref("${LLAMA_SWAP_REPO}" "${LS_VERSION}") which rejects user-friendly
selectors like "latest" or bare release numbers, causing help/examples to fail;
update build-image.sh so that when LS_VERSION is set the script either (a) skips
resolving and passes the raw LS_VERSION through to the build (i.e. set
LS_HASH="${LS_VERSION}" when resolve_ref would reject it) or (b)
validate/normalize LS_VERSION up front and restrict the help text to only
document real git refs; modify the logic around LS_VERSION/LS_HASH and the
resolve_ref invocation (and update any help text) so LLAMA_SWAP_REPO uses a
usable ref for docker buildx instead of aborting for "latest"/numeric values.

Comment on lines +32 to +45
if [ "$BACKEND" = "cuda" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=ON
-DGGML_VULKAN=OFF
"-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
)
elif [ "$BACKEND" = "vulkan" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=OFF
-DGGML_VULKAN=ON
)
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

M2 (Medium): Fail fast on unsupported BACKEND values.

Any value other than cuda or vulkan falls through with only the common CMake flags, which can leave you with a “successful” image that lacks the requested accelerator support. Add an explicit failure here; the same guard is needed in docker/unified/install-sd.sh.

Suggested fix
 if [ "$BACKEND" = "cuda" ]; then
     CMAKE_FLAGS+=(
         -DGGML_CUDA=ON
         -DGGML_VULKAN=OFF
         "-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"
         "-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
         "-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
     )
 elif [ "$BACKEND" = "vulkan" ]; then
     CMAKE_FLAGS+=(
         -DGGML_CUDA=OFF
         -DGGML_VULKAN=ON
     )
+else
+    echo "FATAL: unsupported BACKEND '$BACKEND' (expected cuda or vulkan)" >&2
+    exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$BACKEND" = "cuda" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=ON
-DGGML_VULKAN=OFF
"-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
)
elif [ "$BACKEND" = "vulkan" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=OFF
-DGGML_VULKAN=ON
)
fi
if [ "$BACKEND" = "cuda" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=ON
-DGGML_VULKAN=OFF
"-DCMAKE_CUDA_ARCHITECTURES=${CMAKE_CUDA_ARCHITECTURES:?CMAKE_CUDA_ARCHITECTURES must be set}"
"-DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
"-DCMAKE_EXE_LINKER_FLAGS=-Wl,-rpath-link,/usr/local/cuda/lib64/stubs -lcuda"
)
elif [ "$BACKEND" = "vulkan" ]; then
CMAKE_FLAGS+=(
-DGGML_CUDA=OFF
-DGGML_VULKAN=ON
)
else
echo "FATAL: unsupported BACKEND '$BACKEND' (expected cuda or vulkan)" >&2
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/unified/install-llama.sh` around lines 32 - 45, The script currently
only handles BACKEND="cuda" and "vulkan" so add a fail-fast else branch after
the existing if/elif to error out for any other BACKEND; update install-llama.sh
to check the BACKEND variable and call echo >&2 with a clear message like
"Unsupported BACKEND: $BACKEND" and exit 1 if it's not "cuda" or "vulkan", and
apply the same guard change to docker/unified/install-sd.sh; reference the
BACKEND variable and the CMAKE_FLAGS assignment to locate where to insert the
else { echo ...; exit 1 } branch.

Comment thread ui-svelte/src/App.svelte
Comment on lines +51 to +56
<div class="h-full" class:hidden={$currentRoute !== "/"}>
<Playground />
</div>
<div class="h-full" class:hidden={$currentRoute === "/"}>
<Router {routes} on:routeLoaded={handleRouteLoaded} />
</div>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

H1: Only mount Playground on the root route.

class:hidden just toggles CSS. On non-root pages, the whole playground subtree still mounts, so its state and effects stay live in the background. If the router needs to stay mounted for routeLoaded, you can still keep <Router> mounted and gate only <Playground>.

Suggested fix
   <main class="flex-1 overflow-auto p-4">
-    <div class="h-full" class:hidden={$currentRoute !== "/"}>
-      <Playground />
-    </div>
+    {`#if` $currentRoute === "/"}
+      <div class="h-full">
+        <Playground />
+      </div>
+    {/if}
     <div class="h-full" class:hidden={$currentRoute === "/"}>
       <Router {routes} on:routeLoaded={handleRouteLoaded} />
     </div>
   </main>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<div class="h-full" class:hidden={$currentRoute !== "/"}>
<Playground />
</div>
<div class="h-full" class:hidden={$currentRoute === "/"}>
<Router {routes} on:routeLoaded={handleRouteLoaded} />
</div>
{`#if` $currentRoute === "/"}
<div class="h-full">
<Playground />
</div>
{/if}
<div class="h-full" class:hidden={$currentRoute === "/"}>
<Router {routes} on:routeLoaded={handleRouteLoaded} />
</div>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/App.svelte` around lines 51 - 56, The Playground subtree is
still mounted because class:hidden only hides it; change to conditionally render
Playground so it only mounts on the root route: replace the div wrapping
Playground with an {`#if` $currentRoute === "/"} block that renders <Playground />
(and remove the class:hidden usage), while leaving <Router {routes}
on:routeLoaded={handleRouteLoaded} /> mounted as-is so routeLoaded still fires.

Comment on lines +26 to +28
$effect(() => {
playgroundStores.audioTranscribing.set(isTranscribing);
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

L1: Reset audioTranscribing when this component unmounts.

This effect only writes the current value. If the component is destroyed while isTranscribing is true, the global activity flag stays stuck true.

Suggested fix
   $effect(() => {
     playgroundStores.audioTranscribing.set(isTranscribing);
+    return () => {
+      playgroundStores.audioTranscribing.set(false);
+    };
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$effect(() => {
playgroundStores.audioTranscribing.set(isTranscribing);
});
$effect(() => {
playgroundStores.audioTranscribing.set(isTranscribing);
return () => {
playgroundStores.audioTranscribing.set(false);
};
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/components/playground/AudioInterface.svelte` around lines 26 -
28, The effect setting playgroundStores.audioTranscribing from isTranscribing
doesn't clear the flag on unmount, so add an unmount cleanup that resets
playgroundStores.audioTranscribing to false: either return a cleanup function
from the existing $effect that calls
playgroundStores.audioTranscribing.set(false) or add an onDestroy(() =>
playgroundStores.audioTranscribing.set(false)) next to the $effect to ensure the
global flag is cleared when the component is destroyed.

Comment on lines +33 to +38
// SDAPI lora state
let availableLoras = $state<SdApiLora[]>([]);
let selectedLoras = $state<SdApiLoraRef[]>([]);
let isLoadingLoras = $state(false);
let lorasLoaded = $state(false);
let loraError = $state<string | null>(null);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

M1 (Medium): Reset LoRA state when the selected model changes.

availableLoras, selectedLoras, and lorasLoaded persist across model switches, but selectedLoras is sent unchanged in the next SDAPI request. After loading LoRAs for model A, switching to model B can submit A’s LoRAs to B. Clear this state on model change, or key it by model.

Suggested fix
   let availableLoras = $state<SdApiLora[]>([]);
   let selectedLoras = $state<SdApiLoraRef[]>([]);
   let isLoadingLoras = $state(false);
   let lorasLoaded = $state(false);
   let loraError = $state<string | null>(null);
+
+  $effect(() => {
+    $selectedModelStore;
+    availableLoras = [];
+    selectedLoras = [];
+    lorasLoaded = false;
+    loraError = null;
+  });

Also applies to: 47-62, 99-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/components/playground/ImageInterface.svelte` around lines 33 -
38, When the user changes models, clear or re-key the SDAPI LoRA state so LoRAs
from the previous model aren’t sent to the new one: reset availableLoras,
selectedLoras, lorasLoaded and loraError whenever the selected model (or its
reactive store) changes; alternatively scope these stores by model id (e.g., a
Map keyed by model name). Update the reactive blocks or watchers that load LoRAs
(the code manipulating availableLoras, selectedLoras, isLoadingLoras,
lorasLoaded, loraError — and the similar logic repeated in the other
occurrences) to perform this reset before starting the new load and ensure
selectedLoras is always tied to the current model.

Comment on lines +120 to +122
// Outside fences/math: blank line marks a complete boundary
if (trimmed === "") {
lastCompleteBoundary = i;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

H2 — Suffix-only rendering breaks container markdown.

A blank line is not a safe split point for all CommonMark containers. For example, - a\n\n- b and > a\n\n> b are still single list/blockquote structures when parsed together, but Line 218 renders the suffix in isolation, so the streamed final DOM can diverge from renderMarkdown(text). Please fall back to re-rendering the full complete section whenever the previous block can legally continue, or cache parsed top-level AST nodes instead of raw string slices.

Also applies to: 214-221

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/lib/markdown.ts` around lines 120 - 122, The current logic that
treats a blank trimmed line (where trimmed === "") as a safe split and sets
lastCompleteBoundary = i can break CommonMark containers (e.g., lists/blockquote
that continue across blank lines) and causes suffix-only rendering to diverge
from renderMarkdown(text); change the split strategy so that before accepting a
boundary you check whether the previous top-level block can legally continue (or
whether the blank line is still part of the same container) and if it can, fall
back to re-rendering the full complete section instead of only the suffix, or
implement caching of parsed top-level AST nodes and use those to determine safe
split points; update the code paths that compute lastCompleteBoundary, complete,
and suffix and the code that calls renderMarkdown to use this container-aware
decision (or the cached top-level AST) so streamed DOM matches full
renderMarkdown output.

Comment on lines +239 to +245
// Convert \[...\] to $$...$$ and \(...\) to $...$
export function normalizeLatexDelimiters(text: string): string {
// Display math: \[...\] → $$...$$ (may span multiple lines)
text = text.replace(/\\\[([\s\S]*?)\\\]/g, (_match, inner) => `$$${inner}$$`);
// Inline math: \(...\) → $...$
text = text.replace(/\\\(([\s\S]*?)\\\)/g, (_match, inner) => `$${inner}$`);
return text;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

M1 — Raw-string LaTeX normalization mutates code examples.

Line 254 normalizes \(...\) and \[...\] before markdown parsing, so the same sequences inside fenced code blocks or inline code are rewritten into $...$/$$...$$ and no longer render as literal code. This should run as a remark transform over plain text nodes only, skipping code, inlineCode, and existing math nodes.

Also applies to: 254-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/lib/markdown.ts` around lines 239 - 245,
normalizeLatexDelimiters currently does blind string replacements and mutates
fenced code/inline code; replace it with a remark-style transform that walks the
Markdown AST and performs the same regex substitutions only on plain text nodes
(e.g., visit/text nodes) while explicitly skipping nodes of type "code",
"inlineCode", "math", and "inlineMath". Update the function (or add a new remark
plugin referenced by normalizeLatexDelimiters) to accept/transform an MDAST
root, apply the two replacements on node.value for eligible text nodes, and
return the root so markdown parsing preserves literal code blocks and existing
math nodes.

The previous approach used a seen map that collapsed non-contiguous
same-group entries into a single batch, reordering the preload sequence.
This change preserves the original preload order by only batching models
that appear consecutively for the same group.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants