proxy: parallelize startup preload within process groups#637
Conversation
WalkthroughStartup preload in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity. This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously. - Group preload models by ProcessGroup before starting - Use sync.WaitGroup to parallelize within each group - Call swapProcessGroup once per group (handles exclusivity) - Log group completion with model count Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
78dbba7 to
0353306
Compare
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/examples/aider-qwq-coder/README.md (2)
145-152:⚠️ Potential issue | 🟠 MajorFix the broken multiline aider command.
The line after
--model-settings-file aider.model.settings.ymlis missing a trailing\, so copy/pasting this block runs--openai-api-keyas a separate shell command.Suggested fix
$ aider --architect \ --no-show-model-warnings \ --model openai/aider:QwQ \ --editor-model openai/aider:qwen-coder-32B \ --config aider.conf.yml \ - --model-settings-file aider.model.settings.yml + --model-settings-file aider.model.settings.yml \ --openai-api-key "sk-na" \ --openai-api-base "http://10.0.1.24:8080/v1"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/examples/aider-qwq-coder/README.md` around lines 145 - 152, The multiline shell example for the aider command is missing a trailing backslash after the `--model-settings-file aider.model.settings.yml` line causing `--openai-api-key` to be treated as a new command; fix the example by adding a trailing `\` to the end of the `--model-settings-file aider.model.settings.yml` line (so the `--openai-api-key` and `--openai-api-base` options remain part of the same continued command) and verify the `aider --architect` block uses `\` at the end of each continued line.
130-139:⚠️ Potential issue | 🟠 MajorKeep the profiled model names consistent.
This section defines
openai/aider:qwen-coder-32B, butweak_model_nameandeditor_model_namepoint atopenai/aider:qwen-coder-32B-aider. That target is never declared in the example, so the config will not resolve as written.Suggested fix
- name: "openai/aider:QwQ" - weak_model_name: "openai/aider:qwen-coder-32B-aider" - editor_model_name: "openai/aider:qwen-coder-32B-aider" + weak_model_name: "openai/aider:qwen-coder-32B" + editor_model_name: "openai/aider:qwen-coder-32B" - name: "openai/aider:qwen-coder-32B" - editor_model_name: "openai/aider:qwen-coder-32B-aider" + editor_model_name: "openai/aider:qwen-coder-32B"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/examples/aider-qwq-coder/README.md` around lines 130 - 139, The model settings reference mismatched profile-tagged names: the entry with name "openai/aider:qwen-coder-32B" points its weak_model_name and editor_model_name at "openai/aider:qwen-coder-32B-aider", which is not declared; make the identifiers consistent by either renaming the top-level name to "openai/aider:qwen-coder-32B-aider" or changing weak_model_name and editor_model_name to "openai/aider:qwen-coder-32B" so that the name in the model settings file matches the targets; update the entries that mention weak_model_name and editor_model_name accordingly to ensure every referenced model (e.g., "openai/aider:qwen-coder-32B" and "openai/aider:qwen-coder-32B-aider") is declared with the same profile tag "aider:".
🧹 Nitpick comments (4)
cmd/simple-responder/simple-responder.go (1)
286-292: Handle body read failures explicitly in/collective_rpc.At Line [287],
io.ReadAllignores errors. If body read fails, the handler silently falls through to a generic 400-path, which makes wake-sequence test failures harder to diagnose.Proposed patch
r.POST("/collective_rpc", func(c *gin.Context) { - body, _ := io.ReadAll(c.Request.Body) + body, err := io.ReadAll(c.Request.Body) + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "failed to read request body"}) + return + } if strings.Contains(string(body), "reload_weights") { c.Status(http.StatusOK) } else { c.Status(http.StatusBadRequest) } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/simple-responder/simple-responder.go` around lines 286 - 292, The handler registered in r.POST("/collective_rpc", func(c *gin.Context) {...}) currently ignores the error returned by io.ReadAll(c.Request.Body); change this to check the error, log or attach the error to the context, and return an explicit failure status (e.g., 500 Internal Server Error) when ReadAll fails instead of silently falling back to the generic 400 path — only proceed to inspect the body for "reload_weights" if err == nil. Ensure you reference the same handler closure when adding the error check and use c.Status/c.JSON (or c.AbortWithStatus) to return the failure response.docker/llmsnap-sd.Containerfile (1)
3-3: Note: Default BASE value assumes local image exists.The default
BASE=llmsnap:latestis correct for the rebranding. However, if someone builds this Containerfile manually without providing--build-arg BASE=<image>, the build will fail because no localllmsnap:latestimage exists yet.This is acceptable since the normal build workflow (in
docker/build-container.sh) overrides this default. Just be aware that standalone/manual builds require explicitly passing the BASE argument.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/llmsnap-sd.Containerfile` at line 3, The Containerfile's ARG BASE currently defaults to a local-only image name ("ARG BASE=llmsnap:latest") which causes manual builds to fail if that local image doesn't exist; change the default ARG BASE to a widely available public base image (for example a stable distro slim tag) or add a clear inline comment above ARG BASE instructing maintainers to pass --build-arg BASE=<image> when building manually; update the ARG line (symbol: ARG BASE) and/or the Containerfile comment so the normal build script docker/build-container.sh can still override it while manual builds succeed.ui-svelte/src/stores/theme.ts (1)
7-7: Preserve existing stored title values across the key rename.Line 7 changes the storage key, which will reset title customization for existing users. Consider either keeping the old key or adding one-time migration.
Suggested minimal compatibility fix
-export const appTitle = persistentStore<string>("appTitle", "llmsnap"); +export const appTitle = persistentStore<string>("app-title", "llmsnap");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/stores/theme.ts` at line 7, The change to the persistentStore key for appTitle will reset existing users' custom titles; update the appTitle initialization to first check the old storage key and migrate it to the new key (or fall back to the old key if present) before calling persistentStore, i.e., use the persistentStore API or storage get/set helpers to read the previous key value and write it under the new key so existing values are preserved (referencing appTitle and persistentStore<string>("appTitle", "llmsnap") to locate the change).README.md (1)
53-53: Minor style suggestion: hyphenation.Consider using "real-time" instead of "real time" for consistency with compound adjective conventions.
Proposed fix
-llmsnap includes a real time web interface for monitoring logs and controlling models: +llmsnap includes a real-time web interface for monitoring logs and controlling models:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 53, Update the sentence containing the phrase "llmsnap includes a real time web interface for monitoring logs and controlling models:" to use the compound adjective "real-time" (i.e., change "real time web interface" to "real-time web interface") so the README's wording is consistent with hyphenation conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ai-plans/issue-264-add-metadata.md`:
- Around line 233-234: Update the tests in proxymanager_test.go to stop
asserting or checking the old key name "llamaswap_meta" and instead assert the
new "llmsnap_meta" behavior: replace any occurrences of "llamaswap_meta" with
"llmsnap_meta" in the test fixtures and assertions, remove mixed old/new-key
checks so each test is consistent (i.e., when metadata exists assert presence
and content of llmsnap_meta; when metadata is empty assert the key is absent),
and apply the same replacements/cleanup for the other affected test sections
referenced around the same areas (the snippets near the earlier mentioned
checks).
- Around line 88-90: The changes introduce API metadata/rebranding (the
llmsnap_meta key and related bullets "Add metadata to each model record under
the key `llmsnap_meta`", "Only include `llmsnap_meta` if metadata is non-empty",
"Preserve all types when marshaling to JSON") which are outside this PR's stated
scope (preload concurrency for process groups); remove or revert those
llmsnap_meta bullets and any rebranding language from this document so the file
only documents the preload concurrency change, and move the llmsnap_meta rebrand
text into a separate focused PR (and corresponding release-note entry)
referencing the llmsnap_meta key to avoid scope drift.
In `@CLAUDE.md`:
- Line 5: Update the sentence in CLAUDE.md to use correct compound-word
hyphenation and spelling: change "light weight" to "lightweight" and add a
hyphen in "OpenAI-compatible" so the phrase reads "llmsnap is a lightweight,
transparent proxy server that provides automatic model swapping to vllm,
llama.cpp and other OpenAI-compatible servers."
In `@codemaps/architecture.md`:
- Line 11: There are three unlabeled fenced code blocks; update each opening
triple-backtick to include a language tag (e.g., change ``` to ```text) so the
blocks are labeled and MD040-compliant; locate the three unlabeled ``` blocks in
the document and add the language specifier to each opening fence.
In `@config-schema.json`:
- Around line 242-315: The schema currently allows "sleepMode": "enable" without
enforcing that sleepEndpoints and wakeEndpoints are present and non-empty; add
an if/then that checks if sleepMode equals "enable" and then requires
["sleepEndpoints","wakeEndpoints"] and enforces "minItems": 1 for both
sleepEndpoints and wakeEndpoints array schemas so editor/JSON-schema validation
matches the runtime checks in proxy/config/model_config.go (referencing
sleepMode, sleepEndpoints, wakeEndpoints).
In `@docs/examples/README.md`:
- Line 3: Fix the typo in the README sentence "A collections of usecases and
examples for getting the most out of llmsnap." by changing it to "A collection
of use cases and examples for getting the most out of llmsnap." — update the
words "collections" → "collection" and "usecases" → "use cases" in the README
header/intro line that currently contains that exact string.
In `@docs/examples/restart-on-config-change/README.md`:
- Line 3: Update the user-facing sentence in the README text that currently
reads "trail and error" to the correct phrase "trial and error"; locate the
sentence mentioning `watch-and-restart.sh`, `config.yaml`, and `llmsnap` in the
README and replace the typo so the line reads "Sometimes editing the
configuration file can take a bit of trial and error to get a model
configuration tuned just right.".
In `@proxy/metrics_monitor.go`:
- Around line 200-201: The fallback metric path is using recorder.StartTime()
which can be zero if recorder.Write never set its start; instead use the
originally captured requestStartTime as the fallback. Update the code around the
newBodyCopier/recorder usage so any metric calculation that currently calls
recorder.StartTime() falls back to requestStartTime (or use
recorder.RequestTime() if present) — e.g., when computing DurationMs in the
default metric path replace recorder.StartTime() with a function that returns
recorder.StartTime() if non-zero else requestStartTime (referencing
requestStartTime, recorder, newBodyCopier, RequestTime(), and StartTime()).
- Around line 215-218: The warning currently logs the raw upstream response body
(recorder.body.Bytes()) when recorder.Status() != http.StatusOK; instead
sanitize and truncate the message before logging: build a safeSnippet by
stripping non-printable characters and limiting to a hard cap (e.g. 200 chars)
and log that snippet with mp.logger.Warnf along with recorder.Status() and
request.URL.Path, and keep the full body only for debug/capture paths (e.g.,
when a debug flag or capture hook is enabled) so raw provider payloads are not
written to standard warning logs.
In `@proxy/process.go`:
- Around line 347-350: The health URL construction failure occurs after the
child process has been started (around the p.buildFullURL(checkEndpoint) call),
so before returning the fmt.Errorf you must cleanly stop/reset the process state
to avoid leaving it in StateStarting; call the process teardown method on p
(e.g., the existing stop/shutdown/cleanup method used elsewhere in this file),
ensure the process state is set to a non-starting state
(StateStopped/StateFailed), handle any teardown error (log or wrap it) and then
return the original URL construction error referencing p.config.Proxy and
checkEndpoint.
In `@proxy/processgroup_test.go`:
- Around line 171-199: The test "non-sleep processes are stopped" creates a
ProcessGroup with NewProcessGroup and starts processes but lacks the consistent
teardown used elsewhere; add a deferred cleanup by calling
pg.StopProcesses(StopImmediately) (e.g., defer
pg.StopProcesses(StopImmediately)) immediately after creating pg so processes
started in the loop and affected by pg.MakeIdleProcesses() are always stopped.
In `@scripts/uninstall.sh`:
- Line 38: The uninstall script currently unquotes the command substitution in
the line that runs $SUDO rm $(which llmsnap), which can break on paths
containing spaces and when which returns empty; update the uninstall.sh logic
that references which llmsnap to (1) use command -v llmsnap instead of which for
a POSIX-safe lookup, (2) capture the result into a variable like snap_path, (3)
check that snap_path is non-empty before attempting removal, and (4) quote the
variable when calling rm (e.g., "$snap_path") so paths with spaces are handled
safely.
---
Outside diff comments:
In `@docs/examples/aider-qwq-coder/README.md`:
- Around line 145-152: The multiline shell example for the aider command is
missing a trailing backslash after the `--model-settings-file
aider.model.settings.yml` line causing `--openai-api-key` to be treated as a new
command; fix the example by adding a trailing `\` to the end of the
`--model-settings-file aider.model.settings.yml` line (so the `--openai-api-key`
and `--openai-api-base` options remain part of the same continued command) and
verify the `aider --architect` block uses `\` at the end of each continued line.
- Around line 130-139: The model settings reference mismatched profile-tagged
names: the entry with name "openai/aider:qwen-coder-32B" points its
weak_model_name and editor_model_name at "openai/aider:qwen-coder-32B-aider",
which is not declared; make the identifiers consistent by either renaming the
top-level name to "openai/aider:qwen-coder-32B-aider" or changing
weak_model_name and editor_model_name to "openai/aider:qwen-coder-32B" so that
the name in the model settings file matches the targets; update the entries that
mention weak_model_name and editor_model_name accordingly to ensure every
referenced model (e.g., "openai/aider:qwen-coder-32B" and
"openai/aider:qwen-coder-32B-aider") is declared with the same profile tag
"aider:".
---
Nitpick comments:
In `@cmd/simple-responder/simple-responder.go`:
- Around line 286-292: The handler registered in r.POST("/collective_rpc",
func(c *gin.Context) {...}) currently ignores the error returned by
io.ReadAll(c.Request.Body); change this to check the error, log or attach the
error to the context, and return an explicit failure status (e.g., 500 Internal
Server Error) when ReadAll fails instead of silently falling back to the generic
400 path — only proceed to inspect the body for "reload_weights" if err == nil.
Ensure you reference the same handler closure when adding the error check and
use c.Status/c.JSON (or c.AbortWithStatus) to return the failure response.
In `@docker/llmsnap-sd.Containerfile`:
- Line 3: The Containerfile's ARG BASE currently defaults to a local-only image
name ("ARG BASE=llmsnap:latest") which causes manual builds to fail if that
local image doesn't exist; change the default ARG BASE to a widely available
public base image (for example a stable distro slim tag) or add a clear inline
comment above ARG BASE instructing maintainers to pass --build-arg BASE=<image>
when building manually; update the ARG line (symbol: ARG BASE) and/or the
Containerfile comment so the normal build script docker/build-container.sh can
still override it while manual builds succeed.
In `@README.md`:
- Line 53: Update the sentence containing the phrase "llmsnap includes a real
time web interface for monitoring logs and controlling models:" to use the
compound adjective "real-time" (i.e., change "real time web interface" to
"real-time web interface") so the README's wording is consistent with
hyphenation conventions.
In `@ui-svelte/src/stores/theme.ts`:
- Line 7: The change to the persistentStore key for appTitle will reset existing
users' custom titles; update the appTitle initialization to first check the old
storage key and migrate it to the new key (or fall back to the old key if
present) before calling persistentStore, i.e., use the persistentStore API or
storage get/set helpers to read the previous key value and write it under the
new key so existing values are preserved (referencing appTitle and
persistentStore<string>("appTitle", "llmsnap") to locate the change).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3cea36b2-a884-4ca5-865b-d5c0f16280ed
⛔ Files ignored due to path filters (3)
docs/assets/hero1.jpgis excluded by!**/*.jpgdocs/assets/hero2.pngis excluded by!**/*.pngui-svelte/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (61)
.github/ISSUE_TEMPLATE/bug-report.md.github/workflows/containers.yml.github/workflows/go-ci-windows.yml.github/workflows/release.yml.gitignoreCLAUDE.mdLICENSE.mdMakefileREADME.mdai-plans/issue-264-add-metadata.mdcmd/misc/benchmark-chatcompletion/main.gocmd/simple-responder/simple-responder.gocmd/wol-proxy/README.mdcodemaps/architecture.mdcodemaps/backend.mdcodemaps/data.mdcodemaps/frontend.mdconfig-schema.jsonconfig.example.yamldocker/build-container.shdocker/llmsnap-sd.Containerfiledocker/llmsnap.Containerfiledocs/assets/hero3.webpdocs/configuration.mddocs/container-security.mddocs/examples/README.mddocs/examples/aider-qwq-coder/README.mddocs/examples/aider-qwq-coder/aider.model.settings.dualgpu.ymldocs/examples/benchmark-snakegame/README.mddocs/examples/restart-on-config-change/README.mddocs/examples/speculative-decoding/README.mdgo.modllama-swap.goproxy/config/config.goproxy/config/config_posix_test.goproxy/config/config_test.goproxy/config/config_windows_test.goproxy/config/model_config.goproxy/helpers_test.goproxy/logMonitor.goproxy/metrics_monitor.goproxy/metrics_monitor_test.goproxy/peerproxy.goproxy/peerproxy_test.goproxy/process.goproxy/process_test.goproxy/processgroup.goproxy/processgroup_test.goproxy/proxymanager.goproxy/proxymanager_api.goproxy/proxymanager_test.goscripts/install.shscripts/uninstall.shui-svelte/index.htmlui-svelte/public/site.webmanifestui-svelte/src/components/Header.svelteui-svelte/src/components/ModelsPanel.svelteui-svelte/src/index.cssui-svelte/src/lib/types.tsui-svelte/src/stores/api.tsui-svelte/src/stores/theme.ts
💤 Files with no reviewable changes (1)
- .github/workflows/go-ci-windows.yml
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
proxy/proxymanager.go (1)
258-271:⚠️ Potential issue | 🔴 CriticalSwap-enabled groups cannot be preloaded through parallel
ProxyRequestcalls.On Line 266, each goroutine enters
ProcessGroup.ProxyRequest. Inproxy/processgroup.go, theswappath holds the group lock, stopslastUsedProcess, and then proxies the new model, so the calls serialize and each later preload tears down the one before it. The batch therefore leaves only the last model running, while Line 267 still emitsSuccess: truefor the earlier models. Keepswap=truegroups sequential here, or add a preload path that bypasses swap bookkeeping.Minimal safe fallback
- // Start all models in this group in parallel - var wg sync.WaitGroup - for _, modelID := range ge.models { - wg.Add(1) - go func(mid string) { - defer wg.Done() - proxyLogger.Infof("Preloading model: %s", mid) - req, _ := http.NewRequest("GET", "/", nil) - processGroup.ProxyRequest(mid, &DiscardWriter{}, req) - event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) - proxyLogger.Infof("Preloaded model: %s", mid) - }(modelID) - } - wg.Wait() + if processGroup.swap { + for _, mid := range ge.models { + proxyLogger.Infof("Preloading model: %s", mid) + req, _ := http.NewRequest("GET", "/", nil) + processGroup.ProxyRequest(mid, &DiscardWriter{}, req) + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) + proxyLogger.Infof("Preloaded model: %s", mid) + } + } else { + var wg sync.WaitGroup + for _, modelID := range ge.models { + wg.Add(1) + go func(mid string) { + defer wg.Done() + proxyLogger.Infof("Preloading model: %s", mid) + req, _ := http.NewRequest("GET", "/", nil) + processGroup.ProxyRequest(mid, &DiscardWriter{}, req) + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) + proxyLogger.Infof("Preloaded model: %s", mid) + }(modelID) + } + wg.Wait() + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager.go` around lines 258 - 271, The current parallel preload loop calling processGroup.ProxyRequest for each modelID (ge.models) races with swap bookkeeping in ProcessGroup.ProxyRequest causing teardown of previous processes and false-success ModelPreloadedEvent emissions; modify the preload logic to detect swap-enabled groups and run their preloads sequentially (instead of in goroutines) or implement a dedicated preload path that bypasses swap bookkeeping, and ensure you only emit ModelPreloadedEvent{..., Success: true} after ProxyRequest/ preload returns a success (check/propagate its error) rather than unconditionally; refer to ge.models, processGroup.ProxyRequest, DiscardWriter and ModelPreloadedEvent to locate and update the loop and emission points.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/proxymanager.go`:
- Around line 225-245: The current logic (variables groupOrder, seen map and
loop over proxyConfig.Hooks.OnStartup.Preload) deduplicates by group ID and
reorders batches (collapsing non-contiguous same-group entries), so change it to
only batch contiguous same-group entries: iterate
proxyConfig.Hooks.OnStartup.Preload in order, call
pm.findGroupByModelName(modelID) and proxyLogger.Warnf on missing entries as
before, but instead of using seen, keep lastGroupID (string) and if the found
pg.id equals lastGroupID append modelID to the last groupEntry.models, otherwise
push a new groupEntry{id: pg.id, models: []string{modelID}} and set
lastGroupID=pg.id; remove the seen map and its related dedupe logic so original
preload sequence is preserved.
---
Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 258-271: The current parallel preload loop calling
processGroup.ProxyRequest for each modelID (ge.models) races with swap
bookkeeping in ProcessGroup.ProxyRequest causing teardown of previous processes
and false-success ModelPreloadedEvent emissions; modify the preload logic to
detect swap-enabled groups and run their preloads sequentially (instead of in
goroutines) or implement a dedicated preload path that bypasses swap
bookkeeping, and ensure you only emit ModelPreloadedEvent{..., Success: true}
after ProxyRequest/ preload returns a success (check/propagate its error) rather
than unconditionally; refer to ge.models, processGroup.ProxyRequest,
DiscardWriter and ModelPreloadedEvent to locate and update the loop and emission
points.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 69b035ad-fa28-42d2-bbd3-41f9dd3ca50a
📒 Files selected for processing (1)
proxy/proxymanager.go
The previous approach used a seen map that collapsed non-contiguous same-group entries into a single batch, reordering the preload sequence. This change preserves the original preload order by only batching models that appear consecutively for the same group. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
proxy/proxymanager.go (1)
258-271:⚠️ Potential issue | 🔴 CriticalDon’t parallelize
ProxyRequestinside swap-enabled groups.
processGroup.ProxyRequeststill mutateslastUsedProcessunderpg.Lock()and stops the previously active member whenpg.swapis true (seeproxy/processgroup.go). So this goroutine fan-out serializes on that lock and each later preload tears down the model loaded by the previous goroutine. In practice, only the last model in the batch stays running, while every model still emitsModelPreloadedEvent{Success:true}.At minimum, keep
processGroup.swapbatches sequential until preload can use a path that doesn’t updatelastUsedProcess.Possible safe fallback
- // Start all models in this group in parallel - var wg sync.WaitGroup - for _, modelID := range ge.models { - wg.Add(1) - go func(mid string) { - defer wg.Done() - proxyLogger.Infof("Preloading model: %s", mid) - req, _ := http.NewRequest("GET", "/", nil) - processGroup.ProxyRequest(mid, &DiscardWriter{}, req) - event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) - proxyLogger.Infof("Preloaded model: %s", mid) - }(modelID) - } - wg.Wait() + preloadOne := func(mid string) { + proxyLogger.Infof("Preloading model: %s", mid) + req, _ := http.NewRequest("GET", "/", nil) + processGroup.ProxyRequest(mid, &DiscardWriter{}, req) + event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true}) + proxyLogger.Infof("Preloaded model: %s", mid) + } + + if processGroup.swap { + for _, modelID := range ge.models { + preloadOne(modelID) + } + } else { + var wg sync.WaitGroup + for _, modelID := range ge.models { + wg.Add(1) + go func(mid string) { + defer wg.Done() + preloadOne(mid) + }(modelID) + } + wg.Wait() + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager.go` around lines 258 - 271, The preload loop currently spawns goroutines for every model in ge.models but processGroup.ProxyRequest mutates lastUsedProcess and stops previous members when processGroup.swap is true, causing race/teardown; change the preload logic to detect processGroup.swap (or pg.swap) and, when true, run the per-model ProxyRequest calls sequentially (no goroutines) for that group so each preload completes without tearing down the prior model; when swap is false you may keep the existing parallel goroutine fan-out; reference the existing loop over ge.models, processGroup.ProxyRequest, DiscardWriter, and the emitted ModelPreloadedEvent to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 258-271: The preload loop currently spawns goroutines for every
model in ge.models but processGroup.ProxyRequest mutates lastUsedProcess and
stops previous members when processGroup.swap is true, causing race/teardown;
change the preload logic to detect processGroup.swap (or pg.swap) and, when
true, run the per-model ProxyRequest calls sequentially (no goroutines) for that
group so each preload completes without tearing down the prior model; when swap
is false you may keep the existing parallel goroutine fan-out; reference the
existing loop over ge.models, processGroup.ProxyRequest, DiscardWriter, and the
emitted ModelPreloadedEvent to locate and update the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a5ffc979-6b2b-4b32-ae4b-9d81f2aa5759
📒 Files selected for processing (1)
proxy/proxymanager.go
Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity.
This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously.
Changes:
sync.WaitGroupto parallelize within each groupswapProcessGrouponce per group (handles exclusivity)Before: 3 models in same group → loaded sequentially (total = sum of load times)
After: 3 models in same group → loaded in parallel (total ≈ max load time)
Will also submit to napmany/llmsnap.