Skip to content

proxy: parallelize startup preload within process groups#637

Open
Quentin-M wants to merge 2 commits into
mostlygeek:mainfrom
Quentin-M:parallel-preload
Open

proxy: parallelize startup preload within process groups#637
Quentin-M wants to merge 2 commits into
mostlygeek:mainfrom
Quentin-M:parallel-preload

Conversation

@Quentin-M

Copy link
Copy Markdown
Contributor

Models in the same group now preload concurrently via goroutines instead of sequentially. Groups are still loaded in order to respect exclusivity.

This significantly reduces startup time when a group has multiple models (e.g., an LLM + STT + TTS that share GPU resources). Each model's container starts and health-checks simultaneously.

Changes:

  • Group preload models by ProcessGroup before starting
  • Use sync.WaitGroup to parallelize within each group
  • Call swapProcessGroup once per group (handles exclusivity)
  • Log group completion with model count

Before: 3 models in same group → loaded sequentially (total = sum of load times)
After: 3 models in same group → loaded in parallel (total ≈ max load time)

Will also submit to napmany/llmsnap.

@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown

Walkthrough

Startup preload in ProxyManager is refactored to group contiguous preload entries by process-group id, perform a single swap per group, and issue concurrent ProxyRequest preload calls per model within each group batch, emitting per-model preload events and per-batch logging.

Changes

Cohort / File(s) Summary
Preload Execution Refactor
proxy/proxymanager.go
Replaced per-model sequential swap-and-request with grouping by process-group id: swap once per group, launch concurrent goroutines for ProxyRequest per model with a sync.WaitGroup, emit ModelPreloadedEvent per model (success/failure), remove shared discardWriter in favor of per-request DiscardWriter, and add per-model and per-batch logging.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title directly describes the main change: parallelizing model preload within process groups to improve startup performance.
Description check ✅ Passed The description is fully related to the changeset, providing clear context about the performance improvement, implementation approach, and motivation for the parallelization strategy.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Models in the same group now preload concurrently via goroutines instead
of sequentially. Groups are still loaded in order to respect exclusivity.

This significantly reduces startup time when a group has multiple models
(e.g., an LLM + STT + TTS that share GPU resources). Each model's
container starts and health-checks simultaneously.

- Group preload models by ProcessGroup before starting
- Use sync.WaitGroup to parallelize within each group
- Call swapProcessGroup once per group (handles exclusivity)
- Log group completion with model count

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/examples/aider-qwq-coder/README.md (2)

145-152: ⚠️ Potential issue | 🟠 Major

Fix the broken multiline aider command.

The line after --model-settings-file aider.model.settings.yml is missing a trailing \, so copy/pasting this block runs --openai-api-key as a separate shell command.

Suggested fix
 $ aider --architect \
     --no-show-model-warnings \
     --model openai/aider:QwQ \
     --editor-model openai/aider:qwen-coder-32B \
     --config aider.conf.yml \
-    --model-settings-file aider.model.settings.yml
+    --model-settings-file aider.model.settings.yml \
     --openai-api-key "sk-na" \
     --openai-api-base "http://10.0.1.24:8080/v1"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/examples/aider-qwq-coder/README.md` around lines 145 - 152, The
multiline shell example for the aider command is missing a trailing backslash
after the `--model-settings-file aider.model.settings.yml` line causing
`--openai-api-key` to be treated as a new command; fix the example by adding a
trailing `\` to the end of the `--model-settings-file aider.model.settings.yml`
line (so the `--openai-api-key` and `--openai-api-base` options remain part of
the same continued command) and verify the `aider --architect` block uses `\` at
the end of each continued line.

130-139: ⚠️ Potential issue | 🟠 Major

Keep the profiled model names consistent.

This section defines openai/aider:qwen-coder-32B, but weak_model_name and editor_model_name point at openai/aider:qwen-coder-32B-aider. That target is never declared in the example, so the config will not resolve as written.

Suggested fix
 - name: "openai/aider:QwQ"
-  weak_model_name: "openai/aider:qwen-coder-32B-aider"
-  editor_model_name: "openai/aider:qwen-coder-32B-aider"
+  weak_model_name: "openai/aider:qwen-coder-32B"
+  editor_model_name: "openai/aider:qwen-coder-32B"

 - name: "openai/aider:qwen-coder-32B"
-  editor_model_name: "openai/aider:qwen-coder-32B-aider"
+  editor_model_name: "openai/aider:qwen-coder-32B"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/examples/aider-qwq-coder/README.md` around lines 130 - 139, The model
settings reference mismatched profile-tagged names: the entry with name
"openai/aider:qwen-coder-32B" points its weak_model_name and editor_model_name
at "openai/aider:qwen-coder-32B-aider", which is not declared; make the
identifiers consistent by either renaming the top-level name to
"openai/aider:qwen-coder-32B-aider" or changing weak_model_name and
editor_model_name to "openai/aider:qwen-coder-32B" so that the name in the model
settings file matches the targets; update the entries that mention
weak_model_name and editor_model_name accordingly to ensure every referenced
model (e.g., "openai/aider:qwen-coder-32B" and
"openai/aider:qwen-coder-32B-aider") is declared with the same profile tag
"aider:".
🧹 Nitpick comments (4)
cmd/simple-responder/simple-responder.go (1)

286-292: Handle body read failures explicitly in /collective_rpc.

At Line [287], io.ReadAll ignores errors. If body read fails, the handler silently falls through to a generic 400-path, which makes wake-sequence test failures harder to diagnose.

Proposed patch
 r.POST("/collective_rpc", func(c *gin.Context) {
-	body, _ := io.ReadAll(c.Request.Body)
+	body, err := io.ReadAll(c.Request.Body)
+	if err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{"error": "failed to read request body"})
+		return
+	}
 	if strings.Contains(string(body), "reload_weights") {
 		c.Status(http.StatusOK)
 	} else {
 		c.Status(http.StatusBadRequest)
 	}
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/simple-responder/simple-responder.go` around lines 286 - 292, The handler
registered in r.POST("/collective_rpc", func(c *gin.Context) {...}) currently
ignores the error returned by io.ReadAll(c.Request.Body); change this to check
the error, log or attach the error to the context, and return an explicit
failure status (e.g., 500 Internal Server Error) when ReadAll fails instead of
silently falling back to the generic 400 path — only proceed to inspect the body
for "reload_weights" if err == nil. Ensure you reference the same handler
closure when adding the error check and use c.Status/c.JSON (or
c.AbortWithStatus) to return the failure response.
docker/llmsnap-sd.Containerfile (1)

3-3: Note: Default BASE value assumes local image exists.

The default BASE=llmsnap:latest is correct for the rebranding. However, if someone builds this Containerfile manually without providing --build-arg BASE=<image>, the build will fail because no local llmsnap:latest image exists yet.

This is acceptable since the normal build workflow (in docker/build-container.sh) overrides this default. Just be aware that standalone/manual builds require explicitly passing the BASE argument.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/llmsnap-sd.Containerfile` at line 3, The Containerfile's ARG BASE
currently defaults to a local-only image name ("ARG BASE=llmsnap:latest") which
causes manual builds to fail if that local image doesn't exist; change the
default ARG BASE to a widely available public base image (for example a stable
distro slim tag) or add a clear inline comment above ARG BASE instructing
maintainers to pass --build-arg BASE=<image> when building manually; update the
ARG line (symbol: ARG BASE) and/or the Containerfile comment so the normal build
script docker/build-container.sh can still override it while manual builds
succeed.
ui-svelte/src/stores/theme.ts (1)

7-7: Preserve existing stored title values across the key rename.

Line 7 changes the storage key, which will reset title customization for existing users. Consider either keeping the old key or adding one-time migration.

Suggested minimal compatibility fix
-export const appTitle = persistentStore<string>("appTitle", "llmsnap");
+export const appTitle = persistentStore<string>("app-title", "llmsnap");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-svelte/src/stores/theme.ts` at line 7, The change to the persistentStore
key for appTitle will reset existing users' custom titles; update the appTitle
initialization to first check the old storage key and migrate it to the new key
(or fall back to the old key if present) before calling persistentStore, i.e.,
use the persistentStore API or storage get/set helpers to read the previous key
value and write it under the new key so existing values are preserved
(referencing appTitle and persistentStore<string>("appTitle", "llmsnap") to
locate the change).
README.md (1)

53-53: Minor style suggestion: hyphenation.

Consider using "real-time" instead of "real time" for consistency with compound adjective conventions.

Proposed fix
-llmsnap includes a real time web interface for monitoring logs and controlling models:
+llmsnap includes a real-time web interface for monitoring logs and controlling models:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 53, Update the sentence containing the phrase "llmsnap
includes a real time web interface for monitoring logs and controlling models:"
to use the compound adjective "real-time" (i.e., change "real time web
interface" to "real-time web interface") so the README's wording is consistent
with hyphenation conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ai-plans/issue-264-add-metadata.md`:
- Around line 233-234: Update the tests in proxymanager_test.go to stop
asserting or checking the old key name "llamaswap_meta" and instead assert the
new "llmsnap_meta" behavior: replace any occurrences of "llamaswap_meta" with
"llmsnap_meta" in the test fixtures and assertions, remove mixed old/new-key
checks so each test is consistent (i.e., when metadata exists assert presence
and content of llmsnap_meta; when metadata is empty assert the key is absent),
and apply the same replacements/cleanup for the other affected test sections
referenced around the same areas (the snippets near the earlier mentioned
checks).
- Around line 88-90: The changes introduce API metadata/rebranding (the
llmsnap_meta key and related bullets "Add metadata to each model record under
the key `llmsnap_meta`", "Only include `llmsnap_meta` if metadata is non-empty",
"Preserve all types when marshaling to JSON") which are outside this PR's stated
scope (preload concurrency for process groups); remove or revert those
llmsnap_meta bullets and any rebranding language from this document so the file
only documents the preload concurrency change, and move the llmsnap_meta rebrand
text into a separate focused PR (and corresponding release-note entry)
referencing the llmsnap_meta key to avoid scope drift.

In `@CLAUDE.md`:
- Line 5: Update the sentence in CLAUDE.md to use correct compound-word
hyphenation and spelling: change "light weight" to "lightweight" and add a
hyphen in "OpenAI-compatible" so the phrase reads "llmsnap is a lightweight,
transparent proxy server that provides automatic model swapping to vllm,
llama.cpp and other OpenAI-compatible servers."

In `@codemaps/architecture.md`:
- Line 11: There are three unlabeled fenced code blocks; update each opening
triple-backtick to include a language tag (e.g., change ``` to ```text) so the
blocks are labeled and MD040-compliant; locate the three unlabeled ``` blocks in
the document and add the language specifier to each opening fence.

In `@config-schema.json`:
- Around line 242-315: The schema currently allows "sleepMode": "enable" without
enforcing that sleepEndpoints and wakeEndpoints are present and non-empty; add
an if/then that checks if sleepMode equals "enable" and then requires
["sleepEndpoints","wakeEndpoints"] and enforces "minItems": 1 for both
sleepEndpoints and wakeEndpoints array schemas so editor/JSON-schema validation
matches the runtime checks in proxy/config/model_config.go (referencing
sleepMode, sleepEndpoints, wakeEndpoints).

In `@docs/examples/README.md`:
- Line 3: Fix the typo in the README sentence "A collections of usecases and
examples for getting the most out of llmsnap." by changing it to "A collection
of use cases and examples for getting the most out of llmsnap." — update the
words "collections" → "collection" and "usecases" → "use cases" in the README
header/intro line that currently contains that exact string.

In `@docs/examples/restart-on-config-change/README.md`:
- Line 3: Update the user-facing sentence in the README text that currently
reads "trail and error" to the correct phrase "trial and error"; locate the
sentence mentioning `watch-and-restart.sh`, `config.yaml`, and `llmsnap` in the
README and replace the typo so the line reads "Sometimes editing the
configuration file can take a bit of trial and error to get a model
configuration tuned just right.".

In `@proxy/metrics_monitor.go`:
- Around line 200-201: The fallback metric path is using recorder.StartTime()
which can be zero if recorder.Write never set its start; instead use the
originally captured requestStartTime as the fallback. Update the code around the
newBodyCopier/recorder usage so any metric calculation that currently calls
recorder.StartTime() falls back to requestStartTime (or use
recorder.RequestTime() if present) — e.g., when computing DurationMs in the
default metric path replace recorder.StartTime() with a function that returns
recorder.StartTime() if non-zero else requestStartTime (referencing
requestStartTime, recorder, newBodyCopier, RequestTime(), and StartTime()).
- Around line 215-218: The warning currently logs the raw upstream response body
(recorder.body.Bytes()) when recorder.Status() != http.StatusOK; instead
sanitize and truncate the message before logging: build a safeSnippet by
stripping non-printable characters and limiting to a hard cap (e.g. 200 chars)
and log that snippet with mp.logger.Warnf along with recorder.Status() and
request.URL.Path, and keep the full body only for debug/capture paths (e.g.,
when a debug flag or capture hook is enabled) so raw provider payloads are not
written to standard warning logs.

In `@proxy/process.go`:
- Around line 347-350: The health URL construction failure occurs after the
child process has been started (around the p.buildFullURL(checkEndpoint) call),
so before returning the fmt.Errorf you must cleanly stop/reset the process state
to avoid leaving it in StateStarting; call the process teardown method on p
(e.g., the existing stop/shutdown/cleanup method used elsewhere in this file),
ensure the process state is set to a non-starting state
(StateStopped/StateFailed), handle any teardown error (log or wrap it) and then
return the original URL construction error referencing p.config.Proxy and
checkEndpoint.

In `@proxy/processgroup_test.go`:
- Around line 171-199: The test "non-sleep processes are stopped" creates a
ProcessGroup with NewProcessGroup and starts processes but lacks the consistent
teardown used elsewhere; add a deferred cleanup by calling
pg.StopProcesses(StopImmediately) (e.g., defer
pg.StopProcesses(StopImmediately)) immediately after creating pg so processes
started in the loop and affected by pg.MakeIdleProcesses() are always stopped.

In `@scripts/uninstall.sh`:
- Line 38: The uninstall script currently unquotes the command substitution in
the line that runs $SUDO rm $(which llmsnap), which can break on paths
containing spaces and when which returns empty; update the uninstall.sh logic
that references which llmsnap to (1) use command -v llmsnap instead of which for
a POSIX-safe lookup, (2) capture the result into a variable like snap_path, (3)
check that snap_path is non-empty before attempting removal, and (4) quote the
variable when calling rm (e.g., "$snap_path") so paths with spaces are handled
safely.

---

Outside diff comments:
In `@docs/examples/aider-qwq-coder/README.md`:
- Around line 145-152: The multiline shell example for the aider command is
missing a trailing backslash after the `--model-settings-file
aider.model.settings.yml` line causing `--openai-api-key` to be treated as a new
command; fix the example by adding a trailing `\` to the end of the
`--model-settings-file aider.model.settings.yml` line (so the `--openai-api-key`
and `--openai-api-base` options remain part of the same continued command) and
verify the `aider --architect` block uses `\` at the end of each continued line.
- Around line 130-139: The model settings reference mismatched profile-tagged
names: the entry with name "openai/aider:qwen-coder-32B" points its
weak_model_name and editor_model_name at "openai/aider:qwen-coder-32B-aider",
which is not declared; make the identifiers consistent by either renaming the
top-level name to "openai/aider:qwen-coder-32B-aider" or changing
weak_model_name and editor_model_name to "openai/aider:qwen-coder-32B" so that
the name in the model settings file matches the targets; update the entries that
mention weak_model_name and editor_model_name accordingly to ensure every
referenced model (e.g., "openai/aider:qwen-coder-32B" and
"openai/aider:qwen-coder-32B-aider") is declared with the same profile tag
"aider:".

---

Nitpick comments:
In `@cmd/simple-responder/simple-responder.go`:
- Around line 286-292: The handler registered in r.POST("/collective_rpc",
func(c *gin.Context) {...}) currently ignores the error returned by
io.ReadAll(c.Request.Body); change this to check the error, log or attach the
error to the context, and return an explicit failure status (e.g., 500 Internal
Server Error) when ReadAll fails instead of silently falling back to the generic
400 path — only proceed to inspect the body for "reload_weights" if err == nil.
Ensure you reference the same handler closure when adding the error check and
use c.Status/c.JSON (or c.AbortWithStatus) to return the failure response.

In `@docker/llmsnap-sd.Containerfile`:
- Line 3: The Containerfile's ARG BASE currently defaults to a local-only image
name ("ARG BASE=llmsnap:latest") which causes manual builds to fail if that
local image doesn't exist; change the default ARG BASE to a widely available
public base image (for example a stable distro slim tag) or add a clear inline
comment above ARG BASE instructing maintainers to pass --build-arg BASE=<image>
when building manually; update the ARG line (symbol: ARG BASE) and/or the
Containerfile comment so the normal build script docker/build-container.sh can
still override it while manual builds succeed.

In `@README.md`:
- Line 53: Update the sentence containing the phrase "llmsnap includes a real
time web interface for monitoring logs and controlling models:" to use the
compound adjective "real-time" (i.e., change "real time web interface" to
"real-time web interface") so the README's wording is consistent with
hyphenation conventions.

In `@ui-svelte/src/stores/theme.ts`:
- Line 7: The change to the persistentStore key for appTitle will reset existing
users' custom titles; update the appTitle initialization to first check the old
storage key and migrate it to the new key (or fall back to the old key if
present) before calling persistentStore, i.e., use the persistentStore API or
storage get/set helpers to read the previous key value and write it under the
new key so existing values are preserved (referencing appTitle and
persistentStore<string>("appTitle", "llmsnap") to locate the change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3cea36b2-a884-4ca5-865b-d5c0f16280ed

📥 Commits

Reviewing files that changed from the base of the PR and between d87f0ce and 78dbba7.

⛔ Files ignored due to path filters (3)
  • docs/assets/hero1.jpg is excluded by !**/*.jpg
  • docs/assets/hero2.png is excluded by !**/*.png
  • ui-svelte/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (61)
  • .github/ISSUE_TEMPLATE/bug-report.md
  • .github/workflows/containers.yml
  • .github/workflows/go-ci-windows.yml
  • .github/workflows/release.yml
  • .gitignore
  • CLAUDE.md
  • LICENSE.md
  • Makefile
  • README.md
  • ai-plans/issue-264-add-metadata.md
  • cmd/misc/benchmark-chatcompletion/main.go
  • cmd/simple-responder/simple-responder.go
  • cmd/wol-proxy/README.md
  • codemaps/architecture.md
  • codemaps/backend.md
  • codemaps/data.md
  • codemaps/frontend.md
  • config-schema.json
  • config.example.yaml
  • docker/build-container.sh
  • docker/llmsnap-sd.Containerfile
  • docker/llmsnap.Containerfile
  • docs/assets/hero3.webp
  • docs/configuration.md
  • docs/container-security.md
  • docs/examples/README.md
  • docs/examples/aider-qwq-coder/README.md
  • docs/examples/aider-qwq-coder/aider.model.settings.dualgpu.yml
  • docs/examples/benchmark-snakegame/README.md
  • docs/examples/restart-on-config-change/README.md
  • docs/examples/speculative-decoding/README.md
  • go.mod
  • llama-swap.go
  • proxy/config/config.go
  • proxy/config/config_posix_test.go
  • proxy/config/config_test.go
  • proxy/config/config_windows_test.go
  • proxy/config/model_config.go
  • proxy/helpers_test.go
  • proxy/logMonitor.go
  • proxy/metrics_monitor.go
  • proxy/metrics_monitor_test.go
  • proxy/peerproxy.go
  • proxy/peerproxy_test.go
  • proxy/process.go
  • proxy/process_test.go
  • proxy/processgroup.go
  • proxy/processgroup_test.go
  • proxy/proxymanager.go
  • proxy/proxymanager_api.go
  • proxy/proxymanager_test.go
  • scripts/install.sh
  • scripts/uninstall.sh
  • ui-svelte/index.html
  • ui-svelte/public/site.webmanifest
  • ui-svelte/src/components/Header.svelte
  • ui-svelte/src/components/ModelsPanel.svelte
  • ui-svelte/src/index.css
  • ui-svelte/src/lib/types.ts
  • ui-svelte/src/stores/api.ts
  • ui-svelte/src/stores/theme.ts
💤 Files with no reviewable changes (1)
  • .github/workflows/go-ci-windows.yml

Comment thread ai-plans/issue-264-add-metadata.md Outdated
Comment thread ai-plans/issue-264-add-metadata.md Outdated
Comment thread CLAUDE.md Outdated
Comment thread codemaps/architecture.md Outdated
Comment thread config-schema.json Outdated
Comment thread proxy/process.go Outdated
Comment thread proxy/process.go Outdated
Comment thread proxy/processgroup_test.go Outdated
Comment thread proxy/proxymanager.go
Comment thread scripts/uninstall.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
proxy/proxymanager.go (1)

258-271: ⚠️ Potential issue | 🔴 Critical

Swap-enabled groups cannot be preloaded through parallel ProxyRequest calls.

On Line 266, each goroutine enters ProcessGroup.ProxyRequest. In proxy/processgroup.go, the swap path holds the group lock, stops lastUsedProcess, and then proxies the new model, so the calls serialize and each later preload tears down the one before it. The batch therefore leaves only the last model running, while Line 267 still emits Success: true for the earlier models. Keep swap=true groups sequential here, or add a preload path that bypasses swap bookkeeping.

Minimal safe fallback
-				// Start all models in this group in parallel
-				var wg sync.WaitGroup
-				for _, modelID := range ge.models {
-					wg.Add(1)
-					go func(mid string) {
-						defer wg.Done()
-						proxyLogger.Infof("Preloading model: %s", mid)
-						req, _ := http.NewRequest("GET", "/", nil)
-						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
-						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
-						proxyLogger.Infof("Preloaded model: %s", mid)
-					}(modelID)
-				}
-				wg.Wait()
+				if processGroup.swap {
+					for _, mid := range ge.models {
+						proxyLogger.Infof("Preloading model: %s", mid)
+						req, _ := http.NewRequest("GET", "/", nil)
+						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
+						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+						proxyLogger.Infof("Preloaded model: %s", mid)
+					}
+				} else {
+					var wg sync.WaitGroup
+					for _, modelID := range ge.models {
+						wg.Add(1)
+						go func(mid string) {
+							defer wg.Done()
+							proxyLogger.Infof("Preloading model: %s", mid)
+							req, _ := http.NewRequest("GET", "/", nil)
+							processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
+							event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+							proxyLogger.Infof("Preloaded model: %s", mid)
+						}(modelID)
+					}
+					wg.Wait()
+				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/proxymanager.go` around lines 258 - 271, The current parallel preload
loop calling processGroup.ProxyRequest for each modelID (ge.models) races with
swap bookkeeping in ProcessGroup.ProxyRequest causing teardown of previous
processes and false-success ModelPreloadedEvent emissions; modify the preload
logic to detect swap-enabled groups and run their preloads sequentially (instead
of in goroutines) or implement a dedicated preload path that bypasses swap
bookkeeping, and ensure you only emit ModelPreloadedEvent{..., Success: true}
after ProxyRequest/ preload returns a success (check/propagate its error) rather
than unconditionally; refer to ge.models, processGroup.ProxyRequest,
DiscardWriter and ModelPreloadedEvent to locate and update the loop and emission
points.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@proxy/proxymanager.go`:
- Around line 225-245: The current logic (variables groupOrder, seen map and
loop over proxyConfig.Hooks.OnStartup.Preload) deduplicates by group ID and
reorders batches (collapsing non-contiguous same-group entries), so change it to
only batch contiguous same-group entries: iterate
proxyConfig.Hooks.OnStartup.Preload in order, call
pm.findGroupByModelName(modelID) and proxyLogger.Warnf on missing entries as
before, but instead of using seen, keep lastGroupID (string) and if the found
pg.id equals lastGroupID append modelID to the last groupEntry.models, otherwise
push a new groupEntry{id: pg.id, models: []string{modelID}} and set
lastGroupID=pg.id; remove the seen map and its related dedupe logic so original
preload sequence is preserved.

---

Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 258-271: The current parallel preload loop calling
processGroup.ProxyRequest for each modelID (ge.models) races with swap
bookkeeping in ProcessGroup.ProxyRequest causing teardown of previous processes
and false-success ModelPreloadedEvent emissions; modify the preload logic to
detect swap-enabled groups and run their preloads sequentially (instead of in
goroutines) or implement a dedicated preload path that bypasses swap
bookkeeping, and ensure you only emit ModelPreloadedEvent{..., Success: true}
after ProxyRequest/ preload returns a success (check/propagate its error) rather
than unconditionally; refer to ge.models, processGroup.ProxyRequest,
DiscardWriter and ModelPreloadedEvent to locate and update the loop and emission
points.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 69b035ad-fa28-42d2-bbd3-41f9dd3ca50a

📥 Commits

Reviewing files that changed from the base of the PR and between 78dbba7 and 0353306.

📒 Files selected for processing (1)
  • proxy/proxymanager.go

Comment thread proxy/proxymanager.go
The previous approach used a seen map that collapsed non-contiguous
same-group entries into a single batch, reordering the preload sequence.
This change preserves the original preload order by only batching models
that appear consecutively for the same group.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
proxy/proxymanager.go (1)

258-271: ⚠️ Potential issue | 🔴 Critical

Don’t parallelize ProxyRequest inside swap-enabled groups.

processGroup.ProxyRequest still mutates lastUsedProcess under pg.Lock() and stops the previously active member when pg.swap is true (see proxy/processgroup.go). So this goroutine fan-out serializes on that lock and each later preload tears down the model loaded by the previous goroutine. In practice, only the last model in the batch stays running, while every model still emits ModelPreloadedEvent{Success:true}.

At minimum, keep processGroup.swap batches sequential until preload can use a path that doesn’t update lastUsedProcess.

Possible safe fallback
-				// Start all models in this group in parallel
-				var wg sync.WaitGroup
-				for _, modelID := range ge.models {
-					wg.Add(1)
-					go func(mid string) {
-						defer wg.Done()
-						proxyLogger.Infof("Preloading model: %s", mid)
-						req, _ := http.NewRequest("GET", "/", nil)
-						processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
-						event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
-						proxyLogger.Infof("Preloaded model: %s", mid)
-					}(modelID)
-				}
-				wg.Wait()
+				preloadOne := func(mid string) {
+					proxyLogger.Infof("Preloading model: %s", mid)
+					req, _ := http.NewRequest("GET", "/", nil)
+					processGroup.ProxyRequest(mid, &DiscardWriter{}, req)
+					event.Emit(ModelPreloadedEvent{ModelName: mid, Success: true})
+					proxyLogger.Infof("Preloaded model: %s", mid)
+				}
+
+				if processGroup.swap {
+					for _, modelID := range ge.models {
+						preloadOne(modelID)
+					}
+				} else {
+					var wg sync.WaitGroup
+					for _, modelID := range ge.models {
+						wg.Add(1)
+						go func(mid string) {
+							defer wg.Done()
+							preloadOne(mid)
+						}(modelID)
+					}
+					wg.Wait()
+				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/proxymanager.go` around lines 258 - 271, The preload loop currently
spawns goroutines for every model in ge.models but processGroup.ProxyRequest
mutates lastUsedProcess and stops previous members when processGroup.swap is
true, causing race/teardown; change the preload logic to detect
processGroup.swap (or pg.swap) and, when true, run the per-model ProxyRequest
calls sequentially (no goroutines) for that group so each preload completes
without tearing down the prior model; when swap is false you may keep the
existing parallel goroutine fan-out; reference the existing loop over ge.models,
processGroup.ProxyRequest, DiscardWriter, and the emitted ModelPreloadedEvent to
locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@proxy/proxymanager.go`:
- Around line 258-271: The preload loop currently spawns goroutines for every
model in ge.models but processGroup.ProxyRequest mutates lastUsedProcess and
stops previous members when processGroup.swap is true, causing race/teardown;
change the preload logic to detect processGroup.swap (or pg.swap) and, when
true, run the per-model ProxyRequest calls sequentially (no goroutines) for that
group so each preload completes without tearing down the prior model; when swap
is false you may keep the existing parallel goroutine fan-out; reference the
existing loop over ge.models, processGroup.ProxyRequest, DiscardWriter, and the
emitted ModelPreloadedEvent to locate and update the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a5ffc979-6b2b-4b32-ae4b-9d81f2aa5759

📥 Commits

Reviewing files that changed from the base of the PR and between 0353306 and 8e00a60.

📒 Files selected for processing (1)
  • proxy/proxymanager.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant