Summary
memory-lancedb-pro's LLM completion lane builds a minimal OpenAI chat-completions request in its own client (src/llm-client.ts, createApiKeyClient) and posts it directly to the configured llm.baseURL. This bypasses the host model catalog and runtime entirely. When the base URL is OpenRouter, there is no way to apply OpenRouter's provider routing object, or any other catalog-level model configuration, so OpenRouter free-routes the model across all providers.
Where
src/llm-client.ts, createApiKeyClient(). The request carries only model, messages, temperature (plus a conditional chat_template_kwargs). The same client backs every completion lane (smart extraction, reflection, dedup/merge, admission-control, the memory upgrader).
Impact
On OpenRouter, an open-weight model such as openai/gpt-oss-120b is served by many providers with very different latency, throughput, quality, and rate limits. Because the plugin's calls carry no provider preference, they free-route (observed across SiliconFlow, DeepInfra, Together, Amazon Bedrock, NovitaAI, and Weights & Biases). A latency-sensitive memory and reflection lane cannot be pinned to fast providers (for example Cerebras or Groq), cannot avoid providers that return 429s, and cannot be attributed to the host app in OpenRouter's dashboard. Provider routing is only the first example: any catalog-level model setting (sampling params, request headers, request policy) is equally unavailable to this lane.
Proposed direction
Rather than adding a one-off passthrough per setting, let the completion lane route through the host managed LLM runtime using a catalog model reference, the way other OpenClaw plugins already do (for example lossless-claw's summaryModel and expansionModel, which set model: <ref> and let the host resolve and complete it). The plugin already holds a host-runtime handle (api.runtime.agent) for its recall sub-agent, so the completion lane could use the same managed path when running under OpenClaw with a catalog model reference, and fall back to the existing standalone client for CLI and non-host usage.
Benefit: the full catalog entry applies automatically (provider routing, sampling params, headers, auth profiles, app attribution, and anything added later), instead of re-patching the client for each new field.
OpenRouter provider routing reference: https://openrouter.ai/docs/features/provider-routing
Environment
- memory-lancedb-pro 1.1.0-beta.11 (master)
- OpenClaw 2026.6.1, host-native, Node 24
- Provider: OpenRouter (chat-completions)
Summary
memory-lancedb-pro's LLM completion lane builds a minimal OpenAI chat-completions request in its own client (src/llm-client.ts,createApiKeyClient) and posts it directly to the configuredllm.baseURL. This bypasses the host model catalog and runtime entirely. When the base URL is OpenRouter, there is no way to apply OpenRouter'sproviderrouting object, or any other catalog-level model configuration, so OpenRouter free-routes the model across all providers.Where
src/llm-client.ts,createApiKeyClient(). The request carries onlymodel,messages,temperature(plus a conditionalchat_template_kwargs). The same client backs every completion lane (smart extraction, reflection, dedup/merge, admission-control, the memory upgrader).Impact
On OpenRouter, an open-weight model such as
openai/gpt-oss-120bis served by many providers with very different latency, throughput, quality, and rate limits. Because the plugin's calls carry no provider preference, they free-route (observed across SiliconFlow, DeepInfra, Together, Amazon Bedrock, NovitaAI, and Weights & Biases). A latency-sensitive memory and reflection lane cannot be pinned to fast providers (for example Cerebras or Groq), cannot avoid providers that return 429s, and cannot be attributed to the host app in OpenRouter's dashboard. Provider routing is only the first example: any catalog-level model setting (sampling params, request headers, request policy) is equally unavailable to this lane.Proposed direction
Rather than adding a one-off passthrough per setting, let the completion lane route through the host managed LLM runtime using a catalog model reference, the way other OpenClaw plugins already do (for example lossless-claw's
summaryModelandexpansionModel, which setmodel: <ref>and let the host resolve and complete it). The plugin already holds a host-runtime handle (api.runtime.agent) for its recall sub-agent, so the completion lane could use the same managed path when running under OpenClaw with a catalog model reference, and fall back to the existing standalone client for CLI and non-host usage.Benefit: the full catalog entry applies automatically (provider routing, sampling params, headers, auth profiles, app attribution, and anything added later), instead of re-patching the client for each new field.
OpenRouter provider routing reference: https://openrouter.ai/docs/features/provider-routing
Environment