knowledge base improvements#5204
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an OpenRouter LLM client, threads optional response_format through OpenAI/OpenRouter request/retry flows, introduces Citations to render model JSON into markdown with grouped sources and sanitized links, refactors prompt assembly to use global source numbering, and persists/displays model and context-expansion metadata with migrations and UI wiring. ChangesKnowledge Answer Citations and Multi-Model Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/openai/question.ex (1)
22-30:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument
:response_formatinask/…tracing options.
asknow supports:response_format, but the parameter list in the doc block still omits it.Also applies to: 45-47
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/openai/question.ex` around lines 22 - 30, The doc block for tracing_opts on the ask/… functions is missing the newly supported :response_format option; update the documentation for ask (and the repeated block around lines 45-47) to include `:response_format` in the tracing_opts list, describing that it selects the expected response format (e.g., "text", "json", or a schema) and any default behavior, so callers of ask/… and the ask function’s tracing options know this supported key.
🧹 Nitpick comments (3)
lib/openai/question.ex (1)
16-17: ⚡ Quick winAdd a typespec for
default_model/0.The new public function is missing an explicit
@spec.As per coding guidelines, "Add typespecs (
@spec) to all public functions".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/openai/question.ex` around lines 16 - 17, The public function default_model/0 is missing an `@spec`; add a typespec for it (e.g., `@spec` default_model() :: String.t()) immediately above def default_model(), referencing the existing `@model` module attribute as the returned type; ensure the spec matches the actual type of `@model` (String.t() or atom() as appropriate) so Dialyzer and the project's coding guidelines are satisfied.lib/open_router/question.ex (1)
41-43: ⚡ Quick winAdd
@specdeclarations for the new public functions.Please add explicit specs for the exported API (
ask/...,default_model/0,configured_model/0,openrouter_apikey/0) to keep contracts analyzable and consistent with project rules.As per coding guidelines, "Add typespecs (
@spec) to all public functions".Also applies to: 139-151
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/open_router/question.ex` around lines 41 - 43, Add explicit `@spec` declarations for the exported API: add `@spec` ask(question :: String.t()) :: {:ok, map()} | {:error, term()} for the ask/1 function, add `@spec` default_model() :: String.t(), `@spec` configured_model() :: String.t(), and `@spec` openrouter_apikey() :: String.t() | nil for the respective zero-arity helpers; also ensure any other public functions around lines 139-151 have matching `@specs` following the same style so the module's public contracts are fully annotated.lib/sanbase_web/live/ask_live.ex (1)
227-244: 🏗️ Heavy liftMigrate the new model selector to the
to_form/2+<.input>form pipeline.This new control is wired as raw
<select>+ event handling, which diverges from the project’s required LiveView form pattern and makes this screen harder to keep consistent with the rest of the app.As per coding guidelines: "When building forms, always use the already imported
Phoenix.Component.to_form/2... Never pass raw changesets to templates." and "Always use a form assigned viato_form/2in the LiveView, and the<.input>component in the template."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/sanbase_web/live/ask_live.ex` around lines 227 - 244, The answer_model_select/1 component currently renders a raw <select> and manual phx-change handling; refactor it to use Phoenix.Component.to_form/2 and the project's <.input> form component: in answer_model_select/1 assign a form (e.g., assign(assigns, :form, to_form(assigns.form_changeset_or_params, name: "ask"))) and assign :models from Sanbase.Knowledge.answer_models(); then replace the raw select with <.input type="select" field={:answer_model} form={`@form`} options={Enum.map(`@models`, &{&1.label, &1.key})} /> (remove the manual phx-change and selected logic). Keep the same field name "answer_model" so existing handlers continue to work and ensure the LiveView sets/updates the form assign used by to_form/2.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/sanbase/knowledge/citations.ex`:
- Around line 170-177: The markdown link interpolation in link/1 and
labelled_link/1 currently injects raw url into "[label](url)"; validate and
sanitize the URL before interpolation by parsing it (e.g., via URI.parse),
ensuring the scheme is "http" or "https", and rejecting or skipping entries with
other schemes, and escape/percent-encode any characters that could break
markdown (parentheses, angle brackets, spaces) so the final string uses a safe,
canonical URL; keep using Context.escape_label(label) for the label and only
interpolate the sanitized/validated URL into the returned string.
In `@lib/sanbase/knowledge/knowledge.ex`:
- Around line 288-301: The function answer_academy_hits currently swallows
non-{:ok, _} results from Sanbase.Knowledge.Academy.search_chunks by returning
{:ok, []}; change it to propagate errors instead: use a with to call
Sanbase.Knowledge.Academy.search_chunks(embedding, `@retrieval_top_k`) and only on
{:ok, raw_chunks} run the pipeline (filter_by_similarity, rerank,
diversify_by_document, maybe_expand_context) and return {:ok, hits}, but on any
non-{:ok, reason} return that {:error, reason} so backend failures are not
masked.
In
`@priv/repo/migrations/20260604120000_add_context_expansion_to_question_answer_logs.exs`:
- Around line 5-7: Update the migration so the new column is non-null with a
default false: in the alter table(:question_answer_logs) block change
add(:context_expansion, :boolean) to add(:context_expansion, :boolean, default:
false, null: false) so existing rows become false and the DB enforces a boolean
(no nil); reference the :question_answer_logs table and :context_expansion
column in the migration file to locate the change.
---
Outside diff comments:
In `@lib/openai/question.ex`:
- Around line 22-30: The doc block for tracing_opts on the ask/… functions is
missing the newly supported :response_format option; update the documentation
for ask (and the repeated block around lines 45-47) to include
`:response_format` in the tracing_opts list, describing that it selects the
expected response format (e.g., "text", "json", or a schema) and any default
behavior, so callers of ask/… and the ask function’s tracing options know this
supported key.
---
Nitpick comments:
In `@lib/open_router/question.ex`:
- Around line 41-43: Add explicit `@spec` declarations for the exported API: add
`@spec` ask(question :: String.t()) :: {:ok, map()} | {:error, term()} for the
ask/1 function, add `@spec` default_model() :: String.t(), `@spec`
configured_model() :: String.t(), and `@spec` openrouter_apikey() :: String.t() |
nil for the respective zero-arity helpers; also ensure any other public
functions around lines 139-151 have matching `@specs` following the same style so
the module's public contracts are fully annotated.
In `@lib/openai/question.ex`:
- Around line 16-17: The public function default_model/0 is missing an `@spec`;
add a typespec for it (e.g., `@spec` default_model() :: String.t()) immediately
above def default_model(), referencing the existing `@model` module attribute as
the returned type; ensure the spec matches the actual type of `@model` (String.t()
or atom() as appropriate) so Dialyzer and the project's coding guidelines are
satisfied.
In `@lib/sanbase_web/live/ask_live.ex`:
- Around line 227-244: The answer_model_select/1 component currently renders a
raw <select> and manual phx-change handling; refactor it to use
Phoenix.Component.to_form/2 and the project's <.input> form component: in
answer_model_select/1 assign a form (e.g., assign(assigns, :form,
to_form(assigns.form_changeset_or_params, name: "ask"))) and assign :models from
Sanbase.Knowledge.answer_models(); then replace the raw select with <.input
type="select" field={:answer_model} form={`@form`} options={Enum.map(`@models`,
&{&1.label, &1.key})} /> (remove the manual phx-change and selected logic). Keep
the same field name "answer_model" so existing handlers continue to work and
ensure the LiveView sets/updates the form assign used by to_form/2.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 207ef979-921d-4992-bd2a-f6eb018186a0
📒 Files selected for processing (14)
lib/open_router/question.exlib/openai/question.exlib/sanbase/knowledge/citations.exlib/sanbase/knowledge/context.exlib/sanbase/knowledge/knowledge.exlib/sanbase/knowledge/question_answer_log.exlib/sanbase_web/live/admin/faq/history_live.exlib/sanbase_web/live/admin/faq/history_show.exlib/sanbase_web/live/ask_live.expriv/repo/migrations/20260604120000_add_context_expansion_to_question_answer_logs.exspriv/repo/migrations/20260604130000_add_model_to_question_answer_logs.exspriv/repo/structure.sqltest/sanbase/knowledge/citations_test.exstest/sanbase/knowledge/context_test.exs
8705fc7 to
f4fb672
Compare
f4fb672 to
4e01873
Compare
1b1c3c0 to
972d3ba
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/sanbase_web/live/ask_live.ex (1)
95-97:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate the selected model key in the event handler.
The
"select_answer_model"event handler accepts any string from the client without validation. If a user sends an invalid or malicious key (e.g., via browser devtools), it will be stored in the socket and later passed toAnswerModel.options_for/1, which returns[]for unknown keys (falling back to the default). While this is safe, explicitly validating the key provides better defense-in-depth and clearer error handling.🔒 Proposed fix to validate the model key
`@impl` true def handle_event("select_answer_model", %{"answer_model" => key}, socket) do - {:noreply, assign(socket, :answer_model, key)} + valid_keys = Enum.map(AnswerModel.selectable(), & &1.key) + + if key in valid_keys do + {:noreply, assign(socket, :answer_model, key)} + else + {:noreply, socket} + end endAlternatively, add a flash message for invalid selections.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/sanbase_web/live/ask_live.ex` around lines 95 - 97, The handler handle_event("select_answer_model", %{"answer_model" => key}, socket) currently assigns any client-provided string; update it to validate the key against the known set of models (e.g., via AnswerModel.options_for/1 or a new AnswerModel.valid?/1 or list of allowed keys) before assigning to :answer_model, and if invalid either ignore the change (keep existing socket.assigns.answer_model) or set a safe default and optionally push a flash/error; ensure the validation logic lives in the handler so malicious or unknown keys are never directly stored in the socket.
🧹 Nitpick comments (2)
lib/sanbase_web/live/ask_live.ex (1)
11-23: Depends onAnswerModel.default_key/0fix.Line 18 calls
AnswerModel.default_key()during mount. If the fix for the empty-list crash inanswer_model.exis not applied, this LiveView will crash on mount when no models are selectable.Ensure the fix for
AnswerModel.default_key/0is applied before deploying this LiveView change, or add defensive handling here:answer_model: AnswerModel.default_key() || "gpt-5-nano"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/sanbase_web/live/ask_live.ex` around lines 11 - 23, The mount assigns answer_model via AnswerModel.default_key() which can crash if that function returns nil/raises; either ensure the fix is applied inside AnswerModel.default_key/0 so it never returns nil, or change the LiveView mount to defensively set answer_model to a safe fallback (e.g. use AnswerModel.default_key() || "gpt-5-nano") before assigning; update the mount function (the assign call that sets :answer_model) to use this fallback or verify AnswerModel.default_key/0 behavior to prevent mount crashes.test/sanbase/knowledge/answer_model_test.exs (1)
26-30: ⚡ Quick winAdd test coverage for
default_key/0when no models are available.The current test only validates the happy path. Add a test case that verifies behavior when
selectable/0returns an empty list (e.g., when allrequires_envconditions fail). Given the main module currently useshd/1unsafely, this test would help document the expected behavior once that issue is fixed.🧪 Proposed test for edge case
describe "default_key/0" do test "is the first available entry (always present gpt-5-nano)" do assert AnswerModel.default_key() == "gpt-5-nano" end + + # This test would require the ability to clear env vars in test, or mock selectable/0 + # Uncomment once default_key/0 handles empty list gracefully + # test "raises when no models are available" do + # # Would need to mock System.get_env to return nil for all requires_env checks + # assert_raise RuntimeError, ~r/No selectable answer models/, fn -> + # AnswerModel.default_key() + # end + # end endBased on learnings: Focus on testing public context APIs and structure tests with Arrange-Act-Assert pattern.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/sanbase/knowledge/answer_model_test.exs` around lines 26 - 30, Add a test in test/sanbase/knowledge/answer_model_test.exs that simulates AnswerModel.selectable/0 returning an empty list and asserts the current behavior of AnswerModel.default_key/0 in that case; specifically, stub or temporarily override selectable/0 to return [] and assert that calling default_key/0 raises an ArgumentError (hd/1 on empty list) so the edge case is covered and documented until the hd/1 usage in default_key/0 is made safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/sanbase/knowledge/answer_model.ex`:
- Around line 54-56: Replace the unsafe hd(selectable()) in default_key/0 with a
safe lookup using List.first(selectable()) and return a safe result (e.g. {:ok,
key} when a model exists or {:error, :no_models_available} when the list is
empty); update callers (notably the LiveView mount in ask_live.ex) to
pattern-match on the tuple and handle the {:error, :no_models_available} case
gracefully instead of crashing.
- Around line 24-44: The `@models` list contains invalid OpenAI model IDs for the
entries with key "gpt-5-nano" and "gpt-5-mini"; update the model fields in those
map entries (referenced in the module's `@models`) to the correct OpenAI
identifiers (for example replace "gpt-5-nano" -> "gpt-5.4-nano" and "gpt-5-mini"
-> "gpt-5.4-mini" or another supported ID like "gpt-5.5" as appropriate), or
alternatively read the desired model from configuration/env and use that value
as the model field to provide a safe fallback; ensure the client remains
Sanbase.OpenAI.Question and keep the keys/labels intact.
- Around line 46-89: Add doctest-style usage examples to each public function's
`@doc`: include minimal example calls and expected return forms for selectable/0,
default_key/0, options_for/1, client/1 and resolve/1 so docs compile as tests;
update the `@doc` for selectable(), default_key(), options_for(key),
client(options) and resolve(options \\ []) to show one concrete call (e.g.
calling selectable() and showing a list/map shape, calling
options_for("some_key") returning a keyword list or [], client/1 returning a
module, resolve/1 returning a string) while keeping examples simple and not
relying on external state (use pattern that works in tests or guard with
explanatory comment).
---
Outside diff comments:
In `@lib/sanbase_web/live/ask_live.ex`:
- Around line 95-97: The handler handle_event("select_answer_model",
%{"answer_model" => key}, socket) currently assigns any client-provided string;
update it to validate the key against the known set of models (e.g., via
AnswerModel.options_for/1 or a new AnswerModel.valid?/1 or list of allowed keys)
before assigning to :answer_model, and if invalid either ignore the change (keep
existing socket.assigns.answer_model) or set a safe default and optionally push
a flash/error; ensure the validation logic lives in the handler so malicious or
unknown keys are never directly stored in the socket.
---
Nitpick comments:
In `@lib/sanbase_web/live/ask_live.ex`:
- Around line 11-23: The mount assigns answer_model via
AnswerModel.default_key() which can crash if that function returns nil/raises;
either ensure the fix is applied inside AnswerModel.default_key/0 so it never
returns nil, or change the LiveView mount to defensively set answer_model to a
safe fallback (e.g. use AnswerModel.default_key() || "gpt-5-nano") before
assigning; update the mount function (the assign call that sets :answer_model)
to use this fallback or verify AnswerModel.default_key/0 behavior to prevent
mount crashes.
In `@test/sanbase/knowledge/answer_model_test.exs`:
- Around line 26-30: Add a test in test/sanbase/knowledge/answer_model_test.exs
that simulates AnswerModel.selectable/0 returning an empty list and asserts the
current behavior of AnswerModel.default_key/0 in that case; specifically, stub
or temporarily override selectable/0 to return [] and assert that calling
default_key/0 raises an ArgumentError (hd/1 on empty list) so the edge case is
covered and documented until the hd/1 usage in default_key/0 is made safe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73927631-d583-403a-86f5-10baa777598c
📒 Files selected for processing (6)
lib/sanbase/knowledge/answer_model.exlib/sanbase/knowledge/citations.exlib/sanbase/knowledge/knowledge.exlib/sanbase_web/live/ask_live.expriv/repo/structure.sqltest/sanbase/knowledge/answer_model_test.exs
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/sanbase/knowledge/citations.ex
- lib/sanbase/knowledge/knowledge.ex
Changes
Ticket
Checklist:
Summary by CodeRabbit
New Features
Behavior / UX
Admin Improvements
Database
Tests