Improve Knowledge Base re-embeddings#5200
Conversation
📝 WalkthroughWalkthroughAdds prune_all_stale/0 to remove embeddings for posts no longer published; embed_all_posts/0 gains per-batch error handling; embed_posts_batch/1 now replaces delete+insert with an atomic Repo.transaction. A new DB-backed test asserts prune_all_stale/0 behavior. ChangesPost Embedding Atomicity and Pruning
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/sanbase/insight/post_embedding_prune_test.exs (1)
1-11: ⚡ Quick winConsider adding a test case that verifies stale embeddings are actually pruned.
The current test is a useful smoke test confirming the query executes without error, but it doesn't verify the core pruning logic. A more complete test would:
- Insert a published post with embeddings (should be kept)
- Insert an unpublished/draft post with embeddings (should be pruned)
- Call
prune_all_stale/0- Assert the return count matches expected deletions
- Assert embeddings for the published post still exist
This would catch regressions in the
NOT IN (subquery)logic.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/sanbase/insight/post_embedding_prune_test.exs` around lines 1 - 11, Add a focused test to verify prune_all_stale/0 actually deletes embeddings for stale/unpublished posts while preserving published ones: in Sanbase.Insight.PostEmbeddingPruneTest insert one published post with embeddings and one unpublished/draft post with embeddings, call PostEmbedding.prune_all_stale(), assert the returned deleted count equals the number of embeddings for the draft post, and finally assert embeddings for the published post still exist (and embeddings for the draft post are gone) so the NOT IN (subquery) pruning logic is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/sanbase/insight/post_embedding.ex`:
- Around line 152-159: The transaction result from Sanbase.Repo.transaction
(wrapping Sanbase.Repo.delete_all(from(pe in __MODULE__, where: pe.post_id in
^post_ids)) and Sanbase.Repo.insert_all(__MODULE__, data)) is currently
discarded, so failures are silent; change the call to return the transaction
result and propagate it up out of the function that performs the batch write,
then update embed_all_posts/0 (which currently uses Enum.each and ignores
returns) to handle the {:ok, _} and {:error, reason} tuples (e.g., log the error
and decide whether to continue or halt) so batch insert errors are visible and
handled.
---
Nitpick comments:
In `@test/sanbase/insight/post_embedding_prune_test.exs`:
- Around line 1-11: Add a focused test to verify prune_all_stale/0 actually
deletes embeddings for stale/unpublished posts while preserving published ones:
in Sanbase.Insight.PostEmbeddingPruneTest insert one published post with
embeddings and one unpublished/draft post with embeddings, call
PostEmbedding.prune_all_stale(), assert the returned deleted count equals the
number of embeddings for the draft post, and finally assert embeddings for the
published post still exist (and embeddings for the draft post are gone) so the
NOT IN (subquery) pruning logic is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 19f67815-d5cc-4bce-a149-8ba375f192ab
📒 Files selected for processing (2)
lib/sanbase/insight/post_embedding.extest/sanbase/insight/post_embedding_prune_test.exs
730996d to
36f6322
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/sanbase/insight/post_embedding.ex`:
- Around line 168-171: Remove the inline block comment above the atomic replace
logic and instead document the behavior on the embed_posts_batch/1 function: add
a `@doc` string to embed_posts_batch/1 that explains embeddings are generated
before the transaction and that the delete+insert are performed atomically, or
extract the delete+insert logic into a descriptively named helper (e.g.,
replace_embeddings_atomically/2) and add a `@doc` to that helper so the behavior
is self-documenting rather than using an inline comment.
- Around line 56-58: Remove the inline comment inside embed_all_posts/0 and
instead add a short `@doc` for embed_all_posts/0 that documents the per-batch
error handling behavior (that failing batches are logged and skipped and DB
errors inside the transaction are rescued in addition to {:error, _} rollbacks).
Alternatively, make the code self-documenting by renaming local variables or
extracting a helper (e.g., handle_batch_result/1 or log_and_skip_batch/1) so the
behavior is clear without an inline comment; ensure the new `@doc` or helper name
mentions logging and skipping failing batches.
- Around line 94-104: Add an `@spec` for prune_all_stale/0 (e.g. `@spec`
prune_all_stale() :: {:ok, non_neg_integer()} | {:error, term()}) and change the
implementation to return a tagged tuple: wrap the DB call in a safe block (use
try/rescue or with/case around Sanbase.Repo.delete_all to catch exceptions) so
successful deletes return {:ok, deleted} and failures return {:error, reason};
update Logger calls to log on success and errors accordingly and ensure callers
that relied on an integer are updated to handle the {:ok, count} / {:error,
reason} tuple.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4207c7dd-22bd-429b-9a3b-4676a0119a9e
📒 Files selected for processing (2)
lib/sanbase/insight/post_embedding.extest/sanbase/insight/post_embedding_prune_test.exs
🚧 Files skipped from review as they are similar to previous changes (1)
- test/sanbase/insight/post_embedding_prune_test.exs
Changes
Ticket
Checklist:
Summary by CodeRabbit
New Features
Improvements
Tests