You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR added an IS NOT NULL guard on the vector column in the WHERE clause of the vector search query. Rows with no embedding are now excluded before scanning, so the query succeeds and returns results from rows that have been embedded.
Refactor
Extracted query construction from VectorSearch into a pure buildVectorSearchQuery function to make the logic unit-testable without a database connection.
Tests
Added TestBuildVectorSearchQuery covering all four combinations:
No filter, no minSimilarity — NULL guard always present
minSimilarity only — threshold condition at $3, correct arg ordering
Filter only — filter params start at $3, NULL guard appended
Filter + minSimilarity — filter params shift to $4, minSimilarity at $3
More details of issue:
When a RAG service is initially deployed on a single node and the database is later scaled to multiple nodes via Spock replication, queries to the newly added nodes return "No relevant information found" with tokens_used: 0.
Root cause: on newly joined nodes, the embedding column contains NULL for rows that haven't been processed by the background embedder yet. The pgvector cosine distance operator (<=>) returns NULL when either operand is NULL, and scanning NULL into *float64 fails at runtime:
level=WARN msg="vector search failed" error="failed to scan row: can't scan into dest[1] (col: score): cannot scan NULL into *float64"
This caused vector search to return zero results, triggering the "No relevant information found" fallback response.
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer TIP This summary will be updated as you push new changes.
This change refactors the vector search query construction logic by extracting SQL assembly from the (*Pool).VectorSearch method into a new buildVectorSearchQuery helper function. The helper constructs the WHERE clause by combining filter conditions with a vector null guard and optional min-similarity predicate, returning the query string and arguments. A comprehensive test suite with 102 lines is added to validate the generated SQL contains expected clauses, correct parameter placeholders, and proper argument values at specified indices.
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name
Status
Explanation
Resolution
Docstring Coverage
⚠️ Warning
Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.
Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name
Status
Explanation
Title check
✅ Passed
The title accurately and clearly summarizes the main change: adding an IS NOT NULL guard to exclude NULL embeddings from vector search queries to prevent scan errors.
Linked Issues check
✅ Passed
Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check
✅ Passed
Check skipped because no linked issues were found for this pull request.
Description check
✅ Passed
The pull request description clearly explains the changes, root cause, and rationale: adding IS NOT NULL guard on vector column, refactoring query construction into a testable helper function, and providing comprehensive test coverage with specific scenarios.
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches📝 Generate docstrings
Create stacked PR
Commit on current branch
🧪 Generate unit tests (beta)
Create PR with unit tests
Commit unit tests in branch PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR added an
IS NOT NULLguard on the vector column in theWHEREclause of the vector search query. Rows with no embedding are now excluded before scanning, so the query succeeds and returns results from rows that have been embedded.Refactor
Extracted query construction from VectorSearch into a pure buildVectorSearchQuery function to make the logic unit-testable without a database connection.
Tests
Added TestBuildVectorSearchQuery covering all four combinations:
No filter, no minSimilarity — NULL guard always present
minSimilarity only — threshold condition at $3, correct arg ordering
Filter only — filter params start at $3, NULL guard appended
Filter + minSimilarity — filter params shift to $4, minSimilarity at $3
More details of issue:
When a RAG service is initially deployed on a single node and the database is later scaled to multiple nodes via Spock replication, queries to the newly added nodes return "No relevant information found" with tokens_used: 0.
Root cause: on newly joined nodes, the embedding column contains NULL for rows that haven't been processed by the background embedder yet. The pgvector cosine distance operator (<=>) returns NULL when either operand is NULL, and scanning NULL into *float64 fails at runtime:
level=WARN msg="vector search failed" error="failed to scan row: can't scan into dest[1] (col: score): cannot scan NULL into *float64"This caused vector search to return zero results, triggering the "No relevant information found" fallback response.
PLAT-584