Skip to content

fix: exclude NULL embeddings from vector search to prevent scan errors#19

Open
tsivaprasad wants to merge 1 commit intomainfrom
PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update
Open

fix: exclude NULL embeddings from vector search to prevent scan errors#19
tsivaprasad wants to merge 1 commit intomainfrom
PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update

Conversation

@tsivaprasad
Copy link
Copy Markdown
Contributor

@tsivaprasad tsivaprasad commented Apr 30, 2026

This PR added an IS NOT NULL guard on the vector column in the WHERE clause of the vector search query. Rows with no embedding are now excluded before scanning, so the query succeeds and returns results from rows that have been embedded.

Refactor
Extracted query construction from VectorSearch into a pure buildVectorSearchQuery function to make the logic unit-testable without a database connection.

Tests
Added TestBuildVectorSearchQuery covering all four combinations:

No filter, no minSimilarity — NULL guard always present
minSimilarity only — threshold condition at $3, correct arg ordering
Filter only — filter params start at $3, NULL guard appended
Filter + minSimilarity — filter params shift to $4, minSimilarity at $3

More details of issue:
When a RAG service is initially deployed on a single node and the database is later scaled to multiple nodes via Spock replication, queries to the newly added nodes return "No relevant information found" with tokens_used: 0.

Root cause: on newly joined nodes, the embedding column contains NULL for rows that haven't been processed by the background embedder yet. The pgvector cosine distance operator (<=>) returns NULL when either operand is NULL, and scanning NULL into *float64 fails at runtime:

level=WARN msg="vector search failed" error="failed to scan row: can't scan into dest[1] (col: score): cannot scan NULL into *float64"

This caused vector search to return zero results, triggering the "No relevant information found" fallback response.

PLAT-584

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 1 medium

Results:
1 new issue

Category Results
Complexity 1 medium

View in Codacy

🟢 Metrics 2 duplication

Metric Results
Duplication 2

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

This change refactors the vector search query construction logic by extracting SQL assembly from the (*Pool).VectorSearch method into a new buildVectorSearchQuery helper function. The helper constructs the WHERE clause by combining filter conditions with a vector null guard and optional min-similarity predicate, returning the query string and arguments. A comprehensive test suite with 102 lines is added to validate the generated SQL contains expected clauses, correct parameter placeholders, and proper argument values at specified indices.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and clearly summarizes the main change: adding an IS NOT NULL guard to exclude NULL embeddings from vector search queries to prevent scan errors.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the changes, root cause, and rationale: adding IS NOT NULL guard on vector column, refactoring query construction into a testable helper function, and providing comprehensive test coverage with specific scenarios.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant