Skip to content

Unify scientific RAG sources with Semantic Scholar references#24

Open
shaiananvari8 wants to merge 1 commit into
aietal:masterfrom
shaiananvari8:isaac-497-semantic-scholar-rag
Open

Unify scientific RAG sources with Semantic Scholar references#24
shaiananvari8 wants to merge 1 commit into
aietal:masterfrom
shaiananvari8:isaac-497-semantic-scholar-rag

Conversation

@shaiananvari8
Copy link
Copy Markdown

Part of the open Algora bounty for [ISAAC-497] Implement an enhanced RAG Pipeline for Scientific/Research Workflows.

/claim #45

Bounty reference: https://algora.io/isaac/bounties/clq18zr98000ejs0gt0nv7gwu

Summary

  • Add a shared scientific source helper that normalizes uploaded PDF chunks and saved Semantic Scholar references into one citation-aware RAG source model.
  • Let /api/inject-documents ingest optional Semantic Scholar reference JSON alongside uploaded PDFs, preserving paper IDs, authors, years, venues, DOI/URL data, page/chunk metadata, and stable citation keys.
  • Return structured context and sources from /api/fetch-documents, with bounded retrieval count, method/input validation, and CHROMA_PATH support.
  • Update RAG chat to fetch evidence from the current deployment origin, use the returned citation-ready context, cite exact source keys, and respect the configured temperature.
  • Add unit coverage for mixed uploaded-document and Semantic Scholar reference formatting.

Why this helps ISAAC-497

The bounty calls out unifying uploaded documents with saved references from Semantic Scholar. The current main branch only stores uploaded PDF chunks and formats raw Chroma rows. This PR keeps the existing Chroma/LangChain architecture but gives the pipeline a mixed-source evidence contract so scientific answers can cite both user documents and reference metadata consistently.

Validation

From ui/:

npx vitest run __tests__/scientific-sources.test.ts --reporter verbose
npx vitest run --reporter verbose
npx tsc --noEmit --pretty false
npm run lint -- --file pages/api/fetch-documents.ts --file pages/api/inject-documents.ts --file pages/api/rag-chat.ts --file utils/server/scientific-sources.ts --file __tests__/scientific-sources.test.ts
npm run lint
npm run build

Results:

  • Focused source helper tests passed: 5/5
  • Full Vitest suite passed: 16/16
  • TypeScript passed
  • Targeted lint passed with no warnings
  • Full lint and build passed with pre-existing React hook dependency warnings in unrelated files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant