Query a Docs2DB RAG database with modern retrieval techniques. Docs2DB-API provides a Python library for hybrid search (vector + BM25) with reranking.
What it does:
- Queries RAG databases created by docs2db
- Hybrid search: combines vector similarity with BM25 full-text search
- Reciprocal Rank Fusion (RRF) for result combination
- Cross-encoder reranking for improved result quality
- Question refinement for query expansion
- Universal RAG engine adaptable to multiple API frameworks
What it's for:
- Building RAG applications and agents
- Adding document search to LLM systems
- Serving RAG APIs (FastAPI, LlamaStack, custom frameworks)
uv add docs2db-apiStep 1: Create a database with docs2db
uv tool install docs2db
docs2db pipeline /path/to/documentsThis creates ragdb_dump.sql.
Step 2: Restore and query
# Start database
uv run docs2db-api db-start
# Restore dump
uv run docs2db-api db-restore ragdb_dump.sql
# Check status
uv run docs2db-api db-statusStep 3: Use in your application
import asyncio
from docs2db_api.rag.engine import UniversalRAGEngine, RAGConfig
async def main():
# Initialize engine with defaults (auto-detects database from environment)
engine = UniversalRAGEngine()
await engine.start()
# # Or with specific settings
# config = RAGConfig(
# model_name="granite-30m-english",
# max_chunks=5,
# similarity_threshold=0.7
# )
# db_config = {
# "host": "localhost",
# "port": "5432",
# "database": "ragdb",
# "user": "postgres",
# "password": "postgres"
# }
# engine = UniversalRAGEngine(config=config, db_config=db_config)
# await engine.start()
# Search
result = await engine.search_documents("How do I configure authentication?")
for doc in result.documents:
print(f"Score: {doc['similarity_score']:.3f}")
print(f"Source: {doc['document_path']}")
print(f"Text: {doc['text'][:200]}...\n")
asyncio.run(main())Docs2DB-API includes a native LlamaStack tool provider for agent-based RAG. See the complete demo with setup scripts and examples:
📁 demos/llama-stack/ - LlamaStack RAG tool provider with agent demos
(this needs to be adjusted to work from pypi)
Configuration precedence (highest to lowest):
- CLI arguments:
--host,--port,--db,--user,--password - Environment variables:
DOCS2DB_DB_HOST,DOCS2DB_DB_PORT,DOCS2DB_DB_DATABASE,DOCS2DB_DB_USER,DOCS2DB_DB_PASSWORD DOCS2DB_DB_URL:postgresql://user:pass@host:port/databasepostgres-compose.ymlin current directory- Defaults:
localhost:5432, user=postgres, password=postgres, db=ragdb
Examples:
# Use defaults
uv run docs2db-api db-status
# Environment variables
export DOCS2DB_DB_HOST=prod.example.com
export DOCS2DB_DB_DATABASE=mydb
uv run docs2db-api db-status
# DOCS2DB_DB_URL (cloud providers)
export DOCS2DB_DB_URL="postgresql://user:pass@host:5432/db"
uv run docs2db-api db-status
# CLI arguments
uv run docs2db-api db-status --host localhost --db mydbNote: Don't mix DOCS2DB_DB_URL with individual DOCS2DB_DB_* variables.
Configure the LLM used for query refinement:
export DOCS2DB_LLM_BASE_URL=http://localhost:11434 # OpenAI-compatible API (e.g., Ollama)
export DOCS2DB_LLM_MODEL=qwen2.5:7b-instruct # Model name
export DOCS2DB_LLM_TIMEOUT=30.0 # HTTP timeout (seconds)
export DOCS2DB_LLM_TEMPERATURE=0.7 # Generation temperature
export DOCS2DB_LLM_MAX_TOKENS=500 # Max tokens per responseexport DOCS2DB_OFFLINE=true # Only use locally cached embedding model (no downloads)By default, the embedding model is downloaded automatically on first use. Set DOCS2DB_OFFLINE=true for airgapped/offline environments where the model must already be cached.
RAG settings control retrieval behavior (similarity thresholds, reranking, refinement, etc.) and can be stored in the database or provided at query time.
refinement_prompt- Custom prompt for query refinementenable_refinement(refinement) - Enable question refinement (true/false)enable_reranking(reranking) - Enable cross-encoder reranking (true/false)similarity_threshold- Similarity threshold 0.0-1.0max_chunks- Maximum chunks to returnmax_tokens_in_context- Maximum tokens in context windowrefinement_questions_count- Number of refined questions to generate
- Query parameters - Passed directly to
engine.search_documents()or CLI--threshold,--limit, etc. - RAGConfig object - Provided when initializing
UniversalRAGEngine - Database settings - Stored in database via
docs2db configcommand (see docs2db) - Code defaults - Built-in fallback values
docs2db-api db-start # Start PostgreSQL with Podman/Docker
docs2db-api db-stop # Stop PostgreSQL (data preserved)
docs2db-api db-destroy # Stop and delete all data
docs2db-api db-status # Check connection and stats
docs2db-api db-restore <file> # Restore database from dump
docs2db-api manifest # Generate list of documents# Basic search
docs2db-api query "How do I configure authentication?"
# Advanced options
docs2db-api query "deployment guide" \
--model granite-30m-english \
--limit 20 \
--threshold 0.8 \
--no-refine # Disable question refinementDocs2DB-API implements modern retrieval techniques:
- Contextual chunks - LLM-generated context situating each chunk within its document (Anthropic's approach)
- Hybrid search - Combines BM25 (lexical) and vector embeddings (semantic)
- Reciprocal Rank Fusion (RRF) - Intelligent result combination
- Cross-encoder reranking - Improved result quality
- Question refinement - Query expansion for better matches
- PostgreSQL full-text search - tsvector with GIN indexing for BM25
- pgvector similarity - Fast vector search with HNSW indexes
- Universal RAG engine - Adaptable to multiple API frameworks
See LICENSE for details.