What example would you like to see?
A Jupyter notebook demonstrating retrieval quality benchmarking for financial document RAG using Pinecone. This is one of the highest-value use cases for Pinecone — SEC filings, earnings calls, and financial reports — but there's no existing example that shows how to evaluate retrieval precision on structured financial data.
Why this example is needed
Existing Pinecone examples show how to build RAG pipelines, but not how to measure retrieval quality on domain-specific data. Financial documents have specific challenges:
- Heterogeneous structure — 10-Ks mix prose (MD&A, Risk Factors) with tables (Balance Sheets, Income Statements) and footnotes
- Section-boundary bleed — vector search retrieves chunks from adjacent sections that are semantically similar but not contextually relevant to the query
- Numerical precision — retrieving the right financial figure matters ("current assets FY2024" vs. "total assets FY2023")
- No existing benchmark — there's no Pinecone example showing retrieval precision@k on a financial QA dataset
Proposed notebook outline
1. Dataset: FinanceBench (public) or custom 10-K QA pairs
2. Indexing: Chunk 10-K with section-aware metadata
- doc_type, section, fiscal_year, chunk_role (table/prose/footnote)
3. Query evaluation:
- For each QA pair, retrieve top-k chunks
- Score precision@1, precision@5, NDCG@10
4. Metadata filter comparison:
- Baseline: no filters (pure vector search)
- Filtered: section + fiscal_year filters applied
5. Show: how metadata filtering improves precision from ~0.55 → ~0.82
on financial queries
Example code sketch
import pinecone
from pinecone import Pinecone
pc = Pinecone(api_key="...")
index = pc.Index("financial-rag")
# Baseline: pure semantic search
results_baseline = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True
)
# Filtered: section-aware + fiscal year
results_filtered = index.query(
vector=query_embedding,
top_k=5,
filter={
"section": {"$in": ["balance_sheet", "income_statement"]},
"fiscal_year": {"$eq": "2024"}
},
include_metadata=True
)
# Eval: compare precision@k
def precision_at_k(results, ground_truth_ids, k=5):
retrieved_ids = [r.id for r in results.matches[:k]]
hits = len(set(retrieved_ids) & set(ground_truth_ids))
return hits / k
Why I'm well-placed to contribute this
I've been building financial RAG eval frameworks and have a working prototype in my finrag-eval project. I can contribute this notebook as a PR if the team is interested. I'm also familiar with FinanceBench as a QA dataset and can set up the ground truth labels.
Happy to discuss the scope and any preferred notebook format.
What example would you like to see?
A Jupyter notebook demonstrating retrieval quality benchmarking for financial document RAG using Pinecone. This is one of the highest-value use cases for Pinecone — SEC filings, earnings calls, and financial reports — but there's no existing example that shows how to evaluate retrieval precision on structured financial data.
Why this example is needed
Existing Pinecone examples show how to build RAG pipelines, but not how to measure retrieval quality on domain-specific data. Financial documents have specific challenges:
Proposed notebook outline
Example code sketch
Why I'm well-placed to contribute this
I've been building financial RAG eval frameworks and have a working prototype in my finrag-eval project. I can contribute this notebook as a PR if the team is interested. I'm also familiar with FinanceBench as a QA dataset and can set up the ground truth labels.
Happy to discuss the scope and any preferred notebook format.