Skip to content

[Example Request] Retrieval eval notebook for financial document RAG — precision/recall benchmarking on SEC filings with Pinecone #584

@Ruthwik-Data

Description

@Ruthwik-Data

What example would you like to see?

A Jupyter notebook demonstrating retrieval quality benchmarking for financial document RAG using Pinecone. This is one of the highest-value use cases for Pinecone — SEC filings, earnings calls, and financial reports — but there's no existing example that shows how to evaluate retrieval precision on structured financial data.

Why this example is needed

Existing Pinecone examples show how to build RAG pipelines, but not how to measure retrieval quality on domain-specific data. Financial documents have specific challenges:

  1. Heterogeneous structure — 10-Ks mix prose (MD&A, Risk Factors) with tables (Balance Sheets, Income Statements) and footnotes
  2. Section-boundary bleed — vector search retrieves chunks from adjacent sections that are semantically similar but not contextually relevant to the query
  3. Numerical precision — retrieving the right financial figure matters ("current assets FY2024" vs. "total assets FY2023")
  4. No existing benchmark — there's no Pinecone example showing retrieval precision@k on a financial QA dataset

Proposed notebook outline

1. Dataset: FinanceBench (public) or custom 10-K QA pairs
2. Indexing: Chunk 10-K with section-aware metadata
   - doc_type, section, fiscal_year, chunk_role (table/prose/footnote)
3. Query evaluation:
   - For each QA pair, retrieve top-k chunks
   - Score precision@1, precision@5, NDCG@10
4. Metadata filter comparison:
   - Baseline: no filters (pure vector search)
   - Filtered: section + fiscal_year filters applied
5. Show: how metadata filtering improves precision from ~0.55 → ~0.82
   on financial queries

Example code sketch

import pinecone
from pinecone import Pinecone

pc = Pinecone(api_key="...")
index = pc.Index("financial-rag")

# Baseline: pure semantic search
results_baseline = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True
)

# Filtered: section-aware + fiscal year
results_filtered = index.query(
    vector=query_embedding,
    top_k=5,
    filter={
        "section": {"$in": ["balance_sheet", "income_statement"]},
        "fiscal_year": {"$eq": "2024"}
    },
    include_metadata=True
)

# Eval: compare precision@k
def precision_at_k(results, ground_truth_ids, k=5):
    retrieved_ids = [r.id for r in results.matches[:k]]
    hits = len(set(retrieved_ids) & set(ground_truth_ids))
    return hits / k

Why I'm well-placed to contribute this

I've been building financial RAG eval frameworks and have a working prototype in my finrag-eval project. I can contribute this notebook as a PR if the team is interested. I'm also familiar with FinanceBench as a QA dataset and can set up the ground truth labels.

Happy to discuss the scope and any preferred notebook format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions