Report: Finance AI Agent Report
Tools: Python, Neo4j, LangChain, OpenAI GPT-4, Streamlit
- Yuwen (April) Yang — GitHub | LinkedIn
- Wenjun Song - GitHub | LinkedIn
- Selina Bian - GitHub | LinkedIn
- Xinxin (Stephanie) Liu - GitHub | LinkedIn
10-K filings are long, complex, and packed with valuable insights--but hard to parse and compare quickly. This makes it difficult for investors, analysts, and finance professionals to make fast, informed decisions.
Finance AI Agent is a GraphRAG-powered Q&A system that transforms raw 10-K filings into a structured knowledge graph. It supports natural language queries, connects key concepts like risks and strategies, and delivers AI-powered answers in seconds.
Without better tools:
- Valuable signals in 10-Ks go unnoticed.
- Traditional systems can’t interpret unstructured data effectively.
- Decision-makers lack timely, contextual answers to complex questions.
We asked:
- Can we extract and connect meaningful entities from filings automatically?
- Can users ask natural-language questions and get context-aware, trustworthy answers?
- Can we build a system that scales and stays grounded in facts?
Our end-to-end GraphRAG pipelline integrates:
- Data Automation – Scrape and parse SEC 10-Ks using Serper and CrewAI
- Entity Extraction – spaCy NER + risk-specific keywords
- Graph Construction – Build Neo4j knowledge graph with
TextChunk,Entity, and relationships - Semantic Search – Embed questions + text chunks using OpenAI embeddings
- Answer Generation – LangChain QA chain (RetrievalQAWithSources) returns grounded, cited answers
Source:
- SEC EDGAR
- Parsed into JSON format, retaining key sections:
item1– Business Overviewitem1a– Risk Factorsitem7– MD&Aitem7a– Market Risk
Tech Highlights:
- Paragraph-level chunking improves retrieval accuracy
- Entity detection via spaCy and PhraseMatcher
- Graph schema:
TextChunk,Entity,MENTIONS,CO_OCCURS_WITH - Yearly refresh with logging for automation tracking
- Basic: Generic spaCy NER (ORG, PRODUCT, etc.)
- Advanced: Added risk-specific phrases like “supply chain disruption”, “macroeconomic downturn”, “inflation risk”
- Nodes:
Entity,TextChunk - Edges:
MENTIONS,CO_OCCURS_WITH - Optimized for fast, context-aware retrieval
- Question encoded into 1536-dim vector (OpenAI)
- Top-k vector search → select best supporting chunks
- Answers generated by LangChain QA chain using GPT-4
- Sources always cited, ensuring grounded answers
Users ask questions like:
“What are Netflix’s core products?”
“What inflation risks does Tesla mention?”
“How is Johnson & Johnson handling supply chain issues?”
📌 Business Value: Context-aware answers in plain English, reducing research time by 90%
- CO_OCCURS_WITH links reveal hidden relationships between risks, products, and strategies
- MENTIONS tracks what entities are discussed in each section
📌 Business Value: Improves strategic foresight and reduces oversight
- No hallucination risk: system only generates answers from retrieved 10-K chunks
- Every answer is traceable back to its source
📌 Business Value: Builds trust and ensures compliance in financial reporting
Ask questions about real 10-K filings and receive grounded, AI-generated answers
Screen.Recording.2025-06-02.at.09.59.22.mov
-
Clone this repository
-
Install the required dependencies:
pip install -r requirements.txt
-
Create a
.envfile in the root directory with the following variables:NEO4J_URI=your_neo4j_uri NEO4J_USERNAME=your_neo4j_username NEO4J_PASSWORD=your_neo4j_password NEO4J_DATABASE=your_neo4j_database OPENAI_API_KEY=your_openai_api_key
-
Start the Streamlit app:
streamlit run app.py
-
Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:8501)
- Enter your question in the text input field
- The system will search the knowledge graph and provide an answer based on the relevant information from the SEC filings
- Example questions are provided to help you get started
- What is Netflix's primary business?
- Where is Apple headquartered?
- What are the top risks mentioned in Johnson & Johnson's 10-K?
- Where are the primary suppliers for Tesla?
- How is ExxonMobil addressing climate change and the energy transition?
Make sure your Neo4j database is properly set up with the knowledge graph containing the SEC filing data before running the application.
