Skip to content

YuwenAprilYang/FinAgent

Repository files navigation

📈 Finance AI Agent: Your Smart Assistant for Insights on Listing Companies

Report: Finance AI Agent Report

Tools: Python, Neo4j, LangChain, OpenAI GPT-4, Streamlit

Collaborators


Executive Summary

10-K filings are long, complex, and packed with valuable insights--but hard to parse and compare quickly. This makes it difficult for investors, analysts, and finance professionals to make fast, informed decisions.

Finance AI Agent is a GraphRAG-powered Q&A system that transforms raw 10-K filings into a structured knowledge graph. It supports natural language queries, connects key concepts like risks and strategies, and delivers AI-powered answers in seconds.

Business Problem

Without better tools:

  • Valuable signals in 10-Ks go unnoticed.
  • Traditional systems can’t interpret unstructured data effectively.
  • Decision-makers lack timely, contextual answers to complex questions.

We asked:

  • Can we extract and connect meaningful entities from filings automatically?
  • Can users ask natural-language questions and get context-aware, trustworthy answers?
  • Can we build a system that scales and stays grounded in facts?

System Overview

Our end-to-end GraphRAG pipelline integrates:

  1. Data Automation – Scrape and parse SEC 10-Ks using Serper and CrewAI
  2. Entity Extraction – spaCy NER + risk-specific keywords
  3. Graph Construction – Build Neo4j knowledge graph with TextChunk, Entity, and relationships
  4. Semantic Search – Embed questions + text chunks using OpenAI embeddings
  5. Answer Generation – LangChain QA chain (RetrievalQAWithSources) returns grounded, cited answers

Data Pipeline

Source:

  • SEC EDGAR
  • Parsed into JSON format, retaining key sections:
    • item1 – Business Overview
    • item1a – Risk Factors
    • item7 – MD&A
    • item7a – Market Risk

Tech Highlights:

  • Paragraph-level chunking improves retrieval accuracy
  • Entity detection via spaCy and PhraseMatcher
  • Graph schema: TextChunk, Entity, MENTIONS, CO_OCCURS_WITH
  • Yearly refresh with logging for automation tracking

Models & Methods

1. Named Entity Recognition (NER)

  • Basic: Generic spaCy NER (ORG, PRODUCT, etc.)
  • Advanced: Added risk-specific phrases like “supply chain disruption”, “macroeconomic downturn”, “inflation risk”

2. Knowledge Graph (Neo4j)

  • Nodes: Entity, TextChunk
  • Edges: MENTIONS, CO_OCCURS_WITH
  • Optimized for fast, context-aware retrieval

3. Retrieval-Augmented Generation (RAG)

  • Question encoded into 1536-dim vector (OpenAI)
  • Top-k vector search → select best supporting chunks
  • Answers generated by LangChain QA chain using GPT-4
  • Sources always cited, ensuring grounded answers

Key Insights & Benefits

1. Turn 10-Ks into Actionable Knowledge

Users ask questions like:

“What are Netflix’s core products?”
“What inflation risks does Tesla mention?”
“How is Johnson & Johnson handling supply chain issues?”

📌 Business Value: Context-aware answers in plain English, reducing research time by 90%

2. Boost Research Efficiency with Connected Insights

  • CO_OCCURS_WITH links reveal hidden relationships between risks, products, and strategies
  • MENTIONS tracks what entities are discussed in each section

📌 Business Value: Improves strategic foresight and reduces oversight

3. Grounded, Scalable, Transparent

  • No hallucination risk: system only generates answers from retrieved 10-K chunks
  • Every answer is traceable back to its source

📌 Business Value: Builds trust and ensures compliance in financial reporting

Demonstration

Knowledge Graph Visualization Screenshot 2025-06-02 at 09 59 08

Ask questions about real 10-K filings and receive grounded, AI-generated answers

Screen.Recording.2025-06-02.at.09.59.22.mov

Setup Instructions

  1. Clone this repository

  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Create a .env file in the root directory with the following variables:

    NEO4J_URI=your_neo4j_uri
    NEO4J_USERNAME=your_neo4j_username
    NEO4J_PASSWORD=your_neo4j_password
    NEO4J_DATABASE=your_neo4j_database
    OPENAI_API_KEY=your_openai_api_key
    

Running the Application

  1. Start the Streamlit app:

    streamlit run app.py
  2. Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:8501)

Usage

  1. Enter your question in the text input field
  2. The system will search the knowledge graph and provide an answer based on the relevant information from the SEC filings
  3. Example questions are provided to help you get started

Example Questions

  • What is Netflix's primary business?
  • Where is Apple headquartered?
  • What are the top risks mentioned in Johnson & Johnson's 10-K?
  • Where are the primary suppliers for Tesla?
  • How is ExxonMobil addressing climate change and the energy transition?

Note

Make sure your Neo4j database is properly set up with the knowledge graph containing the SEC filing data before running the application.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors