Skip to content

Meguazy/semantic-layer-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Query Agent

A natural-language analytics agent for vehicle sales data. Ask questions in plain English; get structured JSON results and a plain-English summary.

Architecture

User question
  → synonym detection   (pre-LLM hint injection)
  → LLM #1             (parse intent → structured JSON)
  → synonym resolution  (post-LLM canonical name fix)
  → validation         (check against YAML semantic model)
  → pandas             (filter + groupby + aggregate)
  → LLM #2             (plain-English summary)
  → LLM #3             (confidence score — LLM-as-a-judge)
  → Final JSON

Multi-turn support is handled by SemanticQueryAgent in conversation_manager.py, which maintains conversation history and injects prior context into each LLM call.

Project structure

.
├── src/
│   ├── pipeline_functions.py  # Individual pipeline steps
│   ├── conversation_manager.py # SemanticQueryAgent (stateful, multi-turn)
│   ├── constants.py           # Prompts, paths, config
│   └── run_agent.py           # CLI entry point
├── data/
│   ├── semantic_model.yaml    # Metrics, dimensions, time periods, synonyms
│   ├── sales_data.json        # Mock transaction records
│   └── test_questions.json    # Eight sample questions
├── tests/
│   ├── unit/src/agent/        # One file per function, fully mocked
│   └── integration/
│       └── test_pipeline.py   # End-to-end tests against the real API
├── conftest.py
├── pyproject.toml
└── .env.example

Prerequisites

Python 3.12+

python3 --version

On macOS with Homebrew: brew install python@3.12. On Ubuntu/Debian: sudo apt install python3.12.

uv

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Anthropic API key

Get one from console.anthropic.com. The agent uses claude-sonnet-4-20250514 for all LLM calls.

Setup

git clone <repo-url>
cd holisticon_assessment

uv sync

cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY

Run

uv run python src/run_agent.py

Runs the five test questions, then enters an interactive loop.

Output shape

{
  "interpretation": {
    "metrics": ["avg_deal_margin"],
    "dimensions": ["region"],
    "filters": {},
    "time_period": "last_quarter",
    "confidence": 0.95,
    "confidence_reasoning": "Exact match — all fields map directly."
  },
  "results": [{"region": "Nordic", "avg_deal_margin": 14.23}, "..."],
  "summary": "In Q3 2025, Nordic had the highest average deal margin at 14.23%."
}

Vague questions return {"type": "clarification", "message": "..."}. Off-topic questions return {"type": "out_of_scope", "message": "..."}.

Tests

uv run pytest tests/unit/ -v       # no API key needed
uv run pytest tests/integration/ -v  # requires ANTHROPIC_API_KEY
uv run pytest -v                   # all (integration auto-skipped if no key)

Semantic model

Defined in data/semantic_model.yaml:

  • Metrics: total_revenue, units_sold, avg_deal_margin, avg_sale_price, total_cost, avg_days_to_sale
  • Dimensions: region, vehicle_model, vehicle_type, dealer_name, customer_segment, sale_month, sale_quarter
  • Time periods: ytd, last_quarter, current_quarter, last_month, last_year
  • Synonyms: informal terms mapped to canonical names (e.g. "EV"vehicle_type = "Electric")

To extend the model, edit semantic_model.yaml and add any new metrics to METRIC_AGG in src/constants.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages