Skip to content

hungle123-dev/analysis_agent

Repository files navigation

AI Analysis Workbench

An AI-assisted data analysis workbench for Data Visualization projects.

The application follows a strict human-in-the-loop workflow:

user request -> AI proposal -> user review/edit -> approval -> local execution -> logs/results

AI is used to suggest analysis ideas, generate Python analysis code, and explain approved outputs. It does not execute code silently, modify original datasets, or invent data, columns, charts, images, or results.

Why This Project Exists

This project was built for an AI integration requirement in a Data Visualization final project. The central idea is not to attach a chatbot to a dashboard, but to build a controlled analysis workflow where:

  • AI helps propose and write analysis code.
  • Humans review, edit, and approve the code.
  • Approved code runs locally on the user's machine.
  • Every request, proposal, approval, execution, result, error, and artifact is logged for review.

Tech Stack

Layer Technology
Backend FastAPI, Pydantic, SQLite
Frontend React, JavaScript, Vite
Styling Tailwind CSS
Code editor Monaco Editor
AI provider Mock, DeepSeek, or OpenAI-compatible API such as ds2api
Analysis runtime Local Python subprocess
Data/Charts pandas, matplotlib, seaborn, scipy, scikit-learn, statsmodels
Tests pytest

Core Safety Rules

  • AI-generated code is always shown before execution.
  • Users can edit generated code before approval.
  • Execution is allowed only after human approval.
  • Execution must happen locally, not in an online/cloud runner.
  • Original datasets must not be modified.
  • Generated code must use the input dataframe as df.
  • Generated code must copy data with work_df = df.copy() before transformations.
  • Output artifacts must be written only to outputs_dir.
  • Unsafe imports, shell/network/file operations, and direct mutation of df are blocked by the backend policy checker.

Architecture

React Frontend
  - Dataset/sidebar context
  - Prompt input
  - Monaco code editor
  - Approval controls
  - Result, Policy, and Logs inspector
        |
        v
FastAPI Backend
  - Dataset API
  - AI Proposal API
  - Approval API
  - Local Execution API
  - Logs API
        |
        v
Local Python Runner
  - Loads registered dataset as df
  - Runs approved code only
  - Captures stdout/stderr
  - Stores artifacts under runs/{run_id}/outputs
        |
        v
SQLite Audit Log

Repository Structure

.
├── backend/
│   ├── app/
│   │   ├── api/              # FastAPI routers
│   │   ├── core/             # runtime paths/config helpers
│   │   ├── db/               # SQLite storage
│   │   ├── services/         # business logic and safety rules
│   │   ├── main.py           # FastAPI app setup
│   │   └── schemas.py        # Pydantic contracts
│   ├── tests/                # pytest workflow tests
│   ├── requirements.txt
│   └── README.md
├── frontend/
│   ├── src/
│   │   ├── api/              # API client
│   │   ├── components/       # workbench UI components
│   │   ├── App.jsx
│   │   └── styles.css
│   ├── package.json
│   └── vite.config.js
├── ds2api/                  # optional local ds2api Docker setup
├── docs/ai-integration/      # design, API, prompts, demo, traceability docs
├── PHANCONG.md               # team task split
└── AGENTS.md                 # project guidance for coding agents

Dataset Visibility Rule

Datasets shown in the frontend dataset list are controlled by backend/config/datasets.json.

Each registered dataset can include:

  • allowed: backend allowlist for dataset usage
  • visible: whether the dataset appears in the frontend selector

Example:

{
  "id": "vietnam_real_estate_cleaned",
  "path": "cleaned_vietnam_real_estate.csv",
  "allowed": true,
  "visible": true
}

Backend Modules

Module Responsibility
DatasetRegistry / dataset_service.py Dataset metadata, schema, sample values, allowed dataset IDs
analysis_intent.py / dataset_capabilities.py Classifies user prompts into analysis intents and checks whether the selected dataset has the required capabilities
LLMProvider / llm_provider.py Adapter boundary for mock, DeepSeek, or OpenAI-compatible providers such as ds2api
prompt_builder.py Prompt rules and structured proposal schema
proposal_service.py Create, edit, approve, reject, and hash AI proposals
policy_checker.py AST-based unsafe-code validation
execution_runner.py Local approved-code execution and artifact capture
storage.py SQLite persistence for proposals, approvals, executions, and audit events
log_service.py Trace/audit retrieval

API Overview

Method Endpoint Purpose
GET /api/datasets List registered datasets
GET /api/datasets/{dataset_id}/context Get dataset schema/context
POST /api/ai/proposals Create an AI analysis proposal
POST /api/ai/proposals/jobs Start background AI proposal generation
GET /api/ai/proposals/jobs/{job_id} Poll background generation status
GET /api/ai/proposals/{proposal_id} Get a generated proposal
PATCH /api/ai/proposals/{proposal_id} Save user-edited code
POST /api/ai/proposals/{proposal_id}/approve Approve the current code and generate a hash
POST /api/ai/proposals/{proposal_id}/reject Reject a proposal and save the decision in logs
POST /api/executions Run approved code locally
GET /api/logs/{trace_id} Retrieve audit events for a trace

FastAPI docs are available after starting the backend:

http://127.0.0.1:8000/docs

Getting Started

1. Backend

python -m venv .venv
./.venv/bin/python -m pip install -r backend/requirements.txt
cp .env.example .env
# edit .env and set DEEPSEEK_API_KEY / ds2api key, or set AI_PROVIDER=mock for offline demo
set -a
source .env
set +a
./.venv/bin/python -m uvicorn backend.app.main:app --host 127.0.0.1 --port 8000

The backend creates demo CSV files and runtime folders locally when needed. These generated files are ignored by Git.

Recommended ds2api settings:

AI_PROVIDER=deepseek
DEEPSEEK_BASE_URL=http://127.0.0.1:5001
DEEPSEEK_MODEL=deepseek-v4-flash-nothinking
DEEPSEEK_INSIGHT_MODEL=deepseek-v4-flash-nothinking
DEEPSEEK_MAX_TOKENS=2200
DEEPSEEK_INSIGHT_MAX_TOKENS=900
DEEPSEEK_TIMEOUT_SECONDS=60
DEEPSEEK_THINKING=disabled
AI_FALLBACK_TO_MOCK_ON_ERROR=false
AI_EXPLAIN_RESULT_ENABLED=true
AI_EXPLAIN_RESULT_PROVIDER=deepseek
AI_EXPLAIN_RESULT_TIMEOUT_SECONDS=45
EXECUTION_TIMEOUT_SECONDS=60

Each generated proposal audit log includes llm_duration_ms, model, finish reason, prompt size, and token/cache usage when the provider returns it. Keep mock fallback disabled for real demos so the app fails honestly instead of fabricating code. Result insight uses ds2api/DeepSeek after approved local execution and analyzes only stdout, table previews, and artifact metadata; charts are displayed as visual illustration. The insight prompt is evidence-only and does not include the full generated code, reducing hallucinated commentary.

Before calling the LLM, the backend runs an intent/capability planner. It maps prompts to general analysis intents such as distribution, correlation, group comparison, time series, revenue/profit, funnel, customer retention, and coordinate map. If the selected dataset lacks the required columns, the backend returns a text-only proposal explaining the missing schema instead of letting the AI invent columns or silently switch to an unrelated chart.

Optional local ds2api setup lives in ds2api/:

cd ds2api
Copy-Item .env.example .env
Copy-Item config.example.json config.json
# edit config.json with real DeepSeek account credentials / allowed API keys
docker compose up -d
Invoke-WebRequest http://127.0.0.1:5001/v1/models -UseBasicParsing

The root .env should use the same key configured in ds2api/config.json:

DEEPSEEK_BASE_URL=http://127.0.0.1:5001
DEEPSEEK_API_KEY=ds2api-local-key

2. Frontend

npm install --prefix frontend
npm run dev --prefix frontend

Open:

http://127.0.0.1:5173

Running Checks

Backend tests:

./.venv/bin/python -m pytest backend/tests

Frontend production build:

npm run build --prefix frontend

Dependency audit:

npm audit --audit-level=moderate --prefix frontend

Demo Flow

  1. Start the backend and frontend.
  2. Select a dataset.
  3. Enter an analysis request.
  4. Generate an AI proposal.
  5. Review and edit the generated Python code in Monaco Editor.
  6. Approve the code.
  7. Run the approved code locally.
  8. Inspect chart artifacts, stdout/stderr, policy status, and audit logs.

Current Status

Implemented:

  • FastAPI backend structure with clear API/service/db boundaries.
  • VSCode-style React workbench UI following the main workbench containers: Title Bar, Activity Bar, Primary Sidebar, Editor, bottom Panel, Secondary Sidebar chat, and Status Bar.
  • Mock AI provider behind an LLMProvider boundary.
  • DeepSeek/OpenAI-compatible provider with ds2api-friendly configuration and latency/token logging.
  • Prompt builder and structured proposal schema.
  • Human approval workflow with code hash validation.
  • Local Python execution runner.
  • AST-based policy checker.
  • SQLite audit logging.
  • Backend workflow tests.
  • GitHub-ready cleanup and .gitignore.

Next recommended work:

  • Add streaming or background proposal jobs if AI generation still feels slow.
  • Move dataset registration from hardcoded metadata to a config file.
  • Return structured table artifacts, not only stdout/chart images.
  • Display backend policy errors directly in the frontend Policy tab.
  • Add end-to-end frontend tests.

Documentation

Detailed design documents live in:

docs/ai-integration/

Important files:

  • APP.md
  • SOFTWARE_DESIGN_PRINCIPLES.md
  • IMPLEMENTATION_GUIDE.md
  • API_CONTRACT.md
  • AI_PROMPTS_AND_SCHEMA.md
  • HUMAN_APPROVAL_EXECUTION.md
  • LOGS_REPORTING.md
  • DEMO_QA.md
  • REQUIREMENT_TRACEABILITY.md
  • REFERENCES.md

Team Work Split

See:

PHANCONG.md

License

This repository is currently intended for academic project use. Add a license file before public reuse or distribution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors