An AI-assisted data analysis workbench for Data Visualization projects.
The application follows a strict human-in-the-loop workflow:
user request -> AI proposal -> user review/edit -> approval -> local execution -> logs/results
AI is used to suggest analysis ideas, generate Python analysis code, and explain approved outputs. It does not execute code silently, modify original datasets, or invent data, columns, charts, images, or results.
This project was built for an AI integration requirement in a Data Visualization final project. The central idea is not to attach a chatbot to a dashboard, but to build a controlled analysis workflow where:
- AI helps propose and write analysis code.
- Humans review, edit, and approve the code.
- Approved code runs locally on the user's machine.
- Every request, proposal, approval, execution, result, error, and artifact is logged for review.
| Layer | Technology |
|---|---|
| Backend | FastAPI, Pydantic, SQLite |
| Frontend | React, JavaScript, Vite |
| Styling | Tailwind CSS |
| Code editor | Monaco Editor |
| AI provider | Mock, DeepSeek, or OpenAI-compatible API such as ds2api |
| Analysis runtime | Local Python subprocess |
| Data/Charts | pandas, matplotlib, seaborn, scipy, scikit-learn, statsmodels |
| Tests | pytest |
- AI-generated code is always shown before execution.
- Users can edit generated code before approval.
- Execution is allowed only after human approval.
- Execution must happen locally, not in an online/cloud runner.
- Original datasets must not be modified.
- Generated code must use the input dataframe as
df. - Generated code must copy data with
work_df = df.copy()before transformations. - Output artifacts must be written only to
outputs_dir. - Unsafe imports, shell/network/file operations, and direct mutation of
dfare blocked by the backend policy checker.
React Frontend
- Dataset/sidebar context
- Prompt input
- Monaco code editor
- Approval controls
- Result, Policy, and Logs inspector
|
v
FastAPI Backend
- Dataset API
- AI Proposal API
- Approval API
- Local Execution API
- Logs API
|
v
Local Python Runner
- Loads registered dataset as df
- Runs approved code only
- Captures stdout/stderr
- Stores artifacts under runs/{run_id}/outputs
|
v
SQLite Audit Log
.
├── backend/
│ ├── app/
│ │ ├── api/ # FastAPI routers
│ │ ├── core/ # runtime paths/config helpers
│ │ ├── db/ # SQLite storage
│ │ ├── services/ # business logic and safety rules
│ │ ├── main.py # FastAPI app setup
│ │ └── schemas.py # Pydantic contracts
│ ├── tests/ # pytest workflow tests
│ ├── requirements.txt
│ └── README.md
├── frontend/
│ ├── src/
│ │ ├── api/ # API client
│ │ ├── components/ # workbench UI components
│ │ ├── App.jsx
│ │ └── styles.css
│ ├── package.json
│ └── vite.config.js
├── ds2api/ # optional local ds2api Docker setup
├── docs/ai-integration/ # design, API, prompts, demo, traceability docs
├── PHANCONG.md # team task split
└── AGENTS.md # project guidance for coding agents
Datasets shown in the frontend dataset list are controlled by backend/config/datasets.json.
Each registered dataset can include:
allowed: backend allowlist for dataset usagevisible: whether the dataset appears in the frontend selector
Example:
{
"id": "vietnam_real_estate_cleaned",
"path": "cleaned_vietnam_real_estate.csv",
"allowed": true,
"visible": true
}| Module | Responsibility |
|---|---|
DatasetRegistry / dataset_service.py |
Dataset metadata, schema, sample values, allowed dataset IDs |
analysis_intent.py / dataset_capabilities.py |
Classifies user prompts into analysis intents and checks whether the selected dataset has the required capabilities |
LLMProvider / llm_provider.py |
Adapter boundary for mock, DeepSeek, or OpenAI-compatible providers such as ds2api |
prompt_builder.py |
Prompt rules and structured proposal schema |
proposal_service.py |
Create, edit, approve, reject, and hash AI proposals |
policy_checker.py |
AST-based unsafe-code validation |
execution_runner.py |
Local approved-code execution and artifact capture |
storage.py |
SQLite persistence for proposals, approvals, executions, and audit events |
log_service.py |
Trace/audit retrieval |
| Method | Endpoint | Purpose |
|---|---|---|
GET |
/api/datasets |
List registered datasets |
GET |
/api/datasets/{dataset_id}/context |
Get dataset schema/context |
POST |
/api/ai/proposals |
Create an AI analysis proposal |
POST |
/api/ai/proposals/jobs |
Start background AI proposal generation |
GET |
/api/ai/proposals/jobs/{job_id} |
Poll background generation status |
GET |
/api/ai/proposals/{proposal_id} |
Get a generated proposal |
PATCH |
/api/ai/proposals/{proposal_id} |
Save user-edited code |
POST |
/api/ai/proposals/{proposal_id}/approve |
Approve the current code and generate a hash |
POST |
/api/ai/proposals/{proposal_id}/reject |
Reject a proposal and save the decision in logs |
POST |
/api/executions |
Run approved code locally |
GET |
/api/logs/{trace_id} |
Retrieve audit events for a trace |
FastAPI docs are available after starting the backend:
http://127.0.0.1:8000/docs
python -m venv .venv
./.venv/bin/python -m pip install -r backend/requirements.txt
cp .env.example .env
# edit .env and set DEEPSEEK_API_KEY / ds2api key, or set AI_PROVIDER=mock for offline demo
set -a
source .env
set +a
./.venv/bin/python -m uvicorn backend.app.main:app --host 127.0.0.1 --port 8000The backend creates demo CSV files and runtime folders locally when needed. These generated files are ignored by Git.
Recommended ds2api settings:
AI_PROVIDER=deepseek
DEEPSEEK_BASE_URL=http://127.0.0.1:5001
DEEPSEEK_MODEL=deepseek-v4-flash-nothinking
DEEPSEEK_INSIGHT_MODEL=deepseek-v4-flash-nothinking
DEEPSEEK_MAX_TOKENS=2200
DEEPSEEK_INSIGHT_MAX_TOKENS=900
DEEPSEEK_TIMEOUT_SECONDS=60
DEEPSEEK_THINKING=disabled
AI_FALLBACK_TO_MOCK_ON_ERROR=false
AI_EXPLAIN_RESULT_ENABLED=true
AI_EXPLAIN_RESULT_PROVIDER=deepseek
AI_EXPLAIN_RESULT_TIMEOUT_SECONDS=45
EXECUTION_TIMEOUT_SECONDS=60Each generated proposal audit log includes llm_duration_ms, model, finish reason, prompt size, and token/cache usage when the provider returns it. Keep mock fallback disabled for real demos so the app fails honestly instead of fabricating code. Result insight uses ds2api/DeepSeek after approved local execution and analyzes only stdout, table previews, and artifact metadata; charts are displayed as visual illustration. The insight prompt is evidence-only and does not include the full generated code, reducing hallucinated commentary.
Before calling the LLM, the backend runs an intent/capability planner. It maps prompts to general analysis intents such as distribution, correlation, group comparison, time series, revenue/profit, funnel, customer retention, and coordinate map. If the selected dataset lacks the required columns, the backend returns a text-only proposal explaining the missing schema instead of letting the AI invent columns or silently switch to an unrelated chart.
Optional local ds2api setup lives in ds2api/:
cd ds2api
Copy-Item .env.example .env
Copy-Item config.example.json config.json
# edit config.json with real DeepSeek account credentials / allowed API keys
docker compose up -d
Invoke-WebRequest http://127.0.0.1:5001/v1/models -UseBasicParsingThe root .env should use the same key configured in ds2api/config.json:
DEEPSEEK_BASE_URL=http://127.0.0.1:5001
DEEPSEEK_API_KEY=ds2api-local-keynpm install --prefix frontend
npm run dev --prefix frontendOpen:
http://127.0.0.1:5173
Backend tests:
./.venv/bin/python -m pytest backend/testsFrontend production build:
npm run build --prefix frontendDependency audit:
npm audit --audit-level=moderate --prefix frontend- Start the backend and frontend.
- Select a dataset.
- Enter an analysis request.
- Generate an AI proposal.
- Review and edit the generated Python code in Monaco Editor.
- Approve the code.
- Run the approved code locally.
- Inspect chart artifacts, stdout/stderr, policy status, and audit logs.
Implemented:
- FastAPI backend structure with clear API/service/db boundaries.
- VSCode-style React workbench UI following the main workbench containers: Title Bar, Activity Bar, Primary Sidebar, Editor, bottom Panel, Secondary Sidebar chat, and Status Bar.
- Mock AI provider behind an
LLMProviderboundary. - DeepSeek/OpenAI-compatible provider with ds2api-friendly configuration and latency/token logging.
- Prompt builder and structured proposal schema.
- Human approval workflow with code hash validation.
- Local Python execution runner.
- AST-based policy checker.
- SQLite audit logging.
- Backend workflow tests.
- GitHub-ready cleanup and
.gitignore.
Next recommended work:
- Add streaming or background proposal jobs if AI generation still feels slow.
- Move dataset registration from hardcoded metadata to a config file.
- Return structured table artifacts, not only stdout/chart images.
- Display backend policy errors directly in the frontend Policy tab.
- Add end-to-end frontend tests.
Detailed design documents live in:
docs/ai-integration/
Important files:
APP.mdSOFTWARE_DESIGN_PRINCIPLES.mdIMPLEMENTATION_GUIDE.mdAPI_CONTRACT.mdAI_PROMPTS_AND_SCHEMA.mdHUMAN_APPROVAL_EXECUTION.mdLOGS_REPORTING.mdDEMO_QA.mdREQUIREMENT_TRACEABILITY.mdREFERENCES.md
See:
PHANCONG.md
This repository is currently intended for academic project use. Add a license file before public reuse or distribution.