A Claude Code skill that turns data exploration into a strategic conversation. Upload any tabular dataset (CSV, Excel, TSV) and get domain-aware analysis that challenges your assumptions, flags data limitations, and delivers prioritized next steps — not just statistics.
| Generic EDA | This Skill | |
|---|---|---|
| Context | Blind statistics | Asks about your domain and goals first |
| Hypotheses | Confirms what you expect | Actively seeks contradicting evidence |
| Limitations | Rarely mentioned | Always explicit |
| Output | Descriptive stats | Prioritized, actionable recommendations |
Phase 1: Context Gathering (interactive)
→ 4 questions about your domain, objective, and hypotheses
Phase 2: Data Profiling (automated)
→ Missing data, duplicates, outliers, type mismatches
Phase 3: Domain Exploration (analytical)
→ Universal + domain-specific analysis
→ Devil's advocate: alternative explanations, Simpson's Paradox
Phase 4: Synthesis (strategic)
→ Executive summary, confirming vs. contradicting evidence,
explicit limitations, 3–5 prioritized next steps
The skill generates an interactive HTML report with embedded visualizations. The example below is from an e-commerce sales analysis where the user suspected a shipping policy change caused a revenue drop — the skill found the real cause was a product mix shift.
| Overview | Revenue Trend |
|---|---|
![]() |
![]() |
| Customer Segments | Correlation Matrix |
|---|---|
![]() |
![]() |
The skill adapts its analysis lens based on your answers in Phase 1:
- Financial / Banking — transaction patterns, fraud indicators, portfolio concentration
- Retail / E-commerce — customer behavior, conversion funnels, seasonality, product mix
- Manufacturing / Supply Chain — defect rates, process capability, bottleneck identification
- Healthcare / Medical — patient cohorts, treatment outcomes, comorbidity patterns
- Marketing / Advertising — campaign ROI, channel attribution, audience segmentation
- General — works on any tabular dataset
Requirements: Claude Code CLI (claude) and Python 3.8+.
# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/eda-skill.git
cd eda-skill
# 2. Copy to your Claude skills directory
cp -r . ~/.claude/skills/exploratory-data-analysis/
# 3. Install Python dependencies
./setup.shOr install dependencies manually:
pip install pandas numpy matplotlib seaborn scipy- Start a Claude Code conversation
- Upload a CSV or Excel file
- Say any of the following:
- "Analyze this dataset"
- "Help me explore this data"
- "What patterns do you see?"
- "Run EDA on this file"
- Answer 4 context questions about your domain and objectives
- Receive a full analysis + interactive HTML report
E-commerce dataset (generated, ~5,000 rows — includes intentional quality issues):
python test-cases/generate_sample_data.py
# Then upload test-cases/sample_ecommerce_data.csv and say:
# "Our revenue dropped last quarter. I think it was our new shipping policy. Can you analyze this."Expected: the skill confirms the drop but reveals product mix shift — not the shipping policy — is the real driver.
Car sales dataset (real data, 50,000 rows):
Upload test-cases/car_sales_data.csv and say:
"Analyze this car sales data. I want to understand what factors most influence price."
.
├── SKILL.md # Claude skill definition (what Claude reads)
├── WORKFLOW.md # Visual workflow diagram
├── QUICK_REFERENCE.md # One-page cheat sheet
├── INSTALLATION.md # Detailed installation and customization guide
├── setup.sh # Dependency installer
├── scripts/
│ ├── data_profiler.py # Standalone data quality checker
│ └── generate_report.py # Interactive HTML report generator
├── test-cases/
│ ├── test-prompts.md # 10 test scenarios for evaluating the skill
│ ├── generate_sample_data.py # Generates realistic e-commerce test data
│ ├── sample_ecommerce_data.csv # Pre-generated e-commerce dataset (5,000+ transactions)
│ └── car_sales_data.csv # Real car sales dataset (50,000 rows)
└── examples/
├── fig1_overview.png
├── fig2_price.png
├── fig3_segments.png
└── fig4_correlation.png
Add a domain template — edit SKILL.md Phase 3 (~line 170):
#### Your Domain (e.g., "Education / Learning Analytics")
- Student engagement rates
- Learning path completion rates
- Drop-off points in coursesChange report styling — edit the CSS in scripts/generate_report.py (line ~39):
--primary-color: #9b59b6; /* default: #3498db */Adjust context questions — edit Phase 1 of SKILL.md.
See INSTALLATION.md for the full customization guide and troubleshooting.
# Profile any CSV
python scripts/data_profiler.py your_data.csv --output report.json
# Generate a report (called programmatically from the skill)
python scripts/generate_report.py --profile report.json --output report.htmlMIT — use it, fork it, adapt it.



