Exploratory Data Analysis — Claude Skill

A Claude Code skill that turns data exploration into a strategic conversation. Upload any tabular dataset (CSV, Excel, TSV) and get domain-aware analysis that challenges your assumptions, flags data limitations, and delivers prioritized next steps — not just statistics.

What Makes This Different

	Generic EDA	This Skill
Context	Blind statistics	Asks about your domain and goals first
Hypotheses	Confirms what you expect	Actively seeks contradicting evidence
Limitations	Rarely mentioned	Always explicit
Output	Descriptive stats	Prioritized, actionable recommendations

The 4-Phase Workflow

Phase 1: Context Gathering   (interactive)
         → 4 questions about your domain, objective, and hypotheses

Phase 2: Data Profiling      (automated)
         → Missing data, duplicates, outliers, type mismatches

Phase 3: Domain Exploration  (analytical)
         → Universal + domain-specific analysis
         → Devil's advocate: alternative explanations, Simpson's Paradox

Phase 4: Synthesis           (strategic)
         → Executive summary, confirming vs. contradicting evidence,
           explicit limitations, 3–5 prioritized next steps

Example Output

The skill generates an interactive HTML report with embedded visualizations. The example below is from an e-commerce sales analysis where the user suspected a shipping policy change caused a revenue drop — the skill found the real cause was a product mix shift.

Overview	Revenue Trend

Customer Segments	Correlation Matrix

Supported Domains

The skill adapts its analysis lens based on your answers in Phase 1:

Financial / Banking — transaction patterns, fraud indicators, portfolio concentration
Retail / E-commerce — customer behavior, conversion funnels, seasonality, product mix
Manufacturing / Supply Chain — defect rates, process capability, bottleneck identification
Healthcare / Medical — patient cohorts, treatment outcomes, comorbidity patterns
Marketing / Advertising — campaign ROI, channel attribution, audience segmentation
General — works on any tabular dataset

Installation

Requirements: Claude Code CLI (claude) and Python 3.8+.

# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/eda-skill.git
cd eda-skill

# 2. Copy to your Claude skills directory
cp -r . ~/.claude/skills/exploratory-data-analysis/

# 3. Install Python dependencies
./setup.sh

Or install dependencies manually:

pip install pandas numpy matplotlib seaborn scipy

Quick Start

Start a Claude Code conversation
Upload a CSV or Excel file
Say any of the following:
- "Analyze this dataset"
- "Help me explore this data"
- "What patterns do you see?"
- "Run EDA on this file"
Answer 4 context questions about your domain and objectives
Receive a full analysis + interactive HTML report

Try the included sample datasets

E-commerce dataset (generated, ~5,000 rows — includes intentional quality issues):

python test-cases/generate_sample_data.py
# Then upload test-cases/sample_ecommerce_data.csv and say:
# "Our revenue dropped last quarter. I think it was our new shipping policy. Can you analyze this."

Expected: the skill confirms the drop but reveals product mix shift — not the shipping policy — is the real driver.

Car sales dataset (real data, 50,000 rows):

Upload test-cases/car_sales_data.csv and say:
"Analyze this car sales data. I want to understand what factors most influence price."

Repository Structure

.
├── SKILL.md                        # Claude skill definition (what Claude reads)
├── WORKFLOW.md                     # Visual workflow diagram
├── QUICK_REFERENCE.md              # One-page cheat sheet
├── INSTALLATION.md                 # Detailed installation and customization guide
├── setup.sh                        # Dependency installer
├── scripts/
│   ├── data_profiler.py            # Standalone data quality checker
│   └── generate_report.py          # Interactive HTML report generator
├── test-cases/
│   ├── test-prompts.md             # 10 test scenarios for evaluating the skill
│   ├── generate_sample_data.py     # Generates realistic e-commerce test data
│   ├── sample_ecommerce_data.csv   # Pre-generated e-commerce dataset (5,000+ transactions)
│   └── car_sales_data.csv          # Real car sales dataset (50,000 rows)
└── examples/
    ├── fig1_overview.png
    ├── fig2_price.png
    ├── fig3_segments.png
    └── fig4_correlation.png

Customization

Add a domain template — edit SKILL.md Phase 3 (~line 170):

#### Your Domain (e.g., "Education / Learning Analytics")
- Student engagement rates
- Learning path completion rates
- Drop-off points in courses

Change report styling — edit the CSS in scripts/generate_report.py (line ~39):

--primary-color: #9b59b6;   /* default: #3498db */

Adjust context questions — edit Phase 1 of SKILL.md.

See INSTALLATION.md for the full customization guide and troubleshooting.

Running the Scripts Standalone

# Profile any CSV
python scripts/data_profiler.py your_data.csv --output report.json

# Generate a report (called programmatically from the skill)
python scripts/generate_report.py --profile report.json --output report.html

License

MIT — use it, fork it, adapt it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploratory Data Analysis — Claude Skill

What Makes This Different

The 4-Phase Workflow

Example Output

Supported Domains

Installation

Quick Start

Try the included sample datasets

Repository Structure

Customization

Running the Scripts Standalone

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
scripts		scripts
test-cases		test-cases
.gitattributes		.gitattributes
.gitignore		.gitignore
INSTALLATION.md		INSTALLATION.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
SKILL.md		SKILL.md
WORKFLOW.md		WORKFLOW.md
files.zip		files.zip
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis — Claude Skill

What Makes This Different

The 4-Phase Workflow

Example Output

Supported Domains

Installation

Quick Start

Try the included sample datasets

Repository Structure

Customization

Running the Scripts Standalone

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages