A complete hands-on, project-based journey to mastering Pandas, covering everything from data exploration to building real-world data preprocessing pipelines.
This repository contains 10 structured projects designed to take you from beginner to advanced in Pandas.
Each project focuses on a core concept and gradually builds toward real-world workflows used in:
- Data Analysis
- Machine Learning
- Data Engineering
- Data loading & exploration
- Data cleaning & preprocessing
- Feature engineering
- Filtering & transformation
- GroupBy & aggregation
- Working with relational datasets
- Building end-to-end ML pipelines
| # | Project Name | Key Skills |
|---|---|---|
| 01 | Dataset Explorer Tool | Data loading, inspection |
| 02 | Dataset Summary Generator | Statistics, aggregation |
| 03 | Data Cleaning Pipeline | Missing values, preprocessing |
| 04 | Categorical Data Processor | Encoding, transformations |
| 05 | Data Filtering Engine | Querying, boolean masking |
| 06 | Feature Engineering Toolkit | Feature creation, binning |
| 07 | Sales Data Analyzer | GroupBy, business insights |
| 08 | Student Performance Analyzer | Aggregation, logic building |
| 09 | Relational Data Merger | Joins, multi-table analysis |
| 10 | ML Prep Pipeline | End-to-end preprocessing |
pandas-mastery-projects/
│
├── 01-dataset-explorer/
├── 02-summary-generator/
├── 03-data-cleaning-pipeline/
├── 04-categorical-cleaner/
├── 05-filtering-engine/
├── 06-feature-engineering/
├── 07-sales-analysis/
├── 08-student-analyzer/
├── 09-data-merger/
├── 10-ml-prep-pipeline/
└── README.md
This repository uses a combination of real-world datasets, open-source datasets, and generated datasets for reproducibility.
Iris Dataset (Kaggle) https://www.kaggle.com/datasets/uciml/iris
- File used:
Iris.csv - Used in: Project 1 — Dataset Explorer
Titanic Dataset (Kaggle — YasserH Version) https://www.kaggle.com/datasets/yasserh/titanic-dataset
- File used:
Titanic-Dataset.csv - Used in: Project 2 — Dataset Summary Generator
Titanic Dataset (Seaborn / GitHub Source) https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv
- File used:
titanic.csv - Used in:
- Project 3 — Data Cleaning Pipeline
- Project 4 — Categorical Data Processor
- Project 5 — Data Filtering Engine
- Project 6 — Feature Engineering Toolkit
Synthetic Datasets (Generated using NumPy)
- Created within notebooks for controlled analysis
- Used in:
- Project 7 — Sales Data Analyzer
- Project 8 — Student Performance Analyzer
Brazilian E-Commerce Dataset (Olist) https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce/data
- Files used: orders, customers, order items, products, payments, category translation
- Used in: Project 9 — Relational Data Merger
Titanic Dataset (Kaggle — ML Pipeline Version) https://www.kaggle.com/datasets/whenamancodes/titanic-dataset-machine-learning-from-disaster
- File used:
train.csv - Used in: Project 10 — ML Prep Pipeline
- Python
- Pandas
- NumPy
- Jupyter Notebook
-
Clone the repository:
git clone https://github.com/codewithhanzlah/pandas-mastery-projects.git
-
Install dependencies:
pip install -r requirements.txt
-
Navigate to any project folder:
cd 01-dataset-explorer -
Launch Jupyter Notebook:
jupyter notebook
-
Run the notebook inside the folder.
This repository follows a structured progression:
Exploration → Cleaning → Transformation → Analysis → Real-World Pipelines
- 📦 10 end-to-end projects
- 🧠 Covers beginner → advanced concepts
- 🔗 Real-world relational data handling (Olist Brazilian E-Commerce)
- 🤖 ML-ready preprocessing pipeline
- 📊 Business-focused data analysis
- Beginners learning Pandas
- Aspiring Data Analysts
- Machine Learning beginners
- Anyone building a strong data portfolio
- Add visualizations (Matplotlib / Seaborn) — in progress as a separate series
- Convert pipelines into reusable Python modules
- Integrate preprocessing pipelines with ML models
If you found this helpful, feel free to connect and share feedback!
⭐ If you like this project, consider giving it a star!