🐼 Pandas Mastery for Data Analysis & Machine Learning

A complete hands-on, project-based journey to mastering Pandas, covering everything from data exploration to building real-world data preprocessing pipelines.

🚀 Overview

This repository contains 10 structured projects designed to take you from beginner to advanced in Pandas.

Each project focuses on a core concept and gradually builds toward real-world workflows used in:

Data Analysis
Machine Learning
Data Engineering

🧠 What You'll Learn

Data loading & exploration
Data cleaning & preprocessing
Feature engineering
Filtering & transformation
GroupBy & aggregation
Working with relational datasets
Building end-to-end ML pipelines

📊 Projects Overview

#	Project Name	Key Skills
01	Dataset Explorer Tool	Data loading, inspection
02	Dataset Summary Generator	Statistics, aggregation
03	Data Cleaning Pipeline	Missing values, preprocessing
04	Categorical Data Processor	Encoding, transformations
05	Data Filtering Engine	Querying, boolean masking
06	Feature Engineering Toolkit	Feature creation, binning
07	Sales Data Analyzer	GroupBy, business insights
08	Student Performance Analyzer	Aggregation, logic building
09	Relational Data Merger	Joins, multi-table analysis
10	ML Prep Pipeline	End-to-end preprocessing

📁 Project Structure

pandas-mastery-projects/
│
├── 01-dataset-explorer/
├── 02-summary-generator/
├── 03-data-cleaning-pipeline/
├── 04-categorical-cleaner/
├── 05-filtering-engine/
├── 06-feature-engineering/
├── 07-sales-analysis/
├── 08-student-analyzer/
├── 09-data-merger/
├── 10-ml-prep-pipeline/
└── README.md

📊 Datasets Used

This repository uses a combination of real-world datasets, open-source datasets, and generated datasets for reproducibility.

🟢 Early Projects (1–2)

Iris Dataset (Kaggle) https://www.kaggle.com/datasets/uciml/iris

File used: Iris.csv
Used in: Project 1 — Dataset Explorer

Titanic Dataset (Kaggle — YasserH Version) https://www.kaggle.com/datasets/yasserh/titanic-dataset

File used: Titanic-Dataset.csv
Used in: Project 2 — Dataset Summary Generator

🟡 Preprocessing Projects (3–6)

Titanic Dataset (Seaborn / GitHub Source) https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv

File used: titanic.csv
Used in:
- Project 3 — Data Cleaning Pipeline
- Project 4 — Categorical Data Processor
- Project 5 — Data Filtering Engine
- Project 6 — Feature Engineering Toolkit

🟣 Analysis Projects (7–8)

Synthetic Datasets (Generated using NumPy)

Created within notebooks for controlled analysis
Used in:
- Project 7 — Sales Data Analyzer
- Project 8 — Student Performance Analyzer

🔴 Real-World Projects (9–10)

Brazilian E-Commerce Dataset (Olist) https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce/data

Files used: orders, customers, order items, products, payments, category translation
Used in: Project 9 — Relational Data Merger

Titanic Dataset (Kaggle — ML Pipeline Version) https://www.kaggle.com/datasets/whenamancodes/titanic-dataset-machine-learning-from-disaster

File used: train.csv
Used in: Project 10 — ML Prep Pipeline

🛠️ Tech Stack

Python
Pandas
NumPy
Jupyter Notebook

▶️ How to Use This Repository

Clone the repository:

git clone https://github.com/codewithhanzlah/pandas-mastery-projects.git

Install dependencies:
```
pip install -r requirements.txt
```
Navigate to any project folder:
```
cd 01-dataset-explorer
```
Launch Jupyter Notebook:
```
jupyter notebook
```
Run the notebook inside the folder.

📈 Learning Path

This repository follows a structured progression:

Exploration → Cleaning → Transformation → Analysis → Real-World Pipelines

🎯 Key Highlights

📦 10 end-to-end projects
🧠 Covers beginner → advanced concepts
🔗 Real-world relational data handling (Olist Brazilian E-Commerce)
🤖 ML-ready preprocessing pipeline
📊 Business-focused data analysis

💡 Who Is This For?

Beginners learning Pandas
Aspiring Data Analysts
Machine Learning beginners
Anyone building a strong data portfolio

📌 Future Improvements

Add visualizations (Matplotlib / Seaborn) — in progress as a separate series
Convert pipelines into reusable Python modules
Integrate preprocessing pipelines with ML models

🤝 Connect With Me

If you found this helpful, feel free to connect and share feedback!

⭐ If you like this project, consider giving it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐼 Pandas Mastery for Data Analysis & Machine Learning

🚀 Overview

🧠 What You'll Learn

📊 Projects Overview

📁 Project Structure

📊 Datasets Used

🟢 Early Projects (1–2)

🟡 Preprocessing Projects (3–6)

🟣 Analysis Projects (7–8)

🔴 Real-World Projects (9–10)

🛠️ Tech Stack

▶️ How to Use This Repository

📈 Learning Path

🎯 Key Highlights

💡 Who Is This For?

📌 Future Improvements

🤝 Connect With Me

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
01-dataset-explorer		01-dataset-explorer
02-summary-generator		02-summary-generator
03-data-cleaning-pipeline		03-data-cleaning-pipeline
04-categorical-cleaner		04-categorical-cleaner
05-filtering-engine		05-filtering-engine
06-feature-engineering		06-feature-engineering
07-sales-analysis		07-sales-analysis
08-student-analyzer		08-student-analyzer
09-data-merger		09-data-merger
10-ml-prep-pipeline		10-ml-prep-pipeline
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🐼 Pandas Mastery for Data Analysis & Machine Learning

🚀 Overview

🧠 What You'll Learn

📊 Projects Overview

📁 Project Structure

📊 Datasets Used

🟢 Early Projects (1–2)

🟡 Preprocessing Projects (3–6)

🟣 Analysis Projects (7–8)

🔴 Real-World Projects (9–10)

🛠️ Tech Stack

▶️ How to Use This Repository

📈 Learning Path

🎯 Key Highlights

💡 Who Is This For?

📌 Future Improvements

🤝 Connect With Me

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages