Skip to content

codewithhanzlah/pandas-mastery-projects

Repository files navigation

🐼 Pandas Mastery for Data Analysis & Machine Learning

A complete hands-on, project-based journey to mastering Pandas, covering everything from data exploration to building real-world data preprocessing pipelines.


🚀 Overview

This repository contains 10 structured projects designed to take you from beginner to advanced in Pandas.

Each project focuses on a core concept and gradually builds toward real-world workflows used in:

  • Data Analysis
  • Machine Learning
  • Data Engineering

🧠 What You'll Learn

  • Data loading & exploration
  • Data cleaning & preprocessing
  • Feature engineering
  • Filtering & transformation
  • GroupBy & aggregation
  • Working with relational datasets
  • Building end-to-end ML pipelines

📊 Projects Overview

# Project Name Key Skills
01 Dataset Explorer Tool Data loading, inspection
02 Dataset Summary Generator Statistics, aggregation
03 Data Cleaning Pipeline Missing values, preprocessing
04 Categorical Data Processor Encoding, transformations
05 Data Filtering Engine Querying, boolean masking
06 Feature Engineering Toolkit Feature creation, binning
07 Sales Data Analyzer GroupBy, business insights
08 Student Performance Analyzer Aggregation, logic building
09 Relational Data Merger Joins, multi-table analysis
10 ML Prep Pipeline End-to-end preprocessing

📁 Project Structure

pandas-mastery-projects/
│
├── 01-dataset-explorer/
├── 02-summary-generator/
├── 03-data-cleaning-pipeline/
├── 04-categorical-cleaner/
├── 05-filtering-engine/
├── 06-feature-engineering/
├── 07-sales-analysis/
├── 08-student-analyzer/
├── 09-data-merger/
├── 10-ml-prep-pipeline/
└── README.md

📊 Datasets Used

This repository uses a combination of real-world datasets, open-source datasets, and generated datasets for reproducibility.


🟢 Early Projects (1–2)

Iris Dataset (Kaggle) https://www.kaggle.com/datasets/uciml/iris

  • File used: Iris.csv
  • Used in: Project 1 — Dataset Explorer

Titanic Dataset (Kaggle — YasserH Version) https://www.kaggle.com/datasets/yasserh/titanic-dataset

  • File used: Titanic-Dataset.csv
  • Used in: Project 2 — Dataset Summary Generator

🟡 Preprocessing Projects (3–6)

Titanic Dataset (Seaborn / GitHub Source) https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv

  • File used: titanic.csv
  • Used in:
    • Project 3 — Data Cleaning Pipeline
    • Project 4 — Categorical Data Processor
    • Project 5 — Data Filtering Engine
    • Project 6 — Feature Engineering Toolkit

🟣 Analysis Projects (7–8)

Synthetic Datasets (Generated using NumPy)

  • Created within notebooks for controlled analysis
  • Used in:
    • Project 7 — Sales Data Analyzer
    • Project 8 — Student Performance Analyzer

🔴 Real-World Projects (9–10)

Brazilian E-Commerce Dataset (Olist) https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce/data

  • Files used: orders, customers, order items, products, payments, category translation
  • Used in: Project 9 — Relational Data Merger

Titanic Dataset (Kaggle — ML Pipeline Version) https://www.kaggle.com/datasets/whenamancodes/titanic-dataset-machine-learning-from-disaster

  • File used: train.csv
  • Used in: Project 10 — ML Prep Pipeline

🛠️ Tech Stack

  • Python
  • Pandas
  • NumPy
  • Jupyter Notebook

▶️ How to Use This Repository

  1. Clone the repository:

    git clone https://github.com/codewithhanzlah/pandas-mastery-projects.git
  2. Install dependencies:

    pip install -r requirements.txt
  3. Navigate to any project folder:

    cd 01-dataset-explorer
  4. Launch Jupyter Notebook:

    jupyter notebook
  5. Run the notebook inside the folder.


📈 Learning Path

This repository follows a structured progression:

Exploration → Cleaning → Transformation → Analysis → Real-World Pipelines


🎯 Key Highlights

  • 📦 10 end-to-end projects
  • 🧠 Covers beginner → advanced concepts
  • 🔗 Real-world relational data handling (Olist Brazilian E-Commerce)
  • 🤖 ML-ready preprocessing pipeline
  • 📊 Business-focused data analysis

💡 Who Is This For?

  • Beginners learning Pandas
  • Aspiring Data Analysts
  • Machine Learning beginners
  • Anyone building a strong data portfolio

📌 Future Improvements

  • Add visualizations (Matplotlib / Seaborn) — in progress as a separate series
  • Convert pipelines into reusable Python modules
  • Integrate preprocessing pipelines with ML models

🤝 Connect With Me

If you found this helpful, feel free to connect and share feedback!


⭐ If you like this project, consider giving it a star!

About

Hands-on Pandas mastery through 10 real-world projects. Covers data cleaning, feature engineering, GroupBy analysis, multi-table joins, and ML preprocessing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors