Skip to content

mivige/NataVisionaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NataVisionaries

Overview

This project builds a machine learning pipeline to predict the quality (OK / KO) of Pastel de Nata products for the Nata Visionaries brotherhood. The model will classify each production based on recipe and process data, saving pastries from being destroyed.

Project Structure / Notebooks

The project will consist of 5 Jupyter notebooks as required:

1. MLXX NB1 DATAEXPLORATION.ipynb

  • Load learn.csv
  • Understand data structure & feature types
  • Summary statistics, distributions, correlations
  • Initial observations & potential data issues

2. MLXX NB2 PREPROCESSING.ipynb

  • Handle missing values / outliers
  • Encode categorical features (origin)
  • Scale / normalize numerical values if needed
  • Save processed dataset (if needed for later notebooks)

3. MLXX NB3 FEATUREMGMT.ipynb

  • Engineer new relevant features
  • Drop irrelevant / redundant variables
  • Justify transformations
  • Export cleaned + engineered dataset

4. MLXX NB4 MODELLING.ipynb

  • Train/test split
  • Experiment with ML classifiers (e.g. RF, XGBoost, etc.)
  • Compare accuracy
  • Optional hyperparameter search

9. MLXX NB9 FINAL.ipynb

  • Load raw data directly from files (learn.csv, predict.csv)
  • Rebuild best data prep + feature steps
  • Load best model configuration
  • Train and export Kaggle sampred.csv-style predictions

Data Files

  • learn.csv → full training dataset including target
  • predict.csv → same features, no target, must predict
  • sampred.csv → example submission format

Kaggle Submission

  • Output must match sampred.csv format
  • Submit early and often -> leaderboard feedback required

Deadlines

  • Kaggle closes: 17 Dec
  • Final notebook submission: 20 Dec (follow naming rules!)

Notes

  • Markdown explanations in every notebook
  • No data leakage
  • NB1 and NB9 must run standalone from raw files
  • Use accuracy as main metric

This README is a quick roadmap to keep the project organized and correctly aligned with grading rules.

About

Machine learning pipeline to predict the quality of Pastel de Nata products.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors