An ensemble classifier for the Spaceship Titanic Kaggle competition.
Team: cranberry128, bobbbbbbyli, yangtom0516
We do a lot of feature engineering and stack a variety of models. Primary base model is a Random Forest.
The current ensemble uses a HistGradientBoostingClassifier meta model and base models:
RandomForestClassifier(anchor, fine-tuned)ExtraTreesClassifierGradientBoostingClassifierXGBClassifierLogisticRegressionCatBoostClassifierSGDClassifierLinearDiscriminantAnalysisBaggingClassifierMLPClassifier
Current Best Accuracy (Stacking): 0.81318, rank 73/~2700 teams Current Best Individual Model Accuracy (Random Forest): 0.80009
base_models.ipynb is a notebook for fine-tuning individual base models. stacking_ensemble.ipynb is self-contained and creates the ensemble model.
The notebooks will automatically create files/, model/ and output/ directories.
You must import train.csv, test.csv, sample_submission.csv into files/ from the competition data.
Trained models will be saved to disk in model/ while predictions will be saved to output/.
For convenience, the stacking notebook caches models to stackcache/ with a key of modelname_seed. To retrain a model (e.g. with different hyperparameters) simply delete the corresponding cache file.