Formula One Results Analysis with MongoDB

This repository contains a comprehensive data analysis project exploring the history of the Formula 1 World Championship (1950-2024) using NoSQL document-oriented databases. Developed for the SMBUD course at Politecnico di Milano, the project leverages MongoDB's advanced aggregation framework to extract meaningful insights from large, interconnected racing datasets.

Overview

The primary goal of this project is to demonstrate the power and flexibility of Documental Databases in analyzing complex, real-world data. By utilizing NoSQL approaches instead of traditional relational models, the project efficiently handles historical F1 data to uncover driver performance patterns, race trends, and circuit statistics across seven decades of motorsport.

Dataset Structure

The analysis is built upon the publicly available Kaggle F1 dataset, which was preprocessed and imported into MongoDB. The database is structured into four primary collections:

🏁 results: Contains individual race entries, including finishing positions, grid spots, points scored, and fastest lap data.
🏎️ drivers: Biographical and career information for every driver in F1 history.
🌍 circuits: Geographic and technical details regarding the race tracks.
📅 races: Event-specific data, including season calendars, rounds, and historical dates.

Note: While a fully embedded subdocument structure was considered, separate collections were maintained and linked via $lookup to avoid excessive redundancy while preserving NoSQL flexibility.

Key Features & Methodologies

This project heavily utilizes MongoDB Aggregation Pipelines to perform complex data transformations and analytics.

Advanced Data Aggregation: Extensive use of operators such as $group, $match, $sort, and $project to filter and summarize historical trends.
Cross-Collection Joins: Utilized $lookup to combine data across the results, drivers, races, and circuits collections, simulating relational joins in a document database.
Analytical Queries: * Calculated the historical Average Driver Position across various seasons and cars.
- Analyzed historical reliability by tracking Non-Finishers (DNFs) and identifying races/eras with the highest attrition rates.
- Evaluated specific circuit characteristics and their impact on race outcomes.

Usage & Setup

Prerequisites

MongoDB installed locally or via Atlas.
MongoDB Compass (GUI for importing data and running queries).

Installation

Clone the repository:

git clone [https://github.com/paolorv/F1_DataAnalysis.git](https://github.com/paolorv/F1_DataAnalysis.git)

Download the source dataset from Kaggle.
Open MongoDB Compass, create a new database (e.g., f1_db), and import the .csv/.json files into their respective collections (results, drivers, circuits, races).
Open the provided query scripts in this repository to execute the aggregation pipelines directly in the Compass shell or via a MongoDB driver.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Project Template		Project Template
SMBUD Project Group 15 - Paolo Riva		SMBUD Project Group 15 - Paolo Riva
__MACOSX		__MACOSX
f1_data		f1_data
README.md		README.md
f1_results_transformed.csv		f1_results_transformed.csv
f1_results_transformed_v2.csv		f1_results_transformed_v2.csv
preprocessing.py		preprocessing.py
queries.txt		queries.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Formula One Results Analysis with MongoDB

Overview

Dataset Structure

Key Features & Methodologies

Usage & Setup

Prerequisites

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Formula One Results Analysis with MongoDB

Overview

Dataset Structure

Key Features & Methodologies

Usage & Setup

Prerequisites

Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages