Skip to content

paolorv/f1results-analysis

Repository files navigation

Formula One Results Analysis with MongoDB

Database Dataset

This repository contains a comprehensive data analysis project exploring the history of the Formula 1 World Championship (1950-2024) using NoSQL document-oriented databases. Developed for the SMBUD course at Politecnico di Milano, the project leverages MongoDB's advanced aggregation framework to extract meaningful insights from large, interconnected racing datasets.


Overview

The primary goal of this project is to demonstrate the power and flexibility of Documental Databases in analyzing complex, real-world data. By utilizing NoSQL approaches instead of traditional relational models, the project efficiently handles historical F1 data to uncover driver performance patterns, race trends, and circuit statistics across seven decades of motorsport.

Dataset Structure

The analysis is built upon the publicly available Kaggle F1 dataset, which was preprocessed and imported into MongoDB. The database is structured into four primary collections:

  • 🏁 results: Contains individual race entries, including finishing positions, grid spots, points scored, and fastest lap data.
  • 🏎️ drivers: Biographical and career information for every driver in F1 history.
  • 🌍 circuits: Geographic and technical details regarding the race tracks.
  • 📅 races: Event-specific data, including season calendars, rounds, and historical dates.

Note: While a fully embedded subdocument structure was considered, separate collections were maintained and linked via $lookup to avoid excessive redundancy while preserving NoSQL flexibility.

Key Features & Methodologies

This project heavily utilizes MongoDB Aggregation Pipelines to perform complex data transformations and analytics.

  • Advanced Data Aggregation: Extensive use of operators such as $group, $match, $sort, and $project to filter and summarize historical trends.
  • Cross-Collection Joins: Utilized $lookup to combine data across the results, drivers, races, and circuits collections, simulating relational joins in a document database.
  • Analytical Queries: * Calculated the historical Average Driver Position across various seasons and cars.
    • Analyzed historical reliability by tracking Non-Finishers (DNFs) and identifying races/eras with the highest attrition rates.
    • Evaluated specific circuit characteristics and their impact on race outcomes.

Usage & Setup

Prerequisites

Installation

  1. Clone the repository:
    git clone [https://github.com/paolorv/F1_DataAnalysis.git](https://github.com/paolorv/F1_DataAnalysis.git)
  2. Download the source dataset from Kaggle.
  3. Open MongoDB Compass, create a new database (e.g., f1_db), and import the .csv/.json files into their respective collections (results, drivers, circuits, races).
  4. Open the provided query scripts in this repository to execute the aggregation pipelines directly in the Compass shell or via a MongoDB driver.

About

MongoDB query system to perform a data analysis over the collection of Formula 1 results throughout history - developed within the "SMBUD" course at Politecnico di Milano

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors