Skip to content

Danialjfz/spotify_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotify High-Popularity Tracks - Exploratory Data Analysis

An in-depth exploratory data analysis of Spotify's high-popularity tracks, uncovering patterns in audio features, genre distributions, and temporal trends that define commercially successful music.

Python Pandas Jupyter License


Key Findings

Metric Value
Tracks Analyzed 1,686
Unique Artists 1,437
Genres Covered 6
Time Span 1957 – 2024
  • Popular tracks average 0.67 energy, 0.65 danceability, and 0.53 valence — a balanced mix of intensity and accessibility
  • Strongest correlation: Energy and Loudness (r = 0.68); strongest negative: Energy and Acousticness (r = -0.58)
  • Major keys dominate at 58%, with C#/Db as the most common key
  • Pop, rock, and hip-hop account for nearly half of all high-popularity tracks
  • Modern releases (2020s) overwhelmingly dominate popularity charts

Visualizations

The notebook generates 20 publication-quality visualizations, including:

# Visualization Description
01 Audio Feature Distributions Histograms with KDE for all 9 audio features
02 Feature Box Plots Comparative box plots on a normalized 0–1 scale
04 Correlation Heatmap Pearson correlation matrix of audio features
07 Genre Radar Chart Normalized audio profiles for top 6 genres
08 Genre Violin Plots Distribution shape comparisons across genres
11 Feature Evolution Audio feature trends across decades
17 Pair Plot Pairwise relationships colored by genre
18 2D Density Plot Energy vs. Danceability density distribution
19 Cluster Map Hierarchical clustering of audio features
20 Summary Dashboard Combined overview of all key findings

Dataset

Two CSV files sourced from Spotify's audio features API:

  • high_popularity_spotify_data.csv — 1,686 tracks (popularity score >= 68)
  • low_popularity_spotify_data.csv — available for comparative analysis

Audio Features

Feature Range Description
energy 0.0 – 1.0 Intensity and activity measure
danceability 0.0 – 1.0 Suitability for dancing
valence 0.0 – 1.0 Musical positiveness / happiness
acousticness 0.0 – 1.0 Acoustic vs. electronic
speechiness 0.0 – 1.0 Presence of spoken words
instrumentalness 0.0 – 1.0 Likelihood of no vocals
liveness 0.0 – 1.0 Presence of live audience
loudness -60 – 0 dB Overall loudness in decibels
tempo ~50 – 210 Beats per minute (BPM)
track_popularity 0 – 100 Spotify popularity score

Getting Started

Prerequisites

  • Python 3.8+
  • pip

Installation

# Clone the repository
git clone https://github.com/your-username/spotify-eda.git
cd spotify-eda

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Requirements

numpy>=1.21.0
pandas>=1.3.0
matplotlib>=3.4.0
seaborn>=0.11.0
scipy>=1.7.0
jupyter>=1.0.0

Running the Notebook

jupyter notebook archive/spotify_eda_analysis.ipynb

Or open directly in VS Code with the Jupyter extension.

Output

After running all cells, the notebook produces:

  • 20 PNG visualizations in the archive/ directory
  • processed_spotify_data.csv — cleaned and feature-engineered dataset

Project Structure

spotify-eda/
├── README.md
├── requirements.txt
├── archive/
│   ├── spotify_eda_analysis.ipynb      # Main analysis notebook
│   ├── high_popularity_spotify_data.csv # Primary dataset
│   ├── low_popularity_spotify_data.csv  # Comparison dataset
│   ├── PROJECT_GUIDE.md                # Detailed code walkthrough
│   ├── processed_spotify_data.csv      # Generated after running
│   └── *.png                           # Generated visualizations

Analysis Sections

  1. Data Loading & Exploration — Shape, types, missing values, duplicates
  2. Data Cleaning & Feature Engineering — Datetime extraction, key mapping, duration conversion
  3. Descriptive Statistics — Central tendency, spread, skewness, kurtosis
  4. Distribution Analysis — Histograms, KDE, box plots, popularity breakdown
  5. Correlation Analysis — Heatmap, scatter plots, top correlated pairs
  6. Genre Analysis — Distribution, radar charts, violin plots, popularity comparison
  7. Temporal Analysis — Release trends, decade evolution, monthly patterns
  8. Musical Key & Mode Analysis — Key distribution, major/minor split
  9. Artist Analysis — Top artists, audio feature profiles
  10. Advanced Visualizations — Pair plots, 2D density, hierarchical clustering

Tech Stack

Library Purpose
NumPy Numerical operations and array computations
Pandas Data manipulation, cleaning, and aggregation
Matplotlib Static visualizations and custom plot layouts
Seaborn Statistical visualizations and enhanced aesthetics
SciPy Skewness, kurtosis, and statistical tests
Jupyter Interactive notebook environment

Future Work

  • Comparative analysis with low-popularity tracks
  • Predictive model to forecast track popularity
  • K-means clustering to identify track archetypes
  • Time-series forecasting of audio feature trends
  • Interactive dashboard with Plotly or Streamlit

License

This project is licensed under the MIT License. See LICENSE for details.


Acknowledgments

  • Dataset sourced from the Spotify Web API
  • Built with Python's data science ecosystem

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors