An in-depth exploratory data analysis of Spotify's high-popularity tracks, uncovering patterns in audio features, genre distributions, and temporal trends that define commercially successful music.
| Metric | Value |
|---|---|
| Tracks Analyzed | 1,686 |
| Unique Artists | 1,437 |
| Genres Covered | 6 |
| Time Span | 1957 – 2024 |
- Popular tracks average 0.67 energy, 0.65 danceability, and 0.53 valence — a balanced mix of intensity and accessibility
- Strongest correlation: Energy and Loudness (r = 0.68); strongest negative: Energy and Acousticness (r = -0.58)
- Major keys dominate at 58%, with C#/Db as the most common key
- Pop, rock, and hip-hop account for nearly half of all high-popularity tracks
- Modern releases (2020s) overwhelmingly dominate popularity charts
The notebook generates 20 publication-quality visualizations, including:
| # | Visualization | Description |
|---|---|---|
| 01 | Audio Feature Distributions | Histograms with KDE for all 9 audio features |
| 02 | Feature Box Plots | Comparative box plots on a normalized 0–1 scale |
| 04 | Correlation Heatmap | Pearson correlation matrix of audio features |
| 07 | Genre Radar Chart | Normalized audio profiles for top 6 genres |
| 08 | Genre Violin Plots | Distribution shape comparisons across genres |
| 11 | Feature Evolution | Audio feature trends across decades |
| 17 | Pair Plot | Pairwise relationships colored by genre |
| 18 | 2D Density Plot | Energy vs. Danceability density distribution |
| 19 | Cluster Map | Hierarchical clustering of audio features |
| 20 | Summary Dashboard | Combined overview of all key findings |
Two CSV files sourced from Spotify's audio features API:
high_popularity_spotify_data.csv— 1,686 tracks (popularity score >= 68)low_popularity_spotify_data.csv— available for comparative analysis
| Feature | Range | Description |
|---|---|---|
energy |
0.0 – 1.0 | Intensity and activity measure |
danceability |
0.0 – 1.0 | Suitability for dancing |
valence |
0.0 – 1.0 | Musical positiveness / happiness |
acousticness |
0.0 – 1.0 | Acoustic vs. electronic |
speechiness |
0.0 – 1.0 | Presence of spoken words |
instrumentalness |
0.0 – 1.0 | Likelihood of no vocals |
liveness |
0.0 – 1.0 | Presence of live audience |
loudness |
-60 – 0 dB | Overall loudness in decibels |
tempo |
~50 – 210 | Beats per minute (BPM) |
track_popularity |
0 – 100 | Spotify popularity score |
- Python 3.8+
- pip
# Clone the repository
git clone https://github.com/your-username/spotify-eda.git
cd spotify-eda
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtnumpy>=1.21.0
pandas>=1.3.0
matplotlib>=3.4.0
seaborn>=0.11.0
scipy>=1.7.0
jupyter>=1.0.0
jupyter notebook archive/spotify_eda_analysis.ipynbOr open directly in VS Code with the Jupyter extension.
After running all cells, the notebook produces:
- 20 PNG visualizations in the
archive/directory processed_spotify_data.csv— cleaned and feature-engineered dataset
spotify-eda/
├── README.md
├── requirements.txt
├── archive/
│ ├── spotify_eda_analysis.ipynb # Main analysis notebook
│ ├── high_popularity_spotify_data.csv # Primary dataset
│ ├── low_popularity_spotify_data.csv # Comparison dataset
│ ├── PROJECT_GUIDE.md # Detailed code walkthrough
│ ├── processed_spotify_data.csv # Generated after running
│ └── *.png # Generated visualizations
- Data Loading & Exploration — Shape, types, missing values, duplicates
- Data Cleaning & Feature Engineering — Datetime extraction, key mapping, duration conversion
- Descriptive Statistics — Central tendency, spread, skewness, kurtosis
- Distribution Analysis — Histograms, KDE, box plots, popularity breakdown
- Correlation Analysis — Heatmap, scatter plots, top correlated pairs
- Genre Analysis — Distribution, radar charts, violin plots, popularity comparison
- Temporal Analysis — Release trends, decade evolution, monthly patterns
- Musical Key & Mode Analysis — Key distribution, major/minor split
- Artist Analysis — Top artists, audio feature profiles
- Advanced Visualizations — Pair plots, 2D density, hierarchical clustering
| Library | Purpose |
|---|---|
| NumPy | Numerical operations and array computations |
| Pandas | Data manipulation, cleaning, and aggregation |
| Matplotlib | Static visualizations and custom plot layouts |
| Seaborn | Statistical visualizations and enhanced aesthetics |
| SciPy | Skewness, kurtosis, and statistical tests |
| Jupyter | Interactive notebook environment |
- Comparative analysis with low-popularity tracks
- Predictive model to forecast track popularity
- K-means clustering to identify track archetypes
- Time-series forecasting of audio feature trends
- Interactive dashboard with Plotly or Streamlit
This project is licensed under the MIT License. See LICENSE for details.
- Dataset sourced from the Spotify Web API
- Built with Python's data science ecosystem