Multimodal Sentiment Classification

A highly robust, production-ready MLOps repository for predicting the sentiment (Negative, Neutral, Positive) of multimodal inputs using Modern Deep Learning stacks and APIs.

🚀 Key Features

Multi-Modal Fusion: Dynamically processes & fuses Text (RoBERTa), Images (ViT), and Audio (wav2vec2).
Clean Architecture: Refactored from dispersed Jupyter Notebooks into a strictly typed, modular pipeline (src/models/, src/data/, src/pipelines/).
Automated Data Ingestion: One-command aggregation from sources like MSCTD and Kaggle InstaNY100K.
SLURM Ready: Contains pre-configured batch scripts to queue on clusters effortlessly.
Experiment Tracking: Integrated Weights & Biases (wandb) to log every batch, metric, and checkpoint automatically.
Beautiful FastAPI Server: Production interface wrapped in a stunning glassmorphism UI.

📁 Repository Structure

app/: FastAPI application server and UI templates.
data/: Internal datastore handling downloaded and processed dataset files.
notebooks/: Contains test_development.ipynb - a single unified playground for Jupyter experimentation.
slurm/: Job submission files.
src/: Core logic (Configuration, Dataloaders, Deep Learning Models, Preprocessors).

🛠 Setup & Installation

Create a .env file from the example:

cp .env.example .env
# Edit .env with your keys (Kaggle API, Github, WandB)

Install dependencies:

pip install -r requirements.txt

🧠 Training & Pipelines

To execute the full lifecycle on a SLURM queue:

# Load your system envs properly in setup_env.sh
bash slurm/setup_env.sh

Or run steps manually:

python src/data/ingestion.py   # Download datasets
python src/pipelines/train.py  # Train Multimodal Network

🌐 Running the UI Web Server

Start the frontend interface and inference engine locally:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Navigate to http://localhost:8000/. You can submit text, upload images and standard wav files to generate predictions instantly.

Audio Processing Note 🎵

Audio feature extraction is completely optional. Provide a .wav file to the UI, and it routes dynamically through wav2vec2. If audio is omitted, the framework gracefully applies zeros to the fusion space without crashing.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
app		app
docs		docs
notebooks		notebooks
slurm		slurm
src		src
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.yml		config.yml
environment.yml		environment.yml
example_py.png		example_py.png
multimodal.png		multimodal.png
multimodel-sentiment.ipynb		multimodel-sentiment.ipynb
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Sentiment Classification

🚀 Key Features

📁 Repository Structure

🛠 Setup & Installation

🧠 Training & Pipelines

🌐 Running the UI Web Server

Audio Processing Note 🎵

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Sentiment Classification

🚀 Key Features

📁 Repository Structure

🛠 Setup & Installation

🧠 Training & Pipelines

🌐 Running the UI Web Server

Audio Processing Note 🎵

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages