Skip to content

iamparody/faang-mlops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FAANG Forecasting MLOps Project

mlops-cycle

Project Overview

A full-cycle MLOps project for FAANG stock price forecasting. Built to showcase core MLOps concepts like reproducibility, monitoring, CI/CD, and cloud readiness β€” all within a containerized setup. The project runs locally using Docker Compose and can be extended to cloud setups.


πŸ“Œ Problem Statement

Stock prices for high-growth tech companies (FAANG) are volatile, influenced by global economic trends, sector-specific developments, and investor sentiment. Accurately forecasting such prices is challenging but crucial for applications like:

Algorithmic trading strategies

Portfolio risk management

Investment decision-making

This project tackles the challenge of building a fully reproducible, deployable, and monitored forecasting pipeline that predicts FAANG stock closing prices using Linear Regression. Our goal was not just to build a model, but to design an MLOps-ready system that could be trained, deployed, and monitored in both local and cloud environments.

🎯 Project Goals

Develop a predictive model to forecast FAANG closing prices. Implement MLOps best practices: reproducibility, CI/CD, monitoring, containerization. Enable seamless deployment through Docker Compose with potential cloud migration.

πŸ›  Solution Approach

  1. Data Exploration & Preparation Collected historical FAANG stock price data.

Cleaned missing values, handled outliers, and engineered features (lag variables, rolling averages).

  1. Model Selection & Training Chose Linear Regression for its interpretability and quick iteration.

Used Scikit-Learn for training.

Logged experiments and metrics in MLflow.

  1. Deployment Built a FastAPI service for real-time predictions.

Containerized each service (model, monitoring, API) with Docker.

  1. Monitoring Used Evidently to detect data drift and monitor prediction quality.

Stored monitoring metrics in PostgreSQL and visualized them in Grafana.

  1. CI/CD & Automation Automated builds and tests with GitHub Actions.

Used Makefile for reproducible local workflows.

πŸ“ Project Structure

faang-mlops/
β”œβ”€β”€ orchestration/
β”‚   β”œβ”€β”€ mlflow_pipeline/        # MLflow tracking & model training
β”‚   β”œβ”€β”€ fastapi_app/            # FastAPI app for model serving
β”‚   β”œβ”€β”€ monitoring/             # Evidently + Grafana setup
β”‚   β”œβ”€β”€ data/                   # Reference and current CSV data for monitoring
β”‚   β”œβ”€β”€ requirements.txt        # Python dependencies
β”‚   └── Dockerfiles             # Each service has its own Dockerfile
β”œβ”€β”€ docker-compose.yml         # Service orchestration
β”œβ”€β”€ .env                       # Environment variables (not committed)
β”œβ”€β”€ Makefile                   # Workflow automation
└── README.md

βš™οΈ Technologies Used

  • MLflow: Model training, experiment tracking, and versioning
  • FastAPI: REST API for model inference
  • Evidently: Data and performance monitoring
  • Grafana + PostgreSQL: Visualization of monitoring metrics
  • Docker + Docker Compose: Container orchestration
  • Pandas & Scikit-Learn: Data manipulation & ML modeling

πŸ“Š Results

Model: Linear Regression

Metrics: 6.7 Delivered an MLOps-ready stack that is fully containerized, version-controlled, and monitored.

πŸ”¬ 1. Experimentation

Objective: Train a time-series regression model on FAANG stock data

πŸ”§ Steps:

  • Collected and preprocessed FAANG historical stock data
  • Performed feature engineering (rolling averages, lags)
  • Trained a linear regression model using Scikit-Learn
  • Logged metrics and model to MLflow

πŸ—‚οΈ Key Files:

  • orchestration/mlflow_pipeline/train.py
  • orchestration/mlflow_pipeline/config.yaml
  • orchestration/mlflow_pipeline/utils.py

🧠 Decisions:

  • Chose linear regression for interpretability
  • Used MLflow to version both metrics and artifacts

πŸš€ 2. Deployment

Objective: Serve the trained model via REST API

πŸ”§ Steps:

  • Loaded the latest MLflow model from local mlruns/
  • Created FastAPI endpoints for healthcheck and prediction
  • Containerized with Docker

πŸ—‚οΈ Key Files:

  • orchestration/fastapi_app/main.py
  • orchestration/fastapi_app/Dockerfile

πŸ”₯ How to Run:

make build-fastapi
make run-fastapi

πŸ“Š 3. Monitoring

Objective: Track data drift and prediction quality using Evidently

πŸ”§ Steps:

  • Compared reference.csv (baseline) vs current.csv
  • Generated HTML and JSON monitoring reports
  • Saved metrics to PostgreSQL
  • Visualized in Grafana

πŸ—‚οΈ Key Files:

  • orchestration/monitoring/monitor.py
  • orchestration/monitoring/grafana/ (dashboard config)
  • docker-compose.yml (services)

⚠️ Known Issues:

  • PostgreSQL SSL errors (solved by matching container names)
  • Resource-intensive on low-spec machines

πŸ§ͺ 4. Testing

Objective: Ensure stability of the FastAPI prediction pipeline

πŸ”§ Steps:

  • Wrote unit test for /predict route using pytest
  • Added pre-commit hooks for linting

πŸ—‚οΈ Key Files:

  • orchestration/fastapi_app/test_main.py
  • .pre-commit-config.yaml

πŸ” 5. Automation & CI/CD

Objective: Enable reproducible development and deployments

πŸ”§ Steps:

  • Added Makefile for repeatable workflows
  • Defined GitHub Actions for lint, test, and build

πŸ—‚οΈ Key Files:

  • Makefile
  • .github/workflows/ci.yml

🧠 Key Lessons Learned

  • Container isolation ensures reproducibility
  • Evidently + Grafana provides powerful monitoring with minimal setup
  • MLflow simplifies experiment tracking and version control
  • Modular development aids debugging and future extensions

🧱 Future Improvements

  • Use a lighter model (e.g., LightGBM or Ridge) for better performance
  • Add support for cloud deployment (e.g., Render or LocalStack)
  • Extend monitoring to support concept drift and multivariate alerts
  • Improve model retraining pipeline with DAG (e.g., Mage or Airflow)

🌍 Author

Kiriinya Antony MLOps | Data Engineering | Forecasting Systems

LinkedIn | GitHub | Nairobi, Kenya


πŸ“¦ How to Run Entire Stack

# Build all services
make build-all OR docker compose up(from the root folder(faang-mlops))

# Start the stack (MLflow + FastAPI + Monitoring)
make up

# Open dashboards:
# MLflow:     http://localhost:5000
# FastAPI:    http://localhost:8000/docs
# Grafana:    http://localhost:3000 (admin/admin)

About

End-to-End MLOps pipeline for FAANG stock prediction using MLflow, FastAPI, Grafana, Mage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors