Skip to content

dileepkreddy5/real-time-ml-feature-store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python FastAPI Redis Kafka License

🚀 Real-Time ML Feature Store & Low-Latency Inference System

🧠 Overview

Production-grade real-time ML feature store and inference system simulating applied AI infrastructure used in fintech fraud detection.

This project demonstrates:

  • Streaming ingestion (Redpanda / Kafka-compatible)
  • Online feature store (Redis HASH)
  • Offline feature store (Parquet snapshots)
  • Feature aggregation pipeline
  • Model training pipeline
  • Multi-worker FastAPI inference service
  • Concurrency benchmarking
  • Prometheus metrics instrumentation
  • Dockerized deployment

This is not a notebook project — it is a production-style ML system.


🏗 Architecture

Architecture Diagram

The system follows a layered ML infrastructure design:

  1. Streaming ingestion via Redpanda
  2. Real-time feature aggregation
  3. Dual feature storage (Redis + Parquet)
  4. Offline training pipeline
  5. Multi-worker FastAPI inference service
  6. Prometheus metrics instrumentation

⚙ Tech Stack

  • Python 3.12
  • FastAPI
  • Redis (Online Feature Store - HASH schema)
  • Redpanda (Kafka-compatible streaming)
  • Parquet (Offline Feature Store)
  • Scikit-learn
  • Prometheus metrics
  • Docker

📊 Performance Benchmarks

Production Mode (4 Workers)

  • p50 latency: ~7 ms
  • p95 latency: ~15 ms
  • Max latency: ~20 ms

Stress Test (1000 Requests / 50 Threads)

  • Throughput: ~2650 requests/sec
  • p50 latency: ~15 ms
  • p95 latency: ~29 ms
  • Max latency: ~42 ms

Demonstrates stable tail latency under concurrent load.


🚀 Running Locally

Start API in production mode:

uvicorn src.inference.api:app --workers 4 --host 0.0.0.0 --port 8000

Run benchmark:

python benchmark.py

🐳 Docker Deployment

docker-compose up --build

📈 Observability

Prometheus metrics available at:

/metrics

Exposed metrics:

  • inference_requests_total
  • inference_latency_seconds (histogram)

🧪 Example Inference Response

{
  "user_id": "user_10",
  "fraud_probability": 0.6747,
  "risk_level": "MEDIUM",
  "latency_ms": 3.33,
  "model_version": "v1.0"
}

🏗 Engineering Highlights

  • Online vs Offline feature store separation
  • Redis HASH schema for low-latency retrieval
  • Multi-worker scaling
  • Tail latency optimization
  • Concurrency benchmarking
  • Prometheus instrumentation
  • Dockerized service

📌 Tradeoffs

  • Sync Redis client (simpler, slightly higher tail latency)
  • In-memory model loading per worker (increased RAM usage)
  • No horizontal load balancer (single-instance test)

📈 Scaling Plan

  • Deploy behind reverse proxy (NGINX)
  • Add horizontal replicas
  • Add async Redis client
  • Introduce model version routing
  • Add feature consistency validation

⚠ Failure Modes Considered

  • Redis unavailable → prediction failure
  • Feature missing → 404 returned
  • Model missing → startup failure
  • Worker crash → process-level isolation

🔮 Future Enhancements

  • Online/Offline feature consistency checker
  • Model A/B testing
  • Kubernetes deployment
  • Horizontal auto-scaling
  • Feature freshness monitoring

About

Production-style real-time ML feature store with low-latency inference

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors