🚀 Real-Time ML Feature Store & Low-Latency Inference System

🧠 Overview

Production-grade real-time ML feature store and inference system simulating applied AI infrastructure used in fintech fraud detection.

This project demonstrates:

Streaming ingestion (Redpanda / Kafka-compatible)
Online feature store (Redis HASH)
Offline feature store (Parquet snapshots)
Feature aggregation pipeline
Model training pipeline
Multi-worker FastAPI inference service
Concurrency benchmarking
Prometheus metrics instrumentation
Dockerized deployment

This is not a notebook project — it is a production-style ML system.

🏗 Architecture

The system follows a layered ML infrastructure design:

Streaming ingestion via Redpanda
Real-time feature aggregation
Dual feature storage (Redis + Parquet)
Offline training pipeline
Multi-worker FastAPI inference service
Prometheus metrics instrumentation

⚙ Tech Stack

Python 3.12
FastAPI
Redis (Online Feature Store - HASH schema)
Redpanda (Kafka-compatible streaming)
Parquet (Offline Feature Store)
Scikit-learn
Prometheus metrics
Docker

📊 Performance Benchmarks

Production Mode (4 Workers)

p50 latency: ~7 ms
p95 latency: ~15 ms
Max latency: ~20 ms

Stress Test (1000 Requests / 50 Threads)

Throughput: ~2650 requests/sec
p50 latency: ~15 ms
p95 latency: ~29 ms
Max latency: ~42 ms

Demonstrates stable tail latency under concurrent load.

🚀 Running Locally

Start API in production mode:

uvicorn src.inference.api:app --workers 4 --host 0.0.0.0 --port 8000

Run benchmark:

python benchmark.py

🐳 Docker Deployment

docker-compose up --build

📈 Observability

Prometheus metrics available at:

/metrics

Exposed metrics:

inference_requests_total
inference_latency_seconds (histogram)

🧪 Example Inference Response

{
  "user_id": "user_10",
  "fraud_probability": 0.6747,
  "risk_level": "MEDIUM",
  "latency_ms": 3.33,
  "model_version": "v1.0"
}

🏗 Engineering Highlights

Online vs Offline feature store separation
Redis HASH schema for low-latency retrieval
Multi-worker scaling
Tail latency optimization
Concurrency benchmarking
Prometheus instrumentation
Dockerized service

📌 Tradeoffs

Sync Redis client (simpler, slightly higher tail latency)
In-memory model loading per worker (increased RAM usage)
No horizontal load balancer (single-instance test)

📈 Scaling Plan

Deploy behind reverse proxy (NGINX)
Add horizontal replicas
Add async Redis client
Introduce model version routing
Add feature consistency validation

⚠ Failure Modes Considered

Redis unavailable → prediction failure
Feature missing → 404 returned
Model missing → startup failure
Worker crash → process-level isolation

🔮 Future Enhancements

Online/Offline feature consistency checker
Model A/B testing
Kubernetes deployment
Horizontal auto-scaling
Feature freshness monitoring

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
benchmark.py		benchmark.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Real-Time ML Feature Store & Low-Latency Inference System

🧠 Overview

🏗 Architecture

⚙ Tech Stack

📊 Performance Benchmarks

Production Mode (4 Workers)

Stress Test (1000 Requests / 50 Threads)

🚀 Running Locally

🐳 Docker Deployment

📈 Observability

🧪 Example Inference Response

🏗 Engineering Highlights

📌 Tradeoffs

📈 Scaling Plan

⚠ Failure Modes Considered

🔮 Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Real-Time ML Feature Store & Low-Latency Inference System

🧠 Overview

🏗 Architecture

⚙ Tech Stack

📊 Performance Benchmarks

Production Mode (4 Workers)

Stress Test (1000 Requests / 50 Threads)

🚀 Running Locally

🐳 Docker Deployment

📈 Observability

🧪 Example Inference Response

🏗 Engineering Highlights

📌 Tradeoffs

📈 Scaling Plan

⚠ Failure Modes Considered

🔮 Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages