Production-grade real-time ML feature store and inference system simulating applied AI infrastructure used in fintech fraud detection.
This project demonstrates:
- Streaming ingestion (Redpanda / Kafka-compatible)
- Online feature store (Redis HASH)
- Offline feature store (Parquet snapshots)
- Feature aggregation pipeline
- Model training pipeline
- Multi-worker FastAPI inference service
- Concurrency benchmarking
- Prometheus metrics instrumentation
- Dockerized deployment
This is not a notebook project — it is a production-style ML system.
The system follows a layered ML infrastructure design:
- Streaming ingestion via Redpanda
- Real-time feature aggregation
- Dual feature storage (Redis + Parquet)
- Offline training pipeline
- Multi-worker FastAPI inference service
- Prometheus metrics instrumentation
- Python 3.12
- FastAPI
- Redis (Online Feature Store - HASH schema)
- Redpanda (Kafka-compatible streaming)
- Parquet (Offline Feature Store)
- Scikit-learn
- Prometheus metrics
- Docker
- p50 latency: ~7 ms
- p95 latency: ~15 ms
- Max latency: ~20 ms
- Throughput: ~2650 requests/sec
- p50 latency: ~15 ms
- p95 latency: ~29 ms
- Max latency: ~42 ms
Demonstrates stable tail latency under concurrent load.
Start API in production mode:
uvicorn src.inference.api:app --workers 4 --host 0.0.0.0 --port 8000
Run benchmark:
python benchmark.py
docker-compose up --build
Prometheus metrics available at:
/metrics
Exposed metrics:
- inference_requests_total
- inference_latency_seconds (histogram)
{
"user_id": "user_10",
"fraud_probability": 0.6747,
"risk_level": "MEDIUM",
"latency_ms": 3.33,
"model_version": "v1.0"
}
- Online vs Offline feature store separation
- Redis HASH schema for low-latency retrieval
- Multi-worker scaling
- Tail latency optimization
- Concurrency benchmarking
- Prometheus instrumentation
- Dockerized service
- Sync Redis client (simpler, slightly higher tail latency)
- In-memory model loading per worker (increased RAM usage)
- No horizontal load balancer (single-instance test)
- Deploy behind reverse proxy (NGINX)
- Add horizontal replicas
- Add async Redis client
- Introduce model version routing
- Add feature consistency validation
- Redis unavailable → prediction failure
- Feature missing → 404 returned
- Model missing → startup failure
- Worker crash → process-level isolation
- Online/Offline feature consistency checker
- Model A/B testing
- Kubernetes deployment
- Horizontal auto-scaling
- Feature freshness monitoring
