- Overview
- Key Features
- Quick Start
- Architecture
- Documentation
- Configuration
- Troubleshooting
- License
- Support & Contributions
kube-opex-analytics is a Kubernetes usage accounting and analytics tool that helps organizations track CPU, Memory, and GPU resources consumed by their clusters over time (hourly, daily, monthly).
It provides insightful usage analytics metrics and charts that engineering and financial teams can use as key indicators for cost optimization decisions.
- CPU - Core usage and requests per namespace
- Memory - RAM consumption and requests per namespace
- GPU - NVIDIA GPU utilization via DCGM integration (v26.01.0-beta1 or later)
Multi-cluster Integration: kube-opex-analytics tracks usage for a single Kubernetes cluster. For centralized multi-cluster analytics, see Krossboard Kubernetes Operator (demo video).
| Feature | Description |
|---|---|
| Hourly/Daily/Monthly Trends | Tracks actual usage and requested capacities per namespace, collected every 5 minutes and consolidated hourly |
| Non-allocatable Capacity Tracking | Highlights system overhead (OS, kubelets) vs. usable application capacity at node and cluster levels |
| Cluster Capacity Planning | Visualize consumed capacity globally, instantly, and over time |
| Usage Efficiency Analysis | Compare resource requests against actual usage to identify over/under-provisioning |
| Cost Allocation & Chargeback | Automatic resource usage accounting per namespace for billing and showback |
| Prometheus Integration | Native exporter at /metrics for Grafana dashboards and alerting |
- Kubernetes cluster v1.19+ (or OpenShift 4.x+)
kubectlconfigured with cluster access- Helm 3.x (fine-tuned installation) or
kubectlfor a basic opinionated deployment - Cluster permissions: read access to pods, nodes, and namespaces
- Kubernetes Metrics Server deployed in your cluster (required for CPU and memory metrics)
- NVIDIA DCGM Exporter deployed in your cluster (required for GPU metrics, optional if no GPUs)
Before installing, ensure metrics-server is running in your cluster:
# Check if metrics-server is deployed
kubectl -n kube-system get deploy | grep metrics-server
# Verify it's working
kubectl top nodes
# If not installed, deploy with kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlIf your cluster has NVIDIA GPUs and you want GPU metrics, ensure DCGM Exporter is running:
# Check if DCGM Exporter is deployed
kubectl get daemonset -A | grep dcgm
# If not installed, deploy with Helm (requires NVIDIA GPU Operator or drivers)
helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts
helm install dcgm-exporter gpu-helm-charts/dcgm-exporter \
--namespace gpu-operator \
--create-namespacegit clone https://github.com/rchakode/kube-opex-analytics.git --depth=1
cd kube-opex-analyticsOpenShift users: Skip this section and use Helm installation with OpenShift-specific settings.
# Create namespace
kubectl create namespace kube-opex-analytics
# Deploy using Kustomize
kubectl apply -k ./manifests/kustomize -n kube-opex-analytics
# Watch pod status
kubectl get pods -n kube-opex-analytics -wFor advanced customization (OpenShift, custom storage, etc.), edit manifests/helm/values.yaml:
- OpenShift: Set
securityContext.openshift: true - Custom storage: Set
dataVolume.storageClassanddataVolume.capacity - DCGM Integration: Set
dcgm.enable: trueanddcgm.endpoint
Then run:
# Create namespace
kubectl create namespace kube-opex-analytics
# Install with Helm
helm upgrade --install kube-opex-analytics ./manifests/helm -n kube-opex-analytics
# Watch pod status
kubectl get pods -n kube-opex-analytics -w# Port-forward to access the UI
kubectl port-forward svc/kube-opex-analytics 5483:80 -n kube-opex-analytics
# Open http://localhost:5483 in your browserRequires kubectl proxy running locally to provide API access:
# Start kubectl proxy in background
kubectl proxy &
# Run kube-opex-analytics
docker run -d \
--net="host" \
--name kube-opex-analytics \
-v /var/lib/kube-opex-analytics:/data \
-e KOA_DB_LOCATION=/data/db \
-e KOA_K8S_API_ENDPOINT=http://127.0.0.1:8001 \
rchakode/kube-opex-analytics
# Access at http://localhost:5483┌───────────────────┐
│ Metrics Server │──┐
│ (CPU/Memory) │ │ ┌──────────────────────────────────────┐
└───────────────────┘ ├───>│ kube-opex-analytics │
┌───────────────────┐ │ │ ┌─────────┐ ┌────────┐ ┌─────────┐│
│ DCGM Exporter │──┘ │ │ Poller │─>│RRD DBs │─>│ API ││
│ (GPU metrics) │ │ │ (5 min) │ │ │ │ ││
└───────────────────┘ │ └─────────┘ └────────┘ └────┬────┘│
└───────────────────────────────┼──────┘
│
┌───────────────────────────────┼──────┐
│ v │
│ ┌────────────┐ ┌────────────┐ │
│ │ Web UI │ │ /metrics │ │
│ │ (D3.js) │ │ (Prometheus│ │
│ └────────────┘ └────────────┘ │
└──────────────────────────────────────┘
│ │
v v
Built-in Dashboards Grafana/Alerting
Data Flow:
- Metrics polled every 5 minutes (configurable):
- CPU/Memory from Kubernetes Metrics Server
- GPU from NVIDIA DCGM Exporter
- Metrics are processed and stored in internal lightweight time-series databases (round-robin DBs)
- Data is consolidated into hourly, daily, and monthly aggregates
- API serves data to the built-in web UI and Prometheus scraper
| Topic | Link |
|---|---|
| Installation on Kubernetes/OpenShift | docs/installation-on-kubernetes-and-openshift.md |
| Installation on Docker | docs/installation-on-docker.md |
| Built-in Dashboards | docs/built-in-dashboards-and-charts.md |
| Prometheus & Grafana | docs/prometheus-exporter-grafana-dashboard.md |
| Configuration Reference | docs/configuration-settings.md |
| Design Fundamentals | docs/design-fundamentals.md |
Key environment variables:
| Variable | Description | Default |
|---|---|---|
KOA_K8S_API_ENDPOINT |
Kubernetes API server URL | Required |
KOA_K8S_AUTH_TOKEN |
Service account token | Auto-detected in-cluster |
KOA_DB_LOCATION |
Path for RRDtool databases | /data |
KOA_POLLING_INTERVAL_SEC |
Metrics collection interval | 300 |
KOA_COST_MODEL |
Billing model (CUMULATIVE_RATIO, RATIO, CHARGE_BACK) |
CUMULATIVE_RATIO |
KOA_BILLING_HOURLY_RATE |
Hourly cost for chargeback model | -1.0 |
KOA_BILLING_CURRENCY_SYMBOL |
Currency symbol for cost display | $ |
KOA_NVIDIA_DCGM_ENDPOINT |
NVIDIA DCGM Exporter endpoint for GPU metrics | Not set (GPU disabled) |
To enable GPU metrics collection, set the DCGM Exporter endpoint:
# Environment variable
export KOA_NVIDIA_DCGM_ENDPOINT=http://dcgm-exporter.gpu-operator:9400/metrics
# Or with Helm
helm upgrade --install kube-opex-analytics ./manifests/helm \
--set dcgm.enabled=true \
--set dcgm.endpoint=http://dcgm-exporter.gpu-operator:9400/metricsSee Configuration Settings for the complete reference.
Pod stuck in CrashLoopBackOff
- Check logs:
kubectl logs -f deployment/kube-opex-analytics -n kube-opex-analytics - Verify RBAC permissions are correctly applied
- Ensure the service account has read access to pods and nodes
No data appearing in dashboard
- Wait at least 5-10 minutes for initial data collection
- Verify the pod can reach the Kubernetes API: check for connection errors in logs
- Confirm
KOA_K8S_API_ENDPOINTis correctly set
Metrics not appearing in Prometheus
- Ensure the
/metricsendpoint is accessible - Check ServiceMonitor/PodMonitor configuration if using Prometheus Operator
- Verify network policies allow Prometheus to scrape the pod
Pooling interval
- By default, the polling interval to collect raw metrics from Kubernetes API or NVIDIA DCGM is 300 seconds (5 minutes).
- You can increase this limit using the variable
KOA_POLLING_INTERVAL_SEC. Always use a multiple 300 seconds, as the backend RRD database is based on a 5-minutes resolution.
- Check existing GitHub Issues
- Review the Design Fundamentals for architectural context
kube-opex-analytics is licensed under Apache License 2.0.
Third-party library licenses are documented in NOTICE.
We welcome feedback and contributions!
- Report Issues: GitHub Issues
- Contribute Code: Pull Requests
All contributions must be released under Apache 2.0 License terms.
