Skip to content

Add Windrose metrics exporter#15

Open
carl-shipstuff wants to merge 4 commits into
mainfrom
codex/metrics-exporter
Open

Add Windrose metrics exporter#15
carl-shipstuff wants to merge 4 commits into
mainfrom
codex/metrics-exporter

Conversation

@carl-shipstuff
Copy link
Copy Markdown
Contributor

Summary

  • add stdlib Prometheus exporter in metrics.py, runnable as a standalone sidecar or imported by server.py for opt-in /metrics
  • wire opt-in Helm and Compose support for the metrics sidecar, ServiceMonitor, annotation-based Prometheus scraping, and packaged Grafana dashboard
  • add tests and CI coverage for exporter output, HTTP handlers, dashboard JSON, Helm metrics render, and hostNetwork metrics/UI port conflict validation

Validation

  • python3 -m py_compile server.py metrics.py scripts/patch-idle-cpu.py tests/test_metrics.py
  • python3 tests/test_metrics.py
  • python3 tests/test_retention.py
  • python3 tests/test_restore.py
  • python3 tests/test_auto_backup.py
  • python3 tests/test_server_control.py
  • python3 tests/test_schema.py
  • python3 tests/test_http.py
  • bash tests/test_install_env_merge.sh
  • bash tests/test_engine_ini_reconcile.sh
  • bash tests/test_shadow_stamp.sh
  • docker compose config --quiet
  • docker compose --profile metrics config --quiet
  • helm lint ./helm/windrose
  • Helm templates for default, managed, mutable, and metrics-enabled paths
  • verified default Helm render has no metrics resources
  • verified hostNetwork port conflict fails when metrics.port == service.port

Canary dogfood

  • built/pushed split-persistence-compatible image registry.local:30500/games/windrose-server:metrics-split-5554b8d
  • took pinned backup manual-20260505T020754Z
  • live-patched canary only because its current StatefulSet has storage drift that this main-based chart cannot safely represent
  • canary is 4/4 Running, zero restarts, with windrose-metrics on 28082
  • direct scrape returns windrose_exporter_scrape_success 1 and build labels for Steam build 23065343
  • Prometheus query windrose_exporter_scrape_success{job="windrose-canary"} returns one successful series
  • Grafana imported dashboard Windrose Server at /d/windrose-server/windrose-server

Note: https://windrose-canary.shipstuff.fun/api/status timed out from this environment; current canary Helm values have ingress.enabled=false, so validation used in-cluster service/API.

@seslly
Copy link
Copy Markdown
Contributor

seslly commented May 5, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c639463e7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +281 to +284
- name: WINDROSE_GAME_CPU_LIMIT
value: {{ .Values.resources.game.limits.cpu | default "" | quote }}
- name: WINDROSE_GAME_MEM_LIMIT
value: {{ .Values.resources.game.limits.memory | default "" | quote }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Source game limits from the game container, not sidecar

When .Values.resources.game.limits.* are unset, this sidecar gets empty WINDROSE_GAME_*_LIMIT values and metrics.py falls back to cgroup detection. In the windrose-metrics container, /sys/fs/cgroup/* reflects the sidecar’s own cgroup (for example its default 128Mi memory cap), not the game container’s, so windrose_cpu_limit_millicores / windrose_memory_limit_bytes can be wrong and dashboards/alerts will compute incorrect utilization. This is especially visible for installs that do not set explicit game CPU/memory limits in Helm values.

Useful? React with 👍 / 👎.

@seslly seslly force-pushed the codex/metrics-exporter branch from d8bde2e to 981b1ef Compare May 5, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants