InseeFrLab · VincentG1234 · May 20, 2026 · May 19, 2026
diff --git a/.ai/context/architecture.md b/.ai/context/architecture.md
@@ -0,0 +1,52 @@
+# Architecture — execution flow
+
+**User diagrams:** [`docs/architecture.md`](../../docs/architecture.md).
+
+## 1. Configuration loading
+
+- YAML study file (`examples/study_config_local_exec.yaml` reference).
+- `StudyConfig.from_file()` in `core/config.py` — parse, validate objectives, storage, parameters.
+- CLI `optimize` (`cli/main.py`) copies config beside SQLite when `storage_file` is set.
+
+## 2. Study setup
+
+- `StudyController.create_from_config(LocalExecutionBackend, config)`:
+  - Optional PostgreSQL (`storage/postgres_utils.py`).
+  - `get_storage(config)` → SQLite / PostgreSQL (`storage/utils.py`).
+  - Optuna `Study` + sampler (TPE, Grid, NSGA-II, …).
+  - `CentralizedLogger` if logging block present.
+
+## 3. Study loop
+
+- Baselines: `_run_baseline_trials` → enqueue + run with default/static params.
+- Optimization: `study.ask()` → `TrialConfig` → `backend.submit_trial()` → poll → `study.tell()`.
+- Failures: error classification → trial user attrs.
+- Optional `optimization.log_metrics` → extra user attrs (PR #22).
+
+## 4. Backend (supported path)
+
+**`LocalExecutionBackend`** (`execution/backends.py`): thread pool, `LocalTrialController`, `poll_trials`, `cleanup_all_trials`.
+
+Legacy: `RayExecutionBackend` exists for upstream compatibility; not the fork focus.
+
+## 5. Single trial
+
+`BaseTrialController.run_trial()` (`execution/trial_controller.py`):
+
+1. Validate imports (vllm, guidellm, optuna).
+2. `GuideLLMBenchmark` from `benchmarks/providers.py`.
+3. `_start_vllm_server()` → `_wait_for_server_ready()`.
+4. State machine: `WAITING_FOR_VLLM` → `RUNNING_BENCHMARK`.
+5. Metrics → objectives; `cleanup_resources()` on exit/cancel.
+
+## 6. Storage & logs
+
+- Optuna: `study.storage_file` or `study.database_url`.
+- Logs: `logging/manager.py` (file and/or DB).
+- Dashboard: `optuna_dashboard/start_optuna_dashboard.sh`.
+
+## 7. Cleanup
+
+- Per trial: kill vLLM + benchmark process group.
+- Study end / interrupt: `backend.cleanup_all_trials()`, `shutdown()`.
+- Known gap: orphan vLLM if parent killed abruptly (issue #2).
diff --git a/.ai/context/current-work.md b/.ai/context/current-work.md
@@ -0,0 +1,41 @@
+# Current work
+
+_Last updated from local `gh` / git — refresh before large changes._
+
+## Open pull requests (InseeFrLab)
+
+| PR | Branch | Objective | Status | Next step |
+|----|--------|-----------|--------|-----------|
+| [#22](https://github.com/InseeFrLab/auto-tuning-vllm/pull/22) | `FEAT/optuna-user-attrs-log-metrics` | `optimization.log_metrics` → Optuna user attrs for dashboard | OPEN | Review + merge; ensure docs/example match `StudyConfig` validation |
+| [#21](https://github.com/InseeFrLab/auto-tuning-vllm/pull/21) | `fix/exclude-baseline-trials-budget` | Baselines must not increment `completed_trials` / consume `n_trials` | OPEN | Merge; run `pytest tests/core/test_study_controller.py` |
+| [#17](https://github.com/InseeFrLab/auto-tuning-vllm/pull/17) | `fix/guidellm-cli-preflight` | GuideLLM CLI preflight + pin `vllm<=0.19` | OPEN | Resolve overlap with issue #19 / current `pyproject` vllm pin |
+| [#13](https://github.com/InseeFrLab/auto-tuning-vllm/pull/13) | `fix/local-backend-cleanup` | Cooperative cancel + cleanup on local backend | OPEN | Merge after manual interrupt test |
+
+## Remote branches (not all have open PRs)
+
+| Branch | Notes |
+|--------|--------|
+| `origin/FEAT/custom-metrics` | Merged as #18 on main |
+| `origin/FEAT/grid-cardinality-auto-switch` | Merged as #7 |
+| `origin/FEAT/ray-optional` | Legacy: Ray optional extra (merged) |
+| `origin/add-optuna-dashboard-example` | Dashboard launcher (#14 merged) |
+| `origin/add-startup-timeout-baseline-run` | Startup timeout for baselines — **verify if merged or stale** |
+| `origin/ci-setup` | CI workflow (#8) |
+| `origin/renovate/configure` | Dependency bot config |
+
+## README roadmap (main)
+
+| Item | Status | Next step |
+|------|--------|-----------|
+| Comprehensive test suite | In progress (small `tests/` tree) | Add controller/backend tests per PR #21 pattern |
+| CI runs tests strictly | Partial | Remove `pytest ... \|\| true` in `ci.yml` when suite is stable |
+| Dependency pinning / hygiene | Open | Align `pyproject.toml` with supported vLLM/GuideLLM matrix |
+| CLI validation / error messages | Open | Extend `StudyConfig` errors + Typer messages |
+| Speculative decoding params | Future | Design parameter module + example YAML |
+| Extra benchmark providers | Future | Implement `BenchmarkProvider` subclass |
+
+## Maintainer TODO (fill if stale)
+
+- **Active local branch:** `FEAT/optuna-user-attrs-log-metrics` — confirm whether uncommitted edits on `config.py` / `study_controller.py` belong in PR #22.
+- **Production study configs:** _Add paths or naming convention used internally._
+- **Target vLLM version for production:** _e.g. 0.19 vs 0.20+ — drives issue #19 resolution._
diff --git a/.ai/context/diagrams.md b/.ai/context/diagrams.md
@@ -0,0 +1,5 @@
+# Diagrams (agents)
+
+User-facing Mermaid diagrams live in **[`docs/architecture.md`](../../docs/architecture.md)**.
+
+When changing structure, update that file (see [`.ai/skills/architecture-diagrams.md`](../skills/architecture-diagrams.md)).
diff --git a/.ai/context/external-links.md b/.ai/context/external-links.md
@@ -0,0 +1,37 @@
+# External links
+
+## Repositories
+
+| Resource | URL |
+|----------|-----|
+| Fork | https://github.com/InseeFrLab/auto-tuning-vllm |
+| Upstream | https://github.com/openshift-psap/auto-tuning-vllm |
+| GuideLLM | https://github.com/neuralmagic/guidellm |
+| vLLM | https://github.com/vllm-project/vllm |
+
+## Documentation
+
+| Topic | URL |
+|-------|-----|
+| vLLM | https://docs.vllm.ai/ |
+| Optuna | https://optuna.readthedocs.io/ |
+| Optuna Dashboard | https://github.com/optuna/optuna-dashboard |
+
+## In-repo
+
+| Doc | Path |
+|-----|------|
+| Quick start | `docs/quick_start.md` |
+| Configuration | `docs/configuration.md` |
+| Architecture (diagrams) | `docs/architecture.md` |
+
+## Legacy (Ray — not agent focus)
+
+| Doc | Path |
+|-----|------|
+| Ray cluster | `docs/ray_cluster_setup.md` |
+| Ray auto-start | `docs/ray_auto_start.md` |
+
+## GitHub (fork)
+
+Issues: https://github.com/InseeFrLab/auto-tuning-vllm/issues — see `current-work.md` for open PRs.
diff --git a/.ai/context/history.md b/.ai/context/history.md
@@ -0,0 +1,43 @@
+# History — decisions to preserve
+
+## Execution model
+
+| Decision | Reference |
+|----------|-----------|
+| **Local backend is the product path** | `LocalExecutionBackend`; fork README |
+| **Do not kill parent process group** on vLLM cleanup | upstream PR #92 |
+| Ray backend kept as **legacy / optional** extra | `db1e9ab`; issue #3 — not primary development |
+
+## Optuna / study
+
+| Decision | Reference |
+|----------|-----------|
+| Baselines visible in Optuna dashboard | upstream PR #111 |
+| Failed trial attrs for sampler | #93, #97 |
+| Constraint sampling | #101 |
+| Grid cardinality auto-switch | fork PR #7 |
+| Custom metric expressions | fork PR #18 |
+| `max_concurrent_trials` naming | upstream #122, #125 |
+
+## Benchmarking
+
+| Decision | Reference |
+|----------|-----------|
+| GuideLLM as default provider | `benchmarks/providers.py` |
+| Process-group benchmark terminate | `BenchmarkProvider` |
+
+## Config / vLLM
+
+| Decision | Reference |
+|----------|-----------|
+| Versioned defaults in `schemas/vllm_defaults/` | `version_manager.py` |
+| Config validation in Python (no separate JSON schema) | upstream #110 |
+
+## Tooling
+
+| Decision | Reference |
+|----------|-----------|
+| CI: Ruff + pytest matrix | fork PR #8 |
+| Optuna Dashboard script | fork PR #14 |
+
+Upstream: [openshift-psap/auto-tuning-vllm](https://github.com/openshift-psap/auto-tuning-vllm). Fork emphasizes **local execution**, tests, and dependency control.
diff --git a/.ai/context/known-issues.md b/.ai/context/known-issues.md
@@ -0,0 +1,22 @@
+# Known issues
+
+Update this file when merging fixes (no separate triage skill).
+
+| Title | Status | Link | Component | Next action |
+|-------|--------|------|-----------|-------------|
+| GuideLLM + vLLM ≥ 0.20 | open | [#19](https://github.com/InseeFrLab/auto-tuning-vllm/issues/19) | `providers.py`, deps | Merge #17 or document pins |
+| GuideLLM + transformers ≥ 5 | open | [#15](https://github.com/InseeFrLab/auto-tuning-vllm/issues/15) | GuideLLM | Reproduce; track upstream |
+| Orphan vLLM on parent stop | open | [#2](https://github.com/InseeFrLab/auto-tuning-vllm/issues/2) | `trial_controller.py` | Merge #13 |
+| Local backend cleanup | fix pending | [#13](https://github.com/InseeFrLab/auto-tuning-vllm/pull/13) | `backends.py` | Merge PR |
+| Baselines consume `n_trials` | fix pending | [#21](https://github.com/InseeFrLab/auto-tuning-vllm/pull/21) | `study_controller.py` | Merge PR |
+| CI pytest non-blocking | open | `ci.yml` | CI | Remove `\|\| true` when stable |
+| Basic usage tests | open | [#4](https://github.com/InseeFrLab/auto-tuning-vllm/issues/4) | `tests/` | Expand pytest |
+| Ray removal / deprecation | open | [#3](https://github.com/InseeFrLab/auto-tuning-vllm/issues/3) | `backends.py` | Legacy only; local path default |
+
+## Code TODOs
+
+| File | Note |
+|------|------|
+| `cli/main.py` | Sync log streaming |
+| `trial_controller.py` | Remove debug health logging |
+| `config.py` | Split int/float range parameter types |
diff --git a/.ai/context/repo-map.md b/.ai/context/repo-map.md
@@ -0,0 +1,26 @@
+# Repository map
+
+| Path | Role |
+|------|------|
+| `auto_tune_vllm/` | Python package |
+| `auto_tune_vllm/cli/main.py` | Typer CLI: `optimize`, `resume`, `logs` |
+| `auto_tune_vllm/core/config.py` | `StudyConfig.from_file()` |
+| `auto_tune_vllm/core/study_controller.py` | Optuna loop, baselines, concurrency |
+| `auto_tune_vllm/core/trial.py` | `TrialConfig`, `TrialResult` |
+| `auto_tune_vllm/core/parameters.py` | Search-space types |
+| `auto_tune_vllm/core/storage/` | Optuna storage, PostgreSQL helpers |
+| `auto_tune_vllm/execution/backends.py` | `LocalExecutionBackend` (+ legacy Ray class) |
+| `auto_tune_vllm/execution/trial_controller.py` | vLLM + GuideLLM + cleanup |
+| `auto_tune_vllm/benchmarks/` | `GuideLLMBenchmark`, `BenchmarkConfig` |
+| `auto_tune_vllm/logging/` | Centralized trial logs |
+| `auto_tune_vllm/utils/` | Grid cardinality, vLLM CLI, versioned defaults |
+| `auto_tune_vllm/schemas/vllm_defaults/` | Per-version default YAML |
+| `docs/` | `quick_start.md`, `architecture.md`, `configuration.md` |
+| `examples/` | Study YAMLs and demos |
+| `tests/` | Pytest (`core/`, `execution/`) |
+| `optuna_dashboard/` | Dashboard launcher + sample DB |
+| `.github/workflows/ci.yml` | Ruff, pytest matrix |
+| `pyproject.toml` | Dependencies and tooling |
+| `README.md` | Install and usage |
+
+**CLI:** `auto-tune-vllm` → `auto_tune_vllm.cli:main`
diff --git a/.ai/skills/architecture-diagrams.md b/.ai/skills/architecture-diagrams.md
@@ -0,0 +1,28 @@
+# Skill: Architecture diagrams
+
+## User doc (source of truth)
+
+**[`docs/architecture.md`](../../docs/architecture.md)** — Mermaid diagrams for contributors and users. Also linked from `README.md`.
+
+Agent context: [`.ai/context/architecture.md`](../context/architecture.md) (prose only).
+
+## When to update `docs/architecture.md`
+
+| Change | Section |
+|--------|---------|
+| New package / module layout | Repository layout |
+| Study or trial flow | End-to-end flow, Study orchestration, Single trial lifecycle |
+| Storage / logs | Outputs per study |
+| Import graph | Module dependencies |
+
+## Rules
+
+1. Use real module paths (`study_controller.py`).
+2. Default path = **local** backend; Ray at most one sentence, no extra diagrams.
+3. Mermaid only (GitHub renders natively).
+4. PR: note “updated docs/architecture.md” when structure changes.
+
+## Do not
+
+- Duplicate full diagrams under `.ai/context/`.
+- Run `auto-tune-vllm optimize` to validate diagrams.
diff --git a/.ai/skills/docs-writer.md b/.ai/skills/docs-writer.md
@@ -0,0 +1,24 @@
+# Skill: Docs writer
+
+## Scope
+
+| Audience | Files |
+|----------|-------|
+| Users | `README.md`, `docs/quick_start.md`, `docs/configuration.md` |
+| Examples | `examples/*.yaml` |
+| Agents | `.ai/context/*` |
+
+## Rules
+
+1. Runnable commands: `pip install -e .`, `ruff`, `pytest`, `auto-tune-vllm --help` — E2E optimize only in maintainer sections.
+2. YAML keys match `StudyConfig` in `core/config.py`.
+3. Link GitHub issues instead of long incident writeups.
+4. Structural changes → update `docs/architecture.md` per `architecture-diagrams.md`.
+
+## Ray
+
+Legacy user docs live in `docs/ray_*.md`; do not expand Ray in agent context unless deprecating.
+
+## Agents must not document
+
+“Run optimize to verify” as an agent step — see `AGENTS.md`.
diff --git a/.ai/skills/pr-reviewer.md b/.ai/skills/pr-reviewer.md
@@ -0,0 +1,70 @@
+# Skill: PR reviewer
+
+Review = **read the diff**, **reason about behavior**, optionally **lint + unit tests**.
+**Never run the autotuner** (`auto-tune-vllm optimize`, `resume`, or any command that starts vLLM / GuideLLM / GPU work). Maintainers run end-to-end studies manually.
+
+## Allowed commands (agents)
+
+```bash
+source venv/bin/activate
+ruff check .
+pytest -v tests/
+# optional: basedpyright (if enabled locally)
+```
+
+## Review workflow
+
+1. Read PR description and linked issues.
+2. Walk changed files; trace call path from `cli/main.py` or `StudyController` when relevant.
+3. Run `ruff check .` and `pytest -v tests/` if environment is available.
+4. Record findings in the output format below.
+
+## Config & CLI
+
+- [ ] `StudyConfig.from_file()` — new fields validated; errors actionable.
+- [ ] `examples/*.yaml` + `docs/configuration.md` aligned.
+- [ ] Typer options in `cli/main.py` documented when added.
+
+## Local execution path (primary)
+
+- [ ] `LocalExecutionBackend` — submit/poll/cancel/cleanup semantics still coherent.
+- [ ] `trial_controller.py` — vLLM + GuideLLM lifecycle, cancellation, `cleanup_resources()`.
+- [ ] No regression for install **without** Ray (`pip install -e .` only).
+
+## Optuna
+
+- [ ] `study.ask()` / `study.tell()` paired; failures → `FAIL` + user attrs.
+- [ ] Baseline vs optimization trial counting (`n_trials`, PR #21 context).
+- [ ] Grid / sampler / multi-objective values consistent.
+
+## Benchmarks & metrics
+
+- [ ] `benchmarks/providers.py` — GuideLLM CLI args from `BenchmarkConfig`.
+- [ ] Objective expressions match `ObjectiveConfig.valid_metrics_combined`.
+
+## Tests & docs
+
+- [ ] New behavior covered in `tests/` without mandatory GPU.
+- [ ] User-facing docs updated when behavior or YAML changes.
+
+## Legacy Ray (only if PR touches `RayExecutionBackend`)
+
+- [ ] Optional import still works; no new hard dependency on `ray` in core install path.
+- [ ] No Ray-specific review steps unless the diff is explicitly Ray-related.
+
+## Output format
+
+```markdown
+### Blockers
+- ...
+
+### Questions
+- ...
+
+### Nits
+- ...
+
+### Checks run
+- [ ] ruff
+- [ ] pytest
+```