diff --git a/.ai/context/architecture.md b/.ai/context/architecture.md
new file mode 100644
index 0000000..8caa790
--- /dev/null
+++ b/.ai/context/architecture.md
@@ -0,0 +1,52 @@
+# Architecture — execution flow
+
+**User diagrams:** [`docs/architecture.md`](../../docs/architecture.md).
+
+## 1. Configuration loading
+
+- YAML study file (`examples/study_config_local_exec.yaml` reference).
+- `StudyConfig.from_file()` in `core/config.py` — parse, validate objectives, storage, parameters.
+- CLI `optimize` (`cli/main.py`) copies config beside SQLite when `storage_file` is set.
+
+## 2. Study setup
+
+- `StudyController.create_from_config(LocalExecutionBackend, config)`:
+  - Optional PostgreSQL (`storage/postgres_utils.py`).
+  - `get_storage(config)` → SQLite / PostgreSQL (`storage/utils.py`).
+  - Optuna `Study` + sampler (TPE, Grid, NSGA-II, …).
+  - `CentralizedLogger` if logging block present.
+
+## 3. Study loop
+
+- Baselines: `_run_baseline_trials` → enqueue + run with default/static params.
+- Optimization: `study.ask()` → `TrialConfig` → `backend.submit_trial()` → poll → `study.tell()`.
+- Failures: error classification → trial user attrs.
+- Optional `optimization.log_metrics` → extra user attrs (PR #22).
+
+## 4. Backend (supported path)
+
+**`LocalExecutionBackend`** (`execution/backends.py`): thread pool, `LocalTrialController`, `poll_trials`, `cleanup_all_trials`.
+
+Legacy: `RayExecutionBackend` exists for upstream compatibility; not the fork focus.
+
+## 5. Single trial
+
+`BaseTrialController.run_trial()` (`execution/trial_controller.py`):
+
+1. Validate imports (vllm, guidellm, optuna).
+2. `GuideLLMBenchmark` from `benchmarks/providers.py`.
+3. `_start_vllm_server()` → `_wait_for_server_ready()`.
+4. State machine: `WAITING_FOR_VLLM` → `RUNNING_BENCHMARK`.
+5. Metrics → objectives; `cleanup_resources()` on exit/cancel.
+
+## 6. Storage & logs
+
+- Optuna: `study.storage_file` or `study.database_url`.
+- Logs: `logging/manager.py` (file and/or DB).
+- Dashboard: `optuna_dashboard/start_optuna_dashboard.sh`.
+
+## 7. Cleanup
+
+- Per trial: kill vLLM + benchmark process group.
+- Study end / interrupt: `backend.cleanup_all_trials()`, `shutdown()`.
+- Known gap: orphan vLLM if parent killed abruptly (issue #2).
diff --git a/.ai/context/current-work.md b/.ai/context/current-work.md
new file mode 100644
index 0000000..4dcda86
--- /dev/null
+++ b/.ai/context/current-work.md
@@ -0,0 +1,41 @@
+# Current work
+
+_Last updated from local `gh` / git — refresh before large changes._
+
+## Open pull requests (InseeFrLab)
+
+| PR | Branch | Objective | Status | Next step |
+|----|--------|-----------|--------|-----------|
+| [#22](https://github.com/InseeFrLab/auto-tuning-vllm/pull/22) | `FEAT/optuna-user-attrs-log-metrics` | `optimization.log_metrics` → Optuna user attrs for dashboard | OPEN | Review + merge; ensure docs/example match `StudyConfig` validation |
+| [#21](https://github.com/InseeFrLab/auto-tuning-vllm/pull/21) | `fix/exclude-baseline-trials-budget` | Baselines must not increment `completed_trials` / consume `n_trials` | OPEN | Merge; run `pytest tests/core/test_study_controller.py` |
+| [#17](https://github.com/InseeFrLab/auto-tuning-vllm/pull/17) | `fix/guidellm-cli-preflight` | GuideLLM CLI preflight + pin `vllm<=0.19` | OPEN | Resolve overlap with issue #19 / current `pyproject` vllm pin |
+| [#13](https://github.com/InseeFrLab/auto-tuning-vllm/pull/13) | `fix/local-backend-cleanup` | Cooperative cancel + cleanup on local backend | OPEN | Merge after manual interrupt test |
+
+## Remote branches (not all have open PRs)
+
+| Branch | Notes |
+|--------|--------|
+| `origin/FEAT/custom-metrics` | Merged as #18 on main |
+| `origin/FEAT/grid-cardinality-auto-switch` | Merged as #7 |
+| `origin/FEAT/ray-optional` | Legacy: Ray optional extra (merged) |
+| `origin/add-optuna-dashboard-example` | Dashboard launcher (#14 merged) |
+| `origin/add-startup-timeout-baseline-run` | Startup timeout for baselines — **verify if merged or stale** |
+| `origin/ci-setup` | CI workflow (#8) |
+| `origin/renovate/configure` | Dependency bot config |
+
+## README roadmap (main)
+
+| Item | Status | Next step |
+|------|--------|-----------|
+| Comprehensive test suite | In progress (small `tests/` tree) | Add controller/backend tests per PR #21 pattern |
+| CI runs tests strictly | Partial | Remove `pytest ... \|\| true` in `ci.yml` when suite is stable |
+| Dependency pinning / hygiene | Open | Align `pyproject.toml` with supported vLLM/GuideLLM matrix |
+| CLI validation / error messages | Open | Extend `StudyConfig` errors + Typer messages |
+| Speculative decoding params | Future | Design parameter module + example YAML |
+| Extra benchmark providers | Future | Implement `BenchmarkProvider` subclass |
+
+## Maintainer TODO (fill if stale)
+
+- **Active local branch:** `FEAT/optuna-user-attrs-log-metrics` — confirm whether uncommitted edits on `config.py` / `study_controller.py` belong in PR #22.
+- **Production study configs:** _Add paths or naming convention used internally._
+- **Target vLLM version for production:** _e.g. 0.19 vs 0.20+ — drives issue #19 resolution._
diff --git a/.ai/context/diagrams.md b/.ai/context/diagrams.md
new file mode 100644
index 0000000..efe15f1
--- /dev/null
+++ b/.ai/context/diagrams.md
@@ -0,0 +1,5 @@
+# Diagrams (agents)
+
+User-facing Mermaid diagrams live in **[`docs/architecture.md`](../../docs/architecture.md)**.
+
+When changing structure, update that file (see [`.ai/skills/architecture-diagrams.md`](../skills/architecture-diagrams.md)).
diff --git a/.ai/context/external-links.md b/.ai/context/external-links.md
new file mode 100644
index 0000000..31502c8
--- /dev/null
+++ b/.ai/context/external-links.md
@@ -0,0 +1,37 @@
+# External links
+
+## Repositories
+
+| Resource | URL |
+|----------|-----|
+| Fork | https://github.com/InseeFrLab/auto-tuning-vllm |
+| Upstream | https://github.com/openshift-psap/auto-tuning-vllm |
+| GuideLLM | https://github.com/neuralmagic/guidellm |
+| vLLM | https://github.com/vllm-project/vllm |
+
+## Documentation
+
+| Topic | URL |
+|-------|-----|
+| vLLM | https://docs.vllm.ai/ |
+| Optuna | https://optuna.readthedocs.io/ |
+| Optuna Dashboard | https://github.com/optuna/optuna-dashboard |
+
+## In-repo
+
+| Doc | Path |
+|-----|------|
+| Quick start | `docs/quick_start.md` |
+| Configuration | `docs/configuration.md` |
+| Architecture (diagrams) | `docs/architecture.md` |
+
+## Legacy (Ray — not agent focus)
+
+| Doc | Path |
+|-----|------|
+| Ray cluster | `docs/ray_cluster_setup.md` |
+| Ray auto-start | `docs/ray_auto_start.md` |
+
+## GitHub (fork)
+
+Issues: https://github.com/InseeFrLab/auto-tuning-vllm/issues — see `current-work.md` for open PRs.
diff --git a/.ai/context/history.md b/.ai/context/history.md
new file mode 100644
index 0000000..337c723
--- /dev/null
+++ b/.ai/context/history.md
@@ -0,0 +1,43 @@
+# History — decisions to preserve
+
+## Execution model
+
+| Decision | Reference |
+|----------|-----------|
+| **Local backend is the product path** | `LocalExecutionBackend`; fork README |
+| **Do not kill parent process group** on vLLM cleanup | upstream PR #92 |
+| Ray backend kept as **legacy / optional** extra | `db1e9ab`; issue #3 — not primary development |
+
+## Optuna / study
+
+| Decision | Reference |
+|----------|-----------|
+| Baselines visible in Optuna dashboard | upstream PR #111 |
+| Failed trial attrs for sampler | #93, #97 |
+| Constraint sampling | #101 |
+| Grid cardinality auto-switch | fork PR #7 |
+| Custom metric expressions | fork PR #18 |
+| `max_concurrent_trials` naming | upstream #122, #125 |
+
+## Benchmarking
+
+| Decision | Reference |
+|----------|-----------|
+| GuideLLM as default provider | `benchmarks/providers.py` |
+| Process-group benchmark terminate | `BenchmarkProvider` |
+
+## Config / vLLM
+
+| Decision | Reference |
+|----------|-----------|
+| Versioned defaults in `schemas/vllm_defaults/` | `version_manager.py` |
+| Config validation in Python (no separate JSON schema) | upstream #110 |
+
+## Tooling
+
+| Decision | Reference |
+|----------|-----------|
+| CI: Ruff + pytest matrix | fork PR #8 |
+| Optuna Dashboard script | fork PR #14 |
+
+Upstream: [openshift-psap/auto-tuning-vllm](https://github.com/openshift-psap/auto-tuning-vllm). Fork emphasizes **local execution**, tests, and dependency control.
diff --git a/.ai/context/known-issues.md b/.ai/context/known-issues.md
new file mode 100644
index 0000000..6c7861b
--- /dev/null
+++ b/.ai/context/known-issues.md
@@ -0,0 +1,22 @@
+# Known issues
+
+Update this file when merging fixes (no separate triage skill).
+
+| Title | Status | Link | Component | Next action |
+|-------|--------|------|-----------|-------------|
+| GuideLLM + vLLM ≥ 0.20 | open | [#19](https://github.com/InseeFrLab/auto-tuning-vllm/issues/19) | `providers.py`, deps | Merge #17 or document pins |
+| GuideLLM + transformers ≥ 5 | open | [#15](https://github.com/InseeFrLab/auto-tuning-vllm/issues/15) | GuideLLM | Reproduce; track upstream |
+| Orphan vLLM on parent stop | open | [#2](https://github.com/InseeFrLab/auto-tuning-vllm/issues/2) | `trial_controller.py` | Merge #13 |
+| Local backend cleanup | fix pending | [#13](https://github.com/InseeFrLab/auto-tuning-vllm/pull/13) | `backends.py` | Merge PR |
+| Baselines consume `n_trials` | fix pending | [#21](https://github.com/InseeFrLab/auto-tuning-vllm/pull/21) | `study_controller.py` | Merge PR |
+| CI pytest non-blocking | open | `ci.yml` | CI | Remove `\|\| true` when stable |
+| Basic usage tests | open | [#4](https://github.com/InseeFrLab/auto-tuning-vllm/issues/4) | `tests/` | Expand pytest |
+| Ray removal / deprecation | open | [#3](https://github.com/InseeFrLab/auto-tuning-vllm/issues/3) | `backends.py` | Legacy only; local path default |
+
+## Code TODOs
+
+| File | Note |
+|------|------|
+| `cli/main.py` | Sync log streaming |
+| `trial_controller.py` | Remove debug health logging |
+| `config.py` | Split int/float range parameter types |
diff --git a/.ai/context/repo-map.md b/.ai/context/repo-map.md
new file mode 100644
index 0000000..c1f0617
--- /dev/null
+++ b/.ai/context/repo-map.md
@@ -0,0 +1,26 @@
+# Repository map
+
+| Path | Role |
+|------|------|
+| `auto_tune_vllm/` | Python package |
+| `auto_tune_vllm/cli/main.py` | Typer CLI: `optimize`, `resume`, `logs` |
+| `auto_tune_vllm/core/config.py` | `StudyConfig.from_file()` |
+| `auto_tune_vllm/core/study_controller.py` | Optuna loop, baselines, concurrency |
+| `auto_tune_vllm/core/trial.py` | `TrialConfig`, `TrialResult` |
+| `auto_tune_vllm/core/parameters.py` | Search-space types |
+| `auto_tune_vllm/core/storage/` | Optuna storage, PostgreSQL helpers |
+| `auto_tune_vllm/execution/backends.py` | `LocalExecutionBackend` (+ legacy Ray class) |
+| `auto_tune_vllm/execution/trial_controller.py` | vLLM + GuideLLM + cleanup |
+| `auto_tune_vllm/benchmarks/` | `GuideLLMBenchmark`, `BenchmarkConfig` |
+| `auto_tune_vllm/logging/` | Centralized trial logs |
+| `auto_tune_vllm/utils/` | Grid cardinality, vLLM CLI, versioned defaults |
+| `auto_tune_vllm/schemas/vllm_defaults/` | Per-version default YAML |
+| `docs/` | `quick_start.md`, `architecture.md`, `configuration.md` |
+| `examples/` | Study YAMLs and demos |
+| `tests/` | Pytest (`core/`, `execution/`) |
+| `optuna_dashboard/` | Dashboard launcher + sample DB |
+| `.github/workflows/ci.yml` | Ruff, pytest matrix |
+| `pyproject.toml` | Dependencies and tooling |
+| `README.md` | Install and usage |
+
+**CLI:** `auto-tune-vllm` → `auto_tune_vllm.cli:main`
diff --git a/.ai/skills/architecture-diagrams.md b/.ai/skills/architecture-diagrams.md
new file mode 100644
index 0000000..3a7cc6f
--- /dev/null
+++ b/.ai/skills/architecture-diagrams.md
@@ -0,0 +1,28 @@
+# Skill: Architecture diagrams
+
+## User doc (source of truth)
+
+**[`docs/architecture.md`](../../docs/architecture.md)** — Mermaid diagrams for contributors and users. Also linked from `README.md`.
+
+Agent context: [`.ai/context/architecture.md`](../context/architecture.md) (prose only).
+
+## When to update `docs/architecture.md`
+
+| Change | Section |
+|--------|---------|
+| New package / module layout | Repository layout |
+| Study or trial flow | End-to-end flow, Study orchestration, Single trial lifecycle |
+| Storage / logs | Outputs per study |
+| Import graph | Module dependencies |
+
+## Rules
+
+1. Use real module paths (`study_controller.py`).
+2. Default path = **local** backend; Ray at most one sentence, no extra diagrams.
+3. Mermaid only (GitHub renders natively).
+4. PR: note “updated docs/architecture.md” when structure changes.
+
+## Do not
+
+- Duplicate full diagrams under `.ai/context/`.
+- Run `auto-tune-vllm optimize` to validate diagrams.
diff --git a/.ai/skills/docs-writer.md b/.ai/skills/docs-writer.md
new file mode 100644
index 0000000..b820076
--- /dev/null
+++ b/.ai/skills/docs-writer.md
@@ -0,0 +1,24 @@
+# Skill: Docs writer
+
+## Scope
+
+| Audience | Files |
+|----------|-------|
+| Users | `README.md`, `docs/quick_start.md`, `docs/configuration.md` |
+| Examples | `examples/*.yaml` |
+| Agents | `.ai/context/*` |
+
+## Rules
+
+1. Runnable commands: `pip install -e .`, `ruff`, `pytest`, `auto-tune-vllm --help` — E2E optimize only in maintainer sections.
+2. YAML keys match `StudyConfig` in `core/config.py`.
+3. Link GitHub issues instead of long incident writeups.
+4. Structural changes → update `docs/architecture.md` per `architecture-diagrams.md`.
+
+## Ray
+
+Legacy user docs live in `docs/ray_*.md`; do not expand Ray in agent context unless deprecating.
+
+## Agents must not document
+
+“Run optimize to verify” as an agent step — see `AGENTS.md`.
diff --git a/.ai/skills/pr-reviewer.md b/.ai/skills/pr-reviewer.md
new file mode 100644
index 0000000..06df62b
--- /dev/null
+++ b/.ai/skills/pr-reviewer.md
@@ -0,0 +1,70 @@
+# Skill: PR reviewer
+
+Review = **read the diff**, **reason about behavior**, optionally **lint + unit tests**.
+**Never run the autotuner** (`auto-tune-vllm optimize`, `resume`, or any command that starts vLLM / GuideLLM / GPU work). Maintainers run end-to-end studies manually.
+
+## Allowed commands (agents)
+
+```bash
+source venv/bin/activate
+ruff check .
+pytest -v tests/
+# optional: basedpyright (if enabled locally)
+```
+
+## Review workflow
+
+1. Read PR description and linked issues.
+2. Walk changed files; trace call path from `cli/main.py` or `StudyController` when relevant.
+3. Run `ruff check .` and `pytest -v tests/` if environment is available.
+4. Record findings in the output format below.
+
+## Config & CLI
+
+- [ ] `StudyConfig.from_file()` — new fields validated; errors actionable.
+- [ ] `examples/*.yaml` + `docs/configuration.md` aligned.
+- [ ] Typer options in `cli/main.py` documented when added.
+
+## Local execution path (primary)
+
+- [ ] `LocalExecutionBackend` — submit/poll/cancel/cleanup semantics still coherent.
+- [ ] `trial_controller.py` — vLLM + GuideLLM lifecycle, cancellation, `cleanup_resources()`.
+- [ ] No regression for install **without** Ray (`pip install -e .` only).
+
+## Optuna
+
+- [ ] `study.ask()` / `study.tell()` paired; failures → `FAIL` + user attrs.
+- [ ] Baseline vs optimization trial counting (`n_trials`, PR #21 context).
+- [ ] Grid / sampler / multi-objective values consistent.
+
+## Benchmarks & metrics
+
+- [ ] `benchmarks/providers.py` — GuideLLM CLI args from `BenchmarkConfig`.
+- [ ] Objective expressions match `ObjectiveConfig.valid_metrics_combined`.
+
+## Tests & docs
+
+- [ ] New behavior covered in `tests/` without mandatory GPU.
+- [ ] User-facing docs updated when behavior or YAML changes.
+
+## Legacy Ray (only if PR touches `RayExecutionBackend`)
+
+- [ ] Optional import still works; no new hard dependency on `ray` in core install path.
+- [ ] No Ray-specific review steps unless the diff is explicitly Ray-related.
+
+## Output format
+
+```markdown
+### Blockers
+- ...
+
+### Questions
+- ...
+
+### Nits
+- ...
+
+### Checks run
+- [ ] ruff
+- [ ] pytest
+```
diff --git a/.ai/skills/pr-writer.md b/.ai/skills/pr-writer.md
new file mode 100644
index 0000000..9b98da2
--- /dev/null
+++ b/.ai/skills/pr-writer.md
@@ -0,0 +1,49 @@
+# Skill: PR writer
+
+## Before opening
+
+1. Branch from `main`; one logical change per PR.
+2. Run `ruff check .` and `pytest -v tests/` (`venv`).
+3. Update `docs/configuration.md` + `examples/*.yaml` if schema or CLI changed.
+4. **Do not** list `auto-tune-vllm optimize` as agent-run validation; maintainers run E2E manually.
+
+## Title convention
+
+- `[FEAT]` / `[FIX]` / `[CI]` / `[Docs]`
+
+## Template
+
+```markdown
+## Summary
+<what changed>
+
+## Why
+<problem or goal>
+
+## What changed
+- `path/module.py` — …
+- `tests/...` — …
+- `docs/` or `examples/` — …
+
+## How tested
+- [ ] `ruff check .`
+- [ ] `pytest -v tests/...`
+- [ ] Manual E2E (maintainer): auto-tune-vllm optimize …
+
+## Risks / limitations
+- …
+
+## Links
+- Closes #…
+```
+
+## Fork checks
+
+- Baseline vs `n_trials` if `study_controller.py` touched.
+- Optuna storage (SQLite vs PostgreSQL).
+- `pyproject.toml` pins (vLLM / GuideLLM).
+- Structural change → update `docs/architecture.md` (see `architecture-diagrams.md`).
+
+## After merge
+
+Update `.ai/context/current-work.md` and `.ai/context/known-issues.md` when relevant.
diff --git a/.ai/skills/test-writer.md b/.ai/skills/test-writer.md
new file mode 100644
index 0000000..4117e18
--- /dev/null
+++ b/.ai/skills/test-writer.md
@@ -0,0 +1,24 @@
+# Skill: Test writer
+
+## Run
+
+```bash
+source venv/bin/activate
+pytest -v tests/
+```
+
+**Do not** use full `auto-tune-vllm optimize` in tests or agent workflows.
+
+## Priority
+
+1. `StudyConfig.from_file` / validation (`tests/core/`)
+2. Metric expressions (`tests/execution/test_evaluate_metric_expression.py`)
+3. `StudyController` with fake `ExecutionBackend` (no vLLM)
+4. Mock `subprocess` for GuideLLM in `providers.py`
+5. GPU integration only when explicitly requested by maintainer
+
+## Patterns
+
+- Optuna: `sqlite:///:memory:`
+- Backend fake: return `TrialResult` on first `poll_trials`
+- No CUDA in default CI matrix
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000..1c8c246
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,46 @@
+# Agent guide — auto-tuning-vllm (InseeFrLab fork)
+
+Hyperparameter optimization for **vLLM** serving: YAML configs, **Optuna**, **GuideLLM** benchmarks, **local** execution (`LocalExecutionBackend`). PostgreSQL storage optional.
+
+> **Agents:** do not run `auto-tune-vllm optimize` / `resume` (GPU, vLLM, long jobs). Use lint + `pytest` only.
+
+## Context files
+
+| File | Purpose |
+|------|---------|
+| [.ai/context/repo-map.md](.ai/context/repo-map.md) | Directories and entry points |
+| [.ai/context/architecture.md](.ai/context/architecture.md) | Execution flow (prose); diagrams in `docs/architecture.md` |
+| [.ai/context/current-work.md](.ai/context/current-work.md) | Open PRs, roadmap |
+| [.ai/context/known-issues.md](.ai/context/known-issues.md) | Bugs and limitations |
+| [.ai/context/history.md](.ai/context/history.md) | Design decisions |
+| [.ai/context/external-links.md](.ai/context/external-links.md) | External docs |
+
+## Skills
+
+| Skill | Use when |
+|-------|----------|
+| [.ai/skills/pr-writer.md](.ai/skills/pr-writer.md) | Drafting a PR |
+| [.ai/skills/pr-reviewer.md](.ai/skills/pr-reviewer.md) | Reviewing a PR (diff + ruff + pytest) |
+| [.ai/skills/test-writer.md](.ai/skills/test-writer.md) | Adding tests |
+| [.ai/skills/docs-writer.md](.ai/skills/docs-writer.md) | README / `docs/` |
+| [.ai/skills/architecture-diagrams.md](.ai/skills/architecture-diagrams.md) | Updating `docs/architecture.md` |
+
+## Priorities
+
+1. **Local backend** — `LocalExecutionBackend`, `BaseTrialController` / `LocalTrialController`.
+2. **Config** — `StudyConfig`; sync `docs/configuration.md` + `examples/*.yaml`.
+3. **Trial lifecycle** — vLLM subprocess, GuideLLM, cancellation, cleanup (`trial_controller.py`, `backends.py`).
+4. **Optuna** — `ask`/`tell`, baselines, `core/storage/utils.py`.
+5. **Tests** — `pytest` under `tests/`; no GPU in default CI.
+6. **Small diffs** — match existing patterns.
+
+## Commands (safe for agents)
+
+```bash
+source venv/bin/activate
+pip install -e ".[dev]"
+ruff check .
+pytest -v tests/
+```
+
+Fork: https://github.com/InseeFrLab/auto-tuning-vllm
diff --git a/README.md b/README.md
index f5b908a..2dfa2ab 100644
--- a/README.md
+++ b/README.md
@@ -70,6 +70,7 @@ auto-tune-vllm logs --study-name study_35884
 ## Documentation
 
 - [Quick Start Guide](docs/quick_start.md) - Get running in 5 minutes
+- [Architecture overview](docs/architecture.md) - How the framework works (diagrams)
 - [Configuration Reference](docs/configuration.md) - Complete YAML configuration guide
 - [Ray Cluster Setup](docs/ray_cluster_setup.md) - For distributed optimization (optional)
 
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..9e614bb
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,183 @@
+# Architecture overview
+
+How **auto-tune-vllm** fits together: YAML configuration, Optuna studies, local trial execution, vLLM serving, and GuideLLM benchmarks.
+
+The default path uses **`LocalExecutionBackend`** on a single machine. An optional Ray backend exists for legacy distributed setups (see [Ray Cluster Setup](ray_cluster_setup.md)).
+
+---
+
+## End-to-end flow
+
+1. You provide a **study YAML** (see [Configuration Reference](configuration.md)).
+2. The CLI loads **`StudyConfig`** and creates a **`StudyController`** with an Optuna study (SQLite or PostgreSQL).
+3. Optional **baseline trials** run with default vLLM parameters.
+4. The optimizer loops: suggest parameters → run trial → record metrics in Optuna.
+5. Each trial starts **vLLM**, waits until healthy, runs **GuideLLM**, then cleans up processes.
+
+```mermaid
+flowchart TD
+  A[Study YAML] --> B["StudyConfig.from_file()"]
+  B --> C["auto-tune-vllm optimize / resume"]
+  C --> D["StudyController"]
+  D --> E["Optuna study + storage"]
+  D --> F["LocalExecutionBackend"]
+
+  F --> G[Baseline trials]
+  G --> H["Loop: ask → run trial → tell"]
+
+  H --> I[TrialConfig]
+  I --> J["TrialController.run_trial()"]
+
+  J --> K[vLLM subprocess]
+  K --> L[Server healthy]
+  L --> M[GuideLLM benchmark]
+  M --> N[Objectives + metrics]
+  N --> O["Optuna tell()"]
+  O --> P[Cleanup]
+
+  E -.-> O
+```
+
+---
+
+## Repository layout
+
+```mermaid
+flowchart TB
+  subgraph root["Repository"]
+    README["README.md"]
+    PY["pyproject.toml"]
+    PKG["auto_tune_vllm/"]
+    DOCS["docs/"]
+    EX["examples/"]
+    TESTS["tests/"]
+    DASH["optuna_dashboard/"]
+  end
+
+  subgraph pkg["auto_tune_vllm package"]
+    CLI["cli/"]
+    CORE["core/"]
+    EXEC["execution/"]
+    BENCH["benchmarks/"]
+    LOG["logging/"]
+    UTIL["utils/"]
+  end
+
+  PKG --> CLI
+  PKG --> CORE
+  PKG --> EXEC
+  PKG --> BENCH
+  PKG --> LOG
+  PKG --> UTIL
+
+  CORE --> CFG["config.py — YAML model"]
+  CORE --> SC["study_controller.py — Optuna loop"]
+  EXEC --> BE["backends.py — local execution"]
+  EXEC --> TC["trial_controller.py — vLLM + benchmark"]
+  BENCH --> PROV["providers.py — GuideLLM"]
+```
+
+| Area | Responsibility |
+|------|----------------|
+| `cli/` | Commands: `optimize`, `resume`, `logs` |
+| `core/` | Config, study orchestration, Optuna storage |
+| `execution/` | Backends and per-trial runtime |
+| `benchmarks/` | GuideLLM integration |
+| `logging/` | Centralized trial logs |
+| `examples/` | Sample study YAML files |
+
+---
+
+## Study orchestration
+
+```mermaid
+sequenceDiagram
+  participant User
+  participant CLI as CLI
+  participant SC as StudyController
+  participant BE as LocalExecutionBackend
+  participant TC as TrialController
+  participant O as Optuna
+
+  User->>CLI: optimize --config study.yaml
+  CLI->>SC: create_from_config()
+  SC->>O: create or load study
+  SC->>SC: run baselines
+  loop optimization trials
+    SC->>O: ask()
+    SC->>BE: submit_trial()
+    BE->>TC: run_trial()
+    TC-->>BE: TrialResult
+    BE-->>SC: poll completed
+    SC->>O: tell(metric values)
+  end
+  CLI->>BE: cleanup / shutdown
+```
+
+Concurrency is controlled by **`--max-concurrent-trials`**: several trials may run in parallel, each with its own vLLM process (subject to GPU memory).
+
+---
+
+## Single trial lifecycle
+
+```mermaid
+stateDiagram-v2
+  [*] --> ValidateEnv
+  ValidateEnv --> StartVLLM
+  StartVLLM --> WaitReady: process started
+  WaitReady --> RunBenchmark: HTTP health OK
+  RunBenchmark --> ParseMetrics: GuideLLM finished
+  ParseMetrics --> Cleanup
+  Cleanup --> [*]
+
+  StartVLLM --> Cleanup: error or cancel
+  WaitReady --> Cleanup: timeout or cancel
+  RunBenchmark --> Cleanup: error or cancel
+```
+
+On failure, error details are stored on the Optuna trial (user attributes) to help the sampler avoid repeating bad configurations.
+
+---
+
+## Module dependencies (simplified)
+
+```mermaid
+flowchart LR
+  CLI["cli/main.py"] --> CFG["core/config.py"]
+  CLI --> SC["core/study_controller.py"]
+  CLI --> BE["execution/backends.py"]
+
+  SC --> CFG
+  SC --> BE
+  BE --> TC["execution/trial_controller.py"]
+  TC --> PROV["benchmarks/providers.py"]
+  TC --> LOGM["logging/manager.py"]
+```
+
+---
+
+## Outputs per study
+
+```mermaid
+flowchart LR
+  YAML[study_config.yaml] --> DB[(Optuna DB)]
+  YAML --> LOGS[Trial log directory]
+  TC2[Trial run] --> VLOG[vLLM logs]
+  TC2 --> GJSON[GuideLLM results]
+  GJSON --> MET[Metrics]
+  MET --> DB
+```
+
+Typical locations:
+
+- **Optuna database** — path from `study.storage_file` (SQLite) or `study.database_url` (PostgreSQL).
+- **Logs** — `logging.file_path` in your YAML.
+- **Dashboard** — `./optuna_dashboard/start_optuna_dashboard.sh path/to/study.db`
+
+---
+
+## Related docs
+
+- [Quick Start](quick_start.md)
+- [Configuration Reference](configuration.md)
+- [Ray Cluster Setup](ray_cluster_setup.md) (optional, legacy distributed path)
diff --git a/docs/quick_start.md b/docs/quick_start.md
index 72c394b..5928eb8 100644
--- a/docs/quick_start.md
+++ b/docs/quick_start.md
@@ -51,6 +51,8 @@ auto-tune-vllm --help
 
 Start from [`examples/study_config_local_exec.yaml`](../examples/study_config_local_exec.yaml) for a full example configuration file.
 
+For a visual overview of the runtime, see [Architecture overview](architecture.md).
+
 Key configuration areas:
 - Set/confirm the study name and model ([Study Configuration](configuration.md#study-configuration))
 - Choose the optimization objective(s) (e.g., throughput) ([Optimization Configuration](configuration.md#optimization-configuration))
diff --git a/examples/study_config_local_exec.yaml b/examples/study_config_local_exec.yaml
index ecc83b5..6a8a2e6 100644
--- a/examples/study_config_local_exec.yaml
+++ b/examples/study_config_local_exec.yaml
@@ -63,8 +63,9 @@ parameters: # parameters to optimize
     enabled: true
     options: [0, 256, 512, 1024, 2048, 4096, 8192, 1150]
 
+  # Unsupported on vLLM V1 (see static_environment_variables.VLLM_USE_V1 below).
   max_num_partial_prefills:
-    enabled: true
+    enabled: false
     options: [1, 2, 4, 8]
 
   max_seq_len_to_capture: