diff --git a/.ai/context/architecture.md b/.ai/context/architecture.md new file mode 100644 index 0000000..8caa790 --- /dev/null +++ b/.ai/context/architecture.md @@ -0,0 +1,52 @@ +# Architecture — execution flow + +**User diagrams:** [`docs/architecture.md`](../../docs/architecture.md). + +## 1. Configuration loading + +- YAML study file (`examples/study_config_local_exec.yaml` reference). +- `StudyConfig.from_file()` in `core/config.py` — parse, validate objectives, storage, parameters. +- CLI `optimize` (`cli/main.py`) copies config beside SQLite when `storage_file` is set. + +## 2. Study setup + +- `StudyController.create_from_config(LocalExecutionBackend, config)`: + - Optional PostgreSQL (`storage/postgres_utils.py`). + - `get_storage(config)` → SQLite / PostgreSQL (`storage/utils.py`). + - Optuna `Study` + sampler (TPE, Grid, NSGA-II, …). + - `CentralizedLogger` if logging block present. + +## 3. Study loop + +- Baselines: `_run_baseline_trials` → enqueue + run with default/static params. +- Optimization: `study.ask()` → `TrialConfig` → `backend.submit_trial()` → poll → `study.tell()`. +- Failures: error classification → trial user attrs. +- Optional `optimization.log_metrics` → extra user attrs (PR #22). + +## 4. Backend (supported path) + +**`LocalExecutionBackend`** (`execution/backends.py`): thread pool, `LocalTrialController`, `poll_trials`, `cleanup_all_trials`. + +Legacy: `RayExecutionBackend` exists for upstream compatibility; not the fork focus. + +## 5. Single trial + +`BaseTrialController.run_trial()` (`execution/trial_controller.py`): + +1. Validate imports (vllm, guidellm, optuna). +2. `GuideLLMBenchmark` from `benchmarks/providers.py`. +3. `_start_vllm_server()` → `_wait_for_server_ready()`. +4. State machine: `WAITING_FOR_VLLM` → `RUNNING_BENCHMARK`. +5. Metrics → objectives; `cleanup_resources()` on exit/cancel. + +## 6. Storage & logs + +- Optuna: `study.storage_file` or `study.database_url`. +- Logs: `logging/manager.py` (file and/or DB). +- Dashboard: `optuna_dashboard/start_optuna_dashboard.sh`. + +## 7. Cleanup + +- Per trial: kill vLLM + benchmark process group. +- Study end / interrupt: `backend.cleanup_all_trials()`, `shutdown()`. +- Known gap: orphan vLLM if parent killed abruptly (issue #2). diff --git a/.ai/context/current-work.md b/.ai/context/current-work.md new file mode 100644 index 0000000..4dcda86 --- /dev/null +++ b/.ai/context/current-work.md @@ -0,0 +1,41 @@ +# Current work + +_Last updated from local `gh` / git — refresh before large changes._ + +## Open pull requests (InseeFrLab) + +| PR | Branch | Objective | Status | Next step | +|----|--------|-----------|--------|-----------| +| [#22](https://github.com/InseeFrLab/auto-tuning-vllm/pull/22) | `FEAT/optuna-user-attrs-log-metrics` | `optimization.log_metrics` → Optuna user attrs for dashboard | OPEN | Review + merge; ensure docs/example match `StudyConfig` validation | +| [#21](https://github.com/InseeFrLab/auto-tuning-vllm/pull/21) | `fix/exclude-baseline-trials-budget` | Baselines must not increment `completed_trials` / consume `n_trials` | OPEN | Merge; run `pytest tests/core/test_study_controller.py` | +| [#17](https://github.com/InseeFrLab/auto-tuning-vllm/pull/17) | `fix/guidellm-cli-preflight` | GuideLLM CLI preflight + pin `vllm<=0.19` | OPEN | Resolve overlap with issue #19 / current `pyproject` vllm pin | +| [#13](https://github.com/InseeFrLab/auto-tuning-vllm/pull/13) | `fix/local-backend-cleanup` | Cooperative cancel + cleanup on local backend | OPEN | Merge after manual interrupt test | + +## Remote branches (not all have open PRs) + +| Branch | Notes | +|--------|--------| +| `origin/FEAT/custom-metrics` | Merged as #18 on main | +| `origin/FEAT/grid-cardinality-auto-switch` | Merged as #7 | +| `origin/FEAT/ray-optional` | Legacy: Ray optional extra (merged) | +| `origin/add-optuna-dashboard-example` | Dashboard launcher (#14 merged) | +| `origin/add-startup-timeout-baseline-run` | Startup timeout for baselines — **verify if merged or stale** | +| `origin/ci-setup` | CI workflow (#8) | +| `origin/renovate/configure` | Dependency bot config | + +## README roadmap (main) + +| Item | Status | Next step | +|------|--------|-----------| +| Comprehensive test suite | In progress (small `tests/` tree) | Add controller/backend tests per PR #21 pattern | +| CI runs tests strictly | Partial | Remove `pytest ... \|\| true` in `ci.yml` when suite is stable | +| Dependency pinning / hygiene | Open | Align `pyproject.toml` with supported vLLM/GuideLLM matrix | +| CLI validation / error messages | Open | Extend `StudyConfig` errors + Typer messages | +| Speculative decoding params | Future | Design parameter module + example YAML | +| Extra benchmark providers | Future | Implement `BenchmarkProvider` subclass | + +## Maintainer TODO (fill if stale) + +- **Active local branch:** `FEAT/optuna-user-attrs-log-metrics` — confirm whether uncommitted edits on `config.py` / `study_controller.py` belong in PR #22. +- **Production study configs:** _Add paths or naming convention used internally._ +- **Target vLLM version for production:** _e.g. 0.19 vs 0.20+ — drives issue #19 resolution._ diff --git a/.ai/context/diagrams.md b/.ai/context/diagrams.md new file mode 100644 index 0000000..efe15f1 --- /dev/null +++ b/.ai/context/diagrams.md @@ -0,0 +1,5 @@ +# Diagrams (agents) + +User-facing Mermaid diagrams live in **[`docs/architecture.md`](../../docs/architecture.md)**. + +When changing structure, update that file (see [`.ai/skills/architecture-diagrams.md`](../skills/architecture-diagrams.md)). diff --git a/.ai/context/external-links.md b/.ai/context/external-links.md new file mode 100644 index 0000000..31502c8 --- /dev/null +++ b/.ai/context/external-links.md @@ -0,0 +1,37 @@ +# External links + +## Repositories + +| Resource | URL | +|----------|-----| +| Fork | https://github.com/InseeFrLab/auto-tuning-vllm | +| Upstream | https://github.com/openshift-psap/auto-tuning-vllm | +| GuideLLM | https://github.com/neuralmagic/guidellm | +| vLLM | https://github.com/vllm-project/vllm | + +## Documentation + +| Topic | URL | +|-------|-----| +| vLLM | https://docs.vllm.ai/ | +| Optuna | https://optuna.readthedocs.io/ | +| Optuna Dashboard | https://github.com/optuna/optuna-dashboard | + +## In-repo + +| Doc | Path | +|-----|------| +| Quick start | `docs/quick_start.md` | +| Configuration | `docs/configuration.md` | +| Architecture (diagrams) | `docs/architecture.md` | + +## Legacy (Ray — not agent focus) + +| Doc | Path | +|-----|------| +| Ray cluster | `docs/ray_cluster_setup.md` | +| Ray auto-start | `docs/ray_auto_start.md` | + +## GitHub (fork) + +Issues: https://github.com/InseeFrLab/auto-tuning-vllm/issues — see `current-work.md` for open PRs. diff --git a/.ai/context/history.md b/.ai/context/history.md new file mode 100644 index 0000000..337c723 --- /dev/null +++ b/.ai/context/history.md @@ -0,0 +1,43 @@ +# History — decisions to preserve + +## Execution model + +| Decision | Reference | +|----------|-----------| +| **Local backend is the product path** | `LocalExecutionBackend`; fork README | +| **Do not kill parent process group** on vLLM cleanup | upstream PR #92 | +| Ray backend kept as **legacy / optional** extra | `db1e9ab`; issue #3 — not primary development | + +## Optuna / study + +| Decision | Reference | +|----------|-----------| +| Baselines visible in Optuna dashboard | upstream PR #111 | +| Failed trial attrs for sampler | #93, #97 | +| Constraint sampling | #101 | +| Grid cardinality auto-switch | fork PR #7 | +| Custom metric expressions | fork PR #18 | +| `max_concurrent_trials` naming | upstream #122, #125 | + +## Benchmarking + +| Decision | Reference | +|----------|-----------| +| GuideLLM as default provider | `benchmarks/providers.py` | +| Process-group benchmark terminate | `BenchmarkProvider` | + +## Config / vLLM + +| Decision | Reference | +|----------|-----------| +| Versioned defaults in `schemas/vllm_defaults/` | `version_manager.py` | +| Config validation in Python (no separate JSON schema) | upstream #110 | + +## Tooling + +| Decision | Reference | +|----------|-----------| +| CI: Ruff + pytest matrix | fork PR #8 | +| Optuna Dashboard script | fork PR #14 | + +Upstream: [openshift-psap/auto-tuning-vllm](https://github.com/openshift-psap/auto-tuning-vllm). Fork emphasizes **local execution**, tests, and dependency control. diff --git a/.ai/context/known-issues.md b/.ai/context/known-issues.md new file mode 100644 index 0000000..6c7861b --- /dev/null +++ b/.ai/context/known-issues.md @@ -0,0 +1,22 @@ +# Known issues + +Update this file when merging fixes (no separate triage skill). + +| Title | Status | Link | Component | Next action | +|-------|--------|------|-----------|-------------| +| GuideLLM + vLLM ≥ 0.20 | open | [#19](https://github.com/InseeFrLab/auto-tuning-vllm/issues/19) | `providers.py`, deps | Merge #17 or document pins | +| GuideLLM + transformers ≥ 5 | open | [#15](https://github.com/InseeFrLab/auto-tuning-vllm/issues/15) | GuideLLM | Reproduce; track upstream | +| Orphan vLLM on parent stop | open | [#2](https://github.com/InseeFrLab/auto-tuning-vllm/issues/2) | `trial_controller.py` | Merge #13 | +| Local backend cleanup | fix pending | [#13](https://github.com/InseeFrLab/auto-tuning-vllm/pull/13) | `backends.py` | Merge PR | +| Baselines consume `n_trials` | fix pending | [#21](https://github.com/InseeFrLab/auto-tuning-vllm/pull/21) | `study_controller.py` | Merge PR | +| CI pytest non-blocking | open | `ci.yml` | CI | Remove `\|\| true` when stable | +| Basic usage tests | open | [#4](https://github.com/InseeFrLab/auto-tuning-vllm/issues/4) | `tests/` | Expand pytest | +| Ray removal / deprecation | open | [#3](https://github.com/InseeFrLab/auto-tuning-vllm/issues/3) | `backends.py` | Legacy only; local path default | + +## Code TODOs + +| File | Note | +|------|------| +| `cli/main.py` | Sync log streaming | +| `trial_controller.py` | Remove debug health logging | +| `config.py` | Split int/float range parameter types | diff --git a/.ai/context/repo-map.md b/.ai/context/repo-map.md new file mode 100644 index 0000000..c1f0617 --- /dev/null +++ b/.ai/context/repo-map.md @@ -0,0 +1,26 @@ +# Repository map + +| Path | Role | +|------|------| +| `auto_tune_vllm/` | Python package | +| `auto_tune_vllm/cli/main.py` | Typer CLI: `optimize`, `resume`, `logs` | +| `auto_tune_vllm/core/config.py` | `StudyConfig.from_file()` | +| `auto_tune_vllm/core/study_controller.py` | Optuna loop, baselines, concurrency | +| `auto_tune_vllm/core/trial.py` | `TrialConfig`, `TrialResult` | +| `auto_tune_vllm/core/parameters.py` | Search-space types | +| `auto_tune_vllm/core/storage/` | Optuna storage, PostgreSQL helpers | +| `auto_tune_vllm/execution/backends.py` | `LocalExecutionBackend` (+ legacy Ray class) | +| `auto_tune_vllm/execution/trial_controller.py` | vLLM + GuideLLM + cleanup | +| `auto_tune_vllm/benchmarks/` | `GuideLLMBenchmark`, `BenchmarkConfig` | +| `auto_tune_vllm/logging/` | Centralized trial logs | +| `auto_tune_vllm/utils/` | Grid cardinality, vLLM CLI, versioned defaults | +| `auto_tune_vllm/schemas/vllm_defaults/` | Per-version default YAML | +| `docs/` | `quick_start.md`, `architecture.md`, `configuration.md` | +| `examples/` | Study YAMLs and demos | +| `tests/` | Pytest (`core/`, `execution/`) | +| `optuna_dashboard/` | Dashboard launcher + sample DB | +| `.github/workflows/ci.yml` | Ruff, pytest matrix | +| `pyproject.toml` | Dependencies and tooling | +| `README.md` | Install and usage | + +**CLI:** `auto-tune-vllm` → `auto_tune_vllm.cli:main` diff --git a/.ai/skills/architecture-diagrams.md b/.ai/skills/architecture-diagrams.md new file mode 100644 index 0000000..3a7cc6f --- /dev/null +++ b/.ai/skills/architecture-diagrams.md @@ -0,0 +1,28 @@ +# Skill: Architecture diagrams + +## User doc (source of truth) + +**[`docs/architecture.md`](../../docs/architecture.md)** — Mermaid diagrams for contributors and users. Also linked from `README.md`. + +Agent context: [`.ai/context/architecture.md`](../context/architecture.md) (prose only). + +## When to update `docs/architecture.md` + +| Change | Section | +|--------|---------| +| New package / module layout | Repository layout | +| Study or trial flow | End-to-end flow, Study orchestration, Single trial lifecycle | +| Storage / logs | Outputs per study | +| Import graph | Module dependencies | + +## Rules + +1. Use real module paths (`study_controller.py`). +2. Default path = **local** backend; Ray at most one sentence, no extra diagrams. +3. Mermaid only (GitHub renders natively). +4. PR: note “updated docs/architecture.md” when structure changes. + +## Do not + +- Duplicate full diagrams under `.ai/context/`. +- Run `auto-tune-vllm optimize` to validate diagrams. diff --git a/.ai/skills/docs-writer.md b/.ai/skills/docs-writer.md new file mode 100644 index 0000000..b820076 --- /dev/null +++ b/.ai/skills/docs-writer.md @@ -0,0 +1,24 @@ +# Skill: Docs writer + +## Scope + +| Audience | Files | +|----------|-------| +| Users | `README.md`, `docs/quick_start.md`, `docs/configuration.md` | +| Examples | `examples/*.yaml` | +| Agents | `.ai/context/*` | + +## Rules + +1. Runnable commands: `pip install -e .`, `ruff`, `pytest`, `auto-tune-vllm --help` — E2E optimize only in maintainer sections. +2. YAML keys match `StudyConfig` in `core/config.py`. +3. Link GitHub issues instead of long incident writeups. +4. Structural changes → update `docs/architecture.md` per `architecture-diagrams.md`. + +## Ray + +Legacy user docs live in `docs/ray_*.md`; do not expand Ray in agent context unless deprecating. + +## Agents must not document + +“Run optimize to verify” as an agent step — see `AGENTS.md`. diff --git a/.ai/skills/pr-reviewer.md b/.ai/skills/pr-reviewer.md new file mode 100644 index 0000000..06df62b --- /dev/null +++ b/.ai/skills/pr-reviewer.md @@ -0,0 +1,70 @@ +# Skill: PR reviewer + +Review = **read the diff**, **reason about behavior**, optionally **lint + unit tests**. +**Never run the autotuner** (`auto-tune-vllm optimize`, `resume`, or any command that starts vLLM / GuideLLM / GPU work). Maintainers run end-to-end studies manually. + +## Allowed commands (agents) + +```bash +source venv/bin/activate +ruff check . +pytest -v tests/ +# optional: basedpyright (if enabled locally) +``` + +## Review workflow + +1. Read PR description and linked issues. +2. Walk changed files; trace call path from `cli/main.py` or `StudyController` when relevant. +3. Run `ruff check .` and `pytest -v tests/` if environment is available. +4. Record findings in the output format below. + +## Config & CLI + +- [ ] `StudyConfig.from_file()` — new fields validated; errors actionable. +- [ ] `examples/*.yaml` + `docs/configuration.md` aligned. +- [ ] Typer options in `cli/main.py` documented when added. + +## Local execution path (primary) + +- [ ] `LocalExecutionBackend` — submit/poll/cancel/cleanup semantics still coherent. +- [ ] `trial_controller.py` — vLLM + GuideLLM lifecycle, cancellation, `cleanup_resources()`. +- [ ] No regression for install **without** Ray (`pip install -e .` only). + +## Optuna + +- [ ] `study.ask()` / `study.tell()` paired; failures → `FAIL` + user attrs. +- [ ] Baseline vs optimization trial counting (`n_trials`, PR #21 context). +- [ ] Grid / sampler / multi-objective values consistent. + +## Benchmarks & metrics + +- [ ] `benchmarks/providers.py` — GuideLLM CLI args from `BenchmarkConfig`. +- [ ] Objective expressions match `ObjectiveConfig.valid_metrics_combined`. + +## Tests & docs + +- [ ] New behavior covered in `tests/` without mandatory GPU. +- [ ] User-facing docs updated when behavior or YAML changes. + +## Legacy Ray (only if PR touches `RayExecutionBackend`) + +- [ ] Optional import still works; no new hard dependency on `ray` in core install path. +- [ ] No Ray-specific review steps unless the diff is explicitly Ray-related. + +## Output format + +```markdown +### Blockers +- ... + +### Questions +- ... + +### Nits +- ... + +### Checks run +- [ ] ruff +- [ ] pytest +``` diff --git a/.ai/skills/pr-writer.md b/.ai/skills/pr-writer.md new file mode 100644 index 0000000..9b98da2 --- /dev/null +++ b/.ai/skills/pr-writer.md @@ -0,0 +1,49 @@ +# Skill: PR writer + +## Before opening + +1. Branch from `main`; one logical change per PR. +2. Run `ruff check .` and `pytest -v tests/` (`venv`). +3. Update `docs/configuration.md` + `examples/*.yaml` if schema or CLI changed. +4. **Do not** list `auto-tune-vllm optimize` as agent-run validation; maintainers run E2E manually. + +## Title convention + +- `[FEAT]` / `[FIX]` / `[CI]` / `[Docs]` + +## Template + +```markdown +## Summary + + +## Why + + +## What changed +- `path/module.py` — … +- `tests/...` — … +- `docs/` or `examples/` — … + +## How tested +- [ ] `ruff check .` +- [ ] `pytest -v tests/...` +- [ ] Manual E2E (maintainer): auto-tune-vllm optimize … + +## Risks / limitations +- … + +## Links +- Closes #… +``` + +## Fork checks + +- Baseline vs `n_trials` if `study_controller.py` touched. +- Optuna storage (SQLite vs PostgreSQL). +- `pyproject.toml` pins (vLLM / GuideLLM). +- Structural change → update `docs/architecture.md` (see `architecture-diagrams.md`). + +## After merge + +Update `.ai/context/current-work.md` and `.ai/context/known-issues.md` when relevant. diff --git a/.ai/skills/test-writer.md b/.ai/skills/test-writer.md new file mode 100644 index 0000000..4117e18 --- /dev/null +++ b/.ai/skills/test-writer.md @@ -0,0 +1,24 @@ +# Skill: Test writer + +## Run + +```bash +source venv/bin/activate +pytest -v tests/ +``` + +**Do not** use full `auto-tune-vllm optimize` in tests or agent workflows. + +## Priority + +1. `StudyConfig.from_file` / validation (`tests/core/`) +2. Metric expressions (`tests/execution/test_evaluate_metric_expression.py`) +3. `StudyController` with fake `ExecutionBackend` (no vLLM) +4. Mock `subprocess` for GuideLLM in `providers.py` +5. GPU integration only when explicitly requested by maintainer + +## Patterns + +- Optuna: `sqlite:///:memory:` +- Backend fake: return `TrialResult` on first `poll_trials` +- No CUDA in default CI matrix diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..1c8c246 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,46 @@ +# Agent guide — auto-tuning-vllm (InseeFrLab fork) + +Hyperparameter optimization for **vLLM** serving: YAML configs, **Optuna**, **GuideLLM** benchmarks, **local** execution (`LocalExecutionBackend`). PostgreSQL storage optional. + +> **Agents:** do not run `auto-tune-vllm optimize` / `resume` (GPU, vLLM, long jobs). Use lint + `pytest` only. + +## Context files + +| File | Purpose | +|------|---------| +| [.ai/context/repo-map.md](.ai/context/repo-map.md) | Directories and entry points | +| [.ai/context/architecture.md](.ai/context/architecture.md) | Execution flow (prose); diagrams in `docs/architecture.md` | +| [.ai/context/current-work.md](.ai/context/current-work.md) | Open PRs, roadmap | +| [.ai/context/known-issues.md](.ai/context/known-issues.md) | Bugs and limitations | +| [.ai/context/history.md](.ai/context/history.md) | Design decisions | +| [.ai/context/external-links.md](.ai/context/external-links.md) | External docs | + +## Skills + +| Skill | Use when | +|-------|----------| +| [.ai/skills/pr-writer.md](.ai/skills/pr-writer.md) | Drafting a PR | +| [.ai/skills/pr-reviewer.md](.ai/skills/pr-reviewer.md) | Reviewing a PR (diff + ruff + pytest) | +| [.ai/skills/test-writer.md](.ai/skills/test-writer.md) | Adding tests | +| [.ai/skills/docs-writer.md](.ai/skills/docs-writer.md) | README / `docs/` | +| [.ai/skills/architecture-diagrams.md](.ai/skills/architecture-diagrams.md) | Updating `docs/architecture.md` | + +## Priorities + +1. **Local backend** — `LocalExecutionBackend`, `BaseTrialController` / `LocalTrialController`. +2. **Config** — `StudyConfig`; sync `docs/configuration.md` + `examples/*.yaml`. +3. **Trial lifecycle** — vLLM subprocess, GuideLLM, cancellation, cleanup (`trial_controller.py`, `backends.py`). +4. **Optuna** — `ask`/`tell`, baselines, `core/storage/utils.py`. +5. **Tests** — `pytest` under `tests/`; no GPU in default CI. +6. **Small diffs** — match existing patterns. + +## Commands (safe for agents) + +```bash +source venv/bin/activate +pip install -e ".[dev]" +ruff check . +pytest -v tests/ +``` + +Fork: https://github.com/InseeFrLab/auto-tuning-vllm diff --git a/README.md b/README.md index f5b908a..2dfa2ab 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,7 @@ auto-tune-vllm logs --study-name study_35884 ## Documentation - [Quick Start Guide](docs/quick_start.md) - Get running in 5 minutes +- [Architecture overview](docs/architecture.md) - How the framework works (diagrams) - [Configuration Reference](docs/configuration.md) - Complete YAML configuration guide - [Ray Cluster Setup](docs/ray_cluster_setup.md) - For distributed optimization (optional) diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..9e614bb --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,183 @@ +# Architecture overview + +How **auto-tune-vllm** fits together: YAML configuration, Optuna studies, local trial execution, vLLM serving, and GuideLLM benchmarks. + +The default path uses **`LocalExecutionBackend`** on a single machine. An optional Ray backend exists for legacy distributed setups (see [Ray Cluster Setup](ray_cluster_setup.md)). + +--- + +## End-to-end flow + +1. You provide a **study YAML** (see [Configuration Reference](configuration.md)). +2. The CLI loads **`StudyConfig`** and creates a **`StudyController`** with an Optuna study (SQLite or PostgreSQL). +3. Optional **baseline trials** run with default vLLM parameters. +4. The optimizer loops: suggest parameters → run trial → record metrics in Optuna. +5. Each trial starts **vLLM**, waits until healthy, runs **GuideLLM**, then cleans up processes. + +```mermaid +flowchart TD + A[Study YAML] --> B["StudyConfig.from_file()"] + B --> C["auto-tune-vllm optimize / resume"] + C --> D["StudyController"] + D --> E["Optuna study + storage"] + D --> F["LocalExecutionBackend"] + + F --> G[Baseline trials] + G --> H["Loop: ask → run trial → tell"] + + H --> I[TrialConfig] + I --> J["TrialController.run_trial()"] + + J --> K[vLLM subprocess] + K --> L[Server healthy] + L --> M[GuideLLM benchmark] + M --> N[Objectives + metrics] + N --> O["Optuna tell()"] + O --> P[Cleanup] + + E -.-> O +``` + +--- + +## Repository layout + +```mermaid +flowchart TB + subgraph root["Repository"] + README["README.md"] + PY["pyproject.toml"] + PKG["auto_tune_vllm/"] + DOCS["docs/"] + EX["examples/"] + TESTS["tests/"] + DASH["optuna_dashboard/"] + end + + subgraph pkg["auto_tune_vllm package"] + CLI["cli/"] + CORE["core/"] + EXEC["execution/"] + BENCH["benchmarks/"] + LOG["logging/"] + UTIL["utils/"] + end + + PKG --> CLI + PKG --> CORE + PKG --> EXEC + PKG --> BENCH + PKG --> LOG + PKG --> UTIL + + CORE --> CFG["config.py — YAML model"] + CORE --> SC["study_controller.py — Optuna loop"] + EXEC --> BE["backends.py — local execution"] + EXEC --> TC["trial_controller.py — vLLM + benchmark"] + BENCH --> PROV["providers.py — GuideLLM"] +``` + +| Area | Responsibility | +|------|----------------| +| `cli/` | Commands: `optimize`, `resume`, `logs` | +| `core/` | Config, study orchestration, Optuna storage | +| `execution/` | Backends and per-trial runtime | +| `benchmarks/` | GuideLLM integration | +| `logging/` | Centralized trial logs | +| `examples/` | Sample study YAML files | + +--- + +## Study orchestration + +```mermaid +sequenceDiagram + participant User + participant CLI as CLI + participant SC as StudyController + participant BE as LocalExecutionBackend + participant TC as TrialController + participant O as Optuna + + User->>CLI: optimize --config study.yaml + CLI->>SC: create_from_config() + SC->>O: create or load study + SC->>SC: run baselines + loop optimization trials + SC->>O: ask() + SC->>BE: submit_trial() + BE->>TC: run_trial() + TC-->>BE: TrialResult + BE-->>SC: poll completed + SC->>O: tell(metric values) + end + CLI->>BE: cleanup / shutdown +``` + +Concurrency is controlled by **`--max-concurrent-trials`**: several trials may run in parallel, each with its own vLLM process (subject to GPU memory). + +--- + +## Single trial lifecycle + +```mermaid +stateDiagram-v2 + [*] --> ValidateEnv + ValidateEnv --> StartVLLM + StartVLLM --> WaitReady: process started + WaitReady --> RunBenchmark: HTTP health OK + RunBenchmark --> ParseMetrics: GuideLLM finished + ParseMetrics --> Cleanup + Cleanup --> [*] + + StartVLLM --> Cleanup: error or cancel + WaitReady --> Cleanup: timeout or cancel + RunBenchmark --> Cleanup: error or cancel +``` + +On failure, error details are stored on the Optuna trial (user attributes) to help the sampler avoid repeating bad configurations. + +--- + +## Module dependencies (simplified) + +```mermaid +flowchart LR + CLI["cli/main.py"] --> CFG["core/config.py"] + CLI --> SC["core/study_controller.py"] + CLI --> BE["execution/backends.py"] + + SC --> CFG + SC --> BE + BE --> TC["execution/trial_controller.py"] + TC --> PROV["benchmarks/providers.py"] + TC --> LOGM["logging/manager.py"] +``` + +--- + +## Outputs per study + +```mermaid +flowchart LR + YAML[study_config.yaml] --> DB[(Optuna DB)] + YAML --> LOGS[Trial log directory] + TC2[Trial run] --> VLOG[vLLM logs] + TC2 --> GJSON[GuideLLM results] + GJSON --> MET[Metrics] + MET --> DB +``` + +Typical locations: + +- **Optuna database** — path from `study.storage_file` (SQLite) or `study.database_url` (PostgreSQL). +- **Logs** — `logging.file_path` in your YAML. +- **Dashboard** — `./optuna_dashboard/start_optuna_dashboard.sh path/to/study.db` + +--- + +## Related docs + +- [Quick Start](quick_start.md) +- [Configuration Reference](configuration.md) +- [Ray Cluster Setup](ray_cluster_setup.md) (optional, legacy distributed path) diff --git a/docs/quick_start.md b/docs/quick_start.md index 72c394b..5928eb8 100644 --- a/docs/quick_start.md +++ b/docs/quick_start.md @@ -51,6 +51,8 @@ auto-tune-vllm --help Start from [`examples/study_config_local_exec.yaml`](../examples/study_config_local_exec.yaml) for a full example configuration file. +For a visual overview of the runtime, see [Architecture overview](architecture.md). + Key configuration areas: - Set/confirm the study name and model ([Study Configuration](configuration.md#study-configuration)) - Choose the optimization objective(s) (e.g., throughput) ([Optimization Configuration](configuration.md#optimization-configuration)) diff --git a/examples/study_config_local_exec.yaml b/examples/study_config_local_exec.yaml index ecc83b5..6a8a2e6 100644 --- a/examples/study_config_local_exec.yaml +++ b/examples/study_config_local_exec.yaml @@ -63,8 +63,9 @@ parameters: # parameters to optimize enabled: true options: [0, 256, 512, 1024, 2048, 4096, 8192, 1150] + # Unsupported on vLLM V1 (see static_environment_variables.VLLM_USE_V1 below). max_num_partial_prefills: - enabled: true + enabled: false options: [1, 2, 4, 8] max_seq_len_to_capture: