[FEAT]: add warmup and cooldown for GuideLLM runs#24
Merged
Conversation
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
3 tasks
VincentG1234
added a commit
that referenced
this pull request
Jun 11, 2026
## Summary Add optional `benchmark.rampup` to study configs, forwarding GuideLLM's `--rampup` flag so concurrent benchmarks can ramp load linearly to target concurrency instead of starting at full rate. ## Why Sudden full-concurrency load can skew benchmark metrics (cold caches, queue buildup, OOM risk). GuideLLM supports a ramp-up period; this fork already exposes `warmup` and `cooldown` but not `rampup`, so users could not control how load increases at the start of a run. ## What changed - `auto_tune_vllm/benchmarks/config.py` — add optional `rampup` field; validate `> 0` in `__post_init__` - `auto_tune_vllm/benchmarks/providers.py` — pass `--rampup` to GuideLLM CLI when set - `tests/benchmarks/test_guidellm_command.py` — CLI construction and validation tests for rampup - `docs/configuration.md` — document `rampup` (seconds, included in metrics, unlike warmup) - `examples/study_config.yaml`, `examples/study_config_minimal.yaml` — commented example - `README.md` — fork changelog entry ## How tested - [x] `ruff check .` - [x] `pytest -v tests/benchmarks/test_guidellm_command.py` (11 passed) - [ ] Manual E2E (maintainer): `auto-tune-vllm optimize` with `benchmark.rampup: 10` and verify GuideLLM receives `--rampup 10` ## Risks / limitations - Requires a GuideLLM version that supports `--rampup` (same pattern as existing warmup/cooldown flags). - Ramp-up requests are included in reported metrics; only `warmup`/`cooldown` exclude phases from measurement. - No interaction with fractional warmup/cooldown sum validation (rampup is always absolute seconds). ## Links - Follows [#24](#24) (warmup/cooldown) and [#27](#27) (sample_requests) benchmark config extensions. Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expose optional GuideLLM
warmupandcooldownsettings in study YAML so each trial (baseline and optimization) can exclude cold-start and shutdown phases from reported metrics, reducing benchmark variance without changing Optuna objectives or trial lifecycle.Why
Cold GPU/KV cache at the start of each benchmark run increases metric variance across trials. GuideLLM already supports
--warmupand--cooldown; auto-tuning-vllm did not pass them through. Users had no config-level way to stabilize measurements.What changed
auto_tune_vllm/benchmarks/config.py— add optionalwarmup/cooldownfields; validate values (> 0, fractional sum< 1).auto_tune_vllm/benchmarks/providers.py— forward flags toguidellm benchmarkwhen set.docs/configuration.md— document fields, measured-duration note, production example comments.examples/study_config.yaml,examples/study_config_minimal.yaml— commented example lines.tests/benchmarks/test_guidellm_command.py— unit tests for CLI args and validation (no GPU).How tested
ruff check .pytest -v tests/(60 passed, including 7 new benchmark tests)auto-tune-vllm optimizewithbenchmark.warmup: 0.1/cooldown: 0.1and verify GuideLLM CLI receives flags and trials completeRisks / limitations
--warmup/--cooldownonguidellm benchmark(pyproject.tomlstill pinsguidellm>=0.1.0; older CLIs may fail at runtime).max_secondsbudget; users should increasemax_secondsif they need a longer steady-state measurement window.max_seconds * 1.5); acceptable because GuideLLM excludes warmup/cooldown within the same run duration.Branch:
feat/guidellm-warmup-cooldown