Skip to content

[FEAT]: add warmup and cooldown for GuideLLM runs#24

Merged
VincentG1234 merged 1 commit into
mainfrom
FEAT/guidellm-warmup-cooldown
May 20, 2026
Merged

[FEAT]: add warmup and cooldown for GuideLLM runs#24
VincentG1234 merged 1 commit into
mainfrom
FEAT/guidellm-warmup-cooldown

Conversation

@VincentG1234

@VincentG1234 VincentG1234 commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

Expose optional GuideLLM warmup and cooldown settings in study YAML so each trial (baseline and optimization) can exclude cold-start and shutdown phases from reported metrics, reducing benchmark variance without changing Optuna objectives or trial lifecycle.

Why

Cold GPU/KV cache at the start of each benchmark run increases metric variance across trials. GuideLLM already supports --warmup and --cooldown; auto-tuning-vllm did not pass them through. Users had no config-level way to stabilize measurements.

What changed

  • auto_tune_vllm/benchmarks/config.py — add optional warmup / cooldown fields; validate values (> 0, fractional sum < 1).
  • auto_tune_vllm/benchmarks/providers.py — forward flags to guidellm benchmark when set.
  • docs/configuration.md — document fields, measured-duration note, production example comments.
  • examples/study_config.yaml, examples/study_config_minimal.yaml — commented example lines.
  • tests/benchmarks/test_guidellm_command.py — unit tests for CLI args and validation (no GPU).

How tested

  • ruff check .
  • pytest -v tests/ (60 passed, including 7 new benchmark tests)
  • Manual E2E (maintainer): auto-tune-vllm optimize with benchmark.warmup: 0.1 / cooldown: 0.1 and verify GuideLLM CLI receives flags and trials complete

Risks / limitations

  • Requires a recent GuideLLM install with --warmup / --cooldown on guidellm benchmark (pyproject.toml still pins guidellm>=0.1.0; older CLIs may fail at runtime).
  • Warmup/cooldown share the same max_seconds budget; users should increase max_seconds if they need a longer steady-state measurement window.
  • No change to trial timeout logic (max_seconds * 1.5); acceptable because GuideLLM excludes warmup/cooldown within the same run duration.

Branch: feat/guidellm-warmup-cooldown

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
@VincentG1234 VincentG1234 merged commit 5841e59 into main May 20, 2026
7 checks passed
@VincentG1234 VincentG1234 deleted the FEAT/guidellm-warmup-cooldown branch May 20, 2026 16:47
VincentG1234 added a commit that referenced this pull request Jun 11, 2026
## Summary
Add optional `benchmark.rampup` to study configs, forwarding GuideLLM's
`--rampup` flag so concurrent benchmarks can ramp load linearly to
target concurrency instead of starting at full rate.

## Why
Sudden full-concurrency load can skew benchmark metrics (cold caches,
queue buildup, OOM risk). GuideLLM supports a ramp-up period; this fork
already exposes `warmup` and `cooldown` but not `rampup`, so users could
not control how load increases at the start of a run.

## What changed
- `auto_tune_vllm/benchmarks/config.py` — add optional `rampup` field;
validate `> 0` in `__post_init__`
- `auto_tune_vllm/benchmarks/providers.py` — pass `--rampup` to GuideLLM
CLI when set
- `tests/benchmarks/test_guidellm_command.py` — CLI construction and
validation tests for rampup
- `docs/configuration.md` — document `rampup` (seconds, included in
metrics, unlike warmup)
- `examples/study_config.yaml`, `examples/study_config_minimal.yaml` —
commented example
- `README.md` — fork changelog entry

## How tested
- [x] `ruff check .`
- [x] `pytest -v tests/benchmarks/test_guidellm_command.py` (11 passed)
- [ ] Manual E2E (maintainer): `auto-tune-vllm optimize` with
`benchmark.rampup: 10` and verify GuideLLM receives `--rampup 10`

## Risks / limitations
- Requires a GuideLLM version that supports `--rampup` (same pattern as
existing warmup/cooldown flags).
- Ramp-up requests are included in reported metrics; only
`warmup`/`cooldown` exclude phases from measurement.
- No interaction with fractional warmup/cooldown sum validation (rampup
is always absolute seconds).

## Links
- Follows [#24](#24)
(warmup/cooldown) and
[#27](#27)
(sample_requests) benchmark config extensions.

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant