Skip to content

[FEAT] Add benchmark.sample_requests for GuideLLM output size control#27

Merged
VincentG1234 merged 1 commit into
mainfrom
add-sample-requests-parameter
May 28, 2026
Merged

[FEAT] Add benchmark.sample_requests for GuideLLM output size control#27
VincentG1234 merged 1 commit into
mainfrom
add-sample-requests-parameter

Conversation

@VincentG1234

@VincentG1234 VincentG1234 commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Expose GuideLLM --sample-requests via benchmark.sample_requests in study YAML (default 0) and bump the GuideLLM dependency to >= 0.5.4.

Why

GuideLLM benchmark JSON files can grow very large when every request is stored. During Optuna studies this adds unnecessary disk I/O and storage overhead, while aggregate metrics used for optimization are unaffected. The --sample-requests flag regressed around GuideLLM v0.4 and was fixed in v0.5.4.

What changed

  • auto_tune_vllm/benchmarks/config.py — add sample_requests: int = 0 with validation (>= 0)
  • auto_tune_vllm/benchmarks/providers.py — pass --sample-requests to the GuideLLM CLI on every benchmark run
  • docs/configuration.md — document sample_requests and minimum GuideLLM version
  • examples/study_config.yaml, examples/study_config_minimal.yaml — commented example
  • pyproject.toml, requirements.txt — pin guidellm>=0.5.4 (fix for --sample-requests)

How tested

  • ruff check .
  • pytest -v tests/
  • Manual E2E (maintainer): run a short study and confirm benchmark JSON size stays small with default config, and grows when sample_requests: 20 is set

Links

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
@VincentG1234 VincentG1234 merged commit 9c99ab9 into main May 28, 2026
7 checks passed
@VincentG1234 VincentG1234 deleted the add-sample-requests-parameter branch May 28, 2026 10:33
VincentG1234 added a commit that referenced this pull request Jun 11, 2026
## Summary
Add optional `benchmark.rampup` to study configs, forwarding GuideLLM's
`--rampup` flag so concurrent benchmarks can ramp load linearly to
target concurrency instead of starting at full rate.

## Why
Sudden full-concurrency load can skew benchmark metrics (cold caches,
queue buildup, OOM risk). GuideLLM supports a ramp-up period; this fork
already exposes `warmup` and `cooldown` but not `rampup`, so users could
not control how load increases at the start of a run.

## What changed
- `auto_tune_vllm/benchmarks/config.py` — add optional `rampup` field;
validate `> 0` in `__post_init__`
- `auto_tune_vllm/benchmarks/providers.py` — pass `--rampup` to GuideLLM
CLI when set
- `tests/benchmarks/test_guidellm_command.py` — CLI construction and
validation tests for rampup
- `docs/configuration.md` — document `rampup` (seconds, included in
metrics, unlike warmup)
- `examples/study_config.yaml`, `examples/study_config_minimal.yaml` —
commented example
- `README.md` — fork changelog entry

## How tested
- [x] `ruff check .`
- [x] `pytest -v tests/benchmarks/test_guidellm_command.py` (11 passed)
- [ ] Manual E2E (maintainer): `auto-tune-vllm optimize` with
`benchmark.rampup: 10` and verify GuideLLM receives `--rampup 10`

## Risks / limitations
- Requires a GuideLLM version that supports `--rampup` (same pattern as
existing warmup/cooldown flags).
- Ramp-up requests are included in reported metrics; only
`warmup`/`cooldown` exclude phases from measurement.
- No interaction with fractional warmup/cooldown sum validation (rampup
is always absolute seconds).

## Links
- Follows [#24](#24)
(warmup/cooldown) and
[#27](#27)
(sample_requests) benchmark config extensions.

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant