[FEAT] Parse prompt and total token throughput from GuideLLM by VincentG1234 · Pull Request #29 · InseeFrLab/auto-tuning-vllm

VincentG1234 · 2026-06-03T14:25:05Z

Summary

Expose GuideLLM prompt_tokens_per_second (input/prefill throughput) and tokens_per_second (prompt + output) in trial detailed_metrics, so they can be used in optimization.log_metrics, objectives, and the Optuna dashboard alongside existing output throughput metrics.

Why

Users want to track prompt/input throughput in Optuna user attrs (log_metrics) and optionally optimize or compare prefill vs decode performance. GuideLLM has emitted these metrics since the v0.4+ stats refactor; auto-tuning only parsed the original five benchmark scalars.

What changed

auto_tune_vllm/benchmarks/providers.py — parse prompt_tokens_per_second and tokens_per_second from GuideLLM JSON (total category, same loop as existing metrics).
auto_tune_vllm/core/config.py — add both base metrics to ObjectiveConfig.valid_metrics (objectives, expressions, log_metrics validation).
docs/configuration.md — document new metrics and log_metrics example.
examples/README_optimization_guide.md — metrics table.
examples/study_config.yaml — commented log_metrics example.

How tested

ruff check .
pytest -v tests/ (full suite)
Manual E2E (maintainer): auto-tune-vllm optimize with log_metrics including prompt_tokens_per_second_median

Verified parsing against guidellm/benchmarks_mp.json (GuideLLM >= 0.5.4 export).

Risks / limitations

Trials fail if GuideLLM results omit prompt_tokens_per_second (e.g. very old JSON exports). Project already requires guidellm>=0.5.4 in pyproject.toml.
tokens_per_second was already present in many JSON files but was not parsed until this PR.

Links

Complements [FEAT] add log_metrics as Optuna trial user attrs #22 (log_metrics → Optuna user attrs)

Expose prompt_tokens_per_second and tokens_per_second in detailed_metrics so they can be used in log_metrics and objectives (guidellm>=0.5.4). Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Copilot

Pull request overview

Exposes GuideLLM prompt (prefill) throughput and combined prompt+decode throughput by parsing prompt_tokens_per_second and tokens_per_second into trial detailed_metrics, and wiring these base metrics into objective/log-metrics validation and documentation.

Changes:

Parse prompt_tokens_per_second and tokens_per_second from GuideLLM benchmark JSON and include them in the required metrics set.
Extend objective/log-metrics validation to accept the two new base metrics (and their percentile-suffixed identifiers).
Document the new metrics and add a commented log_metrics example in sample configs/docs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`auto_tune_vllm/benchmarks/providers.py`	Adds parsing/required-metric handling for `prompt_tokens_per_second` and `tokens_per_second` from GuideLLM results.
`auto_tune_vllm/core/config.py`	Adds the new base metrics to `ObjectiveConfig.valid_metrics` so they can be used in objectives/expressions and `log_metrics`.
`docs/configuration.md`	Documents the new metrics and adds a `log_metrics` example including `prompt_tokens_per_second_median`.
`examples/README_optimization_guide.md`	Extends the metrics table to include prompt and combined throughput metrics.
`examples/study_config.yaml`	Adds a commented `log_metrics` example entry for `prompt_tokens_per_second_median`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

            required_metrics = [
                "requests_per_second",
                "request_latency",
                "output_tokens_per_second",
+                "prompt_tokens_per_second",
+                "tokens_per_second",
                "time_to_first_token_ms",
                "inter_token_latency_ms",
            ]


[FEAT] Parse prompt and total token throughput from GuideLLM

ce0a3f7

Expose prompt_tokens_per_second and tokens_per_second in detailed_metrics so they can be used in log_metrics and objectives (guidellm>=0.5.4). Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

VincentG1234 requested a review from Copilot June 3, 2026 15:43

Copilot started reviewing on behalf of VincentG1234 June 3, 2026 15:43 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread auto_tune_vllm/benchmarks/providers.py

Comment on lines 395 to 403

required_metrics = [

"requests_per_second",

"request_latency",

"output_tokens_per_second",

"prompt_tokens_per_second",

"tokens_per_second",

"time_to_first_token_ms",

"inter_token_latency_ms",

]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Parse prompt and total token throughput from GuideLLM#29

[FEAT] Parse prompt and total token throughput from GuideLLM#29
VincentG1234 wants to merge 1 commit into
mainfrom
FEAT/guidellm-prompt-throughput-metrics

VincentG1234 commented Jun 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VincentG1234 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

How tested

Risks / limitations

Links

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VincentG1234 commented Jun 3, 2026 •

edited

Loading