Skip to content

[FEAT] Parse prompt and total token throughput from GuideLLM#29

Open
VincentG1234 wants to merge 1 commit into
mainfrom
FEAT/guidellm-prompt-throughput-metrics
Open

[FEAT] Parse prompt and total token throughput from GuideLLM#29
VincentG1234 wants to merge 1 commit into
mainfrom
FEAT/guidellm-prompt-throughput-metrics

Conversation

@VincentG1234

@VincentG1234 VincentG1234 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Expose GuideLLM prompt_tokens_per_second (input/prefill throughput) and tokens_per_second (prompt + output) in trial detailed_metrics, so they can be used in optimization.log_metrics, objectives, and the Optuna dashboard alongside existing output throughput metrics.

Why

Users want to track prompt/input throughput in Optuna user attrs (log_metrics) and optionally optimize or compare prefill vs decode performance. GuideLLM has emitted these metrics since the v0.4+ stats refactor; auto-tuning only parsed the original five benchmark scalars.

What changed

  • auto_tune_vllm/benchmarks/providers.py — parse prompt_tokens_per_second and tokens_per_second from GuideLLM JSON (total category, same loop as existing metrics).
  • auto_tune_vllm/core/config.py — add both base metrics to ObjectiveConfig.valid_metrics (objectives, expressions, log_metrics validation).
  • docs/configuration.md — document new metrics and log_metrics example.
  • examples/README_optimization_guide.md — metrics table.
  • examples/study_config.yaml — commented log_metrics example.

How tested

  • ruff check .
  • pytest -v tests/ (full suite)
  • Manual E2E (maintainer): auto-tune-vllm optimize with log_metrics including prompt_tokens_per_second_median

Verified parsing against guidellm/benchmarks_mp.json (GuideLLM >= 0.5.4 export).

Risks / limitations

  • Trials fail if GuideLLM results omit prompt_tokens_per_second (e.g. very old JSON exports). Project already requires guidellm>=0.5.4 in pyproject.toml.
  • tokens_per_second was already present in many JSON files but was not parsed until this PR.

Links

Expose prompt_tokens_per_second and tokens_per_second in detailed_metrics
so they can be used in log_metrics and objectives (guidellm>=0.5.4).

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Exposes GuideLLM prompt (prefill) throughput and combined prompt+decode throughput by parsing prompt_tokens_per_second and tokens_per_second into trial detailed_metrics, and wiring these base metrics into objective/log-metrics validation and documentation.

Changes:

  • Parse prompt_tokens_per_second and tokens_per_second from GuideLLM benchmark JSON and include them in the required metrics set.
  • Extend objective/log-metrics validation to accept the two new base metrics (and their percentile-suffixed identifiers).
  • Document the new metrics and add a commented log_metrics example in sample configs/docs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
auto_tune_vllm/benchmarks/providers.py Adds parsing/required-metric handling for prompt_tokens_per_second and tokens_per_second from GuideLLM results.
auto_tune_vllm/core/config.py Adds the new base metrics to ObjectiveConfig.valid_metrics so they can be used in objectives/expressions and log_metrics.
docs/configuration.md Documents the new metrics and adds a log_metrics example including prompt_tokens_per_second_median.
examples/README_optimization_guide.md Extends the metrics table to include prompt and combined throughput metrics.
examples/study_config.yaml Adds a commented log_metrics example entry for prompt_tokens_per_second_median.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 395 to 403
required_metrics = [
"requests_per_second",
"request_latency",
"output_tokens_per_second",
"prompt_tokens_per_second",
"tokens_per_second",
"time_to_first_token_ms",
"inter_token_latency_ms",
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants