Skip to content

Conversation

@godnight10061
Copy link
Contributor

Summary

  • Run ScanBuilder::prepare() for ScanBuilder::into_stream() on spawn_blocking instead of inside Stream::poll_next.
  • Add a regression test to ensure the first poll returns quickly.

Motivation

This addresses the ClickBench regression reported in #5899. prepare() can be expensive, and running it synchronously inside poll_next can cause head-of-line blocking when many scan streams are polled concurrently.

Testing

  • cargo +nightly fmt --all --check
  • cargo test -p vortex-scan --lib
  • cargo clippy --locked --all-features --all-targets -p vortex-scan -- -D warnings
  • cargo hack clippy -p vortex-scan --no-default-features -- -D warnings
  • cargo semver-checks --workspace (ran in a temporary worktree after setting workspace version to 0.58.0)
  • PYO3_PYTHON=C:\Python312\python.exe cargo test --locked --workspace --all-features --exclude vortex-duckdb --exclude duckdb-bench

Bench

Local DataFusion ClickBench (partitioned, q23, 5 iters):

  • datafusion:vortex-file-compressed: 39.405s → 27.661s
  • datafusion:vortex-compact: 57.573s → 35.945s

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 56.43564% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.98%. Comparing base (9d4fcc7) to head (1f416f2).
⚠️ Report is 18 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-scan/src/scan_builder.rs 48.83% 44 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@connortsui20 connortsui20 added the performance Release label indicating an improvement to performance label Jan 9, 2026
@connortsui20
Copy link
Contributor

connortsui20 commented Jan 9, 2026

Thanks for the PR!

I'm going to run our benchmarks on this branch so there will be a bunch of large comments made here by a bot that gives a rough idea where there are improvements/regressions, this is just a heads up!

Edit: huh something went wrong with that but at least they ran...

Edit again: There seems to be some sort of race condition in our CI. Because we have to approve the workflows on outside contributors' PRs, the actions bot doesn't take the benchmark-sql tag off and we don't get any summaries from the bot. The benchmarks do run though. Maybe @AdamGS understands this better?

@joseph-isaacs
Copy link
Contributor

Re the benchmark that is the expected behaviour. We cannot give a fork action write permission. We use the run summary to print out the result

@joseph-isaacs joseph-isaacs added the benchmark Run benchmarks on this branch label Jan 12, 2026
@godnight10061
Copy link
Contributor Author

godnight10061 commented Jan 12, 2026

@connortsui20 @joseph-isaacs The two failing checks don't appear to be caused by this PR. Happy to provide additional context if needed.

@joseph-isaacs
Copy link
Contributor

@godnight10061 look there is not a merge conflict

I also see some regressions in clickbench: https://github.com/vortex-data/vortex/actions/runs/20914237836?pr=5906 and fineweb any ideas??

@godnight10061
Copy link
Contributor Author

I also see some regressions in clickbench: https://github.com/vortex-data/vortex/actions/runs/20914237836?pr=5906 and fineweb any ideas??

May you tell me which exact target line you're referring to?

@joseph-isaacs joseph-isaacs added benchmark Run benchmarks on this branch and removed benchmark Run benchmarks on this branch labels Jan 14, 2026
@joseph-isaacs
Copy link
Contributor

joseph-isaacs commented Jan 14, 2026

clickbench: clickbench_{q24,q26}/duckdb:vortex-file-compressed, I have just kicked off another run so we can see if these repeat (new: https://github.com/vortex-data/vortex/actions/runs/20988924098 previous: https://github.com/vortex-data/vortex/actions/runs/20914237836)

@joseph-isaacs joseph-isaacs added benchmark Run benchmarks on this branch and removed benchmark Run benchmarks on this branch labels Jan 14, 2026
@robert3005
Copy link
Contributor

clickbench_q24/duckdb:vortex-file-compressed 58105110 4.38643e+07 1.32465 ns 🚨
clickbench_q25/duckdb:vortex-file-compressed 72616639 6.3129e+07 1.15029 ns 🚨
clickbench_q26/duckdb:vortex-file-compressed 53189696 4.08331e+07 1.30261 ns 🚨

Interestingly doesn't affect datafusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run benchmarks on this branch performance Release label indicating an improvement to performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants