Skip to content

Optimize sql/sqlall: batch jsonpath extractions to reduce DB round-trips#81

Open
stalep wants to merge 2 commits intomainfrom
issue_80
Open

Optimize sql/sqlall: batch jsonpath extractions to reduce DB round-trips#81
stalep wants to merge 2 commits intomainfrom
issue_80

Conversation

@stalep
Copy link
Copy Markdown
Member

@stalep stalep commented May 3, 2026

Summary

  • Replaces INSERT-UPDATE-SELECT pattern with SELECT-first for sql/sqlall node value computation
  • Batches all sibling sql/sqlall nodes sharing the same source into a single SQL query using PostgreSQL VALUES + CASE
  • Falls back to individual queries for SQLite or single-node cases

Problem

Each sql/sqlall node did 3-4 DB operations: INSERT value with null data, UPDATE with jsonpath result, SELECT to verify, DELETE if null. With ~90 sql/sqlall nodes per upload (rhivos-perf-comprehensive), that was ~270 DB round-trips.

Fix

  1. SELECT-first: Compute the jsonpath result directly via SELECT. Only INSERT values that have data. Null results skip all writes.
  2. Batch siblings: When the first sql/sqlall node for a source is processed, find all sibling nodes sharing that source and compute all jsonpaths in one query. Cache results so subsequent siblings skip the DB entirely.

For rhivos (82 sql/sqlall nodes in 2 source groups), this reduces 82 individual queries to 2 batched queries.

Benchmark — quarkus-spring-boot-comparison (100 uploads, PostgreSQL)

Branch Time Values vs main
main 47.1s 4502 baseline
load-legacy 46.4s 4502 -1.5% (no regression)
issue_80 (batched) 35.5s 4502 -24.5%

Same value count across all branches — no regressions. The load-legacy import changes have no impact on core upload performance.

Benchmark — rhivos-perf-comprehensive (legacy import, 4 runs)

Baseline Batched Improvement
Average 113s 103s ~9%

Test plan

  • All 218 existing tests pass (after clean DB — stale data from crash caused false failures)
  • quarkus-spring-boot-comparison benchmark: same value count (4502), 24.5% faster
  • load-legacy branch: no regression (46.4s vs 47.1s, same values)
  • Verified rhivos import match rate unchanged (84.9%)

Fixes #80 (partial — addresses the sql/sqlall batching item)

stalep added 2 commits May 3, 2026 18:40
Instead of creating a value with null data, updating it via SQL, then
reading it back to check for null — directly SELECT the jsonpath
result, skip entirely if null, and only INSERT values that have data.

Before: INSERT + UPDATE + SELECT + (DELETE if null) = 3-4 DB ops per node
After:  SELECT + (INSERT if non-null) = 1-2 DB ops per node

For null results (missing jsonpath), eliminates all writes entirely.
When the first sql/sqlall node for a source is processed, find all
sibling sql/sqlall nodes sharing the same source and compute all
their jsonpaths in a single SQL query using VALUES + CASE. Results
are cached so subsequent siblings skip the DB entirely.

For rhivos (82 sql/sqlall nodes in 2 source groups), this reduces
82 individual queries to 2 batched queries.
@stalep stalep marked this pull request as draft May 4, 2026 11:11
@stalep stalep marked this pull request as ready for review May 4, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: upload pipeline takes ~113s per run for complex tests

1 participant