Optimize sql/sqlall: batch jsonpath extractions to reduce DB round-trips by stalep · Pull Request #81 · Hyperfoil/h5m

stalep · 2026-05-03T18:34:01Z

Summary

Replaces INSERT-UPDATE-SELECT pattern with SELECT-first for sql/sqlall node value computation
Batches all sibling sql/sqlall nodes sharing the same source into a single SQL query using PostgreSQL VALUES + CASE
Falls back to individual queries for SQLite or single-node cases

Problem

Each sql/sqlall node did 3-4 DB operations: INSERT value with null data, UPDATE with jsonpath result, SELECT to verify, DELETE if null. With ~90 sql/sqlall nodes per upload (rhivos-perf-comprehensive), that was ~270 DB round-trips.

Fix

SELECT-first: Compute the jsonpath result directly via SELECT. Only INSERT values that have data. Null results skip all writes.
Batch siblings: When the first sql/sqlall node for a source is processed, find all sibling nodes sharing that source and compute all jsonpaths in one query. Cache results so subsequent siblings skip the DB entirely.

For rhivos (82 sql/sqlall nodes in 2 source groups), this reduces 82 individual queries to 2 batched queries.

Benchmark — quarkus-spring-boot-comparison (100 uploads, PostgreSQL)

Branch	Time	Values	vs main
main	47.1s	4502	baseline
load-legacy	46.4s	4502	-1.5% (no regression)
issue_80 (batched)	35.5s	4502	-24.5%

Same value count across all branches — no regressions. The load-legacy import changes have no impact on core upload performance.

Benchmark — rhivos-perf-comprehensive (legacy import, 4 runs)

	Baseline	Batched	Improvement
Average	113s	103s	~9%

Test plan

All 218 existing tests pass (after clean DB — stale data from crash caused false failures)
quarkus-spring-boot-comparison benchmark: same value count (4502), 24.5% faster
load-legacy branch: no regression (46.4s vs 47.1s, same values)
Verified rhivos import match rate unchanged (84.9%)

Fixes #80 (partial — addresses the sql/sqlall batching item)

Instead of creating a value with null data, updating it via SQL, then reading it back to check for null — directly SELECT the jsonpath result, skip entirely if null, and only INSERT values that have data. Before: INSERT + UPDATE + SELECT + (DELETE if null) = 3-4 DB ops per node After: SELECT + (INSERT if non-null) = 1-2 DB ops per node For null results (missing jsonpath), eliminates all writes entirely.

When the first sql/sqlall node for a source is processed, find all sibling sql/sqlall nodes sharing the same source and compute all their jsonpaths in a single SQL query using VALUES + CASE. Results are cached so subsequent siblings skip the DB entirely. For rhivos (82 sql/sqlall nodes in 2 source groups), this reduces 82 individual queries to 2 batched queries.

stalep added 2 commits May 3, 2026 18:40

stalep marked this pull request as draft May 4, 2026 11:11

stalep marked this pull request as ready for review May 4, 2026 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sql/sqlall: batch jsonpath extractions to reduce DB round-trips#81

Optimize sql/sqlall: batch jsonpath extractions to reduce DB round-trips#81
stalep wants to merge 2 commits intomainfrom
issue_80

stalep commented May 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stalep commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Benchmark — quarkus-spring-boot-comparison (100 uploads, PostgreSQL)

Benchmark — rhivos-perf-comprehensive (legacy import, 4 runs)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stalep commented May 3, 2026 •

edited

Loading