chore(bq|sf|rs|pg|db|ora|benchmark): cross-cloud spatial-index benchmark suite [sc-550438] by vdelacruzb · Pull Request #606 · CartoDB/analytics-toolbox-core

vdelacruzb · 2026-05-07T09:54:22Z

Summary

Cross-cloud spatial-index benchmark suite. Replaces the Oracle-only scaffolding originally scoped for sc-550438 — now covers all six supported clouds (BigQuery, Snowflake, Redshift, Postgres, Databricks, Oracle) with per-function timing benchmarks for H3 and Quadbin modules.

What's in this PR

Per cloud (core/clouds/<cloud>/):

common/benchmark_utils/ (Py) or common/benchmark-utils.js (JS) — bench()/benchmark() helpers: cached per-process connection (so auth/setup isn't in the timer), CTAS output pattern (real tables, not COUNT(*)), @@SCHEMA@@ placeholder resolution, env-var-driven config dir.
common/list_functions.js — extended with --type=benchmark and --base-dirs="p1,p2" for cross-repo discovery (mirrors how build_modules.js aggregates via MODULES_DIRS).
modules/benchmark/<module>/benchmark_<FN>.{py,bench.js} — one ~10-line file per function (19 H3 + 19 Quadbin per applicable cloud).
modules/benchmark/config.template.json (committed) + config.json (gitignored).
modules/Makefile — benchmark target with the same modules=/functions= filters as make test, plus keep=1 (preserve output tables) and verbose=1 (full errors).
Makefile — benchmark / benchmark-modules per-cloud targets.

Top-level (core/Makefile) — make benchmark cloud=<cloud> mirroring make test/make deploy.

Each run appends a markdown table to dist/benchmark_<ts>.md:

Function	Params	Time (s)	Error	Output Table
H3_KRING	source_table=…, h3_column=h3, size=3	1.82	-	my_schema.bench_h3_kring

The Output Table column appears only with keep=1.

Cross-cloud alignment: same row counts / resolutions / modes everywhere (e.g. H3_COMPACT and H3_UNCOMPACT capped at 100 rows because Oracle's H3_UNCOMPACT hits an internal VARCHAR2(4000) ceiling on larger arrays).

Docs: each cloud's README.md gains a make benchmark bullet + benchmark-modules entry in the existing Filtering section. .claude/rules/<cloud>.md documents the placeholder convention (<my-{ns}>.<my-table> / <my-{ns}>.<my-output-table>) used across docs and benchmark templates.

Usage

make benchmark cloud=<cloud>                                 # all
make benchmark cloud=<cloud> modules=h3 functions=H3_KRING   # filtered
make benchmark cloud=<cloud> keep=1                          # inspect output tables
make benchmark cloud=<cloud> verbose=1                       # full errors in results

Test plan

make benchmark cloud=<cloud> runs every configured benchmark
Filters (modules=, functions=) work as in make test
keep=1 preserves output tables and shows the extra column
verbose=1 writes the full error into the table cell (no truncation)
make test / make deploy unaffected; lint passes across all 6 clouds

🤖 Generated with Claude Code

Adds the infrastructure for running per-function timing benchmarks on Oracle modules, paralleling the existing test target. - bench() helper added to test_utils/__init__.py: single timed run, prints markdown row to stdout + appends to benchmark_results.md with per-file ### filename sub-sections. - list_functions.js extended with --type=benchmark to discover modules/benchmarks/<module>/benchmark_<FUNCTION>.py files. - New `make benchmark` target in core/clouds/oracle/modules/Makefile mirrors `make test`'s modules=/functions= filter shape, prepends a timestamped ## Benchmark run header to benchmark_results.md, then invokes each matching benchmark via python. - benchmark_results.md added to .gitignore. - One example benchmark: benchmark_H3_KRING.py. - README at clouds/oracle/modules/benchmarks/README.md documents layout, running, output format, and how to author new benchmarks. This is Oracle scaffolding only; BigQuery and Snowflake equivalents follow in subsequent commits/PRs once the Oracle pattern is verified. The 18 H3 + 18 quadbin per-function benchmark files follow once the scaffolding is approved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a benchmarking framework for Oracle modules, featuring a discovery script for benchmark files, a timing utility in the test helpers, and a new Makefile target. Review feedback identifies a missing variable declaration in the JavaScript discovery script and suggests refactoring the timing logic to exclude database connection overhead, ensuring more accurate benchmark results.

Two fixes responding to review feedback: 1. bench() doesn't belong in test_utils — different concern. Moved to a new clouds/oracle/common/benchmark_utils/ package, sibling of test_utils. benchmark_utils imports run_query from test_utils to reuse its connection caching. 2. Per-file sys.path.insert bootstrap was ugly. Removed in favor of Make setting PYTHONPATH=$(COMMON_DIR) when invoking benchmarks. Mirrors how pytest auto-loads modules/test/__init__.py for tests: the orchestrator (make / pytest) handles path setup so per-file code stays clean. Per-function benchmark files now look like: from benchmark_utils import bench SOURCE_TABLE = '@@ORA_SCHEMA@@.SAMPLE_TABLE' bench(name='...', sql='...') For single-file re-runs (after a fix), use the Make filter: `make benchmark functions=H3_KRING`. Direct `python <file>` requires the user to set PYTHONPATH manually; the Make form is the preferred path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three small alignments after review: 1. Rename modules/benchmarks/ → modules/benchmark/ (singular). The project's existing convention is singular collective nouns: modules/test/, modules/sql/, modules/doc/. Industry leans plural (asv, Google Benchmark) but internal consistency wins. 2. Add modules/benchmark/__init__.py with the same sys.path bootstrap that modules/test/__init__.py uses. With this, `python -m clouds.oracle.modules.benchmark.h3.benchmark_H3_KRING` works without manual PYTHONPATH; full structural parity with tests. 3. Replace per-file docstring with the standard `# Copyright (c) 2026, CARTO` header, matching the format used in test_*.py files. Updated list_functions.js subdir = 'benchmark', Makefile and README references accordingly. Side-by-side parity: Tests Benchmarks ----- ---------- common/test_utils/__init__.py common/benchmark_utils/__init__.py modules/test/__init__.py (path setup) modules/benchmark/__init__.py (path setup) modules/test/<m>/test_<F>.py modules/benchmark/<m>/benchmark_<F>.py make test modules=X functions=Y make benchmark modules=X functions=Y list_functions.js list_functions.js --type=benchmark pytest runner direct python runner Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nore intent) Working-tree copy preserved for local benchmarking; existing gitignore rule was anchored to the wrong path and never took effect, so the file was committed by accident.

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

Comment thread clouds/oracle/common/list_functions.js Outdated

Comment thread clouds/oracle/common/test_utils/__init__.py Outdated

vdelacruzb and others added 21 commits May 7, 2026 11:58

lint

89e85ed

update benchmark scripts

d072c58

polish benchmark process and include more benchmarks

4f4e09e

polish

0b73843

apply gemini recommendations

94ff9d5

tweak polyfill benchmark files

b9754c8

chore(ora|benchmark): untrack benchmark/config.json (already in gitig…

9df0771

…nore intent) Working-tree copy preserved for local benchmarking; existing gitignore rule was anchored to the wrong path and never took effect, so the file was committed by accident.

add benchmarks to bigquey

efc1b0f

benchmark snowflake

6895824

add benchmark scafolding for redshift

e3bd4a4

benchmark code in postgres

e2b3a43

add benchmark for databricks

73fadc4

some fixes postgres

893b4a6

simplify templates

680a5ae

allow verbose mode and fix tables template format

d6f2ce7

store outputs in tables

d17965c

update doc

6ee88a4

some infra fixes

f771f5d

add table naming rulse

26e3486

vdelacruzb changed the title ~~chore(ora): add benchmark scaffolding for Oracle modules [sc-550438]~~ chore(bq|sf|rs|pg|db|ora|benchmark): cross-cloud spatial-index benchmark suite [sc-550438] May 12, 2026

vdelacruzb added 4 commits May 12, 2026 15:48

update makefiles

98b20d1

allow tags

5ecb839

align naming of parameters

3c4c459

fix table dropping in databricks

8c5cae7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(bq|sf|rs|pg|db|ora|benchmark): cross-cloud spatial-index benchmark suite [sc-550438]#606

chore(bq|sf|rs|pg|db|ora|benchmark): cross-cloud spatial-index benchmark suite [sc-550438]#606
vdelacruzb wants to merge 26 commits into
mainfrom
chore/sc-550438/spatial-index-benchmarks

vdelacruzb commented May 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vdelacruzb commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

Usage

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vdelacruzb commented May 7, 2026 •

edited

Loading