Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out by shangyian · Pull Request #1908 · DataJunction/dj

shangyian · 2026-03-20T07:30:06Z

Summary

This PR fixes several related issues in v3 SQL generation.

Legacy Druid cube column names

When a cube was materialized using the legacy DruidMaterializationJob (pre-DruidCubeMaterializationJob), physical column names in the Druid table use the amenable_name format (e.g. with _DOT_) rather than the short column name. build_synthetic_grain_group now reads materialization.config["combiners"][*]["columns"] to build a short name to physical column name lookup and applies it to both SELECT projections and WHERE clause references.

Dialect-based table references

Druid SQL syntax does not support schema-qualified table names, so build_synthetic_grain_group now emits schema.table for Druid and catalog.schema.table for all other dialects.

Dialect defaults

Previously, dialect=None was silently normalized to SPARK before the cube-path check, causing dialect=None requests to skip the materialized cube path entirely. The resolver now picks the fastest available engine tier: Druid (if a matching cube exists), then Trino, then Spark. This is documented in _ENGINE_TIER_PREFERENCE.

Pre-agg wrapper CTEs

Adds _build_pre_agg_wrapper_cte to prevent fan-out when a LIMITED aggregability metric is combined with a pre-aggregated grain group, wrapping the pre-agg source in a CTE that aggregates it to the right grain before joining.

Test Plan

PR has an associated issue: #
make check passes
make test shows 100% unit test coverage

Deployment Plan

netlify · 2026-03-20T07:30:11Z

✅ Deploy Preview for thriving-cassata-78ae72 canceled.

Name	Link
🔨 Latest commit	`94946f6`
🔍 Latest deploy log	https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/69c223c1173e4b0007d66f62

shangyian · 2026-03-20T16:21:51Z

datajunction-server/datajunction_server/construction/build_v3/metrics.py

+    # Handle LIMITED aggregability (COUNT DISTINCT).
+    # If the grain group was pre-aggregated (is_pre_aggregated=True), the wrapper CTE
+    # already computed COUNT(DISTINCT grain_key) and stored it as a named column.
+    # Emit SUM(pre_agg_col) — a no-op re-aggregation since the wrapper produces


We have to aggregate to the right grain before we combine these metrics to remove chances of double counting.

…e state, and materialization type

shangyian added 2 commits March 20, 2026 00:07

Clean up

8a0cb40

Fix tests

8dd91e2

Add tests

07d4422

shangyian commented Mar 20, 2026

View reviewed changes

shangyian added 3 commits March 23, 2026 11:22

Fix metrics SQL tests

00c03aa

Fix

5be7eb2

Add appropriate fallbacks based on dialect selected, materialized cub…

b66dc05

…e state, and materialization type

shangyian changed the title ~~Add build v3 tests~~ Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out Mar 24, 2026

shangyian added 5 commits March 23, 2026 18:29

Fix tests

ba5cb8e

Fix

b2c39cd

Fix

d887917

Fix tests

d75e02c

Fix

94946f6

shangyian marked this pull request as ready for review March 24, 2026 06:14

shangyian merged commit 89fd3aa into DataJunction:main Mar 24, 2026
17 checks passed

shangyian deleted the add-build-v3-tests branch March 24, 2026 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out#1908

Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out#1908
shangyian merged 11 commits intoDataJunction:mainfrom
shangyian:add-build-v3-tests

shangyian commented Mar 20, 2026 •

edited

Loading

Uh oh!

netlify bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

shangyian Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shangyian commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Legacy Druid cube column names

Dialect-based table references

Dialect defaults

Pre-agg wrapper CTEs

Test Plan

Deployment Plan

Uh oh!

netlify bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for thriving-cassata-78ae72 canceled.

Uh oh!

shangyian Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shangyian commented Mar 20, 2026 •

edited

Loading

netlify bot commented Mar 20, 2026 •

edited

Loading