Skip to content

Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out#1908

Merged
shangyian merged 11 commits intoDataJunction:mainfrom
shangyian:add-build-v3-tests
Mar 24, 2026
Merged

Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out#1908
shangyian merged 11 commits intoDataJunction:mainfrom
shangyian:add-build-v3-tests

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Mar 20, 2026

Summary

This PR fixes several related issues in v3 SQL generation.

Legacy Druid cube column names

When a cube was materialized using the legacy DruidMaterializationJob (pre-DruidCubeMaterializationJob), physical column names in the Druid table use the amenable_name format (e.g. with _DOT_) rather than the short column name. build_synthetic_grain_group now reads materialization.config["combiners"][*]["columns"] to build a short name to physical column name lookup and applies it to both SELECT projections and WHERE clause references.

Dialect-based table references

Druid SQL syntax does not support schema-qualified table names, so build_synthetic_grain_group now emits schema.table for Druid and catalog.schema.table for all other dialects.

Dialect defaults

Previously, dialect=None was silently normalized to SPARK before the cube-path check, causing dialect=None requests to skip the materialized cube path entirely. The resolver now picks the fastest available engine tier: Druid (if a matching cube exists), then Trino, then Spark. This is documented in _ENGINE_TIER_PREFERENCE.

Pre-agg wrapper CTEs

Adds _build_pre_agg_wrapper_cte to prevent fan-out when a LIMITED aggregability metric is combined with a pre-aggregated grain group, wrapping the pre-agg source in a CTE that aggregates it to the right grain before joining.

Test Plan

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 20, 2026

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 94946f6
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/69c223c1173e4b0007d66f62

# Handle LIMITED aggregability (COUNT DISTINCT).
# If the grain group was pre-aggregated (is_pre_aggregated=True), the wrapper CTE
# already computed COUNT(DISTINCT grain_key) and stored it as a named column.
# Emit SUM(pre_agg_col) — a no-op re-aggregation since the wrapper produces
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to aggregate to the right grain before we combine these metrics to remove chances of double counting.

@shangyian shangyian changed the title Add build v3 tests Fix build_v3 Druid cube compatibility, dialect auto-detection, and pre-agg fan-out Mar 24, 2026
@shangyian shangyian marked this pull request as ready for review March 24, 2026 06:14
@shangyian shangyian merged commit 89fd3aa into DataJunction:main Mar 24, 2026
17 checks passed
@shangyian shangyian deleted the add-build-v3-tests branch March 24, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant