Skip to content

Conversation

@praveenc7
Copy link
Contributor

Summary

ISSUE=#17336

For dictionary-encoded columns, DISTINCT_COUNT_SMART_HLL currently uses a RoaringBitmap to deduplicate dictionary IDs before feeding values into HLL. While efficient for low cardinality, this approach becomes CPU-intensive for high cardinality (hundreds of thousands to millions of distinct values), where RoaringBitmap insertions dominate query execution time and negate the benefits of HLL.

Screenshot 2025-12-16 at 3 59 43 PM

Proposal

Introduce a cardinality-aware execution path for DISTINCT_COUNT_HLL:

  • Low cardinality → continue using RoaringBitmap (exact deduplication, memory-efficient)
  • High cardinality → bypass RoaringBitmap and update HLL directly

Observed improvements

  • Reduces server-side CPU time by ~4x - 10× for high-cardinality queries (observed improvements from ~8s → ~700ms in prod benchmarks).

Testing Done

Added JMH benchmark covering:

This JMH benchmark isolates server-side aggregation cost for the DistinctCountHLLAggregationFunction under controlled parameters: Each variation was run for 10 minutes

recordCount: {100K, 500K, 1M, 5M, 10M, 25M}
cardinalityRatioPercent: {1, 10, 30, 50, 80, 100} → Creates a record with configured cardinality
useRoaringBitMap/HLL -> Controls on to run the test with useRoaringBitMap or HLL

DictIds are pre-generated so benchmark timing includes only aggregation, not data generation.

Sample plots :
Screenshot 2025-12-16 at 3 58 39 PM

Flame graph after optimization : Aggregate doesn't dominate CPU
Screenshot 2025-12-16 at 4 01 42 PM

Benchmark Results (Average Latency, ms/op)

Record Count = 100,000

Cardinality RoaringBitmap Direct HLL
1,000 0.6 0.80
10,000 0.71 0.87
30,000 0.89 0.90
50,000 1.05 0.96
80,000 1.79 1.00
100,000 1.91 1.05

Record Count = 500,000

Cardinality RoaringBitmap Direct HLL
5,000 1.45 2.85
50,000 2.36 2.92
150,000 5.53 3.16
250,000 7.26 3.18
400,000 9.59 3.18
500,000 10.69 3.17

Record Count = 1,000,000

Cardinality RoaringBitmap Direct HLL
10,000 2.53 5.36
100,000 6.69 5.44
300,000 13.12 5.80
500,000 15.92 5.78
800,000 19.84 5.78
1,000,000 22.11 5.71

Record Count = 5,000,000

Cardinality RoaringBitmap Direct HLL
50,000 11.51 25.12
500,000 53.62 25.29
1,500,000 75.60 26.13
2,500,000 92.91 25.53
4,000,000 113.21 25.24
5,000,000 129.34 25.79

Record Count = 10,000,000

Cardinality RoaringBitmap Direct HLL
100,000 52.98 50.64
1,000,000 117.68 50.61
3,000,000 161.56 50.08
5,000,000 206.71 51.14
8,000,000 248.77 50.01
10,000,000 278.78 50.37

Record Count = 25,000,000

Cardinality RoaringBitmap Direct HLL
250,000 199.06 125.82
2,500,000 348.39 126.40
7,500,000 466.14 124.74
12,500,000 555.77 124.35
20,000,000 679.43 124.99

Recommendation:

Based on the micro-benchmark results across record counts and cardinalities, 100K distinct values is a good default threshold to start with for switching away from the RoaringBitmap path. At this scale, RoaringBitmap remains efficient for low-cardinality cases, while higher cardinalities already show clear benefits from using direct HLL updates. This threshold provides a safe balance between preserving deduplication benefits for low cardinality and avoiding excessive bitmap maintenance cost for high-cardinality workloads

@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2025

Codecov Report

❌ Patch coverage is 42.02899% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.28%. Comparing base (72be4e6) to head (fd10acb).
⚠️ Report is 35 commits behind head on master.

Files with missing lines Patch % Lines
...seDistinctCountSmartSketchAggregationFunction.java 38.23% 17 Missing and 4 partials ⚠️
...tion/DistinctCountSmartHLLAggregationFunction.java 45.71% 16 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17411      +/-   ##
============================================
+ Coverage     63.26%   63.28%   +0.02%     
  Complexity     1474     1474              
============================================
  Files          3152     3162      +10     
  Lines        187881   188725     +844     
  Branches      28765    28881     +116     
============================================
+ Hits         118855   119433     +578     
- Misses        59810    60039     +229     
- Partials       9216     9253      +37     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.24% <42.02%> (+0.03%) ⬆️
java-21 63.22% <42.02%> (-0.02%) ⬇️
temurin 63.28% <42.02%> (+0.02%) ⬆️
unittests 63.28% <42.02%> (+0.02%) ⬆️
unittests1 55.61% <42.02%> (-0.06%) ⬇️
unittests2 34.00% <0.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@praveenc7 praveenc7 marked this pull request as ready for review December 23, 2025 22:45
@xiangfu0 xiangfu0 requested review from Copilot and xiangfu0 January 6, 2026 07:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an adaptive cardinality-aware execution path for DISTINCT_COUNT_SMART_HLL aggregation on dictionary-encoded columns. For high-cardinality scenarios (>100K distinct values), bypassing RoaringBitmap and directly updating HLL reduces server-side CPU time by 4x-10x. A new dictThreshold parameter (default: 100K) controls when to convert from RoaringBitmap to HLL during aggregation instead of waiting until finalization.

Key changes:

  • Added adaptive conversion logic that switches from RoaringBitmap to HLL when cardinality exceeds threshold
  • Introduced dictThreshold parameter with 100K default based on benchmark results
  • Implemented early conversion checks during aggregation for both group-by and non-group-by queries

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
DistinctCountSmartHLLAggregationFunction.java Added _dictIdCardinalityThreshold field, parameter parsing, and adaptive conversion logic for non-group-by aggregation
BaseDistinctCountSmartSketchAggregationFunction.java Implemented group-by adaptive conversion with modified group tracking and cardinality checks
DistinctCountSmartHLLAggregationFunctionTest.java Added unit tests for parameter parsing, HLL operations, and adaptive conversion behavior
BenchmarkDistinctCountHLLThreshold.java Added JMH benchmark to measure performance across different cardinalities and record counts
pinot-perf/pom.xml Registered new benchmark class in Maven build configuration

Comment on lines 700 to 702
if (!(result instanceof DictIdsWrapper)) {
// Already converted to sketch, offer directly
((HyperLogLog) result).offer(dictionary.get(dictId));
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes any result that is not a DictIdsWrapper is a HyperLogLog, but result could be null on first access for a group. This will cause a NullPointerException. Add a null check and initialize the result if it's null before casting.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it can be null for invalid groupKey. Updating it

Comment on lines 717 to 721
if (!(result instanceof DictIdsWrapper)) {
// Already converted to sketch, offer directly
for (int dictId : dictIds) {
((HyperLogLog) result).offer(dictionary.get(dictId));
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes any result that is not a DictIdsWrapper is a HyperLogLog, but result could be null on first access for a group. This will cause a NullPointerException. Add a null check and initialize the result if it's null before casting.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +56
* - dictThreshold: Threshold for dictionary-encoded columns to trigger early conversion from RoaringBitmap to HLL
* during aggregation. 100_000 by default. Set to Integer.MAX_VALUE to disable and convert only
* at finalization.
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should clarify that non-positive values (≤0) are also treated as disabled and converted to Integer.MAX_VALUE, consistent with the implementation in the Parameters class where values ≤0 are set to Integer.MAX_VALUE.

Copilot uses AI. Check for mistakes.
for (int i = 0; i < 1000; i++) {
hll.offer(i);
}
Long cardinality = Long.valueOf(function.extractFinalResult(hll));
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace Long.valueOf() with direct cast (Long) since extractFinalResult() already returns a Long object. Using Long.valueOf() on an object is unnecessary and could cause a NullPointerException if the result is null.

Suggested change
Long cardinality = Long.valueOf(function.extractFinalResult(hll));
Long cardinality = (Long) function.extractFinalResult(hll);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function.extractFinalResult(hll) returns Int actually hence Long.valueOf() is used

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise

@Override
public void aggregate(int length, AggregationResultHolder aggregationResultHolder,
Map<ExpressionContext, BlockValSet> blockValSetMap) {
Map<ExpressionContext, BlockValSet> blockValSetMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

@xiangfu0
Copy link
Contributor

xiangfu0 commented Jan 6, 2026

also please add a release note for this enhancement

@xiangfu0
Copy link
Contributor

xiangfu0 commented Jan 6, 2026

Also please fix the tests.

@praveenc7
Copy link
Contributor Author

also please add a release note for this enhancement

Sure will add one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants