metrics: added more metrics for multi-column group by#19526
metrics: added more metrics for multi-column group by#19526rluvaton wants to merge 5 commits intoapache:mainfrom
Conversation
| /// Maximum hash map capacity | ||
| pub(crate) maximum_hash_map_capacity: Gauge, |
There was a problem hiding this comment.
is this tracking the HashTable's initial capacity, or how it grows during execution? When is this useful vs. maximum_number_of_entries_in_map?
There was a problem hiding this comment.
How it grows during execution.
This is different than max number of elements as you can't know from max number of elements if the hash map fit in cache and which cache. Because unlike vector which only the number of elements is used and you can know if it fit in the cache, in hash map it is scattered
| @@ -1280,8 +1356,12 @@ mod tests { | |||
| #[test] | |||
| fn test_emit_first_n_for_vectorized_group_values() { | |||
There was a problem hiding this comment.
Besides these tests, I think it would be good to add .slt tests demonstrating the new metrics appear in EXPLAIN ANALYZE.
There was a problem hiding this comment.
Wouldn't that be highly fragile?
There was a problem hiding this comment.
You could use slt:ignore like this
| pub(crate) fn new(metrics: &ExecutionPlanMetricsSet, partition: usize) -> Self { | ||
| Self { | ||
| time_hashing_grouping_columns: MetricBuilder::new(metrics) | ||
| .subset_time("time_hashing_grouping_columns", partition), |
There was a problem hiding this comment.
Is aggregation of per-partition metrics handled elsewhere, or will users need to manually sum across partitions to see total time?
There was a problem hiding this comment.
Per partition I think, it's the same as other metrics
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
N/A
Rationale for this change
The first step in looking for a problem in client env is by metrics, and we currently have no metrics for multi group by aggregation (as well as other types of group by - single column and row based) so this add some
What changes are included in this PR?
Added metrics for time hashing, building hash map and maximum length and capacity for the hash map for multi group by
I would like to add metrics for the
vectorized_appendandvectorized_equal_toas well but I don't have a good name for them that do not leak internal implementationAre these changes tested?
No
Are there any user-facing changes?
Yes