fix: updated the sql query for dataset stats#1555
fix: updated the sql query for dataset stats#1555nikhilsinhaparseable merged 4 commits intoparseablehq:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR refactors the field statistics SQL generation logic by reordering CTEs for improved pagination handling, adds the All action to the SuperAdmin role, enables backward-compatible deserialization of the User protected field, and adjusts a log level. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/storage/field_stats.rs (2)
637-649:⚠️ Potential issue | 🔴 Critical
rv.field_namealias doesn't exist insidetop_values— field filter will always fail at runtime.The
fields_filterstring is interpolated directly into the body oftop_values:top_values AS ( SELECT * FROM ranked_values WHERE rn > ... AND rn <= ... AND rv.field_name IN (...) -- ← no alias 'rv' here )
rvis only introduced in the finalSELECT ... FROM top_values tv JOIN field_totals ft ..., not insidetop_values. Any request with a non-emptyfieldslist will produce a SQL execution error.🐛 Proposed fix
- format!("AND rv.field_name IN ({})", quoted_fields.join(", ")) + format!("AND field_name IN ({})", quoted_fields.join(", "))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 637 - 649, The generated SQL filter uses the nonexistent alias rv inside the top_values CTE causing runtime errors; update the fields_filter construction so it references the correct column name as used in ranked_values/top_values (e.g., use "field_name" without the rv prefix) or build the filter to apply against ranked_values (e.g., "field_name IN (...)") so the WHERE inside top_values matches available columns; modify the code that creates fields_filter (the block assigning fields_filter) to remove the "rv." prefix when interpolating into the top_values CTE so the filter uses the actual column name present in top_values/ranked_values.
650-650:⚠️ Potential issue | 🟠 Major
dataset_nameis escaped for double quotes but is used inside single-quoted SQL string literals — correct escaping is for single quotes.Line 650:
let dataset_name = dataset_name.replace('"', "\"\"");
dataset_nameis then interpolated at lines 677 and 700 as a SQL string literal:WHERE dataset_name = '{dataset_name}'Double-quote escaping does nothing to protect the single-quoted literal. A dataset name containing
'(e.g."customer's_data") produces broken SQLWHERE dataset_name = 'customer's_data'— either a parse error or a potential injection point.🐛 Proposed fix
- let dataset_name = dataset_name.replace('"', "\"\""); + let dataset_name = dataset_name.replace('\'', "''");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` at line 650, The code currently replaces double quotes in dataset_name (let dataset_name = dataset_name.replace('"', "\"\"")) but the value is later embedded into single-quoted SQL literals (WHERE dataset_name = '{dataset_name}'), so replace the unsafe approach by either using a parameterized query/binding for dataset_name when building the SQL or, if parameterization is not possible in this context, escape single quotes correctly by replacing ' with '' (e.g., dataset_name.replace('\'', "''")); update the code that constructs the SQL strings (the sites referencing dataset_name at the WHERE clauses) to use the safe/escaped value or the parameter placeholder to eliminate parse errors and injection risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/storage/field_stats.rs`:
- Around line 637-649: The generated SQL filter uses the nonexistent alias rv
inside the top_values CTE causing runtime errors; update the fields_filter
construction so it references the correct column name as used in
ranked_values/top_values (e.g., use "field_name" without the rv prefix) or build
the filter to apply against ranked_values (e.g., "field_name IN (...)") so the
WHERE inside top_values matches available columns; modify the code that creates
fields_filter (the block assigning fields_filter) to remove the "rv." prefix
when interpolating into the top_values CTE so the filter uses the actual column
name present in top_values/ranked_values.
- Line 650: The code currently replaces double quotes in dataset_name (let
dataset_name = dataset_name.replace('"', "\"\"")) but the value is later
embedded into single-quoted SQL literals (WHERE dataset_name =
'{dataset_name}'), so replace the unsafe approach by either using a
parameterized query/binding for dataset_name when building the SQL or, if
parameterization is not possible in this context, escape single quotes correctly
by replacing ' with '' (e.g., dataset_name.replace('\'', "''")); update the code
that constructs the SQL strings (the sites referencing dataset_name at the WHERE
clauses) to use the safe/escaped value or the parameter placeholder to eliminate
parse errors and injection risk.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/handlers/http/modal/query_server.rssrc/handlers/http/modal/server.rssrc/storage/field_stats.rs
fe47d78 to
885a954
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/storage/field_stats.rs (1)
713-715: Make final ordering deterministic for equal counts.The final sort only uses
distinct_value_count DESC. Adddistinct_value ASCas a tie-breaker to keep output stable across executions/pages.Suggested fix
ORDER BY tv.field_name, - tv.distinct_value_count DESC" + tv.distinct_value_count DESC, + tv.distinct_value ASC"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 713 - 715, The ORDER BY clause in the SQL built in src/storage/field_stats.rs currently sorts only by tv.field_name and tv.distinct_value_count DESC which can yield non-deterministic ordering when counts tie; update the ORDER BY to add tv.distinct_value ASC as a tie-breaker (i.e., ORDER BY tv.field_name, tv.distinct_value_count DESC, tv.distinct_value ASC) in the query string where it's defined so results are stable across executions/pages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/field_stats.rs`:
- Around line 677-680: The injected fields predicate uses an undefined alias
"rv" (rv.field_name IN (...)) causing runtime SQL errors; update the code that
builds fields_filter to reference the correct column in this CTE scope (either
plain field_name or the CTE/table alias used here, e.g. field_stats.field_name)
instead of rv.field_name so the predicate resolves in the same SELECT/CTE where
it is injected.
---
Nitpick comments:
In `@src/storage/field_stats.rs`:
- Around line 713-715: The ORDER BY clause in the SQL built in
src/storage/field_stats.rs currently sorts only by tv.field_name and
tv.distinct_value_count DESC which can yield non-deterministic ordering when
counts tie; update the ORDER BY to add tv.distinct_value ASC as a tie-breaker
(i.e., ORDER BY tv.field_name, tv.distinct_value_count DESC, tv.distinct_value
ASC) in the query string where it's defined so results are stable across
executions/pages.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/handlers/http/modal/query_server.rssrc/handlers/http/modal/server.rssrc/storage/field_stats.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/handlers/http/modal/server.rs
- src/handlers/http/modal/query_server.rs
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/storage/field_stats.rs (2)
716-718: Add a tie-breaker to final ordering for stable output.Rows with equal
distinct_value_countcan be returned in non-deterministic order. Addingtv.distinct_value ASCkeeps results stable and consistent with ranking behavior.Proposed diff
ORDER BY tv.field_name, - tv.distinct_value_count DESC" + tv.distinct_value_count DESC, + tv.distinct_value ASC"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 716 - 718, The final ORDER BY in the SQL string currently sorts by tv.field_name and tv.distinct_value_count DESC but lacks a tie-breaker; update the ORDER BY clause in the SQL (the string containing "ORDER BY tv.field_name, tv.distinct_value_count DESC") to append a stable tie-breaker "tv.distinct_value ASC" so rows with equal distinct_value_count are returned deterministically; locate the SQL in src/storage/field_stats.rs and modify that ORDER BY clause to include tv.distinct_value ASC.
696-706: Filterfield_totalsby requested fields to avoid extra aggregation work.When
fieldsis provided,ranked_valuesis filtered butfield_totalsstill aggregates all fields for the dataset. That adds avoidable scan/grouping cost.Proposed diff
field_totals AS ( SELECT field_stats_field_name, SUM(field_stats_count) AS total_field_count FROM {DATASET_STATS_STREAM_NAME} WHERE dataset_name = '{dataset_name}' + {fields_filter} GROUP BY field_stats_field_name )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 696 - 706, The field_totals CTE is aggregating across all fields even when a specific set is requested; modify the field_totals query (the CTE named field_totals that reads from DATASET_STATS_STREAM_NAME) to apply the same filter used by ranked_values by adding a WHERE clause limiting field_stats_field_name to the requested fields (i.e., field_stats_field_name IN ({fields}) or the equivalent parameterized condition) so aggregation/grouping only processes the requested fields; keep the existing dataset_name = '{dataset_name}' predicate and ensure the placeholder/parameter format matches how fields are passed in the surrounding code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/storage/field_stats.rs`:
- Around line 716-718: The final ORDER BY in the SQL string currently sorts by
tv.field_name and tv.distinct_value_count DESC but lacks a tie-breaker; update
the ORDER BY clause in the SQL (the string containing "ORDER BY tv.field_name,
tv.distinct_value_count DESC") to append a stable tie-breaker "tv.distinct_value
ASC" so rows with equal distinct_value_count are returned deterministically;
locate the SQL in src/storage/field_stats.rs and modify that ORDER BY clause to
include tv.distinct_value ASC.
- Around line 696-706: The field_totals CTE is aggregating across all fields
even when a specific set is requested; modify the field_totals query (the CTE
named field_totals that reads from DATASET_STATS_STREAM_NAME) to apply the same
filter used by ranked_values by adding a WHERE clause limiting
field_stats_field_name to the requested fields (i.e., field_stats_field_name IN
({fields}) or the equivalent parameterized condition) so aggregation/grouping
only processes the requested fields; keep the existing dataset_name =
'{dataset_name}' predicate and ensure the placeholder/parameter format matches
how fields are passed in the surrounding code.
41112ca to
bc86d2f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/storage/field_stats.rs (1)
643-646: Previously raised issue — confirmed addressed.
field_stats_field_nameis correctly referenced in the fields filter (no undefined alias). This resolves the prior review comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 643 - 646, The previous undefined-alias bug in the fields filter has been resolved: ensure the SQL fragment uses the correct column name field_stats_field_name (as shown in the format! call) and keep the quoted_fields.join(", ") usage; no further code changes are needed for the fields filter in the function generating this SQL fragment.
🧹 Nitpick comments (1)
src/storage/field_stats.rs (1)
696-715:field_totalsis not filtered byfields_filter— confirm this is intentional.When a
fields_filteris provided (e.g.,field_stats_field_name IN ('a', 'b')),field_totalsscans all fields fordataset_name. This is harmless because theJOIN ON tv.field_name = ft.field_stats_field_namerestricts the final result to only the filtered fields, but it does cause an unnecessary full scan of all fields forfield_totals. If you want to avoid the extra work, you can propagate{fields_filter}(replacingfield_stats_field_namewith the same column) intofield_totalsas well.♻️ Proposed optimization to propagate fields_filter into field_totals
field_totals AS ( SELECT field_stats_field_name, SUM(field_stats_count) AS total_field_count FROM {DATASET_STATS_STREAM_NAME} WHERE dataset_name = '{dataset_name}' + {fields_filter} GROUP BY field_stats_field_name )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 696 - 715, The CTE field_totals currently sums field_stats_count for the dataset without applying the provided fields_filter, causing an unnecessary scan; update the field_totals CTE (referencing field_totals, field_stats_field_name, DATASET_STATS_STREAM_NAME, dataset_name) to include the same {fields_filter} condition (matching field_stats_field_name to the filtered column) so it only aggregates over the requested fields, ensuring the JOIN with top_values remains correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/field_stats.rs`:
- Around line 660-686: The CTE ranked_values uses ROW_NUMBER() OVER (... ORDER
BY SUM(field_stats_distinct_stats_count) ...) which DataFusion 51 doesn't allow
because ORDER BY in window functions must be row-level expressions; fix by
computing the SUM(field_stats_distinct_stats_count) in the GROUP BY step (e.g.,
produce a column like distinct_value_count in the aggregation) and then
reference that aggregated column in a separate windowing layer/CTE where
ROW_NUMBER() is applied (update the CTE chain so ranked_values emits field_name,
distinct_value, distinct_value_count, distinct_count and the subsequent CTE
applies ROW_NUMBER() OVER (PARTITION BY field_name ORDER BY distinct_value_count
DESC, distinct_value ASC)).
---
Duplicate comments:
In `@src/storage/field_stats.rs`:
- Around line 643-646: The previous undefined-alias bug in the fields filter has
been resolved: ensure the SQL fragment uses the correct column name
field_stats_field_name (as shown in the format! call) and keep the
quoted_fields.join(", ") usage; no further code changes are needed for the
fields filter in the function generating this SQL fragment.
---
Nitpick comments:
In `@src/storage/field_stats.rs`:
- Around line 696-715: The CTE field_totals currently sums field_stats_count for
the dataset without applying the provided fields_filter, causing an unnecessary
scan; update the field_totals CTE (referencing field_totals,
field_stats_field_name, DATASET_STATS_STREAM_NAME, dataset_name) to include the
same {fields_filter} condition (matching field_stats_field_name to the filtered
column) so it only aggregates over the requested fields, ensuring the JOIN with
top_values remains correct.
bc86d2f to
4c13fdc
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
270-276:⚠️ Potential issue | 🟠 MajorKeep near-limit dataset field alert at warning level.
Line 270 now logs threshold breach at
trace, which is usually disabled in production; this hides an important pre-failure signal before ingestion gets blocked at the limit.Suggested fix
- tracing::trace!( + tracing::warn!( "Dataset {0} has {1} fields, which exceeds the warning threshold of {2}. Ingestion will not be possible after reaching {3} fields. We recommend creating a new dataset.", stream_name, fields_count, dataset_fields_warn_threshold as usize, PARSEABLE.options.dataset_fields_allowed_limit );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/modal/utils/ingest_utils.rs` around lines 270 - 276, Change the logging call that reports a dataset approaching the field limit from trace to warn so the near-limit alert is visible in production; update the tracing::trace!(...) invocation that uses stream_name, fields_count, dataset_fields_warn_threshold, and PARSEABLE.options.dataset_fields_allowed_limit to tracing::warn!(...) while keeping the existing message and interpolation intact.
♻️ Duplicate comments (1)
src/storage/field_stats.rs (1)
660-718:⚠️ Potential issue | 🔴 CriticalWindow
ORDER BYstill uses an aggregate expression and can fail at runtime.Line 670 orders
ROW_NUMBER()bySUM(field_stats_distinct_stats_count), which should be computed in a prior aggregation layer and then referenced as a row-level column.Suggested fix
-WITH - ranked_values AS ( +WITH + aggregated_values AS ( SELECT field_stats_field_name AS field_name, field_stats_distinct_stats_distinct_value AS distinct_value, - SUM(field_stats_distinct_stats_count) AS distinct_value_count, + SUM(field_stats_distinct_stats_count) AS distinct_value_count + FROM + {DATASET_STATS_STREAM_NAME} + WHERE + dataset_name = '{dataset_name}' + AND field_stats_distinct_stats_distinct_value IS NOT NULL + {fields_filter} + GROUP BY + field_stats_field_name, + field_stats_distinct_stats_distinct_value + ), + ranked_values AS ( + SELECT + field_name, + distinct_value, + distinct_value_count, ROW_NUMBER() OVER ( PARTITION BY - field_stats_field_name + field_name ORDER BY - SUM(field_stats_distinct_stats_count) DESC, - field_stats_distinct_stats_distinct_value ASC + distinct_value_count DESC, + distinct_value ASC ) AS rn, COUNT(*) OVER ( PARTITION BY - field_stats_field_name + field_name ) AS distinct_count - FROM - {DATASET_STATS_STREAM_NAME} - WHERE - dataset_name = '{dataset_name}' - AND field_stats_distinct_stats_distinct_value IS NOT NULL - {fields_filter} - GROUP BY - field_stats_field_name, - field_stats_distinct_stats_distinct_value + FROM aggregated_values ),Run this read-only check to confirm the risky pattern and dependency version:
#!/bin/bash set -euo pipefail echo "DataFusion dependency versions:" rg -n --glob 'Cargo.toml' --glob 'Cargo.lock' '\bdatafusion\b' echo echo "Window ORDER BY with aggregate in src/storage/field_stats.rs:" rg -nP -C3 'ROW_NUMBER\(\)\s+OVER[\s\S]*ORDER BY[\s\S]*SUM\(field_stats_distinct_stats_count\)' src/storage/field_stats.rs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 660 - 718, The window ORDER BY is using an aggregate expression (SUM(field_stats_distinct_stats_count)) inside ROW_NUMBER() which can fail; refactor the query in src/storage/field_stats.rs so the aggregation is computed in a prior GROUP BY layer and given an alias (e.g., distinct_value_count) and then the ROW_NUMBER() OVER clause in the ranked_values CTE orders by that alias (ORDER BY distinct_value_count DESC, field_stats_distinct_stats_distinct_value ASC). Concretely, modify the CTE that produces ranked_values (and its GROUP BY) to SELECT SUM(field_stats_distinct_stats_count) AS distinct_value_count and then use ROW_NUMBER() OVER (PARTITION BY field_stats_field_name ORDER BY distinct_value_count DESC, field_stats_distinct_stats_distinct_value ASC); ensure downstream references (top_values, joins to field_totals) use the new distinct_value_count column name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/handlers/http/modal/utils/ingest_utils.rs`:
- Around line 270-276: Change the logging call that reports a dataset
approaching the field limit from trace to warn so the near-limit alert is
visible in production; update the tracing::trace!(...) invocation that uses
stream_name, fields_count, dataset_fields_warn_threshold, and
PARSEABLE.options.dataset_fields_allowed_limit to tracing::warn!(...) while
keeping the existing message and interpolation intact.
---
Duplicate comments:
In `@src/storage/field_stats.rs`:
- Around line 660-718: The window ORDER BY is using an aggregate expression
(SUM(field_stats_distinct_stats_count)) inside ROW_NUMBER() which can fail;
refactor the query in src/storage/field_stats.rs so the aggregation is computed
in a prior GROUP BY layer and given an alias (e.g., distinct_value_count) and
then the ROW_NUMBER() OVER clause in the ranked_values CTE orders by that alias
(ORDER BY distinct_value_count DESC, field_stats_distinct_stats_distinct_value
ASC). Concretely, modify the CTE that produces ranked_values (and its GROUP BY)
to SELECT SUM(field_stats_distinct_stats_count) AS distinct_value_count and then
use ROW_NUMBER() OVER (PARTITION BY field_stats_field_name ORDER BY
distinct_value_count DESC, field_stats_distinct_stats_distinct_value ASC);
ensure downstream references (top_values, joins to field_totals) use the new
distinct_value_count column name.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/handlers/http/modal/utils/ingest_utils.rssrc/rbac/role.rssrc/rbac/user.rssrc/storage/field_stats.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/rbac/user.rs
- src/rbac/role.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/storage/field_stats.rs (1)
716-718: Add a deterministic tie-breaker in final ordering.At Lines 716-718, equal
distinct_value_countrows can be returned in unstable order. Addingtv.distinct_value ASCkeeps output deterministic.Suggested change
ORDER BY tv.field_name, - tv.distinct_value_count DESC" + tv.distinct_value_count DESC, + tv.distinct_value ASC"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 716 - 718, The final SQL ORDER BY in the query using alias tv currently sorts by tv.field_name and tv.distinct_value_count DESC but lacks a deterministic tie-breaker; update the ORDER BY clause (where tv is defined) to append a secondary sort on tv.distinct_value ASC so rows with equal distinct_value_count are returned in a stable, deterministic order.src/handlers/http/modal/utils/ingest_utils.rs (1)
270-276: Keep the pre-limit dataset signal at a visible log level.At Line 270, using
trace!makes this threshold warning effectively invisible in most production setups, so operators lose early notice before ingestion starts failing at the hard limit.Suggested change
- tracing::trace!( + tracing::warn!( "Dataset {0} has {1} fields, which exceeds the warning threshold of {2}. Ingestion will not be possible after reaching {3} fields. We recommend creating a new dataset.", stream_name, fields_count, dataset_fields_warn_threshold as usize, PARSEABLE.options.dataset_fields_allowed_limit );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/modal/utils/ingest_utils.rs` around lines 270 - 276, The log about datasets exceeding the warning threshold is currently emitted with tracing::trace!, which is too low for operators to see; change that tracing::trace! invocation to tracing::warn! (keeping the same formatted message and parameters: stream_name, fields_count, dataset_fields_warn_threshold as usize, PARSEABLE.options.dataset_fields_allowed_limit) so the pre-limit dataset signal is visible in production logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/handlers/http/modal/utils/ingest_utils.rs`:
- Around line 270-276: The log about datasets exceeding the warning threshold is
currently emitted with tracing::trace!, which is too low for operators to see;
change that tracing::trace! invocation to tracing::warn! (keeping the same
formatted message and parameters: stream_name, fields_count,
dataset_fields_warn_threshold as usize,
PARSEABLE.options.dataset_fields_allowed_limit) so the pre-limit dataset signal
is visible in production logging.
In `@src/storage/field_stats.rs`:
- Around line 716-718: The final SQL ORDER BY in the query using alias tv
currently sorts by tv.field_name and tv.distinct_value_count DESC but lacks a
deterministic tie-breaker; update the ORDER BY clause (where tv is defined) to
append a secondary sort on tv.distinct_value ASC so rows with equal
distinct_value_count are returned in a stable, deterministic order.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/handlers/http/modal/utils/ingest_utils.rssrc/rbac/role.rssrc/rbac/user.rssrc/storage/field_stats.rs
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
Release Notes
Bug Fixes
Enhancements
Other