chore: upgrade sql parser to support additional ClickHouse features#3645
Conversation
First step of the incremental sqlparser crate upgrade toward 0.54 (ClickHouse SAMPLE clause + query-level SETTINGS support). No serialized-AST shape changes between 0.48 and 0.49; NIF remains a pure pass-through and no Elixir AST handling required. sql_test.exs passes (147 tests, 0 failures). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FROM-clause table-function args became a TableFunctionArgs struct
("args": {"args": [...], ...}). Updated the restricted-function matcher
in Logflare.Sql for the new shape so table-valued functions (file(),
pg_catalog.*) are still rejected. NIF unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0.53's Spanned feature adds source-location metadata to the AST. Fixes: - AstUtils.build_identifier/2 stamps the now-required `span` on hand-built idents - `uses_odbc_syntax` added to hand-built Function nodes - countif's count(*) arg uses the new Wildcard shape - test helper compares ASTs with non-semantic span/*_token metadata stripped Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
||
| table_name = ClickHouseAdaptor.clickhouse_ingest_table_name(backend, :log) | ||
|
|
||
| query = |
There was a problem hiding this comment.
🟡 Severity: MEDIUM
The PR enables ClickHouse query-level SETTINGS clauses to pass through the SQL transform pipeline (newly parseable by sqlparser 0.54) without any application-level validation. Users could embed settings such as allow_introspection_functions = 1 that bypass ClickHouse-level security controls if the ClickHouse user account has sufficient privileges. The validate_query / has_restricted_functions pipeline only inspects SQL function calls, not SETTINGS nodes in the AST.
Helpful? Add 👍 / 👎
💡 Fix Suggestion
Suggestion: The fix should be applied in lib/logflare/sql.ex, not in the test file. Add a new pattern-matching clause to has_restricted_functions/3 (around line 709) that intercepts {"Setting", ...} AST nodes emitted by sqlparser for ClickHouse SETTINGS clauses. The safest approach is to either (a) block all SETTINGS clauses entirely with {:error, "Query-level SETTINGS are not allowed"}, or (b) maintain an allowlist of known-safe performance settings (e.g., max_threads, max_execution_time, max_rows_to_read) and return {:error, "Restricted setting #{key}"} for anything not on the list. Example clause to add before the catch-all:
@allowed_ch_settings ~w(max_threads max_execution_time max_rows_to_read max_bytes_to_read)
defp has_restricted_functions(
{"Setting", %{"key" => key}},
:ok,
%{dialect: "clickhouse"}
)
when is_binary(key) do
if String.downcase(key) in @allowed_ch_settings do
:ok
else
{:error, "Restricted or disallowed ClickHouse setting: #{String.downcase(key)}"}
end
endThis ensures that any SETTINGS node surfaced by sqlparser 0.54+ is inspected and only an explicit allowlist of safe, non-security-impacting settings can pass through.
This bumps the Rust-based
sqlparsercrate to v0.54.0 (from v0.48.0) to allow for the ClickHouse "SAMPLE" keyword to be used in queries / endpoints / etc. Aside from this, we also handle ClickHouse's query-level session settings similar to functions by defining an allowlist of select settings.The biggest challenge was the jump to 0.53 due to this change, which now prevents us from doing quick equality assertions as we serialize the AST to JSON. For that I had to put together a helper function for a number of comparison tests - particularly around BQ -> PG transformations.
I also added a couple integration tests to also validate that ClickHouse actually is able to use the output of the parser for things like SAMPLE. Either way - this gap seems to be covered enough for now.