Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ concurrency:
env:
CARGO_TERM_COLOR: always
INCAN_REF: release/v0.3
EXPECTED_INCAN_VERSION: 0.3.0-rc20
EXPECTED_INCAN_VERSION: 0.3.0-rc21
RUST_BACKTRACE: 1
INCAN_NO_BANNER: 1

Expand Down
10 changes: 10 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,16 @@ Normative behavior is defined in **`docs/rfcs/`**. If package code and an RFC di
- Use `docs/architecture.md` for system boundaries, not implementation diaries.
- Use `docs/release_notes/` for shipped/user-visible changes.

## RFC 016+ review strategy

The function-catalog RFC series is cumulative. For RFC 016 onward, review each PR against the merged end-to-end boundary rather than only the local diff: authoring helpers and query surface → Prism logical planning → Substrait IR → backend adapter. Scan shared files such as the function registry, Prism output/rewrite/lowering, Substrait expression/relation lowering, extension declarations, and the DataFusion adapter even when the current PR only changes one slice.

Function construction is a first-class review axis. Check that new helpers are constructed through the declaration-side pattern: public helper metadata lives in decorators/partials, examples live in docstrings, names/signatures/parameter and return types derive from checked helper metadata where possible, and registry entries are loaded from the helper declaration rather than a central hardcoded list. Keep aggregate measures, scalar applications, window calls, and generator applications as distinct semantic shapes even when they share registry metadata; do not collapse them into ad hoc per-function models or backend-specific shortcuts.

DataFusion is the first adapter, not InQL’s semantic owner. Do not encode DataFusion-only behavior in Substrait IR, do not model core-function “unsupported” as a normal Substrait state, and do not use SQL/string-script generation when DataFusion exposes a typed API. Invalid context belongs in authoring/Prism/lowering validation; backend inability belongs in adapter capability or error handling.

Merge strategy for this series: land cross-cutting boundary resets first, then base later slices on fresh `main`. Prefer fewer coherent follow-up PRs when RFCs share the same registry/Substrait/backend boundary, but do not fold unrelated remaining RFCs into an active boundary-reset PR without explicit maintainer approval.

## Common commands (this repo)

| Command | Purpose |
Expand Down
6 changes: 4 additions & 2 deletions docs/language/reference/dataset_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The Substrait helper surface behind these methods is split by semantic role:
- `src/substrait/relations.incn` builds concrete `Rel` nodes
- `src/substrait/plans.incn` assembles `Plan` envelopes
- `src/substrait/inspect.incn` owns relation/plan inspection and output-column inference
- `src/substrait/schema_registry.incn` owns named-table schema binding
- `src/schema_registry.incn` owns logical named-table schema binding

## Shared method surface

Expand All @@ -20,9 +20,9 @@ The Substrait helper surface behind these methods is split by semantic role:
| `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. |
| `agg` | `def agg(self, measures: list[AggregateMeasure]) -> Self` | Apply aggregate measures over the current relation or current grouping. |
| `generate` | `def generate(self, generator: GeneratorApplication) -> Self` | Apply a relation-shaping generator such as `explode(...)` with explicit output aliases. |
| `with_window_column` | `def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self` | Add or replace one projected column using a placed window function. |
| `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. |
| `limit` | `def limit(self, n: int) -> Self` | Cap row count. |
| `explode` | `def explode(self) -> Self` | Emit the lower-level `EXPLODE` extension boundary without expression/schema metadata. |

## `with_column`

Expand Down Expand Up @@ -69,6 +69,7 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:
- `join(...)` is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature.
- `select(...)` preserves projection shape; explicit projection lists are represented today through `with_column(...)` and scalar-expression builders.
- `generate(...)` preserves all input columns and appends generated output aliases for `explode`, `explode_outer`, `posexplode`, `posexplode_outer`, `inline`, `inline_outer`, `flatten`, and `stack` generator applications. Alias collisions are rejected during planning/lowering.
- `with_window_column(...)` supports placed ranking, distribution, offset, value, and aggregate-over-window helpers over explicit window specs. Portable helpers lower through Substrait window relations and execute through the DataFusion session adapter.
- `DataFrame[T]` exposes materialized metadata and preview text; row-level accessors belong to the materialized DataFrame API surface.
- Query-block and scoped DSL surfaces lower into these builder APIs rather than defining separate method semantics.

Expand All @@ -77,3 +78,4 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:
- [Filter builders](builders/filters.md)
- [Aggregate builders](builders/aggregates.md)
- [Projection builders](builders/projections.md)
- [Window functions](functions/windows.md)
4 changes: 0 additions & 4 deletions docs/language/reference/functions/generators.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,5 @@ The explicit generator surface currently includes:
Generator applications preserve input columns and append generated columns in declaration order. Generated aliases are
required, must be non-empty, and must not collide with existing input columns.

The zero-argument `DataSet.explode()` method is a lower-level extension-boundary operation. It emits the registered
`EXPLODE` relation extension without carrying a source expression or generated output schema. Generator code should use
`generate(explode(...))` so the relation-shaping function identity, input expression, and output schema are explicit.

Nested scalar helpers such as `array_flatten(...)` remain scalar expressions. They do not expand rows and are documented
on the [nested data functions](nested.md) page. The relation-shaping `flatten(...)` helper is intentionally separate.
4 changes: 3 additions & 1 deletion docs/language/reference/functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ Today the concrete shipped surfaces are documented here:
- [Projection builders](../builders/projections.md)
- [Generator and table-valued functions](generators.md)
- [Nested data functions](nested.md)
- [Window functions](windows.md)

The canonical scalar literal helper is `lit(...)`. Typed literal helpers construct the same scalar-expression representation.

The current registry-backed helper surface covers references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, generators, and nested data. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), function policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked public helpers provide the signature and, by default, the canonical name; metadata may override the canonical name only for source spelling constraints such as the reserved-word `mod` case.
The current registry-backed helper surface covers references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, generators, nested data, and windows. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), function policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked public helpers provide the signature and, by default, the canonical name; metadata may override the canonical name only for source spelling constraints such as the reserved-word `mod` case.

The registry is the source for non-derivable machine facts. Public helper declarations are the source for argument names, argument types, and return types. Docstrings remain human-facing explanation, examples, and parameter intent. The `registry-metadata` check validates the checked API metadata projections produced from public facade aliases, registry decorators, and decorated callable signatures. Runtime registry entries are lazy and process-local: they support helper execution and lowering for loaded helpers, while the complete public catalog comes from checked metadata. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.

Expand All @@ -35,6 +36,7 @@ The registered helper surface currently includes:
| `abs(...)`, `ceil(...)`, `floor(...)`, `round(...)` | scalar | registered Substrait math scalar mappings; `round(...)` is currently the single-argument form |
| `array(...)`, `cardinality(...)`, `array_contains(...)`, `arrays_overlap(...)`, `array_position(...)`, `array_range(...)`, `element_at(...)`, `array_sort(...)`, `array_distinct(...)`, `array_except(...)`, `array_intersect(...)`, `array_union(...)`, `array_join(...)`, `array_slice(...)`, `array_reverse(...)`, `array_flatten(...)`, `map_from_arrays(...)`, `map_extract(...)`, `map_contains_key(...)`, `map_keys(...)`, `map_values(...)`, `map_entries(...)`, `named_struct(...)` | scalar | registered nested scalar helpers backed by Substrait extension mappings; `array_range(...)` registers canonical `range` for positional generator lowering and `map_contains_key(...)` lowers as a documented predicate rewrite |
| `explode(...)`, `explode_outer(...)`, `posexplode(...)`, `posexplode_outer(...)`, `inline(...)`, `inline_outer(...)`, `flatten(...)`, `stack(...)` | generator | relation-extension mappings consumed by `generate(...)`; positional forms use zero-based positions |
| `window()`, `unbounded_preceding()`, `preceding(...)`, `current_row()`, `following(...)`, `unbounded_following()`, `row_number()`, `rank()`, `dense_rank()`, `percent_rank()`, `cume_dist()`, `ntile(...)`, `lag(...)`, `lead(...)`, `first_value(...)`, `last_value(...)`, `nth_value(...)` | window | `window()` and bound helpers build structural window-spec metadata; placed ranking, distribution, offset, value, and aggregate-over-window helpers lower through `ConsistentPartitionWindowRel` and execute through the DataFusion session adapter |
| `asc(...)`, `desc(...)`, `asc_nulls_first(...)`, `asc_nulls_last(...)`, `desc_nulls_first(...)`, `desc_nulls_last(...)` | ordering | structural sort-field helpers consumed by `order_by(...)` and lowered to Substrait `SortRel.sorts` |
| `sum(...)`, `count(...)`, `count_expr(...)`, `count_distinct(...)`, `count_if(...)`, `avg(...)`, `min(...)`, `max(...)` | aggregate | registered Substrait extension functions for core aggregates plus compatibility rewrites for `count_expr(...)`, `count_distinct(...)`, and `count_if(...)`; core aggregates allow `DISTINCT` and aggregate-local `FILTER` where the aggregate shape is valid |

Expand Down
49 changes: 49 additions & 0 deletions docs/language/reference/functions/windows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Window Functions (Reference)

Window helpers are relation-aware. A window function application produces one output value per input row while reading a
partition of related rows. It is not an ordinary scalar expression and must be placed through a projection-like dataset
method.

```incan
from pub::inql import LazyFrame
from pub::inql.functions import col, current_row, desc, lag, rank, sum, unbounded_preceding, window
from models import Order

def ranked_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
spec = window().partition_by([col("customer_id")]).order_by([desc(col("amount"))])
return (
orders
.with_window_column("customer_rank", rank().over(spec))
.with_window_column("previous_amount", lag(col("amount")).over(spec))
.with_window_column(
"running_amount",
sum(col("amount")).over(spec.rows_between(unbounded_preceding(), current_row())),
)
)
```

The window helper surface includes:

| Function | Meaning | Placement |
| --- | --- | --- |
| `window()` | Build an empty window specification with a whole-partition row frame. | Refine with `.partition_by(...)`, `.order_by(...)`, `.rows_between(...)`, or `.range_between(...)`, then pass to `.over(...)`. |
| `unbounded_preceding()`, `preceding(n)`, `current_row()`, `following(n)`, `unbounded_following()` | Build frame bounds. | Use with `.rows_between(...)` or `.range_between(...)`. |
| `row_number()` | Assign a sequential row number inside the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `rank()` | Rank rows with gaps after ties inside the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `dense_rank()` | Rank rows without gaps after ties inside the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `percent_rank()` | Return relative rank within the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `cume_dist()` | Return cumulative distribution within the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `ntile(n)` | Split ordered rows into `n` buckets. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `lag(expr, offset=1, default_value=...)` | Read a prior row in the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `lead(expr, offset=1, default_value=...)` | Read a later row in the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `first_value(expr)`, `last_value(expr)`, `nth_value(expr, n)` | Read a value from the current frame. | Use `.over(window().order_by(...))`, then `with_window_column(...)`; value calls may use `.ignore_nulls()` or `.respect_nulls()` before `.over(...)`. |
| `sum(...)`, `count(...)`, `avg(...)`, `min(...)`, `max(...)` | Reuse aggregate helpers over a window frame. | Call `.over(window_spec)` on the aggregate measure, then `with_window_column(...)`. |

`WindowSpec.partition_by(...)` replaces the partition expressions. `WindowSpec.order_by(...)` replaces the ordering
expressions. `WindowSpec.rows_between(...)` and `WindowSpec.range_between(...)` replace the frame. Ranking,
distribution, offset, and value helpers require explicit ordering; missing ordering is rejected during logical lowering.

`with_window_column(name, application)` preserves input columns and adds or replaces `name` using add-or-replace
projection semantics. Compatible adjacent window projections lower through Substrait `ConsistentPartitionWindowRel` with
registry-backed function anchors, frame bounds, invocation metadata, null-treatment options, and output aliases. The
DataFusion session backend executes the portable window helpers through the Substrait adapter boundary.
2 changes: 1 addition & 1 deletion docs/language/reference/substrait/operator_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The following table maps InQL plan capabilities to Substrait logical relations a
| Group by / aggregates | `AggregateRel` with scalar grouping keys and aggregate measures; grouping sets are tracked as a distinct capability below | core |
| Rollup / cube / grouping sets | `AggregateRel` with multiple groupings | core |
| Distinct rows | `AggregateRel` with grouping keys and no measures | core |
| Window / analytic functions | `ProjectRel` with window expressions | core |
| Window / analytic functions | `ConsistentPartitionWindowRel` with partition/order expressions and registered window function anchors | core |
| Sort | `SortRel` | core |
| Limit / offset | `FetchRel` | core |
| Union, intersect, except | `SetRel` with the appropriate set operation enum | core |
Expand Down
1 change: 1 addition & 0 deletions docs/release_notes/v0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable).
- **Common scalar functions:** The first RFC 018 slice adds registry-backed math helpers for `abs(...)`, `ceil(...)`, `floor(...)`, and single-argument `round(...)`, with Substrait mappings and DataFusion-backed execution coverage.
- **Nested data functions:** RFC 020 adds registry-backed scalar helpers for array construction/access, cardinality, containment, overlap, sorting, set-like operations, joining, slicing, reversing, scalar array flattening, map construction/access, map key/value/entry extraction, map key containment, and named struct construction. These helpers lower through Substrait extension metadata without introducing generator semantics, with representative DataFusion-backed Session coverage for composable array projection paths.
- **Generator functions:** RFC 021 adds registry-backed generator applications for `explode(...)`, `explode_outer(...)`, `posexplode(...)`, `posexplode_outer(...)`, `inline(...)`, `inline_outer(...)`, portable `flatten(...)`, and `stack(...)`. Generators remain relation-shaping operations applied with `generate(...)`; they preserve input columns, require explicit output aliases, lower through the current Substrait extension-relation gap encoding, and execute through the DataFusion Session adapter with concrete output-column materialization.
- **Window functions:** RFC 019 adds `window()` specs, explicit row/range frame bounds, ranking and distribution helpers (`row_number`, `rank`, `dense_rank`, `percent_rank`, `cume_dist`, `ntile`), offset and value helpers (`lag`, `lead`, `first_value`, `last_value`, `nth_value`), and aggregate-over-window placement through `with_window_column(...)`. Portable window helpers require explicit ordering where appropriate, lower through Substrait `ConsistentPartitionWindowRel`, and execute through the DataFusion session adapter.
- **Function registry:** RFC 014 adds declaration-site registry decorators for the current public helper surface, including stable function references, checked signature projection, lifecycle metadata, behavior categories, alias policy, Substrait mapping categories, and checked API metadata drift validation.
- **Function extension policy:** InQL RFC 024 policy metadata now distinguishes portable core functions, namespaced extension-only functions, opt-in compatibility aliases, engine-specific functions, and rejected compatibility requests without adding an extension plugin system or backend-owned semantics.
- **Projection:** builder-based `with_column`, `add`, `mul`, and literal expression helpers now lower derived columns through Prism, Substrait, and Session execution.
Expand Down
Loading
Loading