Skip to content

perf(stark): migrate end_exemptions_poly to evaluation form#621

Merged
MauroToscano merged 3 commits into
mainfrom
perf/end-exemptions-eval-form
May 26, 2026
Merged

perf(stark): migrate end_exemptions_poly to evaluation form#621
MauroToscano merged 3 commits into
mainfrom
perf/end-exemptions-eval-form

Conversation

@diegokingston
Copy link
Copy Markdown
Collaborator

Follow-up to #604. Removes transition.rs's last dependency on Polynomial arithmetic — combined with #604 (already merged) no production code in the prover uses Polynomial operators anymore.

Summary

Replaces the coefficient-form end_exemptions_poly: Polynomial<F> in crypto/stark/src/constraints/transition.rs with two eval-form helpers:

  • end_exemptions_roots(trace_primitive_root, trace_length) -> Vec<FieldElement<F>> — the roots rᵢ of ∏(x − rᵢ).
  • end_exemptions_lde_evaluations(domain) -> Vec<FieldElement<F>> — the product evaluated over the precomputed LDE coset directly.

The OOD path in evaluate_zerofier now computes ∏(z − rᵢ) directly instead of Polynomial::evaluate. The Polynomial and evaluate_polynomial_on_lde_domain imports are dropped from transition.rs.

Performance

  • VM path (end_exemptions == 0, the only case in every VM table): unchanged. The else-branch's existing fast-path early-return is preserved bit-for-bit, so the function still returns the short cyclic vector instead of expanding it to the full LDE domain.
  • k > 0 path (example AIRs only — fibonacci variants and read-only-memory): an O(N · end_exemptions) direct product replaces an O(N log N) FFT. Strictly faster for end_exemptions ≤ 2.

Test plan

  • cargo build --workspace — clean.
  • make lint — clean (all three clippy configs).
  • cargo test -p stark — 125/125 passing, including the example AIRs that exercise the eval-form k > 0 path (end_exemptions = 1, 2 in fibonacci variants and read-only-memory).
  • cargo test -p lambda-vm-prover --lib — 273 pass / 77 fail; identical counts on the pre-cleanup(stark/constraints): drop coefficient-form leftovers, dedupe eval #604 baseline (verified by reverting just transition.rs). All 77 failures are pre-existing on the base and unrelated to constraints (UnknownSyscall(5) etc.).

Replaces the coefficient-form `end_exemptions_poly: Polynomial<F>` with
two eval-form helpers, removing transition.rs's last dependency on
`Polynomial` arithmetic. Stacks on #604 (which removed boundary.rs's
`Polynomial` dependency). With both landed, no production code in the
prover uses `Polynomial` operators.

- `end_exemptions_roots` returns the roots r_i of prod(x - r_i)
  (<= 2 in the example AIRs, 0 in every VM table).
- `end_exemptions_lde_evaluations(domain)` evaluates the product over
  the precomputed LDE coset directly: an O(N * end_exemptions) loop
  replaces an O(N log N) FFT.
- `evaluate_zerofier`'s OOD path computes prod(z - r_i) directly
  instead of `Polynomial::evaluate`.

Performance: the VM's dominant path (end_exemptions == 0) is unchanged
- the existing else-branch early-return is preserved bit-for-bit, so
that case still returns the short cyclic vector instead of expanding it.
The k > 0 path (example AIRs only) goes from FFT to direct product,
strictly faster.

Correctness verified by `cargo test -p stark` (125/125, includes the
fibonacci and read-only-memory AIRs with end_exemptions = 1 and 2 -
exactly the eval-form k > 0 path). VM prover lib test counts are
identical to the #604 baseline (273 pass, 77 pre-existing failures
unrelated to constraints).
@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I was not able to run the Rust tests: cargo/rustup tried to write under read-only /home/runner/.rustup / /home/runner/.cargo, and offline mode then failed on the unresolved git dependency.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Code Review

Clean refactoring. The eval-form replacement is correct and the logic is preserved faithfully. A few items worth noting:

Medium — Latent correctness bug: offset != 0 with end_exemptions > 0

The FIXME on line 122 is carried over from the original, but now has sharper consequences: the roots vector is the sole source of truth for which rows are exempt, so a wrong root silently produces a wrong proof for any future constraint with both offset != 0 and end_exemptions > 0. No existing AIR exercises this combination today, but it is easy to miss.

The correct roots should be the last k elements of the offset periodic domain — currently decrement = g^(-period) ignores the offset entirely. The correct value for the last root in the offset domain is g^(offset + (trace_length/period - 1)*period) = g^(trace_length - period + offset).

Low — exemptions_period branch does not skip LDE expansion when end_exemptions == 0

The else branch has a fast-path early return for end_exemptions == 0 (preserving the cyclic vector). The exemptions_period branch calls end_exemptions_lde_evaluations unconditionally: when roots are empty the fold produces a vector of N ones and multiplies through pointlessly. Not a regression (old code also evaluated one_poly over the full domain here), but worth a matching guard if this path ever gets a VM caller.

Nit — double-negation in evaluate_zerofier

The expression -(root.clone() - z.clone()) is a workaround for the subfield/superfield subtraction direction. The comment explains it, but if the math library exposes to_extension(), z.clone() - root.to_extension() reads more directly.


Overall the change is correct, the described performance improvement holds, and tests cover the exercised paths.

Comment thread crypto/stark/src/constraints/transition.rs Outdated
Comment thread crypto/stark/src/constraints/transition.rs
@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

@github-actions
Copy link
Copy Markdown

Benchmark — fib_iterative_8M (median of 5)

Table parallelism: auto (cores / 3)

Metric main PR Δ
Peak heap 51992 MB 51823 MB -169 MB (-0.3%) ⚪
Prove time 24.990s 25.078s +0.088s (+0.4%) ⚪

✅ No significant change.

✅ Low variance (time: 1.2%, heap: 0.5%)

Commit: ccab5dc · Baseline: cached · Runner: self-hosted bench

@MauroToscano MauroToscano enabled auto-merge May 26, 2026 17:15
@MauroToscano MauroToscano added this pull request to the merge queue May 26, 2026
Merged via the queue into main with commit 23121a6 May 26, 2026
12 checks passed
@MauroToscano MauroToscano deleted the perf/end-exemptions-eval-form branch May 26, 2026 18:19
diegokingston added a commit that referenced this pull request May 26, 2026
After #604 (boundary.rs eval-form) and #621 (end_exemptions_poly eval-form)
landed on main, no production code in the prover uses `Polynomial`
operators anymore. The only remaining user was `Polynomial::interpolate`
(test-only Lagrange interpolation, gated `#[cfg(test)]` and used as a
naive reference by the FFT cross-check tests). With both production
references gone, the operators + the naive interpolant can go too.

Removed:
- All 14 operator `impl`s (Add P+/-P, Neg P, Mul P*P, Sub P-FieldElement);
  the previous PR kept these four families because `transition.rs` and
  `boundary.rs` used them. They are now fully unreachable.
- `Polynomial::interpolate` and `InterpolateError` from `polynomial.rs`.
- Operator unit tests in `polynomial_tests.rs` (adding_, negating_,
  multiply_*) and their helpers (`polynomial_a`, `polynomial_b`,
  `polynomial_minus_a`, `polynomial_a_plus_b`).
- `compose` test helper + `composition_works` test (depended on the
  removed `interpolate`).
- Direct `interpolate_x_*_y_*` tests (tested the removed function).

Rewrote:
- FFT correctness tests (both `fft_tests.rs` and
  `fft_friendly_u64_goldilocks_tests.rs`): the
  `gen_fft_and_naive_interpolate` / `_coset_` helpers cross-checked
  `interpolate_fft` against naive Lagrange. Replaced with FFT
  round-trip via direct Horner evaluation — interpolate via FFT, then
  re-evaluate at every twiddle with `Polynomial::evaluate` (a different
  algorithm). Same independence property, no naive Lagrange needed.
- `simple_interpolating_polynomial_by_hand_works`: swapped `*` for
  `mul_with_ref(&...)` (the only remaining call site that needed it).

Verified:
- `cargo build --workspace` clean.
- `cargo test -p math` -> 178 passing.
- `cargo test -p lambda-vm-prover --lib` -> 281 pass / 77 pre-existing
  failures unrelated to constraints (same baseline as main).
- `make lint` clean across all three clippy configs.
diegokingston added a commit that referenced this pull request May 26, 2026
After #604 (boundary.rs eval-form) and #621 (end_exemptions_poly eval-form)
landed on main, no production code in the prover uses `Polynomial`
operators anymore. The only remaining user was `Polynomial::interpolate`
(test-only Lagrange interpolation, gated `#[cfg(test)]` and used as a
naive reference by the FFT cross-check tests). With both production
references gone, the operators + the naive interpolant can go too.

Removed:
- All 14 operator `impl`s (Add P+/-P, Neg P, Mul P*P, Sub P-FieldElement);
  the previous PR kept these four families because `transition.rs` and
  `boundary.rs` used them. They are now fully unreachable.
- `Polynomial::interpolate` and `InterpolateError` from `polynomial.rs`.
- Operator unit tests in `polynomial_tests.rs` (adding_, negating_,
  multiply_*) and their helpers (`polynomial_a`, `polynomial_b`,
  `polynomial_minus_a`, `polynomial_a_plus_b`).
- `compose` test helper + `composition_works` test (depended on the
  removed `interpolate`).
- Direct `interpolate_x_*_y_*` tests (tested the removed function).

Rewrote:
- FFT correctness tests (both `fft_tests.rs` and
  `fft_friendly_u64_goldilocks_tests.rs`): the
  `gen_fft_and_naive_interpolate` / `_coset_` helpers cross-checked
  `interpolate_fft` against naive Lagrange. Replaced with FFT
  round-trip via direct Horner evaluation — interpolate via FFT, then
  re-evaluate at every twiddle with `Polynomial::evaluate` (a different
  algorithm). Same independence property, no naive Lagrange needed.
- `simple_interpolating_polynomial_by_hand_works`: swapped `*` for
  `mul_with_ref(&...)` (the only remaining call site that needed it).

Verified:
- `cargo build --workspace` clean.
- `cargo test -p math` -> 178 passing.
- `cargo test -p lambda-vm-prover --lib` -> 281 pass / 77 pre-existing
  failures unrelated to constraints (same baseline as main).
- `make lint` clean across all three clippy configs.
diegokingston added a commit that referenced this pull request May 26, 2026
#618)

After #604 (boundary.rs eval-form) and #621 (end_exemptions_poly eval-form)
landed on main, no production code in the prover uses `Polynomial`
operators anymore. The only remaining user was `Polynomial::interpolate`
(test-only Lagrange interpolation, gated `#[cfg(test)]` and used as a
naive reference by the FFT cross-check tests). With both production
references gone, the operators + the naive interpolant can go too.

Removed:
- All 14 operator `impl`s (Add P+/-P, Neg P, Mul P*P, Sub P-FieldElement);
  the previous PR kept these four families because `transition.rs` and
  `boundary.rs` used them. They are now fully unreachable.
- `Polynomial::interpolate` and `InterpolateError` from `polynomial.rs`.
- Operator unit tests in `polynomial_tests.rs` (adding_, negating_,
  multiply_*) and their helpers (`polynomial_a`, `polynomial_b`,
  `polynomial_minus_a`, `polynomial_a_plus_b`).
- `compose` test helper + `composition_works` test (depended on the
  removed `interpolate`).
- Direct `interpolate_x_*_y_*` tests (tested the removed function).

Rewrote:
- FFT correctness tests (both `fft_tests.rs` and
  `fft_friendly_u64_goldilocks_tests.rs`): the
  `gen_fft_and_naive_interpolate` / `_coset_` helpers cross-checked
  `interpolate_fft` against naive Lagrange. Replaced with FFT
  round-trip via direct Horner evaluation — interpolate via FFT, then
  re-evaluate at every twiddle with `Polynomial::evaluate` (a different
  algorithm). Same independence property, no naive Lagrange needed.
- `simple_interpolating_polynomial_by_hand_works`: swapped `*` for
  `mul_with_ref(&...)` (the only remaining call site that needed it).

Verified:
- `cargo build --workspace` clean.
- `cargo test -p math` -> 178 passing.
- `cargo test -p lambda-vm-prover --lib` -> 281 pass / 77 pre-existing
  failures unrelated to constraints (same baseline as main).
- `make lint` clean across all three clippy configs.
github-merge-queue Bot pushed a commit that referenced this pull request May 26, 2026
* refactor(math): remove dead Deserializable trait and unreachable error variants

`Deserializable` had a single impl (`Proof<T>`) whose only `deserialize`
call site was inside that impl itself — a recursion with no external
entry point. Production deserialization uses `bincode`/serde everywhere.
Delete the trait and the `Proof<T>` impl.

`merkle_tree/proof.rs` also carried a dead `Serializable` impl behind
`#[cfg(feature = "alloc")]` — a feature the `crypto` crate does not even
define, so the block never compiled. It referenced `math::traits::Serializable`,
which does not exist. Remove it.

Error cleanup (none constructed anywhere in the workspace):
- `DeserializationError` enum + `From<ByteConversionError>` impl — only
  reachable through the now-deleted `Deserializable`.
- `PairingError` enum — lambda_vm has no pairings.
- `ByteConversionError::{InvalidValue, PointNotInSubgroup, ValueNotCompressed}`
  — the latter two are elliptic-curve concepts; there is no EC here.
- `CreationError::InvalidDecString`.

Pure dead-code removal; `cargo build`/`cargo test -p math` unchanged
(216 tests pass).

* refactor(math): remove vestigial helpers module

`helpers::next_power_of_two(u64)` was bit-for-bit equivalent to stdlib
`u64::next_power_of_two()`. Its only caller, `resize_to_next_power_of_two`,
was used solely by the `fibonacci_rap` example — production trace sizing
already calls stdlib `.next_power_of_two()` directly everywhere.

Move `resize_to_next_power_of_two` into `examples/fibonacci_rap.rs` as a
local helper (rewritten on `usize` with the stdlib call, dropping the old
`try_into().unwrap()` TODO), and delete the `helpers` module.

* refactor(math): gate evaluate_fft_bit_reversed behind cfg(test)

`evaluate_fft_bit_reversed` was a non-gated `pub fn` but is only called
from the FFT property tests — no production caller in stark or prover.
Gate it `#[cfg(test)]` so it stops appearing in the production API
surface. Restore the gate (or remove it) when a real caller appears.

* refactor(math): gate test-only Polynomial::{truncate, reverse}

Both methods are called only from `fast_division` / `invert_polynomial_mod`,
which are themselves already `#[cfg(test)]`. No production caller in stark
or prover. Gate them `#[cfg(test)]` so they leave the production API surface.

`long_division_with_remainder` is left public — it is used by the
debug-checks path (`check_boundary_polys_divisibility`) and is a
legitimate general polynomial operation other code could want.

* refactor(math): flatten fft/cpu/* up to fft/*

The `fft/cpu/` directory implied a `fft/gpu/` sibling that never existed —
GPU code lives in the separate `crypto/math-cuda` crate. The `cpu` nesting
was a vestigial holdover.

Move `bit_reversing`, `bowers_fft`, `roots_of_unity`, `fft`, and
`bowers_fft_tests` from `fft/cpu/` up to `fft/`; delete `fft/cpu/mod.rs`
and merge its declarations into `fft/mod.rs`. Update every
`fft::cpu::...` / `super::cpu::...` path across math, stark, prover, and
math-cuda to `fft::...`.

Pure module move — no logic change. 216 math tests pass; stark, prover,
math-cuda all build.

* refactor(math): collapse polynomial/ folder into one polynomial.rs

Two problems compounded:
1. `polynomial/` was a folder holding a single file (`mod.rs`) — the
   directory level served no purpose.
2. Two files were named `polynomial` (`polynomial/mod.rs` and
   `fft/polynomial.rs`), both defining `impl Polynomial<...>` blocks.

Merge `fft/polynomial.rs` (the FFT-specific `impl Polynomial` blocks plus
the `evaluate_fft_cpu` / `interpolate_fft_cpu` free functions) into
`polynomial/mod.rs`, then rename it to a top-level `crypto/math/src/polynomial.rs`
and delete the folder.

Result: one `polynomial.rs` holding the `Polynomial` type and all of its
methods (core ring ops + FFT). `fft/` is now purely the FFT *algorithm*
layer (Bowers, twiddles, bit-reversal, roots of unity).

Also delete the stale `polynomial/README.md` — inherited lambdaworks
docs referencing multilinear-polynomial files that don't exist in
lambda_vm.

Updated the one external path (`fft::polynomial::compose_fft` ->
`polynomial::compose_fft`). 216 math + 124 stark tests pass.

* refactor(math): consolidate scattered tests into tests/ directory

Three test conventions coexisted: the `tests/` directory, a sibling
`bowers_fft_tests.rs` next to source, and an inline `#[cfg(test)] mod`
in the polynomial module. Unify on the `tests/` directory.

- Move `fft/bowers_fft_tests.rs` -> `tests/bowers_fft_tests.rs`, register
  it in `tests/mod.rs`, drop the `mod bowers_fft_tests;` from `fft/mod.rs`.
- Move the inline coset-LDE `mod tests` out of `polynomial.rs` into a new
  `coset_lde_tests` module in `tests/fft_tests.rs`.
- Rename `fft/fft.rs` -> `fft/reference_fft.rs`: after the Step 6 flatten
  it collided with its own parent module name (`fft::fft`), which clippy
  rejects. The new name also says what it is — the reference radix-2 FFT
  used only to cross-check the production Bowers FFT. Gate the module
  `#[cfg(test)] pub(crate)` since its whole content is test-only.

216 math tests pass (same count — no tests lost in the move).

* refactor(math): drop redundant FFT cross-check code, relocate inline tests

The reference radix-2 FFT existed only to cross-check the production Bowers
FFT. Bowers is already verified directly against a naive O(n^2) DFT at the
same sizes, so reference_fft.rs and its four comparison tests are dead
weight. gate get_powers_of_primitive_root as test-only (production twiddles
go through LayerTwiddles), collapse the evaluate_fft_cpu / interpolate_fft_cpu
wrappers into their sole callers, and narrow coset_lde_full_into to
pub(crate).

Move the inline test modules from bit_reversing.rs, goldilocks.rs and
extensions_goldilocks.rs into the dedicated tests/ directory.

* refactor(math): remove dead polynomial-division cluster, fix zero-offset panic

Addresses PR review feedback.

Reviewers flagged that `#[cfg(test)]`-gating previously-public methods is a
breaking change. Rather than keep them as `pub(crate)` dead code, the cluster
is removed outright — none of it has a production caller:

- Delete `Polynomial::{fast_division, invert_polynomial_mod,
  fast_fft_multiplication, truncate, reverse}` and `evaluate_fft_bit_reversed`,
  plus their 8 tests. The fast polynomial-division algorithm had no production
  use; it existed only to be tested.
- `get_powers_of_primitive_root` stays `#[cfg(test)]` — it is a genuine test
  helper used to validate the production FFT, not dead code.

Also fixes the pre-existing panic reviewers noted in `interpolate_offset_fft`:
`offset.inv().unwrap()` panicked on a zero coset offset. Add
`FFTError::InvalidCosetOffset` and propagate it instead.

* refactor(math): remove dead Polynomial APIs and self-referential tests

Continues the dead-code audit of polynomial.rs. Every item removed here
has no production caller — each is a function reachable only from its own
tests, or a test-file algorithm that tests itself.

Removed from polynomial.rs:
- to_extension (zero callers anywhere)
- long_division_with_remainder, leading_coefficient
- ruffini_division_inplace
- interpolate_coset_eval_with_g_n_inv (base-field variant)

Removed the matching self-referential tests/helpers from
polynomial_tests.rs (division_works, division_by_zero_degree_polynomial_works,
test_xgcd, ruffini_inplace_* and the div_with_ref / xgcd / ruffini_division
helpers).

The 3 barycentric tests that exercised the base-field
interpolate_coset_eval_with_g_n_inv now call the production-used
interpolate_coset_eval_ext_with_g_n_inv (instantiated E=F), so they keep
covering the real code path.

* refactor(math): keep Deserializable in math; leave crypto/merkle_tree/proof.rs untouched

The earlier dead-code sweep removed `Deserializable` + `DeserializationError`
from `math` because its only out-of-crate user was the `impl Deserializable`
in `crypto/crypto/src/merkle_tree/proof.rs`, so the PR also dropped those
impls. PR #611 (arity-4 FRI + Merkle caps) explicitly asks this PR to leave
`proof.rs` alone, so revert that pair of deletions:

- Restore `proof.rs` to main's contents (diff vs main: empty).
- Restore `DeserializationError` enum and `From<ByteConversionError>` impl
  in `math/src/errors.rs`.
- Restore `Deserializable` trait + the `DeserializationError` import in
  `math/src/traits.rs`.

The `Serializable` half of the impl block in `proof.rs` is gated on
`crypto/crypto`'s `alloc` feature, which no consumer in this repo
enables — that code path was already non-compiling and is unchanged here.

Tests: `cargo test -p math -p crypto` → 46 + 198 green. `make lint` clean.

* refactor(math): drop all 14 Polynomial operator impls + naive Lagrange (#618)

After #604 (boundary.rs eval-form) and #621 (end_exemptions_poly eval-form)
landed on main, no production code in the prover uses `Polynomial`
operators anymore. The only remaining user was `Polynomial::interpolate`
(test-only Lagrange interpolation, gated `#[cfg(test)]` and used as a
naive reference by the FFT cross-check tests). With both production
references gone, the operators + the naive interpolant can go too.

Removed:
- All 14 operator `impl`s (Add P+/-P, Neg P, Mul P*P, Sub P-FieldElement);
  the previous PR kept these four families because `transition.rs` and
  `boundary.rs` used them. They are now fully unreachable.
- `Polynomial::interpolate` and `InterpolateError` from `polynomial.rs`.
- Operator unit tests in `polynomial_tests.rs` (adding_, negating_,
  multiply_*) and their helpers (`polynomial_a`, `polynomial_b`,
  `polynomial_minus_a`, `polynomial_a_plus_b`).
- `compose` test helper + `composition_works` test (depended on the
  removed `interpolate`).
- Direct `interpolate_x_*_y_*` tests (tested the removed function).

Rewrote:
- FFT correctness tests (both `fft_tests.rs` and
  `fft_friendly_u64_goldilocks_tests.rs`): the
  `gen_fft_and_naive_interpolate` / `_coset_` helpers cross-checked
  `interpolate_fft` against naive Lagrange. Replaced with FFT
  round-trip via direct Horner evaluation — interpolate via FFT, then
  re-evaluate at every twiddle with `Polynomial::evaluate` (a different
  algorithm). Same independence property, no naive Lagrange needed.
- `simple_interpolating_polynomial_by_hand_works`: swapped `*` for
  `mul_with_ref(&...)` (the only remaining call site that needed it).

Verified:
- `cargo build --workspace` clean.
- `cargo test -p math` -> 178 passing.
- `cargo test -p lambda-vm-prover --lib` -> 281 pass / 77 pre-existing
  failures unrelated to constraints (same baseline as main).
- `make lint` clean across all three clippy configs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants