Skip to content

Goldilocks#210

Open
TomWambsgans wants to merge 47 commits into
mainfrom
goldilocks
Open

Goldilocks#210
TomWambsgans wants to merge 47 commits into
mainfrom
goldilocks

Conversation

@TomWambsgans
Copy link
Copy Markdown
Collaborator

No description provided.

TomWambsgans and others added 30 commits April 15, 2026 18:18
Co-authored-by: Copilot <copilot@github.com>
w
Co-authored-by: Copilot <copilot@github.com>
Bring main's MTU-XMSS structure (tweak table, public_param, T-Sponge with
replacement) into the goldilocks branch with all poseidon-related sizes
halved:

  field-element widths    main (KoalaBear)   goldilocks
  ------------------    -----------------   ----------
  TWEAK_LEN                 2                 1
  XMSS_DIGEST_LEN           4                 2
  RANDOMNESS_LEN_FE         6                 3
  MESSAGE_LEN_FE            8                 4
  PUBLIC_PARAM_LEN_FE       4                 2
  POSEIDON1_WIDTH          16                 8
  DIGEST_LEN_FE             8                 4

Tweak table slots are 2 FE (1 actual tweak FE + 1 zero pad). The packed
tweak fits in a single 64-bit Goldilocks element via
`(tweak_type << 42) | (sub_position << 32) | index`.

Port main's poseidon precompile features (`half_output`,
`hardcoded_offset_left`) from Poseidon16 to Poseidon8, with new committed
columns for the flags and `effective_index_left_first/second`. The
half-output trace tail values are filled in a post-pass from
`memory_padded` (lookup-only — the AIR doesn't constrain them).

Encoding decomposition uses the goldilocks-proven 21 chunks of W=3 bits
per FE with a factored 1-bit canonical check
`(diff)·(diff − 2^63) == 0`, applied to the first 2 of 4 output FE for
exactly V = 42 chunks (no V_GRINDING).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TomWambsgans and others added 4 commits May 11, 2026 22:05
…onky3#1606)

Routes div_2exp_u64(1) through halve() instead of mul_2exp_u64(191),
which walks the 96-entry POWERS_OF_TWO table and does a multiply.
Microbench: ~0.95 ns/op -> ~0.43 ns/op (~2.2x throughput).

The same upstream PR also makes halve() branchless; that change measured
~10% slower here on Zen 4 (LLVM already emits cmov for the simple
`if x & 1 == 0` form), so it is not included.

Co-authored-by: Robin Salen <salenrobin@gmail.com>
Symmetric counterpart to the div_2exp_u64 fast path. mul_2exp_u64(1)
no longer indexes into POWERS_OF_TWO and does a full Goldilocks mul —
it returns *self + *self. Microbench: ~0.55 ns/op -> ~0.43 ns/op
(~25% faster).
Brings main into the goldilocks branch. The bulk of the work was porting
main's PR #223 (duplex-sponge Fiat-Shamir) to the Goldilocks field, since
goldilocks never adopted it.

Conflict resolutions of note:
- AIR trait: kept main's `n_shift_columns` / shift-columns-first layout;
  dropped the `low_degree` feature (goldilocks removed it — the Goldilocks
  poseidon8 AIR uses direct x^7 constraints, not `low_degree_block`).
- extension_op/air.rs: cubic (DIM=3) layout reordered shift-columns-first.
- Duplex Challenger ported to Goldilocks (WIDTH=8, RATE=4, CAPACITY=4);
  added a `Permutation` trait to the `symetric` crate.
- New `poseidon8_permute` precompile: AIR (flag_permute column,
  outputs_left/right, mutex constraints), trace gen, ISA, simplifier.
- Duplex `fiat_shamir.py` rewritten for DIGEST_LEN=4.
- poseidon8 MAX_LOG_N_ROWS lowered 21 -> 20: the permute variant widened
  the table by 5 columns, which would otherwise exceed the WHIR commitment
  surface cap.

cargo fmt + clippy clean; full `cargo test --workspace` passes;
`recursion --n 2` aggregation runs end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant