Skip to content

Refactor/fiat shamir statement streamlined#617

Open
MauroToscano wants to merge 3 commits into
mainfrom
refactor/fiat-shamir-statement-streamlined
Open

Refactor/fiat shamir statement streamlined#617
MauroToscano wants to merge 3 commits into
mainfrom
refactor/fiat-shamir-statement-streamlined

Conversation

@MauroToscano
Copy link
Copy Markdown
Contributor

No description provided.

diegokingston and others added 3 commits May 22, 2026 11:42
…script

The VM prover and verifier built their transcript with `new(&[])` — the
statement (program, public output, table layout) was never absorbed, so
Fiat-Shamir challenges did not depend on it. Soundness was preserved only
by out-of-band reconstruction checks in the verifier.

Add a canonical, versioned, length-prefixed encoder (`prover/src/statement.rs`)
that digests the full statement — ELF hash, public output, table counts, and
page layout — into a 32-byte transcript seed. Seed the transcript with it on
the prove path, the verify path, and the bus-balance replay
(`replay_transcript_phase_a`), so all three derive identical challenges for an
honest proof and diverge for any tampered statement.

Closes the frozen-heart class of issue as defense-in-depth and is a
prerequisite for recursion. Breaking proof-format change (challenges differ).

`tiny-keccak` is promoted from a dev-dependency to a dependency for the
statement digest.
Apply the simplifications to the statement-binding work:

- Drop tiny-keccak as a prod dep; the DefaultTranscript is already a
  Keccak256 absorber, so the outer keccak256 in statement_seed() was
  redundant. ELF digest now uses sha3::Keccak256 (already a transitive
  prod dep via crypto). tiny-keccak stays as a dev-dep for the in-VM
  keccak precompile cross-check.

- Replace encode_statement -> Vec<u8> + statement_seed -> [u8; 32] with
  a single absorb_statement(&mut transcript, ...) that streams bytes
  directly into the transcript. No intermediate allocation.

- Drop FORMAT_VERSION constant — the version is in DOMAIN_TAG suffix
  (LAMBDAVM_STARK_STATEMENT_V1). Bump the suffix on any encoding change.

- Drop the redundant 8-byte length prefix on the elf bytes: we hash to
  fixed 32-byte digest, so the prefix doesn't prevent any collision.

- replay_transcript_phase_a / compute_expected_commit_bus_balance now
  take &mut DefaultTranscript<E> instead of an opaque transcript_seed:
  &[u8]. Callers decide what's in the transcript; internal tests just
  hand a fresh empty one (no &[] magic sentinel).

- Fold the duplicated lde_trace_main_merkle_root append branches in
  replay_transcript_phase_a.

- verify_with_options absorbs the statement ONCE, then clones the
  transcript so multi_verify and the bus-balance replay both start
  from the same post-absorb state — no closure, no field-list drift.

- statement_tests.rs reframed around transcript.state(). 3 tests
  (state_is_deterministic, state_depends_on_every_field,
  public_output_length_prefix_prevents_collision) cover the same
  surface as the 8 byte-level encoder tests they replace, hitting the
  real absorb path. Drop the two-ELF e2e test that needed RISC-V
  artifacts and couldn't run in author env.

- 6 internal test sites (decode_tests.rs, prove_elfs_tests.rs): pass
  a fresh DefaultTranscript instead of &[].

Soundness commitment unchanged: same five statement fields bound, same
domain separation, same length prefixes where needed, same cross-site
agreement (now explicitly tested via deterministic absorb).

Follow-up: Phase A is still replayed twice on the verify path (once in
compute_expected_commit_bus_balance, once in multi_verify). That's
pre-existing, not from this PR; removing it would change multi_verify's
public API to take a closure that produces the bus balance offset given
(z, alpha). Separate scope.
@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I reviewed the statement absorption changes, transcript replay path, prover/verifier transcript initialization, and updated tests. The changes appear consistent: prover, verifier, and commit bus replay all start from the same statement-bound transcript state, and tampered statement fields should cause verification failure.

Verification note: I attempted cargo test -p lambda-vm-prover statement_tests --no-default-features and cargo check -p lambda-vm-prover --no-default-features, but both were blocked by the environment because rustup cannot write to /home/runner/.rustup/tmp on this filesystem. git diff --check passed.

Comment thread prover/src/statement.rs

t.append_bytes(&(num_private_input_pages as u64).to_le_bytes());

// runtime_page_ranges: count-prefixed; each entry fixed width.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium (soundness): The hardcoded field list silently excludes any field added to TableCounts in the future. There's no compile-time check that this array covers the full struct.

A new TableCounts field that isn't listed here won't be bound into the transcript, which would let a proof generated for one table configuration pass verification against a different one — breaking the statement-binding guarantee this PR introduces.

Consider adding a test that structurally validates field completeness, e.g.:

#[test]
fn table_counts_all_fields_absorbed() {
    // If TableCounts gains a new field, this test should fail until
    // absorb_statement is updated to include it.
    let counts = TableCounts {
        cpu: 1, lt: 2, memw: 3, memw_aligned: 4, load: 5,
        mul: 6, dvrm: 7, shift: 8, branch: 9, memw_register: 10,
        // Adding a new field here causes a compile error, which is the
        // signal to update the loop in absorb_statement.
    };
    // exhaustive struct literal above is the actual guard
    let _ = counts;
}

Alternatively, implement a fn counts_as_array(&self) -> [usize; 10] method on TableCounts (or a to_le_bytes helper) so the encoding is co-located with the struct definition.

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Review: Fiat-Shamir statement binding

The security approach is sound. Binding the full statement into the transcript before Phase A prevents malleability attacks where a tampered ELF, public output, or table layout would otherwise produce valid-looking challenges.

Logic correctness:

  • The refactor of the phase-A replay is semantically equivalent to the original (the merkle root was always appended in both branches).
  • The transcript clone for the bus-balance replay is the correct pattern — both paths start from the same post-statement state.
  • Tests using the low-level prover directly are consistent (both prover and replay use empty transcripts at that level), and the security property is exercised end-to-end by the higher-level prove/verify tests.

One issue found:

Medium (soundness): The hardcoded TableCounts field array in absorb_statement silently excludes any field added to the struct in the future — see inline comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants