Skip to content

feat(blockchain): advance XMSS preparation window preemtively#332

Open
conache wants to merge 12 commits into
lambdaclass:mainfrom
conache:xmss-advance-background
Open

feat(blockchain): advance XMSS preparation window preemtively#332
conache wants to merge 12 commits into
lambdaclass:mainfrom
conache:xmss-advance-background

Conversation

@conache
Copy link
Copy Markdown
Contributor

@conache conache commented Apr 30, 2026

🗒️ Description / Motivation

This PR closes #262.

Every 65,536 slots, an XMSS signing key has to precompute its next bottom tree via leansig's advance_preparation. The two most recently computed trees form a sliding "prepared window" of 131,072 slots - the range the key can sign for without doing more work. Once the wall-clock slot crosses out of that window, the precomputation has to run before the next signature.

PR #261 made advance_preparation() run synchronously on the BlockChainServer actor's tick handler. When the window has to slide forward, the actor blocks on the hash work long enough to stall other executions.

This PR moves that advance off the signing path, running it preemptively where blocking is cheap:

  • At startup, in BlockChain::spawn (before the actor starts ticking) - catches each loaded key's prepared window up to the current wall-clock slot. Handles long offline gaps.
  • At the end of every tick, after the interval's duties - advances each key's window to cover slot + 1, so the next tick's signing is always inside the prepared window.

The lazy advance loop inside sign_with_* is kept as a safety net, with added elapsed_ms timing logs so we'd see it if it ever fires.

What Changed

crates/blockchain/src/key_manager.rs

  • New KeyManager::advance_keys_to(slot) - iterates registered validators and advances both attestation and proposal keys to cover slot.
  • New free helper advance_key(...) - synchronous advance loop with Instant::now() timing. Emits info! at start, info! with elapsed_ms at end, warn! on activation-interval exhaustion.
  • Added matching Instant::now() + elapsed_ms info! to the pre-existing advance loops in sign_with_attestation_key / sign_with_proposal_key.

crates/blockchain/src/lib.rs

  • BlockChain::spawn: computes current wall-clock slot and calls key_manager.advance_keys_to(current_slot) before the actor starts ticking.
  • on_tick: at the very end, after metric updates, calls self.key_manager.advance_keys_to((slot + 1) as u32).

Correctness / Behavior Guarantees

  • Signing is never delayed by an advance - preempt runs at the idle tail of the interval (or before any tick fires, at startup).
  • Steady-state: the end-of-tick advance is a no-op in 65,535 out of every 65,536 slots.
  • After a long offline gap: startup catch-up walks the window forward before any tick fires.
  • Activation-interval exhaustion stays a hard error in the signing path; advance_key logs warn! and breaks so the next sign attempt surfaces the error.
  • ValidatorKeyPair shape and the KeyManagerError enum are unchanged from main.

Tests Added / Run

  • make fmt clean
  • make lint clean
  • make test passes
  • Boundary-crossing verification on devnet - only fires every 65,536 slots, deferred to operational verification. Log lines Preparing XMSS key for slot in background (start) and XMSS key advance complete (success) signal the path firing.

Related Issues / PRs

@conache conache marked this pull request as ready for review April 30, 2026 18:32
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR moves the blocking XMSS advance_preparation call off the BlockChainServer actor's tick handler into a spawn_blocking worker. The key is taken out of Option<ValidatorSecretKey>, advanced on a background thread, and sent back via a new KeyPreparedForSlot message, preventing the actor from stalling during the ~65,536-slot window advance.

  • The new test_advance_until_prepared_advances_then_detects_exhaustion test calls generate_key_with_three_bottom_trees() — documented as "slow (~minutes)" — without the #[ignore] attribute that the only other test using the same helper already carries. This will add several minutes to every make test run.

Confidence Score: 4/5

Safe to merge after addressing the missing #[ignore] on the slow test

A single P1 finding (missing test attribute that will block CI for minutes); the background-worker design itself is correct and idempotent

crates/common/types/src/signature.rs — the new test needs #[ignore]

Important Files Changed

Filename Overview
crates/common/types/src/signature.rs Adds advance_until_prepared method; new test is missing #[ignore] despite calling the slow key-generation helper (~minutes)
crates/blockchain/src/key_manager.rs Fields changed to Option<ValidatorSecretKey>, new KeyRole enum and KeyNotPreparedForSlot/KeyUnavailable error variants added; logic is clean
crates/blockchain/src/lib.rs Adds prepare_key_for_slot helper and KeyPreparedForSlot message handler; background worker pattern is correct and idempotent
bin/ethlambda/src/main.rs Trivial change: wraps loaded keys in Some(...) to match new Option<ValidatorSecretKey> field type

Sequence Diagram

sequenceDiagram
    participant Tick as on_tick actor
    participant KM as KeyManager
    participant Worker as spawn_blocking
    participant Handler as KeyPreparedForSlot handler

    Tick->>KM: sign_attestation or sign_block_root
    KM-->>Tick: Err(KeyNotPreparedForSlot)
    Tick->>Tick: prepare_key_for_slot - field.take removes key
    Tick->>Worker: advance_until_prepared(target_slot)
    Note over Tick: returns immediately, key field is None

    Worker-->>Handler: KeyPreparedForSlot message

    alt advance succeeded
        Handler->>KM: restore key field with advanced key
    else key exhausted
        Handler->>Handler: emit error log and field stays None
    end
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/common/types/src/signature.rs:202-203
**New test missing `#[ignore]` despite using slow key generation**

`generate_key_with_three_bottom_trees()` is explicitly documented as "slow (~minutes) because it computes 3 bottom trees of 65,536 leaves each", and the only other test that calls it (`test_advance_preparation_duration`) carries `#[ignore = "slow: generates production-size XMSS key (~minutes)"]` for exactly this reason. The new test omits that attribute, so `make test` will now block for several minutes on every CI run.

```suggestion
    #[test]
    #[ignore = "slow: generates production-size XMSS key (~minutes)"]
    fn test_advance_until_prepared_advances_then_detects_exhaustion() {
```

Reviews (1): Last reviewed commit: "Add formatting fixes" | Re-trigger Greptile

Comment thread crates/common/types/src/signature.rs Outdated
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, but this approach isn't good for the following reasons:

  • it skips signing when key preparation is needed
  • it adds a new edge case of a key not being available
  • changes are too many for what it offers

I realize it might be better to instead check at specific times if we can advance validator keys preemptively (i.e. check they're prepared for current slot + 1), along with an advance until the current slot when first initializing the node. Let's try with that first, doing it right after we finish duties for the current interval, in a blocking manner, and adding timing logs for it. We can see if a more complex approach is needed later down the line.

Comment thread crates/blockchain/src/lib.rs Outdated
@conache
Copy link
Copy Markdown
Contributor Author

conache commented May 18, 2026

Thanks for the PR, but this approach isn't good for the following reasons:

  • it skips signing when key preparation is needed
  • it adds a new edge case of a key not being available
  • changes are too many for what it offers

I realize it might be better to instead check at specific times if we can advance validator keys preemptively (i.e. check they're prepared for current slot + 1), along with an advance until the current slot when first initializing the node. Let's try with that first, doing it right after we finish duties for the current interval, in a blocking manner, and adding timing logs for it. We can see if a more complex approach is needed later down the line.

@MegaRedHand thanks for the clear review! 🙏 Working on a simpler approach based on your feedback.

@conache conache changed the title feat(blockchain): advance XMSS preparation window in the background feat(blockchain): advance XMSS preparation window preemtively May 18, 2026
@conache
Copy link
Copy Markdown
Contributor Author

conache commented May 18, 2026

Hey @MegaRedHand , I just updated the PR based on your feedback. Now we're advancing the XMSS preparation window preemptively, in a blocking manner:

  • at startup, so we can catch keys up to the current slot
  • at the end of each tick, for slot + 1, so signing on the next tick shouldn't wait

As suggested, I added timing logs for all the cases, so we can see how long the blocking work takes. Looking forward to your review!

@conache conache requested a review from MegaRedHand May 18, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Advance XMSS preparation window in the background

3 participants