Add checkpoint-restore before testing mode#149
Open
kamilchodola wants to merge 16 commits into
Open
Conversation
b895a1e to
5afd879
Compare
Replace the complex 3-layer overlay approach (mount --move to ready-lower, then mount test overlay on top) with a simpler strategy: - After setup completes, snapshot the overlay upper dir via rsync/cp - Before each CRIU restore, wipe upper+work and restore from snapshot - Removes mount --move dependency and extra overlay layer - Setup files now run normally without checkpoint interference - Checkpoint is taken after all setup (gas-bump, funding) completes
ed8804c to
1887197
Compare
prepare_overlay_for_client runs in a $() subshell, so associative array assignments inside it (ACTIVE_OVERLAY_LOWERS) are lost in the parent. Pipe-delimit merged|lower in stdout, parse in register_overlay_for_client, and strip the suffix before using data_dir as --dataDir.
- Store checkpoint export on tmpfs (RAM) to eliminate disk I/O - Add --keep flag to checkpoint (keep container during dump) - Add --print-stats to both checkpoint and restore for timing visibility - Remove wait_for_rpc after CRIU restore (process resumes mid-execution) - Log checkpoint tar size for diagnostics - Add tmpfs setup/teardown lifecycle management
- tmpfs 4G was too small for Nethermind checkpoint (62GB RAM machine) - Remove socket drain before checkpoint (not needed with MaxActivePeers=0) - Remove podman rm after checkpoint (--keep keeps container alive, matching benchmarkoor's approach)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
checkpoint_before_testingworkflow dispatch input for Linux Docker/CRIU checkpoint-restore runsrun.sh -Cmode that checkpoints and restoresgas-execution-clientbefore each measured testrestart_before_testing/-Rcriuin the workflow when checkpoint mode is requested and fail clearly if Docker checkpoint support is unavailableNotes
Docker/CRIU checkpoints snapshot the execution client process memory, not the bind-mounted data directory. This is an experimental Linux-only alternative to container restart for measuring cache/process-state effects.
Testing
.github/workflows/repricing-nethermind.ymlwith PyYAMLgit diff --checkbash -nagainst a CRLF-stripped stream ofrun.sh