fix(core): register RDMA-fetched blocks to MetaServer after prefetch#378
Merged
Conversation
RDMA prefetch pulled blocks into the local read cache but never re-advertised them to the MetaServer. Once the original holder evicted those blocks (unregister), the MetaServer believed nobody owned them — even though the fetcher still held valid, RDMA-servable copies. After RDMA prefetch completes, the fetched block hashes are now registered to the MetaServer (fire-and-forget, same path as the normal save). SSD prefetch is skipped because those blocks were already registered by this node's own save path and eviction explicitly unregisters them. The p2p_rdma integration test now asserts that the fetching node re-registers the blocks it pulled.
feifei-111
approved these changes
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
RDMA prefetch pulled blocks into the local read cache but never re-advertised them to the MetaServer. Once the original holder evicted those blocks (unregister), the MetaServer believed nobody owned them — even though the fetcher still held valid, RDMA-servable copies in pinned RAM.
Scenario that breaks:
try_unregister→ MetaServer thinks nobody owns themFix
After RDMA prefetch completes and blocks are inserted into the read cache, the fetched block hashes are registered to the MetaServer via the same fire-and-forget path (
try_register_namespace) used by the normal save path.SSD prefetch is intentionally skipped: those blocks were already registered by this node's own save path, and eviction explicitly unregisters them — the SSD case is internally consistent.
Changes
pegaflow-core/src/storage/prefetch.rs:PrefetchSchedulernow holds an optionalMetaServerClient. Inpoll_existing, afterread_cache.batch_insert, RDMA-sourced blocks are registered to the MetaServer.pegaflow-core/src/storage/mod.rs: passmetaserver_clientintoPrefetchScheduler::new.pegaflow-server/tests/p2p_rdma.rs: addedwait_for_metaserver_ownershiphelper and a test assertion verifying the fetching node re-registers the blocks it pulled.Verification
cargo build -p pegaflow-core --no-default-features --features cuda-13,rdma— passescargo clippy -p pegaflow-core --no-default-features --features cuda-13,rdma— cleancargo test -p pegaflow-core --no-default-features --features cuda-13,rdma -- storage::prefetch— 2/2 pass