fix(core): load blocks sealed by split-KV peers into contiguous device layouts#382
Open
xiaguan wants to merge 3 commits into
Open
fix(core): load blocks sealed by split-KV peers into contiguous device layouts#382xiaguan wants to merge 3 commits into
xiaguan wants to merge 3 commits into
Conversation
Newer clippy flags the 8-argument binding; the signature mirrors the NIXL descriptor-index read API 1:1, so collapsing it into a struct would only obscure the FFI contract. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e layouts build_copy_descs paired a Contiguous device block with only the host block's segment 0 for the full span. A block sealed by a peer with a split-KV registration (e.g. the vLLM connector's KV-first layout) stores K and V as two separate host segments, so a contiguous-layout instance loading it read past the K allocation and restored garbage V data. Mirror the Split branch's existing contiguous-host fallback: when the host block carries two segments, emit one copy per segment targeting each half of the device span, and reject blocks whose segments do not exactly span the device block instead of copying misaligned bytes. This is what lets an openinfer decode instance (per-layer fused [K|V] pages) restore blocks prefilled and sealed by a vLLM prefill instance (per-layer split K/V) in P/D disaggregation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pping Add regression tests for build_copy_descs' Contiguous branch when the host block carries two segments (split-KV peer): verify two copies are emitted with correct device addresses and sizes, and that mismatched segment spans are rejected as incompatible KV layouts instead of copying misaligned bytes. Also clarifies the rejection error message: 'N segments (k+v bytes) but contiguous device block is M bytes' reads less ambiguously than the prior 'N x k+v bytes' phrasing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
build_copy_descspairs aContiguousdevice block with only the host block's segment 0 for the full span. A block sealed by a peer with a split-KV registration (e.g. the vLLM connector's KV-first(2, num_blocks, ...)layout) stores K and V as two separate host segments, so a contiguous-layout instance loading it reads past the K allocation and restores garbage V data — silently.This is the missing half of an existing asymmetry: the
Splitbranch already falls back gracefully when the host block is contiguous (v_ptr = k_ptr + k.bytes), but theContiguousbranch had no handling for a split host block.Fix
When the host block carries two segments, emit one copy per segment targeting each half of the device span. Blocks whose segments don't exactly span the device block are rejected with an explicit error (same "incompatible KV layouts" family as the existing slot-count guard) instead of copying misaligned bytes.
Also carries a one-line
#[allow(too_many_arguments)]onrdma_v1::read_async_indices— newer clippy fails the pre-commit hook on master; the signature mirrors the NIXL FFI 1:1.Validation
End-to-end P/D disaggregation on an 8-GPU H200 node (jz node 34):
--block-size 16, per-layer split-KV registration, 36 slots)[K|V]pages, contiguous registration, 36 slots)temperature=0: all outputs byte-identical to D's local-prefill baseline.Without this fix the same setup restores corrupted V segments.
🤖 Generated with Claude Code