Skip to content

Fast forward with closed source changes and reconcile NvmExpressDxe divergence#69

Merged
maheeraeron merged 5 commits intomicrosoft:mainfrom
maheeraeron:user/maheeraeron/nvme-fix
Apr 28, 2026
Merged

Fast forward with closed source changes and reconcile NvmExpressDxe divergence#69
maheeraeron merged 5 commits intomicrosoft:mainfrom
maheeraeron:user/maheeraeron/nvme-fix

Conversation

@maheeraeron
Copy link
Copy Markdown
Collaborator

@maheeraeron maheeraeron commented Apr 28, 2026

NvmExpressDxe diverged significantly from closed source due to missing changes that never made it here. That fix has gone in.

Afterwards, cherry pick the remaining closed source changes on top.

@maheeraeron maheeraeron requested a review from mebersol April 28, 2026 20:07
jennagoddard and others added 4 commits April 28, 2026 13:10
Addresses a few issues:
1. Incorrect buffer is passed to NvmExpressMakeAddressRangeShared, resulting in the admin buffer being made host visible twice and the IO buffer remaining private.
2. Missing error handling results in a hypercall to make a buffer private that was never shared if NvmExpressMakeAddressRangeShared fails x 2
3. On Map failures in NvmExpressPassThru transfer bounce buffer is leaked
4. On Map failure in NvmeCreatePrpList Prp list is not made private before freeing buffer

----
#### AI description  (iteration 1)
#### PR Classification
Bug fix to prevent NVMe DXE crashes in isolated/TDX environments by correcting cleanup paths and memory visibility handling.

#### PR Summary
Improves error handling and resource cleanup in NVMe DXE initialization and pass-through flows on Turin/TDX, preventing leaks and invalid buffer usage during isolated memory mapping failures.
- `MsvmPkg/NvmExpressDxe/NvmExpressHci.c`: Free Admin/IO queue buffers, null pointers, and log/assert when `NvmExpressMakeAddressRangeShared()` fails in isolated mode.
- `MsvmPkg/NvmExpressDxe/NvmExpressPassthru.c`: Convert early returns on DMA mapping failures to unified `EXIT` cleanup, add detailed DEBUG logs, and fix PRP list allocation/share/map error handling to free buffers and clear pointers safely.
- `MsvmPkg/NvmExpressDxe/NvmExpress.c`: Minor cleanup/consistency in queue buffer `FreeBuffer` calls during driver queue teardown.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #61838911
…th uninitialized handle

Vmbus DXE calls MakeAddressRangeNotHostVisible with uninitialized protection handle for confidential channels as the channel was never shared with the host.

Related work items: #61859786
…rotection object

Fix memory leak of protection object

----
#### AI description  (iteration 1)
#### PR Classification
Bug fix and API change to prevent leaked `EFI_HV_PROTECTION_OBJECT` instances when reverting host-visibility protections.

#### PR Summary
Fixes a protection-handle lifetime leak by updating `MakeAddressRangeNotHostVisible` to free and null the handle, and propagates the new pointer-based API across all callers.
- `MsvmPkg/EfiHvDxe/EfiHv.c`: change `EfiHvMakeAddressRangeNotHostVisible` to take `EFI_HV_PROTECTION_HANDLE*`, validate non-NULL (fail-fast), use the underlying protection object for list removal/page params, then `FreePool` and set handle to `NULL`; add `PageCount==0` validation in `EfiHvpModifySparseGpaPageHostVisibility`; update disconnect cleanup callsite.
- `MsvmPkg/Include/Protocol/EfiHv.h`: update protocol function typedef to accept `IN OUT EFI_HV_PROTECTION_HANDLE *` to support freeing and nulling.
- `MsvmPkg/EventLogDxe/EventLogger.c`: update `MakeAddressRangeNotHostVisible` call to pass handle by address.
- `MsvmPkg/EmclDxe/Emcl.c`: update `MakeAddressRangeNotHostVisible` call to pass `Block->ProtectionHandle` by address.
- `MsvmPkg/NvmExpressDxe/NvmExpressBounce.c` and `MsvmPkg/VmbusDxe/VmbusChannel.c`: update all `MakeAddressRangeNotHostVisible` callers to pass protection handles by address.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #61856792
…d physical address width

**Why is this change being made?**

Bug #48431797: On ARM64, `ConfigureMmu()` in PEI fails when the host provides MMIO ranges that extend beyond the CPU's physical address width (e.g. 36-bit PA = 64GB). The MMU setup attempts to create page table entries for unmappable addresses, causing silent corruption or crashes.

**What changed?**

Added explicit validation in `ConfigureMmu()` that the high MMIO gap (base + size) fits within `MaxAddress`, using safe arithmetic (`SafeUint64Add`). If the MMIO range overflows or exceeds the physical address limit, the firmware now calls `FAIL_FAST_UNEXPECTED_HOST_BEHAVIOR()` with a clear error message indicating why.

The existing region 4 underflow check was also upgraded from `ASSERT + return EFI_INVALID_PARAMETER` (which callers ignored) to `FAIL_FAST_UNEXPECTED_HOST_BEHAVIOR()`.

**How was the change tested?**

Build verification (no compilation errors). Runtime testing needed on ARM64 VM with 36-bit PA width.

----
#### AI description  (iteration 1)
#### PR Classification
Bug fix to prevent system crashes by adding validation to fail fast when MMIO (Memory-Mapped I/O) ranges exceed the CPU's physical address width on AARCH64 architecture.

#### PR Summary
Adds early validation in the MMU configuration to detect when host-provided MMIO ranges extend beyond the CPU's physical address capabilities, replacing silent failures with explicit fail-fast behavior.

- `Mmu.c`: Added validation block to check if high MMIO range causes overflow or exceeds `MaxAddress` using `SafeUint64Add`, triggering `FAIL_FAST_UNEXPECTED_HOST_BEHAVIOR()` on error
- `Mmu.c`: Replaced `ASSERT(FALSE)` and `EFI_INVALID_PARAMETER` return with `FAIL_FAST_UNEXPECTED_HOST_BEHAVIOR()` for the underflow validation case
- `Mmu.c`: Added `CrashLib.h` include to support the fail-fast macro
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #48431797
@maheeraeron maheeraeron changed the title Match NvmExpressDxe with closed source Fast forward with closed source changes and reconcile NvmExpressDxe divergence Apr 28, 2026
##

#Override : 00000002 | MdeModulePkg/Bus/Pci/NvmExpressDxe/NvmExpressDxe.inf | 7e0eadc2c658b4d73335aba27f4a1c0c | 2025-06-11T22-02-17 | 3568dd82badd2e9c331f43c30f119f6c6908497a
#Override : 00000002 | MdeModulePkg/Bus/Pci/NvmExpressDxe/NvmExpressDxe.inf | 7e0eadc2c658b4d73335aba27f4a1c0c | 2026-02-27T22-48-59 | 187a160e565265c7e9bfdcf157beebef9f2287b9
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

187a160e565265c7e9bfdcf157beebef9f2287b9

correct for OSS submodule?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, discussed offline

it matches closed source and builds, but i figure that's because closed source's override hash points to a MU_BASECORE commit that also lives in MU_BASECORE's github

Copy link
Copy Markdown
Collaborator

@mebersol mebersol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@maheeraeron maheeraeron merged commit 2907a9a into microsoft:main Apr 28, 2026
8 checks passed
@maheeraeron maheeraeron deleted the user/maheeraeron/nvme-fix branch April 28, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants