Skip to content

[AMDGPU] comgr: Hotswap dispatcher vtable + .def registry, fix Windows support#2491

Merged
lamb-j merged 3 commits into
ROCm:amd-stagingfrom
xintin:comgr-windows-hotswap-vtable
May 13, 2026
Merged

[AMDGPU] comgr: Hotswap dispatcher vtable + .def registry, fix Windows support#2491
lamb-j merged 3 commits into
ROCm:amd-stagingfrom
xintin:comgr-windows-hotswap-vtable

Conversation

@xintin
Copy link
Copy Markdown

@xintin xintin commented May 11, 2026

Requires: to run CI in order to test the fix.

Summary

Replaces the LLVM_ATTRIBUTE_WEAK + #if !defined(_MSC_VER) patch dispatch pattern with a function-pointer vtable driven by a central comgr-hotswap-patches.def registry.

Addresses issue #2479: on Windows, LLVM_ATTRIBUTE_WEAK expands to nothing under MSVC, so the no-op stubs in comgr-hotswap-b0a0.cpp became regular definitions and either (a) collided with the strong overrides as LNK2005 duplicate symbols, or (b) silently won every dispatch when the patch .cpp files were guarded out with #if !defined(_MSC_VER), disabling hotswap entirely while still reporting build success.

The new dispatch contract is:
one slot on HotswapPatchVTable per patch family;
one alpha-ordered line in comgr-hotswap-patches.def per merged patch; one register*Patch(HotswapPatchVTable&) exported from each comgr-hotswap-patch-*.cpp;
installHotswapPatches() walks the .def to bind every slot;
retargetCodeObjectB0A0 drives the install once via std::call_once;
a missing registrar is a libamd_comgr / HotswapMCTests link error, not a silent feature disable. applyWmmaSplitPatches and applyScratchPatches keep nullptr defaults because no patch file ships them yet.
The dispatcher treats nullptr as a no-op slot, preserving the previous "weak stub returns 0" behaviour until those families land their first strong override.

Changes

  • comgr-hotswap-patches.def (new): X-macro registry with one HOTSWAP_PATCH(Name) line per merged patch (InPlace, Trampoline, Vop3px2Src2, WmmaHazard).
  • comgr-hotswap-internal.h: Add HotswapPatchVTable, getHotswapPatchVTable(), installHotswapPatches(). Forward-declare every register*Patch from the same .def so internal.h is the single prototype source for both libamd_comgr and the unit tests. Remove the per-pass applyXxx external declarations -- impls are now file-local in their patch .cpp.
  • comgr-hotswap-b0a0.cpp: Delete the six weak apply* stubs and the forward-declaration block. Add the singleton accessor and the .def-driven installHotswapPatches. Route the dispatcher through the vtable via a small runPerInstPass helper that treats nullptr slots as 0-return. Install once per process from retargetCodeObjectB0A0 under std::call_once. Liveness / DWARF stubs that still have no strong override anywhere in tree keep the LLVM_ATTRIBUTE_WEAK pattern -- they migrate the same way the first time a real implementation lands.
  • comgr-hotswap-patch-{inplace,trampoline,wmma-hazard,vop3px2-src2}.cpp: Drop the #if !defined(_MSC_VER) guard and its explanatory comment. Rename the dispatcher entry point applyXxx to a file-static applyXxxImpl. Add a file-scope registerXxxPatch(HotswapPatchVTable&) that binds the impl into the vtable. Test-facing helpers (classifyWmmaNops, patchScaleSrc2) now compile unconditionally, fixing the LNK2019 failures in HotswapMCTests on MSVC.

Tests

Link errors catch .def entries without matching register*Patch definitions. The tests cover the runtime binding behavior the linker cannot validate.

  • HotswapMCTest.cpp: 3 new TEST(HotswapPatchVTable, ...) blocks.
    • RegisterInPlaceBindsOnlyInPlaceSlot -- canonical worked example that pins the "binds only its own slot" invariant for one installer. Wrong-slot bugs in the other three register*Patch functions are caught transitively by the install end-to-end test below.
    • InstallBindsRegisteredAndLeavesUnregisteredNull -- end-to-end install: defaults null, every .def entry gets bound, slots without a .def entry (wmma-split, scratch) stay null. Pins the dispatcher's no-op contract for unimplemented pass families.
    • ProcessSingletonIdentityAndInstallPersists -- getHotswapPatchVTable() is a real Meyers singleton and bindings persist on it; the retargetCodeObjectB0A0 install path under std::call_once relies on this.
  • test-unit/CMakeLists.txt: Pull comgr-hotswap-b0a0.cpp, comgr-hotswap-patch-inplace.cpp, and comgr-hotswap-patch-trampoline.cpp into HotswapMCTests (the other two patch sources were already there) so installHotswapPatches and every register*Patch resolve at link time. A patches.def line without a matching definition produces a HotswapMCTests link error, which is itself a regression guard for the new contract.
  • All existing LIT and gtest coverage continues to apply unchanged. The dispatcher's observable Linux behaviour is preserved (same call order, same first-non-zero-wins per-instruction semantics, same whole-kernel passes after the inner loop).

Follow-up (deferred)

  • In-flight patches that still ride on the weak-symbol pattern (e.g. ds_2addr_stride64 per #2379, need a one-line .def add and a register*Patch shim when they rebase.
  • Migrate the remaining LLVM_ATTRIBUTE_WEAK liveness / DWARF stubs (buildCfg, computeLiveness, getInstRegDefUse, getBranchImm, verifyPatchCorrectness, addTrampolineSymbols, patchDebug*) onto the same .def contract the first time any of them grows a real implement.

@xintin xintin requested review from chinmaydd and lamb-j as code owners May 11, 2026 23:50
@xintin xintin added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature labels May 11, 2026
@xintin xintin requested review from harsh-amd and martin-luecke May 12, 2026 06:45
Comment thread amd/comgr/src/comgr-hotswap-b0a0.cpp Outdated
Comment thread amd/comgr/src/comgr-hotswap-b0a0.cpp Outdated
Comment thread amd/comgr/src/comgr-hotswap-b0a0.cpp Outdated
Comment thread amd/comgr/src/comgr-hotswap-patches.def Outdated
Copy link
Copy Markdown

@harsh-amd harsh-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, after addressing comments

Comment thread amd/comgr/src/comgr-hotswap-b0a0.cpp Outdated
Comment thread amd/comgr/AGENT_CONVENTIONS.md Outdated
xintin added 2 commits May 12, 2026 18:42
Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin
Copy link
Copy Markdown
Author

xintin commented May 12, 2026

PR #2379 landed in amd-staging and uses the old weak-symbol + #if !defined(_MSC_VER) pattern.
Including that also in this PR.

@xintin xintin force-pushed the comgr-windows-hotswap-vtable branch from 2c0862e to 28af859 Compare May 12, 2026 18:55
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the comgr-windows-hotswap-vtable branch from 28af859 to 657b312 Compare May 12, 2026 19:55
Copy link
Copy Markdown

@chinmaydd chinmaydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chinmaydd
Copy link
Copy Markdown

chinmaydd commented May 12, 2026

@xintin when merging, could you shorten down the commit message to a minimal description of the PR ?

Jacob and me have been discussing about this and will share a directive soon but we want to move towards commit messages that reflect only what the goal of the PR is.

EDIT: I guess directive is the wrong word, but we want to discuss this more broadly.

@lamb-j lamb-j merged commit dc69325 into ROCm:amd-staging May 13, 2026
52 checks passed
lamb-j added a commit to ROCm/SPIRV-LLVM-Translator that referenced this pull request May 13, 2026
ROCm/llvm-project#2491 (merged to amd-staging) fixes the
HotswapMCTests link failure on Windows that #2479 tracked. The SED
that stripped add_subdirectory(test-unit) is no longer needed —
test-unit builds and runs cleanly. check-comgr on Windows now
exercises lit + ctest + gtest, matching the Linux variant.
lamb-j added a commit to ROCm/SPIRV-LLVM-Translator that referenced this pull request May 13, 2026
* [CI] Add Windows variant (Build + 3 test jobs)

Adds Windows::release dispatcher job + spirv-ci-windows.yml — full
parity with the Linux variant: Build + Test SPIRV translator lit (with
baseline-diff + gate) + Test LLVM SPIRV codegen + Test Comgr.

Mirrors TheRock's Windows CI conventions:
  - Runner: azure-windows-scale-rocm
  - No container (Windows runs natively)
  - MSVC env via ilammy/msvc-dev-cmd action
  - Pinned ninja 1.12.1 via Chocolatey (1.13.0 has a known bug)
  - bash shell defaults
  - git long paths enabled
  - No strip step (PE/COFF debug info lives in separate .pdb)

Translator-lit uses a separate sticky-comment marker
(spirv-ci:translator-lit-windows) so it doesn't fight with the Linux
variant's comment on the same PR.

After this lands the PR rollup gets 4 more checks:
  SPIRV Compiler CI / Windows::release / Build
  SPIRV Compiler CI / Windows::release / Test SPIRV translator lit
  SPIRV Compiler CI / Windows::release / Test LLVM SPIRV codegen
  SPIRV Compiler CI / Windows::release / Test Comgr

Wall-time risk: Windows build is typically 25-45 min vs ~14 min on
Linux. Total wall-clock for a PR roughly 2x.

Test-side risk: Windows lit + Comgr fixture generation are unproven
on this runner. If individual test jobs flake, can break them out
into follow-up PRs.

Companion: ROCm/llvm-project copy of this workflow doesn't get the
Windows variant in this PR — defer until Windows is proven on the
translator side.

* [CI] Touch build.ninja after untar (Windows variant)

Same fix as #197 for the Linux variant.
tar -m sets per-file mtimes from sequential extraction order; build.ninja
ends up older than CMakeCache.txt, triggering ninja's regen rule and
cascade rebuild. Touch build.ninja explicitly to make it the newest
file in the tree.

* [CI debug] Locate AMDDeviceLibsConfig.cmake on Windows

Comgr's find_package(AMDDeviceLibs) fails on Windows. Add a one-shot
debug step before the Comgr configure to print:
  - $PWD (validates path expansion in MSVC bash)
  - file path of AMDDeviceLibs*.cmake under build-device-libs
  - any cmake/ subdirs under build-device-libs

Once we see where the file actually lands we can pin AMDDeviceLibs_DIR
explicitly. Revert this debug after the targeted fix.

* [CI] Use pwd -W on Windows for cmake -D paths

Comgr's find_package(AMDDeviceLibs) failed on Windows even though the
config file was at the expected build-device-libs/lib/cmake/AMDDeviceLibs/
location. Cause: $PWD in MSYS bash returns a Unix-style path
(/c/home/runner/...) that cmake's find_package can't traverse on Windows.

Switch device-libs and Comgr configures to compute PWD_WIN=$(pwd -W)
which returns Windows-style with forward slashes (C:/home/runner/...)
that cmake handles natively.

Also pin AMDDeviceLibs_DIR explicitly as belt-and-suspenders.

Drop the debug step.

* [CI] Skip Comgr gtests on Windows (lit + ctest only)

Comgr's HotswapMCTests gtest binary fails to link on Windows because
the hotswap apply* path is fully MSVC-guarded out (see
ROCm/llvm-project#2479). Replace `ninja check-comgr` with
`ninja test test-lit` on Windows so we run the lit suite and ctest
without dragging in the gtest aggregator (test-unit).

Linux variant unchanged — still runs full check-comgr including the
gtest binaries.

* [CI] Skip Comgr test-unit subdirectory on Windows

Drop test-unit (gtest binaries) at configure time so check-comgr's lit
and ctest layers still work on Windows. HotswapMCTests fails to link
because of the MSVC-guarded hotswap apply* path
(ROCm/llvm-project#2479); rather than special-casing the test step,
sed out add_subdirectory(test-unit) from Comgr's CMakeLists.txt before
configuring.

This is a workflow-side hack — when the upstream Comgr issue is
resolved (or Comgr exposes a CMake option to gate test-unit), drop
the sed and revert to plain check-comgr.

* [CI] Enable zstd via vcpkg for clang-offload-bundler (Windows)

Two of the Comgr lit failures on Windows (unbundle, cache) were hitting
"Compression not supported" from clang-offload-bundler — LLVM was built
without zstd or zlib. The manylinux container ships zstd dev libs on
Linux; on Windows we install via vcpkg (already present at $VCPKG_ROOT)
and pass CMAKE_TOOLCHAIN_FILE + LLVM_ENABLE_ZSTD=FORCE_ON so the
configure fails loudly if zstd isn't located.

* [CI] Use $VCPKG_ROOT vcpkg.exe to install zstd (Windows)

The runner has two vcpkg installs: C:\vcpkg (first in PATH) and the
MSVC-bundled one at $VCPKG_ROOT. Plain `vcpkg install` resolved to
C:\vcpkg, but Configure LLVM's CMAKE_TOOLCHAIN_FILE points at the
MSVC-bundled vcpkg.cmake, so find_package(zstd) couldn't see the
package. Invoke $VCPKG_ROOT\vcpkg.exe directly so install and lookup
share the same root.

* [CI] Switch to C:\vcpkg for classic-mode install (Windows)

The MSVC-bundled vcpkg at $VCPKG_ROOT is manifest-mode only — `vcpkg
install <pkg>` errors out with "this distribution does not have a
classic mode instance". The other vcpkg on the runner (C:\vcpkg, what
plain `vcpkg` resolves to via PATH) supports classic mode. Point both
the install command and CMAKE_TOOLCHAIN_FILE at C:\vcpkg so the
package zstd installs and the LLVM cmake find_package locate it.

* [CI] Pass vcpkg toolchain to device-libs and Comgr (Windows)

LLVM built with zstd exports LLVMSupport linking against
zstd::libzstd_shared. Downstream find_package(LLVM) callers in
device-libs and Comgr need CMAKE_TOOLCHAIN_FILE pointing at the same
vcpkg.cmake so the imported zstd target is defined when the export
file is processed; otherwise cmake errors with "target was not found".

* [CI] Install zstd on Windows test runners (tactical)

build.ninja embeds C:/vcpkg/installed/.../zstd.lib as a build-edge
input that ninja validates at planning time. Test jobs running
ninja against the downloaded artifact need the same vcpkg state as
the build runner. Install zstd in each Windows test job to match.

This pattern doesn't scale — every new build-time system dep
requires updates to N test jobs. Followup planned to drop ninja
from test jobs in favor of llvm-lit-direct invocation, which
removes the test-time dependency on build-time system libs.

* [CI] Drop SED workaround for Comgr test-unit on Windows

ROCm/llvm-project#2491 (merged to amd-staging) fixes the
HotswapMCTests link failure on Windows that #2479 tracked. The SED
that stripped add_subdirectory(test-unit) is no longer needed —
test-unit builds and runs cleanly. check-comgr on Windows now
exercises lit + ctest + gtest, matching the Linux variant.

* [CI] Pin test jobs to Build job's resolved llvm-project SHA

Build and test jobs both used \`ref: amd-staging\` for llvm-project
checkout. If amd-staging advances mid-run (e.g., upstream merge lands
between Build and Test), test jobs land on a newer commit than what
Build produced. ninja then sees source > object, decides to rebuild,
and uses the OLD compile command from build.ninja — which can lack
include paths added by the newer commit (recently hit when libc/shared
became a Support dependency for APFloat).

Build job now exports the resolved SHA as a job output. Test jobs
check out llvm-project at that exact SHA via needs.build.outputs.
Both Linux and Windows variants.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants