[AMDGPU] comgr: Fix ELF trampoline growth for clang/lld-produced HSACOs#2331
Merged
Conversation
Signed-off-by: xintin <gaurav.verma@amd.com>
Collaborator
chinmaydd
reviewed
Apr 27, 2026
chinmaydd
reviewed
Apr 27, 2026
chinmaydd
reviewed
Apr 27, 2026
66d1eed to
746c6d8
Compare
Collaborator
746c6d8 to
983348c
Compare
Collaborator
Signed-off-by: xintin <gaurav.verma@amd.com>
983348c to
895cfb0
Compare
Collaborator
chinmaydd
reviewed
Apr 27, 2026
chinmaydd
reviewed
Apr 27, 2026
chinmaydd
reviewed
Apr 28, 2026
chinmaydd
reviewed
Apr 28, 2026
Signed-off-by: xintin <gaurav.verma@amd.com>
Author
|
Thanks, @chinmaydd |
Collaborator
lamb-j
pushed a commit
that referenced
this pull request
Apr 30, 2026
~Stacked on #2203.~ Merged! ~picked from #2226 Closed. ~Depends on #2331 (cherry-picked for now).~ Merged. ## Summary Hotswap patch for the GFX1250 A0 WMMA/SWMMAC co-execution hazard. Detects WMMA/SWMMAC instructions that lack sufficient v_nop separation before the first overlapping co-executable VALU on A0 silicon, and inserts the required v_nop padding via trampoline. On A0, certain WMMA forms need more v_nop slots than B0 before an overlapping VALU can safely co-execute. The patch scans the decoded instruction stream, classifies each WMMA by its A0/B0 nop requirement, counts existing safe slots (v_nop, non-overlapping VALU, non-terminating SALU), and emits a trampoline with the deficit v_nops before the hazardous VALU. ## Changes - **comgr-hotswap-patch-wmma-hazard.cpp**: Strong-symbol override for applyWmmaHazardPatch. Classifies WMMA/SWMMAC via MCInstrDesc::TSFlags, detects VGPR overlap via MCRegisterInfo::regsOverlap, emits v_nop padding via buildTrampoline. Includes TSFlags sync comment, classifyWmmaNops ordering documentation, printInst round-trip documentation, duplicate VALU guard (DenseSet), buildTrampoline result check, dead variable removal. - **comgr-hotswap-internal.h**: Add WmmaNopReq struct and classifyWmmaNops declaration. - **comgr-hotswap-elf.cpp**: Remove duplicate refuse check from cherry-pick merge of #2331. - **hotswap-rewrite.c**: Remove output-size check from --output path (trampoline patches grow the ELF). - **CMakeLists.txt**: Add WMMA patch source to library build. - **hotswap-wmma-hazard.s**: Lit test: v_wmma_i32_16x16x64_iu8 followed by overlapping v_add_f32, verifies v_nop padding and idempotency. - **HotswapMCTest.cpp**: 10 new classifyWmmaNops unit tests covering all WMMA variants and ordering. - **test-unit/CMakeLists.txt**: Link WMMA patch source into HotswapMCTests. ### Test plan - lit, ctest, and unit test passes - ASan lit tests: no memory errors --------- Signed-off-by: xintin <gaurav.verma@amd.com> Co-authored-by: Sanket Pandit <sanket.pandit@amd.com>
kirthana14m
pushed a commit
that referenced
this pull request
Apr 30, 2026
~Stacked on #2203.~ Merged! ~picked from #2226 Closed. ~Depends on #2331 (cherry-picked for now).~ Merged. ## Summary Hotswap patch for the GFX1250 A0 WMMA/SWMMAC co-execution hazard. Detects WMMA/SWMMAC instructions that lack sufficient v_nop separation before the first overlapping co-executable VALU on A0 silicon, and inserts the required v_nop padding via trampoline. On A0, certain WMMA forms need more v_nop slots than B0 before an overlapping VALU can safely co-execute. The patch scans the decoded instruction stream, classifies each WMMA by its A0/B0 nop requirement, counts existing safe slots (v_nop, non-overlapping VALU, non-terminating SALU), and emits a trampoline with the deficit v_nops before the hazardous VALU. ## Changes - **comgr-hotswap-patch-wmma-hazard.cpp**: Strong-symbol override for applyWmmaHazardPatch. Classifies WMMA/SWMMAC via MCInstrDesc::TSFlags, detects VGPR overlap via MCRegisterInfo::regsOverlap, emits v_nop padding via buildTrampoline. Includes TSFlags sync comment, classifyWmmaNops ordering documentation, printInst round-trip documentation, duplicate VALU guard (DenseSet), buildTrampoline result check, dead variable removal. - **comgr-hotswap-internal.h**: Add WmmaNopReq struct and classifyWmmaNops declaration. - **comgr-hotswap-elf.cpp**: Remove duplicate refuse check from cherry-pick merge of #2331. - **hotswap-rewrite.c**: Remove output-size check from --output path (trampoline patches grow the ELF). - **CMakeLists.txt**: Add WMMA patch source to library build. - **hotswap-wmma-hazard.s**: Lit test: v_wmma_i32_16x16x64_iu8 followed by overlapping v_add_f32, verifies v_nop padding and idempotency. - **HotswapMCTest.cpp**: 10 new classifyWmmaNops unit tests covering all WMMA variants and ordering. - **test-unit/CMakeLists.txt**: Link WMMA patch source into HotswapMCTests. ### Test plan - lit, ctest, and unit test passes - ASan lit tests: no memory errors --------- Signed-off-by: xintin <gaurav.verma@amd.com> Co-authored-by: Sanket Pandit <sanket.pandit@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
growWithTrampolinespreviously refused to operate on ELFs where anySHF_ALLOCsection (e.g..dynamic) appeared after.textin the file layout. clang/lld-produced HSACOs always place.dynamicafter.text, so trampoline-based hotswap patch will fail on code objects assembled via%clangin lit tests.Changes
.textalloc sections exist.adjustSectionHeaders: shiftsh_addrforSHF_ALLOCsections after.text, keeping virtual addresses consistent with the shifted file offsets.adjustProgramHeaders: shiftp_vaddrandp_paddrfor segments after.text, keeping the ELF loader mapping consistent.growWithTrampolinessignature updated to acceptSNopBytesfor alignment padding. Trampoline insertion is padded to the maximumsh_addralignof post-.textalloc sections (usingllvm::alignTo) to preserve section alignment invariants. Padding is filled with s_nop bytes.comgr-hotswap-internal.hto reflect the new behavior (replaces the old "must be the last SHF_ALLOC section" invariant).comgr-hotswap-b0a0.cppto passLS.SNopBytes.The data copy and file-offset shifting (
sh_offset,p_offset,e_shoff) were already implemented correctly. Only the virtual address updates were missing.Need
This unblocks
.slit tests for all trampoline-based hotswap patches:Without this fix, these patches can only be tested via comgr-action C drivers that produce a different ELF layout.
Test plan
It is an infra only change.
%clang -nostdlib, confirms.dynamicexists after.textin the input ELF, runshotswap-rewrite --output, and verifies the output is a valid disassemblable elf64-amdgpu.hotswap-rewrite.candhotswap-rewrite-e2e.hiptests still pass (no regression).