Skip to content

[AMDGPU] comgr: Fix ELF trampoline growth for clang/lld-produced HSACOs#2331

Merged
chinmaydd merged 3 commits into
ROCm:amd-stagingfrom
xintin:hotswap-elf-growth-fix
Apr 28, 2026
Merged

[AMDGPU] comgr: Fix ELF trampoline growth for clang/lld-produced HSACOs#2331
chinmaydd merged 3 commits into
ROCm:amd-stagingfrom
xintin:hotswap-elf-growth-fix

Conversation

@xintin
Copy link
Copy Markdown

@xintin xintin commented Apr 27, 2026

Summary

growWithTrampolines previously refused to operate on ELFs where any SHF_ALLOC section (e.g. .dynamic) appeared after .text in the file layout. clang/lld-produced HSACOs always place .dynamic after .text, so trampoline-based hotswap patch will fail on code objects assembled via %clang in lit tests.

Changes

  • Removed the conservative refuse check that blocked growth when post-.text alloc sections exist.
  • adjustSectionHeaders: shift sh_addr for SHF_ALLOC sections after .text, keeping virtual addresses consistent with the shifted file offsets.
  • adjustProgramHeaders: shift p_vaddr and p_paddr for segments after .text, keeping the ELF loader mapping consistent.
  • growWithTrampolines signature updated to accept SNopBytes for alignment padding. Trampoline insertion is padded to the maximum sh_addralign of post-.text alloc sections (using llvm::alignTo) to preserve section alignment invariants. Padding is filled with s_nop bytes.
  • Updated the header comment in comgr-hotswap-internal.h to reflect the new behavior (replaces the old "must be the last SHF_ALLOC section" invariant).
  • Updated the call site in comgr-hotswap-b0a0.cpp to pass LS.SNopBytes.

The data copy and file-offset shifting (sh_offset, p_offset, e_shoff) were already implemented correctly. Only the virtual address updates were missing.

Need

This unblocks .s lit tests for all trampoline-based hotswap patches:

Without this fix, these patches can only be tested via comgr-action C drivers that produce a different ELF layout.

Test plan

It is an infra only change.

  • hotswap-elf-growth.s (new): assembles a gfx1250 kernel via %clang -nostdlib, confirms .dynamic exists after .text in the input ELF, runs hotswap-rewrite --output, and verifies the output is a valid disassemblable elf64-amdgpu.
  • Existing hotswap-rewrite.c and hotswap-rewrite-e2e.hip tests still pass (no regression).

Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin requested a review from yxsamliu April 27, 2026 19:35
@xintin xintin requested review from chinmaydd and lamb-j as code owners April 27, 2026 19:35
@xintin xintin added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature labels Apr 27, 2026
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@xintin xintin requested review from jmmartinez and yoonseoch April 27, 2026 20:33
Comment thread amd/comgr/test-lit/hotswap-elf-growth.s
Comment thread amd/comgr/src/comgr-hotswap-elf.cpp
Comment thread amd/comgr/src/comgr-hotswap-elf.cpp
@xintin xintin force-pushed the hotswap-elf-growth-fix branch from 66d1eed to 746c6d8 Compare April 27, 2026 21:36
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@xintin xintin force-pushed the hotswap-elf-growth-fix branch from 746c6d8 to 983348c Compare April 27, 2026 21:46
@xintin xintin requested a review from chinmaydd April 27, 2026 21:48
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the hotswap-elf-growth-fix branch from 983348c to 895cfb0 Compare April 27, 2026 21:52
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Comment thread amd/comgr/src/comgr-hotswap-b0a0.cpp
Comment thread amd/comgr/src/comgr-hotswap-elf.cpp Outdated
Comment thread amd/comgr/src/comgr-hotswap-elf.cpp
Comment thread amd/comgr/src/comgr-hotswap-elf.cpp Outdated
Copy link
Copy Markdown

@chinmaydd chinmaydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits

Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin
Copy link
Copy Markdown
Author

xintin commented Apr 28, 2026

Thanks, @chinmaydd

@z1-cciauto
Copy link
Copy Markdown
Collaborator

@chinmaydd chinmaydd merged commit 48d88df into ROCm:amd-staging Apr 28, 2026
39 checks passed
lamb-j pushed a commit that referenced this pull request Apr 30, 2026
~Stacked on #2203.~ Merged! 
~picked from #2226 Closed.

~Depends on #2331 (cherry-picked for now).~ Merged.

## Summary
Hotswap patch for the GFX1250 A0 WMMA/SWMMAC co-execution hazard.
Detects WMMA/SWMMAC instructions that lack sufficient v_nop separation
before the first overlapping co-executable VALU on A0 silicon, and
inserts the required v_nop padding via trampoline.
On A0, certain WMMA forms need more v_nop slots than B0 before an
overlapping VALU can safely co-execute.
The patch scans the decoded instruction stream, classifies each WMMA by
its A0/B0 nop requirement, counts existing safe slots (v_nop,
non-overlapping VALU, non-terminating SALU), and emits a trampoline with
the deficit v_nops before the hazardous VALU.

## Changes
- **comgr-hotswap-patch-wmma-hazard.cpp**: Strong-symbol override for
applyWmmaHazardPatch. Classifies WMMA/SWMMAC via MCInstrDesc::TSFlags,
detects VGPR overlap via MCRegisterInfo::regsOverlap, emits v_nop
padding via buildTrampoline. Includes TSFlags sync comment,
classifyWmmaNops ordering documentation, printInst round-trip
documentation, duplicate VALU guard (DenseSet), buildTrampoline result
check, dead variable removal.
- **comgr-hotswap-internal.h**: Add WmmaNopReq struct and
classifyWmmaNops declaration.
- **comgr-hotswap-elf.cpp**: Remove duplicate refuse check from
cherry-pick merge of #2331.
- **hotswap-rewrite.c**: Remove output-size check from --output path
(trampoline patches grow the ELF).
- **CMakeLists.txt**: Add WMMA patch source to library build.
- **hotswap-wmma-hazard.s**: Lit test: v_wmma_i32_16x16x64_iu8 followed
by overlapping v_add_f32, verifies v_nop padding and idempotency.
- **HotswapMCTest.cpp**: 10 new classifyWmmaNops unit tests covering all
WMMA variants and ordering.
- **test-unit/CMakeLists.txt**: Link WMMA patch source into
HotswapMCTests.

### Test plan
- lit, ctest, and unit test passes
- ASan lit tests: no memory errors

---------

Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Sanket Pandit <sanket.pandit@amd.com>
kirthana14m pushed a commit that referenced this pull request Apr 30, 2026
~Stacked on #2203.~ Merged! 
~picked from #2226 Closed.

~Depends on #2331 (cherry-picked for now).~ Merged.

## Summary
Hotswap patch for the GFX1250 A0 WMMA/SWMMAC co-execution hazard.
Detects WMMA/SWMMAC instructions that lack sufficient v_nop separation
before the first overlapping co-executable VALU on A0 silicon, and
inserts the required v_nop padding via trampoline.
On A0, certain WMMA forms need more v_nop slots than B0 before an
overlapping VALU can safely co-execute.
The patch scans the decoded instruction stream, classifies each WMMA by
its A0/B0 nop requirement, counts existing safe slots (v_nop,
non-overlapping VALU, non-terminating SALU), and emits a trampoline with
the deficit v_nops before the hazardous VALU.

## Changes
- **comgr-hotswap-patch-wmma-hazard.cpp**: Strong-symbol override for
applyWmmaHazardPatch. Classifies WMMA/SWMMAC via MCInstrDesc::TSFlags,
detects VGPR overlap via MCRegisterInfo::regsOverlap, emits v_nop
padding via buildTrampoline. Includes TSFlags sync comment,
classifyWmmaNops ordering documentation, printInst round-trip
documentation, duplicate VALU guard (DenseSet), buildTrampoline result
check, dead variable removal.
- **comgr-hotswap-internal.h**: Add WmmaNopReq struct and
classifyWmmaNops declaration.
- **comgr-hotswap-elf.cpp**: Remove duplicate refuse check from
cherry-pick merge of #2331.
- **hotswap-rewrite.c**: Remove output-size check from --output path
(trampoline patches grow the ELF).
- **CMakeLists.txt**: Add WMMA patch source to library build.
- **hotswap-wmma-hazard.s**: Lit test: v_wmma_i32_16x16x64_iu8 followed
by overlapping v_add_f32, verifies v_nop padding and idempotency.
- **HotswapMCTest.cpp**: 10 new classifyWmmaNops unit tests covering all
WMMA variants and ordering.
- **test-unit/CMakeLists.txt**: Link WMMA patch source into
HotswapMCTests.

### Test plan
- lit, ctest, and unit test passes
- ASan lit tests: no memory errors

---------

Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Sanket Pandit <sanket.pandit@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants