[AMDGPU] comgr: Add DS_ADDTID trampoline patch for A0 16-bit M0 truncation by xintin · Pull Request #2302 · ROCm/llvm-project

xintin · 2026-04-23T20:26:05Z

Stacked on #2583.

Problem

ds_load_addtid_b32 and ds_store_addtid_b32 compute LDS addresses as M0 + laneID * 4 + offset. On B0 the DS unit reads 20 bits of M0 (1 MB); on A0 it reads only 16 (64 KB), silently truncating bits [19:16] (DEGFXMI400-12025). gfx1250 supports up to 320 KB LDS per WGP, so the upper 256 KB is unreachable through ADDTID. When B0-compiled code using ADDTID is hotswapped onto A0, M0 values above 65535 wrap and produce silently wrong LDS accesses.

Fix

patchDsAddtid (in comgr-hotswap-patch-trampoline.cpp) rewrites each ADDTID instruction into an ALU sequence (v_mbcnt_lo_u32_b32 / v_mbcnt_hi_u32_b32 for laneID, v_lshlrev_b32 ×4, v_add_u32 with M0 as a 32-bit operand) followed by a regular ds_load_b32 / ds_store_b32.
The M0 read is bypassed and the original offset is preserved.
Emission goes through emitReplacementCode (NOP sled preferred, deferred-trampoline fallback). GDS=1 is rejected with a diagnostic.

Load reuses vDST as the laneID temporary (no extra VGPR).
Store pulls a dead VGPR via allocScratchVgpr. If none is available the patch bails; when the kernel's LDS exceeds 64 KiB a warning is also logged so the A0 truncation risk surfaces to the user. The LDS size is read through ElfView::getKernelLdsSize, added in [AMDGPU] comgr: Add ElfView::getKernelLdsSize helper #2583.

Testing

test-lit/hotswap-trampoline-addtid.s : sled-based rewrite, operand pinned DISASM across the full 6-instruction body.
test-lit/hotswap-trampoline-addtid-nosled.s : deferred-trampoline fallback.
test-unit/HotswapMCTest.cpp AddTid suite: ADDTID assemble/decode round-trips, operand layout, GDS rejection, trampoline asm assembly, buildTrampoline integration.

z1-cciauto · 2026-04-23T20:47:31Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/5431

Adds a small `ElfView` helper that reads `group_segment_fixed_size` from the kernel descriptor symbol `<KernelName>.kd`. Returns` std::nullopt` when the descriptor symbol is missing. Split out as a precursor to the `ds_*_addtid_b32` trampoline patch (PR #2302). That patch will use the helper to gate a diagnostic for kernels with LDS allocations larger than the A0 16-bit M0 truncation limit. Unit test `ElfView.GetKernelLdsSizeReturnsNulloptWhenKdMissing` covers the missing-descriptor path with a hand-crafted minimal ELF; the positive path will be exercised through lit once a consumer lands. --------- Signed-off-by: xintin <gaurav.verma@amd.com>

…ation Rewrites `ds_*_addtid_b32` into an ALU-computed `ds_*_b32` so the address bypasses the 16-bit M0 read on A0 (DEGFXMI400-12025). A diagnostic is emitted when the kernel's LDS allocation exceeds 64 KiB. Tests: lit `hotswap-trampoline-addtid{,-nosled}.s` and gtest `AddTid` in `HotswapMCTest.cpp`. Stacked on llvm#2583. Assisted-by: Cursor Signed-off-by: xintin <gaurav.verma@amd.com>

Signed-off-by: xintin <gaurav.verma@amd.com>

chinmaydd

LGTM, thanks for addressing comments. Please ensure to keep the commit message minimal during squash+merge.

Signed-off-by: xintin <gaurav.verma@amd.com>

xintin added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature ci:skip Skip all CI builds/tests for this PR labels Apr 23, 2026

xintin mentioned this pull request Apr 27, 2026

[AMDGPU] comgr: Fix ELF trampoline growth for clang/lld-produced HSACOs #2331

Merged

xintin changed the title ~~[comgr][hotswap] Add DS_ADDTID trampoline patch for A0 16-bit M0 truncation~~ [AMDGPU] comgr: Add DS_ADDTID trampoline patch for A0 16-bit M0 truncation Apr 30, 2026

xintin force-pushed the hotswap-ds-load-store-addtid branch from 961821a to d70fc7a Compare May 15, 2026 21:20

xintin mentioned this pull request May 18, 2026

[AMDGPU] comgr: Add ElfView::getKernelLdsSize helper #2583

Merged

xintin force-pushed the hotswap-ds-load-store-addtid branch from d70fc7a to 8db0906 Compare May 18, 2026 12:03

jmmartinez reviewed May 20, 2026

View reviewed changes

Comment thread amd/comgr/src/comgr-hotswap-patch-trampoline.cpp Outdated

xintin force-pushed the hotswap-ds-load-store-addtid branch from 8db0906 to 42c9d6b Compare May 20, 2026 07:27

xintin marked this pull request as ready for review May 20, 2026 07:28

xintin requested review from chinmaydd and lamb-j as code owners May 20, 2026 07:28

xintin requested a review from jmmartinez May 20, 2026 07:40

xintin removed the ci:skip Skip all CI builds/tests for this PR label May 20, 2026

chinmaydd reviewed May 20, 2026

View reviewed changes

Comment thread amd/comgr/src/comgr-hotswap-elf.cpp