Skip to content

[AMDGPU] comgr: split non-stride64 DS 2-address ops for A0#2281

Open
xintin wants to merge 4 commits into
ROCm:amd-stagingfrom
xintin:hotswap-ds2-split
Open

[AMDGPU] comgr: split non-stride64 DS 2-address ops for A0#2281
xintin wants to merge 4 commits into
ROCm:amd-stagingfrom
xintin:hotswap-ds2-split

Conversation

@xintin
Copy link
Copy Markdown

@xintin xintin commented Apr 22, 2026

Depends on #2584.

Summary

A0 silicon requires DS 2-address instructions to have offsets aligned to the payload size.
B0 dropped that restriction, so a B0-compiled binary may emit a 2-address DS instruction with unaligned offsets that silently corrupts LDS on A0.

The trampoline PR (introduced in #2369, with the drain-preserving wait fix in #2584) already covers the stride64 variants. This PR extends getDs2AddrReplacement and extractDsOperands in comgr-hotswap-patch-trampoline.cpp to the non-stride64 encodings of the same families, including the storexchg form:

Mnemonic Replacement Byte-offset scale
ds_load_2addr_b32 ds_load_b32 index * 4
ds_load_2addr_b64 ds_load_b64 index * 8
ds_store_2addr_b32 ds_store_b32 index * 4
ds_store_2addr_b64 ds_store_b64 index * 8
ds_storexchg_2addr_rtn_b32 ds_storexchg_rtn_b32 index * 4
ds_storexchg_2addr_rtn_b64 ds_storexchg_rtn_b64 index * 8

The stride64 forms encode index * 64 * ElemBytes byte offsets; the non-stride64 forms encode index * ElemBytes. Replacement single-address instructions take byte offsets directly, so a single ternary in extractDsOperands materialises the right scale for each encoding and the layout-specific expansion helpers are unchanged.

Follow-up

  • Cache resolved opcodes in LLVMState at init time (same follow-up as #2257).

@xintin xintin added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature ci:skip Skip all CI builds/tests for this PR labels Apr 22, 2026
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Comment thread amd/comgr/src/comgr-hotswap-patch-ds2-split.cpp Outdated
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@z1-cciauto
Copy link
Copy Markdown
Collaborator

xintin added 2 commits May 18, 2026 15:32
Follow-up to llvm#2369. The split-aware wait bump was incrementing drains
(s_wait_dscnt 0) along with bounded waits, relaxing them and letting
split halves escape into a downstream data hazard. Skip the bump when
imm == 0; non-drain waits still bump by +1.

Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com
@xintin xintin force-pushed the hotswap-ds2-split branch 2 times, most recently from ff9d5e7 to 7325ff7 Compare May 20, 2026 08:25
…ore test

hotswap-trampoline-ds-nostride64.s: add ds_store_2addr_b64 kernel
exercising fmtRegOperand on b64 data pairs.

Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the hotswap-ds2-split branch from 7325ff7 to 3c21d5e Compare May 20, 2026 08:28
@xintin xintin changed the title [AMDGPU] comgr: split non-stride64 DS_READ2/DS_WRITE2 for A0 [AMDGPU] comgr: split non-stride64 DS 2-address ops for A0" May 20, 2026
@xintin xintin changed the title [AMDGPU] comgr: split non-stride64 DS 2-address ops for A0" [AMDGPU] comgr: split non-stride64 DS 2-address ops for A0 May 20, 2026
@xintin xintin requested a review from jmmartinez May 20, 2026 08:33
@xintin xintin marked this pull request as ready for review May 20, 2026 08:33
@xintin xintin requested review from chinmaydd and lamb-j as code owners May 20, 2026 08:33
@xintin xintin requested a review from harsh-amd May 20, 2026 08:34
xintin added a commit that referenced this pull request May 20, 2026
Follow-up to #2369. The split-aware wait bump was incrementing drains
(s_wait_dscnt 0) along with bounded waits, relaxing them and letting
split halves escape into a downstream data hazard.
Skip the bump when imm == 0; non-drain waits still bump by +1.

This will unlock PR #2281.

(follow-up #2585): a small dataflow pass at the wait site computing the
bump from `(outstanding-count-at-wait, K)`. It would subsume the drain
special-case naturally and remove the `// TODO:` marker left in
`bumpNextWaitDscnt`.

---------

Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin requested a review from yxsamliu May 20, 2026 15:26
Comment thread amd/comgr/test-lit/hotswap-trampoline-ds-nostride64-multi.s
Comment thread amd/comgr/test-lit/hotswap-trampoline-ds-nostride64.s Outdated
Comment thread amd/comgr/src/comgr-hotswap-patch-trampoline.cpp
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin requested a review from chinmaydd May 20, 2026 22:25
Copy link
Copy Markdown

@chinmaydd chinmaydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:skip Skip all CI builds/tests for this PR comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants