[AMDGPU] comgr: split non-stride64 DS 2-address ops for A0#2281
Open
xintin wants to merge 4 commits into
Open
Conversation
Collaborator
yxsamliu
reviewed
Apr 27, 2026
fe852aa to
0627680
Compare
Collaborator
Collaborator
Follow-up to llvm#2369. The split-aware wait bump was incrementing drains (s_wait_dscnt 0) along with bounded waits, relaxing them and letting split halves escape into a downstream data hazard. Skip the bump when imm == 0; non-drain waits still bump by +1. Signed-off-by: xintin <gaurav.verma@amd.com>
Signed-off-by: xintin <gaurav.verma@amd.com
ff9d5e7 to
7325ff7
Compare
…ore test hotswap-trampoline-ds-nostride64.s: add ds_store_2addr_b64 kernel exercising fmtRegOperand on b64 data pairs. Signed-off-by: xintin <gaurav.verma@amd.com>
xintin
added a commit
that referenced
this pull request
May 20, 2026
Follow-up to #2369. The split-aware wait bump was incrementing drains (s_wait_dscnt 0) along with bounded waits, relaxing them and letting split halves escape into a downstream data hazard. Skip the bump when imm == 0; non-drain waits still bump by +1. This will unlock PR #2281. (follow-up #2585): a small dataflow pass at the wait site computing the bump from `(outstanding-count-at-wait, K)`. It would subsume the drain special-case naturally and remove the `// TODO:` marker left in `bumpNextWaitDscnt`. --------- Signed-off-by: xintin <gaurav.verma@amd.com>
chinmaydd
reviewed
May 20, 2026
chinmaydd
reviewed
May 20, 2026
chinmaydd
reviewed
May 20, 2026
Signed-off-by: xintin <gaurav.verma@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #2584.
Summary
A0 silicon requires DS 2-address instructions to have offsets aligned to the payload size.
B0 dropped that restriction, so a B0-compiled binary may emit a 2-address DS instruction with unaligned offsets that silently corrupts LDS on A0.
The trampoline PR (introduced in #2369, with the drain-preserving wait fix in #2584) already covers the stride64 variants. This PR extends
getDs2AddrReplacementandextractDsOperandsincomgr-hotswap-patch-trampoline.cppto the non-stride64 encodings of the same families, including thestorexchgform:ds_load_2addr_b32ds_load_b32index * 4ds_load_2addr_b64ds_load_b64index * 8ds_store_2addr_b32ds_store_b32index * 4ds_store_2addr_b64ds_store_b64index * 8ds_storexchg_2addr_rtn_b32ds_storexchg_rtn_b32index * 4ds_storexchg_2addr_rtn_b64ds_storexchg_rtn_b64index * 8The stride64 forms encode
index * 64 * ElemBytesbyte offsets; the non-stride64 forms encodeindex * ElemBytes. Replacement single-address instructions take byte offsets directly, so a single ternary inextractDsOperandsmaterialises the right scale for each encoding and the layout-specific expansion helpers are unchanged.Follow-up
LLVMStateat init time (same follow-up as #2257).