Skip to content

[AMDGPU] comgr: fix VOP3PX2 scale_src2 false SGPR dependency #2293

Merged
lamb-j merged 1 commit into
ROCm:amd-stagingfrom
xintin:hotswap-vop3px2-src2-fix
May 2, 2026
Merged

[AMDGPU] comgr: fix VOP3PX2 scale_src2 false SGPR dependency #2293
lamb-j merged 1 commit into
ROCm:amd-stagingfrom
xintin:hotswap-vop3px2-src2-fix

Conversation

@xintin
Copy link
Copy Markdown

@xintin xintin commented Apr 22, 2026

Summary

VOP3PX2 V_WMMA_SCALE* instructions have an unused scale_src2 field at bits [58:50] (see VOP3PX2e::Inst{58-50} in VOP3PInstructions.td) that the SQ incorrectly decodes as an SGPR reference, causing a 3-cycle SALU stall after WMMA co-execution. This patch sets the field to VGPR0 encoding (0x100) to prevent the false dependency. Applies to both A0 and B0 steppings.

The fix uses raw byte manipulation because scale_src2 is a hardware encoding artifact not modeled as an MC operand. The MC layer has no mechanism to read or set this field.

This only fires on the B0-to-A0 hotswap rewrite path. A0-native binaries are compiled with an A0-targeted Clang that sets the field correctly at codegen time.

Changes

  • comgr-hotswap-patch-vop3px2-src2.cpp: applyVop3px2Src2Fix whole-kernel pass. Matches the 4 known v_wmma_scale* mnemonics via explicit StringSwitch enumeration (not starts_with), patches bits [58:50] in place. Includes diagnostic logging on size/bounds skip paths and a gated summary log.
  • comgr-hotswap-internal.h: Declare patchScaleSrc2 for unit testing.
  • comgr-hotswap-b0a0.cpp: Add declaration, weak stub, and dispatcher call for applyVop3px2Src2Fix.
  • CMakeLists.txt: Add new source to library build.
  • hotswap-vop3px2-src2.s: Positive lit test -- kernel with v_wmma_scale_f32_16x16x128_f8f6f4. Verifies instruction survives patching and idempotency (second rewrite produces identical bytes, which proves the first pass set the field to VGPR0 since the assembler default is not VGPR0).
  • hotswap-vop3px2-src2-noop.s: Passthrough lit test -- kernel with regular (non-scale) WMMA. Verifies no v_wmma_scale appears and layout is preserved.
  • HotswapMCTest.cpp: 6 new PatchScaleSrc2 unit tests covering zeroed field, all-ones field, already-VGPR0 (returns false), idempotency (patch + re-patch), byte preservation outside scale_src2, and non-scale-src2 bit preservation.
  • test-unit/CMakeLists.txt: Link new source into HotswapMCTests.

Tests

  • 2 lit tests (positive + passthrough) with idempotency checks
  • 6 unit tests for patchScaleSrc2 byte-level correctness
  • All existing lit, ctest, and unit tests pass

Follow-up (deferred)

@xintin xintin added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature ci:skip Skip all CI builds/tests for this PR labels Apr 22, 2026
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@xintin xintin force-pushed the hotswap-vop3px2-src2-fix branch 4 times, most recently from fbd4d9a to 7d3c009 Compare April 29, 2026 20:26
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@xintin xintin removed the ci:skip Skip all CI builds/tests for this PR label Apr 29, 2026
@xintin xintin marked this pull request as ready for review April 29, 2026 21:00
@xintin xintin requested review from chinmaydd and lamb-j as code owners April 29, 2026 21:00
Comment thread amd/comgr/test-lit/hotswap-vop3px2-src2.s
Comment thread amd/comgr/src/comgr-hotswap-patch-vop3px2-src2.cpp Outdated
Comment thread amd/comgr/src/comgr-hotswap-patch-vop3px2-src2.cpp Outdated
@xintin xintin force-pushed the hotswap-vop3px2-src2-fix branch from 7d3c009 to f88577e Compare April 30, 2026 04:24
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@xintin xintin requested a review from chinmaydd April 30, 2026 04:27
Signed-off-by: xintin <gaurav.verma@amd.com>
@xintin xintin force-pushed the hotswap-vop3px2-src2-fix branch from f88577e to 7173f42 Compare May 1, 2026 17:37
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Copy link
Copy Markdown

@chinmaydd chinmaydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !

@lamb-j
Copy link
Copy Markdown
Collaborator

lamb-j commented May 2, 2026

PSDB passes Comgr tests, Windows compiler-runtime Multi-Arch passed. Failures unrelated

@lamb-j lamb-j merged commit c5b632f into ROCm:amd-staging May 2, 2026
57 of 77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants