[AMDGPU] comgr: fix VOP3PX2 scale_src2 false SGPR dependency #2293
Merged
Conversation
Collaborator
fbd4d9a to
7d3c009
Compare
Collaborator
chinmaydd
reviewed
Apr 30, 2026
chinmaydd
reviewed
Apr 30, 2026
chinmaydd
reviewed
Apr 30, 2026
7d3c009 to
f88577e
Compare
Collaborator
Signed-off-by: xintin <gaurav.verma@amd.com>
f88577e to
7173f42
Compare
Collaborator
Collaborator
|
PSDB passes Comgr tests, Windows compiler-runtime Multi-Arch passed. Failures unrelated |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
VOP3PX2 V_WMMA_SCALE* instructions have an unused scale_src2 field at bits [58:50] (see VOP3PX2e::Inst{58-50} in VOP3PInstructions.td) that the SQ incorrectly decodes as an SGPR reference, causing a 3-cycle SALU stall after WMMA co-execution. This patch sets the field to VGPR0 encoding (0x100) to prevent the false dependency. Applies to both A0 and B0 steppings.
The fix uses raw byte manipulation because scale_src2 is a hardware encoding artifact not modeled as an MC operand. The MC layer has no mechanism to read or set this field.
This only fires on the B0-to-A0 hotswap rewrite path. A0-native binaries are compiled with an A0-targeted Clang that sets the field correctly at codegen time.
Changes
applyVop3px2Src2Fixwhole-kernel pass. Matches the 4 knownv_wmma_scale*mnemonics via explicitStringSwitchenumeration (notstarts_with), patches bits [58:50] in place. Includes diagnostic logging on size/bounds skip paths and a gated summary log.patchScaleSrc2for unit testing.applyVop3px2Src2Fix.v_wmma_scale_f32_16x16x128_f8f6f4. Verifies instruction survives patching and idempotency (second rewrite produces identical bytes, which proves the first pass set the field to VGPR0 since the assembler default is not VGPR0).v_wmma_scaleappears and layout is preserved.PatchScaleSrc2unit tests covering zeroed field, all-ones field, already-VGPR0 (returns false), idempotency (patch + re-patch), byte preservation outside scale_src2, and non-scale-src2 bit preservation.Tests
patchScaleSrc2byte-level correctnessFollow-up (deferred)
isVop3px2ScaleInstunit tests (known-variant + near-miss coverage); requires exposing the function in the header