Skip to content

[WaveTransform] Update filecheck pattern tests with basic divergent CGF#2539

Open
vg0204 wants to merge 1 commit into
amd-feature/wave-transformfrom
amd/dev/gvikash/fileCheck-tests-basic-divergent-CFG
Open

[WaveTransform] Update filecheck pattern tests with basic divergent CGF#2539
vg0204 wants to merge 1 commit into
amd-feature/wave-transformfrom
amd/dev/gvikash/fileCheck-tests-basic-divergent-CFG

Conversation

@vg0204
Copy link
Copy Markdown

@vg0204 vg0204 commented May 14, 2026

This patch updates the LIT tests with basic divergent CFG exhibiting the similar scalar manipulation around EXEC for contrl flow naviagtion which is semantically equivalent to the structurized version of CGF.

Lists of test updated are :

  • llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll
  • llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
  • llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx1030.ll
  • llvm/test/CodeGen/AMDGPU/coalescer_distribute.ll
  • llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
  • llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
  • llvm/test/CodeGen/AMDGPU/copy-to-reg-frameindex.ll
  • llvm/test/CodeGen/AMDGPU/cse-convergent.ll
  • llvm/test/CodeGen/AMDGPU/ctpop64.ll
  • llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll
  • llvm/test/CodeGen/AMDGPU/dagcombine-v1i8-extractvecelt-crash.ll

@vg0204 vg0204 changed the title [WaveTransform] update filecheck pattern tests with basic divergent CGF [WaveTransform] Update filecheck pattern tests with basic divergent CGF May 14, 2026
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Comment on lines +312 to +317
; GFX10-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v4
; GFX10-NEXT: v_mov_b32_e32 v4, 0
; GFX10-NEXT: s_xor_b32 s5, vcc_lo, exec_lo
; GFX10-NEXT: s_xor_b32 s4, exec_lo, s5
; GFX10-NEXT: s_mov_b32 exec_lo, s5
; GFX10-NEXT: ; divergent control-flow edge
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you spent some time understanding the 2-line code at the LHS compared to this block of code?

Comment on lines -14 to -15
; GFX1010-NEXT: v_cndmask_b32_e64 v0, 0, 1, s4
; GFX1010-NEXT: v_cmp_ne_u32_e64 s4, 1, v0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cndmask & the compare is optimized away in the right hand side. Can you understand why this happens in the new flow?

; GFX1010-NEXT: s_cbranch_vccz .LBB0_4
; GFX1010-NEXT: .LBB0_2: ; %.a
; GFX1010-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX1010-NEXT: s_and_b32 vcc_lo, exec_lo, s4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this s_and with exec is missed. In the new codegen, it is just a mov of s4 to vcc.

Comment thread llvm/test/CodeGen/AMDGPU/coalescer_distribute.ll
; CHECK-LABEL: phi_with_alloca_and_divergent_copy_to_reg:
; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: s_mov_b64 s[4:5], -1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead because of the redefinition of s[4:5] to 0 3 instructions below in the same BB. I assume they were different virtual registers (ACC registers); one initialized to -1 and the other to 0. Later, the one initialized with allOnes turned out to be dead and RA ends up giving the same physical register to them. Should analyze it and see if it is yet another optimization opportunity.

Comment on lines -345 to -349
; SI-NEXT: s_mov_b64 s[2:3], 0
; SI-NEXT: s_andn2_b64 vcc, exec, s[2:3]
; SI-NEXT: s_waitcnt lgkmcnt(0)
; SI-NEXT: s_mov_b64 vcc, vcc
; SI-NEXT: s_cbranch_vccnz .LBB7_3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you analyze this code change? The code looks much simpler in the late wave transform flow.

Comment on lines +10 to +14
; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; CHECK-NEXT: s_xor_b64 s[6:7], vcc, exec
; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[6:7]
; CHECK-NEXT: s_mov_b64 exec, s[6:7]
; CHECK-NEXT: ; divergent control-flow edge
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a classic case we can explore if it is feasible to introduce the s_and_saveexec pattern instead of the 4MI Idiom we currently follow at the divergent conditional block.

@vg0204
Copy link
Copy Markdown
Author

vg0204 commented May 19, 2026

@cdevadas , @lalaniket8 , @TejaX-Alaghari

I wold like to outline the procedure I followed to process this above tests (it will answer about my current understanding of new divergent code as in relation to yur questions CD) :

  1. Using Cursor I regenerated all 70 tests files with new LWT flag set.
  2. Then I handpicked these tests in PR based on minimal CFG changes (or large changes but of repetitive pattern)
  3. Then I fed cursor the wavetransform rule file & AMDGPU codegen rule file(this craeted by me).
  4. And based on context from 3 asked it to perform semantic equivalence checks.
  5. Thereafter a positive reuslt posted these tests.

So, if I have to answer your multiple questions, I need to start looking into it now. Since, I was waiting on Aniket's basic pattern tests, just thought of using resources in hand to process some simple tests.

@lalaniket8
Copy link
Copy Markdown

@cdevadas , @lalaniket8 , @TejaX-Alaghari

I wold like to outline the procedure I followed to process this above tests (it will answer about my current understanding of new divergent code as in relation to yur questions CD) :

  1. Using Cursor I regenerated all 70 tests files with new LWT flag set.
  2. Then I handpicked these tests in PR based on minimal CFG changes (or large changes but of repetitive pattern)
  3. Then I fed cursor the wavetransform rule file & AMDGPU codegen rule file(this craeted by me).
  4. And based on context from 3 asked it to perform semantic equivalence checks.
  5. Thereafter a positive reuslt posted these tests.

So, if I have to answer your multiple questions, I need to start looking into it now. Since, I was waiting on Aniket's basic pattern tests, just thought of using resources in hand to process some simple tests.

Please don't wait on my tests, please proceed with checking the diffs in detail.

; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_1
; GCN_DBG-NEXT: s_branch .LBB3_2
; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_2
; GCN_DBG-NEXT: s_branch .LBB3_1
Copy link
Copy Markdown

@TejaX-Alaghari TejaX-Alaghari May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this flip in branch polarity intended?

Aren't they supposed to be same as in St-WT? (unless cbranch_scc1 became cbranch_scc0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants