[WaveTransform] Update filecheck pattern tests with basic divergent CGF by vg0204 · Pull Request #2539 · ROCm/llvm-project

vg0204 · 2026-05-14T11:31:05Z

This patch updates the LIT tests with basic divergent CFG exhibiting the similar scalar manipulation around EXEC for contrl flow naviagtion which is semantically equivalent to the structurized version of CGF.

Lists of test updated are :

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll
llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-gfx1030.ll
llvm/test/CodeGen/AMDGPU/coalescer_distribute.ll
llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
llvm/test/CodeGen/AMDGPU/copy-to-reg-frameindex.ll
llvm/test/CodeGen/AMDGPU/cse-convergent.ll
llvm/test/CodeGen/AMDGPU/ctpop64.ll
llvm/test/CodeGen/AMDGPU/dag-divergence-atomic.ll
llvm/test/CodeGen/AMDGPU/dagcombine-v1i8-extractvecelt-crash.ll

…t flow

z1-cciauto · 2026-05-14T11:36:30Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/347

cdevadas · 2026-05-15T12:21:49Z

+; GFX10-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v4
+; GFX10-NEXT:    v_mov_b32_e32 v4, 0
+; GFX10-NEXT:    s_xor_b32 s5, vcc_lo, exec_lo
+; GFX10-NEXT:    s_xor_b32 s4, exec_lo, s5
+; GFX10-NEXT:    s_mov_b32 exec_lo, s5
+; GFX10-NEXT:    ; divergent control-flow edge


Have you spent some time understanding the 2-line code at the LHS compared to this block of code?

cdevadas · 2026-05-15T12:46:06Z

-; GFX1010-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s4
-; GFX1010-NEXT:    v_cmp_ne_u32_e64 s4, 1, v0


This cndmask & the compare is optimized away in the right hand side. Can you understand why this happens in the new flow?

cdevadas · 2026-05-15T12:47:15Z

 ; GFX1010-NEXT:    s_cbranch_vccz .LBB0_4
 ; GFX1010-NEXT:  .LBB0_2: ; %.a
 ; GFX1010-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX1010-NEXT:    s_and_b32 vcc_lo, exec_lo, s4


Also, this s_and with exec is missed. In the new codegen, it is just a mov of s4 to vcc.

cdevadas · 2026-05-15T12:54:37Z

 ; CHECK-LABEL: phi_with_alloca_and_divergent_copy_to_reg:
 ; CHECK:       ; %bb.0: ; %entry
 ; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    s_mov_b64 s[4:5], -1


This is dead because of the redefinition of s[4:5] to 0 3 instructions below in the same BB. I assume they were different virtual registers (ACC registers); one initialized to -1 and the other to 0. Later, the one initialized with allOnes turned out to be dead and RA ends up giving the same physical register to them. Should analyze it and see if it is yet another optimization opportunity.

cdevadas · 2026-05-15T13:04:33Z

-; SI-NEXT:    s_mov_b64 s[2:3], 0
-; SI-NEXT:    s_andn2_b64 vcc, exec, s[2:3]
-; SI-NEXT:    s_waitcnt lgkmcnt(0)
-; SI-NEXT:    s_mov_b64 vcc, vcc
-; SI-NEXT:    s_cbranch_vccnz .LBB7_3


Did you analyze this code change? The code looks much simpler in the late wave transform flow.

cdevadas · 2026-05-15T13:09:19Z

+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
+; CHECK-NEXT:    s_xor_b64 s[6:7], vcc, exec
+; CHECK-NEXT:    s_xor_b64 s[4:5], exec, s[6:7]
+; CHECK-NEXT:    s_mov_b64 exec, s[6:7]
+; CHECK-NEXT:    ; divergent control-flow edge


This is a classic case we can explore if it is feasible to introduce the s_and_saveexec pattern instead of the 4MI Idiom we currently follow at the divergent conditional block.

vg0204 · 2026-05-19T06:14:52Z

@cdevadas , @lalaniket8 , @TejaX-Alaghari

I wold like to outline the procedure I followed to process this above tests (it will answer about my current understanding of new divergent code as in relation to yur questions CD) :

Using Cursor I regenerated all 70 tests files with new LWT flag set.
Then I handpicked these tests in PR based on minimal CFG changes (or large changes but of repetitive pattern)
Then I fed cursor the wavetransform rule file & AMDGPU codegen rule file(this craeted by me).
And based on context from 3 asked it to perform semantic equivalence checks.
Thereafter a positive reuslt posted these tests.

So, if I have to answer your multiple questions, I need to start looking into it now. Since, I was waiting on Aniket's basic pattern tests, just thought of using resources in hand to process some simple tests.

lalaniket8 · 2026-05-19T06:17:36Z

@cdevadas , @lalaniket8 , @TejaX-Alaghari

I wold like to outline the procedure I followed to process this above tests (it will answer about my current understanding of new divergent code as in relation to yur questions CD) :

Using Cursor I regenerated all 70 tests files with new LWT flag set.

Then I handpicked these tests in PR based on minimal CFG changes (or large changes but of repetitive pattern)

Then I fed cursor the wavetransform rule file & AMDGPU codegen rule file(this craeted by me).

And based on context from 3 asked it to perform semantic equivalence checks.

Thereafter a positive reuslt posted these tests.

So, if I have to answer your multiple questions, I need to start looking into it now. Since, I was waiting on Aniket's basic pattern tests, just thought of using resources in hand to process some simple tests.

Please don't wait on my tests, please proceed with checking the diffs in detail.

TejaX-Alaghari · 2026-05-19T06:19:48Z

-; GCN_DBG-NEXT:    s_cbranch_scc1 .LBB3_1
-; GCN_DBG-NEXT:    s_branch .LBB3_2
+; GCN_DBG-NEXT:    s_cbranch_scc1 .LBB3_2
+; GCN_DBG-NEXT:    s_branch .LBB3_1


Is this flip in branch polarity intended?

Aren't they supposed to be same as in St-WT? (unless cbranch_scc1 became cbranch_scc0)

[WaveTransform] fix tests for late wave transform with basic divergen…

97c2673

…t flow

vg0204 requested review from TejaX-Alaghari, cdevadas and lalaniket8 May 14, 2026 11:31

vg0204 changed the title ~~[WaveTransform] update filecheck pattern tests with basic divergent CGF~~ [WaveTransform] Update filecheck pattern tests with basic divergent CGF May 14, 2026

cdevadas reviewed May 15, 2026

View reviewed changes

TejaX-Alaghari reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WaveTransform] Update filecheck pattern tests with basic divergent CGF#2539

[WaveTransform] Update filecheck pattern tests with basic divergent CGF#2539
vg0204 wants to merge 1 commit into
amd-feature/wave-transformfrom
amd/dev/gvikash/fileCheck-tests-basic-divergent-CFG

vg0204 commented May 14, 2026

Uh oh!

z1-cciauto commented May 14, 2026

Uh oh!

cdevadas May 15, 2026

Uh oh!

cdevadas May 15, 2026

Uh oh!

cdevadas May 15, 2026

Uh oh!

Uh oh!

cdevadas May 15, 2026

Uh oh!

cdevadas May 15, 2026

Uh oh!

cdevadas May 15, 2026

Uh oh!

vg0204 commented May 19, 2026

Uh oh!

lalaniket8 commented May 19, 2026

Uh oh!

TejaX-Alaghari May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		; GFX1010-NEXT: v_cndmask_b32_e64 v0, 0, 1, s4
		; GFX1010-NEXT: v_cmp_ne_u32_e64 s4, 1, v0

Conversation

vg0204 commented May 14, 2026

Uh oh!

z1-cciauto commented May 14, 2026

Uh oh!

cdevadas May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cdevadas May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cdevadas May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cdevadas May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cdevadas May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cdevadas May 15, 2026

Choose a reason for hiding this comment

Uh oh!

vg0204 commented May 19, 2026

Uh oh!

lalaniket8 commented May 19, 2026

Uh oh!

TejaX-Alaghari May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TejaX-Alaghari May 19, 2026 •

edited

Loading