[WaveTransform] Update filecheck pattern tests with basic divergent CGF#2539
[WaveTransform] Update filecheck pattern tests with basic divergent CGF#2539vg0204 wants to merge 1 commit into
Conversation
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/347 |
| ; GFX10-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v4 | ||
| ; GFX10-NEXT: v_mov_b32_e32 v4, 0 | ||
| ; GFX10-NEXT: s_xor_b32 s5, vcc_lo, exec_lo | ||
| ; GFX10-NEXT: s_xor_b32 s4, exec_lo, s5 | ||
| ; GFX10-NEXT: s_mov_b32 exec_lo, s5 | ||
| ; GFX10-NEXT: ; divergent control-flow edge |
There was a problem hiding this comment.
Have you spent some time understanding the 2-line code at the LHS compared to this block of code?
| ; GFX1010-NEXT: v_cndmask_b32_e64 v0, 0, 1, s4 | ||
| ; GFX1010-NEXT: v_cmp_ne_u32_e64 s4, 1, v0 |
There was a problem hiding this comment.
This cndmask & the compare is optimized away in the right hand side. Can you understand why this happens in the new flow?
| ; GFX1010-NEXT: s_cbranch_vccz .LBB0_4 | ||
| ; GFX1010-NEXT: .LBB0_2: ; %.a | ||
| ; GFX1010-NEXT: ; =>This Inner Loop Header: Depth=1 | ||
| ; GFX1010-NEXT: s_and_b32 vcc_lo, exec_lo, s4 |
There was a problem hiding this comment.
Also, this s_and with exec is missed. In the new codegen, it is just a mov of s4 to vcc.
| ; CHECK-LABEL: phi_with_alloca_and_divergent_copy_to_reg: | ||
| ; CHECK: ; %bb.0: ; %entry | ||
| ; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) | ||
| ; CHECK-NEXT: s_mov_b64 s[4:5], -1 |
There was a problem hiding this comment.
This is dead because of the redefinition of s[4:5] to 0 3 instructions below in the same BB. I assume they were different virtual registers (ACC registers); one initialized to -1 and the other to 0. Later, the one initialized with allOnes turned out to be dead and RA ends up giving the same physical register to them. Should analyze it and see if it is yet another optimization opportunity.
| ; SI-NEXT: s_mov_b64 s[2:3], 0 | ||
| ; SI-NEXT: s_andn2_b64 vcc, exec, s[2:3] | ||
| ; SI-NEXT: s_waitcnt lgkmcnt(0) | ||
| ; SI-NEXT: s_mov_b64 vcc, vcc | ||
| ; SI-NEXT: s_cbranch_vccnz .LBB7_3 |
There was a problem hiding this comment.
Did you analyze this code change? The code looks much simpler in the late wave transform flow.
| ; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0 | ||
| ; CHECK-NEXT: s_xor_b64 s[6:7], vcc, exec | ||
| ; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[6:7] | ||
| ; CHECK-NEXT: s_mov_b64 exec, s[6:7] | ||
| ; CHECK-NEXT: ; divergent control-flow edge |
There was a problem hiding this comment.
This is a classic case we can explore if it is feasible to introduce the s_and_saveexec pattern instead of the 4MI Idiom we currently follow at the divergent conditional block.
|
@cdevadas , @lalaniket8 , @TejaX-Alaghari I wold like to outline the procedure I followed to process this above tests (it will answer about my current understanding of new divergent code as in relation to yur questions CD) :
So, if I have to answer your multiple questions, I need to start looking into it now. Since, I was waiting on Aniket's basic pattern tests, just thought of using resources in hand to process some simple tests. |
Please don't wait on my tests, please proceed with checking the diffs in detail. |
| ; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_1 | ||
| ; GCN_DBG-NEXT: s_branch .LBB3_2 | ||
| ; GCN_DBG-NEXT: s_cbranch_scc1 .LBB3_2 | ||
| ; GCN_DBG-NEXT: s_branch .LBB3_1 |
There was a problem hiding this comment.
Is this flip in branch polarity intended?
Aren't they supposed to be same as in St-WT? (unless cbranch_scc1 became cbranch_scc0)
This patch updates the LIT tests with basic divergent CFG exhibiting the similar scalar manipulation around EXEC for contrl flow naviagtion which is semantically equivalent to the structurized version of CGF.
Lists of test updated are :