[HLK] Adding WaveMatch Long Vector Test #7937

joaosaffran · 2025-11-20T19:43:42Z

This patch adds support to WaveMatch in the HLK Long Vector Tests.
Tested with Wave sizes: 4, 8, 16, 32, 64, 128.

Closes: #7613

alsepkow · 2025-11-20T19:46:06Z

Were you able to validate these? I would expect these to fail on WARP right now.

tools/clang/unittests/HLSLExec/ShaderOpArith.xml

tools/clang/unittests/HLSLExec/LongVectors.cpp

tools/clang/unittests/HLSLExec/ShaderOpArith.xml

alsepkow · 2025-11-22T00:15:06Z

tools/clang/unittests/HLSLExec/ShaderOpArith.xml

+            }
+            uint4 result = WaveMatch(Vector);
+            uint index = WaveGetLaneIndex() * 4;
+            if(index < NUM)


I'm not sure what the intent of using NUM was here.
NUM describes the number of elements in the input Vector, which isn't related to the number of lanes in the wave.
I think what you will want to do is store the mask from each lane. That is, your output should have WaveGetLaneCount() elements. With each element being a uint4. And in this test case you would expect to have two unique masks. With the first mask being for the first group in lane 0 only. And all other lanes in another group (mask).

That would greatly simplify your logic for computing expected values as well.

Yeah, I don't need NUM check here. Thanks for pointing out.

The output is storing the result mask for each lane, the mask is 128 bits long, so I am breaking into groups of 32 bits. This is done, so I can inspect all the bits and make sure the correct ones are being set.

If I store it as a uint4, like here:

g_OutputVector.Store<uint>(WaveGetLaneIndex() * sizeof(uint4), result);

Some bits get truncated to 0, which is not the correct value, since all but one lane will have a different value, the number that I expect here changes if I change the lane that has the different vector. Instead, to make sure the values are correct, I need to store uint each component of WaveMatch resulting mask, like so:

uint index = WaveGetLaneIndex() * 4; g_OutputVector.Store<uint>(index * sizeof(uint), result.x); g_OutputVector.Store<uint>((index + 1) * sizeof(uint), result.y); g_OutputVector.Store<uint>((index + 2) * sizeof(uint), result.z); g_OutputVector.Store<uint>((index + 3) * sizeof(uint), result.w);

Maybe a typo in your response, but shouldn't it be .Store<uint4>? I wouldn't expect that to truncate anything

tools/clang/unittests/HLSLExec/LongVectors.cpp

inbelic · 2025-11-24T21:23:15Z

From pair review with Alex, we were concerned that the WaveMatch spec does not specifically detail how bits for unused lanes are handled, so we should probably assume that it is undefined behaviour. For the sake of testing then we shouldn't check that those bits match being set to 0.

So we might want/need to have the wave size always be 32 and only check the first component. However, we weren't sure if that was easily possible of the top of our heads.

Do drivers ignore those bits or set them to zero?

tools/clang/unittests/HLSLExec/LongVectors.cpp

damyanp · 2025-11-24T21:33:49Z

...we were concerned that the WaveMatch spec does not specifically detail how bits for unused lanes are handled.

It looks to me that the spec does specify this (they're set to 0):

"Bits in the mask corresponding to inactive lanes, or at positions beyond current implementation’s wave width, will contribute 0’s."

So we might want/need to have the wave size always be 32

This isn't possible, not all devices support wave size == 32. In D3D we need to support 4, 8, 16, 32, 64 and 128 wave sizes. (Vulkan also supports 1 and 2).

tools/clang/unittests/HLSLExec/ShaderOpArith.xml

inbelic · 2025-11-24T21:42:12Z

"Bits in the mask corresponding to inactive lanes, or at positions beyond current implementation’s wave width, will contribute 0’s."

Ah skipped over that sentence... thanks. The last sentence tripped me up as I assumed it was only defined for pixel shaders.

We can disregard needing 32 lanes then as we can match with the expected output = 0.

joaosaffran · 2025-11-25T02:13:24Z

tools/clang/unittests/HLSLExec/LongVectorOps.def

 OP(Wave, WaveMultiPrefixBitAnd, 1, "TestWaveMultiPrefixBitAnd", "", " -DFUNC_WAVE_MULTI_PREFIX_BIT_AND=1 -DIS_WAVE_PREFIX_OP=1", "LongVectorOp", WaveMultiPrefixBitwise, Default2, Default3)
 OP(Wave, WaveMultiPrefixBitOr, 1, "TestWaveMultiPrefixBitOr", "", " -DFUNC_WAVE_MULTI_PREFIX_BIT_OR=1 -DIS_WAVE_PREFIX_OP=1", "LongVectorOp", WaveMultiPrefixBitwise, Default2, Default3)
 OP(Wave, WaveMultiPrefixBitXor, 1, "TestWaveMultiPrefixBitXor", "", " -DFUNC_WAVE_MULTI_PREFIX_BIT_XOR=1 -DIS_WAVE_PREFIX_OP=1", "LongVectorOp", WaveMultiPrefixBitwise, Default2, Default3)
+OP_DEFAULT_DEFINES(Wave, WaveMatch, 1, "TestWaveMatch", "", " -DFUNC_WAVE_MATCH=1 -DIS_WAVE_PREFIX_OP=1")


-DIS_WAVE_PREFIX_OP=1 is required, the test function returns void, since it handles writing to the out vector inside the test function, instead of delegating to main

tools/clang/unittests/HLSLExec/LongVectors.cpp

damyanp · 2025-11-25T17:26:47Z

tools/clang/unittests/HLSLExec/LongVectors.cpp

+    uint64_t LowWaveShift = (LowWaves < 64) ? (1ULL << LowWaves) : 0;
+    uint64_t HighWaveShift = (HighWaves < 64) ? (1ULL << HighWaves) : 0;
+
+    uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL),


We need to follow LLVM naming conventions.

Suggested change

uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL),

uint64_t Result[2] = {(LowWaveShift - 1 & ~1ULL),

(note that this is the reason I'm requesting changes - but I really think that this code needs to go away!)

damyanp

We need to follow LLVM naming conventions.

I also have feedback on how some of the calculations are structured that I really think you should take.

damyanp · 2025-11-25T17:30:42Z

tools/clang/unittests/HLSLExec/LongVectors.cpp

+    uint64_t LowWaveShift = (LowWaves < 64) ? (1ULL << LowWaves) : 0;
+    uint64_t HighWaveShift = (HighWaves < 64) ? (1ULL << HighWaves) : 0;
+
+    uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL),
+                          (HighWaveShift - 1 & ~0ULL)};


I've been trying to nudge you in (what I think is) the right direction here. It really bothers me that the "build the mask" and the "what we expect the result to be" are mangled into the same expression. This is my recommended approach that separates the concerns:

Suggested change

uint64_t LowWaveShift = (LowWaves < 64) ? (1ULL << LowWaves) : 0;

uint64_t HighWaveShift = (HighWaves < 64) ? (1ULL << HighWaves) : 0;

uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL),

(HighWaveShift - 1 & ~0ULL)};

uint64_t LowWaveMask = ((LowWaves < 64) ? (1ULL << LowWaves) : 0) - 1;

uint64_t HighWaveMask = ((HighWaves < 64) ? (1ULL << HighWaves) : 0) - 1;

uint64_t LowExpected = ~1ULL & LowWaveMask;

uint64_t HighExpected = ~0ULL & HighWaveMask;

IMO, separating it like this makes it clearer what the different calculations are.

Although this is take-it-or-leave-it, I do really think you should factor it this way!

joaosaffran added 3 commits November 19, 2025 18:25

init test

68ae80f

update test

fec685c

clean up

42a3127

joaosaffran requested review from alsepkow and damyanp November 20, 2025 19:43

github-project-automation bot added this to HLSL Roadmap Nov 20, 2025

github-project-automation bot moved this to New in HLSL Roadmap Nov 20, 2025

damyanp requested changes Nov 20, 2025

View reviewed changes

github-project-automation bot moved this from New to In progress in HLSL Roadmap Nov 20, 2025

joaosaffran marked this pull request as draft November 21, 2025 18:48

joaosaffran added 4 commits November 21, 2025 15:29

update test

997405c

fixing test

40c3814

clean up

30709ab

clean up 2

74b20df

joaosaffran marked this pull request as ready for review November 21, 2025 23:59

joaosaffran requested a review from damyanp November 21, 2025 23:59

alsepkow reviewed Nov 22, 2025

View reviewed changes

tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated Show resolved Hide resolved

joaosaffran requested a review from alsepkow November 24, 2025 19:34

clean test

77bc0fe

damyanp reviewed Nov 24, 2025

View reviewed changes

tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated Show resolved Hide resolved

damyanp reviewed Nov 24, 2025

View reviewed changes

tools/clang/unittests/HLSLExec/LongVectors.cpp Show resolved Hide resolved

damyanp reviewed Nov 24, 2025

View reviewed changes

tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated Show resolved Hide resolved

damyanp reviewed Nov 24, 2025

View reviewed changes

tools/clang/unittests/HLSLExec/ShaderOpArith.xml Outdated Show resolved Hide resolved

joaosaffran added 2 commits November 24, 2025 17:06

improve text

53aa9a2

format

f1944d3

joaosaffran added 5 commits November 24, 2025 17:55

fix redundant masking

4b692ce

Merge branch 'main' into test/wave-match

1224e9a

remove oopsi

1fd9a33

remove oopsi

c74ccd9

adding flag back

f976a91

joaosaffran commented Nov 25, 2025

View reviewed changes

joaosaffran requested a review from damyanp November 25, 2025 02:14

joaosaffran added 2 commits November 24, 2025 18:57

clean up

8d942fb

format

9e3f3f1

damyanp reviewed Nov 25, 2025

View reviewed changes

tools/clang/unittests/HLSLExec/LongVectors.cpp Outdated Show resolved Hide resolved

joaosaffran-zz added 2 commits November 25, 2025 09:19

remove unused parameter

9ca3790

renaming vars

f8869cf

joaosaffran requested a review from damyanp November 25, 2025 17:22

damyanp reviewed Nov 25, 2025

View reviewed changes

damyanp requested changes Nov 25, 2025

View reviewed changes

joaosaffran-zz added 3 commits November 25, 2025 10:26

change code

3efdb38

format

882e3e1

improve code

4cf8b98

joaosaffran requested a review from damyanp November 25, 2025 19:46

fix damyan comment

71dea03

joaosaffran marked this pull request as draft November 25, 2025 20:00

addressing comments from damyan

f51add7

joaosaffran marked this pull request as ready for review November 26, 2025 02:36

joaosaffran marked this pull request as draft November 26, 2025 18:16

updating code to follow llvm guide rules

360f6ff

inbelic approved these changes Nov 26, 2025

View reviewed changes

joaosaffran marked this pull request as ready for review November 26, 2025 23:19

damyanp approved these changes Nov 27, 2025

View reviewed changes

joaosaffran merged commit 78c43dd into microsoft:main Nov 27, 2025
12 checks passed

github-project-automation bot moved this from In progress to Done in HLSL Roadmap Nov 27, 2025

	uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL),
	uint64_t Result[2] = {(LowWaveShift - 1 & ~1ULL),

[HLK] Adding WaveMatch Long Vector Test #7937

[HLK] Adding WaveMatch Long Vector Test #7937

Uh oh!

Conversation

joaosaffran commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alsepkow commented Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alsepkow Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

joaosaffran Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

inbelic Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

inbelic commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

damyanp commented Nov 24, 2025

Uh oh!

Uh oh!

inbelic commented Nov 24, 2025

Uh oh!

joaosaffran Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

damyanp Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damyanp left a comment

Choose a reason for hiding this comment

Uh oh!

damyanp Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

joaosaffran commented Nov 20, 2025 •

edited

Loading

inbelic Nov 24, 2025 •

edited

Loading

damyanp Nov 25, 2025 •

edited

Loading