-
Notifications
You must be signed in to change notification settings - Fork 812
[HLK] Adding WaveMatch Long Vector Test #7937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Were you able to validate these? I would expect these to fail on WARP right now. |
| } | ||
| uint4 result = WaveMatch(Vector); | ||
| uint index = WaveGetLaneIndex() * 4; | ||
| if(index < NUM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the intent of using NUM was here.
NUM describes the number of elements in the input Vector, which isn't related to the number of lanes in the wave.
I think what you will want to do is store the mask from each lane. That is, your output should have WaveGetLaneCount() elements. With each element being a uint4. And in this test case you would expect to have two unique masks. With the first mask being for the first group in lane 0 only. And all other lanes in another group (mask).
That would greatly simplify your logic for computing expected values as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't need NUM check here. Thanks for pointing out.
The output is storing the result mask for each lane, the mask is 128 bits long, so I am breaking into groups of 32 bits. This is done, so I can inspect all the bits and make sure the correct ones are being set.
If I store it as a uint4, like here:
g_OutputVector.Store<uint>(WaveGetLaneIndex() * sizeof(uint4), result);Some bits get truncated to 0, which is not the correct value, since all but one lane will have a different value, the number that I expect here changes if I change the lane that has the different vector. Instead, to make sure the values are correct, I need to store uint each component of WaveMatch resulting mask, like so:
uint index = WaveGetLaneIndex() * 4;
g_OutputVector.Store<uint>(index * sizeof(uint), result.x);
g_OutputVector.Store<uint>((index + 1) * sizeof(uint), result.y);
g_OutputVector.Store<uint>((index + 2) * sizeof(uint), result.z);
g_OutputVector.Store<uint>((index + 3) * sizeof(uint), result.w);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a typo in your response, but shouldn't it be .Store<uint4>? I wouldn't expect that to truncate anything
|
From pair review with Alex, we were concerned that the WaveMatch spec does not specifically detail how bits for unused lanes are handled, so we should probably assume that it is undefined behaviour. For the sake of testing then we shouldn't check that those bits match being set to 0. So we might want/need to have the wave size always be 32 and only check the first component. However, we weren't sure if that was easily possible of the top of our heads. Do drivers ignore those bits or set them to zero? |
It looks to me that the spec does specify this (they're set to 0): "Bits in the mask corresponding to inactive lanes, or at positions beyond current implementation’s wave width, will contribute 0’s."
This isn't possible, not all devices support wave size == 32. In D3D we need to support 4, 8, 16, 32, 64 and 128 wave sizes. (Vulkan also supports 1 and 2). |
Ah skipped over that sentence... thanks. The last sentence tripped me up as I assumed it was only defined for pixel shaders. We can disregard needing 32 lanes then as we can match with the expected output = 0. |
| OP(Wave, WaveMultiPrefixBitAnd, 1, "TestWaveMultiPrefixBitAnd", "", " -DFUNC_WAVE_MULTI_PREFIX_BIT_AND=1 -DIS_WAVE_PREFIX_OP=1", "LongVectorOp", WaveMultiPrefixBitwise, Default2, Default3) | ||
| OP(Wave, WaveMultiPrefixBitOr, 1, "TestWaveMultiPrefixBitOr", "", " -DFUNC_WAVE_MULTI_PREFIX_BIT_OR=1 -DIS_WAVE_PREFIX_OP=1", "LongVectorOp", WaveMultiPrefixBitwise, Default2, Default3) | ||
| OP(Wave, WaveMultiPrefixBitXor, 1, "TestWaveMultiPrefixBitXor", "", " -DFUNC_WAVE_MULTI_PREFIX_BIT_XOR=1 -DIS_WAVE_PREFIX_OP=1", "LongVectorOp", WaveMultiPrefixBitwise, Default2, Default3) | ||
| OP_DEFAULT_DEFINES(Wave, WaveMatch, 1, "TestWaveMatch", "", " -DFUNC_WAVE_MATCH=1 -DIS_WAVE_PREFIX_OP=1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-DIS_WAVE_PREFIX_OP=1 is required, the test function returns void, since it handles writing to the out vector inside the test function, instead of delegating to main
| uint64_t LowWaveShift = (LowWaves < 64) ? (1ULL << LowWaves) : 0; | ||
| uint64_t HighWaveShift = (HighWaves < 64) ? (1ULL << HighWaves) : 0; | ||
|
|
||
| uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to follow LLVM naming conventions.
| uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL), | |
| uint64_t Result[2] = {(LowWaveShift - 1 & ~1ULL), |
(note that this is the reason I'm requesting changes - but I really think that this code needs to go away!)
damyanp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to follow LLVM naming conventions.
I also have feedback on how some of the calculations are structured that I really think you should take.
| uint64_t LowWaveShift = (LowWaves < 64) ? (1ULL << LowWaves) : 0; | ||
| uint64_t HighWaveShift = (HighWaves < 64) ? (1ULL << HighWaves) : 0; | ||
|
|
||
| uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL), | ||
| (HighWaveShift - 1 & ~0ULL)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to nudge you in (what I think is) the right direction here. It really bothers me that the "build the mask" and the "what we expect the result to be" are mangled into the same expression. This is my recommended approach that separates the concerns:
| uint64_t LowWaveShift = (LowWaves < 64) ? (1ULL << LowWaves) : 0; | |
| uint64_t HighWaveShift = (HighWaves < 64) ? (1ULL << HighWaves) : 0; | |
| uint64_t result[2] = {(LowWaveShift - 1 & ~1ULL), | |
| (HighWaveShift - 1 & ~0ULL)}; | |
| uint64_t LowWaveMask = ((LowWaves < 64) ? (1ULL << LowWaves) : 0) - 1; | |
| uint64_t HighWaveMask = ((HighWaves < 64) ? (1ULL << HighWaves) : 0) - 1; | |
| uint64_t LowExpected = ~1ULL & LowWaveMask; | |
| uint64_t HighExpected = ~0ULL & HighWaveMask; |
IMO, separating it like this makes it clearer what the different calculations are.
Although this is take-it-or-leave-it, I do really think you should factor it this way!
This patch adds support to WaveMatch in the HLK Long Vector Tests.
Tested with Wave sizes: 4, 8, 16, 32, 64, 128.
Closes: #7613