-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[naga wgsl-in] Short-circuiting of && and || operators #7339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 tasks
Contributor
Author
|
Removing draft status so this PR shows up in triage. |
teoxoy
reviewed
Jul 28, 2025
Member
teoxoy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice improvement over what we currently have and we can iterate on it.
teoxoy
approved these changes
Jul 28, 2025
github-merge-queue bot
pushed a commit
to bevyengine/bevy
that referenced
this pull request
Oct 13, 2025
Fixes #18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.
mate-h
pushed a commit
to mate-h/bevy
that referenced
this pull request
Oct 22, 2025
Fixes bevyengine#18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.
7b61352 to
e92b5d5
Compare
mockersf
pushed a commit
to bevyengine/bevy
that referenced
this pull request
Nov 17, 2025
Fixes #18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.
andyleiserson
added a commit
to andyleiserson/wgpu
that referenced
this pull request
Nov 19, 2025
e92b5d5 to
51aba69
Compare
51aba69 to
7f98347
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Connections
Addresses parts of #4394 and #6302.
Description
The WGSL
&&and||operators should short-circuit, i.e., not evaluate their RHS if the result can be determined from the LHS alone. This is accomplished by transforming expressions with these operators into anifstatement, guarding the evaluation of the RHS. In the case of nested expressions using these operators, it is necessary to emit nestedifstatements.Things I am not sure about:
with_nested_runtime_expression_ctxto emit the expressions for the RHS within the if statement. I don't understand the code well enough yet to be confident this approach is sound.Testing
Adds a snapshot test.
Squash or Rebase?
Squash
Checklist
cargo fmt.taplo format.cargo clippy. If applicable, add:--target wasm32-unknown-unknowncargo xtask testto run tests.CHANGELOG.mdentry.