[naga wgsl-in] Short-circuiting of && and || operators #7339

andyleiserson · 2025-03-14T18:41:46Z

Connections
Addresses parts of #4394 and #6302.

Description
The WGSL && and || operators should short-circuit, i.e., not evaluate their RHS if the result can be determined from the LHS alone. This is accomplished by transforming expressions with these operators into an if statement, guarding the evaluation of the RHS. In the case of nested expressions using these operators, it is necessary to emit nested if statements.

Things I am not sure about:

I add a function with_nested_runtime_expression_ctx to emit the expressions for the RHS within the if statement. I don't understand the code well enough yet to be confident this approach is sound.
I raised a few issues about short-circuiting of constant expressions in WGSL: Const. eval. short-circuiting #6302 (comment). The implementation for constant expressions in this PR is quite simple, but has the drawback that it omits nearly all validation of the RHS. So we'd get correct behavior for correct programs, but in exchange we'd accept some programs that aren't valid.
I think this is unlikely to be worth the effort, but we could attempt to undo this transformation in the backends that support short-circuit behavior of these operators.

Testing
Adds a snapshot test.

Squash or Rebase?
Squash

Checklist

andyleiserson · 2025-07-02T15:10:40Z

Removing draft status so this PR shows up in triage.

teoxoy

Looks like a nice improvement over what we currently have and we can iterate on it.

naga/src/front/wgsl/lower/mod.rs

Fixes #18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.

Fixes bevyengine#18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.

Fixes #18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.

Addresses parts of gfx-rs#4394 and gfx-rs#6302

andyleiserson mentioned this pull request Mar 18, 2025

[naga] Disallow logical operators && and || on vectors #7368

Merged

6 tasks

andyleiserson assigned jimblandy Jun 11, 2025

andyleiserson marked this pull request as ready for review July 2, 2025 15:09

cwfitzgerald assigned teoxoy and unassigned jimblandy Jul 16, 2025

teoxoy reviewed Jul 28, 2025

View reviewed changes

naga/src/front/wgsl/lower/mod.rs Show resolved Hide resolved

teoxoy approved these changes Jul 28, 2025

View reviewed changes

tychedelia mentioned this pull request Oct 9, 2025

Fix Intel iGPU Rendering bevyengine/bevy#21475

Merged

andyleiserson force-pushed the short-circuit branch 3 times, most recently from 7b61352 to e92b5d5 Compare October 29, 2025 20:51

andyleiserson added a commit to andyleiserson/wgpu that referenced this pull request Nov 19, 2025

Resolve semantic merge conflict with gfx-rs#7339

048acfe

andyleiserson force-pushed the short-circuit branch from e92b5d5 to 51aba69 Compare November 19, 2025 23:39

andyleiserson mentioned this pull request Nov 19, 2025

[wgsl-in] Implement template list discovery and handle all type resolution in the lowerer #8386

Open

andyleiserson added 2 commits November 19, 2025 16:00

[naga wgsl-in] Short-circuiting of && and || operators

342f48e

Addresses parts of gfx-rs#4394 and gfx-rs#6302

Fix subgroup ops test to work with short-circuiting

7f98347

andyleiserson force-pushed the short-circuit branch from 51aba69 to 7f98347 Compare November 20, 2025 00:16

andyleiserson merged commit 119b4ef into gfx-rs:trunk Nov 20, 2025
42 checks passed

andyleiserson deleted the short-circuit branch November 20, 2025 01:06

jimblandy mentioned this pull request Nov 24, 2025

Naga GLSL front end does not properly short-circuit && and || #4394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[naga wgsl-in] Short-circuiting of && and || operators #7339

[naga wgsl-in] Short-circuiting of && and || operators #7339

Uh oh!

andyleiserson commented Mar 14, 2025 •

edited

Loading

Uh oh!

andyleiserson commented Jul 2, 2025

Uh oh!

teoxoy left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[naga wgsl-in] Short-circuiting of && and || operators #7339

[naga wgsl-in] Short-circuiting of && and || operators #7339

Uh oh!

Conversation

andyleiserson commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyleiserson commented Jul 2, 2025

Uh oh!

teoxoy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andyleiserson commented Mar 14, 2025 •

edited

Loading