Skip to content

Conversation

@andyleiserson
Copy link
Contributor

@andyleiserson andyleiserson commented Mar 14, 2025

Connections
Addresses parts of #4394 and #6302.

Description
The WGSL && and || operators should short-circuit, i.e., not evaluate their RHS if the result can be determined from the LHS alone. This is accomplished by transforming expressions with these operators into an if statement, guarding the evaluation of the RHS. In the case of nested expressions using these operators, it is necessary to emit nested if statements.

Things I am not sure about:

  • I add a function with_nested_runtime_expression_ctx to emit the expressions for the RHS within the if statement. I don't understand the code well enough yet to be confident this approach is sound.
  • I raised a few issues about short-circuiting of constant expressions in WGSL: Const. eval. short-circuiting #6302 (comment). The implementation for constant expressions in this PR is quite simple, but has the drawback that it omits nearly all validation of the RHS. So we'd get correct behavior for correct programs, but in exchange we'd accept some programs that aren't valid.
  • I think this is unlikely to be worth the effort, but we could attempt to undo this transformation in the backends that support short-circuit behavior of these operators.

Testing
Adds a snapshot test.

Squash or Rebase?
Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • Needs a CHANGELOG.md entry.

@andyleiserson
Copy link
Contributor Author

Removing draft status so this PR shows up in triage.

@cwfitzgerald cwfitzgerald assigned teoxoy and unassigned jimblandy Jul 16, 2025
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement over what we currently have and we can iterate on it.

github-merge-queue bot pushed a commit to bevyengine/bevy that referenced this pull request Oct 13, 2025
Fixes #18904.

See also:
- gfx-rs/wgpu#4394
- gfx-rs/wgpu#7339

# A Debugging Story

In the above issue, the defect can be reproduced by running the
`3d_scene` example with an Intel adapter on Vulkan and observing the
following output:

<img width="1924" height="1126" alt="image"
src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082"
/>

## 1. Indirect Draw Args

The first thing to check is what's being passed to
`vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is
a little bizarre:

<img width="895" height="130" alt="image"
src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188"
/>

We should see two separate meshes here for the circular base and the
cube, but instead we see 3 items, two of which are identical with
exception of their `first_instance` id.

## 2. Reversed work item ordering

Trying to debug what could possibly be wrong with the shader input lead
to observing the `PreprocessWorkItem` buffer that gets passed to mesh
preprocessing. Here, there was a very conspicuous difference between
when things would work and when they would break:

```sh
// Working
work_items[0] = {input_index: 0, output_index: 0}
work_items[1] = {input_index: 1, output_index: 1}

// Broken
work_items[0] = {input_index: 1, output_index: 0}
work_items[1] = {input_index: 0, output_index: 1}
```

This was very conspicuous and likely due to ECS query instability.
However, the code looked like it should be robust enough to handle work
items in any order. Further, this works just fine on Nvidia, so the
ordering itself is clearly not the issue.

## 3. Memory ordering?

My first assumption was that this must be some kind of weirdness with
memory ordering or some other race only observable on Intel. This led me
to the following write:

```wgsl
    // If we're the first mesh instance in this batch, write the index of our
    // `MeshInput` into the appropriate slot so that the indirect parameters
    // building shader can access it.
    if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) {
        indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index;
    }
```

Even though the logic looks totally fine and shouldn't require any
synchronization, I tried making the write atomic, which didn't seem to
help at all. My next step was to try to remove the condition and just
unconditionally write. This could lead to some weirdness in the event of
batch size N > 1, but I just wanted to see what happened. And,... it
solved the bug?

## 4. SPIR-V de-compilation

This made no sense to me why this would fix things, so I decided to
decompile the shader and see what was going on, and found the following:

```c
  bool _247 = _236 == 0;
  uint _248 = _236 - 1;
  uint* _249 = &_221[_248].output_or_indirect_parameters_index;
  uint _250 = *_249;
  bool _251 = _250 != _246;
  bool _252 = _247 || _251;
  
  if(_252) {
    uint* _256 = &_227[_246].mesh_index;
    *_256 = _244;
  }
```

This looks wrong. The final condition `_247 || _251` doesn't seem right.
Checking and confirming, [`||` in WGSL is supposed to short
circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we
unconditionally read from
`&_221[_248].output_or_indirect_parameters_index;` AKA
`work_items[instance_index - 1]`. In the event `instance_index` is 0
that means we are OOB. Uh oh!!

## 5. Vendor differences

I'm not sure why this UB have any effect on Nvidia. But we can walk
through the entire bug:

On the first thread in the workgroup where `instance_index` is 0, we
will *always* OOB read, which on Intel seems to cause the thread to
terminate or otherwise return garbage that makes the condition fail or
something else weird. *However*, in the event that the first work item's
input/output index is *supposed* to be 0, everything happens to just
work, since the zero-initialized memory of the GPU metadata is by chance
correct. Thus, the bug only appears when things are sorted funny and a
batch set with a non-zero input index appears at the front of the work
items. Yikes!

The fix is to make sure that we only read from the prior input when we
are not the first item.
mate-h pushed a commit to mate-h/bevy that referenced this pull request Oct 22, 2025
Fixes bevyengine#18904.

See also:
- gfx-rs/wgpu#4394
- gfx-rs/wgpu#7339

# A Debugging Story

In the above issue, the defect can be reproduced by running the
`3d_scene` example with an Intel adapter on Vulkan and observing the
following output:

<img width="1924" height="1126" alt="image"
src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082"
/>

## 1. Indirect Draw Args

The first thing to check is what's being passed to
`vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is
a little bizarre:

<img width="895" height="130" alt="image"
src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188"
/>

We should see two separate meshes here for the circular base and the
cube, but instead we see 3 items, two of which are identical with
exception of their `first_instance` id.

## 2. Reversed work item ordering

Trying to debug what could possibly be wrong with the shader input lead
to observing the `PreprocessWorkItem` buffer that gets passed to mesh
preprocessing. Here, there was a very conspicuous difference between
when things would work and when they would break:

```sh
// Working
work_items[0] = {input_index: 0, output_index: 0}
work_items[1] = {input_index: 1, output_index: 1}

// Broken
work_items[0] = {input_index: 1, output_index: 0}
work_items[1] = {input_index: 0, output_index: 1}
```

This was very conspicuous and likely due to ECS query instability.
However, the code looked like it should be robust enough to handle work
items in any order. Further, this works just fine on Nvidia, so the
ordering itself is clearly not the issue.

## 3. Memory ordering?

My first assumption was that this must be some kind of weirdness with
memory ordering or some other race only observable on Intel. This led me
to the following write:

```wgsl
    // If we're the first mesh instance in this batch, write the index of our
    // `MeshInput` into the appropriate slot so that the indirect parameters
    // building shader can access it.
    if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) {
        indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index;
    }
```

Even though the logic looks totally fine and shouldn't require any
synchronization, I tried making the write atomic, which didn't seem to
help at all. My next step was to try to remove the condition and just
unconditionally write. This could lead to some weirdness in the event of
batch size N > 1, but I just wanted to see what happened. And,... it
solved the bug?

## 4. SPIR-V de-compilation

This made no sense to me why this would fix things, so I decided to
decompile the shader and see what was going on, and found the following:

```c
  bool _247 = _236 == 0;
  uint _248 = _236 - 1;
  uint* _249 = &_221[_248].output_or_indirect_parameters_index;
  uint _250 = *_249;
  bool _251 = _250 != _246;
  bool _252 = _247 || _251;
  
  if(_252) {
    uint* _256 = &_227[_246].mesh_index;
    *_256 = _244;
  }
```

This looks wrong. The final condition `_247 || _251` doesn't seem right.
Checking and confirming, [`||` in WGSL is supposed to short
circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we
unconditionally read from
`&_221[_248].output_or_indirect_parameters_index;` AKA
`work_items[instance_index - 1]`. In the event `instance_index` is 0
that means we are OOB. Uh oh!!

## 5. Vendor differences

I'm not sure why this UB have any effect on Nvidia. But we can walk
through the entire bug:

On the first thread in the workgroup where `instance_index` is 0, we
will *always* OOB read, which on Intel seems to cause the thread to
terminate or otherwise return garbage that makes the condition fail or
something else weird. *However*, in the event that the first work item's
input/output index is *supposed* to be 0, everything happens to just
work, since the zero-initialized memory of the GPU metadata is by chance
correct. Thus, the bug only appears when things are sorted funny and a
batch set with a non-zero input index appears at the front of the work
items. Yikes!

The fix is to make sure that we only read from the prior input when we
are not the first item.
@andyleiserson andyleiserson force-pushed the short-circuit branch 3 times, most recently from 7b61352 to e92b5d5 Compare October 29, 2025 20:51
mockersf pushed a commit to bevyengine/bevy that referenced this pull request Nov 17, 2025
Fixes #18904.

See also:
- gfx-rs/wgpu#4394
- gfx-rs/wgpu#7339

# A Debugging Story

In the above issue, the defect can be reproduced by running the
`3d_scene` example with an Intel adapter on Vulkan and observing the
following output:

<img width="1924" height="1126" alt="image"
src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082"
/>

## 1. Indirect Draw Args

The first thing to check is what's being passed to
`vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is
a little bizarre:

<img width="895" height="130" alt="image"
src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188"
/>

We should see two separate meshes here for the circular base and the
cube, but instead we see 3 items, two of which are identical with
exception of their `first_instance` id.

## 2. Reversed work item ordering

Trying to debug what could possibly be wrong with the shader input lead
to observing the `PreprocessWorkItem` buffer that gets passed to mesh
preprocessing. Here, there was a very conspicuous difference between
when things would work and when they would break:

```sh
// Working
work_items[0] = {input_index: 0, output_index: 0}
work_items[1] = {input_index: 1, output_index: 1}

// Broken
work_items[0] = {input_index: 1, output_index: 0}
work_items[1] = {input_index: 0, output_index: 1}
```

This was very conspicuous and likely due to ECS query instability.
However, the code looked like it should be robust enough to handle work
items in any order. Further, this works just fine on Nvidia, so the
ordering itself is clearly not the issue.

## 3. Memory ordering?

My first assumption was that this must be some kind of weirdness with
memory ordering or some other race only observable on Intel. This led me
to the following write:

```wgsl
    // If we're the first mesh instance in this batch, write the index of our
    // `MeshInput` into the appropriate slot so that the indirect parameters
    // building shader can access it.
    if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) {
        indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index;
    }
```

Even though the logic looks totally fine and shouldn't require any
synchronization, I tried making the write atomic, which didn't seem to
help at all. My next step was to try to remove the condition and just
unconditionally write. This could lead to some weirdness in the event of
batch size N > 1, but I just wanted to see what happened. And,... it
solved the bug?

## 4. SPIR-V de-compilation

This made no sense to me why this would fix things, so I decided to
decompile the shader and see what was going on, and found the following:

```c
  bool _247 = _236 == 0;
  uint _248 = _236 - 1;
  uint* _249 = &_221[_248].output_or_indirect_parameters_index;
  uint _250 = *_249;
  bool _251 = _250 != _246;
  bool _252 = _247 || _251;
  
  if(_252) {
    uint* _256 = &_227[_246].mesh_index;
    *_256 = _244;
  }
```

This looks wrong. The final condition `_247 || _251` doesn't seem right.
Checking and confirming, [`||` in WGSL is supposed to short
circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we
unconditionally read from
`&_221[_248].output_or_indirect_parameters_index;` AKA
`work_items[instance_index - 1]`. In the event `instance_index` is 0
that means we are OOB. Uh oh!!

## 5. Vendor differences

I'm not sure why this UB have any effect on Nvidia. But we can walk
through the entire bug:

On the first thread in the workgroup where `instance_index` is 0, we
will *always* OOB read, which on Intel seems to cause the thread to
terminate or otherwise return garbage that makes the condition fail or
something else weird. *However*, in the event that the first work item's
input/output index is *supposed* to be 0, everything happens to just
work, since the zero-initialized memory of the GPU metadata is by chance
correct. Thus, the bug only appears when things are sorted funny and a
batch set with a non-zero input index appears at the front of the work
items. Yikes!

The fix is to make sure that we only read from the prior input when we
are not the first item.
andyleiserson added a commit to andyleiserson/wgpu that referenced this pull request Nov 19, 2025
@andyleiserson andyleiserson merged commit 119b4ef into gfx-rs:trunk Nov 20, 2025
42 checks passed
@andyleiserson andyleiserson deleted the short-circuit branch November 20, 2025 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants