Skip to content

Conversation

@inner-daemons
Copy link
Collaborator

@inner-daemons inner-daemons commented Oct 31, 2025

Connections
Works towards #7197
Builds on #8370

Description

Add a SPIR-V writer for mesh shaders

All major changes here are in the SPIR-V backend for naga and naga snapshots. Other "changes" are inherited from #8370

Testing

Mesh shader test WGSL is now written to SPIR-V as a snapshot. Mesh shader tests & example use naga-written spirv on vulkan backend.

Squash or Rebase?
Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

inner-daemons and others added 30 commits August 14, 2025 12:53
@amarhassam
Copy link

is this almost done ?

@inner-daemons
Copy link
Collaborator Author

@amarhassam

Haha yeah its basically done. I merged trunk a while back and now Im getting errors I can't understand (see commit log) but I will no doubt figure it out. I think @cwfitzgerald will get to reviewing it on the 26th, but it'll probably arrive after that.

@amarhassam
Copy link

@inner-daemons so its mainly you who is working on mesh shaders ?

@inner-daemons inner-daemons added this to the v28 milestone Nov 15, 2025
@inner-daemons
Copy link
Collaborator Author

@amarhassam Yeah, I try to post updates to #7197 every now and then, and you can track the status there. I've had 2 people offer to write the HLSL writer and WGSL writer, but not much progress has been made on either I think. Otherwise everything is just me

inner-daemons and others added 5 commits November 26, 2025 13:19
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
Co-authored-by: Connor Fitzgerald <connorwadefitzgerald@gmail.com>
@inner-daemons
Copy link
Collaborator Author

@cwfitzgerald All comments should be addressed, ready for round 2! (=2)

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments - also tests need to pass.

@inner-daemons
Copy link
Collaborator Author

I'll work on fixing the tests... at some point

@inner-daemons
Copy link
Collaborator Author

For future me to enjoy and ponder:

	<call no='157' class='pipe_context' method='create_fs_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_FRAGMENT
source_blake3: {0x36b56488, 0x1d19aa4b, 0xcbbf732d, 0x58f338fc, 0x4ff49005, 0x051203b8, 0x5a5ae0b8, 0x945f3dc4}
inputs_read: 32-33
outputs_written: 4
per_primitive_inputs: 33
subgroup_size: 0
bit_sizes_float: 0x20
bit_sizes_int: 0x20
origin_upper_left: true
inputs: 2
outputs: 1
decl_var shader_in INTERP_MODE_NONE none vec4 color (VARYING_SLOT_VAR0.xyzw, 0, 0)
decl_var per_primitive shader_in INTERP_MODE_NONE none vec4 colorMask (VARYING_SLOT_VAR1.xyzw, 1, 0)
decl_var shader_out INTERP_MODE_NONE none vec4 #0 (FRAG_RESULT_DATA0.xyzw, 0, 0)
decl_function fs_main () (entrypoint)

impl fs_main {
    block b0:   // preds:
    32     %0 = deref_var &color (shader_in vec4)
    32x4   %1 = @load_deref (%0) (access=none)
    32     %2 = deref_var &colorMask (shader_in vec4)
    32x4   %3 = @load_deref (%2) (access=none)
    32     %4 = fmul %1.x, %3.x
    32     %5 = fmul %1.y, %3.y
    32     %6 = fmul %1.z, %3.z
    32     %7 = fmul %1.w, %3.w
    32     %8 = deref_var &#0 (shader_out vec4)
    32x4   %9 = vec4 %4, %5, %6, %7
                @store_deref (%8, %9) (wrmask=xyzw, access=none)
                // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1da32f0</ptr></ret>
		<time><int>72</int></time>
	</call>
	<call no='158' class='pipe_context' method='create_ts_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_TASK
source_blake3: {0xa93be064, 0x9f407d12, 0x5b23394f, 0xc5a9b7f6, 0x9146aa9a, 0xd2190e79, 0x0df20856, 0xdec646c8}
workgroup_size: 1, 1, 1
shared_size: 4
task_payload_size: 20
subgroup_size: 0
Unhandled stage 6
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_function ts_main () (entrypoint)

impl ts_main {
    block b0:  // preds:
    32    %0 = load_const (0x3f800000 = 1.000000 = 1065353216)
    32    %1 = load_const (0x00000000)
               @store_shared (%0 (0x3f800000), %1 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32x4  %2 = load_const (0x3f800000, 0x3f800000, 0x00000000, 0x3f800000) = (1.000000, 1.000000, 0.000000, 1.000000) = (1065353216, 1065353216, 0, 1065353216)
               @store_task_payload (%2 (0x3f800000, 0x3f800000, 0x0, 0x3f800000), %1 (0x0)) (base=0, wrmask=xyzw, align_mul=256, align_offset=0)
    32    %3 = load_const (0x00000010 = 16)
    32    %4 = load_const (0x00000001 = 0.000000)
               @store_task_payload (%4 (0x1), %3 (0x10)) (base=0, wrmask=x, align_mul=256, align_offset=16)
    32x3  %5 = load_const (0x00000003, 0x00000001, 0x00000001) = (0.000000, 0.000000, 0.000000)
               @launch_mesh_workgroups (%5 (0x3, 0x1, 0x1)) (base=0, range=20)
               // succs: b1
    block b1:
}

]]></string></member><member name='stream_output'><struct name='pipe_stream_output_info'><member name='num_outputs'><uint>0</uint></member><member name='stride'><array><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem><elem><uint>0</uint></elem></array></member><member name='output'><array></array></member></struct></member></struct></arg>
		<ret><ptr>0x78e9e1d87610</ptr></ret>
		<time><int>25</int></time>
	</call>
	<call no='159' class='pipe_context' method='create_ms_state'>
		<arg name='pipe'><ptr>0x78e9e40b7010</ptr></arg>
		<arg name='state'><struct name='pipe_shader_state'><member name='type'><uint>1</uint></member><member name='tokens'><null/></member><member name='ir'><string><![CDATA[shader: MESA_SHADER_MESH
source_blake3: {0xa41a4a8c, 0xdb892a32, 0x86c55f33, 0x112e676c, 0xd27ecff8, 0x46a99a32, 0x1337b67d, 0xcc785c3c}
workgroup_size: 1, 1, 1
outputs_written: 0,27-28,32-33
system_values_read: 0x00000000'00000000'00000800'00000000
per_primitive_outputs: 27-28,33
shared_size: 140
task_payload_size: 20
subgroup_size: 0
bit_sizes_int: 0x21
ms_cross_invocation_output_access: 0,27-28,32-33
max_vertices_out: 3
max_primitives_out: 1
primitive_type: TRIANGLES
outputs: 5
decl_var shader_out INTERP_MODE_NONE none vec4[3] naga_vertex_builtin_outputs[*].field0 (VARYING_SLOT_POS.xyzw, 0, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none uvec3[1] naga_primitive_indices_outputs (VARYING_SLOT_PRIMITIVE_INDICES.xyz, 2, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none bool[1] naga_primitive_builtin_outputs[*].field0 (VARYING_SLOT_CULL_PRIMITIVE.x, 3, 0)
decl_var shader_out INTERP_MODE_NONE none vec4[3] #0 (VARYING_SLOT_VAR0.xyzw, 1, 0)
decl_var per_primitive shader_out INTERP_MODE_NONE none vec4[1] #1 (VARYING_SLOT_VAR1.xyzw, 4, 0)
decl_var task_payload INTERP_MODE_NONE none TaskPayload taskPayload
decl_var shared INTERP_MODE_NONE none float workgroupData = null
decl_var shared INTERP_MODE_NONE none MeshOutput mesh_output = null
decl_function ms_main () (entrypoint)

impl ms_main {
    block b0:   // preds:
    32     %0 = undefined
    32     %1 = @load_local_invocation_index
    32     %2 = load_const (0x00000004)
    32     %3 = load_const (0x00000084 = 132)
    32     %4 = load_const (0x00000003)
                @store_shared (%4 (0x3), %3 (0x84)) (base=0, wrmask=x, align_mul=256, align_offset=132)
    32     %5 = load_const (0x00000088 = 136)
    32     %6 = load_const (0x00000001)
                @store_shared (%6 (0x1), %5 (0x88)) (base=0, wrmask=x, align_mul=256, align_offset=136)
    32     %7 = load_const (0x40000000 = 2.000000 = 1073741824)
    32     %8 = load_const (0x00000000)
                @store_shared (%7 (0x40000000), %8 (0x0)) (base=0, wrmask=x, align_mul=256, align_offset=0)
    32     %9 = load_const (0x00000005)
    32x4  %10 = load_const (0x00000000, 0x3f800000, 0x00000000, 0x3f800000) = (0.000000, 1.000000, 0.000000, 1.000000) = (0, 1065353216, 0, 1065353216)
                @store_shared (%10 (0x0, 0x3f800000, 0x0, 0x3f800000), %2 (0x4)) (base=0, wrmask=xyzw, align_mul=256, align_offset=4)
    32x4  %11 = @load_task_payload (%8 (0x0)) (base=0, align_mul=256, align_offset=0)
    32x4  %12 = vec4 %8 (0x0), %11.y, %8 (0x0), %11.w
    32    %13 = load_const (0x00000010 = 16)
    32    %14 = load_const (0x00000014 = 20)
                @store_shared (%12, %14 (0x14)) (base=0, wrmask=xyzw, align_mul=256, align_offset=20)
    32    %15 = load_const (0x00000024 = 36)
    32x4  %16 = load_const (0xbf800000, 0xbf800000, 0x00000000, 0x3f800000) = (-1.000000, -1.000000, 0.000000, 1.000000) = (-1082130432, -1082130432, +0, +1065353216) = (3212836864, 3212836864, 0, 1065353216)
                @store_shared (%16 (0xbf800000, 0xbf800000, 0x0, 0x3f800000), %15 (0x24)) (base=0, wrmask=xyzw, align_mul=256, align_offset=36)
    32x4  %17 = vec4 %8 (0x0), %8 (0x0), %11.z, %11.w
    32    %18 = load_const (0x00000034 = 52)
                @store_shared (%17, %18 (0x34)) (base=0, wrmask=xyzw, align_mul=256, align_offset=52)
    32    %19 = load_const (0x00000044 = 68)
    32x4  %20 = load_const (0x3f800000, 0xbf800000, 0x00000000, 0x3f800000) = (1.000000, -1.000000, 0.000000, 1.000000) = (+1065353216, -1082130432, +0, +1065353216) = (1065353216, 3212836864, 0, 1065353216)
                @store_shared (%20 (0x3f800000, 0xbf800000, 0x0, 0x3f800000), %19 (0x44)) (base=0, wrmask=xyzw, align_mul=256, align_offset=68)
    32x4  %21 = vec4 %11.x, %8 (0x0), %8 (0x0), %11.w
    32    %22 = load_const (0x00000054 = 84)
                @store_shared (%21, %22 (0x54)) (base=0, wrmask=xyzw, align_mul=256, align_offset=84)
    32    %23 = load_const (0x00000064 = 100)
    32x3  %24 = load_const (0x00000000, 0x00000001, 0x00000002) = (0.000000, 0.000000, 0.000000)
                @store_shared (%24 (0x0, 0x1, 0x2), %23 (0x64)) (base=0, wrmask=xyz, align_mul=256, align_offset=100)
    32    %25 = @load_task_payload (%13 (0x10)) (base=0, align_mul=256, align_offset=16)
    1     %26 = ieq %25, %8 (0x0)
    32    %27 = load_const (0x00000070 = 112)
    32    %28 = b2b32 %26
                @store_shared (%28, %27 (0x70)) (base=0, wrmask=x, align_mul=256, align_offset=112)
    32    %29 = load_const (0x00000074 = 116)
    32x4  %30 = load_const (0x3f800000, 0x00000000, 0x3f800000, 0x3f800000) = (1.000000, 0.000000, 1.000000, 1.000000) = (1065353216, 0, 1065353216, 1065353216)
                @store_shared (%30 (0x3f800000, 0x0, 0x3f800000, 0x3f800000), %29 (0x74)) (base=0, wrmask=xyzw, align_mul=256, align_offset=116)
                @barrier (execution_scope=WORKGROUP, memory_scope=WORKGROUP, mem_semantics=ACQ|REL, mem_modes=shader_out|shared)
    32    %31 = @load_shared (%3 (0x84)) (base=0, align_mul=256, align_offset=132)
    32    %32 = umin %31, %4 (0x3)
    32    %33 = @load_shared (%5 (0x88)) (base=0, align_mul=256, align_offset=136)
    32    %34 = umin %33, %6 (0x1)
                @set_vertex_and_primitive_count (%32, %34, %0) (stream_id=0)
                // succs: b1
    loop {
        block b1:   // preds: b0 b4
        32    %35 = phi b0: %1, b4: %46
        1     %36 = uge %35, %32
                    // succs: b2 b3
        if %36 {
            block b2:// preds: b1
            break
            // succs: b5
        } else {
            block b3:  // preds: b1, succs: b4
        }
        block b4:   // preds: b3
        32    %37 = ishl %35, %9 (0x5)
        32    %38 = iadd %2 (0x4), %37
        32x4  %39 = @load_shared (%38) (base=0, align_mul=32, align_offset=4)
        32    %40 = iadd %14 (0x14), %37
        32x4  %41 = @load_shared (%40) (base=0, align_mul=32, align_offset=20)
        32    %42 = deref_var &naga_vertex_builtin_outputs[*].field0 (shader_out vec4[3])
        32    %43 = deref_array &(*%42)[%35] (shader_out vec4)  // &naga_vertex_builtin_outputs[*].field0[%35]
                    @store_deref (%43, %39) (wrmask=xyzw, access=none)
        32    %44 = deref_var &#0 (shader_out vec4[3])
        32    %45 = deref_array &(*%44)[%35] (shader_out vec4)  // &#0[%35]
                    @store_deref (%45, %41) (wrmask=xyzw, access=none)
        32    %46 = iadd %35, %6 (0x1)
                    // succs: b1
    }
    block b5:  // preds: b2, succs: b6
    loop {
        block b6:   // preds: b5 b9
        32    %47 = phi b5: %1, b9: %63
        1     %48 = uge %47, %34
                    // succs: b7 b8
        if %48 {
            block b7:// preds: b6
            break
            // succs: b10
        } else {
            block b8:  // preds: b6, succs: b9
        }
        block b9:   // preds: b8
        32    %49 = ishl %47, %9 (0x5)
        32    %50 = iadd %23 (0x64), %49
        32x3  %51 = @load_shared (%50) (base=0, align_mul=32, align_offset=4)
        32    %52 = iadd %27 (0x70), %49
        32    %53 = @load_shared (%52) (base=0, align_mul=32, align_offset=16)
        1     %54 = b2b1 %53
        32    %55 = iadd %29 (0x74), %49
        32x4  %56 = @load_shared (%55) (base=0, align_mul=32, align_offset=20)
        32    %57 = deref_var &naga_primitive_indices_outputs (shader_out uvec3[1])
        32    %58 = deref_array &(*%57)[%47] (shader_out uvec3)  // &naga_primitive_indices_outputs[%47]
                    @store_deref (%58, %51) (wrmask=xyz, access=none)
        32    %59 = deref_var &naga_primitive_builtin_outputs[*].field0 (shader_out bool[1])
        32    %60 = deref_array &(*%59)[%47] (shader_out bool)  // &naga_primitive_builtin_outputs[*].field0[%47]
                    @store_deref (%60, %54) (wrmask=x, access=none)
        32    %61 = deref_var &#1 (shader_out vec4[1])
        32    %62 = deref_array &(*%61)[%47] (shader_out vec4)  // &#1[%47]
                    @store_deref (%62, %56) (wrmask=xyzw, access=none)
        32    %63 = iadd %47, %6 (0x1)
                    // succs: b6
    }
    block b10:  // preds: b7, succs: b11
    block b11:
}

]]>

@inner-daemons
Copy link
Collaborator Author

I also have some AMD GPU code from windows that I'll upload at some point but its absolutely unreadable to my eyes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants