Skip to content

Conversation

@Jenya705
Copy link
Contributor

@Jenya705 Jenya705 commented Nov 30, 2025

Objective

Enables accessing slices from tables directly via Queries.

Fixes: #21861

Solution

One new trait:

  • ContiguousQueryData allows to fetch all values from tables all at once (an implementation for &T returns a slice of components in the set table, for &mut T returns a mutable slice of components in the set table as well as a struct with methods to set update ticks (to match the fetch implementation))

A method as_contiguous_iter in QueryIter making possible to iterate using these traits.

Macro QueryData was updated to support contiguous items when contiguous(target) attribute is added (a target can be all, mutable and immutable, refer to the custom_query_param example)

Testing

  • sparse_set_contiguous_query test verifies that you can't use next_contiguous with sparse set components
  • test_contiguous_query_data test verifies that returned values are valid
  • base_contiguous benchmark (file is named iter_simple_contiguous.rs)
  • base_no_detection benchmark (file is named iter_simple_no_detection.rs)
  • base_no_detection_contiguous benchmark (file is named iter_simple_no_detection_contiguous.rs)
  • base_contiguous_avx2 benchmark (file is named iter_simple_contiguous_avx2.rs)

Showcase

Examples contiguous_query, custom_query_param

Example

let mut world = World::new();
let mut query = world.query::<(&Velocity, &mut Position)>();
let mut iter = query.iter_mut(&mut world);
// velocity's type is &[Velocity]
// position's type is &mut [Position]
// ticks's type is ContiguousComponentTicks
for (velocity, (position, mut ticks)) in iter.as_contiguous_iter().unwrap() {
    for (v, p) in velocity.iter().zip(position.iter_mut()) {
        p.0 += v.0;
    }
    // sets ticks
    ticks.mark_all_as_updated();
}

Benchmarks

Code for base benchmark:

#[derive(Component, Copy, Clone)]
struct Transform(Mat4);

#[derive(Component, Copy, Clone)]
struct Position(Vec3);

#[derive(Component, Copy, Clone)]
struct Rotation(Vec3);

#[derive(Component, Copy, Clone)]
struct Velocity(Vec3);

pub struct Benchmark<'w>(World, QueryState<(&'w Velocity, &'w mut Position)>);

impl<'w> Benchmark<'w> {
    pub fn new() -> Self {
        let mut world = World::new();

        world.spawn_batch(core::iter::repeat_n(
            (
                Transform(Mat4::from_scale(Vec3::ONE)),
                Position(Vec3::X),
                Rotation(Vec3::X),
                Velocity(Vec3::X),
            ),
            10_000,
        ));

        let query = world.query::<(&Velocity, &mut Position)>();
        Self(world, query)
    }

    #[inline(never)]
    pub fn run(&mut self) {
        for (velocity, mut position) in self.1.iter_mut(&mut self.0) {
            position.0 += velocity.0;
        }
    }
}

Iterating over 10000 entities from one table and increasing a 3-dimensional vector from component Position by a 3-dimensional vector from component Velocity

Name Time Time (AVX2) Description
base 5.5828 µs 5.5122 µs Iteration over components
base_contiguous 4.8825 µs 1.8665 µs Iteration over contiguous chunks
base_contiguous_avx2 2.0740 µs 1.8665 µs Iteration over contiguous chunks with enforced avx2 optimizations
base_no_detection 4.8065 µs 4.7723 µs Iteration over components while bypassing change detection through bypass_change_detection() method
base_no_detection_contiguous 4.3979 µs 1.5797 µs Iteration over components without registering update ticks

Using contiguous 'iterator' makes the program a little bit faster and it can be further vectorized to make it even faster

Things to think about

  • The neediness of offset parameter in ContiguousQueryData

@Jenya705 Jenya705 marked this pull request as draft November 30, 2025 15:04
@Jondolf Jondolf added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way labels Nov 30, 2025
@hymm
Copy link
Contributor

hymm commented Dec 1, 2025

@Jenya705
Copy link
Contributor Author

Jenya705 commented Dec 1, 2025

This pr just enables slices from tables to be returned directly when applicable, it doesn't implement any batches and it doesn't ensure any specific (other than rust's) alignment (yet these slices may be used to apply simd).

  • Am I right in my understanding that some things might not properly vectorize due to alignment issues even if they use as_contiguous_iter?

This pr doesn't deal with any alignments but (as of my understanding) you can always take sub-slices which would meet your alignment requirements. And just referring to the issue #21861, even without any specific alignment the code gets vectorized.

No, the returned slices do not have any specific (other than rust's) alignment requirements.

@chengts95
Copy link

The solution looks promising to solve issue #21861.

If you want to use SIMD instructions explicitly, alignment is something you usually have to manage yourself (with an aligned allocator or a peeled prologue). Auto-vectorization won’t “update” the alignment for you – it just uses whatever alignment it can prove and otherwise emits unaligned loads. From that perspective, a contiguous slice is already sufficient; fully aligned SIMD is a separate concern on top of that.

Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review, but onboard with the general approach in this pr. Overall this is fairly straightforward. I imagine we'll eventually want to have some simd aligned storage, but in the meantime users can probably align their components manually.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.

@Jenya705 Jenya705 marked this pull request as ready for review December 3, 2025 20:02
@alice-i-cecile alice-i-cecile added the M-Release-Note Work that should be called out in the blog due to impact label Dec 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note.

Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes.

Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a review of the changes to ThinSlicePtr. I'll try to review more later.

///
/// # Safety
///
/// `len` must be less or equal to the length of the slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `len` must be less or equal to the length of the slice.
/// `len` must be less than or equal to the length of the slice.

#[cfg(debug_assertions)]
assert!(len <= self.len, "tried to create an out-of-bounds slice");

// SAFETY: The caller guarantees `len` is not greater than the length of the slice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this safety comment you should also be asserting why all of these bullets points are met for from_raw_parts. The caller is only covering for the last bullet point. https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html#safety

  • data must be non-null, valid for reads for len * size_of::() many bytes, and it must be properly aligned.

  • data must point to len consecutive properly initialized values of type T.

  • The memory referenced by the returned slice must not be mutated for the duration of lifetime 'a, except inside an UnsafeCell.

  • The total size len * size_of::() of the slice must be no larger than isize::MAX, and adding that size to data must not “wrap around” the address space. See the safety documentation of pointer::offset.

}

/// Casts the slice to another type
pub fn cast<U>(&self) -> ThinSlicePtr<'a, U> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this needs to be unsafe. This has a safety requirement that T and U have the same layout and valid bit representations.

/// # Safety
///
/// - There must not be any aliases to the slice
/// - `len` must be less or equal to the length of the slice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sound? we're basically converting a &[UnsafeCell<T>] to a &mut [T]. Feels like that shouldn't be sound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests that mutate the change ticks? I would expect miri to catch this if we do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way M-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raw table iteration to improve query iteration speed by bypassing change ticks

5 participants