Skip to content

Conversation

@mcroomp
Copy link

@mcroomp mcroomp commented Nov 28, 2025

Ensure that we access coefficient blocks 32 byte aligned so that we can autovectorize and optimize some codepaths via SIMD

@vstroebel
Copy link
Owner

Just some random thoughts, but wouldn't it be better to expose the 8x8 nature of the blocks to the compiler by introducing a Block structure that is backed by an [8; [i16;8]]?
In addition it might be possible to make the exact datatype of a "row" an implementation detail. This could make it possible to switch from a [i16;8] to a i16x8 if wide or std:simd is enabled.
I'm not sure sure whether this really works, but maybe this helps to unify some code of different fdct implementations.

@Shnatsel
Copy link
Contributor

i16x8 should be a zero-cost conversion as long as the data is guaranteed to be aligned in memory, so I don't think it will be of any benefit compared to AlignedBlock.

We can even keep using unaligned loads for safety and the compiler will automatically transform them into aligned ones where possible.

@mcroomp
Copy link
Author

mcroomp commented Nov 29, 2025

The best row size depends on how wide the SIMD registers are, so it probably is better not to hardcode. The main advantage of this change is that even if you transmute to a SIMD in-place, the compile knows that it's properly aligned.

If you use the bytemuck crate you can cast directly from [i16;64] to the appropriate SIMD type safely (it statically asserts on the alignment).

@mcroomp
Copy link
Author

mcroomp commented Nov 29, 2025

Also there's some changes in here that came from cargo fmt, maybe as part of the checkin test you can add

cargo fmt --check

to the github action so that changes are always checked for formatting before checkin

@mcroomp
Copy link
Author

mcroomp commented Nov 29, 2025

I checked the generated code and it's true the compiler changes the loads to aligned with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants