add alignment on JPEG [i16; 64] blocks #28

mcroomp · 2025-11-28T16:37:14Z

Ensure that we access coefficient blocks 32 byte aligned so that we can autovectorize and optimize some codepaths via SIMD

…uture optimizations and auto vectorization

vstroebel · 2025-11-29T15:31:54Z

Just some random thoughts, but wouldn't it be better to expose the 8x8 nature of the blocks to the compiler by introducing a Block structure that is backed by an [8; [i16;8]]?
In addition it might be possible to make the exact datatype of a "row" an implementation detail. This could make it possible to switch from a [i16;8] to a i16x8 if wide or std:simd is enabled.
I'm not sure sure whether this really works, but maybe this helps to unify some code of different fdct implementations.

Shnatsel · 2025-11-29T18:02:20Z

i16x8 should be a zero-cost conversion as long as the data is guaranteed to be aligned in memory, so I don't think it will be of any benefit compared to AlignedBlock.

We can even keep using unaligned loads for safety and the compiler will automatically transform them into aligned ones where possible.

mcroomp · 2025-11-29T18:04:44Z

The best row size depends on how wide the SIMD registers are, so it probably is better not to hardcode. The main advantage of this change is that even if you transmute to a SIMD in-place, the compile knows that it's properly aligned.

If you use the bytemuck crate you can cast directly from [i16;64] to the appropriate SIMD type safely (it statically asserts on the alignment).

mcroomp · 2025-11-29T18:07:25Z

Also there's some changes in here that came from cargo fmt, maybe as part of the checkin test you can add

cargo fmt --check

to the github action so that changes are always checked for formatting before checkin

mcroomp · 2025-11-29T20:23:10Z

I checked the generated code and it's true the compiler changes the loads to aligned with this change.

mcroomp added 4 commits November 28, 2025 17:33

add align on blocks so that we process on simd aligned to allow for f…

8cd3c7f

…uture optimizations and auto vectorization

aligned block

f756560

merge

3db9558

fix avx2

644411d

mcroomp added 2 commits November 29, 2025 18:52

merge

2916a60

fix merge

1c655d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add alignment on JPEG [i16; 64] blocks #28

add alignment on JPEG [i16; 64] blocks #28

Uh oh!

mcroomp commented Nov 28, 2025

Uh oh!

vstroebel commented Nov 29, 2025

Uh oh!

Shnatsel commented Nov 29, 2025

Uh oh!

mcroomp commented Nov 29, 2025

Uh oh!

mcroomp commented Nov 29, 2025

Uh oh!

mcroomp commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add alignment on JPEG [i16; 64] blocks #28

Are you sure you want to change the base?

add alignment on JPEG [i16; 64] blocks #28

Uh oh!

Conversation

mcroomp commented Nov 28, 2025

Uh oh!

vstroebel commented Nov 29, 2025

Uh oh!

Shnatsel commented Nov 29, 2025

Uh oh!

mcroomp commented Nov 29, 2025

Uh oh!

mcroomp commented Nov 29, 2025

Uh oh!

mcroomp commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants