Skip to content

Conversation

@vstroebel
Copy link
Owner

This gives the compiler more information to vectorize the code.
On zen3 with target-cpu=native this is nearly 40% faster in the ycbcr criterion micro benchmark than the current avx2 code path.

@Shnatsel
Copy link
Contributor

This looks very promising! If we could also generate versions with #[target_feature] attributes for SSE 4.2 and AVX2 and dispatch to those, that'd be great!

The multiversion crate makes that easy, but I don't know if this will work inside a macro. If it doesn't, we change this function from being generated in a macro to being a const generic function.

@Shnatsel
Copy link
Contributor

Shnatsel commented Nov 29, 2025

I can confirm the performance gain compared to explicit AVX on desktop Zen 4 as well, built with -C target-cpu=x86-64-v3 as opposed to native that would tune for my CPU specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants