Optimize scalar ycbcr conversion #27

vstroebel · 2025-11-25T18:13:44Z

This gives the compiler more information to vectorize the code.
On zen3 with target-cpu=native this is nearly 40% faster in the ycbcr criterion micro benchmark than the current avx2 code path.

Shnatsel · 2025-11-29T16:36:51Z

This looks very promising! If we could also generate versions with #[target_feature] attributes for SSE 4.2 and AVX2 and dispatch to those, that'd be great!

The multiversion crate makes that easy, but I don't know if this will work inside a macro. If it doesn't, we change this function from being generated in a macro to being a const generic function.

Shnatsel · 2025-11-29T18:56:09Z

I can confirm the performance gain compared to explicit AVX on desktop Zen 4 as well, built with -C target-cpu=x86-64-v3 as opposed to native that would tune for my CPU specifically.

Optimize ycbcr conversion

d35f404

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize scalar ycbcr conversion #27

Optimize scalar ycbcr conversion #27

Uh oh!

vstroebel commented Nov 25, 2025

Uh oh!

Shnatsel commented Nov 29, 2025

Uh oh!

Shnatsel commented Nov 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize scalar ycbcr conversion #27

Are you sure you want to change the base?

Optimize scalar ycbcr conversion #27

Uh oh!

Conversation

vstroebel commented Nov 25, 2025

Uh oh!

Shnatsel commented Nov 29, 2025

Uh oh!

Shnatsel commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Shnatsel commented Nov 29, 2025 •

edited

Loading