Intrinsic implementation prevents instruction shuffling

Just consider the generated code for the two `foo` variants:

https://godbolt.org/z/P7TMe4vax

The code using `double` interleaves multiplications for the first and second summand to achieve more ILP.  The code using `std::simd` doesn't do this. 
IMHO the reason for this is the usage of intrinsics in the `std::simd` implementation, which prevents such optimizations. This can have quite an impact on the acceleration you get from vectorization.
I have no good solution for this issue yet, I just want to raise the awareness here.
Maybe some annotations for intrinsics must be introduced, which can tell the compiler, that the annotated intrinsic is allowed to be optimized.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intrinsic implementation prevents instruction shuffling #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Intrinsic implementation prevents instruction shuffling #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions