Skip to content

Intrinsic implementation prevents instruction shuffling #48

@krzikalla

Description

@krzikalla

Just consider the generated code for the two foo variants:

https://godbolt.org/z/P7TMe4vax

The code using double interleaves multiplications for the first and second summand to achieve more ILP. The code using std::simd doesn't do this.
IMHO the reason for this is the usage of intrinsics in the std::simd implementation, which prevents such optimizations. This can have quite an impact on the acceleration you get from vectorization.
I have no good solution for this issue yet, I just want to raise the awareness here.
Maybe some annotations for intrinsics must be introduced, which can tell the compiler, that the annotated intrinsic is allowed to be optimized.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions