11# x86-simd-sort
22
33C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type
4- sorting algorithms on x86 processors. We currently have AVX-512 and AVX2
5- (32-bit and 64-bit only) based implementation of quicksort, quickselect,
6- partialsort, argsort, argselect & key-value
7- sort. The following API's are currently supported:
4+ sorting algorithms on x86 processors. We currently have AVX-512 and AVX2 based
5+ implementation of quicksort, quickselect, partialsort, argsort, argselect &
6+ key-value sort. The static methods can be used by including
7+ ` src/x86simdsort-static-incl.h ` file. Compiling them with the appropriate
8+ compiler flags will choose either the AVX-512 or AVX2 versions. For AVX-512, we
9+ recommend using -march=skylake-avx512 for 32-bit and 64-bit datatypes,
10+ -march=icelake-client for 16-bit datatype and -march=sapphirerapids for
11+ _ Float16. For AVX2 just using -mavx2 will suffice. The following API's are
12+ currently supported:
813
914#### Quicksort
1015
@@ -13,8 +18,7 @@ Equivalent to `qsort` in
1318` std::sort ` in [ C++] ( https://en.cppreference.com/w/cpp/algorithm/sort ) .
1419
1520``` cpp
16- void avx512_qsort<T>(T* arr, size_t arrsize, bool hasnan = false , bool descending = false );
17- void avx2_qsort<T>(T* arr, size_t arrsize, bool hasnan = false , bool descending = false );
21+ void x86simdsortStatic::qsort<T>(T* arr, size_t arrsize, bool hasnan = false , bool descending = false );
1822```
1923Supported datatypes: ` uint16_t ` , ` int16_t ` , ` _Float16 ` , ` uint32_t ` , ` int32_t ` ,
2024` float ` , ` uint64_t ` , ` int64_t ` and ` double ` . AVX2 versions currently support
@@ -30,8 +34,7 @@ Equivalent to `std::nth_element` in
3034
3135
3236``` cpp
33- void avx512_qselect<T>(T* arr, size_t k, size_t arrsize, bool hasnan = false , bool descending = false );
34- void avx2_qselect<T>(T* arr, size_t k, size_t arrsize, bool hasnan = false , bool descending = false );
37+ void x86simdsortStatic::qselect<T>(T* arr, size_t k, size_t arrsize, bool hasnan = false , bool descending = false );
3538```
3639Supported datatypes: ` uint16_t ` , ` int16_t ` , ` _Float16 ` , ` uint32_t ` , ` int32_t ` ,
3740` float ` , ` uint64_t ` , ` int64_t ` and ` double ` . AVX2 versions currently support
@@ -46,8 +49,7 @@ Equivalent to `std::partial_sort` in
4649
4750
4851``` cpp
49- void avx512_partial_qsort<T>(T* arr, size_t k, size_t arrsize, bool hasnan = false , bool descending = false )
50- void avx2_partial_qsort<T>(T* arr, size_t k, size_t arrsize, bool hasnan = false , bool descending = false )
52+ void x86simdsortStatic::partial_qsort<T>(T* arr, size_t k, size_t arrsize, bool hasnan = false , bool descending = false )
5153```
5254Supported datatypes: ` uint16_t ` , ` int16_t ` , ` _Float16 ` , ` uint32_t ` , ` int32_t ` ,
5355` float ` , ` uint64_t ` , ` int64_t ` and ` double ` . AVX2 versions currently support
@@ -61,8 +63,7 @@ Equivalent to `np.argsort` in
6163[ NumPy] ( https://numpy.org/doc/stable/reference/generated/numpy.argsort.html ) .
6264
6365``` cpp
64- void avx512_argsort<T>(T* arr, size_t *arg, size_t arrsize, bool hasnan = false , bool descending = false );
65- void avx2_argsort<T>(T* arr, size_t *arg, size_t arrsize, bool hasnan = false , bool descending = false );
66+ void x86simdsortStatic::argsort<T>(T* arr, size_t *arg, size_t arrsize, bool hasnan = false , bool descending = false );
6667```
6768Supported datatypes: ` uint32_t ` , ` int32_t ` , ` float ` , ` uint64_t ` , ` int64_t ` and
6869` double ` .
@@ -74,8 +75,7 @@ Equivalent to `np.argselect` in
7475[ NumPy] ( https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html ) .
7576
7677``` cpp
77- void avx512_argselect<T>(T* arr, size_t *arg, size_t k, size_t arrsize);
78- void avx2_argselect<T>(T* arr, size_t *arg, size_t k, size_t arrsize);
78+ void x86simdsortStatic::argselect<T>(T* arr, size_t *arg, size_t k, size_t arrsize, bool hasnan = false );
7979```
8080Supported datatypes: ` uint32_t ` , ` int32_t ` , ` float ` , ` uint64_t ` , ` int64_t ` and
8181` double ` .
@@ -84,10 +84,10 @@ The algorithm resorts to scalar `std::sort` if the array contains NaNs.
8484
8585#### Key-value sort
8686``` cpp
87- void avx512_qsort_kv<T1, T2>(T1* key, T2* value, size_t arrsize);
88- void avx2_qsort_kv<T1, T2>(T1* key, T2* value, size_t arrsize);
87+ void x86simdsortStatic::keyvalue_qsort<T1, T2>(T1* key, T2* value, size_t arrsize, bool hasnan = false , bool descending = false );
8988```
90- Supported datatypes: ` uint64_t ` , ` int64_t ` and ` double ` .
89+ Supported datatypes: ` uint32_t ` , ` int32_t ` , ` float ` , ` uint64_t ` , ` int64_t ` and
90+ ` double ` .
9191
9292## Algorithm details
9393
@@ -106,9 +106,7 @@ source code associated with that paper [3].
106106### Sample code ` main.cpp `
107107
108108``` cpp
109- #include " src/xss-common-includes.h"
110- #include " src/xss-common-qsort.h"
111- #include " src/avx512-32bit-qsort.hpp"
109+ #include " src/x86simdsort-static-incl.h"
112110
113111int main () {
114112 const int ARRSIZE = 1000;
@@ -120,7 +118,7 @@ int main() {
120118 }
121119
122120 /* call avx512 quicksort */
123- avx512_qsort (arr.data(), ARRSIZE);
121+ x86simdsortStatic::qsort (arr.data(), ARRSIZE);
124122 return 0;
125123}
126124
@@ -129,7 +127,8 @@ int main() {
129127### Build using g++
130128
131129```
132- g++ main.cpp -mavx512f -mavx512dq -O3
130+ g++ main.cpp -mavx512f -mavx512dq -mavx512vl -O3 /* for AVX-512 */
131+ g++ main.cpp -mavx2 -O3 /* for AVX2 */
133132```
134133
135134If you are using src files directly, then it is a header file only and we do
@@ -142,7 +141,7 @@ to include and build this library with your source code.
142141## Build requirements
143142
144143The sorting routines relies only on the C++ Standard Library and requires a
145- relatively modern compiler to build (gcc 8.x and above).
144+ relatively modern compiler to build (ex: gcc 8.x and above).
146145
147146## Instruction set requirements
148147
0 commit comments