Skip to content

Commit b880258

Browse files
Improve generated doc
- explicitly document the dispatching system - more cross reference - typo fixes - and many small details like tag dispatching mechanism...
1 parent c5e9d93 commit b880258

File tree

9 files changed

+119
-35
lines changed

9 files changed

+119
-35
lines changed

docs/Doxyfile

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
PROJECT_NAME = "xsimd"
22
XML_OUTPUT = xml
3-
INPUT = ../include-refactoring/xsimd/types/xsimd_api.hpp \
4-
../include-refactoring/xsimd/types/xsimd_batch.hpp \
5-
../include-refactoring/xsimd/config/xsimd_config.hpp \
6-
../include-refactoring/xsimd/memory/xsimd_aligned_allocator.hpp
3+
INPUT = ../include/xsimd/types/xsimd_api.hpp \
4+
../include/xsimd/types/xsimd_batch.hpp \
5+
../include/xsimd/config/xsimd_arch.hpp \
6+
../include/xsimd/config/xsimd_config.hpp \
7+
../include/xsimd/memory/xsimd_alignment.hpp \
8+
../include/xsimd/memory/xsimd_aligned_allocator.hpp
79
GENERATE_LATEX = NO
810
GENERATE_MAN = NO
911
GENERATE_RTF = NO
@@ -14,3 +16,7 @@ RECURSIVE = YES
1416
QUIET = YES
1517
JAVADOC_AUTOBRIEF = YES
1618
WARN_IF_UNDOCUMENTED = NO
19+
ENABLE_PREPROCESSING = YES
20+
MACRO_EXPANSION = YES
21+
EXPAND_ONLY_PREDEF = YES
22+
PREDEFINED = XSIMD_NO_DISCARD=

docs/source/api/batch_index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
55
The full license is in the file LICENSE, distributed with this software.
66
7-
Wrapper types
8-
=============
7+
Batch types
8+
===========
99

1010
.. toctree::
1111

docs/source/api/data_transfer.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,10 @@ Data transfer
1111
:project: xsimd
1212
:content-only:
1313

14+
The following empty types are used for tag dispatching:
15+
16+
.. doxygenstruct:: xsimd::aligned_mode
17+
:project: xsimd
18+
19+
.. doxygenstruct:: xsimd::unaligned_mode
20+
:project: xsimd

docs/source/basic_usage.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ Here is an example that computes the mean of two sets of 4 double floating point
2828
return 0;
2929
}
3030
31+
Note that in that case, the instruction set is explicilty specified in the batch type.
32+
3133
This example outputs:
3234

3335
.. code::
@@ -37,7 +39,7 @@ This example outputs:
3739
Auto detection of the instruction set extension to be used
3840
----------------------------------------------------------
3941

40-
The same computation operating on vectors and using the most performant instruction set available:
42+
The same computation operating on vectors and using the most performant instruction set available, using a code that's generic on the batch size:
4143

4244
.. code::
4345
@@ -67,3 +69,4 @@ The same computation operating on vectors and using the most performant instruct
6769
}
6870
}
6971
72+
In that case, the architecture is chosen based on the compilation flags, prioritizing the largest width and the most recent instruction set.

docs/source/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ vendors and compilers.
1919
`xsimd` provides a unified means for using these features for library authors. Namely, it enables manipulation of batches of numbers with the same arithmetic
2020
operators as for single values. It also provides accelerated implementation of common mathematical functions operating on batches.
2121

22+
`xsimd` makes it easy to write a single algorithm, generate one version of the algorithm per micro-architecture and pick the best one at runtime, based on the
23+
running processor capability.
24+
2225
You can find out more about this implementation of C++ wrappers for SIMD intrinsics at the `The C++ Scientist`_. The mathematical functions are a
2326
lightweight implementation of the algorithms also used in `boost.SIMD`_.
2427

@@ -80,6 +83,7 @@ This software is licensed under the BSD-3-Clause license. See the LICENSE file f
8083
api/batch_manip
8184
api/math_index
8285
api/aligned_allocator
86+
api/dispatching
8387

8488
.. _The C++ Scientist: http://johanmabille.github.io/blog/archives/
8589
.. _boost.SIMD: https://github.com/NumScale/boost.simd

docs/source/installation.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,27 +21,27 @@
2121
Installation
2222
============
2323

24-
Although ``xsimd`` is a header-only library, we provide standardized means to install it, with package managers or with cmake.
24+
Although `xsimd` is a header-only library, we provide standardized means to install it, with package managers or with cmake.
2525

26-
Besides the xsimd headers, all these methods place the ``cmake`` project configuration file in the right location so that third-party projects can use cmake's ``find_package`` to locate xsimd headers.
26+
Besides the `xsimd` headers, all these methods place the ``cmake`` project configuration file in the right location so that third-party projects can use cmake's ``find_package`` to locate `xsimd` headers.
2727

2828
.. image:: conda.svg
2929

3030
Using the conda-forge package
3131
-----------------------------
3232

33-
A package for xsimd is available for the mamba (or conda) package manager.
33+
A package for `xsimd` is available for the `mamba <https://mamba.readthedocs.io>`_ (or `conda <https://conda.io>`_) package manager.
3434

3535
.. code::
3636
37-
mamba install -c conda-forge xsimd
37+
mamba install -c conda-forge xsimd
3838
3939
.. image:: spack.svg
4040

4141
Using the Spack package
4242
-----------------------
4343

44-
A package for xsimd is available on the Spack package manager.
44+
A package for `xsimd` is available on the `Spack <https://spack.io>`_ package manager.
4545

4646
.. code::
4747
@@ -53,7 +53,7 @@ A package for xsimd is available on the Spack package manager.
5353
From source with cmake
5454
----------------------
5555

56-
You can also install ``xsimd`` from source with cmake. On Unix platforms, from the source directory:
56+
You can also install `xsimd` from source with `cmake <https://cmake.org/>`_. On Unix platforms, from the source directory:
5757

5858
.. code::
5959

docs/source/vectorized_code.rst

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ How can we used `xsimd` to take advantage of vectorization ?
2828
Explicit use of an instruction set
2929
----------------------------------
3030

31-
`xsimd` provides the template class ``batch<T, A>`` where ``A`` is the target architecture and ``T`` the type of the values involved in SIMD
32-
instructions. If you know which intruction set is available on your machine, you can directly use the corresponding specialization
31+
`xsimd` provides the template class :cpp:class:`xsimd::batch` parametrized by ``T`` and ``A`` types where ``T`` is the type of the values involved in SIMD
32+
instructions and ``A`` is the target architecture. If you know which instruction set is available on your machine, you can directly use the corresponding specialization
3333
of ``batch``. For instance, assuming the AVX instruction set is available, the previous code can be vectorized the following way:
3434

3535
.. code::
@@ -60,19 +60,19 @@ of ``batch``. For instance, assuming the AVX instruction set is available, the p
6060
}
6161
6262
However, if you want to write code that is portable, you cannot rely on the use of ``batch<double, xsimd::avx>``.
63-
Indeed this won't compile on a CPU where only SSE2 instruction set is available for instance. Fortuantely, if you don't set the second template parameter, ``xsimd`` picks the best architecture among the one available, based on the compiler flag you use.
63+
Indeed this won't compile on a CPU where only SSE2 instruction set is available for instance. Fortunately, if you don't set the second template parameter, `xsimd` picks the best architecture among the one available, based on the compiler flag you use.
6464

6565

6666
Aligned vs unaligned memory
6767
---------------------------
6868

69-
In the previous example, you may have noticed the ``load_unaligned/store_unaligned`` functions. These
69+
In the previous example, you may have noticed the :cpp:func:`xsimd::batch::load_unaligned` and :cpp:func:`xsimd::batch::store_unaligned` functions. These
7070
are meant for loading values from contiguous dynamically allocated memory into SIMD registers and
7171
reciprocally. When dealing with memory transfer operations, some instructions sets required the memory
7272
to be aligned by a given amount, others can handle both aligned and unaligned modes. In that latter case,
73-
operating on aligned memory is always faster than operating on unaligned memory.
73+
operating on aligned memory is generally faster than operating on unaligned memory.
7474

75-
`xsimd` provides an aligned memory allocator which follows the standard requirements, so it can be used
75+
`xsimd` provides an aligned memory allocator, namely :cpp:class:`xsimd::aligned_allocator` which follows the standard requirements, so it can be used
7676
with STL containers. Let's change the previous code so it can take advantage of this allocator:
7777

7878
.. code::
@@ -118,7 +118,7 @@ mechanism that allows you to easily write such a generic code:
118118
#include "xsimd/xsimd.hpp"
119119
120120
template <class C, class Tag>
121-
void mean(const C& a, const C& b, C& res)
121+
void mean(const C& a, const C& b, C& res, Tag)
122122
{
123123
using b_type = xsimd::batch<double>;
124124
std::size_t inc = b_type::size;
@@ -139,10 +139,50 @@ mechanism that allows you to easily write such a generic code:
139139
}
140140
}
141141
142-
Here, the ``Tag`` template parameter can be ``xsimd::aligned_mode`` or ``xsimd::unaligned_mode``. Assuming the existence
143-
of a ``get_alignment_tag`` metafunction in the code, the previous code can be invoked this way:
142+
Here, the ``Tag`` template parameter can be :cpp:struct:`xsimd::aligned_mode` or :cpp:struct:`xsimd::unaligned_mode`. Assuming the existence
143+
of a ``get_alignment_tag`` meta-function in the code, the previous code can be invoked this way:
144144

145145
.. code::
146146
147-
mean<get_alignment_tag<decltype(a)>>(a, b, res);
147+
mean(a, b, res, get_alignment_tag<decltype(a)>());
148148
149+
Writing arch-independent code
150+
-----------------------------
151+
152+
If your code may target either SSE2, AVX2 or AVX512 instruction set, `xsimd`
153+
make it possible to make your code even more generic by using the architecture
154+
as a template parameter:
155+
156+
157+
.. code::
158+
159+
#include <cstddef>
160+
#include <vector>
161+
#include "xsimd/xsimd.hpp"
162+
163+
struct mean {
164+
template <class C, class Tag, class Arch>
165+
void operator()(Arch, const C& a, const C& b, C& res, Tag)
166+
{
167+
using b_type = xsimd::batch<double, Arch>;
168+
std::size_t inc = b_type::size;
169+
std::size_t size = res.size();
170+
// size for which the vectorization is possible
171+
std::size_t vec_size = size - size % inc;
172+
for(std::size_t i = 0; i < vec_size; i += inc)
173+
{
174+
b_type avec = b_type::load(&a[i], Tag());
175+
b_type bvec = b_type::load(&b[i], Tag());
176+
b_type rvec = (avec + bvec) / 2;
177+
xsimd::store(&res[i], rvec, Tag());
178+
}
179+
// Remaining part that cannot be vectorize
180+
for(std::size_t i = vec_size; i < size; ++i)
181+
{
182+
res[i] = (a[i] + b[i]) / 2;
183+
}
184+
}
185+
};
186+
187+
This can be useful to implement runtime dispatching, based on the instruction set detected at runtime. `xsimd` provides a generic machinery :cpp:func:`xsimd::dispatch()` to implement
188+
this pattern. Based on the above example, instead of calling ``mean{}(arch, a, b, res, tag)``, one can use ``xsimd::dispatch(mean{})(a, b, res, tag)``.

include/xsimd/types/xsimd_api.hpp

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1382,10 +1382,10 @@ auto ssub(T const& x, Tp const& y) -> decltype(x - y) {
13821382
/**
13831383
* @ingroup batch_data_transfer
13841384
*
1385-
* copy content of batch \c val to the buffer \c mem. the
1385+
* Copy content of batch \c val to the buffer \c mem. The
13861386
* memory does not need to be aligned.
1387-
* @param mem the memory buffer to read
1388-
* @param val the batch to copy
1387+
* @param mem the memory buffer to write to
1388+
* @param val the batch to copy from
13891389
*/
13901390
template<class To, class A, class From>
13911391
void store(From* mem, batch<To, A> const& val, aligned_mode={}) {
@@ -1395,10 +1395,10 @@ void store(From* mem, batch<To, A> const& val, aligned_mode={}) {
13951395
/**
13961396
* @ingroup batch_data_transfer
13971397
*
1398-
* copy content of batch \c val to the buffer \c mem. the
1398+
* Copy content of batch \c val to the buffer \c mem. The
13991399
* memory does not need to be aligned.
1400-
* @param mem the memory buffer to read
1401-
* @param val the batch to copy
1400+
* @param mem the memory buffer to write to
1401+
* @param val the batch to copy from
14021402
*/
14031403
template<class To, class A, class From>
14041404
void store(To* mem, batch<From, A> const& val, unaligned_mode) {
@@ -1408,10 +1408,10 @@ void store(To* mem, batch<From, A> const& val, unaligned_mode) {
14081408
/**
14091409
* @ingroup batch_data_transfer
14101410
*
1411-
* copy content of batch \c val to the buffer \c mem. the
1412-
* memory does not need to be aligned.
1413-
* @param mem the memory buffer to read
1414-
* @param val the batch to copy
1411+
* Copy content of batch \c val to the buffer \c mem. The
1412+
* memory needs to be aligned.
1413+
* @param mem the memory buffer to write to
1414+
* @param val the batch to copy from
14151415
*/
14161416
template<class To, class A, class From>
14171417
void store_aligned(To* mem, batch<From, A> const& val) {
@@ -1421,9 +1421,9 @@ void store_aligned(To* mem, batch<From, A> const& val) {
14211421
/**
14221422
* @ingroup batch_data_transfer
14231423
*
1424-
* copy content of batch \c val to the buffer \c mem. the
1424+
* Copy content of batch \c val to the buffer \c mem. The
14251425
* memory does not need to be aligned.
1426-
* @param mem the memory buffer to read
1426+
* @param mem the memory buffer to write to
14271427
* @param val the batch to copy
14281428
*/
14291429
template<class To, class A, class From>

include/xsimd/types/xsimd_batch.hpp

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,24 +355,48 @@ template<class T, class A>
355355
template<size_t... Is>
356356
batch<T, A>::batch(T const*data, detail::index_sequence<Is...>) : batch(kernel::set<A>(batch{}, A{}, data[Is]...)) {}
357357

358+
/**
359+
* Copy content of this batch to the buffer \c mem. The
360+
* memory needs to be aligned.
361+
* @param mem the memory buffer to read
362+
*/
358363
template<class T, class A>
359364
template<class U>
360365
void batch<T, A>::store_aligned(U* mem) const {
361366
kernel::store_aligned<A>(mem, *this, A{});
362367
}
363368

369+
/**
370+
* Copy content of this batch to the buffer \c mem. The
371+
* memory does not need to be aligned.
372+
* @param mem the memory buffer to write to
373+
*/
364374
template<class T, class A>
365375
template<class U>
366376
void batch<T, A>::store_unaligned(U* mem) const {
367377
kernel::store_unaligned<A>(mem, *this, A{});
368378
}
369379

380+
/**
381+
* Loading from aligned memory. May involve a conversion if \c U is different
382+
* from \c T.
383+
*
384+
* @param mem the memory buffer to read from.
385+
* @return a new batch instance.
386+
*/
370387
template<class T, class A>
371388
template<class U>
372389
batch<T, A> batch<T, A>::load_aligned(U const* mem) {
373390
return kernel::load_aligned<A>(mem, kernel::convert<T>{}, A{});
374391
}
375392

393+
/**
394+
* Loading from unaligned memory. May involve a conversion if \c U is different
395+
* from \c T.
396+
*
397+
* @param mem the memory buffer to read from.
398+
* @return a new batch instance.
399+
*/
376400
template<class T, class A>
377401
template<class U>
378402
batch<T, A> batch<T, A>::load_unaligned(U const* mem) {

0 commit comments

Comments
 (0)