Skip to content

Commit 9b1c802

Browse files
committed
Adding some more documentation
1 parent cd3ddd1 commit 9b1c802

File tree

12 files changed

+349
-109
lines changed

12 files changed

+349
-109
lines changed

docs/source/dft.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ We denote the entries of a two-dimensional array as :math:`u_{j_0, j_1}`,
8484
which corresponds to a row-major matrix
8585
:math:`\boldsymbol{u}=\{u_{j_0, j_1}\}_{(j_0, j_1) \in \textbf{j}_0 \times \textbf{j}_1}` of
8686
size :math:`N_0\cdot N_1`. Denoting also :math:`\omega_m=j_m k_m / N_m`, a
87-
two-dimensional forward and backward DFTs can be defined as
87+
two-dimensional forward and backward DFT can be defined as
8888

8989
.. math::
9090
:label: 2dfourier
@@ -93,7 +93,7 @@ two-dimensional forward and backward DFTs can be defined as
9393
u_{j_0, j_1} &= \sum_{k_1\in \textbf{k}_1} \Big( e^{2\pi i \omega_1} \sum_{k_0\in\textbf{k}_0} \Big( e^{2\pi i \omega_0} \hat{u}_{k_0, k_1} \Big) \Big), \quad \forall \, (j_0, j_1) \in \textbf{j}_0 \times \textbf{j}_1.
9494
9595
Note that the forward transform corresponds to taking the 1D Fourier
96-
transform first along axis 1, once for each of the indices in :math:`j_0`.
96+
transform first along axis 1, once for each of the indices in :math:`\textbf{j}_0`.
9797
Afterwords the transform is executed along axis 0. The two steps are more
9898
easily understood if we break things up a little bit and write the forward
9999
transform in :eq:`2dfourier` in two steps as
@@ -184,7 +184,7 @@ depending on the requested precision being single, double or long double,
184184
respectively. Except from precision, the tree classes are identical.
185185
All transforms are non-normalized by default. Note that all these functions
186186
are *planners*. They do not execute the transforms, they simply return an
187-
instance of a class that can do it. See docstrings of each function for usage.
187+
instance of a class that can do it (see docstrings of each function for usage).
188188
For quick reference, the 2D transform :ref:`shown for Numpy <numpy2d>` can be
189189
done using :mod:`.fftw` as::
190190

docs/source/global.rst

Lines changed: 80 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -75,21 +75,22 @@ aligned (non-distributed). We may now inspect the low-level
7575

7676
p0 = a.pencil
7777

78-
The ``p0`` :class:`.Pencil` object now contains information about the
79-
distribution of a 2D dataarray of global shape (8, 8), that will be
80-
distributed in the first axis and aligned in the second. However, ``p0``
81-
does not contain any of the data itself. Distributed arrays are instead
82-
created using the information that is in ``p0``. The distributed array
83-
``a`` uses the associated pencil to look up information about the global
84-
array, for example::
85-
86-
a.alignment
87-
a.global_shape
88-
a.subcomm
89-
a.commsizes
90-
91-
will return, respectively, ``1``, ``(8, 8)``, a list of two subcommunicators
92-
(subcomm) and finally their sizes. Naturally, their sizes will depend on the
78+
The ``p0`` :class:`.Pencil` object contains information about the
79+
distribution of a 2D dataarray of global shape (8, 8). The
80+
distributed array ``a`` has been created using the information that is in
81+
``p0``, and ``p0`` is used by ``a`` to look up information about
82+
the global array, for example::
83+
84+
>>> a.alignment
85+
1
86+
>>> a.global_shape
87+
(8, 8)
88+
>>> a.subcomm
89+
(<mpi4py.MPI.Cartcomm at 0x10cc14a68>, <mpi4py.MPI.Cartcomm at 0x10e028690>)
90+
>>> a.commsizes
91+
[1, 1]
92+
93+
Naturally, the sizes of the communicators will depend on the
9394
number of processors used to run the program. If we used 4, then
9495
``a.commsizes`` would return ``[1, 4]``.
9596

@@ -110,12 +111,13 @@ it would be a pure Numpy array (created on each processor) and it would
110111
not contain any of the information about the global array that it is
111112
part of ``(global_shape, pencil, subcomm, etc.)``. It contains the same
112113
amount of data as ``a`` though and ``a0`` is as such a perfectly fine
113-
distributed array.
114+
distributed array. Used together with ``p0`` it contains exactly the
115+
same information as ``a``.
114116

115117
Since at least one axis needs to be aligned (non-distributed), a 2D array
116118
can only be distributed with
117119
one processor group. If we wanted to distribute the second axis instead
118-
of the first, then we would could have done::
120+
of the first, then we would have done::
119121

120122
a = DistributedArray(N, [1, 0])
121123

@@ -249,71 +251,98 @@ Multidimensional distributed arrays
249251
-----------------------------------
250252

251253
The procedure discussed above remains the same for any type of array, of any
252-
dimension. With mpi4py-fft we can distribute any array of arbitrary dimensionality
254+
dimensionality. With mpi4py-fft we can distribute any array of arbitrary dimensionality
253255
using an arbitrary number of processor groups. How to distribute is completely
254256
configurable through the classes in the :mod:`.pencil` module.
255257

256-
The
257-
258258
We denote a global :math:`d`-dimensional array as :math:`u_{j_0, j_1, \ldots, j_{d-1}}`,
259259
where :math:`j_m\in\textbf{j}_m` for :math:`m=[0, 1, \ldots, d-1]`.
260260
A :math:`d`-dimensional array distributed with only one processor group in the
261261
first axis is denoted as :math:`u_{j_0/P, j_1, \ldots, j_{d-1}}`. If using more
262262
than one processor group, the groups are indexed, like :math:`P_0, P_1` etc.
263263

264-
Lets illustrate using a 4-dimensional array and 3 processor groups::
264+
Lets illustrate using a 4-dimensional array with 3 processor groups. Let the
265+
array be aligned only in axis 3 first (:math:`u_{j_0/P_0, j_1/P_1, j_2/P_2, j_3}`),
266+
and then redistributed for alignment along axes 2, 1 and finally 0. Mathematically,
267+
we will now be executing the three following global redistributions:
268+
269+
.. math::
270+
:label: 4d_redistribute
271+
272+
u_{j_0/P_0, j_1/P_1, j_2, j_3/P_2} \xleftarrow[P_2]{3 \rightarrow 2} u_{j_0/P_0, j_1/P_1, j_2/P_2, j_3} \\
273+
u_{j_0/P_0, j_1, j_2/P_1, j_3/P_2} \xleftarrow[P_1]{2 \rightarrow 1} u_{j_0/P_0, j_1/P_1, j_2, j_3/P_2} \\
274+
u_{j_0, j_1/P_0, j_2/P_1, j_3/P_2} \xleftarrow[P_0]{1 \rightarrow 0} u_{j_0/P_0, j_1, j_2/P_1, j_3/P_2}
275+
276+
Now, it is not necessary to use three processor groups just because we have a
277+
four-dimensional array. We could just as well have been using 2 or 1. The advantage
278+
of using more groups is that you can then use more processors in total. Assuming
279+
:math:`N = N_0 = N_1 = N_2 = N_3`, you can use a maximum of :math:`N^p` processors,
280+
where :math:`p` is
281+
the number of processor groups. So for an array of shape :math:`(8,8,8,8)`
282+
it is possible to use 8, 64 and 512 number of processors for 1, 2 and 3
283+
processor groups, respectively. On the other hand, if you can get away with it,
284+
or if you do not have access to a great number of processors, then fewer groups
285+
are usually found to be faster for the same number of processors in total.
286+
287+
We can implement the global redistribution using the high-level :class:`.DistributedArray`
288+
class::
265289

266290
N = (8, 8, 8, 8)
291+
a3 = DistributedArray(N, [0, 0, 0, 1])
292+
a2 = a3.redistribute(2)
293+
a1 = a2.redistribute(1)
294+
a0 = a1.redistribute(0)
295+
296+
Note that the three redistribution steps correspond exactly to the three steps
297+
in :eq:`4d_redistribute`.
298+
299+
Using a low-level API the same can be achieved with a little more elaborate
300+
coding. We start by creating pencils for the 4 different alignments::
301+
267302
subcomm = Subcomm(comm, [0, 0, 0, 1])
268-
p0 = Pencil(subcomm, N, axis=3)
269-
p1 = p0.pencil(2)
270-
p2 = p1.pencil(1)
271-
p3 = p2.pencil(0)
303+
p3 = Pencil(subcomm, N, axis=3)
304+
p2 = p3.pencil(2)
305+
p1 = p2.pencil(1)
306+
p0 = p1.pencil(0)
272307

273308
Here we have defined 4 different pencil groups, ``p0, p1, p2, p3``, aligned in
274-
axis 3, 2, 1 and 0, respectively. Transfer objects for arrays of type ``np.float``
309+
axis 0, 1, 2 and 3, respectively. Transfer objects for arrays of type ``np.float``
275310
are then created as::
276311

277-
transfer01 = p0.transfer(p1, np.float)
278-
transfer12 = p1.transfer(p2, np.float)
279-
transfer23 = p2.transfer(p3, np.float)
312+
transfer32 = p3.transfer(p2, np.float)
313+
transfer21 = p2.transfer(p1, np.float)
314+
transfer10 = p1.transfer(p0, np.float)
280315

281316
Note that we can create transfer objects between any two pencils, not just
282-
neighbouring axes.
283-
284-
We may now perform three different global redistributions as::
317+
neighbouring axes. We may now perform three different global redistributions
318+
as::
285319

286320
a0 = np.zeros(p0.subshape)
287321
a1 = np.zeros(p1.subshape)
288322
a2 = np.zeros(p2.subshape)
289323
a3 = np.zeros(p3.subshape)
290324
a0[:] = np.random.random(a0.shape)
291-
transfer01.forward(a0, a1)
292-
transfer12.forward(a1, a2)
293-
transfer23.forward(a2, a3)
325+
transfer32.forward(a3, a2)
326+
transfer21.forward(a2, a1)
327+
transfer10.forward(a1, a0)
294328

295329
Storing this code under ``pencils4d.py``, we can use 8 processors that will
296330
give us 3 processor groups with 2 processors in each group::
297331

298332
mpirun -np 8 python pencils4d.py
299333

300-
Mathematically, we will now, with the three calls to ``transfer``, be executing
301-
the three following global redistributions:
334+
Note that with the low-level approach we can now easily go back using the
335+
reverse ``backward`` method of the :class:`.Transfer` objects::
302336

303-
.. math::
337+
transfer10.backward(a0, a1)
304338

305-
u_{j_0/P_0, j_1/P_1, j_2, j_3/P_2} \xleftarrow[P_2]{3 \rightarrow 2} u_{j_0/P_0, j_1/P_1, j_2/P_2, j_3} \\
306-
u_{j_0/P_0, j_1, j_2/P_1, j_3/P_2} \xleftarrow[P_1]{2 \rightarrow 1} u_{j_0/P_0, j_1/P_1, j_2, j_3/P_2} \\
307-
u_{j_0, j_1/P_0, j_2/P_1, j_3/P_2} \xleftarrow[P_0]{1 \rightarrow 0} u_{j_0/P_0, j_1, j_2/P_1, j_3/P_2}
339+
A different approach is also possible with the high-level API::
308340

341+
a0.redistribute(darray=a1)
342+
a1.redistribute(darray=a2)
343+
a2.redistribute(darray=a3)
309344

310-
Now, it is not necessary to use three processor groups just because we have a
311-
four-dimensional array. We could just as well have been using 2 or 1. The advantage
312-
of using more groups is that you can then use more processors in total. Assuming
313-
:math:`N = N_0 = N_1 = N_2 = N_3`, you can use a maximum of :math:`N^p` processors,
314-
where :math:`p` is
315-
the number of processor groups. So for an array of shape :math:`(8,8,8,8)`
316-
it is possible to use 8, 64 and 512 number of processors for 1, 2 and 3
317-
processor groups, respectively. On the other hand, if you can get away with it,
318-
or if you do not have access to a great number of processors, then fewer groups
319-
are usually found to be faster for the same number of processors in total.
345+
which corresponds to the backward transfers. However, with the high-level
346+
API the transfer objects are created (and deleted on exit) during the call
347+
to ``redistribute`` and as such this latter approach may be slightly less
348+
efficient.

docs/source/howtocite.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
How to cite?
2+
============
3+
4+
Please cite mpi4py-fft using
5+
6+
::
7+
8+
@article{jpdc_fft,
9+
author = {{Dalcin, Lisandro and Mortensen, Mikael and Keyes, David E}},
10+
year = 2019,
11+
title = {{Fast parallel multidimensional FFT using advanced MPI}},
12+
journal = {{Journal of Parallel and Distributed Computing}},
13+
volume = in press
14+
}
15+
@electronic{mpi4py-fft,
16+
author = {{Lisandro Dalcin and Mikael Mortensen}},
17+
title = {{mpi4py-fft}},
18+
url = {https://bitbucket.org/mpi4py/mpi4py-fft}
19+
}

docs/source/howtocontribute.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
How to contribute?
2+
==================
3+
4+
Mpi4py-fft is an open source project and anyone is welcome to contribute.
5+
An easy way to get started is by suggesting a new enhancement on the
6+
`issue tracker <https://bitbucket.org/mpi4py/mpi4py-fft/issues>`_. If you have
7+
found a bug, then either report this on the issue tracker, er even better, make
8+
a fork of the repository, fix the bug and then create a
9+
`pull request <https://bitbucket.org/mpi4py/mpi4py-fft/pull-requests/>`_
10+
to get the fix into the master branch.

docs/source/index.rst

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,11 @@ Welcome to mpi4py-fft's documentation!
1717
dft
1818
parallel
1919
io
20+
installation
21+
howtocite
22+
howtocontribute
2023

21-
Indices and tables
22-
==================
24+
.. toctree::
25+
:caption: Indices and tables
2326

24-
* :ref:`genindex`
25-
* :ref:`modindex`
26-
* :ref:`search`
27+
indices

docs/source/indices.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Indices and tables
2+
==================
3+
4+
* :ref:`genindex`
5+
* :ref:`modindex`
6+
* :ref:`search`

0 commit comments

Comments
 (0)