@@ -252,8 +252,10 @@ Multidimensional distributed arrays
252252
253253The procedure discussed above remains the same for any type of array, of any
254254dimensionality. With mpi4py-fft we can distribute any array of arbitrary dimensionality
255- using an arbitrary number of processor groups. How to distribute is completely
256- configurable through the classes in the :mod: `.pencil ` module.
255+ using any number of processor groups. We only require that the number of processor
256+ groups is at least one less than the number of dimensions, since one axis must
257+ remain aligned. Apart from this the distribution is completely configurable through
258+ the classes in the :mod: `.pencil ` module.
257259
258260We denote a global :math: `d`-dimensional array as :math: `u_{j_0 , j_1 , \ldots , j_{d-1 }}`,
259261where :math: `j_m\in \textbf {j}_m` for :math: `m=[0 , 1 , \ldots , d-1 ]`.
@@ -263,7 +265,7 @@ than one processor group, the groups are indexed, like :math:`P_0, P_1` etc.
263265
264266Lets illustrate using a 4-dimensional array with 3 processor groups. Let the
265267array be aligned only in axis 3 first (:math: `u_{j_0 /P_0 , j_1 /P_1 , j_2 /P_2 , j_3 }`),
266- and then redistributed for alignment along axes 2, 1 and finally 0. Mathematically,
268+ and then redistribute for alignment along axes 2, 1 and finally 0. Mathematically,
267269we will now be executing the three following global redistributions:
268270
269271.. math ::
@@ -273,6 +275,13 @@ we will now be executing the three following global redistributions:
273275 u_{j_0 /P_0 , j_1 , j_2 /P_1 , j_3 /P_2 } \xleftarrow [P_1 ]{2 \rightarrow 1 } u_{j_0 /P_0 , j_1 /P_1 , j_2 , j_3 /P_2 } \\
274276 u_{j_0 , j_1 /P_0 , j_2 /P_1 , j_3 /P_2 } \xleftarrow [P_0 ]{1 \rightarrow 0 } u_{j_0 /P_0 , j_1 , j_2 /P_1 , j_3 /P_2 }
275277
278+ Note that in the first step it is only processor group :math: `P_2 ` that is
279+ active in the redistribution, and the output (left hand side) is now aligned
280+ in axis 2. This can be seen since there is no processor group there to
281+ share the :math: `j_2 ` index.
282+ In the second step processor group :math: `P_1 ` is the active one, and
283+ in the final step :math: `P_0 `.
284+
276285Now, it is not necessary to use three processor groups just because we have a
277286four-dimensional array. We could just as well have been using 2 or 1. The advantage
278287of using more groups is that you can then use more processors in total. Assuming
0 commit comments