|
| 1 | +# # Getting started |
| 2 | +# |
| 3 | +# Welcome to this "Getting Started" tutorial on using the PartitionedArrays.jl package. In this guide, you'll learn how to implement a parallel version of the one-dimensional Jacobi method using PartitionedArrays. Before you start, please make sure to have installed the following packages: |
| 4 | +# ```julia |
| 5 | +# using Pkg |
| 6 | +# Pkg.add("PartitionedArrays") |
| 7 | +# Pkg.add("MPI") |
| 8 | +# ``` |
| 9 | + |
| 10 | +# ## Learning Outcomes |
| 11 | +# In this notebook, you will learn: |
| 12 | +# |
| 13 | +# - How to parallelize the one-dimensional Jacobi method |
| 14 | +# - How create a block partition with ghost cells |
| 15 | +# - How to run functions in parallel using `map` |
| 16 | +# - How to update ghost cells using `consistent!` |
| 17 | +# - The debugging vs. MPI execution mode |
| 18 | +# - How to execute the parallel Julia code with MPI |
| 19 | +# |
| 20 | + |
| 21 | +# ## The Jacobi method for the Laplace equation |
| 22 | +# |
| 23 | +# |
| 24 | +# The [Jacobi method](https://en.wikipedia.org/wiki/Jacobi_method) is a numerical tool to solve systems of linear algebraic equations. One of the main applications of the Jacobi method is to solve the equations resulting from boundary value problems (BVPs). I.e., given the values at the boundary (of a grid), we are interested in finding the interior values that fulfill a certain equation. |
| 25 | +# |
| 26 | +# A sketch of the discretization of the one-dimensional Laplace equation with boundary conditions is given in the figure below. A possible application of the 1D Laplace equation is e.g. to simulate the temperature of a thin bar where both ends of the bar are kept at a constant temperature. |
| 27 | + |
| 28 | +#  |
| 29 | + |
| 30 | +# When solving the Laplace equation in 1D, the Jacobi method leads to the following iterative scheme: The entry $i$ of vector $u$ at iteration $t+1$ is computed as: |
| 31 | +# |
| 32 | +# $u^{t+1}_i = \dfrac{u^t_{i-1}+u^t_{i+1}}{2}$ |
| 33 | + |
| 34 | +# ## Sequential version |
| 35 | +# The following code implements the iterative scheme above for boundary conditions -1 and 1 on a grid with $n$ interior points and `niter` number of iterations. |
| 36 | + |
| 37 | +function jacobi(n,niters) |
| 38 | + u = zeros(n+2) |
| 39 | + u[1] = -1 |
| 40 | + u[end] = 1 |
| 41 | + u_new = copy(u) |
| 42 | + for t in 1:niters |
| 43 | + for i in 2:(n+1) |
| 44 | + u_new[i] = 0.5*(u[i-1]+u[i+1]) |
| 45 | + end |
| 46 | + u, u_new = u_new, u |
| 47 | + end |
| 48 | + u |
| 49 | +end |
| 50 | + |
| 51 | +jacobi(10,100) |
| 52 | + |
| 53 | +# !!! note |
| 54 | +# In this version of the Jacobi method, we return after a given number of iterations. Other stopping criteria are |
| 55 | +# possible. For instance, iterate until the difference between u and u_new is below a tolerance. |
| 56 | + |
| 57 | + |
| 58 | +# ### Extracting parallelism |
| 59 | +# Consider the two nested loops of the Jacobi function and to analyze where parallelism can be exploited: |
| 60 | +# |
| 61 | +# ```julia |
| 62 | +# for t in 1:nsteps |
| 63 | +# for i in 2:(n+1) |
| 64 | +# u_new[i] = 0.5*(u[i-1]+u[i+1]) |
| 65 | +# end |
| 66 | +# u, u_new = u_new, u |
| 67 | +# end |
| 68 | +# ``` |
| 69 | +# |
| 70 | +# - The outer loop cannot be parallelized. The value of `u` at step `t+1` depends on the value at the previous step `t`. |
| 71 | +# - The inner loop can be parallelized. |
| 72 | +# |
| 73 | +# #### Partitioning scheme |
| 74 | +# We chose block partitioning to distribute data `u` over several processes. The image below illustrates the partitioning with 3 processes. |
| 75 | + |
| 76 | +#  |
| 77 | + |
| 78 | +# #### Data dependencies |
| 79 | +# Recall the Jacobi update: |
| 80 | +# |
| 81 | +# `u_new[i] = 0.5*(u[i-1]+u[i+1])` |
| 82 | +# |
| 83 | +# Thus, in order to update the local entries in `u_new`, we also need remote entries of vector `u` located in neighboring processes. Figure below shows the entries of `u` needed to update the local entries of `u_new` in a particular process (CPU 2). |
| 84 | + |
| 85 | +#  |
| 86 | + |
| 87 | +# #### Ghost (aka halo) cells |
| 88 | +# |
| 89 | +# A usual way of implementing the Jacobi method and related algorithms is using so-called ghost cells. Ghost cells represent the missing data dependencies in the data owned by each process. After importing the appropriate values from the neighbor processes one can perform the usual sequential Jacobi update locally in the processes. |
| 90 | + |
| 91 | +#  |
| 92 | + |
| 93 | +# Thus, the algorithm is usually implemented following two main phases at each iteration Jacobi: |
| 94 | +# |
| 95 | +# 1. Fill the ghost entries with communications |
| 96 | +# 2. Do the Jacobi update sequentially at each process |
| 97 | + |
| 98 | +# ## Parallel version |
| 99 | +# Next, we will implement a parallelized version of Jacobi method using partitioned arrays. The parallel function will take the number of processes $p$ as an additional argument. |
| 100 | + |
| 101 | +function jacobi_par(n,niters,p) |
| 102 | + |
| 103 | +end |
| 104 | + |
| 105 | +using PartitionedArrays |
| 106 | + |
| 107 | +# Define the grid size `n` and the number of iterations `niters`. We also specify the number of processors as 3. |
| 108 | + |
| 109 | +n = 10 |
| 110 | +niters = 100 |
| 111 | +p = 3 |
| 112 | + |
| 113 | +# The following line creates an array of Julia type `LinearIndices`. This array holds linear indices of a specified range and shape ([documentation](https://docs.julialang.org/en/v1/base/arrays/#Base.LinearIndices)). |
| 114 | + |
| 115 | +ranks = LinearIndices((p,)) |
| 116 | + |
| 117 | +# ### Debug Mode |
| 118 | +# While developing the parallel Jacobi method, we can make use of PartitionedArrays debug mode to test parallel code before running it in MPI. When running the code in parallel using MPI, the data type `MPIArray` is used to hold the partitioned data. This array type is not as flexible as standard Julia arrays and many operations are not allowed for `MPIArray` for performance reasons. For instance, it is not permitted to index arbitrary entries. |
| 119 | +# |
| 120 | +# Essentially, in debug mode one uses the data structure `DebugArray`, which is a wrapper of a standard Julia array and can therefore be used in sequential debugging sessions. This allows for easier development of parallel code, since debugging on multiple running instances can be challenging. Additionally, `DebugArray` emulates the limitations of `MPIArray`, which enables the user to detect possible MPI-related errors while debugging the code in sequential. For more information on debug and MPI mode, see the [Usage](https://www.francescverdugo.com/PartitionedArrays.jl/dev/usage/) section of the documentation. |
| 121 | + |
| 122 | +ranks = DebugArray(LinearIndices((p,))) |
| 123 | + |
| 124 | +# To demonstrate that `DebugArray` emulates the limitations of `MPIArray`, run the following code. It is expected to throw an error, since indexing is not permitted. |
| 125 | +# ```julia |
| 126 | +# ranks[1] |
| 127 | +# ``` |
| 128 | + |
| 129 | + |
| 130 | +# ### Partition the data |
| 131 | +# |
| 132 | +# Next, we create a distributed partition of the data. Using PartitionedArrays.jl method `uniform_partition`, one can generate a block partition with roughly equal block sizes. It is also possible to create multi-dimensional partitions and to create ghost cells with `uniform_partition`. For more information on the function, view the [documentation](https://www.francescverdugo.com/PartitionedArrays.jl/dev/reference/partition/#PartitionedArrays.uniform_partition). |
| 133 | + |
| 134 | +# The following line divides the `n=10` grid points into `p=3` approximately equally sized blocks and assigns the corresponding row indices to the ranks. |
| 135 | + |
| 136 | +row_partition = uniform_partition(ranks,p,n) |
| 137 | + |
| 138 | +# As discussed above, the Jacobi method requires the neighboring values $u_{i-1}$ and $u_{i+1}$ to update $u_i$. Therefore, some neighboring values that are stored on remote processes need to be communicated in each iteration. To store these neighbor values, we add ghost cells to the partition using `uniform_partition` with optional argument `ghost=true`. |
| 139 | + |
| 140 | +ghost = true |
| 141 | +row_partition = uniform_partition(ranks,p,n,ghost) |
| 142 | + |
| 143 | +# Note that rows 3, 4, 6, and 7 are now stored on more than one process. The `DebugArray` also keeps the information about which process is the owner of each row. It is possible to retrieve this information with function `local_to_owner`. The output is a `DebugArray` of the rank ids of the owner of each element. |
| 144 | + |
| 145 | +map(local_to_owner,row_partition) |
| 146 | + |
| 147 | +# Likewise, it is possible to view which are the ghost cells in each partition: |
| 148 | + |
| 149 | +map(local_to_ghost, row_partition) |
| 150 | + |
| 151 | +# And, which process is the owner of the local ghost cells: |
| 152 | + |
| 153 | +map(ghost_to_owner, row_partition) |
| 154 | + |
| 155 | +# The following line initializes the data structure that will hold the solution $u$ and fill it with all zero values. |
| 156 | + |
| 157 | +u = pzeros(Float64,row_partition) |
| 158 | + |
| 159 | +# Note that, like `DebugArray`, a `PVector` represents an array whose elements are distributed (i.e. partitioned) across processes, and indexing is disabled here as well. Therefore, the following examples are expected to raise an error. |
| 160 | +# ```julia |
| 161 | +# u[1] |
| 162 | +# u[end] |
| 163 | +# ``` |
| 164 | + |
| 165 | + |
| 166 | +# To view the local values of a partitioned vector, use method `partition` or `local_values`. |
| 167 | + |
| 168 | +partition(u) |
| 169 | + |
| 170 | +# Partition returns a `DebugArray`, so again indexing, such as in the following examples, is not permitted. |
| 171 | +# ```julia |
| 172 | +# partition(u)[1][1] |
| 173 | +# partition(u)[end][end] |
| 174 | +# ``` |
| 175 | + |
| 176 | + |
| 177 | +# ### Initialize boundary conditions |
| 178 | +# The values of the partition are still all 0, so next we need to set the correct boundary conditions. |
| 179 | +# ```julia |
| 180 | +# u[1] = -1 |
| 181 | +# u[end] = 1 |
| 182 | +# ``` |
| 183 | + |
| 184 | +# Since `PVector` is distributed, one process cannot access the values that are owned by other processes, so we need to find a different approach. Each process can set the boundary conditions locally. This is possible with the following piece of code. Using Julia function `map`, the function `set_bcs` is executed locally by each process on its locally available part of `partition(u)`. These local partitions are standard Julia `Vector`s and are allowed to be indexed. |
| 185 | + |
| 186 | +function set_bcs(my_u,rank) |
| 187 | + @show rank |
| 188 | + if rank == 1 |
| 189 | + my_u[1] = 1 |
| 190 | + end |
| 191 | + if rank == 3 |
| 192 | + my_u[end] = -1 |
| 193 | + end |
| 194 | +end |
| 195 | +map(set_bcs,partition(u),ranks) |
| 196 | +partition(u) |
| 197 | + |
| 198 | +# Using `map` we can apply the boundary conditions to each vector within the `DebugArray` individually. The result is that the local border cells (= ghost cells), which are not global borders, will be initialized with values `-1` and `1` as well. But this is not a problem since the ghost cells are overwritten with the values of the neighboring process in each iteration. |
| 199 | + |
| 200 | +map(partition(u)) do my_u |
| 201 | + my_u[1] = 1 |
| 202 | + my_u[end] = -1 |
| 203 | +end |
| 204 | +partition(u) |
| 205 | + |
| 206 | +# Remember that to perform the Jacobi update, alternate writing to one data structure `u` and another `u_new` was required. Hence, we need to create a second data structure to hold a copy of our partition. Using Julia function `copy`, the new object has the same type and values as the original data structure `u`. |
| 207 | + |
| 208 | +u_new = copy(u) |
| 209 | + |
| 210 | +partition(u_new) |
| 211 | + |
| 212 | +# ### Communication of ghost cell values |
| 213 | +# The PartitionedArrays package provides method `consistent!` to update all ghost values of partition `u` with the values from the corresponding remote owners. Thus, the local values are made globally _consistent_. The function returns a `Task`, such that latency hiding is enabled (i.e. other computations can be performed between calling `consistent!` and `wait`). In the first iteration, calling `consistent!` effectively overwrites the initialization of the ghost values with values `-1` and `1`. |
| 214 | + |
| 215 | +t = consistent!(u) |
| 216 | +wait(t) |
| 217 | + |
| 218 | +partition(u) |
| 219 | + |
| 220 | +# ### Updating grid values with Jacobi iteration |
| 221 | + |
| 222 | +# After having updated the ghost cells, each process can perform the Jacobi update on its local data. To perform the update on each part of the data in parallel, we again use `map`. You can verify that the grid points are updated correctly by running one iteration of the Jacobi method on the partitioned vectors `u` and `u_new` with the following code: |
| 223 | + |
| 224 | +map(partition(u),partition(u_new)) do my_u, my_u_new |
| 225 | + my_n = length(my_u) |
| 226 | + for i in 2:(my_n-1) |
| 227 | + my_u_new[i] = 0.5*(my_u[i-1]+my_u[i+1]) |
| 228 | + end |
| 229 | +end |
| 230 | +partition(u_new) |
| 231 | + |
| 232 | +# ### Final parallel implementation |
| 233 | +# To conclude, we can combine the steps to a final parallel implementation: |
| 234 | + |
| 235 | +function jacobi_par(n,niters,p) |
| 236 | + ranks = DebugArray(LinearIndices((p,))) |
| 237 | + ghost = true |
| 238 | + row_partition = uniform_partition(ranks,p,n,ghost) |
| 239 | + u = pzeros(Float64,row_partition) |
| 240 | + map(partition(u)) do my_u |
| 241 | + my_u[1] = 1 |
| 242 | + my_u[end] = -1 |
| 243 | + end |
| 244 | + u_new = copy(u) |
| 245 | + for iter in 1:niters |
| 246 | + t = consistent!(u) |
| 247 | + wait(t) |
| 248 | + map(partition(u),partition(u_new)) do my_u, my_u_new |
| 249 | + my_n = length(my_u) |
| 250 | + for i in 2:(my_n-1) |
| 251 | + my_u_new[i] = 0.5*(my_u[i-1]+my_u[i+1]) |
| 252 | + end |
| 253 | + end |
| 254 | + u, u_new = u_new, u |
| 255 | + end |
| 256 | + u |
| 257 | +end |
| 258 | + |
| 259 | +u = jacobi_par(10,100,3) |
| 260 | + |
| 261 | +partition(u) |
| 262 | + |
| 263 | +# ## Parallel execution |
| 264 | +# After having debugged the code in sequential, we just need to change a couple of code passages to execute the Jacobi method in parallel using MPI. First of all, include the Julia MPI API `MPI.jl`. |
| 265 | + |
| 266 | +using MPI |
| 267 | + |
| 268 | +# In general, any Julia program can be executed using `MPI.jl` like so: |
| 269 | + |
| 270 | +run(`$(mpiexec()) -np 3 julia -e 'println("hi!")'`); |
| 271 | + |
| 272 | +# The command `mpiexec` launches MPI and `-np` specifies the number of processes. Instead of passing the code, you can also copy the code in a file called `filename.jl` and launch the code with |
| 273 | +# ``` |
| 274 | +# run(`$(mpiexec()) -np 3 julia --project=. filename.jl`) |
| 275 | +# ``` |
| 276 | +# ### The MPI mode |
| 277 | +# Now we can call the main function, which calls the parallel Jacobi method, using `with_mpi(f)`. This expression calls function `f` "in MPI mode". Essentially, `with_mpi(f)` calls function `f` with function argument `distribute_with_mpi`. The function `distribute_with_mpi` in turn creates an `MPIArray` from a given collection and distributes its items over the ranks of the given MPI communicator `comm`. (If `comm` is not specified, the standard communicator `MPI.COMM_WORLD` is used.) The difference to the debug mode is that now a real distributed `MPIArray` is used where before `DebugArray` was employed. To switch back to debug mode, simply replace `with_mpi` with `with_debug`. |
| 278 | +# |
| 279 | +# Finally the whole syntax is copied in a Julia `quote` block and run with `mpiexec`. |
| 280 | + |
| 281 | +code = quote |
| 282 | + using PartitionedArrays |
| 283 | + |
| 284 | + function main(distribute) |
| 285 | + function jacobi_par(n,niters,p) |
| 286 | + ranks = distribute(LinearIndices((p,))) |
| 287 | + ghost = true |
| 288 | + row_partition = uniform_partition(ranks,p,n,ghost) |
| 289 | + u = pzeros(Float64,row_partition) |
| 290 | + map(partition(u)) do my_u |
| 291 | + my_u[1] = 1 |
| 292 | + my_u[end] = -1 |
| 293 | + end |
| 294 | + u_new = copy(u) |
| 295 | + for iter in 1:niters |
| 296 | + t = consistent!(u) |
| 297 | + wait(t) |
| 298 | + map(partition(u),partition(u_new)) do my_u, my_u_new |
| 299 | + my_n = length(my_u) |
| 300 | + for i in 2:(my_n-1) |
| 301 | + my_u_new[i] = 0.5*(my_u[i-1]+my_u[i+1]) |
| 302 | + end |
| 303 | + end |
| 304 | + u, u_new = u_new, u |
| 305 | + end |
| 306 | + u |
| 307 | + end |
| 308 | + u = jacobi_par(10,100,3) |
| 309 | + display(partition(u)) |
| 310 | + end # main |
| 311 | + |
| 312 | + with_mpi(main) |
| 313 | + |
| 314 | + end # quote |
| 315 | + |
| 316 | +run(`$(mpiexec()) -np 3 julia --project=. -e $code`); |
| 317 | + |
| 318 | + |
| 319 | + |
0 commit comments