Skip to content

Commit 7ba31ee

Browse files
committed
Add Jacobi tutorial to documentation
1 parent f167f73 commit 7ba31ee

File tree

7 files changed

+323
-0
lines changed

7 files changed

+323
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
.vscode/
12
*.code-workspace
23
*.jl.*.cov
34
*.jl.cov

docs/jacobi_tutorial.jl

Lines changed: 319 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
# # Getting started
2+
#
3+
# Welcome to this "Getting Started" tutorial on using the PartitionedArrays.jl package. In this guide, you'll learn how to implement a parallel version of the one-dimensional Jacobi method using PartitionedArrays. Before you start, please make sure to have installed the following packages:
4+
# ```julia
5+
# using Pkg
6+
# Pkg.add("PartitionedArrays")
7+
# Pkg.add("MPI")
8+
# ```
9+
10+
# ## Learning Outcomes
11+
# In this notebook, you will learn:
12+
#
13+
# - How to parallelize the one-dimensional Jacobi method
14+
# - How create a block partition with ghost cells
15+
# - How to run functions in parallel using `map`
16+
# - How to update ghost cells using `consistent!`
17+
# - The debugging vs. MPI execution mode
18+
# - How to execute the parallel Julia code with MPI
19+
#
20+
21+
# ## The Jacobi method for the Laplace equation
22+
#
23+
#
24+
# The [Jacobi method](https://en.wikipedia.org/wiki/Jacobi_method) is a numerical tool to solve systems of linear algebraic equations. One of the main applications of the Jacobi method is to solve the equations resulting from boundary value problems (BVPs). I.e., given the values at the boundary (of a grid), we are interested in finding the interior values that fulfill a certain equation.
25+
#
26+
# A sketch of the discretization of the one-dimensional Laplace equation with boundary conditions is given in the figure below. A possible application of the 1D Laplace equation is e.g. to simulate the temperature of a thin bar where both ends of the bar are kept at a constant temperature.
27+
28+
# ![discretization](assets/jacobi-discretization.png)
29+
30+
# When solving the Laplace equation in 1D, the Jacobi method leads to the following iterative scheme: The entry $i$ of vector $u$ at iteration $t+1$ is computed as:
31+
#
32+
# $u^{t+1}_i = \dfrac{u^t_{i-1}+u^t_{i+1}}{2}$
33+
34+
# ## Sequential version
35+
# The following code implements the iterative scheme above for boundary conditions -1 and 1 on a grid with $n$ interior points and `niter` number of iterations.
36+
37+
function jacobi(n,niters)
38+
u = zeros(n+2)
39+
u[1] = -1
40+
u[end] = 1
41+
u_new = copy(u)
42+
for t in 1:niters
43+
for i in 2:(n+1)
44+
u_new[i] = 0.5*(u[i-1]+u[i+1])
45+
end
46+
u, u_new = u_new, u
47+
end
48+
u
49+
end
50+
51+
jacobi(10,100)
52+
53+
# !!! note
54+
# In this version of the Jacobi method, we return after a given number of iterations. Other stopping criteria are
55+
# possible. For instance, iterate until the difference between u and u_new is below a tolerance.
56+
57+
58+
# ### Extracting parallelism
59+
# Consider the two nested loops of the Jacobi function and to analyze where parallelism can be exploited:
60+
#
61+
# ```julia
62+
# for t in 1:nsteps
63+
# for i in 2:(n+1)
64+
# u_new[i] = 0.5*(u[i-1]+u[i+1])
65+
# end
66+
# u, u_new = u_new, u
67+
# end
68+
# ```
69+
#
70+
# - The outer loop cannot be parallelized. The value of `u` at step `t+1` depends on the value at the previous step `t`.
71+
# - The inner loop can be parallelized.
72+
#
73+
# #### Partitioning scheme
74+
# We chose block partitioning to distribute data `u` over several processes. The image below illustrates the partitioning with 3 processes.
75+
76+
# ![partition](assets/jacobi-partition.png)
77+
78+
# #### Data dependencies
79+
# Recall the Jacobi update:
80+
#
81+
# `u_new[i] = 0.5*(u[i-1]+u[i+1])`
82+
#
83+
# Thus, in order to update the local entries in `u_new`, we also need remote entries of vector `u` located in neighboring processes. Figure below shows the entries of `u` needed to update the local entries of `u_new` in a particular process (CPU 2).
84+
85+
# ![data-dependencies](assets/jacobi-data-dependencies.png)
86+
87+
# #### Ghost (aka halo) cells
88+
#
89+
# A usual way of implementing the Jacobi method and related algorithms is using so-called ghost cells. Ghost cells represent the missing data dependencies in the data owned by each process. After importing the appropriate values from the neighbor processes one can perform the usual sequential Jacobi update locally in the processes.
90+
91+
# ![ghost-cells](assets/jacobi-ghost-cell-update.png)
92+
93+
# Thus, the algorithm is usually implemented following two main phases at each iteration Jacobi:
94+
#
95+
# 1. Fill the ghost entries with communications
96+
# 2. Do the Jacobi update sequentially at each process
97+
98+
# ## Parallel version
99+
# Next, we will implement a parallelized version of Jacobi method using partitioned arrays. The parallel function will take the number of processes $p$ as an additional argument.
100+
101+
function jacobi_par(n,niters,p)
102+
103+
end
104+
105+
using PartitionedArrays
106+
107+
# Define the grid size `n` and the number of iterations `niters`. We also specify the number of processors as 3.
108+
109+
n = 10
110+
niters = 100
111+
p = 3
112+
113+
# The following line creates an array of Julia type `LinearIndices`. This array holds linear indices of a specified range and shape ([documentation](https://docs.julialang.org/en/v1/base/arrays/#Base.LinearIndices)).
114+
115+
ranks = LinearIndices((p,))
116+
117+
# ### Debug Mode
118+
# While developing the parallel Jacobi method, we can make use of PartitionedArrays debug mode to test parallel code before running it in MPI. When running the code in parallel using MPI, the data type `MPIArray` is used to hold the partitioned data. This array type is not as flexible as standard Julia arrays and many operations are not allowed for `MPIArray` for performance reasons. For instance, it is not permitted to index arbitrary entries.
119+
#
120+
# Essentially, in debug mode one uses the data structure `DebugArray`, which is a wrapper of a standard Julia array and can therefore be used in sequential debugging sessions. This allows for easier development of parallel code, since debugging on multiple running instances can be challenging. Additionally, `DebugArray` emulates the limitations of `MPIArray`, which enables the user to detect possible MPI-related errors while debugging the code in sequential. For more information on debug and MPI mode, see the [Usage](https://www.francescverdugo.com/PartitionedArrays.jl/dev/usage/) section of the documentation.
121+
122+
ranks = DebugArray(LinearIndices((p,)))
123+
124+
# To demonstrate that `DebugArray` emulates the limitations of `MPIArray`, run the following code. It is expected to throw an error, since indexing is not permitted.
125+
# ```julia
126+
# ranks[1]
127+
# ```
128+
129+
130+
# ### Partition the data
131+
#
132+
# Next, we create a distributed partition of the data. Using PartitionedArrays.jl method `uniform_partition`, one can generate a block partition with roughly equal block sizes. It is also possible to create multi-dimensional partitions and to create ghost cells with `uniform_partition`. For more information on the function, view the [documentation](https://www.francescverdugo.com/PartitionedArrays.jl/dev/reference/partition/#PartitionedArrays.uniform_partition).
133+
134+
# The following line divides the `n=10` grid points into `p=3` approximately equally sized blocks and assigns the corresponding row indices to the ranks.
135+
136+
row_partition = uniform_partition(ranks,p,n)
137+
138+
# As discussed above, the Jacobi method requires the neighboring values $u_{i-1}$ and $u_{i+1}$ to update $u_i$. Therefore, some neighboring values that are stored on remote processes need to be communicated in each iteration. To store these neighbor values, we add ghost cells to the partition using `uniform_partition` with optional argument `ghost=true`.
139+
140+
ghost = true
141+
row_partition = uniform_partition(ranks,p,n,ghost)
142+
143+
# Note that rows 3, 4, 6, and 7 are now stored on more than one process. The `DebugArray` also keeps the information about which process is the owner of each row. It is possible to retrieve this information with function `local_to_owner`. The output is a `DebugArray` of the rank ids of the owner of each element.
144+
145+
map(local_to_owner,row_partition)
146+
147+
# Likewise, it is possible to view which are the ghost cells in each partition:
148+
149+
map(local_to_ghost, row_partition)
150+
151+
# And, which process is the owner of the local ghost cells:
152+
153+
map(ghost_to_owner, row_partition)
154+
155+
# The following line initializes the data structure that will hold the solution $u$ and fill it with all zero values.
156+
157+
u = pzeros(Float64,row_partition)
158+
159+
# Note that, like `DebugArray`, a `PVector` represents an array whose elements are distributed (i.e. partitioned) across processes, and indexing is disabled here as well. Therefore, the following examples are expected to raise an error.
160+
# ```julia
161+
# u[1]
162+
# u[end]
163+
# ```
164+
165+
166+
# To view the local values of a partitioned vector, use method `partition` or `local_values`.
167+
168+
partition(u)
169+
170+
# Partition returns a `DebugArray`, so again indexing, such as in the following examples, is not permitted.
171+
# ```julia
172+
# partition(u)[1][1]
173+
# partition(u)[end][end]
174+
# ```
175+
176+
177+
# ### Initialize boundary conditions
178+
# The values of the partition are still all 0, so next we need to set the correct boundary conditions.
179+
# ```julia
180+
# u[1] = -1
181+
# u[end] = 1
182+
# ```
183+
184+
# Since `PVector` is distributed, one process cannot access the values that are owned by other processes, so we need to find a different approach. Each process can set the boundary conditions locally. This is possible with the following piece of code. Using Julia function `map`, the function `set_bcs` is executed locally by each process on its locally available part of `partition(u)`. These local partitions are standard Julia `Vector`s and are allowed to be indexed.
185+
186+
function set_bcs(my_u,rank)
187+
@show rank
188+
if rank == 1
189+
my_u[1] = 1
190+
end
191+
if rank == 3
192+
my_u[end] = -1
193+
end
194+
end
195+
map(set_bcs,partition(u),ranks)
196+
partition(u)
197+
198+
# Using `map` we can apply the boundary conditions to each vector within the `DebugArray` individually. The result is that the local border cells (= ghost cells), which are not global borders, will be initialized with values `-1` and `1` as well. But this is not a problem since the ghost cells are overwritten with the values of the neighboring process in each iteration.
199+
200+
map(partition(u)) do my_u
201+
my_u[1] = 1
202+
my_u[end] = -1
203+
end
204+
partition(u)
205+
206+
# Remember that to perform the Jacobi update, alternate writing to one data structure `u` and another `u_new` was required. Hence, we need to create a second data structure to hold a copy of our partition. Using Julia function `copy`, the new object has the same type and values as the original data structure `u`.
207+
208+
u_new = copy(u)
209+
210+
partition(u_new)
211+
212+
# ### Communication of ghost cell values
213+
# The PartitionedArrays package provides method `consistent!` to update all ghost values of partition `u` with the values from the corresponding remote owners. Thus, the local values are made globally _consistent_. The function returns a `Task`, such that latency hiding is enabled (i.e. other computations can be performed between calling `consistent!` and `wait`). In the first iteration, calling `consistent!` effectively overwrites the initialization of the ghost values with values `-1` and `1`.
214+
215+
t = consistent!(u)
216+
wait(t)
217+
218+
partition(u)
219+
220+
# ### Updating grid values with Jacobi iteration
221+
222+
# After having updated the ghost cells, each process can perform the Jacobi update on its local data. To perform the update on each part of the data in parallel, we again use `map`. You can verify that the grid points are updated correctly by running one iteration of the Jacobi method on the partitioned vectors `u` and `u_new` with the following code:
223+
224+
map(partition(u),partition(u_new)) do my_u, my_u_new
225+
my_n = length(my_u)
226+
for i in 2:(my_n-1)
227+
my_u_new[i] = 0.5*(my_u[i-1]+my_u[i+1])
228+
end
229+
end
230+
partition(u_new)
231+
232+
# ### Final parallel implementation
233+
# To conclude, we can combine the steps to a final parallel implementation:
234+
235+
function jacobi_par(n,niters,p)
236+
ranks = DebugArray(LinearIndices((p,)))
237+
ghost = true
238+
row_partition = uniform_partition(ranks,p,n,ghost)
239+
u = pzeros(Float64,row_partition)
240+
map(partition(u)) do my_u
241+
my_u[1] = 1
242+
my_u[end] = -1
243+
end
244+
u_new = copy(u)
245+
for iter in 1:niters
246+
t = consistent!(u)
247+
wait(t)
248+
map(partition(u),partition(u_new)) do my_u, my_u_new
249+
my_n = length(my_u)
250+
for i in 2:(my_n-1)
251+
my_u_new[i] = 0.5*(my_u[i-1]+my_u[i+1])
252+
end
253+
end
254+
u, u_new = u_new, u
255+
end
256+
u
257+
end
258+
259+
u = jacobi_par(10,100,3)
260+
261+
partition(u)
262+
263+
# ## Parallel execution
264+
# After having debugged the code in sequential, we just need to change a couple of code passages to execute the Jacobi method in parallel using MPI. First of all, include the Julia MPI API `MPI.jl`.
265+
266+
using MPI
267+
268+
# In general, any Julia program can be executed using `MPI.jl` like so:
269+
270+
run(`$(mpiexec()) -np 3 julia -e 'println("hi!")'`);
271+
272+
# The command `mpiexec` launches MPI and `-np` specifies the number of processes. Instead of passing the code, you can also copy the code in a file called `filename.jl` and launch the code with
273+
# ```
274+
# run(`$(mpiexec()) -np 3 julia --project=. filename.jl`)
275+
# ```
276+
# ### The MPI mode
277+
# Now we can call the main function, which calls the parallel Jacobi method, using `with_mpi(f)`. This expression calls function `f` "in MPI mode". Essentially, `with_mpi(f)` calls function `f` with function argument `distribute_with_mpi`. The function `distribute_with_mpi` in turn creates an `MPIArray` from a given collection and distributes its items over the ranks of the given MPI communicator `comm`. (If `comm` is not specified, the standard communicator `MPI.COMM_WORLD` is used.) The difference to the debug mode is that now a real distributed `MPIArray` is used where before `DebugArray` was employed. To switch back to debug mode, simply replace `with_mpi` with `with_debug`.
278+
#
279+
# Finally the whole syntax is copied in a Julia `quote` block and run with `mpiexec`.
280+
281+
code = quote
282+
using PartitionedArrays
283+
284+
function main(distribute)
285+
function jacobi_par(n,niters,p)
286+
ranks = distribute(LinearIndices((p,)))
287+
ghost = true
288+
row_partition = uniform_partition(ranks,p,n,ghost)
289+
u = pzeros(Float64,row_partition)
290+
map(partition(u)) do my_u
291+
my_u[1] = 1
292+
my_u[end] = -1
293+
end
294+
u_new = copy(u)
295+
for iter in 1:niters
296+
t = consistent!(u)
297+
wait(t)
298+
map(partition(u),partition(u_new)) do my_u, my_u_new
299+
my_n = length(my_u)
300+
for i in 2:(my_n-1)
301+
my_u_new[i] = 0.5*(my_u[i-1]+my_u[i+1])
302+
end
303+
end
304+
u, u_new = u_new, u
305+
end
306+
u
307+
end
308+
u = jacobi_par(10,100,3)
309+
display(partition(u))
310+
end # main
311+
312+
with_mpi(main)
313+
314+
end # quote
315+
316+
run(`$(mpiexec()) -np 3 julia --project=. -e $code`);
317+
318+
319+

docs/make.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,10 @@ using Literate
33
using PartitionedArrays
44

55
examples_jl = joinpath(@__DIR__,"examples.jl")
6+
jabobi_tutorial_jl = joinpath(@__DIR__,"jacobi_tutorial.jl")
67
src_dir = joinpath(@__DIR__,"src")
78
Literate.markdown(examples_jl,src_dir)
9+
Literate.markdown(jabobi_tutorial_jl, src_dir)
810

911
makedocs(
1012
sitename = "PartitionedArrays.jl",
@@ -16,6 +18,7 @@ makedocs(
1618
"Introduction" => "index.md",
1719
"usage.md",
1820
"examples.md",
21+
"jacobi_tutorial.md",
1922
"Reference" =>[
2023
"reference/backends.md",
2124
"reference/arraymethods.md",
25 KB
Loading
35.2 KB
Loading
44.4 KB
Loading
27.4 KB
Loading

0 commit comments

Comments
 (0)