@@ -140,8 +140,124 @@ the error.
140140
141141When using MPI, the computational time to run some code can be different for each one of
142142the processes. Usually, one measures the time for each process and computes some statistics
143- of the resulting values. To this end, the library provides a special timer type called
144- [ ` PTimer ` ] ( @ref ) .
143+ of the resulting values. This is done by doing time measurements with the tool of your choice and then ` gather ` ing the results
144+ on the root for further analysis. Note that this is possible thanks to the changes in version 0.4.1
145+ that allow one to use ` gather ` on arbitrary objects.
146+
147+ In the following example, we force different computation times at each of the processes
148+ by sleeping a value proportional to the rank id. We gather all the timings in the main process and compute some statistics:
149+
150+ ``` julia
151+ using PartitionedArrays
152+ using Statistics
153+ with_mpi () do distribute
154+ np = 3
155+ ranks = distribute (LinearIndices ((np,)))
156+ t = @elapsed map (ranks) do rank
157+ sleep (rank)
158+ end
159+ ts = gather (map (rank-> t,ranks))
160+ map_main (ts) do ts
161+ @show ts
162+ @show maximum (ts)
163+ @show minimum (ts)
164+ @show Statistics. mean (ts)
165+ end
166+ end
167+ ```
168+
169+ ```
170+ ts = [1.001268313, 2.0023204, 3.001216396]
171+ maximum(ts) = 3.001216396
172+ minimum(ts) = 1.001268313
173+ Statistics.mean(ts) = 2.001601703
174+ ```
175+
176+ This mechanism also works for the other back-ends. For sequential ones, it provides the time
177+ spend by all parts combined. Note how we define ` t ` (outside the call to ` map ` ) and the object passed to ` gather ` .
178+
179+ ``` julia
180+ using PartitionedArrays
181+ using Statistics
182+ with_debug () do distribute
183+ np = 3
184+ ranks = distribute (LinearIndices ((np,)))
185+ t = @elapsed map (ranks) do rank
186+ sleep (rank)
187+ end
188+ ts = gather (map (rank-> t,ranks))
189+ map_main (ts) do ts
190+ @show ts
191+ @show maximum (ts)
192+ @show minimum (ts)
193+ @show Statistics. mean (ts)
194+ end
195+ end ;
196+ ```
197+
198+ ```
199+ ts = [6.009726399, 6.009726399, 6.009726399]
200+ maximum(ts) = 6.009726399
201+ minimum(ts) = 6.009726399
202+ Statistics.mean(ts) = 6.009726398999999
203+ ```
204+
205+ We can also consider more sophisticated ways of measuring the times, e.g., with [ TimerOutputs] ( https://github.com/KristofferC/TimerOutputs.jl ) .
206+
207+ ``` julia
208+ using PartitionedArrays
209+ using Statistics
210+ using TimerOutputs
211+ with_mpi () do distribute
212+ np = 3
213+ ranks = distribute (LinearIndices ((np,)))
214+ to = TimerOutput ()
215+ @timeit to " phase 1" map (ranks) do rank
216+ sleep (rank)
217+ end
218+ @timeit to " phase 2" map (ranks) do rank
219+ sleep (2 * rank)
220+ end
221+ tos = gather (map (rank-> to,ranks))
222+ map_main (tos) do tos
223+ # check the timings on the first rank
224+ display (tos[1 ])
225+ # compute statistics for phase 1
226+ ts = map (tos) do to
227+ TimerOutputs. time (to[" phase 1" ])
228+ end
229+ @show ts
230+ @show maximum (ts)
231+ @show minimum (ts)
232+ @show Statistics. mean (ts)
233+ end
234+ end
235+ ```
236+
237+ ```
238+ ────────────────────────────────────────────────────────────────────
239+ Time Allocations
240+ ─────────────────────── ────────────────────────
241+ Tot / % measured: 10.3s / 29.3% 44.9MiB / 0.0%
242+
243+ Section ncalls time %tot avg alloc %tot avg
244+ ────────────────────────────────────────────────────────────────────
245+ phase 2 1 2.00s 66.6% 2.00s 120B 50.0% 120B
246+ phase 1 1 1.00s 33.4% 1.00s 120B 50.0% 120B
247+ ────────────────────────────────────────────────────────────────────
248+ ts = [1002323746, 2001614329, 3004363808]
249+ maximum(ts) = 3004363808
250+ minimum(ts) = 1002323746
251+ Statistics.mean(ts) = 2.0027672943333333e9
252+ ```
253+
254+ In addition, the library provides a special timer type called [ ` PTimer ` ] ( @ref ) .
255+
256+ !!! note
257+ ` PTimer ` has been deprecated. Do time measurements with the tool of your choice and then ` gather ` the results
258+ on the root for further analysis (see above).
259+
260+
145261In the following example we force different computation times at each of the processes
146262by sleeping a value proportional to the rank id.
147263When displayed, the instance of [ ` PTimer ` ] ( @ref ) shows some statistics of the
@@ -170,7 +286,7 @@ Sleep 3.021e+00 1.021e+00 2.021e+00
170286───────────────────────────────────────────
171287```
172288
173- This mechanism also works for the other back-ends. For sequential ones, it provides the type
289+ This mechanism also works for the other back-ends. For sequential ones, it provides the time
174290spend by all parts combined.
175291
176292``` julia
0 commit comments