@@ -78,56 +78,7 @@ Finally, to reduce the chance of a garbage collection occurring in the middle
7878of the benchmark, ideally a garbage collection cycle should occur prior to the
7979run of the benchmark, postponing the next cycle as far as possible.
8080
81- The ` scala.testing.Benchmark ` trait is predefined in the Scala standard
82- library and is designed with above in mind. Here is an example of benchmarking
83- a map operation on a concurrent trie:
84-
85- import collection.parallel.mutable.ParTrieMap
86- import collection.parallel.ForkJoinTaskSupport
87-
88- object Map extends testing.Benchmark {
89- val length = sys.props("length").toInt
90- val par = sys.props("par").toInt
91- val partrie = ParTrieMap((0 until length) zip (0 until length): _*)
92-
93- partrie.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(par))
94-
95- def run = {
96- partrie map {
97- kv => kv
98- }
99- }
100- }
101-
102- The ` run ` method embodies the microbenchmark code which will be run
103- repetitively and whose running time will be measured. The object ` Map ` above
104- extends the ` scala.testing.Benchmark ` trait and parses system specified
105- parameters ` par ` for the parallelism level and ` length ` for the number of
106- elements in the trie.
107-
108- After compiling the program above, run it like this:
109-
110- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=1 -Dlength=300000 Map 10
111-
112- The ` server ` flag specifies that the server VM should be used. The ` cp `
113- specifies the classpath and includes classfiles in the current directory and
114- the scala library jar. Arguments ` -Dpar ` and ` -Dlength ` are the parallelism
115- level and the number of elements. Finally, ` 10 ` means that the benchmark
116- should be run that many times within the same JVM.
117-
118- Running times obtained by setting the ` par ` to ` 1 ` , ` 2 ` , ` 4 ` and ` 8 ` on a
119- quad-core i7 with hyperthreading:
120-
121- Map$ 126 57 56 57 54 54 54 53 53 53
122- Map$ 90 99 28 28 26 26 26 26 26 26
123- Map$ 201 17 17 16 15 15 16 14 18 15
124- Map$ 182 12 13 17 16 14 14 12 12 12
125-
126- We can see above that the running time is higher during the initial runs, but
127- is reduced after the code gets optimized. Further, we can see that the benefit
128- of hyperthreading is not high in this example, as going from ` 4 ` to ` 8 `
129- threads results only in a minor performance improvement.
130-
81+ For proper benchmark examples, you can see the source code inside [ Scala library benchmarks] [ 3 ] .
13182
13283## How big should a collection be to go parallel?
13384
@@ -162,114 +113,7 @@ depends on many factors. Some of them, but not all, include:
162113 collection cycle can be triggered. Depending on how the references
163114 to new objects are passed around, the GC cycle can take more or less time.
164115
165- Even in separation, it is not easy to reason about things above and
166- give a precise answer to what the collection size should be. To
167- roughly illustrate what the size should be, we give an example of
168- a cheap side-effect-free parallel vector reduce (in this case, sum)
169- operation performance on an i7 quad-core processor (not using
170- hyperthreading) on JDK7:
171-
172- import collection.parallel.immutable.ParVector
173-
174- object Reduce extends testing.Benchmark {
175- val length = sys.props("length").toInt
176- val par = sys.props("par").toInt
177- val parvector = ParVector((0 until length): _*)
178-
179- parvector.tasksupport = new collection.parallel.ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(par))
180-
181- def run = {
182- parvector reduce {
183- (a, b) => a + b
184- }
185- }
186- }
187-
188- object ReduceSeq extends testing.Benchmark {
189- val length = sys.props("length").toInt
190- val vector = collection.immutable.Vector((0 until length): _*)
191-
192- def run = {
193- vector reduce {
194- (a, b) => a + b
195- }
196- }
197- }
198-
199- We first run the benchmark with ` 250000 ` elements and obtain the
200- following results, for ` 1 ` , ` 2 ` and ` 4 ` threads:
201-
202- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=1 -Dlength=250000 Reduce 10 10
203- Reduce$ 54 24 18 18 18 19 19 18 19 19
204- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=2 -Dlength=250000 Reduce 10 10
205- Reduce$ 60 19 17 13 13 13 13 14 12 13
206- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=4 -Dlength=250000 Reduce 10 10
207- Reduce$ 62 17 15 14 13 11 11 11 11 9
208-
209- We then decrease the number of elements down to ` 120000 ` and use ` 4 `
210- threads to compare the time to that of a sequential vector reduce:
211-
212- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=4 -Dlength=120000 Reduce 10 10
213- Reduce$ 54 10 8 8 8 7 8 7 6 5
214- java -server -cp .:../../build/pack/lib/scala-library.jar -Dlength=120000 ReduceSeq 10 10
215- ReduceSeq$ 31 7 8 8 7 7 7 8 7 8
216-
217- ` 120000 ` elements seems to be the around the threshold in this case.
218-
219- As another example, we take the ` mutable.ParHashMap ` and the ` map `
220- method (a transformer method) and run the following benchmark in the same environment:
221-
222- import collection.parallel.mutable.ParHashMap
223-
224- object Map extends testing.Benchmark {
225- val length = sys.props("length").toInt
226- val par = sys.props("par").toInt
227- val phm = ParHashMap((0 until length) zip (0 until length): _*)
228-
229- phm.tasksupport = new collection.parallel.ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(par))
230-
231- def run = {
232- phm map {
233- kv => kv
234- }
235- }
236- }
237-
238- object MapSeq extends testing.Benchmark {
239- val length = sys.props("length").toInt
240- val hm = collection.mutable.HashMap((0 until length) zip (0 until length): _*)
241-
242- def run = {
243- hm map {
244- kv => kv
245- }
246- }
247- }
248-
249- For ` 120000 ` elements we get the following times when ranging the
250- number of threads from ` 1 ` to ` 4 ` :
251-
252- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=1 -Dlength=120000 Map 10 10
253- Map$ 187 108 97 96 96 95 95 95 96 95
254- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=2 -Dlength=120000 Map 10 10
255- Map$ 138 68 57 56 57 56 56 55 54 55
256- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=4 -Dlength=120000 Map 10 10
257- Map$ 124 54 42 40 38 41 40 40 39 39
258-
259- Now, if we reduce the number of elements to ` 15000 ` and compare that
260- to the sequential hashmap:
261-
262- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=1 -Dlength=15000 Map 10 10
263- Map$ 41 13 10 10 10 9 9 9 10 9
264- java -server -cp .:../../build/pack/lib/scala-library.jar -Dpar=2 -Dlength=15000 Map 10 10
265- Map$ 48 15 9 8 7 7 6 7 8 6
266- java -server -cp .:../../build/pack/lib/scala-library.jar -Dlength=15000 MapSeq 10 10
267- MapSeq$ 39 9 9 9 8 9 9 9 9 9
268116
269- For this collection and this operation it makes sense
270- to go parallel when there are above ` 15000 ` elements (in general,
271- it is feasible to parallelize hashmaps and hashsets with fewer
272- elements than would be required for arrays or vectors).
273117
274118
275119
@@ -279,6 +123,8 @@ elements than would be required for arrays or vectors).
279123
2801241 . [ Anatomy of a flawed microbenchmark, Brian Goetz] [ 1 ]
2811252 . [ Dynamic compilation and performance measurement, Brian Goetz] [ 2 ]
126+ 3 . [ Scala library benchmarks] [ 3 ]
282127
283128 [ 1 ] : http://www.ibm.com/developerworks/java/library/j-jtp02225/index.html " flawed-benchmark "
284129 [ 2 ] : http://www.ibm.com/developerworks/library/j-jtp12214/ " dynamic-compilation "
130+ [ 3 ] : https://github.com/scala/scala/tree/2.12.x/test/benchmarks
0 commit comments