55Here you will learn how to build and run the simplest example and measure the startup time on Linux using
66Debian packages
77
8- ### Build uber jar, print its size, and measure its start up time (tested with Oracle GraalVM 24 -dev)
8+ ### Build uber jar, print its size, and measure its start up time (tested with Oracle GraalVM 25 -dev)
99
1010``` sh
1111sudo apt install linux-tools-common linux-tools-generic
@@ -18,23 +18,23 @@ Expected output:
1818``` text
1919 Performance counter stats for './example01.jar' (100 runs):
2020
21- 275.05 msec task-clock # 2.067 CPUs utilized ( +- 0.19 % )
22- 1,970 context-switches # 7.162 K/sec ( +- 0.48 % )
23- 15 cpu-migrations # 54.536 /sec ( +- 2.23 % )
24- 22,159 page-faults # 80.565 K/sec ( +- 0.20 % )
25- 1,101,372,203 cycles # 4.004 GHz ( +- 0.20% )
26- 1,504,873,660 instructions # 1.37 insn per cycle ( +- 0.10% )
27- 297,952,673 branches # 1.083 G/sec ( +- 0.10% )
28- 9,110,277 branch-misses # 3.06% of all branches ( +- 0.11% )
29- TopdownL1 # 14.1 % tma_backend_bound
30- # 35.6 % tma_bad_speculation
31- # 30.0 % tma_frontend_bound
32- # 20.3 % tma_retiring ( +- 0.13% )
21+ 202.75 msec task-clock # 2.286 CPUs utilized ( +- 0.27 % )
22+ 2,191 context-switches # 10.806 K/sec ( +- 1.53 % )
23+ 39 cpu-migrations # 192.356 /sec ( +- 2.49 % )
24+ 24,775 page-faults # 122.195 K/sec ( +- 0.27 % )
25+ 303,999,880 cpu_atom/ cycles/ # 1.499 GHz ( +- 3.51% ) (71.71% )
26+ 1,005,678,639 cpu_core/cycles/ # 4.960 GHz ( +- 0.44% ) (77.24% )
27+ 431,508,692 cpu_atom/instructions/ # 1.42 insn per cycle ( +- 3.69% ) (71.71% )
28+ 1,684,651,998 cpu_core/instructions/ # 5.54 insn per cycle ( +- 0.47% ) (77.24% )
29+ 83,740,906 cpu_atom/branches/ # 413.027 M/sec ( +- 3.78% ) (71.71%)
30+ 335,230,949 cpu_core/branches/ # 1.653 G/sec ( +- 0.49% ) (77.24%)
31+ 2,017,405 cpu_atom/branch-misses/ # 2.41% of all branches ( +- 4.20% ) (71.71%)
32+ 9,450,892 cpu_core/branch-misses/ # 11.29% of all branches ( +- 0.47% ) (77.24% )
3333
34- 0.133067 +- 0.000190 seconds time elapsed ( +- 0.14 % )
34+ 0.088700 +- 0.000135 seconds time elapsed ( +- 0.15 % )
3535```
3636
37- ### Build Scala JS output, print its size, and measure its start up time with ` node `
37+ ### Build Scala JS output, print its size, and measure its start up time with ` node ` (tested with node.js 22)
3838
3939``` sh
4040curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
@@ -49,23 +49,23 @@ Expected output:
4949``` text
5050 Performance counter stats for 'node ./example01.js' (100 runs):
5151
52- 21.58 msec task-clock # 0.996 CPUs utilized ( +- 0.16 % )
53- 19 context-switches # 880.402 /sec ( +- 1.65 % )
54- 0 cpu-migrations # 0.000 /sec
55- 2,705 page-faults # 125.341 K/sec ( +- 0.04 % )
56- 94,278,347 cycles # 4.369 GHz ( +- 0.14% )
57- 168,171,694 instructions # 1.78 insn per cycle ( +- 0.01% )
58- 28,793,274 branches # 1.334 G/sec ( +- 0.02% )
59- 580,994 branch-misses # 2.02% of all branches ( +- 0.06% )
60- TopdownL1 # 17.7 % tma_backend_bound
61- # 14.4 % tma_bad_speculation
62- # 31.8 % tma_frontend_bound
63- # 36.2 % tma_retiring ( +- 0.14% )
52+ 14.90 msec task-clock # 1.014 CPUs utilized ( +- 0.30 % )
53+ 20 context-switches # 1.343 K /sec ( +- 1.60 % )
54+ 2 cpu-migrations # 134.254 /sec ( +- 5.08% )
55+ 2,748 page-faults # 184.466 K/sec ( +- 0.00 % )
56+ 45,371,808 cpu_atom/ cycles/ # 3.046 GHz ( +- 2.61% ) (20.12% )
57+ 75,772,761 cpu_core/cycles/ # 5.086 GHz ( +- 0.32% ) (85.22% )
58+ 97,892,289 cpu_atom/instructions/ # 2.16 insn per cycle ( +- 3.38% ) (20.12% )
59+ 165,775,873 cpu_core/instructions/ # 3.65 insn per cycle ( +- 0.29% ) (85.22% )
60+ 16,573,542 cpu_atom/branches/ # 1.113 G/sec ( +- 3.35% ) (20.12%)
61+ 28,245,004 cpu_core/branches/ # 1.896 G/sec ( +- 0.27% ) (85.22%)
62+ 97,278 cpu_atom/branch-misses/ # 0.59% of all branches ( +- 11.08% ) (20.12%)
63+ 573,362 cpu_core/branch-misses/ # 3.46% of all branches ( +- 0.72% ) (85.22% )
6464
65- 0.0216665 +- 0.0000338 seconds time elapsed ( +- 0.16 % )
65+ 0.0146864 +- 0.0000435 seconds time elapsed ( +- 0.30 % )
6666```
6767
68- ### Build Scala JS Wasm output, print its size, and measure its start up time with ` node `
68+ ### Build Scala JS Wasm output, print its size, and measure its start up time with ` node ` (tested with node.js 22)
6969
7070``` sh
7171curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
@@ -81,78 +81,78 @@ Expected output:
8181``` text
8282 Performance counter stats for 'node --experimental-wasm-exnref ./example01.js/main.js' (100 runs):
8383
84- 33.51 msec task-clock # 0.997 CPUs utilized ( +- 0.15 % )
85- 66 context-switches # 1.970 K/sec ( +- 0.44 % )
86- 0 cpu-migrations # 0.000 /sec
87- 3,102 page-faults # 92.567 K/sec ( +- 0.04 % )
88- 146,180,631 cycles # 4.362 GHz ( +- 0.13% )
89- 239,361,867 instructions # 1.64 insn per cycle ( +- 0.01 % )
90- 41,631,362 branches # 1.242 G/sec ( +- 0.01% )
91- 1,112,970 branch-misses # 2.67% of all branches ( +- 0.04 % )
92- TopdownL1 # 14.6 % tma_backend_bound
93- # 23.2 % tma_bad_speculation
94- # 32.0 % tma_frontend_bound
95- # 30.3 % tma_retiring ( +- 0.12 % )
84+ 23.36 msec task-clock # 1.036 CPUs utilized ( +- 0.23 % )
85+ 61 context-switches # 2.611 K/sec ( +- 0.47 % )
86+ 3 cpu-migrations # 128.413 /sec ( +- 4.53% )
87+ 3,156 page-faults # 135.090 K/sec ( +- 0.01 % )
88+ 81,924,942 cpu_atom/ cycles/ # 3.507 GHz ( +- 0.93% ) (10.81% )
89+ 115,425,581 cpu_core/cycles/ # 4.941 GHz ( +- 0.30 % )
90+ 151,891,461 cpu_atom/instructions/ # 1.85 insn per cycle ( +- 1.65% ) (10.81% )
91+ 228,699,784 cpu_core/instructions/ # 2.79 insn per cycle ( +- 0.23 % )
92+ 26,745,975 cpu_atom/branches/ # 1.145 G/sec ( +- 1.49% ) (10.81%)
93+ 39,592,483 cpu_core/branches/ # 1.695 G/sec ( +- 0.24% )
94+ 459,225 cpu_atom/branch-misses/ # 1.72% of all branches ( +- 1.76% ) (10.81%)
95+ 1,062,312 cpu_core/branch-misses/ # 3.97% of all branches ( +- 0.30 % )
9696
97- 0.0336197 +- 0.0000488 seconds time elapsed ( +- 0.15 % )
97+ 0.0225519 +- 0.0000530 seconds time elapsed ( +- 0.23 % )
9898```
9999
100- ### Build GraalVM native image, print its size, and measure its start up time
100+ ### Build GraalVM native image, print its size, and measure its start up time (tested with Oracle GraalVM 24)
101101
102102``` sh
103103sudo apt install linux-tools-common linux-tools-generic gcc zlib1g-dev
104104sudo sysctl kernel.perf_event_paranoid=1
105- scala-cli --power package --graalvm-jvm-id graalvm-oracle:23 --native-image example01.sc --force -o example01_graalvm.bin -- --no-fallback -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
105+ scala-cli --power package --graalvm-jvm-id graalvm-oracle:24 --native-image example01.sc --force -o example01_graalvm.bin -- --no-fallback -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
106106ls -l ./example01_graalvm.bin
107107perf stat -r 100 ./example01_graalvm.bin > /dev/null
108108```
109109Expected output:
110110``` text
111111 Performance counter stats for './example01_graalvm.bin' (100 runs):
112112
113- 1.54 msec task-clock # 0.929 CPUs utilized ( +- 2.17 % )
114- 1 context-switches # 647.939 /sec ( +- 2.93 % )
113+ 1.08 msec task-clock # 0.911 CPUs utilized ( +- 0.73 % )
114+ 1 context-switches # 927.547 /sec ( +- 4.79 % )
115115 0 cpu-migrations # 0.000 /sec
116- 535 page-faults # 346.647 K/sec ( +- 0.01% )
117- 6,194,402 cycles # 4.014 GHz ( +- 0.32% )
118- 8,776,339 instructions # 1.42 insn per cycle ( +- 0.04 % )
119- 1,661,644 branches # 1.077 G/sec ( +- 0.03% )
120- 18,496 branch-misses # 1.11% of all branches ( +- 0.87 % )
121- TopdownL1 # 32.3 % tma_backend_bound
122- # 6.6 % tma_bad_speculation
123- # 30.4 % tma_frontend_bound
124- # 30.7 % tma_retiring ( +- 0.31 % )
116+ 477 page-faults # 442.440 K/sec ( +- 0.01% )
117+ 4,315,458 cpu_atom/ cycles/ # 4.003 GHz ( +- 1.85% ) (19.75% )
118+ 3,872,466 cpu_core/cycles/ # 3.592 GHz ( +- 5.11 % )
119+ 10,829,542 cpu_atom/instructions/ # 2.51 insn per cycle ( +- 2.48% ) (19.75% )
120+ 5,351,563 cpu_core/instructions/ # 1.24 insn per cycle ( +- 5.21 % )
121+ 1,856,118 cpu_atom/branches/ # 1.722 G/sec ( +- 2.33% ) (19.75%)
122+ 1,000,400 cpu_core/branches/ # 927.918 M/sec ( +- 5.18% )
123+ 7,651 cpu_atom/branch-misses/ # 0.41% of all branches ( +- 6.90% ) (19.75%)
124+ 11,792 cpu_core/branch-misses/ # 0.64% of all branches ( +- 5.28 % )
125125
126- 0.0016622 +- 0.0000371 seconds time elapsed ( +- 2.23 % )
126+ 0.00118361 +- 0.00000667 seconds time elapsed ( +- 0.56 % )
127127```
128128
129- ### Build Scala Native image, print its size, and measure its start up time
129+ ### Build Scala Native image, print its size, and measure its start up time (tested with Scala Native 0.5.7)
130130
131131``` sh
132132sudo apt install linux-tools-common linux-tools-generic clang libstdc++-12-dev libgc-dev
133133sudo sysctl kernel.perf_event_paranoid=1
134- scala-cli --power package --native-version 0.5.5 --native example01.sc --native-mode release-full --force -o example01_native.bin
134+ scala-cli --power package --native-version 0.5.7 --native example01.sc --native-mode release-full --force -o example01_native.bin
135135ls -l ./example01_native.bin
136136perf stat -r 100 ./example01_native.bin > /dev/null
137137```
138138Expected output:
139139``` text
140140 Performance counter stats for './example01_native.bin' (100 runs):
141141
142- 0.80 msec task-clock # 0.823 CPUs utilized ( +- 0.41 % )
142+ 0.71 msec task-clock # 0.855 CPUs utilized ( +- 0.64 % )
143143 0 context-switches # 0.000 /sec
144144 0 cpu-migrations # 0.000 /sec
145- 246 page-faults # 306.115 K /sec ( +- 0.04 % )
146- 3,345,318 cycles # 4.163 GHz ( +- 0.49% )
147- 5,077,992 instructions # 1.52 insn per cycle ( +- 0.05 % )
148- 924,515 branches # 1.150 G/sec ( +- 0.05% )
149- 16,240 branch-misses # 1.76% of all branches ( +- 0.44 % )
150- TopdownL1 # 26.8 % tma_backend_bound
151- # 11.7 % tma_bad_speculation
152- # 29.4 % tma_frontend_bound
153- # 32.1 % tma_retiring ( +- 0.48 % )
145+ 731 page-faults # 1.024 M /sec ( +- 0.01 % )
146+ <not counted> cpu_atom/ cycles/ ( +-100.00% ) (0.00% )
147+ 3,769,012 cpu_core/cycles/ # 5.280 GHz ( +- 1.17 % )
148+ <not counted> cpu_atom/instructions/ ( +-100.00% ) (0.00% )
149+ 5,907,572 cpu_core/instructions/ # 139.11 insn per cycle ( +- 1.02 % )
150+ <not counted> cpu_atom/branches/ ( +-100.01% ) (0.00%)
151+ 1,055,882 cpu_core/branches/ # 1.479 G/sec ( +- 1.02% )
152+ <not counted> cpu_atom/branch-misses/ ( +-100.62% ) (0.00%)
153+ 9,104 cpu_core/branch-misses/ # 84.85% of all branches ( +- 1.35 % )
154154
155- 0.00097599 +- 0.00000574 seconds time elapsed ( +- 0.59 % )
155+ 0.00083490 +- 0.00000519 seconds time elapsed ( +- 0.62 % )
156156```
157157
158158## RFC-8259 validation (example02)
@@ -175,16 +175,16 @@ gunzip 2023_06_430_65B0_in_network_rates.json.gz
175175Speed of validation highly depends on performance of file reading on the particular disk system. Use the following command
176176to measure reading speed on Linux:
177177``` sh
178- sudo hdparm -Tt /dev/sda
178+ sudo hdparm -Tt /dev/nvme0n1p4
179179```
180180Here is an example of expected output:
181181``` text
182- /dev/sda :
183- Timing cached reads: 45532 MB in 2.00 seconds = 22806.82 MB/sec
184- Timing buffered disk reads: 2180 MB in 3.04 seconds = 716.07 MB/sec
182+ /dev/nvme0n1p4 :
183+ Timing cached reads: 48588 MB in 2.00 seconds = 24334.91 MB/sec
184+ Timing buffered disk reads: 4456 MB in 3.00 seconds = 1485.08 MB/sec
185185```
186186
187- ### Build uber jar, print its size, and measure its running time (tested with Oracle GraalVM 24 -dev)
187+ ### Build uber jar, print its size, and measure its running time (tested with Oracle GraalVM 25 -dev)
188188
189189``` sh
190190scala-cli --power package --assembly example02.sc --force -o example02.jar
@@ -193,12 +193,12 @@ time ./example02.jar -J-XX:+UnlockExperimentalVMOptions -J-XX:+UseEpsilonGC -J-X
193193```
194194Expected output:
195195``` text
196- real 1m8.991s
197- user 1m5.239s
198- sys 0m4.102s
196+ real 0m37.615s
197+ user 0m35.187s
198+ sys 0m2.662s
199199```
200200
201- ### Build GraalVM native image, print its size, and measure its running time
201+ ### Build GraalVM native image, print its size, and measure its running time (tested with Oracle GraalVM 23)
202202
203203``` sh
204204scala-cli --power package --graalvm-jvm-id graalvm-oracle:23 --native-image example02.sc --force -o example02_graalvm.bin -- --no-fallback --gc=epsilon -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
@@ -207,9 +207,9 @@ time ./example02_graalvm.bin < 2023_06_430_65B0_in_network_rates.json 2> /dev/nu
207207```
208208Expected output:
209209``` text
210- real 1m34.353s
211- user 1m29.199s
212- sys 0m5.151s
210+ real 0m44.963s
211+ user 0m42.471s
212+ sys 0m2.486s
213213```
214214
215215You can use profile guided optimization (PGO) to improve performance of Oracle GraalVM native image, for that you need:
@@ -233,21 +233,21 @@ time ./example02_graalvm_optimized.bin < 2023_06_430_65B0_in_network_rates.json
233233```
234234Expected output:
235235``` text
236- real 1m20.185s
237- user 1m15.976s
238- sys 0m4.207s
236+ real 0m43.674s
237+ user 0m41.213s
238+ sys 0m2.461s
239239```
240240
241241### Build Scala Native image, print its size, and measure its running time
242242
243243``` sh
244- scala-cli --power package --native-version 0.5.5 --native example02.sc --native-mode release-full --native-gc none --native-lto thin --native-multithreading=false --force -o example02_native.bin
244+ scala-cli --power package --native-version 0.5.7 --native example02.sc --native-mode release-full --native-gc none --native-lto thin --native-multithreading=false --force -o example02_native.bin
245245ls -l ./example02_native.bin
246246time ./example02_native.bin < 2023_06_430_65B0_in_network_rates.json 2> /dev/null
247247```
248248Expected output:
249249``` text
250- real 1m27.275s
251- user 1m23.233s
252- sys 0m4.038s
250+ real 0m49.232s
251+ user 0m46.838s
252+ sys 0m2.389s
253253```
0 commit comments