You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: entries/ghatem-fpc/README.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,8 @@
8
8
- -t flag to specify the thread-count (default reads the thread-count available on the CPU)
9
9
10
10
currently there are 2 versions that can be compiled / run:
11
-
-`OneBRC.lpr -> ghatem `: all threads share the station names - involves locking
12
-
-`OneBRC-nosharedname.lpr -> ghatem-nosharedname`: each thread maintains a copy of the station names - no locking involved
13
-
-`OneBRC-smallrec.lpr -> ghatem-smallrec `: same as OneBRC, but the StationData's "count" is UInt16 instead of 32. Will likely fail to match hash on the 5B rows test
11
+
-`OneBRC.lpr -> ghatem `: compact record, optimal for the 1B row / 41k stations, will fail on the other tests due to overflow
12
+
-`OneBRC-largerec.lpr -> ghatem-largerec `: same as OneBRC, but the StationData's "count" is UInt32 instead of 16. Passes all the tests
14
13
15
14
## Hardware + Environment
16
15
host:
@@ -25,10 +24,6 @@ VM (VirtualBox):
25
24
- CPU count: 4 out of 8 (threads, probably)
26
25
- 20 GB RAM
27
26
28
-
note about the hash:
29
-
run with DEBUG compiler directive to write from stream directly to file, otherwise the hash will not match.
30
-
31
-
## Baseline
32
27
the initial implementation (the Delphi baseline found in /baseline) aimed to get a correct output, regardless of performance:
33
28
"Make it work, then make it work better".
34
29
It turns out even the baseline caused some trouble, namely the `Ceil` implementation was yielding different results between FPC and Delphi (and different results between Delphi Win32/Win64).
@@ -267,11 +262,17 @@ The idea:
267
262
- -> data about the same station will be stored at the same index for all threads' data-arrays
268
263
- -> names will also be stored at that same index upon first encounter, and is common to all threads
269
264
- no locking needs to occur when the key is already found, since there is no multiple-write occurring
270
-
- the data-arrays are pre-allocated, and a atomic-counter will be incremented to know where the next element will be stored.
265
+
- the data-arrays are pre-allocated, and an atomic-counter will be incremented to know where the next element will be stored.
271
266
272
267
Thinking again, this is likely similar to the approach mentioned by @synopse in one of his comments.
273
268
274
269
For the ExtractLineData, three ideas to try implementing:
275
270
- avoid using a function, to get rid of the cost of stack checking
276
271
- reduce branching, I think it should be possible to go from 3 if-statements, to only 1
277
272
- unroll the loop (although I had tried this in the past, did not show any improvements)
273
+
274
+
Edit 2:
275
+
- was unable to get rid of the stack_check: removing the function somehow became more expensive, I don't understand why that is.
276
+
- I was able to reduce branching to zero in the parsing of temperature
277
+
- unroll the loop was also beneficial, and even more so when the inner if-statement was removed in favor of branchless
278
+
- dictionary improvements were successful and showed a 30% speedup
0 commit comments