Skip to content

Commit d1cb21f

Browse files
committed
readme improvements
1 parent 91c0cc1 commit d1cb21f

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

entries/ghatem-fpc/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,8 @@
88
- -t flag to specify the thread-count (default reads the thread-count available on the CPU)
99

1010
currently there are 2 versions that can be compiled / run:
11-
- `OneBRC.lpr -> ghatem `: all threads share the station names - involves locking
12-
- `OneBRC-nosharedname.lpr -> ghatem-nosharedname`: each thread maintains a copy of the station names - no locking involved
13-
- `OneBRC-smallrec.lpr -> ghatem-smallrec `: same as OneBRC, but the StationData's "count" is UInt16 instead of 32. Will likely fail to match hash on the 5B rows test
11+
- `OneBRC.lpr -> ghatem `: compact record, optimal for the 1B row / 41k stations, will fail on the other tests due to overflow
12+
- `OneBRC-largerec.lpr -> ghatem-largerec `: same as OneBRC, but the StationData's "count" is UInt32 instead of 16. Passes all the tests
1413

1514
## Hardware + Environment
1615
host:
@@ -25,10 +24,6 @@ VM (VirtualBox):
2524
- CPU count: 4 out of 8 (threads, probably)
2625
- 20 GB RAM
2726

28-
note about the hash:
29-
run with DEBUG compiler directive to write from stream directly to file, otherwise the hash will not match.
30-
31-
## Baseline
3227
the initial implementation (the Delphi baseline found in /baseline) aimed to get a correct output, regardless of performance:
3328
"Make it work, then make it work better".
3429
It turns out even the baseline caused some trouble, namely the `Ceil` implementation was yielding different results between FPC and Delphi (and different results between Delphi Win32/Win64).
@@ -267,11 +262,17 @@ The idea:
267262
- -> data about the same station will be stored at the same index for all threads' data-arrays
268263
- -> names will also be stored at that same index upon first encounter, and is common to all threads
269264
- no locking needs to occur when the key is already found, since there is no multiple-write occurring
270-
- the data-arrays are pre-allocated, and a atomic-counter will be incremented to know where the next element will be stored.
265+
- the data-arrays are pre-allocated, and an atomic-counter will be incremented to know where the next element will be stored.
271266

272267
Thinking again, this is likely similar to the approach mentioned by @synopse in one of his comments.
273268

274269
For the ExtractLineData, three ideas to try implementing:
275270
- avoid using a function, to get rid of the cost of stack checking
276271
- reduce branching, I think it should be possible to go from 3 if-statements, to only 1
277272
- unroll the loop (although I had tried this in the past, did not show any improvements)
273+
274+
Edit 2:
275+
- was unable to get rid of the stack_check: removing the function somehow became more expensive, I don't understand why that is.
276+
- I was able to reduce branching to zero in the parsing of temperature
277+
- unroll the loop was also beneficial, and even more so when the inner if-statement was removed in favor of branchless
278+
- dictionary improvements were successful and showed a 30% speedup

0 commit comments

Comments
 (0)