Skip to content

Commit ae54a7b

Browse files
authored
Merge pull request #158 from georges-hatem/main
adjust name
2 parents 32bda29 + badaf50 commit ae54a7b

File tree

4 files changed

+30
-19
lines changed

4 files changed

+30
-19
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ These are the results from running all entries into the challenge on my personal
169169
|--:|----------------:|---------:|:----------|:------|:-------------|
170170
| 1 | 0:1.260 | lazarus-3.99, fpc-3.3.1 | Arnaud Bouchez | Using `mORMot2`, 32 threads | |
171171
| 2 | 0:2.006 | lazarus-3.99, fpc-3.3.1 | O Coddo | Using `SCL`, 32 threads | |
172-
| 3 | 0:3.164 | lazarus-3.99, fpc-3.3.1 | Georges Hatem - FPC | Free Pascal: Using 32 thread | |
172+
| 3 | 0:3.164 | lazarus-3.99, fpc-3.3.1 | Georges Hatem | Using 32 threads | |
173173
| 4 | 0:9.652 | lazarus-3.99, fpc-3.3.1 | G Klark | Using 32 threads | |
174174
| 5 | 0:13.388 | lazarus-3.99, fpc-3.3.1 | Székely Balázs | Using 32 threads | |
175175
| 6 | 0:18.007 | lazarus-3.99, fpc-3.3.1 | Lurendrejer Aksen | Using 32 threads | |

entries/ghatem-fpc/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,8 @@
88
- -t flag to specify the thread-count (default reads the thread-count available on the CPU)
99

1010
currently there are 2 versions that can be compiled / run:
11-
- `OneBRC.lpr -> ghatem `: all threads share the station names - involves locking
12-
- `OneBRC-nosharedname.lpr -> ghatem-nosharedname`: each thread maintains a copy of the station names - no locking involved
13-
- `OneBRC-smallrec.lpr -> ghatem-smallrec `: same as OneBRC, but the StationData's "count" is UInt16 instead of 32. Will likely fail to match hash on the 5B rows test
11+
- `OneBRC.lpr -> ghatem `: compact record, optimal for the 1B row / 41k stations, will fail on the other tests due to overflow
12+
- `OneBRC-largerec.lpr -> ghatem-largerec `: same as OneBRC, but the StationData's "count" is UInt32 instead of 16. Passes all the tests
1413

1514
## Hardware + Environment
1615
host:
@@ -25,10 +24,6 @@ VM (VirtualBox):
2524
- CPU count: 4 out of 8 (threads, probably)
2625
- 20 GB RAM
2726

28-
note about the hash:
29-
run with DEBUG compiler directive to write from stream directly to file, otherwise the hash will not match.
30-
31-
## Baseline
3227
the initial implementation (the Delphi baseline found in /baseline) aimed to get a correct output, regardless of performance:
3328
"Make it work, then make it work better".
3429
It turns out even the baseline caused some trouble, namely the `Ceil` implementation was yielding different results between FPC and Delphi (and different results between Delphi Win32/Win64).
@@ -267,11 +262,17 @@ The idea:
267262
- -> data about the same station will be stored at the same index for all threads' data-arrays
268263
- -> names will also be stored at that same index upon first encounter, and is common to all threads
269264
- no locking needs to occur when the key is already found, since there is no multiple-write occurring
270-
- the data-arrays are pre-allocated, and a atomic-counter will be incremented to know where the next element will be stored.
265+
- the data-arrays are pre-allocated, and an atomic-counter will be incremented to know where the next element will be stored.
271266

272267
Thinking again, this is likely similar to the approach mentioned by @synopse in one of his comments.
273268

274269
For the ExtractLineData, three ideas to try implementing:
275270
- avoid using a function, to get rid of the cost of stack checking
276271
- reduce branching, I think it should be possible to go from 3 if-statements, to only 1
277272
- unroll the loop (although I had tried this in the past, did not show any improvements)
273+
274+
Edit 2:
275+
- was unable to get rid of the stack_check: removing the function somehow became more expensive, I don't understand why that is.
276+
- I was able to reduce branching to zero in the parsing of temperature
277+
- unroll the loop was also beneficial, and even more so when the inner if-statement was removed in favor of branchless
278+
- dictionary improvements were successful and showed a 30% speedup

entries/ghatem-fpc/src/OneBRC-largerec.lpr

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,17 +82,17 @@ TBRCDictionary = class
8282

8383
// searches for a given key, returns if found the key and the storage index
8484
// (or, if not found, which index to use next)
85-
procedure InternalFind(const aKey: Cardinal; out aFound: Boolean; out aIndex: THashSize);
85+
procedure InternalFind(const aKey: Cardinal; out aFound: Boolean; out aIndex: THashSize); {$IFNDEF VALGRIND} inline; {$ENDIF}
8686

8787
public
8888
constructor Create;
8989
destructor Destroy; override;
9090

9191
// simple wrapper to find station-record pointers
92-
function TryGetValue (const aKey: Cardinal; const aThreadNb: TThreadCount; out aValue: PStationData): Boolean; inline;
92+
function TryGetValue (const aKey: Cardinal; const aThreadNb: TThreadCount; out aValue: PStationData): Boolean;
9393

9494
// multithread-unprotected: adds a firstly-encountered station-data (temp, name)
95-
procedure Add (const aHashIdx: THashSize; const aThreadNb: TThreadCount; const aTemp: SmallInt; const aStationName: AnsiString); inline;
95+
procedure Add (const aHashIdx: THashSize; const aThreadNb: TThreadCount; const aTemp: SmallInt; const aStationName: AnsiString); {$IFNDEF VALGRIND} inline; {$ENDIF}
9696

9797
// multithread-protected: safely assign a slot for a given key
9898
function AtomicRegisterHash (const aKey: Cardinal): THashSize;
@@ -114,7 +114,7 @@ TOneBRC = class
114114
FDictionary: TBRCDictionary;
115115

116116
// for a line between idx [aStart; aEnd], returns the station-name length, and the integer-value of temperature
117-
procedure ExtractLineData(const aStart: Int64; const aEnd: Int64; out aLength: ShortInt; out aTemp: SmallInt); inline;
117+
procedure ExtractLineData(const aStart: Int64; const aEnd: Int64; out aLength: ShortInt; out aTemp: SmallInt); {$IFNDEF VALGRIND} inline; {$ENDIF}
118118

119119
public
120120
constructor Create (const aThreadCount: TThreadCount);
@@ -564,6 +564,8 @@ procedure TOneBRC.GenerateOutput;
564564
vHash: Cardinal;
565565
vStations: TStringList;
566566
iStationName: AnsiString;
567+
vIdx: THashSize;
568+
vRes: Boolean;
567569
begin
568570
vStream := TStringStream.Create;
569571
vStations := TStringList.Create;
@@ -587,7 +589,10 @@ procedure TOneBRC.GenerateOutput;
587589
// would it be more efficient to store the hash as well?
588590
// debatable, and the whole output generation is < 0.3 seconds, so not exactly worth it
589591
vHash := crc32c(0, @vStations[i][1], Length (vStations[i]));
590-
FDictionary.TryGetValue(vHash, 0, vData);
592+
593+
FDictionary.InternalFind (vHash, vRes, vIdx);
594+
vData := @FDictionary.FThreadData[0][FDictionary.FIndexes[vIdx]];
595+
591596
vMean := RoundExInteger(vData^.Sum/vData^.Count/10);
592597

593598
vStream.WriteString(

entries/ghatem-fpc/src/OneBRC.lpr

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,17 +82,17 @@ TBRCDictionary = class
8282

8383
// searches for a given key, returns if found the key and the storage index
8484
// (or, if not found, which index to use next)
85-
procedure InternalFind(const aKey: Cardinal; out aFound: Boolean; out aIndex: THashSize);
85+
procedure InternalFind(const aKey: Cardinal; out aFound: Boolean; out aIndex: THashSize); {$IFNDEF VALGRIND} inline; {$ENDIF}
8686

8787
public
8888
constructor Create;
8989
destructor Destroy; override;
9090

9191
// simple wrapper to find station-record pointers
92-
function TryGetValue (const aKey: Cardinal; const aThreadNb: TThreadCount; out aValue: PStationData): Boolean; inline;
92+
function TryGetValue (const aKey: Cardinal; const aThreadNb: TThreadCount; out aValue: PStationData): Boolean; {$IFNDEF VALGRIND} inline; {$ENDIF}
9393

9494
// multithread-unprotected: adds a firstly-encountered station-data (temp, name)
95-
procedure Add (const aHashIdx: THashSize; const aThreadNb: TThreadCount; const aTemp: SmallInt; const aStationName: AnsiString); inline;
95+
procedure Add (const aHashIdx: THashSize; const aThreadNb: TThreadCount; const aTemp: SmallInt; const aStationName: AnsiString); {$IFNDEF VALGRIND} inline; {$ENDIF}
9696

9797
// multithread-protected: safely assign a slot for a given key
9898
function AtomicRegisterHash (const aKey: Cardinal): THashSize;
@@ -114,7 +114,7 @@ TOneBRC = class
114114
FDictionary: TBRCDictionary;
115115

116116
// for a line between idx [aStart; aEnd], returns the station-name length, and the integer-value of temperature
117-
procedure ExtractLineData(const aStart: Int64; const aEnd: Int64; out aLength: ShortInt; out aTemp: SmallInt); inline;
117+
procedure ExtractLineData(const aStart: Int64; const aEnd: Int64; out aLength: ShortInt; out aTemp: SmallInt); {$IFNDEF VALGRIND} inline; {$ENDIF}
118118

119119
public
120120
constructor Create (const aThreadCount: TThreadCount);
@@ -564,6 +564,8 @@ procedure TOneBRC.GenerateOutput;
564564
vHash: Cardinal;
565565
vStations: TStringList;
566566
iStationName: AnsiString;
567+
vIdx: THashSize;
568+
vRes: Boolean;
567569
begin
568570
vStream := TStringStream.Create;
569571
vStations := TStringList.Create;
@@ -587,7 +589,10 @@ procedure TOneBRC.GenerateOutput;
587589
// would it be more efficient to store the hash as well?
588590
// debatable, and the whole output generation is < 0.3 seconds, so not exactly worth it
589591
vHash := crc32c(0, @vStations[i][1], Length (vStations[i]));
590-
FDictionary.TryGetValue(vHash, 0, vData);
592+
593+
FDictionary.InternalFind (vHash, vRes, vIdx);
594+
vData := @FDictionary.FThreadData[0][FDictionary.FIndexes[vIdx]];
595+
591596
vMean := RoundExInteger(vData^.Sum/vData^.Count/10);
592597

593598
vStream.WriteString(

0 commit comments

Comments
 (0)