Skip to content

Commit 830cc6c

Browse files
author
Arnaud Bouchez
committed
minor refactoring of the final code
1 parent 11c7b53 commit 830cc6c

File tree

2 files changed

+22
-23
lines changed

2 files changed

+22
-23
lines changed

entries/abouchez/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ Reference implementations of the 1brc challenge in other languages:
4848
- [another crazy DotNet attempt](https://github.com/noahfalk/1brc/tree/main);
4949
- [a quite readable C entry](https://github.com/lehuyduc/1brc-simd);
5050
- [a good blog article with comparison of most known solutions](https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-fastest-on-linux-my-optimization-journey/#results).
51+
5152
Note that those versions did not use the same input as we do in pascal: we use a "41K dataset" with 41343 station names, whereas they were optimized for 400 stations - see the last blog article.
5253

5354
In the compiler landscape, FPC is not as advanced/magical as gcc/llvm are, but it generates good enough code (on paar or better than Delphi's), and works is still done to enhance its output - e.g. by [Kit](https://www.patreon.com/curiouskit). I was amazed how good "pure pascal" code runs even on aarch64, like Ampere (see my blog article above) or on Apple M1/M2.
@@ -96,7 +97,7 @@ Here are the columns meaning:
9697
- "full" indicates that the full station name is checked, byte-per-byte, to detect any hash collision (not required by our Pascal challenge, but required by the original Java challenge) - so no `X` here states that the ["perfect hash trick"](#perfect-hash-trick) is used by this solution;
9798
- "nobranch" indicates that the temperature parsing is using a branchless algorithm;
9899
- "submap" indicates that `mmap()` is not called for the whole 16GB input file, but for each chunk in its own worker thread;
99-
- "41K" and "400" are the time (in milliseconds) reported on OVH public cloud by `paweld` in [the "Alternative results" discussion thread](https://github/1brc-ObjectPascal/discussions/103#discussioncomment-9273061) for 41343 or 400 stations - so it is on AMD CPU, but not the "official" timing.
100+
- "41K" and "400" are the time (in milliseconds) reported on OVH public cloud by `paweld` in [the "Alternative results" discussion thread](https://github.com/1brc-ObjectPascal/discussions/103#discussioncomment-9273061) for 41343 or 400 stations - so it is on AMD CPU, but not the "official" timing.
100101

101102
So we have a good coverage on what should be the best solution to propose.
102103

@@ -210,7 +211,7 @@ We used a similar branchless approach in our [From Delphi to AVX2](https://blog.
210211

211212
## Perfect Hash Trick
212213

213-
The "perfect hash" trick was not allowed in the original Java challenge, for good reasons. We have made some versions with full name comparison, but they are noticeably slower, and [the Pascal challenge does not make such requirement](https://github/1brc-ObjectPascal/issues/118).
214+
The "perfect hash" trick was not allowed in the original Java challenge, for good reasons. We have made some versions with full name comparison, but they are noticeably slower, and [the Pascal challenge does not make such requirement](https://github.com/1brc-ObjectPascal/issues/118).
214215

215216
Our final implementation is safe with the official dataset, and gives the expected result - which was the goal of this challenge: compute the right data reduction with as little time as possible, with all possible hacks and tricks. A "perfect hash" is a well known hacking pattern, when the dataset is validated in advance. We can imagine that if a new weather station appear, we can check for any collision. And since our CPUs offers `crc32c` which is perfect for our dataset... let's use it! https://en.wikipedia.org/wiki/Perfect_hash_function ;)
216217

entries/abouchez/src/brcmormot.lpr

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,7 @@ constructor TBrcThread.Create(owner: TBrcMain);
254254

255255
const
256256
HASHSIZE = 1 shl 18; // slightly oversized to avoid most collisions
257+
// we tried with a prime constant for fast modulo mult-by-reciprocal: slower
257258

258259
constructor TBrcMain.Create(const fn: TFileName; threads, chunkmb, max: integer;
259260
affinity, fullsearch: boolean);
@@ -561,28 +562,25 @@ function TBrcMain.SortedText: RawUtf8;
561562
begin
562563
assert(SizeOf(TBrcStation) <= 64 div 4); // 64 = CPU L1 cache line size
563564
// read command line parameters
564-
Executable.Command.ExeDescription := 'The mORMot One Billion Row Challenge';
565-
if Executable.Command.Arg(0, 'the data source #filename') then
566-
Utf8ToFileName(Executable.Command.Args[0], fn{%H-});
567-
verbose := Executable.Command.Option(
568-
['v', 'verbose'], 'generate verbose output with timing');
569-
affinity := Executable.Command.Option(
570-
['a', 'affinity'], 'force thread affinity to a single CPU core');
571-
full := Executable.Command.Option(
572-
['f', 'full'], 'force full name lookup (disable "perfect hash" trick)');
573-
Executable.Command.Get(
574-
['t', 'threads'], threads, '#number of threads to run',
575-
SystemInfo.dwNumberOfProcessors);
576-
Executable.Command.Get(
577-
['c', 'chunk'], chunkmb, 'size in #megabytes used for per-thread chunking', 16);
578-
help := Executable.Command.Option(['h', 'help'], 'display this help');
579-
if Executable.Command.ConsoleWriteUnknown then
580-
exit
581-
else if help or
582-
(fn = '') then
565+
with Executable.Command do
583566
begin
584-
ConsoleWrite(Executable.Command.FullDescription);
585-
exit;
567+
ExeDescription := 'The mORMot One Billion Row Challenge';
568+
if Arg(0, 'the data source #filename') then
569+
Utf8ToFileName(Executable.Command.Args[0], fn{%H-});
570+
verbose := Option(['v', 'verbose'], 'generate verbose output with timing');
571+
affinity := Option(['a', 'affinity'], 'force thread affinity to a single CPU core');
572+
full := Option(['f', 'full'], 'force full name lookup (disable "perfect hash" trick)');
573+
Get(['t', 'threads'], threads, '#number of thread workers', SystemInfo.dwNumberOfProcessors);
574+
Get(['c', 'chunk'], chunkmb, 'size in #megabytes for per-thread chunking', 16);
575+
help := Option(['h', 'help'], 'display this help');
576+
if ConsoleWriteUnknown then
577+
exit
578+
else if help or
579+
(fn = '') then
580+
begin
581+
ConsoleWrite(FullDescription);
582+
exit;
583+
end;
586584
end;
587585
// actual process
588586
if verbose then

0 commit comments

Comments
 (0)