minor refactoring of the final code

Arnaud Bouchez · Arnaud Bouchez · commit 830cc6c05621 · 2024-05-02T21:45:30.000+02:00
diff --git a/entries/abouchez/README.md b/entries/abouchez/README.md
@@ -48,6 +48,7 @@ Reference implementations of the 1brc challenge in other languages:
 - [another crazy DotNet attempt](https://github.com/noahfalk/1brc/tree/main);
 - [a quite readable C entry](https://github.com/lehuyduc/1brc-simd);
 - [a good blog article with comparison of most known solutions](https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-fastest-on-linux-my-optimization-journey/#results).
+
 Note that those versions did not use the same input as we do in pascal: we use a "41K dataset" with 41343 station names, whereas they were optimized for 400 stations - see the last blog article.
 
 In the compiler landscape, FPC is not as advanced/magical as gcc/llvm are, but it generates good enough code (on paar or better than Delphi's), and works is still done to enhance its output - e.g. by [Kit](https://www.patreon.com/curiouskit). I was amazed how good "pure pascal" code runs even on aarch64, like Ampere (see my blog article above) or on Apple M1/M2.
@@ -96,7 +97,7 @@ Here are the columns meaning:
 - "full" indicates that the full station name is checked, byte-per-byte, to detect any hash collision (not required by our Pascal challenge, but required by the original Java challenge) - so no `X` here states that the ["perfect hash trick"](#perfect-hash-trick) is used by this solution;
 - "nobranch" indicates that the temperature parsing is using a branchless algorithm;
 - "submap" indicates that `mmap()` is not called for the whole 16GB input file, but for each chunk in its own worker thread;
-- "41K" and "400" are the time (in milliseconds) reported on OVH public cloud by `paweld` in [the "Alternative results" discussion thread](https://github/1brc-ObjectPascal/discussions/103#discussioncomment-9273061) for 41343 or 400 stations - so it is on AMD CPU, but not the "official" timing.
+- "41K" and "400" are the time (in milliseconds) reported on OVH public cloud by `paweld` in [the "Alternative results" discussion thread](https://github.com/1brc-ObjectPascal/discussions/103#discussioncomment-9273061) for 41343 or 400 stations - so it is on AMD CPU, but not the "official" timing.
 
 So we have a good coverage on what should be the best solution to propose.
 
@@ -210,7 +211,7 @@ We used a similar branchless approach in our [From Delphi to AVX2](https://blog.
 
 ## Perfect Hash Trick
 
-The "perfect hash" trick was not allowed in the original Java challenge, for good reasons. We have made some versions with full name comparison, but they are noticeably slower, and [the Pascal challenge does not make such requirement](https://github/1brc-ObjectPascal/issues/118).
+The "perfect hash" trick was not allowed in the original Java challenge, for good reasons. We have made some versions with full name comparison, but they are noticeably slower, and [the Pascal challenge does not make such requirement](https://github.com/1brc-ObjectPascal/issues/118).
 
 Our final implementation is safe with the official dataset, and gives the expected result - which was the goal of this challenge: compute the right data reduction with as little time as possible, with all possible hacks and tricks. A "perfect hash" is a well known hacking pattern, when the dataset is validated in advance. We can imagine that if a new weather station appear, we can check for any collision. And since our CPUs offers `crc32c` which is perfect for our dataset... let's use it! https://en.wikipedia.org/wiki/Perfect_hash_function ;)
 
diff --git a/entries/abouchez/src/brcmormot.lpr b/entries/abouchez/src/brcmormot.lpr
@@ -254,6 +254,7 @@ constructor TBrcThread.Create(owner: TBrcMain);
 
 const
   HASHSIZE = 1 shl 18; // slightly oversized to avoid most collisions
+  // we tried with a prime constant for fast modulo mult-by-reciprocal: slower
 
 constructor TBrcMain.Create(const fn: TFileName; threads, chunkmb, max: integer;
   affinity, fullsearch: boolean);
@@ -561,28 +562,25 @@ function TBrcMain.SortedText: RawUtf8;
 begin
   assert(SizeOf(TBrcStation) <= 64 div 4); // 64 = CPU L1 cache line size
   // read command line parameters
-  Executable.Command.ExeDescription := 'The mORMot One Billion Row Challenge';
-  if Executable.Command.Arg(0, 'the data source #filename') then
-    Utf8ToFileName(Executable.Command.Args[0], fn{%H-});
-  verbose := Executable.Command.Option(
-    ['v', 'verbose'], 'generate verbose output with timing');
-  affinity := Executable.Command.Option(
-    ['a', 'affinity'], 'force thread affinity to a single CPU core');
-  full := Executable.Command.Option(
-    ['f', 'full'], 'force full name lookup (disable "perfect hash" trick)');
-  Executable.Command.Get(
-    ['t', 'threads'], threads, '#number of threads to run',
-      SystemInfo.dwNumberOfProcessors);
-  Executable.Command.Get(
-    ['c', 'chunk'], chunkmb, 'size in #megabytes used for per-thread chunking', 16);
-  help := Executable.Command.Option(['h', 'help'], 'display this help');
-  if Executable.Command.ConsoleWriteUnknown then
-    exit
-  else if help or
-     (fn = '') then
+  with Executable.Command do
   begin
-    ConsoleWrite(Executable.Command.FullDescription);
-    exit;
+    ExeDescription := 'The mORMot One Billion Row Challenge';
+    if Arg(0, 'the data source #filename') then
+      Utf8ToFileName(Executable.Command.Args[0], fn{%H-});
+    verbose := Option(['v', 'verbose'], 'generate verbose output with timing');
+    affinity := Option(['a', 'affinity'], 'force thread affinity to a single CPU core');
+    full := Option(['f', 'full'], 'force full name lookup (disable "perfect hash" trick)');
+    Get(['t', 'threads'], threads, '#number of thread workers', SystemInfo.dwNumberOfProcessors);
+    Get(['c', 'chunk'], chunkmb, 'size in #megabytes for per-thread chunking', 16);
+    help := Option(['h', 'help'], 'display this help');
+    if ConsoleWriteUnknown then
+      exit
+    else if help or
+       (fn = '') then
+    begin
+      ConsoleWrite(FullDescription);
+      exit;
+    end;
   end;
   // actual process
   if verbose then