You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[a quite readable C entry](https://github.com/lehuyduc/1brc-simd);
50
50
-[a good blog article with comparison of most known solutions](https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-fastest-on-linux-my-optimization-journey/#results).
51
+
51
52
Note that those versions did not use the same input as we do in pascal: we use a "41K dataset" with 41343 station names, whereas they were optimized for 400 stations - see the last blog article.
52
53
53
54
In the compiler landscape, FPC is not as advanced/magical as gcc/llvm are, but it generates good enough code (on paar or better than Delphi's), and works is still done to enhance its output - e.g. by [Kit](https://www.patreon.com/curiouskit). I was amazed how good "pure pascal" code runs even on aarch64, like Ampere (see my blog article above) or on Apple M1/M2.
@@ -96,7 +97,7 @@ Here are the columns meaning:
96
97
- "full" indicates that the full station name is checked, byte-per-byte, to detect any hash collision (not required by our Pascal challenge, but required by the original Java challenge) - so no `X` here states that the ["perfect hash trick"](#perfect-hash-trick) is used by this solution;
97
98
- "nobranch" indicates that the temperature parsing is using a branchless algorithm;
98
99
- "submap" indicates that `mmap()` is not called for the whole 16GB input file, but for each chunk in its own worker thread;
99
-
- "41K" and "400" are the time (in milliseconds) reported on OVH public cloud by `paweld` in [the "Alternative results" discussion thread](https://github/1brc-ObjectPascal/discussions/103#discussioncomment-9273061) for 41343 or 400 stations - so it is on AMD CPU, but not the "official" timing.
100
+
- "41K" and "400" are the time (in milliseconds) reported on OVH public cloud by `paweld` in [the "Alternative results" discussion thread](https://github.com/1brc-ObjectPascal/discussions/103#discussioncomment-9273061) for 41343 or 400 stations - so it is on AMD CPU, but not the "official" timing.
100
101
101
102
So we have a good coverage on what should be the best solution to propose.
102
103
@@ -210,7 +211,7 @@ We used a similar branchless approach in our [From Delphi to AVX2](https://blog.
210
211
211
212
## Perfect Hash Trick
212
213
213
-
The "perfect hash" trick was not allowed in the original Java challenge, for good reasons. We have made some versions with full name comparison, but they are noticeably slower, and [the Pascal challenge does not make such requirement](https://github/1brc-ObjectPascal/issues/118).
214
+
The "perfect hash" trick was not allowed in the original Java challenge, for good reasons. We have made some versions with full name comparison, but they are noticeably slower, and [the Pascal challenge does not make such requirement](https://github.com/1brc-ObjectPascal/issues/118).
214
215
215
216
Our final implementation is safe with the official dataset, and gives the expected result - which was the goal of this challenge: compute the right data reduction with as little time as possible, with all possible hacks and tricks. A "perfect hash" is a well known hacking pattern, when the dataset is validated in advance. We can imagine that if a new weather station appear, we can check for any collision. And since our CPUs offers `crc32c` which is perfect for our dataset... let's use it! https://en.wikipedia.org/wiki/Perfect_hash_function ;)
0 commit comments