Skip to content

Commit 8fe8623

Browse files
committed
minor fixes
1 parent 5d9273d commit 8fe8623

File tree

2 files changed

+13
-14
lines changed

2 files changed

+13
-14
lines changed

entries/abcz/README.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,17 @@ I am very happy to share decades of server-side performance coding techniques us
1919
Here are the main ideas behind this implementation proposal:
2020

2121
- **mORMot** makes cross-platform and cross-compiler support simple (e.g. `TMemMap`, `TDynArray.Sort`,`TTextWriter`, `SetThreadCpuAffinity`, `crc32c`, `ConsoleWrite` or command-line parsing);
22-
- Memory map the entire 16GB file at once (so won't work on 32-bit OS, but reduce syscalls);
22+
- Will memmap the entire 16GB file at once into memory (so won't work on 32-bit OS, but reduce syscalls);
2323
- Process file in parallel using several threads (configurable, with `-t=16` by default);
24-
- Each thread is fed from 64MB chunks of input (because thread scheduling is unfair, it is inefficient to pre-divide the size of the whole input file into the number of threads);
24+
- Fed each thread from 64MB chunks of input (because thread scheduling is unfair, it is inefficient to pre-divide the size of the whole input file into the number of threads);
2525
- Each thread manages its own data, so there is no lock until the thread is finished and data is consolidated;
26-
- Each station information (name and values) is packed into a record of exactly 64 bytes, with no external pointer/string, so match the CPU L1 cache size for efficiency;
26+
- Each station information (name and values) is packed into a record of exactly 64 bytes, with no external pointer/string, to match the CPU L1 cache size for efficiency;
2727
- Use a dedicated hash table for the name lookup, with direct crc32c SSE4.2 hash - when `TDynArrayHashed` is involved, it requires a transient name copy on the stack, which is noticeably slower (see last paragraph of this document);
28-
- Store values as 16-bit or 32-bit integers (temperature multiplied by 10);
28+
- Store values as 16-bit or 32-bit integers (i.e. temperature multiplied by 10);
2929
- Parse temperatures with a dedicated code (expects single decimal input values);
3030
- No memory allocation (e.g. no transient `string` or `TBytes`) nor any syscall is done during the parsing process to reduce contention and ensure the process is only CPU-bound and RAM-bound (we checked this with `strace` on Linux);
31-
- Pascal code was tuned to generate the best possible asm output on FPC x86_64 (which is our target) with no SIMD involved;
32-
- Some dedicated x86_64 asm has been written to replace mORMot `crc32c` and `MemCmp` general-purpose functions and gain a last few percents;
31+
- Pascal code was tuned to generate the best possible asm output on FPC x86_64 (which is our target);
32+
- Some dedicated x86_64 asm has been written to replace mORMot `crc32c` and `MemCmp` general-purpose functions and gain a last few percents (nice to have);
3333
- Can optionally output timing statistics and hash value on the console to debug and refine settings (with the `-v` command line switch);
3434
- Can optionally set each thread affinity to a single core (with the `-a` command line switch).
3535

@@ -60,11 +60,9 @@ We will use these command-line switches for local (dev PC), and benchmark (chall
6060

6161
## Local Analysis
6262

63-
On my PC, it takes less than 5 seconds to process the 16GB file with 8 threads.
63+
On my PC, it takes less than 5 seconds to process the 16GB file with 8/10 threads.
6464

65-
If we use the `time` command on Linux, we can see that there is little time spend in kernel (sys) land.
66-
67-
If we compare our `mormot` with a solid multi-threaded entry using file buffer reads and no memory map (like `sbalazs`):
65+
Let's compare our `mormot` with a solid multi-threaded entry using file buffer reads and no memory map (like `sbalazs`), using the `time` command on Linux:
6866

6967
```
7068
ab@dev:~/dev/github/1brc-ObjectPascal/bin$ time ./mormot measurements.txt -t=10 >resmrel5.txt
@@ -79,7 +77,7 @@ real 0m25,330s
7977
user 6m44,853s
8078
sys 0m31,167s
8179
```
82-
We used 20 threads for `sbalazs`, and 10 threads for `mormot` because it was giving the best results on each entry on this particular PC.
80+
We used 20 threads for `sbalazs`, and 10 threads for `mormot` because it was giving the best results for each program on our PC.
8381

8482
Apart from the obvious global "wall" time reduction (`real` numbers), the raw parsing and data gathering in the threads match the number of threads and the running time (`user` numbers), and no syscall is involved by `mormot` thanks to the memory mapping of the whole file (`sys` numbers, which contain only memory page faults).
8583

@@ -125,7 +123,7 @@ On the https://github.com/gcarreno/1brc-ObjectPascal challenge hardware, which i
125123
./mormot measurements.txt -v -t=24 -a
126124
./mormot measurements.txt -v -t=32 -a
127125
```
128-
Please run those command lines, to guess which parameters are to be run for the benchmark to give the best results on the actual benchmark PC with its Ryzen 9 CPU. We will see if core affinity makes a difference here.
126+
Please run those command lines, to guess which parameters are to be run for the benchmark, and would give the best results on the actual benchmark PC with its Ryzen 9 CPU. We will see if core affinity makes a difference here.
129127

130128
## Feedback Needed
131129

entries/abcz/src/brcmormot.lpr

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -518,7 +518,7 @@ function TBrcMain.SortedText: RawUtf8;
518518
var
519519
fn: TFileName;
520520
threads: integer;
521-
verbose, affinity: boolean;
521+
verbose, affinity, help: boolean;
522522
main: TBrcMain;
523523
res: RawUtf8;
524524
start, stop: Int64;
@@ -533,9 +533,10 @@ function TBrcMain.SortedText: RawUtf8;
533533
['a', 'affinity'], 'force thread affinity to a single CPU core');
534534
Executable.Command.Get(
535535
['t', 'threads'], threads, '#number of threads to run', 16);
536+
help := Executable.Command.Option(['h', 'help'], 'display this help');
536537
if Executable.Command.ConsoleWriteUnknown then
537538
exit
538-
else if Executable.Command.Option(['h', 'help'], 'display this help') or
539+
else if help or
539540
(fn = '') then
540541
begin
541542
ConsoleWrite(Executable.Command.FullDescription);

0 commit comments

Comments
 (0)