Skip to content

Commit fde946d

Browse files
authored
Merge branch 'gcarreno:main' into main
2 parents 32be537 + 342f6a4 commit fde946d

25 files changed

+2521
-172
lines changed

.gitignore

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@
44
/profiling
55
/results
66
/entries/entries.json
7-
compile_all.*
8-
test_all.*
9-
run_all.*
7+
*_all.*
108

119
# Compiled l10n files: .mo should be ignored
1210
*.mo
@@ -67,3 +65,7 @@ backup/
6765

6866
# Castalia statistics file
6967
*.stat
68+
generator/Delphi/src/OSX64/
69+
generator/Delphi/src/Win32/
70+
generator/Delphi/src/Win64/
71+
generator/Delphi/src/Linux64/

CONTRIBUTING.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
## Formatting
55

6-
All entries should be formatted using the `Ctr-D` shortcut for both Lazarus and Delphi entries.
6+
All source code should be formatted using the default formatting rules for Pascal code, which your IDE should provide. Both Lazarus and Delphi uses `Ctr-D` as a shortcut to format your code.
77

88
## Folder name for the entry
99

@@ -13,11 +13,11 @@ For example, using _Gustavo Carreno_ for the name, the folder would be `entries/
1313

1414
## Name of the executable binary
1515

16-
The executable binary follows the same rules has the entry folder above.
16+
The executable binary follows the same rules has the entry folder above; therefore, on Windows, the above example's executable would be `gcarreno.exe` and on Linux, just `gcarreno`.
1717

1818
## Placement of the executable binary
1919

20-
The executable binary should be placed under a folder named `bin` below the root folder of this repository.
20+
The executable binary should be placed under a folder named `bin` below the root folder of this repository (`../../../bin` relative to your source).
2121

2222
This folder is not present on the repository and is being ignored from the `.gitignore`.
2323

@@ -57,7 +57,8 @@ The type is contained within the title and can be one of these types:
5757
Subjects should be no greater than 50 characters, should begin with a capital letter and do not end with a period.
5858

5959
Use an imperative tone to describe what a commit does, rather than what it did. For example, use change; not changed or changes.
60-
The Body
60+
61+
### The Body
6162

6263
Not all commits are complex enough to warrant a body, therefore it is optional and only used when a commit requires a bit of explanation and context. Use the body to explain the what and why of a commit, not the how.
6364

README.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -36,36 +36,37 @@ The task is to write an Object Pascal program which reads the file, calculates t
3636
```
3737

3838
## Entering The Challenge
39-
Submissions will be via a `PR`( Pull Request ) to this repository. \
39+
Submissions will be via a `PR` (Pull Request) to this repository.
4040
The challenge will run from the 10th of March until the 10th of May, 2024.
4141

4242
When creating your entry, please do as follows:
4343
1. Create a folder under `entries` with your first initial and last name, e.g., for Gustavo Carreno: `entries/gcarreno`.
4444
2. If you're worried about anonymity, because the Internet stinks, feel free to use a fictional one: Bruce Wayne, Clark Kent, James Logan, Peter Parker, Diana of Themyscira. Your pick!
4545
3. Create a `README.md` with some content about your approach, e.g., `entries/gcarreno/README.md`.
4646
4. Put all your code under `entries/<your name>/src`, e.g., `entries/gcarreno/src`.
47-
5. If you need to provide a custom `.gitignore` for something not present in the main one, please do.
47+
5. Send your binary to the `bin` folder off the root of this repository.
48+
6. If you need to provide a custom `.gitignore` for something not present in the main one, please do.
49+
7. Read the [CONTRIBUTING.md](./CONTRIBUTING.md) file for more details.
4850

4951
This challenge is mainly to allow us to learn something new. This means that copying code from others will be allowed, under these conditions:
5052
1. You can only use pure Object Pascal with no calls to any operating system's `API` or external `C/C++` libraries. \
51-
**There's been a bit of confusion about this restriction.** \
52-
To clear that out: You can use any package/custom code you want. \
53-
As long as it compiles cross-platform and itself is only pure Object Pascal. \
54-
Anything from the `Jedi Project` or even `mORMmot` ( or anything else ), if it compiles, runs cross-platform it's allowed.
53+
**There's been a bit of confusion about this restriction.**
54+
- To clear that out: You can use any package/custom code you want.
55+
- As long as it compiles cross-platform and itself is only pure Object Pascal.
56+
- Anything from the `Jedi Project` or even `mORMmot` ( or anything else ), if it compiles, runs cross-platform it's allowed.
5557
2. The code must have some sort of mention/attribution to the original author, in case you've used someone else's code.
5658
3. It's not a blatant copy just for the sake of submission.
5759
4. It adds something of value, not just a different code formatting.
5860
5. All code should be formatted with the `IDE`'s default formatting tool.
5961

60-
**IMPORTANT** \
61-
This challenge can be entered even if you only have access to the Community Edition of RAD Studio. \
62-
I have a Windows VM, with RAD Studio installed, that will do the necessary cross compilation into my Linux host.
62+
**IMPORTANT**
63+
This challenge can be entered even if you only have access to the Community Edition of RAD Studio. I have a Windows VM, with RAD Studio installed, that will do the necessary cross compilation into my Linux host.
6364

6465
Submit your implementation and become part of the leader board!
6566

6667
## Rounding
6768

68-
Székely Balázs has provided code for rounding towards positive infinity per the original challenge.\
69+
Székely Balázs has provided code for rounding towards positive infinity per the original challenge.
6970
This will be the official way to round the output values:
7071
```pas
7172
function TBaseline.RoundEx(x: Double): Double;
@@ -96,7 +97,7 @@ end;
9697
```
9798

9899
## Generating the measurements.txt
99-
> **NOTE** \
100+
> **NOTE**
100101
> We now have both a Lazarus version and a Delphi version of the generator for both 32b and 64b.
101102
102103
In order to produce the One Billion Rows of text, we are providing the [source code](./generator) for the official generator, so we all have the same entry data.
@@ -110,7 +111,7 @@ In order to produce the One Billion Rows of text, we are providing the [source c
110111
| **-n** or **--line-count \<number\>** | The amount of lines to be generated ( Can use 1_000_000_000 ) |
111112

112113
## Baseline
113-
> **NOTE** \
114+
> **NOTE**
114115
> This is still a bit in flux, still needing to get the Delphi version done.
115116
116117
In order to verify the official output, we are providing the [source code](./baseline) for the official baseline.
@@ -178,7 +179,7 @@ These are the results from running all entries into the challenge on my personal
178179
| 4 | 1:16.059 | lazarus-3.0, fpc-3.2.2 | Richard Lawson | Using 1 thread | |
179180
| 5 | 12:40.179 | lazarus-3.0, fpc-3.2.2 | Iwan Kelaiah | Using 1 thread | |
180181

181-
> ** NOTE **
182+
> **NOTE**
182183
>
183184
> After some tests performed by @paweld, it makes no sense to have an `HDD` run.
184185
> I've removed that from the results
@@ -206,7 +207,8 @@ A: Ubuntu 23.10 64b.
206207
I'd like to thank [@paweld](https://github.com/paweld) for taking us from my miserable 20m attempt, to a whopping ~25s, beating the [Python script](https://github.com/gunnarmorling/1brc/blob/main/src/main/python/create_measurements.py) by about 4 and a half minutes.\
207208
I'd like to thank [@mobius](https://github.com/mobius1qwe) for taking the time to provide the Delphi version of the generator.\
208209
I'd like to thank [@dtpfl](https://github.com/dtpfl) for his invaluable work on maintaining the `README.md` file up to date with everything.\
209-
I'd like to thank Székely Balázs for providing many patches to make everything compliant with the original challenge.
210+
I'd like to thank Székely Balázs for providing many patches to make everything compliant with the original challenge.\
211+
I'd like to thank [@corneliusdavid](https://github.com/corneliusdavid) for giving some of the information files a once over and making things more legible and clear.
210212

211213
## Links
212214
The original repository: https://github.com/gunnarmorling/1brc \

entries/dcornelius/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# David Cornelius
2+
3+
An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 Athens by [David Cornelius](https://github.com/corneliusdavid).
4+
5+
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I solved this in three different ways:
6+
7+
- **TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, after this list is built, another list must be created to sort them.
8+
- **TStringList** - this is a really simple implementation but requires a lot of memory because the `LoadFromFile` method is used to read in all rows before processing them. Then a second TStringList is used to collate and sort the data. *NOTE: Using LoadFromFile resulted in an immediate Range Check Error when trying to read in the 1-billion line file! The default Stream created in `LoadFromFile` was the problem. When I switched to LoadFromStream and created my own Stream, it worked. However, it it's not near as fast as the `TDictionary` version.*
9+
- **In-Memory Table** - another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. While I did learn some cool things about FireDAC, I also learned that this is by far, the most *inefficient* approach for this: after running for 26 *HOURS*, I killed the process!
10+
11+
## Compiler
12+
13+
**Delphi 12 Athens** Enterprise Edition - which includes the ability to generate Linux console apps.
14+
15+
### Dependencies
16+
17+
There are no dependencies if run under one of the most recent versions of Delphi. The code should be backwards-compatible to Delphi 10.3 Rio (it uses inline variables and type inference introduced in that version) and further back with a few simple modifications. It uses `System.StrUtils`, `Generics.Collections`, and a few other run-time libraries in Delphi.
18+
19+
### Conditional Compilation
20+
21+
There are compiler directives to add some convenience when debugging. If built with the default *Debug* configuration, then the DEBUG compiler symbol is defined which turns on a few lines of code that give a little feedback and wait for Enter to be pressed so you can run this from the IDE without missing the quickly disappearing DOS box where the output is displayed.
22+
23+
Thus, build in *Release* mode for the official challenge.
24+
25+
### Execution
26+
27+
The program runs identical between Win32, Win64, and Linux64. If you run it without exactly two parameters, it will display the syntax and exit. The parameters are:
28+
29+
- **Filename**: this is the input filename containing the measurements data as defined by the challenge.
30+
- *Method*: this specifies which method to use when reading and collating the data and correspond to the methods as discussed above:
31+
- **DIC**: TDictionary
32+
- **TSL**: TStringList; NOT an official entry--very slow.
33+
- **TBL**: In-Memory Table; NOT an official entry, only test with *small* data sets
34+
35+
#### Example
36+
37+
To run the challenge, read from the `measurements.txt` file, and use the TDictionary method, run it like this:
38+
39+
```
40+
C:> docrnelius measurements.txt dic
41+
```
42+
43+
## Remarks
44+
45+
I entered this challenge as a learning experience. I did not expect to be the fastest as I don't have time to implement multiple threads (which is clearly the road to victory here) but I had fun and learned a lot!
46+
47+
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files.
48+
49+
I also learned some things about a `TDictionary` and why it works so well for this particular situation. And, *after* I implemented this method, I looked at other entries and noticed the one by IWAN KELAIAH was very similar to mine. While the "ikelaiah" entry was submitted before mine, I did not look at or copy anything from that implementation. Chalk it up to great minds thinking alike, I guess!
50+
51+
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
52+
53+
## History
54+
55+
- Version 1.0: working version with `TDictionary`, `TStringList`, and `TFDMemTable` methods implemented.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
@echo off
2+
echo Build 1BRC Entry for DCornelius - Delphi 12 Athens, Win32, Release mode.
3+
echo Assumes Delphi 12 is in the path and can find the compiler for Win32 (dcc32.exe).
4+
pause
5+
6+
REM RSVars is a batch supplied with Delphi and sets environment variables used in the compilation
7+
call RSVars
8+
9+
REM -$L- : no debug symbols
10+
REM -$Y- : no symbol reference info
11+
REM -B : build all units
12+
REM -Q : quiet compile
13+
REM -TX : set extension
14+
REM -D : define compiler symbol
15+
REM -E : output folder
16+
REM -CC : console target
17+
REM -U : unit folders
18+
dcc32.exe -$L- -$Y- --no-config -B -Q -TX.exe -DRELEASE -E..\..\..\bin -CC -U"%BDS%\lib\win32\release" dcornelius.dpr
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
@echo off
2+
echo Build 1BRC Entry for DCornelius - Delphi 12 Athens, Win64, Release mode.
3+
echo Assumes Delphi 12 is in the path and can find the compiler for Win64 (dcc64.exe).
4+
pause
5+
6+
REM RSVars is a batch supplied with Delphi and sets environment variables used in the compilation
7+
call RSVars
8+
9+
REM -$L- : no debug symbols
10+
REM -$Y- : no symbol reference info
11+
REM -B : build all units
12+
REM -Q : quiet compile
13+
REM -TX : set extension
14+
REM -D : define compiler symbol
15+
REM -E : output folder
16+
REM -CC : console target
17+
REM -U : unit folders
18+
dcc64.exe -$L- -$Y- --no-config -B -Q -TX.exe -DRELEASE -E..\..\..\bin -CC -U"%BDS%\lib\Win64\release" dcornelius.dpr
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
program dcornelius;
2+
(* as: OBRC_DCornelius.dpr
3+
* by: David Cornelius
4+
* on: March, 2024
5+
* in: Delphi 12 Athens
6+
* to: submit an entry in the One Billion Row Challenge
7+
*
8+
* Development/Testing was done on a 3.8 GHz Intel i7 desktop computer running Windows 11.
9+
*
10+
* NOTE: Build with 'Debug' configuration for messages, automatic timing, and a pause at the end.
11+
* Build with 'Release' configuration for the official submission.
12+
*)
13+
14+
{$APPTYPE CONSOLE}
15+
16+
{$R *.res}
17+
18+
uses
19+
System.SysUtils, System.Classes,
20+
{$IFDEF DEBUG}
21+
System.Diagnostics,
22+
{$ENDIF }
23+
uChallengeWithDictionary in 'uChallengeWithDictionary.pas',
24+
uChallengeCommon in 'uChallengeCommon.pas',
25+
uChallengeWithStringList in 'uChallengeWithStringList.pas',
26+
udmChallengeWithFireDAC in 'udmChallengeWithFireDAC.pas' {dmChallengeWithFireDAC: TDataModule};
27+
28+
procedure DisplaySyntax;
29+
begin
30+
Writeln('SYNTAX: ' + ExtractFileName(ParamStr(0)) + ' <filename> <method>');
31+
Writeln(' where <filename> is a text file with weather station data');
32+
Writeln(' and <method> is the algorytm for summarizing the data:');
33+
Writeln(' TSL = read in all data to a TStringList (lots of memory needed)');
34+
Writeln(' DIC = build a Dictionary, then sort after built');
35+
Writeln(' TBL = load a FireDAC in-memory table - warning: takes several hours!');
36+
Writeln;
37+
{$IFDEF DEBUG}
38+
Writeln('Running in Debug mode.');
39+
{$ELSE}
40+
Writeln('Running in Release mode.');
41+
{$ENDIF}
42+
Writeln('Press ENTER...');
43+
Readln;
44+
end;
45+
46+
begin
47+
try
48+
if ParamCount <> 2 then
49+
DisplaySyntax
50+
else begin
51+
{$IFDEF DEBUG}
52+
var StopWatch := TStopwatch.StartNew;
53+
{$ENDIF}
54+
ChallengeCommon := TChallengeCommon.Create(ParamStr(1));
55+
try
56+
var Method := ParamStr(2);
57+
if SameText(Method, 'TSL') then
58+
ChallengeWithStringList
59+
else if SameText(Method, 'DIC') then
60+
ChallengeWithDictionary
61+
else if SameText(Method, 'TBL') then
62+
ChallengeWithFireDAC
63+
else
64+
raise EArgumentException.Create('Invalid method');
65+
finally
66+
ChallengeCommon.Free;
67+
end;
68+
{$IFDEF DEBUG}
69+
StopWatch.Stop;
70+
var ms := StopWatch.ElapsedMilliseconds;
71+
Writeln(Format('Elapsed Time milliseconds: %d, minutes: ~%d:%d', [ms, ms div 1000 div 60, (ms div 1000) mod 60]));
72+
Readln;
73+
{$ENDIF}
74+
end;
75+
except
76+
on E: Exception do
77+
Writeln(E.ClassName, ': ', E.Message);
78+
end;
79+
end.

0 commit comments

Comments
 (0)