You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
4
4
## Formatting
5
5
6
-
All entries should be formatted using the `Ctr-D`shortcut for both Lazarus and Delphi entries.
6
+
All source code should be formatted using the default formatting rules for Pascal code, which your IDE should provide. Both Lazarus and Delphi uses `Ctr-D`as a shortcut to format your code.
7
7
8
8
## Folder name for the entry
9
9
@@ -13,11 +13,11 @@ For example, using _Gustavo Carreno_ for the name, the folder would be `entries/
13
13
14
14
## Name of the executable binary
15
15
16
-
The executable binary follows the same rules has the entry folder above.
16
+
The executable binary follows the same rules has the entry folder above; therefore, on Windows, the above example's executable would be `gcarreno.exe` and on Linux, just `gcarreno`.
17
17
18
18
## Placement of the executable binary
19
19
20
-
The executable binary should be placed under a folder named `bin` below the root folder of this repository.
20
+
The executable binary should be placed under a folder named `bin` below the root folder of this repository (`../../../bin` relative to your source).
21
21
22
22
This folder is not present on the repository and is being ignored from the `.gitignore`.
23
23
@@ -57,7 +57,8 @@ The type is contained within the title and can be one of these types:
57
57
Subjects should be no greater than 50 characters, should begin with a capital letter and do not end with a period.
58
58
59
59
Use an imperative tone to describe what a commit does, rather than what it did. For example, use change; not changed or changes.
60
-
The Body
60
+
61
+
### The Body
61
62
62
63
Not all commits are complex enough to warrant a body, therefore it is optional and only used when a commit requires a bit of explanation and context. Use the body to explain the what and why of a commit, not the how.
Copy file name to clipboardExpand all lines: README.md
+16-14Lines changed: 16 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,36 +36,37 @@ The task is to write an Object Pascal program which reads the file, calculates t
36
36
```
37
37
38
38
## Entering The Challenge
39
-
Submissions will be via a `PR`( Pull Request) to this repository. \
39
+
Submissions will be via a `PR` (Pull Request) to this repository.
40
40
The challenge will run from the 10th of March until the 10th of May, 2024.
41
41
42
42
When creating your entry, please do as follows:
43
43
1. Create a folder under `entries` with your first initial and last name, e.g., for Gustavo Carreno: `entries/gcarreno`.
44
44
2. If you're worried about anonymity, because the Internet stinks, feel free to use a fictional one: Bruce Wayne, Clark Kent, James Logan, Peter Parker, Diana of Themyscira. Your pick!
45
45
3. Create a `README.md` with some content about your approach, e.g., `entries/gcarreno/README.md`.
46
46
4. Put all your code under `entries/<your name>/src`, e.g., `entries/gcarreno/src`.
47
-
5. If you need to provide a custom `.gitignore` for something not present in the main one, please do.
47
+
5. Send your binary to the `bin` folder off the root of this repository.
48
+
6. If you need to provide a custom `.gitignore` for something not present in the main one, please do.
49
+
7. Read the [CONTRIBUTING.md](./CONTRIBUTING.md) file for more details.
48
50
49
51
This challenge is mainly to allow us to learn something new. This means that copying code from others will be allowed, under these conditions:
50
52
1. You can only use pure Object Pascal with no calls to any operating system's `API` or external `C/C++` libraries. \
51
-
**There's been a bit of confusion about this restriction.**\
52
-
To clear that out: You can use any package/custom code you want.\
53
-
As long as it compiles cross-platform and itself is only pure Object Pascal.\
54
-
Anything from the `Jedi Project` or even `mORMmot` ( or anything else ), if it compiles, runs cross-platform it's allowed.
53
+
**There's been a bit of confusion about this restriction.**
54
+
-To clear that out: You can use any package/custom code you want.
55
+
-As long as it compiles cross-platform and itself is only pure Object Pascal.
56
+
-Anything from the `Jedi Project` or even `mORMmot` ( or anything else ), if it compiles, runs cross-platform it's allowed.
55
57
2. The code must have some sort of mention/attribution to the original author, in case you've used someone else's code.
56
58
3. It's not a blatant copy just for the sake of submission.
57
59
4. It adds something of value, not just a different code formatting.
58
60
5. All code should be formatted with the `IDE`'s default formatting tool.
59
61
60
-
**IMPORTANT**\
61
-
This challenge can be entered even if you only have access to the Community Edition of RAD Studio. \
62
-
I have a Windows VM, with RAD Studio installed, that will do the necessary cross compilation into my Linux host.
62
+
**IMPORTANT**
63
+
This challenge can be entered even if you only have access to the Community Edition of RAD Studio. I have a Windows VM, with RAD Studio installed, that will do the necessary cross compilation into my Linux host.
63
64
64
65
Submit your implementation and become part of the leader board!
65
66
66
67
## Rounding
67
68
68
-
Székely Balázs has provided code for rounding towards positive infinity per the original challenge.\
69
+
Székely Balázs has provided code for rounding towards positive infinity per the original challenge.
69
70
This will be the official way to round the output values:
70
71
```pas
71
72
function TBaseline.RoundEx(x: Double): Double;
@@ -96,7 +97,7 @@ end;
96
97
```
97
98
98
99
## Generating the measurements.txt
99
-
> **NOTE**\
100
+
> **NOTE**
100
101
> We now have both a Lazarus version and a Delphi version of the generator for both 32b and 64b.
101
102
102
103
In order to produce the One Billion Rows of text, we are providing the [source code](./generator) for the official generator, so we all have the same entry data.
@@ -110,7 +111,7 @@ In order to produce the One Billion Rows of text, we are providing the [source c
110
111
|**-n** or **--line-count \<number\>**| The amount of lines to be generated ( Can use 1_000_000_000 ) |
111
112
112
113
## Baseline
113
-
> **NOTE**\
114
+
> **NOTE**
114
115
> This is still a bit in flux, still needing to get the Delphi version done.
115
116
116
117
In order to verify the official output, we are providing the [source code](./baseline) for the official baseline.
@@ -178,7 +179,7 @@ These are the results from running all entries into the challenge on my personal
178
179
| 4 | 1:16.059 | lazarus-3.0, fpc-3.2.2 | Richard Lawson | Using 1 thread ||
> After some tests performed by @paweld, it makes no sense to have an `HDD` run.
184
185
> I've removed that from the results
@@ -206,7 +207,8 @@ A: Ubuntu 23.10 64b.
206
207
I'd like to thank [@paweld](https://github.com/paweld) for taking us from my miserable 20m attempt, to a whopping ~25s, beating the [Python script](https://github.com/gunnarmorling/1brc/blob/main/src/main/python/create_measurements.py) by about 4 and a half minutes.\
207
208
I'd like to thank [@mobius](https://github.com/mobius1qwe) for taking the time to provide the Delphi version of the generator.\
208
209
I'd like to thank [@dtpfl](https://github.com/dtpfl) for his invaluable work on maintaining the `README.md` file up to date with everything.\
209
-
I'd like to thank Székely Balázs for providing many patches to make everything compliant with the original challenge.
210
+
I'd like to thank Székely Balázs for providing many patches to make everything compliant with the original challenge.\
211
+
I'd like to thank [@corneliusdavid](https://github.com/corneliusdavid) for giving some of the information files a once over and making things more legible and clear.
210
212
211
213
## Links
212
214
The original repository: https://github.com/gunnarmorling/1brc\
An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 Athens by [David Cornelius](https://github.com/corneliusdavid).
4
+
5
+
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I solved this in three different ways:
6
+
7
+
-**TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, after this list is built, another list must be created to sort them.
8
+
-**TStringList** - this is a really simple implementation but requires a lot of memory because the `LoadFromFile` method is used to read in all rows before processing them. Then a second TStringList is used to collate and sort the data. *NOTE: Using LoadFromFile resulted in an immediate Range Check Error when trying to read in the 1-billion line file! The default Stream created in `LoadFromFile` was the problem. When I switched to LoadFromStream and created my own Stream, it worked. However, it it's not near as fast as the `TDictionary` version.*
9
+
-**In-Memory Table** - another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. While I did learn some cool things about FireDAC, I also learned that this is by far, the most *inefficient* approach for this: after running for 26 *HOURS*, I killed the process!
10
+
11
+
## Compiler
12
+
13
+
**Delphi 12 Athens** Enterprise Edition - which includes the ability to generate Linux console apps.
14
+
15
+
### Dependencies
16
+
17
+
There are no dependencies if run under one of the most recent versions of Delphi. The code should be backwards-compatible to Delphi 10.3 Rio (it uses inline variables and type inference introduced in that version) and further back with a few simple modifications. It uses `System.StrUtils`, `Generics.Collections`, and a few other run-time libraries in Delphi.
18
+
19
+
### Conditional Compilation
20
+
21
+
There are compiler directives to add some convenience when debugging. If built with the default *Debug* configuration, then the DEBUG compiler symbol is defined which turns on a few lines of code that give a little feedback and wait for Enter to be pressed so you can run this from the IDE without missing the quickly disappearing DOS box where the output is displayed.
22
+
23
+
Thus, build in *Release* mode for the official challenge.
24
+
25
+
### Execution
26
+
27
+
The program runs identical between Win32, Win64, and Linux64. If you run it without exactly two parameters, it will display the syntax and exit. The parameters are:
28
+
29
+
-**Filename**: this is the input filename containing the measurements data as defined by the challenge.
30
+
-*Method*: this specifies which method to use when reading and collating the data and correspond to the methods as discussed above:
31
+
- **DIC**: TDictionary
32
+
- **TSL**: TStringList; NOT an official entry--very slow.
33
+
- **TBL**: In-Memory Table; NOT an official entry, only test with *small* data sets
34
+
35
+
#### Example
36
+
37
+
To run the challenge, read from the `measurements.txt` file, and use the TDictionary method, run it like this:
38
+
39
+
```
40
+
C:> docrnelius measurements.txt dic
41
+
```
42
+
43
+
## Remarks
44
+
45
+
I entered this challenge as a learning experience. I did not expect to be the fastest as I don't have time to implement multiple threads (which is clearly the road to victory here) but I had fun and learned a lot!
46
+
47
+
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files.
48
+
49
+
I also learned some things about a `TDictionary` and why it works so well for this particular situation. And, *after* I implemented this method, I looked at other entries and noticed the one by IWAN KELAIAH was very similar to mine. While the "ikelaiah" entry was submitted before mine, I did not look at or copy anything from that implementation. Chalk it up to great minds thinking alike, I guess!
50
+
51
+
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
52
+
53
+
## History
54
+
55
+
- Version 1.0: working version with `TDictionary`, `TStringList`, and `TFDMemTable` methods implemented.
0 commit comments