Skip to content

Commit 14506c1

Browse files
authored
Merge branch 'gcarreno:main' into main
2 parents 2d2e750 + 6c03319 commit 14506c1

15 files changed

+2247
-31
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,7 @@ backup/
6565

6666
# Castalia statistics file
6767
*.stat
68+
generator/Delphi/src/OSX64/
69+
generator/Delphi/src/Win32/
70+
generator/Delphi/src/Win64/
71+
generator/Delphi/src/Linux64/

entries/dcornelius/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# David Cornelius
2+
3+
An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 Athens by [David Cornelius](https://github.com/corneliusdavid).
4+
5+
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I solved this in three different ways:
6+
7+
- **TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, after this list is built, another list must be created to sort them.
8+
- **TStringList** - this is a really simple implementation but requires a lot of memory because the `LoadFromFile` method is used to read in all rows before processing them. Then a second TStringList is used to collate and sort the data. *NOTE: Using LoadFromFile resulted in an immediate Range Check Error when trying to read in the 1-billion line file! The default Stream created in `LoadFromFile` was the problem. When I switched to LoadFromStream and created my own Stream, it worked. However, it it's not near as fast as the `TDictionary` version.*
9+
- **In-Memory Table** - another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. While I did learn some cool things about FireDAC, I also learned that this is by far, the most *inefficient* approach for this: after running for 26 *HOURS*, I killed the process!
10+
11+
## Compiler
12+
13+
**Delphi 12 Athens** Enterprise Edition - which includes the ability to generate Linux console apps.
14+
15+
### Dependencies
16+
17+
There are no dependencies if run under one of the most recent versions of Delphi. The code should be backwards-compatible to Delphi 10.3 Rio (it uses inline variables and type inference introduced in that version) and further back with a few simple modifications. It uses `System.StrUtils`, `Generics.Collections`, and a few other run-time libraries in Delphi.
18+
19+
### Conditional Compilation
20+
21+
There are compiler directives to add some convenience when debugging. If built with the default *Debug* configuration, then the DEBUG compiler symbol is defined which turns on a few lines of code that give a little feedback and wait for Enter to be pressed so you can run this from the IDE without missing the quickly disappearing DOS box where the output is displayed.
22+
23+
Thus, build in *Release* mode for the official challenge.
24+
25+
### Execution
26+
27+
The program runs identical between Win32, Win64, and Linux64. If you run it without exactly two parameters, it will display the syntax and exit. The parameters are:
28+
29+
- **Filename**: this is the input filename containing the measurements data as defined by the challenge.
30+
- *Method*: this specifies which method to use when reading and collating the data and correspond to the methods as discussed above:
31+
- **DIC**: TDictionary
32+
- **TSL**: TStringList; NOT an official entry--very slow.
33+
- **TBL**: In-Memory Table; NOT an official entry, only test with *small* data sets
34+
35+
#### Example
36+
37+
To run the challenge, read from the `measurements.txt` file, and use the TDictionary method, run it like this:
38+
39+
```
40+
C:> docrnelius measurements.txt dic
41+
```
42+
43+
## Remarks
44+
45+
I entered this challenge as a learning experience. I did not expect to be the fastest as I don't have time to implement multiple threads (which is clearly the road to victory here) but I had fun and learned a lot!
46+
47+
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files.
48+
49+
I also learned some things about a `TDictionary` and why it works so well for this particular situation. And, *after* I implemented this method, I looked at other entries and noticed the one by IWAN KELAIAH was very similar to mine. While the "ikelaiah" entry was submitted before mine, I did not look at or copy anything from that implementation. Chalk it up to great minds thinking alike, I guess!
50+
51+
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
52+
53+
## History
54+
55+
- Version 1.0: working version with `TDictionary`, `TStringList`, and `TFDMemTable` methods implemented.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
program dcornelius;
2+
(* as: OBRC_DCornelius.dpr
3+
* by: David Cornelius
4+
* on: March, 2024
5+
* in: Delphi 12 Athens
6+
* to: submit an entry in the One Billion Row Challenge
7+
*
8+
* Development/Testing was done on a 3.8 GHz Intel i7 desktop computer running Windows 11.
9+
*
10+
* NOTE: Build with 'Debug' configuration for messages, automatic timing, and a pause at the end.
11+
* Build with 'Release' configuration for the official submission.
12+
*)
13+
14+
{$APPTYPE CONSOLE}
15+
16+
{$R *.res}
17+
18+
uses
19+
System.SysUtils, System.Classes,
20+
{$IFDEF DEBUG}
21+
System.Diagnostics,
22+
{$ENDIF }
23+
uChallengeWithDictionary in 'uChallengeWithDictionary.pas',
24+
uChallengeCommon in 'uChallengeCommon.pas',
25+
uChallengeWithStringList in 'uChallengeWithStringList.pas',
26+
udmChallengeWithFireDAC in 'udmChallengeWithFireDAC.pas' {dmChallengeWithFireDAC: TDataModule};
27+
28+
procedure DisplaySyntax;
29+
begin
30+
Writeln('SYNTAX: ' + ExtractFileName(ParamStr(0)) + ' <filename> <method>');
31+
Writeln(' where <filename> is a text file with weather station data');
32+
Writeln(' and <method> is the algorytm for summarizing the data:');
33+
Writeln(' TSL = read in all data to a TStringList (lots of memory needed)');
34+
Writeln(' DIC = build a Dictionary, then sort after built');
35+
Writeln(' TBL = load a FireDAC in-memory table - warning: takes several hours!');
36+
Writeln;
37+
{$IFDEF DEBUG}
38+
Writeln('Running in Debug mode.');
39+
{$ELSE}
40+
Writeln('Running in Release mode.');
41+
{$ENDIF}
42+
Writeln('Press ENTER...');
43+
Readln;
44+
end;
45+
46+
begin
47+
try
48+
if ParamCount <> 2 then
49+
DisplaySyntax
50+
else begin
51+
{$IFDEF DEBUG}
52+
var StopWatch := TStopwatch.StartNew;
53+
{$ENDIF}
54+
ChallengeCommon := TChallengeCommon.Create(ParamStr(1));
55+
try
56+
var Method := ParamStr(2);
57+
if SameText(Method, 'TSL') then
58+
ChallengeWithStringList
59+
else if SameText(Method, 'DIC') then
60+
ChallengeWithDictionary
61+
else if SameText(Method, 'TBL') then
62+
ChallengeWithFireDAC
63+
else
64+
raise EArgumentException.Create('Invalid method');
65+
finally
66+
ChallengeCommon.Free;
67+
end;
68+
{$IFDEF DEBUG}
69+
StopWatch.Stop;
70+
var ms := StopWatch.ElapsedMilliseconds;
71+
Writeln(Format('Elapsed Time milliseconds: %d, minutes: ~%d:%d', [ms, ms div 1000 div 60, (ms div 1000) mod 60]));
72+
Readln;
73+
{$ENDIF}
74+
end;
75+
except
76+
on E: Exception do
77+
Writeln(E.ClassName, ': ', E.Message);
78+
end;
79+
end.

0 commit comments

Comments
 (0)