@@ -17,75 +17,60 @@ Add the Modular community channel (https://repo.prefix.dev/modular-community) to
1717channels = [" conda-forge" , " https://conda.modular.com/max" , " https://repo.prefix.dev/modular-community" ]
1818```
1919
20- ` pixi add mojo_csv `
2120
22- ##### Basic Usage
21+ ``` sh
22+ pixi add mojo_csv
23+ ```
2324
24- ``` mojo
25- from mojo_csv import CsvReader
26- from pathlib import Path
25+ ## Usage
2726
28- fn main():
29- var csv_path = Path("path/to/csv/file.csv")
30- var reader = CsvReader(csv_path)
31- for i in range(len(reader)):
32- print(reader[i])
33- ```
3427
35- ##### Optional Usage
28+ By default uses all logical cores - 2
29+ ``` mojo
30+ CsvReader(
31+ in_csv: Path,
32+ delimiter: String = ",",
33+ quotation_mark: String = '"',
34+ num_threads: Int = 0, # default = 0 = use all available cores - 2
35+ )
36+ ```
3637
3738``` mojo
3839from mojo_csv import CsvReader
3940from pathlib import Path
41+ from sys import exit
4042
41- fn main():
43+ fn main() raises :
4244 var csv_path = Path("path/to/csv/file.csv")
43- var reader = CsvReader(csv_path, delimiter="|", quotation_mark='*')
45+ try:
46+ var reader = CsvReader(csv_path)
47+ except:
48+ exit()
4449 for i in range(len(reader)):
4550 print(reader[i])
4651```
4752
48- #### BETA
49- 1.4.0 will be the last version where this isn't the default
50-
51- ``` mojo
52- ThreadedCsvReader(
53- file_path: Path,
54- delimiter: String = ",",
55- quotation_mark: String = '"',
56- num_threads: Int = 0 # 0 = use all available cores
57- )
58- ```
5953
60- ### Example 1: Default (All Cores)
54+ ### Delimiters
6155
6256``` mojo
63- var reader = ThreadedCsvReader(Path("large_file.csv"))
64- // Uses all 16 cores on a 16-core system
57+ CsvReader(csv_path, delimiter=";", quotation_mark='|')
6558```
6659
67- ### Example 2: Custom Thread Count
68-
60+ ### Threads
61+ __ force single threaded __
6962``` mojo
70- var reader = ThreadedCsvReader(Path("data.csv"), num_threads=4)
71- // Uses exactly 4 threads
63+ CsvReader(csv_pash, num_threads = 1)
7264```
73-
74- ### Example 3: Single-threaded
75-
65+ __ use all the threads__
7666``` mojo
77- var reader = ThreadedCsvReader(Path("data.csv"), num_threads=1)
78- // Forces single-threaded execution (same as CsvReader)
79- ```
80-
81- ### Example 4: Custom Delimiter
67+ from sys import num_logical_cores
8268
83- ```` mojo
84- var reader = ThreadedCsvReader(
85- Path("pipe_separated.csv"),
86- delimiter="|",
87- num_threads=8
69+ var reader = CsvReader(
70+ csv_path, num_threads = num_logical_cores()
8871)
72+ ```
73+
8974
9075### Attributes
9176
@@ -99,7 +84,7 @@ reader.elements : List[String] # all delimited elements
9984reader.length : Int # total number of elements
10085````
10186
102- ##### Indexing
87+ ### Indexing
10388
10489currently the array is only 1D, so indexing is fairly manual.
10590
@@ -109,72 +94,73 @@ reader[0] # first element
10994
11095### Performance
11196
112- - average times over 1k iterations
113- - 7950x@5.8ghz (peak clock)
114- - uncompiled
97+ - average times over 100-1k iterations
98+ - AMD 7950x@5.8ghz
11599- single-threaded
116100
117- micro file benchmark (3 rows)
118- mini (100 rows)
119- small (1k rows)
120- medium file benchmark (100k rows)
121- large file benchmark (2m rows)
101+ micro file benchmark (3 rows)
102+ mini (100 rows)
103+ small (1k rows)
104+ medium file benchmark (100k rows)
105+ large file benchmark (2m rows)
122106
123107``` log
124- ✨ Pixi task (bench): mojo bench.mojo
125- running benchmark for micro csv:
108+ ✨ Pixi task (bench): mojo bench.mojo running benchmark for micro csv:
126109average time in ms for micro file:
127- 0.01875
110+ 0.0094 ms
128111-------------------------
129112running benchmark for mini csv:
130113average time in ms for mini file:
131- 0.07328
114+ 0.0657 ms
132115-------------------------
133116running benchmark for small csv:
134117average time in ms for small file:
135- 0.417368
118+ 0.317 ms
136119-------------------------
137120running benchmark for medium csv:
138121average time in ms for medium file:
139- 36.45899
122+ 24.62 ms
140123-------------------------
141124running benchmark for large csv:
142125average time in ms for large file:
143- 1253.19458
126+ 878.6 ms
144127```
145128
146- === ThreadedCsvReader Performance Comparison ===
129+ #### CSV Reader Performance Comparison
130+ ```
131+ Small file benchmark (1,000 rows):
132+ Single-threaded:
133+ Average time: 0.455 ms
134+ Multi-threaded:
135+ Average time: 0.3744 ms
136+ Speedup: 1.22 x
137+
138+ Medium file benchmark (100,000 rows):
139+ Single-threaded:
140+ Average time: 37.37 ms
141+ Multi-threaded:
142+ Average time: 24.46 ms
143+ Speedup: 1.53 x
144+
145+ Large file benchmark (2,000,000 rows):
146+ Single-threaded:
147+ Average time: 1210.3 ms
148+ Multi-threaded:
149+ Average time: 863.9 ms
150+ Speedup: 1.4 x
147151
148- Small file benchmark (1,000 rows):
149- Single-threaded:
150- Average time: 0.500384 ms
151- Multi-threaded:
152- Average time: 0.451094 ms
153- Speedup: 1.11 x
154- -------------------------
155- Medium file benchmark (100,000 rows):
156- Single-threaded:
157- Average time: 38.124275 ms
158- Multi-threaded:
159- Average time: 24.650092 ms
160- Speedup: 1.55 x
161- -------------------------
162- Large file benchmark (2,000,000 rows):
163- Single-threaded:
164- Average time: 1175.345429 ms
165- Multi-threaded:
166- Average time: 830.02685 ms
167- Speedup: 1.42 x
168- -------------------------
169152Summary:
170- Small file speedup: 1.11 x
171- Medium file speedup: 1.55 x
172- Large file speedup: 1.42 x
153+ Small file speedup: 1.22 x
154+ Medium file speedup: 1.53 x
155+ Large file speedup: 1.4 x
156+ ```
173157
174- _ Tested on AMD 7950x (16 cores) @ 5.8GHz_
175158
176159## Future Improvements
177160
161+ - [ ] 2D indexing
162+ - [ ] CsvWriter
163+ - [ ] CsvDictReader
178164- [ ] SIMD optimization within each thread
179165- [ ] Async Chunking
180166- [ ] Streaming support for very large files
0 commit comments