Skip to content

Commit 25815be

Browse files
PhelsongJosh S Wilkinsoncarolinefrasca
authored
mojo_csv 1.5.0, supports mojo 25.5.0 (#162)
* mojo_csv * Update recipe.yaml for CI * unquote for ci * update test * Update source git URL * change tests syntax * verify 25.3.0 and update test * update test * update csv path * update test to cwd * update build and versioning * add logo * update readme * update mojo_csv to 1.3 * 1.4.0 * mojo_csv_1.5.0/mojo_25.5.0 --------- Co-authored-by: Josh S Wilkinson <josh.wilkinson@apkudo.com> Co-authored-by: Caroline Frasca <42614552+carolinefrasca@users.noreply.github.com>
1 parent 9c460f8 commit 25815be

File tree

2 files changed

+76
-92
lines changed

2 files changed

+76
-92
lines changed

recipes/mojo_csv/README.md

Lines changed: 73 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -17,75 +17,60 @@ Add the Modular community channel (https://repo.prefix.dev/modular-community) to
1717
channels = ["conda-forge", "https://conda.modular.com/max", "https://repo.prefix.dev/modular-community"]
1818
```
1919

20-
`pixi add mojo_csv`
2120

22-
##### Basic Usage
21+
```sh
22+
pixi add mojo_csv
23+
```
2324

24-
```mojo
25-
from mojo_csv import CsvReader
26-
from pathlib import Path
25+
## Usage
2726

28-
fn main():
29-
var csv_path = Path("path/to/csv/file.csv")
30-
var reader = CsvReader(csv_path)
31-
for i in range(len(reader)):
32-
print(reader[i])
33-
```
3427

35-
##### Optional Usage
28+
By default uses all logical cores - 2
29+
```mojo
30+
CsvReader(
31+
in_csv: Path,
32+
delimiter: String = ",",
33+
quotation_mark: String = '"',
34+
num_threads: Int = 0, # default = 0 = use all available cores - 2
35+
)
36+
```
3637

3738
```mojo
3839
from mojo_csv import CsvReader
3940
from pathlib import Path
41+
from sys import exit
4042
41-
fn main():
43+
fn main() raises:
4244
var csv_path = Path("path/to/csv/file.csv")
43-
var reader = CsvReader(csv_path, delimiter="|", quotation_mark='*')
45+
try:
46+
var reader = CsvReader(csv_path)
47+
except:
48+
exit()
4449
for i in range(len(reader)):
4550
print(reader[i])
4651
```
4752

48-
#### BETA
49-
1.4.0 will be the last version where this isn't the default
50-
51-
```mojo
52-
ThreadedCsvReader(
53-
file_path: Path,
54-
delimiter: String = ",",
55-
quotation_mark: String = '"',
56-
num_threads: Int = 0 # 0 = use all available cores
57-
)
58-
```
5953

60-
### Example 1: Default (All Cores)
54+
### Delimiters
6155

6256
```mojo
63-
var reader = ThreadedCsvReader(Path("large_file.csv"))
64-
// Uses all 16 cores on a 16-core system
57+
CsvReader(csv_path, delimiter=";", quotation_mark='|')
6558
```
6659

67-
### Example 2: Custom Thread Count
68-
60+
### Threads
61+
__force single threaded__
6962
```mojo
70-
var reader = ThreadedCsvReader(Path("data.csv"), num_threads=4)
71-
// Uses exactly 4 threads
63+
CsvReader(csv_pash, num_threads = 1)
7264
```
73-
74-
### Example 3: Single-threaded
75-
65+
__use all the threads__
7666
```mojo
77-
var reader = ThreadedCsvReader(Path("data.csv"), num_threads=1)
78-
// Forces single-threaded execution (same as CsvReader)
79-
```
80-
81-
### Example 4: Custom Delimiter
67+
from sys import num_logical_cores
8268
83-
````mojo
84-
var reader = ThreadedCsvReader(
85-
Path("pipe_separated.csv"),
86-
delimiter="|",
87-
num_threads=8
69+
var reader = CsvReader(
70+
csv_path, num_threads = num_logical_cores()
8871
)
72+
```
73+
8974

9075
### Attributes
9176

@@ -99,7 +84,7 @@ reader.elements : List[String] # all delimited elements
9984
reader.length : Int # total number of elements
10085
````
10186
102-
##### Indexing
87+
### Indexing
10388
10489
currently the array is only 1D, so indexing is fairly manual.
10590
@@ -109,72 +94,73 @@ reader[0] # first element
10994

11095
### Performance
11196

112-
- average times over 1k iterations
113-
- 7950x@5.8ghz (peak clock)
114-
- uncompiled
97+
- average times over 100-1k iterations
98+
- AMD 7950x@5.8ghz
11599
- single-threaded
116100

117-
micro file benchmark (3 rows)
118-
mini (100 rows)
119-
small (1k rows)
120-
medium file benchmark (100k rows)
121-
large file benchmark (2m rows)
101+
micro file benchmark (3 rows)
102+
mini (100 rows)
103+
small (1k rows)
104+
medium file benchmark (100k rows)
105+
large file benchmark (2m rows)
122106

123107
```log
124-
✨ Pixi task (bench): mojo bench.mojo
125-
running benchmark for micro csv:
108+
✨ Pixi task (bench): mojo bench.mojo running benchmark for micro csv:
126109
average time in ms for micro file:
127-
0.01875
110+
0.0094 ms
128111
-------------------------
129112
running benchmark for mini csv:
130113
average time in ms for mini file:
131-
0.07328
114+
0.0657 ms
132115
-------------------------
133116
running benchmark for small csv:
134117
average time in ms for small file:
135-
0.417368
118+
0.317 ms
136119
-------------------------
137120
running benchmark for medium csv:
138121
average time in ms for medium file:
139-
36.45899
122+
24.62 ms
140123
-------------------------
141124
running benchmark for large csv:
142125
average time in ms for large file:
143-
1253.19458
126+
878.6 ms
144127
```
145128

146-
=== ThreadedCsvReader Performance Comparison ===
129+
#### CSV Reader Performance Comparison
130+
```
131+
Small file benchmark (1,000 rows):
132+
Single-threaded:
133+
Average time: 0.455 ms
134+
Multi-threaded:
135+
Average time: 0.3744 ms
136+
Speedup: 1.22 x
137+
138+
Medium file benchmark (100,000 rows):
139+
Single-threaded:
140+
Average time: 37.37 ms
141+
Multi-threaded:
142+
Average time: 24.46 ms
143+
Speedup: 1.53 x
144+
145+
Large file benchmark (2,000,000 rows):
146+
Single-threaded:
147+
Average time: 1210.3 ms
148+
Multi-threaded:
149+
Average time: 863.9 ms
150+
Speedup: 1.4 x
147151
148-
Small file benchmark (1,000 rows):
149-
Single-threaded:
150-
Average time: 0.500384 ms
151-
Multi-threaded:
152-
Average time: 0.451094 ms
153-
Speedup: 1.11 x
154-
-------------------------
155-
Medium file benchmark (100,000 rows):
156-
Single-threaded:
157-
Average time: 38.124275 ms
158-
Multi-threaded:
159-
Average time: 24.650092 ms
160-
Speedup: 1.55 x
161-
-------------------------
162-
Large file benchmark (2,000,000 rows):
163-
Single-threaded:
164-
Average time: 1175.345429 ms
165-
Multi-threaded:
166-
Average time: 830.02685 ms
167-
Speedup: 1.42 x
168-
-------------------------
169152
Summary:
170-
Small file speedup: 1.11 x
171-
Medium file speedup: 1.55 x
172-
Large file speedup: 1.42 x
153+
Small file speedup: 1.22 x
154+
Medium file speedup: 1.53 x
155+
Large file speedup: 1.4 x
156+
```
173157

174-
_Tested on AMD 7950x (16 cores) @ 5.8GHz_
175158

176159
## Future Improvements
177160

161+
- [ ] 2D indexing
162+
- [ ] CsvWriter
163+
- [ ] CsvDictReader
178164
- [ ] SIMD optimization within each thread
179165
- [ ] Async Chunking
180166
- [ ] Streaming support for very large files

recipes/mojo_csv/recipe.yaml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
context:
2-
version: 1.4.0
3-
2+
version: 1.5.0
43

54
package:
65
name: "mojo_csv"
76
version: ${{ version }}
87

98
source:
109
- git: https://github.com/Phelsong/mojo_csv.git
11-
rev: d92e7b72933445c71c463d3f9eb52404dd01edf2
12-
10+
rev: b3a9dc4422efbea7a94939e3a48ff4a3b03e3505
1311

1412
build:
1513
number: 0
@@ -18,7 +16,7 @@ build:
1816

1917
requirements:
2018
host:
21-
- max >=25.1.0,<26
19+
- max >=25.4.0,<26
2220
run:
2321
- ${{ pin_compatible('max') }}
2422

0 commit comments

Comments
 (0)