-
Notifications
You must be signed in to change notification settings - Fork 68
Add batch support #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add batch support #112
Conversation
lib/csv_importer/csv_reader.rb
Outdated
|
|
||
| if use_batches | ||
| io = file_or_content_io | ||
| separator = detect_separator(io.read(1024)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract separator to a method and memoize it as it's called twice.
lib/csv_importer/runner.rb
Outdated
| batch_size: batch_size, | ||
| use_batches: use_batches | ||
| ) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overwriting initialize, what do you think about extracting csv_reader to a method and memoizing it?
lib/csv_importer/csv_reader.rb
Outdated
| end | ||
| end | ||
|
|
||
| def sanitize_content_stream(io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great method but it's not used. Can you make the Enumerator use it?
|
This is great @ziadsawalha ! I added a couple of comments. Could creating a new class simplify the codebase? e.g. Regarding the public API, We could release this as the next |
|
Thanks for the good feedback. I'll submit updates when I get to it... hopefully tonight (Europe). |
033f9c6 to
2ff239f
Compare
|
I've updated the PR with the following changes:
I struggled with trying to support streams that don't have rewind (which means we need to detect the separator and headers on load and then not try to restart the stream). That complicated things a bit, IMO. More feedback welcome while I try to add more tests and validate this in real life use cases. |
2ff239f to
6ed5a1d
Compare
6ed5a1d to
d37ade1
Compare
|
Great, thank you for making these changes @ziadsawalha ! How does this work against real-life use cases? What kind of streams don't have rewind? S3? Could we close and re-open (or duplicate?!) the stream if it doesn't support rewinding? |
Giving this a stab. Feedback welcome!