Commit 73cf38b
committed
perf: fix record copying performance bug
If there happens to be an abnormally long record in a CSV file---where
the rest are short---this abnormally long record ends up causing a
performance loss while parsing subsequent records. Such a thing is
usually caused by a buffer being expanded, and then that expanded buffer
leading to extra cost that shouldn't be paid when parsing smaller
records. Indeed, this case is no exception.
In this case, the standard record iterators use an internal record for
copying CSV data into, and then clone this record as appropriate it the
iterator's `next` method. In this way, that record's memory can be
reused. This is a bit better than just allocating a fresh buffer every
time, since generally speaking, the length of each CSV row is usually
pretty similar to the length of prior rows.
However, in this case, when we come across an exceptionally long record,
the internal record is expanded to handle that record. When that
internal record is clone to give back to the caller, the record *and*
its excess capacity is also cloned. In the case of an abnormally long
record, this ends up copying that extra excess capacity for all
subsequent rows. This easily explains the performance bug.
So to fix it, we introduce a new private method that lets us copy a
record *without* excess capacity. (We could implement `Clone` more
intelligently, but I'm not sure whether it's appropriate to drop excess
capacity in a `Clone` impl. That might be unexpected.) We then use this
new method in the iterators instead of standard `clone`.
In the case where there is no abnormally long records, this shouldn't
have any impact.
Fixes #2271 parent 6623d87 commit 73cf38b
3 files changed
+24
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
497 | 497 | | |
498 | 498 | | |
499 | 499 | | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
500 | 512 | | |
501 | 513 | | |
502 | 514 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2050 | 2050 | | |
2051 | 2051 | | |
2052 | 2052 | | |
2053 | | - | |
| 2053 | + | |
2054 | 2054 | | |
2055 | 2055 | | |
2056 | 2056 | | |
| |||
2087 | 2087 | | |
2088 | 2088 | | |
2089 | 2089 | | |
2090 | | - | |
| 2090 | + | |
2091 | 2091 | | |
2092 | 2092 | | |
2093 | 2093 | | |
| |||
2126 | 2126 | | |
2127 | 2127 | | |
2128 | 2128 | | |
2129 | | - | |
| 2129 | + | |
2130 | 2130 | | |
2131 | 2131 | | |
2132 | 2132 | | |
| |||
2163 | 2163 | | |
2164 | 2164 | | |
2165 | 2165 | | |
2166 | | - | |
| 2166 | + | |
2167 | 2167 | | |
2168 | 2168 | | |
2169 | 2169 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
610 | 610 | | |
611 | 611 | | |
612 | 612 | | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
613 | 621 | | |
614 | 622 | | |
615 | 623 | | |
| |||
0 commit comments