-
Notifications
You must be signed in to change notification settings - Fork 338
Description
Problem: slow because of repetitive heap-allocation
The 5 iterators below generate Vec<...> items.
multi_cartesian_product(self)
combinations(self, k: usize)
combinations_with_replacement(self, k: usize)
permutations(self, k: usize)
powerset(self)
It can be slow, because of the repetitive allocations, even if the user does not need to permanently store each permutation.
I think that in most cases, the user uses each vector to produce something else and then each vector is discarded. In that case, because those iterators generate (up to) many many items (product of the lengths, factorial-related, powers of two), it creates many temporary vectors which can slow down code execution.
Proposed solution: map slices
An alternative iterator could own one single (internal/private) vector and only give a reference to it to a function implementing FnMut(&[...]) -> R. Those iterators would then generate R items instead of vectors.
// current methods unchanged, new methods:
multi_cartesian_product_map<R, F>(self, func: F)
combinations_map<R, F>(self, k: usize, func: F)
combinations_with_replacement_map<R, F>(self, k: usize, func: F)
permutations_map<R, F>(self, k: usize, func: F)
powerset_map<R, F>(self, func: F)
// all with F: FnMut(&[...]) -> R
Usage examples
it.combinations(k) and it.combinations_map(k, ToOwned::to_owned) would produce the same elements, but the first should be used!
it.combinations(k).map(closure) and it.combinations_map(k, closure) could produce the same elements if it does not require a vector. For what I tested, it's then roughly 4 to 5 times faster to go through all combinations, unless the iterator has very few elements (then .map should be used).
In it.permutations(5).filter_map(|perm: Vec<_>| some_option), if some_option does not need perm to be an owned vector then it could become it.permutations_map(5, |perm: &[_]| some_option).flatten() where there is no heap-allocation for each permutation.
Implementation (commits here)
New module vec_items (behind use_alloc feature) with:
pub trait VecItems<T> { type Output; fn new_item<I: Iterator<Item = T>>(&mut self, iter: I) -> Self::Output; }pub struct CollectToVec;implementingVecItemswithVec<T>output and justiter.collect()(so a new vector each time, as it's currently done).pub struct MapSlice<F, T> { func: F, vec: Vec<T> }implementingVecItemswithRoutput whereF: FnMut(&[T]) -> Rand the internal vector is cleared but not dropped after use (and its length will not change often if ever).
E.g. for combinations_map:
- In
Itertoolstrait:#[cfg(feature = "use_alloc")] fn combinations_map<R, F: FnMut(&[Self::Item]) -> R>(self, k: usize, f: F) -> CombinationsMap<Self, F> where ... - in
src/combinations.rs:- Convert
Combinations<I>toCombinationsBase<I, F>with the new fieldmanager: F - New (fused)iterator constraint
F: VecItems<I::Item>and item typeF::Output - New
pub type Combinations<I> = CombinationsBase<I, CollectToVec>; - New
pub type CombinationsMap<I, F> = CombinationsBase<I, MapSlice<F, <I as Iterator>::Item>>; - New
fn combinations_map<I, F>(iter: I, k: usize, f: F) -> CombinationsMap<I, F>
- Convert
- Test to compare with
combinations.
Very similar for the 4 others. powerset[_map] is a bit simpler because it just uses combinations[_map] internally ; and it was simpler for multi_cartesian_product than anticipated.
Closing thoughts
Avoid cloning in MapSlice case? I think(?) it might be possible to only have references: MapSlice { func: F, vec: Vec<&T> }. Then F: FnMut(&[&T]) -> R and no cloning. But I struggle with the lifetimes (or more probably some hard barrier). But is it worth it? Probably not if it's just integers, but otherwise?
The documentation I did not wrote yet would suggest to use the non _map variants if the user needs owned vectors and not slices (or if there is very little elements expected).
PS: I got the idea while replacing (0..nb).permutations(nb) (after using flamegraph and heaptrack) by a _map version of mine of permutohedron::Heap to avoid any (internal) heap-allocation while keeping a nice iterator structure. And for 8! == 40320 times, that was also 5 times faster.