@@ -253,27 +253,23 @@ to recover the indices in the middle (`5 == 3 + 2` and `7 == 3 + 4`).
253253
254254### Searcher
255255
256- A searcher has two required methods ` .search() ` and ` .consume() ` ,
257- and an optional method ` .trim_start() ` .
256+ A searcher only provides a single method: ` .search() ` . It takes a span as input ,
257+ and returns the first sub-range where the given pattern is found .
258258
259259``` rust
260260pub unsafe trait Searcher <A : Hay + ? Sized > {
261261 fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
262- fn consume (& mut self , span : Span <& A >) -> Option <A :: Index >;
263- fn trim_start (& mut self , hay : & A ) -> A :: Index { ... }
264262}
265263
266264pub unsafe trait ReverseSearcher <A : Hay + ? Sized >: Searcher <A > {
267265 fn rsearch (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
268- fn rconsume (& mut self , span : Span <& A >) -> Option <A :: Index >;
269- fn trim_end (& mut self , hay : & A ) -> A :: Index { ... }
270266}
271267
272268pub unsafe trait DoubleEndedSearcher <A : Hay + ? Sized >: ReverseSearcher <A > {}
273269```
274270
275- ` .search() ` and ` .consume() ` are safe because there is no safe ways to construct a ` Span<&A> `
276- with invalid ranges. Implementations of these methods often start with:
271+ The ` .search() ` function is safe because there is no safe ways to construct a ` Span<&A> `
272+ with invalid ranges. Implementations of ` .search() ` often start with:
277273
278274``` rust
279275 fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >> {
@@ -284,9 +280,27 @@ with invalid ranges. Implementations of these methods often start with:
284280
285281The trait is unsafe to implement because it needs to guarantee the returned range is valid.
286282
287- The ` .search() ` method will look for the first slice matching the searcher's pattern in the span,
283+ ### Consumer
284+
285+ A consumer provides the ` .consume() ` method to implement ` starts_with() ` and ` trim_start() ` . It
286+ takes a span as input, and if the beginning matches the pattern, returns the end index of the match.
287+
288+ ``` rust
289+ pub unsafe trait Consumer <A : Hay + ? Sized > {
290+ fn consume (& mut self , span : Span <& A >) -> Option <A :: Index >;
291+ }
292+
293+ pub unsafe trait ReverseConsumer <A : Hay + ? Sized >: Consumer <A > {
294+ fn rconsume (& mut self , span : Span <& A >) -> Option <A :: Index >;
295+ }
296+
297+ pub unsafe trait DoubleEndedConsumer <A : Hay + ? Sized >: ReverseConsumer <A > {}
298+ ```
299+
300+ Comparing searcher and consumer, the ` .search() ` method will look for the first slice
301+ matching the searcher's pattern in the span,
288302and returns the range where the slice is found (relative to the hay's start index).
289- The ` .consume() ` method will is similar, but anchored to the start of the span.
303+ The ` .consume() ` method is similar, but anchored to the start of the span.
290304
291305``` rust
292306let span = unsafe { Span :: from_parts (" CDEFG" , 3 .. 8 ) };
@@ -310,9 +324,10 @@ A pattern is simply a "factory" of a searcher and consumer.
310324``` rust
311325trait Pattern <H : Haystack >: Sized {
312326 type Searcher : Searcher <H :: Target >;
327+ type Consumer : Consumer <H :: Target >;
313328
314329 fn into_searcher (self ) -> Self :: Searcher ;
315- fn into_consumer (self ) -> Self :: Searcher { self . into_searcher () }
330+ fn into_consumer (self ) -> Self :: Consumer ;
316331}
317332```
318333
@@ -322,27 +337,55 @@ mutable state when implementing some more sophisticated string searching algorit
322337
323338The relation between ` Pattern ` and ` Searcher ` is thus like ` IntoIterator ` and ` Iterator ` .
324339
325- There is a required method ` .into_searcher() ` as well as an optional method ` .into_consumer() ` .
340+ There are two required methods ` .into_searcher() ` and ` .into_consumer() ` .
326341In some patterns (e.g. substring search), checking if a prefix match will require much less
327- pre-computation than checking if any substring match. Therefore, if an algorithm can declare that
328- it will only call ` .consume() ` , the searcher could use a more efficient structure.
342+ pre-computation than checking if any substring match.
343+ Therefore, a consumer could use a more efficient structure with this specialized purpose .
329344
330345``` rust
331346impl <H : Haystack <Target = str >> Pattern <H > for & 'p str {
332347 type Searcher = SliceSearcher <'p , [u8 ]>;
348+ type Consumer = NaiveSearcher <'p , [u8 ]>;
333349 #[inline]
334350 fn into_searcher (self ) -> Self :: Searcher {
335351 // create a searcher based on Two-Way algorithm.
336- SliceSearcher :: new_searcher (self )
352+ SliceSearcher :: new (self )
337353 }
338354 #[inline]
339- fn into_consumer (self ) -> Self :: Searcher {
355+ fn into_consumer (self ) -> Self :: Consumer {
340356 // create a searcher based on naive search (which requires no pre-computation)
341- SliceSearcher :: new_consumer (self )
357+ NaiveSearcher :: new (self )
342358 }
343359}
344360```
345361
362+ Note that, unlike ` IntoIterator ` , the standard library is unable to provide a blanket impl:
363+
364+ ``` rust
365+ impl <H , S > Pattern <H > for S
366+ where
367+ H : Haystack ,
368+ S : Searcher <H :: Target > + Consumer <H :: Target >,
369+ {
370+ type Searcher = Self ;
371+ type Consumer = Self ;
372+ fn into_searcher (self ) -> Self { self }
373+ fn into_consumer (self ) -> Self { self }
374+ }
375+ ```
376+
377+ This is because there is already an existing Pattern impl:
378+
379+ ``` rust
380+ impl <'h , F > Pattern <& 'h str > for F
381+ where
382+ F : FnMut (char ) -> bool ,
383+ { ... }
384+ ```
385+
386+ and a type can implement all of ` (FnMut(char) -> bool) + Searcher<str> + Consumer<str> ` ,
387+ causing impl conflict.
388+
346389### Algorithms
347390
348391Standard algorithms are provided as * functions* in the ` core::pattern::ext ` module.
@@ -360,7 +403,7 @@ where
360403pub fn ends_with <H , P >(haystack : H , pattern : P ) -> bool
361404where
362405 H : Haystack ,
363- P : Pattern <H , Searcher : ReverseSearcher <H :: Target >>;
406+ P : Pattern <H , Consumer : ReverseConsumer <H :: Target >>;
364407```
365408
366409** Trim**
@@ -374,12 +417,12 @@ where
374417pub fn trim_end <H , P >(haystack : H , pattern : P ) -> H
375418where
376419 H : Haystack ,
377- P : Pattern <H , Searcher : ReverseSearcher <H :: Target >>;
420+ P : Pattern <H , Consumer : ReverseConsumer <H :: Target >>;
378421
379422pub fn trim <H , P >(haystack : H , pattern : P ) -> H
380423where
381424 H : Haystack ,
382- P : Pattern <H , Searcher : DoubleEndedSearcher <H :: Target >>;
425+ P : Pattern <H , Consumer : DoubleEndedConsumer <H :: Target >>;
383426```
384427
385428** Matches**
@@ -665,7 +708,7 @@ The main performance improvement comes from `trim()`. In v1.0, `trim()` depends
665708the `Searcher :: next_reject ()` method , which requires initializing a searcher and compute
666709the critical constants for the Two - Way search algorithm . Search algorithms mostly concern about
667710quickly skip through mismatches , but the purpose of `. next_reject ()` is to find mismatches , so a
668- searcher would be a job mismatch for `trim ()`. This justifies the `. into_consumer ()` method in v3 . 0 .
711+ searcher would be a job mismatch for `trim ()`. This justifies the `Consumer ` trait in v3 . 0 .
669712
670713<details ><summary >Summary of benchmark </ summary >
671714
@@ -717,7 +760,7 @@ searcher would be a job mismatch for `trim()`. This justifies the `.into_consume
717760
718761 [suffix table ]: https : // docs.rs/suffix/1.0.0/suffix/struct.SuffixTable.html#method.positions
719762
720- 2 . Patterns are still moved when converting to a Searcher .
763+ 2 . Patterns are still moved when converting to a Searcher or Consumer .
721764 Taking the entire ownership of the pattern might prevent some use cases... ?
722765
723766* Stabilization of this RFC is blocked by [RFC 1672] \(disjointness based on associated types)
@@ -1087,26 +1130,6 @@ trait Consumer<A: Hay + ?Sized> {
10871130Both ` starts_with() ` and ` trim() ` can be efficiently implemented in terms of ` .consume() ` ,
10881131though for some patterns a specialized ` trim() ` can be even faster, so we keep this default method.
10891132
1090- During the RFC, after we have actually tried the API on third party code, we found that having
1091- ` Searcher ` and ` Consumer ` as two distinct traits seldom have any advantages as most of the time they
1092- are the same type anyway. Therefore, we * merge* the consumer methods into the ` Searcher ` trait,
1093- while still keeping ` Pattern::into_consumer() ` so we could still choose the less expensive algorithm
1094- at runtime.
1095-
1096- ``` rust
1097- // v3.0-alpha.8
1098- trait Pattern <H : Haystack > {
1099- type Searcher : Searcher <H :: Target >;
1100- fn into_searcher (self ) -> Self :: Searcher ;
1101- fn into_consumer (self ) -> Self :: Searcher { self . into_searcher () }
1102- }
1103- trait Searcher <A : Hay + ? Sized > {
1104- fn search (& mut self , hay : Span <& A >) -> Option <Range <A :: Index >>;
1105- fn consume (& mut self , hay : Span <& A >) -> Option <A :: Index >;
1106- fn trim_start (& mut self , hay : & A ) -> A :: Index { /* default impl */ }
1107- }
1108- ```
1109-
11101133## Miscellaneous decisions
11111134
11121135### ` usize ` as index instead of pointers
@@ -1183,13 +1206,12 @@ And thus the more general `Borrow` trait offers no advantage over `Deref`.
11831206
11841207### Searcher makes Hay an input type instead of associated type
11851208
1186- The ` Searcher ` trait makes the hay as input type.
1209+ The ` Searcher ` and ` Consumer ` traits makes the hay as input type.
11871210This makes any algorithm relying on a ` ReverseSearcher ` need to spell out the hay as well.
11881211
11891212``` rust
11901213trait Searcher <A : Hay + ? Sized > {
11911214 fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >>;
1192- ...
11931215}
11941216
11951217fn rfind <H , P >(haystack : H , pattern : P ) -> Option <H :: Target :: Index >
@@ -1205,7 +1227,6 @@ An alternative is to make Hay an associated type:
12051227trait Searcher {
12061228 type Hay : Hay + ? Sized ;
12071229 fn search (& mut self , span : Span <& Self :: Hay >) -> Option <Range <Self :: Hay :: Index >>;
1208- ...
12091230}
12101231
12111232fn rfind <H , P >(haystack : H , pattern : P ) -> Option <H :: Target :: Index >
@@ -1251,12 +1272,17 @@ With specialization, this dilemma can be easily fixed: we will fallback to an al
12511272which only requires ` T: PartialEq ` (e.g. [ ` galil-seiferas ` ] or even naive search),
12521273and use the faster Two-Way algorithm when ` T: Ord ` .
12531274
1254- ### Not having default implementations for ` Searcher::{ search, consume} `
1275+ ### Not having default implementations for ` search ` and ` consume `
12551276
1256- In the ` Searcher ` trait, ` .search() ` and ` .consume() ` can be implemented in terms of each other:
1277+ In the ` Searcher ` and ` Consumer ` traits, ` .search() ` and ` .consume() ` can be implemented
1278+ in terms of each other:
12571279
12581280``` rust
1259- trait Searcher <A : Hay + ? Sized > {
1281+ impl <A , C > Searcher <A > for C
1282+ where
1283+ A : Hay + ? Sized ,
1284+ C : Consumer <A >,
1285+ {
12601286 fn search (& mut self , span : Span <& A >) -> Option <Range <A :: Index >> {
12611287 // we can implement `search` in terms of `consume`
12621288 let (hay , range ) = span . into_parts ();
@@ -1272,7 +1298,13 @@ trait Searcher<A: Hay + ?Sized> {
12721298 }
12731299 }
12741300 }
1301+ }
12751302
1303+ impl <A , S > Consumer <A > for S
1304+ where
1305+ A : Hay + ? Sized ,
1306+ S : Searcher <A >,
1307+ {
12761308 fn consume (& mut self , span : Span <& A >) -> Option <A :: Index > {
12771309 // we can implement `consume` in terms of `search`
12781310 let start = span . original_range (). start;
@@ -1283,8 +1315,6 @@ trait Searcher<A: Hay + ?Sized> {
12831315 None
12841316 }
12851317 }
1286-
1287- ...
12881318}
12891319```
12901320
@@ -1308,12 +1338,19 @@ where they should have full control of the details, we keep them as required met
13081338 ` .next_match() ` since it needs to take a span as input and thus no longer iterator-like.
13091339 It is renamed to ` .search() ` as a shorter verb and also consistent with the trait name.
13101340
1311- * ** Searcher ::consume()** . The name is almost randomly chosen as there's no good name for
1341+ * ** Consumer ::consume()** . The name is almost randomly chosen as there's no good name for
13121342 this operation. This name is taken from the same function in the [ ` re2 ` library] [ re2-consume ] .
13131343
1344+ * ` Consumer ` is totally different from ` Searcher ` . Calling it ` PrefixSearcher ` or
1345+ ` AnchoredSearcher ` would imply a non-existing sub-classing relationship.
1346+
13141347 * We would also like a name which is only a single word.
13151348
1316- * "match" (using name from Python) is incompatible with the existing ` .matches() ` method.
1349+ * We want the name * not* start with the letter ** S**
1350+ so we could easily distinguish between this and ` Searcher ` when quick-scanning the code,
1351+ in particular when ` ReverseXxxer ` is involved.
1352+
1353+ * "Matcher" (using name from Python) is incompatible with the existing ` .matches() ` method.
13171354 Besides, the meaning of "match" is very ambiguous among other libraries.
13181355
13191356 <details ><summary >Names from other languages and libraries</summary >
@@ -1636,7 +1673,8 @@ Unlike this RFC, the `Extract` class is much simpler.
16361673 the core type `& A ` only , we could keep `SharedHaystack ` unstable longer
16371674 (a separate track from the main Pattern API ) until this question is resolved .
16381675
1639- * With a benefit of type checking , we may still want to split `Consumer ` from `Searcher `.
1676+ * With a benefit of simplified API ,
1677+ we may want to merge `Consumer ` and `Searcher ` into a single trait .
16401678
16411679[RFC 528 ]: https : // github.com/rust-lang/rfcs/pull/528
16421680[RFC 1309 ]: https : // github.com/rust-lang/rfcs/pull/1309
0 commit comments