Commit 34bfe5e
committed
tco-190 Filter searches before citation detector
** Why are these changes being introduced:
There is a transaction cost to calling the citation detector, both in
terms of time and processing time. If we can develop a way to flag
searches which have zero chance of being a citation, we can cut out this
expense by skipping the citation detector.
** Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/tco-190
** How does this address that need:
This moves the extract_features method in Detector::MlCitation into the
initialize method, allowing us to quickly check for whether a given term
has enough non-zero features to make it worth calling the detector.
From our analysis in the TACOS notebooks, we believe that phrases which
result in only two non-zero values among all their features will never
end up being a citation - and that this threshold will allow us to skip
the citation detector in 90% of searches.
The filtering is performed in a convenience method named
enough_nonzero_values? (naming things is hard).
** Document any side effects to this change:
The @Detections instance variable is defined as false at the top of
the initialize method, before the first guard clause, so that we get a
consistent Boolean value in all conditions. This required one test to
change that previously expected a nil.1 parent 1b0a034 commit 34bfe5e
File tree
5 files changed
+46
-8
lines changed- app/models/detector
- test
- models/detector
5 files changed
+46
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
| 131 | + | |
131 | 132 | | |
132 | 133 | | |
133 | 134 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
| 241 | + | |
240 | 242 | | |
241 | 243 | | |
242 | 244 | | |
| |||
398 | 400 | | |
399 | 401 | | |
400 | 402 | | |
| 403 | + | |
401 | 404 | | |
402 | 405 | | |
403 | 406 | | |
| |||
518 | 521 | | |
519 | 522 | | |
520 | 523 | | |
| 524 | + | |
521 | 525 | | |
522 | 526 | | |
523 | 527 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
15 | 19 | | |
16 | 20 | | |
17 | 21 | | |
| |||
111 | 115 | | |
112 | 116 | | |
113 | 117 | | |
114 | | - | |
| 118 | + | |
115 | 119 | | |
116 | 120 | | |
117 | | - | |
| 121 | + | |
118 | 122 | | |
119 | 123 | | |
120 | 124 | | |
| |||
135 | 139 | | |
136 | 140 | | |
137 | 141 | | |
138 | | - | |
| 142 | + | |
139 | 143 | | |
140 | | - | |
| 144 | + | |
141 | 145 | | |
142 | 146 | | |
143 | 147 | | |
| |||
151 | 155 | | |
152 | 156 | | |
153 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
154 | 166 | | |
155 | 167 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | | - | |
| 80 | + | |
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
86 | 106 | | |
87 | 107 | | |
88 | 108 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
124 | 125 | | |
125 | 126 | | |
126 | 127 | | |
127 | | - | |
| 128 | + | |
0 commit comments