You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/value/index.md
+26-8Lines changed: 26 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -229,6 +229,16 @@ objects for different datasets. You can read more about [setting up the
229
229
cache][getting-started-cache] in the installation guide, and in the
230
230
documentation of the [caching][pydvl.utils.caching] module.
231
231
232
+
!!! danger "Errors are hidden by default"
233
+
During semi-value computations, the utility can be evaluated on subsets that
234
+
break the fitting process. For instance, a classifier might require at least two
235
+
classes to fit, but the utility is sometimes evaluated on subsets with only one
236
+
class. This will raise an error with most classifiers. To avoid this, we set by
237
+
default `catch_errors=True` upon instantiation, which will catch the error and
238
+
return the scorer's default value instead. While we show a warning to signal that
239
+
something went wrong, this suppression can lead to unexpected results, so it is
240
+
important to be aware of this setting and to set it to `False` when testing, or if
241
+
you are sure that the utility will not be evaluated on problematic subsets.
232
242
233
243
### Computing some values
234
244
@@ -267,25 +277,33 @@ over, sliced, sorted, as well as converted to a [pandas.DataFrame][] using
267
277
268
278
### Learning the utility
269
279
270
-
Since each evaluation of the utility entails a full retrain of the model on a new subset of the training data, it is natural to try to learn this mapping from subsets to scores. This is the idea behind **Data Utility Learning (DUL)**
280
+
Since each evaluation of the utility entails a full retraining of the model on a
281
+
new subset of the training data, it is natural to try to learn this mapping from
282
+
subsets to scores. This is the idea behind **Data Utility Learning (DUL)**
271
283
[@wang_improving_2022] and in pyDVL it's as simple as wrapping the
0 commit comments