Skip to content

Commit db7ddd3

Browse files
committed
Checklist: docs/value/loo.md
1 parent da85a1a commit db7ddd3

File tree

3 files changed

+14
-9
lines changed

3 files changed

+14
-9
lines changed

docs/influence/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ $$\frac{d \ \hat{\theta}_{\epsilon, z}}{d \epsilon} \Big|_{\epsilon=0} =
8989
-H_{\hat{\theta}}^{-1} \nabla_\theta L(z, \hat{\theta}), $$
9090

9191
where $H_{\hat{\theta}} = \frac{1}{n} \sum_{i=1}^n \nabla_\theta^2 L(z_i,
92-
\hat{\theta})$ is the Hessian of $L$. These quantities are also knows as
92+
\hat{\theta})$ is the Hessian of $L$. These quantities are also known as
9393
**influence factors**.
9494

9595
Importantly, notice that this expression is only valid when $\hat{\theta}$ is a

docs/value/loo.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ alias:
88

99
LOO is the simplest approach to valuation. Let $D$ be the training set, and
1010
$D_{-i}$ be the training set without the sample $x_i$. Assume some utility
11-
function $u(S)$ that measures the performance of a model trained on
11+
function $u(S)$ that measures the performance of a model trained on
1212
$S \subseteq D$.
1313

14-
LOO assigns to each sample its *marginal utility* as value:
14+
LOO assigns to each sample its *marginal utility* as value:
1515

1616
$$v_\text{loo}(i) = u(D) - u(D_{-i}),$$
1717

@@ -20,13 +20,13 @@ method. In pyDVL it is available as
2020
[LOOValuation][pydvl.valuation.methods.loo.LOOValuation].
2121

2222
For the purposes of data valuation, this is rarely useful beyond serving as a
23-
baseline for benchmarking. Although it can perform astonishingly well on
24-
occasion.
23+
baseline for benchmarking (although it can perform astonishingly well on
24+
occasion).
2525

2626
One particular weakness is that it does not necessarily correlate with an
2727
intrinsic value of a sample: since it is a marginal utility, it is affected by
28-
_diminishing returns_. Often, the training set is large enough for a single sample
29-
not to have any significant effect on training performance, despite any
28+
_diminishing returns_. Often, the training set is large enough for a single
29+
sample not to have any significant effect on training performance, despite any
3030
qualities it may possess. Whether this is indicative of low value or not depends
3131
on one's goals and definitions, but other methods are typically preferable.
3232

@@ -46,3 +46,8 @@ on one's goals and definitions, but other methods are typically preferable.
4646

4747
Strictly speaking, LOO can be seen as a [semivalue][semi-values-intro] where
4848
all the coefficients are zero except for $k=|D|-1.$
49+
50+
!!! tip "Connection to the influence function"
51+
With a slight change of perspective, the _influence function_ can be seen as
52+
a first order approximation to the Leave-One-Out values. See [Approximating
53+
the influence of a point][influence-of-a-point].

docs/value/shapley.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,9 @@ Let's decompose definition [(1)][combinatorial-shapley-intro] into "layers",
111111
one per subset size $k,$ by writing it in the equivalent form:[^not1]
112112

113113
$$v_\text{shap}(i) = \sum_{k=0}^{n-1} \frac{1}{n} \binom{n-1}{k}^{-1}
114-
\sum_{S \subseteq N_{-i}^{k}} \Delta_i(S).$$
114+
\sum_{S \subseteq D_{-i}^{k}} \Delta_i(S).$$
115115

116-
Here $N_i^{k}$ is the set of all subsets of size $k$ in the complement of
116+
Here $D_i^{k}$ is the set of all subsets of size $k$ in the complement of
117117
$\{i\}.$ Since there are $\binom{n-1}{k}$ such sets, the above is an average
118118
over all $n$ set sizes $k$ of the average marginal contributions of the point
119119
$i$ to all sets of size $k.$

0 commit comments

Comments
 (0)