Checklist: docs/value/loo.md

mdbenito · mdbenito · commit db7ddd3e33f0 · 2025-04-02T13:08:53.000+02:00
diff --git a/docs/influence/index.md b/docs/influence/index.md
@@ -89,7 +89,7 @@ $$\frac{d \ \hat{\theta}_{\epsilon, z}}{d \epsilon} \Big|_{\epsilon=0} =
 -H_{\hat{\theta}}^{-1} \nabla_\theta L(z, \hat{\theta}), $$
 
 where $H_{\hat{\theta}} = \frac{1}{n} \sum_{i=1}^n \nabla_\theta^2 L(z_i,
-\hat{\theta})$ is the Hessian of $L$. These quantities are also knows as
+\hat{\theta})$ is the Hessian of $L$. These quantities are also known as
 **influence factors**.
 
 Importantly, notice that this expression is only valid when $\hat{\theta}$ is a
diff --git a/docs/value/loo.md b/docs/value/loo.md
@@ -8,10 +8,10 @@ alias:
 
 LOO is the simplest approach to valuation. Let $D$ be the training set, and
 $D_{-i}$ be the training set without the sample $x_i$. Assume some utility
-function $u(S)$ that measures the performance of a model trained on 
+function $u(S)$ that measures the performance of a model trained on
 $S \subseteq D$.
 
-LOO assigns to each sample its *marginal utility* as value: 
+LOO assigns to each sample its *marginal utility* as value:
 
 $$v_\text{loo}(i) = u(D) - u(D_{-i}),$$
 
@@ -20,13 +20,13 @@ method. In pyDVL it is available as
 [LOOValuation][pydvl.valuation.methods.loo.LOOValuation].
 
 For the purposes of data valuation, this is rarely useful beyond serving as a
-baseline for benchmarking. Although it can perform astonishingly well on
-occasion.
+baseline for benchmarking (although it can perform astonishingly well on
+occasion).
 
 One particular weakness is that it does not necessarily correlate with an
 intrinsic value of a sample: since it is a marginal utility, it is affected by
-_diminishing returns_. Often, the training set is large enough for a single sample
-not to have any significant effect on training performance, despite any
+_diminishing returns_. Often, the training set is large enough for a single
+sample not to have any significant effect on training performance, despite any
 qualities it may possess. Whether this is indicative of low value or not depends
 on one's goals and definitions, but other methods are typically preferable.
 
@@ -46,3 +46,8 @@ on one's goals and definitions, but other methods are typically preferable.
 
 Strictly speaking, LOO can be seen as a [semivalue][semi-values-intro] where
 all the coefficients are zero except for $k=|D|-1.$
+
+!!! tip "Connection to the influence function"
+    With a slight change of perspective, the _influence function_ can be seen as
+    a first order approximation to the Leave-One-Out values. See [Approximating
+    the influence of a point][influence-of-a-point].
diff --git a/docs/value/shapley.md b/docs/value/shapley.md
@@ -111,9 +111,9 @@ Let's decompose definition [(1)][combinatorial-shapley-intro] into "layers",
 one per subset size $k,$ by writing it in the equivalent form:[^not1]
 
 $$v_\text{shap}(i) = \sum_{k=0}^{n-1} \frac{1}{n} \binom{n-1}{k}^{-1} 
-    \sum_{S \subseteq N_{-i}^{k}} \Delta_i(S).$$
+    \sum_{S \subseteq D_{-i}^{k}} \Delta_i(S).$$
 
-Here $N_i^{k}$ is the set of all subsets of size $k$ in the complement  of
+Here $D_i^{k}$ is the set of all subsets of size $k$ in the complement  of
 $\{i\}.$ Since there are $\binom{n-1}{k}$ such sets, the above is an average
 over all $n$ set sizes $k$ of the average marginal contributions of the point
 $i$ to all sets of size $k.$