You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DOC: Update derivation equations of positive Group Lasso penalty
- Add complete Lagrangian derivation for case w = 0
- Include rigorous KKT conditions and optimality analysis
- Replace incomplete derivations with full mathematical proofs
Fixes#243
@@ -135,8 +134,6 @@ and thus, combined with Equations :eq:`prox_projection_nn_Sc` and :eq:`prox_proj
135
134
(1 - \frac{\lambda}{\norm{x_S}})_{+} x_S
136
135
.
137
136
138
-
139
-
140
137
.. _subdiff_positive_group_lasso:
141
138
142
139
Subdifferential of the positive Group Lasso penalty
@@ -184,20 +181,41 @@ Minimizing over :math:`n` then over :math:`u`, thanks to [`1 <https://math.stack
184
181
where :math:`v^+` is :math:`v` restricted to its positive coordinates.
185
182
Intuitively, it is clear that if :math:`v_i < 0`, we can cancel it exactly in the objective function by taking :math:`n_i = - v_i` and :math:`u_i = 0`; on the other hand, if :math:`v_i>0`, taking a non zero :math:`n_i` will only increase the quantity that :math:`u_i` needs to bring closer to 0.
186
183
187
-
For a rigorous derivation of this, introduce the Lagrangian on a squared objective
184
+
**Rigorous derivation:** Consider the Lagrangian (where we have squared the objective and the :math:`u` constraint for convenience when taking derivatives):
188
185
189
186
.. math::
190
187
191
188
\mathcal{L}(u, n, \nu, \mu) =
192
189
\frac{1}{2}\norm{u + n - v}^2 + \nu(\frac{1}{2} \norm{u}^2 - \lambda^2 / 2) + \langle\mu, n \rangle
193
190
,
194
191
195
-
and write down the optimality condition with respect to :math:`u` and :math:`n`.
196
-
Treat the case :math:`nu = 0` separately; in the other case show that :\math:`u` must be positive, and that :math:`v = (1 + \nu) u + n`, together with :math:`u = \mu / \nu` and complementary slackness, to reach the conclusion.
192
+
with a positive scalar :math:`\nu` and a positive vector :math:`\mu`.
193
+
194
+
Slater's condition is met (assuming :math:`\lambda > 0`), so the KKT conditions are necessary and sufficient. Considering the optimality with respect to :math:`u` and :math:`n` respectively, we obtain:
195
+
196
+
.. math::
197
+
198
+
u + n - v + \nu u &= 0 \\
199
+
u + n - v + \mu &= 0
200
+
201
+
Hence :math:`\mu = \nu u`. If :math:`\nu = 0`, then :math:`v = u + n` and the optimal objective is 0. Else, :math:`\nu > 0` and :math:`\mu\geq0`, so any solution :math:`u = \frac{1}{\nu}\mu` must be positive. By complementary slackness, :math:`\mu_j n_j = 0 = \nu u_j n_j`. So :math:`u` and :math:`n` have disjoint supports.
202
+
203
+
Since :math:`v = (1 + \nu)u + n`, it is clear that:
204
+
205
+
- If :math:`v_j > 0`, it is :math:`u_j` which is nonzero, equal to :math:`v_j/(1 + \nu)`
206
+
- If :math:`v_j < 0`, it is :math:`n_j` which is nonzero and equal to :math:`v_j`
207
+
208
+
We have :math:`v_j > 0\Rightarrow n_j = 0` and :math:`v_j < 0\Rightarrow u_j = 0`, so we can rewrite the problem as:
which is the projection problem yielding the final result.
197
215
198
216
Case :math:`|| w || \ne0`
199
217
---------------------------
200
-
The subdifferential in that case is :math:`\lambda w / {|| w ||} + C_1\times\ldots\times C_g` where :math:`C_j = {0}` if :math:`w_j > 0` and :math:`C_j = mathbb{R}_-` otherwise (:math:`w_j =0`).
218
+
The subdifferential in that case is :math:`\lambda w / {|| w ||} + C_1\times\ldots\times C_g` where :math:`C_j = {0}` if :math:`w_j > 0` and :math:`C_j = \mathbb{R}_-` otherwise (:math:`w_j =0`).
201
219
202
220
By letting :math:`p` denotes the projection of :math:`v` onto this set,
203
221
one has
@@ -216,13 +234,12 @@ The distance to the subdifferential is then:
216
234
217
235
.. math::
218
236
219
-
D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda\frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2
237
+
D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda\frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2}
0 commit comments