Skip to content

Commit cda40e1

Browse files
Aishwarya0811malakazlan
authored andcommitted
DOC: Update derivation equations of positive Group Lasso penalty
- Add complete Lagrangian derivation for case w = 0 - Include rigorous KKT conditions and optimality analysis - Replace incomplete derivations with full mathematical proofs Fixes #243
1 parent 06e3a99 commit cda40e1

File tree

1 file changed

+27
-10
lines changed

1 file changed

+27
-10
lines changed

doc/tutorials/prox_nn_group_lasso.rst

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,6 @@ Using the Moreau decomposition, Equations :eq:`fenchel` and :eq:`prox_projection
6868
6969
A similar formula can be derived for the group Lasso with nonnegative constraints.
7070

71-
7271
Proximity operator of the group Lasso with positivity constraints
7372
=================================================================
7473

@@ -135,8 +134,6 @@ and thus, combined with Equations :eq:`prox_projection_nn_Sc` and :eq:`prox_proj
135134
(1 - \frac{\lambda}{\norm{x_S}})_{+} x_S
136135
.
137136
138-
139-
140137
.. _subdiff_positive_group_lasso:
141138

142139
Subdifferential of the positive Group Lasso penalty
@@ -184,20 +181,41 @@ Minimizing over :math:`n` then over :math:`u`, thanks to [`1 <https://math.stack
184181
where :math:`v^+` is :math:`v` restricted to its positive coordinates.
185182
Intuitively, it is clear that if :math:`v_i < 0`, we can cancel it exactly in the objective function by taking :math:`n_i = - v_i` and :math:`u_i = 0`; on the other hand, if :math:`v_i>0`, taking a non zero :math:`n_i` will only increase the quantity that :math:`u_i` needs to bring closer to 0.
186183

187-
For a rigorous derivation of this, introduce the Lagrangian on a squared objective
184+
**Rigorous derivation:** Consider the Lagrangian (where we have squared the objective and the :math:`u` constraint for convenience when taking derivatives):
188185

189186
.. math::
190187
191188
\mathcal{L}(u, n, \nu, \mu) =
192189
\frac{1}{2}\norm{u + n - v}^2 + \nu(\frac{1}{2} \norm{u}^2 - \lambda^2 / 2) + \langle \mu, n \rangle
193190
,
194191
195-
and write down the optimality condition with respect to :math:`u` and :math:`n`.
196-
Treat the case :math:`nu = 0` separately; in the other case show that :\math:`u` must be positive, and that :math:`v = (1 + \nu) u + n`, together with :math:`u = \mu / \nu` and complementary slackness, to reach the conclusion.
192+
with a positive scalar :math:`\nu` and a positive vector :math:`\mu`.
193+
194+
Slater's condition is met (assuming :math:`\lambda > 0`), so the KKT conditions are necessary and sufficient. Considering the optimality with respect to :math:`u` and :math:`n` respectively, we obtain:
195+
196+
.. math::
197+
198+
u + n - v + \nu u &= 0 \\
199+
u + n - v + \mu &= 0
200+
201+
Hence :math:`\mu = \nu u`. If :math:`\nu = 0`, then :math:`v = u + n` and the optimal objective is 0. Else, :math:`\nu > 0` and :math:`\mu \geq 0`, so any solution :math:`u = \frac{1}{\nu}\mu` must be positive. By complementary slackness, :math:`\mu_j n_j = 0 = \nu u_j n_j`. So :math:`u` and :math:`n` have disjoint supports.
202+
203+
Since :math:`v = (1 + \nu)u + n`, it is clear that:
204+
205+
- If :math:`v_j > 0`, it is :math:`u_j` which is nonzero, equal to :math:`v_j/(1 + \nu)`
206+
- If :math:`v_j < 0`, it is :math:`n_j` which is nonzero and equal to :math:`v_j`
207+
208+
We have :math:`v_j > 0 \Rightarrow n_j = 0` and :math:`v_j < 0 \Rightarrow u_j = 0`, so we can rewrite the problem as:
209+
210+
.. math::
211+
212+
\min_{u} \sum_{j: v_j > 0} (u_j - v_j)^2 \quad \text{s.t.} \quad \sum_{j: v_j > 0} u_j^2 \leq \lambda^2
213+
214+
which is the projection problem yielding the final result.
197215

198216
Case :math:`|| w || \ne 0`
199217
---------------------------
200-
The subdifferential in that case is :math:`\lambda w / {|| w ||} + C_1 \times \ldots \times C_g` where :math:`C_j = {0}` if :math:`w_j > 0` and :math:`C_j = mathbb{R}_-` otherwise (:math:`w_j =0`).
218+
The subdifferential in that case is :math:`\lambda w / {|| w ||} + C_1 \times \ldots \times C_g` where :math:`C_j = {0}` if :math:`w_j > 0` and :math:`C_j = \mathbb{R}_-` otherwise (:math:`w_j =0`).
201219

202220
By letting :math:`p` denotes the projection of :math:`v` onto this set,
203221
one has
@@ -216,13 +234,12 @@ The distance to the subdifferential is then:
216234

217235
.. math::
218236
219-
D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda \frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2
237+
D(v) = || v - p || = \sqrt{\sum_{j, w_j > 0} (v_j - \lambda \frac{w_j}{||w||})^2 + \sum_{j, w_j=0} \max(0, v_j)^2}
220238
221239
since :math:`v_j - \min(v_j, 0) = v_j + \max(-v_j, 0) = \max(0, v_j)`.
222240

223241

224-
225242
References
226243
==========
227244

228-
[1] `<https://math.stackexchange.com/a/2887332/167258>`_
245+
[1] `<https://math.stackexchange.com/a/2887332/167258>`_

0 commit comments

Comments
 (0)