Convex - Optimization - Homework 3
Convex - Optimization - Homework 3
1. Let the LASSO (Least Absolute Shrinkange Operator and Selection Opera-
tor) problem
minimize 21 ||Xw − y||22 + λ||w||1
In the variable w ∈ Rd , with X ∈ Rn×d , y ∈ Rn and λ > 0 the regularization
parameter.
1
Figure 1: The objective function vs the number of iterations for dierent µ
2
∂f (w) = X T Xw − X T y + λ(sgn(wi ))1≤i≤d .
By setting g(w) = X T Xw − X T y + λ(sgn(wi ))1≤i≤d , with sgn(0) = 0, we
have
∀z ∈ Rd f (z) − f (w) − g(w)T (z − w) ≥ 0. So we can consider g a subgradient
for f .
Figure 3: The objective function vs the number of iterations for dierent strate-
gies
3
• Strategy 3 : Square summable but not summable αk = h
k
We can see that the 4th strategy is the fastest, while the 1st and the 2nd are
really slow. However, if we continue to iterate a certain number of times, we'll
notice that the rst two strategies are more precise, even though they are not
converging.
2. The function is in the form f (w) = 12 wT X T Xw − y T Xw + 12 y T y + λ||w||1 .
If we writePthe function in close form, P
we'll get :
f (w) = 21 i=1 j=1 (X T X)ij wi wj − i=1 (X T y)i wi + 21 y T y + λ i=1 |wi |.
d Pd Pd
A better form
P should be : P
d d i−1
f (w) = 21 i=1 (X T X)ii wi2 + T T
y)i wi + 21 y T y+
P P
i=1 j=1 (X X)ij wi wj − i=1 (X
λ i=1 |wi |.
Pd
We notice that the coordinate descent method is faster than the sub-gradient
method. Indeed, while with the sub-gradient method, there is no convergence
but an oscillation over p∗, the coordinate descent method assures a convergence
in a certain number of step (approximately 250 to have a gap = 10−3 ).
4
Figure 5: The objective function vs the number of iterations for the coordinate
descent
5
Figure 7: Comparison of the gap for the 2 methods