Convex - Optimization - Homework 3

Uploaded by

hadjiamine93

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Convex - Optimization - Homework 3

Uploaded by

hadjiamine93

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Convex optimization - Homework 3

1. Second order methods for dual problem

1. Let the LASSO (Least Absolute Shrinkange Operator and Selection Opera-
tor) problem
minimize 21 ||Xw − y||22 + λ||w||1
In the variable w ∈ Rd , with X ∈ Rn×d , y ∈ Rn and λ > 0 the regularization
parameter.

We can rewrite the problem as

minimize 12 ||z − y||22 + λ||w||1
subject to z = Xw
The Lagrangian of the problem will be
L(w, z, v) = 21 ||z − y||22 + λ||w||1 + v T (Xw − z).

As the function is separable

inf L(w, z, v) = inf (λ||w||1 + v T Xw) + inf ( 21 ||z − y||22 − v T z)
w,z w z

We have the rst part

1
− y||22 − v T z = 12 z T z − (y + v)T z + 21 y T y
2 ||z
arg min( 12 z T z − (y + v)T z + 12 y T y) = y + v
z
inf ( 12 ||z − y||22 − v T z) = − 12 ||v||22 − y T v
z

And the second part P

d
inf (λ||w||1 + (X T v)T w) = i=1 (inf (λ|wi | + (X T v)i wi ))
w w i

inf (λ|wi | + (X T v)i wi ) = 0 if |(X T v)i | ≤ λ

wi
−∞ else
inf (λ||w||1 + (X T v)T w) = 0 if ||X T v||∞ ≤ λ
w
−∞ else
We deduce that :
inf L(w, z, v) = − 21 ||v||22 − y T v when ||X T v||∞ ≤ λ
w,z

Thus, the dual problem is

maximize - 21 ||v||22 − y T v
subject to ||X T v||∞ ≤ λ
We can rewrite the problem as
minimize v T Qv + pT v
subject to Av ≤ b
In the variable v ∈ Rn , with Q = 12 In , p = y , A = (X, −X)T and b = λI2d .

3.We can take v0 = 0 that satises the initial condition.

We notice that the bigger µ is, the deeper the objective function goes at each
iterations. However, when µ is too big, the number of iterations decreases.

1
Figure 1: The objective function vs the number of iterations for dierent µ

Figure 2: The gap in a semilog scale

Thus, an appropriate µ should be 50

2. First order methods for primal problem

1. The function f (w) = 12 ||Xw − y||22 + λ||w||1 is not dierentiable. However,

the function is dierentiable in the set {w ∈ Rd |∃i ∈ {1, ..., d}wi = 0}, otherwise

2
∂f (w) = X T Xw − X T y + λ(sgn(wi ))1≤i≤d .
By setting g(w) = X T Xw − X T y + λ(sgn(wi ))1≤i≤d , with sgn(0) = 0, we
have
∀z ∈ Rd f (z) − f (w) − g(w)T (z − w) ≥ 0. So we can consider g a subgradient
for f .

Figure 3: The objective function vs the number of iterations for dierent strate-
gies

Figure 4: The gap in a semilog scale for dierent strategies

• Strategy 1 : Constant step size αk = h

• Strategy 2 : Constant step length αk = h
gk

3
• Strategy 3 : Square summable but not summable αk = h
k

• Strategy 4 : Nonsummable diminishing αk = √h

We can see that the 4th strategy is the fastest, while the 1st and the 2nd are
really slow. However, if we continue to iterate a certain number of times, we'll
notice that the rst two strategies are more precise, even though they are not
converging.
2. The function is in the form f (w) = 12 wT X T Xw − y T Xw + 12 y T y + λ||w||1 .
If we writePthe function in close form, P
we'll get :
f (w) = 21 i=1 j=1 (X T X)ij wi wj − i=1 (X T y)i wi + 21 y T y + λ i=1 |wi |.
d Pd Pd
A better form
P should be : P
d d i−1
f (w) = 21 i=1 (X T X)ii wi2 + T T
y)i wi + 21 y T y+
P P
i=1 j=1 (X X)ij wi wj − i=1 (X
λ i=1 |wi |.
Pd

Here, for each i ∈ {1, .., d}, if we x wj with j 6= i, fi (wi ) = f (w) in

the form fi (wi ) = αi wi2 + βi wi + γi |wi | + δi with αi = 12 (X T X)ii , βi =
j6=i (X X)ij wj − (X y)i , γi = λ and δi = 2
P T T 1
Pd Pd T
j6=i k6=i (X X)kj wk wj −
Pd
T
y)j wj + 21 y T y + λ j6=i |wj |
P
j6=i (X
We can say than αi , γi > 0. Thus, when we study the function fi (wi ) =
αi wi2 + βi wi + γi |wi | + δi , we know that the function is convex and its lim-
its are +∞. Moreover :
fi0 (wi ) = 2αi wi + βi + γi if wi > 0
2αi wi + βi − γi if wi < 0
(the funcion is not dierentiable on 0).

We can thus deduce that :

arg min fi (wi ) = 0 if |βi | ≤ γi
wi
−βi −γi
2αi if βi < −γi
−βi +γi
2αi if βi > γi
We can thus set at each iteration wi(k) = arg min fi (wi ) and wj(k+1) = wj(k) when
wi
j 6= i, by cycling over i.

We notice that the coordinate descent method is faster than the sub-gradient
method. Indeed, while with the sub-gradient method, there is no convergence
but an oscillation over p∗, the coordinate descent method assures a convergence
in a certain number of step (approximately 250 to have a gap = 10−3 ).

4
Figure 5: The objective function vs the number of iterations for the coordinate
descent

Figure 6: The gap in a semilog scale for the coordinate descent

5
Figure 7: Comparison of the gap for the 2 methods

HW 5 Sol
100% (1)
HW 5 Sol
19 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Homework 2
No ratings yet
Homework 2
5 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
Introduction to optimization - Jean-François Aujol
No ratings yet
Introduction to optimization - Jean-François Aujol
51 pages
SESO2018_Wednesday_Sagastizabal
No ratings yet
SESO2018_Wednesday_Sagastizabal
181 pages
Exam With Solutions PDF
0% (1)
Exam With Solutions PDF
17 pages
Controle16
No ratings yet
Controle16
4 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
13 Generalized Programming and Subgradient Optimization PDF
No ratings yet
13 Generalized Programming and Subgradient Optimization PDF
20 pages
Hw3sol PDF
No ratings yet
Hw3sol PDF
8 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Chương 9
No ratings yet
Chương 9
12 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Three Strategies To Derive A Dual Problem
No ratings yet
Three Strategies To Derive A Dual Problem
8 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
24142
No ratings yet
24142
7 pages
10-725/36-725 Optimization Midterm Exam: Name
No ratings yet
10-725/36-725 Optimization Midterm Exam: Name
10 pages
Convex Optimization Quizz
No ratings yet
Convex Optimization Quizz
5 pages
MSML604_Homework_5
No ratings yet
MSML604_Homework_5
4 pages
Gradient
No ratings yet
Gradient
31 pages
HW 2 Sol
No ratings yet
HW 2 Sol
5 pages
Hw2sol PDF
100% (1)
Hw2sol PDF
5 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
Spectral Projected Gradient Methods: E. G. Birgin J. M. Mart Inez M. Raydan January 17, 2007
No ratings yet
Spectral Projected Gradient Methods: E. G. Birgin J. M. Mart Inez M. Raydan January 17, 2007
14 pages
(Barrientos O.) A Branch and Bound Method For Solv
No ratings yet
(Barrientos O.) A Branch and Bound Method For Solv
17 pages
Proximal Minimization With D-Functions: Gorithms
No ratings yet
Proximal Minimization With D-Functions: Gorithms
11 pages
Nlpsol 5
No ratings yet
Nlpsol 5
37 pages
sol3_2015
No ratings yet
sol3_2015
8 pages
The Levenberg-Marquardt Algorithm-Implementation and Theory
No ratings yet
The Levenberg-Marquardt Algorithm-Implementation and Theory
12 pages
Inequality 20161031
No ratings yet
Inequality 20161031
31 pages
Comp Numerical Analysis Problems
No ratings yet
Comp Numerical Analysis Problems
7 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
No ratings yet
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
8 pages
1203.3002v1
No ratings yet
1203.3002v1
37 pages
A Strengthened Conjecture on the Minimax Optimal Constant Stepsize for Gradient Descent
No ratings yet
A Strengthened Conjecture on the Minimax Optimal Constant Stepsize for Gradient Descent
8 pages
Hw5sol PDF
No ratings yet
Hw5sol PDF
19 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX
No ratings yet
Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX
9 pages
Lecture_7_8_other_descent_methods
No ratings yet
Lecture_7_8_other_descent_methods
7 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
OPT-202-LN
No ratings yet
OPT-202-LN
86 pages
Institute of Computer Science: Academy of Sciences of The Czech Republic
No ratings yet
Institute of Computer Science: Academy of Sciences of The Czech Republic
49 pages
Solutions To Selected Exercises and Additional Examples For My Book Numerical Methods For Evolutionary Differential Equations
No ratings yet
Solutions To Selected Exercises and Additional Examples For My Book Numerical Methods For Evolutionary Differential Equations
19 pages
MATH412 QUIZ 3 Solution
No ratings yet
MATH412 QUIZ 3 Solution
5 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
Strictly Low Rank Constraint Optimization
No ratings yet
Strictly Low Rank Constraint Optimization
12 pages
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
No ratings yet
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
5 pages
Chapter 3 Supplementary: (Recall That N (N) )
No ratings yet
Chapter 3 Supplementary: (Recall That N (N) )
5 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
Ods End Term
No ratings yet
Ods End Term
3 pages
IRWA-Rewighted Burke 2015
No ratings yet
IRWA-Rewighted Burke 2015
34 pages
斯坦福大学机器学习数学基础 57-64
No ratings yet
斯坦福大学机器学习数学基础 57-64
8 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Mirror Descent Slides
No ratings yet
Mirror Descent Slides
35 pages
Statistics 580 Nonlinear Least Squares: I I I I I I I 2 N I I 2
No ratings yet
Statistics 580 Nonlinear Least Squares: I I I I I I I 2 N I I 2
14 pages
Smooth Convex Minimization Problems
No ratings yet
Smooth Convex Minimization Problems
28 pages
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
BIT4101 BUSINESS DATA MINING AND WAREHOUSING-cat 1
No ratings yet
BIT4101 BUSINESS DATA MINING AND WAREHOUSING-cat 1
5 pages
Data Structure Using C++ - Elective III
No ratings yet
Data Structure Using C++ - Elective III
2 pages
Combining Task and Motion Planning: A Culprit Detection Problem
No ratings yet
Combining Task and Motion Planning: A Culprit Detection Problem
40 pages
Osta L4
No ratings yet
Osta L4
30 pages
Fourth Quarter Lesson 1.1 Permutations
No ratings yet
Fourth Quarter Lesson 1.1 Permutations
28 pages
Statistics: New Foundations, Toolbox, and Machine Learning Recipes
No ratings yet
Statistics: New Foundations, Toolbox, and Machine Learning Recipes
309 pages
4782syllabus2018 9
No ratings yet
4782syllabus2018 9
7 pages
Lesson 23: Solution Sets To Simultaneous Equations: Classwork
No ratings yet
Lesson 23: Solution Sets To Simultaneous Equations: Classwork
4 pages
Secure Electronic Transaction: SMU CSE 5349/7349
No ratings yet
Secure Electronic Transaction: SMU CSE 5349/7349
33 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Mathematical Modeling of Two Tank System Ijariie4840
100% (1)
Mathematical Modeling of Two Tank System Ijariie4840
8 pages
First Derivative Test For Increasing and Decreasing Functions
No ratings yet
First Derivative Test For Increasing and Decreasing Functions
5 pages
Tugas Analisis Kuantitatif Untuk Bisnis
No ratings yet
Tugas Analisis Kuantitatif Untuk Bisnis
6 pages
Module 3 - Two Port Networks
No ratings yet
Module 3 - Two Port Networks
56 pages
BE368 Lecture 4
No ratings yet
BE368 Lecture 4
28 pages
20a Three Mode PID Control Austin
No ratings yet
20a Three Mode PID Control Austin
9 pages
Lecture 4
No ratings yet
Lecture 4
19 pages
Least-Square Method
No ratings yet
Least-Square Method
32 pages
Fuzzy Rule Base and Approximate Reasoning: "Principles of Soft Computing, 2
100% (2)
Fuzzy Rule Base and Approximate Reasoning: "Principles of Soft Computing, 2
39 pages
CH 8 Sect 2 Solve by Addition 014
No ratings yet
CH 8 Sect 2 Solve by Addition 014
3 pages
XGBoost
No ratings yet
XGBoost
4 pages
Authentication of Iot Device and Iot Server Using Secure Vaults
No ratings yet
Authentication of Iot Device and Iot Server Using Secure Vaults
6 pages
Lecture 06
No ratings yet
Lecture 06
14 pages
Python Week 5 - 6 GrPA (Made by Unknown)
No ratings yet
Python Week 5 - 6 GrPA (Made by Unknown)
4 pages
Mixed-Integer Linear Programming (MILP) - MATLAB Intlinprog - MathWorks France
No ratings yet
Mixed-Integer Linear Programming (MILP) - MATLAB Intlinprog - MathWorks France
11 pages
Tolerance Analysis Example
No ratings yet
Tolerance Analysis Example
10 pages
Assignment
No ratings yet
Assignment
30 pages
Factoring GCF and Difference of Two Squares
No ratings yet
Factoring GCF and Difference of Two Squares
23 pages