0% found this document useful (0 votes)

33 views

Convex Functions

This document summarizes key points from a lecture on convexity-preserving operations and their applications. The lecture covered operations that preserve convexity such as nonnegative weighted sums, composition with affine mappings, and pointwise maximum. It also discussed convex envelopes and their use in cardinality constrained optimization and LASSO. Finally, it provided an overview of support vector machines as an application of supervised learning.

Uploaded by

wesley maxmiliano

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Convex Functions

Uploaded by

wesley maxmiliano

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ORF 523 Lecture 8 Spring 2017, Princeton University

Instructor: A.A. Ahmadi

Scribe: G. Hall Thursday, March 9, 2017

When in doubt on the accuracy of these notes, please cross check with the instructor’s notes,
on aaa. princeton. edu/ orf523 . Any typos should be emailed to gh4@princeton.edu.

1 Outline
• Convexity-preserving operations

• Convex envelopes, cardinality constrained optimization and LASSO

• An application in supervised learning: support vector machines (SVMs)

2 Operations that preserve convexity

The role of convexity preserving operations is to produce new convex functions out of a
set of “atom” functions that are already known to be convex. This is very important for
broadening the scope of problems that we can recognize as efficiently solvable via convex
optimization. There is a long list of convexity-preserving rules; see section 3.2 of [2]. We
present only a few of them here. The software CVX has a lot of these rules built in [1],[4].

2.1 Nonnegative weighted sums

Rule 1. If f1 , . . . , fm : Rn → R are convex functions and ω1 , . . . , ωm are nonnegative scalars
then

f (x) = ω1 f1 (x) + . . . + ωm fm (x)

is convex also. Similarly, a nonnegative weighted sum of concave functions is concave.

Exercise: If f1 , f2 are convex functions,
• is f1 − f2 convex?

• is f1 · f2 convex?
f1
• is f2
convex?

1
2.2 Composition with an affine mapping
Rule 2. Suppose f : Rn → R, A ∈ Rn×m , and b ∈ Rn . Define g : Rm → R as

g(x) = f (Ax + b)

with dom(g) = {x|Ax + b ∈ dom(f )}. Then, if f is convex, so is g; if f is concave, so is g.

The proof is a simple exercise.

Example: The following function is immediately seen to be convex. (Without knowing the
previous rule, it would be much harder to prove convexity.)

f (x1 , x2 ) = (x1 − 2x2 )4 + 2e3x1 +2x2 −5

2.3 Pointwise maximum

Rule 3. If f1 , . . . , fm are convex functions, then their pointwise maximum

f (x) = max{f1 (x), . . . , fm (x)},

with dom(f ) = dom(f1 ) ∩ . . . ∩ dom(fm ) is also convex.

Figure 1: An illustration of the pointwise maximum rule

Proof: Pick any x, y ∈ dom(f ), λ ∈ [0, 1]. Then,

f (λx + (1 − λ)y) = fj (λx + (1 − λ)y) (for some j ∈ {1, . . . , m})

≤ λfj (x) + (1 − λ)fj (y)
≤ λ max{f1 (x), . . . , fm (x)} + (1 − λ) max{f1 (y), . . . , fm (y)}
= λf (x) + (1 − λ)f (y).

2
• It is also easy to prove this result using epigraphs. Recall that f convex ⇔ epi(f ) is
convex. But epi(f ) = ∩m i=1 epi(fi ), and we know that the intersection of convex sets is
convex.

• One can similarly show that the pointwise minimum of two concave functions is con-
cave.

• But the pointwise minimum of two convex functions may not be convex.

2.4 Restriction to a line

Rule 4. Let f : Rn → R be a convex function and fix some x, y ∈ Rn . Then the function
g : R → R given by g(α) = f (x + αy) is convex.

Figure 2: An illustration of the restriction to a line rule (image credit: [6])

3
Many algorithms for unconstrained convex optimization (e.g., steepest descent with exact
line search) work by iteratively minimizing a function over lines. It’s useful to remember
that the restriction of a convex function to a line remains convex. This tells us that in each
subproblem we are faced with a univariate convex minimization problem, and hence we can
simply find a global minimum e.g. by finding a zero of the first derivative.

2.5 Power of a nonnegative function

Rule 5. If f is convex and nonnegative (i.e., f (x) ≥ 0, ∀x) and k ≥ 1, then f k is convex.

Proof: We prove this in the case where f is twice differentiable. Let g = f k . Then

∇g(x) = kf k−1 ∇f (x)

∇2 g(x) = k (k − 1)f k−2 ∇f (x)∇f T (x) + f k−1 ∇2 f (x) .

We see that ∇2 g(x) 0 for all x (why?).

Does this result hold if you remove the nonnegativity assumption on f ?

3 Convex envelopes
Definition 1. The convex envelope (or convex hull) convD f of a function f : Rn → R
over a convex set D ⊆ Rn is “the largest convex underestimator of f on D”; i.e.,

if h(x) ≤ f (x) ∀x ∈ D and h is convex ⇒ h(x) ≤ convD f (x), ∀x ∈ D.

Figure 3: The convex envelope of a function over two different sets

• Equivalently, convD f (x) is the pointwise maximum of all convex function that lie
below f (on D).

4
• As the pictures suggest, the epigraph of conv f is the convex hull of the epigraph of f .

• Computing convex hulls of functions is in general a difficult task; e.g., computing

the convex envelope of a mulitilinear function over the unit hypercube is NP-hard [3].
Indeed if we could compute convD f , then we could minimize f over D as the following
statement illustrates.

Theorem 1 ([5]). Consider the problem minx∈S f (x), where S is a convex set. Then,

f ∗ := min f (x) = min convS f (x) (1)

x∈S x∈S

and

{y ∈ S| f (y) = f ∗ } ⊆ {y ∈ S| convS f (y) = f ∗ }. (2)

Proof: First we prove (1). As convS f is an underestimator of f , we clearly have

min convS f ≤ min f (x).

x∈S x∈S

To see the converse, note that the constant function g(x) = f ∗ is a convex underestimator
of f . Hence, we must have convS f (x) ≥ f ∗ , ∀x ∈ S.

To prove (2), let y ∈ S be such that f (y) = f ∗ . Suppose for the sake of contradiction that
convS f (y) < f ∗ . But this means that the function

max{f ∗ , convS f }

is convex (why?), an underestimator of f on D (why?), but larger than convS f at y. This

contradicts convS f being the convex envelope.
Example: In simple cases, the convex envelope of some functions over certain sets can be
computed. A well-known example is the envelope of the function l0 (x) := ||x||0 , which is
the function l1 (x) = ||x||1 . The l0 “pseudonorm”, also known as the cardinality function, is
defined as
||x||0 = # of nonzero elements of x.
This function is not a norm (why?) and is not convex (why?).

Theorem 2. The convex envelope of the l0 pseudonorm over the set {x| ||x||∞ ≤ 1} is the
l1 function.

5
This simple observation is the motivation (or one motivation) behind many heuristics for l0
optimization like compressed sensing, LASSO, etc.

3.1 LASSO [7]

LASSO stands for least absolute shrinkage and selection operator. It is simply a least squares
problem with an l1 penalty
min ||Ax − b||2 + λ||x||1 ,
x

where λ > 0 is a fixed parameter. This is a convex optimization problem (why?).

• By increasing λ, we increase our preference for having sparse solutions.

• By decreasing λ, we increase our preference for decreasing the regression error.

Here’s the idea behind why this could be useful. Consider a very simple scenario where you
are given m data points in Rn and want to fit a function f to the data that minimizes the
sum of the squares of the deviations. The problem is, however, that you don’t have a good
idea of what function class exactly f belongs to. So you decide to throw in a lot of functions
in your basis: maybe you include a term for every monomial up to a certain degree, you
add trigonometric functions, exponential functions, etc. After this, you try to write f as a
linear combination of this massive set of basis functions by solving an optimization problem
that finds the coefficients of the linear combination. Well, if you use all the basis functions
(nonzero coefficients everywhere), then you will have very small least squares error but you
would be overfitting the data like crazy. What LASSO tries to do, as you increase λ, is to
set many of these coefficients equal to zero and tell you (somehow magically) that which of
the basis functions were actually important for fitting the data and which weren’t.

6
4 Support vector machines
• Support vector machines (SVM) constitute a prime example of supervised learning. In
such a setting, we would like to learn a classifier from a labeled data set (called the
training set). The classifier is then used to label future data points.

• A classic example is an email spam filter:

– Given a large number of emails with correct labels “spam” or “not spam”, we
would like an algorithm for classifying future emails as spam or not spam.
– The emails for which we already have the labels constitute the “training set”.

Figure 4: An example of a good spam filter

• A basic approach is to associate a pair (xi , yi ) to each email: yi is the label, which is
either 1 (spam) or −1 (not spam). The vector xi ∈ Rn is called a feature vector ; it
collects some relevant information about email i. For example:

– How many words are in the email?

– How many misspelled words?
– How many links?
– Is there a $ sign?
– Does the word “bank account” appear?
– Is the sender’s email client trustworthy?
– ...

7
• If we have m emails, we end up with m vectors in Rn , each with a label ±1. Here is a
toy example in R2 :

Figure 5: An example of a labeled training with only two features

• The goal is now to find a classifier f : Rn → R, which takes a positive value on spam
emails and a negative value on non-spam emails.

• The zero level set of f serves as a classifier for future predictions.

• We can search for many classes of classifier functions using convex optimization.

• The simplest one is linear classification: f (x) = aT x − b.

• Here, we need to find a ∈ Rn , b ∈ R that satisfy

aT xi − b > 0 if yi = 1
aT xi − b < 0 if yi = −1

• This is equivalent (why?) to finding a ∈ Rn , b ∈ R that satisfy:

yi (aT xi − b) ≥ 1, i = 1, . . . , m.

8
• This is a convex feasibility problem (in fact a set of linear inequalities). It may or may
not be feasible (compare examples above and below). Can you identify the geometric
condition for feasibility of linear classification? (Hint: think of convex hulls.)

Figure 6: An example of linearly separable data

• When linear separation is possible, there could be many (in fact infinitely many) linear
classifiers to choose from. Which one should we pick?

• As we explain next, the following optimization problem (known as maximum-margin

SVM ) tries to find the most “robust” one:

min ||a|| (3)

a,b

s.t. yi (aT xi − b) ≥ 1, i = 1, . . . , m.

– This is a convex optimization problem (why?).

– Its optimal solution is unique (why?).
– But what exactly is this optimization problem doing?

9
Claim 1. The optimization problem above is equivalent to

max t
a,b,t

s.t. yi (aT xi − b) ≥ t, i = 1, . . . , m, (4)

||a|| ≤ 1.

Claim 2. An optimal solution of (4) always satisfies ||a|| = 1.

Claim 3. The Euclidean distance of a point v ∈ Rn to a hyperplane aT z = b is given by

|aT v − b|
.
||a||

• Let’s believe these three claims for the moment. What optimization problem (3) is
then doing is finding a hyperplane that maximizes the minimum distance between the
hyperplane (our classifier) and any of our data points. Do you see why?

• We are trying to end up with as wide a margin as possible. Formally, the margin is
defined to be the distance between the two gray hyperplanes in the figure above. What
is the length of this margin in terms of a∗ (ans possibly b∗ )?

• Having a wide margin helps us be robust to noise, in case the feature vector of our
future data points happens to be slightly misspecified.

The proof of the three claims are given as homework. Here are a few hints:

10
• Claim 1: how would you get feasible solutions to one from the other?

• Claim 2: how would you improve the objective if it didn’t?

• Claim 3: good exercise of our optimality conditions.

4.1 Data that is not linearly separable

• What if the data points are not linearly separable?

• Idea: let’s try to minimize the number of points misclassified:

min ||η||0
a,b,η

s.t. yi (aT xi − b) ≥ 1 − ηi , i = 1, . . . , m
ηi ≥ 0, i = 1, . . . , m.

• Here, ||η||0 denotes th number of nonzero elements of η.

• If ηi = 0, data point i is correctly classified.

• The optimization problem above is trying to set as many entries of η to zero as possible.

– Unfortunately, it is a hard problem to solve.

– Which entries to set to zero? Many different subsets to consider.
– As a powerful heuristic for this problem, people solve the following problem in-
stead:

min ||η||1
a,b,η

s.t. yi (aT xi − b) ≥ 1 − ηi , i = 1, . . . , m
ηi ≥ 0, i = 1, . . . , m.

11
• This is a convex program (why?). We can solve it efficiently.

• The solution with minimum l1 norm tends to be sparse; i.e., has many entries that are
zero.

• Note that when ηi ≤ 1, data point i is still correctly classified but it falls within our
margin; hence it is not “robustly classified”.

• When ηi > 1, data point i is misclassified.

• We can solve a modified optimization problem to balance the tradeoff between the
number of missclassified points and the width of our margin:

min ||a|| + γ||η||1

a,b,η

s.t. yi (aT xi − b) ≥ 1 − ηi , i = 1, . . . , m
ηi ≥ 0, i = 1, . . . , m.

• γ ≥ 0 is a parameter that we fix a priori.

• Larger γ means we assign more importance to reducing number of misclassified points.

• Smaller γ means we assign more importance to having a large margin.

2
– Note that the length of our margin (counting both sides) is ||a||
(why?).

• For each γ, the problem is a convex program (why?).

• On your homework, you will run some numerical experiments on this problem.

12
Notes
Further reading for this lecture can include Chapter 3 of [2]. You can read more about SVMs
in Section 8.6 of [2].

References
[1] S. Boyd and M. Grant. Graph implementations for nonsmooth convex programs. Recent
Advances in Learning and Control. Springer-Verlag, 2008.

[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,

http://stanford.edu/ boyd/cvxbook/, 2004.

[3] Y. Crama. Recognition problems for special classes of polynomials in 0-1 variables.
Mathematical Programming, 44, 1989.

[4] Inc. CVX Research. CVX: Matlab software for disciplined convex programming, version
2.0. Available online at http://cvxr.com/cvx, 2011.

[5] D.Z. Du, P.M. Pardalos, and W. Weili. Mathematical Theory of Optimization. Kluwer
Academic Publishers, 2001.

[6] J.R. Shewchuk. An introduction to the conjugate gradient method without the agonizing
pain. Carnegie-Mellon University. Department of Computer Science, 1994.

[7] R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal
Statistical Society. Series B (Methodological), pages 267–288, 1996.

AP Calculus AB Exam: SECTION I: Multiple Choice
100% (7)
AP Calculus AB Exam: SECTION I: Multiple Choice
98 pages
EBA1203 Fall-2023 (Syallabus) ..
No ratings yet
EBA1203 Fall-2023 (Syallabus) ..
3 pages
Convex Optimization For Machine Learning
No ratings yet
Convex Optimization For Machine Learning
110 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
Convex Functions: September 2, 2008
No ratings yet
Convex Functions: September 2, 2008
21 pages
03 Convex Functions Notes Cvxopt f22
No ratings yet
03 Convex Functions Notes Cvxopt f22
21 pages
03 Convex Functions
No ratings yet
03 Convex Functions
31 pages
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
28 pages
Konveksna Optimizacija
No ratings yet
Konveksna Optimizacija
179 pages
Lecture7[1]
No ratings yet
Lecture7[1]
46 pages
Convex Optimization Prerequisite_topics
No ratings yet
Convex Optimization Prerequisite_topics
6 pages
lect3_removed
No ratings yet
lect3_removed
44 pages
Convex Optimization Overview: Zico Kolter October 19, 2007
No ratings yet
Convex Optimization Overview: Zico Kolter October 19, 2007
12 pages
An Introduction To Convexity: Geir Dahl November 2010
No ratings yet
An Introduction To Convexity: Geir Dahl November 2010
126 pages
convex-fns-scribed
No ratings yet
convex-fns-scribed
6 pages
Matinf 2360 Part 3
No ratings yet
Matinf 2360 Part 3
106 pages
斯坦福大学机器学习数学基础 33-40
No ratings yet
斯坦福大学机器学习数学基础 33-40
8 pages
Nisheeth VishnoiFall2014 ConvexOptimization PDF
No ratings yet
Nisheeth VishnoiFall2014 ConvexOptimization PDF
114 pages
Review 3
No ratings yet
Review 3
14 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
143 pages
Lecture_1_2_background
No ratings yet
Lecture_1_2_background
6 pages
Lecture3 ConvexSetsFuns PDF
No ratings yet
Lecture3 ConvexSetsFuns PDF
43 pages
CS599: Convex and Combinatorial Optimization Fall 2013 Lectures 5-6: Convex Functions
No ratings yet
CS599: Convex and Combinatorial Optimization Fall 2013 Lectures 5-6: Convex Functions
55 pages
Convex Optimization L2 18
No ratings yet
Convex Optimization L2 18
11 pages
Homework 2
No ratings yet
Homework 2
4 pages
(Strong, Strict) Convexity (Princeton. Lecture 14 Pages. ORF523 - Lec7)
No ratings yet
(Strong, Strict) Convexity (Princeton. Lecture 14 Pages. ORF523 - Lec7)
14 pages
Algorithms and Complexity
No ratings yet
Algorithms and Complexity
130 pages
2023_hw3sol
No ratings yet
2023_hw3sol
14 pages
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
No ratings yet
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
20 pages
1 Theory of Convex Functions
No ratings yet
1 Theory of Convex Functions
14 pages
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
No ratings yet
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
4 pages
5 Optimization: F Emp
No ratings yet
5 Optimization: F Emp
52 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
Convexity Examples: CE 377K Stephen D. Boyles Spring 2015
No ratings yet
Convexity Examples: CE 377K Stephen D. Boyles Spring 2015
11 pages
Chapter - 2 - Convex Function
No ratings yet
Chapter - 2 - Convex Function
32 pages
Support Vector Machines: 1 Outline
No ratings yet
Support Vector Machines: 1 Outline
19 pages
09_convex
No ratings yet
09_convex
48 pages
CPSC 542f Notes
No ratings yet
CPSC 542f Notes
10 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Convex Optimization - Introduction (S.l. Dr. Ing. Carmen Voicu)
No ratings yet
Convex Optimization - Introduction (S.l. Dr. Ing. Carmen Voicu)
32 pages
Unit 2 Introduction to Deep Learning
No ratings yet
Unit 2 Introduction to Deep Learning
79 pages
lect5_removed
No ratings yet
lect5_removed
35 pages
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
27 pages
Lec 16
No ratings yet
Lec 16
10 pages
Lecture 9 - SVM
No ratings yet
Lecture 9 - SVM
42 pages
I. Introduction To Convex Optimization
No ratings yet
I. Introduction To Convex Optimization
12 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
week 10 notes MLF
No ratings yet
week 10 notes MLF
20 pages
1.3+Setting+Parameters+of+a+Deep+Neural+Network+ +Hierarchical+Representations
No ratings yet
1.3+Setting+Parameters+of+a+Deep+Neural+Network+ +Hierarchical+Representations
10 pages
Ds 2
No ratings yet
Ds 2
27 pages
Advanced Engineering Math
No ratings yet
Advanced Engineering Math
12 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
8 SVMs
No ratings yet
8 SVMs
72 pages
CPSC 542F WINTER 2017: Lecture Notes
No ratings yet
CPSC 542F WINTER 2017: Lecture Notes
10 pages
Lecture 02 - Convexity
No ratings yet
Lecture 02 - Convexity
42 pages
SVM Reference
No ratings yet
SVM Reference
8 pages
opte - Optimization
No ratings yet
opte - Optimization
125 pages
斯坦福大学机器学习数学基础 41-48
No ratings yet
斯坦福大学机器学习数学基础 41-48
8 pages
Classification of Optimization methods
No ratings yet
Classification of Optimization methods
68 pages
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
Dynamics Assignment University of Aberdeen
No ratings yet
Dynamics Assignment University of Aberdeen
5 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
17 pages
Real Number Assignment
No ratings yet
Real Number Assignment
4 pages
Arithmetic Mean 4
No ratings yet
Arithmetic Mean 4
9 pages
List of Lab Activities For Class 11
No ratings yet
List of Lab Activities For Class 11
1 page
Sheena Syllabus
No ratings yet
Sheena Syllabus
3 pages
Solution Manual For Numerical Analysis
No ratings yet
Solution Manual For Numerical Analysis
4 pages
Number of K-Combination: Without Repetition, Also Called K-Permutation (But Which Is Not A Permutation of S in
No ratings yet
Number of K-Combination: Without Repetition, Also Called K-Permutation (But Which Is Not A Permutation of S in
5 pages
Anna University Question Paper - MA2261 Probability and Random Processes
No ratings yet
Anna University Question Paper - MA2261 Probability and Random Processes
3 pages
Calc Medic Unit 4 Guided Notes AP Daily Videos - AP Precalculus - Calc Medic
No ratings yet
Calc Medic Unit 4 Guided Notes AP Daily Videos - AP Precalculus - Calc Medic
20 pages
Learning Activity Sheets - Week 5
No ratings yet
Learning Activity Sheets - Week 5
4 pages
Quiz 2 Ques and Answer
No ratings yet
Quiz 2 Ques and Answer
3 pages
HW 2 Solution
No ratings yet
HW 2 Solution
3 pages
Logarithm
No ratings yet
Logarithm
2 pages
Croutes Methods
No ratings yet
Croutes Methods
30 pages
Along With The Correct Answers On An Intermediate Paper.) : y A X A 0
No ratings yet
Along With The Correct Answers On An Intermediate Paper.) : y A X A 0
3 pages
Retrace
No ratings yet
Retrace
18 pages
2.1 Problem Set 2 - 1
No ratings yet
2.1 Problem Set 2 - 1
14 pages
Solving Algebra Equations With Fractions
No ratings yet
Solving Algebra Equations With Fractions
2 pages
Problem Set 4 Solution Numerical Methods
No ratings yet
Problem Set 4 Solution Numerical Methods
6 pages
3.6-Bilinear Transformation 025506
No ratings yet
3.6-Bilinear Transformation 025506
17 pages
Domain and Range
100% (1)
Domain and Range
6 pages
Dimensional Regularization: Notes
No ratings yet
Dimensional Regularization: Notes
2 pages
Periodicity and Extreme Values: Vsaq'S
No ratings yet
Periodicity and Extreme Values: Vsaq'S
5 pages
INDE6372: LECTURE 10 Midterm Review: Jiming Peng Department of Industrial Engineering University of Houston
No ratings yet
INDE6372: LECTURE 10 Midterm Review: Jiming Peng Department of Industrial Engineering University of Houston
30 pages
BV Cvxbook Extra Exercises
No ratings yet
BV Cvxbook Extra Exercises
139 pages
Maths
No ratings yet
Maths
5 pages
Engineering Statistics and Probability Chapter 3 PDF
No ratings yet
Engineering Statistics and Probability Chapter 3 PDF
25 pages