Why Convexity Is The Key To Optimization: Convex Sets

Convexity is key to optimization because convex cost functions have important properties. Convex sets are shapes where any line joining two points is contained within the set. A convex function is one where its epigraph, or set of points above the graph, forms a convex set. For gradient descent optimization to find the global minimum, the cost function must be convex so there are no local minima. With a non-convex cost function, gradient descent can get stuck in local minima instead of reaching the true global minimum. Understanding convexity is important for machine learning as optimization is central to building models.

Uploaded by

Navneet Lalwani

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views

Why Convexity Is The Key To Optimization: Convex Sets

Uploaded by

Navneet Lalwani

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Why convexity is the key to optimization

It is easy with convex cost functions

Convex Sets

To simply things, think of convex sets as shapes where any line joining 2 points in this set is
never outside the set. This is called a convex set.
Take a look at the examples below.

It is evident that any line joining 2 points on a circle or say a square (the shapes on extreme left
and middle), will have all the line segments within the shape. These are examples of convex
sets.
On the other hand, the shape on the extreme right in the figure above has part of a line outside the
shape. Thus, this is not a convex set.
A convex set C can be represented as follows.

Convex set condition

Epigraph

Consider the graph of a function f.

An epigraph is a set of points lying on or above the function’s graph.
Epigraph of a function

Convex Function
Okay, now that you understand what are convex sets and epigraphs, we can talk about
convex functions.

A function f is said to be a convex function if its epigraph is a convex set (as seen in the green
figure below on the left).
This means that every line segment drawn on this graph is always equal to or above the function
graph. Pause a minute and check for yourself.
This means that a function f is not convex if there exist two points x, y such that the line
segment joining f(x) and f(y), is below the curve of the function f. This causes the loss of
convexity of the epigraph (as seen in the red-figure above on the right ).
This means that every line segment drawn on this graph is not always equal to or above the
function graph. The same can be proven by taking points on the bends.

Testing for convexity

Most of the cost functions in the case of neural networks would be non-convex. Thus you must
test a function for convexity.
A function f is said to be a convex function if the seconder-order derivative of that function is
greater than or equal to 0.

Condition for convex functions.

Examples of convex functions: y=eˣ, y=x². Both of these functions are differentiable twice.
If -f(x) (minus f(x)) is a convex function, then the function is called a concave function.

Condition for concave functions.

Examples of concave functions: y=-eˣ. The function is differentiable twice.
Convexity in gradient descent optimization
As said earlier, gradient descent is a first-order iterative optimization algorithm that is used to
minimize a cost function.
To understand how convexity plays a crucial role in gradient descent, let us take the example of
convex and non-convex cost functions.
For a Linear Regression model, we define the cost function Mean Square Error(MSE), which
measures the average squared difference between actual and predicted values. Our goal is to
minimize this cost function in order to improve the accuracy of the model. MSE is a convex
function (it is differentiable twice). This means there is no local minimum, but only the global
minimum. Thus gradient descent would converge to the global minimum.

MSE equation. Image by the author.

Now let us consider a non-convex cost function. In this case, take an arbitrary non-convex

function as plotted below.

Instead of converging to the global minimum, you can see that gradient descent would stop at the
local minimum because the gradient at that point is zero (slope is 0) and minimum in the
neighborhood. One way to solve this issue is by using the concept momentum.

Conclusion

Convex functions play a huge role in optimization. Optimization is the core of a machine learning
model. Understanding convexity is really important for the same, which I believe you did from
this article.
https://towardsdatascience.com/understand-convexity-in-optimization-db87653bf920

AB SG Unit 1 Progress Check MCQ Part A
No ratings yet
AB SG Unit 1 Progress Check MCQ Part A
11 pages
Louis-Pierre Arguin - A First Course in Stochastic Calculus-American Mathematical Society (2021)
No ratings yet
Louis-Pierre Arguin - A First Course in Stochastic Calculus-American Mathematical Society (2021)
289 pages
MATH 30.13 Reviewer
No ratings yet
MATH 30.13 Reviewer
9 pages
EER4 Notes
No ratings yet
EER4 Notes
6 pages
Paper 2
No ratings yet
Paper 2
10 pages
3 Integrals
No ratings yet
3 Integrals
33 pages
Exercises of Integral Calculus
From Everand
Exercises of Integral Calculus
Simone Malacrida
No ratings yet
EER4 Notes PDF
No ratings yet
EER4 Notes PDF
6 pages
TransformationsofFunctions Notestopic37
No ratings yet
TransformationsofFunctions Notestopic37
6 pages
functions project work
No ratings yet
functions project work
9 pages
Math
No ratings yet
Math
32 pages
Integration by Parts
No ratings yet
Integration by Parts
12 pages
Number Theory
No ratings yet
Number Theory
85 pages
2.4 - Operations With Functions - Running Notes
No ratings yet
2.4 - Operations With Functions - Running Notes
22 pages
Curve Sketching - Solved Problems
No ratings yet
Curve Sketching - Solved Problems
26 pages
Model Paper-4
No ratings yet
Model Paper-4
122 pages
7.2B - Graphing Absolute Value Functions of The Form Key PDF
No ratings yet
7.2B - Graphing Absolute Value Functions of The Form Key PDF
2 pages
Integrals of Simple Functions
No ratings yet
Integrals of Simple Functions
9 pages
Simplex Algorithm - Wikipedia
No ratings yet
Simplex Algorithm - Wikipedia
20 pages
Calculus
No ratings yet
Calculus
31 pages
Math Analysis
No ratings yet
Math Analysis
16 pages
2.2 & 2.3 Graphs of Functions Increasing and Decreasing Functions
No ratings yet
2.2 & 2.3 Graphs of Functions Increasing and Decreasing Functions
21 pages
Chapter 5: Basis Functions and Regularization
No ratings yet
Chapter 5: Basis Functions and Regularization
4 pages
ML-UNIT-3
No ratings yet
ML-UNIT-3
46 pages
Relations: Lines Connect The Inputs With Their Outputs. The Relation Can Also Be Represented As
No ratings yet
Relations: Lines Connect The Inputs With Their Outputs. The Relation Can Also Be Represented As
13 pages
Math For Machine Learning: 1.1 Differential Calculus
No ratings yet
Math For Machine Learning: 1.1 Differential Calculus
21 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
InterpretingGraphs (1)
No ratings yet
InterpretingGraphs (1)
13 pages
Functions and Their Properties
No ratings yet
Functions and Their Properties
38 pages
Integral Calculus
No ratings yet
Integral Calculus
8 pages
Math HWK24 PDF
No ratings yet
Math HWK24 PDF
7 pages
Exercises of Function Study
From Everand
Exercises of Function Study
Simone Malacrida
No ratings yet
BSCE-2-4_46_Pillado_Therese-Pauline_Machine-Problem-No.-1 (1) (1)
No ratings yet
BSCE-2-4_46_Pillado_Therese-Pauline_Machine-Problem-No.-1 (1) (1)
16 pages
Integration by Parts
No ratings yet
Integration by Parts
19 pages
Experiment - 2
No ratings yet
Experiment - 2
9 pages
Intergration
No ratings yet
Intergration
16 pages
Aravind Rangamreddy 500195259 cs3
No ratings yet
Aravind Rangamreddy 500195259 cs3
8 pages
C1 Sec 3
No ratings yet
C1 Sec 3
8 pages
Lecture (1) - Differential Calculus-1
No ratings yet
Lecture (1) - Differential Calculus-1
61 pages
EXPONENTIAL FUNCTION
No ratings yet
EXPONENTIAL FUNCTION
8 pages
Monotonic Function
No ratings yet
Monotonic Function
7 pages
Ch2 Functions
No ratings yet
Ch2 Functions
6 pages
Quick Look at The Margins Command
No ratings yet
Quick Look at The Margins Command
7 pages
So How Do We Find The Inverse of A Function?
No ratings yet
So How Do We Find The Inverse of A Function?
6 pages
Graphs of Functions
No ratings yet
Graphs of Functions
13 pages
Derivadas y Sus Aplicaciones.
No ratings yet
Derivadas y Sus Aplicaciones.
6 pages
MODULE-1.1-FUNCTIONS-LIMITS-AND-CONTINUITY-PART-2.pptx
No ratings yet
MODULE-1.1-FUNCTIONS-LIMITS-AND-CONTINUITY-PART-2.pptx
57 pages
Assignment of Calculus
No ratings yet
Assignment of Calculus
12 pages
Deriv Int Paper 2 Math
No ratings yet
Deriv Int Paper 2 Math
15 pages
What Are Integrals? How Can We Find Them?
No ratings yet
What Are Integrals? How Can We Find Them?
5 pages
Topic 8 - Maxima Minima
No ratings yet
Topic 8 - Maxima Minima
27 pages
CH0 Before Calculus Part 2 Functions
No ratings yet
CH0 Before Calculus Part 2 Functions
76 pages
AP Calculus AB Full Exam StudyGuide
No ratings yet
AP Calculus AB Full Exam StudyGuide
11 pages
Simplex Algorithm
No ratings yet
Simplex Algorithm
10 pages
Maths Project- INTEGRATION
No ratings yet
Maths Project- INTEGRATION
8 pages
Math 1201 Learning Journal Unit 4.
No ratings yet
Math 1201 Learning Journal Unit 4.
4 pages
SMathStudio.0 85.NumericFunctions - Article.eng
No ratings yet
SMathStudio.0 85.NumericFunctions - Article.eng
8 pages
Fresnel Integrals
No ratings yet
Fresnel Integrals
4 pages
Exploring The Radon and Trace Transforms
No ratings yet
Exploring The Radon and Trace Transforms
8 pages
9 Elementary Functions
No ratings yet
9 Elementary Functions
13 pages
6.3 Exponential Functions: in This Section, We Will Study The Following Topics
No ratings yet
6.3 Exponential Functions: in This Section, We Will Study The Following Topics
25 pages
Computer Vision - Notes - ch2
No ratings yet
Computer Vision - Notes - ch2
34 pages
FRANK_144_9330
No ratings yet
FRANK_144_9330
2 pages
Activation Functions in Neural Networks: What Is Activation Function?
No ratings yet
Activation Functions in Neural Networks: What Is Activation Function?
11 pages
WV47RS_1729818436814
No ratings yet
WV47RS_1729818436814
1 page
Artificial Neural Networks For Machine Learning - Every Aspect You Need To Know About
No ratings yet
Artificial Neural Networks For Machine Learning - Every Aspect You Need To Know About
9 pages
Limitations of ML
No ratings yet
Limitations of ML
10 pages
What Is Distribution?
No ratings yet
What Is Distribution?
4 pages
19MAT115-Discrete Mathematics-Lecture Plan PDF
No ratings yet
19MAT115-Discrete Mathematics-Lecture Plan PDF
3 pages
Spacetime Constraints
No ratings yet
Spacetime Constraints
10 pages
P R Baxandall and H Liebeck Differential Vector Calculus Longman Mathematical Texts Longman London 1981 240 PP 795
No ratings yet
P R Baxandall and H Liebeck Differential Vector Calculus Longman Mathematical Texts Longman London 1981 240 PP 795
1 page
TRIGONOMETRY - JEE Main - July 2022
No ratings yet
TRIGONOMETRY - JEE Main - July 2022
3 pages
Hohete Tibeb Share Company Education and Technology Team Ethio-Parents' Schools: Addis Ababa (Gerji/Gullele) and Hawassa. Work Sheet For Grade 12
No ratings yet
Hohete Tibeb Share Company Education and Technology Team Ethio-Parents' Schools: Addis Ababa (Gerji/Gullele) and Hawassa. Work Sheet For Grade 12
3 pages
calc_4.6_packet
No ratings yet
calc_4.6_packet
4 pages
Probability Module2
No ratings yet
Probability Module2
22 pages
Functions Summary
No ratings yet
Functions Summary
1 page
Week 2 Cohort 2 - Limits of Functions and Continuity
No ratings yet
Week 2 Cohort 2 - Limits of Functions and Continuity
40 pages
Chapter1 Handout 2021 22
No ratings yet
Chapter1 Handout 2021 22
78 pages
How To Start Programming in Python - 15 Steps - WikiHow
No ratings yet
How To Start Programming in Python - 15 Steps - WikiHow
8 pages
Laplace Transformation - Soran Jalal Muhamadamin
No ratings yet
Laplace Transformation - Soran Jalal Muhamadamin
24 pages
Special Functions of Signal Processing
No ratings yet
Special Functions of Signal Processing
7 pages
MTH305
No ratings yet
MTH305
83 pages
MS Excel Notes
No ratings yet
MS Excel Notes
34 pages
DAA Unit 1
No ratings yet
DAA Unit 1
39 pages
LT-1 (Laplace Transform)
No ratings yet
LT-1 (Laplace Transform)
18 pages
Real and Functional Analysis (Moscow... (Z-Library)
100% (2)
Real and Functional Analysis (Moscow... (Z-Library)
602 pages
Computational Finance
No ratings yet
Computational Finance
36 pages
F4211Businessfunction Notes
No ratings yet
F4211Businessfunction Notes
9 pages
Week 1.1 Basic Calculuszxc
No ratings yet
Week 1.1 Basic Calculuszxc
60 pages
7 January 2019 Osu Cse 1
No ratings yet
7 January 2019 Osu Cse 1
66 pages
Syllabus For BITSAT-2019 Part I: Physics: Annexure
No ratings yet
Syllabus For BITSAT-2019 Part I: Physics: Annexure
12 pages
15 Inverse Trigonometry Revision Notes Quizrr
No ratings yet
15 Inverse Trigonometry Revision Notes Quizrr
54 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Assessment: Martha Has Won 19 Out of 28 Tennis Matches This Season
100% (1)
Assessment: Martha Has Won 19 Out of 28 Tennis Matches This Season
5 pages
Orca Share Media1680312384648 7047740956181238360
No ratings yet
Orca Share Media1680312384648 7047740956181238360
13 pages
02 - Blackburn Et Al. - 2019 - Semtex A Spectral Element-Fourier Solver For The
No ratings yet
02 - Blackburn Et Al. - 2019 - Semtex A Spectral Element-Fourier Solver For The
13 pages