Module-4-Optimization
Module-4-Optimization
Linear Algebra
Statistics
Optimization
What is optimization?
Why Optimization?
It helps to find minimum error or best solution for a problem. For example,
in regression, error is calculated as:
e=y1-a1-a2*x1
In general, en=yn-a1-a2*xn
If there are no constraints on what values the decision variables can take,
we have an unconstrained optimization problem. This is a type of problem
encountered in linear regression. It is also called a functional
approximation problem and is widely used in data science.
Further, if we look at data points from various classes and want to find a
hyper plane to separate these, the question is to find the best hyper plane
which effectively separates these classes.
The answer lies in optimization. The hyper plane must be chosen in such a
way that classes are effectively classified and decision variables are the
parameters that characterize the hyper plane.
Thus as we have seen, almost all Machine Learning algorithms can viewed
as solutions to optimization problems. A deeper understanding of
optimization problems gives a better understanding of Machine Learning
and helps to rationalize the working of algorithms.
Components of Optimization
2. Decision variables
3. Constraints
The first component is an objective function f(x) which we are trying to
either maximize or minimize. In general, we talk about minimization
problems because if you have a maximization problem with f(x), it can
be converted to a minimization problem with -f(x).
Types of of optimization
x1 ∈ [0,1,2,3]
x2∈ [0,1,2,3]
If the objective function and constraints are linear and decision variable is
an integer, it is called a Linear integer programming problem.
A special case might be decision variables can only take binary values
( 0,1), then it is called a Binary integer programming problem.
min f(x)
x∈ R,
The x-axis represents various values for the decision variable (x) and the
y-axis represents the objective function f(x). The graph shows the point at
which this function attains its minimum value. By dropping a
perpendicular onto the x-axis, we get x* , which is the value of x at which
the function attains a minimum value. The corresponding value on y-
axis, f*, is the best value this function could possibly take. This is a
Convex function as has only one minima. This minima is the local as well
as global minima i.e. the value of the function is smaller than at nearby
points and is also the best solution or smaller than all feasible points.
However, in certain cases, the local and global minima might not coincide.
The value of the function might be smaller than that at nearby points
(local minima) but is not necessarily the optimal solution. Such functions
are known as Non-convex functions. They might multiple local minima. In
other words, the local vicinity might not have a better point/solution but
there could be one farther away.
Gradient descent
Gradient descent is an optimization algorithm which can be used for
estimating the values of regression parameters, given a dataset with
inputs and outputs.
The cost function for the linear regression model is the total error (mean
squared error) across all N records and is given by:
Gradient descent finds the optimal values of β0 and β1 that minimize loss
function using the following steps: