Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Module-4-Optimization

The document outlines the three pillars of Data Science: Linear Algebra, Statistics, and Optimization, emphasizing the importance of optimization in finding the best solutions to various problems across multiple fields. It explains the components of optimization, including objective functions, decision variables, and constraints, and discusses different types of optimization problems such as linear programming and non-linear programming. Furthermore, it introduces concepts like convex optimization and gradient descent as methods for minimizing error in data science applications.

Uploaded by

Bhagya Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module-4-Optimization

The document outlines the three pillars of Data Science: Linear Algebra, Statistics, and Optimization, emphasizing the importance of optimization in finding the best solutions to various problems across multiple fields. It explains the components of optimization, including objective functions, decision variables, and constraints, and discusses different types of optimization problems such as linear programming and non-linear programming. Furthermore, it introduces concepts like convex optimization and gradient descent as methods for minimizing error in data science applications.

Uploaded by

Bhagya Lakshmi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

The three pillars of Data Science are:

 Linear Algebra

 Statistics

 Optimization

 What is optimization?

“An optimization problem consists of maximizing or minimizing a real


function by systematically choosing input values from an allowed set and
computing the value of the function. ”

It is useful in finding the best solution to a problem (which could be


minimizing or maximizing the functional form f(x)). Here x stands for
decision variables. We choose values for x so that this function f is either
maximized or minimized. There might be certain constraints on x which
have to be satisfied while solving the optimization problem i.e. we can
choose x only in certain regions or sets of values.

Why Optimization?

In real life, optimization helps improve the efficiency of a system. It is


used in a myriad of areas including medicine, manufacturing,
transportation, supply chain, finance, government, physics, economics,
artificial intelligence, etc. In an optimization model, the goal can be to
minimize cost in a production system. In a hospital, the goal can be to
minimize the wait time for patients in an emergency room before they are
seen by a doctor. In Marketing, the goal can be to maximize the profit
obtained by targeting the right customers under budget and operational
conditions. In a Humanitarian Operation the goal would be to reach as
many affected people as quickly as possible to distribute resources water,
food, and medical services by designing optimal routes. It also serves as a
backbone for data science algorithms. It is at the heart of almost all
machine learning and statistical techniques used in data science.

It helps to find minimum error or best solution for a problem. For example,
in regression, error is calculated as:

Optimization helps find a minimum value for the loss function.

Let’s take another example.


If we have data on y and x and we want to fit a function between and y
and x, say y=a1+a2*x. We have multiple samples of x and y, (x1,y1),
(x2,y2) and so on. We can generalize this as: yn=a1+a2*xn. As we see,
there are n equations but only 2 variables. It is nearly impossible to solve
all these equations. So we define error function as:

e=y1-a1-a2*x1

In general, en=yn-a1-a2*xn

If there are no constraints on what values the decision variables can take,
we have an unconstrained optimization problem. This is a type of problem
encountered in linear regression. It is also called a functional
approximation problem and is widely used in data science.

Further, if we look at data points from various classes and want to find a
hyper plane to separate these, the question is to find the best hyper plane
which effectively separates these classes.

The answer lies in optimization. The hyper plane must be chosen in such a
way that classes are effectively classified and decision variables are the
parameters that characterize the hyper plane.

Thus as we have seen, almost all Machine Learning algorithms can viewed
as solutions to optimization problems. A deeper understanding of
optimization problems gives a better understanding of Machine Learning
and helps to rationalize the working of algorithms.

Components of Optimization

1. Objective function (either maximum or minimum)

2. Decision variables

3. Constraints
The first component is an objective function f(x) which we are trying to
either maximize or minimize. In general, we talk about minimization
problems because if you have a maximization problem with f(x), it can
be converted to a minimization problem with -f(x).

Decision variables(x): The second component is the decision variables


which we can choose to minimize the function. So, we usually write this as
min f(x).

Constraints(a ≤ x ≤ b): The third component is the constraint which


constrains this x to a certain set of values.

Types of of optimization

Depending on type of objective function, constraints and decision


variables, optimization can be of various types.

1. Linear programming: If the decision variable is continuous, the


functional form is linear and all constraints are also linear, it is called a
Linear programming problem.

2.Non linear programming: If the decision variable remains continuous


and either the objective function or constraints are non linear, it is called a
non linear programming problem.

In many cases, decision variables might not be continuous and could be


integers. For eg: f(x1,x2) and

x1 ∈ [0,1,2,3]

x2∈ [0,1,2,3]

If the objective function and constraints are linear and decision variable is
an integer, it is called a Linear integer programming problem.

If the objective function or constraints are non-linear and decision variable


is an integer, it is called a Non-linear integer programming problem.

A special case might be decision variables can only take binary values
( 0,1), then it is called a Binary integer programming problem.

Mixed-integer linear programming problem: If the decision variable(x) is a


mixed variable and if the objective function(f) is linear and all the
constraints are also linear then this type of problem known as a mixed-
integer linear programming problem. So, in this case, the decision
variables are mixed, the objective function is linear and the constraints
are also linear.

Mixed-integer non-linear programming problem: If the decision variable(x)


remains mixed; however, if either the objective function(f) or the
constraints are non-linear then this type of problem known as a mixed-
integer non-linear programming problem. So, a programming problem
becomes non-linear if either the objective or the constraints become non-
linear.

Let’s briefly touch upon Convex optimization.

Univariate optimization is a non-linear optimization problem with no


constraints. In Univariate optimization, there is only one decision variable.

min f(x)

x∈ R,

where x is the decision variable and f is the objective function. x is


continuous as it can have an infinite set of values.

Univariate optimization is easy to visualize in 2D for as shown below:

The x-axis represents various values for the decision variable (x) and the
y-axis represents the objective function f(x). The graph shows the point at
which this function attains its minimum value. By dropping a
perpendicular onto the x-axis, we get x* , which is the value of x at which
the function attains a minimum value. The corresponding value on y-
axis, f*, is the best value this function could possibly take. This is a
Convex function as has only one minima. This minima is the local as well
as global minima i.e. the value of the function is smaller than at nearby
points and is also the best solution or smaller than all feasible points.

However, in certain cases, the local and global minima might not coincide.
The value of the function might be smaller than that at nearby points
(local minima) but is not necessarily the optimal solution. Such functions
are known as Non-convex functions. They might multiple local minima. In
other words, the local vicinity might not have a better point/solution but
there could be one farther away.

Gradient descent
Gradient descent is an optimization algorithm which can be used for
estimating the values of regression parameters, given a dataset with
inputs and outputs.

The cost function for the linear regression model is the total error (mean
squared error) across all N records and is given by:

Gradient descent finds the optimal values of β0 and β1 that minimize loss
function using the following steps:

1. Randomly guess the initial values of β0 (bias or intercept) and β1


(feature weight).

2. Calculate the estimated value of the outcome variable Ŷi for initialized


values of bias and weights.

3. Calculate the mean square error function (MSE).

4. Adjust the β0 and β1 values by calculating the gradients of the error


function.
5. Repeat steps 1 to 4 for several iterations until the error stops reducing
further or the change in cost is infinitesimally small.

You might also like