w3 - Linear Model - Linear Regression
w3 - Linear Model - Linear Regression
LINEAR
REGRESSION
Dr. Srikanth Allamsetty
Formulation & Mathematical Foundation
of Regression Problem
What is Regression
• Regression – predict value of response variable from attribute variables.
• Variables – continuous numeric values
• Regression analysis – a set of statistical processes for estimating the relationships
between a dependent variable and one or more independent variables.
• Dependent variables are often called the 'predictand', 'outcome' or 'response' variable;
• Independent variables are often called 'predictors', 'covariates', 'explanatory variables' or
'features'.
• Regression analysis is a way of mathematically sorting out which of those variables does
indeed have an impact.
• Used for modeling the future relationship between the variables.
• Statistical process – a science of collecting, exploring, organizing, analyzing,
interpreting data and exploring patterns and trends to answer questions and make
decisions (Broad area).
Basics of Regression Models
• Regression models predict a value of the Y variable given known values
of the X variables.
• Prediction within the range of values in the dataset used for model-fitting
is known as interpolation.
• Prediction outside this range of the data is known as extrapolation.
• First, a model to estimate the outcome need to be fixed.
• Then the parameters of that model need to be estimated using any
chosen method (e.g., least squares).
Formulation of Regression Models
• Regression models involve the following components:
• The unknown parameters, often denoted as β or ω or w.
• The independent variables, which are observed in data and are often
denoted as a vector Xi (where i denotes a row of data).
• The dependent variable, which are observed in data and often denoted
using the scalar Yi.
• The error terms, which are not directly observed in data and are often
denoted using the scalar ei.
Formulation of Regression Models
• Most regression models propose that Yi is a function of Xi and β, with ei
representing an additive error term that may stand in for a random statistical
noise.
• Our objective is to estimate the function f(Xi , β) that most closely fits the data.
• To carry out regression analysis, the form of the function f must be specified.
• Sometimes the form of this function is based on knowledge about the
relationship between Yi and Xi .
• If no such knowledge is available, a flexible or convenient form for f is chosen.
Formulation of Regression Models
• You may start with a simple univariate linear regression:
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
N = number of training examples
Multiple Linear Regression
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
N = number of training examples
The Regression Model, & The
Concepts of Least Squares
What is Least Square Method
• The least-squares method is a statistical method that is practised to find a
regression line or a best-fit line for the given pattern.
• The method of least squares is used in regression.
• In regression analysis, this method is said to be a standard approach for the
approximation of sets of equations having more equations than the number of
unknowns (overdetermined systems).
• It is used to approximate the solution by minimizing the sum of the squares of the
residuals made in the results of each individual equation.
• Residual: the difference between an observed value and the fitted value provided by a model
• The problem of finding a linear regressor function will be formulated as a problem
of minimizing a criterion function.
• The widely-used criterion function for regression purposes is the sum-of-error-
squares.
Least Square Method with Linear
Regression
Least Square Method with Linear
Regression
• In general, regression methods are used to predict the value of
response (dependent) variable from attribute (independent)
variables,
• Linear regressor model fits a linear function (relationship) between
dependent (output) variable and independent (input) variables.
where denotes the learning rate, and stands for the actual iteration step.
Note:
● Need to choose .
● Needs many iterations.
● Works well even when n is large.
● Gradient descent serves as the basis for learning algorithms that search
the hypothesis space of possible weight vectors to find the weights that
best fit the training examples.
What is Gradient Descent
Gradient Descent Optimization Schemes
● Optimization method Gradient Descent Method used for minimization
tasks. Changes of the weights are made according to the following
algorithm:
where denotes the learning rate, and stands for the actual iteration step.
Approaches for deciding the iteration step:
1. Batch methods use all the data in one shot.
● iteration step means the kth presentation of training dataset.
● Gradient is calculated across the entire set of training patterns.
2. Online methods is where
● – iteration step after single data pair is presented.
● Share almost all good features of recursive least square algorithm with
reduced computational complexity.
The gradient descent training rule
Performance Criterion:
---To be minimized.
Called as cost function.
of steepest decrease.
• Therefore, the training rule for gradient
descent is, ---To be minimized.
Called as cost function.
is learning rate, a +ve constant, determines the step size in the search.
The gradient descent training rule
• This training rule can also be written in its component form:
is attained.
The gradient with respect to weight wj
& Performance Criterion:
---To be minimized.
Called as cost function.
The gradient with respect to weight wj