w3 - Linear Model - Linear Regression

The document provides an overview of linear regression models. It discusses that regression analysis is used to predict the value of a response variable from attribute variables. The key aspects covered include: - Regression models involve parameters, independent variables, dependent variables, and error terms. The objective is to estimate the function that best fits the data. - The least squares method is commonly used to estimate the parameters by minimizing the sum of squared errors between observed and predicted values. - Gradient descent is another approach to determine the optimal parameter weights by iteratively updating the weights in steps that reduce the error function. It avoids problems with singular matrices and large computations compared to the pseudoinverse method.

Uploaded by

Swastik Sindhani

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

w3 - Linear Model - Linear Regression

Uploaded by

Swastik Sindhani

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

LINEAR MODEL -

LINEAR
REGRESSION
Dr. Srikanth Allamsetty
Formulation & Mathematical Foundation
of Regression Problem
What is Regression
• Regression – predict value of response variable from attribute variables.
• Variables – continuous numeric values
• Regression analysis – a set of statistical processes for estimating the relationships
between a dependent variable and one or more independent variables.
• Dependent variables are often called the 'predictand', 'outcome' or 'response' variable;
• Independent variables are often called 'predictors', 'covariates', 'explanatory variables' or
'features'.
• Regression analysis is a way of mathematically sorting out which of those variables does
indeed have an impact.
• Used for modeling the future relationship between the variables.
• Statistical process – a science of collecting, exploring, organizing, analyzing,
interpreting data and exploring patterns and trends to answer questions and make
decisions (Broad area).
Basics of Regression Models
• Regression models predict a value of the Y variable given known values
of the X variables.
• Prediction within the range of values in the dataset used for model-fitting
is known as interpolation.
• Prediction outside this range of the data is known as extrapolation.
• First, a model to estimate the outcome need to be fixed.
• Then the parameters of that model need to be estimated using any
chosen method (e.g., least squares).
Formulation of Regression Models
• Regression models involve the following components:
• The unknown parameters, often denoted as β or ω or w.
• The independent variables, which are observed in data and are often
denoted as a vector Xi (where i denotes a row of data).
• The dependent variable, which are observed in data and often denoted
using the scalar Yi.
• The error terms, which are not directly observed in data and are often
denoted using the scalar ei.
Formulation of Regression Models
• Most regression models propose that Yi is a function of Xi and β, with ei
representing an additive error term that may stand in for a random statistical
noise.

• Our objective is to estimate the function f(Xi , β) that most closely fits the data.
• To carry out regression analysis, the form of the function f must be specified.
• Sometimes the form of this function is based on knowledge about the
relationship between Yi and Xi .
• If no such knowledge is available, a flexible or convenient form for f is chosen.
Formulation of Regression Models
• You may start with a simple univariate linear regression:

• It indicates that you believe that a reasonable approximation for Yi is:

• Now the next objective is to estimate the parameters β

• may be using least squares method
• may go with other alternatives such as least absolute deviations, Least trimmed squares,
quantile regression estimator, Theil–Sen estimator, M-estimation (maximum likelihood
type) or S-estimation (scale).
Formulation of Regression Models
• Find the value of β that minimizes the sum of squared errors

• A given regression method will ultimately provide an estimate of β, usually

denoted .
• Using this estimate, you can then find the fitted value for prediction or to
assess the accuracy of the model in explaining the data.
Variants in Regression Models
• The most common
models are simple linear
Y = a + bX + ϵ and multiple linear.
(discussed in last class)
• Nonlinear regression
Y = a + bX1 + cX2 + dX3 + ϵ analysis is commonly used
for more complicated
data sets in which the
dependent and
independent variables
-- Logistic Regression show a nonlinear
relationship.
Y= -- Michaelis–Menten model for enzyme kinetics
Multiple Linear Regression
x y
Size (feet2) Price ($1000)
Number of bedrooms Number of floors Age of home (years)
i 1 2 3 4
1 2104 5 1 45 460
2 1416 3 2 40 232
3 1534 3 2 30 315
4 852 2 1 36 178
… … … … …
N

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
N = number of training examples
Multiple Linear Regression

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
N = number of training examples
The Regression Model, & The
Concepts of Least Squares
What is Least Square Method
• The least-squares method is a statistical method that is practised to find a
regression line or a best-fit line for the given pattern.
• The method of least squares is used in regression.
• In regression analysis, this method is said to be a standard approach for the
approximation of sets of equations having more equations than the number of
unknowns (overdetermined systems).
• It is used to approximate the solution by minimizing the sum of the squares of the
residuals made in the results of each individual equation.
• Residual: the difference between an observed value and the fitted value provided by a model
• The problem of finding a linear regressor function will be formulated as a problem
of minimizing a criterion function.
• The widely-used criterion function for regression purposes is the sum-of-error-
squares.
Least Square Method with Linear
Regression
Least Square Method with Linear
Regression
• In general, regression methods are used to predict the value of
response (dependent) variable from attribute (independent)
variables,
• Linear regressor model fits a linear function (relationship) between
dependent (output) variable and independent (input) variables.

• where {w0, w1, …, wn} are the parameters of the model.

• The method of linear regression is to choose the (n + 1) coefficients
w0, w1, …, wn, to minimize the residual sum of squares of these
differences over all the N training instances.
Least Square Method with Linear
Regression
• In general, regression methods are used to predict the value of
response (dependent) variable from attribute (independent)
variables,
• Linear regressor model fits a linear function (relationship) between
dependent (output) variable and independent (input) variables.

• where {w0, w1, …, wn} are the parameters of the model.

• The method of linear regression is to choose the (n + 1) coefficients
w0, w1, …, wn, to minimize the residual sum of squares of these
differences over all the N training instances.
Minimal Sum-of-Error-Squares

• For an optimum solution for w , the following equations need to be satisfied:

Minimal Sum-of-Error-Squares
Minimal Sum-of-Error-Squares

• In this least-squares estimation task, the objective is to find the

optimal * that minimizes E ().
• The solution to this classic problem in calculus is found by setting the
gradient of E (), with respect to , to zero.
Minimal Sum-of-Error-Squares

• The (n + 1) x N matrix X+ = (XXT)–1X is called the pseudoinverse matrix

of the matrix XT. Thus, the optimal solution is
* = X+y
Unique solution?
• It might happen that the columns of X are not linearly independent.
• Then XXT is singular and the least squares coefficients * are not
uniquely defined.
• The singular case occurs most often when two or more inputs were
perfectly correlated.
• A natural way to resolve the non-unique representation is by dropping
redundant columns in X.
• Most regression software packages detect these redundancies and
automatically implement some strategy for removing them.
Error Reduction-Gradient Descent
Basics of Gradient Descent
• Gradient descent search helps determine a weight vector that minimizes E
by starting with an arbitrary initial weight vector and then altering it again
and again in small steps.
• Batch gradient descent: When the weight update is calculated based on all
examples in the training dataset, it is called as batch gradient descent.
• Stochastic gradient descent: When the weight update is calculated
incrementally after each training example or a small group of training
example, it is called as stochastic gradient descent.
• Gradient descent procedure has two advantages over merely computing
the pseudoinverse:
• (1) it avoids the problems that arise when XXT is singular (it always yields a
solution regardless of whether or not XXT is singular);
• (2) it avoids the need for working with large matrices.
Basics of Gradient Descent
Basics of Gradient Descent
• The error surface may have multiple local minimums but a single global
minimum.
• The objective would be to find out global minimum.
What is Gradient Descent
Gradient Descent Optimization Schemes
● Optimization method Gradient Descent Method used for minimization
tasks. Changes of the weights are made according to the following
algorithm:

where denotes the learning rate, and stands for the actual iteration step.
Note:
● Need to choose .
● Needs many iterations.
● Works well even when n is large.
● Gradient descent serves as the basis for learning algorithms that search
the hypothesis space of possible weight vectors to find the weights that
best fit the training examples.
What is Gradient Descent
Gradient Descent Optimization Schemes
● Optimization method Gradient Descent Method used for minimization
tasks. Changes of the weights are made according to the following
algorithm:

where denotes the learning rate, and stands for the actual iteration step.
Approaches for deciding the iteration step:
1. Batch methods use all the data in one shot.
● iteration step means the kth presentation of training dataset.
● Gradient is calculated across the entire set of training patterns.
2. Online methods is where
● – iteration step after single data pair is presented.
● Share almost all good features of recursive least square algorithm with
reduced computational complexity.
The gradient descent training rule

Linear classification using regression technique

Performance Criterion:

---To be minimized.
Called as cost function.

used for computational convenience

The gradient descent training rule
• The error surface is parabolic with a single global minimum.
• The specific parabola will depend on the particular set of training examples.
• The direction of steepest descent along the error surface can be found by
computing the derivative of E with respect to each component of the vector .
• This vector-derivative is called the gradient of E w.r.t , written ∇E () .
Remember, it can be applied to any objective
function, not just for squared distances.

• The negative of this vector gives the direction Performance Criterion:

of steepest decrease.
• Therefore, the training rule for gradient
descent is, ---To be minimized.
Called as cost function.

is learning rate, a +ve constant, determines the step size in the search.
The gradient descent training rule
• This training rule can also be written in its component form:

• which shows that steepest descent is achieved by

altering each component wj of w in proportion to
Performance Criterion:

• starting with an arbitrary initial weight vector, is

changed in the direction producing the steepest
descent along the error surface. ---To be minimized.
• The process goes on till the global minimum error Called as cost function.

is attained.
The gradient with respect to weight wj

＆ Performance Criterion:

---To be minimized.
Called as cost function.
The gradient with respect to weight wj

An epoch is a complete run through all the N associated pairs.

The gradient with respect to weight wj
• Once an epoch is completed, the
pair (x(1), y(1)) is presented again
and a run is performed through
all the pairs again.
• After several epochs, the ouput
error is expected to be
sufficiently small.

• k corresponds to the epoch number, the

number of times the set of N pairs is
presented and cumulative error is
compounded.

Forecasting Demand For Food at Apollo Hospital: Submitted by Group 2
No ratings yet
Forecasting Demand For Food at Apollo Hospital: Submitted by Group 2
20 pages
Akuntansi Biaya - PPT Horngren (Teori) PDF
No ratings yet
Akuntansi Biaya - PPT Horngren (Teori) PDF
287 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Regression
No ratings yet
Regression
39 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Model Evaluation
No ratings yet
Model Evaluation
80 pages
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
No ratings yet
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
34 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Mlmultiplelinearregression 170919114353 PDF
No ratings yet
Mlmultiplelinearregression 170919114353 PDF
8 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Unit 5
No ratings yet
Unit 5
104 pages
Multivariate
100% (1)
Multivariate
78 pages
Datamining Lecture6
No ratings yet
Datamining Lecture6
41 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
No ratings yet
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
58 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
LinearRegression
No ratings yet
LinearRegression
24 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
unit 4 regression
No ratings yet
unit 4 regression
26 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Applied Multivariate Statistical Analysis: Chang Xinfeng Department of Statistics
No ratings yet
Applied Multivariate Statistical Analysis: Chang Xinfeng Department of Statistics
46 pages
Final Ml
No ratings yet
Final Ml
54 pages
M2 - (Till CIE1 Portion)
No ratings yet
M2 - (Till CIE1 Portion)
62 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
Welcome To:: Multiple Regression and Model Building
No ratings yet
Welcome To:: Multiple Regression and Model Building
20 pages
Unit 3
No ratings yet
Unit 3
25 pages
Linear Models - Numeric Prediction
No ratings yet
Linear Models - Numeric Prediction
7 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
fileml
No ratings yet
fileml
54 pages
Machine learning
No ratings yet
Machine learning
62 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
Complete UNIT III DEEP LEARNING PPT (1)
No ratings yet
Complete UNIT III DEEP LEARNING PPT (1)
126 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Section 1
No ratings yet
Section 1
5 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
1-Python Algebra Maths
No ratings yet
1-Python Algebra Maths
26 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Hota ML Regression
No ratings yet
Hota ML Regression
57 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Lec 4 PDF
No ratings yet
Lec 4 PDF
66 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
Dip Secretarial Notes
No ratings yet
Dip Secretarial Notes
116 pages
03 Linear Regression
No ratings yet
03 Linear Regression
29 pages
Unit-3
No ratings yet
Unit-3
28 pages
Chapter 6 Optimization
No ratings yet
Chapter 6 Optimization
11 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Banz Small Firm Effects
No ratings yet
Banz Small Firm Effects
16 pages
practice
No ratings yet
practice
2 pages
TA - Bruehne, A. and Jacob, M., 2019. Corporate Tax Avoidance and The Real Effects of Taxation: A Review.
No ratings yet
TA - Bruehne, A. and Jacob, M., 2019. Corporate Tax Avoidance and The Real Effects of Taxation: A Review.
99 pages
Predictive Models of Embodied Carbon Emissions in Building Design Phases - Machine Learning Approaches Based On Residential Buildings in China
No ratings yet
Predictive Models of Embodied Carbon Emissions in Building Design Phases - Machine Learning Approaches Based On Residential Buildings in China
15 pages
Factors Influencing Touristic Demand and Its Modelling Possibilities
No ratings yet
Factors Influencing Touristic Demand and Its Modelling Possibilities
6 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
CHAPTER THREE Dnial
No ratings yet
CHAPTER THREE Dnial
5 pages
Modelling and Forecasting Australian Domestic Tourism: George Athanasopoulos, Rob J. Hyndman
No ratings yet
Modelling and Forecasting Australian Domestic Tourism: George Athanasopoulos, Rob J. Hyndman
13 pages
Chapter 4 (Regression) PDF
No ratings yet
Chapter 4 (Regression) PDF
63 pages
Instructorsmanualforprinciplesofeconometricsfourtheditionwilliame 150910183908 Lva1 App6892 PDF
100% (5)
Instructorsmanualforprinciplesofeconometricsfourtheditionwilliame 150910183908 Lva1 App6892 PDF
620 pages
The Use of Control Charts
No ratings yet
The Use of Control Charts
16 pages
843-Article Text-7499-1-10-20231201
No ratings yet
843-Article Text-7499-1-10-20231201
14 pages
Forensic Accounting Techniques in Detecting Frauds
No ratings yet
Forensic Accounting Techniques in Detecting Frauds
16 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
49 pages
(Ebook) Pathways to Health by George B. Ploubidis, Benedetta Pongiglione, Bianca De Stavola, Rhian Daniel, Lenka Benova, Emily Grundy, Sanna Read ISBN 9789402417050, 9789402417074, 9402417052, 9402417079 download pdf
100% (10)
(Ebook) Pathways to Health by George B. Ploubidis, Benedetta Pongiglione, Bianca De Stavola, Rhian Daniel, Lenka Benova, Emily Grundy, Sanna Read ISBN 9789402417050, 9789402417074, 9402417052, 9402417079 download pdf
67 pages
ProblemSet422 23
No ratings yet
ProblemSet422 23
5 pages
CRM Organisational Resilience
No ratings yet
CRM Organisational Resilience
14 pages
结构方程模型输出
No ratings yet
结构方程模型输出
2 pages
Lessons Learned: A Case Study Using Data Mining in The Newspaper Industry
No ratings yet
Lessons Learned: A Case Study Using Data Mining in The Newspaper Industry
10 pages
10 1016@j Ecolmodel 2020 109354
No ratings yet
10 1016@j Ecolmodel 2020 109354
10 pages
UKP6053 - L8 Multiple Regression
100% (2)
UKP6053 - L8 Multiple Regression
105 pages
Linear Regression
No ratings yet
Linear Regression
21 pages
Opportunities and Challenges For Nextgeneration Applied Intellig 2009
No ratings yet
Opportunities and Challenges For Nextgeneration Applied Intellig 2009
341 pages
Mba 611 Statistics Syllabus
No ratings yet
Mba 611 Statistics Syllabus
4 pages
Exer 9
No ratings yet
Exer 9
4 pages
Design and Analysis of Cross Over Trials Third Edition Byron Jones 2024 Scribd Download
No ratings yet
Design and Analysis of Cross Over Trials Third Edition Byron Jones 2024 Scribd Download
67 pages
Faculity of Agriculture Department of Rural Development and The Role of Agricultural Cooperatives in Achieving Socio-Economic Development of Its Members: in Case of Teda Kebele, Gondar
No ratings yet
Faculity of Agriculture Department of Rural Development and The Role of Agricultural Cooperatives in Achieving Socio-Economic Development of Its Members: in Case of Teda Kebele, Gondar
46 pages
Use of Statistics by Scientist
No ratings yet
Use of Statistics by Scientist
22 pages