Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

EEPC102-Module_6-Lesson-2

This document discusses least squares regression, focusing on linear regression and its mathematical formulation, including the calculation of coefficients and error quantification. It explains the criteria for the best fit, the normal equations, and how to compute the standard error of the estimate and the coefficient of determination. Additionally, it briefly introduces polynomial regression as an extension of the least squares method for fitting higher-order polynomials.

Uploaded by

cuzz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

EEPC102-Module_6-Lesson-2

This document discusses least squares regression, focusing on linear regression and its mathematical formulation, including the calculation of coefficients and error quantification. It explains the criteria for the best fit, the normal equations, and how to compute the standard error of the estimate and the coefficient of determination. Additionally, it briefly introduces polynomial regression as an extension of the least squares method for fitting higher-order polynomials.

Uploaded by

cuzz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lesson 2

 Least Squares Regression

a. Linear Regression

The simplest example of a least-squares approximation is fitting a


straight line to a set of paired observations: (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), . . . ,
(𝑥𝑛 , 𝑦𝑛 ). The mathematical expression for the straight line is
𝑦 = 𝑎𝑜 + 𝑎1 𝑥 + 𝑒

(a) Data exhibiting significant error. (b) Polynomial fit oscillating beyond


the range of the data. (c) More satisfactory result using the least-squares fit.

where 𝑎𝑜 and 𝑎1 are coefficients representing the intercept and the


slope, respectively, and 𝑒 is the error, or residual, between the model and
the observations, which can be represented by rearranging

𝑒 = 𝑦 − 𝑎𝑜 − 𝑎1 𝑥

Thus, the error, or residual, is the discrepancy between the true


value of 𝑦 and the approximate value, 𝑎𝑜 + 𝑎1 𝑥, predicted by the linear
equation.

Criterion for a best fit:


𝑛 𝑛

min 𝑆𝑟 = min ∑ 𝑒𝑖2 = min ∑(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 )2


𝑎𝑜 ,𝑎1 𝑎𝑜 ,𝑎1
𝑖=1 𝑖=1

EEPC102 Module VI
2

Find 𝑎𝑜 and 𝑎1 :

𝑛
𝜕𝑆𝑟
= −2 ∑(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 ) = 0 (1)
𝜕𝑎0
𝑖=1
𝑛
𝜕𝑆𝑟
= −2 ∑[(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 )𝑥𝑖 ] = 0 (2)
𝜕𝑎1
𝑖=1
From (1),

𝑛 𝑛 𝑛

∑ 𝑦𝑖 − ∑ 𝑎0 − ∑ 𝑎1 𝑥𝑖 = 0
𝑖=1 𝑖=1 𝑖=1
Or
𝑛 𝑛

𝑛𝑎0 + ( ∑ 𝑥𝑖 )𝑎1 = ∑ 𝑦𝑖 (3)


𝑖=1 𝑖=1
From (2),
𝑛 𝑛 𝑛

∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑎0 𝑥𝑖 − ∑ 𝑎1 𝑥𝑖2 = 0
𝑖=1 𝑖=1 𝑖=1
Or
𝑛 𝑛 𝑛

∑ 𝑎0 𝑥𝑖 + ∑ 𝑥𝑖2 𝑎1 = ∑ 𝑥𝑖 𝑦𝑖 (4)
𝑖=1 𝑖=1 𝑖=1

(3) and (4) are called normal equations.

From (3),

𝑛 𝑛
1 1
𝒂𝟎 = ∑ 𝑦𝑖 − ∑ 𝑥𝑖 𝑎1 = 𝒚
̅− 𝒙
̅𝒂𝟏
𝑛 𝑛
𝑖=1 𝑖=1

Where,

𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑥𝑖 , 𝑦̅ = ∑ 𝑦𝑖
𝑛 𝑛
𝑖=1 𝑖=1

From (4),

𝑛 𝑛 𝑛 𝑛 𝑛
1 1
∑ 𝑥𝑖 ( ∑ 𝑦𝑖 − ∑ 𝑥𝑖 𝑎1 ) + ∑ 𝑥𝑖2 𝑎1 = ∑ 𝑥𝑖 𝑦𝑖
𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
1
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛 ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖 𝒏 ∑𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − ∑𝒏𝒊=𝟏 𝒙𝒊 ∑𝒏𝒊=𝟏 𝒚𝒊
𝑎1 = =
1 𝒏 ∑𝒏𝒊=𝟏 𝒙𝟐𝒊 − (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐
∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2
𝑛

MATH122 Module VI
3

𝒏 ∑ 𝒙 𝒊 𝒚𝒊 − ∑ 𝒙 𝒊 ∑ 𝒚 𝒊
𝒂𝟏 =
𝒏 ∑ 𝒙𝟐𝒊 − (∑ 𝒙𝒊 )𝟐

Quantification of Error of Linear Regression

Any line results in a larger sum of the squares of the residuals. Thus,
the line is unique and in terms of our chosen criterion is a “best” line through
the points. A number of additional properties of this fit can be elucidated by
examining more closely the way in which residuals were computed. Recall
that the sum of the squares is defined as

𝑛 𝑛

𝑆𝑟 = ∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 )2
𝑖=1 𝑖=1

The analogy can be extended further for cases where (1) the spread of the
points around the line is of similar magnitude along the entire range of the
data and (2) the distribution of these points about the line is normal. It can
be demonstrated that if these criteria are met, least-squares regression will
provide the best (that is, the most likely) estimates of 𝑎𝑜 and 𝑎1 (Draper and
Smith, 1981). This is called the maximum likelihood principle in statistics. In
addition, if these criteria are met, a “standard deviation” for the regression
line can be determined as
𝑺𝒓
𝑺𝒚/𝒙 = √
𝒏−𝟐
where 𝑺𝒚/𝒙 is called the standard error of the estimate. The subscript
notation “ 𝒚/𝒙 ” designates that the error is for a predicted value of 𝒚
corresponding to a particular value of 𝒙. Also, notice that we now divide by
𝒏 − 𝟐 because two data-derived estimates—𝑎𝑜 and 𝑎1 —were used to compute
𝑆𝑟 ; thus, we have lost two degrees of freedom.
The standard error of the estimate quantifies the spread of the data.
However, 𝑺𝒚/𝒙 quantifies the spread around the regression line as shown in
Fig. 3.4b in contrast to the original standard deviation 𝑺𝒚 that quantified the
spread around the mean (Fig. 3.4a).
The above concepts can be used to quantify the “goodness” of our fit.
This is particularly useful for comparison of several regressions (Fig. 3.5). To
do this, we return to the original data and determine the total sum of the
squares around the mean for the dependent variable (in our case, 𝑦). This
quantity is designated 𝑺𝒕 . This is the magnitude of the residual error
associated with the dependent variable prior to regression.
𝑆𝑡 ∑𝑛 ̅)2
𝑖=1(𝑦1 − 𝑦
𝑆𝑦 = √𝑛−1 = √ 𝑛−1
, Standard deviation of data point

MATH122 Module VI
4

Where
𝑛

𝑆𝑡 = ∑(𝑦1 − 𝑦̅)2
𝑖=1

After performing the regression, we can compute 𝑺𝒓 . the sum of the


squares of the residuals around the regression line. This characterizes the
residual error that remains after the regression. It is, therefore, sometimes
called the unexplained sum of the squares. The difference between the two
quantities, 𝑺𝒕 − 𝑺𝒓 , quantifies the improvement or error reduction due to
describing the data in terms of a straight line rather than as an average value.
Because the magnitude of this quantity is scale-dependent, the difference is
normalized to 𝑺𝒕 to yield
𝑺𝒕 − 𝑺𝒓
𝑟2 =
𝑺𝒕
where 𝑟 is called the coefficient of determination and 𝑟 is the
2

correlation coefficient

FIGURE 2 Regression data showing (a) the spread of the data around
the mean of the dependent variable and (b) the spread of the data around
the best-fit line. The reduction in the spread in going from (a) to (b), as
indicated by the bell-shaped curves at the right, represents the
improvement due to linear regression.

MATH122 Module VI
5

FIGURE 3 Examples of linear regression with (a) small and (b) large
residual errors.

Example:

Fit a straight line to the x and y values in the first two columns of Table
below. Also, compute the total standard deviation, the standard error of the
estimate, and the correlation coefficient for the data

𝑥𝑖 𝑦𝑖
1 0.5
2 2.5
3 2.0
4 4.0
5 3.5
6 6.0
7 5.5

SOLUTION:
Using MS Excel,
28
𝑛=7 ∑ 𝑥𝑖 = 28 𝑥̅ = =4
7
24
∑ 𝑦𝑖 = 24 𝑦̅ = = 3.428571
7

∑ 𝑥𝑖 𝑦𝑖 = 119.5

∑ 𝑥𝑖2 = 140

MATH122 Module VI
6

Thus,
7(119.5) − (28)(24)
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
𝑎1 = 2 = = 0.8392857
𝑛 ∑ 𝑥𝑖2
− ∑ 𝑥𝑖 7(140) − 282
𝑎0 = 𝑦̅ − 𝑥̅ 𝑎1 = 3.428571 − (4)(0.8392857) = 0.0714282
Therefore, the least-squares fir is
𝒚 = 𝟎. 𝟎𝟕𝟏𝟒𝟐𝟖𝟐 + 𝟎. 𝟖𝟑𝟗𝟐𝟖𝟓𝟕𝒙

Using data above,

𝑆𝑡 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 = 22.7143


The standard deviation is
𝑆 22.7143
𝑡
𝑆𝑦 = √𝑛−1 = √ = 1.94569
7−1
and the standard error of the estimate
𝒓 𝑺 2.991071
𝑺𝒚/𝒙 = √𝒏−𝟐 = √ = 0.77344
7−2
Thus, because 𝑺𝒚/𝒙 < 𝑆𝑦 , the linear regression model has merit. The extent
of the improvement is quantified by

𝟐𝟐. 𝟕𝟏𝟒𝟑 − 𝟐. 𝟗𝟗𝟏𝟎𝟕𝟏


𝑟2 = = 𝟎. 𝟖𝟔𝟖𝟑
𝟐𝟐. 𝟕𝟏𝟒𝟑
Or
𝑟 = 0.9318

These results indicate that 86.8 percent of the original uncertainty has been
explained by the linear model.

MATH122 Module VI
7

b. Polynomial Regression
The least-squares procedure can be readily extended to fit the data to a
higher-order polynomial. For example, suppose that we fi t a second-order
polynomial or quadratic:
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑒

For this case the sum of the squares of the residuals is


𝑛

𝑆𝑟 = ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )2
𝑖=1

Taking the derivative with respect to each of the unknown coefficients


of the polynomial, as in

𝜕𝑆𝑟
= −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
𝜕𝑎0
𝜕𝑆𝑟
= −2 ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
𝜕𝑎1
𝜕𝑆𝑟
= −2 ∑ 𝑥𝑖 2 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
𝜕𝑎2

These equations can be set equal to zero and rearranged to develop the
following set of normal equations:
(𝑛)𝑎0 + (∑ 𝑥𝑖 ) 𝑎1 + (∑ 𝑥𝑖 2 ) 𝑎2 = ∑ 𝑦𝑖

(∑ 𝑥𝑖 ) 𝑎0 + (∑ 𝑥𝑖 2 ) 𝑎1 + (∑ 𝑥𝑖 3 ) 𝑎2 = ∑ 𝑥𝑖 𝑦𝑖

(∑ 𝑥𝑖 2 ) 𝑎0 + (∑ 𝑥𝑖 3 ) 𝑎1 + (∑ 𝑥𝑖 4 ) 𝑎2 = ∑ 𝑥𝑖 2 𝑦𝑖

where all summations are from 𝑖 = 1 through 𝑛. Note that the above
three equations are linear and have three unknowns: 𝑎0 , 𝑎1 , and 𝑎2 . The
coefficients of the unknowns can be calculated directly from the observed
data.
For this case, we see that the problem of determining a least-squares
second-order polynomial is equivalent to solving a system of three
simultaneous linear equations.
The two-dimensional case can be easily extended to an mth-order
polynomial as
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑚 𝑥 𝑚 + 𝑒

The foregoing analysis can be easily extended to this more general case.
Thus, we can recognize that determining the coefficients of an mth-order
polynomial is equivalent to solving a system of 𝑚 + 1 simultaneous linear
equations. For this case, the standard error is formulated as

MATH122 Module VI
8

𝑺𝒓
𝑺𝒚/𝒙 = √
𝒏 − (𝒎 + 𝟏)
This quantity is divided by 𝑛 − (𝑚 + 1) because (𝑚 + 1) data-derived
coefficients— 𝑎0 , 𝑎1 , … 𝑎𝑚 —were used to compute 𝑺𝒓 ; thus, we have lost
(𝑚 + 1) degrees of freedom. In addition to the standard error, a coefficient
of determination can also be computed for polynomial regression with the
equation of linear regression.
Example:
Fit a second-order polynomial to the data in the first two columns
𝑥𝑖 𝑦𝑖
0 2.1
1 7.7
2 13.6
3 27.2
4 40.9
5 61.1

Solution:
Using MS Excel

𝑚=2
𝑛=6
15
∑ 𝑥𝑖 = 15 𝑥̅ = = 2.5 ∑ 𝑥𝑖2 = 55 ∑ 𝑥𝑖3 = 225 ∑ 𝑥𝑖4 = 979
6
152.6
∑ 𝑦𝑖 = 152.6 𝑦̅ = = 25.433
6

∑ 𝑥𝑖 𝑦𝑖 = 585.6

∑ 𝑥𝑖2 𝑦𝑖 = 2488.8

MATH122 Module VI
9

Therefore, the simultaneous linear equations are


(𝑛)𝑎0 + (∑ 𝑥𝑖 ) 𝑎1 + (∑ 𝑥𝑖 2 ) 𝑎2 = ∑ 𝑦𝑖
6(𝑎0 ) + 15𝑎1 + 55𝑎2 = 152.6
(∑ 𝑥𝑖 ) 𝑎0 + (∑ 𝑥𝑖 2 ) 𝑎1 + (∑ 𝑥𝑖 3 ) 𝑎2 = ∑ 𝑥𝑖 𝑦𝑖
15(𝑎0 ) + 55𝑎1 + 225𝑎2 = 585.6
(∑ 𝑥𝑖 2 ) 𝑎0 + (∑ 𝑥𝑖 3 ) 𝑎1 + (∑ 𝑥𝑖 4 ) 𝑎2 = ∑ 𝑥𝑖 2 𝑦𝑖
155(𝑎0 ) + 225𝑎1 + 979𝑎2 = 2488.8

Thus,

6 15 55 𝑎0 152.6
[15 55 225] {𝑎1 } = { 585.6 }
55 225 979 𝑎2 2488.8

Solving these equations through Gauss elimination gives


𝑎0 = 2.47857
𝑎1 = 2.35929𝑥
𝑎2 = 1.86071𝑥 2
𝑦 = 2.47857 + 2.35929𝑥 + 1.86071𝑥 2

For the standard error of the estimate,

The standard error of the estimate based on the regression polynomial is


𝑛

𝑆𝑟 = ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )2 = 𝟑. 𝟕𝟒𝟔𝟓𝟕𝟏𝟒𝟒


𝑖=1

MATH122 Module VI
10

𝑺𝒓 3.74657144
𝑺𝒚/𝒙 = √ = √ = 𝟏. 𝟐𝟒𝟖𝟖𝟓𝟕𝟏
𝒏 − (𝒎 + 𝟏) 6 − (2 + 1)
The coefficient of determination is

2513.39333 − 3.74657144
𝑟2 = = 𝟎. 𝟗𝟗𝟖𝟓𝟎𝟗
2513.39333
Or the correlation coefficient is
𝑟 = 𝟎. 𝟗𝟗𝟐𝟓

These results indicate that 99.851 percent of the original uncertainty


has been explained by the model. This result supports the conclusion that
the quadratic equation represents an excellent fit.

c. Multiple Linear Regression

A useful extension of linear regression is the case where y is a linear function


of two or more independent variables. For example, y might be a linear
function of 𝑥1 and 𝑥2 , as in
𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑒

Such an equation is particularly useful when fitting experimental data, where


the variable being studied is often a function of two other variables. For this
two-dimensional case, the regression “line” becomes a “plane” (Fig. 4).

FIGURE 4
Graphical depiction of multiple
linear regression where y is a
linear function of 𝑥1 and 𝑥2 .

MATH122 Module VI
11

As with the previous cases, the “best” values of the coefficients are
determined by setting up the sum of the squares of the residuals,
𝑛

𝑆𝑟 = ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )2 =


𝑖=1

and differentiating with respect to each of the unknown coefficients,


𝜕𝑆𝑟
= −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )
𝜕𝑎0
𝜕𝑆𝑟
= −2 ∑ 𝑥1𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )
𝜕𝑎1
𝜕𝑆𝑟
= −2 ∑ 𝑥2𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )
𝜕𝑎2

The coefficients yielding the minimum sum of the squares of the residuals are
obtained by setting the partial derivatives equal to zero and expressing the
result in matrix form as

𝑛 ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑦𝑖
𝑎0
2
∑ 𝑥1𝑖 ∑ 𝑥1𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 = {𝑎1 } = ∑ 𝑥1𝑖 𝑦𝑖
𝑎2
2
[∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 ∑ 𝑥2𝑖 ] {∑ 𝑥2𝑖 𝑦𝑖 }

Example:
The following data were calculated from the equation 𝑦 = 5 + 𝑥1 − 3𝑥2 ;
𝒙𝟏 𝒙𝟐 𝒚
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27

Use multiple linear regression to fit these data.

MATH122 Module VI
12

Solution:
Using MS Excel

The summations required to develop equation of multiple linear regression


are computed in Table above. The result is

𝑛 ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑦𝑖
𝑎0
2
∑ 𝑥1𝑖 ∑ 𝑥1𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 = {𝑎1 } = ∑ 𝑥1𝑖 𝑦𝑖
𝑎2
2
[∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 ∑ 𝑥2𝑖 ] {∑ 𝑥2𝑖 𝑦𝑖 }
6 16.5 14 𝑎0 54
[16.5 76.25 48] = {𝑎1 } = {243.5}
14 48 54 𝑎2 100
which can be solved using a method such as Gauss elimination for
𝑎0 = 5 𝑎1 = 4 𝑎2 = −3

Which consistent with the original equation from which these data were
derived.
The foregoing two-dimensional case can be easily extended to m dimensions,
as
𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ … . + 𝑎𝑚 𝑥𝑚 + 𝑒
where the standard error is formulated as

𝑺𝒓
𝑺𝒚/𝒙 = √
𝒏 − (𝒎 + 𝟏)
Although there may be certain cases where a variable is linearly related to
two or more other variables, multiple linear regression has additional utility
in the derivation of power equations of the general form
𝑎 𝑎 𝑎
𝑦 = 𝑎0 𝑥1 1 𝑥2 2 … 𝑥𝑚𝑚

MATH122 Module VI

You might also like