EEPC102-Module_6-Lesson-2
EEPC102-Module_6-Lesson-2
a. Linear Regression
𝑒 = 𝑦 − 𝑎𝑜 − 𝑎1 𝑥
EEPC102 Module VI
2
Find 𝑎𝑜 and 𝑎1 :
𝑛
𝜕𝑆𝑟
= −2 ∑(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 ) = 0 (1)
𝜕𝑎0
𝑖=1
𝑛
𝜕𝑆𝑟
= −2 ∑[(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 )𝑥𝑖 ] = 0 (2)
𝜕𝑎1
𝑖=1
From (1),
𝑛 𝑛 𝑛
∑ 𝑦𝑖 − ∑ 𝑎0 − ∑ 𝑎1 𝑥𝑖 = 0
𝑖=1 𝑖=1 𝑖=1
Or
𝑛 𝑛
∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑎0 𝑥𝑖 − ∑ 𝑎1 𝑥𝑖2 = 0
𝑖=1 𝑖=1 𝑖=1
Or
𝑛 𝑛 𝑛
∑ 𝑎0 𝑥𝑖 + ∑ 𝑥𝑖2 𝑎1 = ∑ 𝑥𝑖 𝑦𝑖 (4)
𝑖=1 𝑖=1 𝑖=1
From (3),
𝑛 𝑛
1 1
𝒂𝟎 = ∑ 𝑦𝑖 − ∑ 𝑥𝑖 𝑎1 = 𝒚
̅− 𝒙
̅𝒂𝟏
𝑛 𝑛
𝑖=1 𝑖=1
Where,
𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑥𝑖 , 𝑦̅ = ∑ 𝑦𝑖
𝑛 𝑛
𝑖=1 𝑖=1
From (4),
𝑛 𝑛 𝑛 𝑛 𝑛
1 1
∑ 𝑥𝑖 ( ∑ 𝑦𝑖 − ∑ 𝑥𝑖 𝑎1 ) + ∑ 𝑥𝑖2 𝑎1 = ∑ 𝑥𝑖 𝑦𝑖
𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
1
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛 ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖 𝒏 ∑𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − ∑𝒏𝒊=𝟏 𝒙𝒊 ∑𝒏𝒊=𝟏 𝒚𝒊
𝑎1 = =
1 𝒏 ∑𝒏𝒊=𝟏 𝒙𝟐𝒊 − (∑𝒏𝒊=𝟏 𝒙𝒊 )𝟐
∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2
𝑛
MATH122 Module VI
3
𝒏 ∑ 𝒙 𝒊 𝒚𝒊 − ∑ 𝒙 𝒊 ∑ 𝒚 𝒊
𝒂𝟏 =
𝒏 ∑ 𝒙𝟐𝒊 − (∑ 𝒙𝒊 )𝟐
Any line results in a larger sum of the squares of the residuals. Thus,
the line is unique and in terms of our chosen criterion is a “best” line through
the points. A number of additional properties of this fit can be elucidated by
examining more closely the way in which residuals were computed. Recall
that the sum of the squares is defined as
𝑛 𝑛
𝑆𝑟 = ∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 )2
𝑖=1 𝑖=1
The analogy can be extended further for cases where (1) the spread of the
points around the line is of similar magnitude along the entire range of the
data and (2) the distribution of these points about the line is normal. It can
be demonstrated that if these criteria are met, least-squares regression will
provide the best (that is, the most likely) estimates of 𝑎𝑜 and 𝑎1 (Draper and
Smith, 1981). This is called the maximum likelihood principle in statistics. In
addition, if these criteria are met, a “standard deviation” for the regression
line can be determined as
𝑺𝒓
𝑺𝒚/𝒙 = √
𝒏−𝟐
where 𝑺𝒚/𝒙 is called the standard error of the estimate. The subscript
notation “ 𝒚/𝒙 ” designates that the error is for a predicted value of 𝒚
corresponding to a particular value of 𝒙. Also, notice that we now divide by
𝒏 − 𝟐 because two data-derived estimates—𝑎𝑜 and 𝑎1 —were used to compute
𝑆𝑟 ; thus, we have lost two degrees of freedom.
The standard error of the estimate quantifies the spread of the data.
However, 𝑺𝒚/𝒙 quantifies the spread around the regression line as shown in
Fig. 3.4b in contrast to the original standard deviation 𝑺𝒚 that quantified the
spread around the mean (Fig. 3.4a).
The above concepts can be used to quantify the “goodness” of our fit.
This is particularly useful for comparison of several regressions (Fig. 3.5). To
do this, we return to the original data and determine the total sum of the
squares around the mean for the dependent variable (in our case, 𝑦). This
quantity is designated 𝑺𝒕 . This is the magnitude of the residual error
associated with the dependent variable prior to regression.
𝑆𝑡 ∑𝑛 ̅)2
𝑖=1(𝑦1 − 𝑦
𝑆𝑦 = √𝑛−1 = √ 𝑛−1
, Standard deviation of data point
MATH122 Module VI
4
Where
𝑛
𝑆𝑡 = ∑(𝑦1 − 𝑦̅)2
𝑖=1
correlation coefficient
FIGURE 2 Regression data showing (a) the spread of the data around
the mean of the dependent variable and (b) the spread of the data around
the best-fit line. The reduction in the spread in going from (a) to (b), as
indicated by the bell-shaped curves at the right, represents the
improvement due to linear regression.
MATH122 Module VI
5
FIGURE 3 Examples of linear regression with (a) small and (b) large
residual errors.
Example:
Fit a straight line to the x and y values in the first two columns of Table
below. Also, compute the total standard deviation, the standard error of the
estimate, and the correlation coefficient for the data
𝑥𝑖 𝑦𝑖
1 0.5
2 2.5
3 2.0
4 4.0
5 3.5
6 6.0
7 5.5
SOLUTION:
Using MS Excel,
28
𝑛=7 ∑ 𝑥𝑖 = 28 𝑥̅ = =4
7
24
∑ 𝑦𝑖 = 24 𝑦̅ = = 3.428571
7
∑ 𝑥𝑖 𝑦𝑖 = 119.5
∑ 𝑥𝑖2 = 140
MATH122 Module VI
6
Thus,
7(119.5) − (28)(24)
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
𝑎1 = 2 = = 0.8392857
𝑛 ∑ 𝑥𝑖2
− ∑ 𝑥𝑖 7(140) − 282
𝑎0 = 𝑦̅ − 𝑥̅ 𝑎1 = 3.428571 − (4)(0.8392857) = 0.0714282
Therefore, the least-squares fir is
𝒚 = 𝟎. 𝟎𝟕𝟏𝟒𝟐𝟖𝟐 + 𝟎. 𝟖𝟑𝟗𝟐𝟖𝟓𝟕𝒙
These results indicate that 86.8 percent of the original uncertainty has been
explained by the linear model.
MATH122 Module VI
7
b. Polynomial Regression
The least-squares procedure can be readily extended to fit the data to a
higher-order polynomial. For example, suppose that we fi t a second-order
polynomial or quadratic:
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑒
𝑆𝑟 = ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )2
𝑖=1
𝜕𝑆𝑟
= −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
𝜕𝑎0
𝜕𝑆𝑟
= −2 ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
𝜕𝑎1
𝜕𝑆𝑟
= −2 ∑ 𝑥𝑖 2 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
𝜕𝑎2
These equations can be set equal to zero and rearranged to develop the
following set of normal equations:
(𝑛)𝑎0 + (∑ 𝑥𝑖 ) 𝑎1 + (∑ 𝑥𝑖 2 ) 𝑎2 = ∑ 𝑦𝑖
(∑ 𝑥𝑖 ) 𝑎0 + (∑ 𝑥𝑖 2 ) 𝑎1 + (∑ 𝑥𝑖 3 ) 𝑎2 = ∑ 𝑥𝑖 𝑦𝑖
(∑ 𝑥𝑖 2 ) 𝑎0 + (∑ 𝑥𝑖 3 ) 𝑎1 + (∑ 𝑥𝑖 4 ) 𝑎2 = ∑ 𝑥𝑖 2 𝑦𝑖
where all summations are from 𝑖 = 1 through 𝑛. Note that the above
three equations are linear and have three unknowns: 𝑎0 , 𝑎1 , and 𝑎2 . The
coefficients of the unknowns can be calculated directly from the observed
data.
For this case, we see that the problem of determining a least-squares
second-order polynomial is equivalent to solving a system of three
simultaneous linear equations.
The two-dimensional case can be easily extended to an mth-order
polynomial as
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑚 𝑥 𝑚 + 𝑒
The foregoing analysis can be easily extended to this more general case.
Thus, we can recognize that determining the coefficients of an mth-order
polynomial is equivalent to solving a system of 𝑚 + 1 simultaneous linear
equations. For this case, the standard error is formulated as
MATH122 Module VI
8
𝑺𝒓
𝑺𝒚/𝒙 = √
𝒏 − (𝒎 + 𝟏)
This quantity is divided by 𝑛 − (𝑚 + 1) because (𝑚 + 1) data-derived
coefficients— 𝑎0 , 𝑎1 , … 𝑎𝑚 —were used to compute 𝑺𝒓 ; thus, we have lost
(𝑚 + 1) degrees of freedom. In addition to the standard error, a coefficient
of determination can also be computed for polynomial regression with the
equation of linear regression.
Example:
Fit a second-order polynomial to the data in the first two columns
𝑥𝑖 𝑦𝑖
0 2.1
1 7.7
2 13.6
3 27.2
4 40.9
5 61.1
Solution:
Using MS Excel
𝑚=2
𝑛=6
15
∑ 𝑥𝑖 = 15 𝑥̅ = = 2.5 ∑ 𝑥𝑖2 = 55 ∑ 𝑥𝑖3 = 225 ∑ 𝑥𝑖4 = 979
6
152.6
∑ 𝑦𝑖 = 152.6 𝑦̅ = = 25.433
6
∑ 𝑥𝑖 𝑦𝑖 = 585.6
∑ 𝑥𝑖2 𝑦𝑖 = 2488.8
MATH122 Module VI
9
Thus,
6 15 55 𝑎0 152.6
[15 55 225] {𝑎1 } = { 585.6 }
55 225 979 𝑎2 2488.8
MATH122 Module VI
10
𝑺𝒓 3.74657144
𝑺𝒚/𝒙 = √ = √ = 𝟏. 𝟐𝟒𝟖𝟖𝟓𝟕𝟏
𝒏 − (𝒎 + 𝟏) 6 − (2 + 1)
The coefficient of determination is
2513.39333 − 3.74657144
𝑟2 = = 𝟎. 𝟗𝟗𝟖𝟓𝟎𝟗
2513.39333
Or the correlation coefficient is
𝑟 = 𝟎. 𝟗𝟗𝟐𝟓
FIGURE 4
Graphical depiction of multiple
linear regression where y is a
linear function of 𝑥1 and 𝑥2 .
MATH122 Module VI
11
As with the previous cases, the “best” values of the coefficients are
determined by setting up the sum of the squares of the residuals,
𝑛
The coefficients yielding the minimum sum of the squares of the residuals are
obtained by setting the partial derivatives equal to zero and expressing the
result in matrix form as
𝑛 ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑦𝑖
𝑎0
2
∑ 𝑥1𝑖 ∑ 𝑥1𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 = {𝑎1 } = ∑ 𝑥1𝑖 𝑦𝑖
𝑎2
2
[∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 ∑ 𝑥2𝑖 ] {∑ 𝑥2𝑖 𝑦𝑖 }
Example:
The following data were calculated from the equation 𝑦 = 5 + 𝑥1 − 3𝑥2 ;
𝒙𝟏 𝒙𝟐 𝒚
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27
MATH122 Module VI
12
Solution:
Using MS Excel
𝑛 ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑦𝑖
𝑎0
2
∑ 𝑥1𝑖 ∑ 𝑥1𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 = {𝑎1 } = ∑ 𝑥1𝑖 𝑦𝑖
𝑎2
2
[∑ 𝑥2𝑖 ∑ 𝑥1𝑖 𝑥2𝑖 ∑ 𝑥2𝑖 ] {∑ 𝑥2𝑖 𝑦𝑖 }
6 16.5 14 𝑎0 54
[16.5 76.25 48] = {𝑎1 } = {243.5}
14 48 54 𝑎2 100
which can be solved using a method such as Gauss elimination for
𝑎0 = 5 𝑎1 = 4 𝑎2 = −3
Which consistent with the original equation from which these data were
derived.
The foregoing two-dimensional case can be easily extended to m dimensions,
as
𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ … . + 𝑎𝑚 𝑥𝑚 + 𝑒
where the standard error is formulated as
𝑺𝒓
𝑺𝒚/𝒙 = √
𝒏 − (𝒎 + 𝟏)
Although there may be certain cases where a variable is linearly related to
two or more other variables, multiple linear regression has additional utility
in the derivation of power equations of the general form
𝑎 𝑎 𝑎
𝑦 = 𝑎0 𝑥1 1 𝑥2 2 … 𝑥𝑚𝑚
MATH122 Module VI