Unit 3 - Linear Regression and Correlation Analysis
Unit 3 - Linear Regression and Correlation Analysis
LEARNING OUTCOMES:
Understand the meaning of regression analysis.
Construct a simple regression model
Understand correlation analysis and relationships between variables;
Understand and calculate Pearson’s correlation coefficient;
Interpret the correlation coefficient;
Calculate and interpret the coefficient of determination
One variable is called the independent or predictor variable, x, and the other is
called the dependent or responsive variable, y.
Regression Equation
Regression analysis finds the equation that best fits a straight line to the scatter
points. A straight-line graph is defined as follows:
^y =b0 +b 1 x
where: x = values of independent variable
^y = estimated values of dependent variable
b 0 = y-intercept coefficient (where the regression line cuts the y-axis)
b 1 = slope (gradient) coefficient of the regression line
Regression analysis uses the method of least squares to find the best-fitting straight-
line equation.
^y =b0 +b 1 x , where
n ∑ XY −∑ X ∑ Y
b 1=
n ∑ X 2− ( ∑ X )
2
b 0=
∑ Y −b 1 ∑ X
n
Flat-screen TV sales per week
40
35
30 f(x) = 4.36842105263158 x + 12.8157894736842
25
Sales
20
15
10
5
0
1.5 2 2.5 3 3.5 4 4.5 5 5.5
Adverts
Fig. 3.1 Scatter plot of TV sales, with superimposed regression equation.
Example 3.1
Calculation of the regression coefficient b 0 and b 1 for the flat-screen TV sales.
Adverts (x) Sales (y) x
2
xy
4 26 16 104
4 28 16 112
3 24 9 72
2 18 4 36
5 35 25 175
2 24 4 48
4 36 16 144
3 25 9 75
5 31 25 155
5 37 25 185
3 30 9 90
4 32 16 128
Σ x = 44 Σ y = 346 ∑ x 2= 174 Σ xy = 1324
b 0=
∑ Y −b 1 ∑ X = 346−4.368( 44) = 12.817
n 12
The simple linear regression equation to estimate flat-screen TV sales is given by:
^y =12.817 +4.368 x
Estimate y-values using the regression equation
The regression equation can now be used to estimate y-values from (known) x-values
by substituting a given x-values into the regression equation.
Example 3.2
What will be the average sales of flat-screen TVs in a week when 6 advertisements
are made, using equation in example 3.1?
Solution
Thus, substitute x = 6 into the regression equation
^y =12.817 +4.368 x
Example 3.3
Refer to example 3.1. Find the sample correlation coefficient, r, between the number
of adverts placed and flat-screen TVs sales. Comment on the strength of linear
relationship.
Solution:
Adverts (x) Sales (y) x
2
xy y
2
4 26 16 104 676
4 28 16 112 784
3 24 9 72 576
2 18 4 36 324
5 35 25 175 1225
2 24 4 48 576
4 36 16 144 1296
3 25 9 75 625
5 31 25 155 961
5 37 25 185 1369
3 30 9 90 900
4 32 16 128 1024
Σ x = 44 Σ y = 346 ∑ x 2= 174 Σ xy = 1324 ∑ y = 10336
2
0 < r 2 < 1: The strength of association depends on how closer r 2 lies to either
0 or 1.
When r 2 lies closer to 0, it is a weak association between x and y
When r 2 lies closer to 1, it is a strong association between x and y
Example 3.4
Calculate the sample coefficient of determination, r 2, between the number of adverts
placed and flat-screen TV sales in example 3.3.
Solution:
Given r = 0.8198, then r 2 = (0.8198)2 = 0.6721
This means that adverts have a moderate to strong impact on weekly sales of flat-
screen TV