Regression Corr
Regression Corr
Regression Corr
a y bx =
a
b
x
y
x
y
( )
, x y
a
n
x
x
=
n
y
y
=
Simple Linear Regression & Correlation
A way to measure the strength of a linear relationship between two
variables is using the product moment correlation coefficient.
This is a number which lies between -1 and +1. Consider 3 cases:
If then there is positive linear correlation.
If then there is negative linear correlation.
If then there is no linear correlation present
r
0 r >
0 r =
0 r <
Simple Linear Regression & Correlation
The correlation coefficient is calculated using the formula:
From this formula we can also determine the coefficient of determination:
This measure represents the ratio of explained variation to total variation.
( ) ( )
2 2
2 2
nxy xy
r
nx xny y
=
( (
( (
( ) ( )
2
2
2 2
2 2
nxy xy
cdr
nx x ny y
(
(
==
(
( ( (
( (
(
*A Practical Example
Tabulated is the maintenance cost/00s$ to the age/mths of 10 X-ray machines.
i. Find the least squares regression line of maintenance cost on age and use it to
predict the maintenance cost for a similar machine 40 mths old.
ii. Calculate the product moment correlation coefficient between the age of the
machine and the cost for maintenance.
iii. Determine the percentage variation in the total maintenance cost that is explained
by the variation in the machine age.
Question adapted from Business Mathematics & Statistics
395 350 300 300 335 310 300 250 240 190 Cost ( y )
60 50 50 30 30 30 20 15 10 5 Age ( x )
10 9 8 7 6 5 4 3 2 1 Machine
156025 3600 23700 395 60
122500 2500 17500 350 50
90000 2500 15000 300 50
90000 900 9000 300 30
112225 900 10050 335 30
96100 900 9300 310 30
90000 400 6000 300 20
62500 225 3750 250 15
57600 100 2400 240 10
36100 25 950 190 5
x
y
xy
2
x
2
y
300 x=
2970 y=
97650 xy=
2
12050 x =
2
913050 y=
( )
2 2
2
10976503002970
2.8
1012050300
2970 300
2.8 212.9
10 10
nxy xy
b
nx x
y x
aybx b
n n
= = =
| |
== = =
|
\ .
212.9 2.8 y x = + y a bx = +
( )
212.92.840$324.90 y= +=
Recall
Using the formulae from slide 11
i. The correlation coefficient =
i. The coefficient of determination =
Thus 77% of the variation in maintenance costs is explained by the variation in
machine ages.
2 2
10;300;2970; 97650;12050; 913050 nxyxy x y ======
( ) ( )
2 2
2 2
2 2
10976503002970
1012050300 109130502970
0.88
n xy x y
r
n x x n y y
r
r
=
( (
( (
=
=
( )
2
2
0.88 0.77 cdr == =