Unit 8.1 Correlation-Regression
Unit 8.1 Correlation-Regression
1
Correlation and Regression
• Overview
• Correlation
• Regression
• Variation and Prediction Intervals
• Multiple Regression
• Modeling
2
Overview
Paired Data
is there a relationship
if so, what is the equation
use the equation for prediction
3
Correlation
4
Definition
Correlation
exists between two variables
when one of them is related to
the other in some way
5
Assumptions
1. The sample of paired data (x,y) is a
random sample.
2. The pairs of (x,y) data have a
bivariate normal distribution.
6
Definition
Scatter plot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis. Each individual (x,y) pair
is plotted as a single point.
7
Scatter Diagram of Paired Data
8
Scatter Diagram of Paired Data
9
Positive Linear Correlation
y y y
x x x
(a) Positive (b) Strong (c) Perfect
positive positive
Scatter Plots
10
Negative Linear Correlation
y y y
x x x
(d) Negative (e) Strong (f) Perfect
negative negative
Scatter Plots
11
No Linear Correlation
y y
x x
(g) No Correlation (h) Nonlinear Correlation
Scatter Plots
12
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
13
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
Σxy - (Σ
nΣ Σx)(Σ
Σy)
r=
Σx2) - (Σ
n(Σ Σx)2 Σy2) - (Σ
n(Σ Σy)2
Formula 1
14
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
Σxy - (Σ
nΣ Σx)(Σ
Σy)
r=
Σx2) - (Σ
n(Σ Σx)2 Σy2) - (Σ
n(Σ Σy)2
(Σx)2 indicates that the x scores should be added and the total
then squared.
17
Interpreting the Linear
Correlation Coefficient
If the absolute value of r exceeds the
value in Table A - 6, conclude that there
is a significant linear correlation.
18
TABLE A-6 Critical Values of the
Pearson Correlation Coefficient r
n α = .05 α = .01
4 .950 .999
5 .878 .959
6 .811 .917
7 .754 .875
8 .707 .834
9 .666 .798
10 .632 .765
11 .602 .735
12 .576 .708
13 .553 .684
14 .532 .661
15 .514 .641
16 .497 .623
17 .482 .606
18 .468 .590
19 .456 .575
20 .444 .561
25 .396 .505
30 .361 .463
35 .335 .430
40 .312 .402
45 .294 .378
50 .279 .361
60 .254 .330
70 .236 .305
80 .220 .286
90 .207 .269
100 .196 .256
19
Properties of the
Linear Correlation Coefficient r
1. -1 ≤ r ≤ 1
2. Value of r does not change if all values of
either variable are converted to a different
scale.
3. The r is not affected by the choice of x and y.
Interchange x and y and the value of r will
not change.
4. r measures strength of a linear relationship.
20
Common Errors Involving Correlation
FIGURE 9-2
250
200
Distance
150
(feet)
100
50
0
0 1 2 3 4 5 6 7 8
Time (seconds)
Scatterplot of Distance above Ground and Time for Object Thrown Upward
22
Formal Hypothesis Test
To determine whether there is a
significant linear correlation
between two variables
Two methods
Both methods let H0: ρ = 0
(no significant linear correlation)
H1: ρ ≠ 0
(significant linear correlation)
23
Method 1: Test Statistic is t
(follows format of earlier chapters)
Test statistic:
r
t=
1-r2
n-2
24
Method 1: Test Statistic is t
(follows format of earlier chapters)
Test statistic:
r
t=
1-r2
n-2
Critical values:
26
Method 2: Test Statistic is r
(uses fewer calculations)
Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)
27
Method 2: Test Statistic is r
(uses fewer calculations)
Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)
-1 r = - 0.811 0 r = 0.811 1
Sample data:
r = 0.828
28
Start
Let H0: ρ = 0
Testing for a H1: ρ ≠ 0
Calculate r using
Formula 9-1
METHOD 1 METHOD 2
29
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5
30
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5
31
Is there a significant linear correlation?
n α = .05 α = .01
n=8 α = 0.05 H0: ρ=0 4 .950 .999
:ρ ≠ 0
5 .878 .959
6 .811 .917
H1 7 .754 .875
8 .707 .834
9 .666 .798
10 .632 .765
11 .602 .735
12 .576 .708
Test statistic is r = 0.842 13
14
.553
.532
.684
.661
15 .514 .641
16 .497 .623
17 .482 .606
18 .468 .590
19 .456 .575
20 .444 .561
Critical values are r = - 0.707 and 0.707 25 .396 .505
32
Is there a significant linear correlation?
-1 r = - 0.707 0 1
r = 0.707
Sample data:
r = 0.842
33
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.
-1 r = - 0.707 0 1
r = 0.707
Sample data:
r = 0.842
34
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.
-1 r = - 0.707 0 1
r = 0.707
Sample data:
r = 0.842
35
Justification for r Formula
Formula 9-1 is developed from
Σ (x -x) (y -y)
r= (n -1) Sx Sy
36
Justification for r Formula
Formula 9-1 is developed from
37
Justification for r Formula
Formula 9-1 is developed from
Quadrant 2 Quadrant 1
16
•
12
y = 11
(x, y)
8
Quadrant 3 • Quadrant 4
4
••
0 x
0 1 2 3 4 5 6 7
38