Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
55 views

Unit 8.1 Correlation-Regression

Correlation and regression analysis can be used to study the relationship between two quantitative variables. Correlation determines if a linear relationship exists between two variables, while regression allows predicting the value of one variable based on the other. The correlation coefficient r measures the strength and direction of the linear relationship between two variables. A scatter plot visually depicts the relationship. Hypothesis testing using r as the test statistic or a t-statistic derived from r can determine if a significant linear correlation exists between the variables. Multiple regression generalizes the analysis to three or more variables.

Uploaded by

tebebe solomon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Unit 8.1 Correlation-Regression

Correlation and regression analysis can be used to study the relationship between two quantitative variables. Correlation determines if a linear relationship exists between two variables, while regression allows predicting the value of one variable based on the other. The correlation coefficient r measures the strength and direction of the linear relationship between two variables. A scatter plot visually depicts the relationship. Hypothesis testing using r as the test statistic or a t-statistic derived from r can determine if a significant linear correlation exists between the variables. Multiple regression generalizes the analysis to three or more variables.

Uploaded by

tebebe solomon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Correlation and Regression

1
Correlation and Regression

• Overview
• Correlation
• Regression
• Variation and Prediction Intervals
• Multiple Regression
• Modeling
2
Overview

Paired Data
 is there a relationship
 if so, what is the equation
 use the equation for prediction

3
Correlation

4
Definition
Correlation
exists between two variables
when one of them is related to
the other in some way

5
Assumptions
1. The sample of paired data (x,y) is a
random sample.
2. The pairs of (x,y) data have a
bivariate normal distribution.

6
Definition
Scatter plot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis. Each individual (x,y) pair
is plotted as a single point.

7
Scatter Diagram of Paired Data

8
Scatter Diagram of Paired Data

9
Positive Linear Correlation

y y y

x x x
(a) Positive (b) Strong (c) Perfect
positive positive

Scatter Plots
10
Negative Linear Correlation

y y y

x x x
(d) Negative (e) Strong (f) Perfect
negative negative

Scatter Plots
11
No Linear Correlation

y y

x x
(g) No Correlation (h) Nonlinear Correlation

Scatter Plots
12
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample

13
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
Σxy - (Σ
nΣ Σx)(Σ
Σy)
r=
Σx2) - (Σ
n(Σ Σx)2 Σy2) - (Σ
n(Σ Σy)2

Formula 1

14
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
Σxy - (Σ
nΣ Σx)(Σ
Σy)
r=
Σx2) - (Σ
n(Σ Σx)2 Σy2) - (Σ
n(Σ Σy)2

Calculators can compute r

ρ (rho) is the linear correlation coefficient for all paired


data in the population.
15
Notation for the
Linear Correlation Coefficient
n = number of pairs of data presented

Σ denotes the addition of the items indicated.

Σx denotes the sum of all x values.

Σx2 indicates that each x score should be squared and then


those squares added.

(Σx)2 indicates that the x scores should be added and the total
then squared.

Σxy indicates that each x score should be first multiplied by its


corresponding y score. After obtaining all such products,
find their sum.

r represents linear correlation coefficient for a sample


ρ represents linear correlation coefficient for a population
16
Rounding the
Linear Correlation Coefficient r

 Round to three decimal places so


that it can be compared to critical
values in Table A-6

 Use calculator or computer if possible

17
Interpreting the Linear
Correlation Coefficient
If the absolute value of r exceeds the
value in Table A - 6, conclude that there
is a significant linear correlation.

Otherwise, there is not sufficient


evidence to support the conclusion of
significant linear correlation.

18
TABLE A-6 Critical Values of the
Pearson Correlation Coefficient r
n α = .05 α = .01
4 .950 .999
5 .878 .959
6 .811 .917
7 .754 .875
8 .707 .834
9 .666 .798
10 .632 .765
11 .602 .735
12 .576 .708
13 .553 .684
14 .532 .661
15 .514 .641
16 .497 .623
17 .482 .606
18 .468 .590
19 .456 .575
20 .444 .561
25 .396 .505
30 .361 .463
35 .335 .430
40 .312 .402
45 .294 .378
50 .279 .361
60 .254 .330
70 .236 .305
80 .220 .286
90 .207 .269
100 .196 .256

19
Properties of the
Linear Correlation Coefficient r

1. -1 ≤ r ≤ 1
2. Value of r does not change if all values of
either variable are converted to a different
scale.
3. The r is not affected by the choice of x and y.
Interchange x and y and the value of r will
not change.
4. r measures strength of a linear relationship.
20
Common Errors Involving Correlation

1. Causation: It is wrong to conclude that


correlation implies causality.

2. Averages: Averages suppress individual


variation and may inflate the correlation
coefficient.

3. Linearity: There may be some relationship


between x and y even when there is no
significant linear correlation.
21
Common Errors Involving Correlation

FIGURE 9-2
250

200
Distance

150
(feet)

100

50

0
0 1 2 3 4 5 6 7 8

Time (seconds)

Scatterplot of Distance above Ground and Time for Object Thrown Upward

22
Formal Hypothesis Test
 To determine whether there is a
significant linear correlation
between two variables
 Two methods
 Both methods let H0: ρ = 0
(no significant linear correlation)
H1: ρ ≠ 0
(significant linear correlation)
23
Method 1: Test Statistic is t
(follows format of earlier chapters)

Test statistic:
r
t=
1-r2
n-2

24
Method 1: Test Statistic is t
(follows format of earlier chapters)

Test statistic:
r
t=
1-r2
n-2

Critical values:

use Table A-3 with


degrees of freedom = n - 2
25
Method 1: Test Statistic is t
(follows format of earlier chapters)

26
Method 2: Test Statistic is r
(uses fewer calculations)

Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)

27
Method 2: Test Statistic is r
(uses fewer calculations)

Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)

Reject Fail to reject Reject


ρ =0 ρ=0 ρ =0

-1 r = - 0.811 0 r = 0.811 1

Sample data:
r = 0.828

28
Start

Let H0: ρ = 0
Testing for a H1: ρ ≠ 0

Linear Correlation Select a


significance
level α

Calculate r using
Formula 9-1
METHOD 1 METHOD 2

The test statistic is The test statistic is r


r
t= Critical values of t are from
1-r2 Table A-6
n -2
Critical values of t are from Table A-3
with n -2 degrees of freedom

If the absolute value of the


test statistic exceeds the
critical values, reject H0: ρ = 0
Otherwise fail to reject H0

If H0 is rejected conclude that there


is a significant linear correlation.
If you fail to reject H0, then there is
not sufficient evidence to conclude
that there is linear correlation.

29
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5

30
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5

n=8 α = 0.05 H0: ρ = 0


H1 :ρ ≠ 0

Test statistic is r = 0.842

31
Is there a significant linear correlation?
n α = .05 α = .01
n=8 α = 0.05 H0: ρ=0 4 .950 .999

:ρ ≠ 0
5 .878 .959
6 .811 .917
H1 7 .754 .875
8 .707 .834
9 .666 .798
10 .632 .765
11 .602 .735
12 .576 .708
Test statistic is r = 0.842 13
14
.553
.532
.684
.661
15 .514 .641
16 .497 .623
17 .482 .606
18 .468 .590
19 .456 .575
20 .444 .561
Critical values are r = - 0.707 and 0.707 25 .396 .505

(Table A-6 with n = 8 and α = 0.05)


30 .361 .463
35 .335 .430
40 .312 .402
45 .294 .378
50 .279 .361
60 .254 .330
70 .236 .305
80 .220 .286
90 .207 .269
100 .196 .256
TABLE A-6 Critical Values of the Pearson Correlation Coefficient r

32
Is there a significant linear correlation?

Reject Fail to reject Reject


ρ =0 ρ=0 ρ =0

-1 r = - 0.707 0 1
r = 0.707

Sample data:
r = 0.842

33
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.

Reject Fail to reject Reject


ρ =0 ρ=0 ρ =0

-1 r = - 0.707 0 1
r = 0.707

Sample data:
r = 0.842

34
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.

Therefore, we REJECT H0: ρ = 0 (no correlation) and conclude


there is a significant linear correlation between the weights of
discarded plastic and household size.

Reject Fail to reject Reject


ρ =0 ρ=0 ρ =0

-1 r = - 0.707 0 1
r = 0.707

Sample data:
r = 0.842

35
Justification for r Formula
Formula 9-1 is developed from

Σ (x -x) (y -y)
r= (n -1) Sx Sy

36
Justification for r Formula
Formula 9-1 is developed from

Σ (x -x) (y -y) (x, y)


r= (n -1) Sx Sy
centroid of sample points

37
Justification for r Formula
Formula 9-1 is developed from

Σ (x -x) (y -y) (x, y)


r= (n -1) Sx Sy
centroid of sample points
x=3
y x - x = 7- 3 = 4
(7, 23)
24

20
y - y = 23 - 11 = 12

Quadrant 2 Quadrant 1
16

12
y = 11
(x, y)
8
Quadrant 3 • Quadrant 4

4
••
0 x
0 1 2 3 4 5 6 7
38

You might also like