Correlation & Regression
Correlation & Regression
Stats
Test
Score
70
60
50
40
30
20
10
0
$0
$20
$40
$60
$80
Correlation
linear pattern of relationship between one variable (x) and
another variable (y) an association between two variables
relative position of one variable correlates with relative
distribution of another variable
graphical representation of the relationship between two
variables
Warning:
No proof of causality
Cannot assume x causes y
Scatterplot!
No Correlation
Random or circular
assortment of dots
Positive Correlation
ellipse leaning to right
GPA and SAT
Smoking and Lung Damage
Negative Correlation
ellipse learning to left
Depression & Self-esteem
Studying & test errors
-1.0
Strong Negative
0.0
No Rel.
+1.0
Strong Positive
Go to website!
playing with scatterplots
r = .__ __
r = .__ __
r = .__ __
r = .__ __
Correlation Guestimation
Correlations
Weight
Depression
Anxiety
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Miles walked
per day
1
12
-.797**
.002
12
-.800**
.002
12
-.774**
.003
12
Weight
Depression
-.797**
-.800**
.002
.002
12
12
1
.648*
.023
12
12
.648*
1
.023
12
12
.780**
.753**
.003
.005
12
12
Anxiety
-.774**
.003
12
.780**
.003
12
.753**
.005
12
1
12
Case #2
Correlation between aiming and points, r = .628
Sample small (n=6), and r is only moderate in size
We guess = 0 (we guess there is NO correlation in pop.)
Bottom-line
We can only guess about
We can be wrong in two ways
Aiming accuracy
Manual dexterity
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Total ball
Distance
toss points from target
1
-.904*
.
.013
6
6
-.904*
1
.013
.
6
6
-.582
.279
.226
.592
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Aiming
accuracy
.628
.181
6
-.653
.159
6
-.390
.445
Manual
College grade
dexterity
point avg
.821*
-.037
.045
.945
6
6
-.883*
.228
.020
.664
6
6
-.248
-.087
.635
.869
Confidence
for task
-.502
.310
6
.522
.288
6
.267
.609
.628
.181
6
.821*
.045
6
-.037
.945
6
-.502
.310
6
-.653
.159
6
-.883*
.020
6
.228
.664
6
.522
.288
6
-.390
.445
6
-.248
.635
6
-.087
.869
6
.267
.609
6
1
.
6
.758
.081
6
-.546
.262
6
-.250
.633
6
.758
.081
6
1
.
6
-.553
.255
6
-.101
.848
6
-.546
.262
6
-.553
.255
6
1
.
6
-.524
.286
6
-.250
.633
6
-.101
.848
6
-.524
.286
6
1
.
6
Pearson Correlation
a. Day sample collected = Tuesday
Sig. (2-tailed)
N
Distance from target
Pearson Correlation
Sig. (2-tailed)
N
Time spun before
Pearson Correlation
throwing
Sig. (2-tailed)
N
Correlation is significant at the 0.05 level (2-tailed).
Time spun
before
throwing
-.582
.226
6
.279
.592
6
1
.
Total ball
toss points
1
.
6
-.904*
.013
6
-.582
.226
Correlationsa
Time spun
r = -.904
Distance
before
Aiming
Manual College grade Confidence
from target throwing accuracy dexterity
point avg
for task
p
=
.
013
-Probability
of-.502
-.904*
-.582
.628
.821*
-.037
.013
.226 getting
.181 a correlation
.045
.945this size
.310
chance.
Reject
Ho 6
6
6 by sheer
6
6
6
p .05.-.883*
1
.279 if -.653
.228
.522
.
.592
.159
.020
.664
.288
sample
6size
6
6
6
6
6
r
(4)
=
-.904,
p.05
.279
1
-.390
-.248
-.087
.267
.592
.
.445
.635
.869
.609
Predictive Potential
Coefficient of Determination
r
Amount of variance accounted for in y by x
Percentage increase in accuracy you gain by using the regression
line to make predictions
Without correlation, you can only guess the mean of y
[Used with regression]
0%
20%
40%
60%
80%
100%
Limitations of Correlation
linearity:
cant describe non-linear relationships
e.g., relation between anxiety & performance
truncation of range:
underestimate stength of relationship if you cant see full range
of x value
no proof of causation
third variable problem:
could be 3rd variable causing change in both variables
directionality: cant be sure which way causality flows
Regression
Regression: Correlation + Prediction
predicting y based on x
e.g., predicting.
throwing points (y)
based on distance from target (x)
Regression equation
120
100
80
60
y=47
y=20
40
20
Rsq = 0.6031
8
10
12
14
16
18
if x=18
then
20
22
24
26
if x=24
then
Regression Equation
y= bx + a
See correlation
& regression
worksheet
y = predicted value of y
b = slope of the line
x = value of x that you plug-in
a = y-intercept (where line crosses y access)
In this case.
y = -4.263(x) + 125.401
y = 40.141
Predictor,
x-axis variable,
what youre basing
the prediction on
R
R Square
a
.777
.603
Adjusted
R Square
.581
Std. Error of
the Estimate
18.476
See correlation
& regression
worksheet
y = b (x)
y = -4.263(20) + 125.401
Coefficientsa
Model
1
(Constant)
Distance from target
Unstandardized
Coefficients
B
Std. Error
125.401
14.265
-4.263
.815
Standardized
Coefficients
Beta
-.777
t
8.791
-5.230
Sig.
.000
.000
Predictive Ability
Mantra!!
As variability decreases, prediction accuracy ___
if we can account for variance, we can make better predictions
As r increases:
r increases
variance accounted for increases
the prediction accuracy increases
prediction error decreases (distance between y and y)
Sy decreases
the standard error of the residual/predictor
measures overall amount of prediction error