Theory of Regression
Theory of Regression
Theory of Regression
The Course
16 (or so) lessons
Some flexibility
Depends how we feel
What we get through
House Rules
Jeremy must remember
Not to talk too fast
My data
Ill provide you with
Simple examples, small sample sizes
Conceptually simple (even silly)
Computer Programs
SPSS
Mostly
Excel
For calculations
GPower
Stata (if you like)
R (because its flexible and free)
Mplus (SEM, ML?)
AMOS (if you like)
7
Lesson 1: Models in
statistics
Models, parsimony, error,
mean, OLS estimators
10
What is a Model?
11
What is a model?
Representation
Of reality
Not reality
Sifting
What is important from what is not
important
Parsimony
In statistical models we seek
parsimony
Parsimony simplicity
13
Parsimony in Science
A model should be:
1: able to explain a lot
2: use as few concepts as possible
More it explains
The more you get
Fewer concepts
The lower the price
A Simple Model
Height of five individuals
1.40m
1.55m
1.80m
1.62m
1.63m
A Little Notation
Y
Yi
Y 4,5,6,7,8
Y2 5
16
0
j
b
e
18
Yi
The mean of Y.
19
So b1 1
I will use b1 (because it is easier to type)
20
21
22
We want a model
To represent those data
Model 1:
1.40m, 1.55m, 1.80m, 1.62m, 1.63m
Not a model
A copy
VERY unparsimonious
Data: 5 statistics
Model: 5 statistics
No improvement
23
Model 2:
The mean (arithmetic mean)
A one parameter model
Yi b0 Y
Yi
i 1
n
24
Y
Y
n
25
26
27
28
What is error?
Data (Y)
1.40
1.55
1.80
1.62
1.63
Model (b0)
mean
1.60
Error (e)
-0.20
-0.05
0.20
0.02
0.03
29
ERROR ei
(Yi Y )
(Yi b0 )
0 implies no ERROR
Not the case
31
ERROR ei
Yi Y
Yi b0
0.20 0.05 0.20 0.02 0.03
0.50
32
Y = (2, 2, 4, 4)
b0 = any value from 2 - 4
Indeterminate
There are an infinite number of solutions which would
satisfy our criteria for minimum error
33
ERROR e
2
i
(Yi Y )
(Yi b0 )
0.08
34
Determinate
Always gives one answer
If we minimise SSE
Get the mean
Shown in graph
SSE plotted against b0
Min value of SSE occurs when
b0 = mean
35
2
1.8
1.6
1.4
SSE
1.2
1
0.8
0.6
0.4
0.2
0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
b0
36
1.9
37
BLUE Estimators
Best
Minimum variance (of all possible
unbiased estimators
Narrower distribution than other
estimators
e.g. median, mode
Linear
Y Y
Linear predictions
For the mean
Linear (straight, flat) line
39
Unbiased
Centred around true (population)
values
Expected value = population value
Minimum is biased.
Minimum in samples > minimum in
population
Estimators
Errrmm they are estimators
Also consistent
Sample approaches infinity, get closer
to population values
Variance shrinks
40
SSE (Yi Y )
(Yi Y )
n
(Yi Y )
n 1
41
42
Proof
That the mean minimises SSE
Not that difficult
As statistical proofs go
Available in
Maxwell and Delaney Designing
experiments and analysing data
Judd and McClelland Data Analysis
(out of print?)
43
Whats a df?
The number of parameters free to
vary
When one is fixed
44
0 df
No variation
available
1 df
Fix 1 corner, the
shape is fixed
45
has N 1 df
Mean has been fixed
2nd moment
Can think of as amount cases vary
away from the mean
46
While we are at it
Skewness has N 2 df
3rd moment
Kurtosis has N 3 df
4rd moment
Amount cases vary from
47
Parsimony and df
Number of df remaining
Measure of parsimony
Normal distribution
Can be described in terms of mean and
2 parameters
(z with 0 parameters)
48
Summary of Lesson 1
Statistics is about modelling DATA
Models have parameters
Fewer parameters, more parsimony, better
50
51
52
In Lesson 1 we said
Use a model to predict and
describe data
Mean is a simple, one parameter
model
Yi Y Y
53
More Models
Slopes and Intercepts
54
More Models
The mean is OK
As far as it goes
It just doesnt go very far
Very simple prediction, uses very little
information
House Prices
In the UK, two of the largest
lenders (Halifax and Nationwide)
compile house price indices
Predict the price of a house
Examine effect of different
circumstances
(0 0 0 s )
77
74
88
62
90
136
35
134
138
55
57
Y 88.9
Y b0 Y
SSE 11806.9
How much is that house worth?
88,900
58
Use 1 df to say that
Y b0 b1x1
59
Alternative Expression
Estimate of Y (expected value of Y)
Y b0 b1x1
Value of Y
Yi b0 b1 xi1 ei
60
62
Mark errors on it
Called residuals
Sum and square these to find SSE
63
160
140
120
100
80
60
1.5
2.5
3.5
4.5
40
20
0
64
5.5
160
140
120
100
80
60
1.5
2.5
3.5
4.5
40
20
0
65
5.5
66
First attempt:
67
68
Gradient
b1 units
1 unit
69
Height
b0 units
70
Height
If we fix slope to zero
Height becomes mean
Hence mean is b0
71
Why the
constant?
b0x0
Where x0 is 1.00
for every case
i.e. x0 is constant
Implicit in SPSS
Some packages
force you to make
it explicit
(Later on well
need to make it
explicit)
beds (x1)
x0
(000s)
77
74
88
62
90
136
35
134
138
55
72
73
Start with
b0=88.9 (mean)
b1=10 (nice round number)
SSE = 14948 worse than it was
b0=86.9,
b0=66.9,
b0=56.9,
b0=46.9,
b0=51.9,
b0=51.9,
b0=46.9,
..
b1=10,
b1=10,
b1=10,
b1=10,
b1=10,
b1=12,
b1=14,
SSE=13828
SSE=7029
SSE=6628
SSE=8228
SSE=7178
SSE=6179
SSE=5957
75
160
140
120
Price
100
80
60
Actual Price
Predicted Price
40
20
0
0.5
1.5
2.5
3.5
4.5
Number of Bedrooms
77
5.5
We now know
A house with no bedrooms is worth
46,000 (??!)
Adding a bedroom adds 15,000
78
Standardised Regression
Line
One big but:
Scale dependent
Values change
to , inflation
Scales change
, 000, 00?
b1 = 14.79
We increase x1 by 1, and
increases by 14.79
14.79 (14.79 / 36.21) SDs 0.408SDs
80
b1 x1
y
81
14.79 1.72
0.706
36.21
The standardised regression line
Change (in SDs) in associated with
a change of 1 SD in x1
Correlation coefficient is a
standardised regression slope
Relative change, in terms of SDs
83
Proportional Reduction in
Error
84
Proportional Reduction in
Error
We might be interested in the level
of improvement of the model
How much less error (as proportion)
do we have
Proportional Reduction in Error (PRE)
Mean only
Error(model 0) = 11806
Mean + slope
Error(model 1) = 5921
85
ERROR(0) ERROR(1)
PRE
ERROR(0)
ERROR(1)
PRE 1
ERROR(0)
5921
PRE 1
11806
PRE 0.4984
86
0.4984 0.706
This is the correlation coefficient
Correlation coefficient is the square
root of the proportion of variance
87
explained
Standardised Covariance
88
Standardised Covariance
We are still iterating
Need a closed-form
Equation to solve to get the
parameter estimates
Answer is a standardised
covariance
A variable has variance
Amount of differentness
Divide by N
Gives SSE per person
(Actually N 1, we have lost a df to
the mean)
The variance
Same as SD2
We thought of SSE as a scattergram
Y plotted against X
160
140
120
100
80
60
1.5
2.5
3.5
4.5
40
20
0
91
5.5
Sum of areas
SSE
92
Plot of Y against Y
180
160
140
120
100
0
20
40
60
80
80
100
120
140
60
40
20
0
93
160
180
Draw Squares
180
Area =
40.1 x 40.1
= 1608.1
160
138 88.9
= 40.1
140
138 88.9
= 40.1
120
100
20
35 88.9
= -53.9
40
60
80
80
100
120
140
160
60
40
35 88.9
= -53.9
20
Area =
-53.9 x -53.9
= 2905.21
0
94
180
96
Area
= (-33.9) x (-2)
= 67.8
55 88.9
= -33.9
4-3=1
138-88.9
= 49.1
1 - 3 = -2
Area =
49.1 x 1
= 49.1
97
( x x )( y y )
Cov( x , y )
N 1
Cov(x,y)=44.2
What do points in different sectors
do to the covariance?
98
Need to standardise it
Like the slope
99
First approach
Much more computationally
expensive
Too much like hard work to do by hand
Second approach
Much easier
Standardise the final value only
Standardised covariance
Cov( x , y )
Var( x ) Var( y )
44.2
2.9 1311
0.706
101
Covariance
variance variance
102
Expanded
( x x )( y y )
N 1
2
2
( x x ) ( y y )
N 1
N 1
103
This means
We now have a closed form equation
to calculate the correlation
Which is the standardised slope
Which we can use to calculate the
unstandardised slope
104
We know that:
b1 x1
We know that:
b1
r y
x1
105
b1
r y
x1
0.706 36.21
b1
1.72
b1 14.79
So value of b1 is the same as the
iterative approach
106
The intercept
Just while we are at it
107
Subtract mean of x
But not the whole mean of x
Need to correct it for the slope
c y b1x1
c 88.9 14.8 3
c 46.00
Naturally, the same
108
Accuracy of Prediction
109
Beds
A c tu a l
P r e d i c te d
P ric e
P ric e
77
6 0 .8 0
74
7 5 .5 9
88
6 0 .8 0
62
9 0 .3 8
90
1 1 9 .96
136
1 1 9 .96
35
7 5 .5 9
134
1 1 9 .96
138
1 0 5 .17
55
6 0 .8 0
Plot actual
price against
predicted
price
From the
model
111
140
120
Predicted Value
100
80
60
40
20
20
40
60
80
100
Actual Value
120
140
112
160
r = 0.706
The correlation
113
xy
x 2 y 2
Point biserial
M
r
y1
M y 0 PQ
sd y
114
Phi ()
Used for 2 dichotomous variables
Vote P
Vote Q
Homeowner
A: 19
B: 54
Not homeowner
C: 60
D:53
BC AD
r
( A B )(C D)( A C )( B D)
115
6 d
r
2
n(n 1)
2
116
Summary
Mean is an OLS estimate
OLS estimates are BLUE
Regression line
Best prediction of DV from IV
OLS estimate (like mean)
117
118
119
120
Lesson 3: Why
Regression?
A little aside, where we look
at why regression has such a
curious name.
121
Regression
The or an act of regression;
reversion; return towards the
mean; return to an earlier stage of
development, as in an adults or an
adolescents behaving like a child
(From Latin gradi, to go)
Francis Galton
Charles Darwins cousin
Studying heritability
123
124
Other Examples
Secrist (1933): The Triumph of
Mediocrity in Business
Second albums often tend to not be as
good as first
Sequel to a film is not as good as the
first one
Curse of Athletics Weekly
Parents think that punishing bad
behaviour works, but rewarding good
behaviour doesnt
125
126
r=1.00
x
x
x
x
x
x
x
127
r=0.00
x
x
x
128
From Regression to
Correlation
Where do we predict an
individuals score on y will be,
based on their score on x?
Depends on the correlation
r=1.00
Starts here
Will end
up here
y
130
r=0.00
Starts here
Could end
anywhere here
y
131
r=0.50
Probably
end
somewher
e here
Starts
here
132
133
r=0.00
Ends here
Group starts
here
Group starts
here
y
134
r=0.50
y
135
r=1.00
y
136
r
units
1 unit
137
No
regression
r=1.00
138
Some
regression
r=0.50
139
r=0.00
Lots
(maximum)
regression
r=0.00
y
140
Formula
z y rxy z x
141
Conclusion
Regression towards mean is statistical
necessity
regression = perfection correlation
Very non-intuitive
Interest in regression and correlation
From examining the extent of regression
towards mean
By Pearson worked with Galton
Stuck with curious name
143
144
Lesson 4: Samples to
Populations Standard
Errors and Statistical
Significance
145
The Problem
In Social Sciences
We investigate samples
Theoretically
Randomly taken from a specified
population
Every member has an equal chance
of being sampled
Sampling one member does not alter
the chances of sampling another
Population
But its the population that we are
interested in
Not the sample
Population statistic represented with
Greek letter
Hat means estimate
x
x
147
148
Sampling Distribution
We need to know the sampling
distribution of a parameter
estimate
How much does it vary from sample
to sample
Sampling Distribution of
the Mean
Given
Normal distribution
Random sample
Continuous data
151
Concrete Abstract
12
4
11
7
4
6
9
12
8
6
12
10
9
8
8
5
12
10
8
4
Diff (x)
8
4
-2
-3
2
2
1
3
2
4
x 2.1
x 3.11
N 10
152
Confidence Intervals
This means
If we know the mean in our sample
We can estimate where the mean in
the population () is likely to be
Using
The standard error (se) of the mean
Represents the standard deviation of
the sampling distribution of the mean
153
1 SD
contains
68%
Almost 2
SDs contain
95%
154
x
se( x )
n
155
Decreasing SD decreases SE
x 2.1,
x 3.11,
N 10
x 3.11
se( x )
0.98
n
10
157
What is a CI?
(For 95% CI):
95% chance that the true
(population) value lies within the
confidence interval?
95% of samples, true mean will
land within the confidence
interval?
159
Significance Test
Probability that is a certain value
Almost always 0
Doesnt have to be though
x
t
se(x )
2.1
t
2.14
0.98
p 0.061
161
Other Parameter
Estimates
Same approach
Prediction, slope, intercept, predicted
values
At this point, prediction and slope are
the same
Wont be later on
163
SSreg df1
SS res df 2
df1 k
df 2 N k 1,
164
Slope = 14.79
Intercept = 46.0
r = 0.706
165
SSreg df1
SS res df 2
5885 1
F
7.95
5921 (10 1 1)
df1 k 1
df 2 N k 1 8
166
F = 7.95, df = 1, 8, p = 0.02
Can reject H0
H0: Prediction is not better than chance
A significant effect
167
Statistical Significance:
What does a p-value (really)
mean?
168
A Quiz
Six questions, each true or false
Write down your answers (if you like)
An experiment has been done. Carried
out perfectly. All assumptions perfectly
satisfied. Absolutely no problems.
P = 0.01
Which of the following can we say?
169
170
171
172
173
174
A Bit of Notation
Not because we like notation
But we have to say a lot less
Probability P
Null hypothesis is true H
Result (data) D
Given - |
178
Whats a P Value
P(D|H)
Probability of the data occurring if the
null hypothesis is true
Not
P(H|D)
Probability that the null hypothesis is
true, given that we have the data =
p(H)
P(H|D) P(D|H)
179
P(M|B) P(B|M)
180
Police say:
P(D|H) = 1/1,000,000
182
True
34% of students
15% of professors/lecturers,
10% of professors/lecturers teaching
statistics
. False
. We have found evidence against
the null hypothesis
184
. False
. We dont know
185
20% of students
13% of professors/lecturers
10% of professors/lecturers teaching
statistics
False
186
. False
187
68% of students
67% of professors/lecturers
73% of professors professors/lecturers
teaching statistics
. False
. Can be worked out
P(replication)
188
. False
. Another tricky one
It can be worked out
189
190
191
Yates (1951)
"the emphasis given to formal tests of
significance ... has resulted in ... an undue
concentration of effort by mathematical
statisticians on investigations of tests of
significance applicable to problems which
are of little or no practical importance ...
and ... it has caused scientific research
workers to pay undue attention to the
results of the tests of significance ... and
too little to the estimates of the magnitude
of the effects they are investigating
193
194
s y. x
(Y Y )
N k 1
s y.x
SSres
N k 1
s y.x
5921
27.2
8
195
196
se(by .x )
s y.x
( x x )
27.2
se(by.x )
5.24
26.9
197
Confidence Limits
95% CI
t dist with N - k - 1 df is 2.31
CI = 5.24 2.31 = 12.06
b
14.7
t
2.81
se(b)
5.2
df N k 1 8
p 0.02
This probability is (of course) the
199
Need to transform it
Fisher z transformation approximately
normal
z 0.5[ln( 1 r ) ln( 1 r )]
1
SE z
n3
200
0.38
n3
10 3
95% CIs
0.879 1.96 * 0.38 = 0.13
0.879 + 1.96 * 0.38 = 1.62
201
e 1
r 2y
e 1
2y
Using Excel
Functions in excel
Fisher() to carry out Fisher
transformation
Fisherinv() to transform back to
correlation
203
The Others
Same ideas for calculation of CIs
and SEs for
Predicted score
Gives expected range of values given
X
Lesson 5: Introducing
Multiple Regression
205
Residuals
We said
Y = b0 + b1x1
206
Contains information
Something is making the residual 0
But what?
207
160
140
swimming
pool
120
Price
100
80
Unpleasant
neighbours
60
Actual Price
Predicted Price
40
20
0
0.5
1.5
2.5
3.5
4.5
Number of Bedrooms
208
5.5
Beds
(0 0 0 s )
77
74
88
62
90
136
35
134
138
55
210
211
Control?
In experimental research
Use experimental control
e.g. same conditions, materials, time
of day, accurate measures, random
assignment to conditions
In non-experimental research
Cant use experimental control
Use statistical control instead
212
Analysis of Residuals
What predicts differences in crime
rate
After controlling for socio-economic
deprivation
Number of police?
Crime prevention schemes?
Rural/Urban proportions?
Something else
Exam performance
Consider number of books a student
read (books)
Number of lectures (max 20) a
student attended (attend)
214
Book s
Attend
0
1
0
2
4
4
1
4
3
0
9
15
10
16
10
20
11
20
15
15
Grade
45
57
45
51
65
88
44
87
89
59
First 10 cases
215
Use books as IV
R=0.492, F=12.1, df=1, 28, p=0.001
b0=52.1, b1=5.7
(Intercept makes sense)
Use attend as IV
R=0.482, F=11.5, df=1, 38, p=0.002
b0=37.0, b1=1.9
(Intercept makes less sense)
216
100
90
80
70
Grade (100)
60
50
40
30
-1
Books
217
100
90
80
70
60
Grade
50
40
30
5
11
13
15
17
19
Attend
218
21
Problem
Use R2 to give proportion of shared
variance
Books = 24%
Attend = 23%
219
ATTEN
D
GRADE
0.44
0.49
0.48
BOOKS
ATTEN
D
GRADE
Well. Almost.
This would give us correct values for
SS
Would not be correct for slopes, etc
Simultaneously estimate 2
parameters
b1 and b2
Y = b0 + b1x1 + b2x2
x1 and x2 are IVs
3D scatterplot
(2points only)
y
x2
x1
224
b2
b1
b0
x2
x1
225
(Really) Ridiculous
Equations
2
y y x1 x1 x2 x2 y y x2 x2 x1 x1 x2 x2
b1
2
2
2
x1 x1 x2 x2 x1 x1 x2 x2
2
y y x2 x2 x1 x1 y y x1 x1 x2 x2 x1 x1
b2
2
2
2
x2 x2 x1 x1 x2 x2 x1 x1
b0 y b1x1 b2x2
226
227
228
A scalar is a number
A scalar: 4
A vector is a row or column of
numbers
A row vector:
4 8 7
A column vector: 11
230
4 8 7
Is a 1 4 vector
11
Is a 2 1 vector
A number (scalar) is a 1 1 vector
231
2 6 5 7 8
4 5 7 5 3
1 5 2 7 8
Is a 3 x 5 matrix
Matrices are referred to with bold
capitals
232
233
0
I
0
0
1
0
0
0
0
1
0
0
0
234
Matrix Operations
Transposition
A matrix is transposed by putting it
on its side
Transpose of A is A A 7 5 6
7
A ' 5
6
235
Matrix multiplication
A matrix can be multiplied by a scalar,
a vector or a matrix
Not commutative
AB BA
To multiply AB
Number of rows in A must equal number
of columns in B
236
Matrix by vector
a
b
c
e
f
2
7
17
3
11
19
g
h
i
13
23
k
l
3 5
11 13
17 19 23
2
3
4
aj dk gl
bj ek hl
cj fk il
4 9 20
14 33 52
34 57 92
2 33
3 99
4 141
237
43
90
183
Matrix by matrix
a b e
c d g
f
h
ae cf
ce dg
af bh
cf dh
2 3 2 3 4 12 6 15
5 7 4 5 10 28 15 35
16 21
38 50
238
AI A
2 3 1 0
2 3
5 7 0 1
5 7
239
We will do a 2x2
Much more difficult for larger matrices
241
a b
A
c d
A ad cb
1.0 0.3
A
0.3 1.0
A 1 1 0.3 0.3
A 0.91
242
Described as:
Not positive definite
Singular (if determinant is zero)
In different error messages
243
c d
d b
adj A
c a
Now
1
A
adj A
A
1
244
Find A-1
1.0 0.3
A
0.3 1.0
A 0.91
1.0 0.3
1
0.33 1.10
245
246
Determinants
Determinant of a correlation matrix
The volume of space taken up by the
(hyper) sphere that contains all of the
points
1.0 0.0
A
0.0 1.0
A 1.0
247
X
X
1.0 0.0
A
0.0 1.0
A 1.0
248
X
X
X
1.0 1.0
A
1.0 1.0
A 0.0
249
Negative Determinant
Points take up less than no
space
Correlation matrix cannot exist
Non-positive definite matrix
250
Sometimes Obvious
1.0 1.2
A
1
.
2
1
.
0
A 0.44
251
1
0.9 0.9
0.9
1
0.9
0.9 0.9
1
A 2.88
252
Sometimes No Idea
1.00 0.76 0.40
A 0.76
1
0.30
0.40 0.30
A 0.01
A 0.75
1
0.30
0.40 0.30
A 0.0075
253
Ri .123...k
1
1
aii
254
Regression Weights
Where i is DV
j is IV
bi . j
aij
aij
255
Y XB E
257
Where
Y = vector of DV
X = matrix of IVs
B = vector of coefficients
258
1 0 9
1 1 5
1 0 10
1 2 16
1 4 10
1 4 20
1 1 11
1 4 20
1 3 15
1 0 15
b0
b
1
b
2
e1
e
2
e3
e4
e5
e6
e7
e8
e
9
e10
45
57
45
51
65
88
44
87
89
59
259
1
1
1
1
1
1
1
1
0
1
0
2
4
4
1
4
3
0
5
10
16
10
20
11
20
15
15
but
number,
it is most
e2 57
convenient to make it 1.
e 45
Used
the
3 to capture
e4 intercept.
51
b0
e5
b1
b e6
2 e
7
e8
e9
e
10
65
88
44
87
89
59
260
1
1
1
1
1
1
1
1
0
1
0
2
4
4
1
4
3
0
5
10
16
10
20
11
20
15
15
e1 45
e2 57
e 45
3
ematrix
51of values
The
4
b0
(books
65
fore5IVs
and
b1
attend)
e
88
6
b2
e7 44
e8 87
e9 89
e 59
10
261
1 0 9
1 1 5
1 0 10
1 2 16
1 4 10
The parameter
1 4are
20
estimates. We
trying to find
1 11
1 the
best values
1 of
4 20
these. 1 3 15
1 0 15
b0
b1
b
2
e1 45
e2 57
e 45
3
e4 51
e 65
5
e6 88
e7 44
e8 87
e9 89
e 59
10
262
1
1
1
1
1
1
1
1
1
0
2
4
4
1
4
3
0
5
10
16
10
20
11
20
15
15
b0
b1
b
2
e1 45
e2 57
e 45
3
e4 51
e 65
5
e6 88
e7 44
e8 87
e9 89
e 59
10
263
e1 45
1 0 9
e2 57
1 1 5
e 45
1 0 10
3
e4 51
1 2 16
1 4 10 b0 e 65
b1 5
1 4 20 e6 88
b2 e
7 44
1 1 11
e8 87
1 4 20
e9 89
1 3 15
e 59
The 1DV0 - 15
grade
10
264
Y=BX+E
Simple way of representing as many IVs
as you like
Y = b0x0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + e
x01
x02
x11
x12
x21
x22
x31
x32
x41
x42
x51
x52
b0
b1
b e
2 1
b3 e2
b
4
b
5
265
b0
b1
Generalises to Multivariate
Case
Y=BX+E
Y, B and E
Matrices, not vectors
267
268
269
270
Lesson 6: More on
Multiple Regression
271
Parameter Estimates
Parameter estimates (b1, b2 bk)
were standardised
Because we analysed a correlation
matrix
Standard Error of
Regression Coefficient
Standardised is easier
1 R
1
SEi
2
n k 1 1 R i
2
Y
Multiple R
The degree of prediction
R (or Multiple R)
No longer equal to b
275
In Terms of Variance
Can also think of this in terms of
variance explained.
Each IV explains some variance in the
DV
The IVs share some of their variance
276
Variance in Y
accounted for by
x1
rx1y2 = 0.36
The total
variance of Y
=1
Variance in Y
accounted for by
x2
rx2y2 =
277 0.36
In this model
R2 = ryx12 + ryx22
R2 = 0.36 + 0.36 = 0.72
R = 0.72 = 0.85
But
If x1 and x2 are correlated
No longer the case
278
Variance in Y
accounted for by
x1
rx1y2 = 0.36
The total
variance of Y
=1
Variance shared
between x1 and x2
(not equal to
rx1x2)
Variance in Y
accounted for by
x2
279
rx2y2 = 0.36
So
We can no longer sum the r2
Need to sum them, and subtract the
shared variance i.e. the correlation
But
Its not the correlation between them
Its the correlation between them as a
proportion of the variance of Y
Based on estimates
R b1ryx1 b2ryx2
2
If rx1x2 = 0
rxy = bx1
Equivalent to ryx12 + ryx22
281
Based on correlations
R
2
2
yx1
2
yx2
1 r
2
x1x2
rx1x2 = 0
Equivalent to ryx12 + ryx22
282
283
Adjusted R2
R2 is an overestimate of population
value of R2
Any x will not correlate 0 with Y
Any variation away from 0 increases R
Variation from 0 more pronounced
with lower N
Need to correct R2
Adjusted R2
284
Calculation of Adj. R2
N 1
Adj. R 1 (1 R )
N k 1
2
1 R2
Proportion of unexplained variance
We multiple this by an adjustment
More variables greater adjustment
More people less adjustment
285
Shrunken R2
Some authors treat shrunken and
adjusted R2 as the same thing
Others dont
286
N 1
N k 1
N 20, k 3
20 1
19
1.1875
20 3 1 16
N 10, k 8
N 10, k 3
10 1
9
9
10 8 1 1
10 1
9
1.5
10 3 1 6
287
Extra Bits
Some stranger things that
can
happen
Counter-intuitive
288
Suppressor variables
Can be hard to understand
Very counter-intuitive
Definition
An independent variable which
increases the size of the parameters
associated with other independent
variables above the size of their
correlations
289
Correlation matrix
Mech
Mech
Verb
Success
1
0.5
0.3
Verb
0.5
1
0
Success
0.3
0
1
290
291
Mechanical ability
b = 0.4
Larger than r!
Verbal ability
b = -0.2
Smaller than r!!
So what is happening?
You need verbal ability to do the test
Not related to mechanical ability
Measure of mechanical ability is
contaminated by verbal ability
292
Low verbal
Negative, because we are talking about
standardised scores
Your mech is really high you did well on
the mechanical test, without being good
at the words
Another suppressor?
x1
x2
y
x1
1
0.5
0.3
x2
0.5
1
0.2
y
0.3
0.2
1
b1 =
b2 =
294
Another suppressor?
x1
x2
y
x1
1
0.5
0.3
x2
0.5
1
0.2
y
0.3
0.2
1
b1 =0.26
b2 = -0.06
295
And another?
x1
x2
y
x1
1
0.5
0.3
x2
0.5
1
-0.2
y
0.3
-0.2
1
b1 =
b2 =
296
And another?
x1
x2
y
x1
1
0.5
0.3
x2
0.5
1
-0.2
y
0.3
-0.2
1
b1 = 0.53
b2 = -0.47
297
One more?
x1
x2
y
x1
1
-0.5
0.3
x2
-0.5
1
0.2
y
0.3
0.2
1
b1 =
b2 =
298
One more?
x1
x2
y
x1
1
-0.5
0.3
x2
-0.5
1
0.2
y
0.3
0.2
1
b1 = 0.53
b2 = 0.47
299
1
0.1
0.1
0.6
Verbal1
0.1
1
0.9
0.6
Verbal2
0.1
0.9
1
0.3
Scores
0.6
0.6
0.3
1
Mech
Verbal1
Verbal2
0.56
1.71
-1.29
Mechanical
About where we expect
Verbal 1
Very high
Verbal 2
Very low
303
What is going on
Its a suppressor again
An independent variable which
increases the size of the parameters
associated with other independent
variables above the size of their
correlations
304
Variable Selection
What are the appropriate
independent variables to use in a
model?
Depends what you are trying to do
Prediction
What will happen
in the future?
Emphasis on
practical
application
Variables selected
(more) empirically
Value free
Explanation
Why did
something
happen?
Emphasis on
understanding
phenomena
Variables selected
theoretically
Not value free
306
Hierarchical
Variables entered in a predetermined
order
Stepwise
Variables entered according to
change in R2
Actually a family of techniques
308
Entrywise
All variables entered simultaneously
All treated equally
Hierarchical
Entered in a theoretically determined
order
Change in R2 is assessed, and tested
for significance
e.g. sex and age
Should not be treated equally with other
variables
Sex and age MUST be first
Stepwise
Variables entered empirically
Variable which increases R2 the most
goes first
Then the next
Example
IVs: Sex, age, extroversion,
DV: Car how long someone spends
looking after their car
310
Correlation Matrix
SEX
SEX
AGE
EXTRO
CAR
AGE
1.00
-0.05
0.40
0.66
-0.05
1.00
0.40
0.23
EXTRO CAR
0.40
0.66
0.40
0.23
1.00
0.67
0.67
1.00
311
Entrywise analysis
r2 = 0.64
SEX
AGE
EXTRO
b
0.49
0.08
0.44
p
<0.01
0.46
<0.01
312
Stepwise Analysis
Data determines the order
Model 1: Extroversion, R2 = 0.450
Model 2: Extroversion + Sex, R2 =
0.633
EXTRO
SEX
b
0.48
0.47
p
<0.01
<0.01
313
Hierarchical analysis
Theory determines the order
Model 1: Sex + Age, R2 = 0.510
Model 2: S, A + E, R2 = 0.638
Change in R2 = 0.128, p = 0.001
SEX
0.49
< 0.01
A GE
0.08
0.46
E X TRO
0.44
< 0.01
314
Hierarchical
The change in R2 gives the best estimate
of the importance of extroversion
316
N is large
40 people per predictor, Cohen, Cohen,
Aiken, West (2003)
A quick note on R2
R2 is sometimes regarded as the fit
of a regression model
Bad idea
318
Critique of Multiple
Regression
Goertzel (2002)
Myths of murder and multiple
regression
Skeptical Inquirer (Paper B1)
But:
More guns in rural Southern US
More crime in urban North (crack
cocaine epidemic at time of data)
320
Legalised Abortion
Donohue and Levitt (1999)
Legalised abortion in 1970s cut crime in
1990s
Another Critique
Berk (2003)
Regression analysis: a constructive critique
(Sage)
Is Regression Useless?
Do regression carefully
Dont go beyond data which you have
a strong theoretical understanding of
Validate models
Where possible, validate predictive
power of models in other areas,
times, groups
Particularly important with stepwise
324
Lesson 7: Categorical
Independent Variables
325
Introduction
326
Introduction
So far, just looked at continuous
independent variables
Also possible to use categorical
(nominal, qualitative) independent
variables
e.g. Sex; Job; Religion; Region; Type
(of anything)
Historical Note
But these (t-test/ANOVA) are
special cases of regression analysis
Aspects of General Linear Models
(GLMs)
It is much easier to do it by
partitioning of sums of squares
These cases
Very rare in applied research
Very common in experimental
research
Fisher worked at Rothamsted agricultural
research station
Never have problems manipulating
wheat, pigs, cabbages, etc
329
In psychology
Led to a split between experimental
psychologists and correlational
psychologists
Experimental psychologists (until
recently) would not think in terms of
continuous variables
The Approach
331
The Approach
Recode the nominal variable
Into one, or more, variables to represent
that variable
333
The Techniques
334
Effect coding
For >2 groups
O rig in a l
New
C a t e g o ry
V a ria b le
E xp
Con
337
Some data
Group is x, score is y
Control
Group
Experiment 1
Experiment 2
Experiment 3
Experimental
Group
10
10
10
20
10
30
338
Control Group = 0
Intercept = Score on Y when x = 0
Intercept = mean of control group
Experimental Group = 1
b = change in Y when x increases 1
unit
b = difference between experimental
group and control group
339
35
30
Gradient of slope
25 represents
difference
between
20
means
15
10
5
0
Control Group
Experiment 1
Experimental Group
Experiment 2
Experiment 3
340
Dummy Coding 3+
Groups
With three groups the approach is
the similar
g = 3, therefore g-1 = 2 variables
needed
3 Groups
Control
Experimental Group 1
Experimental Group 2
341
Original
Category
Con
Gp1
Gp2
Gp1
Gp2
0
1
0
0
0
1
F and associated p
Tests H0 that
g1 g 2 g 3
b1 and b2 and associated p-values
Test difference between each
experimental group and the control
group
343
344
Effect Coding
Usually used for 3+ groups
Compares each group (except the
reference group) to the mean of all
groups
Dummy coding compares each group to the
reference group.
Examples
Dummy coding and Effect Coding
Group 1 chosen as reference group
each time
Data
Group
Mean
SD
1
52.40
4.60
2
56.30
5.70
3
60.10
5.00
Total
56.27
5.88
347
Dummy
Group
dummy
2
dummy
3
1
2
3
0
1
0
0
0
1
Group
Effect2
effect3
1
2
3
-1
1
0
-1
0
1
Effect
348
Dummy
Effect
R=0.543, F=5.7,
R=0.543, F=5.7,
df=2, 27, p=0.009
df=2, 27, p=0.009
b0 = 52.4,
b0 = 56.27,
b1 = 3.9, p=0.100
b1 = 0.03, p=0.980
b2 = 7.7, p=0.002
b2 = 3.8, p=0.007
b0 g1
b0 G
b1 g2 g1
b1 g2 G
b2 g3 g1
b2 g3 G
349
In SPSS
SPSS provides two equivalent
procedures for regression
Regression (which we have been using)
GLM (which we havent)
GLM will:
Automatically code categorical variables
Automatically calculate interaction terms
GLM wont:
Give standardised effects
Give hierarchical R2 p-values
Allow you to not understand
350
351
Test
(Which is a trick; but its designed to
make you think about it)
354
High Stress
Low Stress
AM
20.1
22.3
PM
6.8
11.8
Diff
13.3
10.5
Using regression
Ensures that all the variance that is
subtracted is true
Reduces the error variance
Two effects
Adjusts the means
Compensates for differences between
groups
358
In SPSS
SPSS automates all of this
But you have to understand it, to
know what it is doing
359
Outcome
here
Categorical
predictors here
Continuous
predictors here
Click options
360
Select
parameter
estimaters
361
More on Change
If difference score is correlated
with either pre-test or post-test
Subtraction fails to remove the
difference between the scores
If two scores are uncorrelated
Difference will be correlated with both
Failure to control
Equal SDs, r = 0
Correlation of change and pre-score
=0.707
362
Lesson 8: Assumptions in
Regression Analysis
364
The Assumptions
1. The distribution of residuals is normal
(at each value of the dependent
variable).
2. The variance of the residuals for every
set of values for the independent
variable is equal.
violation is called
heteroscedasticity.
3. The error term is additive
no interactions.
366
367
Assumption 1: The
Distribution of Residuals is
Normal at Every Value of
the Dependent Variable
368
Look at Normal
Distributions
A normal distribution
symmetrical, bell-shaped (so they
say)
369
Kurtosis
too flat or too peaked
kurtosed
Outliers
Individual cases which are far from the
distribution
370
Kurtosis
mean not biased
standard deviation is
and hence standard errors, and
significance tests
371
Examining Univariate
Distributions
Histograms
Boxplots
P-P plots
Calculation based methods
372
Histograms
30
A and B
30
20
20
10
10
373
C and D
40
14
12
30
10
8
20
6
10
374
E&F
20
10
375
4
4
376
Boxplots
377
P-P Plots
A&B
1.00
1.00
.75
.75
.50
.50
.25
.25
0.00
0.00
.25
.50
.75
1.00
0.00
0.00
.25
.50
378
.75
1.00
C&D
1.00
1.00
.75
.75
.50
.50
.25
.25
0.00
0.00
.25
.50
.75
1.00
0.00
0.00
.25
.50
379
.75
1.00
E&F
1.00
1.00
.75
.75
.50
.50
.25
.25
0.00
0.00
.25
.50
.75
1.00
0.00
0.00
.25
.50
.75
380
1.00
Calculation Based
Skew and Kurtosis statistics
Outlier detection statistics
381
382
-0.12
0.271
0.454
0.117
2.106
0.171
0.172
0.172
0.172
0.172
0.172
0.172
-0.084
0.265
1.885
-1.081
5.75
-0.21
0.342
0.342
0.342
0.342
0.342
0.342
383
Outlier Detection
Calculate distance from mean
z-score (number of standard deviations)
deleted z-score
that case biased the mean, so remove it
Calculate influence
how much effect did that case have on the
mean?
384
Non-Normality in
Regression
385
Checks on Normality
Check residuals are normally
distributed
SPSS will draw histogram and p-p plot
of residuals
Regression Diagnostics
Residuals
standardised, unstandardised, studentised,
deleted, studentised-deleted
look for cases > |3| (?)
Influence statistics
Look for the effect a case has
If we remove that case, do we get a
different answer?
DFBeta, Standardised DFBeta
changes in b
388
Covariance ratio
Ratio of the determinants of the
covariance matrices, with and without
the case
Distances
measures of distance from the
centroid
some include IV, some dont
389
More on Residuals
Residuals are trickier than you
might have imagined
Raw residuals
OK
Standardised residuals
Residuals divided by SD
se
e
n k 1
2
390
Leverage
But
That SD is wrong
Variance of the residuals is not equal
Those further from the centroid on the
predictors have higher variance
Need a measure of this
xi x
1
hi
2
n ( x x )
2
392
1
hi hi
n
2
xi x
*
hi
2
( x x )
*
393
Multiple predictors
Calculate the hat matrix (H)
Leverage values are the diagonals of
this matrix
H X(X' X) X'
Where X is the augmented matrix of
predictors (i.e. matrix that includes
the constant)
Hence leverage hii element ii of H
394
1 15 1 15 1 15
1 20 1 20 1 20
H
1 65
1 65 1 65
1 15
0.318 0.273
1 20
0.273 0.236
... ...
1 65
395
0.318
Standardised / Studentised
Now we can calculate the
standardised residuals
SPSS calls them studentised residuals
Also called internally studentised
residuals
ei
ei
se 1 hi
396
Deleted Studentised
Residuals
Studentised residuals do not have
a known distribution
Cannot use them for inference
Testing Significance
We can calculate the probability of
a residual
Is it sampled from the same
population
BUT
Massive type I error rate
Bonferroni correct it
Multiply p value by N
398
Bivariate Normality
We didnt just say residuals
normally distributed
We said at every value of the
dependent variables
Two variables can be normally
distributed univariate,
but not bivariate
399
Couples IQs
male and female
FEMALE
MALE
5
6
4
Frequency
Frequency
0
60.0
70.0
80.0
90.0
100.0
110.0
120.0
130.0
140.0
0
60.0
70.0
80.0
90.0
100.0
110.0
120.0
130.0
140.0
But wait!!
160
140
120
100
M ALE
80
60
40
40
60
80
100
120
140
FEMALE
401
160
So plot X against Y
OK for bivariate
but may be a multivariate outlier
Need to draw graph in 3+ dimensions
cant draw a graph in 3 dimensions
IQ histogram of residuals
12
10
403
Multivariate Outliers
Will be explored later in the
exercises
So we move on
404
Transform data
removes skew
positive skew log transform
negative skew - square
405
Transformation
May need to transform IV and/or DV
More often DV
time, income, symptoms (e.g. depression) all
positively skewed
Change measures
increase sensitivity at ranges
avoiding floor and ceiling effects
Outliers
Can be tricky
Why did the outlier occur?
Error? Delete them.
Weird person? Probably delete them
Normal person? Tricky.
407
Which is better?
A good model, which explains 99% of
your data?
A poor model, which explains all of it
409
410
Heteroscedasticity
This assumption is a about
heteroscedasticity of the residuals
Hetero=different
Scedastic = scattered
160
140
120
100
80
MALE
60
40
40
60
FEMALE
80
100
120
140
412
160
standardised residuals
deleted residuals
standardised deleted residuals
studentised residuals
R e s id u a l
Good no heteroscedasticity
Predicted Value
414
R e s id u a l
Bad heteroscedasticity
Predicted Value
415
Testing Heteroscedasticity
Whites test
1.
2.
3.
4.
6. Test statistic = N x R2
Distributed as 2
Df = k (for second regression)
-2
-4
-2
Magnitude of
Heteroscedasticity
Chop data into slices
5 slices, based on X (or predicted
score)
Done in SPSS
420
Variances
1
of.219the 5 groups
2
.336
.757
.751
3.119
We have a problem
3 / 0.2 ~= 15
421
Dealing with
Heteroscedasticity
4.
5.
6.
7.
Heteroscedasticity
Implications and Meanings
Implications
What happens as a result of
heteroscedasticity?
Parameter estimates are correct
not biased
However
If there is no skew in predicted
scores
P-values a tiny bit wrong
If skewed,
P-values very wrong
Can do exercise
425
Meaning
What is heteroscedasticity trying
to tell us?
Our model is wrong it is misspecified
Something important is happening
that we have not accounted for
b0 = 0.24, p=0.97
b1 = 0.71, p < 0.001
b2 = 0.23, p = 0.031
Whites test
2 = 18.6, df=5, p=0.002
Which means
the effects of the variables are not
additive
If you think that what a charity does
is important
you might give more money
how much more depends on how much
money you have
429
70
60
50
40
GIVEN
30
Earnings
20
High
10
Low
4
10
12
14
16
IMPORT
430
431
432
Additivity
What heteroscedasticity shows you
effects of variables need to be additive
In medicine
Choose to test for salient non-additive
effects
e.g. sex, race
435
Assumption 4: At every
value of the dependent
variable the expected
(mean) value of the
residuals is zero
436
Linearity
Relationships between variables should
be linear
best represented by a straight line
437
Fuel
Speed
438
R2 = 0.938
looks pretty good
know speed, make a good prediction
of fuel
BUT
look at the chart
if we know speed we can make a
perfect prediction of fuel used
R2 should be 1.00
439
Detecting Non-Linearity
Residual plot
just like heteroscedasticity
440
Residual plot
441
Linearity: A Case of
Additivity
Linearity = additivity along the range of
the IV
Jeremy rides his bicycle harder
Increase in speed depends on current speed
Not additive, multiplicative
MacCallum and Mar (1995). Distinguishing
between moderator and quadratic effects in
multiple regression. Psychological Bulletin.
442
443
Independence Assumption
Also: lack of autocorrelation
Tricky one
often ignored
exists for almost all tests
How is it Detected?
Can be difficult
need some clever statistics
(multilevel models)
Residual Plots
Were data collected in time order?
If so plot ID number against the
residuals
Look for any pattern
Test for linear relationship
Non-linear relationship
Heteroscedasticity
446
R
e
s
id
u
a
l
2
1
0
--1
201
0P
2
0
3
0
4
0
a
rtic
p
a
n
tN
u
m
b
e
r
447
clusters of cases
patients treated by three doctors
children from different classes
people assessed in groups
448
An example
students do an exam (on statistics)
choose one of three questions
IV: time
DV: grade
449
Grade
30
20
10
10
Time
20
30
40
50
60
450
70
BUT
we havent considered which question
people answered
we might have violated the
independence assumption
DV will be autocorrelated
Look again
with questions marked
451
Question
Grade
30
20
10
10
1
20
30
40
50
60
70
Time
452
453
454
Assumption 6: All
independent variables are
uncorrelated with the
error term.
455
It is about the DV
must have no effect (when the IVs
have been removed)
on the DV
456
Problem in economics
Demand increases supply
Supply increases wages
Higher wages increase demand
457
Assumption 7: No
independent variables are
a perfect linear function
of other independent
variables
no perfect multicollinearity
458
No Perfect Multicollinearity
IVs must not be linear functions of one
another
matrix of correlations of IVs is not positive
definite
cannot be inverted
analysis cannot proceed
460
461
Y 0 1 x1
Y ( 0 3) 1 x1 ( 3)
- note, Greek letters because we are
talking about population values
462
463
1
3
1
2
1
1
0
9
8
76789x
1
0
1
1
2
1
3
1
464
465
Lesson 9: Issues in
Regression Analysis
Things that alter the
interpretation of the
regression equation
466
Causality
Sample sizes
Collinearity
Measurement error
467
Causality
468
What is a Cause?
Debate about definition of cause
some statistics (and philosophy)
books try to avoid it completely
We are not going into depth
just going to show why it is hard
I exist because
My parents met because
My father had a job
Proximal cause
the direct and immediate cause of
something
Ultimate cause
the thing that started the process off
I fell off my bicycle because of the
bump
I fell off because I was going too fast
471
474
Association
Correlation does not mean causation
we all know
But
Causation does mean correlation
Price
Price
Demand
Sales
1
0.6
0
Demand
0.6
1
0.6
Sales
0
0.6
1
477
Direction of Influence
Relationship between A and B
three possible processes
A
B causes A
C causes A & B
A causes B
478
Storm
Isolation
Isolate the dependent variable
from all other influences
as experimenters try to do
Cannot do this
can statistically isolate the effect
using multiple regression
480
Role of Theory
Strong theory is crucial to making
causal statements
Fisher said: to make causal
statements make your theories
elaborate.
dont rely purely on statistical
analysis
483
I drink a lot
of beer
16 causal
relations
120 non-causal
correlations
laugh
toilet
jokes (about
statistics)
vomit
karaoke
curtains closed
sleeping
headache
equations (beermat)
thirsty
fried breakfast
no beer
curry
chips
falling over
lose keys
484
485
1.
2.
3.
4.
5.
6.
7.
8.
488
No Causation without
Experimentation
Blatantly untrue
I dont doubt that the sun shining
makes us warm
AI and Causality
A robot needs to make judgements
about causality
Needs to have a mathematical
representation of causality
Suddenly, a problem!
Doesnt exist
Most operators are non-directional
Causality is directional
490
Sample Sizes
How many subjects does it
take to run a regression
analysis?
491
Introduction
Social scientists dont worry enough
about the sample size required
Why didnt you get a significant result?
I didnt have a large enough sample
Not a common answer
493
Rules of Thumb
Lots of simple rules of thumb exist
10 cases per IV
>100 cases
Green (1991) more sophisticated
To test significance of R2 N = 50 + 8k
To test sig of slopes, N = 104 + k
Power Analysis
Introducing Power Analysis
Hypothesis test
tells us the probability of a result of
that magnitude occurring, if the null
hypothesis is correct (i.e. there is no
effect in the population)
Doesnt tell us
the probability of that result, if the
null hypothesis is false
495
496
Type I Errors
Type I error is false rejection of H0
Probability of making a type I error
the significance value cut-off
usually 0.05 (by convention)
Type II errors
Type II error is false acceptance of
the null hypothesis
Much, much trickier
Example
I do an experiment (random
sampling, all assumptions perfectly
satisfied)
I find p = 0.05
498
Power = 1 Beta
Probability of getting a significant
result
500
Research
Findings
H0 True
(no effect to
be found)
H0 false
(effect to be
found)
H0 true (we
find no effect
p > 0.05)
Type II error
p =
power = 1 -
H0 false (we
find an effect
p < 0.05)
Type I error
p=
501
503
504
R
f
2
1 R
2
506
sri
f
2
1 R
2
508
Underpowered Studies
Research in the social sciences is
often underpowered
Why?
See Paper B11 the persistence of
underpowered studies
510
Extra Reading
Power traditionally focuses on p
values
What about CIs?
Paper B8 Obtaining regression
coefficients that are accurate, not
simply significant
511
Collinearity
512
514
Meaning of Collinearity
Literally co-linearity
lying along the same line
Perfect collinearity
when some IVs predict another
Total = S1 + S2 + S3 + S4
S1 = Total (S2 + S3 + S4)
rare
515
516
Implications
Effects the stability of the
parameter estimates
and so the standard errors of the
parameter estimates
and so the significance
Because
shared variance, which the regression
procedure doesnt know where to put
517
Sex differences
due to genetics?
due to upbringing?
(almost) perfect collinearity
statistically impossible to tell
519
520
Detecting Collinearity
Look at the parameter estimates
large standardised parameter
estimates (>0.3?), which are not
significant
be suspicious
Tolerance 1-R
1
VIF
Tolerance
522
Actions
What you can do about collinearity
no quick fix (Fox, 1991)
get a bigger N
Many measures
4. Ridge regression
Measurement Error
526
What is Measurement
Error
In social science, it is unlikely that
we measure any variable perfectly
measurement error represents this
imperfection
xT e
just like a regression equation
standardise the parameters
T is the reliability
the amount of variance in x which comes from
T
Simple Effects of
Measurement Error
Lowers the measured correlation
between two variables
Real correlation
true scores (x* and y*)
Measured correlation
measured scores (x and y)
529
True correlation
of x and y
rx*y*
x*
y*
Reliability of x
rxx
Reliability of y
ryy
Measured
correlation of x and y
rxy
530
Attenuation of correlation
rx * y *
rxy
rxx ryy
531
Example
rxx 0.7
ryy 0.8
rxy 0.3
rx* y*
rx* y*
rxy
rxx ryy
0.3
0.40
0.7 0.8
532
Complex Effects of
Measurement Error
Really horribly complex
Measurement error reduces
correlations
reduces estimate of
reducing one estimate
increases others
Complications
Assume measurement error is
additive
linear
Additive
e.g. weight people may under-report /
over-report at the extremes
Linear
particularly the case when using proxy
variables
535
536
537
Introduction
Non-linear effect occurs
when the effect of one independent
variable
is not consistent across the range of
the IV
Assumption is violated
expected value of residuals = 0
no longer the case
538
Some Examples
539
Skill
A Learning Curve
Experience
540
Performance
Arousal
541
Suicidal
Enthusiastic
Time
3.5
542
Learning
line changed direction once
Yerkes-Dodson
line changed direction once
Enthusiasm
line changed direction twice
543
Everything is Non-Linear
Every relationship we look at is
non-linear, for two reasons
Exam results cannot keep increasing
with reading more books
Linear in the range we examine
Non-Linear
Transformations
545
Transformations
We need to transform the data
rather than estimating a curved line
which would be very difficult
may not work with OLS
Much trickier
Statistical theory either breaks down
OR becomes harder
547
Linear transformations
multiply by a constant
add a constant
change the slope and the intercept
548
y=2x
y=x + 3
y=x
x
549
Non-linear transformation
will bend the slope
quadratic transformation
y = x2
one change of direction
550
Cubic transformation
y = x2 + x3
two changes of direction
551
Quadratic Transformation
552
y=20 + -3x +
5x
553
Cubic Transformation
6
y = 3 - 4x + 2x2 - 0.2x3
5
4
3
2
1
0
0
554
Logarithmic Transformation
y = 1 + 0.1x + 10log(x)
555
Inverse Transformation
y = 20 -10x + 8(1/x)
556
557
Detecting Non-linearity
558
Draw a Scatterplot
Draw a scatterplot of y plotted
against x
see if it looks a bit non-linear
e.g. Anscombes data
e.g. Education and beginning salary
from bank data
drawn in SPSS
with line of best fit
559
Anscombe (1973)
constructed a set of datasets
show the importance of graphs in
regression/correlation
11
9
7.5
y = 3 + 0.5x
110
0.82
0.67
560
561
562
563
564
A Real Example
Starting salary and years of
education
From employee data.sav
565
Beginning Salary
Expected
value of error
(residual) is >
0
Expected
value of error
(residual) is <
0
566
567
We want
points to lie in a nice straight sausage
568
We dont want
a nasty bent sausage
569
-2
-2
-1
570
571
Linear Transformation
Linear transformation doesnt
change
interpretation of slope
standardised slope
se, t, or p of slope
R2
Can change
effect of a transformation
572
Non-linear Effect
Compute new variable
quadratic
educ2 = educ2
Standardised
b1 (educ) = -2.4
b2 (educ2) = 3.1
Collinearity
is what is going on
Correlation of educ and educ2
r = 0.990
Cubic Effect
While we are at it, lets look at the
cubic effect
R2 (change) = 0.004, p = 0.045
19138 + 103 e + -206 e2 + 12 e3
Standardised:
b1(e) = 0.04
b2(e2) = -2.04
b3(e3) = 2.71
577
Fourth Power
Keep going while we are ahead
wont run
???
Interpretation
Tricky, given that parameter
estimates are a bit nonsensical
Two methods
1: Use R2 change
Save predicted values
or calculate predicted values to plot line
of best fit
50000
40000
30000
Beginning Salary
20000
Cubic
10000
Quadratic
0
Linear
8
10
12
Education (Years)
14
16
18
20
22
580
581
Education Slope
9
-962
10
-342
11
278
12
898
13
1518
14
2138
15
2758
16
3378
17
3998
18
4618
19
5238
20
5858
1 year of
education at the
higher end of the
scale, better than
1 year at the lower
end of the scale.
MBA versus GCSE
582
Differentiate Cubic
19138 + 103 e + -206 e2 + 12
e3
dy/dx = 103 206 2 e + 12 3
e2
Can calculate slopes for quadratic
and cubic at different values
583
A Quick Note on
Differentiation
For y = xp
dx/dy = pxp-1
y = 4x + 5x2 + 6x3
dx/dy = 4 + 5 2 x + 6 3 x2
Many functions are simple to
differentiate
Not all though
586
Automatic Differentiation
If you
Dont know how to differentiate
Cant be bothered to look up the
function
588
589
Introduction
Often in social sciences, we have a
dichotomous/nominal DV
we will look at dichotomous first, then a
quick look at multinomial
Dichotomous DV
e.g.
guilty/not guilty
pass/fail
won/lost
Alive/dead (used in medicine)
590
591
Exp
6
15
12
6
15
6
16
10
12
26
Pass
0
0
0
0
1
0
1
1
0
1
593
DV
pass (1 = Yes, 0 = No)
Or does it?
1st Problem pp plot of residuals
1.00
E xp ec te d C u m P rob
.75
.50
.25
0.00
0.00
.25
.50
.75
1.00
595
596
Problems 1 and 2
strange distributions of residuals
parameter estimates may be wrong
standard errors will certainly be
wrong
597
Cannot be interpreted
need a different approach
598
A Different Approach
Logistic Regression
599
Logit Transformation
In lesson 10, transformed IVs
now transform the DV
No lower limit
you cant do worse than fail
600
Step 1: Convert to
Probability
First, stop talking about values
talk about probability
for each value of score, calculate
probability of pass
601
probability of
failure given a
score of 1 is 0.7
Score 1 2 3 4 5
N
7 5 6 4 2
Fail
P
0.7 0.5 0.6 0.4 0.2
N
3 5 4 6 8
Pass
P
0.3 0.5 0.4 0.6 0.8
probability of
passing given a
score of 5 is 0.8
602
This is better
Now a score of 0.41 has a meaning
a 0.41 probability of pass
603
604
605
0.8 = 0.8/0.2 = 4
equivalent to 4:1 (odds on)
4 times out of five
606
607
log( x )
10
log(10) = 1
log(100) = 2
log(1000) = 3
608
log(1) = 0
log(0.1) = -1
log(0.00001) = -5
609
Natural log, ln
Has some desirable properties, that
log10 doesnt
For us
If y = ln(x) + c
dy/dx = 1/x
Not true for any other logarithm
610
611
612
613
Score 1
Fail
Pass
N
P
N
P
Odds (Fail)
log(odds)fail
7
5
6
4
0.7 0.5 0.6 0.4
3
5
4
6
0.3 0.5 0.4 0.6
5
2
0.2
8
0.8
614
probability
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Probability gets
closer to zero, but
never reaches it as
logit goes down.
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0.5
1.5
Logit
615
2.5
3.5
616
Parameter Estimation
using ML
ML tries to find estimates of model
parameters that are most likely to
give rise to the pattern of
observations in the sample data
All gets a bit complicated
OLS is a special case of ML
the mean is an ML estimator
617
Interpreting Output
Using SPSS
Overall fit for:
step (only used for stepwise)
block (for hierarchical)
model (always)
in our model, all are the same
2=4.9, df=1, p=0.025
F test
619
df
Sig.
Step
4.990
.025
Block
4.990
.025
Model
4.990
.025
620
Model summary
-2LL (=2/N)
Cox & Snell R2
Nagelkerke R2
Different versions of R2
No real R2 in logistic regression
should be considered pseudo R2
621
Model Summary
Step
1
-2 Log
likelihood
64.245
.095
Nagelkerke
R Square
.127
622
Classification Table
predictions of model
based on cut-off of 0.5 (by default)
predicted values x actual values
623
Classification Tablea
Predicted
PASS
Observed
Step 1
PASS
Percentage
Correct
18
69.2
12
12
50.0
Overall P ercentage
60.0
a. The cut value is .500
624
Model parameters
B
Change in the logged odds associated
with a change of 1 unit in IV
just like OLS regression
difficult to interpret
SE (B)
Standard error
Multiply by 1.96 to get 95% CIs
625
S.E.
Wald
SCORE
-.467
.219
4.566
Constant
1.314
.714
3.390
score
Constant
Sig.
.386
Exp(B)
1.263
.199
.323
Lower
.744
Upper
2.143
Constant
i.e. score = 0
B = 1.314
Exp(B) = eB = e1.314 = 3.720
OR = 3.720, p = 1 (1 / (OR + 1))
= 1 (1 / (3.720 + 1))
p = 0.788
627
Score 1
Constant b = 1.314
Score B = -0.467
Exp(1.314 0.467) = Exp(0.847)
= 2.332
OR = 2.332
p = 1 (1 / (2.332 + 1))
= 0.699
628
Symmetrical in B
Non-symmetrical (sometimes very) in
exp(B)
629
S.E .
Exp(B)
SCORE
-.467
.219
.627
Constan
t
1.314
.714
3.720
Lower
.408
630
Upper
.962
631
633
Probit Regression
Very similar to logistic
much more complex initial
transformation (to normal
distribution)
Very similar results to logistic
(multiplied by 1.7)
In SPSS:
A bit weird
Probit regression available through
menus
634
However
Ordinal logistic regression is
equivalent to binary logistic
If outcome is binary
635
Results
Estimat
e
SE
Logistic
(binary)
Score
0.288
0.301
0.339
Exp
0.147
0.073
0.043
Logistic
(ordinal)
Score
0.288
0.301
0.339
Exp
0.147
0.073
0.043
Logistic
(probit)
Score
0.191
0.178
0.282
Exp
0.090
0.042
0.033
636
Differentiating Between
Probit and Logistic
Depends on shape of the error term
Normal or logistic
Graphs are very similar to each other
Could distinguish quality of fit
Given enormous sample size
Probit advantage
Understand the distribution
Logistic advantage
Much simpler to get back to the probability
637
638
2.8
2.6
2.4
2.2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
-1.2
-1.4
-1.6
-1.8
-2
-2.2
-2.4
-2.6
-2.8
-3
1.2
Normal (Probit)
Logistic
0.8
0.6
0.4
0.2
Infinite Parameters
Non-convergence can happen
because of infinite parameters
Insoluble model
Three kinds:
Complete separation
The groups are completely distinct
Pass group all score more than 10
Fail group all score less than 10
639
Quasi-complete separation
Separation with some overlap
Pass group all score 10 or more
Fail group all score 10 or less
Both cases:
No convergence
Close to this
Curious estimates
Curious standard errors
640
Categorical Predictors
Can cause separation
Esp. if correlated
Need people in every cell
Male
White
Non-White
Female
White
Non-White
Below
Poverty
Line
Above
Poverty
Line
641
Calculate c statistic
Measure of discriminative power
%age of all possible cases, where the model
gives a higher probability to a correct case
than to an incorrect case
642
Save probabilities
Use Graphs, ROC Curve
Test variable: predicted probability
State variable: outcome
643
Specificity
Probability of saying someone has a
negative result
If they do: p(neg)|neg
644
Sensitivity (value)
P(m)
645
Salary
P(minority)
10
20
30
40
50
60
70
80
90
.39
.31
.23
.17
.12
.09
.06
.04
.03
646
647
R
O
C
u
r
v
e
.0
1
0
.0
8
.6
S
ensit
.0
0
4
.0
2
.D
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
1
S
p
e
c
i
f
t
y
ia
g
o
n
a
ls
e
g
m
e
n
ts
a
rro
d
u
c
e
d
b
y
tie
s
.
Area under
curve is cstatistic
648
More Advanced
Techniques
Multinomial Logistic Regression
more than two categories in DV
same procedure
one category chosen as reference
group
odds of being in category other than
reference
Final Thoughts
Logistic Regression can be
extended
dummy variables
non-linear effects
interactions (even though we dont
cover them until the next lesson)
651
652
653
Introduction
Moderator
Level of one variable influences effect of
another variable
Mediator
One variable influences another via a third
variable
education
beginning
salary
Why?
What is the process?
Are we making assumptions about the
process?
Should we test those assumptions?
655
job skills
expectations
beginning
salary
education
negotiating
skills
kudos
for bank
656
Having
fun in pub
in
evening
not
reading
books on
regressio
n
less
knowledg
e
Anythin
g here?
658
Having
fun in pub
in
evening
not
reading
books on
regressio
n
less
knowledg
e
fatigue
Still
needed
?
659
Mediators needed
to cope with more sophisticated
theory in social sciences
make explicit assumptions made
about processes
examine direct and indirect influences
660
Detecting Mediation
661
4 Steps
From Baron and Kenny (1986)
To establish that the effect of X on Y
is mediated by M
1. Show that X predicts Y
2. Show that X predicts M
3. Show that M predicts Y, controlling
for X
4. If effect of X controlling for M is
zero, M is complete mediator of the
662
relationship
Buy books
Read Books
663
Three Variables
Enjoy
How much an individual enjoys books
Buy
How many books an individual buys
(in a year)
Read
How many books an individual reads
(in a year)
664
ENJOY
BUY
READ
ENJOY BUY
READ
1.00
0.64
0.73
0.64
1.00
0.75
0.73
0.75
1.00
665
The Theory
enjoy
buy
read
666
Step 1
1. Show that X (enjoy) predicts Y
(read)
b1 = 0.487, p < 0.001
standardised b1 = 0.732
OK
667
668
669
b2 = 0.287, p = 0.001
standardised b2 = 0.431
Hmmmm
670
0.287
(step 4)
enjoy
read
buy
0.974
(from step
2)
0.206
(from step
3)
671
SE of Mediator
enjoy
buy
a
(from step
2)
read
b
(from step
2)
sa = se(a)
sb = se(b)
673
Sobel test
Standard error of mediation
coefficient can be calculated
se b s + a s - s s
2 2
a
a = 0.974
sa = 0.189
2 2
b
2 2
a b
b = 0.206
sb = 0.054
674
675
A Note on Power
Recently
Move in methodological literature away
from this conventional approach
Problems of power:
Several tests, all of which must be
significant
Type I error rate = 0.05 * 0.05 = 0.0025
Must affect power
676
677
678
679
Introduction
Moderator relationships have many
different names
interactions (from ANOVA)
multiplicative
non-linear (just confusing)
non-additive
681
Hang on
That seems very like a nonlinear
relationship
Moderator
Effect of one variable depends on level of another
Non-linear
Effect of one variable depends on level of itself
682
684
685
686
Presence of heteroscedasticity
Clue there may be a moderated
relationship missing
688
689
2 IVs
Data
1,
1,
2,
2,
1
2
1
2
5 per group
lesson12.1.sav
690
Recog
Recall
Total
691
Graph of means
18
16
14
12
10
WORDS
8
1.00
6
1.00
2.00
2.00
TEST
692
ANOVA Results
Standard way to analyse these
data would be to use ANOVA
Words: F=6.1, df=1, 16, p=0.025
Test: F=5.1, df=1, 16, p=0.039
Words x Test: F=5.6, df=1, 16,
p=0.031
693
word
-1
1
-1
1
test
-1
-1
1
1
wxt
1
-1
-1
1
695
696
b0=13.2
b1 (words) = -2.3, p=0.025
b2 (test) = -2.1, p=0.039
b3 (words x test) = -2.2, p=0.031
697
b0 = 13.2
grand mean
b1 = -2.3
distance from grand to mean for two
word types
13.2 (-2.3) = 15.5
13.2 + (-2.3) = 10.9
Recog
Recall
Total
b2 = -2.1
distance from grand mean to recog
and recall means
b3 = -2.2
to understand b3 we need to look at
predictions from the equation without
this term
699
700
b1 = -2.3, b2 = -2.1
W
Word
Test
Expected Value
Cog
-1
-1
Call
-1
Cog
-1
Call
701
W
C
C
A
A
T
Word Test Exp
Actual Value
Call
-1 -1
17.6
15.4
Cog
-1
1
13.4
15.6
Call
1 -1
13.0
15.2
Cog
1
1
8.8
11.0
703
Gradient =
(11.1 - 15.3) / 2 =
-2.1
2
0
Recog (-1)
Recall (1)
Test Type
704
Both word
groups (2.1)
Abstract
(6.6 - 15.2 )/2
= -4.3
Test Type
Concrete
(15.6-15.4 )/2
= 0.1
Recall (1)
705
706
as we shall see
708
Categorical x Continuous
709
Note on Dichotomisation
Very common to see people
dichotomise a variable
Makes the analysis easier
Very bad idea
Paper B6
710
Data
A chain of 60 supermarkets
examining the relationship
between profitability, shop size,
and local competition
2 IVs
shop size
comp (local competition, 0=no,
1=yes)
DV
profit
711
Comp
1
1
0
0
1
1
0
1
0
0
Profit
23
25
19
9
18
33
17
20
21
8
712
1st Analysis
Two IVs
R2=0.367, df=2, 57, p < 0.001
Unstandardised estimates
b1 (shopsize) = 0.083 (p=0.001)
b2 (comp) = 5.883 (p<0.001)
Standardised estimates
b1 (shopsize) = 0.356
b2 (comp) = 0.448
713
Suspicions
Presence of competition is likely to
have an effect
Residual plot shows a little
heteroscedasticity
3
-1
-2
-3
-2.0
-1.5
-1.0
-.5
0.0
.5
1.0
1.5
714
2.0
Hierarchical regression
715
Result
Unstandardised estimates
b1 (shopsize) = 0.071 (p=0.006)
b2 (comp) = -1.67 (p = 0.506)
b3 (sxc) = -0.050 (p=0.050)
Standardised estimates
b1 (shopsize) = 0.306
b2 (comp) = -0.127
b3 (sxc) = -0.389
716
717
Interpretation
Draw graph with lines of best fit
drawn automatically by SPSS
718
40
30
20
Profit
10
Competition
No competition
All Shops
0
20
40
60
80
100
Shopsize
719
Effects of size
in presence and absence of
competition
(can ignore the constant)
Y=x10.071 + x2(-1.67) + x1x2 (0.050)
Competition present (x2 = 1)
Y=x10.071 + 1(-1.67) + x11 (-0.050)
Y=x10.071 + -1.67 + x1(-0.050)
Y=x1 0.021
+ (1.67)
720
721
722
Data
Bank Employees
only using clerical staff
363 cases
predicting starting salary
previous experience
age
age x experience
723
Correlation matrix
only one significant
724
The Procedure
Very similar to previous
create multiplicative interaction term
BUT
and SDs
cause one variable to dominate the
interaction term
By standardising
726
To standardise x,
subtract mean, and divide by SD
re-expresses x in terms of distance
from the mean, in SDs
ie z-scores
Hierarchical regression
two linear effects first
moderator effect in second
hint: it is often easier to interpret if
standardised versions of all variables
are used
728
Change in R2
0.085, p<0.001
Estimates (standardised)
b1 (exp) = 0.104
b2 (agestart) = -0.54
b3 (age x exp) = -0.54
729
Interpretation 1: Pick-aPoint
Graph is tricky
cant have two continuous variables
Choose specific points (pick-a-point)
Graph the line of best fit of one variable
at others
We know:
Y = e 0.10 + a -0.54 + a e -0.54
Where a = agestart, and e = experience
731
e = -1
e=0
0= (0 0.10) = 0
1= (-0.54+ 0 -0.54)a = -0.54a
e=1
0= (1 0.10) = 0.10
1= (-0.54 + 1 -0.54)a = -1.08a
732
733
Calculate p-value
At any point
735
Get results
Calculations in Bauer and Curran
(in press: Multivariate Behavioral
Research)
Paper B13
736
4.1
4.2
4.3
4.4
4.5
4.0
CVz1(1)
CVz1(2)
CVz1(3)
-1.0
-0.5
0.0
0.5
1.0
737
Areas of Significance
0 .0
-0 .2
-0 .4
-0 .6
S im p le S lo p e
0 .2
0 .4
Confidence Bands
-4
-2
Experience
738
2 complications
1: Constant differed
2: DV was logged, hence non-linear
effect of 1 unit depends on where the unit
is
739
Finally
740
Unlimited Moderators
Moderator effects are not limited
to
2 variables
linear effects
741
Block 2
Age x Sex, Age x Exp, Sex x Exp
Block 3
Age x Sex x Exp
742
Results
All two way interactions significant
Three way not significant
Effect of Age depends on sex
Effect of experience depends on sex
Size of the age x experience
interaction does not depend on sex
(phew!)
743
Moderated Non-Linear
Relationships
Enter non-linear effect
Enter non-linear effect x moderator
if significant indicates degree of nonlinearity differs by moderator
744
745
746
109
65
22
3
1
0
747
Common approach
Log transform and treat as normal
Problems
Censored at 0
Integers only allowed
Heteroscedasticity
748
749
exp( )
p( y | x)
y!
750
exp( )
p( y | x)
y!
Where:
y is the count
is the mean of the poisson
distribution
In a poisson distribution
The mean = the variance (hence
heteroscedasticity issue))
751
Poisson Regression in
SPSS
Not directly available
SPSS can be tweaked to do it in three ways:
General loglinear model (genlog)
Non-linear regression (CNLR)
Bootstrapped p-values only
SPSS 15,
752
Weight cases by
bites
Analyse,
Loglinear, General
Colour is factor
753
Results
Correspondence Between Parameters and
Terms of the Design
Parameter
Aliased Term
1
Constant
2
[COLOUR = 1]
3 x [COLOUR = 2]
Note: 'x' indicates an aliased (or a
redundant) parameter. These parameters
are set to zero.
754
Asymptotic
Param
Est.
1
2
3
4.1190
-.5495
.0000
SE
.1275
.2108
.
Note: Intercept
(param 1) is
curious
Param 2 is the
difference in the
means
95% CI
Z-value Lower
Upper
32.30
-2.61
.
4.37
-.14
.
3.87
-.96
.
755
SPSS: Continuous
Predictors
Bleedin nightmare
http://www.spss.com/tech/answer/
details.cfm?
tech_tan_id=100006204
756
Poisson Regression in
Stata
SPSS will save a Stata file
Open it in Stata
Statistics, Count outcomes, Poisson
regression
757
Poisson Regression in R
R is a freeware program
Similar to SPlus
www.r-project.org
Commands in R
Stage 1: enter data
colour <- c(1, 0, 1, 0, 1, 0 1)
bites <- c(3, 1, 0, 0, )
Run analysis
p1 <- glm(bites ~ colour, family
= poisson)
Get results
summary.glm(p1)
759
R Results
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3567
0.1686 -2.115 0.03441 *
colour
0.5555
0.2116
2.625 0.00866 **
Predicted Values
Need to get exponential of
parameter estimates
Like logistic regression
Exp(0.555) = 1.74
You are likely to be bitten by a shark
1.74 times more often with a red
surfboard
761
Checking Assumptions
Was it really poisson distributed?
For Poisson, 2
exp( )
p( y | x)
y!
Strictly:
exp( )
p ( yi | xi )
y!
763
764
Overdispersion
Problem in poisson regression
Too many zeroes
Causes
2 inflation
Standard error deflation
Hence p-values too low
Solution
Negative binomial regression
765
Using R
R can read an SPSS file
But you have to ask it nicely
More on R
R uses objects
To place something into an object use < X <- Y
Puts Y into X
Function is read.spss()
Mydata <- read.spss(spssfilename.sav)
GLM in R
Command
glm(outcome ~ pred1 + pred2 + +
predk [,family = familyname])
If no familyname, default is OLS
Use binomial for logistic, poisson for
poisson
769
770
Introducing Structural
Equation Modelling
Lesson 15
771
Introduction
Related to regression analysis
All (OLS) regression can be
considered as a special case of SEM
Regression as SEM
Grades example
Grade = constant + books + attend +
error
Looks like a regression equation
Also
Books correlated with attend
Explicit modelling of error
773
Path Diagram
System of equations are usefully
represented in a path diagram
x
Measured variable
unmeasured variable
regression
correlation
774
error
Books
Grade
Attend
Must explicitly
model correlation
775
Results
Unstandardised
2.00
1.00
BOOKS
4.04
2.65
17.84
13.52
GRADE
1.28
ATTEND
776
Standardised
e
BOOKS
.35
.44
.82
GRADE
.33
ATTEND
777
Table
GRADE
GRADE
GRADE
GRADE
<-- BOOKS
<-- ATTEND
<-- e
Estimate
4.04
1.28
13.52
37.38
S.E.
1.71
0.57
1.53
7.54
C.R.
2.36
2.25
8.83
4.96
St. Est.
0.02
0.35
0.03
0.33
0.00
0.82
0.00
Coefficientsa
Unstandardized
Coefficients
M odel
1
B
(Constant)
Standardized
Coefficients
Std. Error
Beta
Sig.
37.38
7.74
BOOKS
4.04
1.75
.35
.03
ATTEND
1.28
.59
.33
.04
.00
778
Restrict parameters
To zero
To the value of other parameters
To 1
779
Restrictions
Questions
Is a parameter really necessary?
Are a set of parameters necessary?
Are parameters equal
780
The 2 Test
Can the model proposed have
generated the data?
Test of significance of difference of
model and data
Statistically significant result
Bad
Theoretically driven
Start with model
Dont start with data
781
Regression Again
0, 1
BOOKS
GRADE
ATTEND
782
Two restrictions
2 df for 2 test
2 = 15.9, p = 0.0003
783
Multivariate Regression
y1
x1
y2
x2
y3
784
y1
x1
y2
x2
y3
785
y1
x1
y2
x2
y3
786
y1
x1
y2
x2
y3
787
y3
788
ENJOY
BUY
E.g. mediator
model
1 restriction
No path from
enjoy -> read
e_buy
READ
e_read
789
Result
2 = 10.9, 1 df, p = 0.001
Not a complete mediator
Additional path is required
790
Multiple Groups
Same model
Different people
Correlationsa
AGE
AGE
.017
.035
.004
.009
.859
.717
110
110
110
110
110
-.270
.665
.045
.075
.004
.000
.639
.436
110
110
110
110
110
-.248
.665
.109
.096
.009
.000
.255
.316
110
110
110
110
110
Pearson Correlation
.017
.045
.109
.782
Sig. (2-tailed)
.859
.639
.255
.000
110
110
110
110
110
Pearson Correlation
.035
.075
.096
.782
Sig. (2-tailed)
.717
.436
.316
.000
110
110
110
110
110
Pearson Correlation
N
Pearson Correlation
Sig. (2-tailed)
N
N
GHQ_D
GHQ_D
-.248
Sig. (2-tailed)
GHQ_A
GHQ_A
-.270
SEVNONE
SEVNONE
Pearson Correlation
Sig. (2-tailed)
SEVE
SEVE
N
a. SEX = f
793
Correlationsa
AGE
AGE
Pearson Correlation
Sig. (2-tailed)
N
SEVE
Pearson Correlation
Sig. (2-tailed)
N
SEVNONE
Pearson Correlation
Sig. (2-tailed)
N
GHQ_A
Pearson Correlation
Sig. (2-tailed)
N
GHQ_D
Pearson Correlation
Sig. (2-tailed)
N
SEVE
SEVNONE
GHQ_A
GHQ_D
-.243
-.116
-.195
-.190
.031
.310
.085
.094
79
79
79
79
79
-.243
.671
.456
.453
.031
.000
.000
.000
79
79
79
79
79
-.116
.671
.210
.232
.310
.000
.063
.040
79
79
79
79
79
-.195
.456
.210
.800
.085
.000
.063
.000
79
79
79
79
79
-.190
.453
.232
.800
.094
.000
.040
.000
79
79
79
79
79
a. SEX = m
794
Model
AGE
SEVE
SEVNONE
e_s
e_sn
Dep
Anx
E_d
e_a
795
AGE
Females
-.27
.96
-.25
SEVE
SEVNONE
.07
.04
e_s
.97
e_sn
.03
.09 -.04
.15
.64
Dep
Anx
.99
.99
E_d
e_a
.78
796
AGE
Males
-.24
.97
-.12
SEVE
SEVNONE
-.08
-.08
e_s
.99
e_sn
.52
-.12 .55
-.17
.67
Dep
Anx
.88
.88
E_d
e_a
.74
797
Constraint
sevnone -> dep
Constrained to be equal for males and
females
1 restriction, 1 df
2 = 1.3 not significant
4 restrictions
2 severity -> anx & dep
798
4 restrictions, 4 df
2 = 1.3, p = 0.014
799
Power: A Smaller
Advantage
Power for regression gets tricky
with large models
With SEM power is (relatively) easy
Its all based on chi-square
Paper B14
801
802
The Independence
Assumption
In Lesson 8 we talked about
independence
The residual of any one case should not tell
you about the residual of any other case
803
Clusters of Cases
Problem with cluster (group)
randomised studies
Or group effects
Complex Samples
As with Huber-White for
heteroscedasticity
Add a variable that tells it about the clusters
Put it into clusters
Run GLM
As before
Warning:
Need about 20 clusters for solutions to be
stable
805
Example
People randomised by week to one of
two forms of triage
Compare the total cost of treating each
Ignore clustering
Difference is 2.40 per person, with 95%
confidence intervals 0.58 to 4.22, p
=0.010
Include clustering
Difference is still 2.40, with 95% CIs 5.65
to -0.85, and p = 0.141.
Longitudinal Research
For comparing
repeated
measures
Clusters are
people
Can model the
repeated
measures over
time
ID
V1 V2 V3 V4
807
Converting Data
Change data to
tall and thin
Use Data,
Restructure in
SPSS
Clusters are ID
ID
5
808
(Simple) Example
Use employee data.sav
Compare beginning salary and salary
Would normally use paired samples ttest
809
Difference =
$17,430, 95% CIs =
16427.407,
18739.555
ID
Time
Cash
$18,75
0
$21,45
0
$12,00
0
$21,90
0
$13,20
0
$45,00
0
810
Interesting
That wasnt very interesting
What is more interesting is when we
have multiple measurements of the
same people
811
+
+
+
+
+
Time
812
Time
813
Complex Trajectories
An event occurs
Can have two effects:
A jump in the value
A change in the slope
Slope 1
Jump
Slope 2
Event Occurs
815
Parameterising
Time
1
2
3
4
5
6
7
8
9
Event
0
0
0
0
0
1
1
1
1
Time2
0
0
0
0
0
0
1
2
3
Outcome
12
13
14
15
16
10
9
8
7
816
817
Moderator effects
Slope differences
818
Multilevel Models
Fixed versus random effects
Fixed effects are fixed across
individuals (or clusters)
Random effects have variance
Levels
Level 1 individual measurement
occasions
Level 2 higher order clusters
819
More on Levels
NHS direct study
Level 1 units: .
Level 2 units:
820
More Flexibility
Three levels:
Level 1: measurements
Level 2: people
Level 3: schools
821
More Effects
Variances and covariances of
effects
Level 1 and level 2 residuals
Makes R2 difficult to talk about
Outcome variable
Yij
The score of the ith person in the jth
group
822
Y
2.3
3.2
4.5
4.8
7.2
3.1
1.6
i
1
2
3
1
2
3
4
j
1
1
1
2
2
2
2
823
Notation
Notation gets a bit horrid
Varies a lot between books and
programs
824
Standard Errors
Intercept has standard errors
Slopes have standard errors
Random effects have variances
Those variances have standard errors
Is there statistically significant variation
between higher level units (people)?
OR
Is everyone the same?
825
Programs
Since version 12
Can do this in SPSS
Cant do anything really clever
Menus
Completely unusable
Have to use syntax
826
SPSS Syntax
MIXED
relfd with time
/fixed = time
/random = intercept time |
subject (id) covtype(un)
/print = solution.
827
SPSS Syntax
MIXED
relfd with time
Outcome
Continuous
predictor
828
SPSS Syntax
MIXED
relfd with time
/fixed = time
Must specify effect as
fixed first
829
SPSS Syntax
MIXED
relfd with time
/fixed = time
/random = intercept time |
Intercept and
subject (id) covtype(un)
time are random
Specify random
effects
SPSS Syntax
MIXED
relfd with time
fixed = time
/random = intercept time |
subject (id) covtype(un)
Covariance matrix of random
effects is unstructured.
(Alternative is id identity or vc
variance components).
831
SPSS Syntax
MIXED
relfd with time
fixed = time
/random = intercept time |
subject (id) covtype(un)
/print = solution.
Print the answer
832
The Output
Information criteria
Well come back
Information Criteriaa
-2 Restricted Log
Likelihood
64899.758
Akaike's Information
64907.758
Criterion (AIC)
Hurvich and Tsai's
Criterion (AICC)
64907.763
Bozdogan's Criterion
64940.134
(CAIC)
Schwarz's Bayesian
Criterion (BIC)
64936.134
Fixed Effects
Not useful here, useful for
interactions
a
Type III Tests of Fixed Effects
Numerator df
Denominator
df
Intercept
741
3251.877
.000
time
741.000
2.550
.111
Source
Sig.
834
Estimate
Std.
Error
df
Sig.
Lower
Bound
Upper
Bound
21.90
21.90
.38
57.025
.000
21.15
22.66
-.06
-.06
.04
-1.597
.111
-.14
.01
835
Covariance Parameters
Estimates of Covariance Parametersa
Parameter
Estimate
Residual
64.11577 1.0526353
Intercept +
time [subject
= id]
Std. Error
UN (1,1)
85.16791 5.7003732
UN (2,1)
-4.53179
.5067146
UN (2,2)
.7678319
.0636116
Change Covtype to VC
We know that this is wrong
The covariance of the effects was
statistically significant
Can also see if it was wrong by
comparing information criteria
UN Model
Information Criteriaa
-2 Restricted Log
Likelihood
64899.758
VC Model
Information Criteriaa
-2 Restricted Log
Likelihood
65041.891
Akaike's Information
64907.758
Criterion (AIC)
Akaike's Information
65047.891
Criterion (AIC)
64907.763
65047.894
Bozdogan's Criterion
64940.134
(CAIC)
Bozdogan's Criterion
65072.173
(CAIC)
Schwarz's Bayesian
64936.134
Criterion (BIC)
Schwarz's Bayesian
65069.173
Criterion (BIC)
The information
criteria are displayed in smaller-is
The information criteria are displayed in smaller-is-better
forms.
a. Dependent Variable: relfd.
a. Dependent Variable: relfd.
Lower is better.
838
Adding Bits
So far, all a bit dull
We want some more predictors, to make
it more exciting
E.g. female
Add:
Relfd with time female
/fixed = time sex time * sex
Extending Models
Models can be extended
Any kind of regression can be used
Logistic, multinomial, Poisson, etc
More levels
Children within classes within schools
Measures within people within classes within
prisons
840
841
Books
Singer, JD and Willett, JB (2003). Applied
Longitudinal Data Analysis: Modeling
Change and Event Occurrence. Oxford,
Oxford University Press.
Examples at:
http://www.ats.ucla.edu/stat/SPSS/ex
amples/alda/default.htm
842
The End
843