762 Slides 1
762 Slides 1
762 Slides 1
Arden Miller
1 / 56
Linear Models and Geometry
There is a rich geometry associated with the statistical linear
model. Understanding this geometry can provide insight in
much of the analysis associated with regression analysis.
I The idea is to write the regression model as a vector
equation and explore the implications of this equation
using a basic understanding of vector spaces.
I Need to review aspects of vectors and vector spaces.
2 / 56
The Basics of Vectors
For our purposes, a vector is a “n-tuple” of real numbers
which we denote
v1
v2 boldface will be
v = .. used to indicate
.
vectors (and matrices)
vn
3 / 56
Example: 2-component Vectors
Two-component vectors can be displayed as directed line
segments on a standard scatterplot.
4
v
2
w
2 −4
v= w=
0
3 1
−2
−4
−4 −2 0 2 4
4 / 56
Vector Addition
The sum of two vectors is obtained by adding their
corresponding entries:
v1 w1 v1 + w1
v2 w2 v2 + w2
.. + .. =
..
. . .
vn wn vn + wn
2 −4 −2
For example: v + w = + =
3 1 4
5 / 56
Visualising Vector Addition
Visually, we translate the starting point of one the vectors to
the endpoint point of the other.
v+w
4
4
v
2
2
w
0
0
−2
−2
−2
−4
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
The sum is the vector from the origin to the new endpoint.
6 / 56
Scalar Multiplication of Vectors
To multiply a vector by a constant, simply multiply each entry
by that constant:
v1 k × v1
v2 k × v2
k × .. =
..
. .
vn k × vn
7 / 56
Visualising Scalar Multiplication
2v
6
6
v v
3
3
0
0
−3
−3
− 2v
−6
−6
−4 −2 0 2 4 −4 −2 0 2 4
1. v+w =w+v
2. u + (v + w) = (u + v) + w
3. k1 (v + w) = k1 v + k1 w
4. (k1 + k2 )v = k1 v + k2 v
9 / 56
The Linear (Regression) Model
The linear model can be written as an equation which relates
the value of a response variable Y to the values of one or
more explanatory variables:
Y = β0 + β1 X1 + . . . + βk Xk +
10 / 56
The Data
Suppose we have n observed values for the response y1 through
yn . For observation i, denote the values of the explanatory
variables as xi1 through xik and arrange the data in a table:
11 / 56
A Set of Equations
For each observation yi , we can write:
12 / 56
The Linear Model as a Vector Equation
The previous set of equations can be rewritten as a vector
equation:
y1 1 x11 x1k 1
y2 1 x21 x2k 2
= β0 + β1 + . . . + βk +
.. .. .. .. ..
. . . . .
yn 1 xn1 xnk n
y = β0 1 + β1 x1 + . . . βk xk +
13 / 56
Catheter Length Example
For 12 young patients, catheters were fed from a principal vein
into the heart. The catheter length was measured as was the
height and weight of the patients. Is it possible to predict the
necessary catheter length based on height and weight?
14 / 56
Catheter Length Data
Patient Height (in.) Weight (lbs.) Catheter (cm)
1 42.8 40.0 37
2 63.5 93.5 50
3 37.5 35.5 34
4 39.5 30.0 36
5 45.5 52.0 43
6 38.5 17.0 28
7 43.0 38.5 37
8 22.5 8.5 20
9 37.0 33.0 34
10 23.5 9.5 30
11 33.0 21.0 38
12 58.0 79.0 47
15 / 56
Catheter Regression Model
We can explore using a regression model that relates the
necessary catheter length to the height and weight of the
patient:
Y = β0 + β1 X1 + β2 X2 +
I Y is catheter length.
I X1 is patient height.
I X2 is patient weight.
I represents patient-to-patient variability.
16 / 56
The Vector Equation for the Catheter Data
37 1 42.8 40.0 1
50
1
63.5
93.5
2
34
1
37.5
35.5
3
36
1
39.5
30.0
4
43
1
45.5
52.0
5
28 1 38.5 17.0 6
= β0 + β1 + β2 +
37 1 43.0 38.5 7
20 1 22.5 8.5 8
34
1
37.0
33.0
9
30 1 23.5 9.5 10
38 1 33.0 21.0 11
47 1 58.0 79.0 12
y 1 x1 x2
17 / 56
Fixed Vectors and Random Vectors
The linear model contains two types of vectors:
18 / 56
Some Stuff about Random Vectors
A random vector V that contains random variables V1 , . . . Vp
can be thought of as a vector that has a density function or as
a collection of random variables.
I The distribution for V is determined by the joint
distribution of V1 , . . . Vp .
I The expected value of V represents its “average location”
and is a fixed vector given by:
V1 E(V1 )
E(V) = E ... = ...
Vp E(Vp )
19 / 56
More Stuff about Random Vectors
To summarise how V varies about µV , both the variability of
the elements and how they vary relative to each other must be
considered (the variances of individual elements and the
covariances between pairs of elements).
I It is convenient, to put these variances and covariances
into a matrix which we will call ΣV or Cov(V).
var(V1 ) cov(V1 , V2 ) · · · cov(V1 , Vp )
cov(V2 , V1 ) var(V2 ) ··· cov(V2 , Vp )
ΣV =
.. .. ..
. . .
cov(Vp , V1 ) cov(Vp , V2 ) · · · var(Vp )
20 / 56
The Density of a Random Vector
Conceptually, it is useful to think the density function for a
random vector as a cloud in R n that indicates the plausible
end points for the random vector: the vector is more likely to
end in a region where the cloud is dense than one where it is
not dense.
4
2
µv
0
−2
−4
−4 −2 0 2 4
21 / 56
Working with Random Vectors
If we add a fixed vector C to a random vector V, the resulting
vector U = V + C is a random vector with:
µU = µV + C and ΣU = ΣV
2 −4
I E.g. If µV = and C =
3 1
2 −4 −2
then µU = + =
3 1 4
22 / 56
Working with Random Vectors
The mean of the vector has been shifted but how the vector
varies about its mean stays the same.
c
4
4
µv + c µu
2
2
µv µv
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
23 / 56
The Distribution of the Errors
The linear model assumes that the errors are independent,
N(0, σ 2 ) observations.
I The joint distribution of the i ’s is multivariate Normal
with E(i ) = 0, var(i ) = σ 2 and cov(i , j ) = 0 for all i
and j 6= i. i
I Thus the joint density function is:
1 2 2 2 2
f (1 , 2 , . . . n ) = 2 n/2
e −(1 +2 +...n )/2σ
(2πσ )
24 / 56
The Distribution of
On slide 11 we defined the random vector :
1
where the i ’s are independent
= ...
N(0, σ 2 ) random variables.
n
25 / 56
The Density “Cloud” for
For , the density can be written as:
1 −kk2 /2σ 2
f () = 2
n e where kk2 = t = 21 + . . . 2n
(2πσ ) 2
26 / 56
The Density “Cloud” for a N(0, σ 2I2) Vector
A two dimensional N(0, σ 2 I2 ) random vector would have a
density cloud like this:
4
2
0
−2
−4
−4 −2 0 2 4
27 / 56
Is the Response Vector Fixed or Random?
It depends:
28 / 56
The Distribution of Y
The linear model represents Y as the sum of a fixed vector
and a random vector:
Y = β0 1 + β1 x1 + . . . βk xk +
|{z}
| {z }
fixed vector random vector
µY = β0 1 + β1 x1 + . . . βk xk + µ
= β0 1 + β1 x1 + . . . βk xk
ΣY = Σ = σ 2 In
29 / 56
The Density Cloud of Y
Y has the same density as except that it is centered around
µY rather than the origin.
4
4
2
2
µY
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
30 / 56
The Mean
The linear model restricts the possibilities for µY to vectors
that can be formed by taking linear combinations of the
vectors 1, x1 , . . . xk :
µY = β0 1 + β1 x1 + . . . βk xk
31 / 56
Vector Spaces
For our purposes, we only need to consider vectors which
contain real numbers and the usual definitions of vector
addition and scalar multiplication. In this case, a vector space
is any collection of vectors that is closed under addition and
scalar multiplication.
I This means that if we take two vectors u and v from a
vector space then any linear combination k1 u + k2 v must
also be in that vector space.
I As a result, the zero vector must be in all vector spaces.
32 / 56
Definition of R n
Let R n be the set of all n-component vectors where each
component is a real number.
33 / 56
Subspaces of R n
We will need to consider the different subspaces of R n .
I Any subset of the vectors in R n which is itself a vector
space is called a subspace of R n .
I Since we use the same definitions for addition and scalar
multiplication as before, all we really need to check is
that the subset of vectors is closed under addition and
scalar multiplication.
34 / 56
The Basis of a Vector Space
The usual way to define a subspace is by identifying a set of
vectors that form a basis.
I Suppose we take any finite collection of vectors from R n
and consider the set of vectors produced by taking all
possible linear combinations of these vectors. Our method
of generating this subset guarantees that it will be closed
under addition and scalar multiplication and thus be a
subspace of R n .
I Further, suppose that none of the vectors in original
collection can be generated as a linear combination of the
other vectors – i.e. none of these is redundant. Then this
collection of vectors is called a basis for the vector space
they generate.
35 / 56
R 3 as an Example
R 3 is a useful example since we can think of it as representing
the space around us.
Consider the vector space generated by a single vector v1 in
R 3 : the subspace consists of v1 and all scalar multiples of v1 .
I This subspace can be thought of as a infinite line in R 3
10 v1 10
8
8
6
6
4
4
2 2
36 / 56
R 3 as an Example
Now consider the vector space generated by two vectors, v1
and v2 , in R 3 .
I Provided that v1 6= k × v2 (i.e. they are not co-linear)
then the subspace generated by v1 and v2 is a plane.
50
40
30
20
]
J
10 vJ1 v2
10 J 10
8
J 8
6
6
4
J 4
2 2
37 / 56
The Subspaces of R 3
The subspaces of R 3 can be categorised as follows
1. The origin itself.
2. Any line through the origin.
3. Any plane through the origin.
4. R 3 itself.
Items 3 and 4 on this list are technically subspaces of R 3 but
are not of much practical interest – they are referred to as the
“improper subspaces.”
38 / 56
Dimensions of Subspaces
Notice that our categories are based on the dimensions of the
subspaces.
I The origin is considered 0-dimensional.
I Lines are 1-dimensional as they can be defined by a single
vector.
I Planes are 2-dimensional as 2 (non-colinear) vectors are
needed to define a plane.
I R 3 is 3-dimensional.
39 / 56
A Basis of a Subspace
Suppose that for a subspace S we have vectors v1 . . . vk such
that every vector in S can be expressed as a linear
combination of v1 . . . vk . Then v1 . . . vk is said to span S.
40 / 56
The Dimension of a Subspace
For any subspace S, there are an infinite number of bases.
However, each of these will consist of exactly the same
number of vectors. The number of vectors in a basis for S is
called the dimension of S.
I E.g. for a line in R 3 , a basis will consist of of one vector
that falls on that line – lines are 1-dimensional.
I For any plane in R 3 , any set of 2 linearly independent
(non-colinear) vectors that fall on that plane are a basis –
planes are 2-dimensional.
I Any set of 3 linearly independent vectors in R 3 will be a
basis for R 3 itself.
41 / 56
Extending to R n
The subspaces of R n can be categorised by their dimension:
I The origin itself.
I Any line through the origin (1-dimensional).
I Any plane through the origin (2-dimensional)
I Any 3-dimensional hyperplane through the origin.
..
.
I Any (n − 1)-dimensional hyperplane through the origin.
I R n itself.
42 / 56
Back to the Regression Model
For the regression model:
Y = µY + where µY = β0 1 + β1 x1 + . . . βk xk
43 / 56
Matrix Form of the Regression Model
The regression model as a vector equation:
Y1 1 x11 x1k 1
.. .. + β .. + . . . + β .. + ..
. = β0
. 1 . k . .
Yn 1 xn1 xnk n
44 / 56
Matrix Form for Catheter Data
37 1 42.8 40.0 1
50
1 63.5 93.5
2
34
1 37.5 35.5
3
36
1 39.5 30.0
4
43 1 45.5 52.0 5
β0
28 1 38.5 17.0 6
= β1 +
37 1 43.0 38.5 7
β2
20 1 22.5 8.5 8
| {z }
34
1 37.0 33.0
9
β
30 1 23.5 9.5 10
38 1 33.0 21.0 11
47 1 58.0 79.0 12
| {z } | {z } | {z }
Y X
45 / 56
Geometric Representation
Thus we have:
Y = µY + where µY = Xβ
Notice that we have defined the model space as the subspace
of R n spanned by the columns of X – another name for this
subspace is the column space of X denoted as colsp(X).
y
ε
colsp(X)
µY
46 / 56
Model Fitting
We can divide the model fitting procedure into two steps:
47 / 56
Step 1: Finding µ̂Y
The regression model restricts µY to the subspace of R n
spanned by the explanatory vectors (we called this the model
space). Since the distribution of Y is centered around µY , it
makes sense to define µ̂Y as the point in the model space that
is closest to Y .
I To find this point, we take the orthogonal projection of Y
onto the model space.
48 / 56
Orthogonal Projection Matrices
To find the orthogonal projection of the observed response
vector y onto colsp(X), we can pre-multiply y by a projection
matrix H given by:
−1
H = X (Xt X) Xt
49 / 56
Orthogonal Projection of Y
Projecting y onto colsp(X) gives our estimated mean vector
for Y (i.e the fitted values):
−1
µ̂Y = X (Xt X) Xt Y = Hy
colsp(X)
Hy
50 / 56
Catheter Data Analysis using R
In R, we can create the X matrix and the y vector for the
catheter data as follows:
> x1<-c(42.8,63.5,37.5,39.5,45.5,38.5,
+ 43.0,22.5,37.0,23.5,33.0,58.0)
> x2<-c(40.0,93.5,35.5,30.0,52.0,17.0,
+ 38.5, 8.5,33.0, 9.5,21.0,79.0)
> X<-cbind(1,x1,x2)
> y<-matrix(c(37,50,34,36,43,28,37,20,34,30,38,47),12,1)
51 / 56
Fitted Values for the Catheter Example
Then we can project y on to the colsp(X) to get µ̂Y as follows:
> H<-X%*%solve(t(X)%*%X)%*%t(X)
> H%*%y
[,1]
[1,] 37.03954
[2,] 51.62559
[3,] 35.06266
[4,] 34.43313
[5,] 39.90170
[6,] 31.73815
[7,] 36.79505
[8,] 26.74188
[9,] 34.47955
[10,] 27.14373
[11,] 31.34342
[12,] 47.69560
52 / 56
The Residual Vector
The vector of residuals is defined as:
r = y − µ̂Y
= y − Hy
= (I − H)y
y (I−H)y
colsp(X)
Hy
53 / 56
Least Squares
The orthogonal projection of y minimises the distance between
y and µ̂. From the previous picture it is clear that this
distance between is equal to the length of the residual vector r
which we denote as krk. Recalling some linear algebra:
√ q
krk = rt r = r12 + r22 + . . . rn2
54 / 56
Parameter Estimates
Since the column vectors of X are linearly independent, they
form a basis for colsp(X). Thus there is a unique linear
combination of the columns of X that produce µ̂. Putting the
coefficients for this relation in a vector β̂, we get µ̂Y = Xβ̂.
55 / 56
Parameter Estimates for Catheter Data
To get β̂ for our catheter data:
> solve(t(X)%*%X)%*%t(X)%*%y
[,1]
20.3757645
x1 0.2107473
x2 0.1910949
56 / 56