Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
Consider a random sample, y1 ,..., yn , from the normal distribution with unknown
mean Eyi = and known variance Vyi = 2 . The model for these data may be written
yi ~ iid N ( , 2 ) , i = 1,..., n ,
or equivalently,
yi = + ei ,
i = 1,..., n ,
y
~ N (0,1) .
/ n
(y z
1 n
yi , an
n i =1
/2
y 0
for testing H 0 : = 0 versus H1 : 0 is 2 P Z >
, where Z ~ N (0,1) .
/ n
Note 1: For notational simplicity, we will in this chapter not usually distinguish
between random variables and their realised values using upper and lower case letters.
For example, yi may refer to both what were denoted Yi and yi in previous chapters.
Note 2: We use the symbol because we wish to reserve for another quantity.
STAT2001_CH11A Page 2 of 24
Note 3: Suppose that the normal variance 2 is unknown. In that case, an important
result is that the sample variance
s 2y =
1 n
( yi y )2
n 1 i =1
1 n 2
=
yi ny 2
n 1 i =1
(n 1) s y2
(y t
/2
~ 2 (n 1) , and
y
~ t (n 1) .
sy / n
(n 1) s y / n , and
y 0
an exact p-value for testing H 0 : = 0 versus H1 : 0 is 2 P Tn 1 >
sy / n
Note 4: We use the symbol s 2y because we wish to reserve s 2 for another quantity.
Note 5: Suppose that the sample values are not normally distributed. Then all of the
above inferences are still valid, with an understanding that they are only approximate,
but that the approximations improve as the sample size n increases, and that the
inferences are asymptotically exact, meaning exact in the limit as n tends to infinity.
This is true even if is unknown and replaced in the relevant formulae by s y and/or
if t / 2 (n 1) and Tn 1 are replaced by z /2 and Z. It is true even if the error terms
e1 ,..., en are not independent and/or not identically distributed. All that is required is
that these terms are uncorrelated with mean zero and finite variance 2 .
These facts follow by the central limit theorem (CLT) and other results in probability
theory. As a very rough rule of thumb, the approximations may be considered 'good' if
STAT2001_CH11A Page 3 of 24
Consider the above CPM model, but now instead suppose that the n sample values
y1 ,..., yn do not all have the same mean , but rather that the ith value yi has mean
i = Eyi = + xi ,
i = 1,...,n,
where and are unknown constants and x1 ,..., xn are known constants.
The idea here is that we believe there to be a linear relationship between two variables
x and y which can be expressed in the form of the equation
= Ey = + x ,
with x1 ,..., xn being examples of x, and with y1 ,..., yn being examples of y. In this
equation, is implicitly a function of , and x.
The model just described is called the simple linear regression (SLR) model. In the
context of this model, we call y the dependent variable (since it depends on another
variable, namely x), and we call x the independent variable. Another term for x is the
covariate variable, and x1 ,..., xn may be referred to as the sample covariate values.
The focus of inference now are the two parameters and (and functions thereof),
rather than the single parameter as previously in the CPM model. We call and
the SLR parameters. More specifically, is the intercept parameter, and is the
slope parameter. The next example should explain why this terminology is used.
STAT2001_CH11A Page 4 of 24
Note 1: If = 0 then the SLR model reduces to the CPM model, with = .
Note 2: In some books, the symbols 0 and 1 are used instead of and ,
respectively. We will use the latter notation since it is easier to write and say.
Note 3: For definiteness, we have defined the SLR with iid normally distributed
errors that have a known variance 2 . As for the CPM model, we first treat this
'basic' version of the model, and later consider variations (e.g. unknown variance).
Example 1
Field, i
Yield (tonnes), yi
---------------------------------------------------------------------------------------------------1
0.0
2.0
0.5
3.1
1.0
3.0
1.5
3.8
2.0
4.1
2.5
4.3
7 (= n)
3.0
6.0
----------------------------------------------------------------------------------------------------
Produce a plot of these data and discuss the relationship between fertiliser and yield.
STAT2001_CH11A Page 5 of 24
Solution
The required plot is shown below. There appears to be a positive linear relationship
between fertiliser and yield, since we can imagine drawing a straight line which
passes roughly through the points. The equation of this imaginary line has the form
Ey = + x ,
where is the intercept and is the slope of the line. Thus it is plausible that the
data follow the SLR model
yi = i + ei , i = 1,..., n ,
where i = Eyi = + xi and e1 ,..., en ~ iid N (0, 2 ) .
We suppose that the two parameters and have some 'true' values which may
never be known exactly but which could be estimated.
6
5
4
3
2
1
0
y (tonnes of wheat)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x (kilograms of fertiliser)
R Code
STAT2001_CH11A Page 6 of 24
i = 1,..., n
where i = Eyi = + xi
and e1 ,..., en ~ iid N (0, 2 ) , with 2 known.
We will now derive formulae for suitable estimates a and b of and , respectively.
These estimates will be functions of the observed data pairs, ( x1 , y1 ),..., ( xn , yn ) .
Note: The estimates of and could also be denoted and (or 0 and 1 ).
However, for ease of writing, and speaking, we choose to use the symbols a and b.
i = a + bxi .
Note: This quantity also provides an estimate of yi = + xi + ei (since Eei = 0 ).
Therefore, it may also be denoted yi and be referred to as the ith fitted value or ith
(since = Ey ).
predicted value or ith predictor. Another notation for i is Ey
i
i
i
STAT2001_CH11A Page 7 of 24
To formalise this idea, we define the sum of squares for error (SSE) as
n
i =1
i =1
i =1
We next write down the partial derivatives of the SSE with respect to a and b:
n
SSE
= 2( yi a bxi )1 (1)
a
i =1
n
SSE
= 2( yi a bxi )1 ( xi ) .
b
i =1
i =1
i =1
i =1
i =1
0 = ( yi a bxi ) = yi a 1 b xi = ny an bnx a = y bx
n
i =1
i =1
xi yi b xi2
i =1
i =1
i =1
i =1
nx
i =1
i =1
xi yi b xi2
y bx =
nx
i =1
i =1
n
n
b xi2 nx 2 = xi yi nxy
i =1
i =1
n
b=
x y nxy
i =1
n
x
i =1
2
i
.
nx
b=
x y nxy
i =1
n
x
i =1
2
i
and
nx
a = y bx .
We call a and b the least squares estimates (LSEs) of the SLR parameters and .
STAT2001_CH11A Page 8 of 24
S xy
and
S xx
a = y bx ,
= xi yi nxy
i =1
where: S xy = ( xi x )( yi y )
i =1
n
n
2
2
2
= ( xi x ) = xi nx .
i =1
i =1
S xx = ( xi x )( xi x )
i =1
sxx =
S xy
n 1
sxy
sxx
, where:
S xx
= sx2 (the sample variance for the covariate variable).
n 1
Note 2: The quantities a and b here may also be called the least squares estimates in
any context with data of the form ( x1 , y1 ),..., ( xn , yn ) . The formulae for a and b do not
involve 2 or depend on any assumptions about the distribution of the errors e1 ,..., en .
Example 2
For the data in Example 1, find the least squares estimates of the simple linear
regression parameters. Then draw the associated line of best fit. Also calculate and
display in your graph the fitted values. Also, calculate the fitted errors and the SSE.
Solution
Here: x =
y=
1 n
1
xi = (0 + 0.5 + 1 + 1.5 + 2 + 2.5 + 3) = 1.5
n i =1
7
1 n
1
yi = (2 + 3.1 + 3 + 3.8 + 4.1 + 4.3 + 6) = 3.757
n i =1
7
STAT2001_CH11A Page 9 of 24
n
x
i =1
2
i
x y
i =1
S xx = xi2 nx 2 = 7
i =1
n
S xy = xi yi nxy = 7.75 .
i =1
S xy
S xx
= 1.107
and
a = y bx = 2.096 .
y 4 = ... = 3.757 ,
y 6 = ... = 4.864 ,
y 7 = ... = 5.418 .
y5 = ... = 4.311
e4 = ... = 0.043 ,
e6 = ... = 0.564 ,
e7 = ... = 0.582 .
e5 = ... = 0.211
Below is the required figure, showing the observed values y1 ,..., yn , the estimated
regression line y = a + bx , and the fitted values, yi = a + bxi , i = 1,...,n.
STAT2001_CH11A Page 10 of 24
Observed values
Fitted values
y (tonnes of wheat)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x (kilograms of fertiliser)
R Code
options(digits=4); x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
xbar=mean(x); sumx2=sum(x^2); ybar=mean(y); sumxy = sum(x*y); n=length(y)
c(xbar,sumx2,ybar,sumxy) # 1.500 22.750 3.757 47.200
Sxx=sumx2-n*xbar^2; Sxy=sumxy-n*xbar*ybar; c(Sxx, Sxy) # 7.00 7.75
b=Sxy/Sxx; a =ybar-b*xbar; c(a,b) # 2.096 1.107
yhat=a+b*x; ehat=y-yhat; rbind(yhat, ehat)
# yhat 2.09643 2.65 3.2036 3.75714 4.3107 4.8643 5.4179
# ehat -0.09643 0.45 -0.2036 0.04286 -0.2107 -0.5643 0.5821
SSE = sum(ehat^2); SSE # 0.9568
X11(w=8,h=4)
plot(x,y,xlim=c(0,3),ylim=c(0,7), main="Regression of wheat yield on fertiliser",
xlab="x (kilograms of fertiliser)",ylab="y (tonnes of wheat)",pch=16,cex=1.2)
abline(a,b,lwd=2); points(x,yhat,lwd=2,cex=1.2)
legend(0,7,c("Observed values","Fitted values"),
pch=c(16,1), pt.lwd=c(1,1.5),pt.cex=c(1.2,1.2))
STAT2001_CH11A Page 11 of 24
We will now determine some properties of the LSEs under the SLR model.
Unless otherwise indicated, all summations will be over i = 1,...,n.
S xy = ( xi x )( yi y ) = ( xi x ) yi y ( xi x ) ,
( x x ) = x x 1 = nx xn = 0 .
i
Then also, S xx = ( xi x ) 2 = ( xi x ) xi x ( xi x ) = ( xi x ) xi 0 .
Therefore ES xy = ( xi x ) Eyi = ( xi x )( + xi ) = ( xi x ) + ( xi x ) xi
= 0 + S xx .
It follows that Eb =
ES xy
S xx
S xx
S xx
= .
1
1
1
Eyi = ( + xi ) = (n + nx ) = + x .
n
n
n
Therefore Ea = E ( y bx ) = Ey xEb = ( + x ) x =
Note: Theorems 1 and 2 are true under much less restrictive assumptions than those in
the SLR model (as we have defined it). These results are true so long as all the error
terms e1 ,..., en have mean 0. These error terms need not be uncorrelated or normal;
they don't even need to be identically distributed or have the same variance.
STAT2001_CH11A Page 12 of 24
Theorem 3: Vb = 2 / S xx .
Proof: First, VS xy = V ( xi x ) yi = ( xi x ) 2 Vyi = S xx 2 .
S xy S xx 2 2
Therefore Vb = V
.
= 2 =
S xx
S xx
S xx
Theorem 4: C ( y , b) = 0 .
1 n
1
Proof: C ( y , b) = C yi ,
S xx
n i =1
1
(xj x ) y j =
j =1
nS xx
n
( x
i =1 j =1
x )C ( yi , y j ) .
Therefore C ( y , b) =
Theorem 5: Va =
1
nS xx
i =1
nS xx
( xi x ) 2 =
2 xi2
nS xx
0 = 0 .
Vy =
1
n2
Vyi =
1
n2
2 =
1
2
2
n
=
.
n2
n
2
n
+ x2
1 x2
2x 0 = 2 +
S xx
n S xx
x 2 nx 2 ) + nx 2 2 xi2
S xx + nx 2
2 ( i
=
=
.
=
nS xx
nS xx
nS xx
STAT2001_CH11A Page 13 of 24
Theorem 6: C (a, b) =
x 2
.
S xx
x 2
Proof: C (a, b) = C ( y bx , b) = C ( y , b) xC (b, b) = 0 xVb =
.
S xx
Then: (i)
(ii)
Proof: (i)
(ii)
an unbiased estimate of is = u + va + wb
2 2 1
2
2
V =
v xi + w 2vwx .
S xx n
E = u + vEa + wEb = u + v + w = .
2 xi2
2
2
2
2
V = v Va + w Vb + 2vwC (a, b) = v
+ w2
nS
xx
S xx
x 2
+
vw
2
S xx
2 1
2
2
v xi + w 2vwx .
S xx n
Note: Theorems 3 to 7 are true under much less restrictive assumptions than all of
those in the SLR model (as we have defined it). These results are true so long as the
error terms e1 ,..., en are uncorrelated and have the same variance 2 . The error terms
need not be independent or normal; they don't even need to be identically distributed.
STAT2001_CH11A Page 14 of 24
b
~ N (0,1)
Vb
and
~ N (0,1) ,
where a, b, , Va, Vb and V are as above. This follows because each of a, b, and so
also , is a linear combination of the normally distributed data values, y1 ,..., yn .
These results allow us to perform inference on the regression parameters and , as
well as on any linear combination of the parameters having the form = u + v + w .
b 0
an exact p-value for testing H 0 : = 0 versus H1 : 0 is 2 P Z >
.
Vb
= 0 + 1a + xb = a + xb ,
and also, by Theorem 7,
V =
2 1
2
2
v xi + w 2vwx
S xx n
2 2 1
2
2
1 xi + x 2 1 x x
S xx n
2 1
nx 2
( x x )2
2 1
2
2
2
2
2 1
x
+
x
2
xx
+
x
=
S
+
(
x
x
)
=
+
i
xx
.
S xx n
n
n
S
S xx n
xx
0
testing H 0 : = 0 versus H1 : 0 is 2 P Z >
STAT2001_CH11A Page 15 of 24
Example 3
Suppose that the data in Example 1 follow the SLR model with normal errors and
standard deviation 0.15. Estimate the slope parameter and the mean of a y-value with
covariate 2.2. For each quantity, report a point estimate and suitable 95% CI.
Solution
1 n
1
xi = (0 + 0.5 + 1 + 1.5 + 2 + 2.5 + 3) = 1.5
7
n i =1
y=
1 n
1
yi = (2 + 3.1 + 3 + 3.8 + 4.1 + 4.3 + 6) = 3.757
n i =1
7
x
i =1
2
i
i =1
S xx = xi2 nx 2 = 7 ,
i =1
b=
x y
S xy
S xx
= 1.107 ,
S xy = xi yi nxy = 7.75 .
i =1
= 0.15,
Vb =
2
S xx
= 0.003214.
(b z
/2
= a + xb = 4.532,
1 ( x x )2
V = 2 +
= 0.004789.
S xx
n
Thus, we estimate the mean as = 4.532, and a 95% CI for is
( z
/2
V = (4.397, 4.668).
STAT2001_CH11A Page 16 of 24
R Code
Prediction
We have already seen how to perform inference on = + x , the mean of a y-value
with covariate value x. This quantity may also be thought of as the average of a
hypothetically infinite number of independent 'new' y-values, all with covariate x.
But what if we wish to perform inference on just a single such value itself (and not on
its mean or expectation)?
To this end, we may write the new independent single value of interest as
y = + x+e,
where e ~ N (0, 2 ) is an error term which is independent of e1 ,..., en .
Now, Ee = 0, and so we may estimate the new value y = + x + e by y = a + bx .
Notably, this is exactly the same as the estimate = a + xb of = + x .
STAT2001_CH11A Page 17 of 24
To construct an interval estimate for y = + x + e , we consider the error in
estimation (or prediction), y y , whose mean is zero and which has variance
V ( y y ) = V {(a + bx) ( + x + e)} = V {(a + bx) e} .
= V (a + bx) + Ve 2C (a + bx, e) .
Now, a and b are functions of y1 ,..., yn , which are independent of the new error term,
e. Therefore C (a + bx, e) = 0, and so
V ( y y ) = V + Ve 2 0
1 ( x x )2
1 ( x x )2
2
2
=2 +
+
+
=
+
+
.
S xx
S xx
n
n
Since y y is a linear combination of normal random variables, we now have that
y y
~ N (0,1) ,
V ( y y )
from which it follows that an exact 1 prediction interval (PI) for the new value y is
1 ( x x )2
y z /2 V ( y y ) = a + bx z /2 1 + +
n
S xx
Note that this interval for y = + x + e is somewhat wider than the exact 1 CI
for = + x , namely
z / 2 V = a + bx z /2
1 ( x x )2
+
n
S xx
This makes sense, since there is obviously more variability associated with a single
value y than with the average of a hypothetically infinite number of such values.
As an exercise, the reader may wish to find an exact 1 PI for the average of m
independent y-values all having covariate x. (This should be 'between' the 1 CI for
STAT2001_CH11A Page 18 of 24
SSE
n2
1 n 2
1 n
1 n
=
e
=
(
y
y
)
=
( yi a bxi ) 2
i
n2 i n2 i
n 2 i =1
i =1
i =1
is unbiased and consistent for 2 . See Note 2 below for a proof that Es 2 = 2 .
Some other important results in that case are that
(n 2) s 2
~ 2 (n 2)
and that s 2 is independent of both a and b. The proof of these results is omitted.
1 n
( yi y )2 .
n 1 i =1
1
Now, C ( yi , b) = C yi ,
S xx
y (x
j =1
x x 2
1
x ) = C yi ,
yi ( xi x ) = i
.
S xx
S xx
1 n
1
Also, C ( yi , a) = C ( yi , y xb ) = C yi , y j x
S xx
n j =1
y (x
j =1
x)
1
x x 1
x x 2
= C yi , yi x i
= x i
n
S
n
S
xx
xx
STAT2001_CH11A Page 19 of 24
Therefore, using previous results, we have that
2 j =1 x 2j
n
1
x x 2
xi x 2
x 2
2 x i
2
x
2
x
i
i
nS xx
S xx
S xx
S xx
S xx
n
2
1 n 2
2
2
=
S xx + x j + xi S xx + 2 x ( xi x ) 2 xi ( xi x ) 2 xi x
S xx
n j =1
n
Ee = +
2
i
+ xi2
2 n 2
1 n 2
S
+
x j xi2 + 2 xxi + 2 x 2 .
xx
S xx n
n j =1
So E ( SSE ) = Eei =
i =1
2 n 2
n
S xx +
S xx n
x 2j
j =1
2
2
2
x
+
2xnx
+
2nx
= (n 2) ,
i
i =1
SSE
2
and so it follows that Es 2 = E
= , as required.
n2
a
~ t (n 2) ,
Va
b
~ t (n 2)
Vb
and
~ t (n 2) ,
2
2
2
2
= s and V = s v 2 1 x 2 + w2 2vwx
= s xi , Vb
where Va
i
S xx
S xx n
nS xx
, Vb
and V are the same as Va, Vb and V with 2 replaced by s 2 ).
(i.e. where Va
(Keep in mind that = u + v + w and = u + va + wb .)
On the basis of these facts, inference regarding , and can proceed exactly as
before, but with and z /2 everywhere changed to s and t / 2 (n 2) , respectively.
= s2 / S ,
, where Vb
For example, an exact 1 CI for is b t /2 (n 2) Vb
xx
b 0
and an exact p-value for testing H 0 : = 0 vs H1 : 0 is 2 P Tn 2 >
Vb
STAT2001_CH11A Page 20 of 24
Also, an exact 1 PI for a single new value y with covariate value x is
1 ( x x )2
y t / 2 (n 2) V ( y y ) = a + bx t / 2 (n 2) s 1 + +
n
S xx
Example 4
Consider Example 1. Assuming that the data follow the SLR model with normal
errors, but with the normal variance 2 unknown, find a 95% CI for the quantity
= + x , where x = 2.2, and also a 95% PI for a new y-value with covariate x.
Solution
SSE 0.9568
=
= 0.1914
n2
5
( x x )2
2 1
V = s +
= 0.04073
S xx
n
t / 2 (n 2) = t0.025 (5) = 2.571 .
1 ( x x )2
Also, V ( y y ) = s 2 1 + +
= 0.2321.
n
S
xx
Note that the 95% PI has the same centre as, but is wider than, the 95% CI.
STAT2001_CH11A Page 21 of 24
Below is a figure showing:
the estimate and 95% CI for = + x for all x over the range 0 x 3
Note that one line not shown in the figure is the unknown 'true' regression line given
by the equation Ey = + x . In most situations, this will never be known exactly.
Estimate of mu as a function of x
95% CI for mu as a function of x
95% PI for y as a function of x
y (tonnes of wheat)
0.0
0.5
1.0
1.5
x (kilograms of fertiliser)
2.0
2.5
3.0
STAT2001_CH11A Page 22 of 24
R Code
options(digits=4); x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
xbar=mean(x); sumx2=sum(x^2); ybar=mean(y); sumxy = sum(x*y); n=length(y)
c(xbar,sumx2,ybar,sumxy) # 1.500 22.750 3.757 47.200
Sxx=sumx2-n*xbar^2; Sxy=sumxy-n*xbar*ybar; c(Sxx, Sxy) # 7.00 7.75
b=Sxy/Sxx; a =ybar-b*xbar; c(a,b) # 2.096 1.107
yhat=a+b*x; ehat=y-yhat; rbind(yhat, ehat)
SSE = sum(ehat^2); SSE # 0.9568
xval=2.2; muhat=a+xval*b; muhat # 4.532
s2=SSE/(n-2); c(s2, sqrt(s2)) # 0.1914 0.4374
Vhatmuhat=s2*(1/n + (xval-xbar)^2/Sxx); Vhatmuhat # 0.04073
tval=qt(0.975,n-2); tval # 2.571
CI = muhat + c(-1,1)*tval*sqrt(Vhatmuhat); CI # 4.013 5.051
ypred=muhat
Vhatypred=s2*(1+1/n + (xval-xbar)^2/Sxx); Vhatypred # 0.2321
PI = ypred + c(-1,1)*tval*sqrt(Vhatypred); PI # 3.294 5.771
X11(w=8,h=5)
plot(x,y,xlim=c(0,3),ylim=c(0,8),
main="Inference on mu = alpha + x*beta and y = mu + e as functions of x",
xlab="x (kilograms of fertiliser)",ylab="y (tonnes of wheat)",pch=16,cex=1.2)
abline(a,b,lwd=2);
points(rep(xval,3),c(muhat,CI),lwd=2,cex=1.2)
points(rep(xval,2),PI,pch=2,lwd=2,cex=1.2)
STAT2001_CH11A Page 23 of 24
xvec=seq(0,3,0.01); J=length(xvec); CILBvec= rep(NA,J); CIUBvec= rep(NA,J)
for(j in 1:J){ xvalue=xvec[j]; muhatvalue=a+xvalue*b
CI = muhatvalue + c(-1,1)*tval*sqrt(s2*(1/n + (xvalue-xbar)^2/Sxx))
CILBvec[j]=CI[1]; CIUBvec[j]=CI[2] }
lines(xvec,CILBvec); lines(xvec,CIUBvec)
Suppose that the sample values are not normally distributed. Then all of the above
inferences are still valid, with an understanding that they are only approximate, but
that the approximations improve as the sample size n increases, and that the
inferences are asymptotically exact, meaning exact in the limit as n tends to infinity.
This is true even if is unknown and replaced in the relevant formulae by s and/or if
STAT2001_CH11A Page 24 of 24
S xy
S xx
Another useful tool for exploring the relationship between two variables x and y, but
in the context where both are random variables, is correlation analysis. In particular,
S xy
S xx S yy
C ( x, y )
.
Vx Vy
S xx
.
S yy
Another relationship between simple linear regression and correlation analysis is that
r2 =
S yy SSE
S yy
= 1
SSE
,
S yy
i =1
i =1