Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model

STAT2001_CH11A Page 1 of 24
SIMPLE LINEAR REGRESSION (Chapter 11)

Review of some inference and notation: A common population mean model
We begin this chapter by reviewing an important topic: inference regarding a common

population mean based on a random sample. We then modify this topic so as to
introduce the topic of regression, in particular simple linear regression.
Consider a random sample, y1 ,..., yn , from the normal distribution with unknown
mean Eyi = and known variance Vyi = 2 . The model for these data may be written
yi ~ iid N ( , 2 ) , i = 1,..., n ,
or equivalently,
yi = + ei ,
i = 1,..., n ,
where e1 ,..., en ~ iid N (0, 2 ) . We refer to e1 ,..., en as the error terms.

The above model may be called the common population mean (CPM) model. In the
context of this model, a key result is that
y
~ N (0,1) .
/ n
This implies that an unbiased estimate of is the sample mean, y =
exact 1 confidence interval (CI) for is
(y z
1 n
yi , an
n i =1
/ n , and an exact p-value
/2
y 0
for testing H 0 : = 0 versus H1 : 0 is 2 P Z >
, where Z ~ N (0,1) .
/ n
Note 1: For notational simplicity, we will in this chapter not usually distinguish
between random variables and their realised values using upper and lower case letters.
For example, yi may refer to both what were denoted Yi and yi in previous chapters.
Note 2: We use the symbol because we wish to reserve for another quantity.
Note 3: Suppose that the normal variance 2 is unknown. In that case, an important
result is that the sample variance
s 2y =
1 n
( yi y )2
n 1 i =1
1 n 2
=
yi ny 2
n 1 i =1
is unbiased and consistent for 2 . It can also be shown that
(n 1) s y2
that s 2y is independent of y . From these results, it follows that
This fact implies that an exact 1 CI for is
(y t
/2
~ 2 (n 1) , and
y
~ t (n 1) .
sy / n
(n 1) s y / n , and
y 0
an exact p-value for testing H 0 : = 0 versus H1 : 0 is 2 P Tn 1 >
sy / n
where Tn 1 ~ t (n 1) , and where t / 2 (n 1) denotes the upper -quantile of Tn 1 .
Note 4: We use the symbol s 2y because we wish to reserve s 2 for another quantity.
Note 5: Suppose that the sample values are not normally distributed. Then all of the
above inferences are still valid, with an understanding that they are only approximate,
but that the approximations improve as the sample size n increases, and that the
inferences are asymptotically exact, meaning exact in the limit as n tends to infinity.
This is true even if is unknown and replaced in the relevant formulae by s y and/or
if t / 2 (n 1) and Tn 1 are replaced by z /2 and Z. It is true even if the error terms
e1 ,..., en are not independent and/or not identically distributed. All that is required is
that these terms are uncorrelated with mean zero and finite variance 2 .
These facts follow by the central limit theorem (CLT) and other results in probability
theory. As a very rough rule of thumb, the approximations may be considered 'good' if
n is 'large', meaning n 30 (say).
Introduction to regression: Simple linear regression
Consider the above CPM model, but now instead suppose that the n sample values
y1 ,..., yn do not all have the same mean , but rather that the ith value yi has mean
i = Eyi = + xi ,
i = 1,...,n,
where and are unknown constants and x1 ,..., xn are known constants.
This new model may be written

yi ~ N ( i , 2 ), i = 1,..., n
(where signifies independence) or
yi = i + ei , i = 1,..., n ,
where e1 ,..., en ~ iid N (0, 2 ) . As before, we refer to e1 ,..., en as the error terms.
The idea here is that we believe there to be a linear relationship between two variables
x and y which can be expressed in the form of the equation
= Ey = + x ,
with x1 ,..., xn being examples of x, and with y1 ,..., yn being examples of y. In this
equation, is implicitly a function of , and x.
The model just described is called the simple linear regression (SLR) model. In the
context of this model, we call y the dependent variable (since it depends on another
variable, namely x), and we call x the independent variable. Another term for x is the
covariate variable, and x1 ,..., xn may be referred to as the sample covariate values.
The focus of inference now are the two parameters and (and functions thereof),
rather than the single parameter as previously in the CPM model. We call and
the SLR parameters. More specifically, is the intercept parameter, and is the
slope parameter. The next example should explain why this terminology is used.
Note 1: If = 0 then the SLR model reduces to the CPM model, with = .
Note 2: In some books, the symbols 0 and 1 are used instead of and ,
respectively. We will use the latter notation since it is easier to write and say.
Note 3: For definiteness, we have defined the SLR with iid normally distributed
errors that have a known variance 2 . As for the CPM model, we first treat this
'basic' version of the model, and later consider variations (e.g. unknown variance).
Example 1
We are interested in the effects of a particular fertiliser on wheat yield.

To this end, we divide a large field into 7 plots of equal size, similar soil quality, etc.
We then plant wheat in these 7 plots after adding varying amounts of the fertiliser.
At harvest time the yields are observed and recorded, as shown in the following table.
Field, i
Quantity of fertiliser (kg), xi
Yield (tonnes), yi
---------------------------------------------------------------------------------------------------1
0.0
2.0
0.5
3.1
1.0
3.0
1.5
3.8
2.0
4.1
2.5
4.3
7 (= n)
3.0
6.0
----------------------------------------------------------------------------------------------------
Produce a plot of these data and discuss the relationship between fertiliser and yield.
Solution
The required plot is shown below. There appears to be a positive linear relationship
between fertiliser and yield, since we can imagine drawing a straight line which
passes roughly through the points. The equation of this imaginary line has the form
Ey = + x ,
where is the intercept and is the slope of the line. Thus it is plausible that the
data follow the SLR model
yi = i + ei , i = 1,..., n ,
where i = Eyi = + xi and e1 ,..., en ~ iid N (0, 2 ) .
We suppose that the two parameters and have some 'true' values which may
never be known exactly but which could be estimated.
6
5
4
3
2
1
0
y (tonnes of wheat)
Plot of wheat yield versus fertiliser
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x (kilograms of fertiliser)
R Code
x = c(0, 0.5, 1, 1.5, 2, 2.5, 3)

y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
X11(w=8,h=4); par(mfrow=c(1,1))
plot(x,y,xlim=c(0,3),ylim=c(0,7), main="Plot of wheat yield versus fertiliser",
xlab="x (kilograms of fertiliser)",ylab="y (tonnes of wheat)",pch=16,cex=1.2)
Least squares estimation
Consider the SLR model given by

yi = i + ei ,
i = 1,..., n
where i = Eyi = + xi
and e1 ,..., en ~ iid N (0, 2 ) , with 2 known.
We will now derive formulae for suitable estimates a and b of and , respectively.
These estimates will be functions of the observed data pairs, ( x1 , y1 ),..., ( xn , yn ) .
Note: The estimates of and could also be denoted and (or 0 and 1 ).
However, for ease of writing, and speaking, we choose to use the symbols a and b.
First, define the ith fitted mean as
i = a + bxi .
Note: This quantity also provides an estimate of yi = + xi + ei (since Eei = 0 ).
Therefore, it may also be denoted yi and be referred to as the ith fitted value or ith
(since = Ey ).
predicted value or ith predictor. Another notation for i is Ey
i
i
i
Next define the ith error as

ei = yi i = yi xi ,
and the ith fitted error as
ei = yi i = yi yi = yi a bxi .
Now, intuitively, a straight line with equation y = a + bx will provide a good fit to
the n data points if the sum of the squares of the n fitted errors is small. Therefore, one
reasonable approach is to choose a and b so as to make that sum as small as possible.
To formalise this idea, we define the sum of squares for error (SSE) as
n
i =1
i =1
i =1
SSE = ei2 = ( yi yi ) 2 = ( yi a bxi ) 2 .
We next write down the partial derivatives of the SSE with respect to a and b:
n
SSE
= 2( yi a bxi )1 (1)
a
i =1
n
SSE
= 2( yi a bxi )1 ( xi ) .
b
i =1
Setting these derivatives to zero, respectively, we get:

n
i =1
i =1
i =1
i =1
0 = ( yi a bxi ) = yi a 1 b xi = ny an bnx a = y bx
n
i =1
i =1
xi yi b xi2
i =1
i =1
i =1
i =1
nx
0 = ( yi a bxi )xi = xi yi a xi b xi2 a =
Equating these two different expressions for a, we get

n
i =1
i =1
xi yi b xi2
y bx =
nx
i =1
i =1
nxy nbx 2 = xi yi b xi2
n
n
b xi2 nx 2 = xi yi nxy
i =1
i =1
n
b=
x y nxy
i =1
n
x
i =1
2
i
.
nx
Thus, the required formulae are given by

n
b=
x y nxy
i =1
n
x
i =1
2
i
and
nx
a = y bx .
We call a and b the least squares estimates (LSEs) of the SLR parameters and .
Note 1: Another way to express the LSEs are as

b=
S xy
and
S xx
a = y bx ,
= xi yi nxy
i =1
where: S xy = ( xi x )( yi y )
i =1
n
n
2
2
2
= ( xi x ) = xi nx .
i =1
i =1
S xx = ( xi x )( xi x )
i =1
Yet another way to express b is as

sxy =
sxx =
S xy
n 1
sxy
sxx
, where:
(the sample covariance for the two variables)
S xx
= sx2 (the sample variance for the covariate variable).
n 1
Note 2: The quantities a and b here may also be called the least squares estimates in
any context with data of the form ( x1 , y1 ),..., ( xn , yn ) . The formulae for a and b do not
involve 2 or depend on any assumptions about the distribution of the errors e1 ,..., en .
Example 2
For the data in Example 1, find the least squares estimates of the simple linear
regression parameters. Then draw the associated line of best fit. Also calculate and
display in your graph the fitted values. Also, calculate the fitted errors and the SSE.
Solution
Here: x =
y=
1 n
1
xi = (0 + 0.5 + 1 + 1.5 + 2 + 2.5 + 3) = 1.5
n i =1
7
1 n
1
yi = (2 + 3.1 + 3 + 3.8 + 4.1 + 4.3 + 6) = 3.757
n i =1
7
n
x
i =1
2
i
= 0 2 + 0.52 + ... + 32 = 22.75
x y
i =1
= 0 2 + 0.5 3.1 + ... + 3 6 = 47.2
S xx = xi2 nx 2 = 7
i =1
n
S xy = xi yi nxy = 7.75 .
i =1
So the least squares estimates are:

b=
S xy
S xx
= 1.107
and
a = y bx = 2.096 .
The fitted values are:

y1 = a + bx1 = 2.096 + 1.107 0 = 2.096
y 2 = a + bx2 = 2.096 + 1.107 0.5 = 2.650
y3 = ... = 3.204 ,
y 4 = ... = 3.757 ,
y 6 = ... = 4.864 ,
y 7 = ... = 5.418 .
y5 = ... = 4.311
So the fitted errors are:

e1 = y1 y1 = 2 2.096 = 0.096
e2 = y2 y 2 = 3.1 2.65 = 0.45
e3 = ... = 0.203 ,
e4 = ... = 0.043 ,
e6 = ... = 0.564 ,
e7 = ... = 0.582 .
e5 = ... = 0.211
Finally, the sum of squares for error is

n
SSE = ei2 = (0.096) 2 + 0.452 + ... + 0.5822 = 0.957 .

i =1
Below is the required figure, showing the observed values y1 ,..., yn , the estimated
regression line y = a + bx , and the fitted values, yi = a + bxi , i = 1,...,n.
Observed values
Fitted values
y (tonnes of wheat)
Regression of wheat yield on fertiliser
0.0
0.5
1.0
1.5
2.0
2.5
3.0
R Code
options(digits=4); x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
xbar=mean(x); sumx2=sum(x^2); ybar=mean(y); sumxy = sum(x*y); n=length(y)
c(xbar,sumx2,ybar,sumxy) # 1.500 22.750 3.757 47.200
Sxx=sumx2-n*xbar^2; Sxy=sumxy-n*xbar*ybar; c(Sxx, Sxy) # 7.00 7.75
b=Sxy/Sxx; a =ybar-b*xbar; c(a,b) # 2.096 1.107
yhat=a+b*x; ehat=y-yhat; rbind(yhat, ehat)
# yhat 2.09643 2.65 3.2036 3.75714 4.3107 4.8643 5.4179
# ehat -0.09643 0.45 -0.2036 0.04286 -0.2107 -0.5643 0.5821
SSE = sum(ehat^2); SSE # 0.9568
X11(w=8,h=4)
plot(x,y,xlim=c(0,3),ylim=c(0,7), main="Regression of wheat yield on fertiliser",
abline(a,b,lwd=2); points(x,yhat,lwd=2,cex=1.2)
legend(0,7,c("Observed values","Fitted values"),
pch=c(16,1), pt.lwd=c(1,1.5),pt.cex=c(1.2,1.2))
Properties of the least squares estimators
We will now determine some properties of the LSEs under the SLR model.
Unless otherwise indicated, all summations will be over i = 1,...,n.
Theorem 1: b = S xy / S xx is an unbiased estimator of .

Proof: First, observe that S xx = ( xi x ) 2 is a constant.
Next, note that

where
S xy = ( xi x )( yi y ) = ( xi x ) yi y ( xi x ) ,
( x x ) = x x 1 = nx xn = 0 .
i
Then also, S xx = ( xi x ) 2 = ( xi x ) xi x ( xi x ) = ( xi x ) xi 0 .
Therefore ES xy = ( xi x ) Eyi = ( xi x )( + xi ) = ( xi x ) + ( xi x ) xi
= 0 + S xx .
It follows that Eb =
ES xy
S xx
S xx
S xx
= .
Theorem 2: a = y bx is an unbiased estimator of .
Proof: Observe that Ey =
1
1
1
Eyi = ( + xi ) = (n + nx ) = + x .
n
n
n
Therefore Ea = E ( y bx ) = Ey xEb = ( + x ) x =
(using Theorem 1).
Note: Theorems 1 and 2 are true under much less restrictive assumptions than those in
the SLR model (as we have defined it). These results are true so long as all the error
terms e1 ,..., en have mean 0. These error terms need not be uncorrelated or normal;
they don't even need to be identically distributed or have the same variance.
Theorem 3: Vb = 2 / S xx .
Proof: First, VS xy = V ( xi x ) yi = ( xi x ) 2 Vyi = S xx 2 .
S xy S xx 2 2
Therefore Vb = V
.
= 2 =
S xx
S xx
S xx
Theorem 4: C ( y , b) = 0 .
1 n
1
Proof: C ( y , b) = C yi ,
S xx
n i =1
1
(xj x ) y j =
j =1
nS xx
n
( x
i =1 j =1
x )C ( yi , y j ) .
Now, C ( yi , y j ) = 0 for all i j , and C ( yi , y j ) = Vyi = 2 if i = j.
Therefore C ( y , b) =
Theorem 5: Va =
1
nS xx
i =1
nS xx
( xi x ) 2 =
2 xi2
Proof: First note that
nS xx
0 = 0 .
Vy =
1
n2
Vyi =
1
n2
2 =
1
2
2
n
=
.
n2
n
So, by Theorems 3 and 4, Va = V ( y bx ) = Vy + x 2Vb 2 xC ( y , b)
2
n
+ x2
1 x2
2x 0 = 2 +
S xx
n S xx
x 2 nx 2 ) + nx 2 2 xi2
S xx + nx 2
2 ( i
=
=
.
=
nS xx
nS xx
nS xx
Theorem 6: C (a, b) =
x 2
.
S xx
x 2
Proof: C (a, b) = C ( y bx , b) = C ( y , b) xC (b, b) = 0 xVb =
.
S xx
Theorem 7: Let = u + v + w , where u, v and w are finite constants.

(Thus, is any linear combination of the regression parameters.)
Then: (i)
(ii)
Proof: (i)
(ii)
an unbiased estimate of is = u + va + wb
2 2 1
2
2
V =
v xi + w 2vwx .
S xx n
E = u + vEa + wEb = u + v + w = .
2 xi2
2
2
2
2
V = v Va + w Vb + 2vwC (a, b) = v
+ w2
nS
xx
S xx
x 2
+
vw
2
S xx
2 1
2
2
v xi + w 2vwx .
S xx n
Note: Theorems 3 to 7 are true under much less restrictive assumptions than all of
those in the SLR model (as we have defined it). These results are true so long as the
error terms e1 ,..., en are uncorrelated and have the same variance 2 . The error terms
need not be independent or normal; they don't even need to be identically distributed.
Making inferences under the SLR model

Under the assumption in the SLR model that e1 ,..., en ~ iid N (0, 2 ) , it is true that:
a
~ N (0,1) ,
Va
b
~ N (0,1)
Vb
and
~ N (0,1) ,
where a, b, , Va, Vb and V are as above. This follows because each of a, b, and so
also , is a linear combination of the normally distributed data values, y1 ,..., yn .
These results allow us to perform inference on the regression parameters and , as
well as on any linear combination of the parameters having the form = u + v + w .
For example, an exact 1 CI for is b z / 2 Vb , where Vb = 2 / S xx , and
b 0
an exact p-value for testing H 0 : = 0 versus H1 : 0 is 2 P Z >
.
Vb
As another example, suppose that we are interested in = + x , the mean (or

mathematical expectation) of a y-value with some specified covariate value x.
Now, is just a special case of the general linear combination = u + v + w
(with u = 0, v = 1 and w = x ). Therefore, an unbiased estimate of is
= 0 + 1a + xb = a + xb ,
and also, by Theorem 7,
V =
2 1
2
2
v xi + w 2vwx
S xx n
2 2 1
2
2
1 xi + x 2 1 x x
S xx n
2 1
nx 2
( x x )2
2 1
2
2
2
2
2 1
x
+
x
2
xx
+
x
=
S
+
(
x
x
)
=
+
i
xx
.
S xx n
n
n
S
S xx n
xx
It follows that an exact 1 CI for is z / 2 V , and an exact p-value for
0
testing H 0 : = 0 versus H1 : 0 is 2 P Z >
Example 3
Suppose that the data in Example 1 follow the SLR model with normal errors and
standard deviation 0.15. Estimate the slope parameter and the mean of a y-value with
covariate 2.2. For each quantity, report a point estimate and suitable 95% CI.
Solution
Here (as in Example 2):

x=
1 n
1
xi = (0 + 0.5 + 1 + 1.5 + 2 + 2.5 + 3) = 1.5
7
n i =1
y=
1 n
1
yi = (2 + 3.1 + 3 + 3.8 + 4.1 + 4.3 + 6) = 3.757
n i =1
7
x
i =1
= 0 2 + 0.52 + ... + 32 = 22.75 ,
2
i
i =1
S xx = xi2 nx 2 = 7 ,
i =1
b=
x y
S xy
S xx
= 1.107 ,
= 0 2 + 0.5 3.1 + ... + 3 6 = 47.2
S xy = xi yi nxy = 7.75 .
i =1
= 0.15,
Vb =
2
S xx
= 0.003214.
Thus, we estimate the slope parameter by b = 1.107, and a 95% CI for is
(b z
/2
Vb = (0.996, 1.218) (using z /2 = z0.025 = 1.96).
Next, we wish to estimate = + x , where x = 2.2. To this end:

a = y bx = 2.096 ,
= a + xb = 4.532,
1 ( x x )2
V = 2 +
= 0.004789.
S xx
n
Thus, we estimate the mean as = 4.532, and a 95% CI for is
( z
/2
V = (4.397, 4.668).
R Code
options(digits=4); sig=0.15; xval = 2.2

x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
b=Sxy/Sxx; Vb=sig^2/Sxx; CIbeta=b+c(-1,1)*qnorm(0.975)*sqrt(Vb)
c(b, Vb, CIbeta) # 1.107143 0.003214 0.996023 1.218262
a =ybar-b*xbar; muhat = a + xval*b
Vmuhat = sig^2 * ( 1/n + (xval-xbar)^2 / Sxx )
CImu = muhat + c(-1,1)*qnorm(0.975)*sqrt(Vmuhat)
c(a, muhat, Vmuhat, CImu) # 2.096429 4.532143 0.004789 4.396504 4.667782
Prediction
We have already seen how to perform inference on = + x , the mean of a y-value
with covariate value x. This quantity may also be thought of as the average of a
hypothetically infinite number of independent 'new' y-values, all with covariate x.
But what if we wish to perform inference on just a single such value itself (and not on
its mean or expectation)?
To this end, we may write the new independent single value of interest as
y = + x+e,
where e ~ N (0, 2 ) is an error term which is independent of e1 ,..., en .
Now, Ee = 0, and so we may estimate the new value y = + x + e by y = a + bx .
Notably, this is exactly the same as the estimate = a + xb of = + x .
To construct an interval estimate for y = + x + e , we consider the error in
estimation (or prediction), y y , whose mean is zero and which has variance
V ( y y ) = V {(a + bx) ( + x + e)} = V {(a + bx) e} .
= V (a + bx) + Ve 2C (a + bx, e) .
Now, a and b are functions of y1 ,..., yn , which are independent of the new error term,
e. Therefore C (a + bx, e) = 0, and so
V ( y y ) = V + Ve 2 0
1 ( x x )2
1 ( x x )2
2
2
=2 +
+
+
=
+
+
.
S xx
S xx
n
n
Since y y is a linear combination of normal random variables, we now have that
y y
~ N (0,1) ,
V ( y y )
from which it follows that an exact 1 prediction interval (PI) for the new value y is
1 ( x x )2
y z /2 V ( y y ) = a + bx z /2 1 + +
n
S xx
Note that this interval for y = + x + e is somewhat wider than the exact 1 CI
for = + x , namely
z / 2 V = a + bx z /2
1 ( x x )2
+
n
S xx
This makes sense, since there is obviously more variability associated with a single
value y than with the average of a hypothetically infinite number of such values.
As an exercise, the reader may wish to find an exact 1 PI for the average of m
independent y-values all having covariate x. (This should be 'between' the 1 CI for
and the 1 PI for y, and it should converge to the CI as m tends to infinity.)
The case of unknown variance

Consider the SLR model above (with e1 ,..., en ~ iid N (0, 2 ) ), but now suppose the
normal variance 2 is unknown. In that case, an important result is that the quantity
s2 =
SSE
n2
1 n 2
1 n
1 n
=
e
=
(
y
y
)
=
( yi a bxi ) 2
i
n2 i n2 i
n 2 i =1
i =1
i =1
is unbiased and consistent for 2 . See Note 2 below for a proof that Es 2 = 2 .
Some other important results in that case are that
(n 2) s 2
~ 2 (n 2)
and that s 2 is independent of both a and b. The proof of these results is omitted.
Note 1: Here, s 2 is not the same as the sample variance, s 2y =
1 n
( yi y )2 .
n 1 i =1
Note 2: The following is a proof that Es 2 = 2 .

First observe that Eei = Eyi Ea xi Eb = ( + xi ) E xi = 0 .
Therefore, Eei2 = Vei = Vyi + Va + xi2Vb 2C ( yi , a ) 2 xi C ( yi , b) + 2 xi C (a, b) .
1
Now, C ( yi , b) = C yi ,
S xx
y (x
j =1
x x 2
1
x ) = C yi ,
yi ( xi x ) = i
.
S xx
S xx
1 n
1
Also, C ( yi , a) = C ( yi , y xb ) = C yi , y j x
S xx
n j =1
y (x
j =1
x)
1
x x 1
x x 2
= C yi , yi x i
= x i
n
S
n
S
xx
xx
Therefore, using previous results, we have that
2 j =1 x 2j
n
1
x x 2
xi x 2
x 2
2 x i
2
x
2
x
i
i
nS xx
S xx
S xx
S xx
S xx
n
2
1 n 2
2
2
=
S xx + x j + xi S xx + 2 x ( xi x ) 2 xi ( xi x ) 2 xi x
S xx
n j =1
n
Ee = +
2
i
+ xi2
2 n 2
1 n 2
S
+
x j xi2 + 2 xxi + 2 x 2 .
xx
S xx n
n j =1
So E ( SSE ) = Eei =
i =1
2 n 2
n
S xx +
S xx n
x 2j
j =1
2
2
2
x
+
2xnx
+
2nx
= (n 2) ,
i
i =1
SSE
2
and so it follows that Es 2 = E
= , as required.
n2
The above results can be used to show that:
a
~ t (n 2) ,
Va
b
~ t (n 2)
Vb
and
~ t (n 2) ,
2
2
2
2
= s and V = s v 2 1 x 2 + w2 2vwx
= s xi , Vb
where Va
i
S xx
S xx n
nS xx
, Vb
and V are the same as Va, Vb and V with 2 replaced by s 2 ).
(i.e. where Va
(Keep in mind that = u + v + w and = u + va + wb .)
On the basis of these facts, inference regarding , and can proceed exactly as
before, but with and z /2 everywhere changed to s and t / 2 (n 2) , respectively.
= s2 / S ,
, where Vb
For example, an exact 1 CI for is b t /2 (n 2) Vb
xx
b 0
and an exact p-value for testing H 0 : = 0 vs H1 : 0 is 2 P Tn 2 >
Vb
Also, an exact 1 PI for a single new value y with covariate value x is
1 ( x x )2
y t / 2 (n 2) V ( y y ) = a + bx t / 2 (n 2) s 1 + +
n
S xx
Example 4
Consider Example 1. Assuming that the data follow the SLR model with normal
errors, but with the normal variance 2 unknown, find a 95% CI for the quantity
= + x , where x = 2.2, and also a 95% PI for a new y-value with covariate x.
Solution
Using results from previous examples, we have that:
= y = a + xb = 2.096 + 2.2 1.107 = 4.532 (same as before)

s2 =
SSE 0.9568
=
= 0.1914
n2
5
( x x )2
2 1
V = s +
= 0.04073
S xx
n
t / 2 (n 2) = t0.025 (5) = 2.571 .
So a 95% CI for is t / 2 (n 2) V = (4.013, 5.051).
1 ( x x )2
Also, V ( y y ) = s 2 1 + +
= 0.2321.
n
S
xx
Therefore, a 95% PI for y is y t / 2 (n 2) V ( y y ) = (3.294, 5.771).
Note that the 95% PI has the same centre as, but is wider than, the 95% CI.
Below is a figure showing:
the observed data pairs in the sample, namely ( x1 , y1 ),..., ( xn , yn )
the estimated regression line, y = a + bx , where a = 2.096 and b = 1.107
the estimate of = + 2.2 , namely = a + 2.2b = 4.532
the 95% CI for = + 2.2 , namely (4.013, 5.051)
the estimate and 95% CI for = + x for all x over the range 0 x 3
the 95% PI for y = + 2.2 + e , namely (3.294, 5.771)
the 95% PI for y = + x + e for all x over the range 0 x 3 .
Note that one line not shown in the figure is the unknown 'true' regression line given
by the equation Ey = + x . In most situations, this will never be known exactly.
Observed values at x = 0,0.5,1,1.5,2,2.5,3

Estimate and 95% CI for mu at x = 2.2
95% prediction interval for y at x = 2.2
Estimate of mu as a function of x
95% CI for mu as a function of x
95% PI for y as a function of x
y (tonnes of wheat)
Inference on mu = alpha + x*beta and y = mu + e as functions of x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
R Code
options(digits=4); x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
b=Sxy/Sxx; a =ybar-b*xbar; c(a,b) # 2.096 1.107
yhat=a+b*x; ehat=y-yhat; rbind(yhat, ehat)
SSE = sum(ehat^2); SSE # 0.9568
xval=2.2; muhat=a+xval*b; muhat # 4.532
s2=SSE/(n-2); c(s2, sqrt(s2)) # 0.1914 0.4374
Vhatmuhat=s2*(1/n + (xval-xbar)^2/Sxx); Vhatmuhat # 0.04073
tval=qt(0.975,n-2); tval # 2.571
CI = muhat + c(-1,1)*tval*sqrt(Vhatmuhat); CI # 4.013 5.051
ypred=muhat
Vhatypred=s2*(1+1/n + (xval-xbar)^2/Sxx); Vhatypred # 0.2321
PI = ypred + c(-1,1)*tval*sqrt(Vhatypred); PI # 3.294 5.771
X11(w=8,h=5)
plot(x,y,xlim=c(0,3),ylim=c(0,8),
main="Inference on mu = alpha + x*beta and y = mu + e as functions of x",
abline(a,b,lwd=2);
points(rep(xval,3),c(muhat,CI),lwd=2,cex=1.2)
points(rep(xval,2),PI,pch=2,lwd=2,cex=1.2)
legend(0,8,c("Observed values at x = 0,0.5,1,1.5,2,2.5,3",

"Estimate and 95% CI for mu at x = 2.2",
" 95% prediction interval for y at x = 2.2"),
pch=c(16,1,2), pt.lwd=c(1,2,2),pt.cex=c(1.2,1.2,1.2))
xvec=seq(0,3,0.01); J=length(xvec); CILBvec= rep(NA,J); CIUBvec= rep(NA,J)
for(j in 1:J){ xvalue=xvec[j]; muhatvalue=a+xvalue*b
CI = muhatvalue + c(-1,1)*tval*sqrt(s2*(1/n + (xvalue-xbar)^2/Sxx))
CILBvec[j]=CI[1]; CIUBvec[j]=CI[2] }
lines(xvec,CILBvec); lines(xvec,CIUBvec)
PILBvec= rep(NA,J); PIUBvec= rep(NA,J)

for(j in 1:J){ xvalue=xvec[j]; muhatvalue=a+xvalue*b
PI = muhatvalue + c(-1,1)*tval*sqrt(s2*(1+1/n + (xvalue-xbar)^2/Sxx))
PILBvec[j]=PI[1]; PIUBvec[j]=PI[2] }
lines(xvec, PILBvec,lty=2); lines(xvec, PIUBvec,lty=2)
legend(1.6,2.4,c("Estimate of mu as a function of x",

"95% CI for mu as a function of x",
"95% PI for y as a function of x"),lty=c(1,1,2), lwd=c(2,1,1))
The case of non-normality
Suppose that the sample values are not normally distributed. Then all of the above
inferences are still valid, with an understanding that they are only approximate, but
that the approximations improve as the sample size n increases, and that the
inferences are asymptotically exact, meaning exact in the limit as n tends to infinity.
This is true even if is unknown and replaced in the relevant formulae by s and/or if
t / 2 (n 2) and Tn 2 are replaced by z /2 and Z. It is true even if the error terms

e1 ,..., en are not independent and/or not identically distributed. All that is required is
that these terms are uncorrelated with mean zero and finite variance 2 .
These facts follow by the central limit theorem (CLT) and other results in probability
theory. As a very rough rule of thumb, the approximations may be considered 'good' if
n is 'large', meaning n 30 (say).
The relationship between simple linear regression and correlation analysis

We have seen that simple linear regression can be a useful tool for exploring the
relationship between two variables x and y, where y is a random variable and x is a
non-random variable (the covariate). The strength of that relationship is reflected by
the least squares estimate b =
S xy
S xx
of the simple linear regression slope parameter .
Another useful tool for exploring the relationship between two variables x and y, but
in the context where both are random variables, is correlation analysis. In particular,
S xy
the strength of that relationship is reflected by the sample correlation, r =

which provides a consistent estimate of the correlation, = Corr ( x, y ) =
S xx S yy
C ( x, y )
.
Vx Vy
We see that there is a relationship between the two estimates, namely r = b
S xx
.
S yy
Another relationship between simple linear regression and correlation analysis is that
r2 =
S yy SSE
S yy
= 1
SSE
,
S yy
i =1
i =1
where SSE = ( yi a bxi ) 2 and S yy = ( yi y ) 2 .

(For a proof of this, see Section 11.8 in the textbook).
Here, r 2 may be interpreted as the proportion of the 'total' variation in the y-values
(around their average, y ) which is 'explained' by the x-variable in a simple linear
regression (i.e. by the variation of the y-values around their fitted values, a + bxi ).
We call r 2 the coefficient of determination. The idea of analysing the 'total' variation
in a regression (such as of y on x in our examples) and attributing part of that variation
to some variable (like x) (or set of variables) is called analysis of variance (ANOVA).
This is a topic in more advanced courses on regression.

Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model

Uploaded by

Copyright:

Available Formats

Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model

Uploaded by

Copyright:

Available Formats

STAT2001_CH11A Page 1 of 24

SIMPLE LINEAR REGRESSION (Chapter 11)

We begin this chapter by reviewing an important topic: inference regarding a common

where e1 ,..., en ~ iid N (0, 2 ) . We refer to e1 ,..., en as the error terms.

This implies that an unbiased estimate of is the sample mean, y =

exact 1 confidence interval (CI) for is

/ n , and an exact p-value

is unbiased and consistent for 2 . It can also be shown that

that s 2y is independent of y . From these results, it follows that

This fact implies that an exact 1 CI for is

where Tn 1 ~ t (n 1) , and where t / 2 (n 1) denotes the upper -quantile of Tn 1 .

n is 'large', meaning n 30 (say).

Introduction to regression: Simple linear regression

This new model may be written

We are interested in the effects of a particular fertiliser on wheat yield.

Quantity of fertiliser (kg), xi

Plot of wheat yield versus fertiliser

x = c(0, 0.5, 1, 1.5, 2, 2.5, 3)

Least squares estimation

Consider the SLR model given by

First, define the ith fitted mean as

Next define the ith error as

SSE = ei2 = ( yi yi ) 2 = ( yi a bxi ) 2 .

Setting these derivatives to zero, respectively, we get:

0 = ( yi a bxi )xi = xi yi a xi b xi2 a =

Equating these two different expressions for a, we get

nxy nbx 2 = xi yi b xi2

Thus, the required formulae are given by

Note 1: Another way to express the LSEs are as

Yet another way to express b is as

(the sample covariance for the two variables)

= 0 2 + 0.52 + ... + 32 = 22.75

= 0 2 + 0.5 3.1 + ... + 3 6 = 47.2

So the least squares estimates are:

The fitted values are:

So the fitted errors are:

Finally, the sum of squares for error is

SSE = ei2 = (0.096) 2 + 0.452 + ... + 0.5822 = 0.957 .

Regression of wheat yield on fertiliser

Properties of the least squares estimators

Theorem 1: b = S xy / S xx is an unbiased estimator of .

Next, note that

Theorem 2: a = y bx is an unbiased estimator of .

Proof: Observe that Ey =

(using Theorem 1).

Now, C ( yi , y j ) = 0 for all i j , and C ( yi , y j ) = Vyi = 2 if i = j.

Proof: First note that

So, by Theorems 3 and 4, Va = V ( y bx ) = Vy + x 2Vb 2 xC ( y , b)

Theorem 7: Let = u + v + w , where u, v and w are finite constants.

Making inferences under the SLR model

For example, an exact 1 CI for is b z / 2 Vb , where Vb = 2 / S xx , and

As another example, suppose that we are interested in = + x , the mean (or

It follows that an exact 1 CI for is z / 2 V , and an exact p-value for

Here (as in Example 2):

= 0 2 + 0.52 + ... + 32 = 22.75 ,

= 0 2 + 0.5 3.1 + ... + 3 6 = 47.2

Thus, we estimate the slope parameter by b = 1.107, and a 95% CI for is

Vb = (0.996, 1.218) (using z /2 = z0.025 = 1.96).