Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

STAT2001_CH11A Page 1 of 24

SIMPLE LINEAR REGRESSION (Chapter 11)


Review of some inference and notation: A common population mean model

We begin this chapter by reviewing an important topic: inference regarding a common


population mean based on a random sample. We then modify this topic so as to
introduce the topic of regression, in particular simple linear regression.

Consider a random sample, y1 ,..., yn , from the normal distribution with unknown
mean Eyi = and known variance Vyi = 2 . The model for these data may be written
yi ~ iid N ( , 2 ) , i = 1,..., n ,
or equivalently,
yi = + ei ,

i = 1,..., n ,

where e1 ,..., en ~ iid N (0, 2 ) . We refer to e1 ,..., en as the error terms.


The above model may be called the common population mean (CPM) model. In the
context of this model, a key result is that

y
~ N (0,1) .
/ n

This implies that an unbiased estimate of is the sample mean, y =

exact 1 confidence interval (CI) for is

(y z

1 n
yi , an
n i =1

/ n , and an exact p-value

/2

y 0
for testing H 0 : = 0 versus H1 : 0 is 2 P Z >
, where Z ~ N (0,1) .
/ n

Note 1: For notational simplicity, we will in this chapter not usually distinguish
between random variables and their realised values using upper and lower case letters.
For example, yi may refer to both what were denoted Yi and yi in previous chapters.

Note 2: We use the symbol because we wish to reserve for another quantity.

STAT2001_CH11A Page 2 of 24

Note 3: Suppose that the normal variance 2 is unknown. In that case, an important
result is that the sample variance

s 2y =

1 n
( yi y )2
n 1 i =1

1 n 2

=
yi ny 2

n 1 i =1

is unbiased and consistent for 2 . It can also be shown that

(n 1) s y2

that s 2y is independent of y . From these results, it follows that

This fact implies that an exact 1 CI for is

(y t

/2

~ 2 (n 1) , and

y
~ t (n 1) .
sy / n

(n 1) s y / n , and

y 0
an exact p-value for testing H 0 : = 0 versus H1 : 0 is 2 P Tn 1 >

sy / n

where Tn 1 ~ t (n 1) , and where t / 2 (n 1) denotes the upper -quantile of Tn 1 .

Note 4: We use the symbol s 2y because we wish to reserve s 2 for another quantity.

Note 5: Suppose that the sample values are not normally distributed. Then all of the
above inferences are still valid, with an understanding that they are only approximate,
but that the approximations improve as the sample size n increases, and that the
inferences are asymptotically exact, meaning exact in the limit as n tends to infinity.
This is true even if is unknown and replaced in the relevant formulae by s y and/or
if t / 2 (n 1) and Tn 1 are replaced by z /2 and Z. It is true even if the error terms

e1 ,..., en are not independent and/or not identically distributed. All that is required is
that these terms are uncorrelated with mean zero and finite variance 2 .
These facts follow by the central limit theorem (CLT) and other results in probability
theory. As a very rough rule of thumb, the approximations may be considered 'good' if

n is 'large', meaning n 30 (say).

STAT2001_CH11A Page 3 of 24

Introduction to regression: Simple linear regression

Consider the above CPM model, but now instead suppose that the n sample values
y1 ,..., yn do not all have the same mean , but rather that the ith value yi has mean

i = Eyi = + xi ,

i = 1,...,n,

where and are unknown constants and x1 ,..., xn are known constants.

This new model may be written


yi ~ N ( i , 2 ), i = 1,..., n
(where signifies independence) or
yi = i + ei , i = 1,..., n ,
where e1 ,..., en ~ iid N (0, 2 ) . As before, we refer to e1 ,..., en as the error terms.

The idea here is that we believe there to be a linear relationship between two variables
x and y which can be expressed in the form of the equation

= Ey = + x ,
with x1 ,..., xn being examples of x, and with y1 ,..., yn being examples of y. In this
equation, is implicitly a function of , and x.

The model just described is called the simple linear regression (SLR) model. In the
context of this model, we call y the dependent variable (since it depends on another
variable, namely x), and we call x the independent variable. Another term for x is the
covariate variable, and x1 ,..., xn may be referred to as the sample covariate values.
The focus of inference now are the two parameters and (and functions thereof),
rather than the single parameter as previously in the CPM model. We call and

the SLR parameters. More specifically, is the intercept parameter, and is the
slope parameter. The next example should explain why this terminology is used.

STAT2001_CH11A Page 4 of 24

Note 1: If = 0 then the SLR model reduces to the CPM model, with = .
Note 2: In some books, the symbols 0 and 1 are used instead of and ,
respectively. We will use the latter notation since it is easier to write and say.

Note 3: For definiteness, we have defined the SLR with iid normally distributed
errors that have a known variance 2 . As for the CPM model, we first treat this
'basic' version of the model, and later consider variations (e.g. unknown variance).

Example 1

We are interested in the effects of a particular fertiliser on wheat yield.


To this end, we divide a large field into 7 plots of equal size, similar soil quality, etc.
We then plant wheat in these 7 plots after adding varying amounts of the fertiliser.
At harvest time the yields are observed and recorded, as shown in the following table.

Field, i

Quantity of fertiliser (kg), xi

Yield (tonnes), yi

---------------------------------------------------------------------------------------------------1

0.0

2.0

0.5

3.1

1.0

3.0

1.5

3.8

2.0

4.1

2.5

4.3

7 (= n)

3.0

6.0

----------------------------------------------------------------------------------------------------

Produce a plot of these data and discuss the relationship between fertiliser and yield.

STAT2001_CH11A Page 5 of 24

Solution

The required plot is shown below. There appears to be a positive linear relationship
between fertiliser and yield, since we can imagine drawing a straight line which
passes roughly through the points. The equation of this imaginary line has the form
Ey = + x ,
where is the intercept and is the slope of the line. Thus it is plausible that the
data follow the SLR model
yi = i + ei , i = 1,..., n ,
where i = Eyi = + xi and e1 ,..., en ~ iid N (0, 2 ) .
We suppose that the two parameters and have some 'true' values which may
never be known exactly but which could be estimated.

6
5
4
3
2
1
0

y (tonnes of wheat)

Plot of wheat yield versus fertiliser

0.0

0.5

1.0

1.5

2.0

2.5

3.0

x (kilograms of fertiliser)

R Code

x = c(0, 0.5, 1, 1.5, 2, 2.5, 3)


y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
X11(w=8,h=4); par(mfrow=c(1,1))
plot(x,y,xlim=c(0,3),ylim=c(0,7), main="Plot of wheat yield versus fertiliser",
xlab="x (kilograms of fertiliser)",ylab="y (tonnes of wheat)",pch=16,cex=1.2)

STAT2001_CH11A Page 6 of 24

Least squares estimation

Consider the SLR model given by


yi = i + ei ,

i = 1,..., n

where i = Eyi = + xi
and e1 ,..., en ~ iid N (0, 2 ) , with 2 known.
We will now derive formulae for suitable estimates a and b of and , respectively.
These estimates will be functions of the observed data pairs, ( x1 , y1 ),..., ( xn , yn ) .

Note: The estimates of and could also be denoted and (or 0 and 1 ).
However, for ease of writing, and speaking, we choose to use the symbols a and b.

First, define the ith fitted mean as

i = a + bxi .
Note: This quantity also provides an estimate of yi = + xi + ei (since Eei = 0 ).
Therefore, it may also be denoted yi and be referred to as the ith fitted value or ith

(since = Ey ).
predicted value or ith predictor. Another notation for i is Ey
i
i
i

Next define the ith error as


ei = yi i = yi xi ,
and the ith fitted error as
ei = yi i = yi yi = yi a bxi .
Now, intuitively, a straight line with equation y = a + bx will provide a good fit to
the n data points if the sum of the squares of the n fitted errors is small. Therefore, one
reasonable approach is to choose a and b so as to make that sum as small as possible.

STAT2001_CH11A Page 7 of 24
To formalise this idea, we define the sum of squares for error (SSE) as
n

i =1

i =1

i =1

SSE = ei2 = ( yi yi ) 2 = ( yi a bxi ) 2 .

We next write down the partial derivatives of the SSE with respect to a and b:
n
SSE
= 2( yi a bxi )1 (1)
a
i =1
n
SSE
= 2( yi a bxi )1 ( xi ) .
b
i =1

Setting these derivatives to zero, respectively, we get:


n

i =1

i =1

i =1

i =1

0 = ( yi a bxi ) = yi a 1 b xi = ny an bnx a = y bx
n

i =1

i =1

xi yi b xi2

i =1

i =1

i =1

i =1

nx

0 = ( yi a bxi )xi = xi yi a xi b xi2 a =

Equating these two different expressions for a, we get


n

i =1

i =1

xi yi b xi2

y bx =

nx

i =1

i =1

nxy nbx 2 = xi yi b xi2

n
n
b xi2 nx 2 = xi yi nxy
i =1
i =1
n

b=

x y nxy
i =1
n

x
i =1

2
i

.
nx

Thus, the required formulae are given by


n

b=

x y nxy
i =1
n

x
i =1

2
i

and
nx

a = y bx .

We call a and b the least squares estimates (LSEs) of the SLR parameters and .

STAT2001_CH11A Page 8 of 24

Note 1: Another way to express the LSEs are as


b=

S xy

and

S xx

a = y bx ,

= xi yi nxy
i =1

where: S xy = ( xi x )( yi y )
i =1

n
n
2
2
2
= ( xi x ) = xi nx .
i =1
i =1

S xx = ( xi x )( xi x )
i =1

Yet another way to express b is as


sxy =

sxx =

S xy
n 1

sxy
sxx

, where:

(the sample covariance for the two variables)

S xx
= sx2 (the sample variance for the covariate variable).
n 1

Note 2: The quantities a and b here may also be called the least squares estimates in
any context with data of the form ( x1 , y1 ),..., ( xn , yn ) . The formulae for a and b do not
involve 2 or depend on any assumptions about the distribution of the errors e1 ,..., en .

Example 2

For the data in Example 1, find the least squares estimates of the simple linear
regression parameters. Then draw the associated line of best fit. Also calculate and
display in your graph the fitted values. Also, calculate the fitted errors and the SSE.

Solution

Here: x =
y=

1 n
1
xi = (0 + 0.5 + 1 + 1.5 + 2 + 2.5 + 3) = 1.5

n i =1
7
1 n
1
yi = (2 + 3.1 + 3 + 3.8 + 4.1 + 4.3 + 6) = 3.757

n i =1
7

STAT2001_CH11A Page 9 of 24
n

x
i =1

2
i

= 0 2 + 0.52 + ... + 32 = 22.75

x y
i =1

= 0 2 + 0.5 3.1 + ... + 3 6 = 47.2

S xx = xi2 nx 2 = 7
i =1
n

S xy = xi yi nxy = 7.75 .
i =1

So the least squares estimates are:


b=

S xy
S xx

= 1.107

and

a = y bx = 2.096 .

The fitted values are:


y1 = a + bx1 = 2.096 + 1.107 0 = 2.096
y 2 = a + bx2 = 2.096 + 1.107 0.5 = 2.650
y3 = ... = 3.204 ,

y 4 = ... = 3.757 ,

y 6 = ... = 4.864 ,

y 7 = ... = 5.418 .

y5 = ... = 4.311

So the fitted errors are:


e1 = y1 y1 = 2 2.096 = 0.096
e2 = y2 y 2 = 3.1 2.65 = 0.45
e3 = ... = 0.203 ,

e4 = ... = 0.043 ,

e6 = ... = 0.564 ,

e7 = ... = 0.582 .

e5 = ... = 0.211

Finally, the sum of squares for error is


n

SSE = ei2 = (0.096) 2 + 0.452 + ... + 0.5822 = 0.957 .


i =1

Below is the required figure, showing the observed values y1 ,..., yn , the estimated
regression line y = a + bx , and the fitted values, yi = a + bxi , i = 1,...,n.

STAT2001_CH11A Page 10 of 24

Observed values
Fitted values

y (tonnes of wheat)

Regression of wheat yield on fertiliser

0.0

0.5

1.0

1.5

2.0

2.5

3.0

x (kilograms of fertiliser)

R Code

options(digits=4); x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
xbar=mean(x); sumx2=sum(x^2); ybar=mean(y); sumxy = sum(x*y); n=length(y)
c(xbar,sumx2,ybar,sumxy) # 1.500 22.750 3.757 47.200
Sxx=sumx2-n*xbar^2; Sxy=sumxy-n*xbar*ybar; c(Sxx, Sxy) # 7.00 7.75
b=Sxy/Sxx; a =ybar-b*xbar; c(a,b) # 2.096 1.107
yhat=a+b*x; ehat=y-yhat; rbind(yhat, ehat)
# yhat 2.09643 2.65 3.2036 3.75714 4.3107 4.8643 5.4179
# ehat -0.09643 0.45 -0.2036 0.04286 -0.2107 -0.5643 0.5821
SSE = sum(ehat^2); SSE # 0.9568

X11(w=8,h=4)
plot(x,y,xlim=c(0,3),ylim=c(0,7), main="Regression of wheat yield on fertiliser",
xlab="x (kilograms of fertiliser)",ylab="y (tonnes of wheat)",pch=16,cex=1.2)
abline(a,b,lwd=2); points(x,yhat,lwd=2,cex=1.2)
legend(0,7,c("Observed values","Fitted values"),
pch=c(16,1), pt.lwd=c(1,1.5),pt.cex=c(1.2,1.2))

STAT2001_CH11A Page 11 of 24

Properties of the least squares estimators

We will now determine some properties of the LSEs under the SLR model.
Unless otherwise indicated, all summations will be over i = 1,...,n.

Theorem 1: b = S xy / S xx is an unbiased estimator of .


Proof: First, observe that S xx = ( xi x ) 2 is a constant.

Next, note that


where

S xy = ( xi x )( yi y ) = ( xi x ) yi y ( xi x ) ,

( x x ) = x x 1 = nx xn = 0 .
i

Then also, S xx = ( xi x ) 2 = ( xi x ) xi x ( xi x ) = ( xi x ) xi 0 .
Therefore ES xy = ( xi x ) Eyi = ( xi x )( + xi ) = ( xi x ) + ( xi x ) xi
= 0 + S xx .

It follows that Eb =

ES xy
S xx

S xx
S xx

= .

Theorem 2: a = y bx is an unbiased estimator of .

Proof: Observe that Ey =

1
1
1
Eyi = ( + xi ) = (n + nx ) = + x .

n
n
n

Therefore Ea = E ( y bx ) = Ey xEb = ( + x ) x =

(using Theorem 1).

Note: Theorems 1 and 2 are true under much less restrictive assumptions than those in
the SLR model (as we have defined it). These results are true so long as all the error
terms e1 ,..., en have mean 0. These error terms need not be uncorrelated or normal;
they don't even need to be identically distributed or have the same variance.

STAT2001_CH11A Page 12 of 24

Theorem 3: Vb = 2 / S xx .
Proof: First, VS xy = V ( xi x ) yi = ( xi x ) 2 Vyi = S xx 2 .

S xy S xx 2 2
Therefore Vb = V
.
= 2 =
S xx
S xx
S xx
Theorem 4: C ( y , b) = 0 .

1 n
1
Proof: C ( y , b) = C yi ,
S xx
n i =1

1
(xj x ) y j =

j =1
nS xx
n

( x
i =1 j =1

x )C ( yi , y j ) .

Now, C ( yi , y j ) = 0 for all i j , and C ( yi , y j ) = Vyi = 2 if i = j.

Therefore C ( y , b) =

Theorem 5: Va =

1
nS xx

i =1

nS xx

( xi x ) 2 =

2 xi2

Proof: First note that

nS xx

0 = 0 .

Vy =

1
n2

Vyi =

1
n2

2 =

1
2
2
n

=
.
n2
n

So, by Theorems 3 and 4, Va = V ( y bx ) = Vy + x 2Vb 2 xC ( y , b)

2
n

+ x2

1 x2
2x 0 = 2 +

S xx
n S xx

x 2 nx 2 ) + nx 2 2 xi2
S xx + nx 2
2 ( i
=
=
.
=

nS xx
nS xx
nS xx

STAT2001_CH11A Page 13 of 24

Theorem 6: C (a, b) =

x 2
.
S xx

x 2
Proof: C (a, b) = C ( y bx , b) = C ( y , b) xC (b, b) = 0 xVb =
.
S xx

Theorem 7: Let = u + v + w , where u, v and w are finite constants.


(Thus, is any linear combination of the regression parameters.)

Then: (i)

(ii)

Proof: (i)

(ii)

an unbiased estimate of is = u + va + wb

2 2 1

2
2
V =
v xi + w 2vwx .
S xx n

E = u + vEa + wEb = u + v + w = .

2 xi2
2
2
2
2

V = v Va + w Vb + 2vwC (a, b) = v
+ w2

nS

xx
S xx

x 2
+
vw
2

S xx

2 1
2
2
v xi + w 2vwx .
S xx n

Note: Theorems 3 to 7 are true under much less restrictive assumptions than all of
those in the SLR model (as we have defined it). These results are true so long as the
error terms e1 ,..., en are uncorrelated and have the same variance 2 . The error terms
need not be independent or normal; they don't even need to be identically distributed.

STAT2001_CH11A Page 14 of 24

Making inferences under the SLR model


Under the assumption in the SLR model that e1 ,..., en ~ iid N (0, 2 ) , it is true that:
a
~ N (0,1) ,
Va

b
~ N (0,1)
Vb

and

~ N (0,1) ,

where a, b, , Va, Vb and V are as above. This follows because each of a, b, and so
also , is a linear combination of the normally distributed data values, y1 ,..., yn .
These results allow us to perform inference on the regression parameters and , as
well as on any linear combination of the parameters having the form = u + v + w .

For example, an exact 1 CI for is b z / 2 Vb , where Vb = 2 / S xx , and

b 0
an exact p-value for testing H 0 : = 0 versus H1 : 0 is 2 P Z >
.
Vb

As another example, suppose that we are interested in = + x , the mean (or


mathematical expectation) of a y-value with some specified covariate value x.
Now, is just a special case of the general linear combination = u + v + w
(with u = 0, v = 1 and w = x ). Therefore, an unbiased estimate of is

= 0 + 1a + xb = a + xb ,
and also, by Theorem 7,
V =

2 1
2
2
v xi + w 2vwx
S xx n

2 2 1

2
2
1 xi + x 2 1 x x
S xx n

2 1

nx 2
( x x )2
2 1
2
2
2
2
2 1
x

+
x

2
xx
+
x
=
S
+
(
x

x
)
=

+
i

xx

.
S xx n
n
n
S

S xx n

xx

It follows that an exact 1 CI for is z / 2 V , and an exact p-value for

0
testing H 0 : = 0 versus H1 : 0 is 2 P Z >

STAT2001_CH11A Page 15 of 24

Example 3

Suppose that the data in Example 1 follow the SLR model with normal errors and
standard deviation 0.15. Estimate the slope parameter and the mean of a y-value with
covariate 2.2. For each quantity, report a point estimate and suitable 95% CI.

Solution

Here (as in Example 2):


x=

1 n
1
xi = (0 + 0.5 + 1 + 1.5 + 2 + 2.5 + 3) = 1.5

7
n i =1

y=

1 n
1
yi = (2 + 3.1 + 3 + 3.8 + 4.1 + 4.3 + 6) = 3.757

n i =1
7

x
i =1

= 0 2 + 0.52 + ... + 32 = 22.75 ,

2
i

i =1

S xx = xi2 nx 2 = 7 ,
i =1

b=

x y

S xy
S xx

= 1.107 ,

= 0 2 + 0.5 3.1 + ... + 3 6 = 47.2

S xy = xi yi nxy = 7.75 .
i =1

= 0.15,

Vb =

2
S xx

= 0.003214.

Thus, we estimate the slope parameter by b = 1.107, and a 95% CI for is

(b z

/2

Vb = (0.996, 1.218) (using z /2 = z0.025 = 1.96).

Next, we wish to estimate = + x , where x = 2.2. To this end:


a = y bx = 2.096 ,

= a + xb = 4.532,

1 ( x x )2
V = 2 +
= 0.004789.
S xx
n
Thus, we estimate the mean as = 4.532, and a 95% CI for is

( z

/2

V = (4.397, 4.668).

STAT2001_CH11A Page 16 of 24

R Code

options(digits=4); sig=0.15; xval = 2.2


x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
xbar=mean(x); sumx2=sum(x^2); ybar=mean(y); sumxy = sum(x*y); n=length(y)
c(xbar,sumx2,ybar,sumxy) # 1.500 22.750 3.757 47.200
Sxx=sumx2-n*xbar^2; Sxy=sumxy-n*xbar*ybar; c(Sxx, Sxy) # 7.00 7.75
b=Sxy/Sxx; Vb=sig^2/Sxx; CIbeta=b+c(-1,1)*qnorm(0.975)*sqrt(Vb)
c(b, Vb, CIbeta) # 1.107143 0.003214 0.996023 1.218262
a =ybar-b*xbar; muhat = a + xval*b
Vmuhat = sig^2 * ( 1/n + (xval-xbar)^2 / Sxx )
CImu = muhat + c(-1,1)*qnorm(0.975)*sqrt(Vmuhat)
c(a, muhat, Vmuhat, CImu) # 2.096429 4.532143 0.004789 4.396504 4.667782

Prediction
We have already seen how to perform inference on = + x , the mean of a y-value
with covariate value x. This quantity may also be thought of as the average of a
hypothetically infinite number of independent 'new' y-values, all with covariate x.

But what if we wish to perform inference on just a single such value itself (and not on
its mean or expectation)?

To this end, we may write the new independent single value of interest as
y = + x+e,
where e ~ N (0, 2 ) is an error term which is independent of e1 ,..., en .
Now, Ee = 0, and so we may estimate the new value y = + x + e by y = a + bx .
Notably, this is exactly the same as the estimate = a + xb of = + x .

STAT2001_CH11A Page 17 of 24
To construct an interval estimate for y = + x + e , we consider the error in
estimation (or prediction), y y , whose mean is zero and which has variance
V ( y y ) = V {(a + bx) ( + x + e)} = V {(a + bx) e} .
= V (a + bx) + Ve 2C (a + bx, e) .

Now, a and b are functions of y1 ,..., yn , which are independent of the new error term,
e. Therefore C (a + bx, e) = 0, and so
V ( y y ) = V + Ve 2 0

1 ( x x )2
1 ( x x )2
2
2
=2 +
+
+
=
+
+

.
S xx
S xx
n
n
Since y y is a linear combination of normal random variables, we now have that

y y
~ N (0,1) ,
V ( y y )
from which it follows that an exact 1 prediction interval (PI) for the new value y is

1 ( x x )2
y z /2 V ( y y ) = a + bx z /2 1 + +

n
S xx

Note that this interval for y = + x + e is somewhat wider than the exact 1 CI
for = + x , namely

z / 2 V = a + bx z /2

1 ( x x )2
+
n
S xx

This makes sense, since there is obviously more variability associated with a single
value y than with the average of a hypothetically infinite number of such values.
As an exercise, the reader may wish to find an exact 1 PI for the average of m
independent y-values all having covariate x. (This should be 'between' the 1 CI for

and the 1 PI for y, and it should converge to the CI as m tends to infinity.)

STAT2001_CH11A Page 18 of 24

The case of unknown variance


Consider the SLR model above (with e1 ,..., en ~ iid N (0, 2 ) ), but now suppose the
normal variance 2 is unknown. In that case, an important result is that the quantity
s2 =

SSE
n2

1 n 2
1 n
1 n

=
e
=
(
y

y
)
=
( yi a bxi ) 2

i
n2 i n2 i
n 2 i =1
i =1
i =1

is unbiased and consistent for 2 . See Note 2 below for a proof that Es 2 = 2 .
Some other important results in that case are that
(n 2) s 2

~ 2 (n 2)

and that s 2 is independent of both a and b. The proof of these results is omitted.

Note 1: Here, s 2 is not the same as the sample variance, s 2y =

1 n
( yi y )2 .

n 1 i =1

Note 2: The following is a proof that Es 2 = 2 .


First observe that Eei = Eyi Ea xi Eb = ( + xi ) E xi = 0 .
Therefore, Eei2 = Vei = Vyi + Va + xi2Vb 2C ( yi , a ) 2 xi C ( yi , b) + 2 xi C (a, b) .

1
Now, C ( yi , b) = C yi ,
S xx

y (x
j =1

x x 2
1
x ) = C yi ,
yi ( xi x ) = i
.
S xx
S xx

1 n
1
Also, C ( yi , a) = C ( yi , y xb ) = C yi , y j x
S xx
n j =1

y (x
j =1

x)

1
x x 1
x x 2
= C yi , yi x i
= x i

n
S
n
S
xx
xx

STAT2001_CH11A Page 19 of 24
Therefore, using previous results, we have that

2 j =1 x 2j
n

1
x x 2
xi x 2
x 2
2 x i

2
x

2
x

i
i
nS xx
S xx
S xx
S xx
S xx
n

2
1 n 2
2
2
=
S xx + x j + xi S xx + 2 x ( xi x ) 2 xi ( xi x ) 2 xi x
S xx
n j =1
n

Ee = +
2
i

+ xi2

2 n 2

1 n 2
S
+
x j xi2 + 2 xxi + 2 x 2 .

xx
S xx n
n j =1

So E ( SSE ) = Eei =
i =1

2 n 2

n
S xx +
S xx n

x 2j
j =1

2
2
2
x
+
2xnx
+
2nx
= (n 2) ,

i
i =1

SSE
2
and so it follows that Es 2 = E
= , as required.
n2

The above results can be used to show that:

a
~ t (n 2) ,

Va

b
~ t (n 2)

Vb

and

~ t (n 2) ,

2
2
2
2
= s and V = s v 2 1 x 2 + w2 2vwx
= s xi , Vb
where Va
i
S xx
S xx n
nS xx

, Vb
and V are the same as Va, Vb and V with 2 replaced by s 2 ).
(i.e. where Va
(Keep in mind that = u + v + w and = u + va + wb .)
On the basis of these facts, inference regarding , and can proceed exactly as
before, but with and z /2 everywhere changed to s and t / 2 (n 2) , respectively.

= s2 / S ,
, where Vb
For example, an exact 1 CI for is b t /2 (n 2) Vb
xx

b 0
and an exact p-value for testing H 0 : = 0 vs H1 : 0 is 2 P Tn 2 >

Vb

STAT2001_CH11A Page 20 of 24
Also, an exact 1 PI for a single new value y with covariate value x is

1 ( x x )2
y t / 2 (n 2) V ( y y ) = a + bx t / 2 (n 2) s 1 + +

n
S xx

Example 4

Consider Example 1. Assuming that the data follow the SLR model with normal
errors, but with the normal variance 2 unknown, find a 95% CI for the quantity

= + x , where x = 2.2, and also a 95% PI for a new y-value with covariate x.

Solution

Using results from previous examples, we have that:

= y = a + xb = 2.096 + 2.2 1.107 = 4.532 (same as before)


s2 =

SSE 0.9568
=
= 0.1914
n2
5

( x x )2
2 1

V = s +
= 0.04073
S xx
n
t / 2 (n 2) = t0.025 (5) = 2.571 .

So a 95% CI for is t / 2 (n 2) V = (4.013, 5.051).

1 ( x x )2
Also, V ( y y ) = s 2 1 + +
= 0.2321.
n
S

xx

Therefore, a 95% PI for y is y t / 2 (n 2) V ( y y ) = (3.294, 5.771).

Note that the 95% PI has the same centre as, but is wider than, the 95% CI.

STAT2001_CH11A Page 21 of 24
Below is a figure showing:

the observed data pairs in the sample, namely ( x1 , y1 ),..., ( xn , yn )

the estimated regression line, y = a + bx , where a = 2.096 and b = 1.107

the estimate of = + 2.2 , namely = a + 2.2b = 4.532

the 95% CI for = + 2.2 , namely (4.013, 5.051)

the estimate and 95% CI for = + x for all x over the range 0 x 3

the 95% PI for y = + 2.2 + e , namely (3.294, 5.771)

the 95% PI for y = + x + e for all x over the range 0 x 3 .

Note that one line not shown in the figure is the unknown 'true' regression line given
by the equation Ey = + x . In most situations, this will never be known exactly.

Observed values at x = 0,0.5,1,1.5,2,2.5,3


Estimate and 95% CI for mu at x = 2.2
95% prediction interval for y at x = 2.2

Estimate of mu as a function of x
95% CI for mu as a function of x
95% PI for y as a function of x

y (tonnes of wheat)

Inference on mu = alpha + x*beta and y = mu + e as functions of x

0.0

0.5

1.0

1.5
x (kilograms of fertiliser)

2.0

2.5

3.0

STAT2001_CH11A Page 22 of 24

R Code

options(digits=4); x = c(0, 0.5, 1, 1.5, 2, 2.5, 3); y = c(2.0, 3.1, 3.0, 3.8, 4.1, 4.3, 6.0)
xbar=mean(x); sumx2=sum(x^2); ybar=mean(y); sumxy = sum(x*y); n=length(y)
c(xbar,sumx2,ybar,sumxy) # 1.500 22.750 3.757 47.200
Sxx=sumx2-n*xbar^2; Sxy=sumxy-n*xbar*ybar; c(Sxx, Sxy) # 7.00 7.75
b=Sxy/Sxx; a =ybar-b*xbar; c(a,b) # 2.096 1.107
yhat=a+b*x; ehat=y-yhat; rbind(yhat, ehat)
SSE = sum(ehat^2); SSE # 0.9568
xval=2.2; muhat=a+xval*b; muhat # 4.532
s2=SSE/(n-2); c(s2, sqrt(s2)) # 0.1914 0.4374
Vhatmuhat=s2*(1/n + (xval-xbar)^2/Sxx); Vhatmuhat # 0.04073
tval=qt(0.975,n-2); tval # 2.571
CI = muhat + c(-1,1)*tval*sqrt(Vhatmuhat); CI # 4.013 5.051

ypred=muhat
Vhatypred=s2*(1+1/n + (xval-xbar)^2/Sxx); Vhatypred # 0.2321
PI = ypred + c(-1,1)*tval*sqrt(Vhatypred); PI # 3.294 5.771

X11(w=8,h=5)
plot(x,y,xlim=c(0,3),ylim=c(0,8),
main="Inference on mu = alpha + x*beta and y = mu + e as functions of x",
xlab="x (kilograms of fertiliser)",ylab="y (tonnes of wheat)",pch=16,cex=1.2)
abline(a,b,lwd=2);
points(rep(xval,3),c(muhat,CI),lwd=2,cex=1.2)
points(rep(xval,2),PI,pch=2,lwd=2,cex=1.2)

legend(0,8,c("Observed values at x = 0,0.5,1,1.5,2,2.5,3",


"Estimate and 95% CI for mu at x = 2.2",
" 95% prediction interval for y at x = 2.2"),
pch=c(16,1,2), pt.lwd=c(1,2,2),pt.cex=c(1.2,1.2,1.2))

STAT2001_CH11A Page 23 of 24
xvec=seq(0,3,0.01); J=length(xvec); CILBvec= rep(NA,J); CIUBvec= rep(NA,J)
for(j in 1:J){ xvalue=xvec[j]; muhatvalue=a+xvalue*b
CI = muhatvalue + c(-1,1)*tval*sqrt(s2*(1/n + (xvalue-xbar)^2/Sxx))
CILBvec[j]=CI[1]; CIUBvec[j]=CI[2] }
lines(xvec,CILBvec); lines(xvec,CIUBvec)

PILBvec= rep(NA,J); PIUBvec= rep(NA,J)


for(j in 1:J){ xvalue=xvec[j]; muhatvalue=a+xvalue*b
PI = muhatvalue + c(-1,1)*tval*sqrt(s2*(1+1/n + (xvalue-xbar)^2/Sxx))
PILBvec[j]=PI[1]; PIUBvec[j]=PI[2] }
lines(xvec, PILBvec,lty=2); lines(xvec, PIUBvec,lty=2)

legend(1.6,2.4,c("Estimate of mu as a function of x",


"95% CI for mu as a function of x",
"95% PI for y as a function of x"),lty=c(1,1,2), lwd=c(2,1,1))

The case of non-normality

Suppose that the sample values are not normally distributed. Then all of the above
inferences are still valid, with an understanding that they are only approximate, but
that the approximations improve as the sample size n increases, and that the
inferences are asymptotically exact, meaning exact in the limit as n tends to infinity.
This is true even if is unknown and replaced in the relevant formulae by s and/or if

t / 2 (n 2) and Tn 2 are replaced by z /2 and Z. It is true even if the error terms


e1 ,..., en are not independent and/or not identically distributed. All that is required is
that these terms are uncorrelated with mean zero and finite variance 2 .
These facts follow by the central limit theorem (CLT) and other results in probability
theory. As a very rough rule of thumb, the approximations may be considered 'good' if

n is 'large', meaning n 30 (say).

STAT2001_CH11A Page 24 of 24

The relationship between simple linear regression and correlation analysis


We have seen that simple linear regression can be a useful tool for exploring the
relationship between two variables x and y, where y is a random variable and x is a
non-random variable (the covariate). The strength of that relationship is reflected by
the least squares estimate b =

S xy
S xx

of the simple linear regression slope parameter .

Another useful tool for exploring the relationship between two variables x and y, but
in the context where both are random variables, is correlation analysis. In particular,

S xy

the strength of that relationship is reflected by the sample correlation, r =


which provides a consistent estimate of the correlation, = Corr ( x, y ) =

S xx S yy

C ( x, y )
.
Vx Vy

We see that there is a relationship between the two estimates, namely r = b

S xx
.
S yy

Another relationship between simple linear regression and correlation analysis is that

r2 =

S yy SSE
S yy

= 1

SSE
,
S yy

i =1

i =1

where SSE = ( yi a bxi ) 2 and S yy = ( yi y ) 2 .


(For a proof of this, see Section 11.8 in the textbook).
Here, r 2 may be interpreted as the proportion of the 'total' variation in the y-values
(around their average, y ) which is 'explained' by the x-variable in a simple linear
regression (i.e. by the variation of the y-values around their fitted values, a + bxi ).
We call r 2 the coefficient of determination. The idea of analysing the 'total' variation
in a regression (such as of y on x in our examples) and attributing part of that variation
to some variable (like x) (or set of variables) is called analysis of variance (ANOVA).
This is a topic in more advanced courses on regression.

You might also like