Lecture 3

Applied Statistical Methods, T. S.
Lu 1
LECTURE 3
1 The ANOVA Table for Straight-line Regression
An overall summary of the results of any regression analysis, whether

straight-line or nor, can be provided by a table called an analysis-
of-variance (ANOVA) table. The basic information in an ANOVA
table contains several estimates of variance.
The simplest version of the ANOVA table for straight-line regres-
sion is given below,
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 6394.02269 6394.02269 21.33 <.0001

Error 28 8393.44398 299.76586
Corrected Total 29 14787
The mean-square term is equal to sum of squares divided by its

degrees of freedom.
SSY: the total unexplained variance or the total sum of squares
about the mean, representing the total variance of Y before account-
ing for the linear effect of the variable X,
n
X
SSY = (Yi − Y )2
i=1
SSE: the sum of squares of deviations of observed Y ’s from the
Fall 2022
Applied Statistical Methods, T. S. Lu 2
fitted regression line; measuring the variation in the observed Y ’s

that remains after accounting for the linear effect of X,
n
X
SSE = (Yi − Ybi)2
i=1
SSR = (SSY - SSE): the sum of squares due to regression, repre-

senting the deviation of the predicted values from the mean Y ,
n
X
SSR = (Ybi − Y )2
i=1
Thus, we have the following result,
n
X n
X n
X
(Yi − Y )2 = (Ybi − Y )2 + (Yi − Ybi)2 (1)
i=1 i=1 i=1
or
Total unexplained variation = Variation due to regression + Unex-
plained residual variation
Equation (1) is called the fundamental equation of regression
analysis, holds for any general regression case.
It can be shown that the mean-square residual and mean-square
regression terms are statistically independent of one another. Thus,
if H0 : β1 = 0 is true, the ratio of these terms represents the ratio
of two independent estimates of the same variance σ 2. Under the
normality and independence assumptions about the Y ’s, such a ratio
Fall 2022
has the F distribution and this F test statistic can used to test the
hypothesis, H0: “No significant straight-line relationship of Y on X”
(i.e., H0 : β1 = 0).
Fall 2022
2 Multiple Regression Analysis
Multiple regression analysis can be viewed as an extension of straight-

line regression analysis to the case where more than one independent
variable must be considered in the model. The following are rea-
sons that dealing with several independent variables simultaneously
in a regression analysis is relatively more difficult than with a single
independent variable:
1. Since there might exist several reasonable candidates, it is more

difficult to choose the best model.
2. Since it is not possible to draw the graph of the data or the

fitted model directly, it is more difficult to visualize what the
final model looks like.
3. It may be not easy to interpret some terms in the best-fitting

model.
4. It is not possible to do the computations without a computer

program.
2.1 Multiple Regression Models
Y = β0 + β1X1 + β2X2 + . . . + βk Xk + ,
where β0, β1, . . . , βk are the it regression coefficients that need to be

Fall 2022
estimated; X1, X2, . . . , Xk are independent variables. The linear

statistical model for an observation is
Yi = β0 + β1X1i + β2X2i + . . . + βk Xki + i .
The assumptions are a generalization of those for simple linear

regression:
1. Existence: For each specific combination of values of the inde-

pendent variables X1, X2, . . . , Xk , Y is a random variable with a
certain probability distribution having finite mean and variance.
2. Independence: The Y observations are statistically independent

of one another.
3. Linearity: The mean value of Y for each specific combination of

X1, X2, . . . , Xk is a linear function of β0, β1, . . . , βk . That is,
µY |X1,X2,...,Xk = β0 + β1X1 + β2X2 + · · · + βk Xk
4. Homoscedasticity: The variance of Y is the same and unknown

for any fixed combination of X1, X2, . . . , Xk . That is,
σY2 |X1,X2,...,Xk = Var(Y |X1, X2, . . . , Xk ) ≡ σ 2
5. Normality: For any fixed combination of X1, X2, . . . , Xk , the
Fall 2022
variable Y is normally distributed.
Y ∼ N (µY |X1,X2,...,Xk , σ 2)
Notation: The estimated error ebi is usually called a residual,
ebi = Yi − Ybi = Yi − (βb0 + βb1X1i + βb2X2i + · · · + βbk Xki)
where βb0, βb1, . . . , βbk are sample coefficients (estimates).
2.2 Determining the Best Estimate of the Multiple Re-

gression Equation
We want to do the same things as with simple regression:
1. Evaluate assumptions
2. Fit model (estimate parameters)
3. Test hypotheses
4. Use model to predict/estimate, including measures of precision,

such as standard errors and confidence intervals
The formulas used to obtain the intercept and slope coefficients

are far more complex than those in the simple linear regression and
are best expressed using matrix algebra. We will not cover this part
Fall 2022
in this course and instead just assume that we obtain the results form
a computer program.
The criterion used here is the same as for simple linear regression;
we use the least-squares approach to minimizing the sum of squares
of the distances between the observed responses and those predicted
by the fitted model
n
X n
X
SSE = (Yi − Ybi)2 = (Yi − βb0 − βb1X1i − βb2X2i − · · · − βbk Xki)2
i=1 i=1
Example: Suppose that we want to know how weight(WGT) varies

with height(HGT) and age(AGE) for children with a particular kind
of nutritional deficiency. The dependent variable is Y = WGT, and
two independent variables are X1 = HGT and X2 = AGE.
The following ANOVA table is based on the model of WGT, using
HGT, AGE, and (AGE)2 as independent variables.
Table 1: ANOVA table for WGT regressed on HGT, AGE, and AGE2
Source d.f. SS MS F R2
Regression k = 3 SSY - SSE = 693.06 231.02 9.47 0.7803
Residual n − k − 1 = 8 SSE = 195.19 24.40
Total n − 1 = 11 SSY = 888.25
Pn
SSY = i=1 (Yi − Y )2 = 888.25: the total sum of squares, repre-
senting the total variability in the Y observations before accounting
Fall 2022
for the joint effect of the independent variables

Pn
SSE = i=1 (Yi − Ybi)2 = 195.19: the residual sum of squares,
representing the amount of Y variation left unexplained after the
independent variables used in the regression equation to predict Y
Pn
SSR = SSY - SSE = i=1 (Yi − Y
b )2 = 693.06: the regression sum
of squares, measuring the variation due to the independent variables
The regression degrees of freedom = k (the number of independent

variables in the model)
The residual degrees of freedom = n − k − 1
The total degrees of freedom = n − 1
SSY − SSE
R2 = (between 0 and 1): measuring how well the
SSY
fitted model containing the variables HGT, AGE, and AGE2 predicts
the dependent variable WGT.
The computer program will usually give us all of the sums of

squares, mean squares, intercept, slopes, etc.
The F-test shown in the ANOVA table tests the null hypothesis:
H0 : β1 = β2 = . . . = βk = 0
HA : at least one βj 6= 0 (j = 1, . . . , k)
Fall 2022
So in this example, we are testing β1 = β2 = β3 = 0
data temp; *name of the new data set;

set stat.wgt; *name of the old (existed) data set;
agesq = age*age;
label WGT="Weight"
HGT="Height"
agesq="Age Squared";
run;
*above is called the data step;
proc reg data=temp;

model wgt = hgt age agesq;
run;
quit;
The REG Procedure

Model: MODEL1
Dependent Variable: WGT Weight
Sum of Mean
Model 3 693.06046 231.02015 9.47 0.0052

Error 8 195.18954 24.39869
Corrected Total 11 888.25000
Root MSE 4.93950 R-Square 0.7803

Dependent Mean 62.75000 Adj R-Sq 0.6978
Coeff Var 7.87172
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 3.43843 33.61082 0.10 0.9210

HGT 1 0.72369 0.27696 2.61 0.0310
AGE 1 2.77687 7.42728 0.37 0.7182
agesq 1 -0.04171 0.42241 -0.10 0.9238
Fall 2022
We also want test each individual slope coefficient in order to

see which independent variables are contributing to predicting the
dependent variable. To test
H0 : βi = 0
HA : βi 6= 0
Let βi0 be the value of βi under H0; usually, βi0 = 0.

We form a t-statistic,
βbi − βi0
tobs =
sβbi
where the df is n − k − 1 and sβbi is the standard error of the estimate

for the ith slope coefficient.
A 100(1−α)% confidence interval on βi is given by βbi±t1−α/2,n−k−1sβbi .
Fall 2022
Remarks:
1. For a particular model, we fit the parameters, perform the hy-

pothesis tests, calculate the standard errors, and estimate the
confidence intervals.
2. If you change the model, everything will be changed. Thus, never

add or remove more than one variable at a time.
3. Sometimes you may reject H0 : R2 = 0, but fail to reject any of

the individual H0 : βi = 0
4. More frequently, you may fail to reject H0 : R2 = 0, but reject

one or more H0 : βi = 0
5. Commonly, the multiple regression is used for two kinds of mod-

elling:
(a) Predictive modelling: use of the model to obtain the best

predictive values you can, whether you understand the model
or not.
(b) Analytical modelling: use of the model to try to understand

the relationships; to evaluate how independent variables are
related to the dependent variable.
Fall 2022
2.3 Numerical Examples
Back to the example, we reject the null hypothesis at 0.05 signifi-

cance level since the p-value for the F-test is 0.0052. We now know
that at least 1 slope coefficient is non-zero, but we do not know
which one(s). We then can examine the estimates for the regression
coefficients. We can use the standard errors to obtain p-values and
confidence intervals on the parameter estimates; the fitted multiple
regression model is
Ybi = 3.44 + 0.72HGT + 2.78AGE − 0.04AGESQ
We can fit other possible model.

Model: WGT = β0 + β1HGT + β2AGE +
proc reg data=temp;

model wgt = hgt age;
run;
quit;
The REG Procedure

Model: MODEL1
Dependent Variable: WGT
Sum of Mean
Model 2 692.82261 346.41130 15.95 0.0011

Error 9 195.42739 21.71415
Corrected Total 11 888.25000
Fall 2022
Root MSE 4.65984 R-Square 0.7800

Dependent Mean 62.75000 Adj R-Sq 0.7311
Coeff Var 7.42605
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 6.55305 10.94483 0.60 0.5641

HGT 1 0.72204 0.26081 2.77 0.0218
AGE 1 2.05013 0.93723 2.19 0.0565
Clearly, the model above is better than the previous model if we

use R2 and model simplicity as our criteria for selecting a model.
Fall 2022
3 Testing Hypotheses in Multiple Regression
Three basic types of tests:
1. Overall test. Does the entire set of independent variables (or the
fitted model itself) contribute significantly to predict Y ?
2. Test for addition of a single variable. Does adding one particular

independent variable contribute significantly to predict Y given
that other independent variables are already in the model?
3. Test for addition of a group of variables. Does adding some

group of independent variable contribute significantly to pre-
dict Y given that other independent variables are already in the
model?
We can answer above question by performing statistical tests of hy-

potheses.
One characteristics of the tests is that each test can be interpreted
as a comparison of two models. One of these models is always referred
as the full or complete model; the other is called the reduced model.
Consider the following two models:
Y = β0 + β1X1 + β2X2 +
Fall 2022
and
Y = β0 + β1 X 1 +
Under H0 : β2 = 0, the full model reduces to the reduced model. A

test of H0 : β2 = 0 is then equivalent to determining which of these
two models is more appropriate.
3.1 Test for Significant Overall Regression
Consider the first question, an overall test for a model containing k

independent variables,
Y = β0 + β1X1 + β2X2 + · · · + βk Xk +
We can express the null hypothesis as

H0: All k independent variables considered together do not explain
a significant amount of the variation in Y
or
H0: There is no significant overall regression using all k independent
variables in the model
or
H0 : β1 = β2 = . . . = βk = 0
We use the mean-square quantities provided in the ANOVA table to
Fall 2022
perform the test. First, the F statistic is
RegressionM S (SSY - SSE)/k

F = = ,
ResidualM S SSE/(n − k − 1)
Pn 2
Pn b 2
where SSY = (Y
i=1 i − Y ) and SSE = i=1 (Yi − Yi ) . The
computed F can be compared with the critical point Fk,n−k−1,1−α ,

where α is the preselected significance level. We reject H0 if the
computed F exceeds the critical value.
Using the previous example, the regression of WGT on HGT,
AGE, AGE2, we have k = 3. From the formula, we have F = 9.47.
The critical point for α = 0.01 is F3,8,0.99 = 7.59. Since the p-value
is less than 0.01, we would reject H0 at α = 0.01. When interpret-
ing the results, we can conclude that the set of independent vari-
ables HGT, AGE, AGE2 significantly contribute to predict WGT.
However, this conclusion does not mean that all three variables are
needed for predict Y ; maybe only one or two are sufficient. In other
words, we may need further tests to determine the final model.
3.2 Partial F Test
Suppose that we want to test whether adding a variable X ∗ sig-

nificantly improves the prediction of Y , given that the variables,
X1, X2, . . . , Xk are in the model. The null hypothesis can be then
Fall 2022
stated as H0 : β ∗ = 0 in the model Y = β0 + β1X1 + β2X2 + · · · +

βk Xk + β ∗X ∗ + .
We first compute the extra sum of squares from adding X ∗ given
X1, X2, . . . , Xk , using the following formula,
SS(X ∗|X1, X2, . . . , Xk ) = SSR(X1, X2, . . . , Xk , X ∗)−SSR(X1, X2, . . . , Xk )
Again, using the previous example, we have
SS(X3|X1, X2) = SSR(X1, X2, X3) − SSR(X1, X2)
= 693.06 − 692.82
= 0.24
We compute
∗ SS(X ∗|X1, X2, . . . , Xk )

F (X |X1, X2, . . . , Xk ) =
Residual MS(X1, X2, . . . , Xk , X ∗)
for testing the null hypothesis that the addition of X ∗ to a model con-
taining X1, X2, . . . , Xk does not significantly improve the prediction
of Y . For our example, the partial F statistic is
SS(X3|X1, X2) 0.24

F (X3|X1, X2) = = = 0.01
Residual MS(X1, X2, X3) 24.40
The statistic F (X3|X1, X2) is 0.01 so that we fail to reject H0 re-

gardless of the significance level. We therefore conclude that adding
Fall 2022
X3 = AGE2 to the model already containing HGT and AGE does

not improve.
3.3 t Test Alternative
Performing the partial F test is equivalent to use a t test. The t test

focuses on testing the null hypothesis H0 : β ∗ = 0. The test statistic
βb∗ βb3
is T = Sβb∗ . Using the results in our example, we compute T = Sβb =
3
−0.0417 2
0.4224 = −0.10. Clearly, T = 0.01 = partial F (X3|X1, X2).
3.4 Multiple Partial F Test
This testing procedure is to assess the additional contribution of two

or more independent variables adding to the model already contain-
ing other variables. The test procedure is a straight-forward exten-
sion of the partial F test.
Fall 2022

Lecture 3

Uploaded by

Copyright:

Available Formats

Lecture 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3

Uploaded by

Copyright:

Available Formats

Applied Statistical Methods, T. S.

1 The ANOVA Table for Straight-line Regression

An overall summary of the results of any regression analysis, whether

Model 1 6394.02269 6394.02269 21.33 <.0001

The mean-square term is equal to sum of squares divided by its

SSE: the sum of squares of deviations of observed Y ’s from the

fitted regression line; measuring the variation in the observed Y ’s

SSR = (SSY - SSE): the sum of squares due to regression, repre-

Thus, we have the following result,

2 Multiple Regression Analysis

Multiple regression analysis can be viewed as an extension of straight-

1. Since there might exist several reasonable candidates, it is more

2. Since it is not possible to draw the graph of the data or the

3. It may be not easy to interpret some terms in the best-fitting

4. It is not possible to do the computations without a computer

2.1 Multiple Regression Models

where β0, β1, . . . , βk are the it regression coefficients that need to be

estimated; X1, X2, . . . , Xk are independent variables. The linear

Yi = β0 + β1X1i + β2X2i + . . . + βk Xki + i .

The assumptions are a generalization of those for simple linear

1. Existence: For each specific combination of values of the inde-

2. Independence: The Y observations are statistically independent

3. Linearity: The mean value of Y for each specific combination of

µY |X1,X2,...,Xk = β0 + β1X1 + β2X2 + · · · + βk Xk

4. Homoscedasticity: The variance of Y is the same and unknown

σY2 |X1,X2,...,Xk = Var(Y |X1, X2, . . . , Xk ) ≡ σ 2

5. Normality: For any fixed combination of X1, X2, . . . , Xk , the

variable Y is normally distributed.

Notation: The estimated error ebi is usually called a residual,

ebi = Yi − Ybi = Yi − (βb0 + βb1X1i + βb2X2i + · · · + βbk Xki)

where βb0, βb1, . . . , βbk are sample coefficients (estimates).

2.2 Determining the Best Estimate of the Multiple Re-

We want to do the same things as with simple regression:

2. Fit model (estimate parameters)

4. Use model to predict/estimate, including measures of precision,

The formulas used to obtain the intercept and slope coefficients

Example: Suppose that we want to know how weight(WGT) varies

for the joint effect of the independent variables

The regression degrees of freedom = k (the number of independent

The residual degrees of freedom = n − k − 1

The total degrees of freedom = n − 1

The computer program will usually give us all of the sums of

So in this example, we are testing β1 = β2 = β3 = 0

data temp; *name of the new data set;

proc reg data=temp;

The REG Procedure

Model 3 693.06046 231.02015 9.47 0.0052

Root MSE 4.93950 R-Square 0.7803

Intercept 1 3.43843 33.61082 0.10 0.9210

We also want test each individual slope coefficient in order to

Let βi0 be the value of βi under H0; usually, βi0 = 0.

where the df is n − k − 1 and sβbi is the standard error of the estimate

1. For a particular model, we fit the parameters, perform the hy-

2. If you change the model, everything will be changed. Thus, never

3. Sometimes you may reject H0 : R2 = 0, but fail to reject any of

4. More frequently, you may fail to reject H0 : R2 = 0, but reject

5. Commonly, the multiple regression is used for two kinds of mod-

(a) Predictive modelling: use of the model to obtain the best

(b) Analytical modelling: use of the model to try to understand

Yi = β0 + β1X1i + β2X2i + . . . + βk Xki + i .