Lecture 3
Lecture 3
Lecture 3
Lu 1
LECTURE 3
Sum of Mean
Source DF Squares Square F Value Pr > F
Fall 2022
Applied Statistical Methods, T. S. Lu 2
n
X n
X n
X
(Yi − Y )2 = (Ybi − Y )2 + (Yi − Ybi)2 (1)
i=1 i=1 i=1
or
Total unexplained variation = Variation due to regression + Unex-
plained residual variation
Equation (1) is called the fundamental equation of regression
analysis, holds for any general regression case.
It can be shown that the mean-square residual and mean-square
regression terms are statistically independent of one another. Thus,
if H0 : β1 = 0 is true, the ratio of these terms represents the ratio
of two independent estimates of the same variance σ 2. Under the
normality and independence assumptions about the Y ’s, such a ratio
Fall 2022
Applied Statistical Methods, T. S. Lu 3
has the F distribution and this F test statistic can used to test the
hypothesis, H0: “No significant straight-line relationship of Y on X”
(i.e., H0 : β1 = 0).
Fall 2022
Applied Statistical Methods, T. S. Lu 4
Y = β0 + β1X1 + β2X2 + . . . + βk Xk + ,
Fall 2022
Applied Statistical Methods, T. S. Lu 6
Y ∼ N (µY |X1,X2,...,Xk , σ 2)
1. Evaluate assumptions
3. Test hypotheses
Fall 2022
Applied Statistical Methods, T. S. Lu 7
in this course and instead just assume that we obtain the results form
a computer program.
The criterion used here is the same as for simple linear regression;
we use the least-squares approach to minimizing the sum of squares
of the distances between the observed responses and those predicted
by the fitted model
n
X n
X
SSE = (Yi − Ybi)2 = (Yi − βb0 − βb1X1i − βb2X2i − · · · − βbk Xki)2
i=1 i=1
Source d.f. SS MS F R2
Regression k = 3 SSY - SSE = 693.06 231.02 9.47 0.7803
Residual n − k − 1 = 8 SSE = 195.19 24.40
Total n − 1 = 11 SSY = 888.25
Pn
SSY = i=1 (Yi − Y )2 = 888.25: the total sum of squares, repre-
senting the total variability in the Y observations before accounting
Fall 2022
Applied Statistical Methods, T. S. Lu 8
SSY − SSE
R2 = (between 0 and 1): measuring how well the
SSY
fitted model containing the variables HGT, AGE, and AGE2 predicts
the dependent variable WGT.
Fall 2022
Applied Statistical Methods, T. S. Lu 9
agesq = age*age;
label WGT="Weight"
HGT="Height"
agesq="Age Squared";
run;
*above is called the data step;
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Fall 2022
Applied Statistical Methods, T. S. Lu 10
H0 : βi = 0
HA : βi 6= 0
βbi − βi0
tobs =
sβbi
Fall 2022
Applied Statistical Methods, T. S. Lu 11
Remarks:
Fall 2022
Applied Statistical Methods, T. S. Lu 12
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Fall 2022
Applied Statistical Methods, T. S. Lu 13
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Fall 2022
Applied Statistical Methods, T. S. Lu 14
1. Overall test. Does the entire set of independent variables (or the
fitted model itself) contribute significantly to predict Y ?
Y = β0 + β1X1 + β2X2 +
Fall 2022
Applied Statistical Methods, T. S. Lu 15
and
Y = β0 + β1 X 1 +
Y = β0 + β1X1 + β2X2 + · · · + βk Xk +
Fall 2022
Applied Statistical Methods, T. S. Lu 16
Fall 2022
Applied Statistical Methods, T. S. Lu 17
= 693.06 − 692.82
= 0.24
We compute
for testing the null hypothesis that the addition of X ∗ to a model con-
taining X1, X2, . . . , Xk does not significantly improve the prediction
of Y . For our example, the partial F statistic is
Fall 2022
Applied Statistical Methods, T. S. Lu 18
Fall 2022