Chapter Three Statistical Inference in Simple Linear Regression Model
Chapter Three Statistical Inference in Simple Linear Regression Model
Tasew T.(PhD)
2.7.1 Hypothesis testing
The interest of an econometrician is not only in obtaining the
estimator (point estimation) but also in using it
to make inferences about the true parameter (interval
estimation).
We want to make inferences about the likely population values from the
regression parameters.
Since sampling errors are inevitable in all estimates, it is necessary
to apply test of significance in order to measure the size of the error
and determine the degree of confidence in order to measure the
validity of these estimates.
Tasew T.(PhD)
Hypothesis testing
• To test the significance of the OLS parameter estimates
we need the following:
– Unbiased estimator of the variance of Ui
– Variance of the parameter estimators
– The assumption of normality of the distribution of
error term
• Each error term is assumed to be normally distributed.
Since the estimators are linear functions of the error
term, then they are also normally distributed.
Tasew T.(PhD)
i. Variance of the parameter estimators
• In order to test for the statistical significance of
the parameter estimates of the regression, the
variance of b0 and b1 is required:
(0 ) = å 2
X
Variance
varb × s 2
i
nå x 2
Var(b1) = s 2
åx 2
Tasew T.(PhD)
ii. Unbiased estimator of the variance of Ui
å û
2
¬ RSS
sˆ 2= i
n-2
• where RSS = residual sum of squares, k
represents the number of parameter estimates
and n-2 is the degrees of freedom.
• Unbiased estimates of the variance of b0 and b1
are then given respectively by
Tasew T.(PhD)
ii. Unbiased estimator …
Tasew T.(PhD)
iii. Normality of the distribution of error term
Tasew T.(PhD)
Methods of testing hypothesis
• The statistical significance of the parameter estimates of
the regression can be done by using various tests. The
most common ones are:
– Student’s t-test/tests of significance approach
– Confidence interval
In the language of statistics, the stated hypothesis is
known as the null hypothesis and is denoted by the symbol
Ho.
The null hypothesis is usually tested against an alternative
hypothesis (or maintained hypothesis) denoted by H1 or HA
Tasew T.(PhD)
i. Student’s t-test/tests of significance approach
• Step 4: : Check whether there is one tail test or two tail tests.
– If the inequality sign in the alternative hypothesis is ≠, then it implies a two tail
test and divide the chosen level of significance by two; decide the critical value
of t.
– But if the inequality sign is either > or < then it indicates one tail test and there
is no need to divide the chosen level of significance by two to obtain the critical
Tasew T.(PhD)
steps in testing hypothesis:
• Step 5: Obtain critical value of t, tc. We need some tabulated
distribution with which to compare the estimated test statistics
Tasew T.(PhD)
Two-Tailed T-test
b*
1.
H 0: b1 = 1 State the hypothesis
H 1 :b1 ¹b1
*
bˆ 1 -b1
*
t=
Se(bˆ 1 )
2. Compute
Tasew T.(PhD)
Two-Tailed t-test (cont.)
4. Compare t and
Decision Rule:
5. If t > tc or -t < - tc , then reject Ho
or | t | > | t c
|
Accept
region
bˆ 1 - t ca * Se(bˆ 1) b1 bˆ 1 + t a (bˆ 1)
c
* Se
- -
2, n 2 2, n 2
Tasew T.(PhD)
One-tailed T-test decision rule
Step 1:
H 0 : b1 £ b*
1
(H 0 : b1 ³ B*1) State the
H 1 : b1 > b* 1
(H 1 : b1
< B*
1
) hypothesis
Step 2: bˆ 1 - B*
1
Se(b ˆ)
t* = Computed t-value
1
Step 3:Choose level of “significance level” often denoted by.
Step 4: check t-table for c
look for critical t value
ta , n-2
Tasew T.(PhD)
One-tailed t-test decision rule
Decision Rule
Step 5: If t > tc ==> reject H0
If t < tc ==> don’t reject H0 Right-tail
Right-tail left-tail
0 tc <t 0
t< -tc
Tasew T.(PhD)
Student’s t-test/tests of significance approach
Tasew T.(PhD)
Variances and standard errors of bo and b1:
Tasew T.(PhD)
• t-values or tcal: to=13.7; t1=16.6
• Since both t0 and t1 exceed t=2:306 with 8 df
at the 5% level of significance (from t-table),
we conclude that both b0 and b1 are
statistically significant at the 5% level.
Tasew T.(PhD)
One-Tailed t-test
Example: Gujarati (2003)pp.123
Given b1 = 0.5091, n = 10, Se(b1) = 0.0357,
H 0 : b1 0.3
We could postulate that:
H 1 : b 1 > 0 .3
1. Compute:
bˆ 1 -b 1
t=
Se(bˆ1)
0 . 5091 - 0 . 3 0 . 2091
t= = =5.857
0 . 0357 0 . 0357
Tasew T.(PhD)
One-Tailed t-test (cont.)
Tasew T.(PhD)
Critical value of t distribution
Tasew T.(PhD)
2.8 Test of Model Adequacy
Is the estimated equation a useful one? To answer this, an
objective measure of some sort is desirable.
The total variation in the dependent variable Y is given by:
(yi - y)2
_
Our goal is to partition this variation into two:
one that accounts for variation due to the regression equation
(explained portion) and
another that is associated with the unexplained portion of the
model.
In other words, the total sum of squares (TSS) is decomposed
into regression (explained) sum of squares (RSS) and error
(residual or unexplained) sum of squares (ESS).
Tasew T.(PhD)
Explained and Unexplained Variation
where:
=Average/mean value of the dependent variable
y = Observed values of the dependent variable =
Tasew T.(PhD)
Estimated value of y for the given x value
Explained and Unexplained Variation
… =
TSS = total sum of squares RSS = regression sum of
Measures the variation of the yi squares
values around their mean y Explained variation
ESS=error(residual or attributable to the
unexplained) sum of squares-
relationship between x and y
a measure of the dispersion of the
observed values of Y about the measures the amount of the
regression line. total variability in the
Variation attributable to observed values of Y that is
factors other than the accounted for by the linear
relationship between x and y relationship between the
observed values of X and Y.
Tasew T.(PhD)
Explained and Unexplained Variation
(continued)
y
yi
2
ESS = (yi - yi ) y
_
TSS = (yi - y)2
y _2
_ RSS = (yi - y) _
y y
Xi x
Tasew T.(PhD)
…
If a regression equation does a• This proportion is called the sample
good job of describing the coefficient of determination, R2.
relationship between two• The coefficient of determination is the
portion of the total variation in the
variables, the explained sum of dependent variable that is explained
squares should constitute a by variation in the independent
large proportion of the total variable.
sum of squares. • The coefficient of determination is
Thus, it would be of interest to also called R-squared and is denoted
as R2 R2 =RSS/TSS = 1-(ESS/TSS)
determine the magnitude of
• If R2 is close to 1, x is a ‘good’ explanatory
this proportion by computing variable while if r2 is close to 0, it
the ratio of the explained sum explains very little of the variation.
of squares to the total sum of• The r2 is taken as a measure of
squares. “goodness —of- fit”.
Tasew T.(PhD)
• R2 = r2
The largest value that R2 can assume is 1 (in which case all
observations fall on the regression line), and the smallest it can
assume is zero.
A low value of R2 is an indication that:
X is a poor explanatory variable in the sense that variation in X leaves Y
unaffected, or
while X is a relevant variable, its influence on Y is weak as
compared to some other variables that are omitted from the
regression equation, or
the regression equation is misspecified (for example, an
exponential relationship might be more appropriate.
Thus, a small value of R2 casts doubt about the usefulness of the
regression equation. We do not, however, pass final judgment on the
equation until it has been subjected to an objective statistical test.
Tasew T.(PhD)
Testing the significance of R2
Such a test is accomplished by means of analysis of variance
(ANOVA) which enables us to test the significance of R2 (i.e., the
adequacy of the linear regression model).
To test for the significance of R 2, we compare the variance ratio
with the critical value from the F distribution with k-1 and (n-2)
degrees of freedom in the numerator and denominator,
respectively, for a given significance level α
Fcal= (RSS/k-1)/(ESS/n-2)= calculated variance ratio
Decision: If the calculated variance ratio exceeds the tabulated value,
that is, if Fcal > F α(1,n-2) , we then conclude that R2 is significant (or
that the linear regression model is adequate).
The F test is designed to test the significance of all variables or a set of
variables in a regression model.
In the two-variable model, however, it is used to test the explanatory
power of a single variable (X), and at the same time, is equivalent to the
test of significance of R2. Tasew T.(PhD)
Example
Tasew T.(PhD)