Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
191 views

Chapter Three Statistical Inference in Simple Linear Regression Model

Based on the information provided, here are the key steps to conduct a Student's t-test to test the hypotheses regarding the slope coefficient b1: 1. Compute the variance of the residuals, s^2. This is an unbiased estimate of the population variance σ^2. 2. Compute the variances of the intercept b0 and slope b1 estimates using the formulas provided. 3. Compute the standard errors se(b0) and se(b1) as the square roots of the respective variances. 4. State the null and alternative hypotheses, for example: H0: b1 = 0 H1: b1 ≠ 0 5. Compute the t-

Uploaded by

eferem
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
191 views

Chapter Three Statistical Inference in Simple Linear Regression Model

Based on the information provided, here are the key steps to conduct a Student's t-test to test the hypotheses regarding the slope coefficient b1: 1. Compute the variance of the residuals, s^2. This is an unbiased estimate of the population variance σ^2. 2. Compute the variances of the intercept b0 and slope b1 estimates using the formulas provided. 3. Compute the standard errors se(b0) and se(b1) as the square roots of the respective variances. 4. State the null and alternative hypotheses, for example: H0: b1 = 0 H1: b1 ≠ 0 5. Compute the t-

Uploaded by

eferem
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter three

Statistical Inference in Simple linear


regression model

• Hypothesis testing: tests of significance of


parameter estimates
– Student’s t-test
• Test of goodness of fit/Model adequacy

Tasew T.(PhD)
2.7.1 Hypothesis testing
 The interest of an econometrician is not only in obtaining the
estimator (point estimation) but also in using it
 to make inferences about the true parameter (interval
estimation).
 We want to make inferences about the likely population values from the
regression parameters.
 Since sampling errors are inevitable in all estimates, it is necessary
to apply test of significance in order to measure the size of the error
and determine the degree of confidence in order to measure the
validity of these estimates.
Tasew T.(PhD)
Hypothesis testing
• To test the significance of the OLS parameter estimates
we need the following:
– Unbiased estimator of the variance of Ui
– Variance of the parameter estimators
– The assumption of normality of the distribution of
error term
• Each error term is assumed to be normally distributed.
Since the estimators are linear functions of the error
term, then they are also normally distributed.

Tasew T.(PhD)
i. Variance of the parameter estimators
• In order to test for the statistical significance of
the parameter estimates of the regression, the
variance of b0 and b1 is required:
(0 ) = å 2
X
Variance
varb × s 2
i

nå x 2

Var(b1) = s 2

åx 2

Tasew T.(PhD)
ii. Unbiased estimator of the variance of Ui

• But the population (error) variance, σ,2 is not


known. It has to be estimated
• In practice, s2 is unknown, we have to use the
residual variance as an (unbiased) estimate of s
2

å û
2
¬ RSS
sˆ 2= i
n-2
• where RSS = residual sum of squares, k
represents the number of parameter estimates
and n-2 is the degrees of freedom.
• Unbiased estimates of the variance of b0 and b1
are then given respectively by
Tasew T.(PhD)
ii. Unbiased estimator …

Tasew T.(PhD)
iii. Normality of the distribution of error term

• Since ui is normally distributed, Yi and


therefore b0 and b1 are also normally
distributed, so that we can use the t-
distribution with n –k degrees of freedom, to
test hypotheses about and construct
confidence intervals for b0 and b1.

Tasew T.(PhD)
Methods of testing hypothesis
• The statistical significance of the parameter estimates of
the regression can be done by using various tests. The
most common ones are:
– Student’s t-test/tests of significance approach
– Confidence interval
 In the language of statistics, the stated hypothesis is
known as the null hypothesis and is denoted by the symbol
Ho.
 The null hypothesis is usually tested against an alternative
hypothesis (or maintained hypothesis) denoted by H1 or HA

Tasew T.(PhD)
i. Student’s t-test/tests of significance approach

 It is a procedure by which sample results are used to verify the


truth or falsity of a null hypothesis.
 If the true β1 is specified under the null hypothesis, the t values can
readily be computed from the available sample and can serve as a
test statistic.
 A statistic is said to be statistically significant if the value of the test
statistic lies in the critical region. In such a case the null hypothesis
is rejected.
 A statistic is said to be insignificant if the value of the test statistic
lies in the “acceptance” region and the null is not rejected.
 Note: A ‘large’ t-value will be evidence against the null hypothesis
depending on the degrees of freedom.
Tasew T.(PhD)
Student’s t-test/tests of significance approach

• Usually we test whether coefficients are significantly


different from zero or not.
• The most common null hypothesis that is used to test for a
relationship between variables is the zero null hypothesis
• We can formulate hypothesis for slope coefficient as
follows:
• H0: β=0
• H1: β≠0
• In order to test this hypothesis we need to form the test
function relevant for this case. We know that the sample
estimator is normally distributed with a mean and standard
error. Tasew T.(PhD)
steps in testing hypothesis:
• To undertake the above test we follow the following steps:

• Step 1: state the hypothesis

• Step 2: Compute tcal/t-value

• Step 3: Choose “significance level” often denoted by.

• Step 4: : Check whether there is one tail test or two tail tests.
– If the inequality sign in the alternative hypothesis is ≠, then it implies a two tail
test and divide the chosen level of significance by two; decide the critical value
of t.
– But if the inequality sign is either > or < then it indicates one tail test and there
is no need to divide the chosen level of significance by two to obtain the critical
Tasew T.(PhD)
steps in testing hypothesis:
• Step 5: Obtain critical value of t, tc. We need some tabulated
distribution with which to compare the estimated test statistics

• Step 6: compare (the computed value of t) and tc (critical value of t)


• Decision rule:
• If | t cal| > | tc | reject H0 and accept H1.
– The conclusion is b1 is statistically significant.
• If tcal< tc , accept H0 and reject H1.
– The conclusion is b1 is statistically insignificant or statistically is
not different from zero.

Tasew T.(PhD)
Two-Tailed T-test

b*
1.
H 0: b1 = 1 State the hypothesis

H 1 :b1 ¹b1
*

bˆ 1 -b1
*

t=
Se(bˆ 1 )
2. Compute

3. Check t-table for critical t value:

Tasew T.(PhD)
Two-Tailed t-test (cont.)
4. Compare t and

Decision Rule:
5. If t > tc or -t < - tc , then reject Ho
or | t | > | t c
|

Accept
region

reject H0 region reject H0 region

bˆ 1 - t ca * Se(bˆ 1) b1 bˆ 1 + t a (bˆ 1)
c
* Se
- -
2, n 2 2, n 2

Tasew T.(PhD)
One-tailed T-test decision rule

Step 1:
H 0 : b1 £ b*
1
(H 0 : b1 ³ B*1) State the
H 1 : b1 > b* 1
(H 1 : b1
< B*
1
) hypothesis

Step 2: bˆ 1 - B*
1
Se(b ˆ)
t* = Computed t-value
1
Step 3:Choose level of “significance level” often denoted by.
Step 4: check t-table for c
look for critical t value
ta , n-2

Step 4: compare tc and t*

Tasew T.(PhD)
One-tailed t-test decision rule
Decision Rule
Step 5: If t > tc ==> reject H0
If t < tc ==> don’t reject H0 Right-tail

Right-tail left-tail

0 tc <t 0
t< -tc

(If t < - tc ==> reject H0 )


(If t > - tc ==> not reject H0 ) Left-tail
Tasew T.(PhD)
The Zero Null Hypothesis and the 2-t Rule of Thumb

 The most common null hypothesis that is used to test for a


relationship between variables is the zero null hypothesis.
 If the number of degrees of freedom is 20 and more and if the level
of significance is set at 0.05 then the null hypothesis β = 0 can be
rejected if t-value computed exceeds 2 in absolute value.
 p-value: the smallest level of significance or the smallest value of α
for which the null hypothesis is to be rejected.
 The p-value corresponds to the value of Sig.

Tasew T.(PhD)
Student’s t-test/tests of significance approach

• From the previous example, find


– variance of the residual,
– var(b0), var(b1), se(b0), se(b1),
– Hypothesis testing: formulate hypothesis and test
whether b0 and b1 are statistically significant at the
5% level.
• Note: show all the necessary steps and the
calculations required to test the statistical
significance of the parameter estimates.
Tasew T.(PhD)
Table: steps to compute variance of residual
and parameters

Tasew T.(PhD)
Variances and standard errors of bo and b1:

Tasew T.(PhD)
• t-values or tcal: to=13.7; t1=16.6
• Since both t0 and t1 exceed t=2:306 with 8 df
at the 5% level of significance (from t-table),
we conclude that both b0 and b1 are
statistically significant at the 5% level.

Tasew T.(PhD)
One-Tailed t-test
Example: Gujarati (2003)pp.123
Given b1 = 0.5091, n = 10, Se(b1) = 0.0357,
H 0 : b1  0.3
We could postulate that:
H 1 : b 1 > 0 .3
1. Compute:

bˆ 1 -b 1
t=
Se(bˆ1)
0 . 5091 - 0 . 3 0 . 2091
t= = =5.857
0 . 0357 0 . 0357
Tasew T.(PhD)
One-Tailed t-test (cont.)

2. Check t-table for


where =1.860
 = 0.05

3. Compare t and the critical t

t = 5 . 857 > t0c. 05 , 8 = 1 . 860


\ reject H 0

Tasew T.(PhD)
Critical value of t distribution

Tasew T.(PhD)
2.8 Test of Model Adequacy
 Is the estimated equation a useful one? To answer this, an
objective measure of some sort is desirable.
 The total variation in the dependent variable Y is given by:
 (yi - y)2
_
 Our goal is to partition this variation into two:
 one that accounts for variation due to the regression equation
(explained portion) and
 another that is associated with the unexplained portion of the
model.
 In other words, the total sum of squares (TSS) is decomposed
into regression (explained) sum of squares (RSS) and error
(residual or unexplained) sum of squares (ESS).

Tasew T.(PhD)
Explained and Unexplained Variation

• Total variation is made up of two parts:

Total sum Error Sum of Regression Sum of


of Squares Squares Squares

where:
=Average/mean value of the dependent variable
y = Observed values of the dependent variable =
Tasew T.(PhD)
Estimated value of y for the given x value
Explained and Unexplained Variation

… =
 TSS = total sum of squares  RSS = regression sum of
 Measures the variation of the yi squares
values around their mean y  Explained variation
 ESS=error(residual or attributable to the
unexplained) sum of squares-
relationship between x and y
 a measure of the dispersion of the
observed values of Y about the  measures the amount of the
regression line. total variability in the
 Variation attributable to observed values of Y that is
factors other than the accounted for by the linear
relationship between x and y relationship between the
observed values of X and Y.
Tasew T.(PhD)
Explained and Unexplained Variation
(continued)
y
yi 
2
ESS = (yi - yi ) y
_
TSS = (yi - y)2

y  _2
_ RSS = (yi - y) _
y y

Xi x
Tasew T.(PhD)

 If a regression equation does a• This proportion is called the sample
good job of describing the coefficient of determination, R2.
relationship between two• The coefficient of determination is the
portion of the total variation in the
variables, the explained sum of dependent variable that is explained
squares should constitute a by variation in the independent
large proportion of the total variable.
sum of squares. • The coefficient of determination is
 Thus, it would be of interest to also called R-squared and is denoted
as R2 R2 =RSS/TSS = 1-(ESS/TSS)
determine the magnitude of
• If R2 is close to 1, x is a ‘good’ explanatory
this proportion by computing variable while if r2 is close to 0, it
the ratio of the explained sum explains very little of the variation.
of squares to the total sum of• The r2 is taken as a measure of
squares. “goodness —of- fit”.
Tasew T.(PhD)
• R2 = r2

• while r2 measures the variation of the dependent variable y that is


explained by the independent variable x, the correlation
coefficient r measures the association between the two variables.
• The proportion of total variation in the dependent variable (Y) that is
explained by changes in the independent variable (X) or by the
regression line is equal to: R2 x100%.
• The proportion of total variation in the dependent variable (Y) that is
due to factors other than X (for example, due to excluded variables,
chance, etc) is equal to: (1–R2 )Tasew T.(PhD)
x 100%.
Tests for the coefficient of determination

 The largest value that R2 can assume is 1 (in which case all
observations fall on the regression line), and the smallest it can
assume is zero.
 A low value of R2 is an indication that:
 X is a poor explanatory variable in the sense that variation in X leaves Y
unaffected, or
 while X is a relevant variable, its influence on Y is weak as
compared to some other variables that are omitted from the
regression equation, or
 the regression equation is misspecified (for example, an
exponential relationship might be more appropriate.
 Thus, a small value of R2 casts doubt about the usefulness of the
regression equation. We do not, however, pass final judgment on the
equation until it has been subjected to an objective statistical test.
Tasew T.(PhD)
Testing the significance of R2
 Such a test is accomplished by means of analysis of variance
(ANOVA) which enables us to test the significance of R2 (i.e., the
adequacy of the linear regression model).
 To test for the significance of R 2, we compare the variance ratio
with the critical value from the F distribution with k-1 and (n-2)
degrees of freedom in the numerator and denominator,
respectively, for a given significance level α
 Fcal= (RSS/k-1)/(ESS/n-2)= calculated variance ratio
 Decision: If the calculated variance ratio exceeds the tabulated value,
that is, if Fcal > F α(1,n-2) , we then conclude that R2 is significant (or
that the linear regression model is adequate).
 The F test is designed to test the significance of all variables or a set of
variables in a regression model.
 In the two-variable model, however, it is used to test the explanatory
power of a single variable (X), and at the same time, is equivalent to the
test of significance of R2. Tasew T.(PhD)
Example

Find the coefficient of


determination for the corn-
fertilizer example
Thus the regression
equation explains about 97%
of the total variation in corn
output.
The remaining 3% is
attributed to factors included
in the error term

Tasew T.(PhD)

You might also like