Lecture Two (Copy)
Lecture Two (Copy)
Lecture Two (Copy)
Example: SLR (taken from Wooldridge 5th edition) In a non-linear model, the relationships between variables
are curved lines.
11 What
about
By: Mmaru experience, natural ability,….?
S.(MA) 30/05/2023
Variation
12 By: Mmaruin Y= Explained variation+ Unexplained variation
S.(MA) 30/05/2023
Econometrics 5/30/2023
Cont’d… Cont’d…
Cont’d… Cont’d…
Cont’d…
Cont’d… Cont’d…
Technically, Var( X ) must be a finite positive number. No matter what the r/s actually is, we have to know how to estimate
This means that X assumes d/t values in a given the parameters (the β).
sample; but it assumes fixed values in hypothetical The most widely known estimation techniques are;
repeated samples. Method of Ordinary Least Squares (OLS)
Assumption 9: The regression model is Method of Moments (MM)
correctly specified Method of Maximum Likelihood Estimation (ML)
This means that the mathematical form of the
model is correctly specified & all important
explanatory variables are included in it.
There is no specification bias or error in the
model used in empirical analysis.
25 By: Mmaru S.(MA) 30/05/2023 26 By: Mmaru S.(MA) 30/05/2023
Cont’d Cont’d
The magnitude of the residuals is the vertical distance between the The sum of squares of the errors (SSE) is:
actual observed points and the estimating line (see the figure below).
Cont’d… Cont’d
Sometimes the theorem referred as the BLUE Recall Assumptions of the CLRM:
theorem i.e. Best, Linear, Unbiased Estimator. An 1. The true model is: Yi = a + bXi + ei
estimator is called BLUE if: 2. The error terms have zero mean: E(e i ) = 0
A. Linear: a linear function of the random variable,
3. Homoskedasticity (error terms have constant
such as, the dependent variable Y. variance):
B. Unbiased: its average or expected value is equal
to the true population parameter.
4. No error autocorrelation (the error terms ei are
C. Minimum variance: it has a minimum variance in
statistically independent of each other)
the class of linear and unbiased estimators. An
unbiased estimator with the least variance is
known as an efficient estimator. 5. Xi are deterministic (non-stochastic): Xi and e j are
independent for all i, j
33 By: Mmaru S.(MA) 30/05/2023 34 By: Mmaru S.(MA) 30/05/2023
Cont’d Cont’d
The Gauss-Markov Theorem: Proof of Gauss-Markov Theorem:
Under assumptions (1) – (5) of the CLRM, the OLS estimators Here we will prove that ̂ is the BLUE of . The proof for ̂ can
be done similarly.
ˆ and ˆ are Best Linear Unbiased Estimators (BLUE).
The theorem tells us that of all estimators of and which are
linear and which are unbiased, the estimators
resulting from OLS have the minimum variance, that
is, ˆ and ˆ are the best (most efficient) linear unbiased
estimators (BLUE) of and .
Note: If some of the assumptions stated above do not hold, then
OLS estimators are no more BLUE!!! Thus, we can see that ̂ is a linear estimator as it can be written
as a weighted average of the individual observations on Y
35 By: Mmaru S.(MA) 30/05/2023 36 By: Mmaru S.(MA) 30/05/2023
Econometrics 5/30/2023
Cont’d Cont’d
c) To show that ̂ has the smallest variance out of all linear
unbiased estimators of .
Cont’d Cont’d
We have seen above (in proof (a)) that the OLS estimator of can be
expressed as:
Cont’d Cont’d
Taking expectations we have:
Cont’d… Cont’d..
1. The square of correlation coefficient (𝒓𝟐 ) which
Priori criteria set by economic theories are in line
is used for judging the explanatory power of the linear
with the consistency of coefficients of econometric
regression of Y on X or on X’s. The square of
model to the economic theory.
correlation coefficient in simple regression is known
Statistical criteria, also known as first order tests,
as the coefficient of determination and is given by
are set by statistical theory and refer to evaluate the
(𝑅 ). The coefficient of determination measures the
statistical reliability of the model.
goodness of fit of the line of regression on the
Econometric criteria refer to whether the
observed sample values of Y and X.
assumptions of an econometric model employed in
2. The standard error test of the parameter
estimating the parameters are fulfilled or not.
estimates applied for judging the statistical reliability
There are two most commonly used tests in
of the estimates. This test measures the degree of
econometrics. These are:
confidence that we may attribute to the estimates.
47 By: Mmaru S.(MA) 30/05/2023 48 By: Mmaru S.(MA) 30/05/2023
Econometrics 5/30/2023
Cont’d Cont’d
Cont’d Cont’d
From remarks (1) and (2) it follows that The error (residual or unexplained) sum of squares (ESS) is
a measure of the dispersion of the observed values of Y about
the regression line. This is computed as:
Thus, the cross product term in equation (*) vanishes, and we are left If a regression equation does a good job of describing the relationship
with:
between two variables, the explained sum of squares should
constitute a large proportion of the total sum of squares.
Thus, it would be of interest to determine the magnitude of
this proportion by computing the ratio of the explained
In other words, the total sum of squares (TSS) is decomposed into
sum of squares to the total sum of squares.
regression (explained) sum of squares (RSS) and error (residual or This proportion is called the sample coefficient of
unexplained) sum of squares (ESS). determination.
55 By: Mmaru S.(MA) 30/05/2023 56 By: Mmaru S.(MA) 30/05/2023
Econometrics 5/30/2023
Cont’d Cont’d
The higher the coefficient of determination is the
better the fit. Conversely, the smaller the
Note coefficient of determination is the poorer the fit.
1) The proportion of total variation in the dependent variable That is why the coefficient of determination is
(Y) that is explained by changes in the independent variable used to compare two or more models.
(X) or by the regression line is equal to: R2 x100%. One minus the coefficient of determination is
2) The proportion of total variation in the dependent variable called the coefficient of non-determination, and it
(Y) that is due to factors other than X (for example, due to gives the proportion of the variation in the
excluded variables, chance, etc) is equal to: (1– R2 ) x dependent variable that remained undetermined
100%. or unexplained by the model.
57 By: Mmaru S.(MA) 30/05/2023 58 By: Mmaru S.(MA) 30/05/2023
ii) Testing the significance of a given regression coefficient A. The Standard Error Test
• The significance level is a measure of how strong the This test first establishes the two hypotheses that are going to be
sample evidence must be before determining the results tested which are commonly known as the null and alternative
are statistically significant. hypotheses. The null hypothesis addresses that the sample is
• Significance level (a) should be chosen in advance. coming from the population whose parameter is not significantly
• A significance level of 0.05 (which is α = 5%) is widely different from zero while the alternative hypothesis addresses that
used. the sample is coming from the population whose parameter is
• Our decision will always be rejecting the null significantly different from zero. The two hypotheses are given as
hypothesis if the P-value is less than the level of α we follows:
chose N.B Standard deviation is the positive square root of the
• Rejection of our null hypothesis suggests that our variance. I.e. The standard error of a given coefficient is the
research hypothesis is true positive square root of the variance of the coefficient.
• Moreover, to carry out a t-test, we need to know the
61 standard errors of individual estimators
By: Mmaru S.(MA) 30/05/2023 62 By: Mmaru S.(MA) 30/05/2023
Cont’d… Cont’d…
Consider the simple linear regression model Next, compare the standard errors of the estimates with the
numerical values of the estimates and make decision.
If there no relationship between X and Y then the null hypothesis A) If the standard error of the estimate is less than half of the
is expressed as: numerical value of the estimate, we can conclude that the estimate
1
The alternative hypothesis is that there is a significant relationship is statistically significant.That is, if se ( i ) ( i ) ,
2
between X and Y, that is, reject the null hypothesis and we can conclude that the estimate is
In order to reject or not reject the null hypothesis, we calculate statistically significant.
the standard error test statistic given by: B) If the standard error of the estimate is greater than half of the
U2 numerical value of the estimate, the parameter estimate is not
se ( 1 )
1
statistically reliable.That is, if se ( i ) ( i ) ,
𝑆𝐸 𝛽 = 𝑉𝑎𝑟(𝛽 ) x i2 2
conclude to accept the null hypothesis and conclude that the
𝑆𝐸 𝛽 = 𝑉𝑎𝑟(𝛽 )
U2 X i2
63 By: Mmaru S.(MA) se ( 0 ) 30/05/2023 estimate
64 is not
By: Mmaru S.(MA)statistically significant 30/05/2023
n x i2
Econometrics 5/30/2023
Cont’d… Cont’d….
One-Sample Z -Test The standard normal test or Z-test is outline as follows;
A one-sample z test is used to check if there is a 1.Test the null hypothesis against the alternative hypothesis
difference between the sample mean and the population 2. Determine the level of significant. It is the probability of
mean when the population standard deviation is known. committing type I error, i.e. the probability of rejecting the null
The formula for the z test statistic is given as follows: hypothesis while it is true. It is common in applied econometrics
𝑍= to use 5% level of significance.
3. Determine the tabulated value of Z from the table.
𝑋 is the sample mean, 4. Make decision.
μis the population mean, If 𝑍 < 𝑍 , accept the null hypothesis and conclude that
σ is the population standard deviation and the estimate is not statistically significant, while, If
n is the sample size. 𝑍 > 𝑍 , reject the null hypothesis and conclude that the.
71 By: Mmaru S.(MA) 30/05/2023 72 By: Mmaru S.(MA) estimate is statistically significant. 30/05/2023
Econometrics 5/30/2023
Example Cont’d…
1. Suppose that from a sample size n =20 we estimate the Step 2: Since the alternative hypothesis (𝐻 ) is
following consumption function. stated by inequality sign (≠), it is a two tail test,
𝐶 = 100 + 0.70 + 𝑒 hence we divide ⁄ = . ⁄ = 0.025 to obtain
75.5 (0.21)
the critical value of ‘t’ at ⁄ = 0.025 & 18 degree
Solution:
of freedom (df) i.e. (n-2=20-2).
The values in the brackets are standard errors. We want to
test the null hypothesis: 𝐻 : 𝛽 = 0 against the • From the t-table 𝑡 at 0.025 level of significance
alternative: 𝐻 : 𝛽 ≠ 0 using the t-test at 5% level of & 18 df is 2.10.
significance. Step 3: Since t*=3.3 and 𝑡 = 2.1, 𝑡 ∗ > 𝑡 . It
Step one: The t-value for the test statistic is: implies that 𝛽 is statistically significant.
𝛽−0 𝛽 0.70
𝑡∗ = = = = 3.3
77 By: Mmaru S.(MA)
𝑆𝐸(𝛽 ) 𝑆𝐸(𝛽 ) 0.21 30/05/2023 78 By: Mmaru S.(MA) 30/05/2023
Cont’d… Cont’d..
If both deviations are positive or negative then this will
give us a positive value (indicative of the deviations On the other hand, a negative covariance indicates that as
being in the same direction), one variable deviates from the mean (e.g. increases), the
But if one deviation is positive and one negative then other deviates from the mean in the opposite direction
the resulting product will be negative. (e.g. decreases).
Calculating the covariance is a good way to assess There is, however, one problem with covariance as
whether two variables are related to each other. a measure of the relationship
A positive covariance indicates that as one variable That is, it depends upon the scales of measurement used.
deviates from the mean, the other variable deviates in So, covariance is not a standardized measure.
the same direction.
89 90
Cont’d… Cont’d…
Thus, we have to express the covariance in a standard unit of
measurement. By standardizing the covariance we end up with
The standardized covariance is known as a correlation coefficient
and is defined as:
a value that has to lie between −1 and +1
It is called Pearson’s Correlation Coefficient):
n
1 rxy 1
Sxy x x y y
i i n So, if you find a correlation coefficient less than
r i1
xisx x yisy y −1 or more than +1
SxxSyy n n
xi x yi y
2 2 i1
you can be sure that something has gone
i1 i1
hideously wrong!.
i.e. 1 rxy 1
91 92
Econometrics 5/30/2023
Example n
x x y y
S xy i i n
Suppose that the Ethiopian ministry of tourism expects that coffee sales r i 1
xisx x yisy y
S xx S yy n n
x x y y
2 2 i 1
rise when more adult visitors come by (see the table below) i 1
i
i 1
i
x x yi i y
How to measure relationship between coffee sales (x) and number of r i 1
0.579
n n
visitors (y)? x x y y
2 2
97 98 i i
i 1 i 1
Interpretation Interpretation
r = -0.5791
Negative correlation between return of coffee and number
of adult visitors
This means; the association between the two variables is
negative
Strange!
Why do you think this has happened?
Have we controlled other things (other variables)?
E.g. what if temperature is controlled (since the weather
condition may affect the visitors decision to come to
Ethiopia)
Econometrics 5/30/2023
Cont’d… Generally
If we have more data measured (information on
There is a type of correlation that can be done that allows
temperature), will the relationship between coffee sales
you to look at the relationship between two variables
and visitors held same?
when the effects of a third variable are held constant.
returns of coffee number of adult temperature
For example, if I want to analyze your exam
(c) visitors (v) (t)
performance may be negatively related to exam
35 29 12
anxiety, but positively related to revision time,
21 35 14
and revision time itself was negatively related to exam
19 48 17
23 85 26 anxiety.
7 75 24 This scenario is complex,
24 52 18
101 102
between exam anxiety and exam performance while Let’s see the output of that example after and before
controlling for the effect of temperature
‘controlling’ for the effect of revision time.
103 104
Econometrics 5/30/2023
Case 1: when we do not control the effect of temperature Illustrative Example: 1. Given a data on hours studied and
grade on exam are correlated.
A. Find the estimated parameter's?
B. Draw estimated regression line?
C. Interpreted the coefficient
D. Suppose the number of hours studied is
3, what is the predicted grade on exam?
E. Coefficient of determination and
interpreter the result?
Case 2: when we control the effect of temperature F. Correlation coefficient
G. Conduct student t-test with 5 % level of
significance.
H. Construct 95% confidence interval for
the slope of parameter
I. Test the significance of the slope
parameter using constructed confidence
106 interval. 30/05/2023
By: Mmaru S.(MA)
Illustrative Example
2. The following results have been obtained from a
simple of 11 observations on the values of sales (Y) of a
firm and the corresponding prices (X).
𝑋 = 519.18, 𝑋 𝑌 = 1,296,836
𝑌 = 217.82,
𝑋 = 3134,543 𝑌 = 539,512