Econometrics Practical
Econometrics Practical
Here is the derivation of the OLS estimators for the coefficients B0 and B1 in a simple linear
regression model:
y = B0 + B1x + u
Where:
y = dependent variable
x = independent variable
B0 = intercept term
B1 = coefficient on x
u = error term
The OLS method finds coefficients that minimize the sum of squared residuals (RSS):
To minimize the RSS, we take the partial derivatives with respect to B0 and B1 and set them
equal to zero:
B0 = ȳ - B1x̅
So in summary, the OLS coefficient estimators are found by minimizing the RSS, which gives
us the familiar formulas for B1 as the slope of the regression line, and B0 as the intercept.
Ques 2) gaus marakov assumptions
Here are the key Gaussian-Markov assumptions outlined in Gujarati's econometrics
textbook:
1. Linear in parameters - The regression model is linear in the parameters B0, B1, B2 etc.
This means it takes the form:
2. Random sampling - The sample drawn is random from the population. Each observation
has an equal probability of being selected.
3. No perfect collinearity - The independent variables in the model should not be perfectly
linearly related. If they are, the coefficients cannot be estimated.
4. Zero conditional mean - The error term has an expected value of zero given the values of
the independent variables. Formally:
E(u|x1,x2...xk) = 0
5. Homoskedasticity - The error term has a constant variance given any set of values for the
independent variables. Formally:
Var(u|x1,x2...xk) = σ2
7. Normality - The error terms are normally distributed. Along with assumptions 4 and 5, this
implies u ~ N(0, σ2).
If these assumptions hold, OLS estimators are the Best Linear Unbiased Estimators (BLUE).
Estimates are unbiased and efficient, with minimum variance. Inference using t-stats and
F-stats is valid.
These are the standard assumptions made in the Classical Linear Regression Model using
OLS estimation. Checking for violations is important for properly specifying and interpreting
the model.
Ques 3)prove that under the gauss marakov OLS
estimators are Blue
Here is a proof that under the Gaussian-Markov assumptions, the OLS estimators are Best
Linear Unbiased Estimators (BLUE):
Where y is the dependent variable, x1 to xK are the independent variables, the βs are the
true population parameters, and u is the error term.
1) Linear in parameters
2) Random sampling
3) No perfect collinearity
4) Zero conditional mean (E(u|x) = 0)
5) Homoscedasticity (Var(u|x) = σ2)
6) No autocorrelation between error terms
7) Normality of error terms (u ~ N(0, σ2))
Under these assumptions, the OLS estimators β^0, β^1, ..., β^k have the following
properties:
Unbiasedness:
E(β^j) = βj for j = 0, 1, ..., k
This follows from assumption 4. Taking expectations of both sides yields the property.
Minimum Variance:
Var(β^j) ≤ Var(θj) for any other linear unbiased estimator θj.
This can be shown by contradiction. Assume some other estimator θj has lower variance.
Since θj is unbiased, E(θj - β^j) = 0. Therefore, Var(θj) < Var(β^j) would violate the
Gauss-Markov theorem.
Linearity:
The OLS estimators are linear functions of the data y and X.
Therefore, under the Gaussian-Markov assumptions, OLS estimators are linear, unbiased,
and have minimum variance - making them the Best Linear Unbiased Estimators (BLUE).
Ques 5)explain significance of t test
Here is an explanation of the significance of the t-test in regression analysis, based on
Gujarati's econometrics textbook:
The t-test is used to determine if the estimated regression coefficient (β^j) for an
independent variable is statistically significantly different from zero.
t = β^j / SE(β^j)
The standard error measures the precision of the estimate. A large SE means the coefficient
was imprecisely estimated.
Under the Gaussian-Markov assumptions, the t-statistic follows a t-distribution with n-k
degrees of freedom under the null hypothesis.
We compare the t-statistic to the critical t-value from the t-distribution at our chosen
significance level (e.g. 5%).
If |t| > tcritical, the coefficient is significantly different from zero at that significance level. We
reject the null hypothesis and conclude there is a statistically significant linear relationship
between the independent variable and dependent variable.
If |t| < tcritical, we fail to reject the null hypothesis and conclude the evidence is insufficient to
say the coefficient is nonzero. There is no statistically significant linear relationship between
the independent variable and the dependent variable.
So in summary, Gujarati emphasizes that the t-test lets us formally assess whether the data
provides evidence of a nonzero linear effect of an independent variable on the expected
value of the dependent variable. A significant t-statistic indicates a statistically meaningful
relationship.
Ques 6)explain significance of F test
Here is an explanation of the F-test for overall model significance, based on Gujarati's
econometrics textbook:
The F-test is used to test whether the overall regression model has statistically significant
explanatory power.
H0: β1 = β2 = ... = βk = 0
(All coefficients except intercept are 0)
F = (RSSR - RSSUR) / k
RSSUR / (n - k - 1)
Where:
Under the null hypothesis, the F-statistic follows an F-distribution with k and (n-k-1) degrees
of freedom.
We compare the F-statistic to the critical F-value from the F-distribution table at the chosen
significance level (e.g. 5%).
If F > Fcritical, we reject the null hypothesis and conclude at least one population coefficient
is nonzero. The model has statistically significant explanatory power.
If F ≤ Fcritical, we fail to reject the null and conclude there is insufficient evidence that the
regressors jointly explain the dependent variable. The model lacks statistical significance.
So in summary, Gujarati emphasizes that the F-test allows us to formally test whether the
independent variables are jointly useful in explaining variation in the dependent variable. A
significant F-statistic indicates a meaningful regression model.
Ques 7)explain significance of R^2 intercept and how is
it different from adjusted R^2
The R-squared (R2) and adjusted R-squared (adjusted R2) are measures of the explanatory
power of a regression model. Here is an explanation of their significance and how they differ:
R-squared (R2)
- R2 measures the proportion of the variation in the dependent variable that is explained by
the independent variables in the model.
- It ranges from 0 to 1, with higher values indicating more of the variation is explained by the
model.
- R2 is calculated as 1 - (RSS/TSS). RSS is the residual sum of squares and TSS is the total
sum of squares.
- R2 will increase as more variables are added to the model, even if they are not statistically
significant.
Adjusted R-squared
- Adjusted R2 accounts for the number of independent variables in the model. It will increase
only if additional variables improve the model more than would be expected by chance.
- The formula is adjusted R2 = 1 - (1 - R2)*(n - 1)/(n - k - 1). Where n is sample size and k is
the number of independent variables.
- Adjusted R2 can decrease as variables are added, if the improvement is less than
expected.
- Adjusted R2 will always be less than or equal to R2.
Key Differences:
- R2 is biased upwards as more variables are added, even if useless. Adjusted R2 corrects
for this.
- Adjusted R2 is lower than R2 when there are extra useless variables in the model.
- Adjusted R2 is better for comparing models with different numbers of independent
variables.
- R2 is preferred for assessing how well the model fits the data overall.
So in summary, R2 measures how well the model fits the data, while adjusted R2 corrects
for the number of variables and is better for comparing models. Adjusted R2 is always lower
than R2 if extra useless variables are included.
Ques 8) Explain the concept of the standard error of
the model and how will you intercept the model
The standard error of a regression model measures how precisely the model estimates the
dependent variable. It captures the amount of unexplained variation around the regression
line.
- Defined as the square root of the Residual Sum of Squares (RSS) divided by degrees of
freedom.
σ^ = √(RSS/(n - k - 1))
Where:
n = number of observations
k = number of independent variables
- A lower standard error indicates less variance around the regression line and more precise
estimates.
- A large standard error relative to the coefficients indicates imprecise estimation and a
poorly fitting model.
- Statistical tests like t-tests and F-tests rely on the standard error to test coefficient and
model significance.
In summary, the standard error of a regression measures the precision of the model
estimates, with a lower standard error suggesting a better fitting and more useful model. It is
essential for inference in regression analysis.
Ques 10)explain the concept of confidence interval of
Bi, how is it different from OLS estimators
The confidence interval for a regression coefficient Bi provides a range of likely values for
the true population parameter based on the sample estimate. It differs from the OLS
estimator in the following ways:
Confidence Interval:
Where:
Key Differences:
So in summary, the confidence interval describes a range of plausible values for the true
coefficient Bi based on the sample estimate β^i and its standard error. It provides more
information than just the point estimate.
Ques 11)explain the role of t and f test in econometrics
The t-test and F-test play important roles in econometric analysis and inference:
t-test:
- Used to test if an individual regression coefficient is significantly different from zero.
- Tests the null hypothesis H0: βi = 0 against the alternative Ha: βi ≠ 0
- Calculated as t = β^i / SE(β^i)
- Follows a t-distribution under the null hypothesis with n - k degrees of freedom
- A large t-statistic indicates strong evidence to reject the null and conclude the regressor is
significantly related to the dependent variable.
Role in econometrics:
- Lets researchers determine if an independent variable has a statistically significant effect
on the dependent variable after controlling for other factors.
- Essential for assessing the validity of economic theories and the relationship between
economic variables.
F-test:
- Used to test if all regression coefficients (except intercept) are jointly equal to zero.
- Tests the joint null hypothesis H0: β1 = β2 = ... = βk = 0
- Calculated as F = (RSSR - RSSUR)/k / RSSUR/(n - k - 1)
- Follows an F-distribution under the null with k and n - k - 1 degrees of freedom.
Role in econometrics:
- Let researchers test the overall fit and significance of the regression model as a whole.
- Determines if including the independent variables jointly improves the model.
- Essential for model selection and assessing empirical economic models.
In summary, the t and F-tests are fundamental for assessing the relationships between
economic variables and testing economic theories using regression analysis. They facilitate
statistical inference and valid econometric modeling.
Ques 12)explain expected value and variability
The expected value and variability are basic statistical concepts that describe the central
tendency and dispersion of a probability distribution.
Expected Value:
- Equals the weighted average of all possible values, weighted by their probability of
occurrence.
- For a discrete random variable X, the expected value is E(X) = ∑x*P(x) for all possible
values x.
- In econometrics, the population regression function passes through the expected value of
Y for given X.
Variability:
- Variance and standard deviation measure how far outcomes are from the mean on
average.
- In econometrics, the error term captures the variability of Y around the expected value.
In summary, the expected value measures central tendency, while variability measures
dispersion. Understanding both provides insight into the distribution as a whole and
relationships between economic variables.
Ques 13)explain binomial and how is it different from
poison
The binomial and Poisson distributions are two common discrete probability distributions
used in statistics and econometrics. The key differences are:
Binomial Distribution:
Poisson Distribution:
Key Differences:
In summary, the binomial distribution models a fixed number of discrete trials, while the
Poisson distribution models a continuous rate of occurrence of random events over time.
Their applications and properties differ accordingly.
Ques 14)what are the various aspects of discriptive
statistics
Descriptive statistics are used to summarize and describe the characteristics of a data set.
The main aspects or elements of descriptive statistics include:
Descriptive statistics provide simple summaries about the sample and the measures form
the basis for more formal statistical inference. Together they provide a comprehensive
overview of any data set.
Ques 15)how box plots help us to understand discriptive statistics
box plots from Gujarati:
- Visually depict the distribution of a variable using median, quartiles, and extremes.
- Center line in box shows 50th percentile (median). Box boundaries depict 25th and 75th
percentiles (interquartile range).
- Whiskers extend to farthest observations within 1.5 times the interquartile range from the
box.
- Easy to draw and interpret. More insightful than just reporting sample median and standard
deviation.
- Can be used with time series data to identify patterns like trend and seasonal fluctuations.
Overall, Gujarati emphasizes box plots are a simple and useful descriptive statistic tool for
visualizing key aspects of a distribution, detecting outliers, and comparing groups. They
provide insights that may not be apparent from numerical measures alone.