Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Econometrics Assignment

Uploaded by

nebaneba321
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Econometrics Assignment

Uploaded by

nebaneba321
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1

HAWASSA UNIVERSITY

COLLEGE OF BUSINESS AND

ECONOMICS

DEPARTMENT OF ECONOMICS

GROUP ASSIGNMENT FOR ECONOMETRICS ONE

NAME OF STUDENTS ID.NO

1. SEBEWONEL TEADEL _____________________________________ SoScR/0413/15


2. MERON GETACHEW _____________________________________ SoScR/1981/15
3. JONAH SUNDAY _________________________________________ SoScR/1393/14
4. WOYNSHET ATINAFU _____________________________________ SoScR/2093/15
5. ASEFAW MANDFRO ______________________________________ SoScR/1188/15
6. NEBIL MOHAMMED ______________________________________ SoScR/0368/15
7. TSIYON ATALEW _________________________________________ SocSR/1440/15
8. MERSHAYE ABERA_______________________________________ SocSR/0324/15

Submitted to: - Dr, Abate yesigat

[DOCUMENT TITLE] ECONOMETRICS


2

[DOCUMENT TITLE] ECONOMETRICS


3

[DOCUMENT TITLE] ECONOMETRICS


4

[DOCUMENT TITLE] ECONOMETRICS


5

3.Discuss the nature, causes, consequences, and remedies of each of the


following problems we might encounter in regression analysis.
a) Multicollinearity
b) Heteroscedasticity
A. Multicollinearity in Regression Analysis

Multicollinearity is a phenomenon in regression analysis where two or more predictor


variables (independent variables) are highly correlated with each other. This high
correlation makes it difficult to isolate the individual effect of each predictor variable
on the dependent variable.

1. Nature of Multicollinearity

In simple terms, multicollinearity occurs when there is a strong linear relationship


between two or more independent variables in a regression model. This can create
several issues for regression analysis, particularly with interpreting coefficients and
making reliable predictions. There are two main types of multicollinearity:

• Perfect Multicollinearity: This occurs when one independent variable is an exact


linear function of another. Perfect multicollinearity is rare but can lead to an inability
to estimate the regression coefficients for the correlated variables.
• Imperfect or High Multicollinearity: This is more common and occurs when the
correlation between two or more independent variables is very high, but not exactly
1 or -1.
• 2. Causes of Multicollinearity

Several factors can contribute to the occurrence of multicollinearity:

• Inclusion of Highly Correlated Variables: Sometimes, multiple predictor variables in


a model may be closely related, even though they measure different aspects of a
concept (e.g., height in inches and height in centimeters).
• Measurement Errors: If one or more variables are measured with error, this can
increase correlations between predictors.
• Overfitting the Model: Adding too many independent variables to the regression
model, especially those that are similar or highly correlated, can increase
multicollinearity.
• Linear Relationships in Data: In some cases, multicollinearity arises naturally in the
dataset due to inherent relationships between variables, such as in economic data
(e.g., income and education level).
• Polynomial and Interaction Terms: Including interaction terms or higher-degree
polynomial terms can also cause multicollinearity if the variables involved are
already correlated.

[DOCUMENT TITLE] ECONOMETRICS


6

3. Consequences of Multicollinearity

Multicollinearity can lead to several issues in regression analysis:

Unstable Coefficients: When predictors are highly correlated, it becomes


difficult to determine the individual effect of each variable on the dependent
variable. The regression coefficients become highly sensitive to small changes
in the model or the data, making them unstable.

o For example, adding or removing a correlated variable can lead to large


changes in the estimated coefficients of other variables.

Inflated Standard Errors: Multicollinearity causes the standard errors of the


estimated coefficients to increase. This inflation reduces the precision of the
coefficient estimates and makes it more difficult to determine if the variables
are statistically significant.

Significance Testing Problems: With large standard errors, the t-statistics for
individual predictors decrease, which can lead to a failure in rejecting the null
hypothesis that a coefficient is equal to zero (i.e., failing to find significant
predictors even if they should be significant).

Redundancy in Variables: If two variables are highly correlated, they may


essentially be capturing the same information. This redundancy does not
provide additional explanatory power but makes the model unnecessarily
complex.

Misleading Conclusions: The interpretation of the coefficients may become


misleading, making it difficult to understand the relationship between each
predictor and the dependent variable. For example, you might incorrectly
attribute the effect of one variable to another highly correlated one.

4. Remedies for Multicollinearity

Several strategies can be used to address multicollinearity:

Examine Correlation Matrix: Before fitting a regression model, examine the


correlation matrix of all independent variables. If two variables are highly
correlated (typically with a correlation coefficient above 0.8), this is an
indicator of potential multicollinearity.

Remove Highly Correlated Variables: If two variables are highly correlated,


it may be worth removing one from the model. This can simplify the model
and reduce multicollinearity. Sometimes, domain knowledge can help decide
which variable to keep.

Combine Correlated Variables: In some cases, it might make sense to


combine correlated variables into a single composite variable. For example, if

[DOCUMENT TITLE] ECONOMETRICS


7

"income" and "education level" are highly correlated, they might be combined
into a socioeconomic status score.

Use Principal Component Analysis (PCA): PCA is a dimensionality


reduction technique that can transform correlated predictors into a smaller set
of uncorrelated components. These components can then be used in the
regression model instead of the original variables, helping mitigate
multicollinearity.

Apply Ridge or Lasso Regression: These are regularization techniques that


can help reduce the impact of multicollinearity:

o Ridge Regression (L2 regularization) adds a penalty proportional to the


square of the coefficients to the loss function. This helps shrink the
coefficients of correlated predictors, making them more stable.
o Lasso Regression (L1 regularization) can help by setting some coefficients to
zero, effectively removing some correlated predictors from the model.

Increase Sample Size: If multicollinearity arises from limited data, increasing


the sample size may help stabilize the estimates of the coefficients and reduce
the impact of multicollinearity.

Centering Variables: In some cases, centering the variables (subtracting the


mean from each variable) can help reduce multicollinearity, especially in the
presence of interaction terms or polynomial terms. Centering does not remove
the correlation but can reduce some computational issues.

Use Variance Inflation Factor (VIF): VIF is a diagnostic tool that quantifies
how much the variance of a regression coefficient is inflated due to
collinearity with other predictors. A high VIF (typically greater than 10)
indicates significant multicollinearity. Variables with high VIFs can be
considered for removal or transformation.

5. Diagnosing Multicollinearity

Correlation Matrix: This is the simplest way to detect multicollinearity. If


two or more predictors have a high correlation coefficient (above 0.8 or 0.9),
multicollinearity may be present.

Variance Inflation Factor (VIF): VIF measures how much the variance of a
regression coefficient is inflated because of collinearity with other variables. A
VIF value greater than 10 is typically considered indicative of problematic
multicollinearity.

Condition Index: The condition index is another diagnostic tool. It is used in


conjunction with the eigenvalues of the predictor matrix. High condition
indices (greater than 30) often signal multicollinearity.

[DOCUMENT TITLE] ECONOMETRICS


8

Tolerance: Tolerance is the reciprocal of VIF (i.e., 1/VIF1 / VIF1/VIF).


Tolerance values close to 0 indicate severe multicollinearity.

B. Heteroscedasticity in Regression Analysis

Heteroscedasticity refers to a situation in regression analysis where the variance of the


errors (or residuals) is not constant across all levels of the independent variable(s). In
simple terms, it occurs when the spread or dispersion of the residuals varies
systematically with the values of the independent variables. This violates one of the
key assumptions of classical linear regression, which assumes homoscedasticity
(constant variance of errors).

1. Nature of Heteroscedasticity

Inconsistent Variance of Errors: In a homoscedastic model, the variability


(spread) of residuals should be roughly the same across all levels of the
independent variable(s). However, in the case of heteroscedasticity, the
residuals' spread increases or decreases as the value of the predictor variable
changes.

Graphical Indicators: Heteroscedasticity can often be visually identified


through a scatter plot of residuals versus fitted values. If the variance of the
residuals changes systematically (e.g., wider spread for larger values of the
independent variable), the plot will show a "fan" or "cone" shape.

Example: In a simple linear regression model, if the residuals tend to spread


out more as the predicted values (fitted values) increase, this is an indicator of
heteroscedasticity.

2. Causes of Heteroscedasticity

Several factors can contribute to heteroscedasticity:

Non-constant Variability in the Data: In some real-world situations,


variability of the dependent variable naturally increases or decreases as the
level of the independent variable changes. For example, in income-related
models, the variance of income may increase as income levels rise (because
higher-income groups often show more variation in their earnings).

Model Misspecification: Heteroscedasticity can result from incorrect model


specification. For instance, omitting important predictor variables (which
influence the dependent variable) or using an incorrect functional form (e.g.,
linear instead of logarithmic) can lead to heteroscedastic errors.

Presence of Outliers: Outliers or extreme values in the data can sometimes


cause heteroscedasticity, as these values might cause residuals to behave
erratically, leading to a non-constant variance.

[DOCUMENT TITLE] ECONOMETRICS


9

Measurement Error: If there are errors in measuring the dependent or


independent variables, it can cause the variance of the residuals to vary across
observations.

Data Transformation: If the data is not properly transformed (for example,


failing to log-transform highly skewed data), heteroscedasticity can emerge as
the spread of errors becomes larger or smaller as the independent variable(s)
change.

3. Consequences of Heteroscedasticity

Heteroscedasticity can have several negative effects on the results of a regression


analysis:

Inefficient Estimates: Ordinary Least Squares (OLS) estimates remain


unbiased in the presence of heteroscedasticity, but they are no longer efficient.
This means that while the estimated coefficients will still be correct on
average, they will no longer have the minimum possible variance. Therefore,
they will not be the best (most precise) estimates.

Inflated Standard Errors: One of the primary consequences of


heteroscedasticity is that it leads to incorrect standard errors. These errors can
become too large or too small depending on the pattern of heteroscedasticity.
This results in inaccurate significance tests (e.g., t-tests), potentially leading to
the wrong conclusions about whether predictors are significant or not

Invalid Inference: Since the standard errors are incorrect, hypothesis tests
(e.g., testing if a coefficient is zero) may lead to false positives (Type I errors)
or false negatives (Type II errors). For example, you might incorrectly reject a
null hypothesis (finding a variable significant when it isn’t) or fail to reject a
false null hypothesis (failing to detect a significant relationship).

Ineffective Confidence Intervals: Inaccurate standard errors also lead to


incorrect confidence intervals for the estimated coefficients. These intervals
might be too wide or too narrow, giving misleading results about the precision
of the estimates.

Model Fit and Predictive Power: The presence of heteroscedasticity does not
necessarily affect the fit of the model (i.e., R-squared remains valid), but it
affects the reliability of statistical tests on the coefficients, which can
undermine the usefulness of the model for making predictions or drawing
inferences.

4. Remedies for Heteroscedasticity

There are several ways to deal with heteroscedasticity, depending on the severity of
the problem and the context of the data:

Transform the Dependent Variable: A common remedy is to apply a


transformation to the dependent variable (e.g., a log transformation, square
[DOCUMENT TITLE] ECONOMETRICS
10

root transformation, or inverse transformation). This can help stabilize the


variance of the errors. For instance, taking the natural logarithm of income
data can reduce heteroscedasticity because the variability tends to be more
uniform on a logarithmic scale.

Weighted Least Squares (WLS): WLS is a method where different


observations are given different weights based on the variance of their
residuals. Observations with larger error variance are down-weighted, while
those with smaller error variance are up-weighted. This corrects for
heteroscedasticity and ensures more efficient estimates.

Use Robust Standard Errors: A simple and popular approach is to compute


robust standard errors (also called heteroscedasticity-consistent standard
errors). These adjusted standard errors correct for the non-constant variance
and allow for valid hypothesis testing even in the presence of
heteroscedasticity. One common type of robust standard error is the Huber-
White standard error.

Generalized Least Squares (GLS): GLS is another method that attempts to


correct heteroscedasticity by modeling the structure of the variance-covariance
matrix of the errors. This approach requires an assumption about how the
variance changes, and it adjusts the estimation process accordingly to account
for it.

Examine the Functional Form: Heteroscedasticity may arise from a


misspecified functional form of the model. For example, using a linear model
when the relationship between the dependent and independent variables is
non-linear can lead to heteroscedastic errors. A non-linear model, polynomial
regression, or adding interaction terms might resolve this issue.

Add Missing Variables: If the heteroscedasticity is caused by omitted


variable bias, adding relevant variables to the model could correct the issue.
For instance, adding income as a control in a model predicting household
expenditures could reduce heteroscedasticity.

Use of Non-Parametric Methods: In some cases, non-parametric or semi-


parametric models (e.g., generalized additive models or kernel regression) can
provide more flexibility in modeling relationships and errors that exhibit
heteroscedasticity.

5. Diagnosing Heteroscedasticity

Several diagnostic tools and tests are available to detect heteroscedasticity in


regression models:

Residual Plot: Plotting the residuals (errors) versus the fitted values is one of
the most common ways to detect heteroscedasticity. A pattern where residuals
fan out or contract as the fitted values change indicates heteroscedasticity.

[DOCUMENT TITLE] ECONOMETRICS


11

Breusch-Pagan Test: This is a statistical test that detects the presence of


heteroscedasticity. The null hypothesis of the test is that the variance of the
residuals is constant (i.e., homoscedasticity). A significant result (small p-
value) suggests the presence of heteroscedasticity.

White’s Test: White’s test is another heteroscedasticity test that is more


general than the Breusch-Pagan test, as it does not assume a specific form for
the heteroscedasticity. It can detect both heteroscedasticity and model
misspecification.

Goldfeld-Quandt Test: This test is used for detecting heteroscedasticity when


there is a suspicion that the variance of the errors depends on an independent
variable or a specific part of the data.

Plotting Residuals by Grouping: If you suspect that heteroscedasticity arises


from a categorical variable, plotting residuals against different levels of the
categorical variable can help in identifying where the variance changes.

1. Use the data file wage to work on using STATA and answer the following
questions

A. Examine the data

. describe

Contains data from C:\Users\osama\Documents\wage (1).dta


obs: 39
vars: 10 18 Mar 2001 17:39
size: 1,560

storage display value


variable name type format label variable label

HRS float %9.0g Average hours worked during the


year
RATE float %9.0g Average hourly wage ($)
ERSP float %9.0g Average yearly earnings of spouse
($)
ERNO float %9.0g Average yearly earnings of other
family members ($)
NEIN float %9.0g Average yearly non-earned income
ASSET float %9.0g Average family asset holdings
(Bank account, etc.) ($)
AGE float %9.0g Average age of respondent
DEP float %9.0g Average number of dependents
RACE float %9.0g Percent of white respondents
SCHOOL float %9.0g Average highest grade of school
completed

[DOCUMENT TITLE] ECONOMETRICS


12

B. Carry out remedial measure(s) if there is any problem with data


. misstable summarize
Obs<.

Unique
Variable Obs=. Obs>. Obs<. values Min Max

ERSP 2 37 36 342 1805


ERNO 3 36 32 30 594
DEP 2 37 36 1.159 4.512
RACE 8 31 29 9.7 83.1

# Know we are replace the missing variables by impute mean

. mean(ERSP)

Mean estimation Number of obs = 37

Mean Std. Err. [95% Conf. Interval]

ERSP 1101.703 44.88189 1010.678 1192.727

. replace ERSP=1101.703 if missing(ERSP)


(2 real changes made)

. mean DEP

Mean estimation Number of obs = 37

Mean Std. Err. [95% Conf. Interval]

DEP 2.477541 .1059457 2.262673 2.692408

. replace DEP=2.477541 if missing(DEP)


(2 real changes made)
. mean ERNO

Mean estimation Number of obs = 36

Mean Std. Err. [95% Conf. Interval]

ERNO 298.2778 15.6037 266.6006 329.955

. replace ERNO=298.2778 if missing(ERNO)


(3 real changes made)

[DOCUMENT TITLE] ECONOMETRICS


13

. mean RACE

Mean estimation Number of obs = 31

Mean Std. Err. [95% Conf. Interval]

RACE 37.96129 3.38077 31.05684 44.86574

. replace RACE=37.96129 if missing(RACE)


(8 real changes made)

C,Regress HRS on RATE, ERSP, ERNO, NEIN, AGE and DEP


. regress HRS RATE ERSP ERNO NEIN AGE DEP

Source SS df MS Number of obs = 39


F(6, 32) = 17.35
Model 119154.526 6 19859.0877 Prob > F = 0.0000
Residual 36628.7046 32 1144.64702 R-squared = 0.7649
Adj R-squared = 0.7208
Total 155783.231 38 4099.5587 Root MSE = 33.833

HRS Coef. Std. Err. t P>|t| [95% Conf. Interval]

RATE 6.525294 22.33447 0.29 0.772 -38.96853 52.01912


ERSP -.0416553 .028775 -1.45 0.157 -.1002681 .0169576
ERNO -.2493987 .0982828 -2.54 0.016 -.4495942 -.0492032
NEIN .4517179 .0768562 5.88 0.000 .295167 .6082689
AGE -4.42077 2.487202 -1.78 0.085 -9.487034 .6454937
DEP 7.89696 12.96462 0.61 0.547 -18.51111 34.30503
_cons 2244.167 110.312 20.34 0.000 2019.468 2468.865

D, Conduct model specification tests using linktest and ovtest commands of STATA,
and interpret the result

. linktest

Source SS df MS Number of obs = 39


F(2, 36) = 65.17
Model 122068.155 2 61034.0777 Prob > F = 0.0000
Residual 33715.0754 36 936.529873 R-squared = 0.7836
Adj R-squared = 0.7716
Total 155783.231 38 4099.5587 Root MSE = 30.603

HRS Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat 12.70239 6.635216 1.91 0.064 -.7544527 26.15923


_hatsq -.0027452 .0015564 -1.76 0.086 -.0059017 .0004113
_cons -12462.99 7068.384 -1.76 0.086 -26798.33 1872.36

[DOCUMENT TITLE] ECONOMETRICS


14

. ovtest

Ramsey RESET test using powers of the fitted values of HRS


Ho: model has no omitted variables
F(3, 29) = 2.50
Prob > F = 0.0792

E, Perform multicollinearity test

. vif

Variable VIF 1/VIF

NEIN 3.82 0.261487


AGE 3.66 0.273137
RATE 3.45 0.289794
ERNO 2.59 0.386270
DEP 2.20 0.455492
ERSP 1.94 0.515220

Mean VIF 2.94

F, Perform hetroscedasticity test

. hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: fitted values of HRS

chi2(1) = 5.25
Prob > chi2 = 0.0220

G, Comment on the explanatory power and adequacy of the model


The regression model demonstrates strong explanatory power, with
an R-squared of 0.7649, indicating that approximately 76.49% of
the variation in the dependent variable (HRS) is explained by the
independent variables, and an adjusted R-squared of 0.7208,
accounting for potential overfitting. The model is statistically
significant overall, as shown by the F-statistic (17.35) and its p-
value (0.0000). However, while ERNO is statistically significant (p-
value = 0.016), the other predictors (RATE, ERSP, NEIN, AGE,
and DEP) are not, as their p-values exceed 0.05. The Root Mean
Square Error (33.833) reflects the average prediction error, but its
adequacy depends on the scale of HRS. While the model performs
well overall, improvements could be made by addressing
insignificant variables and conducting further diagnostics to
enhance its predictive accuracy.

[DOCUMENT TITLE] ECONOMETRICS


15

H,Interpret the regression coefficient

Among the significant variables:

1. ERNO (-0.2494): A one-unit increase in ERNO is associated with a decrease of 0.2494


units in HRS, holding all other variables constant. This negative relationship is statistically
significant (p-value = 0.016), suggesting that ERNO has a meaningful impact in reducing
HRS.
2. NEIN (0.4512): A one-unit increase in NEIN is associated with an increase of 0.4512
units in HRS, holding all other variables constant. This positive relationship is statistically
significant (p-value = 0.026), indicating that NEIN contributes significantly to increasing
HRS.

These findings suggest that ERNO and NEIN are key predictors of HRS, with ERNO
decreasing and NEIN increasing its valu

[DOCUMENT TITLE] ECONOMETRICS


16

5,Use the data file EARNINGS and, using STATA for analysis, carry out the
following tasks.
a. Perform a regression of EARNINGS on S where EARNINGS represent
Current hourly earnings in $ and S represents education (highest grade
completed) in number of years of schooling of the respondent. Interpret the
regression results

. regress EARNINGS S

Source SS df MS Number of obs = 540


F(1, 538) = 112.15
Model 19321.5589 1 19321.5589 Prob > F = 0.0000
Residual 92688.6722 538 172.283777 R-squared = 0.1725
Adj R-squared = 0.1710
Total 112010.231 539 207.811189 Root MSE = 13.126

EARNINGS Coef. Std. Err. t P>|t| [95% Conf. Interval]

S 2.455321 .2318512 10.59 0.000 1.999876 2.910765


_cons -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444

Earnings = -12.922 + 2.45(Saving): The intercept (-12.922) means that when saving is zero,
earnings are expected to be -12.922, indicating borrowing. The slope (2.45) shows that for
each unit increase in saving, earnings increase by 2.45 units on average.

B, Comment on the value of R2

R² = 0.1725 indicates that 17.25% of the total variation in earnings is explained by savings.
The remaining 82.75% of the variation is due to factors not included in the model or
explained by the error term.

C, Perform a test on the coefficients of regression. Explain the implications of the


result of the test. Calculate a 95% confidence interval for the slope coefficient

To test the significance of regression coefficients, we use methods like the standard error
test, t-test, or confidence intervals.

t-Test: Compare the t-calculated (10.59) to the t-critical (2.05). Since , reject , indicating the
slope coefficient is statistically significant.

Confidence Interval: At 95% confidence, the slope coefficient lies within [1.999, 2.91].

Thus, both the intercept and slope coefficients are statistically significant.

[DOCUMENT TITLE] ECONOMETRICS


17

D, Perform an F test for the goodness of fit and comment on the result

The F-test evaluates the overall significance of the model. The null hypothesis states that all
coefficients are equal to zero (). Since the F-statistic (112.15) is greater than the F-critical
value (4) and the p-value (0.000) is less than 0.05, we reject the null hypothesis. This
indicates that the coefficients are jointly significant, and the model is valid.

E, Regress S on ASVAC and SM where ASVAC is a composite measure of


numerical and verbal ability of the respondent and SM is the years of schooling of
the respondent’s mother. Repeat the regression using SF, the years of schooling of
the father, instead of SM, and again including both as regressors. Do your
regression result support the view that if you educate a male, you educate an
individual, while if you educate a female, you educate a nation?

. regress S ASVABC SM

Source SS df MS Number of obs = 540


F(2, 537) = 147.36
Model 1135.67473 2 567.837363 Prob > F = 0.0000
Residual 2069.30861 537 3.85346109 R-squared = 0.3543
Adj R-squared = 0.3519
Total 3204.98333 539 5.94616574 Root MSE = 1.963

S Coef. Std. Err. t P>|t| [95% Conf. Interval]

ASVABC .1328069 .0097389 13.64 0.000 .1136758 .151938


SM .1235071 .0330837 3.73 0.000 .0585178 .1884963
_cons 5.420733 .4930224 10.99 0.000 4.452244 6.389222

. regress S ASVABC SF

Source SS df MS Number of obs = 540


F(2, 537) = 155.49
Model 1175.37867 2 587.689333 Prob > F = 0.0000
Residual 2029.60467 537 3.77952452 R-squared = 0.3667
Adj R-squared = 0.3644
Total 3204.98333 539 5.94616574 Root MSE = 1.9441

S Coef. Std. Err. t P>|t| [95% Conf. Interval]

ASVABC .1285797 .0095914 13.41 0.000 .1097385 .1474209


SF .1289751 .0259437 4.97 0.000 .0780115 .1799387
_cons 5.541335 .4692887 11.81 0.000 4.619468 6.463202

[DOCUMENT TITLE] ECONOMETRICS


18

. regress S ASVABC SM SF

Source SS df MS Number of obs = 540


F(3, 536) = 104.30
Model 1181.36981 3 393.789935 Prob > F = 0.0000
Residual 2023.61353 536 3.77539837 R-squared = 0.3686
Adj R-squared = 0.3651
Total 3204.98333 539 5.94616574 Root MSE = 1.943

S Coef. Std. Err. t P>|t| [95% Conf. Interval]

ASVABC .1257087 .0098533 12.76 0.000 .1063528 .1450646


SM .0492424 .0390901 1.26 0.208 -.027546 .1260309
SF .1076825 .0309522 3.48 0.001 .04688 .1684851
_cons 5.370631 .4882155 11.00 0.000 4.41158 6.329681

1. For Mother's Education (SM):

The regression results show that the coefficient for SM (mother’s education) is statistically
insignificant, meaning that a mother's education does not have a significant impact on the
outcome (S). As a result, we reject the view that "if you educate a female, you educate a
nation." This implies that the mother’s education does not play a large role in shaping the
individual’s outcome in this case.

2. For Father’s Education (SF):

We conduct a hypothesis test for the father’s education, where:Null Hypothesis (Ho): "If you
educate a male, you don’t educate an individual."Alternative Hypothesis (H1): "If you
educate a male, you educate an individual."The p-value for SF is 0.001, which is less than the
0.05 significance level. This means we reject the null hypothesis and accept the alternative
hypothesis, indicating that educating a male has a significant positive impact on the
individual’s outcome.

In conclusion, the results support the idea that father's education plays a more significant
role in shaping an individual's outcome, and the impact of mother's education appears to be
less significant in this context. Thus, the hypothesis "if you educate a male, you educate an
individual" holds, but the idea "if you educate a female, you educate a nation" does not.

[DOCUMENT TITLE] ECONOMETRICS


19

F, Regress EARNINGS on S and EXP (total out-of-school work experience in years),


interpret the results and perform t test
. regress EARNINGS S EXP

Source SS df MS Number of obs = 540


F(2, 537) = 67.54
Model 22513.6473 2 11256.8237 Prob > F = 0.0000
Residual 89496.5838 537 166.660305 R-squared = 0.2010
Adj R-squared = 0.1980
Total 112010.231 539 207.811189 Root MSE = 12.91

EARNINGS Coef. Std. Err. t P>|t| [95% Conf. Interval]

S 2.678125 .2336497 11.46 0.000 2.219146 3.137105


EXP .5624326 .1285136 4.38 0.000 .3099816 .8148837
_cons -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213

the analysis involves testing the significance of the intercept (β0) and two slope coefficients
(β1 and β2) using t-tests:

1. Intercept (β0):

Null hypothesis (H₀): β0 = 0

Alternative hypothesis (H₁): β0 ≠ 0

t* = 6.20, which is greater than the critical t-value (tc) of 1.96, so we reject H₀ and conclude
that β0 is statistically significant.

Interpretation: The intercept suggests that when all explanatory variables are zero, earnings
are expected to be -26.48.

2. Slope for S (β1):

Null hypothesis (H₀): β1 = 0

Alternative hypothesis (H₁): β1 ≠ 0

t* = 11.46, which is greater than the critical t-value (tc) of 1.96, so we reject H₀ and conclude
that β1 is statistically significant.

Interpretation: For every unit increase in S, earnings (Y) increase by 2.678 units on average,

holding other factors constant.

[DOCUMENT TITLE] ECONOMETRICS


20

3. Slope for DEP (β2):

Null hypothesis (H₀): β2 = 0

Alternative hypothesis (H₁): β2 ≠ 0

t* = 4.38, which is greater than the critical t-value (tc) of 1.96, so we reject H₀ and conclude
that β2 is statistically significant.

Interpretation: For every unit increase in DEP, earnings (Y) increase by 0.562 units on
average, holding other factors constant.All coefficients are statistically significant, indicating
that the explanatory variables (S and DEP) have meaningful relationships with earnings (Y).

[DOCUMENT TITLE] ECONOMETRICS

You might also like