CH 5 2023 Eonometrics For Acct and Finance
CH 5 2023 Eonometrics For Acct and Finance
CH 5 2023 Eonometrics For Acct and Finance
5.1 Introduction
Standard linear regression models are applied when the dependent variable is
continuous such as asset returns, rental value of properties, saving, expenditure, output,
etc. However, there are many situations in which the dependent variable in a regression
equation simply represents a discrete choice assuming only a limited number of
values. Models involving dependent variables of this kind are called categorical
(limited, discrete or qualitative) dependent variable models. In such models, the
values that the dependent variables may take are limited to certain integers (e.g. 0, 1, 2,
3, and 4) or even binary (only 0 or 1).
Categorical dependent variable models may be used when a decision maker faces a
choice among a set of alternatives meeting the following criteria:
The number of choices if finite
The choices are mutually exclusive (the person chooses only one of the
alternatives)
The choices are exhaustive (all possible alternatives are included)
The first criterion is a binding one. We can always refine the available choices so that
they can satisfy the last two criteria. Throughout our discussion we shall restrict
ourselves to cases of qualitative choice where the set of alternatives is binary. For the
sake of convenience the dependent variable is given a value of 0 or 1.
Example: Suppose we want to develop a model for prediction of business failure. Here
the appropriate form for the dependent variable would be a dummy variable taking the
values 0 and 1 since there are only two possible outcomes:
1 if i th company fails
Yi
0 otherwise ,
The independent variables that affect the success or failure (that is, indicators of
financial status) of companies may be working capital to total assets ratio, retained
earnings to total assets ratio, earnings before interest and taxes to total assets ratio, and
sale to total assets ratio. Thus, we would predict the probability of failure of
companies on the basis of these explanatory variables.
Suppose, for example, we wanted to fit a model relating defaults on a bank loan (failing
to pay back loans on time) as a function of income. The dependent variable ( Yi ) is a
dummy variable taking 1 and 0:
CHAPTER V: Categorical Dependent Variable Models 74
Yi 1 2 X 2i i …………………………………...…… (3)
where X 2i is the income of ith individual. The probability that the ith individual defaults
on a bank loan, Pi Pr ob(Yi 1) , is called the response probability. Similarly, the
probability that the ith individual is a non-defaulter on a bank loan is given by:
1 Pi Pr ob(Yi 0) .
From equation (3) we can see that i assumes only two values:
i Probability
1 ( 1 2 X 2i ) Pi
( 1 2 X 2i ) 1 Pi
In the standard regression analysis, we assume that E(i ) 0 . Doing so here we have:
Pi ( 1 2 X 2i ) 0
Pi Pr ob(Yi 1) 1 2 X 2i
The fitted or estimated probability of defaulting for the ith individual is thus:
P̂i ˆ 1 ˆ 2 X2i
75 Applied Econometrics for Accounting and Finance
where ̂1 and ̂ 2 are OLS estimators. The slope estimates for the linear probability
model can be interpreted as the change in the probability that the dependent variable
will be equal to one for a one-unit change in a given explanatory variable.
Example: Suppose the fitted model relating defaults on a bank loan and income (in
thousands of birr) is given by:
P̂i 0.15 0.0025X 2i
This model suggests that for every birr 1000 increase in income, the probability of
defaulting of an individual ( Pi Pr ob(Yi 1) ) decreases by 0.0025 (or 0.25%). For
instance, an individual whose income is birr 10,000 will have a
0.15 0.0025(10) 0.125 (or 12.5%) probability of defaulting.
The problem with this model is that for any individual whose income is more than birr
60,000, the model-predicted probability of defaulting is negative. For instance, the
probability of defaulting of an individual whose income is birr 80,000 is:
0.15 0.0025(80) 0.05
Clearly, such predictions cannot be allowed to stand since we know that the probability
of an event is always a number between 0 and 1 (inclusive), that is, probabilities can
never be negative. The LPM can also produce probabilities that are greater than one.
Thus, the use of the LPM when the dependent variable is categorical may lead to
nonsense probabilities.
1.0
0.8
0.6
0.4
0.2
0.0
where e is the base of the natural logarithm. For a logit model with a single explanatory
variable ( X 2i ), the response probability is given by:
CHAPTER V: Categorical Dependent Variable Models 76
e1 2X2i
Pi P(Yi 1)
1 e1 2X2i
e1 2X2i 1
P(Yi 0) (1 Pi ) 1
1 e1 2X2i 1 e1 2X2i
For the logit model, the ratio of the response probability to the non-response
probability:
is called the odds of Yi 1 against Yi 0 . The natural logarithm of the odds, called
log-odds or logit, is given by:
P
ln i 1 2 X 2i
1 Pi
This is nothing but a simple linear regression model where the dependent variable is the
log-odds instead of the observed values of Yi (which are all zeros and ones). In a
similar fashion, the multiple logistic regression model is given by:
P
ln i 1 2 X 2i 3 X3i . . . k X ki
1 Pi
5.4 Illustration
The following EViews output is a fitted logistic regression model in which the
dependent variable is:
For categorical (qualitative) explanatory variables, the category that is assigned the
value zero is the reference category. When interpreting results, all comparisons are
made with reference to this category.
77 Applied Econometrics for Accounting and Finance
Is the model a good fit? To answer this we can use the Hosmer and Lemeshow Test.
The null hypothesis of this test is that the model fits the data well. As can be seen form
the table below the Chi-square test statistic is insignificant (as the p-value exceeds 5%).
Thus, we can conclude that the model fits the data well.
Goodness-of-Fit Evaluation for Binary Specification
Hosmer-Lemeshow Test
H-L Statistic 8.4942 Prob. Chi-Sq(8) 0.3867
We can also use the likelihood ratio (LR) test to assess the overall model fit. This tests
the joint null hypothesis that all slope coefficients except the constant are zero:
H 0 : 2 3 . . . 8 0
H1: at least one j 0 , j 2,3, . . ,8
The p-value of the test, Prob(LR statistic), is less than 0.001. Thus, we reject the null
hypothesis at the one percent level of significance and conclude that the independent
variables are together statistically significant (or at least one of explanatory variables is
significant).
The response probability Pi P r ob(Yi 1) refers to the probability that ith individual
defaults on a bank loan.
that is assigned the value zero). On the other hand, if the coefficient is positive, then
the probability of defaulting is higher for the non-reference category as compared to
the reference category.
Interpretation of estimated regression coefficients
Debt-to-Income ratio
The coefficient of debt-to-income ratio is positive. This implies that increases in
debt-to-income ratio increases the probability of defaulting, keeping all other
covariates fixed.
Household income
The coefficient of income is negative. Thus, increases in income decreases the
probability of defaulting, keeping all other covariates fixed.
Number of residents in the household
Since the coefficient is positive, increases in the number of residents in the house
leads to increases in the probability of defaulting on a bank loan.
Level of education
The positive coefficient implies that an increase in the level of education of an
individual increases the probability of his/her defaulting.
Home ownership status
The coefficient of home ownership is positive. Since the reference category is Own
(reside in own house), the odds (likelihood or probability) of defaulting is higher for
the non-reference category (those who reside in rented house). The odds ratio is
calculated as: exp(ˆ j ) exp(0.319665) 1.377 . The interpretation is that the odds
of defaulting are 1.377 times higher for those who reside in rented houses as
compared to those who reside in their own houses, keeping all other covariates
fixed.
Retired
The coefficient of retired (whether the individual is retired or not) is positive, and
the odds ratio is: exp(ˆ j ) exp(3.095) 22.081 . Since the reference category is Yes
(retired), the likelihood of defaulting is about 22 times higher for those who are
not retired as compared to those who are retired, keeping all other covariates fixed.