Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Assignment On Probit Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

ASSIGNMENT ON PROBIT MODEL

 In statistics, a probit model is a type of regression where


the dependent variable can take only two values, for
example married or not married.
 The word is a portmanteau, coming from probability +
unit.
 The purpose of the model is to estimate the probability
that an observation with particular characteristics will
fall into a specific one of the categories; moreover,
classifying observations based on their predicted
probabilities is a type of binary classification model.

 A probit model is a popular specification for a binary


response model.
 As such it treats the same set of problems as does
logistic regression using similar techniques.
 When viewed in the generalized linear model
framework, the probit model employs a probit link
function. It is most often estimated using the maximum
likelihood procedure, such an estimation being called a
probit regression.
CONCEPTUAL FRAMEWORK
Suppose a response variable Y is binary, that is it can have
only two possible outcomes which we will denote as 1 and 0.
For example, Y may represent presence/absence of a certain
condition, success/failure of some device, answer yes/no on
a survey, etc. We also have a vector of regressors X, which
are assumed to influence the outcome Y. Specifically; we
assume that the model takes the form:

Where Pr denotes probability and Φ is the Cumulative


Distribution Function (CDF) of the standard normal
distribution. The parameters β are typically estimated by
maximum likelihood.

It is possible to motivate the probit model as a latent variable


model. Suppose there exists an auxiliary random variable:

{\ where ε ~ N (0, 1).


Then Y can be viewed as an indicator for whether this latent
variable is positive:
The use of the standard normal distribution causes no loss of
generality compared with the use of a normal distribution
with an arbitrary mean and standard deviation, because
adding a fixed amount to the mean can be compensated by
subtracting the same amount from the intercept, and
multiplying the standard deviation by a fixed amount can be
compensated by multiplying the weights by the same
amount.

To see that the two models are equivalent, note that

MODEL ESTIMATION THROUGH MAXIMUM LIKELIHOOD


METHOD

Suppose data set


contains ‘n’ independent statistical units corresponding to
the model above.

For the single observation, conditional on the vector of


inputs of that observation, we have:

Where xi is a vector of inputs, and β is a ( ) vector of


coefficients.

The likelihood of a single observation (yi,xi) is then

In fact, if yi =1, then

Since the observations are independent and identically


distributed, then the likelihood of the entire sample, or the
joint likelihood, will be equal to the product of the likelihoods
of the single observations:

The joint log-likelihood function is thus:

The estimator which maximizes this function will be


consistent, asymptotically normal and efficient provided that
E[XX'] exists and is not singular. It can be shown that this log-
likelihood function is globally concave in β, and therefore
standard numerical algorithms for optimization will converge
rapidly to the unique maximum.

Asymptotic distribution for β hat is given by


And is the Probability Density Function (PDF) of
standard normal distribution.
PROBIT MODEL OR LOGIT MODEL?

The logit and probit predictors can be written as:

Y^=f (α+βx)

Logit and probit differ in how they define f (∗).


The logit model uses the cumulative distribution function of
the logistic distribution.
The probit model uses the cumulative distribution function of
the standard normal distribution to define f (∗).
Both functions will take any number and rescale it to fall
between 0 and 1.
Hence, whatever α + βx equals; it can be transformed by the
function to yield a predicted probability. Any function that
would return a value between zero and one can work.
But there is a deeper theoretical model underpinning logit
and probit that requires the function to be based on a
probability distribution.
The logistic and standard normal cdf turn out to be
convenient mathematically and are programmed into just
about any general purpose statistical package.
Is logit better than probit, or vice versa? Both methods will
yield similar (though not identical) inferences. Logit – also
known as logistic regression – is more popular in health
sciences like epidemiology partly because coefficients can be
interpreted in terms of odds ratios. Probit models can be
generalized to account for non-constant error variances in
more advanced econometric settings (known as
heteroskedastic probit models) and hence are used in some
contexts by economists and political scientists. If these more
advanced applications are not of relevance, than it does not
matter which method you choose to go with.
PRACTICAL APPLICATION
Research Question: We need to check the likelihood of
admission based on the GRE score, GPA and the rank of the
institution.

 This data set has a binary response (outcome,


dependent) variable called admit.
 There are three predictor variables: gre, gpa and rank.
We will treat the variables gre and gpa as continuous.
 The variable rank takes on the values 1 through 4.
 Institutions with a rank of 1 have the highest prestige,
while those with a rank of 4 have the lowest.
REGRESSION RESULTS:

Call:
glm (formula = admit ~ gre + gpa + rank, family = binomial (link = "
probit"),
Data = binary)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6163 -0.8710 -0.6389 1.1560 2.1035

Coefficients:
Estimate Std. Error z value Pr (>|z|)
(Intercept) -2.386836 0.673946 -3.542 0.000398 ***
gre 0.001376 0.000650 2.116 0.034329 *
gpa 0.477730 0.197197 2.423 0.015410 *
rank2 -0.415399 0.194977 -2.131 0.033130 *
rank3 -0.812138 0.208358 -3.898 9.71e-05 ***
rank4 -0.935899 0.245272 -3.816 0.000136 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 499.98 on 399 degrees of freedom


Residual deviance: 458.41 on 394 degrees of freedom
AIC: 470.41
Number of Fisher Scoring iterations: 4
Interpretation:

 In the output above, the first thing we see is the call,


this is R reminding us what the model we ran was, what
options we specified, etc.
 Next we see the deviance residuals, which are a
measure of model fit. This part of output shows the
distribution of the deviance residuals for individual cases
used in the model. Below we discuss how to use
summaries of the deviance statistic to asses model fit.
 The next part of the output shows the coefficients, their
standard errors, the z-statistic (sometimes called a Wald
z-statistic), and the associated p-values.
 Both gre, gpa, and the three terms for rank are
statistically significant.
 The probit regression coefficients give the change in the
z-score or probit index for a one unit change in the
predictor.
 For a one unit increase in GRE, the z-score increases by
0.001.
 For each one unit increase in GPA, the z-score increases
by 0.478.
 The indicator variables for rank have a slightly different
interpretation. For example, having attended an
undergraduate institution of rank of 2, versus an
institution with a rank of 1, decreases the z-score by
0.415.
WALD TEST

Explanation:

 The Wald test (also called the Wald Chi-Squared Test) is


a way to find out if explanatory variables in a model are
significant.
 “Significant” means that they add something to the
model; variables that add nothing can be deleted
without affecting the model in any meaningful way.

 The null hypothesis for the test is: some parameter =


some value. For example, you might be studying if
weight is affected by eating junk food twice a week.
“Weight” would be your parameter. The value could be
zero (indicating that you don’t think weight is affected
by eating junk food).

 If the null hypothesis is rejected, it suggests that the


variables in question can be removed without much
harm to the model fit.
 If the Wald test shows that the parameters for certain
explanatory variables are zero, you can remove the
variables from the model.
 If the test shows the parameters are not zero, you
should include the variables in the model.
RELEVANCE OF WALD TEST:

 It is sometimes said that the prestige of the institution


mayn’t be well expressed in terms of the rank of the
institution.
 Hence, we need to check the overall impact of rank on
the model through Wald Test
RESULTS

Wald test:
----------

Chi-squared test:
X2 = 21.4, df = 3, P(> X2) = 8.9e-05

Interpretation: The chi-squared test statistic of 21.4 with


three degrees of freedom is associated with a p-value of less
than 0.001 indicating that the overall effect of rank is
statistically significant.

PREDICTING PROBABILITIES:

 We have tried to present the predicted


probabilities in a graphical manner for better
understanding
 Four plots were created each for different level of
GPA, i.e., 2.5, 3, 3.5 and 4
 The color of lines indicate the rank the predicted
probabilities are meant for
TESTING FOR FIT OF THE MODEL

 Herein, we are going to adopt a new approach to test


the fit of the model wherein we will take the overall fit
of the model in consideration.
 This test asks whether the model with predictors fits
significantly better than a model with just an intercept
(i.e. a null model).
 The test statistic is the difference between the residual
deviance for the model with predictors and the null
model.
 The test statistic is distributed chi-squared with degrees
of freedom equal to the differences in degrees of
freedom between the current and the null model (i.e.
the number of predictor variables in the model).
 To find the difference in deviance for the two models
(i.e. the test statistic) we can compute the change in
deviance, and test it using a chi square test—the change
in deviance distributed as chi square on the change in
degrees of freedom.
RESULTS

CHANGE IN
• 41.56335
DEVIANCE

DEGREES OF
•5
FREEDOM

CHI SQUARED
• 7.218932e-08
P-VALUE

INTERPRETATION

 The chi-square of 41.56 with 5 degrees of freedom and


an associated p-value of less than 0.001 tells us that our
model as a whole fits significantly better than an empty
model.
 This is also called a likelihood ratio test
CONCLUSIONS:

 Probit models simply use the cumulative Gaussian


normal distribution rather than the logistic function for
calculating the probability of being in one category or
not.
 Graphical analysis of predicted values can help in better
understanding of the model and results
 The theoretical and statistical analysis must go hand in
hand while developing a model, otherwise false
conclusions may be drawn.
 Theoretical evidence and justification for omission or
involvement of variables in the model should be
supported with statistical evidence.
 While testing the fit of the model, the overall fit of the
model should be considered, so that every aspect of the
model is considered and is in line with the final
conclusion so drawn.

You might also like