Assignment On Probit Model

 In statistics, a probit model is a type of regression where

the dependent variable can take only two values, for
example married or not married.
 The word is a portmanteau, coming from probability +
 The purpose of the model is to estimate the probability
that an observation with particular characteristics will
fall into a specific one of the categories; moreover,
classifying observations based on their predicted
probabilities is a type of binary classification model.

 A probit model is a popular specification for a binary

response model.
 As such it treats the same set of problems as does
logistic regression using similar techniques.
 When viewed in the generalized linear model
framework, the probit model employs a probit link
function. It is most often estimated using the maximum
likelihood procedure, such an estimation being called a
probit regression.
Suppose a response variable Y is binary, that is it can have
only two possible outcomes which we will denote as 1 and 0.
For example, Y may represent presence/absence of a certain
condition, success/failure of some device, answer yes/no on
a survey, etc. We also have a vector of regressors X, which
are assumed to influence the outcome Y. Specifically; we
assume that the model takes the form:

Where Pr denotes probability and Φ is the Cumulative

Distribution Function (CDF) of the standard normal
distribution. The parameters β are typically estimated by
maximum likelihood.

It is possible to motivate the probit model as a latent variable

model. Suppose there exists an auxiliary random variable:

{\ where ε ~ N (0, 1).

Then Y can be viewed as an indicator for whether this latent
variable is positive:
The use of the standard normal distribution causes no loss of
generality compared with the use of a normal distribution
with an arbitrary mean and standard deviation, because
adding a fixed amount to the mean can be compensated by
subtracting the same amount from the intercept, and
multiplying the standard deviation by a fixed amount can be
compensated by multiplying the weights by the same

To see that the two models are equivalent, note that



Suppose data set

contains ‘n’ independent statistical units corresponding to
the model above.

For the single observation, conditional on the vector of

inputs of that observation, we have:

Where xi is a vector of inputs, and β is a ( ) vector of


The likelihood of a single observation (yi,xi) is then

In fact, if yi =1, then

Since the observations are independent and identically

distributed, then the likelihood of the entire sample, or the
joint likelihood, will be equal to the product of the likelihoods
of the single observations:

The joint log-likelihood function is thus:

The estimator which maximizes this function will be

consistent, asymptotically normal and efficient provided that
E[XX'] exists and is not singular. It can be shown that this log-
likelihood function is globally concave in β, and therefore
standard numerical algorithms for optimization will converge
rapidly to the unique maximum.

Asymptotic distribution for β hat is given by

And is the Probability Density Function (PDF) of
standard normal distribution.

The logit and probit predictors can be written as:

Y^=f (α+βx)

Logit and probit differ in how they define f (∗).

The logit model uses the cumulative distribution function of
the logistic distribution.
The probit model uses the cumulative distribution function of
the standard normal distribution to define f (∗).
Both functions will take any number and rescale it to fall
between 0 and 1.
Hence, whatever α + βx equals; it can be transformed by the
function to yield a predicted probability. Any function that
would return a value between zero and one can work.
But there is a deeper theoretical model underpinning logit
and probit that requires the function to be based on a
probability distribution.
The logistic and standard normal cdf turn out to be
convenient mathematically and are programmed into just
about any general purpose statistical package.
Is logit better than probit, or vice versa? Both methods will
yield similar (though not identical) inferences. Logit – also
known as logistic regression – is more popular in health
sciences like epidemiology partly because coefficients can be
interpreted in terms of odds ratios. Probit models can be
generalized to account for non-constant error variances in
more advanced econometric settings (known as
heteroskedastic probit models) and hence are used in some
contexts by economists and political scientists. If these more
advanced applications are not of relevance, than it does not
matter which method you choose to go with.
Research Question: We need to check the likelihood of
admission based on the GRE score, GPA and the rank of the

 This data set has a binary response (outcome,

dependent) variable called admit.
 There are three predictor variables: gre, gpa and rank.
We will treat the variables gre and gpa as continuous.
 The variable rank takes on the values 1 through 4.
 Institutions with a rank of 1 have the highest prestige,
while those with a rank of 4 have the lowest.

glm (formula = admit ~ gre + gpa + rank, family = binomial (link = "
Data = binary)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6163 -0.8710 -0.6389 1.1560 2.1035

Estimate Std. Error z value Pr (>|z|)
(Intercept) -2.386836 0.673946 -3.542 0.000398 ***
gre 0.001376 0.000650 2.116 0.034329 *
gpa 0.477730 0.197197 2.423 0.015410 *
rank2 -0.415399 0.194977 -2.131 0.033130 *
rank3 -0.812138 0.208358 -3.898 9.71e-05 ***
rank4 -0.935899 0.245272 -3.816 0.000136 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 499.98 on 399 degrees of freedom

Residual deviance: 458.41 on 394 degrees of freedom
AIC: 470.41
Number of Fisher Scoring iterations: 4

 In the output above, the first thing we see is the call,

this is R reminding us what the model we ran was, what
options we specified, etc.
 Next we see the deviance residuals, which are a
measure of model fit. This part of output shows the
distribution of the deviance residuals for individual cases
used in the model. Below we discuss how to use
summaries of the deviance statistic to asses model fit.
 The next part of the output shows the coefficients, their
standard errors, the z-statistic (sometimes called a Wald
z-statistic), and the associated p-values.
 Both gre, gpa, and the three terms for rank are
statistically significant.
 The probit regression coefficients give the change in the
z-score or probit index for a one unit change in the
 For a one unit increase in GRE, the z-score increases by
 For each one unit increase in GPA, the z-score increases
by 0.478.
 The indicator variables for rank have a slightly different
interpretation. For example, having attended an
undergraduate institution of rank of 2, versus an
institution with a rank of 1, decreases the z-score by


 The Wald test (also called the Wald Chi-Squared Test) is

a way to find out if explanatory variables in a model are
 “Significant” means that they add something to the
model; variables that add nothing can be deleted
without affecting the model in any meaningful way.

 The null hypothesis for the test is: some parameter =

some value. For example, you might be studying if
weight is affected by eating junk food twice a week.
“Weight” would be your parameter. The value could be
zero (indicating that you don’t think weight is affected
by eating junk food).

 If the null hypothesis is rejected, it suggests that the

variables in question can be removed without much
harm to the model fit.
 If the Wald test shows that the parameters for certain
explanatory variables are zero, you can remove the
variables from the model.
 If the test shows the parameters are not zero, you
should include the variables in the model.

 It is sometimes said that the prestige of the institution

mayn’t be well expressed in terms of the rank of the
 Hence, we need to check the overall impact of rank on
the model through Wald Test

Wald test:

Chi-squared test:
X2 = 21.4, df = 3, P(> X2) = 8.9e-05

Interpretation: The chi-squared test statistic of 21.4 with

three degrees of freedom is associated with a p-value of less
than 0.001 indicating that the overall effect of rank is
statistically significant.


 We have tried to present the predicted

probabilities in a graphical manner for better
 Four plots were created each for different level of
GPA, i.e., 2.5, 3, 3.5 and 4
 The color of lines indicate the rank the predicted
probabilities are meant for

 Herein, we are going to adopt a new approach to test

the fit of the model wherein we will take the overall fit
of the model in consideration.
 This test asks whether the model with predictors fits
significantly better than a model with just an intercept
(i.e. a null model).
 The test statistic is the difference between the residual
deviance for the model with predictors and the null
 The test statistic is distributed chi-squared with degrees
of freedom equal to the differences in degrees of
freedom between the current and the null model (i.e.
the number of predictor variables in the model).
 To find the difference in deviance for the two models
(i.e. the test statistic) we can compute the change in
deviance, and test it using a chi square test—the change
in deviance distributed as chi square on the change in
degrees of freedom.

• 41.56335


• 7.218932e-08


 The chi-square of 41.56 with 5 degrees of freedom and

an associated p-value of less than 0.001 tells us that our
model as a whole fits significantly better than an empty
 This is also called a likelihood ratio test

 Probit models simply use the cumulative Gaussian

normal distribution rather than the logistic function for
calculating the probability of being in one category or
 Graphical analysis of predicted values can help in better
understanding of the model and results
 The theoretical and statistical analysis must go hand in
hand while developing a model, otherwise false
conclusions may be drawn.
 Theoretical evidence and justification for omission or
involvement of variables in the model should be
supported with statistical evidence.
 While testing the fit of the model, the overall fit of the
model should be considered, so that every aspect of the
model is considered and is in line with the final
conclusion so drawn.

