Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
16 views

Lecture 8

Uploaded by

ahmed.iqbal2907
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lecture 8

Uploaded by

ahmed.iqbal2907
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Categorical Data Analysis

Lecture 8
Analyzing a Binary Response
• We have discusses how to estimate and make
inferences about a single probability of success .
And
• generalizes this discussion to the situation of two
probabilities of success that are now dependent on a
level of a group.
• now completes the generalization to a situation where
there are many different possible probabilities of
success to estimate and perform inferences upon.
• Furthermore, this chapter allows us to quantify how an explanatory
variable with many possible levels (perhaps continuous rather than
categorical) affects the probability of success. These generalizations
are made through the use of binary regression models.
Linear models
Review of normal linear regression models

Yi = 0 + 1xi1 + 2xi2 + … + pxip + i

where i ~ independent N(0, 2) and i = 1, …, n

Note that

E(Yi) = 0 + 1xi1 + 2xi2 + … + pxip

E(Yi) is what one would expect Yi to be on average for a


set of xi1, xi2, …, xip values. Also, note that this model
implies that Yi ~ independent N(E(Yi), 2).
If k (one of the ’s above) is equal to 0, this says there is no
linear relationship between the corresponding explanatory
variable xk and the response variable. If k > 0, there is a
positive relationship, and if k < 0, there is a negative
relationship. All of these statements are with respect to the
other explanatory variables in the model remaining constant.
Regression models for a binary response

Let Yi be a binary response variable for i = 1, …, n,


where a 1 denotes a success and a 0 denotes a failure.

Suppose Yi has a Bernoulli distribution again as in


Chapter 1, but now with the probability of success
parameter i. Thus, Yi ~ independent Bernoulli(i).
Notice that the probability of success can be different for
i = 1, …, n observations. Potentially, there could be n
different parameters then we need to estimate!
We can simplify the number of parameters that we need
to estimate by using a linear model of the form

E(Yi) = 0 + 1xi1

where I am using just one explanatory variable to


simplify the explanation. Because E(Yi) is i, we could
also write the model as

i = 0 + 1xi1

Therefore, instead of potentially having n different


parameters to estimate, we now only have two!!!
To estimate the parameters, we should not proceed as
we would with normal linear regression models for the
following reasons:
 Yi is binary here, but Yi had a continuous distribution in
normal linear regression.
 Var(Yi) = i(1 – i) for a Bernoulli random variable;
thus, the variance potentially changes for each Y i. With
normal linear regression, Var(Yi) = Var(i) = 2 is
constant for i = 1, …, n.
We estimate the ’s through using maximum likelihood
estimation. The likelihood function is

L(0 , 1 | y1,,yn )  P(Y1  y1 )    P(Yn  yn )


n
  P(Yi  yi )
i1
n
1 yi
   (1  i )
yi
i
i1

where i = 0 + 1xi1. Maximizing the likelihood function leads


to the maximum likelihood estimates of 0 and 1.
Unfortunately, there is still a problem – i = 0 + 1xi1 is
not constrained to be within 0 and 1. For particular
values of 0, 1, and xi1, i may end be greater than 1 or
less than 0.
Logistic regression models

• There are a number of solutions to prevent


i from being outside the range of a
probability. Most solutions rely on non-
linear transformations to prevent these
types of problems from occurring. The most
commonly used transformation results in
the logistic regression model:
0 1xi1 p xip
e
i 
1  e0 1xi1p xip

Notice that exp(0  1xi1    p xip )  0 so that the


numerator is always less than the denominator. Thus, 0
< i <1.

The logistic regression model can also be written as

 i 
log    0  1xi1    p xip
 1  i 
• Notice that the left-hand side is the log
transformation of the odds of a success!
This will be very important for us later when
interpreting the effect an explanatory
variable has on the response variable.
 i 
 The log   transformation is often referred to as
 1  i 
the logit transformation. Thus, the most compact way
that people write the model as is

logit( i )  0  1xi1    p xip

 The 0  1xi1    p xip part of the model is often


referred to as the linear predictor.
 We can write the model without the i subscript when
we want to state the model in general:
0 1x1 p xp
e   
 0 1x1 p xp
and log    0  1x1    p xp
1 e  1  

Obviously, this leads to some notational ambiguity with


what we had in Section 1.1 for , but the meaning
should be obvious within the context of the problem.
Example: Plot of  vs. x (PiPlot.R)

When there is only one explanatory variable, 0 = 1, and


1 = 0.5 (or -0.5), a plot of  vs. x looks like the following:
e10.5x 1 e 10.5x 1
 
1  e10.5x 1 1  e10.5x 1
1.0

1.0
0.8

0.8
0.6

0.6


0.4

0.4
0.2

0.2
0.0

-15 -5 0 5 10 15 0.0 -15 -5 0 5 10 15

x1 x1
We can make the following generalizations:
 0<<1
 When 1 > 0, there is a positive relationship between
x1 and . When 1 < 0, there is a negative relationship
between x1 and .
 The shape of the curve is somewhat similar to the
letter s.
 Above  = 0.5 is a mirror image of below  = 0.5.
 The slope of the curve is dependent on the value of x 1.
We can show this mathematically by taking the
d
derivative with respect to x1:  1(1  )
dx1
Questions:

• What happens to the 1 = 0.5 plot when 1


is increased?
• What happens to the 1 = 0.5 plot when 1
is decreased to be close to 0?
• Suppose a plot of logit() vs. x1 was made.
What would the plot look like?
Parameter estimation
Maximum likelihood estimation is used to estimate the
parameters of the model. As shown earlier, the likelihood
function is
n
L(0 , 1,, p | y1,,yn )   iyi (1  i )1 yi
i1

but now

e0 1xi1p xip


i 
1  e0 1xi1p xip
we can find the log likelihood function:
n
log L(0 ,, p | y1,, yn )   yi log( i )  (1  yi )log(1  i )
i 1

n  e0 1xi1 pxip   e0 1xi1  p xip



  yi log  0 1xi1  p xip   (1  yi )log  1  0 1xi1   p xip 
i 1
 1  e   1 e 
n
0 1xi1  p xip
  yi (0  1xi1    p xip )  yi log(1  e )
i 1

(1  yi )log(1  e0 1xi1 pxip )


Taking derivatives with respect to 0, …, p, setting them
equal to 0, and then solving for the parameters lead to
the MLEs. These parameter estimates are denoted by
ˆ 0 , …, ̂p . Corresponding estimates of  are

ˆ 0 ˆ1x1 ˆ p xp
e
ˆ  ˆ 0 ˆ1x1 ˆ p xp
1 e
Unfortunately, there are no closed form expressions that can
be written out for ˆ 0 , …, ̂p except in very simple cases. The
MLEs instead are found through using iterative numerical
procedures.

Newton-Raphson procedure, one of these iterative numerical


procedures, for finding the MLE of  in a homogeneous
population setting

We will use a procedure called iteratively reweighted


least squares (IRLS) to find the maximum likelihood
estimates.
Without going into all of the details behind IRLS,
ˆ
initial estimates for the parameters, say 0 , …, ̂p ,
(0) (0)

are found. Weighted least squares estimation (see


Chapter 11 of my STAT 870 notes; weights are
based on ̂i ) is used to find a “better” set of
parameter estimates.
If the new parameter estimates, say ˆ (1)
0 , …, ̂(1)
p , are very
ˆ
close to 0 , …, ̂p , the iterative numerical procedure is
(0) (0)

said to “converge” and 0 , …, ̂p are used as the MLEs ˆ 0 ,


ˆ (1) (1)

…, ̂p . If the new parameter estimates ˆ (1)


0 , …, ̂(1)
p are not
ˆ
very close to 0 , …, ̂p , weighted least squares estimation
(0) (0)

is used again with new weights.

This iterative process continues until convergence or a prior-


specified maximum number of iterations is reached
The glm() function computes the parameter estimates.

Question: If the prior-specified maximum number of


iterations limit is reached, should the last set of
parameter estimates be used as ˆ 0 , …, ̂p ?
Example: Placekicking (Placekick.R, Placekick.csv)

This example is motivated by the work that I did for my


MS report and Bilder and Loughin (Chance, 1998).

The purpose of this and future examples involving this data


is to estimate the probability of success for a placekick.
Below are the explanatory variables to be considered:

 Week: week of the season


 Distance: Distance of the placekick in yards
 Change: Binary variable denoting lead-change (1) vs.
non-lead-change (0) placekicks; successful lead-
change placekicks are those that change which team
is winning the game.
 Elap30: Number of minutes remaining before the end
of the half with overtime placekicks receiving a value
of 0
 PAT: Binary variable denoting the type of placekick
where a point after touchdown (PAT) is a 1 and a field
goal is a 0
 Type: Binary variable denoting dome (0) vs. outdoor
(1) placekicks
 Field: Binary variable denoting grass (1) vs. artificial
turf (0) placekicks
Wind: Binary variable for placekicks attempted in windy
conditions (1) vs. non-windy conditions (0); I define windy as
a wind stronger than 15 miles per hour at kickoff in an
outdoor stadium
The response variable is referred to as “Good” in the
data set. It is a 1 for successful placekicks and a 0 for
failed placekicks.

There are 1,425 placekick observations from the 1995 NFL


season that are within this data set.
For this particular example, we are only going to use the
distance explanatory variable to estimate the probability
of a successful placekick. Thus, our logistic regression
model is

logit( )  0  1x1

where Y is the good response variable and x1 denotes


the distance in yards for the placekick. Less formally, we
will also write the model as

logit( )  0  1distance
• R to estimate the model with the glm() function
The estimated logistic regression model is

logit( ˆ )  5.8121  0.1150distance

Note that the function gets its name from “generalized


linear model”. This is a general class of linear models
which includes logistic regression models. At the end of
this chapter, I will formally define this general class.
Question: What happens to the estimated probability of
success as the distance increases?
Now is a good time for a reminder of
why R is often referred to as an
“object oriented language, every
object in R has a class associated
with it. The classes for mod.fit are:
Notice all of the method functions have the class name
at their end. For example, there is a summary.glm()
function. When the generic function summary() is run,
R first finds the class of the object and then checks to
see if there is a summary.glm() function. Because the
function exists, this method function completes the main
calculations.
The purpose of generic functions is to use a familiar
language set with any object. For example, we
frequently want to summarize data or a model
(summary()), compute confidence intervals
(confint()), and find predictions (predict()), so
it is convenient to use the same language set no
matter the application.
We can find the estimated probability of success for a
particular distance using:

e5.81210.1150distance
ˆ 
1  e5.81210.1150distance

For example, the probability of success at a distance of


20 is 0.97:

The estimated probability of success for a distance


of 50 yards is 0.52
Using this method to estimate the probability of success,
we can now plot the model with the curve() function:
If more than one explanatory variable is included in the
model, the variable names can be separated by “+” symbols
in the formula argument. For example, suppose we include
the change variable in addition to distance in the model

mod.fit2<-glm(formula = good ~ change + distance, family


= binomial(link = logit), data = placekick)

You might also like