(A) Modeling: 2.3 Models For Binary Responses
(A) Modeling: 2.3 Models For Binary Responses
=
=
p
j
j j
x
1
would be inconsistent with the law of
probability. A simple and effective way of avoiding this difficulty is
to use a transformation
( ) g
that maps the unit interval
| | 1 , 0
onto the whole real line
( ) ,
. That is,
( ) = =
=
p
j
j j
x g
1
.
Several functions (link functions) commonly used in practice are:
1. The logit or logistic function
( )
|
.
|
\
|
1
log
1
g
.
2. The probit or inverse normal function
( ) ( )
1
2
= g
.
3. The complementary log-log function
( ) ( ) | | = 1 log log
3
g
.
4. The log-log function
( ) ( ) | | log log
4
= g
.
2
Note:
( ) ( ) ( ) ( ) = = 1 , 1
3 3 1 1
g g g g
.
Note:
The required inverse functions are
1. The logit or logistic function
( )
e
e
+
=
1
1 .
2. The probit or inverse normal function
( ) ( ) =
2
.
3. The complementary log-log function
( )
e
e
= 1
3
.
4. The log-log function
( )
=
e
e
4
.
Note:
The logistic function is most commonly used link function.
Note:
For the data in the motivating example, suppose the logistic link
function is used. Then,
3
( )
( )
( ) 2. , 1 , 1
exp 1
exp
1
log
2 2 1 1 0
2 2 1 1 0
2 2 1 1 0
= =
+ + +
+ +
=
+ + =
|
.
|
\
|
j
x
x x
x x
x x
j
j
The last equation implies that a larger change in
due to the
change of
j
x
as
is near 0 or 1.
(b) Estimation
Suppose
( ) , , 2, , 1 , , ~ n i m b Y
i i i
K =
with link function
( )
=
=
|
|
.
|
\
|
= =
p
j
ij j
i
i
i i
x g
1
1
log
. Note that
( )
i i i i
m Y E = =
. The likelihood function is
( ) ( )
|
|
.
|
\
|
=
n
i
y m
i
y
i
i
i
i i
i
y
m
y f
1
1 |
and the log-likelihood function is
( ) ( ) | | ( )
( ) ( ) ( )
( )
= =
=
=
(
|
|
.
|
\
|
+
(
+
|
|
.
|
\
|
+ +
(
|
|
.
|
\
|
=
= =
n
i i
i
n
i
i i
i
i
i
n
i
i i i i i
i
i
n
i
i
y
m
m y
y m y
y
m
l y f l
1 1
1
1
log 1 log
1
log
1 log log log
| log
Thus,
4
( )
( )
( )
( )
( )
=
= =
=
=
n
i
ir i i i
n
i
ir i i
i i
i i i
n
i
r
i
i
i
i
i
r
r
x m y
x
m y l l
U
1
1 1
1
1
since
( )
( )
( )
( )
i i
i i i
i
i
i i
i
i
i
i i
i
i
i
i
i
i
i i
i
i
i
i
m y
m y
m
y
m
y
l
|
|
.
|
\
|
=
|
|
.
|
\
|
=
1
1 1
1 1
1 1
1 1 1
1 1
2
2
and
( )
( )
( )
i i
i i
i
i
i
i i
i
i
i
i
i
i
i
i
|
|
.
|
\
|
=
(
|
|
.
|
\
|
=
|
.
|
\
|
1
1
1 1
1
1 1
1 1
1
1
log
1 1
2
2
On the other hand,
( ) ( ) | |
( )
= =
= =
=
n
i
ir is i i i
n
i
ir
s
i
i
i
i
n
i
n
i
ir
s
i
i
s
ir i i i
r s
x x m x m
x m
x m y l
1 1
1 1
2
1
Therefore,
5
( )
( ) ( )
( )
=
=
=
(
=
n
i
ir is i i i
r s r s
sr
x x m
l l
E I
1
2 2
1
Denote
( )
( )
( )
( )
( )
(
(
(
(
=
(
(
(
(
=
(
(
(
(
=
(
(
(
(
(
=
n n n
n n n np n n
p
p
m
m
m
W
m
m
m
x x x
x x x
x x x
X
1 0 0
0 1 0
0 0 1
, ,
2 2 2
1 1 1
2 2
1 1
2
1
2 1
2 22 21
1 12 11
L
M O M M
L
L
M M
L
M O M M
L
L
Then, in matrix form,
( ) ( ) | | ( ) ( )X W X I y X U
t t
= = ,
The Fishers scoring method is
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) | |
( ) ( )
( ) | | ( )
t t
t
t
t
t
t t
t
t t
t
t t t t
t
t t
t
t
t
t t
t
t t
t
t t t t t
z W X X W X
z W X X W X
y W X W X X W X
y X X W X X W X
t U I I
, 2 1, , 0 ,
1
1
1
1
1
1
1
+
+
+
+
+
=
=
+ =
+ =
= + = K
where
( ) ( ) ( )
t t t
tn
t
t
t
y W X
z
z
z
z
1 2
1
+ =
(
(
(
(
=
M
and
6
( )
t
i i i
i i
p
j
tj ij ti
m
y
x z
1
1
=
=
(
+ =
.
Note:
A good choice of starting value usually reduced the number of cycles
by about one or perhaps two.
Note:
After a few cycles of the weighted estimating equation, the fitted
values
| |
t
i i
m
=
are normally quite accurate but the parameter
estimates and their standard error may not be. There are two criteria
tested to detect abnormal convergence of this type. The primary
criterion is based on the change in the fitted probabilities, for
instance by using the deviance. The other is based on the change in
t
.
Note:
Let
( )
( )
( )
( )
2 2 2
1 1 1
1 0 0
0 1 0
0 0 1
=
(
(
(
(
= =
n n n
m
m
m
W W
L
M O M M
L
L
z
( ) ( )
1
= n O E
z
( ) ( ) ( ) | |
1
1
1
+ = n O WX X Cov
t
Note:
The above results are also true for the alternative limit in which n is
fixed and
m
.