The Statistician (2001)
50, Part 1, pp. 41±50
The median lethal doseÐdesign and estimation
Gabrielle E. Kelly
National University of Ireland, Dublin, Republic of Ireland
[Received September 1999. Revised June 2000]
Summary. A biological assay from a pharmaceutical company is simulated with various designs
and a range of parameter values. Three estimation techniquesÐFieller's method, pro®le likelihood
and the bootstrapÐof ®nding con®dence intervals for the median lethal dose LD50 are compared.
Correcting for bias does not substantially improve the maximum likelihood point estimator of LD50 .
The choice of the number of dose levels and sample size at each dose is seen to be critical in terms
of the accuracy and precision of the estimation techniques.
Keywords: Bootstrap; Fieller interval; LD50 ; Monte Carlo simulation; Probit response curve; Pro®le
likelihood interval
1. Introduction
Interest in estimation of the median lethal dose LD50 arose from consultation with a pharmaceutical company in Ireland. The company regularly carries out a biological assay on mice. Doses
of a compound at eight levels are administered to mice, with 10 mice at each dose level, to assess
the toxicity of the compound. The death or survival within 24 h of each mouse is recorded.
However, the assay has often to be repeated, because the con®dence limits for the toxicity of the
compound, as measured by LD50, do not satisfy regulatory requirements. As this is expensive it is
appropriate to examine the design of the assay and estimation methods to ®nd ways in which it
could be improved. The choice of a time limit of 24 h for death or survival of a mouse is made
because results are needed quickly in the production process and furthermore it has been found in
practice that a change in survival status after 24 h occurs only rarely. To understand why eight
dose levels and 10 mice at each dose level are used we introduce the following notation. De®ne
the random variable Y by Y 1 if the outcome for a mouse is death and Y 0 if the outcome is
survival and let ð P(Y 1). The response Y varies with dose. From repeated assays, the
proportions of deaths when plotted against log(dose) were found to be close to those expected in a
cumulative normal distribution function. Therefore a model for ð is taken as
ð Öf(t ÿ ì)=ó g,
(1)
where Ö is the cumulative distribution function of the standard normal distribution and t is
de®ned by the link function
(t ÿ ì)=ó á â log(dose) Öÿ1 (ð):
(2)
This is known as the probit response dose curve, in which t represents the dose (on the log-scale)
likely to kill 100ð% of experimental animals; so, for example, t ì corresponds to the logarithm
Address for correspondence: Gabrielle E. Kelly, Department of Statistics, University College Dublin, National University
of Ireland, Bel®eld, Dublin 4, Republic of Ireland.
E-mail: gabrielle.kelly@ucd.ie
& 2001 Royal Statistical Society
0039±0526/01/50041
42
G. E. Kelly
of the median lethal dose, log(LD50 ). From equation (2), log(LD50 ) ÿá=â. We de®ne the
tolerance of a mouse as the dose of such a magnitude as to be just insuf®cient to cause death,
and the model here assumes that the log-tolerances in the mouse population have a symmetric
distribution that is usually close to normal. The cumulative distribution function of the tolerance
distribution is Ö. This biological interpretation of the model has led to its appeal in bioassays.
Finney (1971) contains many examples of its use.
It is usual to estimate the slope â and intercept á of equation (2), together with their asymptotic
variances and covariance, by maximum likelihood (ML). The LD50 is then estimated as the ratio
of the estimates of á and â. The two methods that have found most prominence in the literature
for then determining a con®dence interval for LD50 are Fieller's, as described in Finney (1971),
and the likelihood ratio (LR) as proposed by Williams (1986). The LR interval is also known as
the pro®le likelihood method. Here we examine also a bootstrap con®dence interval for LD50. In
addition we consider point estimation of LD50, as the ML estimator is biased. We correct for bias
by using the bootstrap distribution, the delta method and the midpoint of the Fieller interval and
we compare the results.
The eight doses used by the company bracket the target log(LD50 ) of 2 and the choice of 10
mice has been found to give adequate precision when log(LD50 ) 2. In practice, because of
variation in the manufacturing process, the true log(LD50 ) will take a range of values and within
this range the standard deviation of the tolerances may also vary. It is not clear whether we should
take a large number of dose levels, k, with a relatively small number n of mice at each dose, or
whether we should use eight dose levels or fewer, as is done at present, and increase n. There is no
simple relationship between the way that dose levels are chosen and the precision obtained for the
estimated LD50. Simulation is needed to study this. A further discussion of this problem can be
found in Finney (1971). In this paper we simulate designs with a range of dose levels and varying
number n of mice at each dose level. A range of typical values for á and â found in practice by
the company are used. We compare the Fieller, LR and bootstrap intervals and estimators for LD50
for these designs. Approximate values of k and n necessary to achieve prespeci®ed degrees of
precision and coverage for a range of the parameters are found from the simulation.
The paper is organized as follows. Section 2 describes the Wald, Fieller, LR and bootstrap
con®dence intervals for LD50 as well as point estimators. Section 3 describes a simulation study.
Section 4 looks at its results. Section 5 summarizes the results, and conclusions are drawn.
2. Point and interval estimation
Suppose that the experiment contains k dose levels with log-doses x1 , x2 , . . ., x k and replications
n1 , n2 , . . ., n k , where Ó n i n, and the n animals are randomly assigned to the dose levels. The
number of deaths at the ith dose is denoted by yi, and the yi are assumed independently distributed
binomial(n i , ð i ). The probabilities ð i are assumed to be related to the log-doses either through the
logistic model, ð i F(á âxi ), where F is the standard logistic distribution, or through the
probit model, ð i Ö(á âxi ), as described in Section 1. log(LD50 ) is given by r ÿá=â. The
standard deviation of the tolerance distribution is given by 1=â. The remainder of this paper will
assume the probit model, but similar arguments can be applied to the logistic model. The
likelihood of the data is given by
k
Q
ni
(3)
Ö(á âxi ) yi f1 ÿ Ö(á âxi )g n i ÿ yi :
L(á, â)
yi
t1
^ Their variances var(á)
^ and â.
^ íá
Numerical methods can be used to ®nd the ML estimates á
^
^
^ â) íáâ are estimated from the inverse of the sample
and var( â) íâ, and their covariance cov(á,
Median Lethal Dose
43
information matrix. We denote the estimates by í^á, í^â and í^áâ respectively. The ML estimate of r
^
^ â.
is r^ ÿá=
2.1. Wald's interval
The estimator r^ is a ratio estimator and using the delta method an approximate variance is given
by
(
)
^2
E(á)
íá
íâ
2íáâ
^
:
(4)
ÿ
var(r)
^ 2 E(á)
^
^ 2 E(á)
^ 2 E( â)
^ E( â)
E( â)
Thus, aspr^ is asymptotically normal, an approximate 100(1ÿá)% con®dence interval for r is
^ íá , í â
^ í^á , í^â and í^áâ for E(á),
^ E( â),
c r),
c r)
^ â,
^ where var(
^ is found by substituting á,
r^ zá=2 var(
and íáâ respectively in equation (4) and zá is the 100á upper percentage point of the standard
normal distribution. This is identical with the Wald interval as shown by Cox (1990). It will not be
studied here because, as shown by Finney (1971) and Lui (1998), it is too short on average.
2.2. Fieller's interval
Because, under suitable regularity conditions, ML estimates are asymptotically normal, Fieller
^ râ^ was approximately distributed as N (0, íá 2ríáâ r2 íâ ). The Fieller
(1940) argued that á
interval for r, with con®dence coef®cient 1 ÿ á, is then given by the set of r satisfying
^2
^ râ)
(á
, z 2á
íáâ r2 í^â
í^á 2r^
(5)
where zá is de®ned as before. The limits of this interval are given by the roots of the following
quadratic equation in r:
^ 2 z 2 (^
^ râ)
íáâ r2 í^â )
(á
á íá 2r^
(6)
which gives
^ p f(z2 í^áâ ÿ á
^ 2 ÿ ( â^2 ÿ z2 í^â )(á
^â)
^â)
^2 ÿ z2á í^á )g
(z2á í^áâ ÿ á
á
á
:
r
â^2 ÿ z2 í^â
á
^ Cox (1990) showed that if n is large the asymmetric intervals
This interval is asymmetric about r.
given by equation (6) are close to the symmetric intervals given by the Wald intervals using
íâ , z2á , i.e. the Wald test of â 0 is
equation (4). The Fieller interval is of in®nite length if â^2 =^
not statistically signi®cant.
2.3. Likelihood ratio (pro®le likelihood) intervals
The Williams (1986) method for ®nding a con®dence interval for r is based on the asymptotic LR
test of the null hypothesis of r r0 against the alternative r 6 r0 . Consider the log-likelihood of
the data l(á, â) and the pro®le likelihood of r given by
Pl(r) maxfl(ÿrâ, â)g:
â
(7)
Under the null hypothesis ð i Ö(ÿr0 â0 â0 xi ) Öfâ0 (xi ÿ r0 )g and, after evaluating the ML
estimate â^0 of the single unknown parameter â0 , we obtain the log-likelihood under the null
^ The LR test rejects the
^ â).
hypothesis, Pl(r0 ). The log-likelihood under the alternative is l(á,
2
^
^ â)g . zá . It follows that the 1 ÿ á LR interval for r is
null hypothesis when ÿ2fPl(r0 ) ÿ l(á,
44
G. E. Kelly
^ , z2 . In practice it may be found by doing a
^ â)g
given by the set of r0 satisfying ÿ2fPl(r0 ) ÿ l(á,
á
grid search, i.e. computing the pro®le likelihood value Pl(r0 ) for a suitable set of ®xed values of
r0 , and tabulating or plotting the function Pl(r0 ) against r0 ; from this the approximate range for
^ , z2 can be deduced. This is computationally expensive. Alho and
^ â)g
which ÿ2fPl(r0 ) ÿ l(á,
á
Valtonen (1995) used Newton's method to calculate the end points of the interval. They de®ned
^ ÿ Pl(r)g ÿ z2 . Then Newton's method proceeds iteratively as
^ â)
h(r) 2fl(á,
á
r( j1) r( j) ÿ h(r( j) )=h9(r( j) ),
j 0, 1, 2 . . . :
Further details can be found in Alho and Valtonen (1995). This interval we call the LR interval.
As shown by Williams (1986), if the slope â is not signi®cantly different from 0 by the LR test,
the LR interval is of in®nite length.
2.4. Bootstrap intervals
The bootstrap method for ®nding a con®dence interval for an unknown parameter is outlined in
Efron and Tibshirani (1993). A parametric bootstrap method is used here. Using the ML estimates
^ responses yi are randomly generated from a binomial(n i , ð^i ), i 1, . . ., k, where ð^i
^ and â,
á
^ i ), and the ML estimates á , â and r ÿá =â for these data are found. The
^ âx
Ö(á
estimates á , â and r are called bootstrap estimates. This simulation of data is repeated a large
number B of times, giving bootstrap estimates r1 , r2 , . . ., rB . The distribution of these estimates
is known as the bootstrap distribution. These values are ranked from smallest to largest. Many
methods are available for identifying a 100(1 ÿ á)% con®dence interval from this rank-ordered
distribution as described in Efron (1987). The percentile method uses the 100á=2% and
100(1 ÿ á=2)% ranked values to de®ne the con®dence interval. This interval we call the bootstrap
interval. The advantage of the bootstrap is that the joint distribution of the ML estimators of á and
â is not assumed to be normal, unlike in the Fieller method. Also, the asymptotic result, that the
LR statistic is ÷ 2 , is not used.
2.5. Point estimation
The ML estimator of r, being a ratio estimator, is biased. Using the delta method the bias can be
d Alternad í^áâ =â^2 ÿ á^
^íâ =â^3, resulting in the bias-corrected estimator r^ ÿ bias.
estimated by bias
tively we can use the midpoint of the Fieller con®dence interval to estimate r. However, the choice
of con®dence coef®cient is arbitrary and different choices will give different estimators. Denote
the mean (or median) of the bootstrap distribution by r^mid. The bootstrap estimator of the bias is
^ resulting in the bias-corrected bootstrap estimator of 2r^ ÿ r^mid .
r^mid ÿ r,
3. Simulation study
A large Monte Carlo study was carried out with the eight log-dose levels used by the company.
These are equally spaced between 1.571 and 2.437. These log-doses are dilution factors, so a small
log-dose will result in a high response. As mentioned in Section 1, because of variation in the
manufacturing process, the true log(LD50 ) may range from 1.89 to 2.09 and within this range the
standard deviation of the tolerances may also vary from about 0.07 to 0.11. The doses used by the
company follow the recommendation of Dixon and Mood (1948) that they bracket LD50 (when it
equals 2) and give â(xi1 ÿ xi ) a value in the range 1±2. However, this design is only optimal
when the true LD50 is 2. We present results for four representative values of á and â:
(a) á 20 and â ÿ10 (log(LD50 ) 2:0);
Median Lethal Dose
45
(b) á 30 and â ÿ15 (log(LD50 ) 2:0);
(c) á 17 and â ÿ9 (log(LD50 ) 1:89);
(d) á 20 and â ÿ9:57 (log(LD50 ) 2:09).
However, a grid of values for LD50 between 1.89 and 2.09 and â between 0.07 and 0.11 was taken.
For each of (a)±(d) replications n i 10 and n i 15 were used at each dose level, and 1000
samples were drawn according to the probit model. If the dose±response curve is steep ( â large)
relative to the spread of doses fewer than two dose groups may have observed mortalities strictly
between 0 and 100%. In such cases the ML estimate of â is in®nite and con®dence intervals
cannot be determined. This fact has been used by Chanter and Heywood (1982) as an argument
for advocating the use of more dose levels and by Williams (1986) for increasing n i . Therefore the
simulation was repeated, increasing the number of dose levels to 15, again equally spaced between
1.571 and 2.437 and n i 10 at each dose. Thus there are more dose levels bracketing the more
extreme LD50 s, i.e. 1.8 and 2.09, which will increase precision, though at greater cost. The
simulation was also carried out using six dose levels, omitting the highest and lowest log-dose
levels used by the company but with n i 15 and n i 20 at each dose. The Newton±Raphson
method was used to ®nd the ML estimates of the parameters in the model. For each sample, each
of the methods described in Section 2 was used to ®nd a point estimate and 95% con®dence
interval for log(LD50 ). For the LR interval both Newton's method and a grid search were used. For
the bootstrap, B 1000 replicates were made for each sample. For the interval estimates, the
criteria used to evaluate them were the ®niteness of the interval, coverage probability, average and
median width of the interval and standard deviation of the width. All calculations were correct to
four decimal places. The results are presented in Table 1; 90% and 99% con®dence intervals were
also computed and the results were consistent with Table 1. Values of n i less than 10 were also
simulated but it was found that n i at least 10 was needed to achieve reasonable coverage and
precision.
For the point estimates, the average and median bias (from true) were computed, together with
the standard deviation of the bias. Because some values in the bootstrap distribution were quite
extreme, the median (rather than the mean) of the bootstrap distribution was used to produce the
bias-corrected bootstrap estimator. Results on point estimation can be seen in Table 2. The results
for the other experimental con®gurations of Table 1 and the other simulations carried out on the
bias were consistent with Table 2.
4. Results
From Table 1 we see that the existence of the ML estimates was best when 15 doses and n i 10
or six doses and n i 20 were used. In this respect the latter pair is to be preferred as it has a
smaller sample size. Shorter intervals were found for all three methods when log(LD50 ) was 2, as
then the log-dose levels are symmetric around 2. As to be expected also, the mean width of the
intervals was shortest when the slope was steepest, i.e. á 30 and â ÿ15. Unlike Williams's
(1986) ®nding, for most simulations the LR interval did not lie within the Fieller interval. For all
experimental con®gurations the LR method performed worst of all three methods, in terms of
®niteness, coverage probability and width of the interval. In certain instances the lower bound was
0 and=or a ®nite upper bound did not exist. This is illustrated in Fig. 1, where Newton's algorithm
converges very slowly to give a lower con®dence bound of 0 and the upper bound is ®nite. The
responses here were 10, 10, 6, 2, 1, 0, 0 and 0. Fig. 1 illustrates that the ÷ 2 -approximation to the
LR statistic does not always work well. In this example both the bootstrap and the Fieller methods
gave con®dence limits of 1.81 and 1.93.
46
G. E. Kelly
Table 1. Percentage of ®nite intervals, coverage probability, mean and median width and standard deviation of
the width of the ®nite intervals, of the Fieller, LR and bootstrap 95% con®dence intervals for LD50 {
Method
% ®nite intervals
Coverage
Mean width
Standard deviation
of width
Median width
Set (A)
(a) á 20; â ÿ10; n 10; number of ®nite ML estimate 966
Fieller
100
0.962
LR
86
0.979
Bootstrap
100
0.942
0.114
0.171
0.097
0.0094
0.2739
0.0156
0.113
0.134
0.098
(b) á 30; â ÿ15; n 10; number of ®nite ML estimate 739
Fieller
100
0.992
LR
96
0.997
Bootstrap
100
0.986
0.102
0.120
0.075
0.0073
0.1100
0.0149
0.103
0.104
0.071
(c) á 17; â ÿ9; n 10; number of ®nite ML estimate 974
Fieller
100
0.969
LR
82
0.983
Bootstrap
100
0.945
0.118
0.147
0.103
0.0105
0.0550
0.0158
0.118
0.143
0.104
(d) á 20; â ÿ9:57; n 10; number of ®nite ML estimate 948
Fieller
100
0.961
0.116
LR
83
0.963
0.145
Bootstrap
100
0.912
0.101
0.0096
0.0333
0.0143
0.116
0.141
0.101
Set (B)
(a) á 20; â ÿ10; n 15; number of ®nite ML estimate 996
Fieller
100
0.963
LR
97.7
0.965
Bootstrap
100
0.944
0.088
0.106
0.080
0.0079
0.0412
0.0108
0.089
0.102
0.081
(b) á 30; â ÿ15; n 15; number of ®nite ML estimate 902
Fieller
100
0.988
LR
97
0.969
Bootstrap
100
0.969
0.077
0.083
0.063
0.0061
0.0155
0.0109
0.077
0.080
0.064
(c) á 17; â ÿ9; n 15; number of ®nite ML estimate 999
Fieller
100
0.945
LR
95.2
0.954
Bootstrap
100
0.928
0.093
0.110
0.086
0.0080
0.0251
0.0102
0.093
0.105
0.087
(d) á 20; â ÿ9:57; n 15; number of ®nite ML estimate 983
Fieller
100
0.956
0.091
LR
94.2
0.957
0.108
Bootstrap
100
0.928
0.084
0.0075
0.0453
0.0093
0.091
0.103
0.084
Set (C)
(a) á 20; â ÿ10; n 10; number of ®nite ML estimate 1000
Fieller
100
0.960
LR
96
0.979
Bootstrap
100
0.937
0.075
0.100
0.071
0.0059
0.0445
0.0066
0.076
0.094
0.071
(b) á 30; â ÿ15; n 10; number of ®nite ML estimate 990
Fieller
100
0.949
LR
91
0.968
Bootstrap
100
0.913
0.063
0.084
0.058
0.0056
0.0578
0.0071
0.064
0.079
0.058
(c) á 17; â ÿ9; n 10; number of ®nite ML estimate 1000
Fieller
100
0.959
LR
97
0.974
Bootstrap
100
0.942
0.079
0.105
0.075
0.0064
0.0302
0.0071
0.079
0.100
0.076
(d) á 20; â ÿ9:57; n 10; number of ®nite ML estimate 1000
Fieller
100
0.948
0.077
LR
96
0.978
0.100
Bootstrap
100
0.931
0.073
0.0059
0.0196
0.0067
0.077
0.096
0.073
(continued )
Median Lethal Dose
47
Table 1 (continued )
Method
% ®nite intervals
Coverage
Mean width
Standard deviation
of width
Median width
Set (D)
(a) á 20; â ÿ10; n 15; number of ®nite ML estimate 994
Fieller
100
0.956
LR
100
0.971
Bootstrap
100
0.933
0.088
0.109
0.080
0.0078
0.0726
0.0107
0.089
0.101
0.081
(b) á 30; â ÿ15; n 15; number of ®nite ML estimate 888
Fieller
100
0.986
LR
97
0.972
Bootstrap
100
0.971
0.076
0.081
0.062
0.0062
0.0340
0.0110
0.076
0.078
0.064
(c) á 17; â ÿ9; n 15; number of ®nite ML estimate 999
Fieller
100
0.953
LR
95
0.957
Bootstrap
100
0.930
0.095
0.124
0.087
0.0102
0.0685
0.0121
0.094
0.107
0.087
(d) á 20; â ÿ9:57; n 15; number of ®nite ML estimate 989
Fieller
100
0.947
0.091
LR
95
0.960
0.119
Bootstrap
100
0.923
0.084
0.0090
0.0737
0.0108
0.091
0.104
0.084
Set (E)
(a) á 20; â ÿ10; n 20; number of ®nite ML estimate 999
Fieller
100
0.950
LR
99
0.957
Bootstrap
100
0.932
0.076
0.090
0.071
0.0061
0.0463
0.0074
0.076
0.084
0.072
(b) á 30; â ÿ15; n 20; number of ®nite ML estimate 963
Fieller
100
0.961
LR
100
0.951
Bootstrap
100
0.948
0.063
0.067
0.055
0.0051
0.0221
0.0084
0.063
0.064
0.056
(c) á 17; â ÿ9; n 20; number of ®nite ML estimate 999
Fieller
100
0.944
LR
98
0.956
Bootstrap
100
0.929
0.081
0.097
0.076
0.0077
0.0368
0.0085
0.079
0.100
0.076
(d) á 20; â ÿ9:57; n 10; number of ®nite ML estimate 998
Fieller
100
0.954
0.077
LR
98
0.959
0.090
Bootstrap
100
0.935
0.073
0.0068
0.0273
0.0077
0.078
0.085
0.073
{The model is Y i binomial(n, ð i ), where probit(ð i ) á âxi . LD50 ÿá=â. Results are shown for four pairs (á, â). In
sets (A) and (B) each con®guration has eight log-dose levels xi equally spaced between 1.571 and 2.437; in set (A) at each
dose the replication n is 10 and in set (B) it is 15. In set (C) there are 15 log-dose levels equally spaced between 1.571 and
2.437 and at each dose the replication n is 10. In sets (D) and (E) there are six log-dose levels which are the same as in set
(A) except that the smallest and largest have been omitted, and the replication n is 15 in set (D) and 20 in set (E).
We note that both Newton's method and the grid search gave identical results for the LR
interval. A maximum of 30 iterations were done for Newton's method. The grid search found on
average an extra 3% of ®nite intervals over Newton's method and these intervals were among the
widest. Newton's algorithm failed to converge when the responses at the extremes of the design
space were identical e.g. 10, 10, 7, 0, 3, 0, 0 and 0. The coverage probability of the Fieller method
was almost always too high. The bootstrap gave varying coverage probability results and in
general tended to underestimate the coverage probability. Whereas the coverage probability for
Fieller intervals approaches the nominal value, on average, as the total number of animals
increases, this occurs at a slower rate with the bootstrap. The mean width of the bootstrap intervals
48
G. E. Kelly
Table 2. Mean and median bias, standard deviation SD of the bias, of the ML estimate, Fieller, delta-methodcorrected and bootstrap-method-corrected estimators of LD50 for the four con®gurations of set (A) of Table 1 and
n 10
Method
(a) á 20; â ÿ10
ML estimate
Fieller
Delta
Bootstrap
(c) á 17; â ÿ9
ML estimate
Fieller
Delta
Bootstrap
Mean
Median
SD
0.0003
0.0003
0.0003
0.00025
0.0038
0.0038
0.0038
0.0038
0.0251
0.0262
0.0250
0.0267
ÿ0.0005
ÿ0.0008
ÿ0.0004
ÿ0.00006
ÿ0.0010
0.0008
ÿ0.0014
ÿ0.0002
0.0271
0.0272
0.0271
0.0280
Mean
Median
(b) á 30; â ÿ15
0.0010
0.0038
0.0009
0.0038
0.0010
0.0038
0.0009
0.0038
(d) á 20; â ÿ9:57
0.0019
0.0019
0.0020
0.0004
0.0019
0.0021
0.0011
ÿ0.0007
SD
0.0185
0.0189
0.0185
0.0211
0.0263
0.0263
0.0263
0.0263
Fig. 1. Plot of deviance dev 2fl ÿ l(R)g versus R, for one simulation of set (A) of Table 1, with á 17,
â ÿ9:0 and n 10, and R denotes LD50 : the reference line 3.841 is also shown
was always the shortest of the three methods. Our results on the LR interval are in contrast with
those of Alho and Valtonen (1995) and Williams (1986). However, we simulated other experimental con®gurations (which were not of interest to the company), e.g. á 2, â ÿ1 and logdose levels 0.5, 1, 1.5, 2, 2.5, 3 and 3.5, and for these, where the slope was not so steep relative to
the spread of doses, we found results similar to those of Williams (1986). For these, the coverage
probability of all three methods was close to nominal, with the mean width of the bootstrap being
shortest, followed by that for the LR method, with that of the Fieller method always the largest.
Comparing the various experimental con®gurations of Table 1 we see that for eight dose levels the
coverage probability remains approximately the same, going from n i 10 in set (A) to n i 15 in
Median Lethal Dose
49
set (B) for the Fieller and the bootstrap methods, but the mean width decreases by approximately
0.2. Using 15 doses as opposed to eight reduces the mean width of the intervals by about 0.3 and
improves the coverage probability. It is interesting that using six doses with n i 15 in set (D)
compared with eight doses and n i 10 in set (A) (i.e. approximately the same number of
experimental animals) is better in terms of the existence of the ML estimate and also gives an
improvement of 0.1 in coverage probability and 0.2 in mean width. There is little improvement in
using eight doses with n i 15 over six doses with n i 15. The design with six doses and
n i 20 in set (E) does almost as well as that with 15 doses and n i 10 in set (C). Similarly the
design with eight doses and n i 15 gives less precision than that with six doses and n i 20.
Thus designs which distribute the animals into a small number of dose groups do slightly better
than those with the same total number of animals but which have few animals at each of a large
number of dose groups. In general we note that for a ®xed number of dose levels the precision
increases as n i increases, as expected. We also note that the mean width of the intervals remains
relatively constant over the four (á, â) pairs in each of sets (A)±(E) of Table 1. It follows that the
sample size to achieve a ®xed precision on average can be read from Table 1. The standard
deviation of the width of the intervals is roughly a tenth that of the average width for those from
the Fieller and the bootstrap methods, so this sample size will be approximately correct for
individual assays also. The same is true for the coverage probability of Fieller intervals. Larger
sample sizes than are shown here are necessary to guarantee a prespeci®ed coverage with the
bootstrap.
With respect to point esimation, from Table 2 we see that the bootstrap bias is smaller than that
of the delta method or midpoint of the 95% Fieller interval on average, but it is not a substantial
improvement on the ML estimate whose bias is small also.
5. Discussion
In the simulation study of Alho and Valtonen (1995) it was shown for particular values of the
slope and intercept that the Fieller and LR intervals failed to exist, or were in®nite, in as many as
50% of the simulations. For their simulations also, the true con®dence probability of the LR
intervals tended to fall below the nominal level, whereas the Fieller intervals were conservative.
This was also shown by Williams (1986). Lui (1998) compared con®dence intervals based on the
log-transform of the risk ratio and those by using Fieller's theorem and found the former to be
superior. Perhaps if Fieller's theorem had also been applied on the log-scale the two methods
might have been found to be more similar.
In this simulation we ®nd that none of the three methods, the Fieller, LR and bootstrap
methods, are entirely satisfactory in terms of coverage probability for providing con®dence
intervals for LD50 for this company, unless very large sample sizes are taken. The LR intervals
have the poorest precision and coverage probability of the three methods. In terms of shortest
intervals, the bootstrap is clearly the method of choice. Even though the bootstrap is always more
precise than the Fieller method, because of its variable coverage probability, it cannot be
recommended and therefore the Fieller method is the more satisfactory. In terms of design, we
have found that choosing six dose levels and 15±20 animals at each dose for the Fieller method is
satisfactory in meeting the company's requirements of precision and coverage. This is contrary to
the assertion of Chanter and Heywood (1982) who recommended a large number of dose levels.
There is variability in the true log(LD50 ) of the manufacturing process of this company. Thus
the log-doses used in the assay are not always symmetric around the true log(LD50 ), resulting in
poorer coverage and wider intervals than if they were. This problem has led many researchers (e.g.
Dixon and Mood (1948) and Brownlee et al. (1953)) to consider sequential methods. Brownlee
50
G. E. Kelly
et al. (1953) showed that the `up-and-down' method requires only two-thirds the number of
animals needed for a conventional study to yield the same degree of precision. However, these
methods require that trials be made sequentially; before each trial is started the response to the
preceding trials must be known. Brownlee et al. (1953) proposed to mitigate this disadvantage by
running several short up-and-down series simultaneously, rather than a single long series. This still
raises logistic problems with laboratories and has not found popularity. The evidence is strong,
though, that a small pilot study which established the approximate location of LD50 would be
valuable. Then the log-doses could be chosen to bracket this value with a corresponding increase
in precision, and a smaller sample size could be taken. There is also evidence that the standard
deviation of the tolerances of the mice is quite variable, ranging from 0.06 to 0.11 on different
occasions. This suggests that the mice do not come from a homogeneous population. It is well
known that the response of an animal to a given dose is affected by (among other things) its
weight, age and previous history. One possibility is to include these factors as covariates in the
probit analysis. Other measures, such as those suggested by Zbinden and Flury-Roversi (1981) for
the conduct of the trials and care of the animals, if adopted, could lead to a more homogeneous
population.
Acknowledgements
We wish to thank the Joint Editor and Associate Editor for their helpful comments which led to
substantial improvements in the paper.
References
Alho, J. M. and Valtonen, E. (1995) Interval estimation of inverse dose-response. Biometrics, 51, 491±501.
Brownlee, K., Hodges, J. and Rosenblatt, M. (1953) The up-and-down method with small samples. J. Am. Statist. Ass., 48,
262±277.
Chanter, D. and Heywood, R. (1982) The LD50 test: some considerations of precision. Toxicol. Lett., 10, 303±307.
Cox, C. (1990) Fieller's theorem, the likelihood and the delta method. Biometrics, 46, 709±718.
Dixon, W. and Mood, A. (1948) A method for obtaining and analyzing sensitivity data. J. Am. Statist. Ass., 43, 109±126.
Efron, B. (1987) Better bootstrap con®dence intervals. J. Am. Statist. Ass., 82, 171±200.
Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. New York: Chapman and Hall.
Fieller, E. (1940) The biological standardization of insulin (with discussion). J. R. Statist. Soc., suppl., 7, 1±64.
Finney, D. (1971) Probit Analysis, 3rd edn. London: Cambridge University Press.
Lui, K.-J. (1998) Interval estimation of the risk ratio between a secondary infection, given a primary infection, and the
primary infection. Biometrics, 54, 706±711.
Williams, D. H. (1986) Interval estimation of the median lethal dose. Biometrics, 42, 641±645.
Zbinden, G. and Flury-Roversi, M. (1981) Signi®cance of the LD50 test for the toxicological evaluation of chemical
substances. Arch. Toxicol., 47, 77±99.