Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Academia.eduAcademia.edu

The Median Lethal Dose-Design and Estimation

2001, Journal of the Royal Statistical Society: Series D (The Statistician)

A biological assay from a pharmaceutical company is simulated with various designs and a range of parameter values. Three estimation techniquesÐFieller's method, pro®le likelihood and the bootstrapÐof ®nding con®dence intervals for the median lethal dose LD 50 are compared. Correcting for bias does not substantially improve the maximum likelihood point estimator of LD 50. The choice of the number of dose levels and sample size at each dose is seen to be critical in terms of the accuracy and precision of the estimation techniques.

The Statistician (2001) 50, Part 1, pp. 41±50 The median lethal doseÐdesign and estimation Gabrielle E. Kelly National University of Ireland, Dublin, Republic of Ireland [Received September 1999. Revised June 2000] Summary. A biological assay from a pharmaceutical company is simulated with various designs and a range of parameter values. Three estimation techniquesÐFieller's method, pro®le likelihood and the bootstrapÐof ®nding con®dence intervals for the median lethal dose LD50 are compared. Correcting for bias does not substantially improve the maximum likelihood point estimator of LD50 . The choice of the number of dose levels and sample size at each dose is seen to be critical in terms of the accuracy and precision of the estimation techniques. Keywords: Bootstrap; Fieller interval; LD50 ; Monte Carlo simulation; Probit response curve; Pro®le likelihood interval 1. Introduction Interest in estimation of the median lethal dose LD50 arose from consultation with a pharmaceutical company in Ireland. The company regularly carries out a biological assay on mice. Doses of a compound at eight levels are administered to mice, with 10 mice at each dose level, to assess the toxicity of the compound. The death or survival within 24 h of each mouse is recorded. However, the assay has often to be repeated, because the con®dence limits for the toxicity of the compound, as measured by LD50, do not satisfy regulatory requirements. As this is expensive it is appropriate to examine the design of the assay and estimation methods to ®nd ways in which it could be improved. The choice of a time limit of 24 h for death or survival of a mouse is made because results are needed quickly in the production process and furthermore it has been found in practice that a change in survival status after 24 h occurs only rarely. To understand why eight dose levels and 10 mice at each dose level are used we introduce the following notation. De®ne the random variable Y by Y ˆ 1 if the outcome for a mouse is death and Y ˆ 0 if the outcome is survival and let ð ˆ P(Y ˆ 1). The response Y varies with dose. From repeated assays, the proportions of deaths when plotted against log(dose) were found to be close to those expected in a cumulative normal distribution function. Therefore a model for ð is taken as ð ˆ Öf(t ÿ ì)=ó g, (1) where Ö is the cumulative distribution function of the standard normal distribution and t is de®ned by the link function (t ÿ ì)=ó ˆ á ‡ â log(dose) ˆ Öÿ1 (ð): (2) This is known as the probit response dose curve, in which t represents the dose (on the log-scale) likely to kill 100ð% of experimental animals; so, for example, t ˆ ì corresponds to the logarithm Address for correspondence: Gabrielle E. Kelly, Department of Statistics, University College Dublin, National University of Ireland, Bel®eld, Dublin 4, Republic of Ireland. E-mail: gabrielle.kelly@ucd.ie & 2001 Royal Statistical Society 0039±0526/01/50041 42 G. E. Kelly of the median lethal dose, log(LD50 ). From equation (2), log(LD50 ) ˆ ÿá=â. We de®ne the tolerance of a mouse as the dose of such a magnitude as to be just insuf®cient to cause death, and the model here assumes that the log-tolerances in the mouse population have a symmetric distribution that is usually close to normal. The cumulative distribution function of the tolerance distribution is Ö. This biological interpretation of the model has led to its appeal in bioassays. Finney (1971) contains many examples of its use. It is usual to estimate the slope â and intercept á of equation (2), together with their asymptotic variances and covariance, by maximum likelihood (ML). The LD50 is then estimated as the ratio of the estimates of á and â. The two methods that have found most prominence in the literature for then determining a con®dence interval for LD50 are Fieller's, as described in Finney (1971), and the likelihood ratio (LR) as proposed by Williams (1986). The LR interval is also known as the pro®le likelihood method. Here we examine also a bootstrap con®dence interval for LD50. In addition we consider point estimation of LD50, as the ML estimator is biased. We correct for bias by using the bootstrap distribution, the delta method and the midpoint of the Fieller interval and we compare the results. The eight doses used by the company bracket the target log(LD50 ) of 2 and the choice of 10 mice has been found to give adequate precision when log(LD50 ) ˆ 2. In practice, because of variation in the manufacturing process, the true log(LD50 ) will take a range of values and within this range the standard deviation of the tolerances may also vary. It is not clear whether we should take a large number of dose levels, k, with a relatively small number n of mice at each dose, or whether we should use eight dose levels or fewer, as is done at present, and increase n. There is no simple relationship between the way that dose levels are chosen and the precision obtained for the estimated LD50. Simulation is needed to study this. A further discussion of this problem can be found in Finney (1971). In this paper we simulate designs with a range of dose levels and varying number n of mice at each dose level. A range of typical values for á and â found in practice by the company are used. We compare the Fieller, LR and bootstrap intervals and estimators for LD50 for these designs. Approximate values of k and n necessary to achieve prespeci®ed degrees of precision and coverage for a range of the parameters are found from the simulation. The paper is organized as follows. Section 2 describes the Wald, Fieller, LR and bootstrap con®dence intervals for LD50 as well as point estimators. Section 3 describes a simulation study. Section 4 looks at its results. Section 5 summarizes the results, and conclusions are drawn. 2. Point and interval estimation Suppose that the experiment contains k dose levels with log-doses x1 , x2 , . . ., x k and replications n1 , n2 , . . ., n k , where Ó n i ˆ n, and the n animals are randomly assigned to the dose levels. The number of deaths at the ith dose is denoted by yi, and the yi are assumed independently distributed binomial(n i , ð i ). The probabilities ð i are assumed to be related to the log-doses either through the logistic model, ð i ˆ F(á ‡ âxi ), where F is the standard logistic distribution, or through the probit model, ð i ˆ Ö(á ‡ âxi ), as described in Section 1. log(LD50 ) is given by r ˆ ÿá=â. The standard deviation of the tolerance distribution is given by 1=â. The remainder of this paper will assume the probit model, but similar arguments can be applied to the logistic model. The likelihood of the data is given by   k Q ni (3) Ö(á ‡ âxi ) yi f1 ÿ Ö(á ‡ âxi )g n i ÿ yi : L(á, â) ˆ yi tˆ1 ^ Their variances var(á) ^ and â. ^ ˆ íá Numerical methods can be used to ®nd the ML estimates á ^ ^ ^ â) ˆ íáâ are estimated from the inverse of the sample and var( â) ˆ íâ, and their covariance cov(á, Median Lethal Dose 43 information matrix. We denote the estimates by í^á, í^â and í^áâ respectively. The ML estimate of r ^ ^ â. is r^ ˆ ÿá= 2.1. Wald's interval The estimator r^ is a ratio estimator and using the delta method an approximate variance is given by ( ) ^2 E(á) íá íâ 2íáâ ^ ˆ : (4) ‡ ÿ var(r) ^ 2 E(á) ^ ^ 2 E(á) ^ 2 E( â) ^ E( â) E( â) Thus, aspr^ is asymptotically normal, an approximate 100(1ÿá)% con®dence interval for r is ^ íá , í â ^ í^á , í^â and í^áâ for E(á), ^ E( â), c r), c r) ^ â, ^ where var( ^ is found by substituting á, r^  zá=2 var( and íáâ respectively in equation (4) and zá is the 100á upper percentage point of the standard normal distribution. This is identical with the Wald interval as shown by Cox (1990). It will not be studied here because, as shown by Finney (1971) and Lui (1998), it is too short on average. 2.2. Fieller's interval Because, under suitable regularity conditions, ML estimates are asymptotically normal, Fieller ^ ‡ râ^ was approximately distributed as N (0, íá ‡ 2ríá⠇ r2 íâ ). The Fieller (1940) argued that á interval for r, with con®dence coef®cient 1 ÿ á, is then given by the set of r satisfying ^2 ^ ‡ râ) (á , z 2á íá⠇ r2 í^â í^á ‡ 2r^ (5) where zá is de®ned as before. The limits of this interval are given by the roots of the following quadratic equation in r: ^ 2 ˆ z 2 (^ ^ ‡ râ) íá⠇ r2 í^â ) (á á íá ‡ 2r^ (6) which gives ^  p f(z2 í^áâ ÿ á ^ 2 ÿ ( â^2 ÿ z2 í^â )(á ^â) ^â) ^2 ÿ z2á í^á )g (z2á í^áâ ÿ á á á : rˆ â^2 ÿ z2 í^â á ^ Cox (1990) showed that if n is large the asymmetric intervals This interval is asymmetric about r. given by equation (6) are close to the symmetric intervals given by the Wald intervals using íâ , z2á , i.e. the Wald test of ⠈ 0 is equation (4). The Fieller interval is of in®nite length if â^2 =^ not statistically signi®cant. 2.3. Likelihood ratio (pro®le likelihood) intervals The Williams (1986) method for ®nding a con®dence interval for r is based on the asymptotic LR test of the null hypothesis of r ˆ r0 against the alternative r 6ˆ r0 . Consider the log-likelihood of the data l(á, â) and the pro®le likelihood of r given by Pl(r) ˆ maxfl(ÿrâ, â)g: â (7) Under the null hypothesis ð i ˆ Ö(ÿr0 â0 ‡ â0 xi ) ˆ Öfâ0 (xi ÿ r0 )g and, after evaluating the ML estimate â^0 of the single unknown parameter â0 , we obtain the log-likelihood under the null ^ The LR test rejects the ^ â). hypothesis, Pl(r0 ). The log-likelihood under the alternative is l(á, 2 ^ ^ â)g . zá . It follows that the 1 ÿ á LR interval for r is null hypothesis when ÿ2fPl(r0 ) ÿ l(á, 44 G. E. Kelly ^ , z2 . In practice it may be found by doing a ^ â)g given by the set of r0 satisfying ÿ2fPl(r0 ) ÿ l(á, á grid search, i.e. computing the pro®le likelihood value Pl(r0 ) for a suitable set of ®xed values of r0 , and tabulating or plotting the function Pl(r0 ) against r0 ; from this the approximate range for ^ , z2 can be deduced. This is computationally expensive. Alho and ^ â)g which ÿ2fPl(r0 ) ÿ l(á, á Valtonen (1995) used Newton's method to calculate the end points of the interval. They de®ned ^ ÿ Pl(r)g ÿ z2 . Then Newton's method proceeds iteratively as ^ â) h(r) ˆ 2fl(á, á r( j‡1) ˆ r( j) ÿ h(r( j) )=h9(r( j) ), j ˆ 0, 1, 2 . . . : Further details can be found in Alho and Valtonen (1995). This interval we call the LR interval. As shown by Williams (1986), if the slope â is not signi®cantly different from 0 by the LR test, the LR interval is of in®nite length. 2.4. Bootstrap intervals The bootstrap method for ®nding a con®dence interval for an unknown parameter is outlined in Efron and Tibshirani (1993). A parametric bootstrap method is used here. Using the ML estimates ^ responses yi are randomly generated from a binomial(n i , ð^i ), i ˆ 1, . . ., k, where ð^i ^ and â, á ^ i ), and the ML estimates á , â and r ˆ ÿá =â for these data are found. The ^ ‡ âx ˆ Ö(á estimates á , â and r are called bootstrap estimates. This simulation of data is repeated a large number B of times, giving bootstrap estimates r1 , r2 , . . ., rB . The distribution of these estimates is known as the bootstrap distribution. These values are ranked from smallest to largest. Many methods are available for identifying a 100(1 ÿ á)% con®dence interval from this rank-ordered distribution as described in Efron (1987). The percentile method uses the 100á=2% and 100(1 ÿ á=2)% ranked values to de®ne the con®dence interval. This interval we call the bootstrap interval. The advantage of the bootstrap is that the joint distribution of the ML estimators of á and â is not assumed to be normal, unlike in the Fieller method. Also, the asymptotic result, that the LR statistic is ÷ 2 , is not used. 2.5. Point estimation The ML estimator of r, being a ratio estimator, is biased. Using the delta method the bias can be d Alternad ˆ í^áâ =â^2 ÿ á^ ^íâ =â^3, resulting in the bias-corrected estimator r^ ÿ bias. estimated by bias tively we can use the midpoint of the Fieller con®dence interval to estimate r. However, the choice of con®dence coef®cient is arbitrary and different choices will give different estimators. Denote the mean (or median) of the bootstrap distribution by r^mid. The bootstrap estimator of the bias is ^ resulting in the bias-corrected bootstrap estimator of 2r^ ÿ r^mid . r^mid ÿ r, 3. Simulation study A large Monte Carlo study was carried out with the eight log-dose levels used by the company. These are equally spaced between 1.571 and 2.437. These log-doses are dilution factors, so a small log-dose will result in a high response. As mentioned in Section 1, because of variation in the manufacturing process, the true log(LD50 ) may range from 1.89 to 2.09 and within this range the standard deviation of the tolerances may also vary from about 0.07 to 0.11. The doses used by the company follow the recommendation of Dixon and Mood (1948) that they bracket LD50 (when it equals 2) and give â(xi‡1 ÿ xi ) a value in the range 1±2. However, this design is only optimal when the true LD50 is 2. We present results for four representative values of á and â: (a) á ˆ 20 and ⠈ ÿ10 (log(LD50 ) ˆ 2:0); Median Lethal Dose 45 (b) á ˆ 30 and ⠈ ÿ15 (log(LD50 ) ˆ 2:0); (c) á ˆ 17 and ⠈ ÿ9 (log(LD50 ) ˆ 1:89); (d) á ˆ 20 and ⠈ ÿ9:57 (log(LD50 ) ˆ 2:09). However, a grid of values for LD50 between 1.89 and 2.09 and â between 0.07 and 0.11 was taken. For each of (a)±(d) replications n i ˆ 10 and n i ˆ 15 were used at each dose level, and 1000 samples were drawn according to the probit model. If the dose±response curve is steep ( â large) relative to the spread of doses fewer than two dose groups may have observed mortalities strictly between 0 and 100%. In such cases the ML estimate of â is in®nite and con®dence intervals cannot be determined. This fact has been used by Chanter and Heywood (1982) as an argument for advocating the use of more dose levels and by Williams (1986) for increasing n i . Therefore the simulation was repeated, increasing the number of dose levels to 15, again equally spaced between 1.571 and 2.437 and n i ˆ 10 at each dose. Thus there are more dose levels bracketing the more extreme LD50 s, i.e. 1.8 and 2.09, which will increase precision, though at greater cost. The simulation was also carried out using six dose levels, omitting the highest and lowest log-dose levels used by the company but with n i ˆ 15 and n i ˆ 20 at each dose. The Newton±Raphson method was used to ®nd the ML estimates of the parameters in the model. For each sample, each of the methods described in Section 2 was used to ®nd a point estimate and 95% con®dence interval for log(LD50 ). For the LR interval both Newton's method and a grid search were used. For the bootstrap, B ˆ 1000 replicates were made for each sample. For the interval estimates, the criteria used to evaluate them were the ®niteness of the interval, coverage probability, average and median width of the interval and standard deviation of the width. All calculations were correct to four decimal places. The results are presented in Table 1; 90% and 99% con®dence intervals were also computed and the results were consistent with Table 1. Values of n i less than 10 were also simulated but it was found that n i at least 10 was needed to achieve reasonable coverage and precision. For the point estimates, the average and median bias (from true) were computed, together with the standard deviation of the bias. Because some values in the bootstrap distribution were quite extreme, the median (rather than the mean) of the bootstrap distribution was used to produce the bias-corrected bootstrap estimator. Results on point estimation can be seen in Table 2. The results for the other experimental con®gurations of Table 1 and the other simulations carried out on the bias were consistent with Table 2. 4. Results From Table 1 we see that the existence of the ML estimates was best when 15 doses and n i ˆ 10 or six doses and n i ˆ 20 were used. In this respect the latter pair is to be preferred as it has a smaller sample size. Shorter intervals were found for all three methods when log(LD50 ) was 2, as then the log-dose levels are symmetric around 2. As to be expected also, the mean width of the intervals was shortest when the slope was steepest, i.e. á ˆ 30 and ⠈ ÿ15. Unlike Williams's (1986) ®nding, for most simulations the LR interval did not lie within the Fieller interval. For all experimental con®gurations the LR method performed worst of all three methods, in terms of ®niteness, coverage probability and width of the interval. In certain instances the lower bound was 0 and=or a ®nite upper bound did not exist. This is illustrated in Fig. 1, where Newton's algorithm converges very slowly to give a lower con®dence bound of 0 and the upper bound is ®nite. The responses here were 10, 10, 6, 2, 1, 0, 0 and 0. Fig. 1 illustrates that the ÷ 2 -approximation to the LR statistic does not always work well. In this example both the bootstrap and the Fieller methods gave con®dence limits of 1.81 and 1.93. 46 G. E. Kelly Table 1. Percentage of ®nite intervals, coverage probability, mean and median width and standard deviation of the width of the ®nite intervals, of the Fieller, LR and bootstrap 95% con®dence intervals for LD50 { Method % ®nite intervals Coverage Mean width Standard deviation of width Median width Set (A) (a) á ˆ 20; ⠈ ÿ10; n ˆ 10; number of ®nite ML estimate ˆ 966 Fieller 100 0.962 LR 86 0.979 Bootstrap 100 0.942 0.114 0.171 0.097 0.0094 0.2739 0.0156 0.113 0.134 0.098 (b) á ˆ 30; ⠈ ÿ15; n ˆ 10; number of ®nite ML estimate ˆ 739 Fieller 100 0.992 LR 96 0.997 Bootstrap 100 0.986 0.102 0.120 0.075 0.0073 0.1100 0.0149 0.103 0.104 0.071 (c) á ˆ 17; ⠈ ÿ9; n ˆ 10; number of ®nite ML estimate ˆ 974 Fieller 100 0.969 LR 82 0.983 Bootstrap 100 0.945 0.118 0.147 0.103 0.0105 0.0550 0.0158 0.118 0.143 0.104 (d) á ˆ 20; ⠈ ÿ9:57; n ˆ 10; number of ®nite ML estimate ˆ 948 Fieller 100 0.961 0.116 LR 83 0.963 0.145 Bootstrap 100 0.912 0.101 0.0096 0.0333 0.0143 0.116 0.141 0.101 Set (B) (a) á ˆ 20; ⠈ ÿ10; n ˆ 15; number of ®nite ML estimate ˆ 996 Fieller 100 0.963 LR 97.7 0.965 Bootstrap 100 0.944 0.088 0.106 0.080 0.0079 0.0412 0.0108 0.089 0.102 0.081 (b) á ˆ 30; ⠈ ÿ15; n ˆ 15; number of ®nite ML estimate ˆ 902 Fieller 100 0.988 LR 97 0.969 Bootstrap 100 0.969 0.077 0.083 0.063 0.0061 0.0155 0.0109 0.077 0.080 0.064 (c) á ˆ 17; ⠈ ÿ9; n ˆ 15; number of ®nite ML estimate ˆ 999 Fieller 100 0.945 LR 95.2 0.954 Bootstrap 100 0.928 0.093 0.110 0.086 0.0080 0.0251 0.0102 0.093 0.105 0.087 (d) á ˆ 20; ⠈ ÿ9:57; n ˆ 15; number of ®nite ML estimate ˆ 983 Fieller 100 0.956 0.091 LR 94.2 0.957 0.108 Bootstrap 100 0.928 0.084 0.0075 0.0453 0.0093 0.091 0.103 0.084 Set (C) (a) á ˆ 20; ⠈ ÿ10; n ˆ 10; number of ®nite ML estimate ˆ 1000 Fieller 100 0.960 LR 96 0.979 Bootstrap 100 0.937 0.075 0.100 0.071 0.0059 0.0445 0.0066 0.076 0.094 0.071 (b) á ˆ 30; ⠈ ÿ15; n ˆ 10; number of ®nite ML estimate ˆ 990 Fieller 100 0.949 LR 91 0.968 Bootstrap 100 0.913 0.063 0.084 0.058 0.0056 0.0578 0.0071 0.064 0.079 0.058 (c) á ˆ 17; ⠈ ÿ9; n ˆ 10; number of ®nite ML estimate ˆ 1000 Fieller 100 0.959 LR 97 0.974 Bootstrap 100 0.942 0.079 0.105 0.075 0.0064 0.0302 0.0071 0.079 0.100 0.076 (d) á ˆ 20; ⠈ ÿ9:57; n ˆ 10; number of ®nite ML estimate ˆ 1000 Fieller 100 0.948 0.077 LR 96 0.978 0.100 Bootstrap 100 0.931 0.073 0.0059 0.0196 0.0067 0.077 0.096 0.073 (continued ) Median Lethal Dose 47 Table 1 (continued ) Method % ®nite intervals Coverage Mean width Standard deviation of width Median width Set (D) (a) á ˆ 20; ⠈ ÿ10; n ˆ 15; number of ®nite ML estimate ˆ 994 Fieller 100 0.956 LR 100 0.971 Bootstrap 100 0.933 0.088 0.109 0.080 0.0078 0.0726 0.0107 0.089 0.101 0.081 (b) á ˆ 30; ⠈ ÿ15; n ˆ 15; number of ®nite ML estimate ˆ 888 Fieller 100 0.986 LR 97 0.972 Bootstrap 100 0.971 0.076 0.081 0.062 0.0062 0.0340 0.0110 0.076 0.078 0.064 (c) á ˆ 17; ⠈ ÿ9; n ˆ 15; number of ®nite ML estimate ˆ 999 Fieller 100 0.953 LR 95 0.957 Bootstrap 100 0.930 0.095 0.124 0.087 0.0102 0.0685 0.0121 0.094 0.107 0.087 (d) á ˆ 20; ⠈ ÿ9:57; n ˆ 15; number of ®nite ML estimate ˆ 989 Fieller 100 0.947 0.091 LR 95 0.960 0.119 Bootstrap 100 0.923 0.084 0.0090 0.0737 0.0108 0.091 0.104 0.084 Set (E) (a) á ˆ 20; ⠈ ÿ10; n ˆ 20; number of ®nite ML estimate ˆ 999 Fieller 100 0.950 LR 99 0.957 Bootstrap 100 0.932 0.076 0.090 0.071 0.0061 0.0463 0.0074 0.076 0.084 0.072 (b) á ˆ 30; ⠈ ÿ15; n ˆ 20; number of ®nite ML estimate ˆ 963 Fieller 100 0.961 LR 100 0.951 Bootstrap 100 0.948 0.063 0.067 0.055 0.0051 0.0221 0.0084 0.063 0.064 0.056 (c) á ˆ 17; ⠈ ÿ9; n ˆ 20; number of ®nite ML estimate ˆ 999 Fieller 100 0.944 LR 98 0.956 Bootstrap 100 0.929 0.081 0.097 0.076 0.0077 0.0368 0.0085 0.079 0.100 0.076 (d) á ˆ 20; ⠈ ÿ9:57; n ˆ 10; number of ®nite ML estimate ˆ 998 Fieller 100 0.954 0.077 LR 98 0.959 0.090 Bootstrap 100 0.935 0.073 0.0068 0.0273 0.0077 0.078 0.085 0.073 {The model is Y i  binomial(n, ð i ), where probit(ð i ) ˆ á ‡ âxi . LD50 ˆ ÿá=â. Results are shown for four pairs (á, â). In sets (A) and (B) each con®guration has eight log-dose levels xi equally spaced between 1.571 and 2.437; in set (A) at each dose the replication n is 10 and in set (B) it is 15. In set (C) there are 15 log-dose levels equally spaced between 1.571 and 2.437 and at each dose the replication n is 10. In sets (D) and (E) there are six log-dose levels which are the same as in set (A) except that the smallest and largest have been omitted, and the replication n is 15 in set (D) and 20 in set (E). We note that both Newton's method and the grid search gave identical results for the LR interval. A maximum of 30 iterations were done for Newton's method. The grid search found on average an extra 3% of ®nite intervals over Newton's method and these intervals were among the widest. Newton's algorithm failed to converge when the responses at the extremes of the design space were identical e.g. 10, 10, 7, 0, 3, 0, 0 and 0. The coverage probability of the Fieller method was almost always too high. The bootstrap gave varying coverage probability results and in general tended to underestimate the coverage probability. Whereas the coverage probability for Fieller intervals approaches the nominal value, on average, as the total number of animals increases, this occurs at a slower rate with the bootstrap. The mean width of the bootstrap intervals 48 G. E. Kelly Table 2. Mean and median bias, standard deviation SD of the bias, of the ML estimate, Fieller, delta-methodcorrected and bootstrap-method-corrected estimators of LD50 for the four con®gurations of set (A) of Table 1 and n ˆ 10 Method (a) á ˆ 20; ⠈ ÿ10 ML estimate Fieller Delta Bootstrap (c) á ˆ 17; ⠈ ÿ9 ML estimate Fieller Delta Bootstrap Mean Median SD 0.0003 0.0003 0.0003 0.00025 0.0038 0.0038 0.0038 0.0038 0.0251 0.0262 0.0250 0.0267 ÿ0.0005 ÿ0.0008 ÿ0.0004 ÿ0.00006 ÿ0.0010 0.0008 ÿ0.0014 ÿ0.0002 0.0271 0.0272 0.0271 0.0280 Mean Median (b) á ˆ 30; ⠈ ÿ15 0.0010 0.0038 0.0009 0.0038 0.0010 0.0038 0.0009 0.0038 (d) á ˆ 20; ⠈ ÿ9:57 0.0019 0.0019 0.0020 0.0004 0.0019 0.0021 0.0011 ÿ0.0007 SD 0.0185 0.0189 0.0185 0.0211 0.0263 0.0263 0.0263 0.0263 Fig. 1. Plot of deviance dev ˆ 2fl ÿ l(R)g versus R, for one simulation of set (A) of Table 1, with á ˆ 17, ⠈ ÿ9:0 and n ˆ 10, and R denotes LD50 : the reference line 3.841 is also shown was always the shortest of the three methods. Our results on the LR interval are in contrast with those of Alho and Valtonen (1995) and Williams (1986). However, we simulated other experimental con®gurations (which were not of interest to the company), e.g. á ˆ 2, ⠈ ÿ1 and logdose levels 0.5, 1, 1.5, 2, 2.5, 3 and 3.5, and for these, where the slope was not so steep relative to the spread of doses, we found results similar to those of Williams (1986). For these, the coverage probability of all three methods was close to nominal, with the mean width of the bootstrap being shortest, followed by that for the LR method, with that of the Fieller method always the largest. Comparing the various experimental con®gurations of Table 1 we see that for eight dose levels the coverage probability remains approximately the same, going from n i ˆ 10 in set (A) to n i ˆ 15 in Median Lethal Dose 49 set (B) for the Fieller and the bootstrap methods, but the mean width decreases by approximately 0.2. Using 15 doses as opposed to eight reduces the mean width of the intervals by about 0.3 and improves the coverage probability. It is interesting that using six doses with n i ˆ 15 in set (D) compared with eight doses and n i ˆ 10 in set (A) (i.e. approximately the same number of experimental animals) is better in terms of the existence of the ML estimate and also gives an improvement of 0.1 in coverage probability and 0.2 in mean width. There is little improvement in using eight doses with n i ˆ 15 over six doses with n i ˆ 15. The design with six doses and n i ˆ 20 in set (E) does almost as well as that with 15 doses and n i ˆ 10 in set (C). Similarly the design with eight doses and n i ˆ 15 gives less precision than that with six doses and n i ˆ 20. Thus designs which distribute the animals into a small number of dose groups do slightly better than those with the same total number of animals but which have few animals at each of a large number of dose groups. In general we note that for a ®xed number of dose levels the precision increases as n i increases, as expected. We also note that the mean width of the intervals remains relatively constant over the four (á, â) pairs in each of sets (A)±(E) of Table 1. It follows that the sample size to achieve a ®xed precision on average can be read from Table 1. The standard deviation of the width of the intervals is roughly a tenth that of the average width for those from the Fieller and the bootstrap methods, so this sample size will be approximately correct for individual assays also. The same is true for the coverage probability of Fieller intervals. Larger sample sizes than are shown here are necessary to guarantee a prespeci®ed coverage with the bootstrap. With respect to point esimation, from Table 2 we see that the bootstrap bias is smaller than that of the delta method or midpoint of the 95% Fieller interval on average, but it is not a substantial improvement on the ML estimate whose bias is small also. 5. Discussion In the simulation study of Alho and Valtonen (1995) it was shown for particular values of the slope and intercept that the Fieller and LR intervals failed to exist, or were in®nite, in as many as 50% of the simulations. For their simulations also, the true con®dence probability of the LR intervals tended to fall below the nominal level, whereas the Fieller intervals were conservative. This was also shown by Williams (1986). Lui (1998) compared con®dence intervals based on the log-transform of the risk ratio and those by using Fieller's theorem and found the former to be superior. Perhaps if Fieller's theorem had also been applied on the log-scale the two methods might have been found to be more similar. In this simulation we ®nd that none of the three methods, the Fieller, LR and bootstrap methods, are entirely satisfactory in terms of coverage probability for providing con®dence intervals for LD50 for this company, unless very large sample sizes are taken. The LR intervals have the poorest precision and coverage probability of the three methods. In terms of shortest intervals, the bootstrap is clearly the method of choice. Even though the bootstrap is always more precise than the Fieller method, because of its variable coverage probability, it cannot be recommended and therefore the Fieller method is the more satisfactory. In terms of design, we have found that choosing six dose levels and 15±20 animals at each dose for the Fieller method is satisfactory in meeting the company's requirements of precision and coverage. This is contrary to the assertion of Chanter and Heywood (1982) who recommended a large number of dose levels. There is variability in the true log(LD50 ) of the manufacturing process of this company. Thus the log-doses used in the assay are not always symmetric around the true log(LD50 ), resulting in poorer coverage and wider intervals than if they were. This problem has led many researchers (e.g. Dixon and Mood (1948) and Brownlee et al. (1953)) to consider sequential methods. Brownlee 50 G. E. Kelly et al. (1953) showed that the `up-and-down' method requires only two-thirds the number of animals needed for a conventional study to yield the same degree of precision. However, these methods require that trials be made sequentially; before each trial is started the response to the preceding trials must be known. Brownlee et al. (1953) proposed to mitigate this disadvantage by running several short up-and-down series simultaneously, rather than a single long series. This still raises logistic problems with laboratories and has not found popularity. The evidence is strong, though, that a small pilot study which established the approximate location of LD50 would be valuable. Then the log-doses could be chosen to bracket this value with a corresponding increase in precision, and a smaller sample size could be taken. There is also evidence that the standard deviation of the tolerances of the mice is quite variable, ranging from 0.06 to 0.11 on different occasions. This suggests that the mice do not come from a homogeneous population. It is well known that the response of an animal to a given dose is affected by (among other things) its weight, age and previous history. One possibility is to include these factors as covariates in the probit analysis. Other measures, such as those suggested by Zbinden and Flury-Roversi (1981) for the conduct of the trials and care of the animals, if adopted, could lead to a more homogeneous population. Acknowledgements We wish to thank the Joint Editor and Associate Editor for their helpful comments which led to substantial improvements in the paper. References Alho, J. M. and Valtonen, E. (1995) Interval estimation of inverse dose-response. Biometrics, 51, 491±501. Brownlee, K., Hodges, J. and Rosenblatt, M. (1953) The up-and-down method with small samples. J. Am. Statist. Ass., 48, 262±277. Chanter, D. and Heywood, R. (1982) The LD50 test: some considerations of precision. Toxicol. Lett., 10, 303±307. Cox, C. (1990) Fieller's theorem, the likelihood and the delta method. Biometrics, 46, 709±718. Dixon, W. and Mood, A. (1948) A method for obtaining and analyzing sensitivity data. J. Am. Statist. Ass., 43, 109±126. Efron, B. (1987) Better bootstrap con®dence intervals. J. Am. Statist. Ass., 82, 171±200. Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. New York: Chapman and Hall. Fieller, E. (1940) The biological standardization of insulin (with discussion). J. R. Statist. Soc., suppl., 7, 1±64. Finney, D. (1971) Probit Analysis, 3rd edn. London: Cambridge University Press. Lui, K.-J. (1998) Interval estimation of the risk ratio between a secondary infection, given a primary infection, and the primary infection. Biometrics, 54, 706±711. Williams, D. H. (1986) Interval estimation of the median lethal dose. Biometrics, 42, 641±645. Zbinden, G. and Flury-Roversi, M. (1981) Signi®cance of the LD50 test for the toxicological evaluation of chemical substances. Arch. Toxicol., 47, 77±99.