Meta-Analysis in Environmental Statistics: Vic Hasselblad
Meta-Analysis in Environmental Statistics: Vic Hasselblad
Meta-Analysis in Environmental Statistics: Vic Hasselblad
12 ~ ]
1994 Elsevier ScienceB.V. All rights reserved.
Vic Hasselblad
1. Introduction
691
692 V. Hasselblad
and Laird (1986), and (6) a general method of combining evidence as described by
Eddy et al. (1992).
These methods are slowly being added to environmental research, having orig-
inated in the fields of chemistry, physics, clinical medicine and psychological testing
(see Hedges, 1987). This chapter will discuss the methods listed above, giving
applications of each method to environmental studies.
The first step in moving away from the global subjective judgment is the creation of
an evidence table. Although this step is obvious to individuals performing meta-
analyses, it has not always been the first step in a decision making process. An
evidence table, as defined by Eddy (1992), is a table that describes for each study the
design, sample size(s), outcomes measured, types of subjects, interventions com-
pared, potential biases, experimental conditions, and observed outcomes. Evidence
tables have been used for years. For example, Pearson (1904) presented evidence on
the relation between deaths and the presence of vaccination scars in cases of
Small-pox. One early example of an evidence table in the environmental field is
found in the first air quality criteria document for hydrocarbons (US Department of
Health, Education, and Welfare, 1970), a table on the dose-response relationships of
various plants to ethylene. All current EPA criteria and assessment documents
contain several evidence tables.
In some cases, it may be desirable to list all studies in an evidence table, regardless
of quality. However, the evidence table is often the first step in a synthesis of studies
measuring similar health endpoints. In cases where there is varying quality, inclu-
sion criteria may be necessary. Chalmers et al. (1981) have described standards for a
good randomized control trial (RCT) and have developed a scoring system to allow
for a quantitative assessment of reported trials. The scoring system includes factors
such as the method of blinding, the presence of a biostatistician, and loss to
follow-up. Any score can then be used to include or exclude a study, or can be used as
a weighting factor. However, the work of Orwin and Cordray (1985) suggests that
direct ratings of study quality have low reliability.
The inclusion or exclusion of a study is a special case of weighting, where the
weights are either 1 or 0. Laird and Mosteller (1990) give a brief discussion of
weighting. As difficult as weighting or scoring is for RCTs, it is much more difficult
for environmental studies. Not only are many different study designs used, the
definitions of exposures (or treatments) are often made with significant measure-
ment error. General guidelines for inclusion or exclusion of environmental studies
may not be feasible.
Any evidence table is valid only if it considers all available information. The
published literature is particularly susceptible to the claim that it is unrepresentative
of all studies that may have been conducted (the publication bias problem) (Hedges,
1987). There is a strong tendency of the published literature to over represent
statistically significant results. In some cases this may be less of a problem for
Meta-analysis in environmental statistics 693
environmental studies. The large epidemiological studies are very expensive and are
usually published regardless of the results. Even animal studies can cost several
hundred thousand dollars and the results are important whether they are positive or
negative. However, the use of existing data bases for case-control studies does not
require the same level of effort, and the location of all such relevant studies may be
difficult. For additional discussion, see Hedges (1987).
The use of an evidence table can be an end unto itself. The compilation of studies
may be so overwhelmingly positive or negative that further meta-analyses are not
necessary. This is more likely to be the case when the object of the meta-analysis is
the determination statistical significance rather than the estimation of a health effect.
3. Combiningp-values
There are two situations where combining p-values may be appropriate. First, some
studies do not report any effect measures but do report p-values. Second, the study
designs or treatment levels may be so different that combining effect measures would
be inappropriate. In these cases, combining p-values is a method for computing an
overall test of significance.
Assume that there are m independent studies of an effect, each having a p-value
corresponding to the test of the null hypothesis against a specified one-sided
alternative pj, j = 1, 2 . . . . . m . One method for combining p-values from these studies
was originally given by Fisher (1932), and thus the method has been referred to as
Fisher's method (Rosenthall 1984). Fisher noted that the distribution of - 2/ln(pj) is
distributed as a chi-square variable with 2 degrees of freedom. Thus the sum
X 2 = - 2 ~ ln(pi) (3.1)
j=l
Table 1
E p i d e m i o l o g i c a l studies of cognitive effects from low level lead e x p o s u r e in children as given by Schwartz
et al. (1985)
4. Effect sizes
In certain situations there are several studies measuring outcomes which are similar
but not exactly the same. Or the studies measure the same endpoint, but under
different circumstances. One solution to this problem is to create an outcome
measure which does not depend on the scale of measurement. The method of
creating such a measure has been termed the method of 'effect sizes'.
4.1. C o n t i n u o u s o u t c o m e s
Several authors have described the method of effect sizes for combining evidence
(Cohen, 1977; Glass, 1980; Hedges, 1981; Rosenthal and Rubin, 1982; Rosenthal,
1984). Effect sizes are most commonly used for continuous outcome experiments.
The effect size of an experiment d is defined as
d- Mt -- M
S ' (4.1.1)
where M t and M c are the sample means of the treated and control arms respectively
Meta-analysis in environmental statistics 695
and S is the estimate of the standard deviation of single observation. S could be the
estimate of the standard deviation in the control arm, or it could be a pooled
estimate. Assuming a normal distribution for the individual observations with equal
variances in each arm of the experiment, S is given by
where Stz is the sample variance of the treated arm and S 2 is the sample variance of
the control arm. The variance of d is
var(d)-v= ( 1+ . . . .
2vJ/[(nt + nc)J
, (4.1.3)
where t is the standard student-t test statistic for testing the hypothesis of no effect of
the study and v = n t + n C- 2.
N o w assume that there are m studies, so that the d and v are indexed by j,
j = 1, 2 . . . . . m. The combined effect measure is the inverse variance weighted average
of the dj:
(4.1.4)
j=l j=l
where wj = 1/vj.
The homogeneity of the dj can be tested using the statistic
X 2 = ~ w~(dj- d ) 2, (4.1.5)
j=l
d = Pt - Pc. (4.2.1)
The estimate of v is
where Pt and Pc are the observed proportions in the two arms of the study.
696 V. Hasselblad
Laird and Mosteller (1990) have also described the use of effect sizes for
dichotomous outcome variables. They defined d as before, but defined v as
4.3. An example
The following example shows the use of effect sizes to estimate the combined effect of
studies of environmental lead exposure on nerve conduction velocity (NCV). A
summary of the studies was given by Davis and Svendsgaard (1990). The studies are
occupational exposure studies and as a result the actual exposures and lengths of
exposure varied greatly. The summary by Davis and Svendsgaard gave several
measures of NCV for each study, but median sensory NCV is perhaps the most
representative of the measures. Table 2 shows the median sensory NCV effect size for
each study. The result of combining the studies in Table 2 using effect sizes gives an
overall value of - 0.428 which has a corresponding one sided p-value less than 10- 5
Thus the studies show an overall effect assuming a fixed effects model. Davis and
Svendsgaard (1990) give a detailed discussion of the analysis of the complete data set
and also use a empirical Bayes approach (see Section 7.4) with effect sizes. Although
the confidence intervals are slightly wider, the conclusion is the same.
4.4. Discussion
Effect sizes have the advantage that they can combine 'apples and oranges' if the
outcome measures are reasonably similar. For example, one study of lung function
Table 2
Effect sizes for median sensory nerve conduction velocity studies of lead exposure
might measure forced expiratory volume (FEV) and another measure vital capacity
(VC). These two measures are closely related, but the values of VC are larger by
definition. The method of effect sizes incorporates the variance of the estimate of
each study and is therefore an improvement over combining p-values. The disadvan-
tage of the method is that the result is unitless, and as such cannot give us direct
information about the magnitude of the effect seen.
Many studies are designed to compare the rate of a health outcome in a treated
group 0t with the rate in a control group 0c. The results are usually summarized by a
standard 2 x 2 table:
Let N = A + B + C + D. We will assume that there are m studies given in the form of
a 2 x 2 tables. We will subscript the letters A, B, C, D, and N b y j to indicate results
from the j-th study (j = 1, 2 . . . . . m). The obvious estimates for 0t and 0c are
Generally, the parameter of interest from a contingency table is the odds ratio hu
defined as
Although the difference in rates 0 t - 0c, and the relative risk, 0t/0c, are often used
as effect measures, the odds ratio is used for more complicated models involving
dichotomous outcomes. Three methods for calculating a combined estimate of an
odds ratio from 2 x 2 contingency tables are (1) the Mantel-Haenszel method, (2)
Peto's method, and (3) the m a x i m u m likelihood method. The methods all assume a
c o m m o n odds ratio between studies and assume that the binomial model holds for
each study.
A t x D t ~ / V '~ BtxCtq
~J~-Lj~=l ~j j/LjZ= 1 l~ t J. (5.1.1)
var(ln(~))~l/~ll/(l+ 1 1 1)
t /\At ~+~+~j . (5.1.2)
Note that the formula for the estimate has a problem if any of the Bt or C t are zero.
For this reason, all values of At, Bt, Ct, and Dt are often increased by in equations
(5.1.1) and (5.1.2).
vt = I(At+Bt)(At+Ct)][1
Nt st J
Then
Note that this can be thought of as the sum of the observed values At minus their
expected value E(A) given the null hypothesis, with the entire quantity normalized
by the variance. The variance of log (~F) is estimated by
Yusef et al. (1985) gave the justification for the use of the quantity A - E(A) as the
statistic to use for the pooling of the information. An asymptotically efficient test of
Ho: fl = 0 versus Ha: fl < 0 can be based on the slope of the log-likelihood function
evaluated at fl = 0 (Cox and Oates, 1984). For this particular problem, it can be
shown that the slope is proportional to A - E(A).
Meta-analysis in environmental statistics 699
5.4. An example
The US Environmental Protection Agency (1990) summarized several studies of the
health effects of passive smoking. Many of these studies were described earlier by the
National Research Council (1986). There were 19 case-control studies giving
estimates of the odds ratio of death from long cancer in nonsmoking women as a
function of the smoking status of their spouse. Of these 21 studies, seven of them used
the same definition of exposure. This exposure was defined to be smoking by the
spouse of at least one cigarette per day. The seven studies with a common definition
of exposure are in Table 3.
An overall chi-square test for homogeneity of the seven studies gives a value of
3.42 for 6 degrees of freedom (p = 0.755), giving no evidence of heterogeneity. The
results from combining the seven studies using the three methods just described are
in Table 4. The combined estimates suggest that smoking by the spouse results in a
risk of lung cancer in women 1.7 times that of women who's spouse does not smoke.
If all 19 studies are included, as was done by US EPA (1990), the estimated risk is
1.42. Using either estimate, there is a large increased risk of lung cancer in women
from sidestream smoke.
5.5. Discussion
For 2 x 2 contingency tables with moderate sample sizes (all cells with counts of 5 or
more), the method of combining studies makes very little difference. For smaller
sample sizes, the models will give slightly different answers. Hauck (1984) compared
the bias and precision of estimates using maximum likelihood, conditional maxi-
mum likelihood and Mantel-Haenszel. For m = 10, both the Mantel-Haenszel
700 V. Hasselblad
Table 3
Case-control studies of lung cancer deaths in nonsmoking women with husband's smoking
status used as the exposure variable
Table 4
Combined estimates of the odds ratio using three different
methods
Lower Upper
estimator and the conditional maximum likelihood estimator were superior to the
maximum likelihood estimator. For m = 5, all three methods gave very similar bias
and precision. Because the maximum likelihood estimate and the conditional
maximum likelihood estimate are difficult to compute, the Mantel-Haenszel esti-
mate is a logical choice for most problems. The underlying model is slightly different
for Peto's method, and there may be situations where this method is preferred.
A fixed parameter is one which remains the same from study to study. The opposite
of this is a parameter which varies randomly from study to study, and this kind of
Meta-analysisinenvironmentalstatistics 701
parameter is often analyzed using a random effects model (see Section 7). There are
two general methods for combining evidence on a single fixed parameter. The first of
these, the variance weighted method, has been used for many years. The second
method is the calculation of the maximum likelihood estimate from tile joint
likelihood function.
6.1. Inversevarianceweightedmethod
One of the oldest methods of combining estimates of a parameter is the inverse
variance weighted method. This method has been used for years, having been
discussed by Birge (1932) and Cochran (1937). A simple description of the method
was given by Hald (1952). Assume that there are m studies, each giving estimates 0j of
a parameter wherej = 1..... m. Then the minimum variance for any estimate which
is a linear combination of the 0j is
O~Ij~=lWjO~I/IjZ=IWJ1' (6.1.1)
The following statistic will test the null hypothesis that all O~are estimating the same
parameter
6.3. An example
The studies summarized below gave evidence on the effect of nitrogen dioxide (NOz)
on respiratory disease in children. The studies were all prospective, but the model
used for analysis of the data varied from study to study. All studies were adjusted so
that the estimated effect was for an increase of about 30 Bg/m 3 in N O 2 exposure. The
702 V. Hasselblad
Table 5
Summary of studies of the effects of nitrogen dioxide on respiratory illness in children
effect was summarized as an odds ratio for the increase in respiratory disease in
children. The nine studies giving estimates for children in the age range of 6 to 12
years are given in Table 5. The studies were reviewed and combined by Hasselblad
et al. (1992).
Likelihoods for the results of nine studies are in Figure 1. Taken separately, only 4
of the 9 studies were statistically significant at the 0.05 level. The combined estimate
using the variance weighted method is an odds ratio of 1.193 with 95~o confidence
limits of 1.123 to 1.266. The combined estimate using the maximum likelihood
I 2 3
Odds ratio for increase in respiratory d i s e a s e
method is an odds ratio of 1.190 with 95 percent confidence limits of 1.121 to 1.261,
and is also shown in Figure 1. Either estimate suggests that an increase of about
30 gg/m 3 in NO2 exposure will result in an increase of about 20% in the reported
number of cases of respiratory disease in elementary school children.
6.4. Discussion
For all but very small sample sizes, the results using the inverse variance weighted
method and the maximum likelihood estimates will be extremely similar. The
inverse variance weighted method may be preferred for its simplicity of computa-
tion. The use of likelihood functions has the advantage that it allows for the
visualization of the estimates via marginal likelihood functions or posterior dis-
tributions.
The use of random effects models is not a recent innovation. They have been referred
to as two-stage or hierarchical models. The idea of a random effects model is that the
parameter or parameters of 'mother nature' do not remain constant from study to
study. Instead, they vary randomly, and are in fact random variables sampled from
some distribution. The problem then becomes to estimate (or derive a posterior
distribution for) some function of these parameters.
yj = log(stJcj/fcjstj). (7.1.1)
The model partitions the variation of the Yj into two sources: (1) the variation z2
resulting from nature choosing different parameter values for each study, (2) the
sampling variation from the study itself given a particular value from nature. Let
where wj is defined as
704 V. Hasselblad
and define
Estimate ,[72by
~2=maxfO,[Q-(m-l)]/[
~=wj-(~=lw,
fj~=lw,)ll. (7.1.4)
Now define u: as
1
u , - [(1/wj) + ~2]" (7.1.5)
Compute
=
j=l
uj yj
IS,
j
ui (7.1.6)
and
7.2. N o r m a l - n o r m a l model
(7.2.1)
For ~ = and small values of 2, this prior approaches 1/z, the scale invariant prior.
In practice, an informative prior with a c~= )~= works very well in that if there are a
Meta-analysis in environmentalstatistics 705
small number of studies, the prior favors a value of zero for ~.2. Otherwise, a small
number of studies will result in a very diffuse posterior.
The Jeffreys' prior (Berger, 1985) is proportional to
(7.2.3)
If the 0-i2 are all equal t o 0-2, then equation (7.2.3) reduces to
F 1 ]3/2
'
or
(7.2.4)
~(v, ? ) = 0- L0-2 + ? j .
leads to problems. The integrals do not converge near z = 0, and the integrals do not
coverge as r ~ ~ unless there are at least four studies (likelihoods). For this reason, a
gamma prior (equation (7.2.2)) is preferable, and if necessary, can be used with very
small values of a and 2.
The joint posterior for v and z 2 is proportional to the product of equations (7.2.1)
and (7.2.2)
2
In most cases, a i is not known. It is standard practice to replace a~ by its estimate.
This should not be a problem for estimates made from reasonably large samples.
The major difficulty in computing a marginal posterior for v is that the posterior
must be integrated over z2
f~f('c2)d'c2~ln(r) ~ crif(cr i)
i=-1
r ~ e 3[In(1 +cZ/sZ)]~JZ/n.
=
{2yj
+
j = 1,2, . . . , 36,
where a 2 is the known variance of Yr' Thus z2 is used to 'shrink' the estimate of yj
towards the overall mean. Under certain assumptions it can be shown that this
estimator has a smaller variance than the original estimator.
The key estimate in this problem is the estimate for z2. The likelihood for T 2 using
an empirical Bayes model is shown in Figure 2 along with the random effects model
posterior for ~.2 assuming two different gamma priors. The results from all three
models are similar, which is to be expected since the sample size (n = 36) was
relatively large.
Meta-analysis in environmental statistics 707
+ I I
8.~i 8.B2 B.B3 0.B4
T 2
Thus the empirical Bayes and random effects model (hierarchical Bayes solution)
can be quite similar. Berger (1985) concludes that' ... it appears that the hierarchical
Bayes approach is the superior methodology for general applications. When [m] is
large, of course, there will be little difference between the two approaches, and
whichever is more convenient can then be employed.'
7.5. An example
The same example of respiratory disease in children from exposure to N O 2 (see
Section 6.3) can be used to demonstrate the use of random effects models. A test of
homogeneity of the nine studies gives a chi-square of 13.55 for 8 degrees of freedom
(p = 0.094). Thus there is some evidence for a lack of homogeneity in the nine studies,
although it is not statistically significant at the 0.05 level.
Estimates of odds ratios are known to be approximately lognormal. By trans-
forming the x-axis, we can use a normal normal model to fit the nine studies.
Estimates from the n o r m a l - n o r m a l model and the model of DerSimonian and
Laird (1986) are shown in Table 6. For comparison, the maximum likelihood (ML)
estimates for the fixed effects model are also given.
The estimates from either random effects model leads to the same conclusions as
does the posterior from the standard maximum likeiihood estimates, although the
confidence limits (credible set values) are slightly wider for the random effects model.
The implication is that even if each study is not measuring exactly the same health
endpoint, the overall parameter which they are estimating lies to the right of one.
Thus there is convincing evidence for the effect of increased N O 2 exposure on
respiratory disease in children in spite of possible heterogeneity.
Raudenbush and Bryk (1985) give formulas for empirical Bayes estimates of effect
sizes using the same random effects model, and Davis and Svendsgaard (1990) used
708 V. Hasselblad
Table 6
Combined estimates of the odds ratio using three differentmodels
Method Estimate 95~ confidencelimits
Lower Upper
Fixed effect(ML) 1.190 1.121 1.261
Normal-normal 1.200 1.102 1.307
DerSimonian and Laird 1.192 1.090 1.304
this model to estimate an effect size for the same studies described in Section 4.3.
Allowing for the heterogeneity resulted in the one sided p-value being reduce to
0.0025.
7.6. Discussion
Hierarchical models are seldom used as the primary method of analysis. They are
very useful when there is concern about the lack of homogeneity of the studies. In
these cases it may be possible to drop the assumption of homogeneity and to make
conclusions about the average effect by appropriately modeling the lack of homo-
geneity.
The confidence profile method is a very general method for combining virtually any
kind of evidence about various parameters, as long as those parameters can be
described in some model. It was first described by Eddy (1989), and a more complete
description was given by Eddy et al. (1982). In general, the method can be divided
into four steps, but the later steps can give information requiring the modification of
earlier steps, and as a results, the process may be considered iterative. The four basic
steps are: (1) define the problem precisely; (2) obtain all available evidence; (3) define
the model parameters and their relationships; and finally, (4) solve the model using
any of several different methods described later.
The third step is the heart of the confidence profile method. Models consist of
three elements: (1) basic parameters; (2) functional parameters; and (3) likelihood
functions relating evidence to basic or functional parameters. These elements are
described below.
Basic parameters are those parameters which appear in the model which are not
functions of any other parameters. Examples of basic parameters are (1) the
probability of having at least one respiratory illness with a background exposure of
10 pg/m3; (2) the mean diastolic blood pressure of US men aged 50; or (3) the
expected number of respiratory illnesses in infants during the first year of life. There
Meta-analysis in environmental statistics 709
may or may not be any direct evidence about a basic parameter. For convenience, we
will denote the basic parameters as 01, 02 . . . . . Ok. It is possible that some of the 0j
could be multivariate in nature. For examples, a multinomial distribution requires a
multivariate parameter.
If a Bayesian analysis is to be used, then all basic parameters must have prior
distributions. Noninformative priors for these parameters can be derived in a variety
of ways (see Berger, 1985).
Oj=f~(O, O z , . . . , O j _ O, j = k + l . . . . . p. (8.2.1)
Difference 0 d = 0 t -- 0e,
0t
Ratio Or =Occ'
0t(1 - 0c)
Odds ratio 0or -- 0c(1
0 t ~__ ).
A second class of functions arise from the problems of errors in measurements. Most
of these are variations of the following model: Let 0el denote the probability that a
success is labeled a failure, and let 0,2 denote the probability that a failure is labeled a
success. The formula for the observed success rate 0' as a function of the true rate 0 is
given by
Other functions arise in the area of survival analysis. These functions are too
numerous to describe.
....
.... 1.
Although equation (8.4.1) defines the log-likelihood function, and equation (8.4.2)
defines the posterior distribution, these functions are not useful for most ap-
plications. The calculation of summary statistics or marginal distributions are
necessary, and these are discussed in the section on solution methods.
The obvious solution is to sample more where the likelihood is larger, a method
known as importance sampling. The distribution of the importance function should
mimic the posterior density. One obvious choice is to start with the approximate
multivariate normal likelihood function (or posterior) for 01. . . . . 0k estimated by
maximum likelihood methods. The sample is made from the k-dimensional multi-
variate normal. Define w~ as
L(O i)
w i - i(0~),
where 0~= 0~..... 0~ and 1(0 ~) is the value of the multivariate normal density
evaluated at 0 i. 0(0) is then given by
The variance of the estimate can be computed from the weighted variance of the
individual estimates. A detailed discussion of various Monte Carlo methods to
obtain marginal density distributions was given by Gelfand and Smith (1990).
Obviously, a key to the method is the ability to quickly generate multivariate
normal observations with the specified mean vector and covariance matrix. Note
that if Visa positive definite matrix, then Vcan be written as C'C. If we start with k
independent normals with mean 0 and variance 1 (denote the vector as X), then the
distribution of/~ + C X will have mean vector/~ and covariance matric C'C = V.
8.7. An example
The determination of 'safe' levels of ambient air toxic substances is an extremely
difficult problem. EPA has the responsibility for determining such levels, and has
labeled these values RfCs. The following is an application of the CPM to determine
an RfC for chronic inhalation exposure of n-hexane. The general approach is to
determine that concentration which will produce a specified adverse health effect.
This methodology is related to the 'benchmark' concentration method of Crump
(1984).
The result from each study is a posterior distribution for the dose producing the
specified health effect. Thus the distributions are all on the same scale and can be
compared directly. With extreme caution, the distributions can be combined. The
computations are illustrated by an example detailed by Jarabek and Hasselblad
(1991).
Sanagi et al. (1980) conducted an occupational study (of humans) exposed to
58 ppm n-hexane and compared them with an unexposed population. The health
endpoint was nerve conduction velocity (NCV), and the specified health effect was a
decrease in NCV of 2 m/se. The model for the Sanagi et al. study is
where y is the NCV value, x is the concentration of n-hexane, ~ and fl are the basic
parameters, and ei is normally distributed with mean 0 and variance a 2. The prior
distribution for ~ is the uniform distribution over the entire real line and the prior for
fl is the uniform distribution over the negative half line (n-hexane is assumed not to
improve NCV). The function parameter of interest is that concentration 0c which
will produce a decrease of 2m/s: 0c = 2~ft. Under the assumptions, the marginal
posterior distribution for 0c will be an inverted truncated student-t distribution. This
distribution, using the data of Sanagi et al., is in Figure 3.
The second study is an experiment by Dunnick et al. (1989) which showed
increased nasal turbinate lesions in female mice expose to n-hexane. In this case,
the ECho was chosen as the biologically significant effect. The model for the data is
the log-logistic model where the exposure variable x is the human equivalent
concentration of n-hexane,
P(lesion Ix) -
[1 + e x p ( - ~ -fl In(x))]'
where e and fl are the basic parameters. The prior distribution for e is the uniform
distribution over the entire real line and the prior for fl is the uniform distribution
over the positive half line (n-hexane is assumed not to decrease lesions). The function
parameter of interest is that concentration 0c which will produce a lesion rate of 10
percent (EC ~o),
-~-ln(9)
0 c --
#
" l ~ Combined
'N
Dunnick
200 400
C o n c e n t r a t i o n of n-Hexone ( r a g / m 3 )
An approximate solution for this problem was given by Finney (1978), but it is
inaccurate for small sample sizes. The posterior distribution for 0 c based on the
data of D u n n i c k et al. (1990) can be calculated numerically, and it is also shown in
Figure 3.
The two posterior distributions are not inconsistent with each other in spite of the
fact that they were done on different species and measured very different endpoints.
The RfC derived from either study would be relatively similar. F o r mathematical
interest, the posterior distribution assuming a c o m m o n 0 c is also shown in Figure 3.
8.8. Discussion
References
Barnard, G. A. and A, D. Sprout (1982). Likelihood. In: Encyclopedia of Statistical Sciences, Vol, 3. Wiley,
New York.
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer, New York.
Berger, J. O. and R. L. Wolpert (1984). The Likelihood Principle. Institute of Mathematical Statistics,
Hayward, CA.
Berndt, E., B. Hall, R. Hall and J. Hausman (1974). Estimation and inference in nonlinear structural
models. Ann. Econom. Soc. Measure 3, 655-665.
Birge, R. T. (1932). The calculation of errors by the method of least squares. Phys. Rev. 30, 207-227.
Box, G. E. P. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis. Addison-Wesley,Reading,
MA.
Burden, R. L. and J. D. Faires (1985). Numerical Analysis. Prindle, Weber and Schmidt, Boston, MA.
Charlmers, T. C., H. Smith Jr., B. Blackburn, B. Silverman, B. Schroeder, D. Reitman and A. Ambroz
(1981). A method for assessing the quality of a randomized control trial. Control. Clinic. Trials 2, 31-49.
Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. J. Roy. Statist.
Soc. Suppl. 4, 102-118.
Cohen, J. (1977). Statistical Power Analysis for the Behavior Sciences. Academic Press, New York.
Cox, D. R. and D. Oates (1984). Analysis of Survival Data. Chapman and Hall, London.
Crump, K. S. (1984). A new method for determining allowable daily intakes. Fund. Appl. Toxicol. 4,
854-871.
Meta-analysis in environmental statistics 715
Davis, J. M. and D. Svendsgaard (1990). Nerve conduction velocity and lead: A critical review and
recta-analysis. In: B.L. Johnson, ed., Advances in Neurobehavioral Toxicology: Applications in
Environmental and Occupational Health. Lewis Publications, Chelsea, MI.
DeGroot, M. H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
DerSimonian, R. and N. Laird (1986). Meta analysis in clinical trials. Control. Clinic. Trails 7, 177-188.
Dunnick, J. K., D. G. Graham, R. S. Yang, S. B. Haber and H. R. Brown (1990). Thirteen-week toxicity
study of n-hexane in B6C3F 1 mice after inhalation exposure. Toxicology 57, 163-172.
Eddy, D. M. (1992). Manual for Evaluatin9 Health Practices & Designin 9 Practice Policies. American
College of Physicians, Philadelphia, PA, 44-45.
Eddy, D. M. (1989). The confidence profile method: A bayesian method for assessing health technologies.
Oper. Res. 37, 210-228.
Eddy, D. M., V. Hasselblad and R. D.'Shachter, (1992). The Statistical Synthesis of Evidence: Meta-
analysis by the Confidence Profile Method. Academic Press, Boston, MA.
Efron, B. and C. Morris (1975). Data analysis using Stein's estimator and its generalizations. J. Amer.
Statist. Assoc. 70, 311-319.
Finney, D. J. (1978). Statistical Methods in Biological Assay. Griffin, London.
Fisher, R. A. (1932). Statistical Methods for Research Workers. 4th ed., Oliver and Boyd, London.
Gelfand, A. and A. F. M. Smith (1990). Sampling-based approaches to calculating marginal densities J.
Amer. Statist. Assoc. 85, 398-409.
George, E. O. (1977). Combining independent one-sided and two-sided statistical tests - Some theory and
applications. Unpublished Doctoral Dissertation, University of Rochester.
Glass, G. (1980). Summarizing effect sizes. In: New Directions for Methodology of Social and Behavioral
Science: Quantitative Assessment of Research Domains. Jossey-Bass, San Francisco, CA.
Hald, A. (1952). Statistical Theory with Engineerin 9 Applications. Wiley, New York.
Hasselblad, V., D. J. Kotchmar and D. M. Eddy (1992). Synthesis of environmental evidence: Nitrogen
dioxide epidemiology studies. J. Air Waste Management Assoc. 42, 662-671.
Hauck, W. W. (1984). A comparative study of conditional maximum likelihood estimation of a common
odds ratio. Biometrics 40, 1117-1123.
Hedges, L. (1981). Distribution theory for Glass's estimator of effect size and related estimators. J. Educat.
Statist. 6, 107-128.
Hedges, L. (1987). Statistical issues in the meta-analysis of environmental studies. In: ASA/EPA Conf. on
Interpretation of Environmental Data I I. Statistical Issues in Combinin 9 Environmental Studies, October
1-2, 1986. EPA-230-12-87-032, 30-44.
Hedges, L. V. and I. Olkin (1985). Statistical Methods for Meta-Analysis. Academic Press, Orlando.
Howard, R. A. and J. E. Matheson (1984). Influence diagrams. In: R. A. Howard and J. E. Matheson, ed.,
Readings in the Principles and Applications of Decision Analysis. Strategic Decision Group, Menlo
Park, CA.
Jarabek, A. M. and V. Hasselbad (1991). Inhalation reference concentration methodology: Impact of
dosimetric adjustments and future directions using the confidence profile method. In: Proc. 84th Ann.
Meetin 9 of the Air & Waste Management Association, Vancouver, Canada.
Kleinbaum, D. G., L. L. Kupper and H. Morgenstern (1982). Epidemiologic Research. Lifetime Learning
Publications, Belmont, CA.
Laird, N. M. and F. Mosteller (1990). Some statistical methods for combining experimental results.
Internat. J. Teehnol. Assess. Health Care 6, 5-30.
Landwehr, J. M. (1987). Discussion of 'Statistical issues in the meta-analysis of Environmental Studies,
In: ASA/EPA Conf. on Interpretation of Environmental Data II. Statistical Issues in Combinin 9
Environmental Studies, October 1-2, 1986. EPA-230-12-87-032, 47 49.
Mann, C. (1990). Meta-analysis in the breech. Science 249, 476-480.
Mantel, N. (1963). Chi-square tests with one degree of freedom; Extensions of the Mantel-Haenszel
procedure. J. Amer. Statist. Assoc. 58, 690-700.
Mantel, N. and W. Haenszel (1959). Statistical aspects of the analysis of data from retrospective studies of
disease. J. Nat. Cancer Inst. 22, 719-748.
Morris, C. N. (1983). Parametric empirical bayes inference: Theory and applications. J. Amer. Statist.
Assoc. 78, 47-59.
716 V. H asselb lad
National Research Council (NRC) (1986). Environmental Tobacco Smoke: Measuring Exposures and
Assessin 9 Health Effects. National Academy Press, Washington, DC.
Orwin, R. G. and D. S. Cordray (1985). Effects of deficient reporting on meta-analysis: A conceptual
framework and reanalysis. Psychol. Bull. 97, 134-147.
Patil, G. P., G. J. Babu, M. T. Bosewell, K. Chatterjee, E. Linder and C. Taillie (1987). Statistical issues in
combining ecological and environmental studies with examples in marine fisheries research and
management. In: ASA/EPA Conf. on Interpretation of Environmental Data II. Statistical Issues in
Combining Environmental Studies, October 1, 1986, EPA-230-12-87-032, 30-44.
Pearson, K. (1904). Report on certain enteric fever inoculation statistics. British Medical J. 2, 1243-1246.
Peto, R., M. C. Pike, P. Armitage, N. E. Breslow, D. R. Cox, S. V. Howard, N. Mantel, K. McPherson,
J. Peto and P. G. Smith (1977). Design and analysis of randomized clinical trials requiring prolonged
observation of each patient II. Analysis and Examples British J. Cancer 35, 1-39.
Raudenbush, S. and A. S. Bryk (1985). Empirical Bayes meta-analysis. J. Educat. Statist. 10, 75-98.
Rosenthal, R. (1984). Meta-Analytic Procedures for Social Research. Sage, Beverly Hills, CA.
Rosenthal, R. and D. B. Rubin (1982). Combining effect sizes from independent studies. Psychol. Bull. 92,
500-503.
Sanagi, S., Y. Seki, K. Sugimoto and M. Hirata (1980). Peripheral nervous system functions of workers
exposed to n-hexane at a low level. Internat. Arch. Occup. Health 47, 69-70.
Schwartz, J., H. Pitcher, R. Levin, B. Ostro and A. J. Nichols (1985). Cost and benefits of reducing lead in
gasoline: Final regulatory impact analysis. USEPA: EPA 230-05-85-006.
Shachter, R. (1986). Evaluating Influence Diagrams. In: A. P. Basu, ed., Reliability and Quality Control.
Elsevier, Amsterdam, 321-344.
Shachter, R. D., D. M. Eddy and V. Hasselblad (1990). An influence diagram approach to medical
technology assessment. In: R. M. Oliver and J. Q. Smith eds., Influence Diagrams and Belief Nets.
Wiley, Chichester, England.
Stouffer, S. A., E. A. Suchman, L C. DeVinney, S. A. Star and R. M. Williams Jr. (1949). The American
Soldier: Adjustment During Army Life, Vol. I. Princeton Univ. Press, Princeton, NJ.
Tippett, L. H. C. (1931). The Methods of Statistics. Williams and Norgate, London.
US Department of Health, Education, and Welfare (1970). Air Quality Criteria for Hydrocarbons.
National Air Pollution Control Administration, Washington DC.
US Environmental Protection Agency. (1990). Health Effects of Passive Smoking: Assessment of Lung
Cancer in Adults and Respiratory Disorders in Children, EPA-600/6-90-006A. Office of Atmospheric
and Indoor Air Programs, Washington, DC.
Welch, B. L. (1938). The significance of the difference between two means when the population variances
are unequal. Biometrika 29, 350 362.
Williams, D.A. (1975). The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics 31,949-952.
Yusef, S., R. Peto, J. Lewis, R. Collins and P. Sleight (1985). Beta blockade during and after myocardial
infarction: An overview of the randomized trials. Progr. Cardiovasc. Diseases 27, 335-371.