Meta-Analysis in Environmental Statistics: Vic Hasselblad

G. P. Patil and C. R. Rao, eds., Handbook of Statistics, Vol.
12 ~ ]
1994 Elsevier ScienceB.V. All rights reserved.
Meta-Analysis in Environmental Statistics
Vic Hasselblad
Abstract: The synthesis of evidence from different experiments or investiga-

tions is a challenging problem for environmental science. This chapter reviews
standard methods for such synthesis, including combining p-values, effect sizes,
and methods for combining contingency tables. Recent developments such as
random effects models and the confidence profile method are also described.
For each method, published environmental examples are given.
1. Introduction
One of the most challenging problems in environmental science is the synthesis of

evidence from different experiments or investigations. There are several books and
review articles written on the general topic of evidence synthesis or meta-analysis.
Two review articles which are most pertinent to environmental studies are those of
Hedges (1987) and Laird and Mosteller (1990). This chapter will review standard
methods, include some recent advances, and give environmental examples of each
method.
By far the most common approach to the synthesis of evidence is the use of the
global subjective judgment. One or more experts gather all the evidence they want to
consider, synthesize it, and describe their conclusions, often in the form of a review
article or guideline document. According to Thomas Chalmers, 'The days of the
expert supposedly putting the state of the field into a review article are numbered'
(Mann, 1990).
Over the past decade a set of quantitative methods has been developed to
synthesize evidence in order to estimate the outcomes of different actions. This set of
methods has come to be known as meta-analysis. The methods include (1) combin-
ing p-values, first described by Fisher (1932), (2) effect sizes as developed by Glass
(1980) and Hedges (1981), (3) contingency table methods such as that of Mantel and
Haenszel (1959) and Peto et al. (1977), (4) inverse variance weighted method as
described by Hald (1952), (5) random effects models as described by DerSimonian
691
692 V. Hasselblad
and Laird (1986), and (6) a general method of combining evidence as described by
Eddy et al. (1992).
These methods are slowly being added to environmental research, having orig-
inated in the fields of chemistry, physics, clinical medicine and psychological testing
(see Hedges, 1987). This chapter will discuss the methods listed above, giving
applications of each method to environmental studies.
2. Descriptive m e t h o d s - Evidence tables
The first step in moving away from the global subjective judgment is the creation of
an evidence table. Although this step is obvious to individuals performing meta-
analyses, it has not always been the first step in a decision making process. An
evidence table, as defined by Eddy (1992), is a table that describes for each study the
design, sample size(s), outcomes measured, types of subjects, interventions com-
pared, potential biases, experimental conditions, and observed outcomes. Evidence
tables have been used for years. For example, Pearson (1904) presented evidence on
the relation between deaths and the presence of vaccination scars in cases of
Small-pox. One early example of an evidence table in the environmental field is
found in the first air quality criteria document for hydrocarbons (US Department of
Health, Education, and Welfare, 1970), a table on the dose-response relationships of
various plants to ethylene. All current EPA criteria and assessment documents
contain several evidence tables.
In some cases, it may be desirable to list all studies in an evidence table, regardless
of quality. However, the evidence table is often the first step in a synthesis of studies
measuring similar health endpoints. In cases where there is varying quality, inclu-
sion criteria may be necessary. Chalmers et al. (1981) have described standards for a
good randomized control trial (RCT) and have developed a scoring system to allow
for a quantitative assessment of reported trials. The scoring system includes factors
such as the method of blinding, the presence of a biostatistician, and loss to
follow-up. Any score can then be used to include or exclude a study, or can be used as
a weighting factor. However, the work of Orwin and Cordray (1985) suggests that
direct ratings of study quality have low reliability.
The inclusion or exclusion of a study is a special case of weighting, where the
weights are either 1 or 0. Laird and Mosteller (1990) give a brief discussion of
weighting. As difficult as weighting or scoring is for RCTs, it is much more difficult
for environmental studies. Not only are many different study designs used, the
definitions of exposures (or treatments) are often made with significant measure-
ment error. General guidelines for inclusion or exclusion of environmental studies
may not be feasible.
Any evidence table is valid only if it considers all available information. The
published literature is particularly susceptible to the claim that it is unrepresentative
of all studies that may have been conducted (the publication bias problem) (Hedges,
1987). There is a strong tendency of the published literature to over represent
statistically significant results. In some cases this may be less of a problem for
Meta-analysis in environmental statistics 693
environmental studies. The large epidemiological studies are very expensive and are
usually published regardless of the results. Even animal studies can cost several
hundred thousand dollars and the results are important whether they are positive or
negative. However, the use of existing data bases for case-control studies does not
require the same level of effort, and the location of all such relevant studies may be
difficult. For additional discussion, see Hedges (1987).
The use of an evidence table can be an end unto itself. The compilation of studies
may be so overwhelmingly positive or negative that further meta-analyses are not
necessary. This is more likely to be the case when the object of the meta-analysis is
the determination statistical significance rather than the estimation of a health effect.
3. Combiningp-values
There are two situations where combining p-values may be appropriate. First, some
studies do not report any effect measures but do report p-values. Second, the study
designs or treatment levels may be so different that combining effect measures would
be inappropriate. In these cases, combining p-values is a method for computing an
overall test of significance.
Assume that there are m independent studies of an effect, each having a p-value
corresponding to the test of the null hypothesis against a specified one-sided
alternative pj, j = 1, 2 . . . . . m . One method for combining p-values from these studies
was originally given by Fisher (1932), and thus the method has been referred to as
Fisher's method (Rosenthall 1984). Fisher noted that the distribution of - 2/ln(pj) is
distributed as a chi-square variable with 2 degrees of freedom. Thus the sum
X 2 = - 2 ~ ln(pi) (3.1)
j=l
is a chi-square distribution with 2m degrees of freedom under the null hypothesis.

The p-value for the combined test can be calculated from standard chi-square tables.
Several other methods of combining p-values have been published, and are
discussed by Hedges (1987). These include the minimum p method proposed by
Tippett (1931), the inverse normal method by Stouffer et al. (1949) and the logit
method described by George (1977).
These methods have the advantage that the studies being combined need not have
the same design or health endpoint, but there are two important drawbacks. First,
the tests do not weight the studies according to their uncertainty or sample size.
Second, they do not give any estimates of the magnitude of the effect. For these two
reasons, methods of combining p-values are seldom used as meta- analytic tools. The
following is an example of the use of Fisher's method for the meta-analysis of
environmental studies.
Schwartz et al. (1985) summarized six studies of cognitive effects from low level lead
exposure in children. The results of these studies were controversial because they
suggested adverse health effects from lead exposures much lower than previously
694 V. Hasselblad
Table 1
E p i d e m i o l o g i c a l studies of cognitive effects from low level lead e x p o s u r e in children as given by Schwartz
et al. (1985)
Study S a m p l e sizes I n t e r n a l lead levels I Q deficit p-value -2In p

Control Exposed Control Exposed
M c B r i d e et al. (1982) 86 86 < 9" 19-30" 1.2 ~ 0.25 2.77

Yule et al. (1981) 20 21 7-10" 17-32 a 7.6 a 0.027 7.22
Smith et al. (1983) 145 1~5 < 2.5 b > 8b 2.3 d 0.067 5.41
Yule and L a n s d o w n (1983) 80 82 7-12 a 13-24" 1.8 d 0.13 4.08
W i n n e k e et al. (1982) 26 26 2.4 b 7b 5.0 b 0.10 4.82
N e e d l e m a n et al. (1979) 100 58 < l0 b > 20 b 4.5 d 0.03 7.01
a Blood lead levels in pg/dl.

bTeeth lead levels in ppm.
c Peabody picture vocabulary IQ test.
d Wechsler intelligence scale for children revised.
thought to be of concern. Furthermore, some of the studies were statistically

significant and others were not. The important consideration was the possible
existence of effects at low lead levels rather than the estimation of the actual deficit in
cognitive abilities. The studies are in Table 1. As calculated by Schwartz et al. (1985),
the combined chi-square was 31.31 with 12 degrees of freedom, giving a one sided
p-value of 0.00177. Thus the studies strongly suggest an effect of low level lead
exposure on cognitive abilities. This was a key point in demonstrating the benefits of
reducing lead in gasoline, and was part of EPA's regulatory impact analysis.
4. Effect sizes
In certain situations there are several studies measuring outcomes which are similar
but not exactly the same. Or the studies measure the same endpoint, but under
different circumstances. One solution to this problem is to create an outcome
measure which does not depend on the scale of measurement. The method of
creating such a measure has been termed the method of 'effect sizes'.
4.1. C o n t i n u o u s o u t c o m e s
Several authors have described the method of effect sizes for combining evidence
(Cohen, 1977; Glass, 1980; Hedges, 1981; Rosenthal and Rubin, 1982; Rosenthal,
1984). Effect sizes are most commonly used for continuous outcome experiments.
The effect size of an experiment d is defined as
d- Mt -- M
S ' (4.1.1)
where M t and M c are the sample means of the treated and control arms respectively
and S is the estimate of the standard deviation of single observation. S could be the
estimate of the standard deviation in the control arm, or it could be a pooled
estimate. Assuming a normal distribution for the individual observations with equal
variances in each arm of the experiment, S is given by
$2_(n- 1)St2 + ( n c - 1)S 2

n t + nc - 2 ' (4.1.2)
where Stz is the sample variance of the treated arm and S 2 is the sample variance of
the control arm. The variance of d is
var(d)-v= ( 1+ . . . .
2vJ/[(nt + nc)J
, (4.1.3)
where t is the standard student-t test statistic for testing the hypothesis of no effect of
the study and v = n t + n C- 2.
N o w assume that there are m studies, so that the d and v are indexed by j,
j = 1, 2 . . . . . m. The combined effect measure is the inverse variance weighted average
of the dj:
(4.1.4)
j=l j=l
where wj = 1/vj.
The homogeneity of the dj can be tested using the statistic
X 2 = ~ w~(dj- d ) 2, (4.1.5)
j=l
where d is defined as before. The statistic X 2 has an approximate chi-square

distribution with m - 1 degrees of freedom.
4.2. Dichotomous outcomes

Rosenthal (1984) gives formulas for the use of effect sizes for d i c h o t o m o u s o u t c o m e
variables. In the d i c h o t o m o u s case, d is given by
d = Pt - Pc. (4.2.1)
The estimate of v is
ntPc(1 -- Pc) + ncPt(1 -- Pt)

v- , (4.2.2)
~/t nc
where Pt and Pc are the observed proportions in the two arms of the study.
696 V. Hasselblad
Laird and Mosteller (1990) have also described the use of effect sizes for
dichotomous outcome variables. They defined d as before, but defined v as
v = pc(1 - po). (4.2.3)
4.3. An example
The following example shows the use of effect sizes to estimate the combined effect of
studies of environmental lead exposure on nerve conduction velocity (NCV). A
summary of the studies was given by Davis and Svendsgaard (1990). The studies are
occupational exposure studies and as a result the actual exposures and lengths of
exposure varied greatly. The summary by Davis and Svendsgaard gave several
measures of NCV for each study, but median sensory NCV is perhaps the most
representative of the measures. Table 2 shows the median sensory NCV effect size for
each study. The result of combining the studies in Table 2 using effect sizes gives an
overall value of - 0.428 which has a corresponding one sided p-value less than 10- 5
Thus the studies show an overall effect assuming a fixed effects model. Davis and
Svendsgaard (1990) give a detailed discussion of the analysis of the complete data set
and also use a empirical Bayes approach (see Section 7.4) with effect sizes. Although
the confidence intervals are slightly wider, the conclusion is the same.
4.4. Discussion
Effect sizes have the advantage that they can combine 'apples and oranges' if the
outcome measures are reasonably similar. For example, one study of lung function
Table 2
Effect sizes for median sensory nerve conduction velocity studies of lead exposure
Study Exposure group Effect size Std. error
Sepp~il/iinen et al. (1983) 0 yr -0.116 0.295

1 yr -0.916 0.441
2 yr -0.697 0.535
4 yr - 0.393 0.640
W a n g et al. (1985) PbB 22 lag/dl -0.377 0.199
PbB 61 pg/dl -0.333 0.199
Seppfil/iinen et al. (1975) -0.706 0.220
Seppfil~iinen et al. (1975) 0.633 0.414
Ros6n et al. (1983) 0.000 0.359
Araki et al. (1986) - 0.952 0.422
Bordo et al. (1982a, b) - 1.004 0.243
Jeyaratnam et al. (1985) 0.100 0.179
Paulev et al. (1979) -0.398 0.339
Chen et al. (1985) -0.548 0.280
Singer et al. (1983) -0.766 0.275
Buchtal and Behse (1979) -1.093 0.340
Overall effect size -0.428 0.067

might measure forced expiratory volume (FEV) and another measure vital capacity
(VC). These two measures are closely related, but the values of VC are larger by
definition. The method of effect sizes incorporates the variance of the estimate of
each study and is therefore an improvement over combining p-values. The disadvan-
tage of the method is that the result is unitless, and as such cannot give us direct
information about the magnitude of the effect seen.
5. Methods for contingency tables
Many studies are designed to compare the rate of a health outcome in a treated
group 0t with the rate in a control group 0c. The results are usually summarized by a
standard 2 x 2 table:
Treated Not treated

Successes A B
Failures C D
Let N = A + B + C + D. We will assume that there are m studies given in the form of
a 2 x 2 tables. We will subscript the letters A, B, C, D, and N b y j to indicate results
from the j-th study (j = 1, 2 . . . . . m). The obvious estimates for 0t and 0c are
A and Oc- B (5.1)

Ot-A+C B+D"
Generally, the parameter of interest from a contingency table is the odds ratio hu
defined as
~u - 0t(1 - 0c) (5.2)

0c(1 - Ot)"
Although the difference in rates 0 t - 0c, and the relative risk, 0t/0c, are often used
as effect measures, the odds ratio is used for more complicated models involving
dichotomous outcomes. Three methods for calculating a combined estimate of an
odds ratio from 2 x 2 contingency tables are (1) the Mantel-Haenszel method, (2)
Peto's method, and (3) the m a x i m u m likelihood method. The methods all assume a
c o m m o n odds ratio between studies and assume that the binomial model holds for
each study.
5.1. Mantel-Haenszel methodfor combinin9 odds ratios

The Mantel-Haenszel method was first described by Mantel and Haenszel (1959)
and was used to calculate a combined odds ratio for stratified case-control studies.
Mantel (1963) later reported that the method could be used for a wider class of
698 V. H a s s e l b l a d
problems, including prospective studies. Assuming a common odds ratio, the
method can be used to combine results from different studies if those results are in
the form of 2 x 2 tables. The estimates obtained using this method are usually very
close to maximum likelihood estimates. The Mantel Haenszel estimate is
A t x D t ~ / V '~ BtxCtq
~J~-Lj~=l ~j j/LjZ= 1 l~ t J. (5.1.1)
A relatively simple estimate of the variance of log(~) was given by Kleinbaum

et al. (1982),
var(ln(~))~l/~ll/(l+ 1 1 1)
t /\At ~+~+~j . (5.1.2)
Note that the formula for the estimate has a problem if any of the Bt or C t are zero.
For this reason, all values of At, Bt, Ct, and Dt are often increased by in equations
(5.1.1) and (5.1.2).
5.2. Peto's method for combining odds ratios

Peto's method combines results from separate trials by adding the observed and
expected cell counts. It has the advantage that it can be used even if some of the cells
in the table are zero. Peto et al. (1977) give the formula for the combined odds ratio.
Let vt be defined as
vt = I(At+Bt)(At+Ct)][1
Nt st J
Then
q-'=It~At (At+B)(At+ ,' ,. (5.2.2)
Note that this can be thought of as the sum of the observed values At minus their
expected value E(A) given the null hypothesis, with the entire quantity normalized
by the variance. The variance of log (~F) is estimated by
Var (ln(~)) ~ 1 1)j . (5.2.3)

J
Yusef et al. (1985) gave the justification for the use of the quantity A - E(A) as the
statistic to use for the pooling of the information. An asymptotically efficient test of
Ho: fl = 0 versus Ha: fl < 0 can be based on the slope of the log-likelihood function
evaluated at fl = 0 (Cox and Oates, 1984). For this particular problem, it can be
shown that the slope is proportional to A - E(A).
5.3. Maximum likelihood estimates applied to 2 x 2 contingency tables

The maximum likelihood estimate is based on the likelihood of the m studies (with
the combinatorial terms dropped)
goc FI OBj( 1 -- Ocj)DJot~J( 1 -- Otj) CJ, (5.3.1)

j=l
subject to
= Ors(1 - Oci)/Ocs(1 -- Otj ) for all j.
Given this model, the likelihood is a function of k + 1 parameters. The maximum

likelihood estimate for ~P and its asymptotic variance can be calculated using
standard maximum likelihood methods (see Section 8). As a function of an increas-
ing number of studies, the maximum likelihood estimator is not consistent because
the number of parameters is increasing with the number of studies. Hauck (1984)
derived a conditional maximum likelihood estimator for 2 x 2 contingency tables
which is superior to the maximum likelihood estimator in both bias and precision.
5.4. An example
The US Environmental Protection Agency (1990) summarized several studies of the
health effects of passive smoking. Many of these studies were described earlier by the
National Research Council (1986). There were 19 case-control studies giving
estimates of the odds ratio of death from long cancer in nonsmoking women as a
function of the smoking status of their spouse. Of these 21 studies, seven of them used
the same definition of exposure. This exposure was defined to be smoking by the
spouse of at least one cigarette per day. The seven studies with a common definition
of exposure are in Table 3.
An overall chi-square test for homogeneity of the seven studies gives a value of
3.42 for 6 degrees of freedom (p = 0.755), giving no evidence of heterogeneity. The
results from combining the seven studies using the three methods just described are
in Table 4. The combined estimates suggest that smoking by the spouse results in a
risk of lung cancer in women 1.7 times that of women who's spouse does not smoke.
If all 19 studies are included, as was done by US EPA (1990), the estimated risk is
1.42. Using either estimate, there is a large increased risk of lung cancer in women
from sidestream smoke.
5.5. Discussion
For 2 x 2 contingency tables with moderate sample sizes (all cells with counts of 5 or
more), the method of combining studies makes very little difference. For smaller
sample sizes, the models will give slightly different answers. Hauck (1984) compared
the bias and precision of estimates using maximum likelihood, conditional maxi-
mum likelihood and Mantel-Haenszel. For m = 10, both the Mantel-Haenszel
700 V. Hasselblad
Table 3
Case-control studies of lung cancer deaths in nonsmoking women with husband's smoking
status used as the exposure variable
Study Location Results Odds ratio

95~ conf. interval
Cases Controls
Akiba et al. Hiroshima, Unexp. 21 82 1.577

Nagasaki, Exposed 73 188 (0.874, 2.629)
Japan
Correa et al. Louisiana, Unexp. 8 72 2.312
US Exposed 14 61 (0.812, 5.248)
Garfinkel et al. US Unexp. 44 157 1.340
Exposed 90 245 (0.868, 1.980)
Geng et al. Tianjin, Unexp. 20 52 2.293
China Exposed 34 41 (1.084, 4.287)
Humble et al. New Maxico, Unexp. 5 71 2.703
US Exposed 15 91 (0.819,6.729)
Lam et al. Hong Kong Unexp. 84 183 1.675
Exposed 115 152 (1.157, 2.349)
Trichopoulos et al. Athens, Unexp. 24 109 2.228
Greece Exposed 38 81 (1.185, 3.829)
Table 4
Combined estimates of the odds ratio using three different
methods
Method Estimate 95~ confidence limits
Lower Upper
Maximum likelihood 1.666 1.360 2.042

Mantel-Haenszel 1.667 1.360 2.043
Peto's formula 1.649 1.353 2.009
estimator and the conditional maximum likelihood estimator were superior to the
maximum likelihood estimator. For m = 5, all three methods gave very similar bias
and precision. Because the maximum likelihood estimate and the conditional
maximum likelihood estimate are difficult to compute, the Mantel-Haenszel esti-
mate is a logical choice for most problems. The underlying model is slightly different
for Peto's method, and there may be situations where this method is preferred.
6. General fixed effects models
A fixed parameter is one which remains the same from study to study. The opposite
of this is a parameter which varies randomly from study to study, and this kind of
Meta-analysisinenvironmentalstatistics 701
parameter is often analyzed using a random effects model (see Section 7). There are
two general methods for combining evidence on a single fixed parameter. The first of
these, the variance weighted method, has been used for many years. The second
method is the calculation of the maximum likelihood estimate from tile joint
likelihood function.
6.1. Inversevarianceweightedmethod
One of the oldest methods of combining estimates of a parameter is the inverse
variance weighted method. This method has been used for years, having been
discussed by Birge (1932) and Cochran (1937). A simple description of the method
was given by Hald (1952). Assume that there are m studies, each giving estimates 0j of
a parameter wherej = 1..... m. Then the minimum variance for any estimate which
is a linear combination of the 0j is
O~Ij~=lWjO~I/IjZ=IWJ1' (6.1.1)
where % = 1/var [0j]. The variance of the estimate is
var(O)= l / [ j~=a%]. (6.1.2)
The following statistic will test the null hypothesis that all O~are estimating the same
parameter
X:= ~ %(0~-0) 2, (6.1.3)

j=l
which is approximately distributed as a chi-square distribution with m - 1 degrees

of freedom.
6.2. Maximum likelihoodmethod

The second method is the application of the confidence profile method (CPM) to a
single fixed parameter (see Section 8). When applied to a single fixed parameter, the
CPM method'is equivalent to the maximum likelihood estimation, or to a standard
Bayesian method if a prior distribution is used.
6.3. An example
The studies summarized below gave evidence on the effect of nitrogen dioxide (NOz)
on respiratory disease in children. The studies were all prospective, but the model
used for analysis of the data varied from study to study. All studies were adjusted so
that the estimated effect was for an increase of about 30 Bg/m 3 in N O 2 exposure. The
702 V. Hasselblad
Table 5
Summary of studies of the effects of nitrogen dioxide on respiratory illness in children
Author Model for analysis Estimated 95~o confidence intervals

odds ratio
Melia et al. (1977) Multiple logistic 1.31 (1.16, 1.48)

Melia et al. (1979) Multiple logistic 1.24 (1.09,1.42)
Ware et al. (1984) Two arm experiment 1.08 (0.97, 1.19)
Neas et al. (1990) Multiple logistic 1.47 (1.17, 1.86)
Ekwo et al. (1983) Murtiway cont. thl. 1.10 (0.79, 1.53)
Dijkstra et al. (1990) Multiple logistic 0.94 (0.66, 1.33)
Keller et al. (1979) Two arm experiment 1.10 (0.79, 1.54)
effect was summarized as an odds ratio for the increase in respiratory disease in
children. The nine studies giving estimates for children in the age range of 6 to 12
years are given in Table 5. The studies were reviewed and combined by Hasselblad
et al. (1992).
Likelihoods for the results of nine studies are in Figure 1. Taken separately, only 4
of the 9 studies were statistically significant at the 0.05 level. The combined estimate
using the variance weighted method is an odds ratio of 1.193 with 95~o confidence
limits of 1.123 to 1.266. The combined estimate using the maximum likelihood
I 2 3
Odds ratio for increase in respiratory d i s e a s e
Fig. 1. Likelihood functions for studies of respiratory disease in children.

method is an odds ratio of 1.190 with 95 percent confidence limits of 1.121 to 1.261,
and is also shown in Figure 1. Either estimate suggests that an increase of about
30 gg/m 3 in NO2 exposure will result in an increase of about 20% in the reported
number of cases of respiratory disease in elementary school children.
6.4. Discussion
For all but very small sample sizes, the results using the inverse variance weighted
method and the maximum likelihood estimates will be extremely similar. The
inverse variance weighted method may be preferred for its simplicity of computa-
tion. The use of likelihood functions has the advantage that it allows for the
visualization of the estimates via marginal likelihood functions or posterior dis-
tributions.
7. Random effects models
The use of random effects models is not a recent innovation. They have been referred
to as two-stage or hierarchical models. The idea of a random effects model is that the
parameter or parameters of 'mother nature' do not remain constant from study to
study. Instead, they vary randomly, and are in fact random variables sampled from
some distribution. The problem then becomes to estimate (or derive a posterior
distribution for) some function of these parameters.
7.1. DerSimonian and Laird model

DerSimonian and Laird (1986) give formulas for combining effect measures using a
random effects model. For example, if the effect measure is the odds ratio, then define
yj as
yj = log(stJcj/fcjstj). (7.1.1)
The model partitions the variation of the Yj into two sources: (1) the variation z2
resulting from nature choosing different parameter values for each study, (2) the
sampling variation from the study itself given a particular value from nature. Let
Yw=j~w~Yj i w~, (7.1.2)
where wj is defined as
704 V. Hasselblad
and define
Q= wj( yj - .~w)z. (7.1.3)

j=l
Estimate ,[72by
~2=maxfO,[Q-(m-l)]/[
~=wj-(~=lw,
fj~=lw,)ll. (7.1.4)
Now define u: as
1
u , - [(1/wj) + ~2]" (7.1.5)
Compute
=
j=l
uj yj
IS,
j
ui (7.1.6)
and
var(@) = 1 uj. (7.1.7)

/j=l
This partitioning makes no distributional assumptions. However, DerSimonian and

Laird also give some maximum likelihood estimates which assume approximate
normality of the yj and the values given by mother nature. This model is described in
the following section.
7.2. N o r m a l - n o r m a l model
In the n o r m a l - n o r m a l model we assume that the observation yj (this could be the

estimate of a parameter as in the previous section) is sampled from a normal
distribution with mean #~ and variance o-2, j = 1, 2 . . . . . m. We assume that cr~ is
known and that/~j is a random value from a normal distribution with mean v and
variance r 2. The likelihood is proportional to
(7.2.1)
One general prior for v, z z is the gamma prior
g(V, "C2) OC ('C2)~ le ;"2/2. (7.2.2)
For ~ = and small values of 2, this prior approaches 1/z, the scale invariant prior.
In practice, an informative prior with a c~= )~= works very well in that if there are a
Meta-analysis in environmentalstatistics 705
small number of studies, the prior favors a value of zero for ~.2. Otherwise, a small
number of studies will result in a very diffuse posterior.
The Jeffreys' prior (Berger, 1985) is proportional to
(7.2.3)
If the 0-i2 are all equal t o 0-2, then equation (7.2.3) reduces to
F 1 ]3/2
'
or
(7.2.4)
~(v, ? ) = 0- L0-2 + ? j .
The natural noninformative prior,
~(v, r2) ocl/z, (7.2.5)
leads to problems. The integrals do not converge near z = 0, and the integrals do not
coverge as r ~ ~ unless there are at least four studies (likelihoods). For this reason, a
gamma prior (equation (7.2.2)) is preferable, and if necessary, can be used with very
small values of a and 2.
The joint posterior for v and z 2 is proportional to the product of equations (7.2.1)
and (7.2.2)
[ m[(~ ~v)2- + 1n(-c

n(v, z2) oc exp - ~ 2+ a2)]'e2~<-2e'~/2} (7.2.6)
m J= 1 0"1
2
In most cases, a i is not known. It is standard practice to replace a~ by its estimate.
This should not be a problem for estimates made from reasonably large samples.
The major difficulty in computing a marginal posterior for v is that the posterior
must be integrated over z2
~(v)=f~(v, vZ)dvz. (7.2.7)
Thinking of To(v,z 2) as a function of 2 f(l"2), this function can be approximated by
f~f('c2)d'c2~ln(r) ~ crif(cr i)
i=-1
~cln(r)If(c)+ i =Z-n cr if(cF-i)~-Fif(cri)l , (7.2.8)

706 v. Hasselblad
where c is chosen to be near the center of the expected answer, s is an approximate

standard deviation, and
r ~ e 3[In(1 +cZ/sZ)]~JZ/n.
7.3. Other models

Several other random effects models have appeared in the environmental literature.
One such model is the beta-binomial model which assumes a beta distribution for
the parameter p of a binomial model. An example of this model was given by
Williams (1975) for the analysis of responses from toxicological experiments involv-
ing reproduction and teratogenicity. A second model is the mixing of the location
parameter of a lognormal distribution by another lognormal distribution. Patil et al.
(1987) give an example using this model to fit recruitment data from oceanic fish
stocks. Random effects models can be used with effect size measures. Hedges and
Olkin (1985) give formulas very similar to those of DerSimmonian and Laird (1986).
7.4. Comparison to empirical Bayes estimation

An empirical Bayes problem is one where relationships among the parameters of the
model allow the use of the data to estimate parts of the prior distribution. For
problems with several observations (likelihoods), the results are usually quite similar
to the random effects model solution. Several examples of empirical Bayes analyses
are referenced by Morris (1983).
Efron and Morris (1975) give examples of empirical Bayes methods used to derive
James-Stein estimators. These estimators will have smaller variances than standard
estimators under certain assumptions. The following example, although not an
environmental one, illustrates the concepts. The example is a study of the prevalence
of toxoplasmosis in 36 cities of E1 Salvador. The prevalence rates for each study were
adjusted using an indirect method of adjustment so that these 36 rates yj had an
overall mean of 0. Thus it was assumed that mother nature sampled from a normal
distribution with mean 0 and variance ~2. If T 2 is estimated, then the James Stein
estimator for each yj is
=
{2yj
+
j = 1,2, . . . , 36,
where a 2 is the known variance of Yr' Thus z2 is used to 'shrink' the estimate of yj
towards the overall mean. Under certain assumptions it can be shown that this
estimator has a smaller variance than the original estimator.
The key estimate in this problem is the estimate for z2. The likelihood for T 2 using
an empirical Bayes model is shown in Figure 2 along with the random effects model
posterior for ~.2 assuming two different gamma priors. The results from all three
models are similar, which is to be expected since the sample size (n = 36) was
relatively large.
+ I I
8.~i 8.B2 B.B3 0.B4
T 2
Fig. 2. Likelihoodand posteriors for "C2.
Thus the empirical Bayes and random effects model (hierarchical Bayes solution)
can be quite similar. Berger (1985) concludes that' ... it appears that the hierarchical
Bayes approach is the superior methodology for general applications. When [m] is
large, of course, there will be little difference between the two approaches, and
whichever is more convenient can then be employed.'
7.5. An example
The same example of respiratory disease in children from exposure to N O 2 (see
Section 6.3) can be used to demonstrate the use of random effects models. A test of
homogeneity of the nine studies gives a chi-square of 13.55 for 8 degrees of freedom
(p = 0.094). Thus there is some evidence for a lack of homogeneity in the nine studies,
although it is not statistically significant at the 0.05 level.
Estimates of odds ratios are known to be approximately lognormal. By trans-
forming the x-axis, we can use a normal normal model to fit the nine studies.
Estimates from the n o r m a l - n o r m a l model and the model of DerSimonian and
Laird (1986) are shown in Table 6. For comparison, the maximum likelihood (ML)
estimates for the fixed effects model are also given.
The estimates from either random effects model leads to the same conclusions as
does the posterior from the standard maximum likeiihood estimates, although the
confidence limits (credible set values) are slightly wider for the random effects model.
The implication is that even if each study is not measuring exactly the same health
endpoint, the overall parameter which they are estimating lies to the right of one.
Thus there is convincing evidence for the effect of increased N O 2 exposure on
respiratory disease in children in spite of possible heterogeneity.
Raudenbush and Bryk (1985) give formulas for empirical Bayes estimates of effect
sizes using the same random effects model, and Davis and Svendsgaard (1990) used
708 V. Hasselblad
Table 6
Combined estimates of the odds ratio using three differentmodels
Method Estimate 95~ confidencelimits
Lower Upper
Fixed effect(ML) 1.190 1.121 1.261
Normal-normal 1.200 1.102 1.307
DerSimonian and Laird 1.192 1.090 1.304
this model to estimate an effect size for the same studies described in Section 4.3.
Allowing for the heterogeneity resulted in the one sided p-value being reduce to
0.0025.
7.6. Discussion
Hierarchical models are seldom used as the primary method of analysis. They are
very useful when there is concern about the lack of homogeneity of the studies. In
these cases it may be possible to drop the assumption of homogeneity and to make
conclusions about the average effect by appropriately modeling the lack of homo-
geneity.
8. Multiple parameter models - The confidence profile method
The confidence profile method is a very general method for combining virtually any
kind of evidence about various parameters, as long as those parameters can be
described in some model. It was first described by Eddy (1989), and a more complete
description was given by Eddy et al. (1982). In general, the method can be divided
into four steps, but the later steps can give information requiring the modification of
earlier steps, and as a results, the process may be considered iterative. The four basic
steps are: (1) define the problem precisely; (2) obtain all available evidence; (3) define
the model parameters and their relationships; and finally, (4) solve the model using
any of several different methods described later.
The third step is the heart of the confidence profile method. Models consist of
three elements: (1) basic parameters; (2) functional parameters; and (3) likelihood
functions relating evidence to basic or functional parameters. These elements are
described below.
8.1. Basic parameters
Basic parameters are those parameters which appear in the model which are not
functions of any other parameters. Examples of basic parameters are (1) the
probability of having at least one respiratory illness with a background exposure of
10 pg/m3; (2) the mean diastolic blood pressure of US men aged 50; or (3) the
expected number of respiratory illnesses in infants during the first year of life. There
may or may not be any direct evidence about a basic parameter. For convenience, we
will denote the basic parameters as 01, 02 . . . . . Ok. It is possible that some of the 0j
could be multivariate in nature. For examples, a multinomial distribution requires a
multivariate parameter.
If a Bayesian analysis is to be used, then all basic parameters must have prior
distributions. Noninformative priors for these parameters can be derived in a variety
of ways (see Berger, 1985).
8.2. Functional parameters
Each functional parameter 0j is defined as a function of the basic parameters

01, 0 2. . . . . 0k, and previously defined functional paramaters Ok+ 1. . . . . Oj_ 1
Oj=f~(O, O z , . . . , O j _ O, j = k + l . . . . . p. (8.2.1)
where p is the total number of parameters in the model. Functional parameters

are useful in defining the likelihood functions, but the models are technically
functions of k parameters. Although the function can be any mathematical expres-
sion, certain functions are very common when evaluating medical evidence. One
class of common functions are effect measures. Denote 0c as the parameter of the
control group and 0t as the parameter of the treated group. The following are three
different effect measures:
Difference 0 d = 0 t -- 0e,
0t
Ratio Or =Occ'
0t(1 - 0c)
Odds ratio 0or -- 0c(1
0 t ~__ ).
A second class of functions arise from the problems of errors in measurements. Most
of these are variations of the following model: Let 0el denote the probability that a
success is labeled a failure, and let 0,2 denote the probability that a failure is labeled a
success. The formula for the observed success rate 0' as a function of the true rate 0 is
given by
0' = (1 - 0el)0 + 0e2(1 - 0).
Other functions arise in the area of survival analysis. These functions are too
numerous to describe.
8.3. Likelihood functions

Likelihood functions connect observed evidence to basic and functional parameters.
Denote the likelihood of the i-th observation from the s-th experiments (s = 1, 2 . . . . . S)
710 V. Hasselblad
as Ls(Xil01,02,... , Op). Remember that the likelihood functions are technically

functions of the k basic parameters. For a detailed description of likelihood
functions, see Barnard and Sprout (1982) and Berger and Wolpert (1984).
8.4. The model

Once the basic parameters, functional parameters, and likelihoods are defined, the
model has been formulated. The log-likelihood, L', for the model (assuming indepen-
dence) is
S n~
L ' = y, ~ logLs(X, lO1 . . . . ,Op). (8.4.1)
s=l i=1
If priors have been defined, then the k-dimensional posterior (thought of as a

function of p parameters) is
....
.... 1.
Although equation (8.4.1) defines the log-likelihood function, and equation (8.4.2)
defines the posterior distribution, these functions are not useful for most ap-
plications. The calculation of summary statistics or marginal distributions are
necessary, and these are discussed in the section on solution methods.
8.5. Influence diagrams

Influence diagrams were developed as a Bayesian computer-aided modeling tool by
Howard and Matheson (1984). It is an alternative to decision trees for Bayesian
decision analysis. Shachter (1986) constructed an algorithm for solving Bayesian
decision analysis problems. Even when influence diagrams are not used to arrive at a
solution, they provide an excellent graphical representation of the conditional
relationship between the parameters and evidence.
8.6. Solution methods

As described earlier, the general solution of the model can be calculated by
maximizing the log-likelihood function in equation (8.4.1) or to calculating the
m-dimensional posterior in equation (8.4.2). There are four different methods for
solving the model: (1) exact solution in certain special cases; (2) approximate
solutions; (3) maximum likelihood methods; and (4) Monte Carlo simulation. The
choice depends on the complexity of the model, the accuracy required, and the
amount of computational power and time available. No distinction need be made
between a Bayesian or a non-Bayesian analysis, except that a Bayesian solution
requires the specification of priors.
8.6.1. Exact solution methods

There are a few situations where exact solutions for marginal likelihoods or
posteriors are available. Several of these arise when we have observations from a
M eta-anal ysis in environmental statistics 711
univariate or multivariate normal distribution. Most of these results were given by

DeGroot (1970) and include (1) the univariate normal with the variance unknown;
(2) the multivariate normal with the covariance unknown; (3) the linear regression
model; and (4) the binomial likelihood with a beta prior. Box and Tiao (1973) show
that a normal prior and a normal likelihood result in a normal posterior. This is
useful when normal approximations for the likelihood function are used. Other
exact solutions can be calculated numerically for models with a small number of
parameters.
8.6.2. Approximate solution methods

Some distributions can be closely approximated by other distributions, so that
distributions of certain functional parameters can also be approximated. Beta
distributions can be approximated by use of logarithmic or log-odds transform-
ations. These transformations result in nearly normal distributions for moderate
values of a and r, so that functional computations can be made. Computing the
distributions of the sum or difference of two variables each independently distrib-
uted as a student-t is an extremely difficult problem, and is known as the Behrens
Fisher problem. Welch (1938) has published an approximate solution to the
problem which is quite accurate for all but the most unusual cases.
8.6.3. Maximum likelihood methods

The maximum likelihood solution for the model can be found without the specifica-
tion of any priors. The point estimate for the maximum likelihood solution is that
value of 0 = 01, 02, . . . , 0,, which maximizes equation (8.4.1). Denote this value as 0.
There are two common solution methods for obtaining this estimate: the standard
N e w t o n - R a p h s o n method (Burden and Faires, 1985) and the modified G a u s s -
Newton method (Berndt et al., 1974). The posterior mode and credible set limits can
be calculated in the same manner (see Shachter et al., 1990), except that the prior
distribution is included in the function to be maximized.
8.6.4. Monte Carlo methods

Monte Carlo sampling solution methods have the advantage that posterior dis-
tributions for any functions of the parameters can be calculated to any desired
degree of accuracy. The problem is that the generation of enough random numbers
to obtain this accuracy could take years. In Monte Carlo sampling, a random vector
of the basic parameters is generated from which all functional parameters are
calculated. Denote these values as 0 ~, where the superscript i denotes the i-th
simulation. From these values, the likelihood function L i (or posterior distribution)
is calculated. To estimate a particular function 9(0), compute
0(0) --~1 L '

l l
where n is the number of random vectors generated.

The difficulty with this method is that most of the Li will be extremely small, and
the number of observations needed to get a precise estimate ofg(0) will be very large.
712 V. Hasselblad
The obvious solution is to sample more where the likelihood is larger, a method
known as importance sampling. The distribution of the importance function should
mimic the posterior density. One obvious choice is to start with the approximate
multivariate normal likelihood function (or posterior) for 01. . . . . 0k estimated by
maximum likelihood methods. The sample is made from the k-dimensional multi-
variate normal. Define w~ as
L(O i)
w i - i(0~),
where 0~= 0~..... 0~ and 1(0 ~) is the value of the multivariate normal density
evaluated at 0 i. 0(0) is then given by
0(0) = ~ g(Oi)/ ~ wi(Oi).

i=l / i=l
The variance of the estimate can be computed from the weighted variance of the
individual estimates. A detailed discussion of various Monte Carlo methods to
obtain marginal density distributions was given by Gelfand and Smith (1990).
Obviously, a key to the method is the ability to quickly generate multivariate
normal observations with the specified mean vector and covariance matrix. Note
that if Visa positive definite matrix, then Vcan be written as C'C. If we start with k
independent normals with mean 0 and variance 1 (denote the vector as X), then the
distribution of/~ + C X will have mean vector/~ and covariance matric C'C = V.
8.7. An example
The determination of 'safe' levels of ambient air toxic substances is an extremely
difficult problem. EPA has the responsibility for determining such levels, and has
labeled these values RfCs. The following is an application of the CPM to determine
an RfC for chronic inhalation exposure of n-hexane. The general approach is to
determine that concentration which will produce a specified adverse health effect.
This methodology is related to the 'benchmark' concentration method of Crump
(1984).
The result from each study is a posterior distribution for the dose producing the
specified health effect. Thus the distributions are all on the same scale and can be
compared directly. With extreme caution, the distributions can be combined. The
computations are illustrated by an example detailed by Jarabek and Hasselblad
(1991).
Sanagi et al. (1980) conducted an occupational study (of humans) exposed to
58 ppm n-hexane and compared them with an unexposed population. The health
endpoint was nerve conduction velocity (NCV), and the specified health effect was a
decrease in NCV of 2 m/se. The model for the Sanagi et al. study is
Yi = O~+ fix i + ~i,

where y is the NCV value, x is the concentration of n-hexane, ~ and fl are the basic
parameters, and ei is normally distributed with mean 0 and variance a 2. The prior
distribution for ~ is the uniform distribution over the entire real line and the prior for
fl is the uniform distribution over the negative half line (n-hexane is assumed not to
improve NCV). The function parameter of interest is that concentration 0c which
will produce a decrease of 2m/s: 0c = 2~ft. Under the assumptions, the marginal
posterior distribution for 0c will be an inverted truncated student-t distribution. This
distribution, using the data of Sanagi et al., is in Figure 3.
The second study is an experiment by Dunnick et al. (1989) which showed
increased nasal turbinate lesions in female mice expose to n-hexane. In this case,
the ECho was chosen as the biologically significant effect. The model for the data is
the log-logistic model where the exposure variable x is the human equivalent
concentration of n-hexane,
P(lesion Ix) -
[1 + e x p ( - ~ -fl In(x))]'
where e and fl are the basic parameters. The prior distribution for e is the uniform
distribution over the entire real line and the prior for fl is the uniform distribution
over the positive half line (n-hexane is assumed not to decrease lesions). The function
parameter of interest is that concentration 0c which will produce a lesion rate of 10
percent (EC ~o),
-~-ln(9)
0 c --
#
" l ~ Combined
'N
Dunnick
200 400
C o n c e n t r a t i o n of n-Hexone ( r a g / m 3 )
Fig. 3. Posterior for concentration producing specified effect.

714 v. Hasselblad
An approximate solution for this problem was given by Finney (1978), but it is
inaccurate for small sample sizes. The posterior distribution for 0 c based on the
data of D u n n i c k et al. (1990) can be calculated numerically, and it is also shown in
Figure 3.
The two posterior distributions are not inconsistent with each other in spite of the
fact that they were done on different species and measured very different endpoints.
The RfC derived from either study would be relatively similar. F o r mathematical
interest, the posterior distribution assuming a c o m m o n 0 c is also shown in Figure 3.
8.8. Discussion
The confidence profile m e t h o d uses a defined model to relate multiple parameters

and differing types of evidence. Any evidence which can be expressed as a likelihood
function of the parameters of the model can be incorporated. Subjective priors,
where necessary, m a y also be included. All of the assumptions of the model are
explicitly given by the functional parameters and the likelihoods. The solution can
be given as a joint posterior distribution or as a marginal distribution on any
parameter of interest. Fixed effect models and r a n d o m effect models are special cases
of the method.
One advantage of C P M is that graphs of the likelihood functions or posterior
distributions of the individual studies are an integral part of the analysis. According
to L a n d w e h r (1987), ' . . . plots should be contructed and used at the very beginning
of any meta-analysis, before getting into more complicated modeling, testing, and
estimation'. The primary d r a w b a c k of the m e t h o d is that the computations can be
difficult without specialized software.
References
Barnard, G. A. and A, D. Sprout (1982). Likelihood. In: Encyclopedia of Statistical Sciences, Vol, 3. Wiley,
New York.
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer, New York.
Berger, J. O. and R. L. Wolpert (1984). The Likelihood Principle. Institute of Mathematical Statistics,
Hayward, CA.
Berndt, E., B. Hall, R. Hall and J. Hausman (1974). Estimation and inference in nonlinear structural
models. Ann. Econom. Soc. Measure 3, 655-665.
Birge, R. T. (1932). The calculation of errors by the method of least squares. Phys. Rev. 30, 207-227.
Box, G. E. P. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis. Addison-Wesley,Reading,
MA.
Burden, R. L. and J. D. Faires (1985). Numerical Analysis. Prindle, Weber and Schmidt, Boston, MA.
Charlmers, T. C., H. Smith Jr., B. Blackburn, B. Silverman, B. Schroeder, D. Reitman and A. Ambroz
(1981). A method for assessing the quality of a randomized control trial. Control. Clinic. Trials 2, 31-49.
Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. J. Roy. Statist.
Soc. Suppl. 4, 102-118.
Cohen, J. (1977). Statistical Power Analysis for the Behavior Sciences. Academic Press, New York.
Cox, D. R. and D. Oates (1984). Analysis of Survival Data. Chapman and Hall, London.
Crump, K. S. (1984). A new method for determining allowable daily intakes. Fund. Appl. Toxicol. 4,
854-871.
Davis, J. M. and D. Svendsgaard (1990). Nerve conduction velocity and lead: A critical review and
recta-analysis. In: B.L. Johnson, ed., Advances in Neurobehavioral Toxicology: Applications in
Environmental and Occupational Health. Lewis Publications, Chelsea, MI.
DeGroot, M. H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
DerSimonian, R. and N. Laird (1986). Meta analysis in clinical trials. Control. Clinic. Trails 7, 177-188.
Dunnick, J. K., D. G. Graham, R. S. Yang, S. B. Haber and H. R. Brown (1990). Thirteen-week toxicity
study of n-hexane in B6C3F 1 mice after inhalation exposure. Toxicology 57, 163-172.
Eddy, D. M. (1992). Manual for Evaluatin9 Health Practices & Designin 9 Practice Policies. American
College of Physicians, Philadelphia, PA, 44-45.
Eddy, D. M. (1989). The confidence profile method: A bayesian method for assessing health technologies.
Oper. Res. 37, 210-228.
Eddy, D. M., V. Hasselblad and R. D.'Shachter, (1992). The Statistical Synthesis of Evidence: Meta-
analysis by the Confidence Profile Method. Academic Press, Boston, MA.
Efron, B. and C. Morris (1975). Data analysis using Stein's estimator and its generalizations. J. Amer.
Statist. Assoc. 70, 311-319.
Finney, D. J. (1978). Statistical Methods in Biological Assay. Griffin, London.
Fisher, R. A. (1932). Statistical Methods for Research Workers. 4th ed., Oliver and Boyd, London.
Gelfand, A. and A. F. M. Smith (1990). Sampling-based approaches to calculating marginal densities J.
Amer. Statist. Assoc. 85, 398-409.
George, E. O. (1977). Combining independent one-sided and two-sided statistical tests - Some theory and
applications. Unpublished Doctoral Dissertation, University of Rochester.
Glass, G. (1980). Summarizing effect sizes. In: New Directions for Methodology of Social and Behavioral
Science: Quantitative Assessment of Research Domains. Jossey-Bass, San Francisco, CA.
Hald, A. (1952). Statistical Theory with Engineerin 9 Applications. Wiley, New York.
Hasselblad, V., D. J. Kotchmar and D. M. Eddy (1992). Synthesis of environmental evidence: Nitrogen
dioxide epidemiology studies. J. Air Waste Management Assoc. 42, 662-671.
Hauck, W. W. (1984). A comparative study of conditional maximum likelihood estimation of a common
odds ratio. Biometrics 40, 1117-1123.
Hedges, L. (1981). Distribution theory for Glass's estimator of effect size and related estimators. J. Educat.
Statist. 6, 107-128.
Hedges, L. (1987). Statistical issues in the meta-analysis of environmental studies. In: ASA/EPA Conf. on
Interpretation of Environmental Data I I. Statistical Issues in Combinin 9 Environmental Studies, October
1-2, 1986. EPA-230-12-87-032, 30-44.
Hedges, L. V. and I. Olkin (1985). Statistical Methods for Meta-Analysis. Academic Press, Orlando.
Howard, R. A. and J. E. Matheson (1984). Influence diagrams. In: R. A. Howard and J. E. Matheson, ed.,
Readings in the Principles and Applications of Decision Analysis. Strategic Decision Group, Menlo
Park, CA.
Jarabek, A. M. and V. Hasselbad (1991). Inhalation reference concentration methodology: Impact of
dosimetric adjustments and future directions using the confidence profile method. In: Proc. 84th Ann.
Meetin 9 of the Air & Waste Management Association, Vancouver, Canada.
Kleinbaum, D. G., L. L. Kupper and H. Morgenstern (1982). Epidemiologic Research. Lifetime Learning
Publications, Belmont, CA.
Laird, N. M. and F. Mosteller (1990). Some statistical methods for combining experimental results.
Internat. J. Teehnol. Assess. Health Care 6, 5-30.
Landwehr, J. M. (1987). Discussion of 'Statistical issues in the meta-analysis of Environmental Studies,
In: ASA/EPA Conf. on Interpretation of Environmental Data II. Statistical Issues in Combinin 9
Environmental Studies, October 1-2, 1986. EPA-230-12-87-032, 47 49.
Mann, C. (1990). Meta-analysis in the breech. Science 249, 476-480.
Mantel, N. (1963). Chi-square tests with one degree of freedom; Extensions of the Mantel-Haenszel
procedure. J. Amer. Statist. Assoc. 58, 690-700.
Mantel, N. and W. Haenszel (1959). Statistical aspects of the analysis of data from retrospective studies of
disease. J. Nat. Cancer Inst. 22, 719-748.
Morris, C. N. (1983). Parametric empirical bayes inference: Theory and applications. J. Amer. Statist.
Assoc. 78, 47-59.
716 V. H asselb lad
National Research Council (NRC) (1986). Environmental Tobacco Smoke: Measuring Exposures and
Assessin 9 Health Effects. National Academy Press, Washington, DC.
Orwin, R. G. and D. S. Cordray (1985). Effects of deficient reporting on meta-analysis: A conceptual
framework and reanalysis. Psychol. Bull. 97, 134-147.
Patil, G. P., G. J. Babu, M. T. Bosewell, K. Chatterjee, E. Linder and C. Taillie (1987). Statistical issues in
combining ecological and environmental studies with examples in marine fisheries research and
management. In: ASA/EPA Conf. on Interpretation of Environmental Data II. Statistical Issues in
Combining Environmental Studies, October 1, 1986, EPA-230-12-87-032, 30-44.
Pearson, K. (1904). Report on certain enteric fever inoculation statistics. British Medical J. 2, 1243-1246.
Peto, R., M. C. Pike, P. Armitage, N. E. Breslow, D. R. Cox, S. V. Howard, N. Mantel, K. McPherson,
J. Peto and P. G. Smith (1977). Design and analysis of randomized clinical trials requiring prolonged
observation of each patient II. Analysis and Examples British J. Cancer 35, 1-39.
Raudenbush, S. and A. S. Bryk (1985). Empirical Bayes meta-analysis. J. Educat. Statist. 10, 75-98.
Rosenthal, R. (1984). Meta-Analytic Procedures for Social Research. Sage, Beverly Hills, CA.
Rosenthal, R. and D. B. Rubin (1982). Combining effect sizes from independent studies. Psychol. Bull. 92,
500-503.
Sanagi, S., Y. Seki, K. Sugimoto and M. Hirata (1980). Peripheral nervous system functions of workers
exposed to n-hexane at a low level. Internat. Arch. Occup. Health 47, 69-70.
Schwartz, J., H. Pitcher, R. Levin, B. Ostro and A. J. Nichols (1985). Cost and benefits of reducing lead in
gasoline: Final regulatory impact analysis. USEPA: EPA 230-05-85-006.
Shachter, R. (1986). Evaluating Influence Diagrams. In: A. P. Basu, ed., Reliability and Quality Control.
Elsevier, Amsterdam, 321-344.
Shachter, R. D., D. M. Eddy and V. Hasselblad (1990). An influence diagram approach to medical
technology assessment. In: R. M. Oliver and J. Q. Smith eds., Influence Diagrams and Belief Nets.
Wiley, Chichester, England.
Stouffer, S. A., E. A. Suchman, L C. DeVinney, S. A. Star and R. M. Williams Jr. (1949). The American
Soldier: Adjustment During Army Life, Vol. I. Princeton Univ. Press, Princeton, NJ.
Tippett, L. H. C. (1931). The Methods of Statistics. Williams and Norgate, London.
US Department of Health, Education, and Welfare (1970). Air Quality Criteria for Hydrocarbons.
National Air Pollution Control Administration, Washington DC.
US Environmental Protection Agency. (1990). Health Effects of Passive Smoking: Assessment of Lung
Cancer in Adults and Respiratory Disorders in Children, EPA-600/6-90-006A. Office of Atmospheric
and Indoor Air Programs, Washington, DC.
Welch, B. L. (1938). The significance of the difference between two means when the population variances
are unequal. Biometrika 29, 350 362.
Williams, D.A. (1975). The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics 31,949-952.
Yusef, S., R. Peto, J. Lewis, R. Collins and P. Sleight (1985). Beta blockade during and after myocardial
infarction: An overview of the randomized trials. Progr. Cardiovasc. Diseases 27, 335-371.

Meta-Analysis in Environmental Statistics: Vic Hasselblad

Uploaded by

Copyright:

Available Formats

Meta-Analysis in Environmental Statistics: Vic Hasselblad

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Meta-Analysis in Environmental Statistics: Vic Hasselblad

Uploaded by

Copyright:

Available Formats

G. P. Patil and C. R. Rao, eds., Handbook of Statistics, Vol.

Meta-Analysis in Environmental Statistics

Abstract: The synthesis of evidence from different experiments or investiga-

One of the most challenging problems in environmental science is the synthesis of

2. Descriptive m e t h o d s - Evidence tables

is a chi-square distribution with 2m degrees of freedom under the null hypothesis.

Study S a m p l e sizes I n t e r n a l lead levels I Q deficit p-value -2In p

M c B r i d e et al. (1982) 86 86 < 9" 19-30" 1.2 ~ 0.25 2.77

a Blood lead levels in pg/dl.

thought to be of concern. Furthermore, some of the studies were statistically

$2_(n- 1)St2 + ( n c - 1)S 2

where d is defined as before. The statistic X 2 has an approximate chi-square

4.2. Dichotomous outcomes

ntPc(1 -- Pc) + ncPt(1 -- Pt)

v = pc(1 - po). (4.2.3)

Study Exposure group Effect size Std. error

Sepp~il/iinen et al. (1983) 0 yr -0.116 0.295

Overall effect size -0.428 0.067

5. Methods for contingency tables

Treated Not treated

A and Oc- B (5.1)

~u - 0t(1 - 0c) (5.2)

5.1. Mantel-Haenszel methodfor combinin9 odds ratios

A relatively simple estimate of the variance of log(~) was given by Kleinbaum

5.2. Peto's method for combining odds ratios

q-'=It~At (At+B)(At+ ,' ,. (5.2.2)

Var (ln(~)) ~ 1 1)j . (5.2.3)

5.3. Maximum likelihood estimates applied to 2 x 2 contingency tables

goc FI OBj( 1 -- Ocj)DJot~J( 1 -- Otj) CJ, (5.3.1)

= Ors(1 - Oci)/Ocs(1 -- Otj ) for all j.

Given this model, the likelihood is a function of k + 1 parameters. The maximum

Study Location Results Odds ratio

Akiba et al. Hiroshima, Unexp. 21 82 1.577

Method Estimate 95~ confidence limits

Maximum likelihood 1.666 1.360 2.042

6. General fixed effects models

where % = 1/var [0j]. The variance of the estimate is

var(O)= l / [ j~=a%]. (6.1.2)

X:= ~ %(0~-0) 2, (6.1.3)

which is approximately distributed as a chi-square distribution with m - 1 degrees

6.2. Maximum likelihoodmethod

Author Model for analysis Estimated 95~o confidence intervals

Melia et al. (1977) Multiple logistic 1.31 (1.16, 1.48)

Fig. 1. Likelihood functions for studies of respiratory disease in children.

7. Random effects models

7.1. DerSimonian and Laird model

Yw=j~w~Yj i w~, (7.1.2)

Q= wj( yj - .~w)z. (7.1.3)

var(@) = 1 uj. (7.1.7)

This partitioning makes no distributional assumptions. However, DerSimonian and

In the n o r m a l - n o r m a l model we assume that the observation yj (this could be the

One general prior for v, z z is the gamma prior

g(V, "C2) OC ('C2)~ le ;"2/2. (7.2.2)

The natural noninformative prior,

~(v, r2) ocl/z, (7.2.5)

[ m[(~ ~v)2- + 1n(-c

~(v)=f~(v, vZ)dvz. (7.2.7)

Thinking of To(v,z 2) as a function of 2 f(l"2), this function can be approximated by

~cln(r)If(c)+ i =Z-n cr if(cF-i)~-Fif(cri)l , (7.2.8)