Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Meta-Analysis in Clinical Trials* Rebecca DerSimonian and Nan Laird ABSTRACT: This paper examines eight published reviews each reporting results from several related trials. Each review pools the results from the relevant trials in order to evaluate the efficacy of a certain treatment for a specified medical condition. These reviews lack consistent assessment of homogeneity of treatment effect before pooling. We discuss a random effects approach to combining evidence from a series of experiments comparing two treatments. This approach incorporates the heterogeneity of effects in the analysis of the overall treatment efficacy. The model can be extended to include relevant covariates which would reduce the heterogeneity and allow for more specific therapeutic recommendations. We suggest a simple noniterative procedure for characterizing the distribution of treatment effects in a series of studies. KEY WORDS: random effects model, heterogeneity of treatment effects, distribution of treatment effects, covariate information INTRODUCTION Meta-analysis is d e f i n e d h e r e as the statistical analysis of a collection of analytic results for the p u r p o s e of integrating the findings. Such analyses are b e c o m i n g increasingly p o p u l a r in medical research w h e r e i n f o r m a t i o n on efficacy of a t r e a t m e n t is available f r o m a n u m b e r of clinical studies with similar t r e a t m e n t protocols. If c o n s i d e r e d separately, a n y one s t u d y m a y be either too small or too limited in scope to c o m e to u n e q u i v o c a b l e or generalizable conclusions a b o u t the effect of treatment. C o m b i n i n g the findings across such studies r e p r e s e n t s a n attractive alternative to s t r e n g t h e n the evidence a b o u t the t r e a t m e n t efficacy. The m a i n difficulty in integrating the results f r o m various studies s t e m s f r o m the s o m e t i m e s diverse nature of the studies, b o t h in t e r m s of design a n d m e t h o d s e m p l o y e d . S o m e are carefully controlled r a n d o m i z e d experi- Yale University, School of Medicine, New Haven, Connecticut (R.D.); Harvard University, School of Public Health, Boston, Massachusetts (N.L.) *This research was supported by grant CA09424-03 from the National Cancer Institute and grant GM-29745 from the National Institute of Health. We are grateful to Frederick Mosteller, Tom Louis, and Katherine Halvorsen for critical readings of various drafts, encouragement, and advice. Address reprint request to: RebeccaDerSimonian, Yale University School of Medicine, P.O. Box 3333, New Haven, CT 06510. Received March 25, 1986; acceptedApril 7, 1986. Controlled ClinicalTrials 7:177-188(1986) © ElsevierSciencePublishing Co., Inc. 1986 52 Vanderbi|t Ave., New York, New York 10017 177 0197-2456/86/$3.50 178 R. DerSimonian and N. Laird ments while others are less well controlled. Because of differing sample sizes and patient populations, each study has a different level of sampling error as well. Thus one problem in combining studies for integrative purposes is the assignment of weights that reflect the relative "value" of the information provided in a study. A more difficult issue in combining evidence is that one may be using incommensurable studies to answer the same question. Armitage [1] emphasizes the need for careful consideration of methods in drawing inferences from heterogeneous but logically related studies. In this setting, the use of a regression analysis to characterize differences in study outcomes may be more appropriate [2]. This paper discusses an approach to meta-analysis which addresses these two problems. In this approach, we assume that there is a distribution of treatment effects and utilize the observed effects from individual studies to estimate this distribution. The approach allows for treatment effects to vary across studies and provides an objective method for weighting that can be made progressively more general by incorporating study characteristics into the analysis. We illustrate the use of this model in several examples, and based on the empirical evidence, suggest a simple noniterative procedure for testing and estimation. DATABASE In a systematic search of the first ten issues published in 1982 of each of four weekly journals (NEJM, JAMA, BMJ, and Lancet), Halvorsen [3] found only one article (out of 589) that considered combining results using formal statistical methods. Our data consist of an ad hoc collection of such articles from the medical literature found through references provided by colleagues and through bibliographic references in articles already located [4-11]. The method we propose applies to several additional articles that have come to our attention since our original analyses [12-14]. We examine in detail eight review articles each reporting results from several related trials. Each review pools the results from the relevant trials in order to evaluate the efficacy of a certain treatment for a specified medical condition. In most of these reviews the original investigators pool the results from the relevant trials and estimate an overall treatment effect without first checking whether the treatment effect across the trials is constant. Others exclude some trials and combine the results only from trials that are similar in design and implementation. The investigators who do check for homogeneity of treatment effect before pooling use different criteria to assess this homogeneity. With two exceptions [6,11], the reviews consider randomized trials only. The two reviews that include nonrandomized studies analyze the data from the two groups of studies (randomized and nonrandomized) separately. In this study, we restrict our attention to the results of randomized trials only. We first describe the eight reviews identifying each by its first author, and in Table 1 summarize the methods used in each review: Winship: A review of eight trials that compare the healing rates in duodenal ulcer patients treated with cimetidine or placebo therapy [4]. ~a ¢$} ~-1 Table 1 The M e t h o d s and O u t c o m e Measures Used in the Original Reviews Outcome measure Overall effect estimate Miao DeSflva difference in proportions difference in proportions difference in proportions relative risk Stjernsward difference in proportions Baum difference in proportions pooled raw data pooled raw data weighted average Mantel-Haenszel estimate for pooled relative risk minimum and maximum pooled raw data Peto Chalmers difference in proportions difference in proportions unweighted average Winship Corm Test of homogeneity Chi-Square Test (Qw) Gilbert, McPeek, and Mosteller estimate of variance [15] Mentions lack of heterogeneity ".,.1 ~D 180 R. DerSimonian and N. Laird Conn: A review of nine trials that compare the survival rates in alcoholic hepatitis patients with steroids or control therapy [5]. Miao: A review of six trials that compare gastric with sham freezing in the treatment of duodenal ulcer [6]. In addition, this review considers 14 observational and two controlled but nonrandomized studies. DeSilva: A review of six trials that evaluate the effect of lignocaine on the incidence of ventricular fibrillation in acute myocardial infarction [7]. This study originally considered 15 trials but because the trials vary widely in treatment schedules and doses, some criteria for adequacy of treatment are established and only six trials that fulfill these requirements are analyzed. Stjernsward: A review of five trials that compare the 5-year survival rates of patients with cancer of the breast treated with surgery plus radiotherapy or surgery alone [8]. Baum: A review of 26 trials that evaluate the efficacy of antibiotics in the prevention of w o u n d infection following colon surgery [9]. Peto: A review of six trials that evaluate the efficacy of aspirin in the prevention of secondary mortality in persons recovered from myocardial infarction [101. Chalmers: A review of a number of trials that evaluate the efficacy of anticoagulants in the treatment of acute myocardial infarction [11]. Data from 18 surveys employing historical controls (HCT), eight studies employing alternately assigned controls (ACT), and six randomized controlled trials (RCT) are given. Three endpoints, total case fatality rates, case fatality rates excluding early deaths, and thromboembolism rates are considered, although not all studies report all three endpoints. Here we consider thromboembolism and total case fatality rates in the RCTs only. The results from randomized trials are compared to those of nonrandomized ones (HCTs and ACTs) in Laird and DerSimonian [16]. METHODS We consider the problem of combining information from a series of k comparative clinical trials, where the data from each Hal consist of the number of patients in treatment and control groups, nT and no, and the proportion of patients with some event in each of the groups, rT and r o Letting i index the trials, we assume that the numbers of patients with the event in each of the study groups are independent binomial random variables with associated probabilities pTi and Pci, i = 1. . . . . k. The basic idea of the random effects approach is to parcel out some measure of the observed treatment effect in each study, say yi, into two additive components: the true treatment effect, 0~, and the sampling error, e~. The variance of ei is the sample variance, s 2, and is usually calculated from the data of the ith observed sample. The true treatment effect associated with each trial will be influenced by several factors, including patient characteristics as well as design and execution of the study. To explicitly account for the variation in the true effects, the model assumes Oi = p. + ~i 181 Meta-Analysis in Clinical Trials where 0i is the true treatment effect in the ith study, Wis the m e a n effect for a population of possible treatment evaluations, and 8i is the deviation of the ith s t u d y ' s effect from the population mean. We regard the trials considered as a sample from this population and use the observed effects to estimate p~ as well as the population variance [var(8) = A2]. Here, ~k2 represents both the degree to which treatment effects vary across experiments as well as the degree to which individual studies give biased assessments of treatment effects. The model just described can thus be characterized by two distinct sampling stages. First we sample a s t u d y from a population of possible studies with m e a n treatment effect W a n d variance in treatment effects of A2. Then we sample observations in the ith s t u d y with underlying treatment effect 0~. One issue which deserves some attention is the specification of treatment effect, 0~. Three c o m m o n l y used measures are the risk difference, pr~ - pc/, the relative risk, pTi/pc,, and the relative odds, [pTi/(1 - pTi)/pcJ(1 - Pc/)]. The relative odds is popular because of its suitability in retrospective or case control studies, and because it has some interesting mathematical properties. In particular, if we a s s u m e a constant relative o d d s (0i = ~ or A2 = 0), then the M a n t e l - H a e n s z e l statistic is optimal for testing Ho: W = 1, and there is considerable literature on efficient estimates of V- and on m e t h o d s for testing Ho: 01 = 02 = . . . = Ok. Despite these advantages, the relative odds (and the closely related relative risk) suffers in interpretability. By far the most intuitively appealing measure for trials of clinical efficacy is the risk difference, since it measures actual gains which can be expected in terms of percentages of patients treated. Besides relevance of the measure and statistical efficiency, it is also desirable to choose a measure which is nearly constant over studies, so that the effect of heterogeneity is minimized. Unless there is no treatment effect at all (pTi = p c / f o r all i), constancy of treatment effect in one scale (say pzJPci = A for all i) implies variation across studies in another (pT, -- Pc/, say). Thus it is conceivable that the w r o n g choice of scale could imply heterogeneity in treatment effects which would not exist if a different measure were used. However, this is not likely to h a p p e n in practice unless there is a very wide range in the control rates (Pc~) or all the rates are very close to zero (or one). In such cases, one might w a n t to do the analysis in both the relative odds and risk difference scales. HOMOGENEITY OF TREATMENT EFFECT To evaluate constancy of treatment effect across strata, we use a large sample test based on the statistic Q = £ wi(yi - yw)2, where yi is the ith l treatment effect estimate, ~ = E, wi y ~ wi is the weighted estimator of treat- m e n t effect, and wi is the inverse of the ith sampling variance. The test statistic Q is the s u m of squares of the treatment effect about the m e a n where the ith square is weighted by the reciprocal of the estimated variance. Under the null hypothesis, Q is approximately a x 2 statistic with k - 1 degrees of freedom; thus, w h e n each s t u d y has a large sample size relative to the n u m b e r of strata, Q m a y be used to test Ho: A2 = 0. 182 R. DerSimonian and N. Laird When y~ is a difference in proportions, rTi -- rc,, we estimate the sampling variance in the ith study, s2, by S 2 = rwi (1 -- rTi)/nTi + r c i (1 - (1) rci)/nci , and use Qw = E w;(y~ - ~w)2 to test constancy of treatment effect. The weights in Q may vary according to the assumptions made about the sampling variances. For instance, when the sampling variances can be assumed to be equal, then w~, i = 1 . . . . . k, is the inverse of a common sampling variance s2. One review [9], which includes a qualitative assessment of homogeneity of treatment effect, uses the method of Gilbert et al. [15] to estimate the magnitude of the variation across the differences in proportions. Since the method of Gilbert et al. for estimating the variation in treatment effects assumes a common sampling variance, we calculate Qu, the analogue of Qw, assuming equal sampling variances. Here, the treatment effect is again the difference in proportions, but W i = S -2, i = 1. . . . . k, where s~ = ~ s~/k, i and s2 is defined in equation (1). We also use the Q statistic for testing homogeneity in the relative odds scale. In this scale, Q L = ~,~ w i ( y i i yw) 2 where yi = In [rTi Wi ~- S~ 2' (1 -- rci) / rci (1 y w "~ ~--a w i y i / ~ i i - rTi)], Wi, and S 2 = [rlTirTi (1 -- rTi)]-1 + [ncirci (1 - rci)] -1. In the large sample case, QL is analogous to the goodness-of-fit test in logistic models [17]. An alternate test statistic for assessing homogeneity is the likelihood ratio test which is computationally more cumbersome than the Q statistic used here [18]. ESTIMATION A N D COMPUTATION Most of the reviews consider the differences in proportions as a measure of treatment effect (Table 1). For estimating tL and A2 we also restrict our attention to this scale. When A2 ~ 0, Qw is used to derive a noniterative estimate of A2 by equating the sample statistic with the corresponding expected value. This yields a weighted estimator 183 Meta-Analysis in Clinical Trials A2 = max {0, {Q,, - (k - 1)} / [ ~ wi - (~_,w~i / ~_, wi)]}, i i i where Qw, ~tw, wi are as described above. The weighted least squares or Cochran's [19] semiweighted estimator of i• is (2) ~w = ~ , w* y , / ~ , w*, i i where w* = ( w i - 1 + 4 2 ) - I (3) The asymptotic standard error of I~w is s.e. (l~w) = ( ~ w*) -112. (4) i W h e n the sampling variances are a s s u m e d to be equal, these equations reduce to: Az, = max [0, {'~P~(y, - y)2 / (k - 1)} - s2], i and s.e. 0~,) = [(s 2 + A2)/k] 1<2, where .~ = ~ y i / k i and s 2 = ~ s 2 /k. i Rao et al. [20] derive A2 from an u n w e i g h t e d s u m of squares procedure a n d s h o w that it is also the Minque estimator w h e n the sampling variances are all equal. The u n w e i g h t e d mean, ~,, is equivalent to the estimate of the treatment effect in reviews that use the average difference in proportions to assess the overall treatment efficacy. With an additional a s s u m p t i o n that yi is N(0;, s2) a n d 0i is N(~, A2), we also c o m p u t e m a x i m u m likelihood (ML) and restricted m a x i m u m likelihood (REML) estimates a n d compare t h e m to the noniterative ones. The m a x i m u m likelihood estimates of the u n k n o w n parameters are those values that maximize the probability density function of the data. In REML estimation, the likelihood to be maximized is slightly modified to adjust for ~ a n d A2 being estimated from the same data. The REML estimators are the iterative equivalents of the weighted estimators above. Both ML and REML estimates of and its s.e. take the form given in equations (2) a n d (4) with weights given in (3), but differ in the w a y A2 is estimated. The ML estimating equations are given in Rao et al. [20] and the REML equations are reviewed by Harville [21]. For implementing the ML or REML procedures, we use the EM algorithm [22] which is an iterative procedure for computing m a x i m u m likelihood estimates appropriate w h e n the observations can be viewed as incomplete data. 184 R. DerSimonian and N. Laird RESULTS Homogeneity of Treatment Effect We present the statistics for testing h o m o g e n e i t y of treatment effect in Table 2. For these reviews Qw, the weighted statistic in the difference scale, and QL, the analogous statistic in the log odds scale, imply similar conclusions about the constancy of treatment effect. The assumption of homogeneity holds in the reviews by DeSilva [7], Stjernsward [8], Peto [10], and in Chalmers' [11] randomized controlled trials (case fatality rates). In the remaining five sets of trials, the evidence suggests heterogeneity of treatment effect irrespective of the scale of measurement. The review by Peto [10] mentions lack of heterogeneity in treatment effects across trials. For this review, the Qu statistic supports the homogeneity assumption (p value -- 0.65), whereas both Q~ and QLsupport that assumption only marginally (p value = 0.12). Baum et al. [9] estimate the variability in treatment differences using the m e t h o d of Gilbert et al. [15] and conclude that relative to within study variation (assumed equal for all studies), between study variation is negligible. This qualitative assessment is not consistent with the results of Table 2 where a c o m m o n treatment effect across the trials in this review does not seem to hold in either scale of measurement. The third review which includes a test of h o m o g e n e i t y before pooling the results [6] uses a slightly modified version of Q~ to test this hypothesis and the conclusion of lack of h o m o g e n e i t y agrees with the result in Table 2. As in the review by Peto [10], Qu and Q~ imply different conclusions about the homogeneity of treatment effect in the review by Winship [4]. In this review also, the m e t h o d assuming equal weights (Qu) implies homogeneity of effect while the weighted one implies the opposite. These results emphasize that the variation in the treatment effect across several trials is often not negligible and should be incorporated into the anal- Table 2 Test of H o m o g e n e i t y ~ dff Winship Conn Miao DeSilva Stjernsward Baum Peto Chalmers Thromboembolism Case fatality rates QJ QLe 15.2b 7 8 5 5 4 25 5 21.7b 9.1 2.1 40.4b 9.0 15.6b 20.6b 18.8b 4.4 2.2 35.2b 9.2 5 5 12.3b 3.5 10.3b 2.4 15.6b aFigures in Tables 2-4 are based on data available at the time of review publication. bp value <0.10. CDegrees of freedom. dQ statistic in difference scale (unequal weights). ~Q statistic in log odds scale (unequal weights). fQ statistic in difference scale (equal weights). Qd 7.9 19.3b 20.9b 7.3 2.7 35.3b 3.5 10.6b 1.8 Meta-Analysis in Clinical Trials 185 ysis of the overall treatment efficacy. Lack of homogeneity holds both when the treatment effect is the difference in proportions and when it is the log odds. The unweighted statistic which assigns an equal weight to each study may not be appropriate for testing homogeneity when differences in sample sizes and/or underlying proportions across studies are large. Estimation For all four methods of estimation we present estimates of ~ and its s.e. in Table 3, and estimates of A2 in Table 4. The estimates of A2, and s.e. (~) are quite similar in the weighted noniterative method, maximum likelihood, and restricted maximum likelihood procedures. The A2s from these three methods are zero or nearly so in the reviews by DeSilva [7], Stjernsward [8], Peto [10], and Chalmers' [11] randomized trials (case fatality rates). These same reviews have Q statistics that are small relative to their degrees of freedom (Table 2). The weighted method and the REML estimation procedures consistently yield slightly higher values of A2 than the ML procedure. This is because both these procedures adjust for p, and A2 being estimated from the same data whereas the ML procedure does not. The estimates of ~ and its s.e. from these three procedures are expected to be similar since the estimates of A2 are almost equal. Comparing the unweighted method of moments with the other three methods, we find that the estimates for A2 from this method differ, and sometimes differ widely, from the estimates of the other three methods but without any consistent pattern. The estimates of ~ and its s.e. from the unweighted method also differ from the estimates of the other three methods and these differences are not necessarily due to the differences in A2s. In Chalmers' [11] randomized trials (case fatality rates), for instance, even when A2 is zero for all four methods, the estimate of ~ is 0.042 (s.e. = 0.024) for the unweighted method while it is 0.029 (s.e. = 0.012) for the other three methods. The original reviewers report the unweighted average of the observed rate differences Table 3 Estimated Overall Effects and Their Standard Errors a Winship Conn Miao DeSilva Stjernsward Baurn Peto Chalmers Thromboembolism Case fatality rates p,u b p,wc p,M a p,R e 0.406 (0.046) 0.102 (0.092) 0.077 (0.125) 0.026 (0.019) 0.046 (0.020) 0.203 (0.031) 0.018 (0.008) 0.389 (0.058) 0.075(0.072) 0.094(0.111) 0.027(0.019) 0.041 (0.018) 0.208 (0.026) 0.015(0.008) 0.384(0.053) 0.070(0.063) 0.095(0.106) 0.026(0.017) 0.041(0.018) 0.208(0.025) 0.014(0.008) 0.387(0.056) 0.073(0.069) 0.093(0.118) 0.027(0.019) 0.041 (0.018) 0.208(0.026) 0.015(0.008) 0.102 (0.036) 0.042(0.024) 0.079(0.020) 0.029(0.012) 0.078(0.017) 0.029(0.012) 0.078(0.020) 0.029(0.012) aFigures in parentheses represent the standard errors of the correspondingestimates. bNoniterative estimates with equal weights. CNoniterativeestimates with weights to reflect unequal variances. dMaximumlikelihoodestimates. eRestricted maximumlikelihoodestimates. 186 R. DerSimonian and N. Laird Table 4 Estimated Variation in the True Effects Winship Conn Miao DeSilva Stjernsward Baum Peto Chalmers Thromboembolism Case fatality rates A u2a A w2b /~M2c A R2d 0.0020 0.0442 0.0716 0.0007 0 0.0072 0 0.0137 0.0208 0.0540 0.0009 0 0.0062 0.0002 0.0096 0.0112 0.0482 0.0006 0 0.0049 0.0002 0.0117 0.0176 0.0638 0.0009 0 0.0057 0.0002 0.0041 0 0.0012 0 0.0007 0 0.0012 0 aNoniterative estimateswith equal weights. bNoniterative estimates with weights to reflect unequal variances. CMaximumlikelihoodestimates. eRestricted maximumlikelihoodestimates. (0.042) as an estimate of overall treatment efficacy. The weighted estimate of the treatment effect which weights the observed effects in relation to sample size is lower than the unweighted average, since some of the larger studies have smaller estimated treatment effects. DISCUSSION We have used a simple random effects model for combining evidence, and applied it to characterize the distribution of treatment effects in a series of studies. The model is useful both in summarizing the data and in illustrating the different kinds of results which one obtains from randomized and nonrandomized studies. In general, studies with greater potential for bias, such as uncontrolled or nonrandomized ones, show greater treatment effect as well as greater heterogeneity [2,16]. One important finding that emerges from this investigation is that heterogeneity of treatment effects across studies is common and should be incorporated into the analysis. The random effects model incorporates this heterogeneity, however small, in the analysis of the overall efficacy of the treatment. The method estimates the magnitude of the heterogeneity, and assigns a greater variability to the estimate of overall treatment effect to account for this heterogeneity. In principle, we can extend the model to include pertinent covariate information [2]. Utilizing covariate information may substantially reduce the heterogeneity of effects and thus allow for more specific therapeutic recommendations. This is often difficult in practice, however, since covariate information may be missing for some studies. Improvement in publication standards for medical reporting and further methodological work for handling missing covariate information are needed to strengthen our ability to combine results from clinical studies. For estimating the overall treatment effect and the variation of effects across studies, our results suggest that the weighted noniterative method is an attractive procedure because of the comparability of its estimates with those of the maximum likelihood methods and because of its relative simplicity. On Meta-Analysis in Clinical Trials 187 the o t h e r h a n d , the u n w e i g h t e d m e t h o d which ignores differences in sample sizes yields estimates that often differ from the estimates of the other m e t h o d s . A p r o b l e m in pooling data w e have not a d d r e s s e d here is that of publication bias. This p r o b l e m relates to studies being executed, but not reported, usually because t r e a t m e n t effect has not b e e n found. Reviewers generally r e c o u n t those studies that a p p e a r to be w o r t h w h i l e and discount those that are unpublished or are not in a g r e e m e n t with a favored g r o u p of studies. The m e t h o d we describe here r e p r e s e n t s a systematic, quantitative pooling of available data to resolve controversies about a treatment effect. With each individual controversy, u n p u b l i s h e d information m a y be elicited a n d along with recent findings the m e t h o d can be u s e d to u p d a t e the results. In all our w o r k w e a s s u m e that the sampling variances are k n o w n , although in reality w e estimate t h e m from the data. Further research n e e d s to be d o n e in this area as there are alternative estimators that might be preferable to the ones we use. For instance, if the sample sizes in each s t u d y are small, t h e n sampling variances based on pooled estimates of the proportions in the treatm e n t a n d control g r o u p s might be better than the ones based on estimates of p r o p o r t i o n s from the individual studies. A n o t h e r alternative is to shrink the individual p r o p o r t i o n s towards a p o o l e d estimate before calculating the variances. Further investigation is n e e d e d before one single m e t h o d emerges as superior. REFERENCES 1. Armitage P: Controversies and achievements in clinical trials. Controlled Clin Trials 5: 67-72, 1984 2. DerSimonian R, Laird N: Evaluating the effect of coaching on SAT Scores: a metaanalysis. Harvard Ed Rev 53: 1-15, 1983 3. Halvorsen K: Combining results from independent investigations: meta-analysis in medical research. In: Medical Uses of Statistics, Bailar JC, Mosteller F, Eds. Boston: New England Journal of Medicine (in press) 4. Winship D: Cimetidine in the treatment of duodenal ulcer. Gastroenterology 74: 402-406, 1978 5. Conn H: Steroid treatment of alcoholic hepatitis. Gastroenterology 74: 319-326, 1978 6. Miao L: Gastric freezing: an example of the evaluation of medical therapy by randomized clinical trials. In: Costs, Risks, and Benefits of Surgery, Bunker JP, Barnes BA, Mosteller F, Eds. New York: Oxford University Press, 1977, pp. 198-211 7. DeSilva RA, Hennekens CH, Lown B, Casscells W: Lignocaine prophylaxis in acute myocardial infarction: An evaluation of randomized trials. Lancet ii: 855-858, 1981 8. Stjernsward J: Decreased survival related to irradiation post-operatively in early operable breast cancer. Lancet ii: 1285-1286, 1974 9. Baum ML, Anish DS, Chalmers TC, Sacks HS, Smith H, Fagerstrom, RM: A survey of clinical trials of antibiotic prophylaxis in colon surgery: evidence against further use of no-treatment controls. N Engl J Med 305:795-799, 1981 10. Peto R: Aspirin after myocardial infarction. Lancet i: 1172-1173, 1980 (unsigned editorial) 11. Chalmers TC, Matta RJ, Smith H, Kunzler AM: Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. N Engl J Med 297: 1091-1096, 1977 188 R. DerSimonian and N. Laird 12. Stampfer MJ, Goldhaber SZ, Yusuf S, Peto R, Hennekens CH: Effect of intravenous streptokinase on acute myocardial infarction: Pooled results from randomized trials. N Engl J Med 307: 1180-1182, 1982 13. Long-term and short-term beta-blockade after myocardial infarction. Lancet i: 1159-1161, 1982 14. Wortman PM, Yeaton WH: Synthesis of results in controlled trials of coronary artery bypass graft surgery. Evaluation Studies Review Annual 1983 15. Gilbert JP, McPeek B, Mosteller F: Progress in surgery and anesthesia: benefits and risks of innovative therapy. In: Costs, Risks, and Benefits of Surgery, Bunker JP, Barnes BA, Mosteller F, Eds. New York: Oxford University Press, 1977, pp. 124-169 16. Laird N, DerSimonian, R: Issues in combining evidence from several comparative trials of clinical therapy. In: Proceeding of the XIth International Biometric Conference. 1982, pp. 91-97 17. Breslow NE, Day NE: Statistical methods in cancer research. International Agency for Research on Cancer, 1980, pp. 136-146 18. Hedges LV, Olkin I: Statistical methods for meta-analysis. London: Academic Press, 1985, pp. 122-127 19. Cochran WG: Adjustments in analysis. In: Planning and Analysis of Observational Studies, Moses LE, MosteUer F, Eds. New York: Wiley, 1983, pp. 102-108 20. Rao PS, Kaplan J, Cochran WG: Estimators for the one-way random effects model with unequal error variances. J Am Stat Assoc 76: 89-97, 1981 21. Harville DA: Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc 72: 320-338, 1977 22. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1-38, 1977