Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
PUBLICATION BIAS IN PSYCHOLOGY Meta-Analyses in Psychology Often Overestimate Evidence for and Size of Effects František Bartoš1∗ , Maximilian Maier2∗ , David R. Shanks2 , T.D. Stanley3,4 , Martina Sladekova5 , Eric-Jan Wagenmakers1 * Both authors contributed equally 1 Department of Psychological Methods, University of Amsterdam 2 Department of Experimental Psychology, University College London 3 Deakin Laboratory for the Meta-Analysis of Research (DeLMAR), Deakin University 4 Department of Economics, School of Business, Deakin University 5 School of Psychology, University of Sussex Author Note Correspondence concerning this article should be addressed to František Bartoš, Department of Psychological Methods, University of Amsterdam E-mail: f.bartos96@gmail.com 1 PUBLICATION BIAS IN PSYCHOLOGY 2 Abstract Adjusting for publication bias is essential when drawing meta-analytic inferences. However, most methods that adjust for publication bias are sensitive to the particular research conditions, such as the degree of heterogeneity in effect sizes across studies. Sladekova et al. (2022) tried to circumvent this complication by selecting the methods that are most appropriate for a given set of conditions, and concluded that publication bias on average causes only minimal over-estimation of effect sizes in psychology. However, this approach suffers from a Şcatch-22Ť problem Ů to know the underlying research conditions, one needs to have adjusted for publication bias correctly, but to correctly adjust for publication bias, one needs to know the underlying research conditions. To alleviate this problem we conduct an alternative analysis, Robust Bayesian meta-analysis (RoBMA), which is not based on model-selection but on model-averaging. In RoBMA, models that predict the observed results better are given correspondingly higher weights. A RoBMA reanalysis of Sladekova et al.’s data reveals that more than 60% of meta-analyses in psychology notably overestimate the evidence for the presence of the meta-analytic effect and more than 50% overestimate its magnitude. Our results highlight the need for robust bias correction when conducting meta-analyses and for the adoption of publishing formats such as Registered Reports that are less prone to publication bias. Keywords: publication bias, RoBMA, model-selection, model-averaging, Bayesian inference PUBLICATION BIAS IN PSYCHOLOGY 3 Meta-Analyses in Psychology Often Overestimate Evidence for and Size of Effects Meta-analysis is widely regarded as the best way to combine and summarize seemingly conĆicting evidence across a set of primary studies. However, publication bias Ů the preferential publishing of statistically signiĄcant studies Ů often causes meta-analyses to overestimate mean effect sizes (Borenstein et al., 2009; Rosenthal & Gaito, 1964; Vevea & Hedges, 1995). Therefore, a key question concerns the extent to which meta-analytic estimates represent reliable indicators even when publication bias is left unaccounted for. To address this question, Sladekova, Webb, and Field (2022; henceforth SWF) compiled an extensive data set of 433 meta-analyses from the Ąeld of psychology and assessed the typical overestimation of effect sizes using methodologically advanced techniques and a model-selection procedure recently developed by Carter et al. (2019). SWF concluded that on average, effect size estimates were only marginally lower after accounting for publication bias. The most aggressive average adjustment was provided by precision effect test (PET) models, ∆r = −0.032, 95% CI [−0.055, −0.009]); moreover, meta-analyses comprised of few studies often exhibited an anomalous upward adjustment. In their analyses, SWF speciĄed four plausible data-generating processes and selected the best estimator for each based on the Ąndings of a simulation study by Carter et al. (2019). As different publication bias adjustment methods are generally found to perform well under different conditions, Carter et al. (2019) provided code that allows researchers to select the most suitable publication bias correction method based on speciĄc assumed research conditions, such as the true degree of heterogeneity in the effect sizes included in the meta-analysis. Whereas this approach presents a substantial improvement over the common practice of applying bias correction methods with little regard to the observed meta-analytic conditions, we believe that in empirical practice it is nigh impossible to execute as intended. The reason is that critical research characteristics (i.e., the true effect size, heterogeneity, and degree and type of publication bias) cannot be PUBLICATION BIAS IN PSYCHOLOGY 4 accurately estimated unless one Ąrst adjusts for publication bias. Alternatively, specifying multiple conditions might result in different estimates, leaving the analyst with incompatible conclusions. Therefore, the approach by Carter et al. (2019), as employed by SWF, creates a catch-22 problem (Heller, 1961): to correctly adjust for publication bias, one needs to know the underlying research conditions; however, in order to know the underlying research conditions, one needs to have adjusted correctly for publication bias.1 A second challenge for the Şselect-the-best-estimatorŤ approach is that the Carter et al. (2019) simulation is based on speciĄc assumptions about the data generating process. As with all simulations, the question is how well the data generating process actually corresponds to publication bias as it operates in the real world (Stanley et al., 2021). In their discussion, SWF point out that an alternative solution is provided by Bayesian model-averaging (Hinne et al., 2020; Hoeting et al., 1999). Bayesian model-averaging (e.g., robust Bayesian meta-analysis or RoBMA; Bartoš et al., in press; Bartoš, Maier, Wagenmakers, et al., 2021; Maier et al., 2022) simultaneously considers an entire ensemble of models for publication selection and potential research conditions. The data then guides the inference to be based most strongly on those models that best predict the observed research results. In this way the Bayesian model-averaging of publication bias models alleviates the catch-22 problem outlined above. SWF discuss how RoBMA would be a good alternative approach; here, we follow SWF’s suggestion and re-analyse their dataset with RoBMA. To preview, a very different (and, we argue, more credible) conclusion emerges from this re-analysis. A third challenge for the for the Şselect-the-best-estimatorŤ approach is that investigations based on empirical data show that the speciĄc correction methods employed 1 The type and degree of publication bias as well as the true effect size is generally unknown, a problem which SWF sidestepped by calculating four possible models of publication bias and effect size. Moreover, the random-effects’ heterogeneity estimates that are required to select the best method depends on the degree and type of publication bias (Augusteijn et al., 2019; Hönekopp & Linden, 2022) PUBLICATION BIAS IN PSYCHOLOGY 5 by SWF do not adjust for publication bias sufficiently. In particular, Kvarven et al. (2020) compared estimates from publication bias-adjusted meta-analyses to Registered Replication Reports on the same topic (Chambers, 2013; Chambers et al., 2015). Registered Reports are a publication format in which a submitted manuscript receives peer review and Şin principleŤ acceptance based on the introduction and methods section alone. Hence the journal commits itself to publishing the report independent of the outcome, as long as the data pass pre-speciĄed outcome-neutral quality checks. Therefore, Registered Reports are not affected by publication bias and can be considered the Şgold standardŤ of evidence. Consequently, a publication bias adjustment method that works well ought to produce an effect size estimate that is similar to the one from a Registered Report on the same topic. By comparing Registered Reports to associated meta-analyses, Kvarven et al. (2020) showed that the publication bias correction methods employed in SWF lead to substantial overestimation of effect size and under-estimation of the required correction (but see Lewis et al. (2020) for a criticism of this approach, which argues that the difference might partly be explained by genuine effect heterogeneity rather than publication bias). In contrast, Bartoš, Maier, Wagenmakers, et al. (2021) demonstrated that RoBMA results in estimates that are less biased and have considerably lower root mean square errors. Finally, in their work SWF focus solely on the impact of publication bias adjustment on meta-analytic effect size. In practice, researchers also wish to know whether there is a genuine effect in the Ąrst place (Jeffreys, 1961; Jeffreys, 1973). A Bayesian analysis allows us to quantify the evidence for a non-null effect and assess its posterior probability, while circumventing problems of frequentist signiĄcance testing (e.g., Wagenmakers, 2007; Wagenmakers et al., 2016). In sum, by applying multiple models to the data simultaneously, RoBMA avoids the catch-22 problem that plagues the Şselect-the-best-estimatorŤ approach. Moreover, RoBMA does not underadjust for publication bias (Bartoš, Maier, Wagenmakers, et al., 2021), and offers a Bayesian way to quantify how much publication bias inĆates the PUBLICATION BIAS IN PSYCHOLOGY 6 evidence for the presence of an overall effect. In the next sections, we apply RoBMA to the meta-analysis dataset compiled by SWF. The RoBMA re-analysis shows that many meta-analyses suffer from publication bias in the sense that both the effect size and the evidence for the presence of the effect are substantially overestimated (52.7% and 60.8%, respectively). Method The RoBMA Model Ensemble Here we describe how we employed the robust Bayesian model-averaging methodology. The remaining publication bias adjustment methods used in SWF are explained in more detail therein and in Carter et al. (2019). The complete RoBMA-PSMA model ensemble (as implemented in Bartoš, Maier, Wagenmakers, et al., 2021; simply referred below as RoBMA) employs models that can be categorized along three research dimensions: presence vs. absence of the effect, heterogeneity across reported effects, and publication selection bias. Each of these hypotheses is assigned a prior model probability of 1/2, reĆecting a position of equipoise. The individual models speciĄed within the RoBMA ensemble then represent a combination of these research characteristics with prior model probabilities corresponding to the product of prior probabilities of each corresponding hypothesis. For models representing the presence of publication bias, the prior model probability is equally split amongst the various selection models and PET-PEESE, then further split equally among the different selection models or between PET and PEESE. The complete RoBMA-PSMA ensemble consists of 36 different models. The hypothesis about the absence of the effect is represented by a point prior distribution on the effect size at 0, µ = 0, and the hypothesis about the presence of the effect is represented by a standard normal prior distribution on Cohen’s d effect size, µ ∼ Normal(0, 1), representing the most common range of effect sizes in psychology. The PUBLICATION BIAS IN PSYCHOLOGY 7 hypothesis about the absence of heterogeneity is represented by a point prior distribution on the heterogeneity at 0, τ = 0, and the hypothesis about the presence of heterogeneity is represented by an inverse-gamma distribution, τ ∼ Inverse-Gamma(1, 0.15) (with scale and shape parameterization; corresponding to Cohen’s d effect sizes), based on empirical heterogeneity estimates from the Ąeld of psychology (van Erp et al., 2017). The hypothesis about the absence of publication bias is instantiated by not applying any publication bias corrections, and the hypothesis about the presence of publication bias is instantiated by applying a set of six weight functions (Larose & Dey, 1998; Maier et al., 2021; Vevea & Hedges, 1995), and both the PET and PEESE models (Stanley et al., 2017) to adjust for publication bias. The weight functions are speciĄed as a combination of cutoffs on signiĄcant, marginally signiĄcant p-values and the direction of the effect. The cumulative unit Dirichlet prior distributions enforce a decreasing relative prior probability with increasing p-values which further helps with performance of selection models. The PET and PEESE models are speciĄed as meta-regressions of the effect sizes on the standard errors or standard errors squared with a truncated Cauchy distributions on the PET and PEESE regression coefficients, PET ∼ Cauchy+ (0, 1), PEESE ∼ Cauchy+ (0, 5), which enforce a positive relationship between standard errors and effect sizes. More details on the RoBMA speciĄcation are presented in Bartoš, Maier, Wagenmakers, et al. (2021). The performance of RoBMA has been evaluated extensively in simulation studies as well as empirical comparisons. In particular, Bartoš, Maier, Wagenmakers, et al. (2021) reanalysed a large simulation study by Hong and Reed (n.d.), which itself combined four different previous simulation environments comprising 1,640 separate experimental conditions (Alinaghi & Reed, 2018; Bom & Rachinger, 2019; Carter et al., 2019; Stanley et al., 2017). In these simulations, RoBMA outperformed other methods for publication bias correction in terms of bias and root mean squared error. RoBMA was also evaluated empirically by comparing meta-analyses that are linked to Registered Replication Reports in Kvarven et al. (2020). As discussed above, comparing meta-analysis bias corrections to a PUBLICATION BIAS IN PSYCHOLOGY 8 Şground truthŤ as revealed by Registered Reports allows us to evaluate whether a given correction sufficiently adjusts for likely publication bias. In the Kvarven et al. (2020) comparison of meta-analyses and Registered Reports, RoBMA was shown to provide the best adjustment for publication bias when evaluated by average bias and/or root mean square error by Bartoš, Maier, Wagenmakers, et al. (2021). Nonetheless, RoBMA and Bayesian model-averaging are only as good as the models incorporated in the ensemble. Since none of the meta-analytic models employed in RoBMA directly adjusts for p-hacking, RoBMA can exhibit downward bias in cases with strong p-hacking (Bartoš, Maier, Wagenmakers, et al., 2021). Effect Size Transformation In contrast to SWF, we analyzed the effect sizes using the Fisher z scale (and subsequently transformed the meta-analytic estimates back to the correlation scale for interpretation). We prefer the Fisher z scale for two reasons. First, it is unbounded (i.e., not restricted to the (−1, 1) interval) and the sampling distribution is approximately normal, which corresponds to the likelihoods used by meta-analytic models (this also prevents adjusted meta-analytic correlation estimates falling outside of (−1, 1), which is anomalous).2 Second, the Fisher z score and its standard error are by deĄnition orthogonal, which is an important assumption for models adjusting for the relationship between effect sizes and standard errors as in PET-PEESE (this was not an issue in SWF as they used standard errors of Fisher’s z alongside the correlation effect sizes). The use of the Fisher’s z scale results in slight differences in (a) selected methods for each condition (as the reduced range of correlation effect sizes limits the possible heterogeneity), and (b) effect sizes estimates of those selected methods. These differences, however, do not change the qualitative conclusions. 2 Use of the Fisher z transformation did necessitate the removal of 51 reported correlation coefficients equal to 1 from 4 meta-analyses. PUBLICATION BIAS IN PSYCHOLOGY 9 Effects of Publication Bias on Evidence and on Effect Size We extended the SWF results by Ąrst assessing the extent to which publication bias inĆates the evidence for the presence of an effect. Then, similarly to SWF, we also evaluated and compared the effect of publication bias on the meta-analytic estimates of effect size. To evaluate the change in evidence for the presence of the effect, we compared the posterior probability for the presence of the effect under RoBMA to the posterior probability for the presence of the effect under RoBMA after excluding the models that adjust for publication bias. The publication bias unadjusted version of RoBMA corresponds to the Bayesian model-averaged meta-analysis (BMA; e.g., Bartoš, Gronau, et al., 2021; Gronau & Wagenmakers, 2018; Gronau et al., 2017). For both RoBMA and BMA, the prior model probability for the presence of the effect is set to 1/2. Furthermore, we summarize the results as the change in the percentage of meta-analyses that provide at least moderate or strong evidence for either the null or alternative hypothesis based on the Şrule of thumbŤ Bayes factor categories that have been proposed to facilitate the interpretation of Bayes factors (i.e., BF > 3 is moderate evidence, and BF > 10 is strong evidence; Jeffreys, 1939; Lee & Wagenmakers, 2013). To evaluate the change in the meta-analytic estimate of effect size, we compared the model-averaged posterior mean obtained from RoBMA to effect size estimates from two meta-analytic methods that do not adjust for publication bias. The Ąrst comparison is to a random effects meta-analysis (reMA) which is regarded as the default meta-analytic method in behavioral research. The comparison of reMA and RoBMA estimates therefore quantiĄes the reduction in effect size that obtains when researchers use RoBMA instead of the standard methodology. The second comparison is to a different version of Bayesian model-averaged meta-analysis (BMA; Bartoš, Gronau, et al., 2021; Gronau et al., 2021; Gronau et al., 2017) that is identical to RoBMA apart from the fact that BMA lacks the models that adjust for publication bias; consequently, the comparison of BMA and RoBMA PUBLICATION BIAS IN PSYCHOLOGY 10 estimates quantiĄes the reduction in effect size that can be attributed solely to publication bias adjustment. Finally, we compare the effect size adjustments due to RoBMA against the adjustments due to the methods presented by SWF. We employ the same Bayesian hierarchical models as SWF to estimate the mean publication bias adjustment, for SWF’s model selection and RoBMA separately. When analyzing results from the SWFs model selection, we combined 3PSM and 4PSM into a single category (PSM) and only estimated Ąxed effects for methods that were selected at least 20 times. We performed the analysis in R R Core Team, 2021 using the RoBMA R package Bartoš and Maier, 2020 and additional R functions adopted from SWF and Carter et al. (2019). The analysis scripts and results are available at https://osf.io/7yzut/. Results Evidence for the Presence of the Effect First, we used RoBMA to evaluate inĆation of the posterior probability for the presence of the effect. Figure 1 shows the estimates for the presence of an effect before (x-axis) and after (y-axis) the publication bias adjustment. The dotted diagonal line highlights the points of no change in posterior probability of the alternative hypotheses due to publication bias. For many meta-analyses the evidence for the presence of an effect is considerably lower after adjusting for publication bias, which is further exempliĄed by the marginal densities of the posterior probabilities on the right and top sides of the Ągure. Across all meta-analyses, the median posterior probability drops from 0.97, interquartile range (IRQ; 0.44, 1.00), to 0.53, IQR (0.26, 0.91), indicating considerable inĆation of evidence due to publication bias. Nevertheless, for 39.2% of the meta-analyses the posterior probability for the presence of the effect did not change by more than 0.05, indicating that a notable proportion of psychology meta-analyses are relatively robust to publication bias. Furthermore, the percentage of meta-analyses providing strong or at least moderate PUBLICATION BIAS IN PSYCHOLOGY 11 Figure 1 Posterior probability for the presence of the effect from the publication bias adjusted vs. unadjusted models. 1.0 Adjusted P(H1 | Data) 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Unadjusted P(H1 | Data) Note. Some meta-analyses show more evidence for an effect after publication bias adjustment. This anomaly occurs when the correction methodology adjusts a nearly zero effect size (showing evidence for the null hypothesis) further down to be slightly negative. This increases the evidence for an effect but in the opposite direction compared to the estimate of the original meta-analysis. evidence for the alternative hypothesis (i.e., BF10 > 10 and BF10 > 3) decreased from 55.7% to 24.9% and from 64.3% to 36.9%, respectively. Interestingly, the proportion of meta-analyses providing strong or at least moderate evidence for the null hypothesis (i.e., BF10 < 1/10 and BF10 < 1/3) increased only marginally, from 4.7% and 5.2% and from PUBLICATION BIAS IN PSYCHOLOGY 12 18.5% to 23.9%, respectively 3 . Most of the change in evidence was due to the increase in the Śundecided’ evidence category, (i.e., 1/3 > BF10 > 3), from 17.2% to 39.2%. Effect Size Estimates In addition to the impact on the posterior probability for the presence of the effect, we can also quantify the degree to which publication bias impacts the effect size estimates. Figure 2 and Figure 3 show the impact of adjusting for publication bias on the meta-analytic estimates. The dotted diagonal lines in Figure 2 highlight the points of no change in the effect size estimates due to publication bias. After adjusting for publication bias, many estimates are considerably lower. SpeciĄcally, the publication bias unadjusted meta-analytic effect sizes corresponded mostly to small to medium sized effects based on random effects meta-analyses r = 0.17, IQR (0.09, 0.30), and BMA r = 0.15, IQR (0.04, 0.28). However, the publication bias adjustment provided by RoBMA reduced the estimates to predominantly small sized effects (i.e., r = 0.07, IQR (0.01, 0.22)). Whereas the distributions of the publication bias unadjusted and adjusted effect size estimates were notably different, the distribution of differences between the estimates was highly skewed with many meta-analyses undergoing only small publication bias adjustments (right panel of Figure 4). The median adjustment from random effect meta-analyses to RoBMA was r = −0.07, IQR (−0.12, −0.04), and the median adjustment from BMA to RoBMA was r = −0.04, IQR (−0.08, −0.01). Interestingly, the comparison of BMA and RoBMA, quantifying the adjustment attributable only to the publication bias adjustment part, revealed that 47.3% of meta-analytic effect size estimates are adjusted by less than r = 0.03,4 again indicating that not all meta-analytic estimates are distorted by 3 This can be partly explained by the difficulty of Ąnding evidence for the null using priors centered on zero as in RoBMA (and most other applications of Bayesian testing). We still chose to use these priors, as they have other desirable properties and they have been evaluated extensively in applied examples and simulation studies (Bartoš, Maier, Wagenmakers, et al., 2021). 4 Other possible ŤthresholdsŞ would results in 24.1% with r = 0.01, 37.7% with r = 0.02, 54.4% with PUBLICATION BIAS IN PSYCHOLOGY 13 Figure 2 RoBMA effect size (correlations) Effect size estimates from the publication bias adjusted vs. unadjusted models. 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 −0.2 −0.2 −0.2 0.0 0.2 0.4 0.6 0.8 reMA effect size (correlations) 1.0 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 BMA effect size (correlations) Note. Model-averaged posterior mean effect size estimates based on RoBMA (y-axis) vs. mean effect size estimates based on random effects meta-analysis (x-axis, left panel) and model-averaged posterior mean effect size estimates based on BMA (x-axis, right panel). One outlier adjusted to −0.46 (from 0.15 with BMA and 0.16 with reMA) is omitted from the display.) publication bias. Van Aert et al. (2019) argue that meta-analyses with low heterogeneity show little evidence of publication bias. To assess the impact of heterogeneity, we conducted an exploratory regression analysis predicting the effect size adjustment attributable to publication bias from heterogeneity with the unadjusted effect size estimate as a covariate (to account for the fact that meta-analyses with larger effect sizes on average show larger absolute bias and larger τ ). We found that publication bias and heterogeneity were indeed associated, BF10 = 8.96 × 108 , b = −0.20 95% CI [−0.26, −0.15]. Contrary to the r = 0.04, and 63.5% with r = 0.05 PUBLICATION BIAS IN PSYCHOLOGY 14 Figure 3 Comparison of densities of the effect size estimates from the publication bias adjusted vs. unadjusted models. RoBMA − Random effects RoBMA − BMA Density Random effects BMA RoBMA −0.2 0.0 0.2 0.4 0.6 Effect size (correlations) 0.8 1.0 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 Effect size change (correlations) Note. Densities of meta-analytic estimates under each method (left) and densities of differences between the unadjusted and adjusted estimates (right). One outlier adjusted to −0.46 is omitted from the display. conclusions of van Aert et al. (2019), we obtain moderate evidence in favor of the effect size overestimation even amongst homogeneous studies (i.e., no heterogeneity, tested via the coefficient for intercept), BF10 = 8.66, Fisher’s z = −0.02, 95% CI [−0.03, 0.00]. However, the effect size adjustment in homogeneous meta-analyses is much smaller than the average effect size adjustment across all meta-analyses (Fisher’s z = −0.06, 95% CI [−0.07, −0.05]).5 5 We calculated estimates and CIs using the a standard linear model in R and corresponding Bayes factors using a normal approximation (e.g., Bartoš & Wagenmakers, 2022; Dienes, 2014; Spiegelhalter et al., 2004) specifying centered normal prior distributions with standard deviation of 0.6. This prior corresponds to a standard deviation of 0.3 on Cohen’s d scale, which tests for small effects. PUBLICATION BIAS IN PSYCHOLOGY 15 Comparison to Results from SWF We compared the effect size adjustment based on RoBMA to the adjustment based on the model selected under different assumptions about the incidence of publication bias and effect size called ŞModel 1Ť though ŞModel 4Ť by SWF. In short, ŞModel 1Ť speciĄed the presence of moderate publication bias and small effect sizes, ŞModel 2Ť speciĄed the presence of high publication bias and small effect sizes, ŞModel 3Ť speciĄed the presence of moderate publication bias and large effect sizes, and ŞModel 4Ť speciĄed the presence of high publication bias and large effect sizes. Figure 4 Comparison of the publication bias adjustment generated by RoBMA and the remaining methods under different modes of publication bias as designed by Sladekova et al. (2022) 1.00 WAAP−WLS PET PEESE PET−PEESE PSM 0.75 Effect size adjustment (r) 0.50 0.25 0.00 −0.25 −0.50 −0.75 −1.00 RoBMA vs. reMA RoBMA vs. BMA Model 1 Model 2 Model 3 Model 4 Figure 4 compares the effect size adjustments in the individual studies by RoBMA PUBLICATION BIAS IN PSYCHOLOGY 16 and under the different models of SWF. The most noticeable difference between the effect size adjustments is that RoBMA did not correct any of the effect size estimates in the opposite direction. As reported before, the median effect size adjustment of RoBMA, r = −0.07, IQR (−0.12, −0.04), and r = −0.04, IQR (−0.08, −0.01), when comparing to random effects meta-analysis and BMA, was larger than adjustments of the other methods, 0.00 IQR (−0.04, 0.03), 0.00 IQR (−0.08, 0.05), 0.00 IQR (−0.04, 0.03), 0.00 IQR (−0.06, 0.03) for Models 1, 2, 3, and 4, respectively. The proportion of meta-analyses where RoBMA (compared to BMA) adjusted by less than r = 0.03 (reported earlier, 47.3%) was higher than in three out of four Models; 44.8%, 30.3%, 50.0%, and 41.9%, for Models 1, 2, 3, and 4, respectively (but not for adjustment from random effects meta-analysis to RoBMA, 23.2%). In other words, while RoBMA, on average, adjusted effect sizes more aggressively than other methods, it targeted the adjustment to a lower proportion of meta-analyses than the remaining methods. Furthermore, the RoBMA effect size adjustments were less dispersed than the effect size adjustments under the different models of publication bias; even when considering the spread of the absolute values of the effect size adjustments, the standard deviation of the absolute values of the effect size adjustments by RoBMA was 0.071 vs. 0.131, 0.142, 0.133, and 0.136 under Models 1, 2, 3, and 4, respectively (and 0.076 for adjustment from random effects meta-analysis to RoBMA). This further suggests that RoBMA is sensitive to the speciĄc research conditions seen in each individual meta-analysis and adjusts accordingly. Finally, we estimated a three-level Bayesian model describing the effects of different publication bias adjustments with the same speciĄcation as in (Sladekova et al., 2022) (combining the 3 and 4PSM category into PSM) for RoBMA and each Model separately. Figure 5 compares the Ąxed effect estimates of the different methods under the different Models. The Ąxed effect estimate of the RoBMA adjustment, βRoBMA = −0.04, 95% CI [−0.05, −0.03] is notably more negative than the adjustments of the remaining methods under the Model 1: βPEESE = −0.01, 95% CI [−0.03, 0.02], βPSM = 0.00, 95% CI PUBLICATION BIAS IN PSYCHOLOGY 17 Figure 5 Comparison of the publication bias adjustment performed by RoBMA and the remaining methods under different models of publication bias, as constructed by SWF Model 1 Model 2 Model 3 Model 4 RoBMA WAAP−WLS PEESE PET PSM −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050 0.075 0.100 Change in effect size (r) [−0.01, 0.01], Model 2: βPEESE = 0.00, 95% CI [−0.05, 0.04], βPET = −0.01, 95% CI [−0.03, 0.02], βPSM = 0.00, 95% CI [−0.02, 0.02], Model 3: βPSM = 0.00, 95% CI [−0.01, 0.01], βWAAP-WLS = 0.00, 95% CI [−0.01, 0.01], or Model 4: βPEESE = 0.00, 95% CI [−0.02, 0.01], βPSM = −0.02, 95% CI [−0.04, −0.01], βWAAP-WLS = 0.00, 95% CI [−0.01, 0.02]. Concluding Comments It is widely accepted that different meta-analysis methods perform well under different conditions. Hence it can be risky to employ a single method to estimate the extent to which meta-analyses in general over-estimate effect sizes. SWF attempted to PUBLICATION BIAS IN PSYCHOLOGY 18 circumvent this complication by selecting different adjustment methods for four plausible conditions based on heterogeneity estimates indicated by a naive random effect meta-analysis. Their article was a much needed contribution to the bias adjustment literature, being the Ąrst comprehensive review that tried to select estimators appropriate for different data generating scenarios on an impressively large and representative dataset. Here we outlined an alternative approach based on Bayesian model-averaging. Rather than selecting a single model for each case and assumed data generating process, our robust Bayesian meta-analysis simultaneously considers multiple models, with their contribution to the meta-analytic inference determined by their predictive accuracy. The difference is not a point of methodological pedantry but has a considerable impact on the conclusions regarding the necessary degree of publication bias adjustment. Whereas SWF found little overestimation of effect sizes due to publication bias, similarly to van Aert et al. (2019), and for some methods, even larger effects after adjustment, RoBMA often corrects more strongly and reveals the presence of notable bias. In addition, RoBMA also allowed us to assess the amount of spurious evidence, indicating that evidence for meta-analytic effect sizes is considerably weaker after publication bias is accounted for. The difference between the effect size correction provided by RoBMA and the remaining methods cannot be solely attributed to a downward bias of RoBMA. First, RoBMA did not adjust meta-analytic effect size downward in many of the analyzed meta-analyses. Second, in Appendix E of Bartoš, Maier, Wagenmakers, et al. (2021), we applied RoBMA to 28 meta-analyses from Many Labs 2 (Klein et al., 2018), a multi-lab Registered Replication Report, where we know that publication bias is absent. Therefore, if a method still detects publication bias or notably corrects the estimate downwards, this is likely indicative of bias. When we applied RoBMA to the Many Labs 2 dataset, we found no notable downward bias, unlike other publication bias adjustment methods. Our analysis shows that it is important to employ multi-model methods when adjusting for publication bias, as model selection is problematic in the absence of strong PUBLICATION BIAS IN PSYCHOLOGY 19 knowledge about the data generating process. Our extension of the SWF work suggests that the effects of publication bias are more deleterious than previously estimated. However, it remains the case that for a sizeable proportion of studies, the correction is relatively modest. The considerable overestimation of effect sizes and evidence for the effect highlights the importance of using appropriate bias correction methods and the imperative to adopt publishing formats that are robust to publication bias, such as Registered Reports (Chambers et al., 2015). Data Availability Statement The data and R scripts for performing the analyses are openly available on OSF at https://osf.io/7yzut/. Conflicts of Interest The authors declare that there were no other conĆicts of interest with respect to the authorship or the publication of this article. Acknowledgements This work was supported by The Netherlands Organisation for ScientiĄc Research (NWO) through a Vici grant (#016.Vici.170.083) to Eric-Jan Wagenmakers. PUBLICATION BIAS IN PSYCHOLOGY 20 References Alinaghi, N., & Reed, W. R. (2018). Meta-analysis and publication bias: How well does the FAT-PET-PEESE procedure work? Research Synthesis Methods, 9 (2), 285Ű311. https://doi.org/10.1002/jrsm.1298 Augusteijn, H. E., van Aert, R., & van Assen, M. A. (2019). The effect of publication bias on the Q test and assessment of heterogeneity. Psychological Methods, 24 (1), 116Ű134. https://doi.org/10.1037/met0000197 Bartoš, F., Gronau, Q. F., Timmers, B., Otte, W. M., Ly, A., & Wagenmakers, E.-J. (2021). Bayesian model-averaged meta-analysis in medicine. Statistics in Medicine, 40 (30), 6743Ű6761. https://doi.org/10.1002/sim.9170 Bartoš, F., & Maier, M. (2020). RoBMA: An R package for robust Bayesian meta-analyses [R package version 2.1.1]. https://CRAN.R-project.org/package=RoBMA Bartoš, F., Maier, M., Quintana, D., & Wagenmakers, E.-J. (in press). Adjusting for publication bias in JASP and R Ů Selection models, PET-PEESE, and robust Bayesian meta-analysis. Advances in Methods and Practices in Psychological Science. https://doi.org/10.31234/osf.io/75bqn Bartoš, F., Maier, M., Wagenmakers, E.-J., Doucouliagos, H., & Stanley, T. D. (2021). Robust Bayesian meta-analysis: Model-averaging across complementary publication bias adjustment methods. https://doi.org/10.31234/osf.io/kvsp7 Bartoš, F., & Wagenmakers, E.-J. (2022). Fast and accurate approximation to informed Bayes factors for focal parameters. arXiv preprint arXiv:2203.01435. https://doi.org/10.48550/arXiv.2203.01435 Bom, P. R., & Rachinger, H. (2019). A kinked meta-regression model for publication bias correction. Research Synthesis Methods, 10 (4), 497Ű514. https://doi.org/10.1002/jrsm.1352 Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2009). Introduction to meta-analysis. John Wiley & Sons. PUBLICATION BIAS IN PSYCHOLOGY 21 Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2 (2), 115Ű144. https://doi.org/10.1177/2515245919847196 Chambers, C. D. (2013). Registered reports: A new publishing initiative at cortex. Cortex, 49 (3), 609Ű610. https://doi.org/10.1016/j.cortex.2012.12.016 Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P., & Willmes, K. (2015). Registered reports: Realigning incentives in scientiĄc publishing. Cortex, 66, A1ŰA2. Dienes, Z. (2014). Using Bayes to get the most out of non-signiĄcant results. Frontiers in Psychology, 5:781. https://doi.org/10.3389/fpsyg.2014.00781 Gronau, Q. F., & Wagenmakers, E.-J. (2018). Bayesian evidence accumulation in experimental mathematics: A case study of four irrational numbers. Experimental Mathematics, 27, 277Ű286. https://doi.org/10.1080/10586458.2016.1256006 Gronau, Q. F., Heck, D. W., Berkhout, S. W., Haaf, J. M., & Wagenmakers, E.-J. (2021). A primer on Bayesian model-averaged meta-analysis. Advances in Methods and Practices in Psychological Science, 4 (3). https://doi.org/10.1177%5C%2F25152459211031256 Gronau, Q. F., van Erp, S., Heck, D. W., Cesario, J., Jonas, K. J., & Wagenmakers, E.-J. (2017). A Bayesian model-averaged meta-analysis of the power pose effect with informed and default priors: The case of felt power. Comprehensive Results in Social Psychology, 2 (1), 123Ű138. https://doi.org/10.1080/23743603.2017.1326760 Heller, J. (1961). Catch-22. Simon; Schuster. Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E.-J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science, 3 (2), 200Ű215. https://doi.org/10.1177/2515245919898657 PUBLICATION BIAS IN PSYCHOLOGY 22 Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14 (4), 382Ű401. https://doi.org/10.1214/SS%5C%2F1009212519 Hönekopp, J., & Linden, A. H. (2022). Heterogeneity estimates in a biased world. PloS One, 17 (2), 1Ű21. https://doi.org/10.1371/journal.pone.0262809 Hong, S., & Reed, W. R. (n.d.). Using monte carlo experiments to select meta-analytic estimators. Research Synthesis Methods, 12 (2), 192Ű215. https://doi.org/https://doi.org/10.1002/jrsm.1467 Jeffreys, H. (1939). Theory of probability (1st Edition). Oxford University Press. Jeffreys, H. (1961). Theory of probability (3rd Edition). Oxford University Press. Jeffreys, H. (1973). Scientific inference (3rd Edition). Cambridge University Press. Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams Jr, R. B., Alper, S., Aveyard, M., Axt, J. R., Babalola, M. T., Bahník, Š., et al. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1 (4), 443Ű490. https://doi.org/10.1177/515245918810225 Kvarven, A., Strømland, E., & Johannesson, M. (2020). Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behaviour, 4 (4), 423Ű434. https://doi.org/10.1038/s41562-019-0787-z Larose, D. T., & Dey, D. K. (1998). Modeling publication bias using weighted distributions in a Bayesian framework. Computational Statistics & Data Analysis, 26 (3), 279Ű302. https://doi.org/10.1016/S0167-9473(97)00039-X Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press. Lewis, M., Mathur, M., VanderWeele, T., & Frank, M. C. (2020). The puzzling relationship between multi-lab replications and meta-analyses of the rest of the literature. https://doi.org/10.31234/osf.io/pbrdk PUBLICATION BIAS IN PSYCHOLOGY 23 Maier, M., Bartoš, F., & Wagenmakers, E.-J. (2022). Robust Bayesian meta-analysis: Addressing publication bias with model-averaging. Psychological Methods. 10.1037/met0000405 Maier, M., VanderWeele, T., & Mathur, M. (2021). Using selection models to assess sensitivity to publication bias: A tutorial and call for more routine use. https://doi.org/10.31222/osf.io/tp45u R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/ Rosenthal, R., & Gaito, J. (1964). Further evidence for the cliff effect in interpretation of levels of signiĄcance. Psychological Reports, 15 (2), 570. https://doi.org/10.2466/pr0.1964.15.2.570 Sladekova, M., Webb, L. E., & Field, A. P. (2022). Estimating the change in meta-analytic effect size estimates after the application of publication bias adjustment methods. Psychological Methods. Spiegelhalter, D. J., Abrams, K. R., & Myles, J. P. (2004). Bayesian approaches to clinical trials and health-care evaluation. John Wiley & Sons. Stanley, T. D., Doucouliagos, H., Ioannidis, J. P. A., & Carter, E. C. (2021). Detecting publication selection bias through excess statistical signiĄcance. Research Synthesis Methods, 12 (6), 776Ű795. https://doi.org/10.1002/jrsm.1512 Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. (2017). Finding the power to reduce publication bias. Statistics in Medicine, 36 (10), 1580Ű1598. https://doi.org/10.1002/sim.7228 van Aert, R. C., Wicherts, J. M., & Van Assen, M. A. (2019). Publication bias examined in meta-analyses from psychology and medicine: A meta-meta-analysis. PloS one, 14 (4), e0215052. https://doi.org/10.1371/journal.pone.0215052 van Erp, S., Verhagen, J., Grasman, R. P., & Wagenmakers, E.-J. (2017). Estimates of between-study heterogeneity for 705 meta-analyses reported in Psychological PUBLICATION BIAS IN PSYCHOLOGY 24 Bulletin from 1990Ű2013. Journal of Open Psychology Data, 5 (1), Article 4. http://doi.org/10.5334/jopd.33 Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60 (3), 419Ű435. https://doi.org/10.1007/BF02294384 Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14 (5), 779Ű804. https://doi.org/10.3758/BF03194105 Wagenmakers, E.-J., Morey, R. D., & Lee, M. D. (2016). Bayesian beneĄts for the pragmatic researcher. Current Directions in Psychological Science, 25 (3), 169Ű176. https://doi.org/10.1177/0963721416643289