Meta Prop

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/325486099
How to Conduct a Meta-Analysis of Proportions in R: A Comprehensive Tutorial
Preprint · June 2018

DOI: 10.13140/RG.2.2.27199.00161
CITATIONS READS
0 2,774
1 author:
Naike Wang
Texas A&M University
1 PUBLICATION 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
How to Conduct a Meta-Analysis of Proportions in R: A Comprehensive Tutorial View project
All content following this page was uploaded by Naike Wang on 01 June 2018.
The user has requested enhancement of the downloaded file.

Conducting Meta-Analyses of Proportions in R
Naike Wang, M.A.
naike.wang@jjay.cuny.edu
John Jay College of Criminal Justice, CUNY
1
Abstract
Meta-analysis of proportions is observational and non-comparative in nature. Rarely have
we seen a study or tutorial demonstrate how a meta-analysis of proportions should be per-
formed using the R programming language. This tutorial intends to fill this gap. The tutorial
consists of two major components: (1) a comprehensive, critical review of the process of con-
ducting a meta-analysis of proportions, in which a number of common practices that possibly
lead to biased estimates and misleading inferences are highlighted (e.g., not taking study size
and within-group estimates of between-study variance into consideration when calculating
mean proportions in the presence of subgroups), and (2) a step-by-step guide to conducting
the analysis using R. The process is described in six stages: (1) setting up the R environment
and getting a sense of the data being analyzed; (2) calculating effect sizes; (3) identifying and
quantifying heterogeneity; (4) constructing forest plots; (5) explaining heterogeneity with
moderator analysis; and (6) assessing publication bias. In the last section (assessing publi-
cation bias), we argued that funnel plot analyses developed for investigating publication bias
in randomized controlled trials may not be suitable for use with meta-analyses of proportions.
Three computational options are incorporated in the code for users to choose from to trans-
form proportional data. The presentation of the tutorial is conceptually oriented, the use of
formulas is kept to a minimum, and a published meta-analysis of proportions is used as an
example to illustrate how to implement the R code and interpret the results of the analysis.
Generic R code is provided for readers to use for their analyses. A video tutorial is also pro-
vided to facilitate learning (watch the video at wangnaike.com).
There are several good reasons for a reader to use this tutorial: (1) one does not need to
purchase expensive statistical software like Comprehensive Meta-Analysis (CMA) to perform
this particular type of meta-analysis, because our code yields exactly the same results as those
CMA delivers; (2) our code yields more accurate estimates for mean proportions in the pres-
ence of subgroups compared to the code other meta-analysis authors have used; (3) our code
allows readers to convert proportional data using three transformation methods (no transfor-
mation, the logit, and the double-arcsine transformation), whereas CMA only performs the
logit transformation; (4) this tutorial will help readers understand why publication bias (due
to non-significant results) is not pertinent in the context of meta-analysis of proportions.
2
Contents
1 Introduction 5
2 Setting up the R environment 6
2.1 R and RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Setting up the working directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Getting a sense of the data 7
3.1 Illustrative example: Prevalence and epidemiological characteristics of congenital
cataract (Wu et al., 2016) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Preferred formats for organizing data in Excel . . . . . . . . . . . . . . . . . . . . . . 8
4 Calculating effect sizes 10
4.1 Fixed-effect and random-effects model . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Transformation of proportions: the logit transformation and the double-arcsine
transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 R code for calculating summary effect sizes . . . . . . . . . . . . . . . . . . . . . . . 15
5 Identifying and quantifying heterogeneity 19
5.1 Overview of heterogeneity: the between-study variance (τ 2 ) . . . . . . . . . . . . . . 19
5.2 Test for heterogeneity (Q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Quantifying heterogeneity (I 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 An important caveat: model selection should not be solely based on heterogeneity
tests and statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 R code for outputting results of the heterogeneity test and statistics . . . . . . . . . 23
6 Creating and interpreting forest plots 24
6.1 Visual inspection of forest plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Identifying outliers with formal tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3 R code for creating forest plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.4 R code for identifying outlying and influential studies . . . . . . . . . . . . . . . . . 28
6.5 R code for removing outlying studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Explaining heterogeneity with moderator analyses 33
7.1 Overview of moderator analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3
7.2 Subgroup analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.3 An important caveat regarding obtaining an overall summary effect size in the pres-
ence of subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.4 Meta-regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.5 Visualizing moderator analysis: Scatter plots . . . . . . . . . . . . . . . . . . . . . . 36
7.6 An important caveat: Results of moderator analyses cannot be seen as causal evi-
dence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.7 R code for calculating subgroup summary proportions, conduct subgroup analysis,
and recalculate the overall summary proportion . . . . . . . . . . . . . . . . . . . . . 37
7.8 R code for creating forest plots in the presence of subgroups . . . . . . . . . . . . . . 45
7.9 R code for conducting meta-regression . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.10 R code for visualizing moderator analyses . . . . . . . . . . . . . . . . . . . . . . . . 50
8 The issue of publication bias in meta-analyses of proportions 55
8.1 Overview of publication bias in the context of meta-analyses of proportions . . . . . 55
8.2 Detecting publication bias through visual inspection of funnel plots in meta-analyses
of randomized controlled trials (RCTs) . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.3 An important caveat: Funnel plot asymmetry does not equal to publication bias . . 56
8.4 Detecting publication bias with formal tests: rank correlation test, Egger’s regres-
sion test, and trim-and-fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.5 An important caveat: A significant p-value is not indicative of the presence of pub-
lication bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.6 R code for creating funnel plots and perform asymmetry tests . . . . . . . . . . . . . 59
References 62
4
1 Introduction
A meta-analysis statistically synthesizes the quantitative findings from multiple studies that inves-
tigate the same research question, providing a numerical summary and estimate of a research area
in an effort to better direct future work. Proportion is defined as the number of cases of interest
in a sample with a particular characteristic, divided by the size of the sample (Lipsey & Wilson,
2001). Meta-analyses of proportions synthesize a one-dimensional binomial measure known as the
(weighted) average proportion (Nyaga et al., 2014), which is an average of the results (i.e., pro-
portions) of multiple studies weighted by the inverse of their sampling variances using either the
fixed-effect or random-effects model (Wang & Liu, 2016).
While most meta-analyses focus on effect size metrics that are indicative of a relationship between
a treatment group and a control group, be it standardized mean difference, odds ratio, relative
risk, or risk difference, a meta-analysis of proportions has the goal of obtaining a more precise es-
timate of the overall proportion for a certain case or event (Borenstein et al., 2009; Barendregt, et
al., 2013).
Studies included in meta-analyses of proportions are observational and non-comparative (i.e.,
single-arm). In other words, each study contributes a number of “successes” and a sample size
(Hamza et al., 2008). For instance, a meta-analysis can be conducted to integrate several esti-
mates for the proportion of individuals who suffer from both post-traumatic stress disorder and
substance use disorder in samples of homeless veterans in various urban areas.
Meta-analyses of proportions are commonly conducted in a variety of prominent fields, such as
medicine (Gillen, 2010), clinical psychology (Fusar-Poli et al., 2016), epidemiology (Wu et al.,
2016), public health (Keithlin et al., 2014), etc. Results from these studies are frequently used for
decision models (Hunter et al., 2014).
Many researchers have used the R programming software (R Development Core Team, 2017) to
conduct meta-analysis of proportions (e.g., Ammirati et al., 2013; Fusar-Poli et al., 2016; Wu et
al., 2016). The reason why people choose R over other software to conduct meta-analyses is pri-
marily because it is free. With other statistical software, one has to pay an expensive license fee
for a limited period of use: you can usually use the software for a year, after which you have to
pay again in order to renew your license. Thus, you may no longer have access to the programs
you learned when you leave school. Additionally, there is a growing library of R packages (i.e.,
extensions to R) that have been developed for all kinds of specialized applications including meta-
analysis. This remarkable feature opens up all sorts of possibilities and flexibility in terms of ma-
5
nipulating data, especially in the case of meta-analyzing proportions.
It is important to note that we usually need to apply transformations to proportional data in or-
der to improve their statistical properties in terms of their distribution and variance. Two of the
most commonly used data transformation methods are the logit and the double-arcsine transfor-
mation (not transforming data is also appropriate under certain circumstances). We will discuss
this in more detail below. Both the metafor (Viechtbauer, 2010) and meta package (Schwarzer
et al., 2015) can perform these transformations whereas other statistical software specifically de-
signed to perform meta-analyses, such as Comprehensive Meta-Analysis (Borenstein et al., 2005)
and MedCalc (Schoonjans, 2017), are only able to perform one of them. Additionally, Comprehen-
sive Meta-Analysis and MedCalc will automatically transform data, but users can decide whether
data should be transformed in R.
Rarely have we seen any R tutorials that show how to conduct a full meta-analysis of proportions,
either in the literature or on the Internet. The purpose of this tutorial is to provide an introduc-
tion on how to perform meta-analyses of proportions in R. As far as we know, this is the first tu-
torial that demonstrates how to do this. This tutorial overviews core statistical constructs and
issues in relation to meta-analyses of proportions and shows an example to illustrate how to con-
duct the analysis using data extracted from a published meta-analytic study in R. Due to space
limitations, we only give one example to show the readers how data transformation can be incor-
porated properly as an integral part in a meta-analysis of proportions study. The R code has been
validated by CMA. The results yielded by the two software are identical.
Please note that this tutorial is designed for medium to advanced-level R users. we assume readers
have a basic understanding of the principles of meta-analysis; in particular, we will not discuss the
processes of searching the literature and collecting, coding, and extracting data.
2 Setting up the R environment
2.1 R and RStudio
First things first, you need to download R. The base R program (the latest version is 3.4.1) can
be downloaded for free from the Comprehensive R Archive Network, (https://cran.r-project.org/).
R provides a basic graphical user interface (GUI), but we would recommend that readers use a
more productive code editor that interfaces with R known as RStudio. This is a development en-
vironment that is built to make using R as effective and efficient as possible. It adds much more
6
functionality above and beyond R’s bare bones GUI. To use RStudio, you have to install the lat-
est version of R. RStudio is available for free at https://www.rstudio.com/. Once RStudio is in-
stalled on your computer, the first thing we do is to create a new R Script or Markdown file. we
would always recommend readers code in Markdown files. The reason for this is beyond the scope
of this article, but we suggest readers explore their website to get a sense of the unique features of
R Markdown.
2.2 Setting up the working directory
Next, readers need to set the programming environment by defining a working directory for the
current R session. A working directory is a place (e.g., a folder) for reading in your data and sav-
ing all of your work, such as code you have written, any plots produced, etc. It is useful for keep-
ing your work organized. To set up a working directory, type the following code in the RStudio
source editor (the upper left area):
setwd("C:/ data")
It is worth noting that when an .rmd file is created in an RStudio Project, the default directory is
the the folder containing this particular .rmd file.
3 Getting a sense of the data
3.1 Illustrative example: Prevalence and epidemiological characteristics

of congenital cataract (Wu et al., 2016)
We will illustrate the conduct of a full meta-analysis of proportions using data extracted from a
published meta-analytic study. The purpose of this exercise is to illustrate the process of meta-
analyzing proportions and not to answer substantive research questions. The example employs
data from a meta-analysis conducted by Wu et al. (2012). In their meta-analytic study, Wu et
al. (2012) estimated the prevalence of congenital cataracts (CC) and their main epidemiological
traits. CC refers to opacity of the lens detected at birth or at an early stage of childhood. It is the
primary cause of treatable childhood blindness worldwide. Current studies have not determined
the etiology of this condition. The few large-scale epidemiological studies on CC also have limita-
tions: they involve specific regions, limited populations and partial epidemiological variables. Wu
et al. (2016) aimed to explore its etiology and estimate its population-based prevalence and ma-
7
jor epidemiological characteristics, morphology, associated comorbidities and etiology. The origi-
nal dataset consists of 27 published studies that were published from 1983 to 2014, among which
17 contained data on the population-based prevalence of CC, 2 hospital-based studies and 8 CC-
based case reviews. Samples investigated in the studies were from different regions of the world,
including Europe, Asia, the USA, Africa, and Australia. The sample sizes of the included studies
ranged from 76 to 2,616,439 patients, with a combined total of 8,302,708 patients. The diagnosed
age ranged from 0 to 18 years of age. The proportions were transformed by the logit transforma-
tion, the most commonly used transformation when dealing with proportional data that results in
a more normal sampling distribution with a mean of zero, and a standard deviation of 1.83. The
authors coded 5 moderators, including world region (China vs. the rest of the world), study de-
sign (birth cohort vs. other), sample size (less vs. more than 100,000), diagnosed age (older vs.
younger than 1 year old), and research period (before vs. after the year 2000). All of these poten-
tial moderators are categorical variables. For the present tutorial, we will work with only a subset
of the moderating variables that were employed in the example study, including study design and
sample size.
3.2 Preferred formats for organizing data in Excel
Prior to performing a meta-analysis in R, it is first important to properly organize the data file.
An excerpt of the example data set that is used in this tutorial is shown in Table 1. In this ta-
ble, each row represents the data extracted from each independent study included in the current
meta-analysis and each column represents the variables that are mandatory to create in order to
properly compute effect sizes, create plots, and conduct further analysis. Following is an excerpt
of the dataset extracted from Wu et al. (2016).
We have separate columns for author names and publication years. This will be useful when we
need to sort studies according to the publication year in R. We also need to create a column con-
taining both author names and publication years, if we decide to use the forest() command in the
meta package to create forest plots. This is because it is not able to combine the author names
and the years of publication in a row automatically. In this case, the column is labeled autho-
ryear. Note that, when a data file is imported into R, if our column names contain letters in up-
percase, they will be converted to lowercase, which means that we are not able to use uppercase/low-
ercase letters to distinguish between different columns. We need to use different column names to
distinguish between them. In addition, we cannot leave a blank between two words. As can be
8
seen in the table, we use authoryear instead of author year, studydesign instead of study design,
samplesize instead of sample size. The variable cases represents the number of the event of inter-
est in the sample of each study. The variable total represents the sample size of each study. When
we divide the values that are contained in cases by the ones in total respectively, we will obtain
the proportions we need to compute effect sizes, which are labeled yi in R. R will also calculate
sampling variances based on the data, which are labeled vi. The rest of the variables that are part
of the dataset are potential moderators (I have only included study design and sample size as ex-
amples in the present excerpt of the data file. Readers can download the data file on my Github
site to get access to the full dataset), which will be examined in either a subgroup analysis or a
meta-regression. For instance, study design, as a potential moderator, has two categories or lev-
els: birth cohort and others. We have 1 and 0 coded respectively for each category in the column
labeled studesg. It is actually not mandatory to create the columns labeled studydesign and sam-
plesize , but it would be more convenient for us to include them because we could then see what 1
and 0 actually represent. I should note that we can use either one of the columns to conduct mod-
erator analyses and it will yield exactly the same results except when we want to create a scatter-
plot to visualize the analysis. If we use the studydesign column, R will create a box plot instead
of a scatterplot. For continuous moderators, readers can simply create columns that contain con-
tinuous values (e.g., the year column). We save the data in an Excel spreadsheet in a comma sep-
arated values (.csv) format in the working directory. We name the file data. csv (you can name it
whatever you want).
9
4 Calculating effect sizes
4.1 Fixed-effect and random-effects model
Before the meta-analysis can be conducted, we need to make a basic choice between two model-
ing approaches for calculating the summary effect size: the fixed-effect and random-effects model
(Hedges, 1992; Hedges & Vevea, 1998; Hunter & Schmidt, 2000). The fixed-effect model assumes
that studies included in a meta-analysis are functionally equivalent, and thus, they share a com-
mon true effect size. Put differently, the true effect sizes are identical across studies. The only rea-
son the observed effect sizes vary across the studies is due to the random sampling error inherent
in each study, namely, the within-study variance. Put differently, participants in the studies come
from a single common population and go through the same experimental procedures performed
by the same researchers under the same conditions. For instance, a series of studies with the same
protocol conducted in the same lab, sampling from the same population (e.g., school children from
the same class) can qualify for the fixed-effect model. However, these conditions rarely hold in re-
ality. True effect sizes actually vary from study to study because in the vast majority of cases we
include a group of studies on a common topic, but these studies are usually performed in differ-
ent ways, which causes the true effect sizes to vary (Barendregt et al., 2013). Therefore, instead
of being identical across the studies, the true effects follow a normal distribution. The random-
effects model allows the included studies to have true effect sizes that are not identical or “fixed”
but normally distributed. In other words, the random-effects model differs from the fixed-effect
model in the calculation of variance: the fixed-effect model assumes that the between-study vari-
ance does not exist (i.e., the between-study variance is zero), hence differences among observed
effect sizes are solely due to within-study variance, whereas the random-effects model takes both
within- and between-study variances into account. The fact that the fixed-effect model does not
take study heterogeneity or between-study variance into consideration leads to a serious limita-
tion: the conclusions drawn from a fixed-effect meta-analysis are limited to only the set of studies
included in a particular meta-analysis and cannot be generalized to a broader, more general pop-
ulation. However, most social scientists wish to make inferences that extend beyond the included
set of studies in their meta-analyses. As a general rule of thumb, in most meta-analytic studies,
the random-effects model will be more plausible than the fixed-effect model because the random-
effects model allows more generalizable conclusions (Card, 2015; Vevea and Coburn, 2014). But,
we discourage the practice of switching to the random-effects model from the fixed-effect model
only based on the results of heterogeneity tests (Borenstein et al., 2005). We will discuss this in
10
more depth later.
The random-effects model can be estimated by three methods (there are other methods, but here
we will focus on the three most popular ones): the method of moments or the DerSimonian and
Laird method (DL; DerSimonian and Laird, 1986), the restricted maximum likelihood method
(REML; Raudenbush and Bryk, 1985), and the maximum likelihood method (ML; Hardy and
Thompson, 1996). In all cases, the summary effect size (i.e., the summary proportion) is esti-
mated as the weighted average of the observed effect sizes of individual studies and the weighting
for each study is the inverse of the total variance of a study, which is the sum of the within-study
variance and the between-studies variance (see formulas below for more details; Ma et al., 2016).
They differ mainly on the estimation of the between-study variance, commonly denoted as τ 2 in
the meta-analytic literature (more on this in the section on heterogeneity). The technical differ-
ences between these methods have been summarized elsewhere (Knapp et al., 2006; Thorlund et
al., 2011) and will not be discussed here.
4.2 Transformation of proportions: the logit transformation and the

double-arcsine transformation
We usually apply transformations to the observed proportions identified across a collection of
studies to make the transformed proportions follow a normal distribution in order to accurately
estimate the summary proportion and increase the validity of the associated statistical analyses,
e.g., tests of significance (Feng et al., 2014; Nyaga, et al., 2014). Nevertheless, when the observed
proportions are around 0.5 and the number of the studies is sufficiently large, the proportions fol-
low a binomial distribution that is approximately symmetrical. Under such circumstances, the
normal distribution is a good approximation of the binomial distribution, and thus the raw pro-
portion is appropriate to be used as an effect size statistic for analysis (Barendregt et al., 2013;
Box et al., 2005; Wang & Liu, 2016). In fact. according to their simulation study, Lipsey and Wil-
son (2001) proposed that when observed proportions derived from individual studies are between
0.2 and 0.8, and only the mean proportion across the studies is of interest, then the raw propor-
tion can work adequately as an effect size index. In addition to the effect sizes, we also need their
corresponding estimates of sampling variance or standard error (SE) to do a meta-analysis. The
effect sizes are weighted by the inverse of their variances. Specifically, a larger study is given more
weight so its effect size has greater impact on the overall mean. The procedure for calculating the
effect size, standard error, sampling variance, and inverse variance weight for individual studies
11
using direct (raw) proportions is as follows (Lipsey & Wilson, 2001):
The raw/direct proportion is given by:
k
ESp = p = (1)
n
with its sampling variance: r

p (1 − p)
V arp = SEp2 = (2)
n
and the inverse variance weight:
1 1 n
wp = = 2
= (3)
V arp SEp p (1 − p)
where p is the proportion, k is the number of individuals or cases in the category of interest, n is
the sample size, ES, SE, Var, and w stand for effect size, standard error, sampling variance, and
inverse variance weight, respectively. Then, the weighted average proportion can be computed as
follows:
P
(wi ESi )
ESP = P = P (4)
wi
with its sampling variance:

1
V arP = SEP2 = P (5)
wi
The confident interval of the weighted average proportion can be expressed as follows:
PL = P − Z(1−α) (SEP )
(6)
PU = P + Z(1−α) (SEP )
where Z(1−α) = 1.96 when α = 0.05.
However, proportional data derived from real studies is rarely centered around 0.5 and is, in fact,
often quite skewed (Hunter et al., 2014). In a group of studies collected for a meta-analysis of
proportions, it was found that as the observed proportions get further from 0.5 and approach
closer to the margins, especially when they are less than 0.2 or greater than 0.8, they will become
less likely to be normally distributed (Lipsey & Wilson, 2001). As a result, the normal distribu-
tion will not adequately describe the observed proportions and a continuing assumption of nor-
mality may result in a biased estimation and a misleading or invalid inference (Feng et al., 2014;
Ma et al., 2016). Specifically, using the direct proportion as the effect size statistic in such cases
12
may lead to an underestimation of the size of the confidence interval around the weighted average
proportion and an overestimation of the degree of heterogeneity across the observed proportions
(Lipsey & Wilson, 2001). When the distribution of a set of observed proportions is skewed (i.e.,
the observed proportions are extremely high or low), we usually apply transformations to the data
in order to make it conform to the normal distribution as much as possible, enhancing the valid-
ity of the following statistical analyses (Barendregt, et al., 2013). Specifically, after transforming
the observed proportions, all analyses are conducted using the transformed proportion as the ef-
fect size statistic (e.g., the natural logarithm of proportion) and the inverse of the variance of the
transformed proportion as study weight. For reporting, the transformed summary proportion and
its confidence interval are converted back to proportions for ease of interpretation (Borenstein et
al., 2009). In practice, the approximate likelihood approach (Agresti & Coull, 1998) is arguably
the predominant framework in modeling proportional data (Hamza, et al., 2008; Nyaga et al.,
2014). There are two main ways to transform observed proportions in this framework: the logit or
log odds (Sahai & Ageel, 2012) and the Freeman-Tukey double arcsine (Freeman & Tukey, 1950;
Miller, 1978). In the logit transformation, observed proportions are first converted to the natural
logarithm of the proportions (i.e., the logit). After the transformation, the logit transformed pro-
portions are assumed to follow a normal distribution and all analyses are performed on the logit
as the effect size statistic. After the analysis, the logits are converted back into proportions for re-
porting. The procedure for calculating the logit, its standard error and inverse variance weight for
individual studies, and the weighted average proportion using the logit transformation is as follows
(Lipsey & Wilson, 2001):
The logit is calculated by:

p p
ESl = loge = ln (7)
1−p 1−p
with its sampling variance:

1 1
V arl = SEl2 = + (8)
np n(1 − p)
and the inverse variance weight:

1
wl = = np (1 − p) (9)
SEl2
To convert the transformed values into the original units of proportions using:
elogit
p= (10)
elogit+1
13
Being widely employed in meta-analyses of proportions, the logit transformation still has its lim-
itations in certain situations. Two limitations are highlighted here. First, the variance instabil-
ity that plagues untransformed proportions persists even after the logit transformation (Baren-
dregt, et al., 2013; Hamza, et al., 2008).We transform data in an attempt to make it closer to a
normal distribution, or at least have more constant variance. While the logit transformation cre-
ates a sampling distribution that is more approximately normally distributed, it fails to stabilize
the variance, and thus may place undue weight on studies (Barendregt, et al., 2013; Hamza, et al.,
2008). According to the equation used to compute the corresponding sampling variance (Eq.8),
for a fixed value of n, the variance changes with p. For instance, consider the following situation
where two studies of the same sample size (e.g., 100), an observed proportion that is close to 0 or
1 gives grossly magnified variance whereas an observed proportion around 0.5 gives squeezed vari-
ance, which results in variance instability (Barendregt, et al., 2013). Further, in situations where
the event of interest is extremely rare (i.e. p=0) or extremely common (i.e., p=1), the logits and
its sampling variances will become undefined. In practice, the common solution is to add an ar-
bitrary constant 0.5 correction to the np and n(1-p) for all studies, including those with no such
problem (Hamza, et al., 2008). However, the inclusion of such studies with a 0.5 continuity cor-
rection has been shown to bias the results even further (Ma et al., 2016). Both of the problems
discussed above can be solved gracefully by employing the variance stabilizing transformation as
proposed by Freeman & Tukey (1950), known as the double arcsine, which is accomplished with
the following equation: r r

1 k k+1
ESt = (sin−1 + sin−1 ) (11)
2 n+1 n+1
The sampling variance is computed by:
1
V art = (12)
4n + 2
The back transformation is computed by the equation as proposed by Miller (1978):
 v" 
u 1 2 #
sin t −

1 u
sin t
p= 1 − sgn (cos t) t 1 − sin t +  (13)
2 n0
where t denotes the double arcsine transformed value or the confidence interval around it with
sgn being the sign operator. In the back-transformation equation (Eq.13), the sample size denoted
by n’ is indicative of using the harmonic mean of individual sample sizes in the inversion formula
14
(Miller, 1978). The harmonic mean is defined as:
Xm
n0 = m( n−1
i )
−1
(14)
i
where ni denotes the sample size of each included study and m denotes the number of included
studies. Miller (1978) gives an example in his paper. A meta-analysis of proportions includes 4
studies, whose samples sizes are 11, 17, 21, 6, respectively. The harmonic mean would be:
4
n0 = 1 1 1 1 = 10.9885 (15)
11 + 17 + 21 + 6
Following the suggestions proposed by Lipsey and Wilson (2001) and Viechtbauer (2010), using
direct proportions would be adequate when the observed proportions identified across studies are
between 0.2 and 0.8; applying the logit transformation would be acceptable when the observed
proportions are less than 0.2 or larger than 0.8; the double-arcsine method would be a more ap-
propriate choice when the sample size is small and/or extreme proportions need to be handled.
This tutorial will demonstrate all three transformation methods with R (unfortunately no studies
have been conducted to compare which to use under different circumstances).
4.3 R code for calculating summary effect sizes
We will now begin the first step of our meta-analysis. First, readers need to install and download
required packages, which run within R and contain collections of functions that are needed to per-
form meta-analyses. In this tutorial, we will need to install two packages: metafor (Viechtbauer,
2010) and meta (Schwarzer et al., 2015). This tutorial is primarily based on using metafor because
it allows users to have control over every step of manipulating raw data. We will also use the meta
package because it will be much more convenient to employ the forest() function in meta to create
forest plots. To install these packages, execute the following command:
install.packages(c("metafor", "meta"))
Once readers have installed a package, it is available permanently for use in R. To use the in-
stalled packages, one will need to execute the library() command each time you open R. In other
words, if you close the R program and restart it, you will have to issue this command again in or-
der to use packages. Readers can load the packages that we just installed into the current R ses-
sion with:
15
library(metafor)
library(meta)
We then need to import the data saved in the file data.csv and create an object named dat to
store the data in R. This can be achieved by running the following code:
dat=read.csv("data.csv", header=T, sep=",")
To estimate the summary effect size (i.e., the weighted average proportion) , we fit a meta-analytic
model in metafor by employing the escalc(), rma(), and predict() functions. These functions, along
with a number of arguments specified within their parentheses, instruct R on how effect sizes should
be calculated. Some of the arguments are defaults, such as weighted=TRUE, so users do not need
to specify them. With the escalc() function, individual effect sizes and their corresponding sam-
pling variances are estimated by fitting a meta-analytic model. One can decide whether to trans-
form these effect sizes with the measure= argument. In the case of a meta-analysis of proportions,
this function has the following general format:
ies=escalc(xi=cases, ni=total, data=dat, measure="PR"/"PLO"/"PFT")
Here, ies is the name of the object in which the results of the escalc() function will be stored. The
function will yield individual effect sizes yi and their corresponding sampling variances vi based
on the information stored in dat. In this case, we have named this object ies, which stands for
individual effect size. cases is the name of the variable containing the number of the event of in-
terest of each study in dat. total is the variable containing the sample size of each study in dat.
Finally, the measure= argument dictates which computational option will be used to transform
the raw proportions:
measure="PR" #no transformation
measure="PLO" #the logit transformation
measure="PFT" #the double arcsine transformation
We will then pool the individual effect sizes and their sampling variances based on the inverse
variance method with the rma() function. To do this, we can execute:
pes/pes.logit/pes.da=rma(yi, vi, data=ies, method="DL"/"REML")
16
If we decide not to perform a transformation, this object is suggested to be named pes, which
stands for pooled effect size; if we decide to perform a transformation, either the logit or the double-
arcsine, it is suggested to be named pes. logit or pes.da, which stands for logit and double-arcsin
transformed summary effect size, respectively. This object will store the results generated by the
rma() function. The function will yield a pooled effect size based on the individual effect sizes and
their sampling variances contained in ies. The method= argument dictates which of the following
between-study variance estimators will be used (the default method is REML):
method="DL" #random effects using the DerSimonian-Laird estimator
method="REML" #random effects using the restricted maximum-likelihood estimator
If unspecified, rma() estimates the variance component using the REML estimator. Even though
rma() stands for random-effects meta-analysis, the function can perform a fixed-effect meta-analysis
with:
method="FE"
The object pes. logit or pes.da now contains the estimate of the transformed summary proportion.
To convert it to its original, non-transformed measurement scale (i.e., proportion) and yield a true
summary proportion, we need to use the predict() function:
pes=predict(pes.logit, transf=transf.ilogit)
pes=predict(pes.da, transf.ipft.hm, targ=list(ni=dat$total))
The transf= argument dictates how the transformed summary effect size should be converted back
to proportion:
transf=transf.ilogit #inverse of logit transformation
transf=transf.ipft.hm, targ=list(ni=dat$total) #inverse of double-arcsine transformation
Note that we use transf. ipft. hm instead of transf. ipft, because we want to use the harmonic
mean of individual sample sizes (explained above). Finally, to see the output for the true sum-
mary proportion and its 95% CI, we use the print() function.
print(pes)
For the sake of readers’ convenience, the generic code for generating results for a fitted random
effects model using three different transformation methods is provided here:
17
Option 1: no transformation
ies=escalc(xi=cases, ni=total, data=dat, measure="PR")
pes=rma(yi, vi, data=ies)
print(pes)
Option 2: the logit transformation
ies.logit=escalc(xi=cases, ni=total, data=dat, measure="PLO")
pes.logit=rma(yi, vi, data=ies.logit)
print(pes)
Option 3: the double-arcsine transformation
ies.da=escalc(xi=cases, ni=total, data=dat, measure="PFT", add=0)
pes.da=rma(yi, vi, data=ies.da)
pes=predict(pes.da, transf=transf.ipft.hm, targ=list(ni=dat$total))
print(pes)
Note the use of add=0 in line 11. When a study contains proportions equal to 0, R will automat-
ically add 0.5 to the observed data (i.e., the number of event of interest, namely the cases vari-
able). Since the double-arcsine transformation does not require any adjustments to be made to
the data in such a situation, we can explicitly switch add=0.5 to add=0 to stop the default ad-
justment. Returning to the running example, the summary proportion is generated using option 2
(i.e., the logit transformation) on the grounds that all of the observed proportions in the dataset
are far below 0.2 and there are no zero events. Thus, we would execute:
ies.logit=escalc(xi=cases, ni=total, measure="PLO", data=dat)
pes.logit=rma(yi, vi, data=ies.logit, method="DL", level=95)
print(pes, digits=6)
The argument digits=specifies the number of decimal places to which the printed results should
be rounded (the default is 4). The argument level= specifies the confidence interval (the default
is 95). In this case, if we use the 95% CI, the values of τ , τ 2 , and I 2 will fall outside of the con-
fidence interval. We can switch to the 99% CI to solve this problem. We will however keep us-
ing the 95% CI throughout this tutorial so readers can compare the results computed by our code
18
with those obtained by Wu et al. (2014).
The estimates of the summary proportion and its 95% CI are shown in the following output:
Interpreting these summary statistics, we find that the summary proportion is 0.000424 (95%
CI=0.000316, 0.000569).
5 Identifying and quantifying heterogeneity
5.1 Overview of heterogeneity: the between-study variance (τ 2 )
Meta-analyses aim to produce a more precise summary or estimate of effect by synthesizing stud-
ies. An important decision that authors of meta-analyses need to make is whether it makes sense
to combine a set of identified studies in a meta-analysis, given that the studies inevitably differ
in their characteristics to varying degrees. If we were to combine studies whose study estimates
vary substantially from one another, the summary effect we estimate and the conclusion we draw
may not be accurate or valid. For instance, a meta-analysis of proportions concludes that the
summary proportion of juvenile offenders who re-offend in a city falls in a medium range (i.e.,
around 0.5), but considerable variation exists among these proportions. That is, studies conducted
in some boroughs have small (e.g., under 0.1) proportions while others have very large propor-
tions (e.g., above 0.9). In this case, to report that the mean proportion is moderately large would
be misleading because It fails to provide an accurate summary of the core finding that there is
large variation or inconsistency in the observed effect sizes across the studies. This variation is
what is known as heterogeneity (Del Re, 2015; Borenstein et al., 2005). We can quantify hetero-
geneity by dividing it into two distinct components: one is the between-study variance due to the
true or real variation among a body of studies and the other is the within-study variance due to
sampling error. The real variation can be attributed to clinical and/or methodological diversity,
namely systematic differences between studies beyond what would be expected by chance, such
as experimental designs, measurements, sample characteristics, interventions, study settings, and
any combination of such factors (Cornell et al., 2014; Lijmer, 2002; Thompson and Higgins, 2002;
Veroniki et al., 2016). For the purposes of meta-analysis, we are only interested in the true vari-
ation in effect sizes (i.e., the between-study variance). We characterize study heterogeneity by its
standard deviation, τ , a statistic called tau. Under the assumption of normality, 95% of true ef-
19
fects are expected to fall within ± 2 × τ 2 of the point estimate of the summary effect size (Cornell
et al., 2014). The between-study variance, τ 2 , called tau-squared, reflects the amount of true het-
erogeneity on an absolute scale (Borenstein et al., 2005). That is, the total amount of systematic
differences in effects across studies. The total variance of a study is the summary of the between-
and within-study variance and is used to assign weights under the random-effects model (i.e., the
inverse of the total variance). τ 2 can be estimated by several methods as we have already men-
tioned in the section dealing with calculating effect sizes (e.g., ML, REML, DL). Researchers have
not come to an agreement as to which of the three methods is the best to estimate the random ef-
fects model. The ML tends to underestimate the between-studies variance if the number of studies
included in a meta-analysis is small (Schwarzer et al., 2015; Nyaga et al., 2014). Simulation stud-
ies (e.g., Chung et al., 2014) have also demonstrated that the DL estimator has the same issue: it
tends to result in downwards biased between-study variance, potentially producing overly narrow
confidence bounds for the summary effect and too small p values when the number of studies is
small or when there is substantive variability in effect sizes (Bowden et al., 2011; Cornell et al.,
2014). Jackson et al. (2010) suggests that the DL procedure is preferable when the sample size is
large and only the mean effect size is of interest. In his book, Borenstein et al. (2005) mentioned
that some statisticians favor the REML method despite its computationally demanding nature
and one of the key authors of the DerSimonian and Laird paper once argued that the DL estima-
tor should not be used any longer. Although the REML estimator has generally been shown to
produce superior results to the DL estimator, a number of studies have shown that the differences
between the results derived by the two approaches are negligible, and thus are rarely sufficiently
pronounced to make a substantive impact on the conclusions that will be drawn from the anal-
ysis (Thorlund et al., 2011). Nevertheless, as all the aforementioned estimators have limitations
as to estimating the amount of true variation in effect sizes, the 95% confidence interval around
the point estimate of τ 2 should be obtained, especially when the number of studies included in a
meta-analysis is small (Veroniki et al., 2016). In practice, the DerSimonian and Laird random ef-
fects model is arguably the most popular statistical method for the meta-analysis of proportions
nowadays and has become the conventional procedure and the default method in many software
packages to gauge the amount of heterogeneity (Cornell et al., 2014; Hamza et al., 2007; Ma et
al., 2016; Schwarzer et al., 2015; Thorlund et al., 2011) because it has the advantage of being the
easiest to compute and explain compared with other methods (Borenstein et al., 2005).
20
5.2 Test for heterogeneity (Q)
The first method to identify heterogeneity is through visual inspection of forest plots. We will dis-
cuss this in depth in its own section below. Using formal tests, the presence of study heterogeneity
is generally examined using a formal χ2 test with a statistic Q under the null hypothesis that all
studies share the same true effect (Hedges & Olkin, 1985). In other words, the Q-test and its p-
value serve as a test of significance to address the null hypothesis: H0 : τ 2 = 0. If the value of the
Q-statistic is above the critical χ2 value, we will reject the null hypothesis and conclude that the
effect sizes are heterogeneous. Under such circumstances, you may consider taking the random-
effects model route. If Q does not exceed this value, then we fail to reject the null hypothesis. It
is very important to note that when a non-significant p-value is present, we have to be very cau-
tious in making the conclusion that the true effects are homogeneous. This is because the statisti-
cal power of the Q-test is heavily dependent on the number of studies included in a meta-analysis,
and thus, it may fail to detect heterogeneity simply due to a lack of power when the number of
included studies is small (i.e., less than 10) and/or the included studies are of small size (Huedo-
Medina et al., 2006). Therefore, a non-signicant result (p>0.05) cannot be interpreted as showing
empirical evidence for homogeneity (Hardy & Thompson, 1998). This is an issue that needs to be
taken seriously because it is found that 75% of meta-analyses reported in Cochrane reviews con-
tained five or fewer studies (Davey, 2011). In addition to the aforementioned limitation, the Q-
test only serves to test if the null-hypothesis is viable, it is not able to quantify the magnitude of
true heterogeneity in effect sizes (Card, 2015).
5.3 Quantifying heterogeneity (I 2 )
One statistic of heterogeneity that is not directly influenced by the number of studies, the I 2 statis-
tic proposed by Higgins et al. (2003) can address these issues by estimating the proportion of the
observed heterogeneity that constitutes the true variation between studies. Put differently, in
essence, I 2 is roughly the ratio of between-study variance to the observed variance (i.e., the sum
of between- and within-study variance). Thus, it allows us to compare estimates of heterogene-
ity across meta-analyses, regardless of the original scale used for the meta-analyses themselves. I 2
can take values from 0% to 100%. When it is equal to 0%, it means that all of the heterogeneity
is caused by sampling error and there is nothing to explain; when it is equal to 100%, the overall
heterogeneity can be accounted for by the true differences between studies exclusively, and thus, it
makes sense to apply sub-group analyses or meta-regressions to identify potential moderating fac-
21
tors that can explain the inconsistencies between effect sizes across studies. It is assumed that an
I 2 of 25, 50, and 75% indicate low, medium, and large heterogeneity, respectively (Higgins et al.,
2003). Note that these are only tentative benchmarks for I 2 . The 95% CI around the I 2 statistic
should also be calculated (Cuijpers 2016; Ioannidis et al., 2007). The value of I 2 itself could be
misleading because an I 2 of 0 with a 95% CI ranging from 0 to 80% in a small study is not indica-
tive of homogeneity. Rather, the degree of heterogeneity in this case is uncertain.
5.4 An important caveat: model selection should not be solely based on

heterogeneity tests and statistics.
Together, the Q-statistic, τ 2 , and I 2 can inform us if the effects are homogeneous, or consistent.
When the effect sizes are reasonably consistent, it is appropriate to combine them and give a sum-
mary effect size in reports. In cases where moderate and substantial heterogeneity is present, the
summary effect size is of less or no value. However, in practice, some have adopted the practice
of automatically selecting the random-effects model in the presence of heterogeneity and giving
an overall summary effect despite the fact that the effects are highly inconsistent. In such cases,
we strongly suggest that readers conduct moderator analyses to provide a thorough explanation
of the possible sources of heterogeneity in observed effect sizes instead of placing too much em-
phasis on the mechanistic calculation of a single estimate for the mean effect (Egger et al., 1998).
We will discuss moderator analysis in more detail later. However, as we can see, the methods for
estimating the amount of heterogeneity and the significance tests for heterogeneity are not very
reliable in many cases, and thus may potentially lead to poor estimates and misleading interpre-
tations in terms of the variability of true effects across studies. If we underestimate the between-
study variance and thus underestimate study heterogeneity (or even fail to detect heterogeneity
since tests for this often suffer from low power), authors may risk fitting the wrong model (i.e.,
the fix-effects model) and thus make inaccurate inferences about the overall intervention effect as
well as miss the opportunity to explore and investigate potential sources of systematic variation
between studies, which is in fact one of the core components of a meta-analysis (Thompson, 1994;
Thompson & Sharp, 1999; Higgins & Thompson, 2002). In conclusion, the selection of a model
should be based on a combination of factors, such as the type of conclusion one wishes to draw,
the expectation of the distribution of true effects, the statistical significance of the test for het-
erogeneity, the number of included studies, etc. (Borenstein et al., 2005; Card, 2015; Whitehead,
2002).
22
5.5 R code for outputting results of the heterogeneity test and statis-
tics
To view the results of the test for heterogeneity (Q), the estimate of between-study variance (τ 2 ),
and the estimate for the proportion of the observed variability that reflects the between-study
variance (I 2 ), we still use the print() function:
print(pes or pes.logit or pes.da)
The confint() function computes and displays the confidence intervals of τ 2 and I 2 :
confint(pes or pes.logit or pes.da)
To display the output of heterogeneity-related results for the running example, we would type:
print(pes.logit, digits=4)
confint(pes.logit, digits=2)
which outputs:
The output reveals that in the example data, τ 2 is 0.3256 (95% CI=0.3296, 1.4997), I 2 is 97.24%
(95% CI=97.28, 99.39) , and the Q-statistic is 580.5387 (p<0.001), all of which suggests high het-
erogeneity in the effect sizes. Again, the values of τ , τ 2 , and I 2 have fallen out their 95% CIs.
Readers can fix this problem by switching to the 99% CI.
23
6 Creating and interpreting forest plots
6.1 Visual inspection of forest plots
A forest plot is a graph that visualizes the point estimates study effects and their confidence inter-
vals (Lewis & Clarke, 2001). It is constructed using two perpendicular lines. The horizontal line
represents the outcome measure, in our case, the proportion. As for the vertical line that inter-
sects the horizontal line, the value at the line is relevant to the statistic being used. Specifically,
in meta-analyses that use relative statistics, such as odds ratio and risk ratio, the vertical line
represents where the intervention has no effect and thus is placed at the value of 1. For absolute
statistics such as absolute risk and standard mean difference, the null difference value is 0. The
two kinds of aforementioned meta-analyses all focus on a treatment effect or a relationship be-
tween two groups. The statistic (i.e., proportion) being estimated in meta-analyses of proportions
is simply a single group summary since a meta-analysis of proportions has only one arm (Boren-
stein et al., 2005). Therefore, in meta-analyses of proportions, the vertical line is usually placed at
the value of the point estimate of the summary proportion, which is depicted as a diamond at the
bottom with the horizontal tips of the diamond representing the 95% confidence interval of the
summary effect. Each study effect plotted on a forest plot has two components to it: a box repre-
senting the point estimate of each study effect and a horizontal line through the box representing
the confidence interval around the point estimate. The size of the box gives a representation of
the sample size or the weighting of each study, meaning the bigger the sample size, the bigger the
box, the shorter the horizontal line. A shorter horizontal line suggests a better precision of the
study results. A bigger box with a shorter horizontal line indicates that the study has more im-
pact on the summary effect size (Anzures-Cabrera & Higgins, 2010). The forest plot is very useful
for understanding the nature of the data being analyzed because it provides a simple visual repre-
sentation of the amount of variation between effect sizes across studies. Study effects are regarded
as homogeneous if the horizontal lines of all studies overlap (Ried, 2006; Petrie et al., 2003). The
forest plot also allows us to detect outliers (Cuijpers, 2016). This can be achieved by identifying
studies whose 95% confidence intervals do not overlap with that of the summary effect. Further-
more, it is worth noting that if large studies are outliers then the overall heterogeneity could be
high.
24
6.2 Identifying outliers with formal tests
It is crucial to conduct several formal tests to determine if the outlying effect sizes identified by
examining the forest plot are truly outliers. If they are considered outliers, further investigation
is needed to determine whether or not they are actually influential to the overall effect size. A di-
agnostic plot known as the Baujat plot (Baujat et al., 2002) has been proposed to identify studies
that contribute to heterogeneity in meta-analytic data. The horizontal axis represents the con-
tribution of each study to the Q-statistic for heterogeneity. The vertical axis illustrates the in-
fluence of each study on the summary effect size. Studies that appear in the top right quadrant
of the graph contribute most to both of these factors. Outlying effect sizes can also be identi-
fied by screening for externally studentized residuals that are larger than 2 or 3 in absolute value
(Tabachnik & Fidell, 2013; Viechtbauer and Cheung, 2010). An outlying effect size, however, may
not be considered to be influential unless its exclusion leads to significant changes in the fitted
meta-analytic model and exerts considerable influence on the summary effect size. Viechtbauer &
Cheung (2010) have proposed a set of case deletion diagnostics derived from linear regression anal-
yses to identify influential studies, such as difference in fits values (DFFITS), Cook’s distances,
leave-one-out estimates for the amount of heterogeneity (i.e., τ 2 ) as well as the test statistic for
heterogeneity (i.e., Q-statistic). In leave-one-out analyses, each study is removed in turn one at a
time, and the summary proportion is re-estimated based on the remaining n-1 studies. Analyzed
as such, we are able to assess the influence that each study has on the summary proportion.
As a final note, instead of simply eliminating studies that yield outlying effect sizes, one should
investigate these outliers and influential cases fully to understand their occurrence. Often they
could reveal valuable study characteristics that can be used as potential moderating variables to
account for some of the between-study variability.
6.3 R code for creating forest plots
Many authors of meta-analysis of proportions fail to create forest plots in a proper way when a
meta-analysis contains subgroups of studies. Specifically, many published meta-analytic studies
have failed to display correct estimates for overall and subgroup summary proportions in their for-
est plots. Simply put, let us say that a group of studies are divided into two subsets of studies in
a subgroup analysis. The inappropriate approach to calculating the subgroup summary propor-
tions is to assume that the two subgroups of studies are two sets of studies independent of each
other. In fact, it is quite possible that they share a common between-study variance component,
25
so a common estimate of τ 2 needs to be applied to each study when calculating subgroup and
overall summary proportions. The summary proportion across all studies calculated in the pres-
ence of subgroups is also different than that derived in the absence of subgroups. We will discuss
this issue in more detail in the next section.
In this section, we will begin with learning how to create a basic forest plot using the meta pack-
age. We will show readers how to create a more difficult forest plot in the next section after a
thorough discussion on the aforementioned issue.
We can create a simple forest plot that does not have any subgroups with the following generic
code (assuming that we have loaded the meta package):
pes.summary=metaprop(cases, total, authoryear, data=dat, sm="PRAW"/"PLO"/"PFT")
forest(pes.summary)
The sm= argument in the metaprop() function dictates which transformation method will be used
to convert the original proportions:
PRAW #no transformation
PLO #the logit transformation
PFT #the double arcsine transformation
Forest plots created by the generic code are bare-boned and may fail to meet publishing standards
in many cases. The following code can produce publication-quality forest plots using the data in
the running example:
pes.summary=metaprop(cases, total, authoryear, data=dat, sm="PLO",
method.tau="DL", method.ci="NAsm")
forest(pes.summary,
xlim=c(0,4),
pscale=1000,
rightcols=FALSE,
leftcols=c("studlab", "event", "n", "effect", "ci"),
leftlabs=c("Study", "Cases", "Total", "Prevalence", "95% C.I."),
xlab="Prevalence of CC", smlab="",
weight.study="random", squaresize=0.5, col.square="navy",
col.square.lines="navy",
col.diamond="maroon", col.diamond.lines="maroon",
26
pooled.totals=FALSE,
comb.fixed=FALSE,
fs.hetstat=10,
print.tau2=TRUE,
print.Q=TRUE,
print.pval.Q=TRUE,
print.I2=TRUE,
digits=2)
which produces:
It should be mentioned that given space constraints we have only listed the most essential argu-
ments in the forest() function to create a forest plot. Readers are referred to the instruction of the
meta package to explore more useful arguments to customize their own forest plots.
We can order the individual studies by precision in order to help us visually inspect the nature of
the data and examine publication bias. This can be achieved by using SE or the inverse of SE as
the measure of accuracy and creating an object to store their estimates:
precision=sqrt(ies.logit$vi) or 1/sqrt(ies.logit$vi)
And then we add the sortvar= argument in the forest() function:
sortvar=precision
27
which produces:
This graph clearly shows that the CC prevalence are larger when studies are smaller and less pre-
cise. In meta-analyses of comparative studies (i.e., meta-analytic studies that use OR, RR, etc. as
effect size), what we would like to see is an even spread of studies with varying precision on either
side of the mean effect size because it indicates a lack of publication bias in most cases. However,
in a meta-analysis of observational data (e.g., proportions), an uneven spread of studies may ac-
tually reflect a genuine pattern in effect sizes instead of publication bias, especially when small
studies fall out to the right side of the mean (the reasons are explained in detail in the next sec-
tion). It is also possible that some small studies are not published due to legitimate reasons, such
as the use of poor methods. Thus, this uneven distribution of effects is certainly worth investigat-
ing further, which may provide new insight into the topic of interest.
Notice that the estimate of τ 2 is 0.33 in the absence of subgroups.
A visual inspection of the forest plot identifies several suspicious outlying studies, including Bermejo
(1998), SanGiovanni (2002), and Halilbasic (2014).
6.4 R code for identifying outlying and influential studies
Next, we have to conduct a few formal tests to confirm our visual inspection of the forest plot.
The first test involves screening for externally studentized residuals that are larger than 2 or 3 in
absolute value. Using the following generic code, studies will be shown in descending order accord-
ing to their residual estimates:
28
stud.res=rstudent(pes/pes.logit/pes.da)
abs.z=abs(stud.res$z)
stud.res[order(-abs.z)]
Performing the test on the running example outputs:
The key here is to find studies with z-values that are bigger than 2 or 3 (depending on the num-
ber of studies included). Since we only have 17 studies in the running example, we would set the
cut-off at 2, thus the second, eighth, and twelfth studies are chosen. They match the studies we
visually identified earlier.
Outliers have the potential to be influential, but we generally have to investigate further to deter-
mine whether or not they are truly influential. This can be achieved by performing a set of leave-
one-out diagnostic tests:
L1O=leave1out(pes); print(L1O)
L1O=leave1out(pes, transf=transf.ilogit); print(L1O)
L1O=leave1out(pes, transf=transf.ipft.hm, targ=list(ni=dat$total)); print(L1O)
Using the example data, we first execute the following code:
L1O=leave1out(pes.logit, transf=transf.ilogit); print(L1O, digits=6)
which outputs:
29
The numbers in the printed dataset look daunting at first. They are actually very easy to inter-
pret. For instance, the first estimate in the first column (i.e., 0.000419) is the estimate for the
summary proportion that is derived when we take the first study out of this set of studies and re-
calculate the overall mean. In other words, if the first study is left out of this set of studies, the
estimate for the observed summary proportion (0.000424) will become 0.000419.
We can actually visualize the change in the summary effect size with a forest plot using metafor.
The generic code is given below:
l1o=leave1out(pes)
yi=l1o$ estimate; vi=l1o$se^2
forest(yi, vi,
slab=paste(dat$author, dat$year, sep=","),
refline=pes$b,
xlab="Summary proportions leaving out each study")
l1o=leave1out(pes.logit)
yi=l1o$estimate; vi=l1o$se^2
forest(yi, vi, transf=transf.ilogit,
refline=pes$pred,
l1o=leave1out(pes.da)
yi=l1o$estimate; vi=l1o$se^2
30
forest(yi, vi, transf=transf.ipft.hm, targ=list(ni=dat$total),
refline=pes$pred,
The forest plot below is generated with the data in the running example. Each box represents a
summary proportion estimated leaving out a study. The reference line indicates where the original
summary proportion lies. From the graph we can deduce that the further a box deviates from the
reference line, the more pronounced the impact of the corresponding missing study will be on the
original summary proportion.
With these potential influential studies in mind, we now conduct a few more leave-one-out diag-
nostics with a built-in function in metafor to verify our guesses:
inf=influence(pes/pes.logit/pes.da)
print(inf); plot(inf)
31
which outputs:
We have actually described these tests earlier, so we will not repeat here. The plot below the
printed dataset visualizes the leave-one-out estimates. Influential studies are marked with an as-
terisk in the printed dataset and labeled in red in the plot. The second and eighth studies fulfill
the criteria as influential studies.
6.5 R code for removing outlying studies
Once all possible outliers are determined, we can remove them with the following generic code:
#remember to add "add=0" when using double-arcsine transformation
ies.noutlier=escalc(xi=cases, ni=total,
measure="PR"/"PLO"/"PFT",
data=dat[-c(study label 1, study label 2,),])
32
If we were to exclude the second and the eighth studies, we would execute the following code:
ies.logit.noutlier=escalc(xi=cases, ni=total, measure="PLO", data=dat[-c(2, 8),])
pes.logit.noutlier=rma(yi, vi, data=ies.logit.noutlier, method="DL")
pes.noutlier=predict(pes.logit.noutlier, transf=transf.ilogit)
print(pes.noutlier, digits=5)
7 Explaining heterogeneity with moderator analyses
7.1 Overview of moderator analysis
We have determined that heterogeneity do exist in our data and identified a few outlying studies
that contribute some variability to part of the heterogeneity. If substantial heterogeneity remains
after we exclude those outliers, a major way to discover other possible sources of heterogeneity is
through moderator analysis. In fact, a thorough moderator analysis is more informative than a
single estimate of summary effect size when meta-analytic data being examined contain substan-
tial heterogeneity. Similar to primary studies, moderator analyses have a sample of “participants”
(i.e., the studies included in a meta-analysis), one or multiple independent variables (i.e., moderat-
ing variables) and one dependent variable (i.e., effect sizes within each subgroup). To predict the
effect of a hypothesized moderator, we apply a weighted linear regression model in which the ef-
fect sizes (i.e., logit-or double-arcsine transformed proportions) are regressed onto the moderator
(Card, 2015; Thompson & Higgins, 2002):
ES = β0 + Cβ1 + e (16)
where C is the regression (slope) coefficient, β1 is the moderator effect, and e is the within-study
variance. β0 is more informative when testing categorical variables as potential moderators; in
such a situation, it is the mean effect of a reference category (i.e., the category coded as 0 in a
dummy variable). It is not necessary to know the mathematics behind the process, but keeping
the regression model formula in mind will be useful when we interpret the resulting output of
moderator analysis in R.
33
7.2 Subgroup analysis
There are two major forms of moderator analyses: subgroup analysis and meta-regression. Sub-
group analysis can be considered as a special form of meta-regression in which only one categor-
ical moderator is examined (Thompson & Higgins, 2002). Generally, subgroup analyses are con-
ducted with a mixed effects model in which the random-effects model is used to combine study
effects within each subgroup and the fixed-effect model is used to test whether the effects across
the subgroups vary significantly from each other (Cuijpers, 2016). Subgroup analysis uses analog
to the analysis of variance (ANOVA) to evaluate the impact of moderators on effect sizes (Lit-
tell et al., 2008). Under the framework of subgroup analysis, the total set of studies is split into
two or more subgroups according to the categories within a categorical moderator and we com-
pare the effect in one subgroup of studies versus that in the rest of the subgroup(s) of studies. In
essence, categorical moderators are study characteristics that can account for a certain proportion
of between-study variability (Hamza et al., 2008). For instance, if a subgroup of studies shares a
common characteristic that the other subgroup(s) does not (e.g., being exposed to a new treat-
ment vs. old treatment) and the effect sizes across the subgroups differ significantly, it is quite
possible that part of the variation can be attributed to this particular study characteristic. In sub-
group analysis, the summary effect size for each group as well as the within-group heterogeneity
are obtained. Moreover, a Wald-type test is conducted to compare the summary effect sizes across
subgroups: using either a Z -score or a Q-statistic (both yield the same p-value), whether or not
two groups have significantly different outcomes can be determined (Viechtbauer, 2010).
7.3 An important caveat regarding obtaining an overall summary effect

size in the presence of subgroups
Within each subgroup of studies, the summary effect size can be calculated under fixed-effect and
random-effects models. One thing to be noted here is the computation of the between-study vari-
ance, τ 2 (Borenstein et al., 2005). As is the case for a fixed-effect meta-analysis in the absence of
moderators, the between-study variance within each subgroup is assumed to be zero. By contrast,
when we compute the summary effect size for each subgroup under the random-effects model, the
value of τ 2 needs to be estimated within subgroups of studies rather than across all studies as we
want to compare the summary effect size and within-group variance of each subgroup. Once we
have obtained the estimated values of τ 2 yielded within the subgroups, we can choose either to
pool them or not, depending on whether we anticipate the true between-study variation in ef-
34
fect sizes within each subgroup to be the same. If we believe that the observed within-group τ 2
estimates are different from one study to the next due to sampling error alone, then we antici-
pate a common τ 2 across subgroups, and thus we should apply the estimate of the pooled τ 2 to
all the studies. On the other hand, if we believe that apart from sampling errors, some system-
atic causes are also responsible for the differential values of the observed within-group τ 2 , then
we apply separate estimates of τ 2 for each subgroup. Simply put, if we use a separate estimate of
between-study variance, that means we are actually conducting an independent meta-analysis for
each subgroup ignoring subgrouping. When we assume that τ 2 is the same for all subgroups un-
der the random-effects model, we can use the R index to represent the proportion of the between-
study variance across all studies that can be explained by a moderator. In the presence of sub-
groups, the estimate of the summary proportion for all studies will be different than that in the
absence of subgroups (Borenstein et al., 2005). This is because we use different estimates for τ 2 in
different cases. In the absence of subgroups, τ 2 is computed based on the dispersion of all studies
from the grand mean (Borenstein et al., 2005). In the present of subgroups, as we have just dis-
cussed above, we have two methods to compute τ 2 . Therefore, there will be 3 different estimates
for summary effect size according to which method you would use. Computing the different kinds
of summary proportion estimates depends on the goal of one’s study as well as the nature of the
meta-analytic data, but any differences among these estimates will usually be trivial. Given space
constraints, this issue will not be discussed any further here, but see the code below to learn how
different variants of estimates for summary proportion are derived in R.
7.4 Meta-regression
Just as subgroup analysis relied on an adaptation of ANOVA, the logic of evaluating moderators
in meta-regression parallels the use of regression or multiple regression in primary studies (Card,
2015). Meta-regression examines the relationship between covariates (i.e., moderators) and the ef-
fect sizes in a set of studies using the study as the unit of analysis (Borenstein et al., 2005; Lit-
tell et al., 2008), under the framework of which moderating variables can either be categorical
or continuous. That is, we can incorporate a single continuous variable (e.g., the average age of
participants, sample size, publication year, number of therapy sessions in treatment, etc.), a se-
rious of continuous variables, or a combination of continuous and categorical variables as mod-
erators. When categorical moderators are included in a meta-regression model, they should be
coded as “dummy” variables. In a multivariate model of meta-regression, due to potential multi-
35
collinearity resulting from interrelated moderators, Hox (2010) suggests that moderating variables
should be evaluated separately in univariate models prior to being tested simultaneously in a sin-
gle model. As is true in the random-effects subgroup analysis, the R2 index can be employed in
meta-regression to indicate the proportion of true heterogeneity across all studies that can be ac-
counted for by one or a set of moderators in order to quantify the magnitude of their impact on
study effects.
7.5 Visualizing moderator analysis: Scatter plots
We can visualize moderator analysis by producing a scatter plot of the moderating variable(s) and
the effect sizes. A scatter plot is constructed with a center line known as the regression line and
two curved lines showing a 95% confidence interval around it, with studies represented by a circle
drawn proportional to its study weight (i.e., larger studies are shown as larger circles). What is
important in a scatter plot is the slope of the regression line. A completely horizontal regression
line implies that there is no association between the moderator and the effect sizes. If, however,
the regression line is not horizontal, this indicates that the effect sizes vary as the value of the
moderator changes. The slope coefficient and its significance test can inform us if the slope signif-
icantly deviates from zero. A significantly positive or negative slope suggests that the explanatory
variable has a significant moderating effect and can explain a significant amount of heterogeneity.
7.6 An important caveat: Results of moderator analyses cannot be seen

as causal evidence
Moderator analysis has several limitations. A main one being that both subgroup analysis and
meta-regression require a large ratio of studies to moderators. In general, moderator analysis should
not be attempted unless at least 10 studies are available for each moderator in the analysis, espe-
cially in a multivariate model where the number of studies could be small, leading to reduced sta-
tistical power (Higgins & Green, 2006; Littell et al., 2008). Perhaps the most important limitation
is that the significant differences found between subgroups of studies in moderator analyses cannot
be seen as causal evidence. It is quite possible that unidentified factors that are not measured in
such moderator analyses are responsible for the differential effect sizes across the subgroups. Un-
fortunately, there is no solution for this problem (Cuijpers, 2016; Littell et al., 2008). Hence, one
cannot draw causal conclusions from moderator analyses. In light of this, we strongly suggest that
authors choose moderating variables based on theoretical reasoning and only test those for which
36
a strong theoretical case can be made in order to avoid erroneously attributing heterogeneity to
spurious moderators found by chance (Schmidt & Hunter, 2014).
7.7 R code for calculating subgroup summary proportions, conduct sub-

group analysis, and recalculate the overall summary proportion
Based on the analysis above, we have developed the following generic code to help readers perform
subgroup analyses and compute (overall and within-group) summary proportions appropriately. In
order to select the appropriate computational option, readers need to develop a good understand-
ing of the nature of the data they are working with.
In the first situation, we do not assume a common between-study variance component across sub-
groups and thus do not pool within-group estimates of τ 2 . To allow us to examine the moderating
effect of a potential moderator variable with a mixed-effect model and to recalculate a new over-
all summary effect size using separate τ 2 within each subgroup, we first fit two separate random-
effects models within each subgroup and then combine the estimated statistics from each model
into a data frame. Finally, we fit a fixed-effect model to compare the two estimated logit trans-
formed proportions and calculate a new summary effect size. The generic code is provided here:
Assumption 1: Do not assume a common between-study variance component (do not pool
within-group estimates of between-study variance)
pes.subgroup1=rma(yi, vi, data=ies, subset=moderator=="subgroup1")
pes.subgroup2=rma(yi, vi, data=ies, subset=moderator=="subgroup2")
dat.diffvar=data.frame(estimate=c(pes.subgroup1$b, pes.subgroup2$b),
stderror=c(pes.subgroup1$se, pes.subgroup2$se),
moderator=c("subgroup1", "subgroup2"),
tau2=round(c(pes.subgroup1$tau2, pes.subgroup2$tau2), 3))
subganal.moderator=rma(estimate, sei=stderror, mods=~moderator,
method="FE", data=dat.diffvar)
pes.moderator=rma(estimate, sei=stderror, method="FE", data=dat.diffvar)
pes.moderator=predict(pes.moderator)
print(pes.subgroup1) #display subgroup 1 summary effect size
print(pes.subgroup2) #display subgroup 2 summary effect size
print(subganal.moderator) #display subgroup analysis results
print(pes.moderator) #display recomputed summary effect size
37
pes.logit.subgroup1=rma(yi, vi, data=ies.logit, subset=moderator=="subgroup1")
pes.logit.subgroup2=rma(yi, vi, data=ies.logit, subset=moderator=="subgroup2")
pes.subgroup1=predict(pes.logit.subgroup1, transf=transf.ilogit)
pes.subgroup2=predict(pes.logit.subgroup2, transf=transf.ilogit)
dat.diffvar=data.frame(estimate=c(pes.logit.subgroup1$b, pes.logit.subgroup2$b),
stderror=c(pes.logit.subgroup1$se, pes.logit.subgroup2$se),
tau2=round(c(pes.logit.subgroup1$tau2,
pes.logit.subgroup2$tau2), 3))
pes.logit.moderator=rma(estimate, sei=stderror, method="FE", data=dat.diffvar)
pes.moderator=predict(pes.logit.moderator, transf=transf.ilogit)
print(pes.subgroup1); print(pes.logit.subgroup1) #display subgroup 1 summary effect size
print(pes.subgroup2); print(pes.logit.subgroup2) #display subgroup 2 summary effect size
pes.da.subgroup1=rma(yi,vi,data=ies.da, subset=moderator=="subgroup1")
pes.da.subgroup2=rma(yi,vi,data=ies.da, subset=moderator=="subgroup2")
pes.subgroup1=predict(pes.da.subgroup1, transf=transf.ipft.hm,targ=list(ni=dat$total))
pes.subgroup2=predict(pes.da.subgroup2, transf=transf.ipft.hm,targ=list(ni=dat$total))
dat.diffvar=data.frame(estimate=c(pes.da.subgroup1$b, pes.da.subgroup2$b),
stderror=c(pes.da.subgroup1$se, pes.da.subgroup2$se),
tau2=round(c(pes.da.subgroup1$tau2,
pes.da.subgroup2$tau2), 3))
pes.da.moderator=rma(estimate, sei=stderror, method="FE", data=dat.diffvar)
pes.moderator=predict(pes.da.moderator, transf=transf.ipft.hm, targ=list(ni=dat$total))
print(pes.subgroup1); print(pes.da.subgroup1) #display subgroup 1 summary effect size
38
print(pes.subgroup2); print(pes.da.subgroup2) #display subgroup 2 summary effect size
In the second situation, we assume a common between-study variance component across sub-
groups and pool within-group estimates of τ 2 . In this case, we can directly use the rma() com-
mand to fit a mixed-effect model to evaluate the moderating effect of a potential predictor. How-
ever, to allow us to calculate a new overall summary proportion using a pooled τ 2 across all stud-
ies, we still have to combine a new data frame containing statistics estimated in two random-
effects models. Once we have created the new data frame, we can calculate a new overall summary
effect based on the data frame using a fixed-effect model or a random-effects model (based on the
various factors we have discussed above, e.g., the conclusion one wishes to make, the true distribu-
tion of effect sizes, etc.).
Assumption 2: Assume a common between-study variance component (pool within-group
estimates of between-study variance)
subganal.moderator=rma(yi, vi, data=ies, mods=~moderator)
pes.subgroup1=rma(yi, vi, data=ies, mods=~moderator=="subgroup2")
pes.subgroup2=rma(yi, vi, data=ies, mods=~moderator=="subgroup1")
pes.subg.moderator=predict(subganal.moderator)
dat.samevar=data.frame(estimate=c((pes.subgroup1$b)[1], (pes.subgroup1$b)[1]),
stderror=c((pes.subgroup2$se)[1], (pes.subgroup2$se)[1]),
tau2=subganal.moderator$tau2)
pes.moderator=rma(estimate, sei=stderror,
method="FE or other RE estimators",
data=dat.samevar)
pes.moderator=predict(pes.moderator)
print(pes.subg.moderator[study label 1]) #display subgroup 1 summary effect size
print(pes.subg.moderator[study label 2]) #display subgroup 2 summary effect size
subganal.moderator=rma(yi, vi, data=ies.logit, mods=~moderator)
39
pes.logit.subgroup1=rma(yi, vi, data=ies.logit, mods=~moderator=="subgroup2")
pes.logit.subgroup2=rma(yi, vi, data=ies.logit, mods=~moderator=="subgroup1")
pes.subg.moderator=predict(subganal.moderator, transf=transf.ilogit)
dat.samevar=data.frame(estimate=
c((pes.logit.subgroup1$b)[1],(pes.logit.subgroup2$b)[1]),
stderror=
c((pes.logit.subgroup1$se)[1],(pes.logit.subgroup2$se)[1]),
pes.logit.moderator=rma(estimate, sei=stderror,
data=dat.samevar)
pes.moderator=predict(pes.logit.moderator, transf=transf.ilogit)
print(pes.subg.moderator[study lable 1]) #display subgroup 1 summary effect size
subganal.moderator=rma(yi, vi, data=ies.da, mods=~moderator)
pes.da.subgroup1=rma(yi, vi, data=ies.da, mods=~moderator=="subgroup2")
pes.da.subgroup2=rma(yi, vi, data=ies.da, mods=~moderator=="subgroup1")
pes.subg.moderator=predict(subganal.moderator,
transf=transf.ipft.hm,
targ=list(ni=dat$total))
dat.samevar=data.frame(estimate=c((pes.da.subgroup1$b)[1], (pes.da.subgroup2$b)[1]),
stderror=c((pes.da.subgroup1$se)[1], (pes.da.subgroup2$se)[1]),
pes.da.moderator=rma(estimate, sei=stderror,
data=dat.samevar)
pes.moderator=predict(pes.da.moderator, transf=transf.ipft.hm, targ=list(ni=dat$total))
40
To help readers better understand how to use the code, we will now illustrate its implementa-
tion with the running example. For demonstrative purposes, we will use the variable study design
(birth cohort vs. others) as the moderator and conduct the analysis with the logit transformation
under both situations and then compare the resulting estimates of the overall summary propor-
tions with that of the original one.
In the first situation, execute the following code:
pes.logit.birthcohort=rma(yi, vi, data=ies.logit,
subset=studydesign=="Birth cohort",
method="DL")
pes.logit.others=rma(yi, vi, data=ies.logit,
subset=studydesign=="Others",
method="DL")
pes.birthcohort=predict(pes.logit.birthcohort, transf=transf.ilogit, digits=5)
pes.others=predict(pes.logit.others, transf=transf.ilogit, digits=5)
dat.diffvar=data.frame(estimate=c(pes.logit.birthcohort$b, pes.logit.others$b),
stderror=c(pes.logit.birthcohort$se, pes.logit.others$se),
studydesign=c("Birth cohort", "Others"),
tau2=round(c(pes.logit.birthcohort$tau2,
pes.logit.others$tau2), 3))
subganal.studydesign=rma(estimate, sei=stderror, data=dat.diffvar,
mods=~studydesign, method="FE")
pes.logit.studydesign=rma(estimate, sei=stderror, method="FE", data=dat.diffvar)
pes.studydesign=predict(pes.logit.studydesign, transf=transf.ilogit)
print(pes.birthcohort, digits=6); print(pes.logit.birthcohort, digits=3)
print(pes.others, digits=6); print(pes.logit.others, digits=3)
print(subganal.studydesign, digits=3)
print(pes.studydesign, digits=6)
which outputs:
From the output above, we can derive that the summary effect estimates are 0.00035 (95% CI=0.00016,
0.00078), 0.00047 (95% CI=0.00034, 0.00065), and 0.00045 (95% CI=0.00034, 0.00061) for the
two subgroups and the overall group of studies, respectively. When we fit separate random-effects
models in the two subgroups, we decide to allow the amount of variance within each set of stud-
ies to be different, which results in two different within-group estimates of τ 2 (0.93 and 0.25 for
41
studies using the birth cohort design and other study designs, respectively). That means studies
within each subgroup share the same estimate of τ 2 . The results of the test of moderators reveals
that the difference between the two subgroup summary estimates is not significant (QM (1)=0.45,
p=0.51) despite the fact that the estimate of the second subgroup is larger than the first. Note
that the sum of the within-group heterogeneity across the subgroups in the fixed-effect model is
equal to QE (0)=0, p=1. This is because the within-group heterogeneity has been accounted for
in each subgroup (Q(df =5)=344.594, p<0.001; Q(df =10)=235.944, p<0.01, respectively) in the
random-effects model, thus there is no heterogeneity left to be accounted for (which is also the
definition of residual heterogeneity).
In the second situation, execute the following code:
subganal.studydesign=rma(yi, vi, data=ies.logit, mods=~studydesign, method="DL")
pes.logit.birthcohort=rma(yi, vi, data=ies.logit, mods=~studydesign=="Others",
method="DL")
42
pes.logit.others=rma(yi, vi, data=ies.logit, mods=~studydesign=="Birth cohort",
method="DL")
pes.subg.studydesign=predict(subganal.studydesign, transf=transf.ilogit)
dat.samevar=data.frame(estimate=
c((pes.logit.birthcohort$b)[1],(pes.logit.others$b)[1]),
stderror=
c((pes.logit.birthcohort$se)[1], (pes.logit.others$se)[1]),
tau2=subganal.studydesign$tau2)
pes.logit.studydesign=rma(estimate, sei=stderror, method="DL", data=dat.samevar)
pes.studydesign=predict(pes.logit.studydesign, transf=transf.ilogit)
print(subganal.studydesign, digits=4)
print(pes.subg.studydesign[1], digits=6)
print(pes.subg.studydesign[17], digits=6)
print(pes.studydesign, digits=6)
which outputs:
This output is fairly self-explanatory. From this output, we can derive that we have fitted a mixed-
effects model, meaning a random-effects model is used to combine studies within each subgroup
and a fixed-effect model is used to combine subgroups and produce the estimate for the summary
effect size. The amount of within-group heterogeneity across the two subgroups is assumed to be
the same (τ 2 =0.042 in this case). It is the combined estimate yielded by pooling the estimates
43
of the two within-group variance as displayed earlier (τ 2 =0.93 and τ 2 =0.25). Once we have the
pooled estimate, we then apply it to each study across the two subgroups, meaning every study
now shares the same estimate of τ 2 (i.e., 0.042). From the test of moderators section, we can de-
rive that the moderator study design does not have a moderating effect (QM (1)=0.92, p=0.34).
In other words, it cannot explain the true heterogeneity in the effect sizes. That is, when we di-
vide the included studies according to their study design, we will fail to find significant difference
between the two subgroups of effect sizes. This conclusion can also be supported by the result of
the test for residual heterogeneity: there is significant unexplained heterogeneity left between all
effect sizes in the data (QE (15)=580.54, p<0.01) after the study design has been added to the
mixed-effects model to examine its potential moderating effect, which can also explain why R2
shows 0%, suggesting that the study design can explain 0% of the between-study variance, namely
the true heterogeneity. It is worth noting that under the framework of the mixed-effect model,
the residual heterogeneity estimate here (QE (15)=580.54) is the sum of the two within-group
heterogeneity estimates we have obtained above in the random-effects model (Q(df =5)=344.59,
Q(df =10)=235.94, respectively). Finally, the estimates for the two subgroup summary propor-
tions and the overall summary proportion are displayed at the bottom of the output. They are
0.00034 (95% CI=0.0002, 0.00061), 0.00049 (95% CI=0.00032, 0.00074), and 0.00043 (95% CI=0.00031,
0.0006), respectively. There are several other points that are worth noting. When we code dummy
variables, the subset of studies coded as 0 in a dummy variable will function as the reference group
(represented by the intercept of the fitted mixed-effects regression model).The other subset of
studies coded as 1 will be compared against the reference group (in the running example, “birth
cohort” is the reference group). In fact, it makes no difference which subset is selected as the ref-
erence group from a statistical point of view. The estimate of the intercept (i.e., -7.97) is equal
to the logit-transformed summary effect estimate of the studies in the reference group (i.e., birth
cohort). The regression coefficient (i.e., the slope) is 0.35.
We can see that even though the conclusion drawn from the two computational models can be the
same (i.e., study design is not a significant moderating variable), when we calculate the summary
effect estimate for the overall set of studies, the estimates vary according to the τ 2 that is applied
in different situations (in the presence of subgroups vs. in the absence of subgroups). In general, if
the sample size of each subgroup is small, it is recommended to pool the separate τ 2 . In so doing
we can obtain a more accurate estimate of τ 2 . In contrast, if we decide not to pool them, we need
at least 5 studies in each subgroup to be able to yield a moderately stable estimate of τ 2 within
each subgroup (Borenstein et al., 2005).
44
7.8 R code for creating forest plots in the presence of subgroups
We have shown readers how to create forest plots without subgroups. We will now begin con-
structing forest plots with subgroups under different assumptions (a common between-study vari-
ance component vs. separate between-study variance components). Constructing forest plots using
metafor could be challenging even for experienced metafor users. Fortunately, we have obtained
the estimates for subgroup and overall summary proportions in the previous section, which are
needed to create our forest plot. We simply need to copy those numbers and paste them into the
forest plot being constructed. The following is the generic code to construct forest plots under the
first assumption (which also corresponds to the first assumption in the previous section):
Assumption 1: Do not assume common between-study variance component (use separate
within-group estimates of between-study variance).
ies.summary=summary(ies, ni=dat$total)
forest(ies.summary$yi, ci.lb=ies.summary$ci.lb, ci.ub=ies.summary$ci.ub,
rows=c(d:c, b:a))
ies.summary=summary(ies.logit, transf=transf.ilogit, ni=dat$total)
rows=c(d:c, b:a))
Option 3: the double arcsine transformation
ies.summary=summary(ies, transf=transf.ipft, ni=dat$total)
rows=c(d:c, b:a))
The code above merely builds the “bones” of a forest plot. More components need to be added to
the plot (e.g., text, headers, labels, etc.). We also have to manually adjust the appearance of the
plot to make it look prettier and more professional. Dividing a set of included studies into sev-
eral subgroups in a forest plot using metafor has to be done manually with the rows=argument.
Readers may have noticed that the parameters (a, b, c, and d denotes a particular position on the
Y -axis, respectively) in the argument are ordered from right to left. a specifies the vertical posi-
tion for plotting the first study in the first subgroup; b specifies the vertical position for plotting
45
the last study in the first subgroup; c specifies the vertical position for plotting the first study in
the second subgroup; d specifies the vertical position for plotting the last study in the second sub-
group. Mathematically speaking, b − a + 1 and d − c + 1 should be equal to the number of studies in
their corresponding subgroups. c and b do not need to be consecutive numbers. If we order these
parameters from left to right, studies will be displayed in reverse order with the first study being
displayed at the bottom of the plot and the last study being displayed at the top of all the stud-
ies.
To illustrate, we can execute the following code to create a forest plot using study design as mod-
erator.
ies.summary=summary(ies.logit, transf=transf.ilogit, ni=dat$total)
par(cex=1, font=6)
forest(ies.summary$yi,
ci.lb=ies.summary$ci.lb, ci.ub=ies.summary$ci.ub,
ylim=c(-5, 23),
xlim=c(-0.005, 0.005),
ilab=cbind(data=dat$cases, dat$total),
ilab.xpos=c(-0.0019, -0.0005),
ilab.pos=2,
rows=c(19:14, 8.5:-1.5),
at=c(seq(from=0, to=0.004, by=0.001)),
refline=pes.studydesign$pred,
main="",
xlab="Proportion(%)",
digits=4)
par(cex=1.2, font=7)
addpoly(pes.birthcohort$pred, ci.lb=pes.birthcohort$ci.lb, ci.ub=
pes.birthcohort$ci.ub, row=12.8, digits=5)
addpoly(pes.others$pred, ci.lb=pes.others$ci.lb, ci.ub=pes.others$ci.ub, row=-2.7,
digits=5)
addpoly(pes.studydesign$pred, ci.lb=pes.studydesign$ci.lb, ci.ub=
pes.studydesign$ci.ub, row=-4.6, digits=5)
par(cex=1.1, font=7)
text(-0.005, 21.8, pos=4, "Study")
46
text(c(-0.0026, -0.0014), 21.8, pos=4, c("Cases", "Total"))
text(0.0025, 21.8, pos=4, "Proportion [95% CI]")
text(-0.005, c(9.7, 20.2), pos=4, c("Others", "Birth cohort"))
par(cex=1, font=7)
text(-0.005, -4.6, pos=4, c("Overall"))
text(-0.005, 12.8, pos=4, c("Subgroup"))
text(-0.005, -2.7, pos=4, c("Subgroup"))
abline(h=-3.7)
which produces:
Notice that the overall summary proportion is 0.00045 (95% CI=0.00033, 0.00061) under the given
assumption, which is different than the one derived in the absence of subgroups (0.00042). As we
have mentioned before, it would be challenging and time-consuming to create a forest plot using
metafor. Recall that we recommend readers calculate summary proportions under the second as-
sumption so that the estimate of τ 2 can be more accurate, especially when a given meta-analysis
contains less than 5 studies in each subgroup. Fortunately, when we work under the second as-
sumption we can use the meta package to construct forest plots whose syntax for creating graphs
47
is highly user-friendly and very easy to learn.
Assumption 2: Assume common between-study variance component (pool within-group
estimates of between-study variance).
subganal.moderator=rma(yi, vi, data=ies/ies.logit/ies.da, mods=~moderator, method="DL")
pes.summary=metaprop(cases, total, authoryear, data=dat, sm="PRAW"/"PLO"/"PFT",
byvar=moderator,
tau.common=TRUE,
tau.preset=sqrt(subganal.moderator$tau2))
forest(pes.summary)
Using the combination of the argument mods= moderator in the rma() function and the argu-
ments byvar=moderator, tau.common=TRUE, and tau.preset=sqrt(subganal.moderator$tau2) in
the metaprop() function allows us to achieve our goal here. To construct a forest plot for the run-
ning example using study design as moderator, use the following code:
subganal.studydesign=rma(yi, vi, data=ies.logit, mods=~studydesign, method="DL")
pes.summary=metaprop(cases, total, authoryear, data=dat,
sm="PLO"
method.tau="DL",
method.ci="NAsm",
byvar=studydesign,
tau.common=TRUE,
tau.preset=sqrt(subganal.studydesign$tau2))
forest(pes.summary,
xlim=c(0,4),
pscale=1000,
rightcols=FALSE,
leftcols=c("studlab", "effect", "ci"),
leftlabs=c("Study", "Proportion", "95% C.I."),
text.random="Combined prevalence",
xlab="Prevalence of CC( )", smlab="",
weight.study="random", squaresize=0.5, col.square="navy",
col.diamond="maroon", col.diamond.lines="maroon",
pooled.totals=FALSE,
comb.fixed=FALSE,
48
fs.hetstat=10,
print.tau2=TRUE,
print.Q=TRUE,
print.pval.Q=TRUE,
print.I2=TRUE,
digits=2)
which returns the following forest plot:
Notice that the estimates of τ 2 are the same (0.4427) and the overall summary proportion and
its 95% CI have also changed (0.43; 95% CI=0.31, 0.6). Again, unless you have a large number of
studies in each subgroup or you have a solid reason to believe that the within-group heterogeneity
varies greatly across subgroups, calculating summary proportions and constructing forest plots
with a pooled τ 2 would suffice in most cases.
7.9 R code for conducting meta-regression
In cases where we want to evaluate the effect of a continuous moderator, the R code we would use
is identical to what we would use in a subgroup analysis. This can be achieved with the following
generic code:
49
metareg.moderator=rma(yi, vi, data=ies/ies.logit/ies.da, mods=~moderator)
As mentioned above, a combination of significant moderators can be regressed on effect size in a
single model. This can be achieved by adding the plus sign in the mods=argument. The generic
code is as follows:
metareg.moderators=rma(yi, vi, data=ies/ies.logit/ies.da,
mods=~moderatorA+moderatorB+moderatorC+...+moderatorZ)
As a final note, metafor and meta handle proportions equal to 0 or 1 differently. Metafor applies
the 0.5 adjustment for calculating the proportions and the sampling variances whereas meta does
not adjust the counts for computing the proportions themselves, but it does the usual 0.5 adjust-
ment for computing the sampling variances. This different handling of proportions and variances
would lead to small discrepancies between the results, but they are usually negligible.
7.10 R code for visualizing moderator analyses
The code to create scatter plots is partly different depending on which transformation option is
selected. Note that if we want to visualize subgroup analyses, we need to use dummy variables
to create scatter plots (e.g., variables labeled studesg in the running example). Using categorical
variables (e.g., studydesign in the example dataset), however, will create box plots.
wi=1/sqrt(ies$vi)
size=1+3*(wi-min(wi))/(max(wi)-min(wi))
plot(ies$dummyvar, ies$yi, cex=size)
wi=1/sqrt(ies.logit$vi)
plot(ies.logit$dummyvar, transf.ilogit(ies.logit$yi), cex=size)
plot(ies.logit$dummyvar, ies.logit$yi, cex=size) #y-axis unit: logit-transformed value
wi=1/sqrt(ies.da$vi)
50
plot(ies.da$dummyvar, transf.ipft.hm(ies.da$yi, targ=list(ni=dat$total)), cex=size)
plot(ies.da$dummyvar, ies.da$yi, cex=size) #y-axis unit: double-arcsine-transformed value
Using the running example, we can create scatter plots with regression lines and corresponding
95% CI bounds for study design with:
subganal.studesg=rma (yi, vi, data=ies.logit, mods=~studesg, method="DL")
plot(ies.logit$studesg, transf.ilogit(ies.logit$yi), cex=size, xlab="Study Design",
ylab="Proportion")
pes.studesg=predict(subganal.studesg, transf=transf.ilogit, newmods=c(0:2))
lines(0:2, pes.studesg$pred, col="navy")
lines(0:2, pes.studesg$ci.lb, lty="dashed", col="maroon")
lines(0:2, pes.studesg$ci.ub, lty="dashed", col="maroon")
which generates the scatter plot as shown below:
According to a visual inspection of the scatter plot, we can find that the slope of the estimated
regression line is neither completely horizontal nor very steep, suggesting a weak association be-
tween study design and the observed effects. In addition, nearly half of the studies fall outside of
the 95% CI bounds, implying that there might be one or more very important missing factors that
could better account for the heterogeneity in the effect sizes. But we are not certain as to whether
this relationship is significant unless we examine the output for the model. To generate the out-
put, run the following code:
51
print(subganal.studesg)
which outputs:
From this output, we can conclude that study design is not a significant moderator (QM (1)=0.92,
p=0.34), which is also supported by the insignificant regression coefficient (0.35; Z (15)=0.96,
p=0.021).
To create a scatter plot for sample size, execute:
subganal.size=rma(yi, vi, data=ies.logit, mods=~size, method="DL")
pes.size=predict(subganal.size, newmods=c(0:2), transf=transf.ilogit)
size=1+3 *(wi-min(wi))/(max(wi)-min(wi))
plot(ies.logit$size, transf.ilogit(ies.logit$yi), cex=size, xlab="Sample size",
ylab="Proportion")
lines(0:2, pes.size$pred, col="navy")
lines(0:2, pes.size$ci.lb, lty="dashed", col="maroon")
lines(0:2, pes.size$ci.ub, lty="dashed", col="maroon")
which generates the scatter plot as presented below:
In this case, the slope of the estimated regression line is much steeper. A visual inspection of this
scatter plot informs us that sample size appears to be negatively correlated with the observed pro-
portions. When the sample size is less than 100,000, the proportion is higher whereas when the
sample size is larger than 100,000, the proportion is lower. Again, missing factors results in a cer-
tain amount of omitted variable bias here. This is also confirmed by the mixed-effects model re-
52
sults:
Together, the results of the test of moderators (QM (1)=36.43, p<0.0001) as well as the significant
slope coefficient (−1.29; Z (15)=−6.04, p<0.0001) conform to our visual interpretation of the asso-
ciation between sample size and the observed effect sizes. In stark contrast with study design, the
R2 for sample size shows that 57.07% of the true heterogeneity in the observed effect size can be
accounted for by sample size.
The running example does not examine any continuous predictors. However, for illustrative pur-
poses, we can plot the observed effect sizes of the individual studies against the continuous vari-
able publication year based on a mixed-effects model:
metareg.year=rma(yi, vi, data=ies.logit, mods=~ year, method="DL")
53
pes.year=predict(metareg.year, newmods=c(1985:2020), transf=transf.ilogit)
plot(ies.logit$year, transf.ilogit(ies.logit$yi), cex=size, pch=1, las=1, xlab="Month",
ylab="Proportion")
lines(1985:2020, pes.year$pred, col="navy")
lines(1985:2020, pes.year$ci.lb, lty="dashed", col="maroon")
lines(1985:2020, pes.year$ci.ub, lty="dashed", col="maroon")
ids=c(1:17)
pos=c(1)
text(ies.logit$year[ids], transf.ilogit(ies.logit$yi)[ids], ids, cex=0.9, pos=pos)
which generates:
Note the use of the last three lines of code. They are used to label the studies (represented by the
circles) in the graph.
One might notice that the results presented in this section do not match those of Wu et al. ex-
actly. This is because Wu et al. (2012) calculated the overall and subgroup summary proportions
with the DL estimator, but they switched to the REML estimator to conduct subgroup analyses.
The reason is not stated in their report.
54
8 The issue of publication bias in meta-analyses of propor-
tions
8.1 Overview of publication bias in the context of meta-analyses of pro-

portions
One of the major threats to the validity of meta-analysis is publication bias. This is the tendency
to submit or accept a study depending on the direction or strength of its results; in other words,
compared with studies with positive and/or significant results, small studies reporting negative re-
sults and/or small effects are less likely to be published and subsequently included in a meta anal-
ysis (Dickerson, 1990; Quintana, 2015). Small studies have to have a more robust effect in order
to be published. Omitting unpublished studies from a review could lead to a biased estimate of
summary effect (Song et al., 2000), because the smaller a study, the larger the effect necessary for
the results to be found statistically significant (Sterne et al., 2000). Moreover, including published
small studies with large effects and not including negative studies at the same time could yield an
overestimation of true effect (Cuijpers, 2016).
Studies included in meta-analyses of proportions are observational and non-comparative, and thus
do not calculate significant levels for their results. This means that results reported in this kind of
studies cannot be interpreted as “positive/negative” or “desirable/undesirable.” Accordingly, sta-
tistical non-significance is unlikely to be an issue that may have biased publications (Maulik et al.,
2011). In practice, authors reporting low proportions (e.g., rare event rates) are just as likely to
have their work published as those reporting very high proportions (e.g., high cure rates). There-
fore, we believe that traditional publication bias modelling tools developed for randomized con-
trolled trials (i.e., funnel plot asymmetry analyses) are less useful in the context of meta-analyses
of observational studies (detailed explanation is provided below). However, authors of meta-analyses
of proportions keep using these tests and conclude that no publication bias is detected in their
studies when a significant relationship between study size and study effect is not found by the
tests.
8.2 Detecting publication bias through visual inspection of funnel plots

in meta-analyses of randomized controlled trials (RCTs)
The first step of exploring publication bias is to create a visual tool known as a funnel plot (Light
& Pillerner, 1984). Estimates of effect size are plotted on a funnel plot as circles, the distribution
55
of which represents the association between study effect and study size (Sterne et al., 2000). A
funnel plot is essentially a scatter plot of study effect estimates on the x-axis (in the case of a
meta-analysis of proportions, the logit- or double-arcsine- transformed proportion) against some
measure of study size on the y-axis (usually the standard error). In other words, on a funnel plot,
the location of a circle is plotted according to the sample size or precision (as precision depends
largely on sample size) of the corresponding study. A vertical line is situated at the value of the
(transformed) summary effect on the funnel plot. When a study is larger, it is more precise, and
thus its effect size is more similar to the summary effect size and it has a lower standard error;
when a study is smaller, it has less precision, and thus its effect size differs more from the sum-
mary effect size and it has a larger standard error. The result of this is, circles representing smaller
studies are broadly spread towards the bottom of the funnel plot and digress further from the cen-
ter line, whereas circles representing larger studies are distributed more narrowly towards the up-
per area of the graph, symmetrically clustered around the vertical line due to random sampling er-
rors (Sterne et al., 2000; Anzures-Cabrera & Higgins, 2010). In general, in the presence of publica-
tion bias, studies with null or undesired results will not be present (Littell et al., 2008), and thus
many circles at the bottom of a funnel plot and a few in the middle tend to be absent, making
the plot asymmetrical (Borenstein et al., 2005; Petrie et al., 2003). The empty area where circles
are not present could appear on either side of the center line, depending on the desired direction
of the effect. A funnel plot also consists of two limit lines indicating the 95% CI around the sum-
mary effect size, which can serve to visualize the extent of heterogeneity in true effects (Anzures-
Cabrera & Higgins, 2010): the more circles that lie beyond the two limit lines, the more likely it is
that heterogeneity is high.
8.3 An important caveat: Funnel plot asymmetry does not equal to pub-
lication bias
It is crucial for readers to know that the presence of funnel plot asymmetry does not necessar-
ily indicate publication bias (Hunter et al. 2014). Asymmetry may also arise due to numerous
mechanisms other than publication bias and heterogeneity (Egger et al., 1997), such as hetero-
geneity in true effects as we have mentioned above, English language bias (i.e., English speaking
researcher fail to include small studies that have been published in foreign languages), citation
bias (i.e., researchers failing to locate small studies in the search for relevant studies because they
are quoted less frequently in papers), etc., all of which can make the inclusion of small studies
56
less likely. However, when a small study shows a stronger effect due to various reasons (e.g., poor
study methodology), its chance of getting published is increased (Schwarzer et al., 2015). This
tendency, known as the “small-study effect”, can also contribute to funnel plot asymmetry (Sterne
et al., 2000). Visual inspection of funnel plots, however, might not allow one to draw concrete in-
terpretations and may even lead to misleading conclusions when such plots do not show a distinct
pattern suggesting evident publication bias (Card, 2015). In their study, Terrin et al. (2005) found
that participants (medical researchers) erroneously interpreted around 50% of funnel plots with
different levels of asymmetry as being affected or unaffected by publication bias by looking at the
graphs. This is not surprising because a funnel plot may not look very asymmetrical in the pres-
ence of publication bias, whereas it may not look very symmetrical in the absence of bias. Sterne
et al. (2000) suggested that funnel plots should be employed as a generic tool to identify small-
study effects instead of a means of diagnosing specific types of bias. In their study, Hunter et al.
(2014) examined the utility of funnel plots in detecting publication bias in meta-analyses of pro-
portions. They concluded that traditionally constructed funnel plots using standard error (SE)
or the inverse of SE on the y-axis may be a potentially misleading method for the detection of
publication bias because these methods are suspected to over-estimate the degree of funnel plot
asymmetry in cases where observed proportions are extreme. They suggested using sample size
as the measure of precision on the y-axis in dealing with extremely low or high proportions. In
summary, the interpretation of a funnel plot has an inherent degree of subjectivity, which makes it
often problematic (Vevea & Woods, 2005).
8.4 Detecting publication bias with formal tests: rank correlation test,
Egger’s regression test, and trim-and-fill
A more objective way to assess funnel plot asymmetry and unearth publication bias is to assess
the relationship between effect size and its precision (e.g., standard error) using linear regression
analysis. The presence of a strong relationship between these provides evidence of asymmetry in
the funnel plot and suggests the possibility of publication bias, while the absence of an relation-
ship suggests otherwise (Card, 2015; Rendina-Gobioff & Kromrey, 2004; Sterne et al., 2000). Two
variants of this linear regression analysis approach that are commonly employed are the rank cor-
relation test (Begg & Mazumdar, 1994) and Egger’s regression test (Egger et al., 1997). The rank
correlation test examines if observed study effects and their sampling variances are significantly
associated. A significant degree of correlation may suggest publication bias. A major limitation of
57
this test is its highly variable power. Generally, the test is quite powerful for large meta-analyses
involving more than 75 studies. Nonetheless, the power will become moderate when it is employed
in small meta-analyses with fewer than 25 studies (Rothstein et al., 2006). Thus, a non-significant
test result cannot be taken as evidence of a lack of publication bias when the meta-analysis is
small. Egger’s regression test is a weighted linear regression test in which the standardized ef-
fect estimate (i.e., the quotient of effect size and its standard error) is regressed against precision
(i.e., the inverse of the standard error). In other words, we can assess the publication bias by us-
ing precision to predict the standardized effect, with a significant result suggesting the possible
presence of publication bias. This test allows us to better detect publication bias in small meta-
analyses unless these studies are based on a small number of small studies. When publication bias
is suspected, the trim-and-fill method (Duval & Tweedie, 2000) can be used to estimate the num-
ber of missing studies that might exist in a meta-analysis and to determine where they might fall
on a funnel plot and visualize them in an attempt to increase the plot’s symmetry. Most impor-
tantly, it is able to add these missing studies to the analysis, recalculate the summary effect size,
and yield the best estimate of the unbiased summary effect size (Borenstein et al., 2005). It can
thus be used as a sensitivity analysis to assess the difference between the trim-and-fill-adjusted
and observed estimates; if the difference is negligible, the validity of the estimated summary effect
size will prove robust (Duval, 2005). However, the trim-and-fill performs poorly when study effects
are heterogeneous (Terrin et al., 2003). In addition, authors of meta-analyses of proportions ought
to be aware that the trim-and-fill approach depends strictly on the assumption that all missing
studies are those with the most negative or undesirable effect sizes. This assumption may be ques-
tionable when trim-and-fill is applied to meta-analyses of proportions with small studies because
it is possible that the effect sizes in such studies are actually large or small for a particular reason
and thus should not be considered “missing.” Put differently, a gap in the left-hand or right-hand
corner of a funnel plot may exist due to a particular reason rather than publication bias. Violat-
ing this assumption can lead to over-correction as pointed out by Vevea & Woods (2005).
8.5 An important caveat: A significant p-value is not indicative of the

presence of publication bias
Current methods for detecting publication bias and gauging its effects are based on the following
assumptions: (a) Large studies are published preferentially regardless of results. (b) Small studies
are unlikely to be published unless they have large effects. (c) Studies with a medium sample size
58
that have significant results are published. In other words, as the sample size of a study decreases,
the more likely the study is affected by publication bias (Borenstein, et al., 2005). As we can see,
traditional approaches of modelling publication bias, such as the aforementioned trim-and-fill, the
rank correlation test, and Egger’s regression model, as well as the more sophisticated weighted se-
lection approaches (e.g., Vevea & Hedges, 1995; Vevea & Woods, 2005) have all taken the assump-
tion that the likelihood of a study getting published depends on its sample size, statistical signif-
icance, or the direction of its results (Coburn & Vevea, 2015). Although it has been confirmed by
empirical research that statistical significance plays a dominant role in the publication of studies
(Preston et al., 2004), this is not entirely the case. In fact, the underlying publication selection
process across different fields is far more convoluted. Cooper et al. (1997) have demonstrated that
the decision as to whether or not to publish a study is influenced by a variety of criteria or “fil-
ters” set by journal editors and reviewers regardless of methodological quality and significance,
including but not limited to, the source of funding for research, social preferences related to race
and gender at the time when a research study is conducted, and even research findings that chal-
lenge previously held beliefs. In practice, authors of meta-analyses of proportions have employed
these methods in an attempt to detect publication bias. However, due to the unique nature of
studies included in meta-analyses of proportions (i.e., that they are non-comparative studies), the
p-value may not be a concern in the publication selection process, as a result of which these tradi-
tional methods may not be able to fully explain the asymmetric distribution of effect sizes on fun-
nel plots. Since the traditional methods fail to capture the full complexity of the selection process,
it is also possible that they may fail to identify publication bias in meta-analyses of proportions as
publication bias in non-comparative studies may arise for reasons independent of a lack of signifi-
cance. Hence, any conclusions regarding the presence of publication bias based on these methods
should be drawn with caution.
8.6 R code for creating funnel plots and perform asymmetry tests
As we have discussed earlier, when the goal is estimation of a single summary proportion rather
than a comparison of treatments, interventions, or methods, then publication bias is not really
pertinent. One can, of course, generate funnel plots and use tests (e.g., Egger test) to examine
whether the distribution of effect size estimates follows what one would ordinarily anticipate (i.e.,
less variation with higher number of studies constructs a roughly symmetric dispersion about the
mean) and to detect whether the small-study effect is present. Past that, the exercise is not very
59
informative.
The generic code to construct funnel plots is as follows:
X-axis scale: logit or double-arcsine transformed value
funnel(pes.logit or pes.da)
X-axis scale: proportion
funnel(pes)
funnel(pes.logit, atransf=transf.ilogit)
funnel(pes.da, atransf=transf.ipft.hm, targ=list(ni=dat$total))
To create a funnel plot for the running example, we would execute the following code:
funnel(pes.logit, yaxis="sei")
which produces:
There is clear evidence of heterogeneity and funnel plot asymmetry. There is also an indication of
small-study effect, even though the effect is not very evident. If we follow the suggestion of Hunter
et al. (2014) and use sample size as the measure of precision, we can change the argument yaxis=
“sei” to yaxis=“ni” in order to see if the asymmetry is induced by the method of funnel plot con-
struction. The new funnel plot is shown below:
It is evident that small-study effects truly exist in this meta-analysis study. Unfortunately, the
large differences between the small studies and the big studies are not explained in the original
report.
Now we conduct a rank correlation test to examine the relationship between the sample size and
the observed effect size of each study using the following code and the result is as follows:
60
rank (pes/pes.logit/pes.da)
Despite clear evidence to the contrary shown in the funnel plot the rank correlation test fails to
find a significant relationship between sample size and effect size. The reason could be that the
rank correlation test has low power when examining a small number of studies (we have men-
tioned that a meta-analysis of less than 25 studies is considered small and the current meta-analysis
consists of 17 studies).
Egger’s regression test performs better than the rank correlation test when the number of included
studies is small. It is important to note that the traditional weighted Egger’s regression test (Eg-
ger et al., 1997) is no longer advocated by Egger et al. due to a lack of theoretical justification
(Rothstein, et al., 2005). Therefore we will conduct only the unweighted regression test, which is
obtained by executing the following generic code:
regtest(pes/pes.logit/pes.da, model="rma", predictor="sei")
The Egger test shows that the funnel plot is significantly asymmetrical:
61
References
I cannot show you the references until the tutorial is published in a scientific journal. If you are
really interested in one or more articles that I cited in this tutorial, feel free to contact me via
email. Thank you for your understanding!
62
View publication stats

Meta Prop

Uploaded by

Copyright:

Available Formats

Meta Prop

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Meta Prop

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

How to Conduct a Meta-Analysis of Proportions in R: A Comprehensive Tutorial

Preprint · June 2018

How to Conduct a Meta-Analysis of Proportions in R: A Comprehensive Tutorial View project

The user has requested enhancement of the downloaded file.

Meta-analysis of proportions is observational and non-comparative in nature. Rarely have

we seen a study or tutorial demonstrate how a meta-analysis of proportions should be per-

ducting a meta-analysis of proportions, in which a number of common practices that possibly

and within-group estimates of between-study variance into consideration when calculating

formulas is kept to a minimum, and a published meta-analysis of proportions is used as an

vided to facilitate learning (watch the video at wangnaike.com).

purchase expensive statistical software like Comprehensive Meta-Analysis (CMA) to perform

to non-significant results) is not pertinent in the context of meta-analysis of proportions.

2 Setting up the R environment 6

2.1 R and RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Setting up the working directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Getting a sense of the data 7

3.1 Illustrative example: Prevalence and epidemiological characteristics of congenital

cataract (Wu et al., 2016) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Preferred formats for organizing data in Excel . . . . . . . . . . . . . . . . . . . . . . 8

4 Calculating effect sizes 10

4.1 Fixed-effect and random-effects model . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Transformation of proportions: the logit transformation and the double-arcsine

4.3 R code for calculating summary effect sizes . . . . . . . . . . . . . . . . . . . . . . . 15

5 Identifying and quantifying heterogeneity 19

5.1 Overview of heterogeneity: the between-study variance (τ 2 ) . . . . . . . . . . . . . . 19

5.2 Test for heterogeneity (Q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.3 Quantifying heterogeneity (I 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

tests and statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Creating and interpreting forest plots 24

6.1 Visual inspection of forest plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2 Identifying outliers with formal tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.3 R code for creating forest plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.4 R code for identifying outlying and influential studies . . . . . . . . . . . . . . . . . 28

6.5 R code for removing outlying studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Explaining heterogeneity with moderator analyses 33

7.1 Overview of moderator analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7.2 Subgroup analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7.5 Visualizing moderator analysis: Scatter plots . . . . . . . . . . . . . . . . . . . . . . 36

and recalculate the overall summary proportion . . . . . . . . . . . . . . . . . . . . . 37

7.8 R code for creating forest plots in the presence of subgroups . . . . . . . . . . . . . . 45

7.9 R code for conducting meta-regression . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.10 R code for visualizing moderator analyses . . . . . . . . . . . . . . . . . . . . . . . . 50

8 The issue of publication bias in meta-analyses of proportions 55

8.1 Overview of publication bias in the context of meta-analyses of proportions . . . . . 55

of randomized controlled trials (RCTs) . . . . . . . . . . . . . . . . . . . . . . . . . . 55

sion test, and trim-and-fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2001). Meta-analyses of proportions synthesize a one-dimensional binomial measure known as the

fixed-effect or random-effects model (Wang & Liu, 2016).

Studies included in meta-analyses of proportions are observational and non-comparative (i.e.,

substance use disorder in samples of homeless veterans in various urban areas.

Meta-analyses of proportions are commonly conducted in a variety of prominent fields, such as

decision models (Hunter et al., 2014).

nipulating data, especially in the case of meta-analyzing proportions.

signed to perform meta-analyses, such as Comprehensive Meta-Analysis (Borenstein et al., 2005)

data should be transformed in R.

2 Setting up the R environment

2.1 R and RStudio

be downloaded for free from the Comprehensive R Archive Network, (https://cran.r-project.org/).