The University of Manchester Research
Genome-wide association study identifies 74 loci
associated with educational attainment
DOI:
10.1038/nature17671
Document Version
Accepted author manuscript
Link to publication record in Manchester Research Explorer
Citation for published version (APA):
Okbay, A., Beauchamp, J. P., Fontana, M. A., Lee, J. J., Pers, T. H., Rietveld, C. A., Turley, P., Chen, G-B.,
Emilsson, V., Meddens, S. F. W., Oskarsson, S., Pickrell, J. K., Thom, K., Timshel, P., de Vlaming, R., Abdellaoui,
A., Ahluwalia, T. S., Bacelis, J., Baumbach, C., ... LifeLines Cohort Study (2016). Genome-wide association study
identifies 74 loci associated with educational attainment. Nature -London-, 533(7604), 539-42.
https://doi.org/10.1038/nature17671
Published in:
Nature -LondonCiting this paper
Please note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscript
or Proof version this may differ from the final Published version. If citing, it is advised that you check and use the
publisher's definitive version.
General rights
Copyright and moral rights for the publications made accessible in the Research Explorer are retained by the
authors and/or other copyright owners and it is a condition of accessing publications that users recognise and
abide by the legal requirements associated with these rights.
Takedown policy
If you believe that this document breaches copyright please refer to the University of Manchester’s Takedown
Procedures [http://man.ac.uk/04Y6Bo] or contact uml.scholarlycommunications@manchester.ac.uk providing
relevant details, so we can investigate your claim.
Download date:18. Jun. 2020
LETTER
Genome-wide association study identifies 74 loci
associated with educational attainment
A list of authors and their affiliations appears at the end of the paper.
Our meta-analysis identified 74 approximately independent genomewide significant loci. For each locus, we define the ‘lead SNP’ as the SNP
in the genomic region that has the smallest P value (Supplementary
Information section 1.6.1). Figure 1 shows a Manhattan plot with
the lead SNPs highlighted. This includes the three SNPs that reached
genome-wide significance in the discovery stage of our previous GWAS
meta-analysis of educational attainment1. The quantile–quantile (Q–Q)
plot of the meta-analysis (Extended Data Fig. 1) exhibits inflation
(λGC = 1.28), as expected under polygenicity3.
Extended Data Fig. 2 shows the estimated effect sizes of the lead
SNPs. The estimates range from 0.014 to 0.048 standard deviations
per allele (2.7 to 9.0 weeks of schooling), with incremental R2 in the
range 0.01% to 0.035%.
To quantify the amount of population stratification in the GWAS
estimates that remains even after the stringent controls used by the
cohorts (Supplementary Information section 1.4), we used linkagedisequilibrium (LD) score regression4. The regression results indicate that ∼8% of the observed inflation in the mean χ2 is due to bias
rather than polygenic signal (Extended Data Fig. 3a), suggesting that
stratification effects are small in magnitude. We also found evidence
for polygenic association signal in several within-family analyses,
although these are not powered for individual SNP association testing
(Supplementary Information section 2 and Extended Data Fig. 3b).
To further test the robustness of our findings, we examined the withinsample and out-of-sample replicability of SNPs reaching genomewide significance (Supplementary Information sections 1.7–1.8). We
found that SNPs identified in the previous educational attainment
meta-analysis replicated in the new cohorts included here, and conversely, that SNPs reaching genome-wide significance in the new
cohorts replicated in the old cohorts. For the out-of-sample replication analyses of our 74 lead SNPs, we used the interim release of the
UK Biobank5 (UKB) (n = 111,349). As shown in Extended Data Fig. 4,
Educational attainment is strongly influenced by social and
other environmental factors, but genetic factors are estimated to
account for at least 20% of the variation across individuals1. Here
we report the results of a genome-wide association study (GWAS)
for educational attainment that extends our earlier discovery
sample1,2 of 101,069 individuals to 293,723 individuals, and a
replication study in an independent sample of 111,349 individuals
from the UK Biobank. We identify 74 genome-wide significant loci
associated with the number of years of schooling completed. Singlenucleotide polymorphisms associated with educational attainment
are disproportionately found in genomic regions regulating gene
expression in the fetal brain. Candidate genes are preferentially
expressed in neural tissue, especially during the prenatal period, and
enriched for biological pathways involved in neural development.
Our findings demonstrate that, even for a behavioural phenotype
that is mostly environmentally determined, a well-powered GWAS
identifies replicable associated genetic variants that suggest
biologically relevant pathways. Because educational attainment
is measured in large numbers of individuals, it will continue
to be useful as a proxy phenotype in efforts to characterize the
genetic influences of related phenotypes, including cognition and
neuropsychiatric diseases.
Educational attainment is measured in all main analyses as the
number of years of schooling completed (EduYears, n = 293,723,
mean = 14.3, s.d. = 3.6; Supplementary Information sections 1.1–1.2).
All GWAS were performed at the cohort level in samples restricted to
individuals of European descent whose educational attainment was
assessed at or above age 30. A uniform set of quality-control procedures was applied to the cohort-level summary statistics. In our GWAS
meta-analysis of ∼9.3 million SNPs from the 1000 Genomes Project,
we used sample-size weighting and applied a single round of genomic
control at the cohort level.
26
6
25
13
rs
48
51
Lead SNPs (n = 74)
Rietveld et al. (2013) SNP
P value < 5 × 10–8
rs
93
20
9
20
70
0
15
11
58
4
10
rs
Significance of association (–log10[P value])
1
doi:10.1038/nature17671
5
0
1
2
3
4
5
6
7
8
Chromosome
Figure 1 | Manhattan plot for EduYears associations (n = 293,723).
The x axis is chromosomal position, and the y axis is the significance on
a −log10 scale (two-tailed test). The black line shows the genome-wide
9
10
11
12 13 14 15 16 17 18 19
20 22
21
significance level (5 × 10−8). The red crosses are the 74 approximately
independent genome-wide significant associations (lead SNPs). The black
dots labelled with rs numbers are the three SNPs identified in ref. 1.
0 0 M O N T H 2 0 1 6 | VO L 0 0 0 | NAT U R E | 1
1.0
0.5
0.0
Brain volume
Neuropsychiatric
Behavioural
Height
BMI
Neuroticism
Cognitive
performance
Schizophrenia
Bipolar
Alzheimer's
Thalamus
Intracranial
volume
Putamen
Pallidum
Hippocampus
Caudate
–0.5
Accumbens
Genetic correlation with EduYears
RESEARCH LETTER
Anthropometric
Figure 2 | Genetic correlations between EduYears and other traits. Results from bivariate LD score regressions9: estimates of genetic correlation with
brain volume, neuropsychiatric, behavioural, and anthropometric phenotypes using published GWAS summary statistics. The error bars show the 95%
confidence intervals (CI).
72 out of the 74 lead SNPs have a consistent sign (P = 1.47 × 10−19), 52
are significant at the 5% level (P = 2.68 × 10−50), and 7 reach genomewide significance in the UK Biobank data set (P = 1.41 × 10−42). For
comparison, the corresponding expected numbers, assuming each
SNP’s true effect size is its estimated effect adjusted for the winner’s
curse, are 71.4, 40.3, and 0.6. (Supplementary Information section
1.8.2). We also find out-of-sample replicability of our overall GWAS
results: the genetic correlation between EduYears in our metaanalysis sample and in the UKB data is 0.95 (s.e. = 0.021; Supplementary
Table 1.14).
It is known that educational attainment, cognitive performance, and
many neuropsychiatric phenotypes are phenotypically correlated, and
several studies of twins find that the phenotypic correlations partly
reflect genetic overlap6–8 (Supplementary Information section 3.3.4).
Here we investigate genetic correlation using our GWAS results for
EduYears and published GWAS results for 14 other phenotypes, using
bivariate LD score regression9 (Supplementary Information section 3).
First, we estimated genetic correlations with EduYears. As shown in
Fig. 2, based on overall summary statistics for associated variants, we
find genetic covariance between increased educational attainment and
increased cognitive performance (P = 9.9 × 10−50), increased intracranial volume (P = 1.2 × 10−6), increased risk of bipolar disorder
(P = 7 × 10−13), decreased risk of Alzheimer’s (P = 4 × 10−4), and lower
neuroticism (P = 2.8 × 10−8). We also found positive, statistically significant, but very small, genetic correlations with height (P = 5.2 × 10−15)
and risk of schizophrenia (P = 3.2 × 10−4).
Second, we examined whether our 74 lead SNPs are jointly associated with each phenotype (Extended Data Fig. 5 and Supplementary
Information section 3.3.1). We reject the null hypothesis of no enrichment at P < 0.05 for 10 of the 14 phenotypes (all the exceptions are
subcortical brain structures).
Third, for each phenotype, we tested (in the published GWAS
results) each of our 74 lead SNPs (or its proxy) for association at a
significance threshold of 0.05/74. We found a total of 25 SNPs meeting this threshold for any of these phenotypes, but only one reaching
genome-wide significance. While these results provide suggestive
evidence that some of these SNPs may be associated with other
phenotypes, further testing of these associations in independent
cohorts is required (Supplementary Tables 3.2–3.4, Extended Data
Fig. 6).
To consider potential biological pathways, we first tested whether
SNPs in particular regions of the genome are implicated by our GWAS
results. Unlike what has been found for other phenotypes, SNPs in
regions that are DNase I hypersensitive in the fetal brain are more likely
to be associated with EduYears by a factor of ∼5 (95% confidence interval 2.89–7.07; Extended Data Fig. 7). Moreover, the 15% of SNPs residing in regions associated with histones marked in the central nervous
2 | NAT U R E | VO L 0 0 0 | 0 0 M O N T H 2 0 1 6
system (CNS) explain 44% of the heritable variation (Extended Data
Fig. 8a and Supplementary Table 4.4.2). This enrichment factor of
∼3 for CNS (P = 2.48 × 10−16) is greater than that of any of the other
nine tissue categories in this analysis.
Given that our findings disproportionately implicate SNPs in regions
regulating brain-specific gene expression, we examined whether genes
located near EduYears-associated SNPs show elevated expression in
neural tissue. We tested this hypothesis using data on mRNA transcript levels in the 37 adult tissues assayed by the Genotype-Tissue
Expression Project (GTEx)10. Remarkably, the 13 GTEx tissues that are
components of the CNS—and only those 13 tissues—show significantly
elevated expression levels of genes near EduYears-associated SNPs
(false discovery rate <0.05; Extended Data Fig. 8b and Supplementary
Table 4.5.2).
To investigate possible functions of the candidate genes from the
GWAS-implicated loci, we examined the extent of their overlap with
groups of genes (‘gene sets’) whose products are known or predicted to
participate in a common biological process11. We found 283 gene sets
significantly enriched by the candidate genes identified in our GWAS
(false discovery rate <0.05; Supplementary Table 4.5.1). To facilitate
interpretation, we used a standard procedure11 to group the 283 gene
sets into ‘clusters’ defined by degree of gene overlap. The resulting 34
clusters, shown in Fig. 3, paint a coherent picture, with many clusters
corresponding to stages of neural development: the proliferation of
neural progenitor cells and their specialization (the cluster npBAF
complex), the migration of new neurons to the different layers of the
cortex (forebrain development, abnormal cerebral cortex morphology),
the projection of axons from neurons to their signalling targets (axonogenesis, signalling by Robo receptor), the sprouting of dendrites and
their spines (dendrite, dendritic spine organization), and neuronal signalling and synaptic plasticity throughout the lifespan (voltage-gated
calcium channel complex, synapse part, synapse organization).
Many of our results implicate candidate genes and biological pathways that are active during distinct stages of prenatal brain development. To directly examine how the expression levels of candidate
genes identified in our GWAS vary over the course of development,
we used gene expression data from the BrainSpan Developmental
Transcriptome12. As shown in Extended Data Fig. 9, these candidate genes exhibit above-baseline expression in the brain throughout
life but especially higher expression levels in the brain during prenatal development (1.36 times higher prenatally than post-natally,
P = 6.02 × 10−8).
A summary overview of some promising candidate genes for
follow-up work is provided in Table 1.
We constructed polygenic scores13 to assess the joint predictive
power afforded by the GWAS results (Supplementary Information
section 5.2). Across our two holdout samples, the mean predictive
LETTER RESEARCH
Gene set P values
Voltage-gated calcium
channel complex
Dendritic spine
organization
PPP1R2 PPI
P<1×
Neuron recognition
Histone deacetylase
activity (H3K14 specific)
AP1S3 PPI
Dendrite
Low
Medium
High
10–6
P < 1 × 10–8
Vesicle membrane
Synapse organization
Gene set overlap
P < 1 × 10–4
PSMD2 PPI
Synapse part
CALM2 PPI
Abnormal cerebral cortex
morphology
Steroid hormone receptor
activity
GRIK2 PPI
Axonogenesis
ENTHD1 PPI
Signaling by EGFR
Forebrain development
SHANK2 PPI
Regulation of cell
development
BTBD2 PPI
npBAF complex
Histone acetyltransferase
complex
Transcription cofactor
activity
Protein kinase binding
CTNNB1 PPI
POU2F3 PPI
ESR1 PPI
Regulation of muscle cell
differentiation
Signaling by Robo
receptor
SRCAP PPI
RAC2 PPI
CBX8 PPI
Figure 3 | Overview of biological annotation. Thirty-four clusters of
significantly enriched gene sets. Each cluster is named after one of its
member gene sets. The colour represents the permutation P value of
the member set exhibiting the most statistically significant enrichment.
Overlap between pairs of clusters is represented by an edge. Edge width
represents the Pearson correlation ρ between the two vectors of gene
membership scores (ρ < 0.3, no edge; 0.3 ≤ ρ < 0.5, thin edge; 0.5 ≤ ρ < 0.7,
intermediate edge; ρ ≥ 0.7, thick edge), where each cluster’s vector is the
vector for the gene set after which the cluster is named.
power of a polygenic score constructed from all measured SNPs is
3.2% (P = 1.18 × 10−39; Supplementary Table 5.2 and Supplementary
Information section 5).
Studies of genetic analyses of behavioural phenotypes have been
prone to misinterpretation, such as characterizing identified associated variants as ‘genes for education’. Such characterization is not
correct for many reasons: educational attainment is primarily determined by environmental factors, the explanatory power of the individual SNPs is small, the candidate genes may not be causal, and
the genetic associations with educational attainment are mediated
by multiple intermediate phenotypes14. To illustrate this last point,
we studied mediation of the association between the all-SNPs polygenic score and EduYears in two of our cohorts. We found that
cognitive performance can statistically account for 23–42% of the
association (P < 0.001) and the personality trait ‘openness to experience’ for approximately 7% (P < 0.001; Supplementary Information
section 6).
It would also be a mistake to infer from our findings that the genetic
effects operate independently of environmental factors. Indeed, a
recent meta-analysis of twin studies found that genetic influences
on educational attainment are heterogeneous across countries and
birth cohorts15. We conducted exploratory analyses in the Swedish
Twin Registry to illustrate how environmental factors may amplify or
dampen the impact of genetic influences (Supplementary Information
section 7). We found that the predictive power of the all-SNPs polygenic
score is heterogeneous by birth cohort, with smaller explanatory power
in younger cohorts (Extended Data Fig. 10; see also Supplementary
Information section 7.4 for discussion of the contrast between these
Table 1 | Selected candidate genes implicated by bioinformatics analyses
Gene
SNP
Syndromic
Score
Top-ranking gene sets
TBR1
MEF2C
ZSWIM6
BCL11A
CELSR3
MAPT
SBNO1
NBAS
NBEA
SMARCA2
MAP4
LINC00461
rs4500960
rs7277187
rs61160187
rs2457660
rs11712056
rs192818565
rs7306755
rs12987662
rs9544418
rs1871109
rs11712056
rs10061788
ID, ASD
ID, ASD
–
ASD
SCZ
ID
SCZ
–
SCZ
ID
ASD
–
6
5
5
5
5
5
5
5
4
4
4
4
POU3F2
RAD54L2
PLK2
rs9320913
rs11712056
rs2964197
–
SCZ
–
4
4
4
Developmental biology, decreased brain size, abnormal cerebral cortex morphology
ErbB signalling pathway, abnormal sternum ossification, regulation of muscle cell differentiation
Transcription factor binding, negative regulation of signal transduction, PI3K events in ErbB4 signalling
Dendritic spine organization, abnormal hippocampal mossy fibre morphology, SWI/SNF-type complex
Dendrite morphogenesis, dendrite development, abnormal hippocampal mossy fibre morphology
Dendrite morphogenesis, abnormal hippocampal mossy fibre morphology, abnormal axon guidance
Protein serine/threonine phosphatase complex
–
Developmental biology, signalling by Robo receptor, dendritic shaft
–
Developmental biology, signalling by Robo receptor, SWI-SNF-type complex
Decreased brain size, abnormal cerebral cortex morphology, abnormal hippocampal mossy fibre
morphology
Dendrite morphogenesis, developmental biology, decreased brain size
Decreased brain size, SWI/SNF-type complex, nBAF complex
Negative regulation of signal transduction, PI3K events in ErbB4 signalling
Fifteen candidate genes implicated most consistently across various analyses. To assemble this list, each gene in a DEPICT-defined locus (Supplementary Information section 4.5) was assigned a score
equal to the number of criteria it satisfies out of ten (see Supplementary Table 4.1 for details). The DEPICT prioritization P value was used as the tiebreaker. SNP, the SNP in the gene’s locus with the
lowest P value in the EduYears meta-analysis. Syndromic, which, if any, of three neuropsychiatric disorders have been linked to de novo mutations in the gene (Supplementary Information section 4.6).
Top-ranking gene sets, DEPICT reconstituted gene sets of which the gene is a top-20 member (Supplementary Table 4.5.1). The three most significant gene sets are shown if more than three are
available. ID, intellectual disability; ASD, autism spectrum disorder; SCZ, schizophrenia; ErbB, erythroblastosis oncogene B; PI3K, phosphatidylinositol-4,5-bisphosphate 3-kinase; SWI/SNF, SWitch/
sucrose non-fermentable; nBAF, neuronal BRG1- or HRBM-associated factors.
0 0 M O N T H 2 0 1 6 | VO L 0 0 0 | NAT U R E | 3
RESEARCH LETTER
results and findings from a seminal twin study that estimated educational attainment heritability by birth cohort16).
Online Content Methods, along with any additional Extended Data display items and
Source Data, are available in the online version of the paper; references unique to
these sections appear only in the online paper.
Received 24 June 2015; accepted 16 March 2016.
Published online xx xx 2016.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
2
16.
17.
Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants
associated with educational attainment. Science 340, 1467–1471 (2013).
Rietveld, C. A. et al. Replicability and robustness of genome-wide-association
studies for behavioral traits. Psychol. Sci. 25, 1975–1986 (2014).
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J.
Hum. Genet. 19, 807–812 (2011).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from
polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Sudlow, C. et al. UK biobank: an open access resource for identifying the
causes of a wide range of complex diseases of middle and old age. PLoS Med.
12, e1001779 (2015).
Fowler, T., Zammit, S., Owen, M. J. & Rasmussen, F. A population-based study
of shared genetic variation between premorbid IQ and psychosis among
male twin pairs and sibling pairs from Sweden. Arch. Gen. Psychiatry 69,
460–466 (2012).
Tambs, K., Sundet, J. M., Magnus, P. & Berg, K. Genetic and environmental
contributions to the covariance between occupational status, educational
attainment, and IQ: a study of twins. Behav. Genet. 19, 209–222 (1989).
Thompson, L. A., Detterman, D. K. & Plomin, R. Associations between cognitive
abilities and scholastic achievement: Genetic overlap but environmental
differences. Psychol. Sci. 2, 158–165 (1991).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases
and traits. Nat. Genet. 47, 1236–1241 (2015).
Ardlie, K. G. et al.; GTEx Consortium. Human genomics. The Genotype-Tissue
Expression (GTEx) pilot analysis: multitissue gene regulation in humans.
Science 348, 648–660 (2015).
Pers, T. H. et al. Biological interpretation of genome-wide association studies
using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Allen Institute for Brain Science. BrainSpan atlas of the developing human
brain (2015) <http://www.brainspan.org/>.
Purcell, S. M. et al. Common polygenic variation contributes to risk of
schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Krapohl, E. et al. The high heritability of educational achievement reflects
many genetically influenced traits, not just intelligence. Proc. Natl Acad. Sci.
USA 111, 15273–15278 (2014).
Branigan, A. R., McCallum, K. J. & Freese, J. Variation in the heritability of
educational attainment: An international meta-analysis. Social Forces 92,
109–140 (2013).
Heath, A. C. et al. Education policy and the heritability of educational
attainment. Nature 314, 734–736 (1985).
Finucane, H. K. et al. Partitioning heritability by functional annotation using
genome-wide association summary statistics. Nat. Genetics 47, 1228–1235
(2015).
Supplementary Information is available in the online version of the paper.
Acknowledgements This research was carried out under the auspices of
the Social Science Genetic Association Consortium (SSGAC). This research
has also been conducted using the UK Biobank Resource. This study was
supported by funding from the Ragnar Söderberg Foundation (E9/11),
the Swedish Research Council (421-2013-1061), The Jan Wallander and
Tom Hedelius Foundation, an ERC Consolidator Grant (647648 EdGe), the
Pershing Square Fund of the Foundations of Human Behavior, and the NIA/
NIH through grants P01-AG005842, P01-AG005842-20S2, P30-AG012810,
and T32-AG000186-23 to NBER, and R01-AG042568 to USC. We thank
S. Cunningham, N. Galla and J. Rashtian for research assistance. A full list of
acknowledgments is provided in the Supplementary Information.
Author Contributions Study design and management: D.J.B., D.Ce., T.E., M.J.,
P.D.K. and P.M.V. Quality control and meta-analysis: A.O., G.B.C., T.E., M.A.F.,
C.A.R. and T.H.P. Stratification: P.T., J.P.B., C.A.R. and J.Y. Genetic overlap: J.P.B.,
M.A.F., P.T. Biological annotation: J.J.L., T.E., T.H.P., J.K.P., J.H.B., J.P.B., L.F., V.E.,
G.A.M., M.A.F., S.F.W.M., P.Ti., R.A.P., R.d.V. and H.J.W. Prediction and mediation:
J.P.B., M.A.F. and J.Y. G×E: D.Co., S.F.L., K.O.L., S.O. and K.T. Replication in UKB:
M.A.F. and C.A.R. SSGAC advisory board: D.Co., T.E., A.H., R.F.K., D.I.L., S.E.M.,
M.N.M., G.D.S. and P.M.V. All authors contributed to and critically reviewed
the manuscript. Authors not listed above contributed to the recruitment,
genotyping, or data processing for the contributing components of the
meta-analysis. For a full list of author contributions, see Supplementary
Information section 8.
Author Information Results can be downloaded from the SSGAC website
(http://ssgac.org/Data.php). Data for our analyses come from many studies
4 | NAT U R E | VO L 0 0 0 | 0 0 M O N T H 2 0 1 6
and organizations, some of which are subject to a MTA, and are listed in the
Supplementary Information. Reprints and permissions information is available
at www.nature.com/reprints. The authors declare no competing financial
interests. Readers are welcome to comment on the online version of the
paper. Correspondence and requests for materials should be addressed
to D.J.B. (daniel.benjamin@gmail.com), D.Ce. (dac12@nyu.edu),
P.D.K. (p.d.koellinger@vu.nl) or P.M.V. (peter.visscher@uq.edu.au).
Aysu Okbay1,2,3*, Jonathan P. Beauchamp4*, Mark Alan Fontana5*,
James J. Lee6*, Tune H. Pers7,8,9,10*, Cornelius A. Rietveld1,2,3*, Patrick Turley4*,
Guo-Bo Chen11, Valur Emilsson12,13, S. Fleur W. Meddens3,14,15, Sven Oskarsson16,
Joseph K. Pickrell17, Kevin Thom18, Pascal Timshel8,19, Ronald de Vlaming1,2,3,
Abdel Abdellaoui20, Tarunveer S. Ahluwalia9,21,22, Jonas Bacelis23,
Clemens Baumbach24,25, Gyda Bjornsdottir26, Johannes H. Brandsma27,
Maria Pina Concas28, Jaime Derringer29, Nicholas A. Furlotte30,
Tessel E. Galesloot31, Giorgia Girotto32, Richa Gupta33, Leanne M. Hall34,35,
Sarah E. Harris36,37, Edith Hofer38,39, Momoko Horikoshi40,41,
Jennifer E. Huffman42, Kadri Kaasik43, Ioanna P. Kalafati44, Robert Karlsson45,
Augustine Kong26, Jari Lahti43,46, Sven J. van der Lee2, Christiaan de Leeuw14,47,
Penelope A. Lind48, Karl-Oskar Lindgren16, Tian Liu49, Massimo Mangino50,51,
Jonathan Marten42, Evelin Mihailov52, Michael B. Miller6, Peter J. van der Most53,
Christopher Oldmeadow54,55, Antony Payton56,57, Natalia Pervjakova52,58,
Wouter J. Peyrot59, Yong Qian60, Olli Raitakari61, Rico Rueedi62,63, Erika Salvi64,
Börge Schmidt65, Katharina E. Schraut66, Jianxin Shi67, Albert V. Smith68,69,
Raymond A. Poot27, Beate St Pourcain70,71, Alexander Teumer72,
Gudmar Thorleifsson26, Niek Verweij73, Dragana Vuckovic32,
Juergen Wellmann74, Harm-Jan Westra8,75,76, Jingyun Yang77,78, Wei Zhao79,
Zhihong Zhu11, Behrooz Z. Alizadeh53,80, Najaf Amin2, Andrew Bakshi11,
Sebastian E. Baumeister72,81, Ginevra Biino82, Klaus Bønnelykke21,
Patricia A. Boyle77,83, Harry Campbell66, Francesco P. Cappuccio84,
Gail Davies36,85, Jan-Emmanuel De Neve86, Panos Deloukas87,88,
Ilja Demuth89,90, Jun Ding60, Peter Eibich91,92, Lewin Eisele65, Niina Eklund58,
David M. Evans70,93, Jessica D. Faul94, Mary F. Feitosa95, Andreas J. Forstner96,97,
Ilaria Gandin32, Bjarni Gunnarsson26, Bjarni V. Halldórsson26,98,
Tamara B. Harris99, Andrew C. Heath100, Lynne J. Hocking101,
Elizabeth G. Holliday54,55, Georg Homuth102, Michael A. Horan103,
Jouke-Jan Hottenga20, Philip L. de Jager8,104,105, Peter K. Joshi66,
Astanand Jugessur106, Marika A. Kaakinen107, Mika Kähönen108,109,
Stavroula Kanoni87, Liisa Keltigangas-Järvinen43, Lambertus A. L. M. Kiemeney31,
Ivana Kolcic110, Seppo Koskinen58, Aldi T. Kraja95, Martin Kroh91,
Zoltan Kutalik62,63,111, Antti Latvala33, Lenore J. Launer112, Maël P. Lebreton15,113,
Douglas F. Levinson114, Paul Lichtenstein45, Peter Lichtner115,
David C. M. Liewald36,85, LifeLines Cohort Study†, Anu Loukola33,
Pamela A. Madden100, Reedik Mägi52, Tomi Mäki-Opas58,
Riccardo E. Marioni11,36,116, Pedro Marques-Vidal117, Gerardus A. Meddens118,
George McMahon70, Christa Meisinger25, Thomas Meitinger115,
Yusplitri Milaneschi59, Lili Milani52, Grant W. Montgomery119, Ronny Myhre106,
Christopher P. Nelson34,35, Dale R. Nyholt119,120, William E. R. Ollier56,
Aarno Palotie8,121,122,123,124,125, Lavinia Paternoster70, Nancy L. Pedersen45,
Katja E. Petrovic38, David J. Porteous37, Katri Räikkönen43,46, Susan M. Ring70,
Antonietta Robino126, Olga Rostapshova4,127, Igor Rudan66, Aldo Rustichini128,
Veikko Salomaa58, Alan R. Sanders129,130, Antti-Pekka Sarin124,131,
Helena Schmidt38,132, Rodney J. Scott55,133, Blair H. Smith134, Jennifer A. Smith79,
Jan A. Staessen135,136, Elisabeth Steinhagen-Thiessen89, Konstantin Strauch137,138,
Antonio Terracciano139, Martin D. Tobin140, Sheila Ulivi126, Simona Vaccargiu28,
Lydia Quaye50, Frank J. A. van Rooij2,141, Cristina Venturini50,51,
Anna A. E. Vinkhuyzen11, Uwe Völker102, Henry Völzke72, Judith M. Vonk53,
Diego Vozzi127, Johannes Waage21,22, Erin B. Ware79,142, Gonneke Willemsen20,
John R. Attia54,55, David A. Bennett77,78, Klaus Berger73, Lars Bertram143,144,
Hans Bisgaard21, Dorret I. Boomsma20, Ingrid B. Borecki95, Ute Bültmann145,
Christopher F. Chabris146, Francesco Cucca147, Daniele Cusi64,148, Ian J. Deary36,85,
George V. Dedoussis44, Cornelia M. van Duijn2, Johan G. Eriksson46,149,
Barbara Franke150, Lude Franke151, Paolo Gasparini32,126,152, Pablo V. Gejman129,130,
Christian Gieger24, Hans-Jörgen Grabe153,154, Jacob Gratten11,
Patrick J. F. Groenen155, Vilmundur Gudnason12,69, Pim van der Harst73,151,156,
Caroline Hayward42,157, David A. Hinds30, Wolfgang Hoffmann72,
Elina Hyppönen158,159,160, William G. Iacono6, Bo Jacobsson23,106,
Marjo-Riitta Järvelin161,162,163,164, Karl-Heinz Jöckel65, Jaakko Kaprio33,58,124,
Sharon L. R. Kardia79, Terho Lehtimäki165,166, Steven F. Lehrer167,168,
Patrik K. E. Magnusson45, Nicholas G. Martin169, Matt McGue6,
Andres Metspalu52,170, Neil Pendleton171,172, Brenda W. J. H. Penninx59,
Markus Perola52,58, Nicola Pirastu32, Mario Pirastu28, Ozren Polasek66,173,
Danielle Posthuma14,174, Christine Power160, Michael A. Province95,
Nilesh J. Samani34,35, David Schlessinger60, Reinhold Schmidt38,
Thorkild I. A. Sørensen9,70,175, Tim D. Spector50, Kari Stefansson26,69,
Unnur Thorsteinsdottir26,69, A. Roy Thurik1,3,176,177, Nicholas J. Timpson70,
Henning Tiemeier2,178,179, Joyce Y. Tung30, André G. Uitterlinden2,180,
Veronique Vitart42, Peter Vollenweider117, David R. Weir94, James F. Wilson42,66,
Alan F. Wright42, Dalton C. Conley181,182, Robert F. Krueger6,
George Davey Smith70, Albert Hofman2, David I. Laibson4, Sarah E. Medland48,
Michelle N. Meyer183, Jian Yang11,93, Magnus Johannesson184,
Peter M. Visscher11,93§, Tõnu Esko7,8,52,185§, Philipp D. Koellinger3,14,15§,
David Cesarini18,186§ & Daniel J. Benjamin5§
LETTER RESEARCH
1
Department of Applied Economics, Erasmus School of Economics, Erasmus
University Rotterdam, Rotterdam, 3062 PA, The Netherlands. 2Department of
Epidemiology, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands.
3
Erasmus University Rotterdam Institute for Behavior and Biology, Rotterdam, 3062
PA, The Netherlands. 4Department of Economics, Harvard University, Cambridge,
Massachusetts 02138, USA. 5Center for Economic and Social Research, University
of Southern California, Los Angeles, California 90089-3332, USA. 6Department of
Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455,
USA. 7Division of Endocrinology and Center for Basic and Translational Obesity
Research, Boston Children’s Hospital, Boston, Massachusetts 2116, USA. 8Program
in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge,
Massachusetts 02142, USA. 9The Novo Nordisk Foundation Center for Basic
Metabolic Research, Section of Metabolic Genetics, University of Copenhagen,
Faculty of Health and Medical Sciences, Copenhagen 2100, Denmark. 10Statens
Serum Institut, Department of Epidemiology Research, Copenhagen 2300,
Denmark. 11Queensland Brain Institute, The University of Queensland, Brisbane,
QLD 4072, Australia. 12Icelandic Heart Association, Kopavogur 201, Iceland.
13
Faculty of Pharmaceutical Sciences, University of Iceland, Reykjavík 107, Iceland.
14
Department of Complex Trait Genetics, VU University, Center for Neurogenomics
and Cognitive Research, Amsterdam, 1081 HV, The Netherlands. 15Amsterdam
Business School, University of Amsterdam, Amsterdam, 1018 TV, The Netherlands.
16
Department of Government, Uppsala University, Uppsala 751 20, Sweden.
17
New York Genome Center, New York, New York 10013, USA. 18Department of
Economics, New York University, New York, New York 10012, USA. 19Center for
Biological Sequence Analysis, Department of Systems Biology, Technical University
of Denmark Lyngby 2800, Denmark. 20Department of Biological Psychology,
VU University Amsterdam, Amsterdam, 1081 BT, The Netherlands. 21COPSAC,
Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte
Hospital, University of Copenhagen, Copenhagen 2820, Denmark. 22Steno Diabetes
Center, Gentofte 2820, Denmark. 23Department of Obstetrics and Gynecology,
Institute of Clinical Sciences, Sahlgrenska Academy, Gothenburg 416 85, Sweden.
24
Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German
Research Center for Environmental Health, Neuherberg 85764, Germany. 25Institute
of Epidemiology II, Helmholtz Zentrum München, German Research Center for
Environmental Health, Neuherberg 85764, Germany. 26deCODE Genetics/Amgen
Inc., Reykjavik 101, Iceland. 27Department of Cell Biology, Erasmus Medical Center
Rotterdam, 3015 CN, The Netherlands. 28Istituto di Ricerca Genetica e Biomedica
U.O.S. di Sassari, National Research Council of Italy, Sassari 07100, Italy.
29
Psychology, University of Illinois, Champaign, Illinois 61820, USA. 3023andMe,
Inc., Mountain View, California 94041, USA. 31Radboud Institute for Health Sciences,
Radboud University Medical Center, Nijmegen, 6500 HB, The Netherlands.
32
Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste
34100, Italy. 33Department of Public Health, University of Helsinki, 00014 Helsinki,
Finland. 34Department of Cardiovascular Sciences, University of Leicester, Leicester
LE3 9QP, UK. 35NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield
Hospital, Leicester LE3 9QP, UK. 36Centre for Cognitive Ageing and Cognitive
Epidemiology, University of Edinburgh, Edinburgh EH8 9JZ, UK. 37Centre for
Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine,
University of Edinburgh, Edinburgh EH4 2XU, UK. 38Department of Neurology,
General Hospital and Medical University Graz, Graz 8036, Austria. 39Institute for
Medical Informatics, Statistics and Documentation, General Hospital and Medical
University Graz, Graz 8036, Austria. 40Oxford Centre for Diabetes, Endocrinology &
Metabolism, University of Oxford, Oxford OX3 7LE, UK. 41Wellcome Trust Centre for
Human Genetics, University of Oxford, Oxford OX3 7BN, UK. 42MRC Human
Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh,
Edinburgh EH4 2XU, UK. 43Institute of Behavioural Sciences, University of Helsinki,
00014 Helsinki, Finland. 44Nutrition and Dietetics, Health Science and Education,
Harokopio University, Athens 17671, Greece. 45Department of Medical
Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 171 77, Sweden.
46
Folkhälsan Research Centre, 00014 Helsingfors, Finland. 47Institute for Computing
and Information Sciences, Radboud University Nijmegen, Nijmegen, 6525 EC,
The Netherlands. 48Quantitative Genetics, QIMR Berghofer Medical Research Institute,
Brisbane, QLD 4029, Australia. 49Lifespan Psychology, Max Planck Institute for
Human Development, Berlin 14195, Germany. 50Department of Twin Research and
Genetic Epidemiology, King’s College London, London SE1 7EH, UK. 51NIHR
Biomedical Research Centre, Guy’s and St. Thomas’ Foundation Trust, London
SE1 7EH, UK. 52Estonian Genome Center, University of Tartu, Tartu 51010, Estonia.
53
Department of Epidemiology, University of Groningen, University Medical Center
Groningen, Groningen, 9700 RB, The Netherlands. 54Public Health Stream, Hunter
Medical Research Institute, New Lambton, NSW 2305, Australia. 55Faculty of Health
and Medicine, University of Newcastle, Newcastle, NSW 2300, Australia. 56Centre for
Integrated Genomic Medical Research, Institute of Population Health, The University
of Manchester, Manchester M13 9PT, UK. 57School of Psychological Sciences,
The University of Manchester, Manchester M13 9PL, UK. 58Department of Health,
THL-National Institute for Health and Welfare, 00271 Helsinki, Finland. 59Psychiatry,
VU University Medical Center & GGZ inGeest, Amsterdam, 1081 HL, The
Netherlands. 60Laboratory of Genetics, National Institute on Aging, Baltimore,
Maryland 21224, USA. 61Research Centre of Applied and Preventive Cardiovascular
Medicine, University of Turku, 20521 Turku, Finland. 62Department of Medical
Genetics, University of Lausanne, Lausanne 1005, Switzerland. 63Swiss Institute of
Bioinformatics, Lausanne 1015, Switzerland. 64Department Of Health Sciences,
University of Milan, Milano 20142, Italy. 65Institute for Medical Informatics, Biometry
and Epidemiology, University Hospital of Essen, Essen 45147, Germany. 66Centre
for Global Health Research, The Usher Institute for Population Health Sciences
and Informatics, University of Edinburgh, Edinburgh EH8 9AG, UK. 67Division of
Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland
20892-9780, USA. 68Icelandic Heart Association, Kopavogur 201, Iceland.
69
Faculty of Medicine, University of Iceland, Reykjavik 101, Iceland. 70MRC
Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK. 71School of
Oral and Dental Sciences, University of Bristol, Bristol BS1 2LY, UK. 72Institute for
Community Medicine, University Medicine Greifswald, Greifswald 17475, Germany.
73
Department of Cardiology, University Medical Center Groningen, University of
Groningen, Groningen, 9700 RB, The Netherlands. 74Institute of Epidemiology and
Social Medicine, University of Münster, Münster 48149, Germany. 75Divisions of
Genetics and Rheumatology, Department of Medicine, Brigham and Women’s
Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA. 76Partners
Center for Personalized Genetic Medicine, Boston, Massachusetts 02115, USA.
77
Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois
60612, USA. 78Department of Neurological Sciences, Rush University Medical
Center, Chicago, Illinois 60612, USA. 79Department of Epidemiology, University of
Michigan, Ann Arbor, Michigan 48109, USA. 80Department of Gastroenterology
and Hepatology, University of Groningen, University Medical Center Groningen,
Groningen, 9713 GZ, The Netherlands. 81Institute of Epidemiology and Preventive
Medicine, University of Regensburg, Regensburg D-93053, Germany. 82Institute of
Molecular Genetics, National Research Council of Italy, Pavia 27100, Italy.
83
Department of Behavioral Sciences, Rush University Medical Center, Chicago,
Illinois 60612, USA. 84Warwick Medical School, University of Warwick, Coventry CV4
7AL, UK. 85Department of Psychology, University of Edinburgh, Edinburgh EH8 9JZ, UK.
86
Saïd Business School, University of Oxford, Oxford OX1 1HP, UK. 87William Harvey
Research Institute, Barts and The London School of Medicine and Dentistry, Queen
Mary University of London, London EC1M 6BQ, UK. 88Princess Al-Jawhara
Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD),
King Abdulaziz University, Jeddah 21589, Saudi Arabia. 89The Berlin Aging Study II;
Research Group on Geriatrics, Charité – Universitätsmedizin Berlin, Germany,
Berlin 13347, Germany. 90Institute of Medical and Human Genetics, CharitéUniversitätsmedizin, Berlin, Berlin 13353, Germany. 91German Socio- Economic
Panel Study, DIW Berlin, Berlin 10117, Germany. 92Health Economics Research
Centre, Nuffield Department of Population Health, University of Oxford, Oxford
OX3 7LF, UK. 93The University of Queensland Diamantina Institute, The Translational
Research Institute, Brisbane, QLD 4102, Australia. 94Survey Research Center,
Institute for Social Research, University of Michigan, Ann Arbor, Michigan 48109,
USA. 95Department of Genetics, Division of Statistical Genomics, Washington
University School of Medicine, St. Louis, Missouri 63018, USA. 96Institute of Human
Genetics, University of Bonn, Bonn 53127, Germany. 97Department of Genomics,
Life and Brain Center, University of Bonn, Bonn 53127, Germany. 98Institute of
Biomedical and Neural Engineering, School of Science and Engineering, Reykjavik
University, Reykjavik 101, Iceland. 99Laboratory of Epidemiology, Demography,
National Institute on Aging, National Institutes of Health, Bethesda, Maryland
20892-9205, USA. 100Department of Psychiatry, Washington University School of
Medicine, St. Louis, Missouri 63110, USA. 101Division of Applied Health Sciences,
University of Aberdeen, Aberdeen AB25 2ZD, UK. 102Interfaculty Institute for
Genetics and Functional Genomics, University Medicine Greifswald, Greifswald
17475, Germany. 103Manchester Medical School, The University of Manchester,
Manchester M13 9PT, UK. 104Program in Translational NeuroPsychiatric Genomics,
Departments of Neurology & Psychiatry, Brigham and Women’s Hospital, Boston,
Massachusetts 02115, USA. 105Harvard Medical School, Boston, Massachusetts
02115, USA. 106Department of Genes and Environment, Norwegian Institute of
Public Health, N-0403 Oslo, Norway. 107Department of Genomics of Common
Disease, Imperial College London, London, W12 0NN, UK. 108Department of Clinical
Physiology, Tampere University Hospital, 33521 Tampere, Finland. 109Department of
Clinical Physiology, University of Tampere, School of Medicine, 33014 Tampere,
Finland. 110Public Health, Medical School, University of Split, 21000 Split, Croatia.
111
Institute of Social and Preventive Medicine, Lausanne University Hospital (CHUV),
Lausanne 1010, Switzerland. 112Neuroepidemiology Section, National Institute on
Aging, National Institutes of Health, Bethesda, Maryland 20892-9205, USA.
113
Amsterdam Brain and Cognition Center, University of Amsterdam, Amsterdam,
1018 XA, The Netherlands. 114Department of Psychiatry and Behavioral Sciences,
Stanford University, Stanford, California 94305-5797, USA. 115Institute of Human
Genetics, Helmholtz Zentrum München, German Research Center for Environmental
Health, Neuherberg 85764, Germany. 116Medical Genetics Section, Centre for
Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine,
University of Edinburgh, Edinburgh, EH4 2XU, UK. 117Department of Internal
Medicine, Internal Medicine, Lausanne University Hospital (CHUV), Lausanne 1011,
Switzerland. 118Tema BV, Hoofddorp, 2131 HE, The Netherlands. 119Molecular
Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029,
Australia. 120Institute of Health and Biomedical Innovation, Queensland Institute
of Technology, Brisbane, QLD 4059, Australia. 121Analytic and Translational
Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston,
Massachusetts 02114, USA. 122The Stanley Center for Psychiatric Research, Broad
Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. 123Psychiatric
& Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts
General Hospital, Boston, Massachusetts 02114, USA. 124Institute for Molecular
Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland.
125
Department of Neurology, Massachusetts General Hospital, Boston,
Massachusetts 02114, USA. 126Medical Genetics, Institute for Maternal and Child
Health IRCCS “Burlo Garofolo”, Trieste 34100, Italy. 127Social Impact, Arlington,
Virginia 22201, USA. 128Department of Economics, University of Minnesota Twin
Cities, Minneapolis, Minnesota 55455, USA. 129Department of Psychiatry and
Behavioral Sciences, NorthShore University HealthSystem, Evanston, Illinois
60201-3137, USA. 130Department of Psychiatry and Behavioral Neuroscience,
University of Chicago, Chicago, Illinois 60637, USA. 131Public Health Genomics
Unit, National Institute for Health and Welfare, 00300 Helsinki, Finland. 132Research
Unit for Genetic Epidemiology, Institute of Molecular Biology and Biochemistry,
Center of Molecular Medicine, General Hospital and Medical University, Graz, Graz
0 0 M O N T H 2 0 1 6 | VO L 0 0 0 | NAT U R E | 5
RESEARCH LETTER
8010, Austria. 133Information Based Medicine Stream, Hunter Medical Research
Institute, New Lambton, NSW 2305, Australia. 134Medical Research Institute,
University of Dundee, Dundee DD1 9SY, UK. 135Research Unit Hypertension and
Cardiovascular Epidemiology, Department of Cardiovascular Science, University
of Leuven, Leuven 3000, Belgium. 136R&D VitaK Group, Maastricht University,
Maastricht, 6229 EV, The Netherlands. 137Institute of Genetic Epidemiology,
Helmholtz Zentrum München, German Research Center for Environmental Health,
Neuherberg 85764, Germany. 138Institute of Medical Informatics, Biometry and
Epidemiology, Chair of Genetic Epidemiology, Ludwig Maximilians-Universität,
Munich 81377, Germany. 139Department of Geriatrics, Florida State University
College of Medicine, Tallahassee, Florida 32306, USA. 140Department of Health
Sciences and Genetics, University of Leicester, Leicester LE1 7RH, UK.
141
Department of Internal Medicine, Erasmus Medical Center, Rotterdam, 3015 GE,
The Netherlands. 142Research Center for Group Dynamics, Institute for Social
Research, University of Michigan, Ann Arbor, Michigan 48104, USA. 143Platform for
Genome Analytics, Institutes of Neurogenetics & Integrative and Experimental
Genomics, University of Lübeck, Lübeck 23562, Germany. 144Neuroepidemiology
and Ageing Research Unit, School of Public Health, Faculty of Medicine, Imperial
College of Science, Technology and Medicine, London SW7 2AZ, UK. 145Department
of Health Sciences, Community & Occupational Medicine, University of Groningen,
University Medical Center Groningen, Groningen, 9713 AV, The Netherlands.
146
Department of Psychology, Union College, Schenectady, New York 12308, USA.
147
Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle
Ricerche, c/o Cittadella Universitaria di Monserrato, Monserrato, Cagliari 9042, Italy.
148
Institute of Biomedical Technologies, Italian National Research Council, Segrate
(Milano) 20090, Italy. 149Department of General Practice and Primary Health Care,
University of Helsinki, 00014 Helsinki, Finland. 150Departments of Human Genetics
and Psychiatry, Donders Centre for Neuroscience, Nijmegen, 6500 HB, The
Netherlands. 151Department of Genetics, University Medical Center Groningen,
University of Groningen, Groningen, 9700 RB, The Netherlands. 152Sidra,
Experimental Genetics Division, Sidra, Doha 26999, Qatar. 153Department of
Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald 17475,
Germany. 154Department of Psychiatry and Psychotherapy, HELIOS-Hospital
Stralsund, Stralsund 18437, Germany. 155Econometric Institute, Erasmus School of
Economics, Erasmus University Rotterdam, Rotterdam, 3062 PA, The Netherlands.
156
Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute,
Utrecht, 1105 AZ, The Netherlands. 157Generation Scotland, Centre for Genomics
and Experimental Medicine, Institute of Genetics and Molecular Medicine, University
6 | NAT U R E | VO L 0 0 0 | 0 0 M O N T H 2 0 1 6
of Edinburgh, Edinburgh EH4 2XU, UK. 158Centre for Population Health Research,
School of Health Sciences and Sansom Institute, University of South Australia,
Adelaide, SA 5000, Australia. 159South Australian Health and Medical Research
Institute, Adelaide, SA 5000, Australia. 160Population, Policy and Practice, UCL
Institute of Child Health, London WC1N 1EH, UK. 161Department of Epidemiology
and Biostatistics, MRC-PHE Centre for Environment & Health, School of Public
Health, Imperial College London, London W2 1PG, UK. 162Center for Life Course
Epidemiology, Faculty of Medicine, University of Oulu, 90014 Oulu, Finland. 163Unit
of Primary Care, Oulu University Hospital, 90029 Oulu, Finland. 164Biocenter Oulu,
University of Oulu, 90014 Oulu, Finland. 165Fimlab Laboratories, 33520 Tampere,
Finland. 166Department of Clinical Chemistry, University of Tampere, School of
Medicine, 33014 Tampere, Finland. 167Economics, NYU Shanghai, 200122 Pudong,
China. 168Policy Studies, Queen’s University, Kingston, Ontario K7L 3N6, Canada.
169
Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane,
QLD 4029, Australia. 170Institute of Molecular and Cell Biology, University of Tartu,
Tartu 51010, Estonia. 171Centre for Clinical and Cognitive Neuroscience, Institute
Brain Behaviour and Mental Health, Salford Royal Hospital, Manchester M6 8HD,
UK. 172Manchester Institute for Collaborative Research in Ageing, University of
Manchester, Manchester M13 9PL, UK. 173Faculty of Medicine, University of Split,
Split 21000, Croatia. 174Department of Clinical Genetics, VU Medical Centre,
Amsterdam, 1081 HV, The Netherlands. 175Institute of Preventive Medicine.
Bispebjerg and Frederiksberg Hospitals, The Capital Region, Frederiksberg 2000,
Denmark. 176Montpellier Business School, Montpellier 34080, France. 177Panteia,
Zoetermeer, 2715 CA, The Netherlands. 178Department of Psychiatry, Erasmus
Medical Center, Rotterdam, 3015 GE, The Netherlands. 179Department of Child
and Adolescent Psychiatry, Erasmus Medical Center, Rotterdam, 3015 GE, The
Netherlands. 180Department of Internal Medicine, Erasmus Medical Center,
Rotterdam, 3015 GE, The Netherlands. 181Department of Sociology, New York
University, New York, New York 10012, USA. 182School of Medicine, New York
University, New York, New York 10016, USA. 183Bioethics Program, Union Graduate
College – Icahn School of Medicine at Mount Sinai, Schenectady, New York 12308,
USA. 184Department of Economics, Stockholm School of Economics, Stockholm
113 83, Sweden. 185Department of Genetics, Harvard Medical School, Boston,
Massachusetts 02115, USA. 186Research Institute for Industrial Economics,
Stockholm 10215, Sweden.
*These authors contributed equally to this work.
§These authors jointly supervised this work.
†A list of participants and affiliations appears in the Supplementary Information.
LETTER RESEARCH
Extended Data Figure 1 | Q–Q plot of the genome-wide association
meta-analysis of 64 EduYears results files (n = 293,723). Observed
and expected P values are on a −log10 scale (two-tailed). The grey
region depicts the 95% confidence interval under the null hypothesis of
a uniform P value distribution. The observed λGC is 1.28. (As reported
in Supplementary Information section 1.5.4, the unweighted mean λGC
is 1.02, the unweighted median is 1.01, and the range across cohorts is
0.95–1.15.)
RESEARCH LETTER
Extended Data Figure 2 | The distribution of effect sizes of the 74 lead
SNPs. a, SNPs ordered by absolute value of the standardized effect of
one more copy of the education-increasing allele, with 95% confidence
intervals. b, SNPs ordered by R2. Effects on EduYears are benchmarked
against the top 74 genome-wide significant hits identified in the largest
GWAS conducted to date of height and body mass index (BMI), and the
48 associations reported for waist-to-hip ratio adjusted for BMI (WHR).
These results are based on the GIANT consortium’s publicly available
results for pooled analyses restricted to European-ancestry individuals:
https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_
consortium.
LETTER RESEARCH
Extended Data Figure 3 | Assessing the extent to which population
stratification affects the estimates from the GWAS. a, LD score
regression plot with the summary statistics from the GWAS. Each point
represents an LD score quantile for a chromosome (the x and y coordinates
of the point are the mean LD score and the mean χ2 statistic of variants
in that quantile). That the intercept is close to 1 and that the χ2 statistics
increase linearly with the LD scores suggest that the bulk of the inflation
in the χ2 statistics is due to true polygenic signal and not to population
stratification. b, Estimates and 95% confidence intervals from individuallevel and within-family regressions of EduYears on polygenic scores, for
scores constructed with sets of SNPs meeting different P value thresholds.
In addition to the analyses shown here, we conduct a sign concordance
test, and we decompose the variance of the polygenic score. Overall, these
analyses suggest that population stratification is unlikely to be a major
concern for our 74 lead SNPs. See Supplementary Information section 3
for additional details.
RESEARCH LETTER
Extended Data Figure 4 | Replication of 74 lead SNPs in the UK
Biobank data. Estimated effect sizes (in years of schooling) and 95%
confidence intervals of the 74 lead SNPs in the meta-analysis sample
(n = 293,723) and the UK Biobank replication sample (n = 111,349).
The reference allele is the allele associated with higher values of EduYears
in the meta-analysis sample. SNPs are in descending order of R2 in the
meta-analysis sample. Of the 74 lead SNPs, 72 have the anticipated sign
in the replication sample, 52 replicate at the 0.05 significance level, and
7 replicate at the 5 × 10−8 significance level.
LETTER RESEARCH
Extended Data Figure 5 | Q–Q plots for the 74 lead EduYears SNPs
(or LD proxies) in published GWAS of other phenotypes. SNPs with
concordant effects on both phenotypes are pink, and SNPs with discordant
effects are blue. SNPs outside the grey area pass Bonferroni-corrected
significance thresholds that correct for the total number of SNPs we
tested (P < 0.05/74 = 6.8 × 10−4) and are labelled with their rs numbers.
Observed and expected P values are on a −log10 scale. For the sign
concordance test: *P < 0.05, **P < 0.01 and ***P < 0.001.
RESEARCH LETTER
Extended Data Figure 6 | Regional association plots for four of
the ten prioritized SNPs for mental health, brain anatomy, and
anthropometric phenotypes identified using EduYears as a proxy
phenotype. a, Cognitive performance; b, hippocampus; c, intracranial
volume; d, neuroticism. The four were selected because very few
genome-wide significant SNPs have been previously reported for these
traits. Data sources and methods are described in Supplementary
Information section 3. The R2 values are from the hg19 / 1000 Genomes
Nov 2014 EUR references samples. The figures were created with
LocusZoom (http://csg.sph.umich.edu/locuszoom/). Mb, megabases.
LETTER RESEARCH
Extended Data Figure 7 | Application of fgwas to EduYears.
See Supplementary Information section 4.2 for further details. a, The
results of single-annotation models. ‘Enrichment’ refers to the factor
by which the prior odds of association at an LD-defined region must be
multiplied if the region bears the given annotation; this factor is estimated
using an empirical Bayes method applied to all SNPs in the GWAS
meta-analysis regardless of statistical significance. Annotations were
derived from ENCODE and a number of other data sources. Plotted
are the base 2 logarithms of the enrichments and their 95% confidence
intervals. Multiple instances of the same annotation correspond to
independent replicates of the same experiment. b, The results of
combining multiple annotations and applying model selection and crossvalidation. Although the maximum-likelihood estimates are plotted,
model selection was performed with penalized likelihood. c, Reweighting
of GWAS loci. Each point represents an LD-defined region of the genome,
and shown are the regional posterior probabilities of association (PPAs).
The x axis gives the PPA calculated from the GWAS summary statistics
alone, whereas the y axis gives the PPA upon reweighting on the basis of
the annotations in b. The orange points represent genomic regions where
the PPA is equivalent to the standard GWAS significance threshold only
upon reweighting.
RESEARCH LETTER
Extended Data Figure 8 | Tissue-level biological annotation. a, The
enrichment factor for a given tissue type is the ratio of variance explained
by SNPs in that group to the overall fraction of SNPs in that group.
To benchmark the estimates for EduYears, we compare the enrichment
factors to those obtained when we use the largest GWAS conducted to date
on BMI, height, and waist-to-hip ratio adjusted for BMI. The estimates
were produced with the LDSC Python software, using the LD scores and
functional annotations introduced in ref. 17 and the HapMap3 SNPs with
minor allele frequency >0.05. Each of the ten enrichment calculations for
a particular cell type is performed independently, while each controlling
for the 52 functional annotation categories in the full baseline model. The
error bars show the 95% confidence intervals. b, We took measurements of
gene expression by the Genotype-Tissue Expression (GTEx) Consortium
and determined whether the genes overlapping EduYears-associated
loci are significantly overexpressed (relative to genes in random sets of
loci matched by gene density) in each of 37 tissue types. These types are
grouped in the panel by organ. The coloured bars correspond to tissues
where there is significant overexpression. The y axis is the significance
on a −log10 scale.
LETTER RESEARCH
Extended Data Figure 9 | Gene-level biological annotation. a, The
DEPICT-prioritized genes for EduYears measured in the BrainSpan
Developmental Transcriptome data (red curve) are more strongly
expressed in the brain prenatally rather than post-natally. The DEPICTprioritized genes exhibit similar gene expression levels across different
brain regions (grey lines). Analyses were based on log2-transformed
RNA-seq data. Error bars represent 95% confidence intervals. b, For
each phenotype and disorder, we calculated the overlap between the
phenotype’s DEPICT-prioritized genes and genes believed to harbour
de novo mutations causing the disorder. The bars correspond to odds
ratios. c, DEPICT-prioritized genes in EduYears-associated loci exhibit
substantial overlap with genes previously reported to harbour sites where
mutations increase risk of intellectual disability and autism spectrum
disorder (Supplementary Table 4.6.1).
RESEARCH LETTER
Extended Data Figure 10 | The predictive power of a polygenic score
(PGS) varies in Sweden by birth cohort. Five-year rolling regressions
of years of education on the PGS (left axis in all four panels), share of
individuals not affected by the comprehensive school reform (a, right
axis), and average distance to nearest junior high school (b, right axis),
nearest high school (c, right axis) and nearest college/university (d, right
axis). The shaded area displays the 95% confidence intervals for the
PGS effect.
LETTER RESEARCH
Author Queries
Journal: Nature
Paper: nature17671
Title: Genome-wide association study identifies 74 loci associated with educational attainment
Query
Reference
Query
1
AUTHOR: A PDF proof will be produced on the basis of your corrections to the preproof and will
contain the main-text figures edited by us and the Extended Data items supplied by you (which
may have been resized but will not have been edited otherwise by us).
When you receive the PDF proof, please check that the display items are as follows (doi:10.1038/
nature17671): Figs 2 (black & white); 1, 3 (colour); Tables: 1; Boxes: None; Extended Data
display items: Figs 1–10.
Please check the edits to all main-text figures (and tables, if any) very carefully, and ensure that
any error bars in the figures are defined in the figure legends. If you wish to revise the Extended
Data items for consistency with main-text figures and tables, please copy the style shown in the
PDF proof (such as italicising variables and gene symbols, and using initial capitals for labels)
and return the revised Extended Data items to us along with your proof corrections.
2
AUTHOR: Updated journal title in ref. 15 correct?.
Web
summary
A genome-wide association study in 293,723 individuals identifies 74 genetic variants associated
with educational attainment, which, although only explaining a small proportion of the variation
in educational attainment, highlights candidate genes and pathways for further study.
SUBJECT WORDS
Biological sciences/Neuroscience/Learning and memory [URI /631/378/1595];Health sciences/Diseases/
Psychiatric disorders [URI /692/699/476];Biological sciences/Genetics/Genetic association study/Genomewide association studies [URI /631/208/205/2138];Biological sciences/Genetics/Behavioural genetics
[URI /631/208/1515].
TECHNIQUE TERMS
Techniques: Life sciences techniques, Genomic analysis [Genome-wide association studies].
For Nature office use only:
Layout
DOI
Title
Authors
Addresses
First para
%
%
%
%
%
%
Figures/Tables/Boxes
Error bars
Colour
Text
Methods
Received/Accepted
AOP
Extended Data
%
%
%
%
%
%
%
%
References
Supp info
Acknowledgements
Author contribs
COI
Correspondence
Author corrx
Web summary
Accession codes link
%
%
%
%
%
%
%
%
%