Abstract
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73â87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 Ã 10â16). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003â1007 (2002).
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949â951 (2004).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444â454 (2006).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56â64 (2008).
Fanciulli, M. et al. FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat. Genet. 39, 721â723 (2007).
Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851â855 (2006).
Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434â1440 (2005).
Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439â448 (2006).
Yang, Y. et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am. J. Hum. Genet. 80, 1037â1054 (2007).
Hollox, E.J. et al. Psoriasis is associated with increased beta-defensin genomic copy number. Nat. Genet. 40, 23â25 (2008).
Estivill, X. et al. Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. Hum. Mol. Genet. 11, 1987â1995 (2002).
Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275â290 (2006).
Cooper, G.M., Zerr, T., Kidd, J.M., Eichler, E.E. & Nickerson, D.A. Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat. Genet. 40, 1199â1203 (2008).
Locke, D.P. et al. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13, 347â357 (2003).
Sharp, A.J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78â88 (2005).
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727â732 (2005).
Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420â426 (2007).
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60â65 (2008).
Campbell, P.J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722â729 (2008).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872â876 (2008).
Chiang, D.Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99â103 (2009).
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53â59 (2008).
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851â1858 (2008).
Hillier, L.W. et al. Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods 5, 183â188 (2008).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403â410 (1990).
Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707â710 (1966).
Ukkonen, E. On approximate string matching. in Fundamentals of Computation Theory, Proceedings of the 1983 International FCT Conference 487â495 (Springer-Verlag, London, 1983).
Smit, A.F.A., Hubley, R. & Green, P. RepeatMasker Open-3.0. http://www.repeatmasker.org (1996â2004).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573â580 (1999).
Morgulis, A., Gertz, E.M., Schaffer, A.A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134â141 (2006).
Smith, D.R. et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res. 18, 1638â1642 (2008).
She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927â930 (2004).
Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA 101, 1916â1921 (2004).
Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. & Eichler, E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005â1017 (2001).
McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166â1174 (2008).
Lackner, C., Cohen, J.C. & Hobbs, H.H. Molecular definition of the extreme size polymorphism in apolipoprotein(a). Hum. Mol. Genet. 2, 933â940 (1993).
Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61âD65 (2007).
Jiang, Z. et al. Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat. Genet. 39, 1361â1368 (2007).
Marques-Bonet, T. et al. A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457, 877â881 (2009).
Lichter, P. et al. High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. Science 247, 64â69 (1990).
Acknowledgements
We thank D. Bentley for early access to the Illumina WGS dataset for NA18507; J. Wang for the YH DNA and the cell line; M. Egholm and B. Simen for the JDW DNA and J.D. Watson for permission to analyze his genome. We also thank M. Shumway, P. Flicek and R. Leinonen for technical assistance in transferring large sequence datasets; E. Tüzün for help in parallelizing mrFAST for computation clusters through message passing interface; S. Girirajan for assistance with experiments and T. Brown for her help in manuscript preparation. J.M.K. is supported by a US National Science Foundation Graduate Research Fellowship. T.M.-B. is supported by a Marie Curie fellowship (FP7). This work was supported, in part, by U.S. National Institutes of Health grant HG004120 to E.E.E. E.E.E. is an investigator of the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Contributions
C.A., J.M.K., T.M.-B. and E.E.E. designed the study, performed analytical work and wrote the manuscript. C.A., F.H. and O.M. designed and implemented the mrFAST algorithm. C.A., J.M.K., G.A. and J.O.K. performed computational analysis. T.M.-B., F.A., C.B. and M.M. performed validation experiments. R.A.G. advised on handling of JDW data analysis. S.C.S. and E.E.E. obtained funding for the study.
Corresponding author
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Figures 1â7, and Supplementary Tables 1â3 and 6 (PDF 2725 kb)
Supplementary Table 4
Estimated diploid copy number for 17,601 autosomal coding genes (XLS 3961 kb)
Supplementary Table 5
Individual exons which are estimated to be copy-number variable among the three analyzed individuals (XLS 631 kb)
Rights and permissions
About this article
Cite this article
Alkan, C., Kidd, J., Marques-Bonet, T. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41, 1061â1067 (2009). https://doi.org/10.1038/ng.437
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.437
This article is cited by
-
Pattern matching for high precision detection of LINE-1s in human genomes
BMC Bioinformatics (2022)
-
PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm
Journal of Computer Science and Technology (2021)
-
Hierarchical discovery of large-scale and focal copy number alterations in low-coverage cancer genomes
BMC Bioinformatics (2020)
-
Genetic profiling of primary and secondary tumors from patients with lung adenocarcinoma and bone metastases reveals targeted therapy options
Molecular Medicine (2020)
-
An enrichment method to increase cell-free fetal DNA fraction and significantly reduce false negatives and test failures for non-invasive prenatal screening: a feasibility study
Journal of Translational Medicine (2019)