Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

The greedy path-merging algorithm for contig scaffolding

Published: 01 September 2002 Publication History
  • Get Citation Alerts
  • Abstract

    Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project.

    References

    [1]
    Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nuc. Acids Res. 28, 1, 15--8.
    [2]
    Bevington, P. R. 1969. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, Inc., New York.
    [3]
    Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-completeness. Bell Telephone Laboratories, Inc.
    [4]
    Green, P. 1994. Documentation for Phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap.html.
    [5]
    Huson, D. H., Reinert, K., Kravitz, S. A., Remington, K. A., Delcher, A. L., Dew, I. M., Flanigan, M., Halpern, A. L., Lai, Z., Mobarry, C. M., Sutton, G. G., and Myers, E. W. 2001. Design of a compartmentalized shotgun assembler for the human genome. Bioinformatics (Proceedings of ISMB 2001) 17, 132--139.
    [6]
    Huson, D. H., Reinert, K., and Myers, E. W. 2001b. The greedy path-merging algorithm for sequence assembly. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB-01), pp. 157--163.
    [7]
    International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409, 6822, 860--921.
    [8]
    Lander, E. S., and Waterman, M. S. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231--239.
    [9]
    Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H. J., Remington, K. A., Anson, E. L., Bolanos, R. A., Chou, H.-H., Jordan, C. M., Halpern, A. L., Lonardi, S., Beasley, E. M., Brandon, R. C., Chen, L., Dunn, P. J., Lai, Z., Liang, Y., Nusskern, D. R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G. M., Adams, M. D., and Venter, J. C. 2000. A whole-genome assembly of Drosophila. Science 287, 2196--2204.
    [10]
    Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., and Petersen, G. B. 1992. Nucleotide sequence of bacteriophage λ DNA. J. Mol. Bio. 162, 4, 729--773.
    [11]
    Sanger, F., Nicklen, S., and Coulson, A. R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Nat. Acad. Sci. 74, 12, 5463--5467.
    [12]
    U. S. Department of Energy, Office of Energy Research, and Office of Biological and Environmental Research. 1997. Human genome program report. http://www. ornl.gov/hgmis/publicat/97pr/.
    [13]
    Venter, J. C., Adams, M. D., Myers, E. W., et al. 2001. The sequence of the human genome. Science 291, 1145--1434.
    [14]
    Webber, J. L., and Myers, E. W. 1997. Human whole-genome shotgun sequencing. Gen. Res. 7, 5, 401--409.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 49, Issue 5
    September 2002
    137 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/585265
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2002
    Published in JACM Volume 49, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. Genome assembly

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Global exact optimisations for chloroplast structural haplotype scaffoldingAlgorithms for Molecular Biology10.1186/s13015-023-00243-119:1Online publication date: 6-Feb-2024
    • (2024)Maptcha: an efficient parallel workflow for hybrid genome scaffoldingBMC Bioinformatics10.1186/s12859-024-05878-425:1Online publication date: 8-Aug-2024
    • (2023)An Optimized Scaffolding Algorithm for Unbalanced SequencingNew Generation Computing10.1007/s00354-023-00221-641:3(553-579)Online publication date: 28-May-2023
    • (2023)Current Progress of Bioinformatics for Human HealthMethodologies of Multi-Omics Data Integration and Data Mining10.1007/978-981-19-8210-1_8(145-162)Online publication date: 16-Jan-2023
    • (2022)RegScaf: a regression approach to scaffoldingBioinformatics10.1093/bioinformatics/btac17438:10(2675-2682)Online publication date: 25-Mar-2022
    • (2021) Empirical evaluation of methods for de novo genome assembly PeerJ Computer Science10.7717/peerj-cs.6367(e636)Online publication date: 9-Jul-2021
    • (2021)SWALO: scaffolding with assembly likelihood optimizationNucleic Acids Research10.1093/nar/gkab717Online publication date: 20-Aug-2021
    • (2021)On the solution bound of two-sided scaffold fillingTheoretical Computer Science10.1016/j.tcs.2021.04.024873(47-63)Online publication date: Jun-2021
    • (2020)Sequencing and assembly of the Egyptian buffalo genomePLOS ONE10.1371/journal.pone.023708715:8(e0237087)Online publication date: 19-Aug-2020
    • (2019)Parameterized Algorithms in Bioinformatics: An OverviewAlgorithms10.3390/a1212025612:12(256)Online publication date: 1-Dec-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media