Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

An Optimal O(nm) Algorithm for Enumerating All Walks Common to All Closed Edge-covering Walks of a Graph

Published: 29 July 2019 Publication History

Abstract

In this article, we consider the following problem. Given a directed graph G, output all walks of G that are sub-walks of all closed edge-covering walks of G. This problem was first considered by Tomescu and Medvedev (RECOMB 2016), who characterized these walks through the notion of omnitig. Omnitigs were shown to be relevant for the genome assembly problem from bioinformatics, where a genome sequence must be assembled from a set of reads from a sequencing experiment. Tomescu and Medvedev (RECOMB 2016) also proposed an algorithm for listing all maximal omnitigs, by launching an exhaustive visit from every edge.
In this article, we prove new insights about the structure of omnitigs and solve several open questions about them. We combine these to achieve an O(nm)-time algorithm for outputting all the maximal omnitigs of a graph (with n nodes and m edges). This is also optimal, as we show families of graphs whose total omnitig length is Ω(nm). We implement this algorithm and show that it is 9--12 times faster in practice than the one of Tomescu and Medvedev (RECOMB 2016).

References

[1]
Endre Boros, Martin C. Golumbic, and Vadim E. Levit. 2002. On the number of vertices belonging to all maximum stable sets. Discrete Appl. Math. 124, 1--3 (2002), 17--25.
[2]
assimo Cairo, Paul Medvedev, Nidia Obscura Acosta, Romeo Rizzi, and Alexandru I. Tomescu. 2017. Optimal omnitig listing for safe and complete contig assembly. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4--6, 2017, Warsaw, Poland (LIPIcs), Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter (Eds.), Vol. 78. Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, 29:1--29:12.
[3]
Katarína Cechlárová. 1998. Persistency in the assignment and transportation problems. Mat. Meth. OR 47, 2 (1998), 243--254.
[4]
Kun-Mao Chao, Ross C. Hardison, and Webb Miller. 1993. Locating well-conserved regions within a pairwise alignment. CABIOS 9, 4 (1993), 387--396. arXiv:http://bioinformatics.oxfordjournals.org/content/9/4/387.full.pdf+html
[5]
Marie Costa. 1994. Persistency in maximum cardinality bipartite matchings. Oper. Res. Lett. 15, 3 (1994), 143--9.
[6]
Marie-Christine Costa, Dominique de Werra, and Christophe Picouleau. 2011. Minimum d-blockers and d-transversals in graphs. J. Comb. Optim. 22, 4 (2011), 857--872.
[7]
David Eppstein. 2015. K-best enumeration. Bulletin of the EATCS 115 (2015). Retrieved from http://eatcs.org/beatcs/index.php/beatcs/article/view/322.
[8]
Donatella Firmani, Giuseppe F. Italiano, Luigi Laura, Alessio Orlandi, and Federico Santaroni. 2012. Computing strong articulation points and strong bridges in large scale graphs. In Experimental Algorithms, Ralf Klasing (Ed.). Springer Berlin, Berlin Germany, 195--207.
[9]
A. Friemann and S. Schmitz. 1992. A new approach for displaying identities and differences among aligned amino acid sequences.Comput. Appl. Biosci. 8, 3 (Jun. 1992), 261--265.
[10]
R. M. Idury and M. S. Waterman. 1995. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2, 2 (1995), 291--306.
[11]
Giuseppe F. Italiano, Luigi Laura, and Federico Santaroni. 2012. Finding strong bridges and strong articulation points in linear time. Theor.Comput. Sci. 447 (Aug. 2012), 74--84.
[12]
Iu, V. L. Florent’ev, A. A. Khorlin, K. R. Khrapko, and V. V. Shik. 1988. Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method. Doklady Akademii nauk SSSR 303, 6 (1988), 1508--1511.
[13]
Benjamin Grant Jackson. 2009. Parallel Methods for Short Read Assembly. Ph.D. Dissertation. Iowa State University.
[14]
Evgeny Kapun and Fedor Tsarev. 2013. De Bruijn superwalk with multiplicities problem is NP-hard. BMC Bioinf. 14, Suppl 5 (2013), S7.
[15]
John D. Kececioglu and Eugene W. Myers. 1995. Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 1/2 (1995), 7--51.
[16]
Carl Kingsford, Michael C. Schatz, and Mihai Pop. 2010. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinf. 11, 1 (2010), 21.
[17]
Paul Medvedev and Michael Brudno. 2009. Maximum likelihood genome assembly.J. Comput. Biol. 16, 8 (2009), 1101--1116.
[18]
Paul Medvedev, Konstantinos Georgiou, Gene Myers, and Michael Brudno. 2007. Computability of models for sequence assembly. In Proceedings of the 7th International Workshop on Algorithms in Bioinformatics (WABI 2007), Philadelphia, PA, September 8--9, 2007 (Lecture Notes in Computer Science), Raffaele Giancarlo and Sridhar Hannenhalli (Eds.), Vol. 4645. Springer, 289--301.
[19]
Gene Myers. 2014. Efficient local alignment discovery amongst noisy long reads. In Proceedings of the 14th International Workshop on Algorithms in Bioinformatics (WABI 2014), Wroclaw, Poland, September 8-10, 2014. (Lecture Notes in Computer Science), Daniel G. Brown and Burkhard Morgenstern (Eds.), Vol. 8701. Springer, 52--67.
[20]
Niranjan Nagarajan and Mihai Pop. 2009. Parametric complexity of sequence assembly: Theory and applications to next generation sequencing. J. Comput. Biol. 16, 7 (2009), 897--908.
[21]
Giuseppe Narzisi, Bud Mishra, and Michael C. Schatz. 2014. On algorithmic complexity of biomolecular sequence assembly problem. In Proceedings of the 1st International Conference on Algorithms for Computational Biology (AlCoB 2014), Tarragona, Spain, July 1--3, 2014 (Lecture Notes in Computer Science), Adrian Horia Dediu, Carlos Martín-Vide, and Bianca Truthe (Eds.), Vol. 8542. Springer, 183--195.
[22]
Foad Mahdavi Pajouh, Vladimir Boginski, and Eduardo L. Pasiliao. 2014. Minimum vertex blocker clique problem. Networks 64, 1 (2014), 48--64.
[23]
Pavel A. Pevzner. 1989. L-tuple DNA sequencing: Computer analysis. J. Biomol. Struct. Dyn. 7, 1 (Aug. 1989), 63--73.
[24]
Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. 2001. An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 98, 17 (2001), 9748--9753.
[25]
Leena Salmela and Alexandru I. Tomescu. 2019. Safely filling gaps with partial solutions common to all solutions. IEEE/ACM Trans. Comput. Biol. Bioinf. 16, 2 (2019), 617--626.
[26]
Jared T. Simpson and Richard Durbin. 2012. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 3 (2012), 549--556.
[27]
Alexandru I. Tomescu and Paul Medvedev. 2016. Safe and complete contig assembly via omnitigs. In Proceedings of the 20th Annual Conference on Research in Computational Molecular Biology (RECOMB 2016), Santa Monica, CA, April 17--21, 2016, (Lecture Notes in Computer Science), Mona Singh (Ed.), Vol. 9649. Springer, 152--163.
[28]
Martin Vingron. 1996. Near-optimal sequence alignment. Curr. Opin. Struct. Biol. 6, 3 (June 1996), 346--352.
[29]
Martin Vingron and Patrick Argos. 1990. Determination of reliable regions in protein sequence alignments. Protein Eng. 3, 7 (1990), 565--569. arXiv:http://peds.oxfordjournals.org/content/3/7/565.full.pdf+html
[30]
Michael S. Waterman. 1995. Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman 8 Hall / CRC Press.
[31]
Rico Zenklusen, Bernard Ries, Christophe Picouleau, Dominique de Werra, Marie-Christine Costa, and Cédric Bentz. 2009. Blockers and transversals. Discrete Math. 309, 13 (2009), 4306--4314.
[32]
M. Zuker. 1991. Suboptimal sequence alignment in molecular biology. Alignment with error analysis. J. Mol. Biol. 221, 2 (Sept. 1991), 403--420.

Cited By

View all

Index Terms

  1. An Optimal O(nm) Algorithm for Enumerating All Walks Common to All Closed Edge-covering Walks of a Graph

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Algorithms
      ACM Transactions on Algorithms  Volume 15, Issue 4
      October 2019
      297 pages
      ISSN:1549-6325
      EISSN:1549-6333
      DOI:10.1145/3351875
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 July 2019
      Accepted: 01 June 2019
      Revised: 01 June 2019
      Received: 01 January 2019
      Published in TALG Volume 15, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Genome assembly
      2. edge-covering walk
      3. graph algorithm
      4. safe and complete algorithm
      5. strong bridge

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)123
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 01 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Simplicity in Eulerian circuitsInformation Processing Letters10.1016/j.ipl.2023.106421183:COnline publication date: 1-Jan-2024
      • (2023)Genome Assembly, from Practice to Theory: Safe, Complete and Linear-TimeACM Transactions on Algorithms10.1145/363217620:1(1-26)Online publication date: 8-Nov-2023
      • (2023)A safety framework for flow decomposition problems via integer linear programmingBioinformatics10.1093/bioinformatics/btad64039:11Online publication date: 20-Oct-2023
      • (2022)Improving RNA Assembly via Safety and Completeness in Flow DecompositionsJournal of Computational Biology10.1089/cmb.2022.026129:12(1270-1287)Online publication date: 1-Dec-2022
      • (2022)Safety in s-t Paths, Trails and WalksAlgorithmica10.1007/s00453-021-00877-w84:3(719-741)Online publication date: 1-Mar-2022
      • (2022)Safety and Completeness in Flow Decompositions for RNA AssemblyResearch in Computational Molecular Biology10.1007/978-3-031-04749-7_11(177-192)Online publication date: 29-Apr-2022
      • (2021)Safety in multi-assembly via paths appearing in all path covers of a DAGIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.3131203(1-1)Online publication date: 2021
      • (2020)Strong Connectivity in Directed Graphs under Failures, with ApplicationsSIAM Journal on Computing10.1137/19M125853049:5(865-926)Online publication date: 1-Sep-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media