research-article

Open access

Data Structures to Represent a Set of k-long DNA Sequences

Authors:

Paul MedvedevAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 54, Issue 1

Article No.: 17, Pages 1 - 22

https://doi.org/10.1145/3445967

Published: 08 March 2021 Publication History

All formats PDF

Abstract

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique features and applications that, over the past 10 years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

References

[1]

Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, and Rob Patro. 2019. An efficient, scalable and exact representation of high-dimensional color information enabled via de Bruijn graph search. In Proceedings of the International Conference on Research in Computational Molecular Biology (Lecture Notes in Computer Science), Vol. 11467. Springer, 1--18.

[2]

Fatemeh Almodaresi, Prashant Pandey, and Rob Patro. 2017. Rainbowfish: A succinct colored de Bruijn graph representation. In WABI 2017: Algorithms in Bioinformatics (LIPIcs-Leibniz International Proceedings in Informatics), Russell Schwartz and Knut Reinert (Eds.), Vol. 88. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 18:1--18:15.

[3]

Fatemeh Almodaresi, Hirak Sarkar, Avi Srivastava, and Rob Patro. 2018. A space and time-efficient index for the compacted colored de Bruijn graph. Bioinformatics 34, 13 (2018), i169--i177.

[4]

Jørgen Bang-Jensen and Gregory Z. Gutin. 2009. Digraphs: Theory, Algorithms and Applications. Springer Science & Business Media.

Digital Library

[5]

Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski et al. 2012. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 5 (2012), 455--477.

[6]

Markus J. Bauer, Anthony J. Cox, and Giovanna Rosone. 2013. Lightweight algorithms for constructing and inverting the BWT of string collections. Theor. Comput. Sci. 483 (2013), 134--148.

Digital Library

[7]

Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger. 2009. Hash, displace, and compress. In ESA 2009: European Symposium on Algorithms (Lecture Notes in Computer Science), Vol. 5757. Springer, 682--693.

[8]

Djamal Belazzougui, Travis Gagie, Veli Mäkinen, and Marco Previtali. 2016a. Fully dynamic de Bruijn graphs. In SPIRE 2016: String Processing and Information Retrieval (Lecture Notes in Computer Science), Shunsuke Inenaga, Kunihiko Sadakane, and Tetsuya Sakai (Eds.), Vol. 9954. Springer, 145--152.

[9]

Djamal Belazzougui, Travis Gagie, Veli Mäkinen, Marco Previtali, and Simon J. Puglisi. 2016b. Bidirectional variable-order de Bruijn graphs. In LATIN 2016: Theoretical Informatics (Lecture Notes in Computer Science), Evangelos Kranakis, Gonzalo Navarro, and Edgar Chávez (Eds.), Vol. 9644. Springer, 164--178.

[10]

Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don’t thrash: How to cache your hash on flash. Proc. VLDB Endow. 5, 11 (2012), 1627--1637.

Digital Library

[11]

Timo Bingmann, Phelim Bradley, Florian Gauger, and Zamin Iqbal. 2019. COBS: A compact bit-sliced signature index. In SPIRE 2019: String Processing and Information Retrieval (Lecture Notes in Computer Science), Vol. 11811. Springer, 285--303.

[12]

Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422--426.

Digital Library

[13]

Christina Boucher, Alex Bowe, Travis Gagie, Simon J. Puglisi, and Kunihiko Sadakane. 2015. Variable-order de Bruijn graphs. In Proceedings of the Data Compression Conference, A. Bilgin, M. W. Marcellin, J. Serra-Sagristà, and J. A. Storer (Eds.). IEEE Computer Society Press, 383--392.

Digital Library

[14]

Alexander Bowe, Taku Onodera, Kunihiko Sadakane, and Tetsuo Shibuya. 2012. Succinct de Bruijn graphs. In WABI 2012: Algorithms in Bioinformatics (Lecture Notes in Computer Science), Ben Raphael and Jijun Tang (Eds.), Vol. 7534. Springer-Verlag, 225--235.

[15]

Phelim Bradley, Henk den Bakker, Eduardo Rocha, Gil McVean, and Zamin Iqbal. 2019. Ultrafast search of all deposited bacterial and viral genomic data. Nat. Biotechnol. 37 (2019), 152--159.

[16]

Keith R. Bradnam, Joseph N. Fass, Anton Alexandrov, Paul Baranay, Michael Bechner, Inanç Birol, Sébastien Boisvert, Jarrod A. Chapman, Guillaume Chapuis, Rayan Chikhi et al. 2013. Assemblathon 2: Evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2, 1 (2013).

[17]

Karel Břinda. 2016. Novel Computational Techniques for Mapping and Classifying Next-generation Sequencing Data. Ph.D. Dissertation. University of Paris-Est Marne-la-Vallée.

[18]

Karel Břinda, Michael Baym, and Gregory Kucherov. 2020. Simplitigs as an efficient and scalable representation of de Bruijn graphs. bioRxiv 903443 (2020).

[19]

Andrei Broder and Michael Mitzenmacher. 2003. Network applications of Bloom filters: A survey. Internet Math. 1, 4 (2003), 485--509.

[20]

Rayan Chikhi, Antoine Limasset, Shaun Jackman, Jared T. Simpson, and Paul Medvedev. 2014. On the representation of de Bruijn graphs. In RECOMB 2014: Research in Computational Molecular Biology (Lecture Notes in Computer Science), Roded Sharan (Ed.), Vol. 8394. Springer, 35--55.

Digital Library

[21]

Rayan Chikhi, Antoine Limasset, and Paul Medvedev. 2016. Compacting de Bruijn graphs from sequencing data quickly and in low memory. Bioinformatics 32, 12 (2016), i201--i208.

[22]

Rayan Chikhi and Guillaume Rizk. 2012. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. In WABI 2012: Algorithms in Bioinformatics (Lecture Notes in Computer Science), Ben Raphael and Jijun Tang (Eds.), Vol. 7534. Springer, 236--248.

[23]

Justin Chu, Hamid Mohamadi, Emre Erhan, Jeffery Tse, Readman Chiu, Sarah Yeo, and Inanç Birol. 2018. Improving on hash-based probabilistic sequence classification using multiple spaced seeds and multi-index Bloom filters. bioRxiv (2018), 434795.

[24]

Saar Cohen and Yossi Matias. 2003. Spectral Bloom filters. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 241--252.

Digital Library

[25]

Thomas C. Conway and Andrew J. Bromage. 2011. Succinct data structures for assembling large genomes. Bioinformatics 27, 4 (2011), 479--486.

Digital Library

[26]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. The MIT Press.

[27]

Adina Crainiceanu and Daniel Lemire. 2015. Bloofi: Multidimensional Bloom filters. Inf. Syst. 54 (2015), 311--324.

Digital Library

[28]

Peter Elias. 1974. Efficient storage and retrieval by content and address of static files. J. ACM 21, 2 (1974), 246--260.

Digital Library

[29]

Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies. Association for Computing Machinery, 75--88.

[30]

Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8, 3 (2000), 281--293.

Digital Library

[31]

Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, and Shan Muthukrishnan. 2009. Compressing and indexing labeled trees, with applications. J. ACM 57, 1 (2009), 4:1--4:33.

Digital Library

[32]

Paolo Ferragina and Giovanni Manzini. 2000. Opportunistic data structures with applications. In Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS’00). IEEE Computer Society, 390--398.

[33]

Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. 2007. Compressed representations of sequences and full-text indexes. ACM Trans. Algor. 3, 2 (2007).

[34]

Hongzhe Guo, Yilei Fu, Yan Gao, Junyi Li, Yadong Wang, and Bo Liu. 2019. deGSM: Memory scalable construction of large scale de Bruijn Graph. IEEE/ACM Trans. Comput. Biol. Bioinf. (2019), Early access.

[35]

Robert S. Harris and Paul Medvedev. 2020. Improved representation of sequence Bloom trees. Bioinformatics 36, 3 (2020), 721--727.

[36]

Steffen Heinz, Justin Zobel, and Hugh E. Williams. 2002. Burst tries: A fast, efficient data structure for string keys. ACM Trans. Inf. Syst. 20, 2 (2002), 192--223.

Digital Library

[37]

Guillaume Holley and Páll Melsted. 2019. Bifrost--Highly parallel construction and indexing of colored and compacted de Bruijn graphs. bioRxiv (2019), 695338.

[38]

Guillaume Holley, Roland Wittler, and Jens Stoye. 2016. Bloom filter trie: An alignment-free and reference-free data structure for pan-genome storage. Algor. Molec. Biol. 11, 1 (2016), 3.

[39]

Zamin Iqbal, Mario Caccamo, Isaac Turner, Paul Flicek, and Gil McVean. 2012. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genetics 44 (2012), 226--232.

[40]

Shaun D. Jackman, Benjamin P. Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S. Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, Rene L. Warren et al. 2017. ABySS 2.0: Resource-efficient assembly of large genomes using a Bloom filter. Genome Res. 27 (2017), 768--777.

[41]

Daniel Lemire and Owen Kaser. 2010. Recursive n-gram hashing is pairwise independent, at best. Comput. Speech Lang. 24, 4 (2010), 698--710.

Digital Library

[42]

Dinghua Li, Ruibang Luo, Chi-Man Liu, Chi-Ming Leung, Hing-Fung Ting, Kunihiko Sadakane, Hiroshi Yamashita, and Tak-Wah Lam. 2016. MEGAHIT v1. 0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102 (2016), 3--11.

[43]

Antoine Limasset, Bastien Cazaux, Eric Rivals, and Pierre Peterlongo. 2016. Read mapping on de Bruijn graphs. BMC Bioinf. 17 (2016).

[44]

Antoine Limasset, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. 2017. Fast and scalable minimal perfect hashing for massive key sets. In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA’17) (Leibniz International Proceedings in Informatics (LIPIcs)), Costas S. Iliopoulos, Solon P. Pissis, Simon J. Puglisi, and Rajeev Raman (Eds.), Vol. 75. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 25:1--25:16.

[45]

Xinan Liu, Ye Yu, Jinpeng Liu, Corrine F. Elliott, Chen Qian, and Jinze Liu. 2018. A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures. Bioinformatics 34, 1 (2018), 171--178.

[46]

Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. 2015. Genome-scale Algorithm Design. Cambridge University Press.

[47]

Sabrina Mantaci, Antonio Restivo, and Marinella Sciortino. 2005. An extension of the Burrows Wheeler transform to k words. In Proceedings of the Data Compression Conference, James A. Storer and Martin Cohn (Eds.). IEEE Computer Society Press, 469.

Digital Library

[48]

Camille Marchet, Christina Boucher, Simon J. Puglisi, Paul Medvedev, Mikaël Salson, and Rayan Chikhi. 2019a. Data structures based on k-mers for querying large collections of sequencing datasets. bioRxiv 866756 (2019).

[49]

Camille Marchet, Maël Kerbiriou, and Antoine Limasset. 2019b. Indexing de Bruijn graphs with minimizers. bioRxiv (2019), 546309.

[50]

Camille Marchet, Lolita Lecompte, Antoine Limasset, Lucie Bittner, and Pierre Peterlongo. 2018. A resource-frugal probabilistic dictionary and applications in bioinformatics. Disc. Appl. Math. 274 (2018), 92--102.

[51]

Paul Medvedev, Rayan Chikhi, and Antoine Limasset. 2019. Bi-directed graphs in BCALM 2. Retrieved from https://github.com/GATB/bcalm/blob/master/bidirected-graphs-in-bcalm2/bidirected-graphs-in-bcalm2.md.

[52]

Paul Medvedev, Konstantinos Georgiou, Gene Myers, and Michael Brudno. 2007. Computability of models for sequence assembly. In WABI 2007: Algorithms in Bioinformatics (Lecture Notes in Computer Science), Raffaele Giancarlo and Sridhar Hannenhalli (Eds.), Vol. 4645. Springer, 289--301.

[53]

Michael Mitzenmacher. 2002. Compressed Bloom filters. IEEE/ACM Trans. Netw. 10, 5 (2002), 604--612.

Digital Library

[54]

Martin D. Muggli, Alexander Bowe, Noelle R. Noyes, Paul S. Morley, Keith E. Belk, Robert Raymond, Travis Gagie, Simon J. Puglisi, and Christina Boucher. 2017. Succinct colored de Bruijn graphs. Bioinformatics 33, 20 (2017), 3181--3187.

[55]

Harun Mustafa, Ingo Schilken, Mikhail Karasikov, Carsten Eickhoff, Gunnar Rätsch, and André Kahles. 2019. Dynamic compression schemes for graph coloring. Bioinformatics 35, 3 (2019), 407--414.

[56]

Gonzalo Navarro. 2016. Compact Data Structures: A Practical Approach. Cambridge University Press.

[57]

Gonzalo Navarro and Kunihiko Sadakane. 2014. Fully functional static and dynamic succinct trees. ACM Trans. Algor. 10, 3 (2014), 16:1--16:39.

[58]

Daisuke Okanohara and Kunihiko Sadakane. 2007. Practical entropy-compressed rank/select dictionary. In Proceedings of the 9th Workshop on Algorithm Engineering and Experiments (ALENEX’07). Society for Industrial and Applied Mathematics, 60--70.

[59]

Tony Pan, Rahul Nihalani, and Srinivas Aluru. 2020. Fast de Bruijn graph compaction in distributed memory environments. IEEE/ACM Trans. Comput. Biol. Bioinf. 17, 1 (2020), 136--148.

Digital Library

[60]

Prashant Pandey, Fatemeh Almodaresi, Michael A. Bender, Michael Ferdman, Rob Johnson, and Rob Patro. 2018. Mantis: A fast, small, and exact large-scale sequence search index. Cell Syst. (2018), 201--207.

[61]

Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. 2017c. deBGR: An efficient and near-exact representation of the weighted de Bruijn graph. Bioinformatics 33, 14 (2017), i133--i141.

[62]

Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. 2017a. A general-purpose counting filter: Making every bit count. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). Association for Computing Machinery, 775--787.

Digital Library

[63]

Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. 2017b. Squeakr: An exact and approximate k-mer counting system. Bioinformatics 34, 4 (2017), 568--575.

[64]

David Pellow, Darya Filippova, and Carl Kingsford. 2017. Improving Bloom filter performance on sequence data using k-mer Bloom filters. J. Comput. Biol. 24, 6 (2017), 547--557.

[65]

Amatur Rahman and Paul Medvedev. 2020. Representation of k-mer sets using spectrum-preserving string sets. bioRxiv 896928 (2020).

[66]

Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algor. 3, 4 (2007), 43.

Digital Library

[67]

Michael Roberts, Wayne Hayes, Brian R. Hunt, Stephen M. Mount, and James A. Yorke. 2004. Reducing storage requirements for biological sequence comparison. Bioinformatics 20, 18 (2004), 3363--3369.

Digital Library

[68]

Kamil Salikhov, Gustavo Sacomoto, and Gregory Kucherov. 2013. Using cascading Bloom filters to improve the memory usage for de Brujin graphs. In WABI 2013: Algorithms in Bioinformatics (Lecture Notes in Computer Science), Aaron Darling and Jens Stoye (Eds.), Vol. 8126. Springer, 364--376.

[69]

Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: Local algorithms for document fingerprinting. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 76--85.

Digital Library

[70]

Alexander Sczyrba, Peter Hofmann, Peter Belmann, David Koslicki, Stefan Janssen, Johannes Dröge, Ivan Gregor, Stephan Majda, Jessika Fiedler, Eik Dahms et al. 2017. Critical assessment of metagenome interpretation—A benchmark of metagenomics software. Nat. Methods 14, 11 (2017), 1063--1071.

[71]

Haixiang Shi, Bertil Schmidt, Weiguo Liu, and Wolfgang Muller-Wittig. 2010. A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol. 17, 4 (2010), 603--615.

[72]

Jared T. Simpson and Mihai Pop. 2015. The theory and practice of genome sequence assembly. Ann.l Rev. Genom. Hum. Genet. 16 (2015), 153--172.

[73]

Brad Solomon and Carl Kingsford. 2016. Fast search of thousands of short-read sequencing experiments. Nat. Biotechnol. 34, 3 (2016), 300--302.

[74]

Brad Solomon and Carl Kingsford. 2018. Improved search of large transcriptomic sequencing databases using split sequence Bloom trees. J. Comput. Biol. 25, 7 (2018), 755--765.

[75]

Henrik Stranneheim, Max Käller, Tobias Allander, Björn Andersson, Lars Arvestad, and Joakim Lundeberg. 2010. Classification of DNA sequences using Bloom filters. Bioinformatics 26, 13 (2010), 1595--1600.

Digital Library

[76]

Chen Sun, Robert S. Harris, Rayan Chikhi, and Paul Medvedev. 2018. AllSome Sequence Bloom Trees. J. Comput. Biol. 25, 5 (2018), 467--479.

[77]

Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. 2011. Theory and practice of Bloom filters for distributed systems. IEEE Commun. Surv. Tutor. 14, 1 (2011), 131--155.

[78]

Niko Välimäki and Eric Rivals. 2013. Scalable and versatile k-mer indexing for high-throughput sequencing data. In Proceedings of the 9th International Symposium on Bioinformatics Research and Applications (ISBRA’13) (Lecture Notes in Computer Science), Vol. 7875. Springer, 237--248.

[79]

Derrick E. Wood and Steven L. Salzberg. 2014. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15 (2014), R46.

[80]

Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Chen Qian, and Jinze Liu. 2018. SeqOthello: Query over RNA-seq experiments at scale. Genome Biol. 19 (2018).

[81]

Jens Zentgraf, Henning Timm, and Sven Rahmann. 2020. Cost-optimal assignment of elements in genome-scale multi-way bucketed Cuckoo hash tables. In 2020 Proceedings of the Twenty-Second Workshop on Algorithm Engineering and Experiments (ALENEX), Guy Blelloch and Irene Finocchi (Eds.). SIAM, 186--198.

Cited By

Khan JRubel TMolloy EDhulipala LPatro R(2024)Fast, parallel, and cache-friendly suffix array constructionAlgorithms for Molecular Biology10.1186/s13015-024-00263-519:1Online publication date: 28-Apr-2024
https://doi.org/10.1186/s13015-024-00263-5
Costa MFerragina PVinciguerra G(2024)Grafite: Taming Adversarial Queries with Optimal Range FiltersProceedings of the ACM on Management of Data10.1145/36392582:1(1-23)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639258
Martayan ICazaux BLimasset AMarchet C(2024) Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k -mer sets Bioinformatics10.1093/bioinformatics/btae21740:Supplement_1(i48-i57)Online publication date: 28-Jun-2024
https://doi.org/10.1093/bioinformatics/btae217
Show More Cited By

Index Terms

Data Structures to Represent a Set of k-long DNA Sequences
1. Applied computing
  1. Life and medical sciences
    1. Computational biology
2. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Pattern matching

Recommendations

PriSeT: efficient de novo primer discovery
BCB '21: Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Motivation: DNA metabarcoding is commonly used to infer the species composition of environmental samples, whereby a short, homologous DNA sequence is amplified and sequenced from all members of the community. Samples can comprise hundreds of organisms ...
HmmUFOtu: An HMM and Phylogenetic Placement based Ultra-fast Taxonomic Assignment and OTU Picking Tool for Microbiome Amplicon Sequencing Studies
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Over the last decade, joint advances in next-generation sequencing technology and bioinformatics pipelines have dramatically improved our understanding of host-associated and environmental microbiota. Standard microbiome community analysis typically ...
DNA motifs detection algorithms in long sequences
BIBE '12: Proceedings of the 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)

The identification of DNA motifs remains an active challenge for the researchers in the bioinformatics domain. A considerable effort in this area was concentrated on understanding the evolution of the genome by identifying the DNA binding sites for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 54, Issue 1

January 2022

844 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3446641

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2021

Accepted: 01 October 2020

Revised: 01 June 2020

Received: 01 April 2019

Published in CSUR Volume 54, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fulbright Visiting Scholar Program
OP VVV MEYS
“Research Center for Informatics”
NSF
National Institute Of General Medical Sciences of the National Institutes of Health
INCEPTION

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
2,184
Total Downloads

Downloads (Last 12 months)683
Downloads (Last 6 weeks)55

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khan JRubel TMolloy EDhulipala LPatro R(2024)Fast, parallel, and cache-friendly suffix array constructionAlgorithms for Molecular Biology10.1186/s13015-024-00263-519:1Online publication date: 28-Apr-2024
https://doi.org/10.1186/s13015-024-00263-5
Costa MFerragina PVinciguerra G(2024)Grafite: Taming Adversarial Queries with Optimal Range FiltersProceedings of the ACM on Management of Data10.1145/36392582:1(1-23)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639258
Martayan ICazaux BLimasset AMarchet C(2024) Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k -mer sets Bioinformatics10.1093/bioinformatics/btae21740:Supplement_1(i48-i57)Online publication date: 28-Jun-2024
https://doi.org/10.1093/bioinformatics/btae217
Rossignolo EComin M(2024) Enhanced Compression of k -Mer Sets with Counters via de Bruijn Graphs Journal of Computational Biology10.1089/cmb.2024.053031:6(524-538)Online publication date: 1-Jun-2024
https://doi.org/10.1089/cmb.2024.0530
Lemane TLezzoche NLecubin JPelletier ELescot MChikhi RPeterlongo P(2024)Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORANature Computational Science10.1038/s43588-024-00596-64:2(104-109)Online publication date: 26-Feb-2024
https://doi.org/10.1038/s43588-024-00596-6
Medvedev P(2023)Theoretical Analysis of Sequencing Bioinformatics Algorithms and BeyondCommunications of the ACM10.1145/357172366:7(118-125)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3571723
Ferragina PFrasca MMarinò GVinciguerra G(2023)On Nonlinear Learned String IndexingIEEE Access10.1109/ACCESS.2023.329543411(74021-74034)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3295434
Robidou LPeterlongo P(2023) fimpera : drastic improvement of Approximate Membership Query data-structures with counts Bioinformatics10.1093/bioinformatics/btad30539:5Online publication date: 17-May-2023
https://doi.org/10.1093/bioinformatics/btad305
Rossignolo EComin M(2023)USTAR: Improved Compression of k-mer Sets with Counters Using de Bruijn GraphsBioinformatics Research and Applications10.1007/978-981-99-7074-2_16(202-213)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-99-7074-2_16
Li W(2023)Genome-Wide Copy Number Variation and Structural Variation: A Novel Tool for Improved Livestock Genomic SelectionBiotechnological Interventions Augmenting Livestock Health and Production10.1007/978-981-99-2209-3_5(75-88)Online publication date: 23-Sep-2023
https://doi.org/10.1007/978-981-99-2209-3_5
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents