Time- and memory-efficient genome assembly with Raven

Vaser, Robert; Šikić, Mile

doi:10.1038/s43588-021-00073-4

Brief Communication
Published: 20 May 2021

Time- and memory-efficient genome assembly with Raven

Nature Computational Science volumeÂ 1,Â pages 332â336 (2021)Cite this article

2083 Accesses
193 Citations
32 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

Whole genome sequencing technologies are unable to invariably read DNA molecules intact, a shortcoming that assemblers try to resolve by stitching the obtained fragments back together. Here, we present methods for the improvement of de novo genome assembly from erroneous long reads incorporated into a tool called Raven. Raven maintains similar performance for various genomes and has accuracy on par with other assemblers that support third-generation sequencing data. It is one of the fastest options while having the lowest memory consumption on the majority of benchmarked datasets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Bacterial assembly graph drawn with the force-directed placement algorithm.**

Linear time complexity de novo long read genome assembly with GoldRush

Article Open access 22 May 2023

Efficient hybrid de novo assembly of human genomes with WENGAN

Article Open access 14 December 2020

Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement

Article Open access 17 October 2023

Data availability

The ONT dataset for A. thaliana is available under accession no. ERR2173373, for D. melanogaster under SRR6702603, for H. sapiens NA12878 at https://github.com/nanopore-wgs-consortium/NA12878 (release 6), for H. sapiens CHM13 at https://github.com/marbl/CHM13 (release 6), for H. sapiens HG002 at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/Ultralong_OxfordNanopore/guppy-V3.4.5/ and for H. sapiens HG00733 at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_panel/HG00733/nanopore/. The PacBio CLR dataset for A. thaliana is available at https://downloads.pacbcloud.com/public/SequelData/ArabidopsisDemoData/, for D. melanogaster under accession no. SRR5439404, for H. sapiens CHM13 at https://github.com/marbl/CHM13 (extracted from draft v1.0 bam), for H. sapiens HG002 at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_MtSinai_NIST/PacBio_fasta/ and for H. sapiens HG0073 under SRR7615963. The PacBio HiFi dataset for H. sapiens CHM13 is available from accession nos. SRR11292120âSRR11292123, for H. sapiens HG002 under SRR10382244, SRR10382245, SRR10382248 and SRR10382249, and for H. sapiens HG00733 under ERX3831682. Illumina reads for yak evaluation are available from accession nos. SRX1049768âSRX1049782 for H. sapiens NA12878, from https://github.com/marbl/CHM13 (extracted from draft v1.0 bam) for H. sapiens CHM13, from https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_HiSeq_HG002_Homogeneity-10953946/NHGRI_Illumina300X_AJtrio_novoalign_bams/ (extracted from 60x bam) for H. sapiens HG002 and under accession no. SRR7782677 for H. sapiens HG00733. ONT plant datasets are available under accession nos. ERR2564160âERR2564170 for B. rapa, from ERR2564373âERR2564382 for B. oleracea, from ERR2571286âERR2571303 for M. schizocarpa, from ERR3476478âERR3476482 for O. sativa basmati 334 and from ERR3476463âERR3476466 for O. sativa dom sufid. All generated assemblies in this research are available at Zenodo²⁶.

Code availability

The Raven source code is available under an MIT license on GitHub at https://github.com/lbcb-sci/raven. Source code for version 1.3.0 used in this manuscript is also available at Zenodo²⁷.

References

Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722â736 (2017).
ArticleÂ Google ScholarÂ
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050â1054 (2016).
ArticleÂ Google ScholarÂ
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540â546 (2019).
ArticleÂ Google ScholarÂ
Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103â2110 (2016).
ArticleÂ Google ScholarÂ
Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044â1053 (2020).
ArticleÂ Google ScholarÂ
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155â158 (2020).
ArticleÂ Google ScholarÂ
Kamath, G. M., Shomorony, I., Xia, F., Courtade, T. A. & Tse, D. N. HINGE: long-read assembly achieves optimal repeat resolution. Genome Res. 27, 747â756 (2017).
ArticleÂ Google ScholarÂ
Vaser, R., SoviÄ, I., Nagarajan, N. & Å ikiÄ, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737â746 (2017).
ArticleÂ Google ScholarÂ
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403â410 (1990).
ArticleÂ Google ScholarÂ
Li, H. & Durbin, R. Fast and accurate short read alignment with BurrowsâWheeler transform. Bioinformatics 25, 1754â1760 (2009).
ArticleÂ Google ScholarÂ
Broder, A. Z. On the resemblance and containment of documents. In Proc. Compression and Complexity of SEQUENCES 1997 (cat. no. 97TB100171) (eds. Carpentieri, B. et al.) 21â29 (IEEE, 1997); https://doi.org/10.1109/SEQUEN.1997.666900
Jain, C., Dilthey, A., Koren, S., Aluru, S. & Phillippy, A. M. A fast approximate algorithm for mapping long reads to large reference databases. In Research in Computational Molecular Biology (ed. Sahinalp, S. C.) 66â81 (Springer, 2017).
Chin, C.-S. & Khalak, A. Human genome assembly in 100 minutes. Preprint at bioRxiv https://doi.org/10.1101/705616 (2019).
Fruchterman, T. M. J. & Reingold, E. M. Graph drawing by force-directed placement. Softw. Pract. Exp. 21, 1129â1164 (1991).
ArticleÂ Google ScholarÂ
Barnes, J. & Hut, P. A hierarchical O(NlogN) force-calculation algorithm. Nature 324, 446â449 (1986).
ArticleÂ Google ScholarÂ
Wick, R. R. & Holt, K. E. Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Res. 8, 2138 (2020).
ArticleÂ Google ScholarÂ
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291â1305 (2020).
ArticleÂ Google ScholarÂ
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170â175 (2021).
ArticleÂ Google ScholarÂ
Belser, C. et al. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nat. Plants 4, 879â887 (2018).
ArticleÂ Google ScholarÂ
Choi, J. Y. et al. Nanopore sequencing-based genome assembly and evolutionary genomics of circum-basmati rice. Genome Biol. 21, 21 (2020).
ArticleÂ Google ScholarÂ
Vaser, R. & Å ikiÄ, M. Yet another de novo genome assembler. In Proc. 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA) (eds. LonÄariÄ, S. et al.) 147â151 (IEEE, 2019); https://doi.org/10.1109/ISPA.2019.8868909
SimÃ£o, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210â3212 (2015).
ArticleÂ Google ScholarÂ
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338â345 (2018).
ArticleÂ Google ScholarÂ
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. & Gurevich, A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142âi150 (2018).
ArticleÂ Google ScholarÂ
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094â3100 (2018).
ArticleÂ Google ScholarÂ
Vaser, R. & Sikic, M. 2021. Assemblies generated in the manuscript âTime and memory efficient genome assembly with Ravenâ. Zenodo https://doi.org/10.5281/zenodo.4443062
Vaser, R. & Sikic, M. 2021. Raven source code used in the manuscript âTime and memory efficient genome assembly with Ravenâ. Zenodo https://doi.org/10.5281/zenodo.4672196

Download references

Acknowledgements

This work has been supported in part by the Croatian Science Foundation under the project âSingle genome and metagenome assemblyâ (IP-2018-01-5886, to M.Å .), the European Regional Development Fund under grant no. KK.01.1.1.01.0009 (DATACROSS, to M.Å .) and the A*STAR Computational Resource Centre through the use of their high-performance computing facilities. R.V. and M.Å . have been partially supported by funding from A*STAR, Singapore. We acknowledge Intel Corporation for allowing us to test with the Intel Optane persistent memory server and providing us with high-quality technical support. Finally, we thank G. Å½uÅ¾iÄ from Carnegie Mellon University for valuable discussions about graph drawings.

Author information

Authors and Affiliations

Laboratory for Bioinformatics and Computational Biology, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Robert VaserÂ &Â Mile Å ikiÄ
Laboratory of AI in Genomics, Genome Institute of Singapore, A*STAR, Singapore, Singapore
Robert VaserÂ &Â Mile Å ikiÄ

Authors

Robert Vaser
View author publications
You can also search for this author in PubMedÂ Google Scholar
Mile Å ikiÄ
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

M.Å . devised the project. R.V. designed and implemented Raven, and benchmarked it with other assemblers. Both authors drafted and revised the manuscript.

Corresponding author

Correspondence to Mile Å ikiÄ.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Handling editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1â4 and Tables 1â5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vaser, R., Å ikiÄ, M. Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 1, 332â336 (2021). https://doi.org/10.1038/s43588-021-00073-4

Download citation

Received: 23 February 2021
Accepted: 20 April 2021
Published: 20 May 2021
Issue Date: May 2021
DOI: https://doi.org/10.1038/s43588-021-00073-4

This article is cited by

Genomic mining of Vibrio parahaemolyticus highlights prevalence of antimicrobial resistance genes and new genetic markers associated with AHPND and tdhâ+â/trhâ+âgenotypes
- Marieke Vandeputte
- Sieglinde Coppens
- Daisy Vanrompay
BMC Genomics (2024)
Carbapenem-resistant hypervirulent ST23 Klebsiella pneumoniae with a highly transmissible dual-carbapenemase plasmid in Chile
- MatÃas GÃ¡lvez-Silva
- Patricio Arros
- AndrÃ©s E. Marcoleta
Biological Research (2024)
Comparative analysis of metagenomic classifiers for long-read sequencing datasets
- Josip MariÄ
- KreÅ¡imir KriÅ¾anoviÄ
- Mile Å ikiÄ
BMC Bioinformatics (2024)
Sexual dimorphism in the tardigrade Paramacrobiotus metropolitanus transcriptome
- Kenta Sugiura
- Yuki Yoshida
- Midori Matsumoto
Zoological Letters (2024)
A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study
- Harpreet Kaur
- Laura M. Shannon
- Deborah A. Samac
BMC Genomics (2024)