-
Graph-based variant discovery reveals novel dynamics in the human microbiome
Authors:
Harihara Subrahmaniam Muralidharan,
Jacquelyn S Michaelis,
Jay Ghurye,
Todd Treangen,
Sergey Koren,
Marcus Fedarko,
Mihai Pop
Abstract:
Sequence differences between the strains of bacteria comprising host-associated and environmental microbiota may play a role in community assembly and influence the resilience of microbial communities to disturbances. Tools for characterizing strain-level variation within microbial communities, however, are limited in scope, focusing on just single nucleotide polymorphisms, or relying on reference…
▽ More
Sequence differences between the strains of bacteria comprising host-associated and environmental microbiota may play a role in community assembly and influence the resilience of microbial communities to disturbances. Tools for characterizing strain-level variation within microbial communities, however, are limited in scope, focusing on just single nucleotide polymorphisms, or relying on reference-based analyses that miss complex functional and structural variants. Here, we demonstrate the power of assembly graph analysis to detect and characterize structural variants in almost 1,000 metagenomes generated as part of the Human Microbiome Project. We identify over nine million variants comprising insertion/deletion events, repeat copy-number changes, and mobile elements such as plasmids. We highlight some of the potential functional roles of these genomic changes. Our analysis revealed striking differences in the rate of variation across body sites, highlighting niche-specific mechanisms of bacterial adaptation. The structural variants we detect also include potentially novel prophage integration events, highlighting the potential use of graph-based analyses for phage discovery.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
MetaCompass: Reference-guided Assembly of Metagenomes
Authors:
Tu Luan,
Victoria Cepeda,
Bo Liu,
Zac Bowen,
Ujjwal Ayyangar,
Mathieu Almeida,
Christopher M. Hill,
Sergey Koren,
Todd J. Treangen,
Adam Porter,
Mihai Pop
Abstract:
Metagenomic studies have primarily relied on de novo assembly for reconstructing genes and genomes from microbial mixtures. While reference-guided approaches have been employed in the assembly of single organisms, they have not been used in a metagenomic context. Here we describe the first effective approach for reference-guided metagenomic assembly that can complement and improve upon de novo met…
▽ More
Metagenomic studies have primarily relied on de novo assembly for reconstructing genes and genomes from microbial mixtures. While reference-guided approaches have been employed in the assembly of single organisms, they have not been used in a metagenomic context. Here we describe the first effective approach for reference-guided metagenomic assembly that can complement and improve upon de novo metagenomic assembly methods for certain organisms. Such approaches will be increasingly useful as more genomes are sequenced and made publicly available.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph
Authors:
Haoyu Cheng,
Mobin Asri,
Julian Lucas,
Sergey Koren,
Heng Li
Abstract:
Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two hum…
▽ More
Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two human and two plant genomes, we demonstrate that our algorithm is around an order of magnitude cheaper than existing methods, while producing better diploid and haploid assemblies. Notably, our algorithm is the only feasible solution to the haplotype-resolved assembly of polyploid genomes.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Reducing assembly complexity of microbial genomes with single-molecule sequencing
Authors:
Sergey Koren,
Gregory P Harhay,
Timothy PL Smith,
James L Bono,
Dayna M Harhay,
D. Scott Mcvey,
Diana Radune,
Nicholas H Bergman,
Adam M Phillippy
Abstract:
Background: The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, whi…
▽ More
Background: The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.
Results: To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.
Conclusions: Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
△ Less
Submitted 15 November, 2013; v1 submitted 12 April, 2013;
originally announced April 2013.
-
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
Authors:
Keith R. Bradnam,
Joseph N. Fass,
Anton Alexandrov,
Paul Baranay,
Michael Bechner,
İnanç Birol,
Sébastien Boisvert,
Jarrod A. Chapman,
Guillaume Chapuis,
Rayan Chikhi,
Hamidreza Chitsaz,
Wen-Chi Chou,
Jacques Corbeil,
Cristian Del Fabbro,
T. Roderick Docking,
Richard Durbin,
Dent Earl,
Scott Emrich,
Pavel Fedotov,
Nuno A. Fonseca,
Ganeshkumar Ganapathy,
Richard A. Gibbs,
Sante Gnerre,
Élénie Godzaridis,
Steve Goldstein
, et al. (66 additional authors not shown)
Abstract:
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and…
▽ More
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
△ Less
Submitted 27 June, 2013; v1 submitted 23 January, 2013;
originally announced January 2013.