Abstract
Ocean microbes drive biogeochemical cycling on a global scale1. However, this cycling is constrained by viruses that affect community composition, metabolic activity, and evolutionary trajectories2,3. Owing to challenges with the sampling and cultivation of viruses, genome-level viral diversity remains poorly described and grossly understudied, with less than 1% of observed surface-ocean viruses known4. Here we assemble complete genomes and large genomic fragments from both surface- and deep-ocean viruses sampled during the Tara Oceans and Malaspina research expeditions5,6, and analyse the resulting âglobal ocean viromeâ dataset to present a global map of abundant, double-stranded DNA viruses complete with genomic and ecological contexts. A total of 15,222 epipelagic and mesopelagic viral populations were identified, comprising 867 viral clusters (defined as approximately genus-level groups7,8). This roughly triples the number of known ocean viral populations4 and doubles the number of candidate bacterial and archaeal virus genera8, providing a near-complete sampling of epipelagic communities at both the population and viral-cluster level. We found that 38 of the 867 viral clusters were locally or globally abundant, together accounting for nearly half of the viral populations in any global ocean virome sample. While two-thirds of these clusters represent newly described viruses lacking any cultivated representative, most could be computationally linked to dominant, ecologically relevant microbial hosts. Moreover, we identified 243 viral-encoded auxiliary metabolic genes, of which only 95 were previously known. Deeper analyses of four of these auxiliary metabolic genes (dsrC, soxYZ, P-II (also known as glnB) and amoC) revealed that abundant viruses may directly manipulate sulfur and nitrogen cycling throughout the epipelagic ocean. This viral catalog and functional analyses provide a necessary foundation for the meaningful integration of viruses into ecosystem models where they act as key players in nutrient cycling and trophic networks.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive Earthâs biogeochemical cycles. Science 320, 1034â1039 (2008)
Rohwer, F. & Thurber, R. V. Viruses manipulate the marine environment. Nature 459, 207â212 (2009)
Brum, J. R. & Sullivan, M. B. Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat. Rev. Microbiol. 13, 147â159 (2015)
Brum, J. et al. Patterns and ecological drivers of ocean viral communities. Science 348, 1261498 (2015)
Karsenti, E. et al. A holistic approach to marine eco-systems biology. PLoS Biol . 9, e1001177 (2011)
Duarte, C. M. Seafaring in the 21st century: the Malaspina 2010 circumnavigation expedition. Limnol. Oceanogr. 24, 11â14 (2015)
Lima-Mendez, G., Van Helden, J., Toussaint, A. & Leplae, R. Reticulate representation of evolutionary and functional relationships between phage genomes. Mol. Biol. Evol. 25, 762â777 (2008)
Roux, S., Hallam, S. J., Woyke, T. & Sullivan, M. B. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. eLife 4, 1â20 (2015)
Mizuno, C. M., Rodriguez-Valera, F., Kimes, N. E. & Ghai, R. Expanding the marine virosphere using metagenomics. PLoS Genet . 9, e1003987 (2013)
Chow, C.-E. T., Winget, D. M., White, R. A., III, Hallam, S. J. & Suttle, C. A. Combining genomic sequencing methods to explore viral diversity and reveal potential virus-host interactions. Front. Microbiol. 6, 265 (2015)
Roux, S. et al. Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics. eLife 3, e03125 (2014)
Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun . 5, 4498 (2014)
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533â538 (2013)
Sullivan, M. B. et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 12, 3035â3056 (2010)
Zhao, Y. et al. Abundant SAR11 viruses in the ocean. Nature 494, 357â360 (2013)
Labrie, S. J. et al. Genomes of marine cyanopodoviruses reveal multiple origins of diversity. Environ. Microbiol. 15, 1356â1376 (2013)
Andersson, A. F. & Banfield, J. F. Virus population dynamics and acquired virus resistance in natural microbial communities. Science 320, 1047â1050 (2008)
Sunagawa, S. et al. Ocean plankton. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015)
Flores, C. O., Valverde, S. & Weitz, J. S. Multi-scale structure and geographic drivers of cross-infection within marine bacteria and phages. ISME J . 7, 520â532 (2013)
Hurwitz, B. L., Brum, J. R. & Sullivan, M. B. Depth-stratified functional and taxonomic niche specialization in the âcoreâ and âflexibleâ Pacific Ocean Virome. ISME J . 9, 472â484 (2015)
Anantharaman, K. et al. Sulfur oxidation genes in diverse deep-sea viruses. Science 344, 757â760 (2014)
Friedrich, C. G., Bardischewsky, F., Rother, D., Quentmeier, A. & Fischer, J. Prokaryotic sulfur oxidation. Curr. Opin. Microbiol. 8, 253â259 (2005)
Santos, A. A. et al. A protein trisulfide couples dissimilatory sulfate reduction to energy conservation. Science 350, 1541â1545 (2015)
Venceslau, S. S., Stockdreher, Y., Dahl, C. & Pereira, I. A. C. The âbacterial heterodisulfideâ DsrC is a key protein in dissimilatory sulfur metabolism. Biochim. Biophys. Acta 1837, 1148â1164 (2014)
Dahl, C., Franz, B., Hensen, D., Kesselheim, A. & Zigann, R. Sulfite oxidation in the purple sulfur bacterium Allochromatium vinosum: identification of SoeABC as a major player and relevance of SoxYZ in the process. Microbiology 159, 2626â2638 (2013)
Huergo, L. F., Chandra, G. & Merrick & M. P. (II) signal transduction proteins: nitrogen regulation and beyond. FEMS Microbiol. Rev. 37, 251â283 (2013)
Stahl, D. A. & de la Torre, J. R. Physiology and diversity of ammonia-oxidizing archaea. Annu. Rev. Microbiol. 66, 83â101 (2012)
Loy, A. et al. Reverse dissimilatory sulfite reductase as phylogenetic marker for a subgroup of sulfur-oxidizing prokaryotes. Environ. Microbiol. 11, 289â299 (2009)
Pester, M., Schleper, C. & Wagner, M. The Thaumarchaeota: an emerging view of their phylogeny and ecophysiology. Curr. Opin. Microbiol. 14, 300â306 (2011)
Weitz, J. S. et al. A multitrophic model to quantify the effects of marine viruses on microbial food webs and ecosystem processes. ISME J . 9, 1352â1364 (2015)
Arcondéguy, T., Jack, R. & Merrick & M. P. (II) signal transduction proteins, pivotal players in microbial nitrogen control. Microbiol. Mol. Biol. Rev. 65, 80â105 (2001)
Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2, 150023 (2015)
John, S. G. et al. A simple and efficient method for concentration of ocean viruses by chemical flocculation. Environ. Microbiol. Rep. 3, 195â202 (2011)
Hurwitz, B. L., Deng, L., Poulos, B. T. & Sullivan, M. B. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ. Microbiol. 15, 1428â1440 (2013)
Aminot, A., Kérouel, R. & Coverly, S. in Practical Guidelines for the Analysis of Seawater (ed. O. Wurl ) 143â176 (CRC Press, 2009)
Tara Oceans Consortium & Tara Oceans Expedition. Registry of all samples from the Tara Oceans Expedition (2009â2013). http://dx.doi.org/10.1594/PANGAEA.842197 (2015)
Tara Oceans Consortium & Tara Oceans Expedition. Environmental context of all samples from the Tara Oceans Expedition (2009â2013). http://dx.doi.org/10.1594/PANGAEA.853810 (2015)
Tara Oceans Consortium & Tara Oceans Expedition. Biodiversity context of all samples from the Tara Oceans Expedition (2009â2013). http://dx.doi.org/10.1594/PANGAEA.853809 (2015)
Salazar, G. et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J . 10, 596â608 (2016). 10.1038/ismej.2015.137
Kultima, J. R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7, e47656 (2012)
Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420â1428 (2012)
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658â1659 (2006)
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015)
Mavromatis, K. et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4, 495â500 (2007)
Roux, S., Krupovic, M., Debroas, D., Forterre, P. & Enault, F. Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences. Open Biol . 3, 130160 (2013)
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015)
Pope, W. H. et al. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity. eLife 4, e06416 (2015)
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575â1584 (2002)
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res . 42, D222âD230 (2014)
Eddy, S. R. Accelerated Profile HMM Searches. PLOS Comput. Biol. 7, e1002195 (2011)
Brum, J. R. et al. Illuminating structural proteins in viral âdark matterâ with metaproteomics. Proc. Natl Acad. Sci. USA 113, 2436â2441 (2016)
Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl Acad. Sci. USA 110, 12798â12803 (2013)
Kang, I., Jang, H. & Cho, J.-C. Complete genome sequences of two Persicivirga bacteriophages, P12024S and P12024L. J. Virol. 86, 8907â8908 (2012)
Kang, I., Oh, H.-M., Kang, D. & Cho, J.-C. Genome of a SAR116 bacteriophage shows the prevalence of this phage type in the oceans. Proc. Natl Acad. Sci. USA 110, 12343â12348 (2013)
Hjorleifsdottir, S., Aevarsson, A., Hreggvidsson, G. O., Fridjonsson, O. H. & Kristjansson, J. K. Isolation, growth and genome of the Rhodothermus RM378 thermophilic bacteriophage. Extremophiles 18, 261â270 (2014)
Marks, T. J. & Hamilton, P. T. Characterization of a thermophilic bacteriophage of Geobacillus kaustophilus. Arch. Virol. 159, 2771â2775 (2014)
Halmillawewa, A. P., Restrepo-Córdoba, M., Yost, C. K. & Hynes, M. F. Genomic and phenotypic characterization of Rhizobium gallicum phage vB_RglS_P106B. Microbiology 161, 611â620 (2015)
Rohwer, F. & Edwards, R. The Phage Proteomic Tree: a genome-based taxonomy for phage. J. Bacteriol. 184, 4529â4535 (2002)
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127â128 (2007)
Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res . 39, W475â8 (2011)
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357â359 (2012)
Edwards, R. A., McNair, K., Faust, K., Raes, J. & Dutilh, B. E. Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 40, 258â272 (2016)
Bland, C. et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007)
Rho, M., Wu, Y.-W., Tang, H., Doak, T. G. & Ye, Y. Diverse CRISPRs evolving in human microbiomes. PLoS Genet . 8, e1002441 (2012)
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet . 16, 276â277 (2000)
Ogilvie, L. A. et al. Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences. Nat. Commun. 4, 2420 (2013)
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764â770 (2011)
Oksanen, J. et al. The vegan package version 2.4-0; https://cran.r-project.org/web/packages/vegan/index.html (2016)
Sharon, I. et al. Comparative metagenomics of microbial traits within oceanic viral communities. ISME J . 5, 1178â1190 (2011)
Thompson, L. R. et al. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl Acad. Sci. USA 108, E757âE764 (2011)
Dammeyer, T., Bagby, S. C., Sullivan, M. B., Chisholm, S. W. & Frankenberg-Dinkel, N. Efficient phage-mediated pigment biosynthesis in oceanic cyanobacteria. Curr. Biol. 18, 442â448 (2008)
Lindell, D., Jaffe, J. D., Johnson, Z. I., Church, G. M. & Chisholm, S. W. Photosynthesis genes in marine viruses yield proteins during host infection. Nature 438, 86â89 (2005)
Lindell, D. et al. Genome-wide expression dynamics of a marine virus and host reveal features of co-evolution. Nature 449, 83â86 (2007)
Sullivan, M. B. et al. Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts. PLoS Biol . 4, e234 (2006)
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)
Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2âa multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189â1191 (2009)
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2âapproximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010)
Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754â755 (2001)
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592â593 (2011)
Sullivan, M. J., Petty, N. K. & Beatson, S. A. Easyfig: a genome comparison visualizer. Bioinformatics 27, 1009â1010 (2011)
Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protocols 5, 725â738 (2010)
Wiederstein, M. & Sippl, M. J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res . 35, W407â10 (2007)
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45â50 (2013)
Alberti, A. et al. Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data. BMC Genomics 15, 912 (2014)
Acknowledgements
We thank J. Weitz for advice on statistics, C. Pelikan for help with the DsrAB phylogenetic tree, C. Dahl for discussion regarding DsrC function, and members of the Sullivan and the V. Rich laboratories for suggestions and comments on this manuscript. We acknowledge support from UA high-performance computing and the Ohio Supercomputer Center. Sponsors and support for Tara Oceans and Malaspina expeditions are listed in the Supplementary Information. This viral research was funded by a National Science Foundation grant (1536989) and Gordon and Betty Moore Foundation grants (3790, 2631) to M.B.S., and the French Ministry of Research and Government through the âInvestissements dâAvenirâ program OCEANOMICS (ANR-11-BTBR-0008) and France Genomique (ANR-10-INBS-09-08). Virus researchers were partially supported by the Water, Environmental and Energy Solutions Initiative and the Ecosystem Genomics Institute (S.R.), the Netherlands Organization for Scientific Research Vidi grant 864.14.004 and CAPES/BRASIL (B.E.D.), and the Austrian Science Fund (project P25111-B22, A.L.). Sequencing was provided by Genoscope (Tara Oceans) and DOE JGI (Malaspina). All authors approved the final manuscript. This article is contribution number 43 of the Tara Oceans expedition.
Author information
Authors and Affiliations
Consortia
Contributions
S.R. and M.B.S. designed the study. C.D., M.P. and S.Se. contributed extensively to sampling collection. S.K.-L. managed the logistics of the Tara Oceans project. B.T.P., N.S. and E.L. performed the viral-specific processing of the samples. J.P., C.C., A.A. and P.W. led the sequencing of viral samples. S.R., S.Su. and B.E.D. led the assembly of raw data. S.R., S.Su., M.B.D. and M.B.S. analysed the genomic diversity data. S.R., A.L., J.R.B. and M.B.S. analysed the AMGs data. S.R., J.R.B., B.E.D, S.Su., M.B.D., A.L., S.P., P.B., S.G.A., C.D., J.M.G., D.V. and M.B.S. provided constructive comments, revised and edited the manuscript. Tara Oceans Coordinators provided constructive criticism throughout the study. All authors discussed the results and commented on the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
All data are fully and freely available from the date of publication, with no restrictions, at EBI, PANGAEA, and iVirus. All of the samples, analyses, publications, and ownership of data are free from legal entanglement or restriction of any sort by the nations in whose waters Tara Oceans expedition sampled.
A list of participants and their affiliations appears in the Supplementary Information.
Extended data figures and tables
Extended Data Figure 1 Accumulation curves of populations and viral clusters and identification of abundant viral clusters in GOV samples.
a, b, Accumulation curves for viral populations (a) and viral clusters (b) were computed from 50 randomly shuffled samples (blue dots) for all samples, epipelagic, mesopelagic, or bathypelagic subsets. For each curve, the average of 50 iterations is displayed with red dots. c, Schematic of the selection process of abundant viral clusters. For each sample, viral clusters accounting for (up to) 80% of the sample diversity (as assessed by their Simpson index) was considered as abundant. On the left is an example for sample 125_MIX. Viral clusters detected as abundant in at least two different stations were included in the 38 viral clusters described in Fig. 2 and Extended Data Fig. 3.
Extended Data Figure 2 Comparison of viral clusters with other classification methods (phage proteomic tree and percentage of shared genes).
The phage proteomic tree includes the 756 GOV complete and near-complete genomes from epipelagic and mesopelagic samples and the closest reference genomes from RefSeq and environmental phages (dâ<â0.5 to a GOV sequence or found in the same viral cluster as a GOV sequence). Branches of monophyletic clades that include more than 3 GOV and/or uncultivated marine sequences with no isolate reference are highlighted in blue. All viral clusters with more than 8 representatives in the tree or part of the 38 abundant viral clusters are indicated by the colours of the outer ring. The name and affiliation (if available) of the 38 abundant viral clusters are indicated next to the viral cluster on the coloured ring. Viral clusters in which members were gathered in single monophyletic clades are indicated with a solid black outline, while viral clusters for which all-but-one member were gathered in a single monophyletic clade are highlighted with a dashed black outline. Distribution of the percentage number of shared genes estimated based on the number of shared protein clusters for viral genome/contigs pairs either between different viral clusters or within viral clusters (bottom right). On average, 73% and 39% of sequences within a viral cluster shared more than 20% and 40% of their genes, respectively, which represent the current thresholds currently accepted for sub-family and genus designations. Similarly, 83% of sequences within a viral cluster were consistently affiliated in the phage proteomic tree as they formed a monophyletic group that included only members of the particular viral cluster. Thus all three classification methods are largely consistent for the GOV dataset (see Supplementary Information).
Extended Data Figure 3 Summary of 34 of the 38 abundant viral clusters.
Summaries are given for the 34 abundant viral clusters not summarized in Fig. 2. Predicted genome size is based on the set of isolates and circular contigs in the viral cluster. NA (not applicable) corresponds to viral clusters either without any circular contigs, or for which the relative standard deviation of estimated genome size across the different isolate(s) and/or circular contigs is greater than 15%. Host association values are based on the number of cluster members associated with each host group. Statistical significance of this number of predictions was evaluated by comparison with an expected number of associations calculated using a Poisson distribution. Host associations based on known isolates are indicated with a star (for associations based on cultivated isolates) or a dot (for associations based on the detection of a cluster member in a microbial genome from the VirSorter Curated Dataset). The abundant epipelagic microbial groups (representing >1% of the microbial OTUs in epipelagic samples) are highlighted in bold. Distribution and relative abundance of viral clusters are based on the cumulated coverage of viral cluster members among sample viral populations. The main oceanic basins are indicated for each set of sample.
Extended Data Figure 4 Association between abundant viral clusters and abundance and diversity of host groups.
a, Abundance and diversity of bacterial and archaeal host groups associated with the 38 abundant viral clusters (see Fig. 2a). For each host group (at the phylum level, except for Proteobacteria where the class level is used), the different panels display, from top to bottom: (i) the number of viral clusters associated with this host group; (ii) the global relative abundance of this group estimated from the microbial metagenomic OTU counts; (iii) the global diversity of this group based on a Chao index computation including all Tara Oceans microbial metagenome samples (that is, including both alpha and beta diversity); (iv) the distribution of Chao indexes by sample for this group (the alpha diversity); and (v) the average Sorensen index between pairs of samples that include at least one OTU of this group (the beta diversity). OTU counts were derived from the 109 epipelagic microbial metagenomes described previously18. b, Pearson correlations between host-group relative abundance or diversity indices (global Chao index, average Chao index across samples and average Sorensen index across samples) and the number of viral clusters.
Extended Data Figure 5 Diversity, distribution, and genome context of dsrC genes in GOV contigs.
a, Maximum-likelihood tree (from an amino-acid alignment) including the 11 viral DsrC and microbial sequences from microbial metagenomes and NCBI nr database. The presence of conserved cysteine residues (termed CysA and CysB, as in ref. 24) is indicated with coloured circles next to each sequence or clade. The corresponding type of DsrC-like protein is indicated by the colouring of the branch or clade. The microbial metagenomic contigs affiliated to uncultivated, marine sulfur-oxidizing Gammaproteobacteria (as confirmed by complementary phylogenetic analysis of DsrAB; Supplementary Fig. 7) are indicated by stars. Viral AMG sequences are highlighted in blue, internal nodes and SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). Each dsrC AMG is associated with an abundance profile (right) that displays the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverageâthat is, contig coverage per Gb of metagenome). b, Comparison of dsrC-containing contigs maps. A T4-like marker gene (T4 baseplate) is indicated on the maps, alongside putative AMGs (FeâS biosyn, ironâsulfur cluster biosynthesis; Amt, ammonia transporter).
Extended Data Figure 6 Diversity, distribution, and genome context of soxYZ genes in GOV contigs.
a, Bayesian tree from an amino-acid alignment, including the four viral soxYZ and microbial sequences from microbial metagenomes and the NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Viral AMG sequences are highlighted in blue, posterior probabilities are represented by proportional circles (all nodes with posterior probability <0.40 were collapsed). Clades including sulfur-oxidizing proteobacteria are indicated on the tree. Each soxYZ AMG is associated with an abundance profile (on the right) displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). b, Comparison of soxYZ-containing contigs maps. For contig GOV_bin_4310_contig-100_0, the second largest contig from the same bin (GOV_bin_4310_contig-100_1) is displayed. T4-like marker genes (gp23 and the gene encoding T4 baseplate) are indicated on the maps alongside putative AMGs.
Extended Data Figure 7 Diversity, distribution, and genome context of P-II genes in GOV contigs.
a, Maximum-likelihood tree from an amino-acid alignment that includes the 10 viral P-II and microbial sequences from microbial metagenomes and the NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Sequences lacking the conserved uridylation site of P-II (Supplementary Fig. 5) are highlighted with a star next to the sequence name or clade. Viral AMG sequences are highlighted in blue, internal nodes SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). Each P-II AMG is associated with an abundance profile (right) displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). b, Comparison of P-II-containing contig maps. Ammonia transporter genes linked to P-II are indicated on the map (dark red). When available, the viral-cluster affiliation of each contig is indicated next to the contig name. Contig GOV_bin_5834_contig-100_7 is too short to be clustered based on a shared protein cluster network, however the seed contig of its population was clustered (in VC_12, Siphoviridae P12024virus), hence the indication of this seed contig affiliation.
Extended Data Figure 8 Diversity, distribution, and genome context of amoC gene in GOV contigs.
a, Maximum-likelihood tree (from an amino-acid alignment) including the GOV amoC AMG and microbial sequences from microbial metagenomes and NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Viral AMG sequence is highlighted in blue, internal nodes and SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). b, Abundance profile displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). c, Map of the amoC-containing contig.
Extended Data Figure 9
Normalized coverage of contigs harbouring AMG as a function of the temperature and nutrient concentrations of the corresponding samples. AMGs are grouped by clade based on their phylogeny (see Extended Data Figs 5, 6, 7) and their coverages are cumulated if multiple contigs are included in a clade. Plots display the cumulated normalized coverage of a clade (y axis) as function of the temperature or nutrient concentration (x axis) across all epipelagic samples for geographically unrestricted clades (that is, clades found in >5 samples, see Fig. 3c). Mesopelagic samples were excluded from the analysis since the AMG signal was detected in epipelagic samples. Samples are colour-coded according to ocean and sea regions (Supplementary Table 1). The calculated preferential range of temperature or nutrient concentration is displayed below each plot for epipelagic AMGs (P-II-4 distribution could not be linked to specific environmental conditions, but this AMG is the only one consistently retrieved in mesopelagic samples).
Related audio
Supplementary information
Supplementary Information
This file includes Supplementary Text and Data, Supplementary Figures 1-8 legends for Supplementary Tables 1-6 (see separate excel files) and additional references. The text includes additional information and literature context that help document details about the generation of the GOV dataset (assembly, identification of viral contigs, read mapping to viral contigs), viral cluster definition and affiliation (including comparison to other genome classification methods), host prediction (methods evaluation and results), discussions about AMG affiliation and host prediction for associated contigs, and list of supports and sponsors of Tara Oceans and Malaspina expeditions (including the list and affiliation of Tara Oceans coordinators). (PDF 8379 kb)
Supplementary Table 1
This file contains the list of viromes in the GOV dataset. Station number, depth, longhurst province, biome, and sequencing effort are indicated for each virome sample. (XLS 63 kb)
Supplementary Table 2
This file contains the GOV viral population summary. The number of contig and length of each population are presented, alongside their normalized coverage across the 104 GOV viromes. (XLS 13582 kb)
Supplementary Table 3
This file contains a summary of GOV Viral Clusters (VCs). For each VC, the composition (number and origin of VC members), affiliation, and coverage across GOV viromes are indicated. (XLS 865 kb)
Supplementary Table 4
This file contains the benchmarks of in silico host prediction methods. Results of host prediction methods evaluations performed using the NCBI RefSeq Virus database and VirSorter Curated Dataset. (XLS 7 kb)
Supplementary Table 5
This file contains the host prediction for GOV viral contigs that are associated with a population. Predictions are reported for each population with the type of signal (blastn, CRISPR, tetranucleotide composition), the host sequence used, and the strength of the prediction. (XLS 699 kb)
Supplementary Table 6
This file contains the PFAM domains detected in GOV viral contigs (â¥1.5kb). For each PFAM domain, the number of genes detected in the GOV dataset is indicated, alongside the functional category of the domain. (XLS 369 kb)
Rights and permissions
About this article
Cite this article
Roux, S., Brum, J., Dutilh, B. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689â693 (2016). https://doi.org/10.1038/nature19366
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature19366
This article is cited by
-
COBRA improves the completeness and contiguity of viral genomes assembled from metagenomes
Nature Microbiology (2024)
-
Metavirome mining from fjord sediments of Svalbard Archipelago
Journal of Soils and Sediments (2024)
-
Global diversity and biogeography of DNA viral communities in activated sludge systems
Microbiome (2023)
-
The compact genome of the sponge Oopsacas minuta (Hexactinellida) is lacking key metazoan core genes
BMC Biology (2023)
-
A remarkably diverse and well-organized virus community in a filter-feeding oyster
Microbiome (2023)