Tracing mother-infant transmission of bacteriophages by means of a novel analytical tool for shotgun metagenomic datasets: METAnnotatorX

Douwe Van Sinderen

Milani et al. Microbiome (2018) 6:145 https://doi.org/10.1186/s40168-018-0527-z RESEARCH Open Access Tracing mother-infant transmission of bacteriophages by means of a novel analytical tool for shotgun metagenomic datasets: METAnnotatorX Christian Milani1†, Eoghan Casey2,3†, Gabriele Andrea Lugli1†, Rebecca Moore4,2, Joanna Kaczorowska2,3, Conor Feehily2,5, Marta Mangifesta1,6, Leonardo Mancabelli1, Sabrina Duranti1, Francesca Turroni1,7, Francesca Bottacini2,3, Jennifer Mahony2,3, Paul D. Cotter2,5, Fionnuala M. McAuliffe4,2, Douwe van Sinderen1,2,3 and Marco Ventura1,7* Abstract Background: Despite the relevance of viral populations, our knowledge of (bacterio) phage populations, i.e., the phageome, suffers from the absence of a “gold standard” protocol for viral DNA extraction with associated in silico sequence processing analyses. To overcome this apparent hiatus, we present here a comprehensive performance evaluation of various protocols and propose an optimized pipeline that covers DNA extraction, sequencing, and bioinformatic analysis of phageome data. Results: Five widely used protocols for viral DNA extraction from fecal samples were tested for their performance in removal of non-viral DNA. Moreover, we developed a novel bioinformatic platform, METAnnotatorX, for metagenomic dataset analysis. This in silico tool facilitates a range of read- and assembly-based analyses, including taxonomic profiling using an iterative multi-database pipeline, classification of contigs at genus and species level, as well as functional characterizations of reads and assembled data. Performances of METAnnotatorX were assessed through investigation of seven mother-newborn pairs, leading to the identification of shared phage genotypes, of which two were genomically decoded and characterized. METAnnotatorX was furthermore employed to evaluate a protocol for the identification of contaminant non-viral DNA in sequenced datasets and was exploited to determine the amount of metagenomic data needed for robust evaluation of human adult-derived (fecal) phageomes. Conclusions: Results obtained in this study demonstrate that a comprehensive pipeline for analysis of phageomes will be pivotal for future explorations of the ecology of phages in the gut environment as well as for understanding their impact on the physiology and bacterial community kinetics as players of dysbiosis and homeostasis in the gut microbiota. Keywords: Gut microbiota, Metagenomics, Metagenome, Virome, Gastro intestinal tract, Vertical transmission * Correspondence: marco.ventura@unipr.it † Christian Milani, Eoghan Casey and Gabriele Andrea Lugli contributed equally to this work. 1 Laboratory of Probiogenomics, Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11a, 43124 Parma, Italy 7 Microbiome Research Hub, University of Parma, Parma, Italy Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Milani et al. Microbiome (2018) 6:145 Background The establishment of Next Generation Sequencing (NGS) technologies has facilitated explorations into the ecology and functionality of microorganisms living in complex communities [1]. Notably, a substantial portion of these research efforts has focused on the characterization of prokaryotes colonizing humans, being microbiota members that reside in various body sites, investigations that have clearly revealed the existence of an intimate relationship between these microbial populations and their host [2]. In this context, bacteria colonizing the gastro-intestinal tract have been described as a “forgotten organ” based on their understudied, yet key roles in a wide range of aspects of animal physiology, including the development, metabolism, and functionality of the immune system [3, 4]. Despite the scientific interest in the bacterial component of the gut microbiota, current knowledge on the associated (bacterio) phage populations, i.e., the phageome, is very limited. These bacterial viruses are believed to play an important role in influencing the ecology of prokaryotes, e.g., by modulating population dynamics and catalyzing horizontal gene transfer events [5], although knowledge on their prevalence, diversity, and specific functionalities is still in its infancy. In this context, only a limited number of studies have evaluated the functional role of phages in the gastrointestinal tract (GIT), the majority of which provide a descriptive profiling of the viral population in saliva or fecal samples [6–12]. This rather naïve view of phage ecology in the GIT reflects the very limited exploration of the role, if any, of phages in the development and evolution of common gut diseases, with studies focusing mainly on inflammatory bowel diseases (IBD), such as Crohn’s disease (CD) and ulcerative colitis (UC) [13, 14]. This knowledge gap can primarily be attributed to the lack of a comprehensive experimental pipeline for metagenomic analyses of viral populations that ideally should include an efficient and reliable protocol for viral DNA extraction and purification, as well as bioinformatic tools for phageome data management, processing and associated analysis. In fact, while a range of optimized protocols for extraction of phage DNA have been published [15, 16], their efficiency has not yet been comparatively assessed, primarily because the tools that are currently available for the analysis of phage metagenomic datasets rely on simple homology searches against a single viral database [17, 18]. Thus, the absence of data regarding other components of the metagenomic dataset, i.e., archaea, bacteria, and eukaryotes, does not permit an accurate evaluation of the viral DNA retrieved from an environmental sample. Moreover, the lack of available tools for efficient phageome assembly and subsequent functional interrogation and taxonomic classification of generated contigs prevents identification and reconstruction of the complete genome of free phage particles. Altogether, these limitations underline the need Page 2 of 16 for a thorough assessment of available methodologies for phageome analysis, with particular focus on the identification of the viral DNA extraction protocol providing the lowest relative abundance of exogenous DNA, as well as definition of a comprehensive bioinformatic pipeline for phylogenetic and genomic characterization of the viral population. For these reasons, the objective of the current report was to develop a start-to-finish protocol to cover phageome analysis from DNA extraction of fecal samples all the way to sequence data processing and database interrogations. We therefore performed a comparative analysis of the five most widely employed protocols for viral DNA extraction and purification from fecal samples, coupled with an in-depth evaluation of the generated sequences by means of a novel viral metagenomics analysis platform, which we called METAnnotatorX. This bioinformatics analysis platform supports a wide range of read- and assembly-based analyses using a multi-database, homology-based search approach to explore the viral, archaeal, bacterial, and eukaryotic biodiversity within a generated sequence dataset from a given sample. In order to provide an example of the functionality offered by analysis of phageomes, the optimal identified protocol for viral DNA extraction and METAnnotatorX was employed so as to profile phageomes of fecal samples collected from seven mothers and their corresponding infants. Results allowed the detection of mother-to-infant vertical transmission of phages, two of which were also genomically decoded and annotated. Methods Ethical statement and sample collection The study protocol was approved by the National Maternity Hospital Dublin ethics committee, and informed written consent for fecal sample collection and associated microbiological analyses was obtained from all participants or their legal guardians. Virus-like particle (VLP) isolation and DNA extraction Extraction protocols 1A, 1B, and 1C 0.5 g of fecal material was suspended in 45 ml of sterile SMG (sodium chloride magnesium sulphate) buffer (200 mM NaCl, 10 mM MgSO4, 50 mM Tris-HCl (pH 7.5), 0.01% gelatin) and homogenized in filter bags for 2 min at medium speed. The resultant solution was then incubated on ice for 1 h for virus-like particle (VLP) desorption. Samples were then centrifuged at 5000×g for 45 min at 4 °C. Supernatants were recovered and large particulates were removed using Whatman glass microfibre filters (Sigma-Aldrich, St. Louis, MO, USA). A second centrifugation step of 5000×g for 45 min at 4 °C was performed; the supernatant was then collected and, in the case of protocol 1A, used for VLP Milani et al. Microbiome (2018) 6:145 precipitation through supplementation with 10% PEG 6000 (Sigma-Aldrich, St. Louis, MO, U.S.A.) at 4 °C overnight. In contrast, in the case of protocol 1B, the supernatant was first subjected to 0.45-μm filtration (all filters obtained from Sarstedt, Numbrecht, Germany), while for protocol 1C, the supernatant was subjected to 0.45-μm filtration, followed by a 0.2-μm filtration, before precipitation of VLPs. PEG-precipitated VLPs were collected by centrifugation at 25000×g for 45 min at 4 °C. The resulting VLP-containing pellets where then re-suspended in 400 μl SMG buffer at 4 °C. The sample was DNase treated with 10 U ml−1 DNase I (Roche, Basel, Switzerland) for 1 h at room temperature with subsequent inactivation performed by heat treatment at 75 °C for 10 min. Viral DNA was then extracted using the Norgen Phage DNA isolation kit (Norgen Biotek Corp., Ontario, CA) according to the manufacturer’s instructions. Extraction protocols FD and DTT 0.5 g of fecal material was suspended in 1.2 mL of SMG buffer by vortexing for 2 min. The resultant solution was then incubated on ice for 1 h. Following incubation, a centrifugation step of 2500×g for 5 min at 4 °C was performed. The supernatant was then centrifuged again at 5000×g for 15 min at 4 °C. The supernatant was retained, and dithiothreitol (DTT) (Promega, Madison, WI, USA) was added to a final concentration of 6.5 mM and incubated for 1 h at 37 °C. In the FD protocol, this DTT treatment was absent. The resultant solution was then filtered employing a 0.45-μm filter. The sample was DNase treated with 10 U ml−1 DNase I (Roche) for 1 h at room temperature with subsequent inactivation performed by heat treatment at 75 °C for 10 min. Viral DNA was then extracted using the Norgen Phage DNA isolation kit according to the manufacturer’s instructions. DNA concentrations were quantified using the Qubit Fluorometer and Qubit dsDNA HS Assay Kit (Life Technologies, Bleiswijk, Netherlands). Shotgun metagenomics sequencing and analysis DNA was fragmented to 550–650 bp using a BioRuptor machine (Diagenode, Belgium). Samples were prepared following the TruSeq Nano DNA Sample Preparation Guide (Part#15041110Rev.D). Sequencing was performed using an Illumina NextSeq 500 sequencer with NextSeq Mid Output v2 Kit chemicals (Illumina Inc., San Diego, CA 92122, USA). Read- and assembly-based analyses were performed using the METAnnotatorX bioinformatic platform described below in this manuscript. Mapping of reads on nucleotide sequences was performed using the software BowTie2 [19] and retrieval of mapping or non-mapping reads was performed using the Sequence Alignment/Map tools (SAMtools) 43 [20]. Page 3 of 16 METAnnotatorX The METAnnotatorX bioinformatics platform described in this manuscript performs a range of in silico taxonomic and functional analyses of both reads and contigs assembled from shotgun metagenomics datasets. Details are reported in the “Results and Discussion” section while the default METAnnotatorX settings, used for all analyses reported in this manuscript, are listed in Additional file 1: Table S1. Results and discussion Comparative evaluation of various protocols for viral DNA extraction and purification Virome protocol analyses commonly consist of the isolation of virus-like particles (VLPs) from a fecal sample followed by extraction of the genetic material from these VLPs, prior to further analysis of the obtained genetic material by means of shotgun sequencing approaches [21–23]. Published protocols for VLP isolation from fecal samples all involve homogenization of fecal samples in a buffer, followed by centrifugation steps to remove bacteria and large particles, with a subsequent filtration step. Total nucleic acid can then be isolated from the resulting filtrate following a DNase treatment to remove bacterial DNA contamination [21–24]. Despite several attempts to optimize protocols for fecal VLP extraction (5, 6), a “gold standard” protocol has yet to be developed and to be accepted by the scientific community. A trial of an optimized PEG-precipitation method (Route 5 from [15] termed protocol 1A here) was undertaken with some modifications. Following sample homogenization, an incubation step on ice was included to encourage VLP desorption [25]. The other key modification of the protocol represents the omission of a CsCl density gradient centrifugation step as this has been shown to have a detrimental effect on phage infectivity [15] and can influence retrieved information on community composition by introducing a bias against certain phages [16]. Omission of the CsCl step is believed to lead to a more faithful representation of community composition, yet at the expense of a reduced efficiency of bacterial DNA removal [16]. To counteract this, we tested dead-end filtration steps, where protocol 1A lacked such a filtration step, protocol 1B included a 0.45-μm filtration step, whereas samples processed using protocol 1C were subjected to 0.45 μm followed by a 0.2-μm filtration, (in all protocols) prior to PEG precipitation. Furthermore, it was determined through a phage spiking experiment that PEG removal by buffer exchange was inefficient and in fact caused loss of phages during subsequent centrifugation (data not shown); therefore, DNA extraction was directly performed on the resuspended PEG-precipitated VLPs. Milani et al. Microbiome (2018) 6:145 Page 4 of 16 In addition to these protocols, two further methods from literature, namely the FD (termed here as 1D) and DTT (termed here as 1E) methods described by Kleiner et al. [16], were assessed. These PEG precipitation-based protocols require simple homogenization of the sample followed by filtration, DNase treatment, and DNA extraction, with the only difference between the two being a dithiotreitol treatment to degrade fecal mucus in the 1E protocol. In the current study, we modified these two protocols by the inclusion of a VLP desorption step and adjustment of the initial sample size. DNA yields were comparable when applying these five protocols on the same fecal sample, except in the case of protocol 1A, which yielded approximately four times more DNA compared to the other assessed protocols (Table 1). This was presumably due to the presence of host-derived DNA contaminating the viral DNA due to the lack of filtration and/or a density gradient centrifugation step. In terms of practical and experimental advantages, the 1D and 1E methods are vastly preferable to the 1A, 1B, and 1C methods in terms of execution time, with protocol completion achievable within 1 day as compared to 2 days, while also offering the advantage of a considerably shorter “hands-on” procedure (Table 1). Development of a comprehensive bioinformatic pipeline for analysis of shotgun metagenomic datasets A large proportion of the current, publicly available tools for analysis of (bacterio) phage populations, i.e., the phageome, relies on alignment against a single viral database to obtain taxonomic assignment of reads or pre-assembled contigs [17, 18]. This approach is very limiting since shotgun metagenomics datasets are mainly employed for taxonomic surveys, though such datasets may be able to generate novel information regarding genomic structure, functionality and host-specificity of identified phages. To fill these gaps, we developed a comprehensive bioinformatic platform, referred to here as METAnnotatorX, which performs a variety of analytical steps applied to a given shotgun metagenomic dataset. METAnnotatorX not only performs taxonomic and functional profiling of the reads, but also allows assembly and phage genome reconstruction, open reading frame identification, and annotation (Fig. 1). Moreover, the developed pipeline is able to analyze the read pools corresponding to archaea, bacteria, Table 1 Overview of viral DNA extraction protocols Protocol 1A 1B 1C 1D 1E Total DNA yield (ng) 58.3 8.2 10.2 15 10.2 Sample throughput (no. of samples processed simultaneously) 15–20 15–20 15–20 15–20 15–20 Protocol duration (days) 2 2 2 1 1 Hands-on time (hours) 10 10 10 6 7 and eukaryotes through iterative classification steps that exploit specific databases for viruses, bacteria, archaea, and eukaryotes. Notably, viruses are classified at the family and species level, while bacteria, archaea, and eukaryotes are classified at the genus and species level. Thus, the pipeline can be exploited not only to perform a comprehensive analysis of viromes, but also of shotgun metagenomic datasets that include bacterial, archaeal, and eukaryotic data (Fig. 1). METAnnotatorX is provided pre-installed in a virtual machine running Ubuntu 16.04.3 (http://probiogenomics.unipr.it/pbi/index.html). A graphic installation interface guides the user through a small number of steps for third party software installation and database downloading, which are necessary to install METAnnotatorX. The main graphic interface of the METAnnotatorX software allows selection of input dataset (s), output folder, and analysis steps (Fig. 1 and Additional file 1: Figure S1). Moreover, a configuration file provides the user with an option to modify a range of associated analysis parameters, such as the number of computing cores or databases to be used and specific cut-offs (Additional file 1: Figure S1). Outputs are provided as tabular files ready to be imported in spreadsheet software or as GenBank files in the case of assembled and annotated sequences (Additional file 1: Figure. S1). METAnnotatorX provides an innovative approach for taxonomic profiling of reads that relies on four consecutive read annotation steps querying four NCBI databases, including the Viral RefSeq, Archaeal RefSeq, bacterial RefSeq, and the whole RefSeq for eukaryote classification (Fig. 1). Notably, hits against the viral RefSeq are in default mode given the maximum priority, followed by archaeal RefSeq, bacterial RefSeq, and the whole RefSeq, so as to guarantee high sensitivity towards viral and archaeal profiling and to avoid annotation of viral reads as archaeal or bacterial in case of prophages that constitute part of prokaryotic genomes. In this regard, it is noteworthy that read-based metagenomic approaches cannot distinguish between reads corresponding to free viral particles and reads belonging to prophage genomes. Thus, efficient removal of non-viral DNA during DNA extraction is fundamental to minimize misclassification of prophages as free viral particles when analyzing phageomes. RefSeq databases are non-redundant datasets built from the sequence data available in the archival database GenBank, and each RefSeq record represents a synthesis of information obtained from GenBank records with identical sequences [26, 27]. It is also worth mentioning that the viral RefSeq database was selected as the default database for viral taxonomic classification since all its entries are genes predicted from manually revised and validated viral Milani et al. Microbiome (2018) 6:145 Page 5 of 16 Fig. 1 Schematic representation of the automated analyses performed by METAnnotatorX. Raw reads obtained from NGS sequencing can be directly used as input data for a range of read- and assembly-based analyses genomes. Although the viral RefSeq database is continually expanded and updated (at the time of writing of this manuscript), it encompasses 7485 genomes, whereas the GenBank viral database includes 5530 additional non-revised genomic sequences, thus totaling 13,015 genomes. METAnnotatorX was therefore developed to offer the possibility to interrogate the GenBank viral database as an alternative to the Viral RefSeq database if the user wants to maximize the sensitivity of the analysis while reducing specificity. Moreover, the user can request interrogation of alternative databases in the setting file. Notably, the header of fasta entries must be formatted as those included in the NCBI RefSeq database. In this context, external databases such as the recently published VirSorter [28] and IMG/VR databases [29] may represent useful alternatives. Nevertheless, due to the exponential increase of metagenomic data, such databases require constant updating as performed by NCBI for RefSeq databases. The user can also choose to perform functional classification analyses of the reads using custom databases for METAnnotatorX that can be downloaded and updated using a script available in the virtual machine. These analyses permit retrieval of (i) COG functional category profiles as based on the EggNog nomenclature [30]; (ii) carbohydrate-active enzymes, i.e., the glycobiome, based on CAZy database nomenclature [31]; and (iii) metabolic pathways based on the MetaCyc classification [32] (Fig. 1). Furthermore, shotgun metagenomic datasets can also be employed for metagenomic assembly using SPADES software [33] (Fig. 1). Notably, contigs > 5000 nucleotides are taxonomically classified by means of a novel in silico protocol, which taxonomically categorizes encoded ORFs following a multi-step approach, as described above for reads. The contigs are then classified with the most frequent taxonomy observed among genes encoded by each contig. Subsequently, the user can request the generation of GenBank files with annotated ORFs comprised of all contigs that share the same taxonomy at bacterial genus/viral family or species level (Fig. 1). ORFs are annotated based on the MEGAnnotator pipeline for accurate functional Milani et al. Microbiome (2018) 6:145 assignment [34]. Furthermore, each contig pool which corresponds to a taxonomic rank can be functionally profiled as indicated above. Additional analyses offered by METAnnotatorX encompass host prediction based on the CRISPRdb [35], as well as evaluation of the relative abundance and taxonomic profile of genes collected in user-provided databases, and identification of putative (pro) phage genomes without homologs in the NCBI Viral RefSeq database through screening of bacterial contigs for those encoding ORFs typically found in genomic modules of phages (Fig. 1). A comprehensive manual details the pipeline followed by each analysis offered by METAnnotatorX, including software and default cut-off values used (http:// probiogenomics.unipr.it/pbi/index.html). At the time of writing, we could not compare METAnnotatorX with the two available online tools for phageome analysis, i.e., VIROME [17] and MetaVir 2 [18] using a test dataset of known viral composition, due to limitations regarding input data or saturation of storage and computing capacities (details can be found in the Additional file 1). Nevertheless, we performed re-analysis of a dataset already processed with MetaVir2 that can be downloaded from the MetaVir2 website (Additional file 1: Table S2). In this context, comparison of the results retrieved through analysis of these datasets using MetaVir2 and METAnnotatorX revealed that METAnnotatorX is able to detect and classify a higher number of viral taxa (Additional file 1: Table S2). Notably, differences may be attributable to the more updated database and improved pipeline exploited by METAnnotatorX. In silico comparative analysis of shotgun metagenomics data obtained from the five tested protocols for viral DNA extraction and purification In order to reconstruct a detailed overview of the performance of the five tested protocols for double-stranded viral DNA purification, i.e. 1A, 1B, 1C, 1D and 1E, the same infant fecal sample was processed using these five distinct DNA isolation procedures. The obtained DNA was then subjected to Illumina paired-end sequencing. Subsequently, METAnnotatorX was employed for analysis of a sub-sample that consisted of 500,000 randomly selected reads of the total read pool obtained for each viral DNA purification protocol. Remarkably, read-based taxonomic profiling of the normalized datasets revealed that protocol 1E provides the best performance in terms of removal of non-viral DNA, i.e. the total relative abundance of reads not profiled as viral, in comparison to the other tested protocols (Fig. 2). Moreover, we evaluated the efficiency of recovered viral DNA obtained from the Page 6 of 16 five most abundant viral taxa profiled across all the five datasets (Fig. 2), encompassing both Siphoviridae and Podoviridae viral families. This was performed through mapping of reads obtained for each sample on the assembled contigs classified as the viral taxa listed in Fig. 2. Notably, evaluation of the number of mapped reads confirmed the superior performance of protocol 1E for all five viral taxa analyzed and demonstrated the absence of a species-specific bias in phage DNA enrichment. To confirm the observed performances in non-viral DNA removal, the five protocols were used to perform duplicate extractions from an additional human fecal sample. Notably, the obtained results confirmed the superior performance of protocol 1E and did not reveal any biases in the duplicates (Additional file 1: Figure S2). Overall, the 1E protocol yielded the best results both in terms of execution time (Table 1) and removal of non-viral DNA. Thus, this protocol to isolate and analyze double-stranded viral DNA was employed for processing of subsequent phageomes sequenced in this study. It should be noted that, since we did not include a multiple-displacement amplification (MDA) step in our pipeline, ssDNA viruses were not assessed (yet this can easily be remedied by the inclusion of such an MDA step). Evaluation of the sensitivity and specificity of phage classification as performed by METAnnotatorX An artificial sample of 500,000 reads was constructed using random reads corresponding to the virome of a human adult fecal sample with the addition of decreasing percentages of reads obtained from shotgun sequencing of C2 and 936 Lactococcus phages, as outlined in Additional file 1: Table S3. Notably, our findings showed that METAnnotatorX is able to accurately reconstruct the composition of the artificially composed sample, with limited discrepancies (Additional file 1: Table S3). Identification of contaminants The amount of viral DNA extracted from environmental samples may be of very low abundance, in particular when performing viral DNA extraction from samples with very limited bacterial colonization, e.g., meconium samples from newborns. This not only represents an issue for library preparation and sequencing yields but may also cause biases induced by environmental contamination. In fact, if the amount of viral DNA retrieved from a sample is limited, even the presence of a very low quantity of contaminating DNA is expected to result in the presence of a high relative contaminant level of sequencing reads in a given dataset. In order to identify and remove contaminant DNA in the phageome datasets used in this study, the genome Milani et al. Microbiome (2018) 6:145 Page 7 of 16 Fig. 2 Evaluation of non-viral DNA removal performances through analysis of viral DNA extracted from the same fecal sample using five different protocols. a The percentage of viral DNA detected through taxonomic classification of reads corresponding to coding regions. b The number of reads retrieved for the five most abundant viral taxa using the five different protocols align tool MAUVE [36] was exploited to perform cross-alignment of contigs obtained from the metagenomic datasets using METAnnotatorX. Interestingly, we observed that the five infant samples, which represent the first stool samples of these neonates following birth (i.e. the meconium), used for evaluation of mother-infant vertical transmission of phages (discussed below) share identical contigs (Additional file 1: Figure S3). ORF prediction and functional annotation of these contigs led to the reconstruction of the complete genome of phages extensively studied in our laboratory [37, 38]. Thus, we proceeded to map all datasets included in this study (see above and below for details) to these apparently contaminating contigs using a 99% identity cut-off in order to remove the reads corresponding to these putative contaminants. This cut-off was chosen to allow mapping of reads identical to the backbone, while permitting the 1% error rate that affects Illumina sequencing [39]. Moreover, the Milani et al. Microbiome (2018) 6:145 DNA extraction kit was identified as the primary source of contaminants and measures were taken to minimize such contamination, including the use of dedicated kits for fecal virome studies [40] and performing DNA isolation in laminar flow hoods [41]. However, in samples with low DNA abundance, the potential for DNA contamination remains significant, and it is therefore strongly recommended to perform routine sequencing of sham controls so as to monitor and identify DNA contaminants originating from the lab environment [42]. In this context, a newly acquired DNA extraction kit was used to process a sham sample, resulting in 46,269 quality-filtered reads, representing 4.6% of the target sequencing depth of 1,000,000 reads. Moreover, assembly of these data did not produce any contigs, thus indicating that the retrieved reads represent the sequencing background, i.e., sequencing and demultiplexing errors performed by the Illumina sequencer [39]. It is worth mentioning that while the latter approach is effective in the removal of contaminants that can be assembled into contigs, it is not possible to efficiently detect non-viral DNA present at low abundance using a read-based approach. Thus, prevention of DNA contamination should be considered critical in virome studies, particularly when analyzing samples with a low viral load. Notably, the presence of contaminant DNA from the lab environment seems to be a common issue in published phageome studies, as evidenced by MAUVE genomic alignment of contigs assembled from datasets sequenced in one of the largest infant phageome studies [22] and available in the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra). Interestingly, genome alignment of contigs assembled using METAnnotatorX from 12 random datasets revealed the presence of sequences taxonomically related to the Pseudomonas genus that are shared and show identity > 99% by most of the phageomes (Additional file 1: Figure S4). Notably, if a cross check of sequences assembled from unrelated samples processed in the same lab reveals contigs with high identity, they may represent contaminants from the environment. Thus, these contigs should be carefully evaluated and, if they are shown to represent contaminating sequences, be removed from such datasets. Evaluation of mother-infant transmission of phages To demonstrate the potential for a comprehensive pipeline for in depth analysis of phageomes, the 1E extraction protocol and METAnnotatorX platform were employed in combination for the analysis of fecal samples collected from seven mothers and their corresponding newborn infants. In total, 14 fecal samples were collected, corresponding to seven mothers sampled at 34 weeks of gestation and meconium Page 8 of 16 samples of their corresponding offspring. Viral DNA was extracted by means of the 1E protocol and sequenced with Illumina technology, aimed at achieving an output of 10 million reads for the meconium samples and 25 million reads for fecal samples of mothers. Shotgun sequencing produced a total of 148,797,588 reads, ranging from 238,288 to 34,105,775 reads (Additional file 1: Table S4). Notably, a high variability of sequencing yield was expected despite normalization of DNA used for library preparation, with those samples that encompass a very low virus load (i.e. meconium). The obtained datasets were processed with METAnnotatorX in order to classify the viral, archaeal, bacterial, and eukaryotic reads (Additional file 1: Figure S5). A complete profile of the archaeal and bacterial viral population is reported in Additional file 2. The obtained read-based taxonomic profiles revealed the presence of common viral taxa in each mother-infant pair (Table 2). To evaluate if the latter observation is due to sharing of the same phage genotypes, METAnnotatorX was employed for taxonomic assignment of contigs reconstructed from the infant datasets. Subsequently, the retrieved phage contigs were used as backbones for mapping of the reads constituting the dataset of the corresponding mother (Fig. 3). To avoid false positives, mappings were performed using a stringent identity cut-off of 99%. As reported above, a 99% cut-off was chosen to allow mapping of reads that are identical to the backbone while permitting the 1% error rate, which is imputable to Illumina sequencing [39]. Notably, for each mother-infant pair, reads of the mother’s phageome were mapped on multiple phage contigs reconstructed from the corresponding infant, thus suggesting a vertical route for phageome transmission from the maternal gut virome to her offspring. (Fig. 3). In contrast, cross-alignment of each mother dataset to phage contigs assembled from unrelated infants did not produce any common reads, thus indicating the absence of environmental contamination and supporting the notion of vertical transmission. Genome decoding and functional characterization of vertically transmitted phage genomes METAnnotatorX was employed for the reconstruction and functional characterization of complete viral genomes predicted to be transmitted from mother to newborn. This analysis resulted in the deduction of two phage genomes shared by Infant_7 and its corresponding mother’s phageome, named Infant_7_Myoviridae_36549 and Infant_7_Siphoviridae_29493, with genome sizes of 90,522 and 45,589 bp, respectively (Fig. 4). ORF prediction and functional annotation based on PHAST database [43] revealed that Infant_7_Myoviridae_36549 Milani et al. Microbiome (2018) 6:145 Page 9 of 16 Table 2 List of viral taxa with abundance > 0.01% identified in the fecal samples of both mother and corresponding newborn Viral taxonomy Motherinfant 1 Motherinfant 2 Motherinfant 3 Motherinfant 4 Motherinfant 5 Motherinfant 6 Unclassified__Bacillus virus 1 Shared Unclassified__Clostridium phage phiCT453A Shared Unclassified__Geobacillus phage GBSV1 Shared Unclassified__Geobacillus virus E2 Shared Myoviridae Abouovirus__Brevibacillus virus Abouo Myoviridae Felixo1virus__Escherichia virus AYO145A Myoviridae Felixo1virus__Escherichia virus EC6 Motherinfant 7 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Myoviridae Felixo1virus__Escherichia virus HY02 Shared Myoviridae Felixo1virus__Escherichia virus JH2 Shared Myoviridae Felixo1virus__Escherichia virus VpaE1 Shared Myoviridae Felixo1virus__Salmonella virus FelixO1 Shared Myoviridae Felixo1virus__Salmonella virus HB2014 Shared Myoviridae Felixo1virus__Salmonella virus UAB87 Shared Myoviridae Mooglevirus__Citrobacter phage Michonne Shared Myoviridae Myoviridae_Unclassified__Bacillus phage 0305phi8-36 Shared Myoviridae Myoviridae_Unclassified__Bacillus phage AR9 Shared Myoviridae Myoviridae_Unclassified__Bacillus phage BCD7 Shared Myoviridae Myoviridae_Unclassified__Bacillus phage BM5 Myoviridae Myoviridae_Unclassified__Bacillus phage G Myoviridae Myoviridae_Unclassified__Bacillus phage SP-15 Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Myoviridae Myoviridae_Unclassified__Brochothrix phage A9 Shared Myoviridae Myoviridae_Unclassified__Clostridium phage c-st Shared Shared Shared Shared Shared Shared Shared Myoviridae Myoviridae_Unclassified__Clostridium phage phiCD211 Myoviridae Myoviridae_Unclassified__Cronobacter phage vB_CsaM_GAP32 Shared Shared Myoviridae Myoviridae_Unclassified__Enterobacteria phage phi92 Shared Shared Shared Shared Shared Myoviridae Myoviridae_Unclassified__Escherichia phage vB_EcoM_Alf5 Shared Myoviridae Myoviridae_Unclassified__Staphylococcus phage SA1 Shared Unclassified__Paenibacillus phage phiIBB_Pl23 Podoviridae Cba41virus__Cellulophaga virus Cba172 Shared Shared Podoviridae Cp1virus__Streptococcus virus Cp1 Shared Podoviridae Phi29virus__Bacillus virus B103 Podoviridae Phi29virus__Bacillus virus GA1 Shared Shared Shared Podoviridae Phi29virus__Bacillus virus phi29 Shared Shared Podoviridae Podoviridae_Unclassified__Actinomyces phage Av-1 Shared Shared Milani et al. Microbiome (2018) 6:145 Page 10 of 16 Table 2 List of viral taxa with abundance > 0.01% identified in the fecal samples of both mother and corresponding newborn (Continued) Viral taxonomy Motherinfant 1 Motherinfant 2 Motherinfant 3 Motherinfant 4 Motherinfant 5 Motherinfant 6 Podoviridae Podoviridae_Unclassified__Bacillus phage Aurora Shared Podoviridae Podoviridae_Unclassified__Bacillus phage MG-B1 Shared Podoviridae Podoviridae_Unclassified__Bacillus phage VMY22 Shared Podoviridae Podoviridae_Unclassified__Cellulophaga phage phi18:3 Shared Podoviridae Podoviridae_Unclassified__Planktothrix phage PaV-LD Shared Shared Shared Podoviridae Podoviridae_Unclassified__Streptococcus Shared phage Str-PAP-1 Unclassified__Pseudomonas phage O4 Shared Siphoviridae C5virus__Lactobacillus virus c5 Shared Siphoviridae Cba181virus__Cellulophaga virus Cba181 Shared Siphoviridae Cecivirus__Bacillus virus 250 Shared Siphoviridae Ff47virus__Mycobacterium virus Ff47 Shared Siphoviridae Mudcatvirus__Arthrobacter virus Mudcat Shared Shared Siphoviridae Omegavirus__Mycobacterium phage Courthouse Shared Shared Shared Shared Siphoviridae Pepy6virus__Rhodococcus virus Pepy6 Shared Siphoviridae Pepy6virus__Rhodococcus virus Poco6 Shared Siphoviridae Phietavirus__Staphylococcus phage EW Siphoviridae Sfi21dt1virus__Streptococcus phage 7201 Shared Siphoviridae Sfi21dt1virus__Streptococcus phage Abc2 Shared Siphoviridae Sfi21dt1virus__Streptococcus phage DT1 Shared Siphoviridae_Unclassified__Bacillus phage BCJA1c Shared Shared Shared Shared Shared Siphoviridae_Unclassified__Bacillus phage BtCS33 Shared Siphoviridae_Unclassified__Bacillus phage phi4J1 Siphoviridae_Unclassified__Bacteriophage Lily Shared Shared Siphoviridae_Unclassified__Bacteroides phage B124-14 Shared Siphoviridae_Unclassified__Brevibacillus phage Sundance Shared Siphoviridae_Unclassified__Cellulophaga phage phi46:1 Shared Siphoviridae_Unclassified__Clostridium phage 39-O Shared Siphoviridae_Unclassified__Clostridium phage phi8074-B1 Shared Siphoviridae_Unclassified__Clostridium phage phiCT453B Shared Shared Siphoviridae_Unclassified__Croceibacter phage P2559Y Siphoviridae_Unclassified__Enterococcus Shared Shared Motherinfant 7 Milani et al. Microbiome (2018) 6:145 Page 11 of 16 Table 2 List of viral taxa with abundance > 0.01% identified in the fecal samples of both mother and corresponding newborn (Continued) Viral taxonomy Motherinfant 1 Motherinfant 2 Motherinfant 3 Motherinfant 4 Motherinfant 5 Motherinfant 6 Shared Shared Shared Motherinfant 7 phage EFC-1 Siphoviridae_Unclassified__Geobacillus virus E3 Siphoviridae_Unclassified__Helicobacter phage phiHP33 Shared Siphoviridae_Unclassified__Lactobacillus phage Ldl1 Shared Siphoviridae_Unclassified__Lactococcus phage 1706 Siphoviridae_Unclassified__Lactococcus phage 50,101 Shared Shared Siphoviridae_Unclassified__Lactococcus phage bIL285 Shared Siphoviridae_Unclassified__Lactococcus phage Tuc2009 Shared Siphoviridae_Unclassified__Mycobacterium phage BTCU-1 Shared Siphoviridae_Unclassified__Pseudomonas phage YMC11/07/P54_PAE_BP Shared Siphoviridae_Unclassified__Riemerella phage RAP44 Shared Siphoviridae_Unclassified__Staphylococcus phage StB20 Shared Siphoviridae_Unclassified__Streptococcus phage Dp-1 Siphoviridae_Unclassified__Streptococcus phage MM1 Shared Shared Shared Shared Siphoviridae_Unclassified__Streptococcus phage PH15 Shared Siphoviridae_Unclassified__Streptococcus phage phiNJ2 Shared Siphoviridae_Unclassified__Streptococcus phage SM1 Shared Shared Shared Siphoviridae_Unclassified__Synechococcus phage S-CBS3 Shared Siphoviridae_Unclassified__Vibrio phage SIO-2 Shared Siphoviridae Spbetavirus__Bacillus virus SPbeta Unclassified__Streptococcus phage 20617 Shared Shared Shared Shared Shared Shared Unclassified__Streptococcus phage phiARI0131-2 Unclassified__Uncultured phage crAssphage encodes 118 genes, 89 of which were shown to encode hypothetical proteins, while Infant_7_Siphoviridae_29493 encodes a total of 62 genes, representing 41 hypothetical proteins (Fig. 4). Interestingly, evaluation of the taxonomy of homologous genes identified in the PHAST database showed that 63% of the ORFs encoded by Infant_7_Myoviridae_36549 and 32% of the ORFs encoded by Infant_7_Siphoviridae_29493 share distant homology with Shared Shared Shared Shared genes encoded by Bacillus phage BCD7 and Bacteroides phage B124-14, respectively. This finding suggests that the hosts of Infant_7_Myoviridae_36549 and Infant_7_Siphoviridae_29493 are members of the Firmicutes and Bacteroidetes phyla. Analysis of phage modules revealed that Infant_7_ Myoviridae_36549 and Infant_7_ Siphoviridae_29493 possess four modules typical of Myoviridae and Milani et al. Microbiome (2018) 6:145 Page 12 of 16 Fig. 3 Identification of vertical transmission events of phages. For each of the seven enrolled infants, the assembled viral contigs > 5000 bp were used as backbone for stringent mapping of sequencing reads obtained from their mothers. In case mapping reads were observed, the contig was either colored in yellow or in black Siphoviridae phages, i.e., DNA replication, DNA packaging, and tail and lysis module (Fig. 4). Both phages lack a clear lysogeny module, with no genes encoding integrases found within the genomes (Fig. 4). Notably, Infant_7_Myoviridae_36549 possesses a large region encoding genes of unknown function interspersed with genes with putative functions such as a putative type III restriction protein and two queuosine (Que) biosynthesis genes. The latter compound is a modified nucleoside that is present in certain tRNAs [44, 45], and genes for its synthesis have been identified in other Myoviridae phages [46]. Evaluation of the minimum amount of shotgun metagenomics data needed for robust phage biodiversity assessment The choice of the target sequencing depth is a critical step in resource management when planning phageome studies using shotgun metagenomics sequencing. To define the number of sequence reads needed to obtain a reliable and comprehensive coverage of the biodiversity from read- and contig-based analyses, the five datasets of mothers with > 20 M reads (Additional file 1: Table S4) were subjected to iterative analysis of subsamples to construct rarefaction curves reporting the number of phage species identified in sub-samplings from 0.5 M up to 20 M reads. Notably, for each of the five samples analyzed, the number of phage taxa detected increased exponentially until a read pool size of about 7 M reads, beyond which a plateauing was observed (Additional file 1: Figure S6). Moreover, the average curve obtained by integration of the five datasets revealed that 7.5 M reads are enough to cover 70% of the total biodiversity identified in the total pool of 20 M reads. This indicates that 7 M reads are the target sequencing depth needed to obtain a comprehensive read-based overview of the phage population harbored by a given fecal sample obtained from a healthy adult (Fig. 5). Focusing on the assembly and analysis of phage genomes, we constructed a rarefaction curve reporting the number of viral taxa for which we obtained at least one assembled contig > 5000 bp at increasing subsampling points from 0.5 M up to 25 M reads. Interestingly, the obtained graphs revealed that the curve rapidly increased up to the point of 7.5 M reads and then tends to plateau (Additional file 1: Figure S6). Furthermore, evaluation of the average curve revealed that 7.5 M reads are enough to obtain contigs for 70% of the total number of phage taxa assembled 25 M reads. Notably, evaluation of the logarithmic trendline for both the read- and contig-based rarefaction curves revealed that doubling the amount of shotgun metagenomic reads would only provide a limited increase of 14.4 and 16% in viral taxa identified through read profiling and contig classification, respectively (Additional file 1: Figure S7). Altogether, these results indicate that the minimum sequencing depth needed for robust read-based profiling and Milani et al. Microbiome (2018) 6:145 Page 13 of 16 Fig. 4 Genomic characterization of two vertically transmitted phages. a, b The genome map of the phages Infant_7_Myoviridae_36549 and Infant_7_Siphoviridae_29493, respectively. Genes are colored based on their predicted function assembly of gut phageomes of healthy adults is approximately 7.5 M reads. In fact, additional sequencing outputs do not provide additional valuable information about the biodiversity of phages in these complex ecosystems (Fig. 5). Nevertheless, re-evaluation and adjustment of the target sequencing depth is necessary in case of analysis of samples with remarkably lower or higher bacterial and viral biodiversity, e.g., infant gut or soil samples. In this context, we exploited the dataset of Infant 7 to reconstruct rarefaction curves of viral taxa observed trough taxonomic classification of reads and assembled contigs > 5000 bp (Additional file 1: Figure S8). Notably, these data confirmed 7 M reads as an optimal sequencing depth also for comprehensive analysis of infant phageomes (Additional file 1: Figure S8). Conclusions Despite environmental and host-associated microbiomes being the subject of an increasing number of studies, the phageome associated with these complex bacterial communities remains poorly understood. This is primarily due to the current lack of a gold standard procedure for viral DNA extraction and data analysis. Instead, there are a variety of different procedures associated with publications, which makes it near impossible to compare results between different studies. To address this issue, we performed a comparative assessment of various DNA extraction methods for virome analysis and developed a novel bioinformatic tool, METAnnotatorX, which enables an integrated and comprehensive processing of viral and prokaryotic metagenomic datasets. Notably, this software can perform a wide range of read- and assembly-based analyses and represents, to date, the most complete bioinformatics platform for the study of viromes. METAnnotatorX was employed to perform an in-depth comparison of five protocols for viral DNA extraction and enrichment, leading to the identification of protocol 1E as the one that performs best in terms of removal of non-viral DNA, unbiased representation of the viral population and execution time. Moreover, we also analyzed five deep-sequenced viromes retrieved from feces of human adults. The generated results demonstrated that 7.5 M reads represent a sufficient sequencing depth needed for both read- and assembly-based investigation of gut phageomes of heathy human adults. The proposed comprehensive pipeline for phageome analysis was then used to shed light on the vertical acquisition of phages by infants. Analysis of fecal samples collected from seven mothers and their newborns revealed that they share identical phage Milani et al. Microbiome (2018) 6:145 Page 14 of 16 Fig. 5 Evaluation of the optimal sequencing depth for read- and assembly-based analyses through investigation of five sequenced datasets. a The average number of viral taxa detected by means of read-based taxonomic profiling at increasing sub-samplings of the total read pools. b The average number of viral taxa detected among contigs assembled using increasing sub-samplings of the total read pools genotypes, thus indicating the existence of a putative vertical route for transmission of phages from the mother to the infant. Moreover, METAnnotatorX also allowed, for the first time, the reconstruction and characterization of the genome of two genotypes predicted to be vertically transmitted. Notably, these results demonstrate that the use of a comprehensive pipeline for analysis of phageomes will be pivotal for future explorations of the dark matter of phageomes, such as phage ecology in the gut environment, the role of phages in modulating the bacterial population and their impact on the physiology as well as bacterial community kinetics as players of dysbiosis and homeostasis in the gut microbiota. Additional files Additional file 1: Supplementary text, tables and figures. (DOCX 5306 kb) Additional file 2: Archaeal and bacterial viruses profiled in the analyzed samples. (XLSX 152 kb) Acknowledgements We gratefully acknowledge the technical assistance of Elaine M. Lawton, Teagasc, Moorepark Food Research Centre, Fermoy, Co. Cork, Ireland. This research benefited from the HPC (High Performance Computing) facility of Milani et al. Microbiome (2018) 6:145 Page 15 of 16 the University of Parma, Italy. We furthermore thank GenProbio srl for financial support of the Laboratory of Probiogenomics. Funding This work was primarily funded by the EU Joint Programming Initiative—A Healthy Diet for a Healthy Life (JPI HDHL, http:// www.healthydietforhealthylife.eu/) to DvS (in conjunction with Science Foundation Ireland [SFI], Grant number 15/JP-HDHL/3280) and to MV (in conjunction with MIUR, Italy). J.M. is supported by a Starting Investigator Research Grant (SIRG) (Ref. No. 15/SIRG/3430) funded by the Science Foundation Ireland (SFI). This publication was also supported in part by a research grant from Science Foundation Ireland (SFI) under Grant No. 12/RC/2273 and 16/SP/3827 and by a research grant from Alimentary Health Ltd. The study is supported by Fondazione Cariparma, under TeachInParma Project. Availability of data and materials Shotgun metagenomics datasets obtained in this study were deposited in SRA under the accession number PRJNA422569. METAnnotatorX virtual box can be downloaded from the Probiogenomics lab website (http://probiogenomics.unipr.it/pbi). Authors’ contributions CM and GAL developed the METAnnotatorX, performed the bioinformatics analyses, and wrote the manuscript. OC performed the viral DNA extractions and wrote the manuscript. Sample and metadata collection was done by RM, JK, CF, JM, PDC, and FMMA. MM, SD, and FT performed the library preparation and Illumina sequencing. FB contributed to the additional bioinformatics analyses during the revision of the manuscript. LM performed the bioinformatics analyses for phage contig reconstruction. DvS and MV designed the study and wrote the manuscript. All authors read and approved the final manuscript. Ethics approval and consent to participate The study protocol was approved by National Maternity Hospital Dublin ethics committee, and informed written consent for fecal sample collection and associated microbiological analyses was obtained from all participants or their legal guardians. Consent for publication Not applicable. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Competing interests The authors declare that they have no competing interests. 18. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Author details 1 Laboratory of Probiogenomics, Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11a, 43124 Parma, Italy. 2APC Microbiome Ireland, University College Cork, Cork, Ireland. 3School of Microbiology, University College Cork, Cork, Ireland. 4 UCD Perinatal Research Centre, School of Medicine, University College Dublin, National Maternity Hospital, Dublin, Ireland. 5Teagasc, Moorepark Food Research Centre, Fermoy, Co., Cork, Cork, Ireland. 6GenProbio srl, Parma, Italy. 7Microbiome Research Hub, University of Parma, Parma, Italy. 19. 20. 21. 22. 23. Received: 19 February 2018 Accepted: 9 August 2018 24. References 1. Heather JM, Chain B. The sequence of sequencers: the history of sequencing DNA. Genomics. 2016;107:1–8. 2. Eloe-Fadrosh EA, Rasko DA. The human microbiome: from symbiosis to pathogenesis. Annu Rev Med. 2013;64:145–63. 3. O'Hara AM, Shanahan F. The gut flora as a forgotten organ. EMBO Rep. 2006;7:688–93. 4. Milani C, Duranti S, Bottacini F, Casey E, Turroni F, Mahony J, Belzer C, Delgado Palacio S, Arboleya Montes S, Mancabelli L, et al. The first microbial 25. 26. 27. colonizers of the human gut: composition, activities, and health implications of the infant gut microbiota. Microbiol Mol Biol Rev. 2017;81. https://doi. org/10.1128/MMBR.00036-17. Print 2017 Dec. Mirzaei MK, Maurice CF. Menage a trois in the human gut: interactions between host, bacteria and phages. Nat Rev Microbiol. 2017;15:397–408. Pride DT, Salzman J, Haynes M, Rohwer F, Davis-Long C, White RA 3rd, Loomer P, Armitage GC, Relman DA. Evidence of a robust resident bacteriophage population revealed through analysis of the human salivary virome. ISME J. 2012;6:915–26. Abeles SR, Pride DT. Molecular bases and role of viruses in the human microbiome. J Mol Biol. 2014;426:3892–906. Yolken RH, Severance EG, Sabunciyan S, Gressitt KL, Chen O, Stallings C, Origoni A, Katsafanas E, Schweinfurth LA, Savage CL, et al. Metagenomic sequencing indicates that the oropharyngeal phageome of individuals with schizophrenia differs from that of controls. Schizophr Bull. 2015;41:1153–61. Ogilvie LA, Caplin J, Dedi C, Diston D, Cheek E, Bowler L, Taylor H, Ebdon J, Jones BV. Comparative (meta) genomic analysis and ecological profiling of human gut-specific bacteriophage phiB124-14. PLoS One. 2012;7:e35053. Yarygin K, Tyakht A, Larin A, Kostryukova E, Kolchenko S, Bitner V, Alexeev D. Abundance profiling of specific gene groups using precomputed gut metagenomes yields novel biological hypotheses. PLoS One. 2017;12:e0176154. Minot S, Bryson A, Chehoud C, Wu GD, Lewis JD, Bushman FD. Rapid evolution of the human gut virome. Proc Natl Acad Sci U S A. 2013;110:12450–5. Minot S, Sinha R, Chen J, Li H, Keilbaugh SA, Wu GD, Lewis JD, Bushman FD. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 2011;21:1616–25. Norman JM, Handley SA, Baldridge MT, Droit L, Liu CY, Keller BC, Kambal A, Monaco CL, Zhao G, Fleshner P, et al. Disease-specific alterations in the enteric virome in inflammatory bowel disease. Cell. 2015;160:447–60. Tetz GV, Ruggles KV, Zhou H, Heguy A, Tsirigos A, Tetz V. Bacteriophages as potential new mammalian pathogens. Sci Rep. 2017;7:7043. Castro-Mejia JL, Muhammed MK, Kot W, Neve H, Franz CM, Hansen LH, Vogensen FK, Nielsen DS. Optimizing protocols for extraction of bacteriophages prior to metagenomic analyses of phage communities in the human gut. Microbiome. 2015;3:64. Kleiner M, Hooper LV, Duerkop BA. Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes. BMC Genomics. 2015;16:7. Wommack KE, Bhavsar J, Polson SW, Chen J, Dumas M, Srinivasiah S, Furman M, Jamindar S, Nasko DJ. VIROME: a standard operating procedure for analysis of viral metagenome sequences. Stand Genomic Sci. 2012;6:427–39. Roux S, Tournayre J, Mahul A, Debroas D, Enault F. Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. BMC Bioinformatics. 2014;15:76. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The sequence alignment/ map format and SAMtools. Bioinformatics. 2009;25:2078–9. Breitbart M, Haynes M, Kelley S, Angly F, Edwards RA, Felts B, Mahaffy JM, Mueller J, Nulton J, Rayhawk S, et al. Viral diversity and dynamics in an infant gut. Res Microbiol. 2008;159:367–73. Lim ES, Zhou Y, Zhao G, Bauer IK, Droit L, Ndao IM, Warner BB, Tarr PI, Wang D, Holtz LR. Early life dynamics of the human gut virome and bacterial microbiome in infants. Nat Med. 2015;21:1228–34. Reyes A, Blanton LV, Cao S, Zhao G, Manary M, Trehan I, Smith MI, Wang D, Virgin HW, Rohwer F, Gordon JI. Gut DNA viromes of Malawian twins discordant for severe acute malnutrition. Proc Natl Acad Sci U S A. 2015;112:11941–6. Hayes S, Mahony J, Nauta A, van Sinderen D. Metagenomic approaches to assess bacteriophages in various environmental niches. Viruses. 2017;9:127. Hoyles L, McCartney AL, Neve H, Gibson GR, Sanderson JD, Heller KJ, van Sinderen D. Characterization of virus-like particles associated with the human faecal and caecal microbiota. Res Microbiol. 2014;165:803–12. Brister JR, Ako-Adjei D, Bao Y, Blinkova O. NCBI viral genomes resource. Nucleic Acids Res. 2015;43:D571–7. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–5. Milani et al. Microbiome (2018) 6:145 28. Roux S, Enault F, Hurwitz BL, Sullivan MB. VirSorter: mining viral signal from microbial genomic data. PeerJ. 2015;3:e985. 29. Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann M, Mikhailova N, Rubin E, Ivanova NN, Kyrpides NC. Uncovering Earth’s virome. Nature. 2016;536:425–30. 30. Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016;44:D286–93. 31. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42:D490–5. 32. Caspi R, Billington R, Ferrer L, Foerster H, Fulcher CA, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/ genome databases. Nucleic Acids Res. 2016;44:D471–80. 33. Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, et al. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol. 2013;20:714–37. 34. Lugli GA, Milani C, Mancabelli L, van Sinderen D, Ventura M. MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation. FEMS Microbiol Lett. 2016;363. https://doi.org/10.1093/femsle/fnw049. Epub 2016 Mar 1. 35. Grissa I, Vergnaud G, Pourcel C. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics. 2007;8:172. 36. Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5:e11147. 37. Wegmann U, O'Connell-Motherway M, Zomer A, Buist G, Shearman C, Canchaya C, Ventura M, Goesmann A, Gasson MJ, Kuipers OP, et al. Complete genome sequence of the prototype lactic acid bacterium Lactococcus lactis subsp. cremoris MG1363. J Bacteriol. 2007;189:3256–70. 38. Ventura M, Zomer A, Canchaya C, O'Connell-Motherway M, Kuipers O, Turroni F, Ribbera A, Foroni E, Buist G, Wegmann U, et al. Comparative analyses of prophage-like elements present in two Lactococcus lactis strains. Appl Environ Microbiol. 2007;73:7771–80. 39. Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res. 2015;43:e37. 40. Monaco CL, Kwon DS: Next-generation sequencing of the DNA virome from fecal samples. Bio Protoc. 2017;7(5). https://doi.org/10.21769/BioProtoc.2159. 41. Thoendel M, Jeraldo P, Greenwood-Quaintance KE, Yao J, Chia N, Hanssen AD, Abdel MP, Patel R. Impact of contaminating DNA in whole-genome amplification kits used for metagenomic shotgun sequencing for infection diagnosis. J Clin Microbiol. 2017;55:1789–801. 42. Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner P, Parkhill J, Loman NJ, Walker AW. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 2014;12:87. 43. Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. PHAST: a fast phage search tool. Nucleic Acids Res. 2011;39:W347–52. 44. Iwata-Reuyl D. Biosynthesis of the 7-deazaguanosine hypermodified nucleosides of transfer RNA. Bioorg Chem. 2003;31:24–43. 45. Morris RC, Elliott MS. Queuosine modification of tRNA: a case for convergent evolution. Mol Genet Metab. 2001;74:147–59. 46. Holmfeldt K, Solonenko N, Shah M, Corrier K, Riemann L, Verberkmoes NC, Sullivan MB. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc Natl Acad Sci U S A. 2013;110:12798–803. Page 16 of 16

RELATED PAPERS

RELATED TOPICS

Log In

Tracing mother-infant transmission of bacteriophages by means of a novel analytical tool for shotgun metagenomic datasets: METAnnotatorX

Tracing mother-infant transmission of bacteriophages by means of a novel analytical tool for shotgun metagenomic datasets: METAnnotatorX

Related Papers

RELATED PAPERS

RELATED TOPICS