microorganisms
Article
Diversity of Human-Associated Bifidobacterial
Prophage Sequences
Darren Buckley 1 , Toshitaka Odamaki 2 , Jinzhong Xiao 2 , Jennifer Mahony 3 , Douwe van Sinderen 3, *
and Francesca Bottacini 3,4, *
1
2
3
4
*
Citation: Buckley, D.; Odamaki, T.;
Xiao, J.; Mahony, J.; van Sinderen, D.;
Bottacini, F. Diversity of
Human-Associated Bifidobacterial
Prophage Sequences. Microorganisms
2021, 9, 2559. https://doi.org/
10.3390/microorganisms9122559
Academic Editor: Henry C. Lin
Received: 20 October 2021
Accepted: 7 December 2021
Published: 10 December 2021
INFANT Research Centre, University College Cork, Cork, Ireland; darren.buckley@umail.ucc.ie
Next Generation Science Institute, Morinaga Milk Industry Co., Ltd., Zama 252-8583, Japan;
t-odamak@morinagamilk.co.jp (T.O.); j_xiao@morinagamilk.co.jp (J.X.)
APC Microbiome Ireland, School of Microbiology, University College Cork, Cork, Ireland; J.Mahony@ucc.ie
Biological Sciences, Munster Technological University, Cork, Ireland
Correspondence: d.vansinderen@ucc.ie (D.v.S.); francesca.bottacini@mtu.ie (F.B.)
Abstract: Members of Bifidobacterium play an important role in the development of the immature gut
and are associated with positive long-term health outcomes for their human host. It has previously
been shown that intestinal bacteriophages are detected within hours of birth, and that induced
prophages constitute a significant source of such gut phages. The gut phageome can be vertically
transmitted from mother to newborn and is believed to exert considerable selective pressure on target
prokaryotic hosts affecting abundance levels, microbiota composition, and host characteristics. The
objective of the current study was to investigate prophage-like elements and predicted CRISPR-Cas
viral immune systems present in publicly available, human-associated Bifidobacterium genomes.
Analysis of 585 fully sequenced bifidobacterial genomes identified 480 prophage-like elements with
an occurrence of 0.82 prophages per genome. Interestingly, we also detected the presence of very
similar bifidobacterial prophages and corresponding CRISPR spacers across different strains and
species, thus providing an initial exploration of the human-associated bifidobacterial phageome.
Our analyses show that closely related and likely functional prophages are commonly present
across four different species of human-associated Bifidobacterium. Further comparative analysis
of the CRISPR-Cas spacer arrays against the predicted prophages provided evidence of historical
interactions between prophages and different strains at an intra- and inter-species level. Clear
evidence of CRISPR-Cas acquired immunity against infection by bifidobacterial prophages across
several bifidobacterial strains and species was obtained. Notably, a spacer representing a putative
major capsid head protein was found on different genomes representing multiple strains across
B. adolescentis, B. breve, and B. bifidum, suggesting that this gene is a preferred target to provide
bifidobacterial phage immunity.
Keywords: gut microbiota; Bifidobacterium; bifidobacteria; phageome; prophage; CRISPR-Cas
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affiliations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1. Introduction
The human body harbours trillions of microbial cells whose coordinated actions
are believed to be crucial for human health and well-being. The gastrointestinal (GI)
compartment contains the highest density of microbes, known collectively as the gut
microbiota. Over the course of a human life, the gut microbiota develops and changes in
composition, with the formation of complex trophic relationships ranging from symbiosis
to parasitism [1–3].
Colonisation of the infant gut is a crucial step in the assembly of the gut community
and begins immediately after, and perhaps even before, birth via the vertical transfer
of microbes from the mother [4]. The development and maturation of the early gut
microbiome is a dynamic and orderly process influenced by several environmental and
host factors in which positive and negative interactions occur between key members of the
Microorganisms 2021, 9, 2559. https://doi.org/10.3390/microorganisms9122559
https://www.mdpi.com/journal/microorganisms
Microorganisms 2021, 9, 2559
2 of 16
gut community [5]. It has been shown that certain species of Bifidobacterium or Lactobacillus
play an important role in the development of the immature gut and are associated with
positive implications for long-term host health [6,7]. Conversely, negative features of
the gut microbiota such as reduced genetic diversity or aberrant composition, especially
with abundant Bacteroides or Ruminococcus, have been associated with pro-inflammatory
states [8–13].
Bacteriophages (phages) are viruses that infect prokaryotic hosts and play crucial roles
in shaping the composition and diversity of bacterial communities in many environments
including the early gut microbiota [14,15]. Prophages are bacteriophages whose genetic
material is integrated into the bacterial chromosome and are able to produce phages when
activated or induced [16]. It has been shown that intestinal bacteriophages can be detected
within hours of birth. Induced prophages, already present within the microbial community,
appear to constitute a large source of these intestinal virions [17]. Following birth, the
intestinal community experiences successions of drops in phage abundance, followed by
periods of “steady-states” [18]. Intestinal phages are believed to exert considerable evolutionary pressure on bacterial hosts affecting abundance levels, microbiota composition,
and host characteristics including colonisation, antibiotic resistance, and immune system
modulation [19].
The ability of prokaryotes to withstand phage attacks is key for their survival. Prokaryotes have developed numerous resistance mechanisms to fend off phage predation. One
of these phage resistance mechanisms is the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system. CRISPR-associated (Cas) proteins play an instrumental
role in the survival of prokaryotes against viral invaders by cleaving the DNA of foreign
genetic elements. The CRISPR-cas adaptive immune system has been found within 77% of
studied bifidobacterial genomes [20] and consists of a genetic locus containing the CRISPRs
with their non-repetitive, unique spacer sequences and adjacent genes encoding the Cas
proteins [21,22]. It is the spacers that enable the adaptable and sequence-specific inactivating mechanism of the CRISPR system. A spacer is a short segment of sequence that is
homologous to phage or other invading DNA sequences and represents a genetic memory
of a previous interaction with that sequence. By the acquisition of up to 200 spacers, the
prokaryotic host can acquire an expansive viral immune system affecting the efficacy of
bacteriophages in exerting evolutionary pressure in their host environment.
However, the dynamics and mechanisms involved in these phage-host interactions
are currently poorly described. Therefore, deeper insights into the interplay between the
bacterial gut community and their viral predators may allow causal links to be established
between microbial dysbiosis and disease, improve current therapeutics, and identify novel
targets for microbiome modulating strategies.
In the current study, a comprehensive analysis of predicted bifidobacterial prophages [23]
and CRISPR-Cas spacer arrays found among Bifidobacterium adolescentis, Bifidobacterium
bifidum, Bifidobacterium breve, and Bifidobacterium longum is presented. These species were
chosen as the most abundant bifidobacterial species in the human gut. Notably Bifidobacterium bifidum, Bifidobacterium breve, and Bifidobacterium longum are most abundant in early
life while Bifidobacterium adolescentis becomes more abundant in adulthood. By availing of
prophage and CRISPR-Cas prediction software, comparative analysis, and phylogenetic
inference, this work represents an overview of the observed diversity among identified
prophage-like elements in human-derived bifidobacterial strains, thereby representing a
foundation for a better characterisation of the human-associated bifidobacterial phageome.
2. Materials and Methods
The assembly information and complete genome sequences of publicly available
bifidobacterial strains from B. adolescentis, B. bifidum, B. breve, and B. longum species were
retrieved from the RefSeq database (https://www.ncbi.nlm.nih.gov/refseq/ (accessed on
1 July 2021)) for analysis.
Microorganisms 2021, 9, 2559
3 of 16
The whole-genome nucleotidic sequence of the downloaded bifidobacterial strains
was used as input for prophage prediction using the PHASTER (PHAge Search Tool
Enhanced Release) application programming interface (API) (https://phaster.ca (accessed
on 1 July 2021)).
Comparative analysis was performed at both nucleotide and amino acid levels using Average Nucleotide Identity (ANI) analysis, and an all-vs-all BLASTP alignment,
respectively, and MCL (Markov Clustering Algorithm) clustering analysis. ANI was assessed at the nucleotide level using Pyani v0.3+ (https://github.com/widdowquinn/pyani
(accessed on 1 July 2021). Two-way hierarchical clustering heatmaps were generated
from the pyani outputs for graphical representation of intra- and inter-species similarities across the dataset. The predicted prophages underwent comparative analysis using
an all-vs-all BLASTP alignment [24] and MCL clustering [25] following a previously described protocol for the analysis of lactococcal prophages [26]. Open Reading Frame
(ORF) prediction was performed from nucleotide sequences of the predicted bifidobacterial prophages using Prodigal v2.6.3 [27]. The deduced amino acid sequences of the
ORFs were then concatenated and compared in an all-vs-all manner using BLASTP. The
all-vs-all BLASTP output file was used as the input for mclblastline to generate clusters of
functionally related families. An example script of the comparative analysis is available
here: https://github.com/frbot/Comparative-genomics/blob/master/comparative.sh
(accessed on 1 July 2021). The obtained binary matrix containing the presence/absence
of phage gene families obtained from the mclblastline output was used to generate interactive heatmaps via heatmaply (https://cran.r-project.org/web/packages/heatmaply/
vignettes/heatmaply.html (accessed on 1 July 2021)).
Comparative locus maps of seventeen prophages selected from six clusters were
generated using the python-based graphical representation program Easyfig v2.2.2 (https:
//mjsull.github.io/Easyfig/ (accessed on 1 July 2021)). To improve the overall analysis,
manual revision of the PHASTER predictions was performed by adjusting the left and right
boundaries of predicted prophages. Where appropriate, a prophage’s left- and/or rightward ends were modified based on the following criteria: (i) presence of a tRNA-encoding
gene or a phage integrase located at either the left or right side of the predicted prophagelike element; (ii) consecutive hypothetical proteins containing prophage-like predicted
functions assigned to be part of the same prophage region; (iii) putatively co-transcribed,
and contiguous Open Reading Frames (of distance < 1 Kb), and encoded within the same
DNA strand.
Following Phaster revision, seventeen representative prophages were compared, and
their features were classified in functional categories based on the information retrieved
from the Virus Orthologous Group (VOG—http://vogdb.org/ (accessed on 1 July 2021))
and visualised.
Intra-group phylogenetic analysis was performed using VICTOR (Virus Classification, and Tree Building Online Resource—https://ggdc.dsmz.de/home.php (accessed
on 1 July 2021)). All pairwise comparisons of the predicted phage nucleotide sequences
were conducted using the Genome-BLAST Distance Phylogeny (GBDP) method [28]
employing settings recommended for prokaryotic viruses [29]. The resulting intergenomic distances were used to infer a balanced minimum evolution tree with branch
support via FASTME including SPR postprocessing [30] for each of the formulas D0,
D4, and D6, respectively. Branch support was inferred from 100 pseudo-bootstrap replicates each. The obtained trees were rooted at the midpoint and visualised with FigTree
http://tree.bio.ed.ac.uk/software/figtree/ (accessed on 1 July 2021). Taxon boundaries
at species, genus and family level were estimated with the OPTSIL program [31], the
recommended clustering thresholds, and an F value (fraction of links required for cluster
fusion) of 0.5 [32].
ViPtree (the Viral Proteomic Tree server—https://www.genome.jp/viptree/ (accessed
on 1 July 2021) was used to generate a “proteomic tree” of the seventeen predicted prophages
and infer their positioning within the viral tree of life.
Microorganisms 2021, 9, 2559
4 of 16
Identification of CRISPR/Cas systems in bifidobacterial genomes was performed using CRISPRCasFinder [33] (https://crisprcas.i2bc.paris-saclay.fr/CrisprCasFinder/Index
(accessed on 1 July 2021)). Non-redundant databases of predicted phages and spacers were
created using CD-HIT [34] and used to perform comparative analysis at the nucleotide
level in a phage-vs-spacer BLASTN alignment. The nucleotide sequences associated with
the BLAST hits were extracted from their parent bifidobacterial genome GenBank file.
Prodigal was used to predict the ORFs of the extracted nucleotide sequences, and the
compressed archive of the HMMER3 compatible Hidden Markov Models for the pVOG
database was used to verify gene identification and functional annotation with the viral
protein-specific database.
3. Results
3.1. Sequence Download and Feature Extraction
In order to compare the genetic features of predicted bifidobacterial prophage-like
elements, the GenBank flat files (gff) of all currently available bifidobacterial genomes
belonging to the B. adolescentis, B. bifidum, B. breve, and B. longum species were downloaded
from the RefSeq database (See Section 2). The complete and draft genome sequences
of the four above-mentioned species were downloaded for a total of 52 genomes from
B. adolescentis, 96 from B. bifidum, 108 from B. breve, and 329 from B. longum (of which
288 belong to subsp. longum and 41 to subsp. infantis). Therefore, the final dataset consisted
of 585 bifidobacterial genomes, which were extracted and processed for further analysis. Of
note, in the case of the B. longum species, two human-associated subspecies have previously
been identified (subsp. infantis and longum). For this study, we did divide B. longum into
two relevant subspecies B. longum subsp. longum and B. longum subsp. infantis to assess
possible links with the development and maturation of the early gut microbiome.
3.2. Prophage Prediction
PHASTER was employed to identify prophage-like elements in the downloaded bifidobacterial genomes. PHASTER is a prophage prediction tool used for rapid identification
and annotation of prophage sequences within bacterial genomes [35]. PHASTER prediction
identified 480 prophage-like regions among the 585 bifidobacterial genomes analysed, thus
giving an average of 0.82 prophages per human-associated bifidobacterial genome in line
with previous literature on B. breve [17,36]. The gff files of the identified prophage regions
are provided as prophages_all_gff.zip. Overall, 52 presumed prophages were predicted to
be resident in the assessed B. adolescentis genomes, while 55, 121, 245, and 7 were assigned
to be present in B. bifidum, B. breve, B. longum subsp. longum, and B. longum subsp. infantis
genomes, respectively (Tables S1–S4). As expected, we observed a high degree of variability
in the number of putative prophage regions per genome (ranging from zero to six), and
size of the identified regions (ranging from 4.5 Kb to 51.2 Kb with a median size of 14.3 Kb,
and a mode of 9.9 Kb). It is noteworthy that many of these smaller prophages are likely to
be cryptic, remnant, or fragmented prophages, while the 47 prophage regions predicted
to be larger than 30 Kb in size are more likely to represent complete, biologically intact
prophages. Overall, the median GC content observed was 60.8%. Specifically, B. adolescentis
prophages had an average GC content of 59 %, while those residents in B. bifidum, B. breve,
and B. longum (representing both subspecies) were shown to possess a GC content of
62 %, 59.8 %, and 61 %, respectively. While evaluating the GC content of the predicted
prophages, we confirmed that the observed values are similar to the average GC content of
their corresponding bifidobacterial hosts.
We used a simple nomenclature integrating the host species, bacterial strain, and
phages numbered in ascending fashion, based on their location within the strain genome.
Each species was given an abbreviation. B. adolescentis was Bad, B. bifidum was Bif, B. longum
was Blong, and B. breve was Bre. For example, BifBIOML-A4ph1 is the first phage that we
predicted in the B. bifidum strain BIOML-A4.
Microorganisms 2021, 9, 2559
5 of 16
Strikingly, PHASTER predicted only one presumed complete phage across the whole
dataset, encoded within the B. breve BR3 genome (Table S3). The manual investigation,
and VOG annotation of this prophage, here named B. breve BR3 phage 1 (25 Kb), indicated
a lack of genes encoding for structural phage proteins, thus suggesting that this phage
is unlikely to constitute a complete phage. Furthermore, a manually curated analysis of
strains included in this study has previously indicated the presence of functional phages in
certain bifidobacterial genomes, but these were predicted as incomplete by PHASTER. This
discrepancy highlights one of the main challenges we observed in applying this popular
(pro)phage prediction software to bifidobacterial genomes. A high degree of variability and
lack of a general consensus between viral sequences, coupled with the underrepresentation
of bifidobacterial (and actinobacterial) prophages in the reference database results in ineffective prediction of complete phages. While PHASTER represents a helpful tool to highlight
prophage-like regions in bifidobacterial genomes, manual inspection, and experimental
investigations are generally required to verify the completeness of the identified phages.
3.3. Comparative Analysis
3.3.1. Average Nucleotide Identity
In order to explore phage diversity at the nucleotide level, ANI was calculated for
the identified 480 prophage-like sequences using PYANI. The resulting ANI values were
organised in a matrix and visualised in a two-way hierarchical clustering heatmap for
graphical representation of intra- and inter-species similarities (Figures 1 and 2). Using a
cut off of 0.9 ANI, the analysis identified twenty-four potentially interesting clusters of
high ANI including both inter-, and intra-species examples containing an average of twelve
representatives per cluster.
Figure 1. ANI heatmap produced by heatmaply. Yellow boxes in the heatmap signify prophage sequences with high levels
of nucleotide similarity to each other. There are clusters of high similarity at both intra- and inter-species levels. The rowside
legend indicates the degree of predicted completeness of the predicted prophages. Bifidophages predicted to be likely
complete or complete are highlighted in dark blue on the “Completeness” column key. Suspected partial or likely remnant
bifidophages are highlighted in light blue. The red letters a to f represent the six clusters highlighted in Figure 4.
Microorganisms 2021, 9, 2559
6 of 16
Figure 2. Interactive elements of ANI heatmap produced by heatmaply. Each cell can be highlighted to display the ANI
value and the associated bifidophages. Representation of the lowermost left cluster in Figure 1 showing high intra-species
ANI of a prophage-like region across several Bifidobacterium bifidum strains. Notably, five of the six regions were annotated
as “likely complete” prophages by PHASTER.
3.3.2. Markov Clustering Algorithm Analysis
To perform a comparative genome analysis at the protein level of the putative prophages,
the deduced products of the predicted prophage ORFs were compared in an all-versusall manner using BLASTP, and MCL clustering to assess the diversity of prophage-like
elements and the degree of gene sharing between elements. A heatmap representation
of the resulting presence/absence and clustering matrix is indicated in Figure 3a. Further investigation of this heatmap revealed clusters containing predicted prophage-like
sequences that were shared across all four assessed species (Figure 3b). Following comparative analysis, the MCL clusters were individually investigated and assessed based on the
size and number of prophage-like sequences contained within each cluster, the number
of “likely complete” or “complete” representatives, ANI scores, and the degree of gene
sharing within the clusters. Six clusters of similar prophage-like sequences were selected
for further visualisation with a comparative locus map using Easyfig v2.2.2.
3.3.3. Comparison of ANI and MCL Results
Detailed comparative analysis of the ANI, and MCL heatmaps revealed 14 clusters
containing 253 prophages exhibiting a high level of similarity at both ANI and gene sharing
levels. These ranged from clusters of 5 to 45 prophages with an average of 10 prophages per
cluster. On further investigation, six of these clusters proved to contain similar phage-like
sequences with shared integration regions present in genomes of different bifidobacterial
strains/species and likely representing complete prophage genomes. For these reasons,
they were selected for a more detailed inspection. The remaining eight clusters were
deemed to contain less valuable information as despite having shared integration regions
these prophage sequences appeared to represent remnant or incomplete phages (as their
length was <13 Kb). The large number of likely remnant prophages observed, lacking
structural genes or other functional modules (and thus of small size), is in line with previous
observations indicating a high degree of genomic decay among prophages within bacterial
genomes [37].
Microorganisms 2021, 9, 2559
7 of 16
Figure 3. (a) Comparative Analysis heatmap produced by heatmaply. Gene families are presented on the horizontal axis,
while bifidophages are presented on the vertical axis. We used a binary yes/no 100% gene similarity cut off. Gene familie
that are present are represented by a yellow cell, while purple indicates gene absence. Specifically, yellow cells on the same
vertical line indicate a gene family present in multiple bifidophages, while yellow cells on the same horizontal line indicates
multiple gene families present in a single bifidophage. The rowside legends indicates which species of bidifobacteria the
phage belongs to and the degree of completeness predicted by PHASTER. B. longum is represented in pale green, B. breve in
dark green, B. adolescentis in pale blue, and B. bifidum in dark blue. As previous, bifidophages predicted to be likely complete
or complete are highlighted in dark blue and suspected partial or likely remnant bifidophages are highlighted in light blue.
A blue arrow highlights an example of a cluster of multiple genes shared between the same Bifidobacterium bifidum strains
highlighted in Figure 2. A red arrow indicates an example of gene families from a bifidophage shared across multiple
Bifidobacterium species, highlighted in Figure 2. (b) Comparative analysis heatmap highlighting the same cluster represented
in Figure 2 indicated here by a red arrow. This image is a zoomed-in view of the area highlighted in Figure 3a. Multiple
gene families are shared between the B. bifidum phages BifBIOML-a4 phage 1, BifBIOML-a6 phage 1, BifBIOML-a8 phage 1,
Bif LMG11041 phage 1 and BifNCIMB41171 phage 1. There is also an example of interspecies gene sharing between phages
targeting hosts from B. adolescentis, B. breve, and B. longum directly below the bifidum example. Most notably, Bad22L phage
2, Bre139W423 phage 1 and BlongSu859 phage 1.
Microorganisms 2021, 9, 2559
8 of 16
Seventeen predicted prophages were included in the six clusters selected for further
analysis. As described in the Section 2, a manual review of the phage boundaries was
performed to refine the completeness of the predicted prophages and to improve the
comparative analysis. The annotated gff file of the seventeen manually curated prophages
is provided as Supplementary File prophage_clusters_gff.zip. For each of the six clusters,
prophage alignments were generated (Figure 4), providing insight into the diversity of
bifidophages predicted to be present in the genomes of B. adolescentis, B. bifidum, B. breve,
and B. longum. Interestingly, our analysis showed that closely related, and likely functional,
prophages with shared integration points can be found among all four species.
Figure 4. Bifidoprophage genomic comparison and characterisation by Easyfig. Each arrow indicates an open reading
frame, orientated in the direction of transcription. The length of arrows is proportional to the the length of the predicted
open reading frame. Genes predicted to have similar functions are marked with the same color. Grey arrows indicate
hypothetical proteins. Genome architecture and mosaic relationships between the prophage genomes are highlighted
with pairwise alignments. The colour spectrum between genomes reflects amino acid identity as shown in the percentage
based legend. The prophages in each alignment are described in descending order according to the above representations
and are described further in the accompanying text. (a) Bre215W447aph1, BreDRBB30ph3, BadTF06-29ph2 and Bad11ph1.
(b) BifBIOML-A4ph1, BifBIOML-A6ph1 and BifBIOML-A8ph1. (c) BlongATCC15697ph1 (from B. longum subsp. infantis)
and BlongTF06-12ACph1 (from B. longum subsp. longum). (d) Bre139W423ph1 and BlongCECT7210ph3. (e) Bif85Bph3,
BreDRBB30ph2 and BreLMC520ph1. (f) BlongAF05-2ph1 (from B. longum subsp. longum), BreDRBB29ph1 and BreBR3ph5.
The purple arrows highlight sequence inversions observed in phage tail-encoding genes. The white arrows highlight reverse
transcriptase encoding genes mentioned in Section 3.3.3.
A high degree of mosaicism in bifidophage genomes was observed, in keeping with
previous observations, indicating that events of integration and gene shuffling commonly
occur in these elements [17,38–40]. Accurate detection of phage genes required the aid
of pVOG predictions. Annotation of phage genes is difficult due to the lack of reference
sequences in the databases. Notably, genes predicted to encode structural components of
phage particles were always predicted to be present and were conserved across prophages
(Figure 4). Structural genes encoding phage capsid and tail were found to be located in close
proximity to one another, indicating that co-location of functionally-related phage genes
is a feature of the bifidobacterial prophages studied. Co-location of functionally-related
phage genes has previously been described in other phage families [26,41].
Microorganisms 2021, 9, 2559
9 of 16
Diversity-generating retroelements (DGRs) are genetic cassettes that selectively mutate target genes to produce hypervariable proteins. First characterised in a group of
Bordetella species, bacteriophage BPP-1 is predicted to possess a DGR associated with a
hypervariable tail fiber-encoding gene, which together would be expected to enable host
tropism switching [42,43]. We found evidence of a gene encoding a reverse transcriptase
located in close proximity of phage tail structural genes across two clusters representative of potentially active prophages. In the cluster represented by Figure 4e, a predicted
reverse transcriptase-encoding gene is positioned in close proximity to a phage tail proteinencoding gene present in BreDRBB30ph2 in B. breve DRBB30 and BreLMC520ph1 in B. breve
LMC520 (indicated by a vertical white arrow in Figure 4e). In a second cluster represented in Figure 4f, a reverse transcriptase-encoding gene is located in close proximity
to a tail protein-encoding gene in prophage genomes BlongAF05-2ph1, BreBR3ph5, and
BreDRBB29ph1, identified in the genomes of B. longum AF05-2, B. breve BR3, and B. breve
DRBB29, respectively (also indicated by a vertical white arrow in this figure). These may
introduce increased sequence divergence between the targeted tail gene and the rest of
the genome when compared to this genome region in related phages. An almost identical
tail protein-encoding gene is present in both BlongAF05-2ph1 and BreDRBB29ph1 with
evidence of sequence shuffling observed in the sequence alignment and occurring in the
tail protein-encoded gene by BreBR3ph5, indicating a change in the target for this phage.
Occasionally, we identified genes encoding elements of the DNA replication machinery, but these were not always present (or likely not detected). This may be due to a lack of
homologous genes in the reference database. Complete bifidophages are often integrated in
the proximity of tRNA-encoding genes [23] and were always found to harbour an integrase
gene (at either left or right border of the prophage genome), which is involved in the
integration or excision of the prophage in or out of the host chromosome.
All possibly complete and active bifidophages predicted by our analysis show a
modular organisation of their genes, based on gene function, resembling that of lambdoid
phages [37] and previously described Bifidobacterium prophages [17,44]. These modules
can functionally be grouped into integration/excision, lysis, DNA replication, tail/head
morphogenesis, and DNA packaging. We have observed that phage integrase-encoding
genes are often located at the left or right boundary of predicted prophage-like elements.
The products of these integrase genes mediate the integration/excision of the phage
into/out of the host chromosome during lysogenic/lytic cycle switching and promote
site-specific recombination between phage and host DNA recognition sites. These integrase
modules were often found close to genes predicted to be involved in lysis as previously
described in bifidophages [44]. During the lytic phase of most bacteriophages, a phageencoded peptidoglycan-degrading activity is activated. Holin proteins provide access to the
peptidoglycan layer for endolysins by creating small holes in the membrane. Endolysins
are enzymes that degrade peptidoglycan and were always identified as single proteins in
our predicted elements. Genes involved in tail/head morphogenesis follow a modular
organisation and are always co-located in clusters within the identified prophage-like
elements. Notably, genes encoding DNA packaging proteins, capsid components, and tail
fibers were found to be located in close proximity.
3.3.4. Comparative Analysis of Seventeen Identified Prophages
The combination of ANI and MCL clustering analysis allowed the identification of six
MCL clusters encompassing seventeen prophages, which are commonly present among
four Bifidobacterium species.
Figure 4a shows a pairwise alignment, which represents an interesting example of
an identical prophage shared between two non-identical hosts B. breve 215W447a and
B. breve DRBB30. There is a close relationship with two other prophages identified in
B. adolescentis TF06-29 and B. adolescentis 1-11. Therefore, this cluster represents a good
candidate for presumed inter- and intra-species prophage transmission. Of note, apparent
DNA inversion, insertion, and deletion events are observed in one of the tail-associated
Microorganisms 2021, 9, 2559
10 of 16
genes (Figure 4) across B. breve and B. adolescentis, suggesting that these prophages represent
potentially active phages with different host specificities.
Figure 4b contains three predicted prophages in B. bifidum BIOML-A4, BIOML-A6, and
BIOML-A8. These prophage genomes are nearly identical with no major genetic differences
or reshuffling observed between them. All three prophages are located at different loci
within the genomes. Of note, these prophages lack a gene with similarity to genes encoding
cell lytic functions. We cannot exclude that one of the hypothetical proteins encoded in this
prophage is involved in lysis, and this may be due to a lack of homologous genes in the
reference database.
Figure 4c shows two predicted prophages in B. longum subsp. (B. longum subsp.
infantis ATCC15697 and B. longum TF06-12AC strains) with a high degree of mosaicism.
The left half of the prophage region is highly homologous, but DNA integrations have
promoted genomic insertions on the second half of the genome. This is a prime example of
the high level of genomic diversity that can be observed across bifidobacterial prophages
even within the same species.
Figure 4d represents an example of a nearly identical phage that is common between
B. breve 139W423 and B. longum subsp. infantis CECT7210. This prophage harbours several
gene encoding functions involved in DNA packaging, capsid, tail fiber, and DNA replication. These give reason to believe that this phage may be active and target representative
strains of both B. breve and B. longum species.
In Figure 4e a typical example of closely related prophages was observed among
strains B. bifidum 85B, B. breve DRBB30, and B. breve LMC520. With a degree of mosaicism,
it appears that two distinct prophages (identified by two sets of DNA packaging and tail
genes) have integrated side by side in both B. bifidum 85B and B. breve DRBB30, thus suggesting a common integration site. All three strains carry a highly homologous prophage
showing gene identity above 90%. Our comparative analysis indicates that prophage
regions are highly prone to genomic inversions and reshuffling, especially in phage tailassociated regions.
Figure 4f represents a set of highly homologous prophage-like sequences in B. longum
and B. breve, designated BlongAF05-2ph1, BreDRBB29ph1 and BreBR3ph5, showing evidence of deletions and gene shuffling. As described above, a reverse transcriptase encoding gene is found in close proximity to a phage tail protein-encoding gene in all three
prophages likely representing a diversity generating retroelement not yet described in
Bifidobacterium (reverse transcriptase encoding gene highlighted by a vertical white arrow
and tail-associated inversion indicated by a purple arrow in Figure 4f).
3.4. Phylogenetic Analysis
To investigate the phylogenetic relationships of the seventeen selected prophages,
two approaches were used. The seventeen nucleotide sequences were compared to each
other using VICTOR for intra-group phylogenetic analysis. This process generated a
phylogenetic tree which was used to infer the evolutionary relationships between these
predicted prophages. Prophages identified by ANI to be present between strains and
species have been found to be phylogenetically related (Figure 5). The clusters identified
in Section 3.3. are supported by this analysis, thus indicating that the relationships and
similarities highlighted by our comparative genome analysis are also supported at a
phylogenetic level.
A multifasta file containing all seventeen predicted prophage genomes was assessed
by ViPtree for a global phylogenetic analysis against a viral reference database. To determine the phylogenetic positioning of the 17 selected bifidobacterial prophages within the
proteomic tree of reference prokaryotic dsDNA viral sequences, a nucleotide multifasta
file containing all seventeen predicted prophages was submitted to the ViPtree analysis
(see Section 2). As viral groups identified in a proteomic tree approach correspond well to
established viral taxonomies [45], we decided to use this resource to establish the nearest
neighbours of our prophage-like elements in existing databases. Gene prediction on the vi-
Microorganisms 2021, 9, 2559
11 of 16
ral genomes and a protein similarity search against the NCBI non-redundant database were
performed in parallel with the proteomic tree construction. This tool generated a global
phylogenetic analysis of our query sequences against the database of double-stranded
DNA viruses known to be associated with prokaryotic hosts containing 2687 viruses. Notably, there are no Bifidobacterium-associated viruses in the reference database, which is
an indication of the underrepresentation of bifidophages in currently available tools. As
scores of zero are excluded from the resulting proteomic tree, this computation resulted in
a final tree of 1490 sequences. The assessed Bifidobacterium prophages are highlighted by a
red star in the generated tree presented in Figure 6.
Figure 5. Intra-group Phylogenetic Tree created by VICTOR. Phylogenetic tree obtained using the prophage genomes of
the seventeen prophages included in the six clusters of interest. The bar scale indicates phylogenetic distances. Bootstrap
values are reported for a total of 100 replicates.
The predicted prophages do not cluster in the same clade and are unlikely to constitute
a Bifidobacterium-specific family. Though residing in very related hosts, these prophages
appear to be distinctly different from each other. This result represents another clear
indication that there is a high degree of genetic diversity observed across bifidophages,
which also justifies the need for collecting a higher number of representative and manuallycurated viral sequences for this genus. The high level of mosaicism displayed by these
prophage-like elements may be explained by multiple module exchange events between
phages infecting Firmicutes and Actinobacteria. Such events could have been facilitated
by the common ecological origin of members of these two phyla [44]. However, we also
noticed that our 17 prophages generally cluster with viruses of actinobacterial hosts.
As shown in Figure 6, the prophages represented by Figure 4a, highlighted by the
inferior left red star, are similar to each other, and to other bifidobacteria-associated phages.
They are also distantly related to Microbacterium, and Gordonia phages, which target actinobacterial hosts.
Furthermore, the prophages represented in Figure 4c, highlighted by the right red star,
are the only example of bifidobacterial prophages closely located with mainly Firmicutesinfecting phages. They are similar to each other but much less similar to the other bifidophages and are even less similar to the other phages analysed.
Finally, Figure 6 shows the third and largest phylogenetic clade identified by ViPtree
represented by the superior left red star containing prophages from the remaining clusters.
They are similar to each other but not to other bifidobacteria-associated phages. They are
more related to Streptomyces and Mycobacterium infecting prophages, all having Actinobacteria as prokaryotic host. These findings once again highlight the lack of bifidobacterial
prophages in the reference databases.
Microorganisms 2021, 9, 2559
12 of 16
Figure 6. ViPtree circular proteomic tree of related dsDNA viruses with prokaryotic hosts. Submitted prophage sequences
are highlighted with red stars.
3.5. CRISPR-Cas Prediction
In order to identify the CRISPR-Cas spacer arrays, the software tool CRISPRCasFinder
was employed (see Section 2). The CRISPRCasFinder program enables easy detection
of CRISPRs and Cas genes in sequence data based on the search of protein similarity
using Hidden Markov Models (HMMs) and a genetic model of the identified components.
CRISPRCasFinder analysis provides summary information on CRISPR arrays and cas
gene clusters in the order in which they are positioned along the sequence and details on
individual CRISPR arrays and cas gene clusters.
CRISPRCasFinder prediction identified 38,676 protospacers among the 585 whole
genomes analysed. Further analysis using CD-HIT identified 14,501 non-redundant spacers,
thus giving an average of 24.7 unique spacers per human-associated bifidobacterial genome.
Overall, 1513 spacers were predicted to belong to B. adolescentis, while 883 spacers were
assigned to B. bifidum, 2895 to B. breve, and 9210 to B. longum strains. The non-repetitive
spacers were extracted from these predictions and processed for further investigation.
3.6. Comparison of Phage and Spacer Sequences
In order to compare the predicted prophage and spacer nucleotide sequences, the
spacer arrays were mapped onto the prophage sequences to assess if they target any
identified prophages. BLAST alignments of predicted spacer arrays against prophages
identified 1488 significant hits and 638 unique sequences using an e-value of 0.01 and
98% identity as cut-off values for significance (Table S7). These unique sequences likely
Microorganisms 2021, 9, 2559
13 of 16
represent prophage genes and validate our prediction that these sequences represent parts
of invading genetic elements. The larger number of hits indicates that these sequences are
conserved across different bifidophages and may be preferential targets for the CRISPR/Cas
system in bifidobacteria.
Functional assignment of the genes was confirmed using pVOG database and HMMER3 revealing 20 hits across 15 different genes (Table S5). Less than 3% of the unique
sequences had corresponding information in the pVOG database, once again highlighting
the lack of bifidobacterial viral information currently present in this database.
Of those with information, the majority were classified as hypothetical proteins in
REFSEQ database. Of note, 13 of the 20 hits targeted likely complete prophage sequences,
while 4 of the 20 hits targeted remnant or partial phage sequences containing some but
not all of the required modular genetic elements. The remaining three hits are unlikely to
hit prophage sequences, however, one sequence was associated with protein A6, a protein
essential for maturation of an immature virion, and another was associated with a probable
diguanylate cyclase DgcQ. Most notably, a spacer representing a putative major capsid
head protein was found on multiple strains across B. adolescentis, B. breve, and B. bifidum,
thus suggesting that this gene may be a preferred target for bifidobacterial phage immunity
(Table S6).
4. Discussion
4.1. Prophage Prediction
Due to the inherent diversity, and lack of a universal marker among viral genomes,
prophage prediction is a challenging task. PHASTER was chosen as a putative phage
search tool due to its accessible application programming interface and competitive sensitivity scores in benchmarking tests on gold standard genomes [46,47]. PHASTER proved
adequate in predicting prophage-like regions within bifidobacterial genomes, but manual
investigation of these regions is required to verify predictions.
We found the PHASTER predictions of bifidobacterial prophages to have inadequate
sensitivity. PHASTER only predicted one phage to be intact/complete, and on manual
examination, the identified genomic region of this prophage does not contain genes associated with key structural phage proteins. Conversely, PHASTER classified prophages
previously identified to be complete, as incomplete prophage fragments. This is likely
due to a lack of high-quality information associated with bifidobacterial genomes, or close
relatives, in the reference database. To attenuate this issue for future studies, we intend to
develop a curated database of predicted and, where relevant, functionally characterised
bifidobacterial prophage sequences to supplement relevant viral reference databases.
The PHASTER completeness scoring algorithm relies heavily on phage-related keywords being present in the submitted Genbank files. Bifidobacterial prophages are a
novel area of research, highlighted by the lack of a single Bifidobacterium-associated viral nucleotide sequence in the Virus-Host database featuring manual annotation of viral
sequences covering RefSeq, Genbank, Uniprot, and Viral Zone. Furthermore, only two
Bifidobacterium phages are included in the Virus Orthologous Group database, used in
this project for gene annotation. These findings underscore the predictive weaknesses
exposed in this study and explain the motivation for a high-quality reference database
of Bifidobacterium-associated phages. To attenuate this issue for future research, we will
develop and submit a curated database of predicted bifidobacterial prophage sequences to
supplement viral reference databases.
4.2. Assumptions and Limitations
During our analysis, we made several assumptions due to a lack of high-quality
information in this area. We assumed that phages under a certain size are more likely
to represent cryptic or remnant phages (<13 Kb), and that phage genomes larger than a
certain size are more likely to encompass complete or active phages when compared to
shorter phages (>20 Kb). These assumptions are based on studies involving likely unrelated
Microorganisms 2021, 9, 2559
14 of 16
phages and may not be suitable for extrapolation to this area. Furthermore, comprehensive
studies investigating bifidobacterial phages will need to test these assumptions by manually
reviewing and functionally characterising a larger selection of predicted phages.
As this is an in silico analysis, the obtained findings were not validated by experimentation of host–phage interactions in a recreated gut environment to verify phage activity.
Nonetheless, our comparative study allowed us to identify a high level of genomic diversity
in Bifidobacterium-associated phages, some of which appear to be capable of infecting multiple species. Future research efforts will be needed to experimentally verify our findings of
an available biobank in a fashion similar to previous work [39]. However, a manual review
of the selected prophage sequences was performed, and all prophages involved in the five
clusters contained the key genes required for prophage activity, thus suggesting they may
be functional. These findings lend further weight to the argument for further research in
this area.
4.3. Bifidobacterial Prophage Elements
The objective of this study was successfully achieved. Our analysis represents an
investigation of the diversity of viral elements in bifidobacterial species, showing that
either similar or highly diverse prophages can be found between species and strains. This
study provides compelling evidence showing that closely related, and likely functional,
prophages can be found shared across all four human-associated Bifidobacterium species
analysed. This novel finding positions bifidobacterial phages as a strong candidate for
further investigation on a larger scale.
Analysis of the CRISPR-Cas protospacers provided evidence of historical interactions between bifidobacterial species and strains. Despite a large number of prophage–
protospacer hits, most of the sequences did not have accompanying pVOG annotation, once
again highlighting the shortage of available bifidobacterial gene information. Most notably,
a spacer representing a putative major capsid head protein was found on multiple strains,
thus suggesting that this gene may be a preferred target for bifidobacterial phage immunity.
Further research would involve metagenomic analysis of new samples to allow the
identification of potentially active phages and follow-up analysis of selected samples in
a recreated gut environment to verify host–phage interactions. Increased knowledge of
host-phage interactions may lead to novel targets for microbiota-modulating therapies.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/
10.3390/microorganisms9122559/s1, Table S1: Predicted prophages of Bifidobacterium adolescentis,
Table S2: Predicted prophages of Bifidobacterium bifidum, Table S3: Predicted prophages of Bifidobacterium breve, Table S4: (a) Predicted prophages of Bifidobacterium longum subsp. longum, (b) Predicted
Prophages of Bifidobacterium longum subsp. infantis, Table S5: Results table of pVOG HMMER3 queries
against the phage-protospacer hits, Table S6: Annotation of pVOG genes associated with phageprotospacer hits, Table S7: BLAST alignments of predicted spacer arrays against predicted prophages.
Author Contributions: Conceptualization, F.B. and D.v.S.; methodology, F.B. and D.B.; software,
D.B.; validation, J.M., T.O. and J.X.; formal analysis, D.B.; investigation, D.B.; resources, F.B., D.v.S.
and D.B.; data curation, F.B. and D.B.; writing—original draft preparation, D.B.; writing—review
and editing F.B. and D.v.S.; visualization, D.B.; supervision, F.B. and project administration; F.B. All
authors have read and agreed to the published version of the manuscript.
Funding: This publication has emanated from research conducted with the financial support
of/supported in part by a grant from Science Foundation Ireland under Grant numbers SFI/12/RC/22
73-P1, SFI/12/RC/2273-P2 and 15/SIRG/3430.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data supporting the reported results can be found in the Supplementary Materials.
Conflicts of Interest: The authors declare no conflict of interest.
Microorganisms 2021, 9, 2559
15 of 16
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Backhed, F.; Ley, R.E.; Sonnenburg, J.L.; Peterson, D.A.; Gordon, J.I. Host-bacterial mutualism in the human intestine. Science
2005, 307, 1915–1990. [CrossRef] [PubMed]
Dieterich, W.; Schink, M.; Zopf, Y. Microbiota in the Gastrointestinal Tract. Med. Sci. 2018, 6, 116. [CrossRef]
Ventura, M.; O’Flaherty, S.; Claesson, M.J.; Turroni, F.; Klaenhammer, T.R.; van Sinderen, D.; O’Toole, P.W. Genome-scale analyses
of health-promoting bacteria: Probiogenomics. Nat. Rev. Microbiol. 2009, 7, 61–71. [CrossRef] [PubMed]
DiGiulio, D.B.; Romero, R.; Amogan, H.P.; Kusanovic, J.P.; Bik, E.M.; Gotsch, F.; Kim, C.J.; Erez, O.; Edwin, S.; Relman,
D.A. Microbial prevalence, diversity and abundance in amniotic fluid during preterm labor: A molecular and culture-based
investigation. PLoS ONE 2008, 3, e3056. [CrossRef] [PubMed]
Avershina, E.; Storro, O.; Oien, T.; Johnsen, R.; Wilson, R.; Egeland, T.; Rudi, K. Bifidobacterial succession and correlation networks
in a large unselected cohort of mothers and their children. Appl. Environ. Microbiol. 2013, 79, 497–507. [CrossRef] [PubMed]
Milani, C.; Mangifesta, M.; Mancabelli, L.; Lugli, G.A.; James, K.; Duranti, S.; Turroni, F.; Ferrario, C.; Ossiprandi, M.C.; van
Sinderen, D.; et al. Unveiling bifidobacterial biogeography across the mammalian branch of the tree of life. ISME J. 2017, 11,
2834–2847. [CrossRef]
Tojo, R.; Suarez, A.; Clemente, M.G.; de los Reyes-Gavilan, C.G.; Margolles, A.; Gueimonde, M.; Ruas-Madiedo, P. Intestinal
microbiota in health and disease: Role of bifidobacteria in gut homeostasis. World J. Gastroenterol. 2014, 20, 15163–15176.
[CrossRef]
Brown, C.T.; Davis-Richardson, A.G.; Giongo, A.; Gano, K.A.; Crabb, D.B.; Mukherjee, N.; Casella, G.; Drew, J.C.; Ilonen, J.;
Knip, M.; et al. Gut microbiome metagenomics analysis suggests a functional model for the development of autoimmunity for
type 1 diabetes. PLoS ONE 2011, 6, e25792. [CrossRef]
Cryan, J.F.; O’Riordan, K.J.; Cowan, C.S.M.; Sandhu, K.V.; Bastiaanssen, T.F.S.; Boehme, M.; Codagnone, M.G.; Cussotto, S.;
Fulling, C.; Golubeva, A.V.; et al. The Microbiota-Gut-Brain Axis. Physiol. Rev. 2019, 99, 1877–2013. [CrossRef]
Fujimura, K.E.; Sitarik, A.R.; Havstad, S.; Lin, D.L.; Levan, S.; Fadrosh, D.; Panzer, A.R.; LaMere, B.; Rackaityte, E.;
Lukacs, N.W.; et al. Neonatal gut microbiota associates with childhood multisensitized atopy and T cell differentiation. Nat. Med.
2016, 22, 1187–1191. [CrossRef]
Hua, X.; Goedert, J.J.; Pu, A.; Yu, G.; Shi, J. Allergy associations with the adult fecal microbiota: Analysis of the American Gut
Project. EBioMedicine 2016, 3, 172–179. [CrossRef]
Leach, S.T.; Lui, K.; Naing, Z.; Dowd, S.E.; Mitchell, H.M.; Day, A.S. Multiple Opportunistic Pathogens, but Not Pre-existing
Inflammation, May Be Associated with Necrotizing Enterocolitis. Dig. Dis. Sci. 2015, 60, 3728–3734. [CrossRef]
Saulnier, D.M.; Riehle, K.; Mistretta, T.A.; Diaz, M.A.; Mandal, D.; Raza, S.; Weidler, E.M.; Qin, X.; Coarfa, C.; Milosavljevic, A.; et al.
Gastrointestinal microbiome signatures of pediatric patients with irritable bowel syndrome. Gastroenterology 2011, 141, 1782–1791.
[CrossRef] [PubMed]
Suttle, C.A. Marine viruses-major players in the global ecosystem. Nat. Rev. Microbiol. 2007, 5, 801–812. [CrossRef] [PubMed]
Von Wintersdorff, C.J.; Penders, J.; van Niekerk, J.M.; Mills, N.D.; Majumder, S.; van Alphen, L.B.; Savelkoul, P.H.; Wolffs, P.F.
Dissemination of Antimicrobial Resistance in Microbial Ecosystems through Horizontal Gene Transfer. Front. Microbiol. 2016,
7, 173. [CrossRef] [PubMed]
Casjens, S.R. Comparative genomics and evolution of the tailed-bacteriophages. Curr. Opin. Microbiol. 2005, 8, 451–458. [CrossRef]
[PubMed]
Lugli, G.A.; Milani, C.; Turroni, F.; Tremblay, D.; Ferrario, C.; Mancabelli, L.; Duranti, S.; Ward, D.V.; Ossiprandi, M.C.;
Moineau, S.; et al. Prophages of the genus Bifidobacterium as modulating agents of the infant gut microbiota. Environ. Microbiol.
2016, 18, 2196–2213. [CrossRef] [PubMed]
Hsu, B.B.; Gibson, T.E.; Yeliseyev, V.; Liu, Q.; Lyon, L.; Bry, L.; Silver, P.A.; Gerber, G.K. Dynamic Modulation of the Gut Microbiota
and Metabolome by Bacteriophages in a Mouse Model. Cell Host Microbe 2019, 25, 803–814.e5. [CrossRef]
Koskella, B.; Brockhurst, M.A. Bacteria-phage coevolution as a driver of ecological and evolutionary processes in microbial
communities. FEMS Microbiol. Rev. 2014, 38, 916–931. [CrossRef] [PubMed]
Briner, A.E.; Lugli, G.A.; Milani, C.; Duranti, S.; Turroni, F.; Gueimonde, M.; Margolles, A.; van Sinderen, D.; Ventura, M.;
Barrangou, R. Occurrence and Diversity of CRISPR-Cas Systems in the Genus Bifidobacterium. PLoS ONE 2015, 10, e0133661.
[CrossRef] [PubMed]
Lillestol, R.K.; Redder, P.; Garrett, R.A.; Brugger, K. A putative viral defence mechanism in archaeal cells. Archaea 2006, 2, 59–72.
[CrossRef]
Nasko, D.J.; Ferrell, B.D.; Moore, R.M.; Bhavsar, J.D.; Polson, S.W.; Wommack, K.E. CRISPR Spacers Indicate Preferential Matching
of Specific Virioplankton Genes. mBio 2019, 10, e02651-18. [CrossRef]
Ventura, M.; Lee, J.H.; Canchaya, C.; Zink, R.; Leahy, S.; Moreno-Munoz, J.A.; O’Connell-Motherway, M.; Higgins, D.; Fitzgerald,
G.F.; O’Sullivan, D.J.; et al. Prophage-like elements in bifidobacteria: Insights from genomics, transcription, integration,
distribution, and phylogenetic analysis. Appl. Environ. Microbiol. 2005, 71, 8692–8705. [CrossRef] [PubMed]
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410.
[CrossRef]
Enright, A.J.; van Dongen, S.; Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res.
2002, 30, 1575–1584. [CrossRef] [PubMed]
Microorganisms 2021, 9, 2559
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
16 of 16
Murphy, J.; Bottacini, F.; Mahony, J.; Kelleher, P.; Neve, H.; Zomer, A.; Nauta, A.; van Sinderen, D. Comparative genomics and
functional analysis of the 936 group of lactococcal Siphoviridae phages. Sci. Rep. 2016, 6, 21345. [CrossRef] [PubMed]
Hyatt, D.; Chen, G.L.; Locascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: Prokaryotic gene recognition and translation
initiation site identification. BMC Bioinform. 2010, 11, 119. [CrossRef]
Meier-Kolthoff, J.P.; Auch, A.F.; Klenk, H.P.; Goker, M. Genome sequence-based species delimitation with confidence intervals
and improved distance functions. BMC Bioinform. 2013, 14, 60. [CrossRef] [PubMed]
Meier-Kolthoff, J.P.; Goker, M. VICTOR: Genome-based phylogeny and classification of prokaryotic viruses. Bioinformatics 2017,
33, 3396–3404. [CrossRef]
Lefort, V.; Desper, R.; Gascuel, O. FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference
Program. Mol. Biol. Evol. 2015, 32, 2798–2800. [CrossRef] [PubMed]
Goker, M.; Garcia-Blazquez, G.; Voglmayr, H.; Telleria, M.T.; Martin, M.P. Molecular taxonomy of phytopathogenic fungi: A case
study in Peronospora. PLoS ONE 2009, 4, e6319. [CrossRef] [PubMed]
Meier-Kolthoff, J.P.; Hahnke, R.L.; Petersen, J.; Scheuner, C.; Michael, V.; Fiebig, A.; Rohde, C.; Rohde, M.; Fartmann, B.;
Goodwin, L.A.; et al. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal
for delineating subspecies in microbial taxonomy. Stand. Genom. Sci. 2014, 9, 2. [CrossRef] [PubMed]
Kurtz, S.; Schleiermacher, C. REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics 1999, 15, 426–427.
[CrossRef]
Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics
2006, 22, 1658–1659. [CrossRef] [PubMed]
Arndt, D.; Marcu, A.; Liang, Y.; Wishart, D.S. PHASTER: A better, faster version of the PHAST phage search tool. Nucleic Acids
Res. 2016, 44, 16–21. [CrossRef] [PubMed]
Bottacini, F.; Morrissey, R.; Roberts, R.J.; James, K.; van Breen, J.; Egan, M.; Lambert, J.; van Limpt, K.; Knol, J.;
Motherway, M.O.; et al. Comparative genome and methylome analysis reveals restriction/modification system diversity
in the gut commensal Bifidobacterium breve. Nucleic Acids Res. 2018, 46, 1860–1877. [CrossRef] [PubMed]
Canchaya, C.; Proux, C.; Fournous, G.; Bruttin, A.; Brussow, H. Prophage genomics. Microbiol. Mol. Biol. Rev. 2003, 67, 238–276.
[CrossRef] [PubMed]
Arzamasov, A.A.; van Sinderen, D.; Rodionov, D.A. Comparative Genomics Reveals the Regulatory Complexity of Bifidobacterial
Arabinose and Arabino-Oligosaccharide Utilization. Front. Microbiol. 2018, 9, 776. [CrossRef]
Mavrich, T.N.; Casey, E.; Oliveira, J.; Bottacini, F.; James, K.; Franz, C.; Lugli, G.A.; Neve, H.; Ventura, M.; Hatfull, G.F.; et al.
Characterization and induction of prophages in human gut-associated Bifidobacterium hosts. Sci. Rep. 2018, 8, 12772. [CrossRef]
Van Hoek, A.H.; Mayrhofer, S.; Domig, K.J.; Florez, A.B.; Ammor, M.S.; Mayo, B.; Aarts, H.J. Mosaic tetracycline resistance genes
and their flanking regions in Bifidobacterium thermophilum and Lactobacillus johnsonii. Antimicrob. Agents Chemother. 2008, 52,
248–252. [CrossRef]
Summer, E.J.; Liu, M.; Gill, J.J.; Grant, M.; Chan-Cortes, T.N.; Ferguson, L.; Janes, C.; Lange, K.; Bertoli, M.; Moore, C.; et al.
Genomic and functional analyses of Rhodococcus equi phages ReqiPepy6, ReqiPoco6, ReqiPine5, and ReqiDocB7. Appl. Environ.
Microbiol. 2011, 77, 669–683. [CrossRef] [PubMed]
Benler, S.; Cobian-Guemes, A.G.; McNair, K.; Hung, S.H.; Levi, K.; Edwards, R.; Rohwer, F. A diversity-generating retroelement
encoded by a globally ubiquitous Bacteroides phage. Microbiome 2018, 6, 191. [CrossRef] [PubMed]
Liu, M.; Gingery, M.; Doulatov, S.R.; Liu, Y.; Hodes, A.; Baker, S.; Davis, P.; Simmonds, M.; Churcher, C.; Mungall, K.; et al.
Genomic and genetic analysis of Bordetella bacteriophages encoding reverse transcriptase-mediated tropism-switching cassettes.
J. Bacteriol. 2004, 186, 1503–1517. [PubMed]
Ventura, M.; Turroni, F.; Lima-Mendez, G.; Foroni, E.; Zomer, A.; Duranti, S.; Giubellini, V.; Bottacini, F.; Horvath, P.;
Barrangou, R.; et al. Comparative analyses of prophage-like elements present in Bifidobacterial genomes. Appl. Environ. Microbiol.
2009, 75, 6929–6936. [CrossRef] [PubMed]
Nishimura, Y.; Yoshida, T.; Kuronishi, M.; Uehara, H.; Ogata, H.; Goto, S. ViPTree: The viral proteomic tree server. Bioinformatics
2017, 33, 2379–2380. [CrossRef]
Arndt, D.; Grant, J.R.; Marcu, A.; Sajed, T.; Pon, A.; Liang, Y.; Wishart, D.S. PHAST, PHASTER and PHASTEST: Tools for finding
prophage in bacterial genomes. Brief. Bioinform. 2019, 20, 1560–1567. [CrossRef] [PubMed]
Reis-Cunha, J.L.; Bartholomeu, D.C.; Manson, A.L.; Earl, A.M.; Cerqueira, G.C. ProphET, prophage estimation tool: A stand-alone
prophage sequence prediction tool with self-updating reference database. PLoS ONE 2019, 14, e0223364. [CrossRef]