We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families ... more We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to the species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, pre-compiled executables, a detailed manual and example cases.
The evidence for universal common ancestry (UCA) is vast and persuasive, and a phylogenetic test ... more The evidence for universal common ancestry (UCA) is vast and persuasive, and a phylogenetic test was proposed for quantifying its odds against independently originated sequences based on the comparison between one and several trees. This test was successfully applied to a well-supported homologous sequence alignment, being however criticized once simulations showed that even alignments without any phylogenetic structure could mislead its conclusions. Despite claims to the contrary, we believe that the counterexample successfully showed a drawback of the test, of relying on good alignments. Here we present a simplified version of this counterexample, which can be interpreted as a tree with arbitrarily long branches, and where the test again fails. We also present another simulation showing circumstances whereby any sufficiently similar alignment will favor UCA irrespective of the true independent origins for the sequences. We therefore conclude that the test should not be trusted unl...
Background / Purpose: This poster describes the software biomc2, which detects phylogenetic recom... more Background / Purpose: This poster describes the software biomc2, which detects phylogenetic recombination using a prior distribution of distances between topologies of consecutive alignment segments. Main conclusion: We present its usage on an HIV genomic dataset and on simulated alignments, where we show the importance of our chosen probability for the distances, which resemble the minimum number of recombinations.
We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families ... more We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to the species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex m...
Douglas Theobald recently developed an interesting test putatively capable of quantifying the evi... more Douglas Theobald recently developed an interesting test putatively capable of quantifying the evidence for a Universal Common Ancestry uniting the three domains of life (Eukarya, Archaea and Bacteria) against hypotheses of Independent Origins for some of these domains. We review here his model, in particular in relation to the treatment of Horizontal Gene Transfer and to the quality of sequence alignment.
Background
The human APOBEC3G (A3G) protein activity is associated with innate immunity agains... more Background
The human APOBEC3G (A3G) protein activity is associated with innate immunity against HIV-1 by inducing high rates of guanosines to adenosines (G-to-A) mutations (viz., hypermutation) in the viral DNA. If hypermutation is not enough to disrupt the reading frames of viral genes, it may likely increase the HIV-1 diversity. To counteract host innate immunity HIV-1 encodes the Vif protein that binds A3G protein and form complexes to be degraded by cellular proteolysis.
Methods
Here we studied the pattern of substitutions in the vif gene and its association with clinical status of HIV-1 infected individuals. To perform the study, unique vif gene sequences were generated from 400 antiretroviral-naive individuals.
Results
The codon pairs: 78--154, 85--154, 101--157, 105--157, and 105--176 of vif gene were associated with CD4+ T cell count lower than 500 cells per mm3. Some of these codons were located in the 81LGQGVSIEW89 region and within the BC-Box. We also identified codons under positive selection clustered in the N-terminal region of Vif protein, between 21WKSLVK26 and 40YRHHY44 regions (i.e., 31, 33, 37, 39), within the BC-Box (i.e., 155, 159) and the Cullin5-Box (i.e., 168) of vif gene. All these regions are involved in the Vif-induced degradation of A3G/F complexes and the N-terminal of Vif protein binds to viral and cellular RNA.
Conclusions
Adaptive evolution of vif gene was mostly to optimize viral RNA binding and A3G/F recognition. Additionally, since there is not a fully resolved structure of the Vif protein, codon pairs associated with CD4+ T cell count may elucidate key regions that interact with host cell factors. Here we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.
Genomic regions participating in recombination events may support distinct topologies, and phylog... more Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected.
ABSTRACT Two distinct Bayesian coalescent methods were used to analyze the evolutionary history o... more ABSTRACT Two distinct Bayesian coalescent methods were used to analyze the evolutionary history of the HIV-1 subtypes B, F and C, as well as the BF and CB recombinant viruses and determine the age of the most recent ancestral (MRCA) of the main strains circulating in South America. Near full-length sequences of pure B, F and C subtypes, as well as of the BF and CB recombinants, were analyzed.
Background Since the discovery of deep-sea chemosynthesis-based communities, much work has been d... more Background Since the discovery of deep-sea chemosynthesis-based communities, much work has been done to clarify their organismal and environmental aspects. However, major topics remain to be resolved, including when and how organisms invade and adapt to deep-sea environments; whether strategies for invasion and adaptation are shared by different taxa or unique to each taxon; how organisms extend their distribution and diversity; and how they become isolated to speciate in continuous waters.
Abstract Inferences about the evolutionary history of biological sequence data are greatly influe... more Abstract Inferences about the evolutionary history of biological sequence data are greatly influenced by the presence of recombination, that tends to disrupt the phylogenetic signal. Current recombination detection procedures focus on the phylogenetic disagreement of the data along the aligned sequences, but only recently the link between the quantification of this disagreement and the strength of the recombination was realised.
Abstract Recombinant DNA sequences can not be represented by a single topology since recombinant ... more Abstract Recombinant DNA sequences can not be represented by a single topology since recombinant segments support distinct evolutionary histories. Existing methods for recombination detection can handle only a limited number of taxa, constraining the recombination analysis to cases where the phylogeny can be assumed known for the parental sequences. If the analysis is conducted independently for each putative recombinant sequence, potential recombinations between them are neglected.
We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families ... more We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to the species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, pre-compiled executables, a detailed manual and example cases.
The evidence for universal common ancestry (UCA) is vast and persuasive, and a phylogenetic test ... more The evidence for universal common ancestry (UCA) is vast and persuasive, and a phylogenetic test was proposed for quantifying its odds against independently originated sequences based on the comparison between one and several trees. This test was successfully applied to a well-supported homologous sequence alignment, being however criticized once simulations showed that even alignments without any phylogenetic structure could mislead its conclusions. Despite claims to the contrary, we believe that the counterexample successfully showed a drawback of the test, of relying on good alignments. Here we present a simplified version of this counterexample, which can be interpreted as a tree with arbitrarily long branches, and where the test again fails. We also present another simulation showing circumstances whereby any sufficiently similar alignment will favor UCA irrespective of the true independent origins for the sequences. We therefore conclude that the test should not be trusted unl...
Background / Purpose: This poster describes the software biomc2, which detects phylogenetic recom... more Background / Purpose: This poster describes the software biomc2, which detects phylogenetic recombination using a prior distribution of distances between topologies of consecutive alignment segments. Main conclusion: We present its usage on an HIV genomic dataset and on simulated alignments, where we show the importance of our chosen probability for the distances, which resemble the minimum number of recombinations.
We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families ... more We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to the species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex m...
Douglas Theobald recently developed an interesting test putatively capable of quantifying the evi... more Douglas Theobald recently developed an interesting test putatively capable of quantifying the evidence for a Universal Common Ancestry uniting the three domains of life (Eukarya, Archaea and Bacteria) against hypotheses of Independent Origins for some of these domains. We review here his model, in particular in relation to the treatment of Horizontal Gene Transfer and to the quality of sequence alignment.
Background
The human APOBEC3G (A3G) protein activity is associated with innate immunity agains... more Background
The human APOBEC3G (A3G) protein activity is associated with innate immunity against HIV-1 by inducing high rates of guanosines to adenosines (G-to-A) mutations (viz., hypermutation) in the viral DNA. If hypermutation is not enough to disrupt the reading frames of viral genes, it may likely increase the HIV-1 diversity. To counteract host innate immunity HIV-1 encodes the Vif protein that binds A3G protein and form complexes to be degraded by cellular proteolysis.
Methods
Here we studied the pattern of substitutions in the vif gene and its association with clinical status of HIV-1 infected individuals. To perform the study, unique vif gene sequences were generated from 400 antiretroviral-naive individuals.
Results
The codon pairs: 78--154, 85--154, 101--157, 105--157, and 105--176 of vif gene were associated with CD4+ T cell count lower than 500 cells per mm3. Some of these codons were located in the 81LGQGVSIEW89 region and within the BC-Box. We also identified codons under positive selection clustered in the N-terminal region of Vif protein, between 21WKSLVK26 and 40YRHHY44 regions (i.e., 31, 33, 37, 39), within the BC-Box (i.e., 155, 159) and the Cullin5-Box (i.e., 168) of vif gene. All these regions are involved in the Vif-induced degradation of A3G/F complexes and the N-terminal of Vif protein binds to viral and cellular RNA.
Conclusions
Adaptive evolution of vif gene was mostly to optimize viral RNA binding and A3G/F recognition. Additionally, since there is not a fully resolved structure of the Vif protein, codon pairs associated with CD4+ T cell count may elucidate key regions that interact with host cell factors. Here we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.
Genomic regions participating in recombination events may support distinct topologies, and phylog... more Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected.
ABSTRACT Two distinct Bayesian coalescent methods were used to analyze the evolutionary history o... more ABSTRACT Two distinct Bayesian coalescent methods were used to analyze the evolutionary history of the HIV-1 subtypes B, F and C, as well as the BF and CB recombinant viruses and determine the age of the most recent ancestral (MRCA) of the main strains circulating in South America. Near full-length sequences of pure B, F and C subtypes, as well as of the BF and CB recombinants, were analyzed.
Background Since the discovery of deep-sea chemosynthesis-based communities, much work has been d... more Background Since the discovery of deep-sea chemosynthesis-based communities, much work has been done to clarify their organismal and environmental aspects. However, major topics remain to be resolved, including when and how organisms invade and adapt to deep-sea environments; whether strategies for invasion and adaptation are shared by different taxa or unique to each taxon; how organisms extend their distribution and diversity; and how they become isolated to speciate in continuous waters.
Abstract Inferences about the evolutionary history of biological sequence data are greatly influe... more Abstract Inferences about the evolutionary history of biological sequence data are greatly influenced by the presence of recombination, that tends to disrupt the phylogenetic signal. Current recombination detection procedures focus on the phylogenetic disagreement of the data along the aligned sequences, but only recently the link between the quantification of this disagreement and the strength of the recombination was realised.
Abstract Recombinant DNA sequences can not be represented by a single topology since recombinant ... more Abstract Recombinant DNA sequences can not be represented by a single topology since recombinant segments support distinct evolutionary histories. Existing methods for recombination detection can handle only a limited number of taxa, constraining the recombination analysis to cases where the phylogeny can be assumed known for the parental sequences. If the analysis is conducted independently for each putative recombinant sequence, potential recombinations between them are neglected.
Uploads
Papers by Leonardo de Oliveira Martins
The human APOBEC3G (A3G) protein activity is associated with innate immunity against HIV-1 by inducing high rates of guanosines to adenosines (G-to-A) mutations (viz., hypermutation) in the viral DNA. If hypermutation is not enough to disrupt the reading frames of viral genes, it may likely increase the HIV-1 diversity. To counteract host innate immunity HIV-1 encodes the Vif protein that binds A3G protein and form complexes to be degraded by cellular proteolysis.
Methods
Here we studied the pattern of substitutions in the vif gene and its association with clinical status of HIV-1 infected individuals. To perform the study, unique vif gene sequences were generated from 400 antiretroviral-naive individuals.
Results
The codon pairs: 78--154, 85--154, 101--157, 105--157, and 105--176 of vif gene were associated with CD4+ T cell count lower than 500 cells per mm3. Some of these codons were located in the 81LGQGVSIEW89 region and within the BC-Box. We also identified codons under positive selection clustered in the N-terminal region of Vif protein, between 21WKSLVK26 and 40YRHHY44 regions (i.e., 31, 33, 37, 39), within the BC-Box (i.e., 155, 159) and the Cullin5-Box (i.e., 168) of vif gene. All these regions are involved in the Vif-induced degradation of A3G/F complexes and the N-terminal of Vif protein binds to viral and cellular RNA.
Conclusions
Adaptive evolution of vif gene was mostly to optimize viral RNA binding and A3G/F recognition. Additionally, since there is not a fully resolved structure of the Vif protein, codon pairs associated with CD4+ T cell count may elucidate key regions that interact with host cell factors. Here we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.
The human APOBEC3G (A3G) protein activity is associated with innate immunity against HIV-1 by inducing high rates of guanosines to adenosines (G-to-A) mutations (viz., hypermutation) in the viral DNA. If hypermutation is not enough to disrupt the reading frames of viral genes, it may likely increase the HIV-1 diversity. To counteract host innate immunity HIV-1 encodes the Vif protein that binds A3G protein and form complexes to be degraded by cellular proteolysis.
Methods
Here we studied the pattern of substitutions in the vif gene and its association with clinical status of HIV-1 infected individuals. To perform the study, unique vif gene sequences were generated from 400 antiretroviral-naive individuals.
Results
The codon pairs: 78--154, 85--154, 101--157, 105--157, and 105--176 of vif gene were associated with CD4+ T cell count lower than 500 cells per mm3. Some of these codons were located in the 81LGQGVSIEW89 region and within the BC-Box. We also identified codons under positive selection clustered in the N-terminal region of Vif protein, between 21WKSLVK26 and 40YRHHY44 regions (i.e., 31, 33, 37, 39), within the BC-Box (i.e., 155, 159) and the Cullin5-Box (i.e., 168) of vif gene. All these regions are involved in the Vif-induced degradation of A3G/F complexes and the N-terminal of Vif protein binds to viral and cellular RNA.
Conclusions
Adaptive evolution of vif gene was mostly to optimize viral RNA binding and A3G/F recognition. Additionally, since there is not a fully resolved structure of the Vif protein, codon pairs associated with CD4+ T cell count may elucidate key regions that interact with host cell factors. Here we identified and discriminated codons under positive selection and codons under functional constraint in the vif gene of HIV-1.