Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous subs... more Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we propose an evolutionary approach that explicitly relaxes the time-homogeneity assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the ...
Phylogenetic analysis of novel dolphin (Tursiops truncatus) papillomavirus sequences, TtPV1,-2, a... more Phylogenetic analysis of novel dolphin (Tursiops truncatus) papillomavirus sequences, TtPV1,-2, and-3, indicates that the early and late protein coding regions of their genomes differ in evolutionary history. Sliding window bootscan analysis showed a significant a change in phylogenetic clustering, in which the grouped sequences of TtPV1 and-3 move from a cluster with the Phocoena spinipinnis PsPV1 in the early region to a cluster with TtPV2 in the late region. This provides indications for a possible recombination event near the end of E2/beginning of L2. A second possible recombination site could be located near the end of L1, in the upstream regulatory region. Selection analysis by using maximum likelihood models of codon substitutions ruled out the possibility of intense selective pressure, acting asymmetrically on the viral genomes, as an alternative explanation for the observed difference in evolutionary history between the early and late genomic regions of these cetacean papillomaviruses.
Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution withi... more Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologicallyrelated patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the 'store and retrieve' hypothesis positing that viruses stored early in latently infected cells preferentially transmit or establish new infections upon reactivation.
Rotaviruses (RVs) are responsible for more than 600,000 child deaths each year. The worldwide int... more Rotaviruses (RVs) are responsible for more than 600,000 child deaths each year. The worldwide introduction of two life oral vaccines RotaTeq and Rotarix is believed to reduce this number significantly. Before the licensing of both vaccines, two new genotypes, G9 and G12, emerged in the human population and were able to spread across the entire globe in a very short time span. To quantify the VP7 mutation rates of these G9 and G12 genotypes and to estimate their most recent common ancestors, we used a Bayesian Markov chain Monte Carlo framework. Based on 356 sequences for G9 and 140 sequences for G12, we estimated mutation rates (nt substitutions/site/year) of 1.87 Â 10 À3 (1.45-2.27 Â 10 À3) for G9 and 1.66 Â 10 À3 (1.13-2.32 Â 10 À3) for G12. For both the G9 and G12 strains, one particular (sub) lineage was able to disseminate and cause disease across the world. The most recent common ancestors of these particular lineages were dated back to 1989 (1986-1992) and 1995 (1992-1998) for the G9 and G12 genotypes, respectively. These estimates suggest that a single novel RV (e.g., a vaccine escape mutant) can spread worldwide in little more than a decade. These results re-emphasize the need for thorough and continued RV surveillance in order to detect such potential spreading events at an early stage.
Coronaviruses are enveloped, positive-stranded RNA viruses with a genome of approximately 30 kb. ... more Coronaviruses are enveloped, positive-stranded RNA viruses with a genome of approximately 30 kb. Based on genetic similarities, coronaviruses are classified into three groups. Two group 2 coronaviruses, human coronavirus OC43 (HCoV-OC43) and bovine coronavirus (BCoV), show remarkable antigenic and genetic similarities. In this study, we report the first complete genome sequence (30,738 nucleotides) of the prototype HCoV-OC43 strain (ATCC VR759). Complete genome and open reading frame (ORF) analyses were performed in comparison to the BCoV genome. In the region between the spike and membrane protein genes, a 290-nucleotide deletion is present, corresponding to the absence of BCoV ORFs ns4.9 and ns4.8. Nucleotide and amino acid similarity percentages were determined for the major HCoV-OC43 ORFs and for those of other group 2 coronaviruses. The highest degree of similarity is demonstrated between HCoV-OC43 and BCoV in all ORFs with the exception of the E gene. Molecular clock analysis ...
Like other RNA viruses, coxsackievirus B5 (CVB5) exists as circulating heterogeneous populations ... more Like other RNA viruses, coxsackievirus B5 (CVB5) exists as circulating heterogeneous populations of genetic variants. In this study, we present the reconstruction and characterization of a probable ancestral virion of CVB5. Phylogenetic analyses based on capsid protein-encoding regions (the VP1 gene of 41 clinical isolates and the entire P1 region of eight clinical isolates) of CVB5 revealed two major cocirculating lineages. Ancestral capsid sequences were inferred from sequences of these contemporary CVB5 isolates by using maximum likelihood methods. By using Bayesian phylodynamic analysis, the inferred VP1 ancestral sequence dated back to 1854 (1807 to 1898). In order to study the properties of the putative ancestral capsid, the entire ancestral P1 sequence was synthesized de novo and inserted into the replicative backbone of an infectious CVB5 cDNA clone. Characterization of the recombinant virus in cell culture showed that fully functional infectious virus particles were assembl...
Human enteroviruses (HEVs) are responsible for a wide spectrum of clinical diseases. Even though ... more Human enteroviruses (HEVs) are responsible for a wide spectrum of clinical diseases. Even though usually associated with non-specific febrile illness, they are the most common cause of viral meningitis and pose a serious public-health problem, especially during outbreaks. Rapid detection and identification of HEV serotypes in clinical specimens are important in appropriate patient management and epidemiological investigation. A 5 year study (2003)(2004)(2005)(2006)(2007) of clinical specimens from patients with viral meningitis and/or symptoms of enteroviral infection was carried out in Cyprus to determine the underlying enteroviral aetiology. Reverse transcription, followed by a sequential PCR strategy targeting the 59 non-coding region and VP1 region, was used for typing the isolated enteroviruses. The serotype of each isolate was determined by BLAST search of the VP1 amplicon sequence against GenBank. Clinical specimens from a total of 146 patients were diagnosed as enterovirus-positive. Twenty-two different serotypes were identified. The main strains identified were echovirus 18 and echovirus 30, followed by coxsackievirus B5, echovirus 9, echovirus 6, coxsackievirus A10 and coxsackievirus B2. However, rapid changes in serotype frequency and diversity were observed over time. Serotype distribution corresponded essentially with observations reported from other European countries in the same period. The present report demonstrates the epidemiology of enteroviruses in Cyprus from 2003 to 2007.
The inflammatory bowel diseases (IBD), Crohn's disease (CD), and ulcerative colitis (UC), are com... more The inflammatory bowel diseases (IBD), Crohn's disease (CD), and ulcerative colitis (UC), are complex multifactorial traits involving both environmental and genetic factors. Mannan-binding lectin (MBL) plays an important role in non-specific immunity and complement activation. Point mutations in codons 52, 54 and 57 of exon 1 of the MBL gene are associated with decreased MBL plasma concentrations and increased susceptibility to various infectious diseases. If these MBL mutations could lead to susceptibility to putative IBD-etiological microbial agents, or could temper the complementmediated mucosal damage in IBD, MBL could function as the link between certain microbial, immunological and genetic factors in IBD. In this study, we investigated the presence of the codon 52, 54 and 57 mutations of the MBL gene in 431 unrelated IBD patients, 112 affected and 141 unaffected first-degree relatives, and 308 healthy control individuals. In the group of sporadic IBD patients (n = 340), the frequency of the investigated MBL variants was significantly lower in UC patients when compared with CD patients (P = 0.01) and with controls (P = 0.02). These results suggest that MBL mutations which decrease the formation of functional MBL could protect against the clinical development of sporadic UC, but not of CD. This could be explained by the differential T-helper response in both diseases. Genes and Immunity (2001) 2, 323-328.
The inflammatory bowel diseases (IBD), Crohn&... more The inflammatory bowel diseases (IBD), Crohn's disease (CD), and ulcerative colitis (UC), are complex multifactorial traits involving both environmental and genetic factors. Mannan-binding lectin (MBL) plays an important role in non-specific immunity and complement activation. Point mutations in codons 52, 54 and 57 of exon 1 of the MBL gene are associated with decreased MBL plasma concentrations and increased susceptibility
Phylodynamic reconstructions rely on a measurable molecular footprint of epidemic processes in pa... more Phylodynamic reconstructions rely on a measurable molecular footprint of epidemic processes in pathogen genomes. Identifying the factors that govern the tempo and mode by which these processes leave a footprint in pathogen genomes represents an important goal towards understanding infectious disease evolution. Discriminating between synonymous and non-synonymous substitution rates is crucial for testing hypotheses about the sources of evolutionary rate variation. Here, we implement a codon substitution model in a Bayesian statistical framework to estimate absolute rates of synonymous and non-synonymous substitution in unknown evolutionary histories. To demonstrate how this model can provide critical insights into pathogen evolutionary dynamics, we adopt hierarchical phylogenetic modelling with fixed effects and apply it to two viral examples. Using within-host HIV-1 data from patients with different host genetic background and different disease progression rates, we show that viral populations undergo faster absolute synonymous substitution rates in patients with faster disease progression, probably reflecting faster replication rates. We also re-analyse rabies data from different bat species in the Americas to demonstrate that climate predicts absolute synonymous substitution rates, which can be attributed to climate-associated bat activity and viral transmission dynamics. In conclusion, our model to estimate absolute rates of synonymous and non-synonymous substitution can provide a powerful approach to investigate how host ecology can shape the tempo of pathogen evolution.
Proceedings. Biological sciences / The Royal Society, Jan 22, 2015
The frequency and global impact of infectious disease outbreaks, particularly those caused by eme... more The frequency and global impact of infectious disease outbreaks, particularly those caused by emerging viruses, demonstrate the need for a better understanding of how spatial ecology and pathogen evolution jointly shape epidemic dynamics. Advances in computational techniques and the increasing availability of genetic and geospatial data are helping to address this problem, particularly when both information sources are combined. Here, we review research at the intersection of evolutionary biology, human geography and epidemiology that is working towards an integrated view of spatial incidence, host mobility and viral genetic diversity. We first discuss how empirical studies have combined viral spatial and genetic data, focusing particularly on the contribution of evolutionary analyses to epidemiology and disease control. Second, we explore the interplay between virus evolution and global dispersal in more depth for two pathogens: human influenza A virus and chikungunya virus. We dis...
Since its isolation in 1966 in Kenya, rice yellow mottle virus (RYMV) has been reported throughou... more Since its isolation in 1966 in Kenya, rice yellow mottle virus (RYMV) has been reported throughout Africa resulting in one of the economically most important tropical plant emerging diseases. A thorough understanding of RYMV evolution and dispersal is critical to manage viral spread in tropical areas that heavily rely on agriculture for subsistence. Phylogenetic analyses have suggested a relatively recent expansion, perhaps driven by the intensification of agricultural practices, but this has not yet been examined in a coherent statistical framework. To gain insight into the historical spread of RYMV within Africa rice cultivations, we analyse a dataset of 300 coat protein gene sequences, sampled from East to West Africa over a 46-year period, using Bayesian evolutionary inference. Spatiotemporal reconstructions date the origin of RMYV back to 1852 (1791-1903) and confirm Tanzania as the most likely geographic origin. Following a single long-distance transmission event from East to West Africa, separate viral populations have been maintained for about a century. To identify the factors that shaped the RYMV distribution, we apply a generalised linear model (GLM) extension of discrete phylogenetic diffusion and provide strong support for distances measured on a rice connectivity landscape as the major determinant of RYMV spread. Phylogeographic estimates in continuous space further complement this by demonstrating more pronounced expansion dynamics in West Africa that are consistent with agricultural intensification and extensification. Taken together, our principled phylogeographic inference approach shows for the first time that host ecology dynamics have shaped the historical spread of a plant virus.
Phylogeographic approaches help uncover the imprint that spatial epidemiological processes leave ... more Phylogeographic approaches help uncover the imprint that spatial epidemiological processes leave in the genomes of fast evolving viruses. Recent Bayesian inference methods that consider phylogenetic diffusion of discretely and continuously distributed traits offer a unique opportunity to explore genotypic and phenotypic evolution in greater detail. To provide a taste of the recent advances in viral diffusion approaches, we highlight key findings arising at the intrahost, local and global epidemiological scales. We also outline future areas of research and discuss how these may contribute to a quantitative understanding of the phylodynamics of RNA viruses.
As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has be... more As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
Proceedings of the National Academy of Sciences, 2012
We introduce a conceptual bridge between the previously unlinked fields of phylogenetics and math... more We introduce a conceptual bridge between the previously unlinked fields of phylogenetics and mathematical spatial ecology, which enables the spatial parameters of an emerging epidemic to be directly estimated from sampled pathogen genome sequences. By using phylogenetic history to correct for spatial autocorrelation, we illustrate how a fundamental spatial variable, the diffusion coefficient, can be estimated using robust nonparametric statistics, and how heterogeneity in dispersal can be readily quantified. We apply this framework to the spread of the West Nile virus across North America, an important recent instance of spatial invasion by an emerging infectious disease. We demonstrate that the dispersal of West Nile virus is greater and far more variable than previously measured, such that its dissemination was critically determined by rare, long-range movements that are unlikely to be discerned during field observations. Our results indicate that, by ignoring this heterogeneity, ...
Understanding the role of humans in the dispersal of predominately animal pathogens is essential ... more Understanding the role of humans in the dispersal of predominately animal pathogens is essential for their control. We used newly developed Bayesian phylogeographic methods to unravel the dynamics and determinants of the spread of dog rabies virus (RABV) in North Africa. Each of the countries studied exhibited largely disconnected spatial dynamics with major geopolitical boundaries acting as barriers to gene flow. Road distances proved to be better predictors of the movement of dog RABV than accessibility or raw geographical distance, with occasional long distance and rapid spread within each of these countries. Using simulations that bridge phylodynamics and spatial epidemiology, we demonstrate that the contemporary viral distribution extends beyond that expected for RABV transmission in African dog populations. These results are strongly supportive of human-mediated dispersal, and demonstrate how an integrated phylogeographic approach will turn viral genetic data into a powerful asset for characterizing, predicting, and potentially controlling the spatial spread of pathogens.
Philosophical Transactions of the Royal Society B: Biological Sciences, 2013
The factors that determine the origin and fate of cross-species transmission events remain unclea... more The factors that determine the origin and fate of cross-species transmission events remain unclear for the majority of human pathogens, despite being central for the development of predictive models and assessing the efficacy of prevention strategies. Here, we describe a flexible Bayesian statistical framework to reconstruct virus transmission between different host species based on viral gene sequences, while simultaneously testing and estimating the contribution of several potential predictors of cross-species transmission. Specifically, we use a generalized linear model extension of phylogenetic diffusion to perform Bayesian model averaging over candidate predictors. By further extending this model with branch partitioning, we allow for distinct host transition processes on external and internal branches, thus discriminating between recent cross-species transmissions, many of which are likely to result in dead-end infections, and host shifts that reflect successful onwards transm...
Research aimed at understanding the geographic context of evolutionary histories is burgeoning ac... more Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features.
Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous sub... more Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates (d N =d S) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in d N =d S across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific d N =d S estimates. Results: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific d N =d S values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific d N =d S estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples. Availability: Renaissance counting is implemented in the development branch of BEAST, freely available at http://code.google.com/p/beastmcmc/. The method will be made available in the next public release of the package, including support to set up analyses in BEAUti.
Rates of evolution span orders of magnitude among RNA viruses with important implications for vir... more Rates of evolution span orders of magnitude among RNA viruses with important implications for viral transmission and emergence. Although the tempo of viral evolution is often ascribed to viral features such as mutation rates and transmission mode, these factors alone cannot explain variation among closely related viruses, where host biology might operate more strongly on viral evolution. Here, we analyzed sequence data from hundreds of rabies viruses collected from bats throughout the Americas to describe dramatic variation in the speed of rabies virus evolution when circulating in ecologically distinct reservoir species. Integration of ecological and genetic data through a comparative Bayesian analysis revealed that viral evolutionary rates were labile following historical jumps between bat species and nearly four times faster in tropical and subtropical bats compared to temperate species. The association between geography and viral evolution could not be explained by host metabolism, phylogeny or variable selection pressures, and instead appeared to be a consequence of reduced seasonality in bat activity and virus transmission associated with climate. Our results demonstrate a key role for host ecology in shaping the tempo of evolution in multi-host viruses and highlight the power of comparative phylogenetic methods to identify the host and environmental features that influence transmission dynamics.
Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous subs... more Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we propose an evolutionary approach that explicitly relaxes the time-homogeneity assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the ...
Phylogenetic analysis of novel dolphin (Tursiops truncatus) papillomavirus sequences, TtPV1,-2, a... more Phylogenetic analysis of novel dolphin (Tursiops truncatus) papillomavirus sequences, TtPV1,-2, and-3, indicates that the early and late protein coding regions of their genomes differ in evolutionary history. Sliding window bootscan analysis showed a significant a change in phylogenetic clustering, in which the grouped sequences of TtPV1 and-3 move from a cluster with the Phocoena spinipinnis PsPV1 in the early region to a cluster with TtPV2 in the late region. This provides indications for a possible recombination event near the end of E2/beginning of L2. A second possible recombination site could be located near the end of L1, in the upstream regulatory region. Selection analysis by using maximum likelihood models of codon substitutions ruled out the possibility of intense selective pressure, acting asymmetrically on the viral genomes, as an alternative explanation for the observed difference in evolutionary history between the early and late genomic regions of these cetacean papillomaviruses.
Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution withi... more Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologicallyrelated patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the 'store and retrieve' hypothesis positing that viruses stored early in latently infected cells preferentially transmit or establish new infections upon reactivation.
Rotaviruses (RVs) are responsible for more than 600,000 child deaths each year. The worldwide int... more Rotaviruses (RVs) are responsible for more than 600,000 child deaths each year. The worldwide introduction of two life oral vaccines RotaTeq and Rotarix is believed to reduce this number significantly. Before the licensing of both vaccines, two new genotypes, G9 and G12, emerged in the human population and were able to spread across the entire globe in a very short time span. To quantify the VP7 mutation rates of these G9 and G12 genotypes and to estimate their most recent common ancestors, we used a Bayesian Markov chain Monte Carlo framework. Based on 356 sequences for G9 and 140 sequences for G12, we estimated mutation rates (nt substitutions/site/year) of 1.87 Â 10 À3 (1.45-2.27 Â 10 À3) for G9 and 1.66 Â 10 À3 (1.13-2.32 Â 10 À3) for G12. For both the G9 and G12 strains, one particular (sub) lineage was able to disseminate and cause disease across the world. The most recent common ancestors of these particular lineages were dated back to 1989 (1986-1992) and 1995 (1992-1998) for the G9 and G12 genotypes, respectively. These estimates suggest that a single novel RV (e.g., a vaccine escape mutant) can spread worldwide in little more than a decade. These results re-emphasize the need for thorough and continued RV surveillance in order to detect such potential spreading events at an early stage.
Coronaviruses are enveloped, positive-stranded RNA viruses with a genome of approximately 30 kb. ... more Coronaviruses are enveloped, positive-stranded RNA viruses with a genome of approximately 30 kb. Based on genetic similarities, coronaviruses are classified into three groups. Two group 2 coronaviruses, human coronavirus OC43 (HCoV-OC43) and bovine coronavirus (BCoV), show remarkable antigenic and genetic similarities. In this study, we report the first complete genome sequence (30,738 nucleotides) of the prototype HCoV-OC43 strain (ATCC VR759). Complete genome and open reading frame (ORF) analyses were performed in comparison to the BCoV genome. In the region between the spike and membrane protein genes, a 290-nucleotide deletion is present, corresponding to the absence of BCoV ORFs ns4.9 and ns4.8. Nucleotide and amino acid similarity percentages were determined for the major HCoV-OC43 ORFs and for those of other group 2 coronaviruses. The highest degree of similarity is demonstrated between HCoV-OC43 and BCoV in all ORFs with the exception of the E gene. Molecular clock analysis ...
Like other RNA viruses, coxsackievirus B5 (CVB5) exists as circulating heterogeneous populations ... more Like other RNA viruses, coxsackievirus B5 (CVB5) exists as circulating heterogeneous populations of genetic variants. In this study, we present the reconstruction and characterization of a probable ancestral virion of CVB5. Phylogenetic analyses based on capsid protein-encoding regions (the VP1 gene of 41 clinical isolates and the entire P1 region of eight clinical isolates) of CVB5 revealed two major cocirculating lineages. Ancestral capsid sequences were inferred from sequences of these contemporary CVB5 isolates by using maximum likelihood methods. By using Bayesian phylodynamic analysis, the inferred VP1 ancestral sequence dated back to 1854 (1807 to 1898). In order to study the properties of the putative ancestral capsid, the entire ancestral P1 sequence was synthesized de novo and inserted into the replicative backbone of an infectious CVB5 cDNA clone. Characterization of the recombinant virus in cell culture showed that fully functional infectious virus particles were assembl...
Human enteroviruses (HEVs) are responsible for a wide spectrum of clinical diseases. Even though ... more Human enteroviruses (HEVs) are responsible for a wide spectrum of clinical diseases. Even though usually associated with non-specific febrile illness, they are the most common cause of viral meningitis and pose a serious public-health problem, especially during outbreaks. Rapid detection and identification of HEV serotypes in clinical specimens are important in appropriate patient management and epidemiological investigation. A 5 year study (2003)(2004)(2005)(2006)(2007) of clinical specimens from patients with viral meningitis and/or symptoms of enteroviral infection was carried out in Cyprus to determine the underlying enteroviral aetiology. Reverse transcription, followed by a sequential PCR strategy targeting the 59 non-coding region and VP1 region, was used for typing the isolated enteroviruses. The serotype of each isolate was determined by BLAST search of the VP1 amplicon sequence against GenBank. Clinical specimens from a total of 146 patients were diagnosed as enterovirus-positive. Twenty-two different serotypes were identified. The main strains identified were echovirus 18 and echovirus 30, followed by coxsackievirus B5, echovirus 9, echovirus 6, coxsackievirus A10 and coxsackievirus B2. However, rapid changes in serotype frequency and diversity were observed over time. Serotype distribution corresponded essentially with observations reported from other European countries in the same period. The present report demonstrates the epidemiology of enteroviruses in Cyprus from 2003 to 2007.
The inflammatory bowel diseases (IBD), Crohn's disease (CD), and ulcerative colitis (UC), are com... more The inflammatory bowel diseases (IBD), Crohn's disease (CD), and ulcerative colitis (UC), are complex multifactorial traits involving both environmental and genetic factors. Mannan-binding lectin (MBL) plays an important role in non-specific immunity and complement activation. Point mutations in codons 52, 54 and 57 of exon 1 of the MBL gene are associated with decreased MBL plasma concentrations and increased susceptibility to various infectious diseases. If these MBL mutations could lead to susceptibility to putative IBD-etiological microbial agents, or could temper the complementmediated mucosal damage in IBD, MBL could function as the link between certain microbial, immunological and genetic factors in IBD. In this study, we investigated the presence of the codon 52, 54 and 57 mutations of the MBL gene in 431 unrelated IBD patients, 112 affected and 141 unaffected first-degree relatives, and 308 healthy control individuals. In the group of sporadic IBD patients (n = 340), the frequency of the investigated MBL variants was significantly lower in UC patients when compared with CD patients (P = 0.01) and with controls (P = 0.02). These results suggest that MBL mutations which decrease the formation of functional MBL could protect against the clinical development of sporadic UC, but not of CD. This could be explained by the differential T-helper response in both diseases. Genes and Immunity (2001) 2, 323-328.
The inflammatory bowel diseases (IBD), Crohn&... more The inflammatory bowel diseases (IBD), Crohn's disease (CD), and ulcerative colitis (UC), are complex multifactorial traits involving both environmental and genetic factors. Mannan-binding lectin (MBL) plays an important role in non-specific immunity and complement activation. Point mutations in codons 52, 54 and 57 of exon 1 of the MBL gene are associated with decreased MBL plasma concentrations and increased susceptibility
Phylodynamic reconstructions rely on a measurable molecular footprint of epidemic processes in pa... more Phylodynamic reconstructions rely on a measurable molecular footprint of epidemic processes in pathogen genomes. Identifying the factors that govern the tempo and mode by which these processes leave a footprint in pathogen genomes represents an important goal towards understanding infectious disease evolution. Discriminating between synonymous and non-synonymous substitution rates is crucial for testing hypotheses about the sources of evolutionary rate variation. Here, we implement a codon substitution model in a Bayesian statistical framework to estimate absolute rates of synonymous and non-synonymous substitution in unknown evolutionary histories. To demonstrate how this model can provide critical insights into pathogen evolutionary dynamics, we adopt hierarchical phylogenetic modelling with fixed effects and apply it to two viral examples. Using within-host HIV-1 data from patients with different host genetic background and different disease progression rates, we show that viral populations undergo faster absolute synonymous substitution rates in patients with faster disease progression, probably reflecting faster replication rates. We also re-analyse rabies data from different bat species in the Americas to demonstrate that climate predicts absolute synonymous substitution rates, which can be attributed to climate-associated bat activity and viral transmission dynamics. In conclusion, our model to estimate absolute rates of synonymous and non-synonymous substitution can provide a powerful approach to investigate how host ecology can shape the tempo of pathogen evolution.
Proceedings. Biological sciences / The Royal Society, Jan 22, 2015
The frequency and global impact of infectious disease outbreaks, particularly those caused by eme... more The frequency and global impact of infectious disease outbreaks, particularly those caused by emerging viruses, demonstrate the need for a better understanding of how spatial ecology and pathogen evolution jointly shape epidemic dynamics. Advances in computational techniques and the increasing availability of genetic and geospatial data are helping to address this problem, particularly when both information sources are combined. Here, we review research at the intersection of evolutionary biology, human geography and epidemiology that is working towards an integrated view of spatial incidence, host mobility and viral genetic diversity. We first discuss how empirical studies have combined viral spatial and genetic data, focusing particularly on the contribution of evolutionary analyses to epidemiology and disease control. Second, we explore the interplay between virus evolution and global dispersal in more depth for two pathogens: human influenza A virus and chikungunya virus. We dis...
Since its isolation in 1966 in Kenya, rice yellow mottle virus (RYMV) has been reported throughou... more Since its isolation in 1966 in Kenya, rice yellow mottle virus (RYMV) has been reported throughout Africa resulting in one of the economically most important tropical plant emerging diseases. A thorough understanding of RYMV evolution and dispersal is critical to manage viral spread in tropical areas that heavily rely on agriculture for subsistence. Phylogenetic analyses have suggested a relatively recent expansion, perhaps driven by the intensification of agricultural practices, but this has not yet been examined in a coherent statistical framework. To gain insight into the historical spread of RYMV within Africa rice cultivations, we analyse a dataset of 300 coat protein gene sequences, sampled from East to West Africa over a 46-year period, using Bayesian evolutionary inference. Spatiotemporal reconstructions date the origin of RMYV back to 1852 (1791-1903) and confirm Tanzania as the most likely geographic origin. Following a single long-distance transmission event from East to West Africa, separate viral populations have been maintained for about a century. To identify the factors that shaped the RYMV distribution, we apply a generalised linear model (GLM) extension of discrete phylogenetic diffusion and provide strong support for distances measured on a rice connectivity landscape as the major determinant of RYMV spread. Phylogeographic estimates in continuous space further complement this by demonstrating more pronounced expansion dynamics in West Africa that are consistent with agricultural intensification and extensification. Taken together, our principled phylogeographic inference approach shows for the first time that host ecology dynamics have shaped the historical spread of a plant virus.
Phylogeographic approaches help uncover the imprint that spatial epidemiological processes leave ... more Phylogeographic approaches help uncover the imprint that spatial epidemiological processes leave in the genomes of fast evolving viruses. Recent Bayesian inference methods that consider phylogenetic diffusion of discretely and continuously distributed traits offer a unique opportunity to explore genotypic and phenotypic evolution in greater detail. To provide a taste of the recent advances in viral diffusion approaches, we highlight key findings arising at the intrahost, local and global epidemiological scales. We also outline future areas of research and discuss how these may contribute to a quantitative understanding of the phylodynamics of RNA viruses.
As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has be... more As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
Proceedings of the National Academy of Sciences, 2012
We introduce a conceptual bridge between the previously unlinked fields of phylogenetics and math... more We introduce a conceptual bridge between the previously unlinked fields of phylogenetics and mathematical spatial ecology, which enables the spatial parameters of an emerging epidemic to be directly estimated from sampled pathogen genome sequences. By using phylogenetic history to correct for spatial autocorrelation, we illustrate how a fundamental spatial variable, the diffusion coefficient, can be estimated using robust nonparametric statistics, and how heterogeneity in dispersal can be readily quantified. We apply this framework to the spread of the West Nile virus across North America, an important recent instance of spatial invasion by an emerging infectious disease. We demonstrate that the dispersal of West Nile virus is greater and far more variable than previously measured, such that its dissemination was critically determined by rare, long-range movements that are unlikely to be discerned during field observations. Our results indicate that, by ignoring this heterogeneity, ...
Understanding the role of humans in the dispersal of predominately animal pathogens is essential ... more Understanding the role of humans in the dispersal of predominately animal pathogens is essential for their control. We used newly developed Bayesian phylogeographic methods to unravel the dynamics and determinants of the spread of dog rabies virus (RABV) in North Africa. Each of the countries studied exhibited largely disconnected spatial dynamics with major geopolitical boundaries acting as barriers to gene flow. Road distances proved to be better predictors of the movement of dog RABV than accessibility or raw geographical distance, with occasional long distance and rapid spread within each of these countries. Using simulations that bridge phylodynamics and spatial epidemiology, we demonstrate that the contemporary viral distribution extends beyond that expected for RABV transmission in African dog populations. These results are strongly supportive of human-mediated dispersal, and demonstrate how an integrated phylogeographic approach will turn viral genetic data into a powerful asset for characterizing, predicting, and potentially controlling the spatial spread of pathogens.
Philosophical Transactions of the Royal Society B: Biological Sciences, 2013
The factors that determine the origin and fate of cross-species transmission events remain unclea... more The factors that determine the origin and fate of cross-species transmission events remain unclear for the majority of human pathogens, despite being central for the development of predictive models and assessing the efficacy of prevention strategies. Here, we describe a flexible Bayesian statistical framework to reconstruct virus transmission between different host species based on viral gene sequences, while simultaneously testing and estimating the contribution of several potential predictors of cross-species transmission. Specifically, we use a generalized linear model extension of phylogenetic diffusion to perform Bayesian model averaging over candidate predictors. By further extending this model with branch partitioning, we allow for distinct host transition processes on external and internal branches, thus discriminating between recent cross-species transmissions, many of which are likely to result in dead-end infections, and host shifts that reflect successful onwards transm...
Research aimed at understanding the geographic context of evolutionary histories is burgeoning ac... more Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features.
Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous sub... more Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates (d N =d S) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in d N =d S across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific d N =d S estimates. Results: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific d N =d S values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific d N =d S estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples. Availability: Renaissance counting is implemented in the development branch of BEAST, freely available at http://code.google.com/p/beastmcmc/. The method will be made available in the next public release of the package, including support to set up analyses in BEAUti.
Rates of evolution span orders of magnitude among RNA viruses with important implications for vir... more Rates of evolution span orders of magnitude among RNA viruses with important implications for viral transmission and emergence. Although the tempo of viral evolution is often ascribed to viral features such as mutation rates and transmission mode, these factors alone cannot explain variation among closely related viruses, where host biology might operate more strongly on viral evolution. Here, we analyzed sequence data from hundreds of rabies viruses collected from bats throughout the Americas to describe dramatic variation in the speed of rabies virus evolution when circulating in ecologically distinct reservoir species. Integration of ecological and genetic data through a comparative Bayesian analysis revealed that viral evolutionary rates were labile following historical jumps between bat species and nearly four times faster in tropical and subtropical bats compared to temperate species. The association between geography and viral evolution could not be explained by host metabolism, phylogeny or variable selection pressures, and instead appeared to be a consequence of reduced seasonality in bat activity and virus transmission associated with climate. Our results demonstrate a key role for host ecology in shaping the tempo of evolution in multi-host viruses and highlight the power of comparative phylogenetic methods to identify the host and environmental features that influence transmission dynamics.
Uploads
Papers by Philippe Lemey