My research is interested in using population genetics based and computational tools to answer key evolutionary questions using genomic data. I am particularly interested in understanding our own human evolution both in the context of human demography, migration and admixture history as well as with our interactions with infectious disease, particularly pathogenic bacteria, at both recent eg. the emergence of antimicrobial resistant bacteria and ancient eg. human population structure at the time of the Neolithic revolution, time-scales.
The Asian Central Steppe, consisting of current-day Kazakhstan and Russia, has acted as a highway... more The Asian Central Steppe, consisting of current-day Kazakhstan and Russia, has acted as a highway for major migrations throughout history. Therefore, describing the genetic composition of past populations in Central Asia holds value to understanding human mobility in this pivotal region. In this study, we analyse paleogenomic data generated from five humans from Kuygenzhar, Kazakhstan. These individuals date to the early to mid-18th century, shortly after the Kazakh Khanate was founded, a union of nomadic tribes of Mongol Golden Horde and Turkic origins. Genomic analysis identifies that these individuals are admixed with varying proportions of East Asian ancestry, indicating a recent admixture event from East Asia. The high amounts of DNA from the anaerobic Gram-negative bacteria Tannerella forsythia, a periodontal pathogen, recovered from their teeth suggest they may have suffered from periodontitis disease. Genomic analysis of this bacterium identified recently evolved virulence a...
Background Methicillin-resistant Staphylococcus aureus (MRSA) is a major nosocomial pathogen subd... more Background Methicillin-resistant Staphylococcus aureus (MRSA) is a major nosocomial pathogen subdivided into lineages termed sequence types (STs). Since the 1950s, successive waves of STs have appeared and replaced previously dominant lineages. One such event has been occurring in China since 2013, with community-associated (CA-MRSA) strains including ST59 largely replacing the previously dominant healthcare-associated (HA-MRSA) ST239. We previously showed that ST59 isolates tend to have a competitive advantage in growth experiments against ST239. However, the underlying genomic and phenotypic drivers of this replacement event are unclear. Methods Here, we investigated the replacement of ST239 using whole-genome sequencing data from 204 ST239 and ST59 isolates collected in Chinese hospitals between 1994 and 2016. We reconstructed the evolutionary history of each ST and considered two non-mutually exclusive hypotheses for ST59 replacing ST239: antimicrobial resistance (AMR) profile a...
The rich linguistic, ethnic and cultural diversity of Ethiopia provides an unprecedented opportun... more The rich linguistic, ethnic and cultural diversity of Ethiopia provides an unprecedented opportunity to understand the level to which cultural factors correlate with–and shape–genetic structure in human populations. Using primarily new genetic variation data covering 1,214 Ethiopians representing 68 different ethnic groups, together with information on individuals’ birthplaces, linguistic/religious practices and 31 cultural practices, we disentangle the effects of geographic distance, elevation, and social factors on the genetic structure of Ethiopians today. We provide evidence of associations between social behaviours and genetic differences among present-day peoples. We show that genetic similarity is broadly associated with linguistic affiliation, but also identify pronounced genetic similarity among groups from disparate language classifications that may in part be attributable to recent intermixing. We also illustrate how groups reporting the same culture traits are more genet...
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in late 2019 and spread glob... more Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in late 2019 and spread globally to cause the COVID-19 pandemic. Despite the constant accumulation of genetic variation in the SARS-CoV-2 population, there was little evidence for the emergence of significantly more transmissible lineages in the first half of 2020. Around November 2020, several more contagious and possibly more virulent ‘Variants of Concern’ (VoCs) were detected near-simultaneously in various regions of the world. These VoCs share some mutations and deletions that haven arisen recurrently in distinct genetic backgrounds. Here, we build on our previous work modelling the association of mutations to SARS-CoV-2 transmissibility and characterise the contribution of individual recurrent mutations and deletions to estimated viral transmissibility. We estimate enhanced transmissibility associated to mutations characteristic of VoCs and identify a tendency for cytidine to thymidine (C→T) substitutions to b...
Severe acute respiratory coronavirus 2 (SARS-CoV-2), the agent of the ongoing COVID-19 pandemic, ... more Severe acute respiratory coronavirus 2 (SARS-CoV-2), the agent of the ongoing COVID-19 pandemic, jumped into humans from an unknown animal reservoir in late 2019. In line with other coronaviruses, SARS-CoV-2 has the potential to infect a broad range of hosts. SARS-CoV-2 genomes have now been isolated from cats, dogs, lions, tigers and minks. SARS-CoV-2 seems to transmit particularly well in mink farms with outbreaks reported in Spain, Sweden, the Netherlands, Italy, the USA and Denmark. Genomic data from SARS-CoV-2 isolated from infected minks provides a natural case study of a secondary host jump of the virus, in this case from humans to animals, and occasionally back again. We screened published SARS-CoV-2 genomes isolated from minks for the presence of recurrent mutations common in mink but infrequent in SARS-CoV-2 genomes isolated from human infections. We identify 23 recurrent mutations including three nonsynonymous mutations in the Receptor Binding Domain of the SARS-CoV-2 spi...
Individuals with likely exposure to the highly infectious SARS-CoV-2 do not necessarily develop P... more Individuals with likely exposure to the highly infectious SARS-CoV-2 do not necessarily develop PCR or antibody positivity, suggesting some may clear sub-clinical infection before seroconversion. T cells can contribute to the rapid clearance of SARS-CoV-2 and other coronavirus infections1–5. We hypothesised that pre-existing memory T cell responses, with cross-protective potential against SARS-CoV-26–12, would expand in vivo to mediate rapid viral control, potentially aborting infection. We studied T cells against the replication transcription complex (RTC) of SARS-CoV-2 since this is transcribed first in the viral life cycle13–15and should be highly conserved. We measured SARS-CoV-2-reactive T cells in a cohort of intensively monitored healthcare workers (HCW) who remained repeatedly negative by PCR, antibody binding, and neutralisation for SARS-CoV-2 (exposed seronegative, ES). 16-weeks post-recruitment, ES had memory T cells that were stronger and more multispecific than an unexp...
Our understanding of the host component of sepsis has made significant progress. However, detaile... more Our understanding of the host component of sepsis has made significant progress. However, detailed study of the microorganisms causing sepsis, either as single pathogens or microbial assemblages, has received far less attention. Metagenomic data offer opportunities to characterize the microbial communities found in septic and healthy individuals. In this study we apply gradient-boosted tree classifiers and a novel computational decontamination technique built upon SHapley Additive exPlanations (SHAP) to identify microbial hallmarks which discriminate blood metagenomic samples of septic patients from that of healthy individuals. Classifiers had high performance when using the read assignments to microbial genera [area under the receiver operating characteristic (AUROC=0.995)], including after removal of species ‘culture-confirmed’ as the cause of sepsis through clinical testing (AUROC=0.915). Models trained on single genera were inferior to those employing a polymicrobial model and w...
The COVID-19 pandemic has led to an unprecedented global sequencing effort of its viral agent SAR... more The COVID-19 pandemic has led to an unprecedented global sequencing effort of its viral agent SARS-CoV-2. The first whole genome assembly of SARS-CoV-2 was published on January 5 2020. Since then, over 150,000 high-quality SARS-CoV-2 genomes have been made available. This large genomic resource has allowed tracing of the emergence and spread of mutations and phylogenetic reconstruction of SARS-CoV-2 lineages in near real time. Though, whether SARS-CoV-2 undergoes genetic recombination has been largely overlooked to date. Recombination-mediated rearrangement of variants that arose independently can be of major evolutionary importance. Moreover, the absence of recombination is a key assumption behind the application of phylogenetic inference methods. Here, we analyse the extant genomic diversity of SARS-CoV-2 and show that, to date, there is no detectable hallmark of recombination. We assess our detection power using simulations and validate our method on the related MERS-CoV for whic...
The mobile resistance gene blaNDM encodes the NDM enzyme capable of hydrolysing carbapenems, a cl... more The mobile resistance gene blaNDM encodes the NDM enzyme capable of hydrolysing carbapenems, a class of antibiotics used to treat some of the most severe bacterial infections. blaNDM is globally distributed across a variety of Gram-negative bacteria and is typically located within a highly recombining transposon-rich genomic region common to multiple plasmids types. As a result of this genomic complexity the dynamics underlying the dissemination of blaNDM remain poorly resolved. In this work, we compiled a dataset of over 2000 bacterial genomes harbouring the blaNDM gene including 112 new PacBio hybrid assemblies from clinical and livestock associated isolates across China and developed a novel computational approach to track structural variants in bacterial genomes. We were able to correlate specific structural variants with plasmid backbones, bacterial host species and sampling locations, and identified multiple transposition events that occurred during the global dissemination of...
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel coronavirus responsible f... more Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel coronavirus responsible for the COVID-19 pandemic, continues to cause significant public health burden and disruption globally. Genomic epidemiology approaches point to most countries in the world having experienced many independent introductions of SARS-CoV-2 during the early stages of the pandemic. However, this situation may change with local lockdown policies and restrictions on travel leading to the emergence of more geographically structured viral populations and lineages transmitting locally. Here, we report the first SARS-CoV-2 genomes from Palestine sampled from early March, when the first cases were observed, through to August of 2020. SARS-CoV-2 genomes from Palestine fall across the diversity of the global phylogeny, consistent with at least nine independent introductions into the region. We identify one locally predominant lineage in circulation represented by 50 Palestinian SARS-CoV-2, grouping wit...
BackgroundThe last two decades have seen significant progress in our understanding of the host co... more BackgroundThe last two decades have seen significant progress in our understanding of the host component of sepsis. However, detailed study of the composition of the microbial community present in sepsis, or indeed, the finer characterisation of this community as a predictive tool to identify sepsis cases, has only received limited attention.MethodsUsing various rigorous computational methods, we analysed a publicly available metagenomic dataset, comparing the patterns of microbial DNA in the blood plasma of septic patients relative to that of healthy individuals. We evaluated the performance of gradient-boosted tree classifiers in determining if the microbial taxonomic assignments of a blood metagenomic sample belonged to a septic or healthy individual. Additionally, we demonstrated a novel application of SHapley Additive exPlanations (SHAP), a recently developed model interpretation approach, to computationally remove putative contaminant genera present in such data. Finally, we c...
The Asian Central Steppe, consisting of current-day Kazakhstan and Russia, has acted as a highway... more The Asian Central Steppe, consisting of current-day Kazakhstan and Russia, has acted as a highway for major migrations throughout history. Therefore, describing the genetic composition of past populations in Central Asia holds value to understanding human mobility in this pivotal region. In this study, we analyse paleogenomic data generated from five humans from Kuygenzhar, Kazakhstan. These individuals date to the early to mid-18th century, shortly after the Kazakh Khanate was founded, a union of nomadic tribes of Mongol Golden Horde and Turkic origins. Genomic analysis identifies that these individuals are admixed with varying proportions of East Asian ancestry, indicating a recent admixture event from East Asia. The high amounts of DNA from the anaerobic Gram-negative bacteria Tannerella forsythia, a periodontal pathogen, recovered from their teeth suggest they may have suffered from periodontitis disease. Genomic analysis of this bacterium identified recently evolved virulence a...
Background Methicillin-resistant Staphylococcus aureus (MRSA) is a major nosocomial pathogen subd... more Background Methicillin-resistant Staphylococcus aureus (MRSA) is a major nosocomial pathogen subdivided into lineages termed sequence types (STs). Since the 1950s, successive waves of STs have appeared and replaced previously dominant lineages. One such event has been occurring in China since 2013, with community-associated (CA-MRSA) strains including ST59 largely replacing the previously dominant healthcare-associated (HA-MRSA) ST239. We previously showed that ST59 isolates tend to have a competitive advantage in growth experiments against ST239. However, the underlying genomic and phenotypic drivers of this replacement event are unclear. Methods Here, we investigated the replacement of ST239 using whole-genome sequencing data from 204 ST239 and ST59 isolates collected in Chinese hospitals between 1994 and 2016. We reconstructed the evolutionary history of each ST and considered two non-mutually exclusive hypotheses for ST59 replacing ST239: antimicrobial resistance (AMR) profile a...
The rich linguistic, ethnic and cultural diversity of Ethiopia provides an unprecedented opportun... more The rich linguistic, ethnic and cultural diversity of Ethiopia provides an unprecedented opportunity to understand the level to which cultural factors correlate with–and shape–genetic structure in human populations. Using primarily new genetic variation data covering 1,214 Ethiopians representing 68 different ethnic groups, together with information on individuals’ birthplaces, linguistic/religious practices and 31 cultural practices, we disentangle the effects of geographic distance, elevation, and social factors on the genetic structure of Ethiopians today. We provide evidence of associations between social behaviours and genetic differences among present-day peoples. We show that genetic similarity is broadly associated with linguistic affiliation, but also identify pronounced genetic similarity among groups from disparate language classifications that may in part be attributable to recent intermixing. We also illustrate how groups reporting the same culture traits are more genet...
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in late 2019 and spread glob... more Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in late 2019 and spread globally to cause the COVID-19 pandemic. Despite the constant accumulation of genetic variation in the SARS-CoV-2 population, there was little evidence for the emergence of significantly more transmissible lineages in the first half of 2020. Around November 2020, several more contagious and possibly more virulent ‘Variants of Concern’ (VoCs) were detected near-simultaneously in various regions of the world. These VoCs share some mutations and deletions that haven arisen recurrently in distinct genetic backgrounds. Here, we build on our previous work modelling the association of mutations to SARS-CoV-2 transmissibility and characterise the contribution of individual recurrent mutations and deletions to estimated viral transmissibility. We estimate enhanced transmissibility associated to mutations characteristic of VoCs and identify a tendency for cytidine to thymidine (C→T) substitutions to b...
Severe acute respiratory coronavirus 2 (SARS-CoV-2), the agent of the ongoing COVID-19 pandemic, ... more Severe acute respiratory coronavirus 2 (SARS-CoV-2), the agent of the ongoing COVID-19 pandemic, jumped into humans from an unknown animal reservoir in late 2019. In line with other coronaviruses, SARS-CoV-2 has the potential to infect a broad range of hosts. SARS-CoV-2 genomes have now been isolated from cats, dogs, lions, tigers and minks. SARS-CoV-2 seems to transmit particularly well in mink farms with outbreaks reported in Spain, Sweden, the Netherlands, Italy, the USA and Denmark. Genomic data from SARS-CoV-2 isolated from infected minks provides a natural case study of a secondary host jump of the virus, in this case from humans to animals, and occasionally back again. We screened published SARS-CoV-2 genomes isolated from minks for the presence of recurrent mutations common in mink but infrequent in SARS-CoV-2 genomes isolated from human infections. We identify 23 recurrent mutations including three nonsynonymous mutations in the Receptor Binding Domain of the SARS-CoV-2 spi...
Individuals with likely exposure to the highly infectious SARS-CoV-2 do not necessarily develop P... more Individuals with likely exposure to the highly infectious SARS-CoV-2 do not necessarily develop PCR or antibody positivity, suggesting some may clear sub-clinical infection before seroconversion. T cells can contribute to the rapid clearance of SARS-CoV-2 and other coronavirus infections1–5. We hypothesised that pre-existing memory T cell responses, with cross-protective potential against SARS-CoV-26–12, would expand in vivo to mediate rapid viral control, potentially aborting infection. We studied T cells against the replication transcription complex (RTC) of SARS-CoV-2 since this is transcribed first in the viral life cycle13–15and should be highly conserved. We measured SARS-CoV-2-reactive T cells in a cohort of intensively monitored healthcare workers (HCW) who remained repeatedly negative by PCR, antibody binding, and neutralisation for SARS-CoV-2 (exposed seronegative, ES). 16-weeks post-recruitment, ES had memory T cells that were stronger and more multispecific than an unexp...
Our understanding of the host component of sepsis has made significant progress. However, detaile... more Our understanding of the host component of sepsis has made significant progress. However, detailed study of the microorganisms causing sepsis, either as single pathogens or microbial assemblages, has received far less attention. Metagenomic data offer opportunities to characterize the microbial communities found in septic and healthy individuals. In this study we apply gradient-boosted tree classifiers and a novel computational decontamination technique built upon SHapley Additive exPlanations (SHAP) to identify microbial hallmarks which discriminate blood metagenomic samples of septic patients from that of healthy individuals. Classifiers had high performance when using the read assignments to microbial genera [area under the receiver operating characteristic (AUROC=0.995)], including after removal of species ‘culture-confirmed’ as the cause of sepsis through clinical testing (AUROC=0.915). Models trained on single genera were inferior to those employing a polymicrobial model and w...
The COVID-19 pandemic has led to an unprecedented global sequencing effort of its viral agent SAR... more The COVID-19 pandemic has led to an unprecedented global sequencing effort of its viral agent SARS-CoV-2. The first whole genome assembly of SARS-CoV-2 was published on January 5 2020. Since then, over 150,000 high-quality SARS-CoV-2 genomes have been made available. This large genomic resource has allowed tracing of the emergence and spread of mutations and phylogenetic reconstruction of SARS-CoV-2 lineages in near real time. Though, whether SARS-CoV-2 undergoes genetic recombination has been largely overlooked to date. Recombination-mediated rearrangement of variants that arose independently can be of major evolutionary importance. Moreover, the absence of recombination is a key assumption behind the application of phylogenetic inference methods. Here, we analyse the extant genomic diversity of SARS-CoV-2 and show that, to date, there is no detectable hallmark of recombination. We assess our detection power using simulations and validate our method on the related MERS-CoV for whic...
The mobile resistance gene blaNDM encodes the NDM enzyme capable of hydrolysing carbapenems, a cl... more The mobile resistance gene blaNDM encodes the NDM enzyme capable of hydrolysing carbapenems, a class of antibiotics used to treat some of the most severe bacterial infections. blaNDM is globally distributed across a variety of Gram-negative bacteria and is typically located within a highly recombining transposon-rich genomic region common to multiple plasmids types. As a result of this genomic complexity the dynamics underlying the dissemination of blaNDM remain poorly resolved. In this work, we compiled a dataset of over 2000 bacterial genomes harbouring the blaNDM gene including 112 new PacBio hybrid assemblies from clinical and livestock associated isolates across China and developed a novel computational approach to track structural variants in bacterial genomes. We were able to correlate specific structural variants with plasmid backbones, bacterial host species and sampling locations, and identified multiple transposition events that occurred during the global dissemination of...
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel coronavirus responsible f... more Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel coronavirus responsible for the COVID-19 pandemic, continues to cause significant public health burden and disruption globally. Genomic epidemiology approaches point to most countries in the world having experienced many independent introductions of SARS-CoV-2 during the early stages of the pandemic. However, this situation may change with local lockdown policies and restrictions on travel leading to the emergence of more geographically structured viral populations and lineages transmitting locally. Here, we report the first SARS-CoV-2 genomes from Palestine sampled from early March, when the first cases were observed, through to August of 2020. SARS-CoV-2 genomes from Palestine fall across the diversity of the global phylogeny, consistent with at least nine independent introductions into the region. We identify one locally predominant lineage in circulation represented by 50 Palestinian SARS-CoV-2, grouping wit...
BackgroundThe last two decades have seen significant progress in our understanding of the host co... more BackgroundThe last two decades have seen significant progress in our understanding of the host component of sepsis. However, detailed study of the composition of the microbial community present in sepsis, or indeed, the finer characterisation of this community as a predictive tool to identify sepsis cases, has only received limited attention.MethodsUsing various rigorous computational methods, we analysed a publicly available metagenomic dataset, comparing the patterns of microbial DNA in the blood plasma of septic patients relative to that of healthy individuals. We evaluated the performance of gradient-boosted tree classifiers in determining if the microbial taxonomic assignments of a blood metagenomic sample belonged to a septic or healthy individual. Additionally, we demonstrated a novel application of SHapley Additive exPlanations (SHAP), a recently developed model interpretation approach, to computationally remove putative contaminant genera present in such data. Finally, we c...
Uploads
Papers by Lucy van Dorp