Inconsistency of parsimony under the multispecies coalescent
Authors:
Daniel Rickert,
Wai-Tong Louis Fan,
Matthew Hahn
Abstract:
While it is known that parsimony can be statistically inconsistent under certain models of evolution due to high levels of homoplasy, the consistency of parsimony under the multispecies coalescent (MSC) is less well studied. Previous studies have shown the consistency of concatenated parsimony (parsimony applied to concatenated alignments) under the MSC for the rooted 4-taxa case under an infinite…
▽ More
While it is known that parsimony can be statistically inconsistent under certain models of evolution due to high levels of homoplasy, the consistency of parsimony under the multispecies coalescent (MSC) is less well studied. Previous studies have shown the consistency of concatenated parsimony (parsimony applied to concatenated alignments) under the MSC for the rooted 4-taxa case under an infinite-sites model of mutation; on the other hand, other work has also established the inconsistency of concatenated parsimony for the unrooted 6-taxa case. These seemingly contradictory results suggest that concatenated parsimony may fail to be consistent for trees with more than 5 taxa, for all unrooted trees, or for some combination of the two. Here, we present a technique for computing the expected internal branch lengths of gene trees under the MSC. This technique allows us to determine the regions of the parameter space of the species tree under which concatenated parsimony fails for different numbers of taxa, for rooted or unrooted trees. We use our new approach to demonstrate that there are always regions of statistical inconsistency for concatenated parsimony for the 5- and 6-taxa cases, regardless of rooting. Our results therefore suggest that parsimony is not generally dependable under the MSC.
△ Less
Submitted 4 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
Reply to Zhang et al.: Linear regression does not encapsulate the effect of non-pharmaceutical interventions on the number of COVID-19 cases
Authors:
Angeline G. Pendergrass,
Kristie L. Ebi,
Micah B. Hahn
Abstract:
Zhang et al. (2020) used linear regression to quantify the effect of lockdowns on the number of cases of COVID-19. We show using differential equations from the susceptible-exposed-infected-recovered (SEIR) model and with an example from another location not previously considered that the Zhang et al. analysis should not be considered sound evidence that mask mandates are sufficient to control or…
▽ More
Zhang et al. (2020) used linear regression to quantify the effect of lockdowns on the number of cases of COVID-19. We show using differential equations from the susceptible-exposed-infected-recovered (SEIR) model and with an example from another location not previously considered that the Zhang et al. analysis should not be considered sound evidence that mask mandates are sufficient to control or the primary factor controlling the spread of COVID-19.
△ Less
Submitted 4 October, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
The sequencing and interpretation of the genome obtained from a Serbian individual
Authors:
Wazim Mohammed Ismail,
Kymberleigh A. Pagel,
Vikas Pejaver,
Simo V. Zhang,
Sofia Casasa,
Matthew Mort,
David N. Cooper,
Matthew W. Hahn,
Predrag Radivojac
Abstract:
Recent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and large-scale sequencing efforts and hence present an opportunity to discover new variants of biomedical and demographic significance. This report descr…
▽ More
Recent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and large-scale sequencing efforts and hence present an opportunity to discover new variants of biomedical and demographic significance. This report describes the sequencing and analysis of a genome obtained from an individual of Serbian origin, introducing tens of thousands of previously unknown variants to the currently available pool. Ancestry analysis places this individual in close proximity of the Central and Eastern European populations; i.e., closest to Croatian, Bulgarian and Hungarian individuals and, in terms of other Europeans, furthest from Ashkenazi Jewish, Spanish, Sicilian, and Baltic individuals. Our analysis confirmed gene flow between Neanderthal and ancestral pan-European populations, with similar contributions to the Serbian genome as those observed in other European groups. Finally, to assess the burden of potentially disease-causing/clinically relevant variation in the sequenced genome, we utilized manually curated genotype-phenotype association databases and variant-effect predictors. We identified several variants that have previously been associated with severe early-onset disease that is not evident in the proband, as well as variants that could yet prove to be clinically relevant to the proband over the next decades. The presence of numerous private and low-frequency variants along with the observed and predicted disease-causing mutations in this genome exemplify some of the global challenges of genome interpretation, especially in the context of understudied ethnic groups.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.