-
Automated Data-Driven Discovery of Material Models Based on Symbolic Regression: A Case Study on Human Brain Cortex
Authors:
Jixin Hou,
Xianyan Chen,
Taotao Wu,
Ellen Kuhl,
Xianqiao Wang
Abstract:
We introduce a data-driven framework to automatically identify interpretable and physically meaningful hyperelastic constitutive models from sparse data. Leveraging symbolic regression, an algorithm based on genetic programming, our approach generates elegant hyperelastic models that achieve accurate data fitting through parsimonious mathematic formulae, while strictly adhering to hyperelasticity…
▽ More
We introduce a data-driven framework to automatically identify interpretable and physically meaningful hyperelastic constitutive models from sparse data. Leveraging symbolic regression, an algorithm based on genetic programming, our approach generates elegant hyperelastic models that achieve accurate data fitting through parsimonious mathematic formulae, while strictly adhering to hyperelasticity constraints such as polyconvexity. Our investigation spans three distinct hyperelastic models -- invariant-based, principal stretch-based, and normal strain-based -- and highlights the versatility of symbolic regression. We validate our new approach using synthetic data from five classic hyperelastic models and experimental data from the human brain to demonstrate algorithmic efficacy. Our results suggest that our symbolic regression robustly discovers accurate models with succinct mathematic expressions in invariant-based, stretch-based, and strain-based scenarios. Strikingly, the strain-based model exhibits superior accuracy, while both stretch- and strain-based models effectively capture the nonlinearity and tension-compression asymmetry inherent to human brain tissue. Polyconvexity examinations affirm the rigor of convexity within the training regime and demonstrate excellent extrapolation capabilities beyond this regime for all three models. However, the stretch-based models raise concerns regarding potential convexity loss under large deformations. Finally, robustness tests on noise-embedded data underscore the reliability of our symbolic regression algorithms. Our study confirms the applicability and accuracy of symbolic regression in the automated discovery of hyperelastic models for the human brain and gives rise to a wide variety of applications in other soft matter systems.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Guidelines in Wastewater-based Epidemiology of SARS-CoV-2 with Diagnosis
Authors:
Madiha Fatima,
Zhihua Cao,
Aichun Huang,
Shengyuan Wu,
Xinxian Fan,
Yi Wang,
Liu Jiren,
Ziyun Zhu,
Qiongrou Ye,
Yuan Ma,
Joseph K. F Chow,
Peng Jia,
Yangshou Liu,
Yubin Lin,
Manjun Ye,
Tong Wu,
Zhixun Li,
Cong Cai,
Wenhai Zhang,
Cheris H. Q. Ding,
Yuanzhe Cai,
Feijuan Huang
Abstract:
With the global spread and increasing transmission rate of SARS-CoV-2, more and more laboratories and researchers are turning their attention to wastewater-based epidemiology (WBE), hoping it can become an effective tool for large-scale testing and provide more ac-curate predictions of the number of infected individuals. Based on the cases of sewage sampling and testing in some regions such as Hon…
▽ More
With the global spread and increasing transmission rate of SARS-CoV-2, more and more laboratories and researchers are turning their attention to wastewater-based epidemiology (WBE), hoping it can become an effective tool for large-scale testing and provide more ac-curate predictions of the number of infected individuals. Based on the cases of sewage sampling and testing in some regions such as Hong Kong, Brazil, and the United States, the feasibility of detecting the novel coronavirus in sewage is extremely high. This study re-views domestic and international achievements in detecting SARS-CoV-2 through WBE and summarizes four aspects of COVID-19, including sampling methods, virus decay rate cal-culation, standardized population coverage of the watershed, algorithm prediction, and provides ideas for combining field modeling with epidemic prevention and control. Moreover, we highlighted some diagnostic techniques for detection of the virus from sew-age sample. Our review is a new approach in identification of the research gaps in waste water-based epidemiology and diagnosis and we also predict the future prospect of our analysis.
△ Less
Submitted 26 December, 2023;
originally announced January 2024.
-
Random Copolymer inverse design system orienting on Accurate discovering of Antimicrobial peptide-mimetic copolymers
Authors:
Tianyu Wu,
Yang Tang
Abstract:
Antimicrobial resistance is one of the biggest health problem, especially in the current period of COVID-19 pandemic. Due to the unique membrane-destruction bactericidal mechanism, antimicrobial peptide-mimetic copolymers are paid more attention and it is urgent to find more potential candidates with broad-spectrum antibacterial efficacy and low toxicity. Artificial intelligence has shown signific…
▽ More
Antimicrobial resistance is one of the biggest health problem, especially in the current period of COVID-19 pandemic. Due to the unique membrane-destruction bactericidal mechanism, antimicrobial peptide-mimetic copolymers are paid more attention and it is urgent to find more potential candidates with broad-spectrum antibacterial efficacy and low toxicity. Artificial intelligence has shown significant performance on small molecule or biotech drugs, however, the higher-dimension of polymer space and the limited experimental data restrict the application of existing methods on copolymer design. Herein, we develop a universal random copolymer inverse design system via multi-model copolymer representation learning, knowledge distillation and reinforcement learning. Our system realize a high-precision antimicrobial activity prediction with few-shot data by extracting various chemical information from multi-modal copolymer representations. By pre-training a scaffold-decorator generative model via knowledge distillation, copolymer space are greatly contracted to the near space of existing data for exploration. Thus, our reinforcement learning algorithm can be adaptive for customized generation on specific scaffolds and requirements on property or structures. We apply our system on collected antimicrobial peptide-mimetic copolymers data, and we discover candidate copolymers with desired properties.
△ Less
Submitted 7 December, 2022; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Molecular Joint Representation Learning via Multi-modal Information
Authors:
Tianyu Wu,
Yang Tang,
Qiyu Sun,
Luolin Xiong
Abstract:
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g. textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input L…
▽ More
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g. textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Learning biological neuronal networks with artificial neural networks: neural oscillations
Authors:
Ruilin Zhang,
Zhongyi Wang,
Tianyi Wu,
Yuhang Cai,
Louis Tao,
Zhuo-Cheng Xiao,
Yao Li
Abstract:
First-principles-based modelings have been extremely successful in providing crucial insights and predictions for complex biological functions and phenomena. However, they can be hard to build and expensive to simulate for complex living systems. On the other hand, modern data-driven methods thrive at modeling many types of high-dimensional and noisy data. Still, the training and interpretation of…
▽ More
First-principles-based modelings have been extremely successful in providing crucial insights and predictions for complex biological functions and phenomena. However, they can be hard to build and expensive to simulate for complex living systems. On the other hand, modern data-driven methods thrive at modeling many types of high-dimensional and noisy data. Still, the training and interpretation of these data-driven models remain challenging. Here, we combine the two types of methods to model stochastic neuronal network oscillations. Specifically, we develop a class of first-principles-based artificial neural networks to provide faithful surrogates to the high-dimensional, nonlinear oscillatory dynamics produced by neural circuits in the brain. Furthermore, when the training data set is enlarged within a range of parameter choices, the artificial neural networks become generalizable to these parameters, covering cases in distinctly different dynamical regimes. In all, our work opens a new avenue for modeling complex neuronal network dynamics with artificial neural networks.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
SPRINT: A fast, new software tool for reconstructing the evolutionary past of polyploid datasets
Authors:
Liam J. Maher,
Taoyang Wu,
Katharina T. Huber
Abstract:
Polyploidization is an important evolutionary process which affects organisms ranging from plants to fish and fungi. The signal left behind by it is in the form of a species' ploidy level (number of complete chromosome sets found in a cell) which is inherently non-treelike. Currently available tools for reconstructing the evolutionary past of a polyploid dataset generally start with a multi-labell…
▽ More
Polyploidization is an important evolutionary process which affects organisms ranging from plants to fish and fungi. The signal left behind by it is in the form of a species' ploidy level (number of complete chromosome sets found in a cell) which is inherently non-treelike. Currently available tools for reconstructing the evolutionary past of a polyploid dataset generally start with a multi-labelled tree obtained for a dataset of interest and then derive a (phylogenetic) network from that tree in some way that reflects that past by interpreting the networks's vertices of indegree at least two as polyploidization events. Since obtaining such a tree can be computationally expensive it is paramount to have alternative approaches available that allow one to shed light into the reticulate evolutionary past of a polyploid dataset. SPRINT aims to reconstruct the evolutionary past of a polyploid dataset in terms of a binary network which realises the dataset's ploidy profile (vector of ploidy levels of the dataset's taxa) and requires the fewest number of polyploidization events. It does this by representing the ploidy level of a species x in terms of the number of directed paths from the root of the network to the leaf of the network labelled by x. SPRINT is distributed on GitHub: https://github.com/lmaher1/SPRINT.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Evolutionary games and spatial periodicity
Authors:
Te Wu,
Feng Fu,
Long Wang
Abstract:
We establish a theoretical framework to address evolutionary dynamics of spatial games under strong selection. As the selection intensity tends to infinity, strategy competition unfolds in the deterministic way of winners taking all. We rigorously prove that the evolutionary process soon or later either enters a cycle and from then on repeats the cycle periodically, or stabilizes at some state alm…
▽ More
We establish a theoretical framework to address evolutionary dynamics of spatial games under strong selection. As the selection intensity tends to infinity, strategy competition unfolds in the deterministic way of winners taking all. We rigorously prove that the evolutionary process soon or later either enters a cycle and from then on repeats the cycle periodically, or stabilizes at some state almost everywhere. This conclusion holds for any population graph and a large class of finite games. This framework suffices to reveal the underlying mathematical rationale for the kaleidoscopic cooperation of Nowak and May's pioneering work on spatial games: highly symmetric starting configuration causes a very long transient phase covering a large number of extremely beautiful spatial patterns. For all starting configurations, spatial patterns transit definitely over generations, so cooperators and defectors persist definitely. This framework can be extended to explore games including the snowdrift game, the public goods games (with or without loner, punishment), and repeated games on graphs. Aspiration dynamics can also be fully addressed when players deterministically switch strategy for unmet aspirations by virtue of our framework. Our results have potential implications for exploring the dynamics of a large variety of spatially extended systems in biology and physics.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Multi-band oscillations emerge from a simple spiking network
Authors:
Tianyi Wu,
Yuhang Cai,
Ruilin Zhang,
Zhongyi Wang,
Louis Tao,
Zhuo-Cheng Xiao
Abstract:
In the brain, coherent neuronal activities often appear simultaneously in multiple frequency bands, e.g., as combinations of alpha (8-12 Hz), beta (12.5-30 Hz), gamma (30-120 Hz) oscillations, among others. These rhythms are believed to underlie information processing and cognitive functions and have been subjected to intense experimental and theoretical scrutiny. Computational modeling has provid…
▽ More
In the brain, coherent neuronal activities often appear simultaneously in multiple frequency bands, e.g., as combinations of alpha (8-12 Hz), beta (12.5-30 Hz), gamma (30-120 Hz) oscillations, among others. These rhythms are believed to underlie information processing and cognitive functions and have been subjected to intense experimental and theoretical scrutiny. Computational modeling has provided a framework for the emergence of network-level oscillatory behavior from the interaction of spiking neurons. However, due to the strong nonlinear interactions between highly recurrent spiking populations, the interplay between cortical rhythms in multiple frequency bands has rarely been theoretically investigated. Many studies invoke multiple physiological timescales or oscillatory inputs to produce rhythms in multi-bands. Here we demonstrate the emergence of multi-band oscillations in a simple network consisting of one excitatory and one inhibitory neuronal population driven by constant input. First, we construct a data-driven, Poincaré section theory for robust numerical observations of single-frequency oscillations bifurcating into multiple bands. Then we develop model reductions of the stochastic, nonlinear, high-dimensional neuronal network to capture the appearance of multi-band dynamics and the underlying bifurcations theoretically. Furthermore, when viewed within the reduced state space, our analysis reveals conserved geometrical features of the bifurcations on low-dimensional dynamical manifolds. These results suggest a simple geometric mechanism behind the emergence of multi-band oscillations without appealing to oscillatory inputs or multiple synaptic or neuronal timescales. Thus our work points to unexplored regimes of stochastic competition between excitation and inhibition behind the generation of dynamic, patterned neuronal activities.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
EGR: Equivariant Graph Refinement and Assessment of 3D Protein Complex Structures
Authors:
Alex Morehead,
Xiao Chen,
Tianqi Wu,
Jian Liu,
Jianlin Cheng
Abstract:
Protein complexes are macromolecules essential to the functioning and well-being of all living organisms. As the structure of a protein complex, in particular its region of interaction between multiple protein subunits (i.e., chains), has a notable influence on the biological function of the complex, computational methods that can quickly and effectively be used to refine and assess the quality of…
▽ More
Protein complexes are macromolecules essential to the functioning and well-being of all living organisms. As the structure of a protein complex, in particular its region of interaction between multiple protein subunits (i.e., chains), has a notable influence on the biological function of the complex, computational methods that can quickly and effectively be used to refine and assess the quality of a protein complex's 3D structure can directly be used within a drug discovery pipeline to accelerate the development of new therapeutics and improve the efficacy of future vaccines. In this work, we introduce the Equivariant Graph Refiner (EGR), a novel E(3)-equivariant graph neural network (GNN) for multi-task structure refinement and assessment of protein complexes. Our experiments on new, diverse protein complex datasets, all of which we make publicly available in this work, demonstrate the state-of-the-art effectiveness of EGR for atomistic refinement and assessment of protein complexes and outline directions for future work in the field. In doing so, we establish a baseline for future studies in macromolecular refinement and structure analysis.
△ Less
Submitted 24 May, 2022; v1 submitted 20 May, 2022;
originally announced May 2022.
-
Planar Rooted Phylogenetic Networks
Authors:
Vincent Moulton,
Taoyang Wu
Abstract:
A rooted phylogenetic network is a directed acyclic graph with a single root, whose sinks correspond to a set of species. As such networks are useful for representing the evolution of species that have undergone reticulate evolution, there has been great interest in developing the theory behind and algorithms for constructing them. However, unlike evolutionary trees, these networks can be highly n…
▽ More
A rooted phylogenetic network is a directed acyclic graph with a single root, whose sinks correspond to a set of species. As such networks are useful for representing the evolution of species that have undergone reticulate evolution, there has been great interest in developing the theory behind and algorithms for constructing them. However, unlike evolutionary trees, these networks can be highly non-planar, which can make them difficult to visualise and interpret. Here we investigate properties of planar rooted phylogenetic networks and algorithms for deciding whether or not rooted networks have certain special planarity properties. In particular, we introduce three natural subclasses of planar rooted phylogenetic networks and show that they form a hierarchy. In addition, for the well-known level-k networks, we show that level-1, -2, -3 networks are always outer, terminal, and upward planar, respectively, and that level-4 networks are not necessarily planar. Finally, we show that a regular network is terminal planar if and only if it is pyramidal. Our results make use of the highly developed field of planar digraphs, and we believe that the link between phylogenetic networks and planar graphs should prove useful in future for developing new approaches to both construct and visualise phylogenetic networks.
△ Less
Submitted 27 June, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Artificial Intelligence Enables Real-Time and Intuitive Control of Prostheses via Nerve Interface
Authors:
Diu Khue Luu,
Anh Tuan Nguyen,
Ming Jiang,
Markus W. Drealan,
Jian Xu,
Tong Wu,
Wing-kin Tam,
Wenfeng Zhao,
Brian Z. H. Lim,
Cynthia K. Overstreet,
Qi Zhao,
Jonathan Cheng,
Edward W. Keefer,
Zhi Yang
Abstract:
Objective: The next generation prosthetic hand that moves and feels like a real hand requires a robust neural interconnection between the human minds and machines. Methods: Here we present a neuroprosthetic system to demonstrate that principle by employing an artificial intelligence (AI) agent to translate the amputee's movement intent through a peripheral nerve interface. The AI agent is designed…
▽ More
Objective: The next generation prosthetic hand that moves and feels like a real hand requires a robust neural interconnection between the human minds and machines. Methods: Here we present a neuroprosthetic system to demonstrate that principle by employing an artificial intelligence (AI) agent to translate the amputee's movement intent through a peripheral nerve interface. The AI agent is designed based on the recurrent neural network (RNN) and could simultaneously decode six degree-of-freedom (DOF) from multichannel nerve data in real-time. The decoder's performance is characterized in motor decoding experiments with three human amputees. Results: First, we show the AI agent enables amputees to intuitively control a prosthetic hand with individual finger and wrist movements up to 97-98% accuracy. Second, we demonstrate the AI agent's real-time performance by measuring the reaction time and information throughput in a hand gesture matching task. Third, we investigate the AI agent's long-term uses and show the decoder's robust predictive performance over a 16-month implant duration. Conclusion & significance: Our study demonstrates the potential of AI-enabled nerve technology, underling the next generation of dexterous and intuitive prosthetic hands.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Distributions of cherries and pitchforks for the Ford model
Authors:
Gursharn Kaur,
Kwok Pui Choi,
Taoyang Wu
Abstract:
We study two fringe subtree counting statistics, the number of cherries and that of pitchforks for Ford's $α$ model, a one-parameter family of random phylogenetic tree models that includes the uniform and the Yule models, two tree models commonly used in phylogenetics. Based on a nonuniform version of the extended Pólya urn models in which negative entries are permitted for their replacement matri…
▽ More
We study two fringe subtree counting statistics, the number of cherries and that of pitchforks for Ford's $α$ model, a one-parameter family of random phylogenetic tree models that includes the uniform and the Yule models, two tree models commonly used in phylogenetics. Based on a nonuniform version of the extended Pólya urn models in which negative entries are permitted for their replacement matrices, we obtain the strong law of large numbers and the central limit theorem for the joint distribution of these two count statistics for the Ford model. Furthermore, we derive a recursive formula for computing the exact joint distribution of these two statistics. This leads to exact formulas for their means and higher order asymptotic expansions of their second moments, which allows us to identify a critical parameter value for the correlation between these two statistics. That is, when $n$ is sufficiently large, they are negatively correlated for $0\le α\le 1/2$ and positively correlated for $1/2<α<1$.
△ Less
Submitted 4 November, 2021; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Evolutionary dynamics of zero-determinant strategies in repeated multiplayer games
Authors:
Fang Chen,
Te Wu,
Long Wang
Abstract:
Since Press and Dyson's ingenious discovery of ZD (zero-determinant) strategy in the repeated Prisoner's Dilemma game, several studies have confirmed the existence of ZD strategy in repeated multiplayer social dilemmas. However, few researches study the evolutionary performance of multiplayer ZD strategies, especially from a theoretical perspective. Here, we use a newly proposed state-clustering m…
▽ More
Since Press and Dyson's ingenious discovery of ZD (zero-determinant) strategy in the repeated Prisoner's Dilemma game, several studies have confirmed the existence of ZD strategy in repeated multiplayer social dilemmas. However, few researches study the evolutionary performance of multiplayer ZD strategies, especially from a theoretical perspective. Here, we use a newly proposed state-clustering method to theoretically analyze the evolutionary dynamics of two representative ZD strategies: generous ZD strategies and extortionate ZD strategies. Apart from the competitions between the two strategies and some classical strategies, we consider two new settings for multiplayer ZD strategies: competitions in the whole ZD strategy space and competitions in the space of all memory-1 strategies. Besides, we investigate the influence of level of generosity and extortion on the evolutionary dynamics of generous and extortionate ZD, which was commonly ignored in previous studies. Theoretical results show players with limited generosity are at an advantageous place and extortioners extorting more severely hold their ground more readily. Our results may provide new insights into better understanding the evolutionary dynamics of ZD strategies in repeated multiplayer games.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Encoding and ordering X-cactuses
Authors:
Andrew Francis,
Katharina T. Huber,
Vincent Moulton,
Taoyang Wu
Abstract:
Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are commonly used to represent the evolution of species which cross with one another. A special type of phylogenetic network is an {\em $X$-cactus}, which is essentially a cactus graph in which all vertices with degree less than three are labelled by at least one element from a set $X$ of species. In this paper,…
▽ More
Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are commonly used to represent the evolution of species which cross with one another. A special type of phylogenetic network is an {\em $X$-cactus}, which is essentially a cactus graph in which all vertices with degree less than three are labelled by at least one element from a set $X$ of species. In this paper, we present a way to {\em encode} $X$-cactuses in terms of certain collections of partitions of $X$ that naturally arise from $X$-cactuses. Using this encoding, we also introduce a partial order on the set of $X$-cactuses (up to isomorphism), and derive some structural properties of the resulting partially ordered set. This includes an analysis of some properties of its least upper and greatest lower bounds. Our results not only extend some fundamental properties of phylogenetic trees to $X$-cactuses, but also provides a new approach to solving topical problems in phylogenetic network theory such as deriving consensus networks.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Modeling spatial waves of Wolbachia invasion for controlling mosquito-borne diseases
Authors:
Zhuolin Qu,
Tong Wu,
James Mac Hyman
Abstract:
Wolbachia is a natural bacterium that can infect mosquitoes and reduce their ability to transmit mosquito-borne diseases, such as dengue fever, Zika, and chikungunya. Field trials and modeling studies have shown that the fraction of infection among the mosquitoes must exceed a threshold level for the infection to persist. To capture this threshold, it is critical to consider the spatial heterogene…
▽ More
Wolbachia is a natural bacterium that can infect mosquitoes and reduce their ability to transmit mosquito-borne diseases, such as dengue fever, Zika, and chikungunya. Field trials and modeling studies have shown that the fraction of infection among the mosquitoes must exceed a threshold level for the infection to persist. To capture this threshold, it is critical to consider the spatial heterogeneity in the distributions of the infected and uninfected mosquitoes, which is created by the local release of the infected mosquitoes. We develop and analyze partial differential equation (PDE) models to study the invasion dynamics of Wolbachia infection among mosquitoes in the field. Our reaction-diffusion-type models account for both the complex vertical transmission and the spatial mosquito dispersion. We characterize the threshold for a successful invasion, which is a bubble-shaped profile, called the "critical bubble". The critical bubble is optimal in its release size compared to other spatial profiles in a one-dimensional landscape. The fraction of infection near the release center is higher than the threshold level for the corresponding homogeneously mixing ODE models. We show that the proposed spatial models give rise to the traveling waves of Wolbachia infection when above the threshold. We quantify how the threshold condition and traveling-wave velocity depend on the diffusion coefficients and other model parameters. Numerical studies for different scenarios are presented to inform the design of release strategies.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
State-clustering method of payoff computation in repeated multiplayer games
Authors:
Fang Chen,
Te Wu,
Guocheng Wang,
Long Wang
Abstract:
Direct reciprocity is a well-known mechanism that could explain how cooperation emerges and prevails in an evolving population. Numerous prior researches have studied the emergence of cooperation in multiplayer games. However, most of them use numerical or experimental methods, not theoretical analysis. This lack of theoretical works on the evolution of cooperation is due to the high complexity of…
▽ More
Direct reciprocity is a well-known mechanism that could explain how cooperation emerges and prevails in an evolving population. Numerous prior researches have studied the emergence of cooperation in multiplayer games. However, most of them use numerical or experimental methods, not theoretical analysis. This lack of theoretical works on the evolution of cooperation is due to the high complexity of calculating payoffs. In this paper, we propose a new method, namely, the state-clustering method to calculate the long-term payoffs in repeated games. Using this method, in an $n$-player repeated game, the computing complexity is reduced from $O(2^n)$ to $O(n^2)$, which makes it possible to compute a large-scale repeated game's payoff. We explore the evolution of cooperation in both infinitely and finitely repeated public goods games as an example to show the effectiveness of our method. In both cases, we find that when the synergy factor is sufficiently large, the increasing number of participants in a game is detrimental to the evolution of cooperation. Our work provides a theoretical approach to study the evolution of cooperation in repeated multiplayer games.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Model Reduction Captures Stochastic Gamma Oscillations on Low-Dimensional Manifolds
Authors:
Yuhang Cai,
Tianyi Wu,
Louis Tao,
Zhuo-Cheng Xiao
Abstract:
Gamma frequency oscillations (25-140 Hz), observed in the neural activities within many brain regions, have long been regarded as a physiological basis underlying many brain functions, such as memory and attention. Among numerous theoretical and computational modeling studies, gamma oscillations have been found in biologically realistic spiking network models of the primary visual cortex. However,…
▽ More
Gamma frequency oscillations (25-140 Hz), observed in the neural activities within many brain regions, have long been regarded as a physiological basis underlying many brain functions, such as memory and attention. Among numerous theoretical and computational modeling studies, gamma oscillations have been found in biologically realistic spiking network models of the primary visual cortex. However, due to its high dimensionality and strong nonlinearity, it is generally difficult to perform detailed theoretical analysis of the emergent gamma dynamics. Here we propose a suite of Markovian model reduction methods with varying levels of complexity and applied it to spiking network models exhibiting heterogeneous dynamical regimes, ranging from homogeneous firing to strong synchrony in the gamma band. The reduced models not only successfully reproduce gamma band oscillations in the full model, but also exhibit the same dynamical features as we vary parameters. Most remarkably, the invariant measure of the coarse-grained Markov process reveals a two-dimensional surface in state space upon which the gamma dynamics mainly resides. Our results suggest that the statistical features of gamma oscillations strongly depend on the subthreshold neuronal distributions. Because of the generality of the Markovian assumptions, our dimensional reduction methods offer a powerful toolbox for theoretical examinations of many other complex cortical spatio-temporal behaviors observed in both neurophysiological experiments and numerical simulations.
△ Less
Submitted 22 January, 2021; v1 submitted 5 January, 2021;
originally announced January 2021.
-
On cherry and pitchfork distributions of random rooted and unrooted phylogenetic trees
Authors:
Kwok Pui Choi,
Ariadne Thompson,
Taoyang Wu
Abstract:
Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on re…
▽ More
Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on recursive formulas on the joint distribution of the number of cherries and that of pitchforks, it is shown that cherry distributions are log-concave for both rooted and unrooted trees under these two models. Furthermore, the mean number of cherries and that of pitchforks for unrooted trees converge respectively to those for rooted trees under the YHK model while there exists a limiting gap of 1/4 for the PDA model. Finally, the total variation distances between the cherry distributions of rooted and those of unrooted trees converge for both models. Our results indicate that caution is required for conducting statistical analysis for tree shapes involving both rooted and unrooted trees.
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
ABCD Neurocognitive Prediction Challenge 2019: Predicting individual residual fluid intelligence scores from cortical grey matter morphology
Authors:
Neil P. Oxtoby,
Fabio S. Ferreira,
Agoston Mihalik,
Tong Wu,
Mikael Brudfors,
Hongxiang Lin,
Anita Rau,
Stefano B. Blumberg,
Maria Robu,
Cemre Zor,
Maira Tariq,
Maria Del Mar Estarellas Garcia,
Baris Kanber,
Daniil I. Nikitichev,
Janaina Mourao-Miranda
Abstract:
We predicted residual fluid intelligence scores from T1-weighted MRI data available as part of the ABCD NP Challenge 2019, using morphological similarity of grey-matter regions across the cortex. Individual structural covariance networks (SCN) were abstracted into graph-theory metrics averaged over nodes across the brain and in data-driven communities/modules. Metrics included degree, path length,…
▽ More
We predicted residual fluid intelligence scores from T1-weighted MRI data available as part of the ABCD NP Challenge 2019, using morphological similarity of grey-matter regions across the cortex. Individual structural covariance networks (SCN) were abstracted into graph-theory metrics averaged over nodes across the brain and in data-driven communities/modules. Metrics included degree, path length, clustering coefficient, centrality, rich club coefficient, and small-worldness. These features derived from the training set were used to build various regression models for predicting residual fluid intelligence scores, with performance evaluated both using cross-validation within the training set and using the held-out validation set. Our predictions on the test set were generated with a support vector regression model trained on the training set. We found minimal improvement over predicting a zero residual fluid intelligence score across the sample population, implying that structural covariance networks calculated from T1-weighted MR imaging data provide little information about residual fluid intelligence.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
ABCD Neurocognitive Prediction Challenge 2019: Predicting individual fluid intelligence scores from structural MRI using probabilistic segmentation and kernel ridge regression
Authors:
Agoston Mihalik,
Mikael Brudfors,
Maria Robu,
Fabio S. Ferreira,
Hongxiang Lin,
Anita Rau,
Tong Wu,
Stefano B. Blumberg,
Baris Kanber,
Maira Tariq,
Maria Del Mar Estarellas Garcia,
Cemre Zor,
Daniil I. Nikitichev,
Janaina Mourao-Miranda,
Neil P. Oxtoby
Abstract:
We applied several regression and deep learning methods to predict fluid intelligence scores from T1-weighted MRI scans as part of the ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel intensities and probabilistic tissue-type labels derived from these as features to train the models. The best predictive performance (lowest mean-squared error) came from Kernel Ridge…
▽ More
We applied several regression and deep learning methods to predict fluid intelligence scores from T1-weighted MRI scans as part of the ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel intensities and probabilistic tissue-type labels derived from these as features to train the models. The best predictive performance (lowest mean-squared error) came from Kernel Ridge Regression (KRR; $λ=10$), which produced a mean-squared error of 69.7204 on the validation set and 92.1298 on the test set. This placed our group in the fifth position on the validation leader board and first place on the final (test) leader board.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural Recording
Authors:
Tong Wu,
Wenfeng Zhao,
Edward Keefer,
Zhi Yang
Abstract:
Understanding the coordinated activity underlying brain computations requires large-scale, simultaneous recordings from distributed neuronal structures at a cellular-level resolution. One major hurdle to design high-bandwidth, high-precision, large-scale neural interfaces lies in the formidable data streams that are generated by the recorder chip and need to be online transferred to a remote compu…
▽ More
Understanding the coordinated activity underlying brain computations requires large-scale, simultaneous recordings from distributed neuronal structures at a cellular-level resolution. One major hurdle to design high-bandwidth, high-precision, large-scale neural interfaces lies in the formidable data streams that are generated by the recorder chip and need to be online transferred to a remote computer. The data rates can require hundreds to thousands of I/O pads on the recorder chip and power consumption on the order of Watts for data streaming alone. We developed a deep learning-based compression model to reduce the data rate of multichannel action potentials. The proposed model is built upon a deep compressive autoencoder (CAE) with discrete latent embeddings. The encoder is equipped with residual transformations to extract representative features from spikes, which are mapped into the latent embedding space and updated via vector quantization (VQ). The decoder network reconstructs spike waveforms from the quantized latent embeddings. Experimental results show that the proposed model consistently outperforms conventional methods by achieving much higher compression ratios (20-500x) and better or comparable reconstruction accuracies. Testing results also indicate that CAE is robust against a diverse range of imperfections, such as waveform variation and spike misalignment, and has minor influence on spike sorting accuracy. Furthermore, we have estimated the hardware cost and real-time performance of CAE and shown that it could support thousands of recording channels simultaneously without excessive power/heat dissipation. The proposed model can reduce the required data transmission bandwidth in large-scale recording experiments and maintain good signal qualities. The code of this work has been made available at https://github.com/tong-wu-umn/spike-compression-autoencoder
△ Less
Submitted 16 September, 2018; v1 submitted 14 September, 2018;
originally announced September 2018.
-
Phenotype affinity mediated interactions can facilitate the evolution of cooperation
Authors:
Te Wu,
Feng Fu,
Long Wang
Abstract:
We study the coevolutionary dynamics of the diversity of phenotype expression and the evolution of cooperation in the Prisoner's Dilemma game. Rather than pre-assigning zero-or-one interaction rate, we diversify the rate of interaction by associating it with the phenotypes shared in common. Individuals each carry a set of potentially expressible phenotypes and expresses a certain number of phenoty…
▽ More
We study the coevolutionary dynamics of the diversity of phenotype expression and the evolution of cooperation in the Prisoner's Dilemma game. Rather than pre-assigning zero-or-one interaction rate, we diversify the rate of interaction by associating it with the phenotypes shared in common. Individuals each carry a set of potentially expressible phenotypes and expresses a certain number of phenotypes at a cost proportional to the number. The number of expressed phenotypes and thus the rate of interaction is an evolvable trait. Our results show that nonnegligible cost of expressing phenotypes restrains phenotype expression, and the evolutionary race mainly proceeds on between cooperative strains and defective strains who express a very few phenotypes. It pays for cooperative strains to express a very few phenotypes. Though such a low level of expression weakens reciprocity between cooperative strains, it decelerates rate of interaction between cooperative strains and defective strains to a larger degree, leading to the predominance of cooperative strains over defective strains. We also find that evolved diversity of phenotype expression can occasionally destabilize due to the invasion of defective mutants, implying that cooperation and diversity of phenotype expression can mutually reinforce each other. Therefore, our results provide new insights into better understanding the coevolution of cooperation and the diversity of phenotype expression.
△ Less
Submitted 3 June, 2018;
originally announced June 2018.
-
Crowding induces entropically-driven changes to DNA dynamics that depend on crowder structure and ionic conditions
Authors:
Warren M. Mardoum,
Stephanie M. Gorczyca,
Kathryn E. Regan,
Tsai-Chin Wu,
Rae M. Robertson-Anderson
Abstract:
Macromolecular crowding plays a principal role in a wide range of biological processes including gene expression, chromosomal compaction, and viral infection. However, the impact that crowding has on the dynamics of nucleic acids remains a topic of debate. To address this problem, we use single-molecule fluorescence microscopy and custom particle-tracking algorithms to investigate the impact of va…
▽ More
Macromolecular crowding plays a principal role in a wide range of biological processes including gene expression, chromosomal compaction, and viral infection. However, the impact that crowding has on the dynamics of nucleic acids remains a topic of debate. To address this problem, we use single-molecule fluorescence microscopy and custom particle-tracking algorithms to investigate the impact of varying macromolecular crowding conditions on the transport and conformational dynamics of large DNA molecules. Specifically, we measure the mean-squared center-of-mass displacements, as well as the conformational size, shape, and fluctuations, of individual 115 kbp DNA molecules diffusing through various in vitro solutions of crowding polymers. We determine the role of crowder structure and concentration, as well as ionic conditions, on the diffusion and configurational dynamics of DNA. We find that branched, compact crowders (10 kDa PEG, 420 kDa Ficoll) drive DNA to compact, whereas linear, flexible crowders (10 kDa, 500 kDa dextran) cause DNA to elongate. Interestingly, the extent to which DNA mobility is reduced by increasing crowder concentrations appears largely insensitive to crowder structure (branched vs linear), despite the highly different configurations DNA assumes in each case. We also characterize the role of ionic conditions on crowding-induced DNA dynamics. We show that both DNA diffusion and conformational size exhibit an emergent non-monotonic dependence on salt concentration that is not seen in the absence of crowders.
△ Less
Submitted 27 March, 2018;
originally announced March 2018.
-
Quarnet inference rules for level-1 networks
Authors:
Katharine T. Huber,
Vincent Moulton,
Charles Semple,
Taoyang Wu
Abstract:
An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set $X$ of species from a collection of trees, each having leaf-set some subset of $X$. In the 1980's characterizations, certain inference rules were given for when a collection of 4-leav…
▽ More
An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set $X$ of species from a collection of trees, each having leaf-set some subset of $X$. In the 1980's characterizations, certain inference rules were given for when a collection of 4-leaved trees, one for each 4-element subset of $X$, can all be simultaneously displayed by a single supertree with leaf-set $X$. Recently, it has become of interest to extend such results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has been shown that a certain type of phylogenetic network, called a level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of $X$, can all be simultaneously displayed by a level-1 network with leaf-set $X$. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of $X$ from orderings on subsets of $X$ of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
UPGMA and the normalized equidistant minimum evolution problem
Authors:
Vincent Moulton,
Andreas Spillner,
Taoyang Wu
Abstract:
UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a widely used clustering method. Here we show that UPGMA is a greedy heuristic for the normalized equidistant minimum evolution (NEME) problem, that is, finding a rooted tree that minimizes the minimum evolution score relative to the dissimilarity matrix among all rooted trees with the same leaf-set in which all leaves have the same dist…
▽ More
UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a widely used clustering method. Here we show that UPGMA is a greedy heuristic for the normalized equidistant minimum evolution (NEME) problem, that is, finding a rooted tree that minimizes the minimum evolution score relative to the dissimilarity matrix among all rooted trees with the same leaf-set in which all leaves have the same distance to the root. We prove that the NEME problem is NP-hard. In addition, we present some heuristic and approximation algorithms for solving the NEME problem, including a polynomial time algorithm that yields a binary, rooted tree whose NEME score is within O(log^2 n) of the optimum. We expect that these results to eventually provide further insights into the behavior of the UPGMA algorithm.
△ Less
Submitted 3 April, 2017;
originally announced April 2017.
-
Treewidth distance on phylogenetic trees
Authors:
Steven Kelk,
Georgios Stamoulis,
Taoyang Wu
Abstract:
In this article we study the treewidth of the \emph{display graph}, an auxiliary graph structure obtained from the fusion of phylogenetic (i.e., evolutionary) trees at their leaves. Earlier work has shown that the treewidth of the display graph is bounded if the trees are in some formal sense topologically similar. Here we further expand upon this relationship. We analyse a number of reduction rul…
▽ More
In this article we study the treewidth of the \emph{display graph}, an auxiliary graph structure obtained from the fusion of phylogenetic (i.e., evolutionary) trees at their leaves. Earlier work has shown that the treewidth of the display graph is bounded if the trees are in some formal sense topologically similar. Here we further expand upon this relationship. We analyse a number of reduction rules which are commonly used in the phylogenetics literature to obtain fixed parameter tractable algorithms. In some cases (the \emph{subtree} reduction) the reduction rules behave similarly with respect to treewidth, while others (the \emph{cluster} reduction) behave very differently, and the behaviour of the \emph{chain reduction} is particularly intriguing because of its link with graph separators and forbidden minors. We also show that the gap between treewidth and Tree Bisection and Reconnect (TBR) distance can be infinitely large, and that unlike, for example, planar graphs the treewidth of the display graph can be as much as linear in its number of vertices. On a slightly different note we show that if a display graph is formed from the fusion of a phylogenetic network and a tree, rather than from two trees, the treewidth of the display graph is bounded whenever the tree can be topologically embedded ("displayed") within the network. This opens the door to the formulation of the display problem in Monadic Second Order Logic (MSOL). A number of other auxiliary results are given. We conclude with a discussion and list a number of open problems.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
A cubic-time algorithm for computing the trinet distance between level-1 networks
Authors:
Vincent Moulton,
James Oldman,
Taoyang Wu
Abstract:
In evolutionary biology, phylogenetic networks are constructed to represent the evolution of species in which reticulate events are thought to have occurred, such as recombination and hybridization. It is therefore useful to have efficiently computable metrics with which to systematically compare such networks. Through developing an optimal algorithm to enumerate all trinets displayed by a level-1…
▽ More
In evolutionary biology, phylogenetic networks are constructed to represent the evolution of species in which reticulate events are thought to have occurred, such as recombination and hybridization. It is therefore useful to have efficiently computable metrics with which to systematically compare such networks. Through developing an optimal algorithm to enumerate all trinets displayed by a level-1 network (a type of network that is slightly more general than an evolutionary tree), here we propose a cubic-time algorithm to compute the trinet distance between two level-1 networks. Employing simulations, we also present a comparison between the trinet metric and the so-called Robinson-Foulds phylogenetic network metric restricted to level-1 networks. The algorithms described in this paper have been implemented in JAVA and are freely available at https://www.uea.ac.uk/computing/TriLoNet.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.
-
Bounds for phylogenetic network space metrics
Authors:
Andrew Francis,
Katharina Huber,
Vincent Moulton,
Taoyang Wu
Abstract:
Phylogenetic networks are a generalization of phylogenetic trees that allow for representation of reticulate evolution. Recently, a space of unrooted phylogenetic networks was introduced, where such a network is a connected graph in which every vertex has degree 1 or 3 and whose leaf-set is a fixed set $X$ of taxa. This space, denoted $\mathcal{N}(X)$, is defined in terms of two operations on netw…
▽ More
Phylogenetic networks are a generalization of phylogenetic trees that allow for representation of reticulate evolution. Recently, a space of unrooted phylogenetic networks was introduced, where such a network is a connected graph in which every vertex has degree 1 or 3 and whose leaf-set is a fixed set $X$ of taxa. This space, denoted $\mathcal{N}(X)$, is defined in terms of two operations on networks -- the nearest neighbor interchange and triangle operations -- which can be used to transform any network with leaf set $X$ into any other network with that leaf set. In particular, it gives rise to a metric $d$ on $\mathcal N(X)$ which is given by the smallest number of operations required to transform one network in $\mathcal N(X)$ into another in $\mathcal N(X)$. The metric generalizes the well-known NNI-metric on phylogenetic trees which has been intensively studied in the literature. In this paper, we derive a bound for the metric $d$ as well as a related metric $d_{N\!N\!I}$ which arises when restricting $d$ to the subset of $\mathcal{N}(X)$ consisting of all networks with $2(|X|-1+i)$ vertices, $i \ge 1$. We also introduce two new metrics on networks -- the SPR and TBR metrics -- which generalize the metrics on phylogenetic trees with the same name and give bounds for these new metrics. We expect our results to eventually have applications to the development and understanding of network search algorithms.
△ Less
Submitted 8 March, 2017; v1 submitted 18 February, 2017;
originally announced February 2017.
-
Binets: fundamental building blocks for phylogenetic networks
Authors:
Leo van Iersel,
Vincent Moulton,
Eveline de Swart,
Taoyang Wu
Abstract:
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic network…
▽ More
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known Graph Isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to find a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks.
△ Less
Submitted 31 January, 2017;
originally announced January 2017.
-
Exploring the reproducibility of functional connectivity alterations in Parkinson's Disease
Authors:
Liviu Badea,
Mihaela Onu,
Tao Wu,
Adina Roceanu,
Ovidiu Bajenaru
Abstract:
Since anatomic MRI is presently not able to directly discern neuronal loss in Parkinson's Disease (PD), studying the associated functional connectivity (FC) changes seems a promising approach toward developing non-invasive and non-radioactive neuroimaging markers for this disease. While several groups have reported such FC changes in PD, there are also significant discrepancies between studies. In…
▽ More
Since anatomic MRI is presently not able to directly discern neuronal loss in Parkinson's Disease (PD), studying the associated functional connectivity (FC) changes seems a promising approach toward developing non-invasive and non-radioactive neuroimaging markers for this disease. While several groups have reported such FC changes in PD, there are also significant discrepancies between studies. Investigating the reproducibility of PD-related FC changes on independent datasets is therefore of crucial importance. We acquired resting-state fMRI scans for 43 subjects (27 patients , 16 controls) and compared the observed FC changes with those obtained in 2 independent datasets, one made available by the PPMI consortium and a second one by the group of Tao Wu. Unfortunately, PD-related functional connectivity changes turned out to be non-reproducible across datasets. This could be due to disease heterogeneity, but also to technical differences. To distinguish between the two, we devised a method to directly check for disease heterogeneity using random splits of a single dataset. Since we still observe non-reproducibility in a large fraction of random splits of the same dataset, we conclude that functional heterogeneity may be a dominating factor behind the lack of reproducibility of FC alterations in different rs-fMRI studies of PD. While global PD-related functional connectivity changes were non-reproducible across datasets, we identified a few individual brain region pairs with marginally consistent FC changes across all three datasets. However, training classifiers on each one of the 3 datasets to discriminate PD scans from controls produced only low accuracies on the remaining two test datasets. Moreover, classifiers trained and tested on random splits of the same dataset (which are technically homogeneous) also had low test accuracies, directly substantiating disease heterogeneity.
△ Less
Submitted 1 September, 2017; v1 submitted 15 November, 2016;
originally announced November 2016.
-
Transforming phylogenetic networks: Moving beyond tree space
Authors:
Katharina T. Huber,
Vincent Moulton,
Taoyang Wu
Abstract:
Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI)…
▽ More
Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks.
△ Less
Submitted 8 January, 2016;
originally announced January 2016.
-
Reduction rules for the maximum parsimony distance on phylogenetic trees
Authors:
Steven Kelk,
Mareike Fischer,
Vincent Moulton,
Taoyang Wu
Abstract:
In phylogenetics, distances are often used to measure the incongruence between a pair of phylogenetic trees that are reconstructed by different methods or using different regions of genome. Motivated by the maximum parsimony principle in tree inference, we recently introduced the maximum parsimony (MP) distance, which enjoys various attractive properties due to its connection with several other we…
▽ More
In phylogenetics, distances are often used to measure the incongruence between a pair of phylogenetic trees that are reconstructed by different methods or using different regions of genome. Motivated by the maximum parsimony principle in tree inference, we recently introduced the maximum parsimony (MP) distance, which enjoys various attractive properties due to its connection with several other well-known tree distances, such as TBR and SPR. Here we show that computing the MP distance between two trees, a NP-hard problem in general, is fixed parameter tractable in terms of the TBR distance between the tree pair. Our approach is based on two reduction rules--the chain reduction and the subtree reduction--that are widely used in computing TBR and SPR distances. More precisely, we show that reducing chains to length 4 (but not shorter) preserves the MP distance. In addition, we describe a generalization of the subtree reduction which allows the pendant subtrees to be rooted in different places, and show that this still preserves the MP distance. On a slightly different note we also show that Monadic Second Order Logic (MSOL), posited over an auxiliary graph structure known as the display graph (obtained by merging the two trees at their leaves), can be used to obtain an alternative proof that computation of MP distance is fixed parameter tractable in terms of TBR-distance. We conclude with an extended discussion in which we focus on similarities and differences between MP distance and TBR distance and present a number of open problems. One particularly intriguing question, emerging from the MSOL formulation, is whether two trees with bounded MP distance induce display graphs of bounded treewidth.
△ Less
Submitted 7 July, 2016; v1 submitted 23 December, 2015;
originally announced December 2015.
-
On joint subtree distributions under two evolutionary models
Authors:
Taoyang Wu,
Kwok Pui Choi
Abstract:
In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet…
▽ More
In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model. Based on two novel recursive formulae, we propose a dynamic approach to numerically compute the exact joint distribution (and hence the marginal distributions) for trees of any size. We also obtained insights into the statistical properties of trees generated under these two models, including a constant correlation between the cherry and the pitchfork distributions under the YHK model, the log-concavity and unimodality of cherry distributions under both models. In particular, we show the existence of a unique change point for cherry distribution between the two models, that is, there exists a critical value $τ_n$ for each $n\geq 4$ such that the probability that a random tree with $n$ leaves generated under the YHK model contains $k$ cherries is lower than that under the PDA model if $1<k< τ_n$, and higher if $τ_n<k\le n/2$.
△ Less
Submitted 13 August, 2015;
originally announced August 2015.
-
Folding and unfolding phylogenetic trees and networks
Authors:
Katharina T. Huber,
Vincent Moulton,
Mike Steel,
Taoyang Wu
Abstract:
Phylogenetic networks are rooted, labelled directed acyclic graphs which are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network $N$ can be 'unfolded' to obtain a MUL-tree $U(N)$ and, conversely, a MUL-tree $T$ can in certain circumstances be 'folded' to obtain a phylogen…
▽ More
Phylogenetic networks are rooted, labelled directed acyclic graphs which are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network $N$ can be 'unfolded' to obtain a MUL-tree $U(N)$ and, conversely, a MUL-tree $T$ can in certain circumstances be 'folded' to obtain a phylogenetic network $F(T)$ that exhibits $T$. In this paper, we study properties of the operations $U$ and $F$ in more detail. In particular, we introduce the class of stable networks, phylogenetic networks $N$ for which $F(U(N))$ is isomorphic to $N$, characterise such networks, and show that that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network $N$ can be related to displaying the tree in the MUL-tree $U(N)$. To do this, we develop a phylogenetic analogue of graph fibrations. This allows us to view $U(N)$ as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in $U(N)$ and reconciling phylogenetic trees with networks.
△ Less
Submitted 14 June, 2015;
originally announced June 2015.
-
Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets
Authors:
Katharina Huber,
Leo van Iersel,
Vincent Moulton,
Celine Scornavacca,
Taoyang Wu
Abstract:
Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set $\mathcal{T}$ of binary binets or trinets over a set $X$ of taxa, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for bine…
▽ More
Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set $\mathcal{T}$ of binary binets or trinets over a set $X$ of taxa, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an $O(3^{|X|} poly(|X|))$ time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks.
△ Less
Submitted 25 November, 2014;
originally announced November 2014.
-
Clades and clans: a comparison study of two evolutionary models
Authors:
Sha Zhu,
Cuong Than,
Taoyang Wu
Abstract:
The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model are two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions of clade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important in hypothesis testing and Bayesian analyses in phylogenetics. Her…
▽ More
The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model are two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions of clade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important in hypothesis testing and Bayesian analyses in phylogenetics. Here we show that these distributions are log-convex, which implies that very large clades or very small clades are more likely to occur under these two models. Moreover, we prove that there exists a critical value $κ(n)$ for each $n\geqslant 4$ such that for a given clade with size $k$, the probability that this clade is contained in a random tree with $n$ leaves generated under the YHK model is higher than that under the PDA model if $1<k<κ(n)$, and lower if $κ(n)<k<n$. Finally, we extend our results to binary unrooted trees, and obtain similar results for the distributions of clan sizes.
△ Less
Submitted 15 July, 2014;
originally announced July 2014.
-
Representing Partitions on Trees
Authors:
Katharina T. Huber,
Vincent Moulton,
Charles Semple,
Taoyang Wu
Abstract:
In evolutionary biology, biologists often face the problem of constructing a phylogenetic tree on a set $X$ of species from a multiset $Π$ of partitions corresponding to various attributes of these species. One approach that is used to solve this problem is to try instead to associate a tree (or even a network) to the multiset $Σ_Π$ consisting of all those bipartitions $\{A,X-A\}$ with $A$ a part…
▽ More
In evolutionary biology, biologists often face the problem of constructing a phylogenetic tree on a set $X$ of species from a multiset $Π$ of partitions corresponding to various attributes of these species. One approach that is used to solve this problem is to try instead to associate a tree (or even a network) to the multiset $Σ_Π$ consisting of all those bipartitions $\{A,X-A\}$ with $A$ a part of some partition in $Π$. The rational behind this approach is that a phylogenetic tree with leaf set $X$ can be uniquely represented by the set of bipartitions of $X$ induced by its edges. Motivated by these considerations, given a multiset $Σ$ of bipartitions corresponding to a phylogenetic tree on $X$, in this paper we introduce and study the set $P(Σ)$ consisting of those multisets of partitions $Π$ of $X$ with $Σ_Π=Σ$. More specifically, we characterize when $P(Σ)$ is non-empty, and also identify some partitions in $P(Σ)$ that are of maximum and minimum size. We also show that it is NP-complete to decide when $P(Σ)$ is non-empty in case $Σ$ is an arbitrary multiset of bipartitions of $X$. Ultimately, we hope that by gaining a better understanding of the mapping that takes an arbitrary partition system $Π$ to the multiset $Σ_Π$, we will obtain new insights into the use of median networks and, more generally, split-networks to visualize sets of partitions.
△ Less
Submitted 9 May, 2014;
originally announced May 2014.
-
Compression of structured high-throughput sequencing data
Authors:
Fabien Campagne,
Kevin C. Dorff,
Nyasha Chambwe,
James T. Robinson,
Jill P. Mesirov,
Thomas D. Wu
Abstract:
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compre…
▽ More
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 20% when storing gene expression and epigenetic datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.
△ Less
Submitted 28 November, 2012;
originally announced November 2012.
-
Reconstruction of Network Evolutionary History from Extant Network Topology and Duplication History
Authors:
Si Li,
Kwok Pui Choi,
Taoyang Wu,
Louxin Zhang
Abstract:
Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history f…
▽ More
Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history from the topology of PPI networks and the duplication relationship among the paralogs. Simulations show that our approach outperforms the existing ones in terms of the accuracy of reconstruction. Moreover, the growth parameters of several real PPI networks estimated by our method are more consistent with the ones predicted in literature.
△ Less
Submitted 12 March, 2012;
originally announced March 2012.
-
On Patchworks and Hierarchies
Authors:
Andreas Dress,
Vincent Moulton,
Mike Steel,
Taoyang Wu
Abstract:
Motivated by questions in biological classification, we discuss some elementary combinatorial and computational properties of certain set systems that generalize hierarchies, namely, 'patchworks', 'weak patchworks', 'ample patchworks' and 'saturated patchworks' and also outline how these concepts relate to an apparently new 'duality theory' for cluster systems that is based on the fundamental conc…
▽ More
Motivated by questions in biological classification, we discuss some elementary combinatorial and computational properties of certain set systems that generalize hierarchies, namely, 'patchworks', 'weak patchworks', 'ample patchworks' and 'saturated patchworks' and also outline how these concepts relate to an apparently new 'duality theory' for cluster systems that is based on the fundamental concept of 'compatibility' of clusters.
△ Less
Submitted 11 February, 2012;
originally announced February 2012.
-
On the neighbourhoods of trees
Authors:
Peter J. Humphries,
Taoyang Wu
Abstract:
Tree rearrangement operations typically induce a metric on the space of phylogenetic trees. One important property of these metrics is the size of the neighbourhood, that is, the number of trees exactly one operation from a given tree. We present an expression for the size of the TBR (tree bisection and reconnection) neighbourhood, thus answering a question first posed in [Annals of Combinatorics,…
▽ More
Tree rearrangement operations typically induce a metric on the space of phylogenetic trees. One important property of these metrics is the size of the neighbourhood, that is, the number of trees exactly one operation from a given tree. We present an expression for the size of the TBR (tree bisection and reconnection) neighbourhood, thus answering a question first posed in [Annals of Combinatorics, 5, 2001 1-15].
△ Less
Submitted 10 February, 2012;
originally announced February 2012.
-
Reconciliation of Gene and Species Trees With Polytomies
Authors:
Yu Zheng,
Taoyang Wu,
Louxin Zhang
Abstract:
Motivation: Millions of genes in the modern species belong to only thousands of `gene families'. A gene family includes instances of the same gene in different species (orthologs) and duplicate genes in the same species (paralogs). Genes are gained and lost during evolution. With advances in sequencing technology, researchers are able to investigate the important roles of gene duplications and los…
▽ More
Motivation: Millions of genes in the modern species belong to only thousands of `gene families'. A gene family includes instances of the same gene in different species (orthologs) and duplicate genes in the same species (paralogs). Genes are gained and lost during evolution. With advances in sequencing technology, researchers are able to investigate the important roles of gene duplications and losses in adaptive evolution. Because of gene complex evolution, ortholog identification is a basic but difficult task in comparative genomics. A key method for the task is to use an explicit model of the evolutionary history of the genes being studied, called the gene (family) tree. It compares the gene tree with the evolutionary history of the species in which the genes reside, called the species tree, using the procedure known as tree reconciliation. Reconciling binary gene and specific trees is simple. However, both gene and species trees may be non-binary in practice and thus tree reconciliation presents challenging problems. Here, non-binary gene and species tree reconciliation is studied in a binary refinement model.
Results: The problem of reconciling arbitrary gene and species trees is proved NP-hard even for the duplication cost. We then present the first efficient method for reconciling a non-binary gene tree and a non-binary species tree. It attempts to find binary refinements of the given gene and species trees that minimize reconciliation cost. Our algorithms have been implemented into a software to support quick automated analysis of large data sets.
Availability: The program, together with the source code, is available at its online server http://phylotoo.appspot.com.
△ Less
Submitted 2 May, 2012; v1 submitted 19 January, 2012;
originally announced January 2012.
-
Species, Clusters and the 'Tree of Life': A graph-theoretic perspective
Authors:
Andreas Dress,
Vincent Moulton,
Mike Steel,
Taoyang Wu
Abstract:
A hierarchical structure describing the inter-relationships of species has long been a fundamental concept in systematic biology, from Linnean classification through to the more recent quest for a 'Tree of Life.' In this paper we use an approach based on discrete mathematics to address a basic question: Could one delineate this hierarchical structure in nature purely by reference to the 'genealo…
▽ More
A hierarchical structure describing the inter-relationships of species has long been a fundamental concept in systematic biology, from Linnean classification through to the more recent quest for a 'Tree of Life.' In this paper we use an approach based on discrete mathematics to address a basic question: Could one delineate this hierarchical structure in nature purely by reference to the 'genealogy' of present-day individuals, which describes how they are related with one another by ancestry through a continuous line of descent? We describe several mathematically precise ways by which one can naturally define collections of subsets of present day individuals so that these subsets are nested (and so form a tree) based purely on the directed graph that describes the ancestry of these individuals. We also explore the relationship between these and related clustering constructions.
△ Less
Submitted 20 August, 2009;
originally announced August 2009.