-
Semi-Quantitative Group Testing for Efficient and Accurate qPCR Screening of Pathogens with a Wide Range of Loads
Authors:
Ananthan Nambiar,
Chao Pan,
Vishal Rana,
Mahdi Cheraghchi,
João Ribeiro,
Sergei Maslov,
Olgica Milenkovic
Abstract:
Pathogenic infections pose a significant threat to global health, affecting millions of people every year and presenting substantial challenges to healthcare systems worldwide. Efficient and timely testing plays a critical role in disease control and transmission prevention. Group testing is a well-established method for reducing the number of tests needed to screen large populations when the dise…
▽ More
Pathogenic infections pose a significant threat to global health, affecting millions of people every year and presenting substantial challenges to healthcare systems worldwide. Efficient and timely testing plays a critical role in disease control and transmission prevention. Group testing is a well-established method for reducing the number of tests needed to screen large populations when the disease prevalence is low. However, it does not fully utilize the quantitative information provided by qPCR methods, nor is it able to accommodate a wide range of pathogen loads. To address these issues, we introduce a novel adaptive semi-quantitative group testing (SQGT) scheme to efficiently screen populations via two-stage qPCR testing. The SQGT method quantizes cycle threshold ($Ct$) values into multiple bins, leveraging the information from the first stage of screening to improve the detection sensitivity. Dynamic $Ct$ threshold adjustments mitigate dilution effects and enhance test accuracy. Comparisons with traditional binary outcome GT methods show that SQGT reduces the number of tests by $24$% while maintaining a negligible false negative rate.
△ Less
Submitted 2 August, 2023; v1 submitted 30 July, 2023;
originally announced July 2023.
-
Functional universality in slow-growing microbial communities arises from thermodynamic constraints
Authors:
Ashish B. George,
Tong Wang,
Sergei Maslov
Abstract:
The dynamics of microbial communities is incredibly complex, determined by competition for metabolic substrates and cross-feeding of byproducts. Species in the community grow by harvesting energy from chemical reactions that transform substrates to products. In many anoxic environments, these reactions are close to thermodynamic equilibrium and growth is slow. To understand the community structure…
▽ More
The dynamics of microbial communities is incredibly complex, determined by competition for metabolic substrates and cross-feeding of byproducts. Species in the community grow by harvesting energy from chemical reactions that transform substrates to products. In many anoxic environments, these reactions are close to thermodynamic equilibrium and growth is slow. To understand the community structure in these energy-limited environments, we developed a microbial community consumer-resource model incorporating energetic and thermodynamic constraints on an interconnected metabolic network. The central ingredient of the model is product inhibition, meaning that microbial growth may be limited not only by depletion of metabolic substrates but also by accumulation of products. We demonstrate that these additional constraints on microbial growth cause a convergence in the structure and function of the community metabolic network -- independent of species composition and biochemical details -- providing a possible explanation for convergence of community function despite taxonomic variation observed in many natural and industrial environments. Furthermore, we discovered that the structure of community metabolic network is governed by the thermodynamic principle of maximum heat dissipation. Overall, the work demonstrates how universal thermodynamic principles may constrain community metabolism and explain observed functional convergence in microbial communities.
△ Less
Submitted 18 March, 2022; v1 submitted 11 March, 2022;
originally announced March 2022.
-
A cross-study analysis of drug response prediction in cancer cell lines
Authors:
Fangfang Xia,
Jonathan Allen,
Prasanna Balaprakash,
Thomas Brettin,
Cristina Garcia-Cardona,
Austin Clyde,
Judith Cohn,
James Doroshow,
Xiaotian Duan,
Veronika Dubinkina,
Yvonne Evrard,
Ya Ju Fan,
Jason Gans,
Stewart He,
Pinyi Lu,
Sergei Maslov,
Alexander Partin,
Maulik Shukla,
Eric Stahlberg,
Justin M. Wozniak,
Hyunseung Yoo,
George Zaki,
Yitan Zhu,
Rick Stevens
Abstract:
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimat…
▽ More
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: NCI60, CTRP, GDSC, CCLE and gCSI. Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies, and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
△ Less
Submitted 13 August, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Stochastic social behavior coupled to COVID-19 dynamics leads to waves, plateaus and an endemic state
Authors:
Alexei V. Tkachenko,
Sergei Maslov,
Tong Wang,
Ahmed Elbanna,
George N. Wong,
Nigel Goldenfeld
Abstract:
It is well recognized that population heterogeneity plays an important role in the spread of epidemics. While individual variations in social activity are often assumed to be persistent, i.e. constant in time, here we discuss the consequences of dynamic heterogeneity. By integrating the stochastic dynamics of social activity into traditional epidemiological models we demonstrate the emergence of a…
▽ More
It is well recognized that population heterogeneity plays an important role in the spread of epidemics. While individual variations in social activity are often assumed to be persistent, i.e. constant in time, here we discuss the consequences of dynamic heterogeneity. By integrating the stochastic dynamics of social activity into traditional epidemiological models we demonstrate the emergence of a new long timescale governing the epidemic in broad agreement with empirical data. Our model captures multiple features of real-life epidemics such as COVID-19, including prolonged plateaus and multiple waves, which are transiently suppressed due to the dynamic nature of social activity. The existence of the long timescale due to the interplay between epidemic and social dynamics provides a unifying picture of how a fast-paced epidemic typically will transition to the endemic state.
△ Less
Submitted 19 February, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Time-dependent heterogeneity leads to transient suppression of the COVID-19 epidemic, not herd immunity
Authors:
Alexei V. Tkachenko,
Sergei Maslov,
Ahmed Elbanna,
George N. Wong,
Zachary J. Weiner,
Nigel Goldenfeld
Abstract:
Epidemics generally spread through a succession of waves that reflect factors on multiple timescales. On short timescales, super-spreading events lead to burstiness and overdispersion, while long-term persistent heterogeneity in susceptibility is expected to lead to a reduction in the infection peak and the herd immunity threshold (HIT). Here, we develop a general approach to encompass both timesc…
▽ More
Epidemics generally spread through a succession of waves that reflect factors on multiple timescales. On short timescales, super-spreading events lead to burstiness and overdispersion, while long-term persistent heterogeneity in susceptibility is expected to lead to a reduction in the infection peak and the herd immunity threshold (HIT). Here, we develop a general approach to encompass both timescales, including time variations in individual social activity, and demonstrate how to incorporate them phenomenologically into a wide class of epidemiological models through parameterization. We derive a non-linear dependence of the effective reproduction number Re on the susceptible population fraction S. We show that a state of transient collective immunity (TCI) emerges well below the HIT during early, high-paced stages of the epidemic. However, this is a fragile state that wanes over time due to changing levels of social activity, and so the infection peak is not an indication of herd immunity: subsequent waves can and will emerge due to behavioral changes in the population, driven (e.g.) by seasonal factors. Transient and long-term levels of heterogeneity are estimated by using empirical data from the COVID-19 epidemic as well as from real-life face-to-face contact networks. These results suggest that the hardest-hit areas, such as NYC, have achieved TCI following the first wave of the epidemic, but likely remain below the long-term HIT. Thus, in contrast to some previous claims, these regions can still experience subsequent waves.
△ Less
Submitted 29 January, 2021; v1 submitted 10 August, 2020;
originally announced August 2020.
-
Modeling COVID-19 dynamics in Illinois under non-pharmaceutical interventions
Authors:
George N. Wong,
Zachary J. Weiner,
Alexei V. Tkachenko,
Ahmed Elbanna,
Sergei Maslov,
Nigel Goldenfeld
Abstract:
We present modeling of the COVID-19 epidemic in Illinois, USA, capturing the implementation of a Stay-at-Home order and scenarios for its eventual release. We use a non-Markovian age-of-infection model that is capable of handling long and variable time delays without changing its model topology. Bayesian estimation of model parameters is carried out using Markov Chain Monte Carlo (MCMC) methods. T…
▽ More
We present modeling of the COVID-19 epidemic in Illinois, USA, capturing the implementation of a Stay-at-Home order and scenarios for its eventual release. We use a non-Markovian age-of-infection model that is capable of handling long and variable time delays without changing its model topology. Bayesian estimation of model parameters is carried out using Markov Chain Monte Carlo (MCMC) methods. This framework allows us to treat all available input information, including both the previously published parameters of the epidemic and available local data, in a uniform manner. To accurately model deaths as well as demand on the healthcare system, we calibrate our predictions to total and in-hospital deaths as well as hospital and ICU bed occupancy by COVID-19 patients. We apply this model not only to the state as a whole but also its sub-regions in order to account for the wide disparities in population size and density. Without prior information on non-pharmaceutical interventions (NPIs), the model independently reproduces a mitigation trend closely matching mobility data reported by Google and Unacast. Forward predictions of the model provide robust estimates of the peak position and severity and also enable forecasting the regional-dependent results of releasing Stay-at-Home orders. The resulting highly constrained narrative of the epidemic is able to provide estimates of its unseen progression and inform scenarios for sustainable monitoring and control of the epidemic.
△ Less
Submitted 15 June, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Window of Opportunity for Mitigation to Prevent Overflow of ICU capacity in Chicago by COVID-19
Authors:
Sergei Maslov,
Nigel Goldenfeld
Abstract:
We estimate the growth in demand for ICU beds in Chicago during the emerging COVID-19 epidemic, using state-of-the-art computer simulations calibrated for the SARS-CoV-2 virus. The questions we address are these:
(1) Will the ICU capacity in Chicago be exceeded, and if so by how much?
(2) Can strong mitigation strategies, such as lockdown or shelter in place order, prevent the overflow of capa…
▽ More
We estimate the growth in demand for ICU beds in Chicago during the emerging COVID-19 epidemic, using state-of-the-art computer simulations calibrated for the SARS-CoV-2 virus. The questions we address are these:
(1) Will the ICU capacity in Chicago be exceeded, and if so by how much?
(2) Can strong mitigation strategies, such as lockdown or shelter in place order, prevent the overflow of capacity?
(3) When should such strategies be implemented?
Our answers are as follows:
(1) The ICU capacity may be exceeded by a large amount, probably by a factor of ten.
(2) Strong mitigation can avert this emergency situation potentially, but even that will not work if implemented too late.
(3) If the strong mitigation precedes April 1st, then the growth of COVID-19 can be controlled and the ICU capacity could be adequate. The earlier the strong mitigation is implemented, the greater the probability that it will be successful. After around April 1 2020, any strong mitigation will not avert the emergency situation. In Italy, the lockdown occurred too late and the number of deaths is still doubling every 2.3 days. It is difficult to be sure about the precise dates for this window of opportunity, due to the inherent uncertainties in computer simulation. But there is high confidence in the main conclusion that it exists and will soon be closed.
Our conclusion is that, being fully cognizant of the societal trade-offs, there is a rapidly closing window of opportunity to avert a worst-case scenario in Chicago, but only with strong mitigation/lockdown implemented in the next week at the latest. If this window is missed, the epidemic will get worse and then strong mitigation/lockdown will be required after all, but it will be too late.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Modeling microbial cross-feeding at intermediate scale portrays community dynamics and species coexistence
Authors:
Chen Liao,
Tong Wang,
Sergei Maslov,
Joao B. Xavier
Abstract:
Social interaction between microbes can be described at many levels of details, ranging from the biochemistry of cell-cell interactions to the ecological dynamics of populations. Choosing the best level to model microbial communities without losing generality remains a challenge. Here we propose to model cross-feeding interactions at an intermediate level between genome-scale metabolic models of i…
▽ More
Social interaction between microbes can be described at many levels of details, ranging from the biochemistry of cell-cell interactions to the ecological dynamics of populations. Choosing the best level to model microbial communities without losing generality remains a challenge. Here we propose to model cross-feeding interactions at an intermediate level between genome-scale metabolic models of individual species and consumer-resource models of ecosystems, which is suitable to empirical data. We applied our method to three published examples of multi-strain Escherichia coli communities with increasing complexity consisting of uni-, bi-, and multi-directional cross-feeding of either substitutable metabolic byproducts or essential nutrients. The intermediate-scale model accurately described empirical data and could quantify exchange rates elusive by other means, such as the byproduct secretions, even for a complex community of 14 amino acid auxotrophs. We used the three models to study each community's limits of robustness to perturbations such as variations in resource supply, antibiotic treatments and invasion by other "cheaters" species. Our analysis provides a foundation to quantify cross-feeding interactions from experimental data, and highlights the importance of metabolic exchanges in the dynamics and stability of microbial communities.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Evidence for a multi-level trophic organization of the human gut microbiome
Authors:
Tong Wang,
Akshit Goyal,
Veronika Dubinkina,
Sergei Maslov
Abstract:
The human gut microbiome is a complex ecosystem, in which hundreds of microbial species and metabolites coexist, in part due to an extensive network of cross-feeding interactions. However, both the large-scale trophic organization of this ecosystem, and its effects on the underlying metabolic flow, remain unexplored. Here, using a simplified model, we provide quantitative support for a multi-level…
▽ More
The human gut microbiome is a complex ecosystem, in which hundreds of microbial species and metabolites coexist, in part due to an extensive network of cross-feeding interactions. However, both the large-scale trophic organization of this ecosystem, and its effects on the underlying metabolic flow, remain unexplored. Here, using a simplified model, we provide quantitative support for a multi-level trophic organization of the human gut microbiome, where microbes consume and secrete metabolites in multiple iterative steps. Using a manually-curated set of metabolic interactions between microbes, our model suggests about four trophic levels, each characterized by a high level-to-level metabolic transfer of byproducts. It also quantitatively predicts the typical metabolic environment of the gut (fecal metabolome) in approximate agreement with the real data. To understand the consequences of this trophic organization, we quantify the metabolic flow and biomass distribution, and explore patterns of microbial and metabolic diversity in different levels. The hierarchical trophic organization suggested by our model can help mechanistically establish causal links between the abundances of microbes and metabolites in the human gut.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Multistability and regime shifts in microbial communities explained by competition for essential nutrients
Authors:
Veronika Dubinkina,
Yulia Fridman,
Parth Pratim Pandey,
Sergei Maslov
Abstract:
Microbial communities routinely have several possible species compositions or community states observed for the same environmental parameters. Changes in these parameters can trigger abrupt and persistent transitions (regime shifts) between such community states. Yet little is known about the main determinants and mechanisms of multistability in microbial communities. Here we introduce and study a…
▽ More
Microbial communities routinely have several possible species compositions or community states observed for the same environmental parameters. Changes in these parameters can trigger abrupt and persistent transitions (regime shifts) between such community states. Yet little is known about the main determinants and mechanisms of multistability in microbial communities. Here we introduce and study a resource-explicit model in which microbes compete for two types of essential nutrients. We adapt game-theoretical methods of the stable matching problem to identify all possible species compositions of a microbial community. We then classify them by their resilience against three types of perturbations: fluctuations in nutrient supply, invasions by new species, and small changes of abundances of existing ones. We observe multistability and explore an intricate network of regime shifts between stable states in our model. Our results suggest that multistability requires microbial species to have different stoichiometries of essential nutrients. We also find that balanced nutrient supply promote multistability and species diversity yet make individual community states less stable.
△ Less
Submitted 29 June, 2019;
originally announced July 2019.
-
Alternative stable states in a model of microbial community limited by multiple essential nutrients
Authors:
Veronika Dubinkina,
Yulia Fridman,
Parth Pandey,
Sergei Maslov
Abstract:
Microbial communities routinely have several alternative stable states observed for the same environmental parameters. Sudden and irreversible transitions between these states make external manipulation of these systems more complicated. To better understand the mechanisms and origins of multistability in microbial communities, we introduce and study a model of a microbial ecosystem colonized by m…
▽ More
Microbial communities routinely have several alternative stable states observed for the same environmental parameters. Sudden and irreversible transitions between these states make external manipulation of these systems more complicated. To better understand the mechanisms and origins of multistability in microbial communities, we introduce and study a model of a microbial ecosystem colonized by multiple specialist species selected from a fixed pool. Growth of each species can be limited by essential nutrients of two types, e.g. carbon and nitrogen, each represented in the environment by multiple metabolites. We demonstrate that our model has an exponentially large number of potential stable states realized for different environmental parameters. Using game theoretical methods adapted from the stable marriage problem we predict all of these states based only on ranked lists of competitive abilities of species for each of the nutrients. We show that for every set of nutrient influxes, several mutually uninvadable stable states are generally feasible and we distinguish them based upon their dynamic stability. We further explore an intricate network of discontinuous transitions (regime shifts) between these alternative states both in the course of community assembly, or upon changes of nutrient influxes.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Multiple stable states in microbial communities explained by the stable marriage problem
Authors:
Akshit Goyal,
Veronika Dubinkina,
Sergei Maslov
Abstract:
Experimental studies of microbial communities routinely reveal that they have multiple stable states. While each of these states is generally resilient, certain perturbations such as antibiotics, probiotics and diet shifts, result in transitions to other states. Can we reliably both predict such stable states as well as direct and control transitions between them? Here we present a new conceptual…
▽ More
Experimental studies of microbial communities routinely reveal that they have multiple stable states. While each of these states is generally resilient, certain perturbations such as antibiotics, probiotics and diet shifts, result in transitions to other states. Can we reliably both predict such stable states as well as direct and control transitions between them? Here we present a new conceptual model inspired by the stable marriage problem in game theory and economics in which microbial communities naturally exhibit multiple stable states, each state with a different species' abundance profile. Our model's core ingredient is that microbes utilize nutrients one at a time while competing with each other. Using only two ranked tables, one with microbes' nutrient preferences and one with their competitive abilities, we can determine all possible stable states as well as predict inter-state transitions, triggered by the removal or addition of a specific nutrient or microbe. Further, using an example of 7 Bacteroides species common to the human gut utilizing 9 polysaccharides, we predict that mutual complementarity in nutrient preferences enables these species to coexist at high abundances.
△ Less
Submitted 3 July, 2018; v1 submitted 16 December, 2017;
originally announced December 2017.
-
Diversity, stability, and reproducibility in stochastically assembled microbial ecosystems
Authors:
Akshit Goyal,
Sergei Maslov
Abstract:
Microbial ecosystems are remarkably diverse, stable, and often consist of a balanced mixture of core and peripheral species. Here we propose a conceptual model exhibiting all these emergent properties in quantitative agreement with real ecosystem data, specifically species' abundance and prevalence distributions. Resource competition and metabolic commensalism drive stochastic ecosystem assembly i…
▽ More
Microbial ecosystems are remarkably diverse, stable, and often consist of a balanced mixture of core and peripheral species. Here we propose a conceptual model exhibiting all these emergent properties in quantitative agreement with real ecosystem data, specifically species' abundance and prevalence distributions. Resource competition and metabolic commensalism drive stochastic ecosystem assembly in our model. We demonstrate that even when supplied with just one resource, ecosystems can exhibit high diversity, increasing stability, and partial reproducibility between samples.
△ Less
Submitted 2 November, 2017;
originally announced November 2017.
-
Onset of natural selection in auto-catalytic heteropolymers
Authors:
Alexei V. Tkachenko,
Sergei Maslov
Abstract:
Reduction of information entropy along with ever-increasing complexity are among the key signatures of living matter. Understanding the onset of such behavior in early prebiotic world is essential for solving the problem of origins of life. To elucidate this transition, we study a theoretical model of information-storing heteropolymers capable of template-assisted ligation and subjected to cyclic…
▽ More
Reduction of information entropy along with ever-increasing complexity are among the key signatures of living matter. Understanding the onset of such behavior in early prebiotic world is essential for solving the problem of origins of life. To elucidate this transition, we study a theoretical model of information-storing heteropolymers capable of template-assisted ligation and subjected to cyclic non-equilibrium driving forces. We discover that this simple physical system undergoes a spontaneous reduction of the information entropy due to the competition of chains for constituent monomers. This natural-selection-like process ultimately results in the survival of a limited subset of polymer sequences. Importantly, the number of surviving sequences remains exponentially large, thus opening up the possibility of further increase in complexity due to Darwinian evolution. We also propose potential experimental implementations of our model using either biopolymers or artificial nano-structures.
△ Less
Submitted 17 October, 2017;
originally announced October 2017.
-
Severe population collapses and species extinctions in multi-host epidemic dynamics
Authors:
Sergei Maslov,
Kim Sneppen
Abstract:
Most infectious diseases including more than half of known human pathogens are not restricted to just one host, yet much of the mathematical modeling of infections has been limited to a single species. We investigate consequences of a single epidemic propagating in multiple species and compare and contrast it with the endemic steady state of the disease. We use the two-species Susceptible-Infected…
▽ More
Most infectious diseases including more than half of known human pathogens are not restricted to just one host, yet much of the mathematical modeling of infections has been limited to a single species. We investigate consequences of a single epidemic propagating in multiple species and compare and contrast it with the endemic steady state of the disease. We use the two-species Susceptible-Infected-Recovered (SIR) model to calculate the severity of post-epidemic collapses in populations of two host species as a function of their initial population sizes, the times individuals remain infectious, and the matrix of infection rates. We derive the criteria for a very large, extinction-level, population collapse in one or both of the species. The main conclusion of our study is that a single epidemic could drive a species with high mortality rate to local or even global extinction provided that it is co-infected with an abundant species. Such collapse-driven extinctions depend on factors different than those in the endemic steady state of the disease.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.
-
Family-specific scaling laws in bacterial genomes
Authors:
Eleonora de Lazzari,
Jacopo Grilli,
Sergei Maslov,
Marco Cosentino Lagomarsino
Abstract:
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are n…
▽ More
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Recombinant transfer in the basic genome of E. coli
Authors:
Purushottam Dixit,
Tin Yau Pang,
F. William Studier,
Sergei Maslov
Abstract:
An approximation to the ~4 Mbp basic genome shared by 32 strains of E. coli representing six evolutionary groups has been derived and analyzed computationally. A multiple-alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ~90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single-bp mutations (SNPs…
▽ More
An approximation to the ~4 Mbp basic genome shared by 32 strains of E. coli representing six evolutionary groups has been derived and analyzed computationally. A multiple-alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ~90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single-bp mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome-pairs have one or two recombinant transfers of length ~40-115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome-pairs (0.4-1% SNPs ) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kbp. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by co-evolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.
△ Less
Submitted 14 July, 2015;
originally announced July 2015.
-
Diversity waves in collapse-driven population dynamics
Authors:
Sergei Maslov,
Kim Sneppen
Abstract:
Populations of species in ecosystems are often constrained by availability of resources within their environment. In effect this means that a growth of one population, needs to be balanced by comparable reduction in populations of others. In neutral models of biodiversity all populations are assumed to change incrementally due to stochastic births and deaths of individuals. Here we propose and mod…
▽ More
Populations of species in ecosystems are often constrained by availability of resources within their environment. In effect this means that a growth of one population, needs to be balanced by comparable reduction in populations of others. In neutral models of biodiversity all populations are assumed to change incrementally due to stochastic births and deaths of individuals. Here we propose and model another redistribution mechanism driven by abrupt and severe collapses of the entire population of a single species freeing up resources for the remaining ones. This mechanism may be relevant e.g. for communities of bacteria, with strain-specific collapses caused e.g. by invading bacteriophages, or for other ecosystems where infectious diseases play an important role.
The emergent dynamics of our system is cyclic "diversity waves" triggered by collapses of globally dominating populations. The population diversity peaks at the beginning of each wave and exponentially decreases afterwards. Species abundances are characterized by a bimodal time-aggregated distribution with the lower peak formed by populations of recently collapsed or newly introduced species, while the upper peak - species that has not yet collapsed in the current wave. In most waves both upper and lower peaks are composed of several smaller peaks. This self-organized hierarchical peak structure has a long-term memory transmitted across several waves. It gives rise to a scale-free tail of the time-aggregated population distribution with a universal exponent of 1.7. We show that diversity wave dynamics is robust with respect to variations in the rules of our model such as diffusion between multiple environments, species-specific growth and extinction rates, and bet-hedging strategies.
△ Less
Submitted 14 July, 2015; v1 submitted 2 March, 2015;
originally announced March 2015.
-
Onset of autocatalysis of information-coding polymers
Authors:
Alexei V. Tkachenko,
Sergei Maslov
Abstract:
Self-replicating systems based on information-coding polymers are of crucial importance in biology. They also recently emerged as a paradigm in material design on nano- and micro-scales. We present a general theoretical and numerical analysis of the problem of spontaneous emergence of autocatalysis for heteropolymers capable of template-assisted ligation driven by cyclic changes in the environment…
▽ More
Self-replicating systems based on information-coding polymers are of crucial importance in biology. They also recently emerged as a paradigm in material design on nano- and micro-scales. We present a general theoretical and numerical analysis of the problem of spontaneous emergence of autocatalysis for heteropolymers capable of template-assisted ligation driven by cyclic changes in the environment. Our central result is the existence of the first order transition between the regime dominated by free monomers and that with a self-sustaining population of sufficiently long chains. We provide a simple, mathematically tractable model supported by numerical simulations, which predicts the distribution of chain lengths and the onset of autocatalysis in terms of the overall monomer concentration and two fundamental rate constants. Another key result of our study is the emergence of the kinetically-limited optimal overlap length between a template and each of its two substrates. The template-assisted ligation allows for heritable transmission of the information encoded in chain sequences thus opening up the possibility of long-term memory and evolvability in such systems.
△ Less
Submitted 14 July, 2015; v1 submitted 12 May, 2014;
originally announced May 2014.
-
Quantifying evolutionary dynamics of the basic genome of E. coli
Authors:
Purushottam Dixit,
Tin Yau Pang,
F. William Studier,
Sergei Maslov
Abstract:
The ~4-Mbp basic genome shared by 32 independent isolates of E. coli representing considerable population diversity has been approximated by whole-genome multiple-alignment and computational filtering designed to remove mobile elements and highly variable regions. Single nucleotide polymorphisms (SNPs) in the 496 basic-genome pairs are identified and clonally inherited stretches are distinguished…
▽ More
The ~4-Mbp basic genome shared by 32 independent isolates of E. coli representing considerable population diversity has been approximated by whole-genome multiple-alignment and computational filtering designed to remove mobile elements and highly variable regions. Single nucleotide polymorphisms (SNPs) in the 496 basic-genome pairs are identified and clonally inherited stretches are distinguished from those acquired by horizontal transfer (HT) by sharp discontinuities in SNP density. The six least diverged genome-pairs each have only one or two HT stretches, each occupying 42-115-kbp of basic genome and containing at least one gene cluster known to confer selective advantage. At higher divergences, the typical mosaic pattern of interspersed clonal and HT stretches across the entire basic genome are observed, including likely fragmented integrations across a restriction barrier. A simple model suggests that individual HT events are of the order of 10-kbp and are the chief contributor to genome divergence, bringing in almost 12 times more SNPs than point mutations. As a result of continuing horizontal transfer of such large segments, 400 out of the 496 strain-pairs beyond genomic divergence of share virtually no genomic material with their common ancestor. We conclude that the active and continuing horizontal transfer of moderately large genomic fragments is likely to be mediated primarily by a co evolving population of phages that distribute random genome fragments throughout the population by generalized transduction, allowing efficient adaptation to environmental changes.
△ Less
Submitted 11 May, 2014;
originally announced May 2014.
-
Universal distribution of component frequencies in biological and technological systems
Authors:
Tin Yau Pang,
Sergei Maslov
Abstract:
Bacterial genomes and large-scale computer software projects both consist of a large number of components (genes or software packages) connected via a network of mutual dependencies. Components can be easily added or removed from individual systems and their usage frequencies vary over many orders of magnitude. We study this frequency distribution in genomes of ~500 bacterial species and in over 2…
▽ More
Bacterial genomes and large-scale computer software projects both consist of a large number of components (genes or software packages) connected via a network of mutual dependencies. Components can be easily added or removed from individual systems and their usage frequencies vary over many orders of magnitude. We study this frequency distribution in genomes of ~500 bacterial species and in over 2 million of Linux computers and find that in both cases it is described by the same scale-free power law distribution with an additional peak near the tail of the distribution corresponding to nearly universal components. We argue that this is a general property of any modular system with a multi-layered dependency network. We demonstrate that the frequency of a component is positively correlated with its dependency degree given by the total number of upstream components whose operation directly or indirectly depends on the selected component. The observed frequency/dependency degree distributions are reproduced in a simple mathematically tractable model introduced and analyzed in this study.
△ Less
Submitted 9 August, 2013;
originally announced August 2013.
-
Well-temperate phage: optimal bet-hedging against local environmental collapses
Authors:
Sergei Maslov,
Kim Sneppen
Abstract:
Upon infection of their bacterial hosts temperate phages must chose between lysogenic and lytic developmental strategies. Here we apply the game-theoretic bet-hedging strategy introduced by Kelly to derive the optimal lysogenic fraction of the total population of phages as a function of frequency and intensity of environmental downturns affecting the lytic subpopulation. "Well-temperate" phage fro…
▽ More
Upon infection of their bacterial hosts temperate phages must chose between lysogenic and lytic developmental strategies. Here we apply the game-theoretic bet-hedging strategy introduced by Kelly to derive the optimal lysogenic fraction of the total population of phages as a function of frequency and intensity of environmental downturns affecting the lytic subpopulation. "Well-temperate" phage from our title is characterized by the best long-term population growth rate. We show that it is realized when the lysogenization frequency is approximately equal to the probability of lytic population collapse. We further predict the existence of sharp boundaries in system's environmental, ecological, and biophysical parameters separating the regions where this temperate strategy is optimal from those dominated by purely virulent or} dormant (purely lysogenic) strategies. We show that the virulent strategy works best for phages with large diversity of hosts, and access to multiple independent environments reachable by diffusion. Conversely, progressively more temperate or even dormant strategies are favored in the environments, that are subject to frequent and severe temporal downturns.
△ Less
Submitted 14 July, 2015; v1 submitted 7 August, 2013;
originally announced August 2013.
-
Joint scaling laws in functional and evolutionary categories in prokaryotic genomes
Authors:
Jacopo Grilli,
Bruno Bassetti,
Sergei Maslov,
Marco Cosentino Lagomarsino
Abstract:
We propose and study a class-expansion/innovation/loss model of genome evolution taking into account biological roles of genes and their constituent domains. In our model numbers of genes in different functional categories are coupled to each other. For example, an increase in the number of metabolic enzymes in a genome is usually accompanied by addition of new transcription factors regulating the…
▽ More
We propose and study a class-expansion/innovation/loss model of genome evolution taking into account biological roles of genes and their constituent domains. In our model numbers of genes in different functional categories are coupled to each other. For example, an increase in the number of metabolic enzymes in a genome is usually accompanied by addition of new transcription factors regulating these enzymes. Such coupling can be thought of as a proportional "recipe" for genome composition of the type "a spoonful of sugar for each egg yolk". The model jointly reproduces two known empirical laws: the distribution of family sizes and the nonlinear scaling of the number of genes in certain functional categories (e.g. transcription factors) with genome size. In addition, it allows us to derive a novel relation between the exponents characterising these two scaling laws, establishing a direct quantitative connection between evolutionary and functional categories. It predicts that functional categories that grow faster-than-linearly with genome size to be characterised by flatter-than-average family size distributions. This relation is confirmed by our bioinformatics analysis of prokaryotic genomes. This proves that the joint quantitative trends of functional and evolutionary classes can be understood in terms of evolutionary growth with proportional recipes.
△ Less
Submitted 9 August, 2011; v1 submitted 30 January, 2011;
originally announced January 2011.
-
Toolbox model of evolution of metabolic pathways on networks of arbitrary topology
Authors:
Tin Yau Pang,
Sergei Maslov
Abstract:
In prokaryotic genomes the number of transcriptional regulators is known to quadratically scale with the total number of protein-coding genes. Toolbox model was recently proposed to explain this scaling for metabolic enzymes and their regulators. According to its rules the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a l…
▽ More
In prokaryotic genomes the number of transcriptional regulators is known to quadratically scale with the total number of protein-coding genes. Toolbox model was recently proposed to explain this scaling for metabolic enzymes and their regulators. According to its rules the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a larger "universal" network formed by the union of all species-specific networks. It remained to be understood, however, how the topological properties of this universal network influence the scaling law of functional content of genomes. In this study we answer this question by first analyzing the scaling properties of the toolbox model on arbitrary tree-like universal networks. We mathematically prove that the critical branching topology, in which the average number of upstream neighbors of a node is equal to one, is both necessary and sufficient for the quadratic scaling. Conversely, the toolbox model on trees with exponentially expanding, supercritical topology is characterized by the linear scaling with logarithmic corrections. We further generalize our model to include reactions with multiple substrates/products as well as branched or cyclic metabolic pathways. Unlike the original model the new version employs evolutionary optimized pathways with the smallest number of reactions necessary to achieve their metabolic tasks. Numerical simulations of this most realistic model on the universal network from the KEGG database again produced approximately quadratic scaling. Our results demonstrate why, in spite of their "small-world" topology, real-life metabolic networks are characterized by a broad distribution of pathway lengths and sizes of metabolic regulons in regulatory networks.
△ Less
Submitted 22 September, 2010;
originally announced September 2010.
-
Toolbox model of evolution of prokaryotic metabolic networks and their regulation
Authors:
Sergei Maslov,
Sandeep Krishna,
Tin Yau Pang,
Kim Sneppen
Abstract:
It has been reported that the number of transcription factors encoded in prokaryotic genomes scales approximately quadratically with their total number of genes. We propose a conceptual explanation of this finding and illustrate it using a simple model in which metabolic and regulatory networks of prokaryotes are shaped by horizontal gene transfer of coregulated metabolic pathways. Adapting to a n…
▽ More
It has been reported that the number of transcription factors encoded in prokaryotic genomes scales approximately quadratically with their total number of genes. We propose a conceptual explanation of this finding and illustrate it using a simple model in which metabolic and regulatory networks of prokaryotes are shaped by horizontal gene transfer of coregulated metabolic pathways. Adapting to a new environmental condition monitored by a new transcription factor (e.g., learning to use another nutrient) involves both acquiring new enzymes and reusing some of the enzymes already encoded in the genome. As the repertoire of enzymes of an organism (its toolbox) grows larger, it can reuse its enzyme tools more often and thus needs to get fewer new ones to master each new task. From this observation, it logically follows that the number of functional tasks and their regulators increases faster than linearly with the total number of genes encoding enzymes. Genomes can also shrink, e.g., because of a loss of a nutrient from the environment, followed by deletion of its regulator and all enzymes that become redundant. We propose several simple models of network evolution elaborating on this toolbox argument and reproducing the empirically observed quadratic scaling. The distribution of lengths of pathway branches in our model agrees with that of the real-life metabolic network of Escherichia coli. Thus, our model provides a qualitative explanation for broad distributions of regulon sizes in prokaryotes.
△ Less
Submitted 22 September, 2010;
originally announced September 2010.
-
Protein abundances and interactions coevolve to promote functional complexes while suppressing non-specific binding
Authors:
Muyoung Heo,
Sergei Maslov,
Eugene I. Shakhnovich
Abstract:
How do living cells achieve sufficient abundances of functional protein complexes while minimizing promiscuous non-functional interactions? Here we study this problem using a first-principle model of the cell whose phenotypic traits are directly determined from its genome through biophysical properties of protein structures and binding interactions in crowded cellular environment. The model cell i…
▽ More
How do living cells achieve sufficient abundances of functional protein complexes while minimizing promiscuous non-functional interactions? Here we study this problem using a first-principle model of the cell whose phenotypic traits are directly determined from its genome through biophysical properties of protein structures and binding interactions in crowded cellular environment. The model cell includes three independent prototypical pathways, whose topologies of Protein-Protein Interaction (PPI) sub-networks are different, but whose contributions to the cell fitness are equal. Model cells evolve through genotypic mutations and phenotypic protein copy number variations. We found a strong relationship between evolved physical-chemical properties of protein interactions and their abundances due to a "frustration" effect: strengthening of functional interactions brings about hydrophobic interfaces, which make proteins prone to promiscuous binding. The balancing act is achieved by lowering concentrations of hub proteins while raising solubilities and abundances of functional monomers. Based on these principles we generated and analyzed a possible realization of the proteome-wide PPI network in yeast. In this simulation we found that high-throughput affinity capture - mass spectroscopy experiments can detect functional interactions with high fidelity only for high abundance proteins while missing most interactions for low abundance proteins.
△ Less
Submitted 30 December, 2010; v1 submitted 15 July, 2010;
originally announced July 2010.
-
Fluctuations in Mass-Action Equilibrium of Protein Binding Networks
Authors:
Koon-Kiu Yan,
Dylan Walker,
Sergei Maslov
Abstract:
We consider two types of fluctuations in the mass-action equilibrium in protein binding networks. The first type is driven by relatively slow changes in total concentrations (copy numbers) of interacting proteins. The second type, to which we refer to as spontaneous, is caused by quickly decaying thermodynamic deviations away from the equilibrium of the system. As such they are amenable to metho…
▽ More
We consider two types of fluctuations in the mass-action equilibrium in protein binding networks. The first type is driven by relatively slow changes in total concentrations (copy numbers) of interacting proteins. The second type, to which we refer to as spontaneous, is caused by quickly decaying thermodynamic deviations away from the equilibrium of the system. As such they are amenable to methods of equilibrium statistical mechanics used in our study. We investigate the effects of network connectivity on these fluctuations and compare them to their upper and lower bounds. The collective effects are shown to sometimes lead to large power-law distributed amplification of spontaneous fluctuations as compared to the expectation for isolated dimers. As a consequence of this, the strength of both types of fluctuations is positively correlated with the overall network connectivity of proteins forming the complex. On the other hand, the relative amplitude of fluctuations is negatively correlated with the abundance of the complex. Our general findings are illustrated using a real network of protein-protein interactions in baker's yeast with experimentally determined protein concentrations.
△ Less
Submitted 10 March, 2008;
originally announced March 2008.
-
Prediction and verification of indirect interactions in densely interconnected regulatory networks
Authors:
Koon-Kiu Yan,
Sergei Maslov,
Ilya Mazo,
Anton Yuryev
Abstract:
We develop a matrix-based approach to predict and verify indirect interactions in gene and protein regulatory networks. It is based on the approximate transitivity of indirect regulations (e.g. A regulates B and B regulates C often implies that A regulates C) and optimally takes into account the length of a cascade and signs of intermediate interactions. Our method is at its most powerful when a…
▽ More
We develop a matrix-based approach to predict and verify indirect interactions in gene and protein regulatory networks. It is based on the approximate transitivity of indirect regulations (e.g. A regulates B and B regulates C often implies that A regulates C) and optimally takes into account the length of a cascade and signs of intermediate interactions. Our method is at its most powerful when applied to large and densely interconnected networks. It successfully predicts both the yet unknown indirect regulations, as well as the sign (activation or repression) of already known ones. The reliability of sign predictions was calibrated using the gold-standard sets of positive and negative interactions. We fine-tuned the parameters of our algorithm by maximizing the area under the Receiver Operating Characteristic (ROC) curve. We then applied the optimized algorithm to large literature-derived networks of all direct and indirect regulatory interactions in several model organisms (Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster).
△ Less
Submitted 27 November, 2007; v1 submitted 3 October, 2007;
originally announced October 2007.
-
Propagation of large concentration changes in reversible protein binding networks
Authors:
Sergei Maslov,
I. Ispolatov
Abstract:
We study how the dynamic equilibrium of the reversible protein-protein binding network in yeast Saccharomyces cerevisiae responds to large changes in abundances of individual proteins. The magnitude of shifts between free and bound concentrations of their immediate and more distant neighbors in the network is influenced by such factors as the network topology, the distribution of protein concent…
▽ More
We study how the dynamic equilibrium of the reversible protein-protein binding network in yeast Saccharomyces cerevisiae responds to large changes in abundances of individual proteins. The magnitude of shifts between free and bound concentrations of their immediate and more distant neighbors in the network is influenced by such factors as the network topology, the distribution of protein concentrations among its nodes, and the average binding strength. Our primary conclusion is that, on average, the effects of a perturbation are strongly localized and exponentially decay with the network distance away from the perturbed node, which explains why, despite globally connected topology, individual functional modules in such networks are able to operate fairly independently. We also found that under specific favorable conditions, realized in a significant number of paths in the yeast network, concentration perturbations can selectively propagate over considerable network distances (up to four steps). Such "action-at-a-distance" requires high concentrations of heterodimers along the path as well as low free (unbound) concentration of intermediate proteins.
△ Less
Submitted 17 August, 2007;
originally announced August 2007.
-
Detection of the dominant direction of information flow in densely interconnected regulatory networks
Authors:
I. Ispolatov,
Sergei Maslov
Abstract:
Finding the dominant direction of flow of information in densely interconnected regulatory or signaling networks is required in many applications in computational biology and neuroscience. This is achieved by first identifying and removing links which close up feedback loops in the original network and hierarchically arranging nodes in the remaining network. In mathematical language this corresp…
▽ More
Finding the dominant direction of flow of information in densely interconnected regulatory or signaling networks is required in many applications in computational biology and neuroscience. This is achieved by first identifying and removing links which close up feedback loops in the original network and hierarchically arranging nodes in the remaining network. In mathematical language this corresponds to a problem of making a graph acyclic by removing as few links as possible and thus altering the original graph in the least possible way. Practically in all applications the exact solution of this problem requires an enumeration of all combinations of removed links, which is computationally intractable. We introduce and compare two algorithms: the deterministic, 'greedy' algorithm that preferentially cuts the links that participate in the largest number of feedback cycles, and the probabilistic one based on a simulated annealing of a hierarchical layout of the network which minimizes the number of ``backward'' links going from lower to higher hierarchical levels. We find that the annealing algorithm outperforms the deterministic one in terms of speed, memory requirement, and the actual number of removed links. Implications for system biology and directions for further research are discussed.
△ Less
Submitted 27 February, 2007;
originally announced February 2007.
-
UV-induced mutagenesis in Escherichia coli SOS response: A quantitative model
Authors:
Sandeep Krishna,
Sergei Maslov,
Kim Sneppen
Abstract:
Escherichia coli bacteria respond to DNA damage by a highly orchestrated series of events known as the SOS response, regulated by transcription factors, protein-protein binding and active protein degradation. We present a dynamical model of the UV-induced SOS response, incorporating mutagenesis by the error-prone polymerase, Pol V. In our model, mutagenesis depends on a combination of two key pr…
▽ More
Escherichia coli bacteria respond to DNA damage by a highly orchestrated series of events known as the SOS response, regulated by transcription factors, protein-protein binding and active protein degradation. We present a dynamical model of the UV-induced SOS response, incorporating mutagenesis by the error-prone polymerase, Pol V. In our model, mutagenesis depends on a combination of two key processes: damage counting by the replication forks and a long term memory associated with the accumulation of UmuD'. Together, these provide a tight regulation of mutagenesis resulting, we show, in a "digital" turn-on and turn-off of Pol V. Our model provides a compact view of the topology and design of the SOS network, pinpointing the specific functional role of each of the regulatory processes. In particular, we suggest that the recently observed second peak in the activity of promoters in the SOS regulon (Friedman et al., 2005, PLoS Biol. 3, e238) is the result of a positive feedback from Pol V to RecA filaments.
△ Less
Submitted 6 January, 2007;
originally announced January 2007.
-
Propagation of fluctuations in interaction networks governed by the law of mass action
Authors:
Sergei Maslov,
Kim Sneppen,
Iaroslav Ispolatov
Abstract:
Using an example of physical interactions between proteins, we study how perturbations propagate in interconnected networks whose equilibrium state is governed by the law of mass action. We introduce a comprehensive matrix formalism which predicts the response of this equilibrium to small changes in total concentrations of individual molecules, and explain it using a heuristic analogy to a curre…
▽ More
Using an example of physical interactions between proteins, we study how perturbations propagate in interconnected networks whose equilibrium state is governed by the law of mass action. We introduce a comprehensive matrix formalism which predicts the response of this equilibrium to small changes in total concentrations of individual molecules, and explain it using a heuristic analogy to a current flow in a network of resistors. Our main conclusion is that on average changes in free concentrations exponentially decay with the distance from the source of perturbation. We then study how this decay is influenced by such factors as the topology of a network, binding strength, and correlations between concentrations of neighboring nodes. An exact analytic expression for the decay constant is obtained for the case of uniform interactions on the Bethe lattice. Our general findings are illustrated using a real biological network of protein-protein interactions in baker's yeast with experimentally determined protein concentrations.
△ Less
Submitted 7 November, 2006;
originally announced November 2006.
-
Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins
Authors:
Jacob Bock Axelsen,
Koon-Kiu Yan,
Sergei Maslov
Abstract:
The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. We introduce a si…
▽ More
The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in seven model organisms. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~p^(-gamma) with the value of the exponent gamma around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent alpha ~ 0.33. We separately measure the short-term (``raw'') duplication and deletion rates r*_dup, r*_del which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts r_dup, r_del. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates r*_del were shown to systematically increase with N_genes. Abnormally flat shapes of sequence identity histograms observed for yeast and human are consistent with lineages leading to these organisms undergoing one or more whole-genome duplications.
△ Less
Submitted 5 November, 2007; v1 submitted 21 July, 2005;
originally announced July 2005.
-
Binding properties and evolution of homodimers in protein-protein interaction networks
Authors:
Iaroslav Ispolatov,
Anton Yuryev,
Ilya Mazo,
Sergei Maslov
Abstract:
We demonstrate that Protein-Protein Interaction (PPI) networks in several eucaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically the likelihood of a protein to ph…
▽ More
We demonstrate that Protein-Protein Interaction (PPI) networks in several eucaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically the likelihood of a protein to physically interact with itself was found to be proportional to the total number of its binding partners. These properties of dimers are are in agreement with a phenomenological model in which individual proteins differ from each other by the degree of their ``stickiness'' or general propensity towards interaction with other proteins including oneself. A duplication of self-interacting proteins creates a pair of paralogous proteins interacting with each other. We show that such pairs occur more frequently than could be explained by pure chance alone. Similar to homodimers, proteins involved in heterodimers with their paralogs on average have twice as many interacting partners than the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with their sequence similarity. This all points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established de novo after the duplication. We finally discuss possible implications of our empirical observations from functional and evolutionary standpoints.
△ Less
Submitted 3 January, 2005;
originally announced January 2005.
-
Upstream Plasticity and Downstream Robustness in Evolution of Molecular Networks
Authors:
Sergei Maslov,
Kim Sneppen,
Kasper Astrup Eriksen
Abstract:
Evolving biomolecular networks have to combine the stability against perturbations with flexibility allowing their constituents to assume new roles in the cell. Gene duplication followed by functional divergence of associated proteins is a major force shaping molecular networks in living organisms. Recent availability of system-wide data for yeast S. Cerevisiae allow us to access the effects of…
▽ More
Evolving biomolecular networks have to combine the stability against perturbations with flexibility allowing their constituents to assume new roles in the cell. Gene duplication followed by functional divergence of associated proteins is a major force shaping molecular networks in living organisms. Recent availability of system-wide data for yeast S. Cerevisiae allow us to access the effects of gene duplication on robustness and plasticity of molecular networks. We demonstrate that the upstream transcriptional regulation of duplicated genes diverges fast, losing on average 4% of their common transcription factors for every 1% divergence of their amino acid sequences. In contrast, the set of physical interaction partners of their protein products changes much slower. The relative stability of downstream functions of duplicated genes, is further corroborated by their ability to substitute for each other in gene knockout experiments. We believe that the combination of the upstream plasticity and the downstream robustness is a general feature determining the evolvability of molecular networks.
△ Less
Submitted 22 October, 2003;
originally announced October 2003.
-
Hierarchy Measures in Complex Networks
Authors:
Ala Trusina,
Sergei Maslov,
Petter Minnhagen,
Kim Sneppen
Abstract:
Using each node's degree as a proxy for its importance, the topological hierarchy of a complex network is introduced and quantified. We propose a simple dynamical process used to construct networks which are either maximally or minimally hierarchical. Comparison with these extremal cases as well as with random scale-free networks allows us to better understand hierarchical versus modular feature…
▽ More
Using each node's degree as a proxy for its importance, the topological hierarchy of a complex network is introduced and quantified. We propose a simple dynamical process used to construct networks which are either maximally or minimally hierarchical. Comparison with these extremal cases as well as with random scale-free networks allows us to better understand hierarchical versus modular features in several real-life complex networks. For random scale-free topologies the extent of topological hierarchy is shown to smoothly decline with $γ$ -- the exponent of a degree distribution -- reaching its highest possible value for $γ\leq 2$ and quickly approaching zero for $γ>3$.
△ Less
Submitted 19 February, 2004; v1 submitted 18 August, 2003;
originally announced August 2003.
-
Specificity and stability in topology of protein networks
Authors:
Sergei Maslov,
Kim Sneppen
Abstract:
Molecular networks guide the biochemistry of a living cell on multiple levels: its metabolic and signalling pathways are shaped by the network of interacting proteins, whose production, in turn, is controlled by the genetic regulatory network. To address topological properties of these two networks we quantify correlations between connectivities of interacting nodes and compare them to a null mo…
▽ More
Molecular networks guide the biochemistry of a living cell on multiple levels: its metabolic and signalling pathways are shaped by the network of interacting proteins, whose production, in turn, is controlled by the genetic regulatory network. To address topological properties of these two networks we quantify correlations between connectivities of interacting nodes and compare them to a null model of a network, in which al links were randomly rewired. We find that for both interaction and regulatory networks, links between highly connected proteins are systematically suppressed, while those between a highly-connected and low-connected pairs of proteins are favored. This effect decreases the likelihood of cross talk between different functional modules of the cell, and increases the overall robustness of a network by localizing effects of deleterious perturbations.
△ Less
Submitted 17 May, 2002;
originally announced May 2002.