1. Introduction
During evolution, organisms adapt to the environment for survival following natural selection processes, which involve modification of physiological characteristics over generations. Those modifications imply changes in the genome that may lead to different genetic interactions to perform a specific function,
i.e., gene regulation. An evolutionary process can be divided in two parts. The first part is the modification of gene regulation (recombination rules and mutation), which leads to the modification of certain characteristics in a specific way. The second part optimizes the characteristics modification by survival of the best-adapted characteristics to their purpose (selection and fitness). From a dynamical systems perspective, these processes define an evolutionary dynamics that drives a system from an initial state, attractor A, which is stable under certain environmental conditions, through different trajectories in a state space until it reaches a new attractor B, which is stable under new environmental conditions (
Figure 1) [
1].
Figure 1.
Scheme of evolutionary dynamics from three different perspectives. (Top) From the perspective of dynamical systems; (Middle) From the perspective of the evolutionary process; (Bottom) From the perspective of networks topology.
Figure 1.
Scheme of evolutionary dynamics from three different perspectives. (Top) From the perspective of dynamical systems; (Middle) From the perspective of the evolutionary process; (Bottom) From the perspective of networks topology.
In
Figure 1top, the system in attractor A modifies its state according to certain dynamics that may lead the system to different attractors. These target attractors may represent states, which are far from the optimum possible state (in grey). A selection towards an optimum may lead attractor A to the optimum (or close to the optimum) attractor B. Dashed circles represent the robustness vicinity where small perturbations maintain the system under the same attractor (canalization). Dynamical rules and stochastic perturbations define the trajectories in the state space and determine their final states. The state space in the dynamical system matches with the fitness space in the evolutionary process. In
Figure 1middle, the system in A (green dot) evolves according to certain dynamics and it can reach a new stable state with low fitness (local maxima represented with grey dots). A selection process towards an optimum may lead A to a global maximum-fitness, or sufficiently high-fitness, B (blue dot).
Evolutionary processes inherently define an optimization mechanism so that the approximation to attractor B is not a random trajectory. Heuristic optimization methods and evolutionary algorithms mimic natural selection using recombination and mutation rules to approximate to attractor B. The application of heuristic optimization to the formal definition of the evolutionary dynamics of genetic regulation may help to understand the dynamical rules that lead from A to B.
In practice, evolutionary dynamics formalisms use a set of given rules to reveal the final states of a modified genetic regulation. In the recent years, this approach has been especially studied in the fixation probability of deleterious mutations in viral quasispecies [
2,
3], and cancer development [
4,
5,
6]. The application of evolutionary algorithms to the evolutionary dynamics of genetic regulation determines the modification rules given both initial A and final B states. In a previous work [
7], we have followed this approach to determine possible dynamical rules, which change genetic regulation between two different developmental stages of a single developmental process [
7]; Gerstung
et al. identified possible dynamical rules for cancer progression between different disease stages [
8].
In the present work, we review the recent methodology for evolutionary dynamics and heuristic optimization, and discuss the applicability of such methods to real biological problems.
Basic Concepts
Gene regulation formalisms. Gene regulation includes a wide range of mechanisms that determine the concentration of gene products in a specific process that can be associated with the result of a specific phenotype or biological function. In a simplistic definition, gene regulation can be represented by the interaction among certain genes (or gene products), which contribute to develop biological processes. Under this definition genetic regulation can be studied in networked frameworks, where the nodes represent genes (or their products) and the links represent the regulatory interactions among genes. In gene regulatory networks (GRN) we can match nodes with expressed (non-expressed) genes that are functionally active (inactive) in a specific process. Links can define positive regulatory interactions, they favor the activation of a non-expressed gene or the activity of an already expressed one, or negative, they favor the deactivation of an expressed gene or block the activation of a non-expressed one. The simplest and most common representation of GRNs is the Boolean type [
7,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33], where two discrete values define the state of network nodes; common values are 1 and 0 (or +1 and −1) whether the state is active and inactive respectively. Extensions to this representation comprise more than two discrete values to define gradual activity of a node; this is usually considered in combination with Fuzzy-logic techniques [
34]. Links action can also be defined as Boolean so that value 1 indicates a link between two nodes, and 0 indicates that those two nodes are unconnected. The GRN link-values define the adjacency matrix
M, where the entry
mij describes the interaction between genes
i and
j; the diagonal entry
mii describes the self-interaction of gene
i. It is also common to designate an action-intensity (weight) to the link using a graded-value interval, thus the entry
wij in the weight matrix
W defines the strength of the interaction between gene
i and
j. A GRN is a directed network, so
M and
W are non-symmetric. Usually, GRNs are sparse though they may contain highly connected nodes (hubs).
Continuum formalisms provide more detail representations of GRNs. As a recently explored instance, S-Systems defines the nodes in a GRN as variables in an ordinary differential equation (ODE) system [
13,
35,
36,
37,
38,
39,
40,
41,
42], which dynamics is defined by:
where
Xi define the genes expression (network nodes), which is given in time as a function of the expression levels of the other genes. The dynamical reaction kinetics in Equation (1) is defined using a power-law formalism that allows capturing complex non-linear relations. The two reaction terms in Equation (1) formally describe synthesis and degradation of gene
i, which are influenced by genes
j. α
i and β
i are reaction rates (synthesis and degradation respectively) and
gij and
hij are the positive kinetic orders of the reactions that indicate the strength of the influence of gene
j in the synthesis and degradation of gene
i.
Other instances of GRN definition are based on linear ODE systems [
16,
43,
44,
45,
46,
47,
48], artificial neural networks (ANN) [
29,
34,
49] and reticular systems [
30,
31,
32].
Evolutionary dynamics rules. The topology of a GRN defines the expression of the genes (nodes activity) in a determined time point. Evolutionary rules can modify nodes activity and network wiring in time so that the final topology of the network may be different from the initial one. In discrete systems, evolution rules are generally defined as threshold-type functions, either step-like (Heaviside) or sigmoidal (like the Hill function); the rules are applied using difference equations with a discrete time counter. In continuum representations the rules are applied using differential equations and a real variable for the time. During evolution, GRNs can be ranked according to a predefined criterion using a fitness function. This function measures the adaptability of the evolved system and drives the evolution process to an attractor. The fitness function is defined according to the specific system of study. In
Figure 1bottom, Network A (with green nodes) represents a certain function of a system (phenotypic/genotypic characteristic in terms of gene networks). Following a specific dynamics, network A changes its topology by rewiring the links and/or changing the number of nodes. This new topology may represent a state far from the optimum (networks with nodes in grey). A selection process towards an optimum may lead network A to network B (with blue nodes) satisfying a specific functionality.
2. Evolutionary Dynamics Methods
Our starting point in the evolutionary dynamics methods is Wagner’s model [
9,
10]. Wagner studied the influence of gene duplicity on gene expression in metazoan development. The GRN is discretely represented using value 1 for active nodes and −1 for inactive nodes. In Equation (2),
Si(
t) represents the state of the nodes in time
t. The system evolves applying the difference equations:
where
f(
x) indicates the sign of the expression, and
. The expression of the gene
Si in time (
t + τ) is a function of the action of the genes
Sj in time
t with
wij the regulatory strength of the interaction of the gene
Sj on
Si. If
x < 0 then
f(
x) = −1 and the regulation is negative, thus the target gene is deactivated. If
x > 0 then
f(
x) = +1 and the regulation is positive, thus the target gene is activated. If
x = 0 then
f(
x) = 0 and the target node is not modified. The characteristic time τ indicates the time unit during the evolutionary process. This dynamics leads either to an equilibrium or oscillatory state. For simplicity, Wagner studied only the equilibrium state
and analyzed the epigenetic stability that is the number of stable states. An optimal state
Sopt exists during the developmental process. If the final equilibrium state is different from
Sopt, the developmental process may suffer modifications and, therefore, the fitness of the adult organism is reduced. The evolutionary process, starting from an initial state, is numerically simulated for a set of individuals and is characterized by a certain state
S. The distance of the individual fitness value against the optimal is measured using the Hamming distance:
From Expression (3), the fitness is given by the function:
where β is a positive parameter that measures the selection strength. To test whether the evolution rule leads to an equilibrium state, it is applied twice. If the network is in an equilibrium state the fitness is measured, otherwise it is assigned the minimum fitness value, exp(−1/β). The individuals measured are ranked by their fitness value. The recombination process is only performed over selected individuals. Two individuals are selected for recombination if their normalized fitness is higher than a random number with uniform distribution between 0 and 1. The non-selected individuals are discarded. The recombination process implies the exchange of rows in the weight matrix
W between two individuals selected according to a highest fitness rule. The weights
wij ≠ 0 have a probability of modification (mutation) based on a pseudorandom value.
Wagner determined that gene duplicity does not affect the expression level.
Many authors have considered variations on Wagner’s model. As instances, Siegal and Bergman proposed the same dynamical law as in Equation (2). As a difference, the authors introduce a sign function in the form:
The sign function
f(
x) is a sigmoidal function with
a controlling the speed of change from an activation to a repression state [
15]. With this model, the authors revisited Waddington’s canalization concept that is the robustness during development to changes in the genome [
50]. They claim that the dynamical GRN driving a developmental process, described by iteration of Equations (2) and (5), evolves to constrain the genetic system to produce canalization, even without a selection towards an optimum fitness [
15]. To define the final state,
, of the evolutionary process, they define a measure similar to a variance in the form:
where
and
is the mean expression level during the time interval (
t − τ, …,
t). The initial state is randomly selected and the steady state,
, is reached when ψ(
) < 10
−4. The system is considered unstable if this threshold is not reach within 100 iterations. In this model, the fitness of an unstable individual is 0. The fitness function in steady-state systems is defined as:
with β the selection strength and
Sopt predefined the optimum solution. Note that the dynamical law, Equation (2), and Hamming distance, Equation (3), defined in Wagner’s model is a special case of this one considering the limit of α➔∞.
The recombination is performed by randomly exchanging W rows in individuals with similar fitness. Mutations are performed according to a normal distribution. The new individual is considered part of the new population if it reaches the steady state and its fitness is higher than a random number with uniform distribution.
Other instances study the effect of mutations with respect to the stability of the evolved state. Masel studied genetic assimilations that are mutations previously induced by external environmental conditions that later turn into inherited genetic modifications [
17]. The author added a noise component ε in the form of a random number drown from a normal distribution. The noise component may alter the state of a node
si in the dynamic equation:
where
f(
x) = 0 if
x < 0 and
f(
x) = 1 otherwise with
x =
WS(
t − 1)
i + ε. If
S does not remain at some stable value during four time steps in a row within 100 time steps, then the individual, represented by the matrix
W, does not reach equilibrium and it is assumed to be unviable (
W is discarded). The author found that genetic assimilation could occur in absence of a selection process that favors the assimilating modification. This is also in agreement with Waddington’s canalization.
Similarly, Espinosa-Soto
et al. analyzed the effect of genetic and non-genetic perturbations on an individual and its relation with the phenotypic plasticity that a regulatory circuit may reach to ease the adaptive evolution [
31]. Azevedo
et al. developed a model in which sexual reproduction produces negative epistasis that increases the capacity to purge deadly mutations. This suggests that sexual reproduction selects conditions to favor its own maintenance [
21]. Kaneko
et al. studied the robustness against noise and mutations in genetic expression. They used a different type of model based on stochastic differential equations [
44].
3. Inference of GRNs Using Evolutionary Algorithms
One of the motivations for evolutionary dynamics methods is the interpretation of real experimental data. Several methods are used to infer this interpretation. In the case of genomics, this inference provides GRNs that are able to satisfy the experimental conditions. The most recently used methods are
evolutionary algorithms (EA) [
51].
Among the EA one can find the genetic algorithms (GA), evolution strategies (ES), evolutionary programming (EP), simulated annealing (SA), ant colonies (AC) and immune algorithms (IA) [
51]. Moreover, hybrid algorithms are also considered. Here, two or more methods are combined, or they are applied with other selection/learning methods such as artificial neural networks (ANN) and S-systems. We first discuss two of the most common single methods.
One of the, so-called, classical evolutionary algorithms are the
genetic algorithms. The GAs are based on natural selection. They contemplate inheritance (recombination), mutation, adaptation and elitism. The general scheme (
Figure 2) of these methods is the following: the starting point is the generation of individuals (often called
chromosomes) representing the possible solutions to a problem. The individuals are ranked according to a specific fitness function that measures the proximity of the individual’s solution to a predefined optimal solution (commonly the fit to experimental data). The selection of individuals to form
offspring that are new individuals, which are supposed to maintain most of the good qualities of their
parents, is fitness dependent. The offspring is form by crossover methods of the selected individuals. The new individuals can mutate. Mutations are small perturbations, which may eventually lead to better characteristics. This reproductive cycle intends to improve the quality of the individuals according to the fitness criteria.
In the group of the considered modern EA, one of the most used is the simulated annealing. While GA mimic laws of evolution in biology, SA emulates a metallurgic technique used to obtain uniform composition alloys. The fitness (energy) function has multiple local minima (metastable states) and it should be minimal in the thermodynamic equilibrium. In experimental annealing, a solid is heated to a high temperature, whereby particles in the liquid phase have a great freedom to adapt to a minimum energy configuration. Then, the cooling program slowly decreases temperature T, so that for each T value the system reaches an equilibrium state. The process can be described starting from the maximum temperature and considering that for each T value the solid reaches its thermal equilibrium. Each temperature-dependent state is characterized by an energy E given by the Boltzmann distribution, provided that the cooling program is sufficiently slow. As temperature lowers, the Boltzmann distribution concentrates in lower energy states. Finally, when the temperature is close to zero, only the lowest energy state has a nonzero probability. If the cooling program is done too fast, the solid can reach metastable structures.
Figure 2.
Workflow schemes for general genetic algorithms (GA) and simulated annealing (SA) implementation.
Figure 2.
Workflow schemes for general genetic algorithms (GA) and simulated annealing (SA) implementation.
In heuristic optimization, the SA method matches the fitness function with the energy function, which includes an artificial control parameter as temperature. The set of possible solutions is mapped to the possible states of the physical system. The steady state (the optimal solution) that minimizes the energy (fitness function) is achieved with high probability at the end of the SA process. The temperature parameter gradually decreases from an initial high value.
Instances of these methods in evolutionary dynamics of GRN are the following: Ina previous work we applied a GA to decipher the dynamical rules that drive the evolution of a GRN between two different developmental stages. The GA modifies both the network links and the evolution rules applied to each node. We determined that, in many cases, different rules might produce the same effect in the network evolution [
7]. Kobayashi
et al. applied a SA method to optimize a network topology so that the GRN exhibit stable sustained oscillations with a predefined period [
46,
47]. The authors considered the Metropolis algorithm from Monte Carlo method with the cooling program
T = μ
F, where
T is the effective temperature, μ is a normalization constant and
F is the fitness function. The fitness
F decreases with
T until it reaches a sufficiently low value. The fitness function is defined as:
where
P0 is the searched period,
P is the mean period during the simulation time and σ is the period
P variance, which should vanish if the GRN dynamics is really periodic.
Though single methods can get sufficiently good solutions, most recent studies are based on hybrid methods. Some of the most relevant are discussed below.
Tominaga
et al. designed an inference method for GRNs combining S-systems with GA [
35]. The motivation for this research is to fit a dynamical GRN output with experimental time series. In Tominaga’s method, the set of S-system parameters (α
i, β
i,
gij,
hij) forms an individual. The fitness function (Equation (11)) measures the proximity of the S-system solution for each individual with respect to an experimental temporal series.
where
n is the number of observable state variables,
m is the number of experimental points,
(
t) are the simulated values and
Xi(
t) are the experimental values.
Kikuchi et al. performed modifications on Tominaga’s model to increase both the number of parameters to optimize and the convergence rate to an optimum fit, Equation (12). This modification adds an extra term that favors the individuals with low gij and hij.
where
c is a balance weight between the two terms [
36]. The individuals with
gij and
hij under a pre-defined threshold are set to zero. This introduces a pruning mechanism that makes the network successively sparser. After a certain number of generations minimizing the network topology, Equation (11) is used for a refinement search.
Noman and Hiba performed a modification on Equation (12), see Equation (13), where they introduced a higher precision in the network topology search by adding a Hill Climbing local search [
40,
41].
where
Kij are the kinetic orders (
gij and
hij) of gene
i sorted in ascending order of their absolute values and
I is the maximum in-degree in the network. This modification allows finding the zero valued parameters increasingly and thus obtains a minimized topology more efficiently.
The combination of S-systems with immune algorithms (IA) reduces substantially the computational time to infer GRNs in comparison to a combination of S-systems and GA [
42]. The IA is a heuristic search algorithm that mimics a simple acquired immunity mechanism. In the immune system, the anti-bodies are produced to bind the antigen. The organism modifies the anti-body production to optimize the binding efficiency. The IA generates an initial anti-body population. A binding affinity characterizes each anti-body in the form:
where
F is the fitness function in Equation (11). The IA increases its search capacity by introducing a memory mechanism.
Other instances of hybrid methods are S-systems with SA [
48], GA with ANN [
49], GA with a Simplex method [
43], ANN with Fuzzy-logic [
34].
Sirbu
et al. performed a comparison among different simple and hybrid methods [
29]. On those methods, the authors measured some characteristics such as the fitting over experimental data, robustness against different simulation runs and noise, quality of the adjusted parameters, mean runtime and number of function calls. They chose two classic methods: Classic genetic algorithm (CLGA) [
35], and multi-objective genetic algorithm (MOGA) [
43]. Moreover, five hybrid algorithms: Genetic algorithm with evolution strategy (GA + ES) [
38], genetic algorithm with artificial neural network (GA + ANN) [
49], iterative GA (PEACE1) [
36], local genetic search (GLSDC) [
37], and differential evolution as search strategy with Akaike’s information criterion (DE + AIC) [
40,
41]. The five hybrid methods showed good data fitting. GA + ANN showed the best fitness followed by GLSDC. GA + ANN also showed the highest robustness against different simulation runs and it was also capable to identify more correct interactions than the rest of the methods. Though the S-system methods showed better mean performance, GLSDC method seemed more convenient for a quantitative analysis. According to robustness against noise, GA + ANN and GLSDC were at the top of the ranking. In general, hybrid methods were more efficient than single methods.
4. Discussion and Conclusions
Canalization and evolvability are two recurrent concepts in evolutionary dynamics. Canalization is the capacity of a developing system to show robustness against changes in the genotype and environment. Evolvability is the capacity of a system to adapt to environmental changes. These two concepts have counter definitions. However, both capacities are complementary to describe an evolutionary dynamic system. This system needs to show evolvability to properly adapt to changes in the environment and it needs to maintain the new characteristic over generations in a robust manner, thus showing canalization. Evolutionary algorithms are also able to show evolvability, as they are design to lead a system from attractor A to attractor B according to a specific fitness. And they show canalization as individuals evolve over generations to keep closer to the optimum fit.
We reviewed important methodology to study evolutionary dynamics in networked systems applied to gene networks, where both canalization and evolvability can be explored. Some studies relate innovation and robustness in canalization and conclude that robustness is a system property that can itself evolve [
22,
23]. Some other works study the effect of recombination under genetic variations [
25,
52], and the capacity of a population to adapt to certain conditions [
28]. Other instance studied canalization during evolution, modularity and robustness in body patterning [
53]. Recent studies showed that specificity in GRNs modifies the evolution of network motifs (recurrent sub-networks) concentration (modularity). This may be useful to build up new characteristics as combination of simple motifs [
30].
In the recent years, the amount of data available that describe biological processes has highly increased. These data, which mostly correspond to developmental biology experiments and evolutionary diseases like cancer, can be studied from the evolutionary dynamics perspective. With this work we want to highlight the use of EA in combination with evolutionary dynamics methods to tackle these problems. In this case, attractors A and B are known (experimental data) and the in-between dynamics is the unknown. Morphogenesis (
i.e., the formation of the size and shape of tissues/organs), one of the hot topics in developmental biology, is widely studied through the evolutionary dynamics perspective using EA to fit experimental data. Works in the field include the canalization of genetic expression and the segmentation of the fruit-fly embryogenesis [
45,
54,
55], and the evolvability through different developmental stages of the GRN that drives the formation of the mouse eye [
9]. From these works we learn how regulation can be done and modified during the development of some specific systems. An interesting question arises from this point: Do similar evolutionary dynamics apply to different developmental processes and times? Evolutionary developmental biology (Evo-Devo) studies how genetic regulation is conserved during evolution. As well-known instances, the fruit fly and the mouse eyes are far evolutionarily speaking, and share developmental genetic mechanisms [
56], thus there are similar genetic interactions in different species; also different tissues share similar developmental genetic mechanisms in a single organism, e.g.,
Pax genes family during the organogenesis of mouse eye and kidney [
57]. Though we are still far from deciphering general mechanisms for the modification of gene regulation, the combination of evolutionary algorithms with evolutionary dynamics methods build a powerful tool to explain experimental observations in biological and biomedical research.