Exact path-integral representation of the Wright-Fisher model with mutation and selection

David Waxman Centre for Computational Systems Biology, ISTBI, Fudan University, 220 Handan Road, Shanghai 200433, China

Abstract

The Wright-Fisher model describes a biological population containing a finite number of individuals. In this work we consider a Wright-Fisher model for a randomly mating population, where selection and mutation act at an unlinked locus. The selection acting has a general form, and the locus may have two or more alleles. We determine an exact representation of the time dependent transition probability of such a model in terms of a path integral. Path integrals were introduced in physics and mathematics, and have found numerous applications in different fields, where a probability distribution, or closely related object, is represented as a ‘sum’ of contributions over all paths or trajectories between two points. Path integrals provide alternative calculational routes to problems, and may be a source of new intuition and suggest new approximations. For the case of two alleles, we relate the exact Wright-Fisher path-integral result to the path-integral form of the transition density under the diffusion approximation. We determine properties of the Wright-Fisher transition probability for multiple alleles. We show how, in the absence of mutation, the Wright-Fisher transition probability incorporates phenomena such as fixation and loss.

keywords:

random genetic drift, frequency trajectories, multiple alleles, transition probability, Markov chain, allele frequency

Introduction

The Wright-Fisher model describes a biological population containing a finite number of individuals[1, 2]. It represents a fundamental model (or class of models) within population genetics that continues to be of relevance [3]. Such a model, at heart, describes the stochastic fluctuations of allele frequencies that occur in finite populations. The fluctuations arise because the parents of one generation do not all make the same contribution to the next generation. The Wright-Fisher model thus describes random genetic drift.

In addition to random genetic drift, the model can also incorporate the dynamical effects of selection, mutation and other evolutionary forces, and has been used in many different situations. For example, the model lies at the heart of forward simulations[4], as well as in inference[5], in evolutionary games[6], and it connects intimately with the coalescent[7].

The Wright-Fisher model is a discrete state, discrete time, Markov chain. The discrete states correspond to possible allele frequencies (or sets of allele frequencies), and the discrete time corresponds to generations. The model has been analysed under the diffusion approximation[8, 3], where states and times are treated as taking continuous values. Recently, the Wright-Fisher model of a biallelic locus, subject to multiplicative selection but in the absence of mutation, has been considered under the diffusion approximation, and the time-dependent transition density has been represented as a path-integral[9].

Path integrals were introduced into quantum theory by Feynman[10], and into the mathematics of diffusion by Wiener[11]. Generally, a probability distribution (or closely related object), associated with a diffusion-like process, is represented as a ‘sum’ of contributions over all paths or trajectories between two points or states. The ‘sum’ over paths is often an integral, and this is the origin of the name ‘path integral’. An alternative name is ‘functional integral’, since integration over paths/trajectories can also be thought of as integration over functions - namely the trajectories themselves. Related approaches, termed functional methods, have a large variety of applications[12].

The introduction of path integrals in physics has provided alternative calculational routes to problems. Beyond this, path integrals, because they focus on visualisable trajectories, may be a source of new intuition, and may suggest new ways to proceed and new approximations[10].

In the present paper we work fully within the framework of a Wright-Fisher model where both states and time are discrete; we do not employ the diffusion approximation, although we relate some results to this approximation. We consider a randomly mating sexual population, where a general scheme of selection acts at an unlinked locus, which is also subject to mutation. We derive an exact path-integral representation of the time-dependent transition probability of this model.

We first consider a locus with two alleles, and then generalise to the locus having $n$ alleles, where $n$ can take the values $2$ , $3$ , $4$ , … . There are numerous examples of studies of loci with two alleles, and there are increasing numbers of examples where multiple ( $>2$ ) alleles exist at a locus[13, 14, 15, 16].

Theoretical background for the case of two alleles

Consider a population that is diploid and sexual, and reproduces by random mating. We assume there is an equal sex ratio and no sexual dimorphism. Time is measured in generations, and labelled by $t=0,1,2,3,\ldots$ .

The organisms in the population are subject to mutation and selection at a single unlinked locus. The locus has two alleles that we refer to as $A_{1}$ and $A_{2}$ . We shall focus on just one of the two alleles, say $A_{1}$ , and often call it the focal allele.

With two alleles, the sum of their frequencies is unity. Thus specification of the frequency of the focal allele at any time determines the frequency of the other allele at this time. The state of the population can thus be described just in terms of the frequency of the focal allele, which we will often just call the frequency.

Effectively infinite population

We first consider a very large (effectively infinite) population.

We take a generation to begin with adults, who sexually reproduce via random mating and then die. Each mating yields the same very large number of offspring.

If the frequency of the focal allele in adults is $x$ , then the frequency of the focal allele in offspring is $x^{\ast}=f_{mut}(x)$ where $f_{mut}(x)$ takes into account frequency changes caused by mutation. This function is given by

f_{mut}(x)=\left(1-u\right)x+v(1-x)

(1)

where $u$ is the probability that an $A_{1}$ allele in a parent mutates to an $A_{2}$ allele in an offspring, and $v$ is the corresponding $A_{2}$ to $A_{1}$ mutation probability. In the absence of mutation $f_{mut}(x)=x$ which is equivalent to $x^{\ast}=x$ .

We assume viability selection determines the probability of different offspring surviving to maturity. If the frequency of the focal allele in offspring (i.e., after mutation) is $x^{\ast}$ , then the frequency of this allele after viability selection has acted is $f_{sel}(x^{\ast})$ , where $f_{sel}(x^{\ast})$ takes into account frequency changes of $x^{\ast}$ due to selection. Some non selective thinning may occur at this point, but providing the population size remains very large, this does not cause any further changes in allele frequencies.

Selection acts on variation in the population, and when there is no variation there are no effects of selection. There is no variation when carriers of only one allele are present in the population, which corresponds to $x=0$ and $x=1$ . We take

f_{sel}(x)=x+\sigma(x)x(1-x)

(2)

where the function $\sigma(x)$ (with $|\sigma(x)|<\infty$ ) is determined by the particular scheme of selection that is operating, and the effect of selection in $f_{sel}(x)$ , namely $\sigma(x)x(1-x)$ , has the required property of vanishing at both $x=0$ and $x=1$ .

A few examples of $\sigma(x)$ are as follows.

(i)

If the relative fitnesses of the three genotypes $A_{1}A_{1}$ , $A_{1}A_{2}$ and $A_{2}A_{2}$ are $1+s$ , $1+hs$ and $1$ , respectively, then $\sigma(x)=s\times\left[\left(1-2h\right)x+h\right]/[1+sx^{2}+2hsx(1-x)]$ . To leading order in $s$ (assuming $|s|\ll 1$ ), we have $\sigma(x)=s\times\left[\left(1-2h\right)x+h\right]$ , in which case any $h\neq 1/2$ will lead to $\sigma(x)$ varying with $x$ .
(ii)

If selection is additive, and the relative fitnesses of the three genotypes are $1+2s$ , $1+s$ and $1$ , respectively, then $\sigma(x)=s/\left(1+2sx\right)$ , and to leading order in $s$ we have $\sigma(x)=s$ , i.e., a constant.
(iii)

If selection is multiplicative, and the relative fitnesses of the three genotypes are $\left(1+s\right)^{2}$ , $1+s$ and $1$ , respectively, then $\sigma(x)=s/(1+sx)$ , and to leading order in $s$ we have $\sigma(x)=s$ ., i.e., a constant. Thus weak multiplicative selection is very similar, in effect, to weak additive selection.

We note that while small selection coefficients (i.e., small values of $s$ , and more generally small values of $\sigma(x)$ ) are common in nature [17], strongly selected alleles do sometimes occur, for example alleles that are appreciably deleterious [13]. Accordingly, we will not make the assumption that $|\sigma(x)|$ is small, and will simply assume that $\sigma(x)$ follows, without any approximation, from a selection scheme.

The frequency of the $A_{1}$ allele in offspring, after selection and mutation have acted, can be expressed in terms of the frequency, $x$ , in adults, as $f_{sel}(x^{\ast})\equiv f_{sel}(f_{mut}(x))$ and we write

F(x)=f_{sel}(f_{mut}(x)).

(3)

Let $X(t)$ denote the frequency of the $A_{1}$ allele in the adults of generation $t$ . Then in a very large population, the frequency obeys the deterministic equation

X(t+1)=F(X(t)).

(4)

Finite population

We now consider a finite population, where $N$ adults are present at the start of each generation. The processes of reproduction, mutation and viability selection occur as in an effectively infinite population. However, after viability selection there is a round of non selective sampling/number regulation of the mature offspring, that leads to $N$ individuals being present in the population. These become the $N$ adults of the next generation. The behaviour of this population can be described by a Wright-Fisher model, as is shown in textbooks[18]. We will now use such a model (which can, like the diffusion approximation, incorporate an effective population size[19]).

For the population under consideration, let $\mathbf{M}$ denote the transition matrix of the Wright-Fisher model. We write the $(i,j)$ element of $\mathbf{M}$ as $M_{i,j}$ , where $i$ and $j$ can take the values $0,1,2,\ldots,2N$ . Then for a population where the focal allele has the frequency $j/(2N)$ in one generation, the probability that the focal allele will have the frequency $i/(2N)$ in the next generation is $M_{i,j}$ . With $\binom{a}{b}=\tfrac{a!}{(a-b)!b!}$ a binomial coefficient, we have[18]

M_{i,j}=\binom{2N}{i}\left[F\left(\frac{j}{2N}\right)\right]^{i}\left[1-F\left% (\frac{j}{2N}\right)\right]^{2N-i}.

(5)

The transition matrix is always normalised in the sense that $\sum_{i=0}^{2N}M_{i,j}=1$ for all $j$ , and invoking normalisation can resolve ambiguities (for example, when $F(x)=0$ , normalisation ensures $M_{0,0}=1$ so $\sum_{i=0}^{2N}M_{0,j}=1$ for all $j$ ).

The transition matrix is key to many calculations. If $\mathbf{P}(t)$ is a column vector containing the probabilities of all $2N+1$ possible frequency states of the population in generation $t$ , i.e., the probability distribution for generation $t$ , then using the transition matrix we can determine the probability distribution for generation $t+1$ , namely $\mathbf{P}(t+1)=\mathbf{MP}(t)$ . Furthermore, using the elements of the transition matrix, we can determine the probability that the population passes through a particular set of frequency states over time, i.e., displays a particular frequency trajectory. For example, if the population starts with frequency $l/(2N)$ in one generation, then the probability that in the next $3$ generations the population will have the frequencies $k/(2N)$ , $j/(2N)$ and $i/(2N)$ , respectively, is given by $M_{i,j}M_{j,k}M_{k,l}$ .

Alternative notation for the transition matrix

We shall now write elements of $\mathbf{M}$ in a different notation that will be useful for our purposes. We introduce the notion of an allowed frequency of an allele which is given by

\text{allowed frequency}=\frac{\text{integer}}{2N}

(6)

where the integer can take any of the values $0,1,2,\ldots,2N$ .

To keep the notation as simple as possible, we shall, for a locus with two alleles, reserve the use of $a$ , $x$ , $x^{\prime}$ , $x(r)$ (for various integral $r$ ) and $z$ , as values of allowed frequencies. In terms of the allowed frequencies $x^{\prime}$ and $x$ , we write the elements of $\mathbf{M}$ as

M(x^{\prime}|x)=\binom{2N}{2Nx^{\prime}}\left[F(x)\right]^{2Nx^{\prime}}\left[% 1-F(x)\right]^{2N(1-x^{\prime})}

(7)

which gives the probability of a transition from the population state (i.e., frequency) $x$ , in one generation, to state $x^{\prime}$ in the next generation. Thus if $x^{\prime}=i/(2N)$ and $x=j/(2N)$ , with $i$ and $j$ any two of the integers $0,1,2,\ldots,2N$ , then $M(x^{\prime}|x)$ coincides with $M_{i,j}$ in Eq. (5).

We shall refer to a locus that is not subject to selection (but which may be subject to mutation), as a neutral locus. The transition matrix of a neutral locus, written $M^{(0)}(x^{\prime}|x)$ , is obtained from $M(x^{\prime}|x)$ by setting $\sigma(x)$ to zero for all $x$ . With

x^{\ast}=f_{mut}(x)

(8)

this leads to

M^{(0)}(x^{\prime}|x)=\binom{2N}{2Nx^{\prime}}\left(x^{\ast}\right)^{2Nx^{% \prime}}\left(1-x^{\ast}\right)^{2N(1-x^{\prime})}.

(9)

A central aspect of the analysis we present is that the form of $F(x)$ in Eq. (3) allows us to write the transition matrix $M(x^{\prime}|x)$ of Eq. (7) as the exact product of two factors:

M(x^{\prime}|x)=M^{(0)}(x^{\prime}|x)\times e^{C(x^{\prime}|x)}

(10)

where $M^{(0)}(x^{\prime}|x)$ is the neutral result given in Eq. (9), while, with $x^{\ast}$ given by Eq. (8), we have

C(x^{\prime}|x)=2N\times\big{[}x^{\prime}\ln\big{(}1+\sigma(x^{\ast})\left(1-x% ^{\ast}\right)\big{)}+(1-x^{\prime})\ln\big{(}1-\sigma(x^{\ast})x^{\ast}\big{)% }\big{]}

(11)

- see the first subsection of Methods for details.

The factorisation in Eq. (10) says the transition matrix $M(x^{\prime}|x)$ depends on a ‘core’ neutral/mutational part, $M^{(0)}(x^{\prime}|x)$ , and a factor $e^{C(x^{\prime}|x)}$ that is ‘selectively controlled’ in the sense that if there is no selection (i.e., if $\sigma(x)$ vanishes for all $x$ ) then $C(x^{\prime}|x)$ vanishes and the factor $e^{C(x^{\prime}|x)}$ is simply unity.

We note that while $C(x^{\prime}|x)$ depends on selection, and precisely vanishes in the absence of selection, it also depends on the population size, $N$ , and the mutation rates $u$ and $v$ .

Trajectories and path integral

We now consider a trajectory of the frequency, which starts at frequency $x(0)$ in generation $0$ , has frequency $x(1)$ in generation $1$ ,…, and frequency $x(t)$ in generation $t$ . To represent such a trajectory, which runs from time $0$ to time $t$ , we use the notation

[x]_{0}^{t}=\left(x(0),x(1),...,x(t)\right).

(12)

This expresses the trajectory as a row vector with $t+1$ elements, all of which are allowed frequencies.

The probability of occurrence of the trajectory $[x]_{0}^{t}$ in Eq. (12) is obtained by multiplying together the appropriate $M(x^{\prime}|x)$ and is given by ${\textstyle\prod\limits_{r=1}^{t}}M(x(r)|x(r-1)={\textstyle\prod\limits_{r=1}^% {t}}M^{(0)}(x(r)|x(r-1))\times e^{\sum_{r=1}^{t}C(x(r)|x(r-1))}$ . We write this as

\text{probability of }[x]_{0}^{t}=W^{(0)}\left([x]_{0}^{t}\right)\times e^{C% \left([x]_{0}^{t}\right)}

(13)

where

W^{(0)}\left([x]_{0}^{t}\right)={\textstyle\prod_{r=1}^{t}}M^{(0)}(x(r)|x(r-1))

(14)

and

C\left([x]_{0}^{t}\right)=\sum_{r=1}^{t}C(x(r)|x(r-1)).

(15)

Equation (13) says that the trajectory $[x]_{0}^{t}$ , in the presence of selection, has a probability which we can write as the product of the probability of the trajectory under neutrality, $W^{(0)}\left([x]_{0}^{t}\right)$ , and the factor $e^{C\left([x]_{0}^{t}\right)}$ which is selectively controlled. The presence of $e^{C\left([x]_{0}^{t}\right)}$ in Eq. (13) indicates how non-zero selection (in combination with other forces, via its $N$ , $u$ and $v$ dependence), modifies the probability of occurrence of an entire trajectory under neutrality.

Let $K(z,t|a,0)$ denote the overall probability of going from an initial allowed frequency of $a$ at time $0$ to the allowed frequency of $z$ at time $t$ . In conventional (Markov chain) language, $K(z,t|a,0)$ is determined from matrix elements of $\mathbf{M}^{t}$ , where $\mathbf{M}$ is the transition matrix that was introduced above (with elements given in Eq. (5)). By contrast, in trajectory language, all possible trajectories, between the end-points ( $a$ at time $0$ and $z$ at time $t$ ) contribute to $K(z,t|a,0)$ . We can thus write $K(z,t|a,0)$ as sum of the trajectory probability, $W^{(0)}[\mathbf{x}]\times e^{C\left([x]_{0}^{t}\right)}$ , over all possible trajectories. That is

K(z,t|a,0)=\overset{x(t)=z}{\underset{x(0)=a}{\sum\cdots\sum}}W^{(0)}\left([x]% _{0}^{t}\right)e^{C\left([x]_{0}^{t}\right)}.

(16)

The notation in Eq. (16) denotes a sum over all trajectories whose end points, $x(0)$ and $x(t)$ , have the specific allowed frequencies $a$ and $z$ , respectively, while $x(1)$ , $x(2)$ , $...,x(t-1)$ , which give the state of the population at intermediate times, take values that cover all allowed frequencies. In Figure 1 we illustrate two trajectories that contribute to a transition probability.

Refer to caption — Figure 1: Contributing trajectories. An illustration of two trajectories (i.e., frequency as a function of time) that contribute to the transition probability $K(z,T|a,0)$ , where the initial frequency is $a=0.2$ at time $0$ , and the final frequency is $z=0.8$ at time $T=50$ . All trajectories that contribute to $K(z,T|a,0)$ take only allowed frequencies.

Equation (16) is an exact ‘path integral’ or ‘sum over paths’ representation of the finite time transition probability in a two allele Wright-Fisher model where states and times are discrete, and the model incorporates mutation and a general form of selection.

Since, by construction, $W^{(0)}\left([x]_{0}^{t}\right)$ is independent of selection, and since $C\left([x]_{0}^{t}\right)$ vanishes when selection vanishes, the transition probability corresponding to that in Eq. (16), when there is no selection, namely the neutral probability, is written as $K^{(0)}(z,t|a,0)$ and given by

K^{(0)}(z,t|a,0)=\overset{x(t)=z}{\underset{x(0)=a}{\sum\cdots\sum}}W^{(0)}% \left([x]_{0}^{t}\right).

(17)

The quantity $K(z,t|a,0)$ in Eq. (16) can also be interpreted as the probability distribution of the frequency (of the $A_{1}$ allele) at time $t$ , which is a random variable that we write as $X(t)$ . In particular, $K(z,t|a,0)$ is the value of the distribution of $X(t)$ , when evaluated at frequency $z$ , given that the frequency $X(0)$ had the definite value $a$ . Thus, for example, the expected value of $X(t)$ , given $X(0)=a$ , is $E[X(t)|X(0)=a]=\sum_{z}z\times K(z,t|a,0)$ where the sum runs over all allowed values of $z$ .

Approximation when there is no mutation and selection is weak

We now consider a special case of the distribution $K(z,t|a,0)$ . We proceed under the following assumptions.

(i)

There is no mutation ( $u=v=0$ ).

Equation (9), with no mutation, entails replacing $x^{\ast}$ by $x$ and we obtain the no mutation, neutral (no selection) form of the transition matrix that we write as

M^{(0,0)}(x^{\prime}|x)=\binom{2N}{2Nx^{\prime}}x^{2Nx^{\prime}}\left(1-x% \right)^{2N(1-x^{\prime})}.

(18)

(ii)

Selection is multiplicative.

We take the $A_{1}A_{1}$ , $A_{1}A_{2}$ and $A_{2}A_{2}$ genotypes to have relative fitnesses of $\left(1+s\right)^{2}$ , $1+s$ and $1$ , respectively. We then have $\sigma(x)=s/(1+sx)$ and from Eq. (11) obtain

C(x^{\prime}|x)=2N\left[x^{\prime}\ln\left(1+s\right)-\ln\left(1+sx\right)% \right].

(19)

(iii)

Selection is weak ( $|s|\ll 1$ )

In terms of the scaled strength of selection

R=2Ns

(20)

which, unlike $s$ need not be small, the expansion of $C(x^{\prime}|x)$ in $s$ is given by $C(x^{\prime}|x)\simeq R\cdot\left(x^{\prime}-x\right)-\frac{R^{2}}{4N}\left(x^% {\prime}-x^{2}\right)$ with corrections of order $s^{3}$ . This yields

C\left([x]_{0}^{t}\right)\equiv\sum_{r=1}^{t}C(x(r)|x(r-1)\simeq\left(R-\frac{% R^{2}}{4N}\right)\left[x(t)-x(0)\right]-\sum_{r=0}^{t-1}U\left(x(r)\right)

(21)

where

U\left(x\right)=\frac{R^{2}}{4N}x\left(1-x\right).

(22)

Thus in the absence of mutation, but with weak selection, we have the approximation

K(z,t|a,0)\simeq e^{\left[R-R^{2}/(4N)\right]\left(z-a\right)}\overset{x(t)=z}% {\underset{x(0)=a}{\sum\cdots\sum}}W^{(0,0)}\left([x]_{0}^{t}\right)e^{-\sum_{% r=0}^{t-1}U\left(x(r)\right)}.

(23)

The path integral representation of the transition probability density, under the diffusion approximation, which involves continuous frequencies and continuous time, can be written as

K_{\text{\text{diffusion}}}(z,t|a,0)=e^{R\cdot\left(z-a\right)}\int_{x(0)=a}^{% x(t)=z}P([x]_{0}^{t})e^{-\int_{0}^{t}U(x(r))dr}d[x]

(24)

where the integration is over all continuous trajectories that start at frequency $a$ at time $0$ and end at frequency $z$ at time $t$ , with $P([x]_{0}^{t})$ a ‘weight’ associated with neutral trajectories, and $d[x]$ the measure of the path integral[9].

A comparison of the approximate Wright-Fisher transition probability in Eq. (23) and the diffusion transition probability density in Eq. (24) indicates that the two results are similar. In particular, corresponding to the expressions $e^{\left[R-R^{2}/(4N)\right]\left(z-a\right)}$ and $e^{-\sum_{r=1}^{t}U\left(x(r)\right)}$ that are present in the Wright-Fisher result are, respectively, the expressions $e^{R\cdot\left(z-a\right)}$ and $e^{-\int_{0}^{t}U(x(r))dr}$ in the diffusion result. The analogue of the Wright-Fisher neutral, mutation-free, trajectory probability, $W^{(0,0)}\left([x]_{0}^{t}\right)$ , that is present in Eq. (23), is the neutral weight, $P([x]_{0}^{t})$ , that is present in Eq. (24).

Theoretical background for multiple alleles

We shall now generalise the above. We again consider a population that is diploid, reproduces sexually by random mating, has an equal sex ratio, exhibits no sexual dimorphism, and evolves in discrete generations. Selection again occurs at a single unlinked locus, but now there are $n$ alleles at the locus, where $n$ is arbitrary (i.e., $n=2,3,4,...$ ) and we write allele $i$ (for $i=1,2,...,n$ ) as $A_{i}$ .

When there are three or more alleles, the difference, compared with two alleles, is that knowledge of the frequency of one allele is not enough to specify the state of the population. In fact, we need to follow the behaviour of $n-1$ allele frequencies, while one allele frequency can be treated as being determined by all other allele frequencies (since allele frequencies sum to unity). However, we shall not proceed in this way; we shall treat all alleles as being on an equal footing, and follow the behaviour of all $n$ allele frequencies.

Effectively infinite population

We first consider a very large (effectively infinite) population.

In what follows, we shall use $\mathbf{x}$ to denote an $n$ component column vector whose $i$ ’th element, $x_{i}$ , is the frequency of allele $A_{i}$ in adults ( $i=1,2,...,n$ ).

The frequency of all alleles in offspring is then $\mathbf{x}^{\ast}=\mathbf{Q}\mathbf{x}$ where $\mathbf{Q}$ is an $n\times n$ matrix whose $(i,j)$ element, $Q_{i,j}$ , is the probability that an $A_{j}$ allele mutates to an $A_{i}$ allele. Elements of $\mathbf{Q}$ are non-negative, and satisfy $\sum_{i=1}^{n}Q_{i,j}=1$ for all $j$ (so the sum of all mutated frequencies is unity).

We next assume that viability selection acts and determines the probability of different offspring surviving to maturity. The frequencies, after viability selection, are given by $\mathbf{f}_{sel}(\mathbf{x}^{\ast})$ , where $\mathbf{f}_{sel}(\mathbf{x}^{\ast})$ takes into account frequency changes of $\mathbf{x}^{\ast}$ due to selection, and is an $n$ component column vector.

We shall shortly exploit a property of $\mathbf{f}_{sel}(\mathbf{x})$ , that follows because selection acts on variation in a population. In particular, if the vector of allele frequencies, $\mathbf{x}$ , has an $i$ ’th element which is zero ( $x_{i}=0$ ), then the $i$ ’th element of $\mathbf{f}_{sel}(\mathbf{x})$ , which we write, as $f_{sel,i}(\mathbf{x})$ , also vanishes, since selection alone cannot salvage an allele after its absence from a population. This motivates us to take $f_{sel,i}(\mathbf{x})$ in the form

f_{sel,i}(\mathbf{x})=x_{i}\times[1+G_{i}(\mathbf{x})]

(25)

where $G_{i}(\mathbf{x})$ is finite ( $\left|G_{i}(\mathbf{x})\right|<\infty$ ) and is determined by the specific form of selection acting. Generally, $\sum_{i=1}^{n}x_{i}G_{i}(\mathbf{x})=0$ and $G_{i}(\mathbf{x})\geq-1$ (ensuring that after selection, the sum of all allele frequencies is unity, and all alleles frequencies are non-negative).

The set of allele frequencies in offspring, after selection and mutation have acted, can be expressed in terms of the set of frequencies $\mathbf{x}$ , in adults, as $\mathbf{f}_{sel}(\mathbf{Qx})$ and we write

\mathbf{F}(\mathbf{x})=\mathbf{f}_{sel}(\mathbf{Qx}).

(26)

We now consider dynamics. Let $\mathbf{X}(t)$ denote an $n$ component column vector containing the set of allele frequencies in generation $t$ . Because we have an effectively infinite population, $\mathbf{X}(t)$ obeys the deterministic equation

\mathbf{X}(t+1)=\mathbf{F}(\mathbf{X}(t)).

(27)

Finite population

Consider now a finite population, where $N$ adults are present in each generation. The quantity $\mathbf{x}$ is still an $n$ component vector whose $i$ ’th element, $x_{i}$ , is the frequency of allele $A_{i}$ in adults, but it has the added feature that all elements have values which are allowed frequencies (Eq. (6)). That is, $x_{i}\geq 0$ , $\sum_{i=1}^{n}x_{i}=1$ , and each $x_{i}$ is an integer divided by $2N$ . We shall call a vector that has this property an allowed set of allele frequencies. In the multiallele case we shall reserve the use of $\mathbf{a}$ , $\mathbf{x}$ , $\mathbf{x}^{\prime}$ , $\mathbf{x}(r)$ (for various $r$ ), and $\mathbf{z}$ , for allowed sets of allele frequencies. We now write the transition matrix element for the probability of a transition from state $\mathbf{x}$ to state $\mathbf{x}^{\prime}$ as

M(\mathbf{x}^{\prime}|\mathbf{x})=\binom{2N}{2N\mathbf{x}^{\prime}}\prod% \limits_{i=1}^{n}\left[F_{i}(\mathbf{x})\right]^{2Nx_{i}^{\prime}}

(28)

where $\binom{2N}{\mathbf{m}}$ , with $\mathbf{m}$ an $n$ component column vector with integer elements, denotes a multinomial coefficient for $n$ categories. We note that the transition matrix element, $M(\mathbf{x}^{\prime}|\mathbf{x})$ , in its conventional matrix form, is an element of a matrix with vector indices, not scalars[20].

The transition matrix of a neutral locus has elements which are the zero selection limit of $M(\mathbf{x}^{\prime}|\mathbf{x})$ , which we write as $M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})$ , and which is given by

M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})=\binom{2N}{2N\mathbf{x}^{\prime}}\prod% \limits_{i=1}^{n}\left(x_{i}^{\ast}\right)^{2Nx_{i}^{\prime}}

(29)

where $\mathbf{x}^{\ast}$ is given by

\mathbf{x}^{\ast}=\mathbf{Qx}.

(30)

As for the case of two alleles, a factorisation is possible; the form of $\mathbf{f}_{sel}(\mathbf{x})$ in Eq. (25) allows us to write the transition matrix, $M(\mathbf{x}^{\prime}|\mathbf{x})$ of Eq. (28), as the exact product of two factors:

M(\mathbf{x}^{\prime}|\mathbf{x})=M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})% \times e^{C(\mathbf{x}^{\prime}|\mathbf{x})}

(31)

where $M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})$ is given in Eq. (29) and

C(\mathbf{x}^{\prime}|\mathbf{x})=2N\sum_{i=1}^{n}x_{i}^{\prime}\ln\big{(}1+G_% {i}(\mathbf{x}^{\ast})\big{)}

(32)

- see the second subsection of Methods for details.

Trajectories and path integral

We now write a trajectory as

[\mathbf{x}]_{0}^{t}=\left(\mathbf{x}(0),\mathbf{x}(1),...,\mathbf{x}(t)\right)

(33)

in which each $\mathbf{x}(r)$ is an $n$ component column vector containing an allowed set of allele frequencies, which gives the state of the population at time $r$ . It follows that the trajectory $[\mathbf{x}]_{0}^{t}$ in Eq. (33), is an $n\times(t+1)$ matrix. The probability of this trajectory is ${\textstyle\prod\limits_{r=1}^{t}}M(\mathbf{x}(r)|\mathbf{x}(r-1)={\textstyle% \prod\limits_{r=1}^{t}}M^{(0)}(\mathbf{x}(r)|\mathbf{x}(r-1))\times\exp\left(% \sum_{r=1}^{t}C(\mathbf{x}(r)|\mathbf{x}(r-1))\right)$ . We write this as

\text{probability of}\ [\mathbf{x}]_{0}^{t}=W^{(0)}\left([\mathbf{x}]_{0}^{t}% \right)\times e^{C\left([\mathbf{x}]_{0}^{t}\right)}

(34)

where

W^{(0)}\left([\mathbf{x}]_{0}^{t}\right)={\textstyle\prod_{r=1}^{t}}M^{(0)}(% \mathbf{x}(r)|\mathbf{x}(r-1))

(35)

and

C\left([\mathbf{x}]_{0}^{t}\right)=\sum_{r=1}^{t}C(\mathbf{x}(r)|\mathbf{x}(r-% 1)).

(36)

Let $K(\mathbf{z},t|\mathbf{a},0)$ denote the overall probability of going from an initial state of the population corresponding to the allowed set of frequencies, $\mathbf{a}$ at time $0$ , to state $\mathbf{z}$ at time $t$ , which is an another allowed set of frequencies. All possible trajectories between these end-points contribute to $K(\mathbf{z},t|\mathbf{a},0)$ . We thus write $K(\mathbf{z},t|\mathbf{a},0)$ as sum of the probabilities $W^{(0)}\left([\mathbf{x}]_{0}^{t}\right)\times e^{C\left([\mathbf{x}]_{0}^{t}% \right)}$ over all possible trajectories. That is

K(\mathbf{z},t|\mathbf{a},0)=\overset{\mathbf{x}(t)=\mathbf{z}}{\underset{% \mathbf{x}(0)=\mathbf{a}}{\sum\cdots\sum}}W^{(0)}\left([\mathbf{x}]_{0}^{t}% \right)e^{C\left([\mathbf{x}]_{0}^{t}\right)}.

(37)

The notation in Eq. (37) denotes a sum over all trajectories whose end points, $\mathbf{x}(0)$ and $\mathbf{x}(t)$ , have the specific (allowed set) values $\mathbf{a}$ and $\mathbf{z}$ , respectively, while $\mathbf{x}(1)$ , $\mathbf{x}(2)$ , $...,\mathbf{x}(t-1)$ , which give the state of the population at intermediate times, take values that cover all allowed sets of frequencies.

Equation (37) is an exact ‘path integral’ representation of the finite time transition probability in a multiple ( $n$ ) allele Wright-Fisher model where states and times are discrete.

Since, by construction, $W^{(0)}\left([\mathbf{x}]_{0}^{t}\right)$ is independent of selection, and since $C\left([\mathbf{x}]_{0}^{t}\right)$ vanishes when there is no selection, the probability of going from state $\mathbf{a}$ at time $0$ to state $\mathbf{z}$ at time $t$ , when there is no selection, is $K^{(0)}(\mathbf{z},t|\mathbf{a},0)=\overset{\mathbf{x}(t)=\mathbf{z}}{% \underset{\mathbf{x}(0)=\mathbf{a}}{\sum\cdots\sum}}W^{(0)}\left([\mathbf{x}]_% {0}^{t}\right)$ .

Approximation when there is no mutation and selection is weak

We now consider a special case of the distribution $K(\mathbf{z},t|\mathbf{a},0)$ of Eq. (37), when there is no mutation and selection is multiplicative and weak.

When there is no mutation the matrix $\mathbf{Q}$ becomes the $n\times n$ identity matrix.

Under multiplicative selection, we take the $A_{i}A_{j}$ genotype to have a fitness proportional to $(1+s_{i})(1+s_{j})$ . It follows that $F_{i}(\mathbf{x})=x_{i}\left(s_{i}-{\sum\nolimits_{j=1}^{n}}s_{j}x_{j}\right)/% \left(1+{\sum\nolimits_{j=1}^{n}}s_{j}x_{j}\right)$ hence $G_{i}(\mathbf{x})=\left(s_{i}-{\sum\nolimits_{j=1}^{n}}s_{j}x_{j}\right)/\left% [1+{\sum\nolimits_{j=1}^{n}}s_{j}x_{j}\right]$ and

C(\mathbf{x}^{\prime}|\mathbf{x})=2N\left[\sum_{i=1}^{n}x_{i}^{\prime}\ln\left% (1+s_{i}\right)-\ln\left(1+\sum_{i=1}^{n}s_{i}x_{i}\right)\right].

(38)

We take weak selection to correspond to $|s_{i}|\ll 1$ for all $i$ , then similar to the case of two alleles, we obtain approximate results by expanding $C(\mathbf{x}^{\prime}|\mathbf{x})$ in the $s_{i}$ , and discarding third and higher order terms. We shall express results in terms of scaled selection strengths that are given by

\mathbf{R}=2N\mathbf{s}\text{ or }R_{i}=2Ns_{i}

(39)

where $\mathbf{s}$ is a column vector of the $s_{i}$ .

With $\delta_{i,j}$ denoting a Kronecker delta, a $T$ superscript denoting the transpose of a vector, and $\mathbf{V}(\mathbf{x})$ denoting an $n\times n$ matrix with elements

V_{i,j}(\mathbf{x})=x_{i}\delta_{i,j}-x_{i}x_{j}

(40)

we obtain

C\left([\mathbf{x}]_{0}^{t}\right)\equiv\sum_{r=1}^{t}C(\mathbf{x}(r)|\mathbf{% x}(r-1)\simeq\mathbf{R}^{T}\left[\mathbf{x}(t)-\mathbf{x}(0)\right]+\phi\left(% \mathbf{x}(t)\right)-\phi\left(\mathbf{x}(0)\right)-\sum_{r=0}^{t-1}U\left(% \mathbf{x}(r)\right)

(41)

where

\phi\left(\mathbf{x}\right)=-\sum_{i=1}^{n}\frac{R_{i}^{2}}{4N}x_{i}

(42)

and

U\left(\mathbf{x}\right)=\frac{1}{4N}\mathbf{R}^{T}\mathbf{V}(\mathbf{x})% \mathbf{R}.

(43)

Using Eqs. (41), (42) and (43) in Eq. (37), combined with $W^{(0,0)}\left([\mathbf{x}]_{0}^{t}\right)$ , which denotes the probability of trajectory $[\mathbf{x}]_{0}^{t}$ in the absence of mutation and selection ( $W^{(0,0)}\left([\mathbf{x}]_{0}^{t}\right)$ is constructed from a product of terms of the form $\binom{2N}{2N\mathbf{x}^{\prime}}\prod\limits_{i=1}^{n}x_{i}^{2Nx_{i}^{\prime}}$ - cf. Eq. (35)). we obtain the approximation

K(\mathbf{z},t|\mathbf{a},0)\simeq e^{\mathbf{R}^{T}\left(\mathbf{z}-\mathbf{a% }\right)+\phi\left(\mathbf{z}\right)-\phi\left(\mathbf{a}\right)}\overset{% \mathbf{x}(t)=\mathbf{z}}{\underset{\mathbf{x}(0)=\mathbf{a}}{\sum\cdots\sum}}% W^{(0,0)}\left([\mathbf{x}]_{0}^{t}\right)e^{-\sum_{r=0}^{t-1}U\left(\mathbf{x% }(r)\right)}.

(44)

Discussion

In this work we have derived an exact ‘path integral’ representation of the time-dependent transition probability in a Wright-Fisher model. We have explicitly considered the case of two alleles, where the population’s description is in terms of a focal allele, and the case of an arbitrary number of $n$ alleles, where the description is in terms of all $n$ allele frequencies, with all frequencies treated as having the same status.

For the case of two alleles, we have compared the Wright-Fisher transition probability with a path integral representation of the corresponding quantity (a transition density) under the diffusion approximation. The result for the diffusion approximation result was derived for multiplicative selection, in the absence of mutation, and we have established the relation of this with the exact Wright-Fisher result in this case.

The Wright-Fisher path integral, derived in this work for two alleles, applies for a wider class of fitness functions than just multiplicative fitness, and can incorporate mutation. The general form of the path integral, for two alleles is given in Eq. (16), and takes the form of a sum over trajectories of a product the two terms: (i) a ‘weight’ $W^{(0)}\left([\mathbf{x}]_{0}^{t}\right)$ which gives the probability of a trajectory under neutrality, i.e., when only random genetic drift and mutation are operating, and (ii) the factor $e^{C\left([\mathbf{x}]_{0}^{t}\right)}$ which while depending on parameters such as mutation rates, is primarily determined by selection - this factor incorporates all effects of selection, and $C\left([\mathbf{x}]_{0}^{t}\right)$ vanishes in the absence of selection. This separation into two factors represents an underlying property of the transition probability, $K(z,t|a,0)$ , that we know from other analyses, namely that at long times ( $t\rightarrow\infty$ ) the quantity $K(z,t|a,0)$ is a smooth function of selection, but the long time properties are very different for zero and non-zero mutation rates. For non-zero mutation rates, the long-time form of $K(z,t|a,0)$ is non-zero for all possible values of $z$ (i.e., all allowed frequencies), and independent of the initial frequency, $a$ . By contrast, for vanishing mutation rates, only the terminal frequency classes ( $0$ and $1$ ) have non-zero probabilities at long times, and furthermore, these probabilities depend on the initial frequency, $a$ . Thus $K(z,t|a,0)$ , as $t\rightarrow\infty$ , behaves discontinuously, as a function of mutation rates, in the sense that allowing mutation rates to tend to zero, and having mutation rates exactly equal to zero, yield different results. A diffusion analysis shows this most clearly, where singular spikes (Dirac delta functions) at the terminal frequencies are generally present in the transition probability density when mutation rates are zero, and are absent when mutation rates are non-zero [21]. The separation of a probability of a trajectory into the product of $W^{(0)}\left([\mathbf{x}]_{0}^{t}\right)$ and $e^{C\left([\mathbf{x}]_{0}^{t}\right)}$ is thus natural and a reflection of different behaviours arising from different features of the dynamics.

On the matters of fixation and loss, we note that since a Wright-Fisher model can describe these phenomena (in the absence of mutation), an exact path integral representation associated with this model can also, generally, describe features such as fixation and loss. This will also carry over to a path integral representation, based on the diffusion approximation, since the diffusion approximation is also known to encompass fixation and loss, albeit in a singular form [21]. Such singular behaviour seems likely to make the analysis of the path integral representation, based on the diffusion approximation, to be more complex, than in its absence.

As an elementary illustration of how fixation is incorporated into the path integral representation of the transition probability, $K(z,t|a,0)$ , we note the when all mutation rates are zero, the probability of ultimate fixation of the focal allele is $\lim_{t\rightarrow\infty}K(1,t|a,0)$ . Let us revisit the case considered above where there is no mutation and weak multiplicative selection acting. We can expand $K(1,t|a,0)$ in $s$ by first expanding $K(1,t|a,0)$ in $C\left(\left[x_{0}^{t}\right]\right)$ , and then expanding $C\left(\left[x_{0}^{t}\right]\right)$ in $s$ . To linear order in $s$ we obtain (from Eq. (23)) $K(1,t|a,0)\simeq[1+2Ns(1-a)]\times\overset{x(t)=1}{\underset{x(0)=a}{\sum% \cdots\sum}}W^{(0,0)}\left([x]_{0}^{t}\right)$ . Since $\lim_{t\rightarrow\infty}\overset{x(t)=1}{\underset{x(0)=a}{\sum\cdots\sum}}W^% {(0,0)}$ is the probability of fixation ultimately occurring, under neutrality, this limit thus coincides with the initial frequency, $a$ . In this way, we arrive at a fixation probability of $P_{fix}(a)\simeq a+2Nsa(1-a)$ , which contains the neutral result and a term which is first order in $s$ , which is the leading correction due to selection. Expansion of $K(z,t|a,0)$ (and related quantities) to higher order in $s$ , can be achieved, again by exploiting the factorisation between drift/mutation and selection that occurs in Eq. (16). Expansions in $s$ beyond linear order involve more complicated calculations than that of the linear case.

In the case of two alleles, we have seen the relation between the path integral of the ‘fully discrete’ Wright-Fisher model and the path integral of the diffusion approximation, for this model. For the case of an arbitrary number of $n$ alleles there is, at the present time, no such path integral for the diffusion approximation. However, from the lessons learned for two alleles we can infer this some of the properties of the general $n$ case, under the diffusion approximation. In particular, when selection is multiplicative, and in the absence of mutation. we infer from Eq. (44) that

K_{\text{diffusion}}(\mathbf{z},t|\mathbf{a},0)=e^{\mathbf{R}^{T}\left(\mathbf% {z}-\mathbf{a}\right)}\int_{\mathbf{x}(0)=\mathbf{a}}^{\mathbf{x}(t)=\mathbf{z% }}P([\mathbf{x}]_{0}^{t})e^{-\int_{0}^{t}U\left(\mathbf{x}(r)\right)dr}d[% \mathbf{x}]

(45)

where $\mathbf{R}$ is a column vector containing the set of scaled selection strengths (Eq. (39)), the quantity $P([\mathbf{x}]_{0}^{t})$ is the analogue of the neutral, mutation-free, probability of a trajectory in a Wright-Fisher model, $W^{(0,0)}\left([\mathbf{x}]_{0}^{t}\right)$ , while $U\left(\mathbf{x}\right)$ is given by Eq. (43). An interesting feature is the way selection enters Eq. (45), in both the prefactor, $e^{\mathbf{R}^{T}\left(\mathbf{z}-\mathbf{a}\right)}$ and within $U\left(\mathbf{x}\right)$ in forms that involve the vectors and matrices that occur in the problem. Additionally, a diffusion analysis would suggest that all occurrences of the population size, $N$ , are replaced by the effective population size, $N_{e}$ .

In the special cases considered above, of no mutation and weak selection, the ‘selectively controlled’ quantities $C\left([x]_{0}^{t}\right)$ and $C\left([\mathbf{x}]_{0}^{t}\right)$ , for two and $n$ alleles, respectively, both naturally split into two terms (see Eqs. (21) and (41)). One of the terms has dependence on only the initial and final frequencies of the transition probability, and has no dependence of the frequencies taken by trajectories at intermediate times; it is natural to call this a boundary term. To leading order in selection coefficients, the boundary term changes sign when the sign of all selection coefficients are reversed (for two alleles reversal entails $s\rightarrow-s$ ; for $n$ alleles, $\mathbf{s}\rightarrow-\mathbf{s}$ ). The boundary term is thus the primary place that the deleterious or beneficial effect of a mutation manifests itself. The other term ( $U\left([x]_{0}^{t}\right)$ and $U\left([\mathbf{x}]_{0}^{t}\right)$ , respectively) depends on the frequencies taken by trajectories at all times, from the initial time to the final time. The $U$ terms, when large, have the effect of suppressing the contribution of a trajectory. They are a manifestation of the ‘probabilistic cost of selection’ of an entire trajectory. Interestingly, the $U$ terms cannot take negative values and remain unaltered when the sign of all selection coefficients are reversed.

In summary, we have presented an exact representation of the transition probability of a Wright-Fisher model in terms of a path integral (in reality a sum over paths/trajectories). Let us conclude with some possible ways that the path integral representation may be of use. We shall restrict our considerations to the case of two alleles, where the main result is given in Eq. (16), since very similar considerations apply to the $n$ allele case in Eq. (37).

1.

The path integral representation may make it easy to carry out an expansion in a small parameter, such as a selection coefficient. This has been carried out for the transition density at intermediate frequencies, under the diffusion approximation[9]. In the present work we have shown that expansion in selection coefficients can also be applied to phenomena such as fixation and loss. There may be many other applications of expansion in a small parameter..
2.

A path integral involves trajectories whose contributions generally have different probabilities of occurrence. A possible approximation is where the most probable trajectory, along with trajectories that have with small fluctuations around the most probable trajectory, are used to estimate the path integral. The most probable trajectory may be of interest in its own right, since it may typify the way the population makes a transition between two states of the population over time.
3.

The path integral representation involves a fundamental separation of mutation and drift from the process of what is primarily selection, as manifested by the two factors $W^{(0)}\left([x]_{0}^{t}\right)$ and $e^{C\left([x]_{0}^{t}\right)}$ in Eq. (16). To exploit this separation, we note that while, in this work, we have implicitly assumed that all parameters are independent of the time, a straightforward generalisation of the exact results allows parameters to be time dependent. Then one immediate case of application occurs when just selection fluctuates over time, with selection coefficients drawn each generation from a given distribution, or generated by a random process. In the absence of further knowledge, it is plausibly the case that the relevant transition probability follows from an average over all such selection coefficients. With the average denoted by an overbar, the average of Eq. (16) reads $\overline{K(z,t|a,0)}=\overset{x(t)=z}{\underset{x(0)=a}{\sum\cdots\sum}}W^{(0% )}\left([x]_{0}^{t}\right)\overline{e^{C\left([x]_{0}^{t}\right)}}$ . Thus only the selectively controlled factor is averaged, and this may lead to an effective theory that has new/modified selective terms, compared with the case where selection coefficients are simply set equal to the time-averaged value[22, 23].

A different approach, compared to the above three approaches, is to rewrite Eq. (16) in the form $K(z,t|a,0)=K^{(0)}(z,t|a,0)\times D(z,t|a,0)$ where $K^{(0)}(z,t|a,0)$ is the neutral result (Eq. (17)) and

D(z,t|a,0)=\left.\overset{x(t)=z}{\underset{x(0)=a}{\sum\cdots\sum}}W^{(0)}% \left([x]_{0}^{t}\right)e^{C\left([x]_{0}^{t}\right)}\right/\overset{x(t)=z}{% \underset{x(0)=a}{\sum\cdots\sum}}W^{(0)}\left([x]_{0}^{t}\right).

(46)

We can interpret $D(z,t|a,0)$ as an average of the quantity $e^{C\left([x]_{0}^{t}\right)}$ over all neutral trajectories that start at allowed frequency $a$ at time $0$ and end at allowed frequency $z$ at time $t$ . Applying Jensen’s inequality[24] to Eq. (46) yields $D(z,t|a,0)\geq D_{J}(z,t|a,0)$ with

D_{J}(z,t|a,0)=\exp\left(\left.\overset{x(t)=z}{\underset{x(0)=a}{\sum\cdots% \sum}}W^{(0)}\left([x]_{0}^{t}\right)C\left([x]_{0}^{t}\right)\right/\overset{% x(t)=z}{\underset{x(0)=a}{\sum\cdots\sum}}W^{(0)}\left([x]_{0}^{t}\right)% \right).

(47)

Thus we find $K(z,t|a,0)\geq K^{(0)}(z,t|a,0)\times D_{J}(z,t|a,0)$ , where the exponent of $D_{J}(z,t|a,0)$ is a conditional average of $C\left([x]_{0}^{t}\right)$ over all neutral trajectories that start at frequency $a$ at time $0$ and end at frequency $z$ at time $t$

Methods

Here we give details of the calculations underlying Eqs. (10) and (31).

Factorisation of the transition matrix: two alleles

For the case of two alleles, the transition matrix can be expressed as a product of two factors, one of which is independent of selection.

We begin with Eq. (7) for the transition matrix, which we reproduce here for convenience. We have

M(x^{\prime}|x)=\binom{2N}{2Nx^{\prime}}\left[F(x)\right]^{2Nx^{\prime}}\left[% 1-F(x)\right]^{2N(1-x^{\prime})}

(48)

where

F(x)=f_{sel}(f_{mut}(x)).

(49)

In the absence of selection, $F(x)$ reduces to $f_{mut}(x)$ and $M(x^{\prime}|x)$ reduces $M^{(0)}(x^{\prime}|x)$ , as given by

M^{(0)}(x^{\prime}|x)=\binom{2N}{2Nx^{\prime}}\left[f_{mut}(x)\right]^{2Nx^{% \prime}}\left[1-f_{mut}(x)\right]^{2N(1-x^{\prime})}\equiv\binom{2N}{2Nx^{% \prime}}\left(x^{\ast}\right)^{2Nx^{\prime}}\left(1-x^{\ast}\right)^{2N(1-x^{% \prime})}

(50)

where we have set

x^{\ast}=f_{mut}(x).

(51)

To establish factorisation we use the adopted form of selection in Eq. (2), namely $f_{sel}(x)=x+\sigma(x)x(1-x)$ to write $F(x)=x^{\ast}\left[1+\sigma(x^{\ast})(1-x^{\ast})\right]$ . Similarly we have $1-F(x)=\left(1-x^{\ast}\right)\left[1-\sigma(x^{\ast})x^{\ast}\right]$ . These allow us to write Eq. (48) as

$\displaystyle M(x^{\prime}\|x)$	$\displaystyle=\binom{2N}{2Nx^{\prime}}\left\{x^{\ast}\left[1+\sigma(x^{\ast})(% 1-x^{\ast})\right]\right\}^{2Nx^{\prime}}\left\{\left(1-x^{\ast}\right)\left[1% -\sigma(x^{\ast})x^{\ast}\right]\right\}^{2N(1-x^{\prime})}$
	$\displaystyle=\binom{2N}{2Nx^{\prime}}\left(x^{\ast}\right)^{2Nx^{\prime}}% \left(1-x^{\ast}\right)^{2N(1-x^{\prime})}\times\left[1+\sigma(x^{\ast})(1-x^{% \ast})\right]^{2Nx^{\prime}}\left[1-\sigma(x^{\ast})x^{\ast}\right]^{2N(1-x^{% \prime})}$
	$\displaystyle\equiv M^{(0)}(x^{\prime}\|x)\times e^{C(x^{\prime}\|x)}$	(52)

where

C(x^{\prime}|x)=2N\times\big{[}x^{\prime}\ln\big{(}1+\sigma(x^{\ast})\left(1-x% ^{\ast}\right)\big{)}+(1-x^{\prime})\ln\big{(}1-\sigma(x^{\ast})x^{\ast}\big{)% }\big{]}.

(53)

Equation (52) represents an exact decomposition of the transition matrix, $M(x^{\prime}|x)$ , into the product of a transition matrix, $M^{(0)}(x^{\prime}|x)$ , that is independent of selection, and a factor $e^{C(x^{\prime}|x)}$ which depends on selection, and is unity in the absence of selection.

Factorisation of the transition matrix: n alleles

For the case of $n$ alleles, the transition matrix can again be expressed as a product of two factors, one of which is independent of selection.

We begin with Eqs. (28) and (25), which we reproduce here for convenience:

M(\mathbf{x}^{\prime}|\mathbf{x})=\binom{2N}{2N\mathbf{x}^{\prime}}\prod% \nolimits_{i=1}^{n}\left[F_{i}(\mathbf{x})\right]^{2Nx_{i}^{\prime}}

(54)

and

F_{i}(\mathbf{x})=x_{i}^{\ast}\times[1+G_{i}(\mathbf{x}^{\ast})]

(55)

in which

\mathbf{x}^{\ast}=\mathbf{Qx}.

(56)

In the absence of selection, $\mathbf{F}(\mathbf{x})$ reduces to $\mathbf{x}^{\ast}$ and $M(\mathbf{x}^{\prime}|\mathbf{x})$ reduces to $M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})$ , as given by

M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})=\binom{2N}{2N\mathbf{x}^{\prime}}\prod% \nolimits_{i=1}^{n}\left(x_{i}^{\ast}\right)^{2Nx^{\prime}}.

(57)

To establish a factorisation we note that Eq. (55) allows us to write Eq. (54) as

	$\displaystyle M(\mathbf{x}^{\prime}\|\mathbf{x})$	$\displaystyle=\binom{2N}{2N\mathbf{x}^{\prime}}\prod\nolimits_{i=1}^{n}\left\{% x_{i}^{\ast}\left[1+G_{i}(\mathbf{x}^{\ast})\right]\right\}^{2Nx_{i}^{\prime}}% =\binom{2N}{2N\mathbf{x}^{\prime}}\prod\nolimits_{i=1}^{n}\left(x_{i}^{\ast}% \right)^{2Nx_{i}^{\prime}}\times\prod\nolimits_{i=1}^{n}\left[1+G_{i}(\mathbf{% x}^{\ast})\right]^{2Nx_{i}^{\prime}}$
		$\displaystyle=M^{(0)}(\mathbf{x}^{\prime}\|\mathbf{x})\times e^{C(\mathbf{x}^{% \prime}\|\mathbf{x})}$		(58)

where

M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})=\binom{2N}{2N\mathbf{x}^{\prime}}\prod% \nolimits_{i=1}^{n}\left(x_{i}^{\ast}\right)^{2Nx_{i}^{\prime}}

(59)

and

C(\mathbf{x}^{\prime}|\mathbf{x})=2N\sum\nolimits_{i=1}^{n}x_{i}^{\prime}\ln% \left(1+G_{i}(\mathbf{x}^{\ast})\right).

(60)

Equation (58) represents an exact decomposition of the transition matrix, $M(\mathbf{x}^{\prime}|\mathbf{x})$ , into the product of a transition matrix, $M^{(0)}(\mathbf{x}^{\prime}|\mathbf{x})$ , that is independent of selection, and a factor $e^{C(\mathbf{x}^{\prime}|\mathbf{x})}$ which depends on selection, and is unity in the absence of selection.

References

[1] Fisher, R. A. On the dominance ratio. \JournalTitleProceedings of the Royal Society of Edinburgh 42, 321–341 (1922).
[2] Wright, S. G. Evolution in Mendelian populations. \JournalTitleGenetics 16, 97–159 (1931).
[3] Ewens, W. Mathematical Population Genetics I. Theoretical Introduction, 2nd Edition (Springer-Verlag, New York, 2004).
[4] Haller, B. C. & Messer, P. W. Forward genetic simulations beyond the Wright-Fisher model. \JournalTitleMolecular Biology and Evolution 36, 632–637 (2019).
[5] Tataru, P., Simonsen, M., Bataillon, T. & Hobolth, A. Inference in the Wright-Fisher model using allele frequency aata. \JournalTitleSystematic Biology 66, e30–e46 (2017).
[6] Imhof, L. & Nowak, M. Evolutionary game dynamics in a Wright-Fisher process. \JournalTitleJournal of Mathematical Biology 52, 667–681 (2006).
[7] Fu, Y.-X. Exact coalescent for the Wright-Fisher model. \JournalTitleTheoretical Population Biology 69, 385–394 (2006).
[8] Kimura, M. Stochastic processes and distribution of gene frequencies under natural selection. \JournalTitleCold Spring Harbor Symposium in Quantative Biology 20, 33–53 (1955).
[9] Schraiber, J. G. A path integral formulation of the Wright–Fisher process with genic selection. \JournalTitleTheoretical Population Biology 92, 30–35 (2014).
[10] Feynman, R. P. & Hibbs, A. Quantum Mechanics and Path-Integrals (McGraw-Hill, New York, 1965).
[11] Koval’chik, I. M. The Wiener integral. \JournalTitleRussian Mathematical Surveys 18(1), 97–134, DOI: DOI:10.1070/RM1963v018n01ABEH004126 (1963).
[12] Kleinert, H. Path Integrals in Quantum Mechanics, Statistics, Polymer Physics, and Financial Markets, 5th Edition (World Scientifc, Singapore, 2009).
[13] Okay, T. S., Oliveira, W. P., Raiz-J´unior, R., Rodrigues, J. C. & Negro, G. M. B. D. Frequency of the deltaf508 mutation in 108 cystic fibrosis patients in Sao Paulo: Comparison with reported Brazilian data. \JournalTitleClinics 60(2), DOI: 10.1590/S1807-59322005000200009 (2005).
[14] Hodgkinson, A. & Eyre-Walker, A. Human triallelic sites: Evidence for a new mutational mechanism? \JournalTitleGenetics 184(1), DOI: 10.1534/genetics.109.110510 (2010).
[15] Campbell, I. M. et al. Multiallelic positions in the human genome: Challenges for genetic analyses. \JournalTitleHuman Mutation 37(3), 231–234, DOI: 10.1002/humu.22944 (2016).
[16] Phillips, C. et al. A compilation of tri-allelic snps from 1000 genomes and use of the most polymorphic loci for a large-scale human identification panel. \JournalTitleForensic Science International: Genetics 46:102232, DOI: 10.1016/j.fsigen.2020.102232 (2020).
[17] Eyre-Walker, A. & Keightley, P. D. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. \JournalTitleMolecular Biology and Evolution 26, 2097–2108, DOI: 10.1093/molbev/msp119 (2009).
[18] Hoppensteadt, F. C. Mathematical Methods of Population Biology (Cambridge University Press, Cambridge, 1982).
[19] Zhao, L., Gossmann, T. I. & Waxman, D. A modified Wright–Fisher model that incorporates ${N}_{e}$ : A variant of the standard model with increased biological realism and reduced computational complexity. \JournalTitleJournal of Theoretical Biology 393, 218–228 (2016).
[20] Waxman, D. Fixation at a locus with multiple alleles: Structure and solution of the wright fisher model. \JournalTitleJournal of Theoretical Biology 257, 245–251 (2009).
[21] McKane, A. & Waxman, D. Singular solutions of the diffusion equation of population genetics. \JournalTitleJournal of Theoretical Biology 247, 849–858 (2007).
[22] Takahata, N., Ishii, K. & Matsuda, H. Effect of temporal fluctuation of selection coefficient on gene frequency in a population. \JournalTitleGenetics 72, 4541–4545 (1975).
[23] Waxman, D. A unified treatment of the probability of fixation when population size and the strength of selection change over time. \JournalTitleGenetics 188, 907–913 (2011).
[24] Ross, S. M. Introduction to Probability Models (13th ed.) (Academic Press, Oxford, 2024).