TMRCA Estimates
TMRCA Estimates
TMRCA Estimates
1
2 A Bayesian Method
We wish to write down the posterior probability of TMRCA, which we will
call T, given the mutation rates. We will start with one allele at a time.
Since we will assume that mutations in different alleles are independent, the
probabilities for all alleles are just products of the individual probabilities.
2
Likewise, the probability distribution of a difference is given by the cor-
relation of the two distributions P (W ≡ X − Y ) = i P (Xi )P (W + Xi ).
P
For distributions, like ours, that are symmetric about zero, it is easy to show
that convolution and correlation are the same.
Using this information we can immediately write down the probability
distribution of markers, M = M0 + D after T generations. It is given by
3
forms. And so if P is the probability distribution of mutations,
P (D|T, ~µ) = DF T −1 (DF T (P )T ) (2)
and for calculating the difference between the haplotypes of two descendants,
you would just replace T in the above equation with 2T .
Generally the mutations are symmetric and so the DFT becomes the
DCF, discrete cosine transform which we define as
X
Fk = DCT [f ](k) = fj cos(2πjk/N ) (3)
j
This is true for any N but we can also take N arbitrarily large and this
becomes an integral
Z 2π
T 1
P (x|T ) = (1 − µ) dθ cos(xθ) (1 + µ cos(θ))T (10)
2π 0
Z π
1
= (1 − µ)T dθ cos(xθ) (1 + µ cos(θ))T (11)
π 0
4
This integral can be integrated in various ways. For small µ it can be
written
1Zπ
P (x|T ) ≈ exp(−µT ) dθ cos(xθ) exp(µT cos(θ)) (12)
π 0
= exp(−µT )Ix (µT ) (13)
where ax,i are the Chebyshev polynomial coefficients. Using this and the
binomial theorem, we have
1Zπ
P (x|T ) = (1 − µ) dθTx (cos(θ)) (1 + µ cos(θ))T
T
(15)
π 0
x T
Z π !
1 T
(1 − µ)T dθ cosi (θ) µj cosj (θ)
X X
= ax,i (16)
π i=0 0 j=0 j
x X T
!
1 T jZ π
(1 − µ)T dθ cosi+j (θ)
X
= ax,i µ (17)
π i=0 j=0 j 0
Generally, the x values are small especially for small T where this is useful
so you only need tabulate the first few Chebyshev polynomial coefficients.
5
When µ is very poorly known, one might wish to integrate over µ and
write Z ∞
P (x|T ) = dµP (µ) exp(−µT )Ix (µT ) (20)
0
This can easily be done numerically for any distribution P (µ). Genererally it
has very little effect unless the uncertianty in µ is quite large. The integral can
be computed analytically (albeit with some difficulty) when P (µ) is a Gamma
distribution, a not unreasonable choice. P (µ) = µ(k−1) (Γ(k) θk )−1 exp(−µ/θ).
This has mean, kθ and variance kθ2 . So that k is given by the square of the
mean over the variance and θ is given by the variance over the mean. Using
this we have
Z ∞
k −1
P (x|T ) = (Γ(k) θ ) dµµ(k−1) exp(−µ/θ) exp(−µT )Ix (µT ) (21)
0
Z ∞
= (Γ(k) θ ) T −k
k −1
dτ τ (k−1) exp(−τ s)Ix (τ ) (22)
0
(23)
2.4 Variances
Since our distributions are all symmetric about zero, the mean of the distri-
bution is always zero. The next moment of interest is the variance. Another
relevant fact is that the variance of a convolution of two functions is the
sum of the variance of each of them. This fact allows us to write down the
variance of distribution P (D|T, ~µ) as
6
where V ar[P ] is the variance of the original mutation distribution. For single
branching, V ar[P ] = µ and so V ar[P (D|T, ~µ)] = T µ. As usual, we replace
T with 2T when talking about the variance between two descendants rather
than the variance between the descendant and the ancestor.
that the sample mean is unbiased for the true mean but that the sample
variance is not unbiased for the true variance. The expectation value for s2
is (N − 1)/N σ 2 . But this bias is harmless because the corrected statistic
N/(N − 1)S 2 therefore is an unbiased estimator for σ 2 .
These well known results however make the assumption that the data are
independent. If the data are not independent but are, rather, correlated, the
this result for the sample variance is changed as follows. For clarity, we will
drop the i subscript for the moment and reintroduce it later when needed.
So we are just discussion the data in one allele.
1 X
s2 ≡ (Dj − m)2 (27)
N j
1 X 2 1 X
= Dj − 2 Dj m + m2 (28)
N j N j
(29)
7
Plugging in m = N −1
P
Dk , this becomes
k
1 X 2 1 X 1 X
s2 = Dj − 2 2 Dj Dk + 2 Dj Dk (30)
N j N jk N jk
1 X 2 1 X
= Dj − 2 Dj Dk (31)
N j N jk
Let µ be the population mean (just for now, we will use µ for mutation
rates later). The data values are Dj = µ + j where j are the random
deviations from the mean due to random mutations. The expectation values
are < j >= 0 (required if µ is to be the mean) and the expectation value of
j k defines the covariance matrix, Cjk =< j k >. So now, we can write
1 X 2 1 X 1 X 2
Dj = (µ + j )2 = (µ + 2µj + 2j ) (32)
N j N j N j
(33)
The expectation value of this is µ + N j Cjj = µ + N −1 T r(C) where T r(C)
1 P
is the trace (sum of diagonals) of the covariance matrix. Similarly, the expec-
tation value of this second term is < N −2 jk Dj Dk >= µ + N −2 jk Cjk and
P P
so we can finally write down the expectation value of the sample variance for
correlated data,
1 1 X
< s2 >= T r(C) − 2 Cjk (34)
N N jk
For the special case (uncorrelated and equal variances) Cjk = σ 2 δjk , we
recover the usual result < s2 >= σ 2 − N −1 σ 2 = (N − 1)/N σ 2 .
Now, lets apply to this the STR data for a clade. We only need to know
the covariance matrix of Dj . We are still just working with one allele so will
suppress the i subscript. We already know that the variances are µT . From
now on, µ will refer to mutation rates not the mean. But what about the off-
diagonal values? Here, we need to remember that the mutations are assumed
to be independent events. If two people have a pairwise TMRCA of Tjk , it
means that those people shared the exact same mutation events before that
time and, after that time, experienced independent (uncorrelated) mutations.
So it is clear that the off-diagonal covariances are given by µ(T − Tjk ). So
now, we can write down the expectation value of the sample variance for
STR marker data.
1 1 X
< s2 > = T r(C) − 2 Cjk (35)
N N jk
8
1 X
= µT − µ (T − Tjk ) (36)
N 2 jk
1 X
= µ Tjk (37)
N 2 jk
Note that the diagonals of Tjk are zero and there are N (N − 1) off-diagonal
terms so we can write this
N −1
< s2 >= µ TP (38)
N
where TP is the mean pair-wise TMRCA,
1 X
TP = Tjk (39)
N (N − 1) jk
So, at last, we have shown that the corrected sample variance N/(N −1)s2
is not in fact an unbiased estimator of T but is an unbiased estimator of this
TP , the mean pairwise TMRCA. This TP is of course always less than T .
The ratio of T /TP will depend on the structure of the particular tree and
mutations times but will usually be in the range 1 to 3.
3 References
Walsh,B. 2001, The Genetics Society of America
http://www.genetics.org/cgi/reprint/158/2/897
http://en.wikipedia.org/wiki/Gamma distribution
http://mathworld.wolfram.com/SampleVarianceDistribution.html