Haplotalk
Haplotalk
Haplotalk
January 7, 2015
1 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
A G T T A G C G A
A G T C A G C A A
gene gene
approx. 0.1% of human nucleotide sites differ between individuals
2 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
A G T T A G C G A
A G T C A G C A A
gene gene
approx. 0.1% of human nucleotide sites differ between individuals
2 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
A G T T A G C G A
A G T C A G C A A
gene gene
the sequence of SNPs is called a haplotype
2 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
T G
C A
2 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
3 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
3 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Motivation
Haplotype Inference
goal: find relation between certain SNPs and genetic diseases
problem: difficult (expensive) to sequence both haplotypes
but: easy (cheap) to sequence the genotype instead
; idea: sequence genotype and computationally infer haplotypes
4 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Motivation
Haplotype Inference
goal: find relation between certain SNPs and genetic diseases
problem: difficult (expensive) to sequence both haplotypes
but: easy (cheap) to sequence the genotype instead
; idea: sequence genotype and computationally infer haplotypes
Problems
impossible to infer haplotypes of just 1 genotype
; sequence and infer groups/populations
which explanation should be preferred if there are multiple?
; parsimony
how to perform the actual computation fast?
4 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Parsimony
5 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Parsimony
used in...
... Clark’s problem [Clark, Molecular Biology and Evolution ’90]
5 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Preliminary Definitions
Example
haplotype1: 0 0 1 0 1 1 1 0 0 0 1
haplotype2: 0 0 1 1 0 1 0 0 1 1 1
genotype: 0 0 1 2 2 1 2 0 2 2 1
6 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Preliminary Definitions
Example
haplotype1: 0 0 1 0 1 1 1 0 0 0 1
haplotype2: 0 0 1 1 0 1 0 0 1 1 1
genotype: 0 0 1 2 2 1 2 0 2 2 1
6 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Preliminary Definitions
Example
haplotype1: 0 0 1 1 1 1 1 0 0 0 1
haplotype2: 0 0 1 0 0 1 0 0 1 1 1
genotype: 0 0 1 2 2 1 2 0 2 2 1
6 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Preliminary Definitions
Example
01001 11111
21221
1
11011 2102
22222
11122
12
21
2
12120
10110 11100
7 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Preliminary Definitions
Example
01001 11111
21221
1
11011 2102
22222
11122
12
21
2
12022
10110 11100
7 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
genotypes haplotype graph haplotypes
11122 01001
21221 11111 01001
12120 11011 2102
1
10110
12212
22222
11122
11011
21021
12
11100
21
2
8 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Previous Work
Complexity
NP-hard [Halldórsson et al., DMTCS ’03]
Algorithms
O(2|G |·d ) Branch&Bound [Wang & Xu, Bioinformatics ’03]
Our Contribution
10 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
11 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
11 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Example
2 0 1 2 1
2 0 2 1 1
2 2 1 1 2 10 genotypes in total ; 5 haplotypes
2 2 1 2 2
1 0 2 2 1
1 2 1 0 2
1 2 1 2 2
1 2 2 1 2
1 2 2 2 2
1 1 1 2 0
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Example
2 0 1 2 1
2 0 2 1 1
2 2 1 1 2 10 genotypes in total ; 5 haplotypes
2 2 1 2 2 6 genotypes with 1 ; 4 haplotypes with 1
1 0 2 2 1 0 genotypes with 0 ; 1 haplotype with 0
1 2 1 0 2
1 2 1 2 2
1 2 2 1 2
1 2 2 2 2
1 1 1 2 0
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Example
2 0 1 2 1
2 0 2 1 1
2 2 1 1 2 10 genotypes in total ; 5 haplotypes
2 2 1 2 2 6 genotypes with 1 ; 4 haplotypes with 1
1 0 2 2 1 0 genotypes with 0 ; 1 haplotype with 0
1 2 1 0 2 6 genotypes in total ; 4 haplotypes
1 2 1 2 2
1 2 2 1 2
1 2 2 2 2
1 1 1 2 0
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Example
2 0 1 2 1
2 0 2 1 1
2 2 1 1 2 10 genotypes in total ; 5 haplotypes
2 2 1 2 2 6 genotypes with 1 ; 4 haplotypes with 1
1 0 2 2 1 0 genotypes with 0 ; 1 haplotype with 0
1 2 1 0 2 6 genotypes in total ; 4 haplotypes
1 2 1 2 2 1 genotype with 1 ; 2 haplotypes with 1
1 2 2 1 2 1 genotype with 0 ; 2 haplotypes with 0
1 2 2 2 2
1 1 1 2 0
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Example
2 0 1 2 1
2 0 2 1 1
2 2 1 1 2 10 genotypes in total ; 5 haplotypes
2 2 1 2 2 6 genotypes with 1 ; 4 haplotypes with 1
1 0 2 2 1 0 genotypes with 0 ; 1 haplotype with 0
1 2 1 0 2 6 genotypes in total ; 4 haplotypes
1 2 1 2 2
1 2 2 1 2
1 2 2 2 2
1 1 1 2 0
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0 | = |G1 | = 1 ; poly (although we may get 2 solutions)
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0 | = |G1 | = 1 ; poly (although we may get 2 solutions)
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0 | = |G1 | = 1 ; poly (although we may get 2 solutions)
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0 | = |G1 | = 1 ; poly (although we may get 2 solutions)
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
G can be “nicely” partitioned into G0 , G1 , and G2 .
Observation
G2 6= ∅ but G0 = ∅ or G1 = ∅ ; poly
|G0 | = |G1 | = 1 ; poly (although we may get 2 solutions)
12 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
Let H0 induce G0 and let g be a genotype in G2 with the smallest
number of 2’s. ; All h ∈ H0 that are consistent with g are
equal.
13 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
Let H0 induce G0 and let g be a genotype in G2 with the smallest
number of 2’s. ; All h ∈ H0 that are consistent with g are
equal.
Proof Idea
H1 H0
h’=1100... g=2120...
h=0110...
13 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observation
Let H0 induce G0 and let g be a genotype in G2 with the smallest
number of 2’s. ; All h ∈ H0 that are consistent with g are
equal.
Proof Idea
H1 H0
h’=1100... g=2120...
h=0110...
g’
h’’=01?0...
13 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
14 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Theorem
Induced Haplotype Inference by Parsimony can be solved in
O(|G | · k · m) time.
14 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
15 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
15 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Definitions
Example
21221
1
2102
22222
11122
12
21
2
12120
16 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observations
Example
21221
1
2102
22222
11122
12
21
2
12120
17 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observations
Example
21221
22222
11122
12120
18 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observations
Observation
(G , k) yes-instance with solution H
⇒ ∃ Γ extendable (O(|Γ| · m) time) to a haplotype graph of H and G
19 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observations
Observation
(G , k) yes-instance with solution H
⇒ ∃ Γ extendable (O(|Γ| · m) time) to a haplotype graph of H and G
algorithmic idea: guess Γ
19 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Observations
Observation
(G , k) yes-instance with solution H
⇒ ∃ Γ extendable (O(|Γ| · m) time) to a haplotype graph of H and G
algorithmic idea: guess Γ
better idea: guess a “spanning” subgraph of Γ
19 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Algorithm
1 guess “spanning” subgraph of Γ
2 infer the haplotype multiset H ; O(k · m) time
3 check whether H resolves G ; O(k 2 · m) time
20 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Algorithm
1 guess “spanning” subgraph of Γ
1 guess a size-k genotype subset of G ; O(k 2k ) possibilities
2 for these genotypes, guess 2 (of k) vertices ; O(k 2k ) possibilities
2 infer the haplotype multiset H ; O(k · m) time
3 check whether H resolves G ; O(k 2 · m) time
20 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Algorithm
1 guess “spanning” subgraph of Γ
1 guess a size-k genotype subset of G ; O(k 2k ) possibilities
2 for these genotypes, guess 2 (of k) vertices ; O(k 2k ) possibilities
2 infer the haplotype multiset H ; O(k · m) time
3 check whether H resolves G ; O(k 2 · m) time
Theorem
Haplotype Inference by Parsimony can be solved in O(k 4k+2 · m)
time.
20 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
11101 01011
21221
12222
01022
22020
10010 01000
01001 11111
21221
1
11011 2102
22222
11122
12
21
2
12120
10110 11100
21 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
21221
12222
01022
22020
21221
1
2102
22222
11122
12
21
2
21 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
21221
12222
01022
22020
?10?1
21221
1
2102
22222
11122
12
21
2
21 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
11??1 010?1
21221
12222
01022
22020
1?0?0 010?0
?10?1 111?1
21221
1
11011 2102
22222
11122
12
21
2
1??1? 111??
21 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
111?1 010?1
21221
12222
01022
22020
100?0 010?0
01001 11111
21221
1
11011 2102
22222
11122
12
21
2
10110 11100
21 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Example
11101 01011
21221
12222
01022
22020
10010 01000
01001 11111
21221
1
11011 2102
22222
11122
12
21
2
10110 11100
21 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
22 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
22 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
22 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
22 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Lemma
MH resolves MG :
columns i,j of MH equal ⇒ columns i,j of MG equal
23 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Lemma
MH resolves MG :
columns i,j of MH equal ⇒ columns i,j of MG equal
Example
MG MH
1 1 1 2 1 2
0 1 0 0 0 1
1 2 1 2 1 0
1 0 1 1 1 0
1 2 2 1 2 2
1 1 0 1 0 1
2 1 0 2 0 1
1 1 1 0 1 1
2 1 2 2 2 1
1 1 1 1 1 1
2 2 2 2 2 2
23 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Lemma
MH resolves MG :
columns i,j of MH equal ⇒ columns i,j of MG equal
Example
MG MH
1 1 1 2 1 2
0 1 0 0 0 1
1 2 1 2 1 0
1 0 1 1 1 0
1 2 2 1 2 2
1 1 0 1 0 1
2 1 0 2 0 1
1 1 1 0 1 1
2 1 2 2 2 1
1 1 1 1 1 1
2 2 2 2 2 2
23 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Lemma
MH resolves MG :
columns i,j of MH equal ⇒ columns i,j of MG equal
Example
MG MH
1 1 1 2 2
0 1 0 0 1
1 2 1 2 0
1 0 1 1 0
1 2 2 1 2
1 1 0 1 1
2 1 0 2 1
1 1 1 0 1
2 1 2 2 1
1 1 1 1 1
2 2 2 2 2
23 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Lemma
MH resolves MG :
columns i,j of MH equal ⇒ columns i,j of MG equal
Example
MG MH
1 1 1 2 1 2
0 1 0 0 0 1
1 2 1 2 1 0
1 0 1 1 1 0
1 2 2 1 2 2
1 1 0 1 0 1
2 1 0 2 0 1
1 1 1 0 1 1
2 1 2 2 2 1
1 1 1 1 1 1
2 2 2 2 2 2
23 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Kernel Conclusion
Theorem
columns: ≤ 2k , rows: ≤ k2
24 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Conclusion
what we saw. . .
introduced induced variant with O(k 3 · m) time algorithm
2
improved 2O(k log k) time algorithm to 2O(k log k) time
presented O(2k · k 2 )-bit kernel
25 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Conclusion
what we saw. . .
introduced induced variant with O(k 3 · m) time algorithm
2
improved 2O(k log k) time algorithm to 2O(k log k) time
presented O(2k · k 2 )-bit kernel
25 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Conclusion
what we saw. . .
introduced induced variant with O(k 3 · m) time algorithm
2
improved 2O(k log k) time algorithm to 2O(k log k) time
presented O(2k · k 2 )-bit kernel
future work
find polynomial kernel (or prove nonexistence)
distance from triviality measures
find 2O(k) time algorithm
25 / 26
Introduction Induced Haplotyping Improved FPT algorithm Simple Kernel Conclusion
Thank you
26 / 26