Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples

Moriyama, Takuya; Imoto, Seiya; Miyano, Satoru; Yamaguchi, Rui

doi:10.1007/978-3-030-35210-3_4

Takuya Moriyama¹³,
Seiya Imoto¹⁴,
Satoru Miyano^13,14 &
…
Rui Yamaguchi^13,15,16

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11826))

Included in the following conference series:

International Symposium on Mathematical and Computational Oncology

353 Accesses
1 Citations

Abstract

We propose a Bayesian method termed MultiMuC for accurate detection of somatic mutations (mutation call) from multi-regional tumor sequence data sets. To improve detection performance, our method is based on the assumption of mutation sharing: if we can predict at least one tumor region has the mutation, then we can be more confident to detect a mutation in more tumor regions by lowering the original threshold of detection. We find two drawbacks in existing methods for leveraging the assumption of mutation sharing. First, existing methods do not consider the probability of the “No-TP (True Positive)” case: we could expect mutation candidates in multiple regions, but actually, no true mutations exist. Second, existing methods cannot leverage scores from other state-of-the-art mutation calling methods for a single-regional tumor. We overcome the first drawback through evaluation of the probability of the No-TP case. Next, we solve the second drawback by the idea of Bayes-factor-based model construction that enables flexible integration of probability-based mutation call scores as building blocks of a Bayesian statistical model. We empirically evaluate that our method steadily improves results from mutation calling methods for a single-regional tumor, e.g., Strelka2 and NeuSomatic, and outperforms existing methods for multi-regional tumors through a real-data-based simulation study. Our implementation of MultiMuC is available at https://github.com/takumorizo/MultiMuC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CloneSig can jointly infer intra-tumor heterogeneity and mutational signature activity in bulk tumor sequencing data

Article Open access 09 September 2021

Mutational concordance analysis provides supportive information for double cancer diagnosis

Article Open access 19 February 2021

isma: an R package for the integrative analysis of mutations detected by multiple pipelines

Article Open access 28 February 2019

References

Koboldt, D.C., et al.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)
Article Google Scholar
Saunders, C.T., et al.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28(14), 1811–1817 (2012)
Article Google Scholar
Cibulskis, K., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)
Article Google Scholar
Shiraishi, Y., et al.: An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acid Res. 41(7), e89 (2013)
Article Google Scholar
Usuyama, N., et al.: HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations. Bioinformatics 30(23), 3302–3309 (2014)
Article Google Scholar
Kim, S., et al.: Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15(8), 591–594 (2018)
Article Google Scholar
Moriyama, T., et al.: A Bayesian model integration for mutation calling through data partitioning. Bioinformatics, btz233 (2019). https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz233/5423180
Sahraeian, S.M.E., et al.: Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10(1), 1041 (2019)
Article Google Scholar
Poplin, R., et al.: A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36(10), 983–987 (2018)
Article Google Scholar
Reiter, J.G., et al.: Reconstructing metastatic seeding patterns of human cancers. Nature Commun. 8, 14114 (2017)
Article Google Scholar
Dorri, F., et al.: Somatic mutation detection and classification through probabilistic integration of clonal population information. Commun. Biol. 2(1), 44 (2019)
Article Google Scholar
van Rens, K.E., et al.: SNV-PPILP: refined SNV calling for tumor data using perfect phylogenies and ILP. Bioinformatics 31(7), 1133–1135 (2015)
Article Google Scholar
Salari, R., et al.: Inference of tumor phylogenies with improved somatic mutation discovery. J. Comput. Biol. 20(11), 933–944 (2013)
Article MathSciNet Google Scholar
Josephidou, M., et al.: multiSNV: a probabilistic approach for improving detection of somatic point mutations from multiple related tumour samples. Nuclic Acids Res. 43(9), e61 (2015)
Article Google Scholar
Detering, H., et al.: Accuracy of somatic variant detection in multiregional tumor sequencing data. bioRxiv 655605 (2019)
Google Scholar
Kass, R.E., et al.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995)
Article MathSciNet Google Scholar
Neal, R.M.: Probabilistic inference using Markov Chain Monte Carlo methods. Technical report, Department of Computer Science, University of Toronto (1993)
Google Scholar
Koboldt, D.C., et al.: VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17), 2283–2285 (2009)
Article Google Scholar
Wilm, A., et al.: LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nuclic Acids Res. 40(22), 11189–11201 (2012)
Article Google Scholar
Narzisi, G., et al.: Genome-wide somatic variant calling using localized colored de Bruijn graphs. Commun. Biol. 1(1), 20 (2018)
Article Google Scholar

Download references

Acknowledgments

We used the supercomputers at Human Genome Center, the Institute of Medical Science, the University of Tokyo. This work has been supported by the Grant-in-Aid for JSPS Research Fellow (17J08884) and MEXT/JSPS KAKENHI Grant (15H05912, hp180198, hp170227, 18H03329, hp190158).

Author information

Authors and Affiliations

Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Takuya Moriyama, Satoru Miyano & Rui Yamaguchi
Health Intelligence Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Seiya Imoto & Satoru Miyano
Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Japan
Rui Yamaguchi
Department of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Japan
Rui Yamaguchi

Authors

Takuya Moriyama
View author publications
You can also search for this author in PubMed Google Scholar
Seiya Imoto
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Miyano
View author publications
You can also search for this author in PubMed Google Scholar
Rui Yamaguchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Yamaguchi .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Pittsburgh, Pittsburgh, PA, USA
Takis Benos
The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Ken Chen
ETH Zurich, Basel, Switzerland
Katharina Jahn
The University of Texas, Austin, TX, USA
Ernesto Lima

A Appendix

1.1 A.1 Increasing Posterior Odds Score of Mutation Call Given $C=1$

In the main text, we mentioned the assumption that lowering the threshold of scores given $C=1$ leads to performance improvement at Sect. 2.1. Here, we show that the assumption is based on an increase of posterior odds for mutation call. We assume that each score is represented by posterior odds form $V_i := Pr(X_i = 1|D_i)/Pr(X_i = 0|D_i)$. If we observe $C = 1$ in addition to the observed sequence data set, the true posterior odds can be represented as follows.

$$\begin{aligned} V_i^{\prime }&:= \frac{Pr(X_i = 1|C=1, D_i)}{Pr(X_i = 0|C=1, D_i)} = \frac{Pr(X_i = 1, C=1, D_i)}{Pr(X_i = 0, C=1, D_i)} \\&= \frac{ Pr( X_i = 1|C= 1 )}{Pr( X_i = 0| C = 1)} \frac{Pr( D_i|X_i = 1, C=1 )}{ Pr( D_i|X_i = 0, C=1 ) } \end{aligned}$$

The true posterior odds is greater than the original posterior odds as shown in the following lemma.

Lemma 2 (Increasing posterior odds)

[Increasing posterior odds]

$$\begin{aligned}&\text{ If } Pr(D_i|X_i, C) = Pr(D_i|X_i),\, 0< Pr(C = 0) < 1,\text{ and } V_i, V_i^{\prime } \in \mathbb {R},\,\text{ then } V_i^{\prime } > V_i\,. \end{aligned}$$

Proof

It is sufficient to show that the following condition holds true.

$$\begin{aligned}&\cdot \frac{ Pr(X_i = 1|C= 1)}{Pr(X_i = 0| C = 1)} > \frac{ Pr(X_i = 1)}{Pr(X_i = 0)} \end{aligned}$$

The condition can be proved by evaluating $Pr(X_i = 1)$ and $Pr(X_i = 0)$ as follows.

$$\begin{aligned}&Pr(X_i = 1) \\&= Pr(X_i = 1|C = 1) Pr(C = 1) + Pr(X_i = 1|C = 0) Pr(C = 0) \\&= Pr(X_i = 1|C = 1) Pr(C = 1) \,(\because \,Pr(X_i = 1|C = 0) = 0) \\&Pr(X_i = 0) \\&= Pr(X_i = 0|C = 1) Pr(C = 1) + Pr(X_i = 0|C = 0) Pr(C = 0) \\&= Pr(X_i = 0|C = 1) Pr(C = 1) + Pr(C = 0) \,(\because \,Pr(X_i = 0|C = 0) = 1) \\&> Pr(X_i = 0|C = 1) Pr(C = 1) \,(\because \,0< Pr(C = 0) < 1) \end{aligned}$$

By using the above evaluations, we can show $\frac{ Pr(X_i = 1|C= 1)}{Pr(X_i = 0| C = 1)} > \frac{ Pr(X_i = 1)}{Pr(X_i = 0)}$.

From this condition and the given hypothesis,

$$\begin{aligned}&V_i^{\prime } = \frac{ Pr( X_i = 1|C= 1 )}{Pr( X_i = 0| C = 1)} \frac{Pr( D_i|X_i = 1, C=1 )}{ Pr( D_i|X_i = 0, C=1 ) } > \frac{ Pr( X_i = 1 )}{Pr( X_i = 0)} \frac{Pr( D_i|X_i = 1 )}{ Pr( D_i|X_i = 0) } = V_i \end{aligned}$$

$\square $

1.2 A.2 The Probability of No-TP Case in General

Here, we evaluate the probability of the No-TP case when $M < N$. For simplicity, we define variables and relational operators between vector and scalar. For Eq. (10), we also define similar relational operators for $\ge , <, \le , =$ between vector and scalar.

$$\begin{aligned}&\varvec{V} :=(V_{1},\cdots ,V_{M}),\, \widetilde{\varvec{V}} :=(V_{M+1},\cdots ,V_{N}),\varvec{X} :=(X_{1},\cdots ,X_{M}),\, \widetilde{\varvec{X}} :=(X_{M+1},\cdots ,X_{N}), \nonumber \\&\varvec{u}> v \iff u_{i} > v \,\,(\forall i ) \\&\varvec{u} \ne v \iff u_{i} \ne v \,\,(\exists i ) \nonumber \end{aligned}$$

(10)

The probability of the No-TP case can be represented as follows.

$$\begin{aligned}&Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0|\varvec{V} > v, \widetilde{\varvec{V}} \le v ) \end{aligned}$$

(11)

For $Pr( \varvec{V} > v, \widetilde{\varvec{V}} \le v )$, we can obtain a lower bound as follows.

$$\begin{aligned}&Pr(\varvec{V}> v, \widetilde{\varvec{V}} \le v) \nonumber \\&\qquad = \sum _{\varvec{X}, \widetilde{\varvec{X}}} Pr( \varvec{X},\widetilde{\varvec{X}} ) Pr(\varvec{V}> v, \widetilde{\varvec{V}} \le v| \varvec{X}, \widetilde{\varvec{X}}) \nonumber \\&\qquad = \sum _{\varvec{X}, \widetilde{\varvec{X}}} Pr( \varvec{X},\widetilde{\varvec{X}} ) \prod _{i=1}^{M} Pr(V_{i} > v|X_{i}) \prod _{k=M+1}^{N} Pr(V_{k} \le v|X_{k}) \,(\because \, Eq.\, (2)) \nonumber \\&\qquad = \sum _{\varvec{X}, \widetilde{\varvec{X}}} Pr( \varvec{X}, \widetilde{\varvec{X}} ) \prod _{i=1}^{M} (1-s_{i}(v))^{1-X_{i}} R_{i}(v)^{X_{i}} \prod _{k=M+1}^{N} s_{k}(v)^{1-X_{k}} (1-R_{k}(v))^{X_{k}} \nonumber \\&\qquad \ge Pr( \varvec{X} = 1, \widetilde{\varvec{X}} = 0 ) \prod _{i=1}^{M} R_{i}(v) \prod _{k=M+1}^{N} s_{k}(v) =: A, \end{aligned}$$

(12)

where $R_{i}(v) := Pr(V_{i} > v|X_{i} = 1)$ corresponds to recall.

From Eq. (12), if $A > 0$, we can derive an upper bound for Eq. (11) as follows.

$$\begin{aligned}&Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0|\varvec{V}> v, \widetilde{\varvec{V}} \le v ) = \frac{ Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0, \varvec{V}> v, \widetilde{\varvec{V}} \le v ) }{ Pr( \varvec{V} > v, \widetilde{\varvec{V}} \le v ) } \nonumber \nonumber \\&\le min\left( 1, \frac{ Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0 ) \prod _{i=1}^{M} (1-s_{i}(v)) \prod _{k=M+1}^{N} s_{k}(v) }{ Pr( \varvec{X} = 1, \widetilde{\varvec{X}} = 0 ) \prod _{i=1}^{M} R_{i}(v) \prod _{k=M+1}^{N} s_{k}(v) } \right) \nonumber \\&= min \left( 1, \frac{Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0 )}{ Pr( \varvec{X} = 1, \widetilde{\varvec{X}} = 0 ) } \prod _{i=1}^{M} \frac{ 1-s_{i}(v) }{ R_{i}(v)} \right) \end{aligned}$$

(13)

From Eq. (13), as the specificity increases, the probability of the No-TP case also decreases when $M < N$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moriyama, T., Imoto, S., Miyano, S., Yamaguchi, R. (2019). Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples. In: Bebis, G., Benos, T., Chen, K., Jahn, K., Lima, E. (eds) Mathematical and Computational Oncology. ISMCO 2019. Lecture Notes in Computer Science(), vol 11826. Springer, Cham. https://doi.org/10.1007/978-3-030-35210-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-35210-3_4
Published: 12 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35209-7
Online ISBN: 978-3-030-35210-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CloneSig can jointly infer intra-tumor heterogeneity and mutational signature activity in bulk tumor sequencing data

Mutational concordance analysis provides supportive information for double cancer diagnosis

isma: an R package for the integrative analysis of mutations detected by multiple pipelines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

1.1 A.1 Increasing Posterior Odds Score of Mutation Call Given \(C=1\)

Lemma 2 (Increasing posterior odds)

Proof

1.2 A.2 The Probability of No-TP Case in General

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CloneSig can jointly infer intra-tumor heterogeneity and mutational signature activity in bulk tumor sequencing data

Mutational concordance analysis provides supportive information for double cancer diagnosis

isma: an R package for the integrative analysis of mutations detected by multiple pipelines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Increasing Posterior Odds Score of Mutation Call Given \(C=1\)

Lemma 2 (Increasing posterior odds)

Proof

1.2 A.2 The Probability of No-TP Case in General

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation