Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples

  • Conference paper
  • First Online:
Mathematical and Computational Oncology (ISMCO 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11826))

Included in the following conference series:

Abstract

We propose a Bayesian method termed MultiMuC for accurate detection of somatic mutations (mutation call) from multi-regional tumor sequence data sets. To improve detection performance, our method is based on the assumption of mutation sharing: if we can predict at least one tumor region has the mutation, then we can be more confident to detect a mutation in more tumor regions by lowering the original threshold of detection. We find two drawbacks in existing methods for leveraging the assumption of mutation sharing. First, existing methods do not consider the probability of the “No-TP (True Positive)” case: we could expect mutation candidates in multiple regions, but actually, no true mutations exist. Second, existing methods cannot leverage scores from other state-of-the-art mutation calling methods for a single-regional tumor. We overcome the first drawback through evaluation of the probability of the No-TP case. Next, we solve the second drawback by the idea of Bayes-factor-based model construction that enables flexible integration of probability-based mutation call scores as building blocks of a Bayesian statistical model. We empirically evaluate that our method steadily improves results from mutation calling methods for a single-regional tumor, e.g., Strelka2 and NeuSomatic, and outperforms existing methods for multi-regional tumors through a real-data-based simulation study. Our implementation of MultiMuC is available at https://github.com/takumorizo/MultiMuC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Koboldt, D.C., et al.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)

    Article  Google Scholar 

  2. Saunders, C.T., et al.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28(14), 1811–1817 (2012)

    Article  Google Scholar 

  3. Cibulskis, K., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)

    Article  Google Scholar 

  4. Shiraishi, Y., et al.: An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acid Res. 41(7), e89 (2013)

    Article  Google Scholar 

  5. Usuyama, N., et al.: HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations. Bioinformatics 30(23), 3302–3309 (2014)

    Article  Google Scholar 

  6. Kim, S., et al.: Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15(8), 591–594 (2018)

    Article  Google Scholar 

  7. Moriyama, T., et al.: A Bayesian model integration for mutation calling through data partitioning. Bioinformatics, btz233 (2019). https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz233/5423180

  8. Sahraeian, S.M.E., et al.: Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10(1), 1041 (2019)

    Article  Google Scholar 

  9. Poplin, R., et al.: A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36(10), 983–987 (2018)

    Article  Google Scholar 

  10. Reiter, J.G., et al.: Reconstructing metastatic seeding patterns of human cancers. Nature Commun. 8, 14114 (2017)

    Article  Google Scholar 

  11. Dorri, F., et al.: Somatic mutation detection and classification through probabilistic integration of clonal population information. Commun. Biol. 2(1), 44 (2019)

    Article  Google Scholar 

  12. van Rens, K.E., et al.: SNV-PPILP: refined SNV calling for tumor data using perfect phylogenies and ILP. Bioinformatics 31(7), 1133–1135 (2015)

    Article  Google Scholar 

  13. Salari, R., et al.: Inference of tumor phylogenies with improved somatic mutation discovery. J. Comput. Biol. 20(11), 933–944 (2013)

    Article  MathSciNet  Google Scholar 

  14. Josephidou, M., et al.: multiSNV: a probabilistic approach for improving detection of somatic point mutations from multiple related tumour samples. Nuclic Acids Res. 43(9), e61 (2015)

    Article  Google Scholar 

  15. Detering, H., et al.: Accuracy of somatic variant detection in multiregional tumor sequencing data. bioRxiv 655605 (2019)

    Google Scholar 

  16. Kass, R.E., et al.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995)

    Article  MathSciNet  Google Scholar 

  17. Neal, R.M.: Probabilistic inference using Markov Chain Monte Carlo methods. Technical report, Department of Computer Science, University of Toronto (1993)

    Google Scholar 

  18. Koboldt, D.C., et al.: VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17), 2283–2285 (2009)

    Article  Google Scholar 

  19. Wilm, A., et al.: LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nuclic Acids Res. 40(22), 11189–11201 (2012)

    Article  Google Scholar 

  20. Narzisi, G., et al.: Genome-wide somatic variant calling using localized colored de Bruijn graphs. Commun. Biol. 1(1), 20 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

We used the supercomputers at Human Genome Center, the Institute of Medical Science, the University of Tokyo. This work has been supported by the Grant-in-Aid for JSPS Research Fellow (17J08884) and MEXT/JSPS KAKENHI Grant (15H05912, hp180198, hp170227, 18H03329, hp190158).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Yamaguchi .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Increasing Posterior Odds Score of Mutation Call Given \(C=1\)

In the main text, we mentioned the assumption that lowering the threshold of scores given \(C=1\) leads to performance improvement at Sect. 2.1. Here, we show that the assumption is based on an increase of posterior odds for mutation call. We assume that each score is represented by posterior odds form \(V_i := Pr(X_i = 1|D_i)/Pr(X_i = 0|D_i)\). If we observe \(C = 1\) in addition to the observed sequence data set, the true posterior odds can be represented as follows.

$$\begin{aligned} V_i^{\prime }&:= \frac{Pr(X_i = 1|C=1, D_i)}{Pr(X_i = 0|C=1, D_i)} = \frac{Pr(X_i = 1, C=1, D_i)}{Pr(X_i = 0, C=1, D_i)} \\&= \frac{ Pr( X_i = 1|C= 1 )}{Pr( X_i = 0| C = 1)} \frac{Pr( D_i|X_i = 1, C=1 )}{ Pr( D_i|X_i = 0, C=1 ) } \end{aligned}$$

The true posterior odds is greater than the original posterior odds as shown in the following lemma.

Lemma 2 (Increasing posterior odds)

[Increasing posterior odds]

$$\begin{aligned}&\text{ If } Pr(D_i|X_i, C) = Pr(D_i|X_i),\, 0< Pr(C = 0) < 1,\text{ and } V_i, V_i^{\prime } \in \mathbb {R},\,\text{ then } V_i^{\prime } > V_i\,. \end{aligned}$$

Proof

It is sufficient to show that the following condition holds true.

$$\begin{aligned}&\cdot \frac{ Pr(X_i = 1|C= 1)}{Pr(X_i = 0| C = 1)} > \frac{ Pr(X_i = 1)}{Pr(X_i = 0)} \end{aligned}$$

The condition can be proved by evaluating \(Pr(X_i = 1)\) and \(Pr(X_i = 0)\) as follows.

$$\begin{aligned}&Pr(X_i = 1) \\&= Pr(X_i = 1|C = 1) Pr(C = 1) + Pr(X_i = 1|C = 0) Pr(C = 0) \\&= Pr(X_i = 1|C = 1) Pr(C = 1) \,(\because \,Pr(X_i = 1|C = 0) = 0) \\&Pr(X_i = 0) \\&= Pr(X_i = 0|C = 1) Pr(C = 1) + Pr(X_i = 0|C = 0) Pr(C = 0) \\&= Pr(X_i = 0|C = 1) Pr(C = 1) + Pr(C = 0) \,(\because \,Pr(X_i = 0|C = 0) = 1) \\&> Pr(X_i = 0|C = 1) Pr(C = 1) \,(\because \,0< Pr(C = 0) < 1) \end{aligned}$$

By using the above evaluations, we can show \(\frac{ Pr(X_i = 1|C= 1)}{Pr(X_i = 0| C = 1)} > \frac{ Pr(X_i = 1)}{Pr(X_i = 0)}\).

From this condition and the given hypothesis,

$$\begin{aligned}&V_i^{\prime } = \frac{ Pr( X_i = 1|C= 1 )}{Pr( X_i = 0| C = 1)} \frac{Pr( D_i|X_i = 1, C=1 )}{ Pr( D_i|X_i = 0, C=1 ) } > \frac{ Pr( X_i = 1 )}{Pr( X_i = 0)} \frac{Pr( D_i|X_i = 1 )}{ Pr( D_i|X_i = 0) } = V_i \end{aligned}$$

   \(\square \)

1.2 A.2 The Probability of No-TP Case in General

Here, we evaluate the probability of the No-TP case when \(M < N\). For simplicity, we define variables and relational operators between vector and scalar. For Eq. (10), we also define similar relational operators for \(\ge , <, \le , =\) between vector and scalar.

$$\begin{aligned}&\varvec{V} :=(V_{1},\cdots ,V_{M}),\, \widetilde{\varvec{V}} :=(V_{M+1},\cdots ,V_{N}),\varvec{X} :=(X_{1},\cdots ,X_{M}),\, \widetilde{\varvec{X}} :=(X_{M+1},\cdots ,X_{N}), \nonumber \\&\varvec{u}> v \iff u_{i} > v \,\,(\forall i ) \\&\varvec{u} \ne v \iff u_{i} \ne v \,\,(\exists i ) \nonumber \end{aligned}$$
(10)

The probability of the No-TP case can be represented as follows.

$$\begin{aligned}&Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0|\varvec{V} > v, \widetilde{\varvec{V}} \le v ) \end{aligned}$$
(11)

For \(Pr( \varvec{V} > v, \widetilde{\varvec{V}} \le v )\), we can obtain a lower bound as follows.

$$\begin{aligned}&Pr(\varvec{V}> v, \widetilde{\varvec{V}} \le v) \nonumber \\&\qquad = \sum _{\varvec{X}, \widetilde{\varvec{X}}} Pr( \varvec{X},\widetilde{\varvec{X}} ) Pr(\varvec{V}> v, \widetilde{\varvec{V}} \le v| \varvec{X}, \widetilde{\varvec{X}}) \nonumber \\&\qquad = \sum _{\varvec{X}, \widetilde{\varvec{X}}} Pr( \varvec{X},\widetilde{\varvec{X}} ) \prod _{i=1}^{M} Pr(V_{i} > v|X_{i}) \prod _{k=M+1}^{N} Pr(V_{k} \le v|X_{k}) \,(\because \, Eq.\, (2)) \nonumber \\&\qquad = \sum _{\varvec{X}, \widetilde{\varvec{X}}} Pr( \varvec{X}, \widetilde{\varvec{X}} ) \prod _{i=1}^{M} (1-s_{i}(v))^{1-X_{i}} R_{i}(v)^{X_{i}} \prod _{k=M+1}^{N} s_{k}(v)^{1-X_{k}} (1-R_{k}(v))^{X_{k}} \nonumber \\&\qquad \ge Pr( \varvec{X} = 1, \widetilde{\varvec{X}} = 0 ) \prod _{i=1}^{M} R_{i}(v) \prod _{k=M+1}^{N} s_{k}(v) =: A, \end{aligned}$$
(12)

where \(R_{i}(v) := Pr(V_{i} > v|X_{i} = 1)\) corresponds to recall.

From Eq. (12), if \(A > 0\), we can derive an upper bound for Eq. (11) as follows.

$$\begin{aligned}&Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0|\varvec{V}> v, \widetilde{\varvec{V}} \le v ) = \frac{ Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0, \varvec{V}> v, \widetilde{\varvec{V}} \le v ) }{ Pr( \varvec{V} > v, \widetilde{\varvec{V}} \le v ) } \nonumber \nonumber \\&\le min\left( 1, \frac{ Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0 ) \prod _{i=1}^{M} (1-s_{i}(v)) \prod _{k=M+1}^{N} s_{k}(v) }{ Pr( \varvec{X} = 1, \widetilde{\varvec{X}} = 0 ) \prod _{i=1}^{M} R_{i}(v) \prod _{k=M+1}^{N} s_{k}(v) } \right) \nonumber \\&= min \left( 1, \frac{Pr( \varvec{X} = 0, \widetilde{\varvec{X}} = 0 )}{ Pr( \varvec{X} = 1, \widetilde{\varvec{X}} = 0 ) } \prod _{i=1}^{M} \frac{ 1-s_{i}(v) }{ R_{i}(v)} \right) \end{aligned}$$
(13)

From Eq. (13), as the specificity increases, the probability of the No-TP case also decreases when \(M < N\).

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moriyama, T., Imoto, S., Miyano, S., Yamaguchi, R. (2019). Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples. In: Bebis, G., Benos, T., Chen, K., Jahn, K., Lima, E. (eds) Mathematical and Computational Oncology. ISMCO 2019. Lecture Notes in Computer Science(), vol 11826. Springer, Cham. https://doi.org/10.1007/978-3-030-35210-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35210-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35209-7

  • Online ISBN: 978-3-030-35210-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics