Abstract
We propose a Bayesian method termed MultiMuC for accurate detection of somatic mutations (mutation call) from multi-regional tumor sequence data sets. To improve detection performance, our method is based on the assumption of mutation sharing: if we can predict at least one tumor region has the mutation, then we can be more confident to detect a mutation in more tumor regions by lowering the original threshold of detection. We find two drawbacks in existing methods for leveraging the assumption of mutation sharing. First, existing methods do not consider the probability of the “No-TP (True Positive)” case: we could expect mutation candidates in multiple regions, but actually, no true mutations exist. Second, existing methods cannot leverage scores from other state-of-the-art mutation calling methods for a single-regional tumor. We overcome the first drawback through evaluation of the probability of the No-TP case. Next, we solve the second drawback by the idea of Bayes-factor-based model construction that enables flexible integration of probability-based mutation call scores as building blocks of a Bayesian statistical model. We empirically evaluate that our method steadily improves results from mutation calling methods for a single-regional tumor, e.g., Strelka2 and NeuSomatic, and outperforms existing methods for multi-regional tumors through a real-data-based simulation study. Our implementation of MultiMuC is available at https://github.com/takumorizo/MultiMuC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Koboldt, D.C., et al.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)
Saunders, C.T., et al.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28(14), 1811–1817 (2012)
Cibulskis, K., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)
Shiraishi, Y., et al.: An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acid Res. 41(7), e89 (2013)
Usuyama, N., et al.: HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations. Bioinformatics 30(23), 3302–3309 (2014)
Kim, S., et al.: Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15(8), 591–594 (2018)
Moriyama, T., et al.: A Bayesian model integration for mutation calling through data partitioning. Bioinformatics, btz233 (2019). https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz233/5423180
Sahraeian, S.M.E., et al.: Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10(1), 1041 (2019)
Poplin, R., et al.: A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36(10), 983–987 (2018)
Reiter, J.G., et al.: Reconstructing metastatic seeding patterns of human cancers. Nature Commun. 8, 14114 (2017)
Dorri, F., et al.: Somatic mutation detection and classification through probabilistic integration of clonal population information. Commun. Biol. 2(1), 44 (2019)
van Rens, K.E., et al.: SNV-PPILP: refined SNV calling for tumor data using perfect phylogenies and ILP. Bioinformatics 31(7), 1133–1135 (2015)
Salari, R., et al.: Inference of tumor phylogenies with improved somatic mutation discovery. J. Comput. Biol. 20(11), 933–944 (2013)
Josephidou, M., et al.: multiSNV: a probabilistic approach for improving detection of somatic point mutations from multiple related tumour samples. Nuclic Acids Res. 43(9), e61 (2015)
Detering, H., et al.: Accuracy of somatic variant detection in multiregional tumor sequencing data. bioRxiv 655605 (2019)
Kass, R.E., et al.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995)
Neal, R.M.: Probabilistic inference using Markov Chain Monte Carlo methods. Technical report, Department of Computer Science, University of Toronto (1993)
Koboldt, D.C., et al.: VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17), 2283–2285 (2009)
Wilm, A., et al.: LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nuclic Acids Res. 40(22), 11189–11201 (2012)
Narzisi, G., et al.: Genome-wide somatic variant calling using localized colored de Bruijn graphs. Commun. Biol. 1(1), 20 (2018)
Acknowledgments
We used the supercomputers at Human Genome Center, the Institute of Medical Science, the University of Tokyo. This work has been supported by the Grant-in-Aid for JSPS Research Fellow (17J08884) and MEXT/JSPS KAKENHI Grant (15H05912, hp180198, hp170227, 18H03329, hp190158).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Increasing Posterior Odds Score of Mutation Call Given \(C=1\)
In the main text, we mentioned the assumption that lowering the threshold of scores given \(C=1\) leads to performance improvement at Sect. 2.1. Here, we show that the assumption is based on an increase of posterior odds for mutation call. We assume that each score is represented by posterior odds form \(V_i := Pr(X_i = 1|D_i)/Pr(X_i = 0|D_i)\). If we observe \(C = 1\) in addition to the observed sequence data set, the true posterior odds can be represented as follows.
The true posterior odds is greater than the original posterior odds as shown in the following lemma.
Lemma 2 (Increasing posterior odds)
[Increasing posterior odds]
Proof
It is sufficient to show that the following condition holds true.
The condition can be proved by evaluating \(Pr(X_i = 1)\) and \(Pr(X_i = 0)\) as follows.
By using the above evaluations, we can show \(\frac{ Pr(X_i = 1|C= 1)}{Pr(X_i = 0| C = 1)} > \frac{ Pr(X_i = 1)}{Pr(X_i = 0)}\).
From this condition and the given hypothesis,
\(\square \)
1.2 A.2 The Probability of No-TP Case in General
Here, we evaluate the probability of the No-TP case when \(M < N\). For simplicity, we define variables and relational operators between vector and scalar. For Eq. (10), we also define similar relational operators for \(\ge , <, \le , =\) between vector and scalar.
The probability of the No-TP case can be represented as follows.
For \(Pr( \varvec{V} > v, \widetilde{\varvec{V}} \le v )\), we can obtain a lower bound as follows.
where \(R_{i}(v) := Pr(V_{i} > v|X_{i} = 1)\) corresponds to recall.
From Eq. (12), if \(A > 0\), we can derive an upper bound for Eq. (11) as follows.
From Eq. (13), as the specificity increases, the probability of the No-TP case also decreases when \(M < N\).
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Moriyama, T., Imoto, S., Miyano, S., Yamaguchi, R. (2019). Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples. In: Bebis, G., Benos, T., Chen, K., Jahn, K., Lima, E. (eds) Mathematical and Computational Oncology. ISMCO 2019. Lecture Notes in Computer Science(), vol 11826. Springer, Cham. https://doi.org/10.1007/978-3-030-35210-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-35210-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35209-7
Online ISBN: 978-3-030-35210-3
eBook Packages: Computer ScienceComputer Science (R0)