Abstract
Fragmentation trees are a technique for identifying molecular formulas and deriving some chemical properties of metabolites—small organic molecules—solely from mass spectral data. Computing these trees involves finding exact solutions to the NP-hard Maximum Colorful Subtree problem. Existing solvers struggle to solve the large instances involved fast enough to keep up with instrument throughput, and their performance remains a hindrance to adoption in practice.
We attack this problem on two fronts: by combining fast and effective reduction algorithms with a strong integer linear program (ILP) formulation of the problem, we achieve overall speedups of 9.4 fold and 8.8 fold on two sets of real-world problems—without sacrificing optimality. Both approaches are, to our knowledge, the first of their kind for this problem. We also evaluate the strategy of solving global problem instances, instead of first subdividing them into many candidate instances as has been done in the past. Software (C++ source for our reduction program and our CPLEX/Gurobi driver program) available under LGPL at https://github.com/wtwhite/speedy_colorful_subtrees/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Böcker, S., Lipták, Z.: A fast and simple algorithm for the Money Changing Problem. Algorithmica 48(4), 413–432 (2007)
Böcker, S., Rasche, F.: Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics 24, I49–I55 (2008). Proc. of European Conference on Computational Biology (ECCB 2008)
Böcker, S., Letzel, M., Lipták, Z., Pervukhin, A.: SIRIUS: Decomposing isotope patterns for metabolite identification. Bioinformatics 25(2), 218–224 (2009)
Cherkassky, B., Goldberg, A.: On implementing push-relabel method for the maximum flow problem. Algorithmica 19, 390–410 (1997)
Dührkop, K., Böcker, S.: Fragmentation trees reloaded. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 65–79. Springer, Heidelberg (2015)
Dührkop, K., Hufsky, F., Böcker, S.: Molecular formula identification using isotope pattern analysis and calculation of fragmentation trees. Mass Spectrom 3(special issue 2), S0037 (2014)
Kind, T., Fiehn, O.: Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics 7(1), 234 (2006)
Menikarachchi, L.C., Cawley, S., Hill, D.W., Hall, L.M., Hall, L., Lai, S., Wilder, J., Grant, D.F.: MolFind: A software package enabling HPLC/MS-based identification of unknown chemical structures. Anal Chem 84(21), 9388–9394 (2012)
Meringer, M., Reinker, S., Zhang, J., Muller, A.: MS/MS data improves automated determination of molecular formulas by mass spectrometry. MATCH-Commun Math Co 65, 259–290 (2011)
Nishioka, T., Kasama, T., Kinumi, T., Makabe, H., Matsuda, F., Miura, D., Miyashita, M., Nakamura, T., Tanaka, K., Yamamoto, A.: Winners of CASMI2013: Automated tools and challenge data. Mass Spectrom 3(special issue 2), S0039 (2014)
Pluskal, T., Uehara, T., Yanagida, M.: Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. Anal Chem 84(10), 4396–4403 (2012)
Rasche, F., Svatoš, A., Maddula, R.K., Böttcher, C., Böcker, S.: Computing fragmentation trees from tandem mass spectrometry data. Anal Chem 83(4), 1243–1251 (2011)
Rasche, F., Scheubert, K., Hufsky, F., Zichner, T., Kai, M., Svatoš, A., Böcker, S.: Identifying the unknowns by aligning fragmentation trees. Anal Chem 84(7), 3417–3426 (2012)
Rauf, I., Rasche, F., Nicolas, F., Böcker, S.: Finding maximum colorful subtrees in practice. J Comput Biol 20(4), 1–11 (2013)
Rojas-Chertó, M., Kasper, P.T., Willighagen, E.L., Vreeken, R.J., Hankemeier, T., Reijmers, T.H.: Elemental composition determination based on MS\(^n\). Bioinformatics 27, 2376–2383 (2011)
Shen, H., Dührkop, K., Böcker, S., Rousu, J.: Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics 30(12), 157–164 (2014). Proc. of Intelligent Systems for Molecular Biology (ISMB 2014)
Tautenhahn, R., Cho, K., Uritboonthai, W., Zhu, Z., Patti, G.J., Siuzdak, G.: An accelerated workflow for untargeted metabolomics using the METLIN database. Nat Biotechnol 30(9), 826–828 (2012)
Wishart, D.S., Knox, C., Guo, A.C., Eisner, R., Young, N., Gautam, B., Hau, D.D., Psychogios, N., Dong, E., Bouatra, S., Mandal, R., Sinelnikov, I., Xia, J., Jia, L., Cruz, J.A., Lim, E., Sobsey, C.A., Shrivastava, S., Huang, P., Liu, P., Fang, L., Peng, J., Fradette, R., Cheng, D., Tzur, D., Clements, M., Lewis, A., Souza, A.D., Zuniga, A., Dawe, M., Xiong, Y., Clive, D., Greiner, R., Nazyrova, A., Shaykhutdinov, R., Li, L., Vogel, H.J., Forsythe, I.: HMDB: A knowledgebase for the human metabolome. Nucleic Acids Res 37, D603–D610 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
White, W.T.J., Beyer, S., Dührkop, K., Chimani, M., Böcker, S. (2015). Speedy Colorful Subtrees. In: Xu, D., Du, D., Du, D. (eds) Computing and Combinatorics. COCOON 2015. Lecture Notes in Computer Science(), vol 9198. Springer, Cham. https://doi.org/10.1007/978-3-319-21398-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-21398-9_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21397-2
Online ISBN: 978-3-319-21398-9
eBook Packages: Computer ScienceComputer Science (R0)