Abstract
In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.
We demonstrate the applications of these algorithms to two concrete use cases: (1) analysis of protein coexpression in colorectal cancer cell lines; and (2) protein homology identification from, both sequence and structural similarity, data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Boccaletti, S., et al.: The structure and dynamics of multilayer networks. Phys. Rep. 544, 1–122 (2014)
Choudhari, J., et al.: Genomic determinants of protein abundance variation in colorectal cancer cells. Cell Rep. 20, 2201–2214 (2017)
Enright, A., et al.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)
Fortunato, A.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Freivalds, K., Dogrusoz, U., Kikusts, P.: Disconnected graph layout and the polyomino packing approach. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 378–391. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45848-4_30
Freivalds, K., Glagoļevs, J.: Graph compact orthogonal layout algorithm. In: Fouilhoux, P., Gouveia, L.E.N., Mahjoub, A.R., Paschos, V.T. (eds.) ISCO 2014. LNCS, vol. 8596, pp. 255–266. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09174-7_22
Grishin, N.: Fold change in evolution of protein structures. Struct. Biol. 134, 167–185 (2001)
Higgins, D., Sievers, F.: Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol. Biol. 1079, 105–116 (2014)
Higgins, D., et al.: ClustalW and ClustalX version 2.0. Bioinformatics 23, 2947–2948 (2007)
Jonsson, P., et al.: Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinform. 7(1), 2 (2006)
Kurbatova, N., Mancinska, L., Viksna, J.: Protein structure comparison based on fold evolution. Lect. Notes Inform. 115, 78–89 (2007)
Kurbatova, N., Viksna, J.: Exploration of evolutionary relations between protein structures. Commun. Comput. Inf. Sci. 13, 154–166 (2008)
Langfelder, P., Horwath, S.: WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008)
Maddi, A., Eslahchi, C.: Discovering overlapped protein complexes from weighted PPI networks by removing inter-module hubs. Sci. Rep. 7, 3247 (2017)
Nepusz, T., Yu, H., Paccanaro, A.: Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods 9, 471–472 (2012)
Orengo, C., et al.: New functional families in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res. 44, 490–498 (2013)
Pearson, R.: Effective protein sequence comparison. Methods Enzymol. 266, 227–258 (1996)
Petryszak, R., et al.: Expression Atlas update - an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44(D1), 746–752 (2016)
Pirim, H., Eksioglu, B., Perkins, A.: Clustering high throughput biological data with B-MST, a minimum spanning tree based heuristic. Comput. Biol. Med. 62, 94–102 (2015)
Rung, J., Schlitt, T., Brazma, A., Freivalds, K., Vilo, J.: Building and analysing genome-wide gene disruption networks. Bioinformatics 18, S202–S210 (2002)
Schaeffer, S.: Graph clustering. Comput. Sci. Rev. 1, 27–64 (2007)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Traag, A., Doreian, P., Mrvar, A.: Partitioning signed networks. ArXiv e-prints abs/1803.02082 (2018)
van Dongen, S., Abreu-Goodger, C.: Using MCL to extract clusters from networks. In: van Helden, J., Toussaint, A., Thieffry, D. (eds.) Bacterial Molecular Networks. Methods in Molecular Biology (Methods and Protocols), vol. 804, pp. 281–295. Springer, New York (2012). https://doi.org/10.1007/978-1-61779-361-5_15
Vihrovs, J., Prusis, K., Freivalds, K., Rucevskis, P., Krebs, V.: A potential field function for overlapping point set and graph cluster visualization. Commun. Comput. Inf. Sci. 550, 136–152 (2015)
Viksna, J., Gilbert, D.: Assessment of the probabilities for evolutionary structural changes in protein folds. Bioinformatics 23, 832–841 (2007)
Acknowledgements
The research was supported by ERDF project 1.1.1.1/16/A/135.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Celms, E. et al. (2018). Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data. In: Lupeikiene, A., Vasilecas, O., Dzemyda, G. (eds) Databases and Information Systems. DB&IS 2018. Communications in Computer and Information Science, vol 838. Springer, Cham. https://doi.org/10.1007/978-3-319-97571-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-97571-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97570-2
Online ISBN: 978-3-319-97571-9
eBook Packages: Computer ScienceComputer Science (R0)