Abstract
Graph similarity search (GSS) models chemical compounds as a graph database. GSS is an essential tool for drug discovery because they can find similar graphs (compounds) for a query. Existing GSS methods have two critical limitations. First, handling large databases is time consuming. Second, finding compounds with the structure-activity relationship (SAR), which is vital in drug discovery, remains difficult. Herein a novel graph-based method for chemical compound searches is proposed to overcome these limitations. Since compounds with SAR share similar substructures, the proposed method extracts correlated subgraphs included in a query and explores similar compounds. In practical drug discovery task, our method achieves faster searches and improved accuracy compared to existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bellmann, L., Penner, P., Rarey, M.: Connected subgraph fingerprints: representing molecules using exhaustive subgraph enumeration. J. Chem. Inf. Model. 59(11), 4625–4635 (2019)
Cao, Y., Jiang, T., Girke, T.: A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24(13), i366–i374 (2008)
Chang, L., Feng, X., Yao, K., Qin, L., Zhang, W.: Accelerating graph similarity search via efficient GED computation. IEEE Trans. Knowl. Data Eng. 35(5), 4485–4498 (2023)
Doan, K.D., Manchanda, S., Mahapatra, S., Reddy, C.K.: Interpretable graph similarity computation via differentiable optimal alignment of node embeddings. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pp. 665–674 (2021)
Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. In: Proceedings of the 7th International Workshop on Graph-Based Representations in Pattern Recognition (GbRPR 2011), pp. 102–111 (2011)
Garcia-Hernandez, C., Fernández, A., Serratosa, F.: Ligand-based virtual screening using graph edit distance as molecular similarity measure. J. Chem. Inf. Model. 59(4), 1410–1421 (2019)
Ke, Y., Cheng, J., Ng, W.: Correlation search in graph databases. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 390–399 (2007)
Ke, Y., Cheng, J., Ng, W.: Efficient correlation search from graph databases. IEEE Trans. Knowl. Data Eng. 20(12), 1601–1615 (2008)
Ke, Y., Cheng, J., Yu, J.X.: Top-k correlative graph mining. In: Proceedings of the 2009 SIAM International Conference on Data Mining (SDM 2009), pp. 1038–1049 (2009)
Lee, E.S.A., Fung, S., Sze-To, H.Y., Wong, A.K.C.: Discovering co-occurring patterns and their biological significance in protein families. BMC Bioinformatics 15(S2), 13 (2014)
Liang, Y., Zhao, P.: Similarity search in graph databases: a multi-layered indexing approach. In: Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE 2017), pp. 783–794 (2017)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K.: Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55(14), 6582–6594 (2012)
Nguyen, D.D., Wei, G.W.: AGL-score: algebraic graph learning score for protein-ligand binding scoring, ranking, docking, and screening. J. Chem. Inf. Model. 59(7), 3291–3304 (2019)
Nguyen, V.K.T., Jacquemard, C., Rognan, D.: LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60(9), 4263–4273 (2020)
Onizuka, M., Fujimori, T., Shiokawa, H.: Graph partitioning for distributed graph processing. Data Sci. Eng. 2(1), 94–105 (2017)
Prateek, A., Khan, A., Goyal, A., Ranu, S.: Mining top-k pairs of correlated subgraphs in a large network. Proc. VLDB Endowm. 13(9), 1511–1524 (2020)
Reynolds, H.T.: The Analysis of Cross-classifications. The Free Press, New York (1977)
Riesen, K., Emmenegger, S., Bunke, H.: A novel software toolkit for graph edit distance computation. In: Proceedings of the 9th International Workshop on Graph-Based Representations in Pattern Recognition (GbRPR 2013), pp. 142–151 (2013)
Shiokawa, H., Amagasa, T., Kitagawa, H.: Scaling fine-grained modularity clustering for massive graphs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019), pp. 4597–4604 (2019)
Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI 2013) (2013)
Shiokawa, H., Fujiwara, Y., Onizuka, M.: SCAN++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. Proc. VLDB Endowm. 8(11), 1178–1189 (2015)
Shiokawa, H., Takahashi, T.: DSCAN: distributed structural graph clustering for billion-edge graphs. In: Proceedings of the 31st International Conference on Database and Expert Systems Applications (DEXA 2020), pp. 38–54 (2020)
Wang, X., Ding, X., Tung, A.K., Ying, S., Jin, H.: An efficient graph indexing method. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), pp. 210–221 (2012)
Yagi, R., Shiokawa, H.: Fast top-k similar sequence search on DNA databases. In: Proceedings of the 24th International Conference on Information Integration and Web Intelligence (iiWAS 2022), pp. 145–150 (2022)
Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. Proc. VLDB Endowm. 2(1), 25–36 (2009)
Zhao, X., Xiao, C., Lin, X., Wang, W., Ishikawa, Y.: Efficient processing of graph similarity queries with edit distance constraints. VLDB J. 22(6), 727–752 (2013)
Zhao, X., Xiao, C., Lin, X., Zhang, W., Wang, Y.: Efficient structure similarity searches: a partition-based approach. VLDB J. 27(1), 53–78 (2018)
Acknowledgements
This work was partly supported by JST PRESTO (JPMJPR2033) and JSPS KAKENHI (JP22K17894).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Naoi, Y., Shiokawa, H. (2023). Boosting Similar Compounds Searches via Correlated Subgraph Analysis. In: Delir Haghighi, P., et al. Information Integration and Web Intelligence. iiWAS 2023. Lecture Notes in Computer Science, vol 14416. Springer, Cham. https://doi.org/10.1007/978-3-031-48316-5_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-48316-5_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48315-8
Online ISBN: 978-3-031-48316-5
eBook Packages: Computer ScienceComputer Science (R0)