Abstract
A large body of research has focused on efficient and scalable processing of subgraph search queries on large networks. In these efforts, a query is posed in the form of a connected query graph. Unfortunately, in practice end users may not always have precise knowledge about the topological relationships between nodes in a query graph to formulate a connected query. In this paper, we present a novel graph querying paradigm called partial topology-based network search and propose a query processing framework called panda to efficiently process partial topology query (ptq) in a single machine. A ptq is a disconnected query graph containing multiple connected query components. ptqs allow an end user to formulate queries without demanding precise information about the complete topology of a query graph. To this end, we propose an exact and an approximate algorithm called sen-panda and po-panda, respectively, to generate top-k matches of a ptq. We also present a subgraph simulation-based optimization technique to further speedup the processing of ptqs. Using real-life networks with millions of nodes, we experimentally verify that our proposed algorithms are superior to several baseline techniques.
Similar content being viewed by others
Notes
As we shall see later, our solution framework can easily handle overlapping cases by mapping it to a Steiner tree problem.
References
Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Torque: topology-free querying of protein interaction networks. Nucl. Acids Res. 37(2), 106–108 (2009)
Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Topology-free querying of protein interaction networks. J. Comput. Biol. 17(3), 237–252 (2010)
Buchan, N., Croson, R.: The boundaries of trust: own and others actions in the US and china. J. Econ. Behav. Organ. 55(4), 485–504 (2004)
Cordella, L., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. Pattern Anal. Mach. Intell. IEEE Trans. 26(10), 1367–1372 (2004)
Ding, B., Xu Yu, J., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)
Duin, C., Volgenant, A., Voß, S.: Solving group steiner problems as steiner problems. Eur. J. Oper. Res. 154(1), 323–329 (2004)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. VLDB 3(1–2), 264–275 (2010)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y.: Adding regular expressions to graph reachability and pattern queries. In: ICDE (2011)
Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. In: PVLDB (2010)
Fernández, M.-L., Valiente, G.: A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognit. Lett. 22(6–7), 753–758 (2001)
Han, W.-S., Lee, J., Lee, J.-H.: TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD (2013)
He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: SIGMOD, pp. 305–316 (2007)
Helvig, C.S., Robins, G., Zelikovsky, A.: An improved approximation scheme for the group steiner problem. Networks 37(1), 8–20 (2001)
Henzinger, M.R., Henzinger, T., Kopke, P.: Computing simulations on finite and infinite graphs. In: FOCS (1995)
Ihler, E.: Bounds on the quality of approximate solutions to the group steiner problem. In: Graph-Theoretic Concepts in Computer Science, pp. 109–118 (1991)
Karp, R.M.: Reducibility Among Combinatorial Problems. Springer, Berlin (1972)
Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: NeMa: fast graph search with label similarity. VLDB 6(3), 181–192 (2013)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection (2014)
Ma, S., Cao, Y., Fan, W., Huai, J., Wo, T.: Strong simulation: Capturing topology in graph pattern matching, vol. 39. In: TODS (2014)
Morsey, M., Lehmann, J., Auer, S., Ngomo, A.-C.N.: DBpedia SPARQL benchmark-performance assessment with real queries on real data. In: ISWC, volume 7031 of LNCS, pp. 454–469. Springer, Berlin (2011)
Pearl, J.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: AAAI (1982)
Pinter, R.Y., Shachnai, H., Zehavi, M.: Partial information network queries. J. Discrete Algorithms 31, 129–145 (2015)
Pinter, R.Y., Shachnai, H., Zehavi, M.: Improved parameterized algorithms for network query problems. In: Parameterized and Exact Computation, pp. 294–306. Springer (2014)
Shang, H., Lin, X., Zhang, Y., Yu, J. X., Wang, W.: Connected substructure similarity search. In: SIGMOD, pp. 903–914 (2010)
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. In: PVLDB (2012)
Tian, Y., Patel, J.M.: TALE: a tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)
Xie, Y., Yu, P.S.: CP-index: on the efficient indexing of large graphs. In: CIKM (2011)
Yang, S., Wu, Y., Sun, H., Yan, X.: Schemaless and structureless graph querying. VLDB 7(7), 565–576 (2014)
Yuan, Y., Wang, G., Xu, J. Y., Chen, L.: Efficient distributed subgraph similarity matching. VLDB J. 24(3), 369–394 (2010)
Zhang, S., Yang, J., Jin, W.: SAPPER: subgraph indexing and approximate matching in large graphs. VLDB 3, 1185–1194 (2010)
Zeng, Z., Tung, A. K. H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. In: VLDB (2009)
Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: EDBT (2009)
Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: TreeSpan: efficiently computing similarity all-matching. In: SIGMOD, pp. 529–540 (2012)
Acknowledgements
Qing Wang is supported by the National Natural Science Foundation of China under grants 61432001, 91318301, 91218302.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was primarily done when the first author was visiting Nanyang Technological University.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Xie, M., Bhowmick, S.S., Cong, G. et al. PANDA: toward partial topology-based search on large networks in a single machine. The VLDB Journal 26, 203–228 (2017). https://doi.org/10.1007/s00778-016-0447-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-016-0447-0