Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Capturing topology in graph pattern matching

Published: 01 December 2011 Publication History

Abstract

Graph pattern matching is often defined in terms of subgraph isomorphism, an np-complete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubic-time. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubic-time algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using real-life data and synthetic data.

References

[1]
LinkedIn. www.linkedin.com.
[2]
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, 1999.
[3]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[4]
C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer, 2010.
[5]
S. Amer-Yahia, M. Benedikt, and P. Bohannon. Challenges in searching online communities. IEEE Data Eng. Bull., 30(2), 2007.
[6]
J. Brynielsson, J. Hogberg, L. Kaati, C. Martenson, and P. Svenson. Detecting social positions using simulation. In ASONAM, 2010.
[7]
N. Buchan and R. Croson. The boundaries of trust: own and others' actions in the US and China. Journal of Economic Behavior & Organization, 55(4), 2004.
[8]
D. Bustan and O. Grumberg. Simulation-based minimization. TOCL, 4(2), 2003.
[9]
D. Cavendish and K. S. Candan. Distributed XML processing: Theory and applications. J. Parallel Distrib. Comput., 68(8), 2008.
[10]
D. Chen and C. Y. Chan. Minimization of tree pattern queries with constraints. In SIGMOD, 2008.
[11]
G. Cong, W. Fan, and A. Kementsietsidis. Distributed query evaluation with performance guarantees. In SIGMOD, 2007.
[12]
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10), 2004.
[13]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, 2001.
[14]
G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal Complex Systems, 1695(1695), 2006.
[15]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
[16]
R. Diestel. Graph Theory. Springer, 2005.
[17]
A. Dovier and C. Piazza. The subgraph bisimulation problem. IEEE Trans. Knowl. Data Eng., 15(4), 2003.
[18]
W. Fan, J. Li, S. Ma, N. Tang, and Y. Wu. Adding regular expressions to graph reachability and pattern queries. In ICDE, 2011.
[19]
W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1), 2010.
[20]
B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS., 2006.
[21]
M. Giatsoglou, S. Papadopoulos, and A. Vakali. Massive graph management for the web and web 2.0. In New Directions in Web Data Management 1. Springer, 2011.
[22]
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997.
[23]
M. Grohe. From polynomial time queries to graph structure theory. In ICDT, 2010.
[24]
M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995.
[25]
V. Kann. On the approximability of the maximum common subgraph problem. In STACS, 1992.
[26]
D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 32(4), 2000.
[27]
C. Liu, C. Chen, J. Han, and P. S. Yu. Gplag: detection of software plagiarism by program dependence graph analysis. In KDD, 2006.
[28]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010.
[29]
R. Milner. Communication and Concurrency. Prentice Hall, 1989.
[30]
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
[31]
L. Terveen and D. McDonald. Social matching: A framework and research agenda. ACM Trans. Comput.-Hum. Interact., 12(3), 2005.
[32]
Y. Tian and J. M. Patel. Tale: A tool for approximate large graph matching. In ICDE, 2008.
[33]
H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast best-effort pattern matching in large attributed graphs. In KDD, 2007.
[34]
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1), 1976.
[35]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1), 2009.

Cited By

View all
  • (2024)IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00036(265-273)Online publication date: 24-Jun-2024
  • (2024)Scalable Diversified Top-k Pattern Matching in Big GraphsBig Data Research10.1016/j.bdr.2024.10046436:COnline publication date: 18-Jul-2024
  • (2024)Towards efficient simulation-based constrained temporal graph pattern matchingWorld Wide Web10.1007/s11280-024-01259-227:3Online publication date: 3-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 5, Issue 4
December 2011
120 pages

Publisher

VLDB Endowment

Publication History

Published: 01 December 2011
Published in PVLDB Volume 5, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00036(265-273)Online publication date: 24-Jun-2024
  • (2024)Scalable Diversified Top-k Pattern Matching in Big GraphsBig Data Research10.1016/j.bdr.2024.10046436:COnline publication date: 18-Jul-2024
  • (2024)Towards efficient simulation-based constrained temporal graph pattern matchingWorld Wide Web10.1007/s11280-024-01259-227:3Online publication date: 3-Apr-2024
  • (2023)Recommending Orchestration Plan for Space-Ground Integration Information Network: A Subgraph Matching Approach2023 IEEE Smart World Congress (SWC)10.1109/SWC57546.2023.10448994(1-8)Online publication date: 28-Aug-2023
  • (2022)Flexible application-aware approximation for modern distributed graph processing frameworksProceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3534540.3534693(1-10)Online publication date: 12-Jun-2022
  • (2022)Mnemonic: A Parallel Subgraph Matching System for Streaming Graphs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00038(313-323)Online publication date: May-2022
  • (2022)Adaptive Partitioning for Large-Scale Graph Analytics in Geo-Distributed Data Centers2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00256(2818-2830)Online publication date: May-2022
  • (2022)Distributed Partial Simulation for Graph Pattern MatchingThe Computer Journal10.1093/comjnl/bxac16167:1(110-126)Online publication date: 21-Nov-2022
  • (2022)Distributed graph pattern matching via bounded dual simulationInformation Sciences: an International Journal10.1016/j.ins.2022.08.038610:C(549-570)Online publication date: 1-Sep-2022
  • (2022)A Twig-Based Algorithm for Top-k Subgraph Matching in Large-Scale Graph DataBig Data Research10.1016/j.bdr.2022.10035030:COnline publication date: 28-Nov-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media