research-article

On effective presentation of graph patterns: a structural representative approach

Authors:

Cindy Xide Lin,

Jiawei HanAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 299 - 308

https://doi.org/10.1145/1458082.1458124

Published: 26 October 2008 Publication History

Abstract

In the past, quite a few fast algorithms have been developed to mine frequent patterns over graph data, with the large spectrum covering many variants of the problem. However, the real bottleneck for knowledge discovery on graphs is neither efficiency nor scalability, but the usability of patterns that are mined out. Currently, what the state-of-art techniques give is a lengthy list of exact patterns, which are undesirable in the following two aspects: (1) on the micro side, due to various inherent noises or data diversity, exact patterns are usually not too useful in many real applications; and (2) on the macro side, the rigid structural requirement being posed often generates an excessive amount of patterns that are only slightly different from each other, which easily overwhelm the users.

In this paper, we study the presentation problem of graph patterns, where structural representatives are deemed as the key mechanism to make the whole strategy effective. As a solution to fill the usability gap, we adopt a two-step smoothing-clustering framework, with the first step adding error tolerance to individual patterns (the micro side), and the second step reducing output cardinality by collapsing multiple structurally similar patterns into one representative (the macro side). This novel, integrative approach is never tried in previous studies, which essentially rolls-up our attention to a more appropriate level that no longer looks into every minute detail. The above framework is general, which may apply under various settings and incorporate a lot of extensions. Empirical studies indicate that a compact group of informative delegates can be achieved on real datasets and the proposed algorithms are both efficient and scalable.

References

[1]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994.

Digital Library

[2]

R. J. Bayardo. Efficiently mining long patterns from databases. In SIGMOD Conference, pages 85--93, 1998.

Digital Library

[3]

C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In ICDM, pages 51--58, 2002.

Digital Library

[4]

C. Chen, X. Yan, F. Zhu, and J. Han. gapprox: Mining frequent approximate patterns from a massive network. In ICDM, pages 445--450, 2007.

Digital Library

[5]

M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8):1036--1050, 2005.

Digital Library

[6]

J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In ICDM, pages 211--218, 2002.

Digital Library

[7]

M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. Origami: Mining representative orthogonal graph patterns. In ICDM, pages 153--162, 2007.

Digital Library

[8]

D. S. Hochbaum, editor. Approximation Algorithms for NP-Hard Problems. PWS Publishing, 1997.

Digital Library

[9]

J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004.

Digital Library

[10]

L. Kaufman and P. J. Rousseeuw, editors. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, 1990.

[11]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001.

Digital Library

[12]

M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. In SDM, 2004.

[13]

Y. Liu, J. Li, and H. Gao. Summarizing graph patterns. In ICDE, pages 903--912, 2008.

Digital Library

[14]

T. Mielikäinen and H. Mannila. The pattern ordering problem. In PKDD, pages 327--338, 2003.

[15]

R. T. Ng and J. Han. Clarans: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5):1003--1016, 2002.

Digital Library

[16]

S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004.

Digital Library

[17]

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416, 1999.

Digital Library

[18]

R. Sharan, S. Suthram, R. M. Kelley, T. Kuhn, S. McCuine, P. Uetz, T. Sittler, R. M. Karp, and T. Ideker. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974--1979, 2005.

[19]

C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD, pages 730--735, 2006.

Digital Library

[20]

D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005.

Digital Library

[21]

X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In KDD, pages 314--323, 2005.

Digital Library

[22]

X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003.

Digital Library

[23]

X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335--346, 2004.

Digital Library

Cited By

Jazayeri AYang C(2021)Frequent Subgraph Mining Algorithms in Static and Temporal Graph-Transaction Settings: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2021.3072001(1-1)Online publication date: 2021
https://doi.org/10.1109/TBDATA.2021.3072001
Rehman SAsghar S(2020)Online social network trend discovery using frequent subgraph miningSocial Network Analysis and Mining10.1007/s13278-020-00682-310:1Online publication date: 11-Aug-2020
https://doi.org/10.1007/s13278-020-00682-3
Rehman SAsghar SFong S(2018)Optimized and Frequent Subgraphs: How Are They Related?IEEE Access10.1109/ACCESS.2018.28466046(37237-37249)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2846604
Show More Cited By

Index Terms

On effective presentation of graph patterns: a structural representative approach
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Scalable mining of large disk-based graph databases
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Mining frequent structural patterns from graph databases is an interesting problem with broad applications. Most of the previous studies focus on pruning unfruitful search subspaces effectively, but few of them address the mining on large, disk-based ...
Discovering frequent topological structures from graph datasets
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

The problem of finding frequent patterns from graph-based datasets is an important one that finds applications in drug discovery, protein structure analysis, XML querying, and social network analysis among others. In this paper we propose a framework to ...
Summarizing Graph Patterns
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Several efficient frequent subgraph mining algorithms have been recently proposed. However, the number of frequent graph patterns generated by these graph mining algorithms may be too large to be effectively explored by users, especially when the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
456
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jazayeri AYang C(2021)Frequent Subgraph Mining Algorithms in Static and Temporal Graph-Transaction Settings: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2021.3072001(1-1)Online publication date: 2021
https://doi.org/10.1109/TBDATA.2021.3072001
Rehman SAsghar S(2020)Online social network trend discovery using frequent subgraph miningSocial Network Analysis and Mining10.1007/s13278-020-00682-310:1Online publication date: 11-Aug-2020
https://doi.org/10.1007/s13278-020-00682-3
Rehman SAsghar SFong S(2018)Optimized and Frequent Subgraphs: How Are They Related?IEEE Access10.1109/ACCESS.2018.28466046(37237-37249)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2846604
Bei YLin ZChen D(2016)Summarizing scale-free networks based on virtual and real linksPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2015.08.048444(360-372)Online publication date: Feb-2016
https://doi.org/10.1016/j.physa.2015.08.048
Bindu PThilagam P(2016)Mining social networks for anomaliesJournal of Network and Computer Applications10.1016/j.jnca.2016.02.02168:C(213-229)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1016/j.jnca.2016.02.021
Dhifli WNguifo E(2015)Motif Discovery in Protein 3D‐Structures using Graph Mining TechniquesPattern Recognition in Computational Molecular Biology10.1002/9781119078845.ch10(165-189)Online publication date: 18-Dec-2015
https://doi.org/10.1002/9781119078845.ch10
Kutty SNayak RTran T(2013)A Study of XML Models for Data MiningData Mining10.4018/978-1-4666-2455-9.ch001(1-27)Online publication date: 2013
https://doi.org/10.4018/978-1-4666-2455-9.ch001
Liu ZJin RCheng HYu J(2013)Frequent subgraph summarization with error controlProceedings of the 14th international conference on Web-Age Information Management10.1007/978-3-642-38562-9_1(1-12)Online publication date: 14-Jun-2013
https://dl.acm.org/doi/10.1007/978-3-642-38562-9_1
Kutty SNayak RTran T(2012)A Study of XML Models for Data MiningXML Data Mining10.4018/978-1-61350-356-0.ch001(1-28)Online publication date: 2012
https://doi.org/10.4018/978-1-61350-356-0.ch001
Lin CJi MDanilevsky MHan JKotoulas SZeng YHuang Z(2012)Efficient mining of correlated sequential patterns based on null hypothesisProceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning10.1145/2389656.2389660(17-24)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2389656.2389660
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten