Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Output space sampling for graph patterns

Published: 01 August 2009 Publication History

Abstract

Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on Metropolis-Hastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.

References

[1]
C. Bilgin, C. Demir, C. Nagi, and B. Yener. Cell-graph mining for breast tissue modeling and analysis. In IEEE Engineering in Medicine and Biology Society, 2007.
[2]
M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In IEEE Int'l Conf. on Data Mining, 2008.
[3]
I. Bordino, D. Donato, A. Gionis, and S. Leonardi. Mining Large Networks with Subgraph Counting. In Proc. of ICDM, 2008.
[4]
B. Bringmann, A. Zimmermann, L. Raedt, and S. Nijssen. Don't be afraid of simpler pattern. In Proc. of PKDD Conference, 2006.
[5]
C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In In Proc. of Neural Information Processing Systems (NIPS), 2006.
[6]
V. Chakravarthy, V. Pandit, and Y. Sabharwal. Analysis of Sampling Techniques for Association Rule Mining. In Proc. of 12th International Conf. on Database Theory, 2009.
[7]
V. Chaoji, M. Hasan, S. Salem, J. Besson, and M. Zaki. ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining, 1(2):67--84, June 2008.
[8]
V. Chaoji, M. Hasan, S. Salem, and M. Zaki. An Integrated, Generic Approach to Pattern Mining: Data Mining Template Library. Data Mining and Knowledge Discovery Journal, 17(3):457--495, 2008.
[9]
B. Chen, P. Hass, and P. Scheuermann. A new Two-Phase Sampling based Algorithm for discovering Association Rules. In SIGKDD Proceedings, pages 462--468, 2002.
[10]
R. K. Chung. Spectral Graph Theory. Americal Mathematical Society, 1997.
[11]
V. Guruswami. Rapidly mixing markov chains: A comparison of techniques. Technical report, MIT Laboratory of Computer Science, 2000.
[12]
M. A. Hasan and M. Zaki. Musk: Uniform sampling of k maximal patterns. In SIAM Data Mining, 2009.
[13]
J. Huan, W. Wang, D. B, J. Snoeyink, J. Prins, and A. Tropsha. Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs. In Proc. of RECOMB, 2004.
[14]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003.
[15]
J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. In SIGKDD, 2004.
[16]
C. Hubler, H. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In Proc. of ICDM, 2008.
[17]
R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In KDD '08: Proc. of SIGKDD, pages 399--407, 2008.
[18]
S. Kramer, L. Raedt, and C. Helma. Molecular feature Mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001.
[19]
M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, 2001.
[20]
L. Li, W. Fu, F. Guo, T. Mowry, and C. Faloutsos. Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs. In Proc. of SIGKDD, 2008.
[21]
S. Morishita and J. Sese. Traversing Itemset Lattice with Statistical Metric Pruning. PODS, pages 226--236, 2000.
[22]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[23]
M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In SIAM Int'l Conf. on Data Mining, 2009.
[24]
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In KDD Proceedings. ACM, 2004.
[25]
R. Y. Rubinstein and D. K. Kroese. Simulation and the Monte Carlo Method, 2nd Ed. John Wiley & Sons, 2008.
[26]
A. Sinclair. Algorithms for Random Generation and Counting. BirkHauser, 1992.
[27]
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32--41, 2002.
[28]
H. Toivonen. Sampling Large Databases for Association Rules. In VLDB Proceedings, pages 134--145, 1996.
[29]
J. R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of ACM, 23(1):31--42, 1976.
[30]
S. Vishwanathan, K. Borgwardt, and N. Schraudolph. Fast computation of graph kernels. In In Proc. of Neural Information Processing Systems (NIPS), 2006.
[31]
C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 730--735. ACM, 2006.
[32]
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 709--720. VLDB Endowment, 2005.
[33]
X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing Itemset Patterns: A Profile-Based Approach. In SIGKDD, 2005.
[34]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining Significant Graph Patterns by Leap Search. In SIGMOD Proceedings. ACM, 2008.
[35]
X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In ICDM, 2002.
[36]
X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, New York, NY, USA, 2003. ACM.
[37]
S. Zhang, X. Wu, C. Zhang, and J. Lu. Computing the Minimum-Support for Mining Frequent Patterns. Knowledge and Information Systems, 15(2):233--257, 2008.

Cited By

View all
  • (2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
  • (2024)Conscious points and patterns extraction: a high-performance computing model for knowledge discovery in cognitive IoTThe Journal of Supercomputing10.1007/s11227-024-06348-780:17(24871-24907)Online publication date: 1-Nov-2024
  • (2024)Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networksJournal of Intelligent Information Systems10.1007/s10844-024-00866-962:5(1455-1492)Online publication date: 1-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 2, Issue 1
August 2009
1293 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009
Published in PVLDB Volume 2, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
  • (2024)Conscious points and patterns extraction: a high-performance computing model for knowledge discovery in cognitive IoTThe Journal of Supercomputing10.1007/s11227-024-06348-780:17(24871-24907)Online publication date: 1-Nov-2024
  • (2024)Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networksJournal of Intelligent Information Systems10.1007/s10844-024-00866-962:5(1455-1492)Online publication date: 1-Oct-2024
  • (2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 1-Jul-2024
  • (2024)GraphRPM: Risk Pattern Mining on Industrial Large Attributed GraphsMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70381-2_9(133-149)Online publication date: 8-Sep-2024
  • (2022)SampleMineProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569658(185-197)Online publication date: 8-Oct-2022
  • (2022)Semantics and Anomaly Preserving Sampling Strategy for Large-Scale Time Series DataACM/IMS Transactions on Data Science10.1145/35119182:4(1-25)Online publication date: 30-Mar-2022
  • (2021)Reservoir Pattern Sampling in Data StreamsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-030-86486-6_21(337-352)Online publication date: 13-Sep-2021
  • (2020)Identifying exceptional (dis)agreement between groupsData Mining and Knowledge Discovery10.1007/s10618-019-00665-934:2(394-442)Online publication date: 1-Mar-2020
  • (2020)Sequential pattern sampling with norm-based utilityKnowledge and Information Systems10.1007/s10115-019-01417-362:5(2029-2065)Online publication date: 1-May-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media