Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1557019.1557101acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Scalable graph clustering using stochastic flows: applications to community discovery

Published: 28 June 2009 Publication History

Abstract

Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multi-level algorithm for graph clustering using flows that delivers significant improvements in both quality and speed. The graph is first successively coarsened to a manageable size, and a small number of iterations of flow simulation is performed on the coarse graph. The graph is then successively refined, with flows from the previous graph used as initializations for brief flow simulations on each of the intermediate graphs. When we reach the final refined graph, the algorithm is run to convergence and the high-flow regions are clustered together, with regions without any flow forming the natural boundaries of the clusters. Extensive experimental results on several real and synthetic datasets demonstrate the effectiveness of our approach when compared to state-of-the-art algorithms.

Supplementary Material

JPG File (p737-satuluri.jpg)
MP4 File (p737-satuluri.mp4)

References

[1]
V. Arnau, S. Mars, and I. Marin. Iterative Cluster analysis of protein interaction data. Bioinformatics, 21(3):364--78, 2005.
[2]
S. Brohee and J. van Helden. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7, 2006.
[3]
D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006.
[4]
I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors A Multilevel Approach. IEEE Trans. Pattern Anal. Mach. Intell., 29(11):1944--1957, 2007.
[5]
S. V. Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000.
[6]
S. Dutt and W. Deng. Cluster-aware iterative improvement techniques for partitioning large VLSI circuits. ACM Trans. Des. Autom. Electron. Syst., 7(1):91--121, 2002.
[7]
C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In KDD '04, pages 118--127, New York, NY, USA, 2004. ACM.
[8]
R. Kannan, S. Vempala, and A. Veta. On clusterings-good, bad and spectral. In FOCS '00, page 367, Washington, DC, USA, 2000. IEEE Computer Society.
[9]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20, 1999.
[10]
B. Kernighan and S. Lin. An Efficient Heuristic Procedure for partitioning graphs. The Bell System Technical J., 49, 1970.
[11]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. CoRR, abs/0810.1355, 2008.
[12]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08, pages 695--704, New York, NY, USA, 2008. ACM.
[13]
L. Li, C. J. Stoeckert, and D. S. Roos. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 13(9):2178--2189, September 2003.
[14]
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb 2004.
[15]
V. Satuluri and S. Parthasarathy. Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery. Technical Report OSU-CISRC-4/09-TR10, The Ohio State University.
[16]
R. Sharan, I. Ulitsky, and R. Shamir. Network-based prediction of protein function. Molecular Systems Biology, 3, 2007.
[17]
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000.
[18]
The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics, 25:25--29, 2000.
[19]
K. Tsuda. Propagating distributions on a hypergraph by dual information regularization. In ICML, pages 920--927, 2005.

Cited By

View all
  • (2024)Detecting Communities Using Network Embedding and Graph Clustering ApproachSoft Computing and Signal Processing10.1007/978-981-99-8451-0_27(311-325)Online publication date: 17-Feb-2024
  • (2023)A Novel Epitope Dataset: Performance of the MCL-Based Algorithms to Generate Dataset for Graph Learning ModelEngineering Innovations10.4028/p-8a27xd4(37-46)Online publication date: 15-Feb-2023
  • (2023)Multi-view Graph Clustering via Efficient Global-Local Spectral Embedding FusionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612190(3268-3276)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. Scalable graph clustering using stochastic flows: applications to community discovery

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
      June 2009
      1426 pages
      ISBN:9781605584959
      DOI:10.1145/1557019
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 June 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. communities
      3. graphs
      4. networks

      Qualifiers

      • Research-article

      Conference

      KDD09

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Detecting Communities Using Network Embedding and Graph Clustering ApproachSoft Computing and Signal Processing10.1007/978-981-99-8451-0_27(311-325)Online publication date: 17-Feb-2024
      • (2023)A Novel Epitope Dataset: Performance of the MCL-Based Algorithms to Generate Dataset for Graph Learning ModelEngineering Innovations10.4028/p-8a27xd4(37-46)Online publication date: 15-Feb-2023
      • (2023)Multi-view Graph Clustering via Efficient Global-Local Spectral Embedding FusionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612190(3268-3276)Online publication date: 26-Oct-2023
      • (2023)Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging ProblemIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2022.319184820:3(1700-1712)Online publication date: 1-May-2023
      • (2023)A multi-level generative framework for community detection in attributed networksJournal of Complex Networks10.1093/comnet/cnad02011:3Online publication date: 7-Jun-2023
      • (2023)Algebraic multiscale grid coarsening using unsupervised machine learning for subsurface flow simulationJournal of Computational Physics10.1016/j.jcp.2023.112570(112570)Online publication date: Oct-2023
      • (2023)Analyzing and Comparing Omicron Lineage Variants Protein–Protein Interaction Network Using Centrality MeasureSN Computer Science10.1007/s42979-023-01685-54:3Online publication date: 30-Mar-2023
      • (2023)Discrete Improved Grey Wolf Optimizer for Community DetectionJournal of Bionic Engineering10.1007/s42235-023-00387-120:5(2331-2358)Online publication date: 18-May-2023
      • (2022)Community Detection in Graph: An Embedding MethodIEEE Transactions on Network Science and Engineering10.1109/TNSE.2021.31303219:2(689-702)Online publication date: 1-Mar-2022
      • (2022)Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304763134:10(4781-4796)Online publication date: 1-Oct-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media