Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3110025.3110030acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficiently Clustering Very Large Attributed Graphs

Published: 31 July 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.

    References

    [1]
    L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. Pics: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM, pages 439--450. SIAM, 2012.
    [2]
    L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four degrees of separation. In Proceedings of the 4th Annual ACM Web Science Conference, pages 33--42. ACM, 2012.
    [3]
    A. Baroni, A. Conte, M. Patrignani, and S. Ruggieri. Efficiently clustering very large attributed graphs. CoRR, abs/1703.08590, 2017.
    [4]
    A. Baroni and S. Ruggieri. Segregation discovery in a social network of companies. In IDA, pages 37--48. Springer, 2015.
    [5]
    P. Boldi, M. Rosa, and S. Vigna. Hyperanf: Approximating the neighbourhood function of very large graphs on a budget. In WWW, pages 625--634. ACM, 2011.
    [6]
    C. Bothorel, J. D. Cruz, M. Magnani, and B. Micenková. Clustering attributed graphs: models, measures and methods. Network Science, 3(03):408--444, 2015.
    [7]
    J. G. Bruhn. The sociology of community connections. Springer Science & Business Media, 2011.
    [8]
    J. Cao, S. Wang, F. Qiao, H. Wang, F. Wang, and S. Y. Philip. User-guided large attributed graph clustering with multiple sparse annotations. In PAKDD, pages 127--138. Springer, 2016.
    [9]
    A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Phys rev E, 70(6):066111, 2004.
    [10]
    E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In PODS, pages 225--234. ACM, 2007.
    [11]
    D. Combe, C. Largeron, E. Egyed-Zsigmond, and M. Géry. Combining relations and text in scientific network clustering. In ASONAM, pages 1248--1253. IEEE, 2012.
    [12]
    D. Combe, C. Largeron, M. Géry, and E. Egyed-Zsigmond. I-louvain: An attributed graph clustering method. In IDA, pages 181--192. Springer, 2015.
    [13]
    A. Conte, R. Grossi, and A. Marino. Clique covering of large real-world networks. In ACM Symposium on Applied Computing, SAC '16, pages 1134--1139. ACM, 2016.
    [14]
    M. Coscia, F. Giannotti, and D. Pedreschi. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining, 4(5):512--546, 2011.
    [15]
    P. Crescenzi, R. Grossi, L. Lanzi, and A. Marino. A comparison of three algorithms for approximating the distance distribution in real-world graphs. In TAPAS, pages 92--103. Springer, 2011.
    [16]
    R. Diestel. Graph theory. 2005. Grad. Texts in Math, 2005.
    [17]
    Y. Ding. Community detection: Topological vs. topical. Journal of Informetrics, 5(4):498--514, 2011.
    [18]
    K.-C. Duong, C. Vrain, et al. A filtering algorithm for constrained clustering with within-cluster sum of dissimilarities criterion. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pages 1060--1067. IEEE, 2013.
    [19]
    S. Günnemann, B. Boden, and T. Seidl. Db-csc: a density-based approach for subspace clustering in graphs with feature vectors. In Machine Learning and Knowledge Discovery in Databases, pages 565--580. Springer, 2011.
    [20]
    I. Guy, N. Zwerdling, D. Carmel, I. Ronen, E. Uziel, S. Yogev, and S. Ofek-Koifman. Personalized recommendation of social software items based on social relations. In RecSys, pages 53--60, New York, NY, USA, 2009. ACM.
    [21]
    J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann Publ. Inc., 3rd edition, 2011.
    [22]
    T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE T Pattern Anal, 24(7):881--892, 2002.
    [23]
    S. P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--137, 1982.
    [24]
    D. W. McMillan and D. M. Chavis. Sense of community: A definition and theory. J Community Psychol, 14(1):6--23, 1986.
    [25]
    M.M. and E. Deza. Encyclopedia of distances. Springer, 2009.
    [26]
    I. B. Mohamad and D. Usman. Standardization and its effects on k-means clustering algorithm. Res J Appl Sci Eng Technol, 6(17):3299--3303, 2013.
    [27]
    A. Papadopoulos, D. Rafailidis, G. Pallis, and M. D. Dikaiakos. Clustering attributed multi-graphs with information ranking. In DEXA, pages 432--446. Springer, 2015.
    [28]
    B. Perozzi, L. Akoglu, P. Iglesias Sánchez, and E. Müller. Focused clustering and outlier detection in large attributed graphs. In ACM SIGKDD, pages 1346--1355. ACM, 2014.
    [29]
    M. H. Protter and C. B. Morrey. College calculus with analytic geometry. Addison-Wesley, 1977.
    [30]
    P. I. Sánchez, E. Müller, K. Böhm, A. Kappes, T. Hartmann, and D. Wagner. Efficient algorithms for a robust modularity-driven clustering of attributed graphs. In SDM. SIAM, 2015.
    [31]
    J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503, 2011.
    [32]
    N. Villa-Vialaneix, M. Olteanu, and C. Cierco-Ayrolles. Carte auto-organisatrice pour graphes étiquetés. In Atelier Fouilles de Grands Graphes (FGG)-EGC, pages Article--numéro, 2013.
    [33]
    U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007.
    [34]
    D. J. Watts. Networks, dynamics, and the small-world phenomenon 1. Am J Sociol, 105(2):493--527, 1999.
    [35]
    Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. A model-based approach to attributed graph clustering. In SIGMOD, pages 505--516. ACM, 2012.
    [36]
    Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. Gbagc: A general bayesian framework for attributed graph clustering. TKDD, 9(1):5, 2014.
    [37]
    J. Yang, J. McAuley, and J. Leskovec. Community detection in networks with node attributes. In ICDM, pages 1151--1156. IEEE, 2013.
    [38]
    Y. Zhou, H. Cheng, and J. X. Yu. Clustering large attributed graphs: An efficient incremental approach. In ICDM, pages 689--698. IEEE, 2010.

    Cited By

    View all
    • (2024)A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networksSoft Computing10.1007/s00500-024-09648-5Online publication date: 29-Feb-2024
    • (2023)Top-k Distance Queries on Large Time-Evolving GraphsIEEE Access10.1109/ACCESS.2023.331660211(102228-102242)Online publication date: 2023
    • (2023)Community Detection Algorithms in Healthcare Applications: A Systematic ReviewIEEE Access10.1109/ACCESS.2023.326065211(30247-30272)Online publication date: 2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASONAM '17: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
    July 2017
    698 pages
    ISBN:9781450349932
    DOI:10.1145/3110025
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 July 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ASONAM '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 116 of 549 submissions, 21%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networksSoft Computing10.1007/s00500-024-09648-5Online publication date: 29-Feb-2024
    • (2023)Top-k Distance Queries on Large Time-Evolving GraphsIEEE Access10.1109/ACCESS.2023.331660211(102228-102242)Online publication date: 2023
    • (2023)Community Detection Algorithms in Healthcare Applications: A Systematic ReviewIEEE Access10.1109/ACCESS.2023.326065211(30247-30272)Online publication date: 2023
    • (2023)Learning attribute and homophily measures through random walksApplied Network Science10.1007/s41109-023-00558-38:1Online publication date: 27-Jun-2023
    • (2023)Learning Attribute Distributions Through Random WalksComplex Networks and Their Applications XI10.1007/978-3-031-21131-7_2(17-29)Online publication date: 26-Jan-2023
    • (2022)Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304763134:10(4781-4796)Online publication date: 1-Oct-2022
    • (2022)VALKYRIE: a suite of topology-aware clustering approaches for cloud-based virtual network servicesThe Journal of Supercomputing10.1007/s11227-022-04786-979:3(3298-3328)Online publication date: 2-Sep-2022
    • (2021)Overlapping Community Detection Based on Attribute Augmented GraphEntropy10.3390/e2306068023:6(680)Online publication date: 28-May-2021
    • (2021)Integrating Prior Knowledge in Mixed-Initiative Social Network ClusteringIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.303034727:2(1775-1785)Online publication date: Feb-2021
    • (2021)X-Mark: a benchmark for node-attributed community discovery algorithmsSocial Network Analysis and Mining10.1007/s13278-021-00823-211:1Online publication date: 15-Oct-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media