Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1367497.1367591acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Statistical properties of community structure in large social and information networks

Published: 21 April 2008 Publication History

Abstract

A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the "best" possible community - according to the conductance measure - over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large real-world networks than has been appreciated previously.
Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually "blend in" with the rest of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community structure similar to our observations.

References

[1]
R. Z. Albert and A.-L. Barabási. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.
[2]
Christopher Allen. Life with alacrity: The Dunbar number as a limit to group sizes, http://www.lifewithalacrity.com/2004/03/the_dunbar_numb.html, 2004.
[3]
R. Andersen, F. R. K. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 475--486, 2006.
[4]
S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. In STOC '04: Proceedings of the 36th annual ACM Symposium on Theory of Computing, pages 222--231, 2004.
[5]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 44--54, 2006.
[6]
F. R. K. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. AMS, 1997.
[7]
F. R. K. Chung and L. Lu. Complex Graphs and Networks, volume 107 of CBMS Regional Conference Series in Mathematics. AMS, 2006.
[8]
A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. arXiv:cond-mat/0408187, August 2004.
[9]
L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 29(09):P09008, 2005.
[10]
I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944--1957, 2007.
[11]
Robin Dunbar. Grooming, Gossip, and the Evolution of Language. Harvard Univ Press, October 1998.
[12]
A. D. Flaxman, A. M. Frieze, and J. Vera. A geometric preferential attachment model of networks. In WAW '04: Proceedings of the 3rd Workshop On Algorithms And Models For The Web-Graph, pages 44--55, 2004.
[13]
M. Gaertler. Clustering. In U. Brandes and T. Erlebach, editors, Network Analysis: Methodological Foundations, pages 178--215. Springer, 2005.
[14]
J. Gehrke, P. Ginsparg, and J. Kleinberg. Overview of the 2003 KDD Cup. SIGKDD Explorations, 5(2), 2003.
[15]
S. Hoory, N. Linial, and A. Wigderson. Expander graphs and their applications. Bulletin of the American Mathematical Society, 43:439--561, 2006.
[16]
R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Jour. of the ACM, 51(3), 2004.
[17]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20:359--392, 1998.
[18]
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000.
[19]
K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. In IPCO '04: Proceedings of the 10th International Conf. on Integer Programming and Combinatorial Optimization, 2004.
[20]
T. Leighton and S. Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM, 46(6):787--832, 1999.
[21]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD ?05: Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 177--187, 2005.
[22]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transact. on Knowledge Discovery from Data, 1(1), 2007.
[23]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. Manuscript.
[24]
M. E. J. Newman. Detecting community structure in networks. The European Physical J. B, 38:321--330, 2004.
[25]
M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E, 74, 2006.
[26]
M. E. J. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103(23):8577--8582, 2006.
[27]
E. Ravasz and A.-L. Barabási. Hierarchical organization in complex networks. Physical Review E, 67:026112, 2003.
[28]
M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In ISWC ?03: Proceedings of the 2nd International Semantic Web Conference, pages 351--368, 2003.
[29]
M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Computing, 6(1):50--57, 2002.
[30]
S. E. Schaeffer. Graph clustering. Computer Science Review, 1(1):27--64, 2007.
[31]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations of Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.
[32]
D. A. Spielman and S.-H. Teng. Spectral partitioning works: Planar graphs and finite element meshes. In FOCS '96: Proceedings of the 37th Annual IEEE Symposium on Foundations of Computer Science, pages 96--107, 1996.
[33]
S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the internet topology. In GLOBECOM ?01: Global Telecommunications Conference, pages 1667--1671, 2001.
[34]
D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature, 393:440--442, 1998.
[35]
W. W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33:452--473, 1977.
[36]
C. T. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, C-20(1):68--86, 1971.
[37]
Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55:311--331, 2004.

Cited By

View all
  • (2024)Size-Constrained Community Search on Large Networks: An Effective and Efficient SolutionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328048336:1(356-371)Online publication date: 1-Jan-2024
  • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
  • (2024)GBTM: Community Detection and Network Reconstruction for Noisy and Time-evolving DataInformation Sciences10.1016/j.ins.2024.121069(121069)Online publication date: Jun-2024
  • Show More Cited By

Index Terms

  1. Statistical properties of community structure in large social and information networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '08: Proceedings of the 17th international conference on World Wide Web
    April 2008
    1326 pages
    ISBN:9781605580852
    DOI:10.1145/1367497
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 April 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. community structure
    2. conductance
    3. graph partitioning
    4. random walks
    5. social networks

    Qualifiers

    • Research-article

    Conference

    WWW '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Size-Constrained Community Search on Large Networks: An Effective and Efficient SolutionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328048336:1(356-371)Online publication date: 1-Jan-2024
    • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
    • (2024)GBTM: Community Detection and Network Reconstruction for Noisy and Time-evolving DataInformation Sciences10.1016/j.ins.2024.121069(121069)Online publication date: Jun-2024
    • (2024)Quantifying intragroup and intergroup connections in non-disjoint groups in social networksInformation Sciences: an International Journal10.1016/j.ins.2024.120624670:COnline publication date: 1-Jun-2024
    • (2024)A nonnegative Gumbel-based encoder–decoder approach for community detectionInternational Journal of Information Technology10.1007/s41870-024-01854-6Online publication date: 28-Apr-2024
    • (2024)Pure expansion-based local community detectionInternational Journal of Data Science and Analytics10.1007/s41060-024-00602-018:3(317-335)Online publication date: 29-Jul-2024
    • (2024)An empirical framework for event prediction in massive datasetsInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02302-115:7(2880-2901)Online publication date: 10-Apr-2024
    • (2024)A New Model for Preferential Attachment Scheme with Time-Varying ParametersJournal of Statistical Physics10.1007/s10955-024-03304-w191:7Online publication date: 19-Jul-2024
    • (2024)Wiser than the Wisest of Crowds: The Asch Effect and Polarization RevisitedMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70362-1_26(440-458)Online publication date: 22-Aug-2024
    • (2023)Theoretical bounds on the network community profile from low-rank semi-definite programmingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618976(13976-13992)Online publication date: 23-Jul-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media