Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Defining and evaluating network communities based on ground-truth

Published: 01 January 2015 Publication History

Abstract

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100 million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods.

References

[1]
Abrahao BD, Soundarajan S, Hopcroft JE, Kleinberg R (2012) On the separability of structural classes of communities. In KDD '12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 624---632
[2]
Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 466:761---764
[3]
Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. In FOCS '06: proceedings of the 47th annual IEEE symposium on foundations of computer science, pp 475---486
[4]
Andersen R, Lang K (2006) Communities from seed sets. In: WWW '06 proceedings of the 15th international conference on, World Wide Web, pp 223---232
[5]
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth and evolution. In KDD '06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 44---54
[6]
Danon L, Duch J, Diaz-Guilera A, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 29(09):P09008
[7]
Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944---1957
[8]
Feld SL (1981) The focused organization of social ties. Am J Sociol 86(5):1015---1035
[9]
Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In KDD '00: proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 150---160
[10]
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3---5):75---174
[11]
Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Nat Acad Sci USA 104(1):36---41
[12]
Girvan M, Newman M (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99(12):7821---7826
[13]
Gleich DF, Seshadhri C (2012) Neighborhoods are good communities. In KDD '12: proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 597---605
[14]
Granovetter MS (1973) The strength of weak ties. Am J Sociol 78:1360---1380
[15]
Kairam S, Wang D, Leskovec J (2012) The life and death of online groups: predicting group growth and longevity. In WSDM '12: ACM international conference on web search and data mining
[16]
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359---392
[17]
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Q 2:83---97
[18]
Leskovec J, Adamic L, Huberman B (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5
[19]
Leskovec J, Lang K, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In WWW '10: proceedings of the 19th international conference on World Wide Web
[20]
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29---123
[21]
Lin W, Kong X, Yu PS, Wu Q, Jia Y, Li C (2012) Community detection in incomplete information networks. In WWW '12: proceedings of the 21st international conference on, World Wide Web, pp 341---350
[22]
Meil' M (2005) Comparing clusterings: an axiomatic view. In ICML '05: proceedings of the 22nd international conference on machine learning. New York, NY, USA, pp 577---584
[23]
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In IMC '07: proceedings of the 7th ACM SIGCOMM conference on internet, measurement, pp 29---42
[24]
Newman M (2006) Modularity and community structure in networks. Proc Nat Acad Sci USA 103(23):8577---8582
[25]
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
[26]
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814---818
[27]
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Nat Acad Sci USA 101(9):2658---2663
[28]
Ren Y, Kraut R, Kiesler S (2007) Applying common identity and bond theory to design of online communities. Organ Stud 28(3):377---408
[29]
Schaeffer S (2007) Graph clustering. Comp Sci Rev 1(1):27---64
[30]
Shi C, Yu PS, Cai Y, Yan Z, Wu B (2011) On selection of objective functions in multi-objective community detection. In CIKM '11: proceedings of the 20th ACM international conference on information and, knowledge management, pp 2301---2304
[31]
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888---905
[32]
Spielman D, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04: proceedings of the 36th annual ACM symposium on theory of computing, pp 81---90
[33]
Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In KDD '09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797---806
[34]
von Luxburg U (2010) Clustering stability: an overview. Found Trends Mach Learn 2(3):235---274
[35]
Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393:440---442
[36]
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state of the art and comparative study. ACM Comput Surv 45(4). Art no 43
[37]
Yang J, Leskovec J (2012) Community-affiliation graph model for overlapping network community detection. In ICDM '12: proceedings of the 2012 IEEE international conference on data mining, pp 1170---1175
[38]
Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In ICDM '12: proceedings of the 2012 IEEE international conference on data mining, pp 745---754
[39]
Yang J, Leskovec J (2013) Overlapping community detection at scale: a non-negative factorization approach. In WSDM '13: proceedings of the sixth ACM international conference on web search and data mining, pp 587---596
[40]
Yang J, Leskovec J (2013) Structure and overlaps of communities in networks. ACM Trans Intell Syst Technol (to appear)

Cited By

View all
  • (2024)NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment10.14778/3659437.365945317:8(1995-2008)Online publication date: 1-Apr-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • (2024)Local Community Detection in Multiple Private NetworksACM Transactions on Knowledge Discovery from Data10.1145/364407818:5(1-21)Online publication date: 10-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 42, Issue 1
January 2015
242 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2015

Author Tags

  1. Community detection
  2. Community scoring function
  3. Ground-truth communities
  4. Modularity
  5. Network communities

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous EnvironmentsProceedings of the VLDB Endowment10.14778/3659437.365945317:8(1995-2008)Online publication date: 1-Apr-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • (2024)Local Community Detection in Multiple Private NetworksACM Transactions on Knowledge Discovery from Data10.1145/364407818:5(1-21)Online publication date: 10-Feb-2024
  • (2024)Contigra: Graph Mining with Containment ConstraintsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629589(50-65)Online publication date: 22-Apr-2024
  • (2024)Parallel k-Core Decomposition with Batched Updates and Asynchronous ReadsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638508(286-300)Online publication date: 2-Mar-2024
  • (2024)OsirisBFT: Say No to Task Replication for Scalable Byzantine Fault Tolerant AnalyticsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638468(94-108)Online publication date: 2-Mar-2024
  • (2024)Densest Subhypergraph: Negative Supermodular Functions and Strongly Localized MethodsProceedings of the ACM Web Conference 202410.1145/3589334.3645624(881-892)Online publication date: 13-May-2024
  • (2024)Vertex Encoding for Edge Nonexistence Determination With SIMD AccelerationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335091936:7(3600-3614)Online publication date: 9-Jan-2024
  • (2024)A Hierarchical Block Distance Model for Ultra Low-Dimensional Graph RepresentationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330434436:4(1399-1412)Online publication date: 1-Apr-2024
  • (2024)Enhancing Locally Adaptive Smoothing of Graph Neural Networks Via Laplacian Node DisagreementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.330321236:3(1099-1112)Online publication date: 1-Mar-2024
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media