Abstract
Community detection is one of the most important issues in modern network science. Although numerous community detection algorithms have been proposed during the past decades, how to assess the statistical significance of one single community analytically and exactly still remains an open problem. In this paper, we present an analytical solution to calculate the exact p-value of a single community with the Erdös–Rényi model. Meanwhile, we propose a local search method for finding statistically significant communities based on the p-value minimization. Experimental results on both real networks and simulated networks demonstrate that our method is able to effectively detect true communities from different types of networks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, pp 36–43
Aldecoa R, Marín I (2011) Deciphering network community structure by surprise. PLoS ONE 6(9):e24195
Bickel PJ, Sarkar P (2016) Hypothesis testing for automated community detection in networks. J R Stat Soc Ser B (Stat Methodol) 78(1):253–273
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Carissimo A, Cutillo L, De Feis I (2018) Validation of community robustness. Comput Stat Data Anal 120:1–24
Chakraborty T, Srinivasan S, Ganguly N, Mukherjee A, Bhowmick S (2014) On the permanence of vertices in network communities. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1396–1405
Chakraborty T, Dalmia A, Mukherjee A, Ganguly N (2017) Metrics for community analysis: a survey. ACM Comput Surv 50(4):54
Chang YT, Pantazis D, Leahy RM (2012) Assessing statistical significance when partitioning large-scale brain networks. In: 2012 9th IEEE international symposium on biomedical imaging (ISBI), pp 1759–1762
Chen K, Lei J (2018) Network cross-validation for determining the number of communities in network data. J Am Stat Assoc 113(521):241–251
Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140
Cutillo L, Signorelli M (2017) An inferential procedure for community structure validation in networks. arXiv:1710.06611
Durrett R (2007) Random graph dynamics. Cambridge University Press, Cambridge
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174
Fortunato S, Hric D (2016) Community detection in networks: a user guide. Phys Rep 659:1–44
Gao C, Lafferty J (2017a) Testing for global network structure using small subgraph statistics. arXiv:1710.00862
Gao C, Lafferty J (2017b) Testing network structure using relations between small subgraph probabilities. arXiv:1704.06742
Ghosh S, Banerjee A, Sharma N, Agarwal S, Ganguly N, Bhattacharya S, Mukherjee A (2011) Statistical analysis of the Indian railway network: a complex network approach. Acta Phys Polonica B Proc Suppl 4(2):123–138
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
He Z, Liang H, Chen Z, Zhao C (2018) Detecting statistically significant communities. arXiv:1806.05602
Hu Y, Nie Y, Yang H, Cheng J, Fan Y, Di Z (2010) Measuring the significance of community structure in complex networks. Phys Rev E 82(6):066106
Karrer B, Levina E, Newman ME (2008) Robustness of community structure in networks. Phys Rev E 77(4):046119
Kloumann IM, Kleinberg JM (2014) Community membership identification from small seed sets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1366–1375
Kojaku S, Masuda N (2018) A generalised significance test for individual communities in networks. Sci Rep 8(1):7351
Koyutürk M, Szpankowski W, Grama A (2007) Assessing significance of connectivity and conservation in protein interaction networks. J Comput Biol 14(6):747–764
Krebs V (2013) Social network analysis software & services for organizations, communities, and their consultants. http://www.orgnet.com
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110
Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015
Lancichinetti A, Radicchi F, Ramasco JJ (2010) Statistical significance of communities in networks. Phys Rev E 81(4):046110
Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLoS ONE 6(4):e18961
Li Y, Shang Y, Yang Y (2017) Clustering coefficients of large networks. Inf Sci 382:350–358
Li Y, He K, Kloster K, Bindel D, Hopcroft J (2018) Local spectral clustering for overlapping community detection. ACM Trans Knowl Discov Data (TKDD) 12(2):17
Liu X, Cheng HM, Zhang ZY (2019) Evaluation of community detection methods. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2019.2911943
Miyauchi A, Kawase Y (2015) What is a network community? A novel quality function and detection algorithms. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1471–1480
Miyauchi A, Kawase Y (2016) Z-score-based modularity for community detection in networks. PLoS ONE 11(1):e0147805
Newman M (2018) Networks, 2nd edn. Oxford University Press, Oxford
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818
Palowitch J (2019) Computing the statistical significance of optimized communities in networks. Sci Rep 9(1):18444
Palowitch J, Bhamidi S, Nobel AB (2018) Significance-based community detection in weighted networks. J Mach Learn Res 18(188):1–48
Peel L, Larremore DB, Clauset A (2017) The ground truth about metadata and community detection in networks. Sci Adv 3(5):e1602548
Perry MB, Michaelson GV, Ballard MA (2013) On the statistical detection of clusters in undirected networks. Comput Stat Data Anal 68:170–189
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci USA 101(9):2658–2663
Reichardt J, Bornholdt S (2006) When are networks truly modular? Physica D 224(1–2):20–26
Saldana DF, Yu Y, Feng Y (2017) How many communities are there? J Comput Graph Stat 26(1):171–181
Sales-Pardo M, Guimera R, Moreira AA, Amaral LAN (2007) Extracting the hierarchical organization of complex systems. Proc Natl Acad Sci USA 104(39):15224–15229
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Spirin V, Mirny LA (2003) Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 100(21):12123–12128
Tokuda T (2018) Statistical test for detecting community structure in real-valued edge-weighted graphs. PLoS ONE 13(3):e0194079
Traag VA, Krings G, Van Dooren P (2013) Significant scales in community structure. Sci Rep 3(1):2930
Wang B, Phillips JM, Schreiber R, Wilkinson D, Mishra N, Tarjan R (2008) Spatial scan statistics for graph clustering. In: Proceedings of the 2008 SIAM international conference on data mining, pp 727–738
Whang JJ, Gleich DF, Dhillon IS (2013) Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on information and knowledge management, ACM, pp 2099–2108
Whang JJ, Gleich DF, Dhillon IS (2016) Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans Knowl Data Eng 28(5):1272–1284
Wilson JD, Wang S, Mucha PJ, Bhamidi S, Nobel AB et al (2014) A testing based extraction algorithm for identifying significant communities in networks. Ann Appl Stat 8(3):1853–1891
Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zhang P, Moore C (2014) Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc Natl Acad Sci USA 111(51):18144–18149
Acknowledgements
This work was partially supported by the Natural Science Foundation of China under Grant Nos. 61972066 and 61572094.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Evangelos Papalexakis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, Z., Liang, H., Chen, Z. et al. Computing exact P-values for community detection. Data Min Knowl Disc 34, 833–869 (2020). https://doi.org/10.1007/s10618-020-00681-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00681-0