Abstract
As the massive size of contemporary social networks poses a serious challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities, we develop, in this manuscript, an approach used to discover hierarchical community structure in large networks. The introduced hybrid technique combines the strengths of bottom-up hierarchical clustering method with that of top-down hierarchical clustering. In fact, the first approach is efficient in identifying small clusters, while the second one is good at determining large ones. Our mixed hierarchical clustering technique, based on the assumption that there exists an initial solution composed of k classes and the combination of the two previously mentioned methods, does not the change of the number of partitions, modifies the repartition of the initial classes. At the end of the introduced clustering process, a fixed point, representing a local optimum of the cost function which measures the degree of importance between two partitions, is obtained. Consequently, the introduced combined model leads to the emergence of local community structure. To avoid this local optimum and detect community structure converged to the global optimum of the cost function, the detection of community structures, in this study, is not considered only as a clustering problem, but as an optimization issue. Besides, a novel mixed hierarchical clustering algorithm based on swarms intelligence is suggested for identifying community structures in social networks. In order to validate the proposed method, performances of the introduced approach are evaluated using both real and artificial networks as well as internal and external clustering evaluation criteria.
Similar content being viewed by others
Abbreviations
- SHC:
-
Similarity-based hierarchical community
- HAMUHI-CODE:
-
Heuristic algorithm for multi-scale hierarchical community detection
- PMAC:
-
Partial matrix approximation convergence
- SN:
-
Social network
- JS:
-
Jaccard similarity measure
- AgA :
-
Agglomerative algorithm
- DST:
-
Dependence similarity table
- AHL:
-
Ascendant hierarchical level
- DivA:
-
Divisive algorithm
- DHL:
-
Descendant hierarchical level
- MHA:
-
Mixed hierarchical algorithm
- T-D-H-L:
-
Top-down hierarchical level
- B-U-H-L:
-
Bottom-up hierarchical level
- MHAS:
-
Mixed hierarchical algorithm-based swarms
- AntCDivA:
-
Ant colony-based divisive algorithm
- BeeCAgA:
-
Bee colony-based agglomerative algorithm
- LFR benchmark:
-
Lancichinetti Fortunato Radicchi benchmark
- CEC:
-
Cross-entropy clustering
- NMI:
-
Normalized mutual information
- DBI:
-
Davies–Bouldin index
- PGP:
-
Pretty good privacy
- SI:
-
Swarm intelligence
- \(Q_\mathrm{comb}\) :
-
Combined modularity function
- \(Q_\mathrm{comb}\) :
-
Separated modularity function
- \(\mathrm{SN} = (V; E; \mu )\) :
-
Graph modeling SN
- V :
-
Nodes representing to social network members
- E :
-
Edges modeling the relationship between social network members
- \(\mu \) :
-
Weight of edges
- n :
-
Number of nodes
- \(\ell \) :
-
Hierarchical level
- k :
-
Number of sub-detected partitions at each hierarchical level
- \(P=\{p_{1},p_{2},\ldots ,p_{s}\}\), \(G=\{g_{1},g_{2},\ldots ,g_{r}\}\), \(C=\{c_{1},c_{2},\ldots ,c_{s}\}\) :
-
SN detected partitions
- \(p_{1},p_{2},\ldots ,p_{s}\), \(g_{1},g_{2},\ldots ,g_{r}\), \(c_{1},c_{2},\ldots ,c_{s}\) :
-
Sub-partitions
- m :
-
Social network members’
- D :
-
Any element contained in SN partitions
- A[i, j]:
-
The adjacency matrix of SN
- \(\overline{A}{[}i{]}\) :
-
Average of the vector A[i]
- cov(\(E_{i,j}\)):
-
Covariance function
- Op(\(V_{i}\)):
-
Extracted opinions from the node\(V_{i}\)
- Op(\(V_{j}\)):
-
Extracted opinions from the node\(V_{j}\).
- \(N_{i}\) :
-
Neighbor of node i
- \(N_{j}\) :
-
Neighbor of node j
- \(Score_{importantOp}\) :
-
Function measuring the degree of importance of nodes
- \(GScore_{importantOp}\) :
-
General \(GScore_{importantOp}\)
- \(MoyScore_{importantOp}\) :
-
Average of \(Score_{importantOp}\) of sub-partitions
- Initpart:
-
Initial partition
- cordMin:
-
Function returning m having the least \(Score_{importantOp}\) value
- cordMax:
-
Function returning m having the highest \(Score_{importantOp}\) value
- \(Q_{DS}\) :
-
Dependance similarity-based modularity
- \(AgQ_{DS}\) :
-
\(Q_{DS}\) function for BeeCAgA
- \(DivQ_{DS}\) :
-
\(Q_{DS}\) function for AntCDivA
- \(MixQ_{DS}\) :
-
\(Q_{DS}\) function for MHAS
- E :
-
Energy function
References
Aggarwal CC (2011) An introduction to social network data analytics. In: Social network data analytics. Springer, Berlin, pp 1–15
Ahn JP, Bagrow Y-Y, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 446:761
Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761
Ahn YY, Lehmann S, Bagrow JP (2009) Communities and hierarchical organization of links in complex networks. arXiv:0903.3178
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Boguná M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E 70(5):056122
Cai Q, Ma L, Gong M, Tian D (2016) A survey on network community detection based on evolutionary computation. Int J Bio Inspir Comput 8(2):84–98
Castrillo E, Leon E, Gomez J (2017) Fast heuristic algorithm for multi-scale hierarchical community detection. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 982–989
Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Danon L, DÃaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09), P09008. Retrieved from http://stacks.iop.org/1742-5468/2005/i=09/a=P09008
Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104
Dutta S, Ghatak S, Roy M, Ghosh S, Das AK (2015) A graph based clustering technique for tweet summarization. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions), pp 1–6
Fortunato S (2011) Benchmark graphs to test community detection algorithms. https://sites.google.com/site/santofortunato/inthepress2)
Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41
Fortunato S (2007) Community detection in graphs. Phys Rep 486:75–174
Frenken K, Mendritzki S (2012) Optimal modularity: a demonstration of the evolutionary advantage of modular architectures. J Evol Econ 22(5):935–956
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Gonzalez-Pardo A, Jung JJ, Camacho D (2017) Aco-based clustering for ego network analysis. Fut Gener Comput Syst 66:160–170
Guimera R, Sales-Pardo M, Amaral LAN (2007) Module identification in bipartite and directed networks. Retrieved from http://arxiv.org/abs/physics/0701151 (cite arXiv:physics/0701151)
Gulbahce N, Lehmann S (2008) The art of community detection. BioEssays 30(10):934–938
Harrington J, Salibián-Barrera M (2010) Finding approximate solutions to combinatorial problems with very large data sets using birch. Comput Stat Data Anal 54(3):655–667
Herrmann S, Ochoa G, Rothlauf F (2016) Communities of local optima as funnels in fitness landscapes. In: Proceedings of the genetic and evolutionary computation conference 2016, pp 325–331
John Lu Z (2010) The elements of statistical learning: data mining, inference, and prediction. J R Stat Soc Ser A (Stat Soc) 173(3):693–694
Kim B, Kim J, Yi G (2017) Analysis of clustering evaluation considering features of item response data using data mining technique for setting cut-off scores. Symmetry 9(5):62
Kim Y, Son S-W, Jeong H (2010) Finding communities in directed networks. Phys Rev E 81(1):016103
Li Y, He K, Bindel D, Hopcroft J (2015) Overlapping community detection via local spectral clustering. arXiv preprint arXiv:1509.07996
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Assoc Inf Sci Technol 58(7):1019–1031
Liu Y, Wang Q, Wang Q, Yao Q, Liu Y (2007) Email community detection using artificial ant colony clustering. In: Advances in web and network technologies, and information management. Springer, Berlin, pp 287–298
LIU Y, YANG T, FU L, LIU J (2015) Community detection in networks based on information bottleneck clustering. J Comput Inf Syst 11(2):693–700
Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405
Mathias SB, Rosset V, Nascimento M (2016) Community detection by consensus genetic-based algorithm for directed networks. Proc Comput Sci 96:90–99
Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl Based Syst 84:144–161
Newman M (2004) Detecting community structure in networks. Eur Phys J 38:321–330
Newman ME (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Newman ME (2006b) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Papadopoulos KYVAS, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24:515–554
Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and information sciences-ISCIS 2005. Springer, Berlin, pp 284–293
Ratkiewicz J, Conover M, Meiss MR, Goncalves B, Flammini, A., Menczer F (2011) Detecting and tracking political abuse in social media. In: ICWSM11, pp 297–304
Ravasz E, Barabasi A-L (2003) Hierarchical organization in complex networks. Phys Rev E67(2):026112
Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417
Richardson T, Mucha PJ, Porter MA (2009) Spectral tripartitioning of networks spectral tripartitioning of networks. Phys Rev E 80(3):036111
Rosset V, Paulo MA, Cespedes JG, Nascimento M (2017) Enhancing the reliability on data delivery and energy efficiency by combining swarm intelligence and community detection in large-scale WSNs. Exp Syst Appl 78:89–102
Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci 104(18):7327–7331
Soumi D, Roy M, Ghosh S, Das AK, Sujata. (n.d.). A graph based clustering technique for tweet summarization, pp 4673–7231
Spurek P (2017) Split-and-merge tweak in cross entropy clustering. In: Computer information systems and industrial management: 16th IFIP TC8 international conference, CISIM 2017, Bialystok, Poland, June 16–18, 2017, proceedings, vol 10244, p 193
Staudt CL, Meyerhenke H (2016) Engineering parallel algorithms for community detection in massive networks. IEEE Trans Paral Distrib Syst 27(1):171–184
Talbi M (2013) Une nouvelle approche de detection de communautes dans les reseaux sociaux (Unpublished doctoral dissertation). Universite du Quebec en Outaouais
Toujani R, Akaichi J (2017) Fuzzy sentiment classification in social network Facebook’statuses mining. In: 2017 international conference on information and digital technologies (IDT), pp 393–397
Toujani R, Akaichi J (2015) Machine learning and metaheuristic for sentiment analysis in social networks. In: Proceedings of the metaheuristic internatianal conference (MIC’15)
Toujani R, Akaichi J (2017) Optimal initial partitionning for high quality hybrid hierarchical community detection in social networks. In Proceedings of the international conference on control, decision and information technologies (\({\rm {codit}}^{TM}\)17)
Van Laarhoven T, Marchiori E (2016) Local network community detection with continuous optimization of conductance and weighted kernel k-means. J Mach Learn Res 17(147):1–28
Wang Z, Li Z, Yuan G, Sun Y, Rui X, Xiang X (2018) Tracking the evolution of overlapping communities in dynamic social networks. Knowl Based Syst 157:81–97
Wu J, Hou Y, Jiao Y, Li Y, Li X, Jiao L (2015) Density shrinking algorithm for community detection with path based similarity. Phys A Stat Mech Appl 433:218–228
Xi J, Zhan W, Wang Z (2016) Hierarchical community detection algorithm based on node similarity. Int J Database Theory Appl 9(6):209–218
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv (CSUR) 45(4):43
Xu L, Dong-Yun Y (2011) Complex network community detection by local similarity. Acta Autom Sin 37(12):1520–1529
Yang Z, Algesheimer R, Tessone CJ (2016) A comparative analysis of community detection algorithms on artificial networks. Sci Rep 6:30750
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zhang W, Kong F, Yang L, Chen Y, Zhang M (2018) Hierarchical community detection based on partial matrix convergence using random walks. Tsinghua Sci Technol 1:004
Zhi-Xiao W, Ze-chao L, Xiao-fang D, Jin-hui T (2016) Overlapping community detection based on node location analysis. Knowl Based Syst 105:225–235
Zhou C, Feng L, Zhao Q (2018) A novel community detection method in bipartite networks. Phys A Stat Mech Appl 492:1679–1693
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Toujani, R., Akaichi, J. An approach based on mixed hierarchical clustering and optimization for graph analysis in social media network: toward globally hierarchical community structure. Knowl Inf Syst 60, 907–947 (2019). https://doi.org/10.1007/s10115-019-01329-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01329-2