Adaptive clustering algorithm for community detection in complex networks

Zhenqing Ye, Songnian Hu, and Jun Yu

Phys. Rev. E 78, 046115 – Published 30 October 2008

Abstract

Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality.

Received 19 June 2008

DOI:https://doi.org/10.1103/PhysRevE.78.046115

Authors & Affiliations

Zhenqing Ye¹, Songnian Hu^1,2, and Jun Yu^1,2,*

¹James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou, China
²CAS Key Laboratory of Genome Sciences and Information, Beijing Institute Genomics, Chinese Academy of Sciences, Beijing, China

^*junyu@big.ac.cn

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand

Issue

Vol. 78, Iss. 4 — October 2008

Reuse & Permissions

Access Options

Author publication services for translation and copyediting assistance advertisement

Images

Figure 1
(Color online) A illustration of adaptive vertex movements according to the force rules. For the square vertex, we can calculate its forces and obtain the results as follows: $F_{out}^{(A)} = 0.333 > F_{in}^{(C)} = 0.074 > F_{out}^{(B)} = - 0.147$ , and it should move to module A (arrowed line). Although a similar situation happens to the triangular vertex, we should take care of the influence from the first motion of the square vertex as it may alter the forces of the triangular vertex. Moreover, we note that the motion of the triangular vertex reduces the total number of modules. Such a self-organization process happens to all vertices iteratively until they become stabilized.Reuse & Permissions
Figure 2
(Color online) A sample network for a merit illustration of AdClust when competing against Newman-fast algorithm. The two nodes connected through edge $E$ are grouped together first by the agglomerative procedure due to their maximal increase of $Q$ , and remain together to form an isolated group ultimately. However, it is clearly seen that $E$ may belong to different groups. In the adaptive clustering algorithm, the two nodes will be separated into their respective groups according to the simple force rules $(F_{out} > F_{in})$ , leading to an optimal partition.Reuse & Permissions
Figure 3
(Color) Performance tests with artificial networks. In these computer-generated networks, four modules are predisposed according to various parameters described in the text. Each data point is an average over 100 graphs. As $k_{out}$ grows bigger, the boundary of module structure becomes more complicated. (a) When $k_{in} = 14$ and $k_{out} = 2$ . (b) When $k_{in} = 8$ , $k_{out} = 8$ . (c) The fraction of nodes correctly classified by the AdClust. Newman-fast algorithm was used as a reference. (d) The average modularity obtained at each point. The theoretical value corresponds to the four predefined community structures.Reuse & Permissions
Figure 4
(Color online) Robustness tests of AdClust. Due to the stochastic feature of AdClust, we performed 100 runs over the network as described in Fig. 3b to check for consistency among different partitions. (a) Fractions of timing when two nodes are classified into the same module indicated with a different gray scheme from black (low) to white (high). The modular structure is clearly revealed, and most pairs of nodes are either always classified into the same module (white) or never classified into the same module (black), which suggests that the solution is robust. (b) The modularity value $Q$ for each partition corresponds to each run. The horizontal solid line represents the mean value of total runs, and the dotted line represents the $Q$ value based on the Newman-fast algorithm as a reference.Reuse & Permissions
Figure 5
(Color online) A test case on the Zachary network. The administrator and the instructor are defined as nodes 1 and 33, respectively. AdClust divides this network into four parts as indicated with four different shapes, yielding a $Q$ value of 0.4198. The two groups separated by the dashed line are consistent with disruption among members in reality.Reuse & Permissions
Figure 6
Clustering for the football network. The different shapes represent 12 conferences in reality, and the ten circles indicate modules found by AdClust. The team members are homologous among most modules.Reuse & Permissions

Physical Review E

covering statistical, nonlinear, biological, and soft matter physics