A Qualitative Survey on Community Detection Attack Algorithms

Tekin, Leyla; Bostanoğlu, Belgin Ergenç

doi:10.3390/sym16101272

Open AccessReview

A Qualitative Survey on Community Detection Attack Algorithms

by

Leyla Tekin

and

Belgin Ergenç Bostanoğlu

^*

Department of Computer Engineering, Izmir Institute of Technology, 35433 Izmir, Turkey

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(10), 1272; https://doi.org/10.3390/sym16101272

Submission received: 31 August 2024 / Revised: 20 September 2024 / Accepted: 24 September 2024 / Published: 26 September 2024

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

Community detection enables the discovery of more connected segments of complex networks. This capability is essential for effective network analysis. But, it raises a growing concern about the disclosure of user privacy since sensitive information may be over-mined by community detection algorithms. To address this issue, the problem of community detection attacks has emerged to subtly perturb the network structure so that the performance of community detection algorithms deteriorates. Three scales of this problem have been identified in the literature to achieve different levels of concealment, such as target node, target community, or global attack. A broad range of community detection attack algorithms has been proposed, utilizing various approaches to tackle the distinct requirements associated with each attack scale. However, existing surveys of the field usually concentrate on studies focusing on target community attacks. To be self-contained, this survey starts with an overview of community detection algorithms used on the other side, along with the performance measures employed to evaluate the effectiveness of the community detection attacks. The core of the survey is a systematic analysis of the algorithms proposed across all three scales of community detection attacks to provide a comprehensive overview. The survey wraps up with a detailed discussion related to the research opportunities of the field. Overall, the main objective of the survey is to provide a starting and diving point for scientists.

Keywords:

community hiding; community detection attack; target node attack; target community attack; global attack

1. Introduction

Complex networks exist in almost every aspect of our daily lives, including social networks to understand human interactions like friendships, professional connections, and shared interests, transportation networks of cities and countries to provide better service, population networks to track and manage the spread of disease, and biological networks revealing gene and protein interactions to provide advances in genomics. Such complex networks are modeled by graph structures consisting of a set of nodes and edges. Nodes refer to entities in the network, and edges refer to the connections between them.

A community is a group of members who are closely connected, whereas they are loosely connected to other members in the network. Community detection aims to identify community structures, hence providing insights into the dynamics and organization in the network. Various algorithms have been developed from diverse disciplines for community detection. Some of its applications are given in [1]; these include identifying groups of users with similar political views and activities on social media, detecting terrorist groups with the analysis of friend-of-a-friend connections within social networks, improving retail strategies, generating customer segments and providing personalized recommendations in shopping behavior data, discovering hidden connections between research papers, and predicting financial activities in the stock market.

As community detection algorithms have been improved, a new challenge has emerged, namely over-mining of knowledge [2]. People have become increasingly aware that these social network analysis tools can over-mine even their personal information, being meticulously recorded and stored through various internet platforms, such as identities, social circles, interests, and business relationships. Even if an individual prefers to keep such information private, it may still be revealed if details about other members of the same community are disclosed. This underscores a remedy to protect sensitive information, and the symmetrical concept of a community detection attack on the other side has become a crucial area of research [3]. A community detection attack is devised to intentionally modify the structure of a network, which in turn makes the performance of these detection algorithms worse and reduces the accuracy of their results.

A community detection attack is a privacy-preserving task that involves concealing or obfuscating communities by imperceptibly altering connections. Hence, it is also known as ‘community hiding’. It offers individuals or organizations a guideline for strategically managing their social relationships or adjusting their communication methods. For example, if some members of the community work for the same organization, it is likely that other community members are also affiliated with this organization. Such a data breach could cause consequences such as targeted advertising [4]. Another example is that some groups, including activists or police forces, may collaborate on social networks like Twitter or Facebook but do not want to be detected by detection tools, so they strategically manage their connections to avoid detection. Community detection attack techniques can be used maliciously, such as by terrorists seeking to communicate covertly. Thus, at the same time, it highlights the need for new community detection algorithms that can effectively counter (i.e., that are robust to) deception [5,6].

The problem of community detection attacks is categorized into three distinct scales, i.e., target node attack, target community attack, and global attack [7]. Specifically, they try to achieve different objectives by manipulating the structure of the graph through the modification of a minimal number of edges. A target node attack focuses on concealing or obscuring the community membership of individual node(s) within the network; a target community attack aims to hide a target community. A global attack seeks to alter the overall community structure of the network.

The field of community detection attacks is rapidly advancing and increasingly crucial for solving a variety of problems mentioned above. Numerous studies have been devoted to exploring different methods for these attacks. Existing surveys [8,9] have examined community detection attack algorithms at a particular scale, focusing on target community attacks. There has been a lack of comprehensive reviews that address community detection attacks across all three scales. This paper aims to bridge this gap by conducting a thorough examination of community detection attack algorithms, encompassing target node, target community, and global attacks, thereby providing a more complete understanding of this evolving domain for researchers.

1.1. Scope and Contributions of the Survey

To find the relevant literature on the algorithms proposed for community detection attacks, an exhaustive search has been performed using the keywords ‘community detection attack’ and ‘community hiding’. Following this, relevant papers that are aligned with the inclusion criteria of trustworthy sources have been carefully selected. The key contributions of this survey are outlined as follows.

The survey provides an overview of the community detection algorithms specially used by community detection attack algorithms.
Another overview is related to the evaluation measures of community detection attack algorithms.
An objective-based categorization of the community detection attack problem is introduced by Chen et al. [7], which classifies attacks into three distinct scales: target node, target community, and global attacks. While some surveys have addressed the target community attack scale, this survey is the first to examine all three scales comprehensively.
Existing surveys often overlook various paradigms, such as genetic algorithms. In contrast, this survey encompasses various methodologies, including genetic algorithms, heuristic approaches, and objective-based strategies. This expanded coverage provides a broader view of the techniques employed in community detection attacks.
The survey incorporates the recent algorithms for community detection attacks to present an up-to-date overview.
The survey tries to show future directions for the researchers of the field.

1.2. Outline of the Survey

The remainder of the survey is organized as follows. Section 2 gives the necessary preliminary information on community detection, community detection attacks, and measures employed in evaluating the performance of community detection attack algorithms. Section 3 presents the problem formulation and the existing attack algorithms for each attack scale. Section 4 provides a discussion and future directions. The final section concludes the survey.

2. Preliminaries

This section introduces the preliminaries required to understand the literature in the scope of the review. It starts by introducing the concept of community detection, types of community detection algorithms, and outstanding algorithms of each type in the first subsection. It then continues to explain the concept of the community detection attack, which is the other side of the coin. The last subsection gives the measures used in the performance evaluation of community detection attacks.

2.1. Community Detection

Complex networks intrinsically possess a community structure as one of their characteristics. A community (i.e., a group) within such a network is defined as a subset of nodes that exhibit strong connections among themselves, whereas they maintain relatively sparse connections with nodes outside this group. Community detection is the task of uncovering and identifying these coherent substructures within the network, thus being critical for understanding the underlying organization and structural patterns of interaction and gaining insights into the functional dynamics that emerge from these structures.

Definition 1

(Community Detection). Let a network

G = (V, E)

be a graph with V as the set of nodes (vertices) and E as the set of edges. The problem of community detection is to find the community structure, denoted by

\bar{C}

, that is, to divide the nodes in the set V into K communities

\bar{C} = {C_{1}, \dots, C_{K}}

, with

C_{i} \subseteq V

.

Networks can be conceptualized as static or dynamic, depending on whether their structure remains constant or evolves. In static networks, community detection focuses on identifying the community structure; however, when dealing with dynamic networks, they must consider the temporal evolution of communities to effectively capture and manage the dynamicity inherent in such networks. Since there is no attack specifically designed for community detection methods applied to dynamic networks, this study concentrates on static methods. Various community detection methods have been offered for static networks, which can be mainly categorized as traditional methods, spectral methods, optimization methods, statistical methods, and dynamics-driven methods [10,11].

Traditional methods encompass early approaches to community detection, such as partitional clustering, graph partitioning, hierarchical clustering, etc. Partitional clustering splits the network vertices into a predefined number of clusters (i.e., K) by optimizing a specific cost function based on the distances. The most renowned algorithms within this category are k-means and its numerous extensions [12,13]. Graph partitioning involves separating the vertices into g groups with a predetermined size, aiming to minimize the number of edges that connect distinct groups. The Kernighan–Lin algorithm [14] is a popular example of this category. The network may exhibit a hierarchical organization, characterized by multiple levels of clusters. Hierarchical clustering is broadly divided into two categories: agglomerative and divisive approaches. Girvan and Newman [15] offer a divisive algorithm that progressively removes the edge with the highest (shortest path) betweenness. Spectral methods leverage spectral properties of matrices associated with the graph (such as adjacency matrix or Laplacian matrix) to determine the community structure. The works referenced in [16,17,18,19] provide examples of spectral methods.

Optimization methods seek to maximize or minimize a specific function that represents the quality of a partitioning. The most widely used quality function is modularity. Maximizing modularity is proved to be NP-hard [20], leading to the development of various approximation algorithms, including greedy algorithms, genetic algorithms, or simulated annealing [21]. Newman [22] introduces a hierarchical agglomeration algorithm that maximizes modularity through a greedy approach. Clauset et al. [23] significantly enhance its efficiency by applying strategic optimizations and advanced data structures. Newman [24] offers refinement procedures aimed at enhancing the effectiveness of spectral optimization for modularity. The study [20] delves into the use of integer programming for community detection problems. Moreover, Blondel et al. [25] propose the prominent Louvain method based on modularity optimization. It consists of two phases that are repeated iteratively. The Leiden [26] and Combo [27] algorithms are also proposed in this category.

Statistical methods in community detection typically employ a probabilistic framework to fit the observed network data into a generative model. The stochastic block model (SBM) is the most common generative model for graphs with communities. Dynamics-driven methods utilize the behavior of dynamical processes, like random walks, spin dynamics, and synchronization, occurring on the network to detect communities. The idea of random walk-based methods is that vertices with similar properties are highly likely to be grouped within the same partition. Pons and Latapy [28] present the Walktrap method using a distance matrix constructed according to random walks between nodes. Rosvall and Bergstrom [29] introduce the Infomap algorithm that minimizes the description length for a random walk. The work [30] utilizes spin glass models from statistical mechanics by optimizing a Hamiltonian function. Furthermore, Raghavan et al. [31] offer a label propagation algorithm in which at the beginning each node is given a different label, and at every iteration, each node takes the label that most of its neighbors keep. The main advantages are its simplicity and time efficiency.

Communities can be either overlapping or disjoint (non-overlapping) communities. Overlapping communities allow nodes to be assigned to more than one community, reflecting the nature of the real world, where entities often join various groups simultaneously. Non-overlapping communities, on the other hand, are strictly divided, with each node being assigned to only one community. The Clique Percolation Method (CPM) [32] is a well-known example of uncovering overlapping communities with the concept of overlapping cliques, where each clique is a complete subgraph.

Up to this point, the focus has been on exploring conventional techniques for community detection since community detection attack algorithms try to deceive them. However, the field has seen notable progress, particularly in the parallelization of these conventional techniques to improve their practical applicability [33,34]. Additionally, deep learning has gained prominence, demonstrating significant success in this field [35].

2.2. Community Detection Attack

Community detection algorithms can be considerably disrupted by altering only a small percentage of links, with each link having a different role in maintaining the community structure. A community detection attack aims to mislead community detection algorithms and can be considered as a symmetrical operation on the other side of the wall. It involves strategically manipulating the network structure, such as adding new edges and/or removing the existing ones. The goal is to fool the detection algorithms into inaccurately identifying the community structure of the network, effectively obscuring the true community affiliations and interactions within the network. Hence, the attack algorithms seek to achieve obfuscation at the desired scale by rearranging the minimum number of connections.

Definition 2

(Budget). It refers to the maximum number of edge modifications that an attack algorithm is allowed to perform within the network. This constraint limits the extent of adversarial modifications that can be applied to the graph structure so as to ensure the attack remains imperceptible. Budget is denoted by β.

Definition 3

(Rewiring). It describes a particular edge operation in which an edge connected to a node is deleted and another edge is added to it, leaving the total degree of the node unchanged. It strategically changes the relationships of the node to other nodes while preserving its overall connectivity, making the attack not so obvious.

Figure 1 illustrates an overview of community detection attack. The community structure of an original graph is identified using a community detection algorithm. Subsequently, a community detection attack algorithm is applied to the graph that makes some update operations (e.g., edge deletions and/or edge additions). The graph is then re-analyzed using the community detection algorithm. There may be significant changes in the community structure. For instance, members from one community may move to different communities, or the total number of communities in the graph may change significantly. This figure is intended to provide an overview only; different attack algorithms may utilize different knowledge and take some additional knowledge as input. The knowledge each algorithm exploits is explained in Section 3.

2.3. Evaluation Measures for Community Detection Attacks

In this section, the prominent performance evaluation measures used to highlight the effectiveness of the community detection attacks are explained. Performance measures listed below are used to indicate the different aspects of the attack algorithms like the quality of a division, similarity of two partitions, hiding success, etc.

Modularity: To measure the quality of a division of a network, modularity, which was first introduced by [36], is defined as $Q = \sum_{i} (e_{i i} - a_{i}^{2})$ , where $e_{i i}$ is the fraction of edges in the network that are internal in community $C_{i}$ and $a_{i} = \sum_{j} e_{i j}$ is the fraction of edges in the network which connect to nodes in community $C_{i}$ . That means it measures the difference between the number of intra-community edges and the expected number of such edges with random connections. Higher values of modularity indicate better community structure. It is particularly used for networks whose community structure is unknown. For weighted networks, the formulation of the modularity is given in [37].
Normalized mutual information (NMI): It is a measure of similarity between two community partitions (X and Y) based on information theory. $H (X)$ is the entropy associated with the partition X, and $I (X, Y)$ is the mutual information between two partitions, that is, the information that one partition has about the other. It is defined as $N M I = \frac{2 I (X, Y)}{H (X) + H (Y)}$ . It can be written for the community structures ( $\hat{C}$ and $\hat{C}$ ) detected before and after the attack as follows:

$N M I = \frac{- 2 \sum_{i = 1}^{| \bar{C} |} \sum_{j = 1}^{| \hat{C} |} m_{i j} log (\frac{m_{i j} n}{M_{i} M_{j}})}{\sum_{i = 1}^{| \bar{C} |} M_{i} log (\frac{M_{i}}{n}) + \sum_{j = 1}^{| \hat{C} |} M_{j} log (\frac{M_{j}}{n})},$

(1)

where m is the confusion matrix containing the ‘real’ communities in the rows and the ‘found’ communities in the columns, $m_{i j}$ is the number of shared nodes between the real community i and the found community j, $M_{i}$ is the sum over row i, $M_{j}$ is the sum over column j, and n is the sum of all elements in the matrix (or the number of nodes in the network). A larger NMI value means the partitions are more similar [38].
Adjusted Rand Index (ARI): It measures the similarity between two partitions. It is based on pair counting and defined as [39]:

$A R I = \frac{\sum_{i j} (\binom{m_{i j}}{2}) - [\sum_{i} (\binom{M_{i}}{2}) \sum_{j} (\binom{M_{j}}{2})] / (\binom{n}{2})}{\frac{1}{2} [\sum_{i} (\binom{M_{i}}{2}) + \sum_{j} (\binom{M_{j}}{2})] - [\sum_{i} (\binom{M_{i}}{2}) \sum_{j} (\binom{M_{j}}{2})] / (\binom{n}{2})} .$

(2)

Its value is in the range $[- 1, 1]$ . The larger the ARI value, the closer the two partitions are to each other.
Success rate: It quantifies the ratio of incorrectly clustered target nodes out of all the target nodes [40], or the percentage of times the target node does not belong to its original community when the process is repeated several times [41].
AML: The average quantity of link alterations to successfully attack a target node [40].
Percentage degree increase: It is the percentage increase in the degree of the target node due to the attack [7].
Community retention probability: It is the probability that the target nodes are found within their original communities after the attack [42].
Miss ratio: It is defined as $F N / | V - C |$ , where C is the target community and $F N$ is the fraction of the target community nodes being associated with another community. It demonstrates the fraction of the target community in the part of the network where the nodes of the target community attempt to hide [43].
Concealment measure (M): It assesses the effectiveness of concealing a target community C within a community structure $\bar{C}$ . Two measures (namely M^′ and M^′′) are introduced to evaluate different dimensions. M^′ assesses how well the C members are distributed across communities in $\bar{C}$ , while M^″ measures the degree to which C is concealed within the crowd [5].

$M = α M^{'} + (1 - α) M^{″}$

(3)

$M^{'} = \frac{| {C_{i} \in \bar{C} : C_{i} \cap C \neq \emptyset} | - 1}{max (| \bar{C} | - 1, 1) max_{C_{i} \in \bar{C}} (| C_{i} \cap C |)}$

(4)

$M^{″} = \sum_{C_{i} \in \bar{C}} \frac{| C_{i} ∖ C |}{max (n - | C |, 1)}$

(5)

where $α \in [0, 1]$ and n is the number of nodes in the graph.
Community deception score (H): Fionda and Pirro [6] establish three indicators of good hiding of a target community C: (i) reachability preservation, the members of the community C should reach each other, that is, modifications should not break its connectivity; (ii) community spread, the C’s members should be spread over as many communities in the network as possible; and (iii) community hiding, the C’s members should be distributed in the largest communities. The deception score H that captures all of them is defined. Given a target community C and a community structure $\bar{C} = {C_{1}, C_{2}, \dots, C_{k}}$ , the score H is defined as:

$\begin{matrix} H (C, \bar{C}) = (1 - \frac{| S (C) | - 1}{| C | - 1}) \\ \times (\frac{1}{2} (1 - max_{C_{i} \in \bar{C}} {R (C_{i}, C)}) + \frac{1}{2} (1 - \frac{\sum_{C_{i} \cap C \neq \emptyset} P (C_{i}, C)}{| C_{i} \cap C \neq \emptyset |})), \end{matrix}$

(6)

where $| S (C) |$ is the number of connected components within the subgraph formed by the members of C, and recall and precision of a community detection algorithm $A_{D}$ with regard to the C are defined as follows:

$R (C_{i}, C) = \frac{# C members in C_{i}}{| C |} \forall C_{i} \in \bar{C} .$

(7)

$P (C_{i}, C) = \frac{# C members in C_{i}}{| C_{i} |} \forall C_{i} \cap C \neq \emptyset .$

(8)

The goals (i), (ii), and (iii) stated above are fulfilled by the left multiplicative factor, the first term in the right factor, and the second term in the right factor, respectively. Let us think whether the deception score H can be directly used (by maximizing it) to approach a target community hiding problem. It includes the knowledge of the community structure $\bar{C}$ and would need to have the knowledge of the community detection algorithm $A_{D}$ , which produced $\bar{C}$ . That means, the deception would depend on $A_{D}$ .
Community splits (CommS): It captures the number of communities in the updated network containing the members of the target community [44].
Community uniformity (CommU): It captures how members of the target community are distributed among the communities in the updated network using entropy [44].
Modified NMI (MNMI): In large networks, concealing a target community might not significantly impact the communities that are not directly connected to it. Therefore, this measure evaluates NMI between the community memberships of target community nodes and their directly connected neighbors prior to and following the attack. It has the same range as NMI [44].
Fitness: The proposed fitness function can be used to assess the attack effect [7].
Variation of information (VI): It is a measure used for comparing two partitions (X and Y) based on information theory. It is defined as $V I = (H (X) - I (X, Y)) + (H (Y) - I (X, Y))$ in [45]. The lower VI value implies that the partitions are more similar.
Split–join distance (SJD): It calculates the distance between two community partitions (X and Y), proposed in [46]. It is given by $S J (X, Y) = P D_{Y} (X) + P D_{X} (Y)$ . The $P D_{Y} (X)$ is the projection distance of Y from X and found as follows: for each community in Y, determine the community in X with which it has the maximum overlap, then add up the maximal overlap sizes and subtract the sum from the number of elements in Y. A lower value of the split join indicates that the partitions are more similar.
Jaccard index: It is used to compare two community partitions (X and Y) [47]. It is defined as $J (X, Y) = | X \cap Y | / | X \cup Y |$ .
Recall: Given two community partitions (X and Y), for any distinct vertices i and j, it is defined as [47]:

$R (X, Y) = \frac{| {(i, j) \in X ∣ i \neq j, (i, j) \in Y} |}{| {(i, j) \in X ∣ i \neq j} |} .$

(9)

The numerator is the number of pairs $(i, j)$ that are in the same community in both partitions X and Y. The denominator is the total number of pairs $(i, j)$ that are in the same community in partition X. So, recall measures the proportion of relevant vertex pairs in the same community in partition X that are also found in the same community in partition Y.
Precision: It is used to compare two community partitions (X and Y) [48]. It is defined as $P (X, Y) = | X \cap Y | / | Y |$ .
Node-centric measures: It encompasses the gain in the graph safeness and the loss in the graph persistence [49]. The equations for node safeness and persistence (which closely resemble the permanence formula) are provided in Section 3.2, allowing graph-wide computation by summing the values of all nodes in the graph.
Constant community (CC) measures: Constant communities refer to groups of nodes that consistently belong to the same community across various community detection algorithms. The measures derived from CCs involve the number of nodes within CCs, the average density of such communities, and the average hub dominance [49].
Attack efficiency: It measures the effectiveness of an attack. It is the number of incorrectly clustered nodes with a limited number of edge modifications ( $β$ ). It is defined as [50]:

$A E = \frac{Number of altered nodes}{β} .$

(10)
BN ( $β$ * NMI): It represents the attack cost. It measures the number of edge modifications required to decrease the NMI value. A lower BN value signifies a more effective attack [3].

3. Community Detection Attacks

Community detection attacks try to deteriorate the performance of community detection algorithms. In this section, the problem of community detection attacks is given in three scales; target node, target community, and global attacks. The available attack algorithms in the literature for each scale are explained. Some algorithms are designed for multi-scale problems like EPA [7] or CH-SNMF [42]; the algorithm explanation is given in the subsection where it first appears in these cases.

3.1. Target Node Attack

In certain cases, a specific node within a network may prefer to avoid detection as part of any particular community. This node may have affiliation with sensitive or private groups, such as political or religious groups, or may not want its group to be known due to personal preferences. The main goal of a target node attack is to obscure the community membership of a target node within a network, ensuring that the community it belongs to cannot easily be detected. This can be executed by altering the connections around the target node, making it harder to accurately place the node within its original community.

Definition 4

(Target Node Attack). Given a target node u, suppose it is a member of community

C_{i}

in the original graph

G = (V, E)

. The target node attack problem is to avoid the target node u from being identified as part of its initial community

C_{i}

by altering the connections around it. After the attack, suppose that it is included in

C_{j}

in the adversarial graph

G^{'}

, which is as different as possible from

C_{i}

. That means the target node is clustered into the wrong community.

Figure 2 illustrates an example of a target node attack on Zachary’s Karate network [51]. The community structure, comprising four distinct communities, is found using the Louvain algorithm [25]. Node 19 is chosen as the target node, with its border highlighted in red. Initially, it belongs to the blue community indicated by the circle. Given an attack budget of 2, the target node removes its connection with node 1, which is in the same community, and establishes a new connection with node 29 from a different community. That means edge

(19, 1)

is deleted within the community of the target node and edge

(19, 29)

is added between communities. After the attack, when the Louvain is re-applied to the modified network, as shown on the right, the target node is assigned to a different community, marked with plum color and hexagon shape. The goal here is to determine the optimal structural modification to the neighborhood of the target node. This modification should ensure that the target node remains hidden. It is considered successful when the original community and the new community involving the target node are different.

Chen et al. [40] put forward fast gradient attack (FGA) on network embedding to avoid target nodes from being identified (i.e., wrongly clustered in community detection). FGA comprises two parts: (i) generation of the adversarial network based on the graph convolutional network (GCN) gradient and (ii) using the adversarial network to attack the GCN or other network embedding methods. In the first stage, for a target node, a target loss function is computed. The partial derivative of it concerning each element of the adjacency matrix is calculated. Then, the link with the maximum absolute gradient is chosen to add/delete since it affects the result of the target node more. In the experiments, the K-means algorithm is used to obtain the community detection results from the embedding vectors.

The problem of community detection attacks is formalized in three scales, global, target community, and target node attacks, by the study [7]. Each attack problem is modeled as an optimization algorithm, and a genetic algorithm-based EPA method is proposed. The index (ID) of a link (added or deleted link) is used as a gene. Each chromosome represents an attack. For a target node attack, the fitness function is based on only the degree change

f = Ψ (d^{'})

, where

Ψ (d^{'}) = e x p (c \times d^{'})

, c is a constant to control decay speed,

d^{'} = d / | E |

,

d = \frac{1}{4} \sum_{i = 1}^{n} | d_{i} - {\hat{d}}_{i} |

, n is the number of nodes in the graph, and

d_{i}

and

{\hat{d}}_{i}

are the degrees of node i before and after the attack. Unequal crossover is adopted so that the chromosome length can be changeable. In the mutation process, network structural properties (i.e., between pairs of two nodes, betweenness for deletion, and the shortest path lengths for addition) are used.

Bernini et al. [41] address the problem of community membership hiding. The goal of the node deception task is to disassociate a target node u from its original community

C_{i}

. The objective is defined by setting a threshold for the similarity between the target node’s original community

C_{i}

and new community

C_{i}^{'}

by excluding it, that is,

s i m (C_{i} - {u}, C_{i}^{'} - {u}) \leq τ

, where

τ \in [0, 1]

. The loss function is formulated as

l o s s = l_{d e c e p t} + λ l_{d i s t}

. The first part evaluates to 1 if node u remains in the community

C_{i}

or 0 otherwise (the goal is achieved). The second part assesses the distance between two graphs (G and

G^{'}

) and their corresponding community structures (

\bar{C}

and

\hat{C}

). The target node can delete edges within its community and add new edges to nodes outside its community. The hiding problem is solved using deep reinforcement learning, and the advantage actor-critic framework is used [52].

A unified framework (CH-SNMF) is introduced for the three scales of community hiding [42]. It utilizes the clustering properties of symmetric non-negative matrix factorization (SNMF). Rather than constructing the feature matrix, the adjacency matrix A is used. Matrix U

= U_{i j}

(community indicator) specifies the probability of a node

v_{i}

being part of community

C_{j}

.

m i n | |

A − UU^T

{| |}_{F}^{2}

is obtained such that U

\geq 0

. At each iteration, nodes having a high membership are chosen when updating the matrix U. For rewiring, it removes the edges between nodes with high membership in a community and creates edges to nodes with high membership that belong to other communities.

Table 1 summarizes and compares target node attack algorithms. For each study, reference of the study, target node attack algorithm name(s), update operation, whether intra-/inter-commmunity edges are used or not, knowledge needed (if any prior information required), and measures used to compare the attack algorithms with the algorithms in the comparison column are specified in the table. The last column shows the short names of the community detection algorithms used in the study. When the table is examined, it can be seen that the EPA algorithm employs only edge addition, while other methods incorporate both edge addition and removal to modify the network’s structure. Rewiring is only applied in CH-SNMF. Considering the community of the target node and the external communities, the algorithms that delete the intra-community edges and add inter-community edges are DRL-Agent and CH-SNMF. The aim is concealing a specific node in target node attacks, but all attack algorithms utilize the entire network. Some algorithms, such as DRL-Agent and CH-SNMF, even rely on prior knowledge of communities. Although success rate is used as a performance measure in two studies, different measures are also preferred. Comparisons of the algorithms are typically made against random baselines.

3.2. Target Community Attack

There may be scenarios where a group, such as police enforcement or activists, needs to interact and cooperate in a network without disclosing their community membership. A target community attack, also known as community deception, involves disguising a specific community that may be private or sensitive. This attack aims to conceal the existence of a target community within the network. For this, the connections of members of the target community are altered, which complicates the detection. Changing the connections of the target community may spread its members to other communities, causing the target community to become obscure.

Definition 5

(Target Community Attack). Given a target community

C \subseteq \bar{C}

, identified by a community detection algorithm f on the original graph

G = (V, E)

, the target community attack problem is to hide the target community C by altering the connections of its members within a specified budget β. After the attack, C is no longer detectable by the community detection algorithm f. That is, in the adversarial graph

G^{'}

, the nodes in C are distributed among a set of new communities, denoted by

\hat{C} = {C_{1}^{'}, C_{2}^{'}, \dots, C_{L}^{'}}

.

An illustrative example for a target community attack on the Karate network [51] is depicted in Figure 3. The community structure identified with the Louvain algorithm on the original network consists of four communities. Assume that the community whose members are shown as octagons and bordered in red is targeted, that is, target community members are

{23, 24, 25, 27, 28, 31}

. Considering that the update budget is 4 and the following updates are applied to the graph (deleting edges

(23, 27)

and

(24, 31)

, and adding edges

(24, 4)

and

(25, 4)

), the community detection result on the updated network is obtained, as shown on the right, pointing out [6] (i) its members are evenly distributed in two communities, (ii) its members are better ‘hidden’ within the larger communities, and (iii) all its members are reachable from one another.

Nagaraja [43] first introduced the concept of hiding a community to counter community detection algorithms by allowing the community to alter its structure. The study considers only two communities where the nodes of the hidden community try to enter the main network (i.e., the other community). Their strategies are only based on link additions to the nodes with high centrality values, chosen from each of the two communities.

Waniek et al. [5] also focus on how a community can disguise itself to reduce the likelihood of being discovered through community detection. A heuristic algorithm named disconnect internally and connect externally (DICE) is proposed, which works by randomly deleting the links between members of the target community (intra-community edges) and adding the links between members and non-members of the target community (inter-community edges). The DICE algorithm is inspired by modularity. It does not require knowing the entire network topology and can be applied by any group of people.

Fionda and Pirro [6] model community deception as an optimization problem and propose “community safeness”. Before giving it, firstly node safeness

σ (μ, C)

of node

μ

in G is defined to measure hiding of it in the community C as follows:

σ (μ, C) = \frac{1}{2} \frac{| V_{C}^{u} | - | E (u, C) |}{| C | - 1} + \frac{1}{2} \frac{| \tilde{E} (u, C) |}{d e g (u)},

(11)

where

V_{C}^{u} \subseteq C

is the set of nodes of C reachable from u by passing only via nodes in C,

E (u, C)

is the set of internal edges of node

u \in C

,

\tilde{E} (u, C)

is the set of external edges of node

u \in C

, and

d e g (u)

is the degree of node u. Node safeness of u considers two factors related to the indirect connections of the node u with the nodes in the community C and the external connections of the node u. That is, the first factor explains how well u transmits information in C. The second factor explains how u is concealed in the network with respect to its degree. The second one is used to confuse detection because it leads to a better community deception if u is diverse in terms of its edges. Moreover, the community safeness

σ (C)

is defined as

σ (C) = \sum_{u \in C} σ (u, C) / | C |

. Higher community safeness means finding the correct set of updates. Based on the community safeness, the Ds algorithm is proposed to deceive a detection algorithm by changing (adding/deleting) a certain number of links of the target community. Let

Δ (C) = σ (C^{'}) - σ (C)

be the safeness gain, where

C^{'}

is the community after one or more link updates. The Ds is a greedy algorithm which selects the edge update that gives the highest

Δ (C)

at each step. The Ds algorithm only considers inter-community edge additions and intra-community edge deletions. Additionally, in the study [6], the Dm algorithm that is based on the modularity loss is presented for community deception. However, it needs the knowledge of the community structure. Later, Fionda and Pirro [53] offer the SECRETORUM algorithm, in which node safeness is re-defined to work with weighted networks. Further, the same author group explores community deception for directed and/or weighted graphs in the studies [54,55,56].

In the EPA method [7], for target community and global attacks, the fitness function is calculated using entropy and degree change. Entropy is employed to assess the attack effect using the community detection results before and after the attack. Therefore, the EPA method requires the knowledge of the community structure discovered by a particular detection algorithm. The fitness function for a global attack is designed as follows:

f = Ψ (d^{'}) \times (E n t_{A} / {log}_{2} | \hat{C} | + E n t_{B} / {log}_{2} | \bar{C} |),

(12)

E n t_{A}

and

E n t_{B}

are the entropy for new communities and original communities, respectively, m is the confusion matrix of two community distributions,

m_{i j}

is the number of common nodes between the original community

C_{i}

and the new community

C_{j}^{'}

, n is the number of nodes in the graph, and

E n t_{A} = - \sum_{i = 1}^{| \bar{C} |} \sum_{j = 1}^{| \hat{C} |} \frac{M_{i}}{n} (\frac{m_{i j}}{M_{i}} {log}_{2} \frac{m_{i j}}{M_{i}}) .

(13)

The maximum value of

E n t_{A}

(resp.,

E n t_{B}

) is obtained if the values of each row (resp., column) are equal. For a target community attack, the fitness function given in Equation (12) can be adapted to limit the search space [7].

Chen et al. [57] give a new definition for community safeness that considers the total shortest path for every pair of nodes in the target community C. For this, the degree of dispersion of the community C is defined as

ρ (C) = \sum_{u} \sum_{v} s p (u, v)

, where

u, v \in C

,

u \neq v

and

s p (u, v)

is the length of the shortest path between node u and node v. The higher

ρ (C)

is, the better the community hiding. The

ρ (C)

is normalized to maintain it in the range

[0, 1]

, that is,

ψ (C) = (ρ (C) - ρ {(C)}_{m i n}) / (ρ {(C)}_{m a x} - ρ {(C)}_{m i n})

. Then, the community safeness is defined as

σ (C) = \frac{1}{2} ψ (C) + \frac{1}{2} φ (C)

, where

φ (C) = \sum_{u \in C} \frac{| \tilde{E} (u, C) |}{d e g (u)}

. The Ds algorithm is improved, and the Hs algorithm, which is based on the new safeness definition, is proposed.

Mittal et al. [44] use permanence for community deception. Permanence is a node-based metric which quantifies the belongingness of a node v to its community C [58]:

P e r m (v) = \frac{I (v)}{E_{m a x} (v)} \times \frac{1}{d e g (v)} - (1 - c_{i n} (v)),

(14)

which is based on three factors: (1) internal pull

I (v)

, the internal connections of v inside its own community; (2) maximum external pull

E_{m a x} (v)

, the maximum connections of v to its neighboring communities; and (3) v’s internal clustering coefficient

c_{i n} (v)

, the fraction of the actual and possible number of links among the internal neighbors of v. The permanence of a node increases if its internal pull is greater than its external pull or its internal neighbors are densely connected. Figure 4 illustrates a toy example of the calculation of the permanence of two nodes. The permanence of a network G is calculated as

P e r m (G) = \sum_{u \in V} P e r m (v) / | V |

. Mittal et al. propose NEURAL, a permanence-based deception method that aims to reduce the network permanence to hide a target community C. NEURAL is a greedy strategy that maximizes the permanence loss

P_{l} = P e r m (G) - P e r m (G^{'})

by choosing the edge update giving the highest

P_{l}

at every iteration. NEURAL also allows only intra-community edge deletions and inter-community edge additions. The work [59] then adapts the permanence formula to operate on weighted networks and offers the PERMDEC algorithm using the permanence loss.

The Matthew effect caused by traditional quality function (i.e., modularity)-based methods is identified, where earlier edge perturbation influences the placement of subsequent perturbations at the community level [48]. To address this issue, a probabilistic framework, ProHiCo, is designed to hide a set of target communities. The core concept behind it is to initially allocate the resource of perturbations randomly and sequentially, followed by selecting the suitable edges for perturbation through likelihood minimization. By integrating the stochastic block model and its degree-corrected version into the ProHiCo framework, two scalable algorithms, SBM and DCSBM, employing sampling and pruning techniques, are proposed. Then, the scalable ComDeceptor method [60], leveraging the Laplacian spectrum, also hides an arbitrary set of target communities. It first allocates the resources of perturbations fairly, giving either a pair of communities for edge addition or a community for edge deletion in each round. The method performs inter-community edge additions by maximizing the second smallest Laplacian eigenvalue while performing intra-community edge deletions by minimizing the largest Laplacian eigenvalue. It incorporates heuristics for approximately solving them.

A node-centric approach [61] is introduced for community deception by allowing node updates, where each node operation (node addition or node deletion) results in several edge updates. It relies on node safeness defined in Equation (11). A greedy algorithm, nSAF, is proposed that selects the node operation (between a node addition candidate and a node deletion candidate) that yields the best safeness gain for each step, along with the respective edges. For deletion, the node with the least safeness is chosen among the target community nodes, and its edges are determined as the edges to be deleted. For the addition of a new node (natural or bot), the edges to be added are determined as follows: first, only one edge is established with a node in the target community to form a single component, and then, the remaining edges are established with nodes outside the target community. Another greedy algorithm with a node-centric approach, nDec [62], is based on modularity and considers three node operations (addition, deletion, and moving). Node addition is essential when new members join the network. While moving nodes between communities is generally preferable, node deletion may be necessary in cases where the complete concealment of the target community is critical. A deleted node can be re-added with carefully chosen edge additions.

Chang et al. [63] offer three genetic algorithms for hiding a target community using escape score, dispersion score (deception score H in [6]), and hiding score as the fitness functions. The escape score quantifies the number of nodes in the target community that are concealed. The hiding score formula is obtained from the first two scores.

A swarm intelligence-based method, SCP, is introduced for target community attacks [64]. The edge pool is initialized according to permanence, as in [4] (explained in Section 3.3), but only the edges related to the target community are considered. The search space is tailored for each particle. A self-adaptive mechanism is implemented to balance global and local search resources. The fitness function is derived from the structural entropy given in Equation (16).

Table 2 summarizes and compares target community attack algorithms. For each study, reference of the study, target community attack algorithm name(s), update operation, whether intra-/inter-commmunity edges are used or not, knowledge needed (if any prior information required), and measures used to compare the attack algorithms with the algorithms in the comparison column are specified in the table. The last column shows the short names of the community detection algorithms used in the study. The table indicates that manipulation of the network structure exclusively through the addition of edges is achieved only in the study [43]. The other algorithms modify the network by both adding and removing edges. Additionally, nSAF and nDec algorithms also perform node updates. Notably, edge rewiring is implemented only in the EPA and CH-SNMF algorithms. Rewiring is not applied in greedy optimization-based methods (i.e., Ds, Dm, SECRETORUM, Hs, NEURAL, nSAF, and nDec). Except for EPA, nSAF, and nDec, all of them perform inter-community link addition and intra-community link deletion since the target community to be concealed is known and other communities can be treated as external communities. Some algorithms may need additional prior information to hide a target community. For example, information about neighboring communities is needed to calculate the permanence value in the NEURAL algorithm. Certain algorithms even demand a community structure extracted from the network by a community detection algorithm, implying the necessity of knowing the entire network. This requirement may potentially restrict the practical applicability of these methods in real-world scenarios. When evaluation measures are analyzed, it is observed that the community deception score H, which captures all three desiderata of effective community hiding, and NMI are predominantly preferred. Following them, the measure M is preferred. In the comparative analysis of the attack algorithms, the Ds algorithm emerges as the most frequently utilized, indicating its widespread acceptance and effectiveness in the context of target community attacks.

3.3. Global Attack

In a global attack, all communities within the network are regarded as sensitive. This attack focuses on modifying the least number of connections to significantly alter the overall community structure, thereby ensuring privacy.

Definition 6

(Global Attack). Let

G = (V, E)

be a graph. f is a community detection algorithm that discovers the community structure

\bar{C} = {C_{1}, \dots, C_{K}}

on G. The global attack problem aims to maximize the change in the community structure by allowing a certain number (β) of edge modifications (addition or deletion) in the network, where β is the budget. After the attack, the adversarial graph

G^{'}

is obtained. The community structure discovered from

G^{'}

is

\hat{C} = {C_{1}^{'}, \dots, C_{L}^{'}}

through the community detection algorithm f. The community structure

\hat{C}

is as different as possible from the original community structure

\bar{C}

.

Figure 5 presents a visualization of a global attack on the Karate network [51]. The implementation of the Louvain algorithm divides the nodes in the original network into four communities, each represented by a different shape to indicate the membership of the individuals. Performing a global attack, consisting of four strategically selected edge rewirings, produces the adversarial network; the resulting community structure on it by the same detection algorithm is illustrated in Figure 5b [2], revealing the change in the number of communities and exhibiting disorganization at a broader scale by modifications across the network.

The attack strategies for global attacks are designed to disrupt the community structure. Figure 6 shows an ideal case for community obfuscation.

Chen et al. [2] tackle the problem of global community structure deception and develop strategies to attack community detection algorithms by rewiring the minimal number of links. Two heuristic attack strategies, community detection attack (CDA) and degree-based attack (DBA), as baselines, and a genetic algorithm-based strategy, Q-Attack, are given to fool the detection algorithms. These strategies rely on rewiring, that is, adding an edge to a node while deleting one from it. The CDA strategy uses knowledge of the community structure found by a specific detection algorithm. Since there is no target community, it randomly selects a certain number of nodes from the network. In every iteration, for a chosen node, an existent intracommunity edge is deleted and a non-existent inter-community edge is added. The DBA heuristic strategy is also similar to CDA but chooses the nodes of larger degree from the network.

The genetic algorithm is used to search for the optimal set of rewiring links [2]. The Q-Attack algorithm is mainly composed of three parts: encoding, fitness function, and design of genetic operators. For encoding, an attack is represented by a chromosome and a rewiring (a deleted edge and an added edge) is represented by a gene. An example individual consisting of four rewirings is demonstrated in Table 3. The fitness function used is

f = e^{- Q}

, which indicates that individuals with lower modularity will have larger fitness. The probability of selecting an individual is proportional to its fitness, represented as

p_{i} = f (i) / \sum_{j = 1}^{n} f (j)

. A single-point crossover between two individuals is adopted with the probability

p_{c}

, and to promote the diversity, three types of mutation operators, such as link deletion, link addition, and link reconnection, are proposed with probability

p_{m}

.

In the study [47], structural entropy for a graph

G = (V, E)

is expressed as follows:

H (G) = - \sum_{i = 1}^{| V |} \frac{d (i)}{2 | E |} {log}_{2} \frac{d (i)}{2 | E |},

(15)

where

d (i)

is the degree of node i. Then, the structural entropy for the graph with respect to the community structure

\bar{C} = {C_{1}, C_{2},

\dots,

C_{k}}

is defined, in which

v (j)

is the sum of degrees in community

C_{j}

and

g (j)

is the number of external edges from community

C_{j}

.

H_{C} (G) = - \sum_{j = 1}^{k} [\frac{v (j)}{2 | E |} (- \sum_{i \in C_{j}} \frac{d (i)}{v (j)} {log}_{2} \frac{d (i)}{v (j)}) - \frac{g (j)}{2 | E |} {log}_{2} \frac{v (j)}{2 | E |}] .

(16)

To disrupt the community structure, their REM method minimizes

ρ = (H (G) - H_{C} (G)) / H (G)

. It only performs edge additions between communities by considering nodes with a low degree.

Network embedding algorithms map the nodes of a network into the vectors in a Euclidean space. These vectors then can be used by downstream network tasks, such as node classification, community detection, and link prediction. Yu et al. [66] focus on attacking the network embedding process. An attack method, namely EDA, is proposed for disturbing the distances between the embedding vectors through minimal changes of the network structure. For embedding, the DeepWalk method is used. To obtain the optimal set of modified links, a genetic algorithm is used. The EDA method iteratively calls DeepWalk and the genetic algorithm and returns the adversarial network at the end. This attack method disturbs the global network structure. Moreover, to observe the effectiveness of the attack on community detection, nodes are converted into vectors using DeepWalk and the embedding vectors are clustered via the K-means algorithm. As an unsupervised attack, EDA does not need any knowledge of communities.

Persistence, which includes the same three factors and is too similar to permanence, is defined in the study [49]. However, persistence is used for global attacks. Two attack strategies, maximization of persistence loss (MPL) and a rewiring strategy, are offered. The MPL strategy aims to maximize the persistence loss of the network. A node is moved to a neighboring community if its persistence score decreases. All the nodes in the network are checked. In MPL, the structure is not altered. A rewiring strategy can be applied after MPL to change the structure of the network. For every shifted node, an edge from the old community is deleted and a new edge is added to a node in the new community.

Modularity vitality reflects the role of a node (i.e., bridge or hub) within a given community structure [67]. This measure can be utilized to execute global attacks, with a greedy approach by removing community hub nodes—those with high modularity vitality. Even an approximation is offered with the calculation of the values only once. Experiments on a network reveal that a specific level of attack effect can be achieved by removing a few nodes or by removing a large number of edges. The suggestion is that users seeking to safeguard their community identity break their connections with community leaders.

Graph neural networks (GNNs) have been widely preferred in graph-related tasks. Liu et al. [50] apply a graph auto-encoder [68] to handle the community hiding problem and propose a method called GCH. A graph auto-encoder is used to reconstruct the probability adjacency matrix. The GCH method takes the community structure discovered by a detection algorithm to generate the adversarial network. The original network is perturbed by deleting the edges with the highest probability within communities and adding the edges with the lowest probability between communities. In this manner, the global community structure is disturbed, which leads the structure to be very different.

Liu et al. [3] propose a community hiding algorithm based on genetic algorithms using normalized mutual information (NMI), called CGN. This algorithm realizes global community structure hiding. It mainly consists of four parts: creating a gene pool with prior information, encoding, fitness function, and genetic operators. A gene pool is established according to preliminary information about the community structure of the original graph found by a community detection algorithm. That means rather than considering all existent graph edges for deletion and all non-existent graph edges for addition, the gene pool includes existent edges within the communities (intra-community edges) for deletion and non-existent edges between the communities (inter-community edges) for addition. In encoding, a chromosome corresponds to an attack and an edge randomly chosen from the gene pool corresponds to a gene. The fitness function is designed as

f = 1 - N M I

, which indicates the chromosome with a lower NMI value will have larger fitness and produce a better attack. The benefit of employing NMI as a fitness function is that the minimum value of NMI shows that the community structure of the network has changed significantly after the attack. The roulette wheel selection is used. The chromosomes in a population are mapped to a wheel, where those with higher fitness values have a higher probability of being selected. A single-point crossover is applied between two chromosomes with the probability

p_{c}

, and mutation is performed with probability

p_{m}

.

Zhao et al. [4] develop SAEP (a self-adaptive evolutionary deception) framework. It offers a permanence-based edge pool initialization mechanism. The permanence of a node is a measure of loyalty to the community it belongs to. A permanence value is in

(- 1, 1]

. A value close to 1 means a loyal node, while close to −1 means a tendency to leave. For the edge deletion, intra-community edges are sorted by the product of their nodes’ permanence in increasing order. This implies that in a community, the nodes with low loyalty are more likely to leave because they lose the attraction from the ones with high loyalty. For the edge addition, it focuses on the non-existent inter-community edges that are sorted by the product of the nodes’ permanence in increasing order. It means that a node with low permanence in community A receives an invitation from a node loyal to an external community B. Further, a penalty for adding edges to the node from unrelated communities is defined by multiplying the result by 0.5. Then, edges are sampled from the pool with a probability

p r o b (i) = \frac{x_{i}^{- θ}}{Σ_{k = 1}^{| P |} x_{i}^{- θ}}

, where

x_{i}

is the ith element in the sorted pool,

| P |

is the pool size, and

θ

is empirically assigned to 0.3.

A fitness function is proposed to capture local and global community change [4]. It consists of multiple components (

f = f_{1} * f_{2} * f_{3}

). To capture local change, the assumption is that if an edge is modified (added/deleted), its two nodes should be affected first. The aim is to observe if the nodes of the modified edge join the other communities. A set of nodes

Δ (V^{'})

related to the modified edges

Δ (E^{'})

, which includes nodes whose community is not the same as that of its most loyal neighbor after the attack, are identified.

f_{1}

is designed as

f_{1} = \sum_{v \in Δ (V^{'})} {log}_{2} d (v)

. To capture global change, two functions are defined based on the confusion matrix.

f_{2}

is

E n t_{A}

defined in the study [7]. This will force nodes to leave their initial community.

f_{1}

cannot capture local obfuscation if the entire community (the affected node and its most loyal neighbor) combines with another community after the attack. To solve the issues, a variant of the confusion matrix

m^{'}

, is introduced as

m_{i j}^{'} = \frac{m_{i j}}{| C_{j}^{'} |} * {log}_{2} | C_{j}^{'} |

. The first part

\frac{m_{i j}}{| C_{j}^{'} |}

is the proportion of nodes of the original community

C_{i}

that form the new community

C_{j}^{'}

. The log part is to prevent a situation like a community with one node.

f_{3}

is defined as

f_{3} = e^{- m a x (m^{'})}

and tries to make the maximum value as small as possible.

In the SAEP [4], each chromosome is assigned a strength vector to evaluate the strength or weakness of its genes. The vector for the jth chromosome is

W^{j} = [W_{1}^{j}, \dots, W_{β}^{j}]

. An edge distance is introduced to ensure that the edges in the solution are close to each other. The penalty Q of a modified edge is

Q (e) = Σ_{i = 1}^{k} σ^{i}

, where

σ

is set in

[0, 1]

and k indicates there are no modified edges at k-hop distance. The strength of the ith edge (gene) in its chromosome is updated as

W_{i} = W_{i} * \frac{1}{1 + e x p (Q (e))}

if

k > 1

. Moreover, adaptive crossover and mutation operations are designed according to the strength vectors. To exchange only the inferior genes, a uniform crossover operation is used; exchanging of two genes is performed if a random number is greater than the maximum of the

W_{i}^{m}

and

W_{i}^{f}

that represent the ith gene of the mother chromosome and father chromosome, respectively. Then, the weights of the changed genes are updated according to the fitness gain. In mutation, the weaker a gene is, the more likely it is to mutate. Also, the weight of the changed gene is updated.

A heuristic algorithm, degree first deception (DFP), is proposed [4]. It assumes that nodes with a large degree have a greater impact on the community structure. Therefore, it prioritizes the deletion of intra-community edges between large-degree and small-degree nodes and the addition of inter-community edges with large-degree nodes. That is, it sorts the edges using the following rule:

S (u, v) = \{\begin{matrix} \frac{m i n (d_{u}, d_{v})}{m a x (d_{u}, d_{v})}, & (u, v) \in E, C_{u} = C_{v} \\ d_{u} + d_{v}, & (u, v) \in \bar{E}, C_{u} \neq C_{v} . \end{matrix}

(17)

where E and

\bar{E}

are all edges and non-existent edges in the graph,

d_{x}

is the degree of node x, and

C_{x}

is the community that x belongs to. The study [69] later proposes a modified version called DFP-R, which uses the same metric for the addition and removal of edges, but the created edge is chosen to have a vertex in common with the removed edge.

A coevolutionary method [70], called CoeCo, to obfuscate the community structure is offered to also apply on large-scale datasets. It divides the graph into multiple similar-sized subgraphs, and each is optimized separately. In the method, to reduce the search space, initialization is performed. Edges are sorted according to permanence multiplication as in the SAEP [4] for addition and motif weights (the number of triangles containing the edge) for deletion. Two fitness functions are adapted, namely the fitness function for the ith subgraph

f_{i} = H (s u b G_{i})

defined in Equation (16) [47] and the fitness function for the global graph

f_{g l o b} = f_{1} * f_{2} * f_{3}

introduced in [4]. In the coevolution part, they support each other to identify the optimal set of edges.

Another genetic algorithm-based method for global attack is EPCG [71], which employs two different fitness functions (based on NMI and the difference between an individual and the best one to preserve the population diversity). For better concealment of attack, the number of edge additions and deletions is kept equal. It adopts a co-evolution mechanism with two elite populations to improve evolution.

The study by [72] combines a graph auto-encoder (GAE) and genetic algorithm to deceive community detection algorithms. Initially, small subnetworks are sampled and reconstructed using the GAE with an added community hiding constraint in the loss function. Subsequently, a genetic algorithm is applied to improve the hiding effects through the network, treating each reconstructed subnetwork as an individual. The genetic algorithm fitness function and operators utilized follow those in Chen et al. [2], with elitism retaining the top 15% of individuals.

A heuristic approach based on local structures, namely LSHA, is proposed [69]. It does not depend on the community structure produced by a particular community detection algorithm. Instead, local structures are detected using the local information of nodes. A community can contain multiple local structures. LSHA chooses two local structures (a low degree one and a high degree one) using the modularity concept for attack. Then, to choose the edge in the low-degree structure, an edge vulnerability metric is proposed, which measures the significance of the edge in preserving the structure. It is defined as

V u l (e) = \frac{| N (v_{i}) | - | N (v_{i}) \cap N_{L S} (v_{i}) | + | N (v_{j}) | - | N (v_{j}) \cap N_{L S} (v_{j}) |}{| N (v_{i}) | + | N (v_{j}) |}

(18)

where e is an edge

(v_{i}, v_{j})

,

N (v_{x})

is the set of neighbors of

v_{x}

, and

N_{L S} (v_{x})

is the set of nodes within the local structure to which

v_{x}

belongs. For deletion, the edge with the high vulnerability is chosen, and its lower degree node is chosen for rewiring around it. For addition, the other node is selected from the high degree structure according to the node entropy metric that quantifies the perplexity level of a node for the partition. For a node v, it is defined as

E n t (v) = - \sum_{i}^{M} \frac{| L S_{i} \cap N (v) |}{| N (v) |} log (\frac{| L S_{i} \cap N (v) |}{| N (v) |})

(19)

where

L S_{i}

is the local structure that has connections with node v and M is the number of such structures. The entropy increase is calculated for each node, and the node with the most increase is selected to connect with.

Table 4 summarizes and compares global attack algorithms. For each study, reference of the study, global attack algorithm name(s), update operation, whether intra-/inter-commmunity edges are used or not, knowledge needed (if any prior information is required), and measures used to compare the attack algorithms with the algorithms in the comparison column are specified in the table. The last column shows the short names of the community detection algorithms used in the study. The table shows that in the REM algorithm, modification is performed solely through the addition of edges, while other algorithms, except for the study [67], employ both edge addition and removal to alter the network structure. Rewiring is applied in heuristic approaches, such as DBA, CDA, LSHA, DFP-R, and Custom Rewiring [49]. Although some genetic algorithm-based methods (Q-Attack and EPA) incorporate rewiring, some of them (EDA, CGN, SAEP, CoeCo, EPCG, and GAE+Genetic alg [72]) do not, with the assumption that the genetic algorithm might not achieve its full potential efficiency [3]. Heuristic methods, with the exception of [67], leverage intra-community and inter-community information (intra-community edge deletion and inter-community edge addition) to guide their structural adjustments. While some genetic algorithm-based methods (Q-Attack, EPA, EDA, EPCG, and GAE+Genetic alg) do not utilize intra/inter-community information, recent methods like CGN, SAEP, and CoeCo have integrated it to enhance their effectiveness. Moreover, most global attack methods rely on a community structure, though some methods (EDA, GAE+Genetic alg, and LSHA) are designed to function without prior knowledge of the communities, broadening their applicability in various scenarios. When evaluating the performance of the attack algorithms, NMI emerges as the most frequently used measure, with ARI being the second most commonly employed measure.

4. Discussion and Future Directions

Community detection provides many benefits by uncovering structural insights within networks, but it also presents privacy concerns. Hence, the community detection attack problem has gained prominence, which is investigated across three distinct scales. When the applicability of the attack at each scale is considered, the following conclusions can be drawn:

A target node attack aims to prevent a target node within a network from being identified as part of a specific community. This is achieved by allowing the node to strategically alter its connections to other nodes in the network. Hence, an attack tool can be designed to provide personalized recommendations to escape community detection. For example, the tool might suggest to a user of a social network, “If you remove your connections with users X and Y, your membership in community Z will no longer be detectable [41]”. Nevertheless, concealing a specific target node may be challenging for an individual due to the limitations of resources and the lack of influence over other participants. Moreover, if an attack method depends on knowing about the entire network, its practical application becomes significantly limited in real-world scenarios.

A specific community can act collaboratively to hide its collective existence. In social media platforms like Facebook or Twitter, target community attacks can be implemented through specific actions [6] offered by an attack tool. On Facebook, intra-community edge deletions can be achieved by ‘unfriending’ some members of the target community, whereas inter-community deletions involve breaking ties with individuals outside the community. On Twitter, these actions can be achieved by ‘unfollowing’ the appropriate members inside or outside the community. When it comes to adding connections, intra-community edge additions are straightforward on Facebook, requiring only sending and accepting friend requests. Conversely, adding inter-community edges, which involves identifying new members, can be performed by connecting with individuals from colleagues, classmates, or even random people. On Twitter, this process is performed through the act of ‘following’ other users. However, an attack algorithm might not only require knowledge of the community members and their connections but could also necessitate the community structure discovered from the entire network. Using network-wide knowledge could impose some constraints on the practical applicability of the algorithm.

Attacking at a global scale might be difficult due to the structural changes required throughout the network. Assuming that all nodes within a network can collaborate to manipulate edges—either by deleting or adding them—to avoid detection appears to be less realistic [61]. On the other hand, some systems depend on community detection results, such as recommender systems, and these systems are affected when a general perturbation is made in the network. Therefore, the development of robust detection algorithms is important for such dependent systems. Furthermore, community detection attacks can serve as a robust evaluation index for assessing and comparing the resilience of community detection algorithms against adversarial perturbations [50]. The work [75], which investigates robust community detection, provides examples showing some experiments conducted on networks with adversarial noise produced by certain attacks.

A generic attack algorithm capable of deceiving all community detection methods may be quite challenging since there is no universally accepted or standardized definition of the community. As illustrated in Table 1, Table 2, and Table 4 in Section 3, most attack algorithms are tailored to mislead multiple detection algorithms (e.g., modularity optimization-based, spectral-based, random walk-based, label propagation, etc.). Typically, attacks are designed to target disjoint community detection algorithms, though the study [72] indicates the efficacy of the proposed method against an overlapping algorithm [32] as well. At this point, it is important to mention the transferability of attack algorithms. Since some attack methods rely on the community structure generated by a specific community detection algorithm from a network, their success in deceiving different detection algorithms can be assessed. Thus, this indicates that the use of attack methods with high transferability can effectively protect individuals’ privacy against community detection in practical scenarios.

Different methods, such as genetic algorithms, heuristic approaches, greedy algorithms, reinforcement learning, etc., have been proposed to provide solutions to the problem. Genetic algorithms are promising because of achieving good attack effects. However, the computational requirements may limit their applicability in large graphs. Heuristic algorithms may need less computation but often sacrifice optimality for efficiency. Reinforcement learning methods can yield effective results, whereas applying to large graphs may become a concern.

There are still many unresolved issues in this area. Some potential research directions are explained below:

Incorporating additional knowledge: Community detection algorithms used in the domain of community detection attacks are mainly restricted to network topology. Nevertheless, real-world networks frequently involve attribute data. Attribute networks combine user attributes with topological data. Future research can explore attack techniques against detection algorithms that consider attribute networks.
Overlapping communities: Most studies have concentrated mainly on disjoint community detection algorithms in the research of community detection attacks. However, overlapping community detection allows for more flexible and realistic modeling of systems. Although few techniques [76,77] have been introduced to hide target nodes in overlapping areas, attacks at the global scale or target community scale against overlapping community detection algorithms have not yet been investigated. Even target nodes that are not in overlapping areas can be attacked. Exploring such attacks is an important research direction.
Balancing efficiency and quality: The trade-off between efficiency and solution quality poses a significant limitation for attack algorithms. Achieving the balance between efficiency and solution quality continues to be a critical challenge in progressing this field. Future research should focus on designing new advanced algorithms that offer higher-quality solutions.
Deception-aware detection: Since community detection algorithms can be misled by strategic manipulation, such as the removal of real links or the introduction of fake ones, deception-aware algorithms can be developed to predict missing links and identify deceptively added links. This can be accomplished by utilizing the network history (containing different snapshots of the network). For instance, analyzing modification of the community structure or detecting anomalies (like a tightly connected community becoming loosely connected after a while) can reveal significant insights.
Large graphs: Attack algorithms may struggle with scalability issues when applied to large graphs. Among the attack algorithms proposed in the literature, only a limited number have been tested on moderate/large graphs [6,44,47,72]. Implementing global attacks on these graphs, in particular, may be more challenging. Future work in this area will likely involve designing scalable algorithms to handle the large-scale graphs.
Different network types: An intriguing direction for future exploration lies in applying community detection attack algorithms to different network types like heterogeneous networks, dynamic networks, and multilayer networks.
Real applications: Adapting attack algorithms to real-life problems is likely to inspire further research, as it may lead to additional problems.

5. Conclusions

Community detection attacks are adversarial attacks that impair the performance of community detection algorithms by imperceptibly modifying the network structure. They have different applications, such as in preventing the detection of community membership information when it is confidential or sensitive or developing more robust community detection algorithms. These attacks operate at three scales: target node, target community, and global, with each scale having a different level of concealment objective. A diverse range of community detection attack algorithms has been offered to achieve network perturbation at these scales. Existing reviews limit their scope to target community attacks, but this survey extends the analysis to encompass community detection attacks across all three scales, offering a more integrated perspective. Additionally, it presents a thorough overview of the performance measures utilized to assess attack algorithms at these three scales, providing valuable guidance for selecting appropriate measures when comparing different algorithms. Moreover, analyzing the community detection algorithms targeted by attacks is crucial for understanding the ones that are commonly employed in this field. This study concludes with a compilation of suggestions for future research opportunities. Ultimately, this study serves as an important resource for researchers to quickly grasp community detection attacks.

Author Contributions

Conceptualization, L.T. and B.E.B.; Methodology, L.T. and B.E.B.; Validation, L.T. and B.E.B.; Formal analysis, L.T. and B.E.B.; Investigation, L.T. and B.E.B.; Writing—original draft preparation, L.T.; Writing—review and editing, L.T. and B.E.B.; Visualization, L.T.; Supervision, B.E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Javed, M.A.; Younis, M.S.; Latif, S.; Qadir, J.; Baig, A. Community detection in networks: A multidisciplinary review. J. Netw. Comput. Appl. 2018, 108, 87–111. [Google Scholar] [CrossRef]
Chen, J.; Chen, L.; Chen, Y.; Zhao, M.; Yu, S.; Xuan, Q.; Yang, X. GA-based Q-attack on community detection. IEEE Trans. Comput. Soc. Syst. 2019, 6, 491–503. [Google Scholar] [CrossRef]
Liu, D.; Chang, Z.; Yang, G.; Chen, E. Hiding ourselves from community detection through genetic algorithms. Inf. Sci. 2022, 614, 123–137. [Google Scholar] [CrossRef]
Zhao, J.; Wang, Z.; Cao, J.; Cheong, K.H. A self-adaptive evolutionary deception framework for community structure. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 4954–4967. [Google Scholar] [CrossRef]
Waniek, M.; Michalak, T.P.; Wooldridge, M.J.; Rahwan, T. Hiding individuals and communities in a social network. Nat. Hum. Behav. 2018, 2, 139–147. [Google Scholar] [CrossRef]
Fionda, V.; Pirro, G. Community deception or: How to stop fearing community detection algorithms. IEEE Trans. Knowl. Data Eng. 2017, 30, 660–673. [Google Scholar] [CrossRef]
Chen, J.; Chen, Y.; Chen, L.; Zhao, M.; Xuan, Q. Multiscale evolutionary perturbation attack on community detection. IEEE Trans. Comput. Soc. Syst. 2020, 8, 62–75. [Google Scholar] [CrossRef]
Fionda, V.; Pirrò, G. Community deception in networks: Where we are and where we should go. In Proceedings of the International Conference on Complex Networks and Their Applications, Madrid, Spain, 30 November–2 December 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 144–155. [Google Scholar]
Kalaichelvi, N.; Easwarakumar, K. A comprehensive survey on community deception approaches in social networks. In Proceedings of the International Conference on Computer, Communication, and Signal Processing, Chennai, India, 24–25 February 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 163–173. [Google Scholar]
Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Hlaoui, A.; Wang, S. A direct approach to graph clustering. Neural Netw. Comput. Intell. 2004, 4, 158–163. [Google Scholar]
Kernighan, B.W.; Lin, S. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 1970, 49, 291–307. [Google Scholar] [CrossRef]
Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed]
Newman, M.E. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2006, 74, 036104. [Google Scholar] [CrossRef] [PubMed]
Newman, M.E. Spectral methods for community detection and graph partitioning. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2013, 88, 042822. [Google Scholar] [CrossRef] [PubMed]
Higham, D.J.; Kalna, G.; Kibble, M. Spectral clustering and its use in bioinformatics. J. Comput. Appl. Math. 2007, 204, 25–37. [Google Scholar] [CrossRef]
Ruan, J.; Zhang, W. An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, 28–31 October 2007; pp. 643–648. [Google Scholar]
Brandes, U.; Delling, D.; Gaertler, M.; Gorke, R.; Hoefer, M.; Nikoloski, Z.; Wagner, D. On modularity clustering. IEEE Trans. Knowl. Data Eng. 2007, 20, 172–188. [Google Scholar] [CrossRef]
Chen, M.; Kuzmin, K.; Szymanski, B.K. Community detection via maximization of modularity and its variants. IEEE Trans. Comput. Soc. Syst. 2014, 1, 46–65. [Google Scholar] [CrossRef]
Newman, M.E. Fast algorithm for detecting community structure in networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2004, 69, 066133. [Google Scholar] [CrossRef] [PubMed]
Clauset, A.; Newman, M.E.; Moore, C. Finding community structure in very large networks. Phys. Rev. E 2004, 70, 066111. [Google Scholar] [CrossRef]
Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef]
Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 1–12. [Google Scholar] [CrossRef] [PubMed]
Sobolevsky, S.; Campari, R.; Belyi, A.; Ratti, C. General optimization technique for high-quality community detection in complex networks. Phys. Rev. E 2014, 90, 012811. [Google Scholar] [CrossRef] [PubMed]
Pons, P.; Latapy, M. Computing communities in large networks using random walks. In Proceedings of the Computer and Information Sciences-ISCIS 2005: 20th International Symposium, Istanbul, Turkey, 26–28 October 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 284–293. [Google Scholar]
Rosvall, M.; Bergstrom, C.T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 2008, 105, 1118–1123. [Google Scholar] [CrossRef] [PubMed]
Reichardt, J.; Bornholdt, S. Statistical mechanics of community detection. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2006, 74, 016110. [Google Scholar] [CrossRef]
Raghavan, U.N.; Albert, R.; Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 2007, 76, 036106. [Google Scholar] [CrossRef] [PubMed]
Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005, 435, 814–818. [Google Scholar] [CrossRef]
Prat-Pérez, A.; Dominguez-Sal, D.; Larriba-Pey, J.L. High quality, scalable and parallel community detection for large real graphs. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 225–236. [Google Scholar]
Fazlali, M.; Moradi, E.; Malazi, H.T. Adaptive parallel Louvain community detection on a multicore platform. Microprocess. Microsyst. 2017, 54, 26–34. [Google Scholar] [CrossRef]
Al-Andoli, M.N.; Tan, S.C.; Cheah, W.P.; Tan, S.Y. A review on community detection in large complex networks from conventional to deep learning methods: A call for the use of parallel meta-heuristic algorithms. IEEE Access 2021, 9, 96501–96527. [Google Scholar] [CrossRef]
Newman, M.E.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [PubMed]
Newman, M.E. Analysis of weighted networks. Phys. Rev. E 2004, 70, 056131. [Google Scholar] [CrossRef]
Danon, L.; Diaz-Guilera, A.; Duch, J.; Arenas, A. Comparing community structure identification. J. Stat. Mech. Theory Exp. 2005, 2005, P09008. [Google Scholar] [CrossRef]
Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
Chen, J.; Wu, Y.; Xu, X.; Chen, Y.; Zheng, H.; Xuan, Q. Fast gradient attack on network embedding. arXiv 2018, arXiv:1809.02797. [Google Scholar]
Bernini, A.; Silvestri, F.; Tolomei, G. Community Membership Hiding as Counterfactual Graph Search via Deep Reinforcement Learning. arXiv 2023, arXiv:2310.08909. [Google Scholar]
Liu, D.; Jia, R.; Liu, X.; Zhang, W. A unified framework of community hiding using symmetric nonnegative matrix factorization. Inf. Sci. 2024, 663, 120235. [Google Scholar] [CrossRef]
Nagaraja, S. The impact of unlinkability on adversarial community detection: Effects and countermeasures. In Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium, Berlin, Germany, 21–23 July 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 253–272. [Google Scholar]
Mittal, S.; Sengupta, D.; Chakraborty, T. Hide and seek: Outwitting community detection algorithms. IEEE Trans. Comput. Soc. Syst. 2021, 8, 799–808. [Google Scholar] [CrossRef]
Meilă, M. Comparing clusterings by the variation of information. In Proceedings of the Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, 24–27 August 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 173–187. [Google Scholar]
Van Dongen, S. Performance criteria for graph clustering and Markov cluster experiments. In Report-Information Systems; Centrum Voor Wiskunde en Informatica: Amsterdam, The Netherlands, 2000; pp. 1–36. [Google Scholar]
Liu, Y.; Liu, J.; Zhang, Z.; Zhu, L.; Li, A. REM: From structural entropy to community structure deception. Adv. Neural Inf. Process. Syst. 2019, 32, 12918–12928. [Google Scholar]
Liu, X.; Fu, L.; Wang, X.; Hopcroft, J.E. Prohico: A probabilistic framework to hide communities in large networks. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
Kumari, S.; Yadav, R.J.; Namasudra, S.; Hsu, C.H. Intelligent deception techniques against adversarial attack on the industrial system. Int. J. Intell. Syst. 2021, 36, 2412–2437. [Google Scholar] [CrossRef]
Liu, D.; Chang, Z.; Yang, G.; Chen, E. Community hiding using a graph autoencoder. Knowl.-Based Syst. 2022, 253, 109495. [Google Scholar] [CrossRef]
Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef]
Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
Fionda, V.; Pirró, G. Community deception in weighted networks. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Virtual Event Netherlands, 8–11 November 2021; pp. 278–282. [Google Scholar]
Fionda, V.; Madi, S.A.; Pirrò, G. Community deception: From undirected to directed networks. Soc. Netw. Anal. Min. 2022, 12, 74. [Google Scholar] [CrossRef]
Fionda, V.; Pirrò, G. Community deception in attributed networks. IEEE Trans. Comput. Soc. Syst. 2022, 11, 228–237. [Google Scholar] [CrossRef]
Madi, S.A.; Pirrò, G. Community deception in directed influence networks. Soc. Netw. Anal. Min. 2023, 13, 122. [Google Scholar] [CrossRef]
Chen, X.; Jiang, Z.; Li, H.; Ma, J.; Philip, S.Y. Community hiding by link perturbation in social networks. IEEE Trans. Comput. Soc. Syst. 2021, 8, 704–715. [Google Scholar] [CrossRef]
Chakraborty, T.; Srinivasan, S.; Ganguly, N.; Mukherjee, A.; Bhowmick, S. Permanence and community structure in complex networks. ACM Trans. Knowl. Discov. Data (TKDD) 2016, 11, 1–34. [Google Scholar] [CrossRef]
Nallusamy, K.; Easwarakumar, K. PERMDEC: Community deception in weighted networks using permanence. Computing 2024, 106, 353–370. [Google Scholar] [CrossRef]
Zhang, C.; Fu, L.; Ding, J.; Cao, X.; Long, F.; Wang, X.; Zhou, L.; Zhang, J.; Zhou, C. Community Deception in Large Networks: Through the Lens of Laplacian Spectrum. IEEE Trans. Comput. Soc. Syst. 2023, 11, 2057–2069. [Google Scholar] [CrossRef]
Madi, S.A.; Pirrò, G. Node-Centric Community Deception Based on Safeness. IEEE Trans. Comput. Soc. Syst. 2023, 11, 2955–2965. [Google Scholar] [CrossRef]
Pirrò, G. Community Deception from a Node-Centric Perspective. IEEE Trans. Netw. Sci. Eng. 2023, 11, 969–981. [Google Scholar] [CrossRef]
Chang, Z.; Liang, J.; Ma, S.; Liu, D. Community Hiding: Completely Escape from Community Detection. Inf. Sci. 2024, 672, 120665. [Google Scholar] [CrossRef]
Zhao, J.; Wang, Z.; Yu, D.; Cao, J.; Cheong, K.H. Swarm intelligence for protecting sensitive identities in complex networks. Chaos Solitons Fractals 2024, 182, 114831. [Google Scholar] [CrossRef]
Ye, F.; Chen, C.; Zheng, Z. Deep autoencoder-like nonnegative matrix factorization for community detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1393–1402. [Google Scholar]
Yu, S.; Zheng, J.; Chen, J.; Xuan, Q.; Zhang, Q. Unsupervised euclidean distance attack on network embedding. In Proceedings of the 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), Hong Kong, China, 27–29 July 2020; pp. 71–77. [Google Scholar]
Magelinski, T.; Bartulovic, M.; Carley, K.M. Measuring node contribution to community structure with modularity vitality. IEEE Trans. Netw. Sci. Eng. 2021, 8, 707–723. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308. [Google Scholar]
Yang, H.; Chen, L.; Cheng, F.; Qiu, J.; Zhang, L. LSHA: A Local Structure-Based Community Detection Attack Heuristic Approach. IEEE Trans. Comput. Soc. Syst. 2023, 11, 2966–2978. [Google Scholar] [CrossRef]
Zhao, J.; Cheong, K.H. Obfuscating community structure in complex network with evolutionary divide-and-conquer strategy. IEEE Trans. Evol. Comput. 2023, 27, 1926–1940. [Google Scholar] [CrossRef]
Yang, S.; Chen, B.; Zhu, G. EPCG: An Elite Population Co-evolutionary Genetic Algorithm for Global Community Deception. In Proceedings of the 7th International Conference on Control Engineering and Artificial Intelligence, Sanya, China, 28–30 January 2023; pp. 66–71. [Google Scholar]
Wang, X.; Li, J.; Guan, Y.; Yuan, J.; Tao, H.; Zhang, S. Enhancing Community Deception based on Graph Autoencoder and Genetic Algorithm. In Proceedings of the 2023 IEEE 9th International Conference on Computer and Communications (ICCC), Chengdu, China, 8–11 December 2023; pp. 742–746. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Zhou, J.; Chen, Z.; Du, M.; Chen, L.; Yu, S.; Chen, G.; Xuan, Q. RobustECD: Enhancement of network structure for robust community detection. IEEE Trans. Knowl. Data Eng. 2021, 35, 842–856. [Google Scholar] [CrossRef]
Yang, G.; Wang, Y.; Chang, Z.; Liu, D. Overlapping Community Hiding Method Based on Multi-Level Neighborhood Information. Symmetry 2022, 14, 2328. [Google Scholar] [CrossRef]
Liu, D.; Yang, G.; Wang, Y.; Jin, H.; Chen, E. How to protect ourselves from overlapping community detection in social networks. IEEE Trans. Big Data 2022, 8, 894–904. [Google Scholar] [CrossRef]

Figure 1. An overview of community detection attack.

Figure 2. Illustration of a target node attack. (a) Community structure of the original graph with the target node border highlighted in red. (b) Community structure identified after applying a target node attack.

Figure 3. Illustration of target community attack. (a) Community structure of the original graph with the borders of target community nodes highlighted in red. (b) Community structure after applying a target community attack.

Figure 4. A toy example illustrating the calculation of permanence of two nodes.

Figure 5. Illustration of a global attack. (a) Community structure of the original graph. (b) Community structure after applying a global attack.

Figure 6. The objective of global community obfuscation is demonstrated: (a) original community structure; (b) community dispersion occurring by dividing the original community structure; (c) optimal case of community obfuscation, with nodes of each community spread among various communities [4].

Table 1. Summary of target node attacks.

Ref.	Community Detection Attack	Update	Intra/ Inter	Knowledge Needed	Measure	Comparison	Community Detection Algorithm (**)
[40]	FGA	EDel, EAdd	✗	Network	Success rate, AML	Random, Nettack, DICE	emb + km
[7]	EPA	EAdd	✗	Network	Percent. degree increase	Random	gre, inf, lou, wal, eig, spi
[41]	DRL-Agent	EDel, EAdd	✓	Network, CS	Success rate, NMI	Random, Degree, ROAM	opt, lou, wal
[42]	CH-SNMF	EDel, EAdd (Rewire)	✓	Network, community number	Comm. retention prob.	ROAM	gre, inf, lou, lab

(**): Greedy [22] (gre), Infomap [29] (inf), Louvain [25] (lou), Walktrap [28] (wal), Eigenvectors [16] (eig), SpinGlass [30] (spi), Label Propagation [31] (lab), Optimal [20] (opt), Embedding (emb), K-means [12] (km).

Table 2. Summary of target community attacks.

Ref.	Community Detection Attack	Update	Intra/ Inter	Knowledge Needed	Measure	Comparison	Community Detection Algorithm (**)
[43]	Naga.	EAdd	✓	C’s members links, Vertex centralities	Miss ratio	No	smo
[5]	DICE	EDel, EAdd	✓	C’s members links	M	No	cnm, inf, lou, wal, eig, spi, btw
[6]	Ds, Dm	EDel, EAdd	✓	C’s members links for Ds, CS for Dm	H, NMI	DICE	cnm, inf, lou, wal, eig, spi, btw, lab, opt, scd
[7]	EPA	EDel, EAdd (Rewire)	✗	Network, A part of CS	Fitness, H	DICE, Ds	gre, inf, lou, wal, eig, spi
[53]	SECRETORUM	EDel, EAdd	✓	C’s members links	H, NMI	Random, DICE, Ds, NEURAL	cnm, inf, lou, wal, lab
[57]	Hs	EDel, EAdd	✓	C’s members links	H	Ds, Dm	inf, lou, eig, spi, lab
[44]	NEURAL	EDel, EAdd	✓	Node info for a subset of nodes	NMI, MNMI, CommS, CommU	Random, Naga., DICE, Ds	cnm, inf, lou, wal, eig, lab
[48]	ProHiCo (SBM, DCSBM)	EDel, EAdd	✓	Network, CS	Jaccard, Recall, Precision, NMI	Ds, REM	cnm, inf, wal, lab, lei
[60]	ComDeceptor	EDel, EAdd	✓	Network, CS	Jaccard, Recall, Precision, NMI	Ds, REM, DCSBM	cnm, inf, lou, eig, lei, danmf
[41]	DRL-Agent	EDel, EAdd	✓	Network, CS	H, NMI	Ds, Dm	opt, lou, walk
[61]	nSAF	EDel, EAdd, NDel, NAdd	✗	C’s members links	H, NMI	Random, DICE, Ds, Dm, NEURAL	cnm, inf, lou, eig, lab, scd, lei, cmb, spec, kcut
[62]	nDec	EDel, EAdd, NDel, NAdd, NMov	✗	Network, CS	H, MNMI, NMI	Random, DICE, Ds, Dm, NEURAL	cnm, inf, lou, eig, lab, scd, lei, cmb, spec, kcut
[42]	CH-SNMF	EDel, EAdd (Rewire)	✓	Network, Community number	H, M	DICE, Ds	gre, inf, lou, lab
[63]	CEHA, CDHA, CCHA	EDel, EAdd	✓	Network, CS	NMI, Q, M	Random, DICE, Ds, NEURAL	cnm, inf, lou, lab
[64]	SCP	EDel, EAdd	✓	Network, CS	NMI, VI, SJD, Local	Random, DICE, Ds, NEURAL, MOD	cnm, inf, lou, wal, eig, btw

(**): Spectral Modularity Opt. [24] (smo), Greedy [22] (gre), Clauset–Newman–Moore [23] (cnm), Infomap [29] (inf), Louvain [25] (lou), Walktrap [28] (wal), Eigenvectors [16] (eig), SpinGlass [30] (spi), Edge-Betweeness [15] (btw), Label Propagation [31] (lab), Optimal [20] (opt), Scalable Community Detection [33] (scd), Leiden [26] (lei), Combo [27] (cmb), Spectral [18] (spec), Knowledge cut [19] (kcut), Deep Autoencoder-like NMF [65] (danmf).

Table 3. An individual representation.

$(19, 38)$	$(29, 21)$	$(25, 33)$	$(20, 15)$
$(19, 13)$	$(29, 14)$	$(25, 28)$	$(20, 5)$

Table 4. Summary of global attacks.

Ref.	Community Detection Attack	Update	Intra/ Inter	Knowledge Needed	Measure	Comparison	Community Detection Algorithm (**)
[2]	DBA, CDA	EDel, EAdd (Rewire)	✓	Network, CS	Q, NMI	Random-R	gre, inf, lou, eig, lab, n2v, km
	Q-Attack		✗	Network, CS, Q
[47]	REM	EAdd	✓	Network, CS	Jaccard, NMI, Recall	Modularity Min. (MOM), Random-Add	cnm, inf, lou, wal, spi, btw
[7]	EPA	EDel, EAdd (Rewire)	✗	Network, CS	NMI, ARI	Q-Attack, EPA with H, two heuristics	gre, inf, lou, wal, eig, spi
[66]	EDA	EDel, EAdd	✗	Network	NMI	Random, DICE, RLS, DBA	dpw + km
[49]	MPL, Custom Rewiring	EDel, EAdd (Rewire)	✓	Network, CS	Node-centric, NMI, CCs	Ds	gre, cnm, lou, lab, cpm, bis
[67]	Modularity Vitality	EDel, NDel	✗	Network, CS	Modularity Minimization	No	lei
[50]	GCH	EDel, EAdd	✓	Network, CS	NMI, Attack Effic.	Random, DICE, Ds	cnm, inf, lou, wal, eig, spi, btw, lab
[3]	CGN	EDel, EAdd	✓	Network, CS	Q, NMI, BN ( $β$ * NMI)	Random, DICE, Ds, Q-Attack, NEURAL	cnm, inf, lou, lab
[4]	SAEP, DFP	EDel, EAdd	✓	Network, CS	NMI, VI, SJD	Random, REM, Q-Attack, DFP	cnm, inf, lou, wal, eig, btw
[70]	CoeCo	EDel, EAdd	✓	Network, CS	NMI, ARI	Random, REM, Q-Attack, DFP	cnm, inf, lou, wal, eig, btw
[71]	EPCG	EDel, EAdd $\| E^{+} \| = \| E^{-} \|$	✗	Network, CS	NMI, ARI, Purity	Random, CDA, Q-Attack	cnm, lou, wal
[72]	GAE + Genetic alg	EDel, EAdd	✗	Network	NMI, ARI	Random, Barabasi and Albert, CDA, Q-Attack	lou, lab, cpm
[69]	LSHA	EDel, EAdd (Rewire)	✓	Network	NMI, ARI	Random-R, CDA, DBA, DFP-R	cnm, inf, lou, wal, lab
[42]	CH-SNMF	EDel, EAdd (Rewire)	✓	Network, Community number	NMI, ARI	Random, Q-Attack	gre, inf, lou, lab

(**): Greedy [22] (gre), Clauset–Newman–Moore [23] (cnm), Infomap [29] (inf), Louvain [25] (lou), Walktrap [28] (wal), Eigenvectors [16] (eig), SpinGlass [30] (spi), Edge-Betweeness [15] (btw), Label Propagation [31] (lab), Optimal [20] (opt), Node2vec [73] (n2v), K-means [12] (km), DeepWalk [74] (dpw), Clique Percolation Method [32] (cpm), Bisection [14] (bis), Leiden [26] (lei).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tekin, L.; Bostanoğlu, B.E. A Qualitative Survey on Community Detection Attack Algorithms. Symmetry 2024, 16, 1272. https://doi.org/10.3390/sym16101272

AMA Style

Tekin L, Bostanoğlu BE. A Qualitative Survey on Community Detection Attack Algorithms. Symmetry. 2024; 16(10):1272. https://doi.org/10.3390/sym16101272

Chicago/Turabian Style

Tekin, Leyla, and Belgin Ergenç Bostanoğlu. 2024. "A Qualitative Survey on Community Detection Attack Algorithms" Symmetry 16, no. 10: 1272. https://doi.org/10.3390/sym16101272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Qualitative Survey on Community Detection Attack Algorithms

Abstract

1. Introduction

1.1. Scope and Contributions of the Survey

1.2. Outline of the Survey

2. Preliminaries

2.1. Community Detection

2.2. Community Detection Attack

2.3. Evaluation Measures for Community Detection Attacks

3. Community Detection Attacks

3.1. Target Node Attack

3.2. Target Community Attack

3.3. Global Attack

4. Discussion and Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI