Hidden Community Detection
Hidden Community Detection
Jialu Bao1 , Kun He2(B) , Xiaodong Xin2 , Bart Selman3 , and John E. Hopcroft3
1
Department of Computer Science, University of Wisconsin-Madison,
Madison, WI 50706, USA
2
School of Computer Science and Technology, Huazhong University of Science
and Technology, Wuhan 430074, China
brooklet60@hust.edu.cn
3
Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
1 Introduction
Community detection problem has occurred in a wide range of domains, from
social network analysis to biological protein-protein interactions, and numerous
algorithms have been proposed, based on the assumption that nodes in the same
community are more likely to connect with each other. While many real-world
J. Bao—Portion of the work was done while at Cornell University.
c Springer Nature Switzerland AG 2020
J. Chen et al. (Eds.): TAMC 2020, LNCS 12337, pp. 365–376, 2020.
https://doi.org/10.1007/978-3-030-59267-7_31
366 J. Bao et al.
social networks satisfy the assumption, their communities can overlap in interest-
ing ways: communities based on schools can overlap as students attend different
schools; connections of crime activities often hide behind innocuous social con-
nections; proteins serving multiple functions can belong to multiple function
communities. In any of these networks, communities can have more structures
than random overlappings. For example, communities based on schools may
be divided into primary school, middle school, high school, college and gradu-
ate school layers, where each layer are approximately disjoint. This observation
inspires us to model real world networks as having multiple layers.
To simulate real-world networks, researchers also build generative models
such as single-layer stochastic block model G(n, n1 , p, q) (p > q). It can be seen
as Erdős-Rényi model with communities—G(n, n1 , p, 1) has n nodes that belongs
to n1 disjoint blocks/communities (we use them interchangeably in the follow-
ing), and any node pair internal to a community has probability p to form an
edge, while any node pair across two communities have q probability to form an
edge. We propose a multi-layer stochastic block model G(n, n1 , p1 , ..., nL , pL ),
where each layer l consists of nl disjoint communities, and communities in dif-
ferent layers are independent to each other. Each layer l is associated with one
edge probability pl , determining the probability that a node pair internal to a
community in that layer forms an edge. In this ideal abstraction, we assume
that each node belongs to exactly one community in each layer, and an edge is
generated only through that process, i.e. all edges outgoing communities of one
layer are generated as internal edges in some other layers. Note that our model
is different to the multi-layer stochastic blockmodel proposed by Paul et al. [6],
where they have different types of edges, and each type of edges forms one layer
of the network.
He et al. [3,4] first introduce the concept of hidden communities, remarked
as a new graph-theoretical concept [7]. He et al. propose the Hidden Commu-
nity Detection (HICODE) algorithm for networks containing both strong and
hidden layers of communities, where each layer consists of a set of disjoint or
slightly overlapping communities. A hidden community is a community most of
whose nodes also belong to other stronger communities as measured by metrics
like modularity [2]. They showed through experiments that HICODE uncovers
grounded communities with higher accuracy and finds hidden communities in
the weak layers. However, they did not provide any theoretical support.
In this work, we provide solid theoretical analysis that demonstrates the
effectiveness of HICODE on two-layer stochastic models. One important step in
HICODE algorithm is to reduce the strength of one partition when the partition
is found to approximate one layer of communities in the network. Since commu-
nities in different layers unavoidably overlap, both internal edges and outgoing
edges of remaining layers have a chance to be reduced while reducing one layer.
It was unclear how the modularity of remaining layer would change. Through
rigorous analysis of three layer weakening methods they suggested, we prove
that using any one of RemoveEdge, ReduceEdge and ReduceWeight on one layer
increases the modularity of the grounded partition in the unreduced layer. Thus,
we provide evidence that HICODE’s layer reduction step makes weak layers more
detectable.
Hidden Community Detection on Two-Layer Stochastic Models 367
2 Preliminary
In this section, we first introduce metrics that measure community partition
quality. Then, we summarize important components in HICODE, the iterative
meta-approach we are going to analyze, and in particular, how it reduce layers
of detected communities during the iterations. Also, we define the multi-layer
stochastic block model formally, and the rationale why it is a reasonable abstrac-
tion of generative processes of real world networks.
of one layer are the result of them being internal edges of some other layers. We
will detail the expected number of outgoing edges and the size of the intersection
block of layers in Lemma 1 in the next section.
For example, in G(200, 4, 5, p1 , p2 ), layer 1 contains four communities C11 =
{1, 2, ..., 50}, C12 = {51, 52, ..., 100}, C13 = {101, 102, ..., 150}, C14 = {151, 152, ...,
200}, and layer 2 contains five communities C21 = {1, 6, ..., 196}, C22 = {2, 7, ...,
197}, C23 = {3, 8, ..., 198}, C24 = {4, 9, ..., 199}, C25 = {5, 10, ..., 200}. Each com-
munity is modeled as an Erdős-Rényi graph. Each C1i in layer 1 is expected
to have 0.5 · 502 p1 internal edges, and each C2i in layer 2 are expected to have
0.5 · 402 p1 internal edges.
Each community in layer 1 overlaps with each community in layer 2. Each
overlap consists of 20% of the nodes of layer 1 community and 25% of the nodes
of layer 2 community. Figure 1 (a) and (b) show the adjacency matrix when nodes
are ordered by [1, ..., n] for layer 1, and [1, 6, ..., 196, 2, 7, ..., 197, 5, 10, ..., 200] for
layer 2, respectively (Here we set p1 = 0.12, p2 = 0.10). Fig. 1 (c) and (d) show
an enlarged block for each layer. Edges in layer 1 are plotted in red, edges in
layer 2 are plotted in blue and the intersected edges are plotted in green.
Lemma 2. For layer l in a two-layer stochastic block model, if the layer weak-
ening method (e.g. RemoveEdge, ReduceEdge, ReduceWeight) reduces a bigger
percentage of outgoing edges than internal edges, i.e. the expected number of
e e
internal and outgoing edges after weakening, ell , elout , satisfy elout
lout
< ellll , then
the modularity of layer l increases after the weakening method.
|S12 | + |S1 | 2
e11 = , e1out = |S2 |,
n1 n1
|S12 | + |S2 | 2
e22 = , e2out = |S1 |.
n2 n2
We omitted the proof due to space limit.
Using the above three lemmas, we can prove the following theorems.
Proof. If we remove all internal edges of communities in layer 1, both |S12 | and
|S1 | become 0, then the remaining internal edges of layer 2 is e22 = n12 (|S12 | +
|S2 |
|S2 |) = n2 > 0. There is no outgoing edge of layer 2, so e2out = 0. Thus,
e2out e22
e2out =0< e22 ,
and applying Lemma 2, we have that the modularity of layer 2
after RemoveEdge on layer 1 Q2 > Q2 .
Similarly, the modularity of layer 1 after RemoveEdge on layer 2, Q1 , is
greater than Q1 .
2
ei22 di2
Q2 = Qi2 = −
e 2e
i∈layer 2 i∈layer 2
4e · ei22 − (2ei22 )2 e · e22 − (e22 )2
= = n2 .
4e2 e2
i∈layer 2
For any one partition, we can transform layer 2 partition to it by moving a series
of nodes across communities. Every time we move one node from one community
i to another community j, both ei2out , ej2out will increase by 1, ei22 will decrease
by 2 while ej22 remains the same. Let ei i
2out , e22 denote corresponding values after
all movements. The following always holds no matter how many times we move:
2 (ei22 − ei
22 ) = ei
2out
i∈layer 2 i∈layer 2
Now Q2 , the modularity of the new partition after moving, is:
2
ei di
Q2 = 22
− 2
e 2e
i∈layer 2
4e · ei (2ei i 2
22 + e2out )
= 22
− .
4e2 4e2
i∈layer 2 i∈layer 2
Finally, we have Q2 ≤ Q2 + 4eT2 ≤ Q2 . Hence, layer 2 has the highest modular-
ity among all possible partitions of n nodes into n2 communities. In this way,
RemoveEdge makes the unreduced layer easier for the base algorithm to detect.
Theorem 3. For a two-layer stochastic blockmodel network G(n, n1 , n2 , p1 , p2 ),
the modularity of a layer increases if we apply ReduceEdge on all communities
in the other layer.
Proof. In ReduceEdge of layer 1, we keep edges in the given community with
probability q1 = 1−
1−
p
is the observed edge probability within the
q , where p
detected community and q is the observed background noise.
ReduceEdge on layer 1 would only keep q1 fraction of edges in S12 and S1 ,
so after ReduceEdge,
1 1
e22 = (|S2 | + |S12 | · q1 ) > (|S2 | + |S12 |) · q1 = e22 · q1 ,
n2 n2
2
e2out = |S1 | · q1 = e2out · q1 .
n1
e e
Thus, e2out
2out
< e2222
, and Lemma 2 indicates that Q2 < Q2 . Similarly, for the
modularity of layer 1 after ReduceEdge on layer 1, Q1 > Q1 .
Theorem 4. For a synthetic two-layer block model network G(n, n1 , n2 , p1 , p2 ),
the modularity of a layer increases if we apply ReduceWeight on all communities
in the other layer.
Proof. According to [3], ReduceWeight on layer 1 multiplies the weight of edges
in layer 1 community by q1 = 1 − 1− p
q percent. In weighted network, the weight
1−
sum of internal edges of a community i in layer 2 is e22 = 2 u,v∈i wuv · Auv
1
where wuv is the weight of edge (u, v). By construction, ReduceWeight on layer
1 reduces weight of all edges in S12 or S1 , but does not change weight of edges
in S2 . Thus,
1 1
ei
22 = wuv · Auv · q1 + wuv · Auv
2 2
u,v∈i, (u,v)∈S12 u,v∈i, (u,v)∈S2
⎛ ⎞
1 1
>⎝ wuv · Auv + wuv · Auv ⎠ · q1
2 2
u,v∈i, (u,v)∈S12 u,v∈i, (u,v)∈S2
= ·
ei22 q1
1
ei2out = wuv Auv
2
u∈i,v ∈i
/
1
ei
2out = wuv Auv · q1 = ei2out · q1
2
u∈i,v ∈i
/
e2out e22
Thus, <
e2out and combined with Lemma 2, this proves that Q2 > Q2 , the
e22 ,
modularity increases after ReduceWeight.
Similarly, the modularity of layer 1 after RemoveEdge on layer 1, Q1 > Q1 .
374 J. Bao et al.
The analysis shows that weakening one layer with any one of the meth-
ods (RemoveEdge, ReduceEdge, ReduceWeight) increases the modularity of the
other layer. These results follow naturally from Lemma 2, which is in some way
a stronger claim that the modularity of the remaining layer increases as long as
a larger percentage of outgoing edges is reduced than internal edges.
Fig. 2. Simulation results of ReduceEdge on G(600, 15, 12, 0.1, 0.12). (Color figure
online)
cross sign denotes where the estimated layer projects on the 2-dimensional plane.
Simulations using RemoveEdge and ReduceWeight yield similar results.
1. Initially, two grounded layers here have similar modularity values, contribut-
ing to the two local peaks of modularity, one at the right-bottom and the
other at the left-top.
2. (a): At iteration t = 0:, the base algorithm finds an approximate layer 2,
whose NMI similarity with layer 2 is about 0.90.
3. (b): After reducing that partition, the modularity local peak at the left-top
sinks and the modularity peak at right-bottom rises, and the base algorithm
finds an approximate layer 1 whose NMI similarity with layer 1 is about 0.89.
ReduceEdge then reduces this approximated layer 1 and makes it easier to
approximate layer 2.
4. (c) and (d): At t = 1, the base algorithm finds an approximate layer 2 having
0.97 NMI similarity with layer 2, which is a significant improvement. As that
more accurate approximation of layer 2 is reduced, the base algorithm is
able to find a better approximation of layer 1 too. In our run, it finds an
approximation that has 0.96 NMI similarity with layer 1.
5. (e) and (f): As HICODE iterates, at t = 2, the base algorithm is able to
uncover an approximate layer 2 with 0.98 NMI similarity, and an approximate
layer 1 with 0.97 NMI similarity.
5 Conclusion
In this work, we provide a theoretical perspective on the hidden community
detection meta-approach HICODE, on multi-layer stochastic block models. We
376 J. Bao et al.
References
1. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of com-
munities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
2. Girvan, M., Newman, M.E.: Community structure in social and biological networks.
Proc. Natl. Acad. Sci. 99(12), 3807–3870 (2015)
3. He, K., Li, Y., Soundarajan, S., Hopcroft, J.E.: Hidden community detection in
social networks. Inf. Sci. 425, 92–106 (2018)
4. He, K., Soundarajan, S., Cao, X., Hopcroft, J.E., Huang, M.: Revealing multiple
layers of hidden community structure in networks. CoRR abs/1501.05700 (2015)
5. McDaid, A.F., Greene, D., Hurley, N.: Normalized mutual information to evaluate
overlapping community finding algorithms. arXiv preprint arXiv:1110.2515 (2011)
6. Paul, S., Chen, Y.: Consistent community detection in multi-relational data through
restricted multi-layer stochastic blockmodel. Electron. J. Stat. 10(2), 3807–3870
(2016)
7. Teng, S.H., et al.: Scalable algorithms for data and network analysis. Found. Trends
Theor. Comput. Sci. 12(1–2), 1–274 (2016)