Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Concentration of Random Graphs and Application To Community Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

P .I .C . M .

– 2018
Rio de Janeiro, Vol. 4 (2925–2944)

CONCENTRATION OF RANDOM GRAPHS AND


APPLICATION TO COMMUNITY DETECTION

C M. L , E L R V

Abstract
Random matrix theory has played an important role in recent work on statis-
tical network analysis. In this paper, we review recent results on regimes of con-
centration of random graphs around their expectation, showing that dense graphs
concentrate and sparse graphs concentrate after regularization. We also review rel-
evant network models that may be of interest to probabilists considering directions
for new random matrix theory developments, and random matrix theory tools that
may be of interest to statisticians looking to prove properties of network algorithms.
Applications of concentration results to the problem of community detection in net-
works are discussed in detail.

1 Introduction
A lot of recent interest in concentration of random graphs has been generated by prob-
lems in network analysis, a very active interdisciplinary research area with contribu-
tions from probability, statistics, physics, computer science, and the social sciences all
playing a role. Networks represent relationships (edges) between objects (nodes), and
a network between n nodes is typically represented by its n  n adjacency matrix A.
We will focus on the case of simple undirected networks, where Aij = 1 when nodes i
and j are connected by an edge, and 0 otherwise, which makes A a symmetric matrix
with binary entries. It is customary to assume the graph contains no self-loops, that
is, Ai i = 0 for all i, but this is not crucial. In general, networks may be directed (A
is not symmetric), weighted (the entries of A have a numerical value representing the
strength of connection), and/or signed (the entries of A have a sign representing whether
the relationship is positive or negative in some sense).
Viewing networks as random realizations from an underlying network model enables
analysis and inference, with the added difficulty that we often only observe a single re-
alization of a given network. Quantities of interest to be inferred from this realization
may include various network properties such as the node degree distribution, the net-
work radius, and community structure. Fundamental to these inferences is the question
of how close a single realization of the matrix A is to the population mean, or the true
MSC2010: primary 05C80; secondary 05C85, 60B20.

2925
2926 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

model, E A. If A is close to E A, that is, A concentrates around its mean, then inferences
drawn from A can be transferred to the population with high probability.
In this paper, we aim to answer the question “When does A concentrate around E A?”
under a number of network models and asymptotic regimes. We also show that in some
cases when the network does not concentrate, a simple regularization step can restore
concentration. While the question of concentration is interesting in its own right, we
especially focus on the implications for the problem of community detection, a prob-
lem that has attracted a lot of attention in the networks literature. When concentration
holds, in many cases a simple spectral algorithm can recover communities, and thus
concentration is of practical and not only theoretical interest.

2 Random network models


Our concenrtation results hold for quite general models, but, for the sake of clarity, we
provide a brief review of network models, starting from the simplest model and building
up in complexity.

The Erdős–Rényi (ER) graph. The simplest random network model is the Erdős–
Rényi graph G(n; p) Erdős and Rényi [1959]. Under this model, edges are indepen-
dently drawn between all pairs of nodes according to a Bernoulli distribution with suc-
cess probability p. Although the ER model provides an important building block in net-
work modeling and is attractive to analyze, it almost never fits network data observed
in practice.

The stochastic block model (SBM). The SBM is perhaps the simplest network model
with community structure, first proposed by Holland, Laskey, Leinhardt, and and [1983].
Under this model, each node belongs to exactly one of K communities, and the node
community membership ci is drawn independently from a multinomial distribution on
f1; : : : ; Kg with probabilities 1 ; : : : ; K . Conditional on the label vector c, edges are
drawn independently between each pair of nodes i; j , with

P (Aij = 1) = Bci cj ;

where B is a symmetric K  K matrix controlling edge probabilities. Note that each


community within SBM is an ER graph. The main question of interest in network
analysis is estimating the label vector c from A, although model parameters  and P
may also be of interest.

The degree-corrected stochastic block model (DCSBM). While the SBM does in-
corporate community structure, the assumption that each block is an ER graph is too
restrictive for many real-world networks. In particular, ER graphs have a Poisson de-
gree distribution, and real networks typically fit the power law or another heavy-tailed
distribution better, since they often have “hubs”, influential nodes with many connec-
tions. An extension removing this limitation, the degree-corrected stochastic block
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2927

model (DCSBM) was proposed by Karrer and Newman [2011]. The DCSBM is like
an SBM but with each node assigned an additional parameter i > 0 that controls its
expected degree, and edges drawn independently with

P (Aij = 1) = i j Bci cj :

Additional constraints need to be imposed on i for model identifiability; see Karrer


and Newman [2011] and Zhao, Levina, and Zhu [2012] for options.

The latent space model (LSM). Node labels under the SBM or the DCSBM can be
thought of as latent (unobserved) node positions in a discrete space of K elements. More
generally, latent positions can be modeled as coordinates in Rd , or another set equipped
with a distance measure. The LSM Hoff, Raftery, and Handcock [2002] assumes that
each node i is associated with an unknown position xi and edges are drawn indepen-
dently between each pair of nodes i; j with probability inversely proportional to the
distance between xi and xj . If latent positions xi form clusters (for example, if they
are drawn from a mixture of Gaussians), then a random network generated from this
model exhibits community structure. Inferring the latent positions can in principle lead
to insights into how the network was formed, beyond simple community assignments.

Exchangeable random networks. An analogue of de Finetti’s theorem for networks,


due to Hoover and Aldous Hoover [1979] and Aldous [1981], shows that any network
whose distribution is invariant under node permutations can be represented by

Aij = g(˛; i ; j ; ij );

where ˛, i and j are independent and uniformly distributed on [0; 1], and g(u; v; w; z)
= g(u; w; v; z) for all u; v; w; z. This model covers all the previously discussed mod-
els as special cases, and the function g, called the graphon, can be estimated up to a
permutation under additional assumptions; see Olhede and Wolfe [2013], Gao, Lu, and
Zhou [2015], and Y. Zhang, Levina, and Zhu [2017].

Network models with overlapping communities. In practice, it is often more reason-


able to allow nodes to belong to more than one community. Multiple such models have
been proposed, including the Mixed Membership Stochastic Block Model (MMSBM)
Airoldi, Blei, Fienberg, and Xing [2008], the Ball-Karrer-Newman Model (BKN) Ball,
Karrer, and Newman [2011], and the OCCAM model Y. Zhang, Levina, and Zhu [2014].
MMSBM allows different memberships depending on which node the given node inter-
acts with; the BKN models edges as a sum of multiple edges corresponding to different
communities; and OCCAM relaxes the membership vector c under the SBM to have
entries between 0 and 1 instead of exactly one “1”. All of these models are also covered
by the theory we present, because, conditional on node memberships, all these networks
are distributed according to an inhomogeneous Erdős–Rényi model, the most general
model we consider, described next.
2928 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

The inhomogeneous Erdős–Rényi model. All models described above share an im-
portant property: conditioned on node latent positions, edges are formed independently.
The most general form of such a model is the inhomogeneous Erdős–Rényi model
(IERM) Bollobás, Janson, and Riordan [2007], where each edge is independently drawn,
with P (Aij = 1) = Pij , where P = (Pij ) = E A. Evidently, additional assumptions
have to be made if latent positions of nodes (however they are defined) are to be recov-
ered from a single realization of A. We will state concentration results under the IERM
as generally as possible, and then discuss additional assumptions under which latent
positions can also be estimated reliably.

Scaling. We have so far defined all the models for a fixed number of nodes n, but in
order to talk about concentration, we need to determine how the expectation Pn = E An
changes with n. Most of the literature defines

Pn = n P

where P is a matrix with constant non-negative entries, and n controls the average
expected degree of the network, d = dn = nn . Different asymptotic regimes have
been studied, especially under the SBM; see Abbe [2017] for a review. Unless n ! 0,
the average network degree d = Ω(n), and the network becomes dense as n grows.
In the SBM literature, the regime dn  log n is sometimes referred to as semi-dense;
dn ! 1 but not faster than log n is semi-sparse; and the constant degree regime dn =
O(1) is called sparse. We will elaborate on these regimes and their implications later
on in the paper.

3 Useful random matrix results


We start from presenting a few powerful and general tools in random matrix theory
which can help prove concentration bounds for random graphs.

Theorem 3.1 (Bai-Yin law Bai and Y. Q. Yin [1988]; see Füredi and Komlós [1981] an
for earlier result). Let M = (Mij )1i;j =1 be an infinite, symmetric, and diagonal-free
random matrix whose entries above the diagonal are i.i.d. random variables with zero
mean and variance  2 . Suppose further that E Mij4 < 1. Let Mn = (Mij )ni;j =1 denote
the principal minors of M . Then, as n ! 1,
1
(3-1) p kMn k ! 2 almost surely.
n
Theorem 3.2 (Matrix Bernstein’s inequality). Let X1 ; : : : ; XN be independent, mean
zero, n  n symmetric random matrices, such that kXi k  K almost surely for all i.
Then, for every t  0 we have

n X
N o  t 2 /2 

P Xi  t  2n exp :
 2 + Kt /3
i=1
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2929

P
2
Here  2 = Ni=1 E Xi is the norm of the “matrix variance” of the sum.

Corollary 3.3 (Expected norm of sum of random matrices). We have

XN p

E Xi .  log n + K log n:
i=1

The following result gives sharper bounds on random matrices than matrix Bern-
stein’s inequality, but requires independence of entries.

Theorem 3.4 (Bandeira-van Handel Bandeira and van Handel [2016] Corollary 3.6).
Let M be an n  n symmetric random matrix with independent entries on and above
the diagonal. Then
X 1/2 p
E kM k . max ij2 + log n max Kij ;
i i;j
j

where ij2 = E Mij2 are the variances of entries and Kij = kMij k1 .

Theorem 3.5 (Seginer’s theorem Seginer [2000]). Let M be a nn symmetric random
matrix with i.i.d. mean zero entries above the diagonal and arbitrary entries on the
diagonal. Then
E kM k  E max kMi k2
i

where Mi denote the columns of M .

The lower bound in Seginer’s theorem is trivial; it follows from the fact that the
operator norm of a matrix is always bounded below by the Euclidean norm of any of its
columns. The original paper of Seginer Seginer [ibid.] proved the upper bound for non-
symmetric matrices with independent entries. The present statement of Theorem 3.5
can be derived by a simple symmetrization argument, see Hajek, Wu, and Xu [2016,
Section 4.1].

4 Dense networks concentrate


If A = An is the adjacency matrix of a G(n; p) random graph with a constant p, then
the Bai-Yin law gives
q
1
p kA E Ak ! 2 p(1 p):
n

In particular, we have
p
(4-1) kA E Ak  2 d

with probability tending to one, where d = np is the expected node degree.


2930 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

Can we expect a similar concentration for sparser Erdős–Rényi graphs, where p is


allowed to decrease with n? The method of Friedman, Kahn, and Szemeredi [1989]
adapted by Feige and Ofek [2005] gives
p
(4-2) kA E Ak = O( d )

under the weaker condition d & log n, which is optimal, as we will see shortly. This
argument actually yields (4-2) for inhomogeneous random graphs G(n; (pij )) as well,
and for d = maxij npij , see e.g. Lei and Rinaldo [2015] and Chin, Rao, and V. Vu
[2015].
Under a weaker assumption d = np  log4 n, Vu V. H. Vu [2007] proved a sharper
bound for G(n; p), namely
p
(4-3) kA E Ak = (2 + o(1)) d ;

which essentially extends (4-1) to sparse random graphs. Very recently, Benaych-Geor-
ges, Bordenave, and Knowles [2017b] were able to derive (4-3) under the optimal con-
dition d  log n. More precisely, they showed that if 4  d  n2/13 , then
s
p log n
E kA E Ak  2 d + C :
1 + log(log(n)/d )

The argument of Benaych-Georges, Bordenave, and Knowles [ibid.] applies more gen-
erally to inhomogeneous random graphs G(n; (pij )) under a regularity condition on
the connection probabilities (pij ). It even holds for more general random matrices that
may not necessarily have binary entries.
To apply Corollary 3.3 to the adjacency matrix A of an ER random graph G(n; p),
P
decompose A into a sum of independent random matrices A = i j Xij , where each
matrix Xij contains a pair of symmetric entries of A, i.e. Xij = Aij (ei eiT +ej ejT ) where
(ei ) denotes the canonical basis in Rn . Then apply Corollary 3.3 to the sum of mean
zero matrices Xij p. It is quick to check that  2  pn and obviously K  2, and so
we conclude that
p
(4-4) E kA E Ak . d log n + log n;

where d = np is the expected degree. The same argument applies more generally to
inhomogeneous random graphs G(n; (pij )), and it still gives (4-4) when
X
d = max pij
i
j

is the maximal expected degree.


The logarithmic factors in bound (4-4) are not optimal, and can be improved by
applying the result of Bandeira and van Handel (Theorem 3.4) to the centered adjacency
matrix A E A of an inhomogeneous random graph G(n; (pij )). In this case, ij2 = pij
and Kij  1, so we obtain the following sharpening of (4-4).
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2931

Proposition 4.1 (Concentration of inhomogeneous random graphs). Let A be the adja-


cency matrix of an inhomogeneous random graph G(n; (pij )). Then
p p
(4-5) E kA E Ak . d + log n;
P
where d = maxi j pij is the expected maximal degree.

In particular, if the graph is not too sparse, namely d & log n, then the optimal
concentration (4-3) holds, i.e.
p
E kA E Ak . d :

This recovers a result of Feige and Ofek [2005].


A similar bound can be alternatively proved using the general result of Seginer (Theo-
rem
p 3.5).p If A is the adjacency matrix of G(n; p), it is easy to check that E maxi kAi k2 .
d + log n. Thus, Seginer’s theorem implies the optimal concentration bound (4-5)
as well. Using simple convexity arguments, one can extend this to inhomogeneous ran-
dom graphs G(n; (pij )), and get the bound (4-5) for d = maxij npij , see Hajek, Wu,
and Xu [2016, Section 4.1].
One may wonder if Seginer’s theorem holds for matrices with independent but not
identically distributed entries. Unfortunately, this is not the case in general; a simple
counterexample was found by Seginer Seginer [2000], see Bandeira and van Handel
[2016, Remark 4.8]. Nevertheless, it is an open conjecture of Latala that Seginer’s
theorem does hold if M has independent Gaussian entries, see the papers Riemer and
Schütt [2013] and van Handel [2017a] and the survey van Handel [2017b].

5 Sparse networks concentrate after regularization

5.1 Sparse networks do not concentrate. In the sparse regime d = np  log n,


the Bai-Yin’s law for G(n; p) fails. This is because in this case, degrees of some vertices
are much higher than the expected degree d . This causes
p some rows of the adjacency
matrix A to have Euclidean norms much larger than d , which in turn gives
p
kA E Ak  d :

In other words, concentration fails for very sparse graphs; there exist outlying eigenval-
ues that escape the interval [ 2; 2] where the spectrum of denser graphs lies according
to (3-1). For precise description of this phenomenon, see the original paper Krivelevich
and Sudakov [2003], a discussion in Bandeira and van Handel [2016, Section 4] and
the very recent work Benaych-Georges, Bordenave, and Knowles [2017a].

5.2 Sparse networks concentrate after regularization. One way to regularize a


random network in the sparse regime is to remove high degree vertices altogether from
the network. Indeed, Feige and Ofek [2005] showed that for G(n; p), if we drop all ver-
tices with degrees, say, larger than 2d , then the remaining part of the network satisfies
2932 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

p
kA E Ak = O( d ) with high probability. The argument in Feige and Ofek [2005]
is based on the method developed by Friedman, Kahn, and Szemeredi [1989] and it is
extended to the IERM in Lei and Rinaldo [2015] and Chin, Rao, and V. Vu [2015].
Although removal of high degree vertices restores concentration, in practice this is
a bad idea, since the loss of edges associated with “hub” nodes in an already sparse
network leads to a considerable loss of information, and in particular community de-
tection tends to break down. A more gentle regularization proposed in Le, Levina, and
Vershynin [2017] does not remove high degree vertices, but reduces the weights of their
edges just enough to keep the degrees bounded by O(d ).

Theorem 5.1 (Concentration of regularized adjacency matrices). Consider a random


graph from the inhomogeneous Erdős–Rényi model G(n; (pij )), and let d = maxij npij .
Consider any subset of at most 10n/d vertices, and reduce the weights of the edges
incident to those vertices in an arbitrary way, but so that all degrees of the new (weighted)
network become bounded by 2d . For any r  1, with probability at least 1 n r the
adjacency matrix A0 of the new weighted graph satisfies
p
kA0 E Ak  C r 3/2 d :

Proving concentration for this kind of general regularization requires different tools.
One key result we state next is the Grothendieck-Pietsch factorization, a general and
well-known result in functional analysis Pietsch [1980], Pisier [1986], Tomczak-Jae-
germann [1989], and Pisier [2012] which has already been used in a similar probabilis-
tic context Ledoux and Talagrand [1991, Proposition 15.11]. It compares two matrix
norms, the spectral norm `2 ! `2 and the `1 ! `2 norm.

Theorem 5.2 (Grothendieck-Pietsch factorization). Let B be a k m real matrix. Then


P
there exist positive weights j with mj =1 j = 1 such that

1/2
kBk1!2  kBD k  2kBk1!2 ;

where D = diag(j ) denotes the m  m diagonal matrix with weights j on the


diagonal.

Idea of the proof of Theorem 5.1 by network decomposition. The argument in Feige
and Ofek [2005] becomes very complicated for handling the general regularization in
Theorem 5.1. A simpler alternative approach was developed by Le, Levina, and Ver-
shynin [2017] for proving Theorem 5.1. The main idea is to decompose the set of entries
[n]  [n] into different subsets with desirable properties. There exists a partition (see
Figure 1c for illustration)
[n]  [n] = N [ R [ C
such that A concentrates on N even without regularization, while restrictions of A onto
R and C have small row and column sums, respectively. It is easy to see that the degree
regularization does not destroy the properties of N , R and C. Moreover, it creates a
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2933

n/d

R0 R0
N0 N0 R
N
R1
N1
n/2
n/d C0 C0 C

·
C1

··
n/2

(a) First step (b) Iterations (c) Final decomposition

Figure 1: Constructing network decomposition iteratively.

new property, allowing for controlling the columns of R and rows of C. Together with
the triangle inequality, this implies the concentration of the entire network.
The network decomposition is constructed by an iterative procedure. We first estab-
lish concentration of A in `1 ! `2 norm using standard probability techniques. p Next,
we upgrade this to concentration in the spectral norm k(A E A)N0 k = O( d ) on an
appropriate (large) subset N0  [n]  [n] using the Grothendieck-Pietsch factorization
(Theorem 5.2). It remains to control A on the complement of N0 . That set is small; it
can be described as a union of a block C0 with a small number of rows, a block R0 with
a small number of columns and an exceptional (small) block (see Figure 1a). Now we
repeat the process for the exceptional block, decomposing it into N1 , R1 , and C1 , and
so on, as shown in Figure 1b. At the end, we set N = [i Ni , R = [i Ri and C = [i Ci .
The cumulative error from this iterative procedure can be controlled appropriately; see
Le, Levina, and Vershynin [ibid.] for details.

5.3 Concentration of the graph Laplacian. So far, we have looked at random


graphs through the lens of their adjacency matrices. Another matrix that captures the
structure of a random graph is the Laplacian. There are several ways to define the
Laplacian; we focus on the symmetric, normalized Laplacian,

L(A) = D 1/2
AD 1/2
:
Pn
Here D = diag(di ) is the diagonal matrix with degrees di = j =1 Aij on the di-
agonal. The reader is referred to F. R. K. Chung [1997] for an introduction to graph
Laplacians and their role in spectral graph theory. Here we mention just two basic facts:
the spectrum of L(A) is a subset of [ 1; 1], and the largest eigenvalue is always one.
In the networks literature in particular, community detection has been mainly done
through spectral clustering on the Laplacian, not on the adjacency matrix. We will
discuss this in more detail in Section 6, but the primary reason for this is degree nor-
malization: as discussed in Section 2, real networks rarely have the Poisson or mixture
of Poissons degree distribution that characterizes the stochastic block model; instead,
“hubs”, or high degree vertices, are common, and they tend to break down spectral
clustering on the adjacency matrix itself.
2934 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

Concentration of Laplacians of random graphs has been studied by S. Yin [2008],


Chaudhuri, F. Chung, and Tsiatas [2012], Qin and Rohe [2013], Joseph and Yu [2016],
and Gao, Ma, A. Y. Zhang, and Zhou [2017]. Just like the adjacency matrix, the Lapla-
cian is known to concentrate in the dense regime d = Ω(log n), and it fails to concen-
trate in the sparse regime. However, the reasons it fails to concentrate are different. For
the adjacency matrix, as we discussed, concentration fails in the sparse case because of
high degree vertices. For the Laplacian, it is the low degree vertices that destroy con-
centration. In fact, it is easy to check that when d = o(log n), the probability of isolated
vertices is non-vanishing; and each isolated vertex contributes an eigenvalue of 0 to the
spectrum of L(A), which is easily seen to destroy concentration.
Multiple ways to regularize the Laplacian in order to deal with the low degree ver-
tices have been proposed. Perhaps the two most common ones are adding a small con-
stant to all the degrees on the diagonal of D Chaudhuri, F. Chung, and Tsiatas [2012],
and adding a small constant to all the entries of A before computing the Laplacian.
Here we focus on the latter regularization, proposed by Amini, Chen, Bickel, and Lev-
ina [2013] and analyzed by Joseph and Yu [2016] and Gao, Ma, A. Y. Zhang, and Zhou
[2017]. Choose  > 0 and add the same number /n to all entries of the adjacency
matrix A, thereby replacing it with

 T
(5-1) A := A + 11
n

Then compute the Laplacian as usual using this new adjacency matrix. This regulariza-
tion raises all degrees di to di + , and eliminates isolated vertices, making the entire
graph connected. The original paper Amini, Chen, Bickel, and Levina [2013] suggested
the choice  = d̄ , where d̄ is the average node degree and  2 (0; 1) is a constant.
They showed the estimator is not particularly sensitive to  over a fairly wide range of
values away from 0 (too little regularization) and 1 (too much noise). The choice of
 = 0:25 was recommended by Amini, Chen, Bickel, and Levina [ibid.] but this param-
eter can also be successfully chosen by cross-validation on the network T. Li, Levina,
and Zhu [2016].
The following consequence of Theorem 5.1 shows that regularization (5-1) indeed
forces the Laplacian to concentrate.

Theorem 5.3 (Concentration of the regularized Laplacian). Consider a random graph


drawn from the inhomogeneous Erdős–Rényi model G(n; (pij )), and let d = maxij npij .
Choose a number  > 0. Then, for any r  1, with probability at least 1 e r we have

C r2  d 5/2
kL(A ) L(E A )k  p 1 + :
 

In the next section, we discuss why concentration of the adjacency matrix and/or its
Laplacian is important in the context of community detection, the primary application
of concentration in network analysis.
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2935

6 Application to community detection

Concentration of random graphs has been of such interest in networks analysis primar-
ily because it relates to the problem of community detection; see Fortunato [2010],
Goldenberg, Zheng, Fienberg, Airoldi, et al. [2010], and Abbe [2017] for reviews of
community detection algorithms and results. We should specify that, perhaps in a slight
misnomer, “community detection” refers to the task of assigning each node to a commu-
nity (typically one and only one), not to the question of whether there are communities
present, which might be a more natural use of the term “detection”.
Most of the theoretical work linking concentration of random graphs to community
detection has focused on the stochastic block model (SBM), defined in Section 2, which
is one of the many special cases of the general IERM we consider. For the purpose of
this paper, we focus on the simplest version of the SBM for which the largest num-
ber of results has been obtained so far, also known as the balanced planted partition
model G(n; na ; nb ). In this model, there are K = 2 equal-sized communities with n/2
nodes each. Edges between vertices within the same community are drawn indepen-
dently with probability a/n, and edges between vertices in different communities with
probability b/n. The task is to recover the community labels of vertices from a single
realization of the adjacency matrix A drawn from this model. The large literature on
both the recovery algorithms and the theory establishing when a recovery is possible is
very nicely summarized in the recent excellent review Abbe [2017], where we refer the
reader for details and analogues for a general K (now available for most results) and
the asymmetric SBM (very few are available). In the following subsections we give a
brief summary for the symmetric K = 2 case which does not aim to be exhaustive.

6.1 Community detection phase transition. Weak recovery, sometimes also called
detection, means performing better than randomly guessing the labels of vertices. The
phase transition threshold for weak recovery was first conjectured in the physics liter-
ature by Decelle, Krzakala, Moore, and Zdeborová [2011], and proved rigorously by
Mossel, Neeman, and Sly [2013, 2015, 2014], with follow-up and related work by Abbe,
Bandeira, and Hall [2016], Massoulié [2014], and Bordenave, Lelarge, and Massoulié
[2015]. The phase transition result says that there exists a polynomial time algorithm
which can classify more than 50% of the vertices correctly as n ! 1 with high proba-
bility if and only if

(a b)2 > 2(a + b):

Performing better than random guessing is the weakest possible guarantee of perfor-
mance, which is of interest in the very sparse regime of d = (a + b)/2 = O(1); when
the degree grows, weak recovery becomes trivial. This regime has been mostly studied
by physicists and probabilists; in the statistics literature, consistency has been of more
interest.
2936 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

6.2 Consistency of community detection. Two types of consistency have been dis-
cussed in the literature. Strong consistency, also known as exact recovery, means la-
beling all vertices correctly with high probability, which is, as the name suggests, a
very strong requirement. Weak consistency, or “almost exact” recovery, is the weaker
and arguably more practically reasonable requirement that the fraction of misclassified
vertices goes to 0 as n ! 1 with high probability.
Strong consistency was studied first, in a seminal paper Bickel and Chen [2009], as
well as by Mossel, Neeman, and Sly [2014], McSherry [2001], Hajek, Wu, and Xu
[2016], and Cai and X. Li [2015]. Strong consistency is achievable, and achievable in
polynomial time, if ˇr s ˇ
ˇ a b ˇˇ p
ˇ
ˇ ˇ> 2
ˇ log n log n ˇ
ˇp p ˇ p
ˇ ˇ
and not possible if ˇ a/n b/nˇ < 2. In particular, strong consistency is nor-
mally only considered in the semi-dense regime of d / log n ! 1.
Weak consistency, as one would expect, requires a stronger condition than weak
recovery but a weaker one than strong consistency. Weak consistency is achievable if
and only if
(a b)2
= !(1)
a+b
see for example Mossel, Neeman, and Sly [2014]. In particular, weak consistency is
achievable in the semi-sparse regime of d ! 1.
Partial recovery, finally, refers to the situation where the fraction of misclassified
vertices does not go to 0, but remains bounded by a constant below 0.5. More specifi-
cally, partial recovery means that for a fixed " > 0 one can recover communities up to
"n mislabeled vertices. For the balanced symmetric case, this is true as long as

(a b)2
= O(1)
a+b
which is primarily relevant when d = O(1). Several types of algorithms are known
to succeed at partial recovery in this very sparse regime, including non-backtracking
walks Mossel, Neeman, and Sly [2013], Massoulié [2014], and Bordenave, Lelarge,
and Massoulié [2015], spectral methods Chin, Rao, and V. Vu [2015] and methods
based on semidefinite programming Guédon and Vershynin [2016] and Montanari and
Sen [2016].

6.3 Concentration implies recovery. As an example application of the new con-


centration results, we demonstrate how to show that regularized spectral clustering
Amini, Chen, Bickel, and Levina [2013] and Joseph and Yu [2016], one of the simplest
and most popular algorithms for community detection, can recover communities in the
sparse regime of constant degrees. In general, spectral clustering works by comput-
ing the leading eigenvectors of either the adjacency matrix or the Laplacian, or their
regularized versions, and running the k-means clustering algorithm on the rows of the
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2937

n  k matrix of leading eigenvectors to recover the node labels. In the simplest case of
the balanced K = 2 model G(n; na ; nb ), one can simply assign nodes to two communi-
ties according to the sign of the entries of the eigenvector v2 (A0 ) corresponding to the
second smallest eigenvalue of the (regularized) adjacency matrix A0 .
Let us briefly explain how concentration results validate recovery from the regular-
ized adjacency matrix or regularized Laplacian. If concentration holds and the regular-
ized matrix A0 is shown to be close to E A, then standard perturbation theory (i.e., the
Davis-Kahan theorem, see e.g. Bhatia [1997]) implies that v2 (A0 ) is close to v2 (E A),
and in particular, the signs of these two eigenvectors must agree on most vertices. An
easy calculation shows that the signs of v2 (E A) recover the communities exactly: the
eigenvector corresponding to the second smallest eigenvalue of E A (or the second
largest of L(A)) is a positive constant on one community and a negative constant on
the other. Therefore, the signs of v2 (A0 ) recover communities up to a small fraction of
misclassified vertices and, as always, up to a permutation of community labels. This
argument remains valid if we replace the regularized adjacency matrix A0 with regular-
ized Laplacian L(A ).

Corollary 6.1 (Partial recovery from a regularized adjacency matrix for sparse graphs).
Let " > 0 and r  1. Let A be the adjacency matrix drawn from the stochastic block
model G(n; na ; nb ). Assume that

(a b)2 > C (a + b)

where C is a constant depending only on " and r. For all nodes with degrees larger
than 2a, reduce the weights of the edges incident to them in an arbitrary way, but so
that all degrees of the new (weighted) network become bounded by 2a, resulting in a
new matrix A0 . Then with probability at least 1 e r , the signs of the entries of the
eigenvector corresponding to the second smallest eigenvalue of A0 correctly estimate
the partition into two communities, up to at most "n misclassified vertices.

Corollary 6.2 (Partial recovery from a regularized Laplacian for sparse graphs). Let
" > 0 and r  1. Let A be the adjacency matrix drawn from the stochastic block model
G(n; na ; nb ). Assume that

(6-1) (a b)2 > C (a + b)

where C is a constant depending only on " and r. Choose  to be the average degree
of the graph, i.e.  = (d1 +    + dn )/n. Then with probability at least 1 e r , the
signs of the entries of the eigenvector corresponding to the second largest eigenvalue
of L(A ) correctly estimate the partition into the two communities, up to at most "n
misclassified vertices.

As we have discussed, the Laplacian is typically preferred over the adjacency ma-
trix in practice, because the variation in node degrees is reduced by the normalization
factor D 1/2 Sarkar and Bickel [2015]. Figure 2 shows the effect of regularization for
the Laplacian of a random network generated from G(n; na ; nb ) with n = 50, a = 5 and
2938 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

Figure 2: Three leading eigenvectors (from top to bottom) of the Laplacian (left)
and the regularized Laplacian (right). The network is generated from G(n; na ; nb )
with n = 50, a = 5 and b = 0:1. Nodes are labeled so that the first 25 nodes
belong to one community and the rest to the other community. Regularized Lapla-
cian is computed from A + 0:1d̄ /n11T .

b = 0:1. For plotting purposes, we order the nodes so that the first n/2 nodes belong
to one community and the rest belong to the other community. Without regularization,
the two leading eigenvectors of the Laplacian localize around a few low degree nodes,
and therefore do not contain any information about the global community structure. In
contrast, the second leading eigenvector of the regularized Laplacian (with  = 0:1d̄ )
clearly reflects the communities, and the signs of this eigenvector alone recover com-
munity labels correctly for all but three nodes.

7 Discussion
Great progress has been made in recent years, and yet many problems remain open.
Open questions on community detection under the SBM, in terms of exact and partial
recovery and efficient (polynomial time) algorithms are discussed in Abbe [2017], and
likely by the time this paper comes out in print, some of them will have been solved.
Yet the focus on the SBM is unsatisfactory for many practitioners, since not many real
networks fit this model well. Some of the more general models we discussed in Sec-
tion 2 fix some of the problems of the SBM, allowing for heterogeneous degree distri-
butions and overlapping communities, for instance. A bigger problem lies in the fixed
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2939

K regime; it is not realistic to assume that as the size of the network grows, the number
of communities remains fixed. A more realistic model is the “small world” scenario,
where the size of communities remains bounded or grows very slowly with the number
of nodes, the number of communities grows, and connections between many smaller
communities happen primarily through hub nodes. Some consistency results have been
obtained for a growing K, but we are not aware of any results in the sparse constant
degree regime so far. An even bigger problem is presented by the so far nearly uni-
versal assumption of independent edges; this assumption violates commonly observed
transitivity of friendships (if A is friends with B and B is friends with C, A is more
likely to be friends with C). There are other types of network models that do not rely on
this assumption, but hardly any random matrix results apply there. Ultimately, network
analysis involves a lot more than community detection: link prediction, network de-
noising, predicting outcomes on networks, dynamic network modeling over time, and
so on. We are a long way away from establishing rigorous theoretical guarantees for
any of these problems to the extent that we have for community detection, but given
how rapid progress in the latter area has been, we are hopeful that continued interest
from the random matrix community will help shed light on other problems in network
analysis.

References
Emmanuel Abbe (Mar. 2017). “Community detection and stochastic block models: re-
cent developments”. arXiv: 1703.10146 (cit. on pp. 2928, 2935, 2938).
Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall (2016). “Exact recovery in
the stochastic block model”. IEEE Trans. Inform. Theory 62.1, pp. 471–487. arXiv:
1405.3267. MR: 3447993 (cit. on p. 2935).
Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing (2008). “Mixed
membership stochastic blockmodels”. Journal of Machine Learning Research 9.Sep,
pp. 1981–2014 (cit. on p. 2927).
David J. Aldous (1981). “Representations for partially exchangeable arrays of random
variables”. J. Multivariate Anal. 11.4, pp. 581–598. MR: 637937 (cit. on p. 2927).
Arash A. Amini, Aiyou Chen, Peter J. Bickel, and Elizaveta Levina (2013). “Pseudo-
likelihood methods for community detection in large sparse networks”. Ann. Statist.
41.4, pp. 2097–2122. MR: 3127859 (cit. on pp. 2934, 2936).
Z. D. Bai and Y. Q. Yin (1988). “Necessary and sufficient conditions for almost sure
convergence of the largest eigenvalue of a Wigner matrix”. Ann. Probab. 16.4, pp. 1729–
1741. MR: 958213 (cit. on p. 2928).
B Ball, B. Karrer, and M. E. J. Newman (2011). “An efficient and principled method
for detecting communities in networks”. Physical Review E 34, p. 036103 (cit. on
p. 2927).
Afonso S. Bandeira and Ramon van Handel (2016). “Sharp nonasymptotic bounds on
the norm of random matrices with independent entries”. Ann. Probab. 44.4, pp. 2479–
2506. MR: 3531673 (cit. on pp. 2929, 2931).
2940 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

Florent Benaych-Georges, Charles Bordenave, and Antti Knowles (Apr. 2017a). “Largest
eigenvalues of sparse inhomogeneous Erdős–Rényi graphs”. arXiv: 1704 . 02953
(cit. on p. 2931).
– (Apr. 2017b). “Spectral radii of sparse random matrices”. arXiv: 1704.02945 (cit.
on p. 2930).
Rajendra Bhatia (1997). Matrix analysis. Vol. 169. Graduate Texts in Mathematics.
Springer-Verlag, New York, pp. xii+347. MR: 1477662 (cit. on p. 2937).
Peter J Bickel and Aiyou Chen (2009). “A nonparametric view of network models and
Newman–Girvan and other modularities”. Proceedings of the National Academy of
Sciences 106.50, pp. 21068–21073 (cit. on p. 2936).
Béla Bollobás, Svante Janson, and Oliver Riordan (2007). “The phase transition in in-
homogeneous random graphs”. Random Structures Algorithms 31.1, pp. 3–122. MR:
2337396 (cit. on p. 2928).
Charles Bordenave, Marc Lelarge, and Laurent Massoulié (2015). “Non-backtracking
spectrum of random graphs: community detection and non-regular Ramanujan graphs”.
In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science—FOCS
2015. IEEE Computer Soc., Los Alamitos, CA, pp. 1347–1357. arXiv: 1501.06087.
MR: 3473374 (cit. on pp. 2935, 2936).
T. Tony Cai and Xiaodong Li (2015). “Robust and computationally feasible community
detection in the presence of arbitrary outlier nodes”. Ann. Statist. 43.3, pp. 1027–
1059. MR: 3346696 (cit. on p. 2936).
Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas (2012). “Spectral clustering of
graphs with general degrees in the extended planted partition model”. In: Proceed-
ings of Machine Learning Research, pp. 1–23 (cit. on p. 2934).
P. Chin, A. Rao, and V. Vu (2015). “Stochastic block model and community detection in
the sparse graphs : A spectral algorithm with optimal rate of recovery”. In: Proceed-
ings of Machine Learning Research. Vol. 40, pp. 391–423 (cit. on pp. 2930, 2932,
2936).
Fan R. K. Chung (1997). Spectral graph theory. Vol. 92. CBMS Regional Conference
Series in Mathematics. Published for the Conference Board of the Mathematical
Sciences, Washington, DC; by the American Mathematical Society, Providence, RI,
pp. xii+207. MR: 1421568 (cit. on p. 2933).
Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová (2011).
“Asymptotic analysis of the stochastic block model for modular networks and its
algorithmic applications”. Physical Review E 84.6, p. 066106 (cit. on p. 2935).
P. Erdős and A. Rényi (1959). “On random graphs. I”. Publ. Math. Debrecen 6, pp. 290–
297. MR: 0120167 (cit. on p. 2926).
Uriel Feige and Eran Ofek (2005). “Spectral techniques applied to sparse random graphs”.
Random Structures Algorithms 27.2, pp. 251–275. MR: 2155709 (cit. on pp. 2930–
2932).
Santo Fortunato (2010). “Community detection in graphs”. Phys. Rep. 486.3-5, pp. 75–
174. MR: 2580414 (cit. on p. 2935).
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2941

Joel Friedman, Jeff Kahn, and Endre Szemeredi (1989). “On the second eigenvalue of
random regular graphs”. In: Proceedings of the twenty-first annual ACM symposium
on Theory of computing. ACM, pp. 587–598 (cit. on pp. 2930, 2932).
Z. Füredi and J. Komlós (1981). “The eigenvalues of random symmetric matrices”.
Combinatorica 1.3, pp. 233–241. MR: 637828 (cit. on p. 2928).
Chao Gao, Yu Lu, and Harrison H. Zhou (2015). “Rate-optimal graphon estimation”.
Ann. Statist. 43.6, pp. 2624–2652. MR: 3405606 (cit. on p. 2927).
Chao Gao, Zongming Ma, Anderson Y. Zhang, and Harrison H. Zhou (2017). “Achiev-
ing optimal misclassification proportion in stochastic block models”. J. Mach. Learn.
Res. 18, Paper No. 60, 45. MR: 3687603 (cit. on p. 2934).
Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi, et al. (2010).
“A survey of statistical network models”. Foundations and Trends in Machine Learn-
ing 2.2, pp. 129–233 (cit. on p. 2935).
Olivier Guédon and Roman Vershynin (2016). “Community detection in sparse net-
works via Grothendieck’s inequality”. Probab. Theory Related Fields 165.3-4, pp. 1025–
1049. MR: 3520025 (cit. on p. 2936).
Bruce Hajek, Yihong Wu, and Jiaming Xu (2016). “Achieving exact cluster recov-
ery threshold via semidefinite programming”. IEEE Trans. Inform. Theory 62.5,
pp. 2788–2797. MR: 3493879 (cit. on pp. 2929, 2931, 2936).
Ramon van Handel (2017a). “On the spectral norm of Gaussian random matrices”.
Trans. Amer. Math. Soc. 369.11, pp. 8161–8178. MR: 3695857 (cit. on p. 2931).
– (2017b). “Structured random matrices”. Convexity and Concentration 161, pp. 107–
156 (cit. on p. 2931).
Peter D. Hoff, Adrian E. Raftery, and Mark S. Handcock (2002). “Latent space ap-
proaches to social network analysis”. J. Amer. Statist. Assoc. 97.460, pp. 1090–1098.
MR: 1951262 (cit. on p. 2927).
Paul W. Holland, Kathryn Blackmond Laskey, Samuel Leinhardt, and and (1983). “Sto-
chastic blockmodels: first steps”. Social Networks 5.2, pp. 109–137. MR: 718088
(cit. on p. 2926).
Douglas N Hoover (1979). “Relations on probability spaces and arrays of random vari-
ables”. Technical report, Institute for Advanced Study, Princeton, NJ 2 (cit. on p. 2927).
Antony Joseph and Bin Yu (2016). “Impact of regularization on spectral clustering”.
Ann. Statist. 44.4, pp. 1765–1791. MR: 3519940 (cit. on pp. 2934, 2936).
Brian Karrer and M. E. J. Newman (2011). “Stochastic blockmodels and community
structure in networks”. Phys. Rev. E (3) 83.1, pp. 016107, 10. MR: 2788206 (cit. on
p. 2927).
Michael Krivelevich and Benny Sudakov (2003). “The largest eigenvalue of sparse
random graphs”. Combin. Probab. Comput. 12.1, pp. 61–72. MR: 1967486 (cit. on
p. 2931).
Can M. Le, Elizaveta Levina, and Roman Vershynin (2017). “Concentration and reg-
ularization of random graphs”. Random Structures Algorithms 51.3, pp. 538–561.
MR: 3689343 (cit. on pp. 2932, 2933).
2942 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN

Michel Ledoux and Michel Talagrand (1991). Probability in Banach spaces. Vol. 23.
Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and
Related Areas (3)]. Isoperimetry and processes. Springer-Verlag, Berlin, pp. xii+480.
MR: 1102015 (cit. on p. 2932).
Jing Lei and Alessandro Rinaldo (2015). “Consistency of spectral clustering in stochas-
tic block models”. Ann. Statist. 43.1, pp. 215–237. MR: 3285605 (cit. on pp. 2930,
2932).
Tianxi Li, Elizaveta Levina, and Ji Zhu (Dec. 2016). “Network cross-validation by edge
sampling”. arXiv: 1612.04717 (cit. on p. 2934).
Laurent Massoulié (2014). “Community detection thresholds and the weak Ramanujan
property”. In: STOC’14—Proceedings of the 2014 ACM Symposium on Theory of
Computing. ACM, New York, pp. 694–703. MR: 3238997 (cit. on pp. 2935, 2936).
Frank McSherry (2001). “Spectral partitioning of random graphs”. In: 42nd IEEE Sym-
posium on Foundations of Computer Science (Las Vegas, NV, 2001). IEEE Computer
Soc., Los Alamitos, CA, pp. 529–537. MR: 1948742 (cit. on p. 2936).
Andrea Montanari and Subhabrata Sen (2016). “Semidefinite programs on sparse ran-
dom graphs and their application to community detection”. In: STOC’16—Proceedings
of the 48th Annual ACM SIGACT Symposium on Theory of Computing. ACM, New
York, pp. 814–827. MR: 3536616 (cit. on p. 2936).
Elchanan Mossel, Joe Neeman, and Allan Sly (Nov. 2013). “A Proof Of The Block
Model Threshold Conjecture”. arXiv: 1311.4115 (cit. on pp. 2935, 2936).
– (July 2014). “Consistency Thresholds for the Planted Bisection Model”. arXiv: 1407.
1591 (cit. on pp. 2935, 2936).
– (2015). “Reconstruction and estimation in the planted partition model”. Probab. The-
ory Related Fields 162.3-4, pp. 431–461. MR: 3383334 (cit. on p. 2935).
Sofia C Olhede and Patrick J Wolfe (2013). “Network histograms and universality
of blockmodel approximation”. Proceedings of the National Academy of Sciences
111.41, pp. 14722–14727 (cit. on p. 2927).
Albrecht Pietsch (1980). Operator ideals. Vol. 20. North-Holland Mathematical Library.
Translated from German by the author. North-Holland Publishing Co., Amsterdam-
New York, p. 451. MR: 582655 (cit. on p. 2932).
Gilles Pisier (1986). Factorization of linear operators and geometry of Banach spaces.
Vol. 60. CBMS Regional Conference Series in Mathematics. Published for the Con-
ference Board of the Mathematical Sciences, Washington, DC; by the American
Mathematical Society, Providence, RI, pp. x+154. MR: 829919 (cit. on p. 2932).
– (2012). “Grothendieck’s theorem, past and present”. Bull. Amer. Math. Soc. (N.S.)
49.2, pp. 237–323. MR: 2888168 (cit. on p. 2932).
Tai Qin and Karl Rohe (2013). “Regularized spectral clustering under the degree-corrected
stochastic blockmodel”. In: Advances in Neural Information Processing Systems,
pp. 3120–3128 (cit. on p. 2934).
Stiene Riemer and Carsten Schütt (2013). “On the expectation of the norm of random
matrices with non-identically distributed entries”. Electron. J. Probab. 18, no. 29,
13. MR: 3035757 (cit. on p. 2931).
RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION 2943

Purnamrita Sarkar and Peter J. Bickel (2015). “Role of normalization in spectral clus-
tering for stochastic blockmodels”. Ann. Statist. 43.3, pp. 962–990. MR: 3346694
(cit. on p. 2937).
Yoav Seginer (2000). “The expected norm of random matrices”. Combin. Probab. Com-
put. 9.2, pp. 149–166. MR: 1762786 (cit. on pp. 2929, 2931).
Nicole Tomczak-Jaegermann (1989). Banach-Mazur distances and finite-dimensional
operator ideals. Vol. 38. Pitman Monographs and Surveys in Pure and Applied Math-
ematics. Longman Scientific & Technical, Harlow; copublished in the United States
with John Wiley & Sons, Inc., New York, pp. xii+395. MR: 993774 (cit. on p. 2932).
Van H. Vu (2007). “Spectral norm of random matrices”. Combinatorica 27.6, pp. 721–
736. MR: 2384414 (cit. on p. 2930).
Shuhua Yin (2008). “Investigation on spectrum of the adjacency matrix and Laplacian
matrix of graph Gl ”. WSEAS Trans. Syst. 7.4, pp. 362–372. MR: 2447295 (cit. on
p. 2934).
Yuan Zhang, Elizaveta Levina, and Ji Zhu (Dec. 2014). “Detecting Overlapping Com-
munities in Networks Using Spectral Methods”. arXiv: 1412.3432 (cit. on p. 2927).
– (2017). “Estimating network edge probabilities by neighbourhood smoothing”. Bio-
metrika 104.4, pp. 771–783. MR: 3737303 (cit. on p. 2927).
Y. Zhao, E. Levina, and J. Zhu (2012). “Consistency of community detection in net-
works under degree-corrected stochastic block models”. Annals of Statistics 40.4,
pp. 2266–2292 (cit. on p. 2927).

Received 2017-12-22.

C M. L
D S ,U C ,D ,O S A ,D , CA 95616, U.S.A.
canle@ucdavis.edu

E L
D S ,U M , 1085 S. U A ,A A , MI 48109,
U.S.A.
elevina@umich.edu

R V
U C I , 340 R H ,I , CA 92697, U.S.A.
rvershyn@uci.edu

You might also like