Dyngem: Deep Embedding Method For Dynamic Graphs
Dyngem: Deep Embedding Method For Dynamic Graphs
Dyngem: Deep Embedding Method For Dynamic Graphs
Abstract SVD based models [Belkin and Niyogi, 2001; Roweis and
Saul, 2000; Tenenbaum et al., 2000; Cao et al., 2015; Ou et
Embedding large graphs in low dimensional spaces al., 2016], which decompose the Laplacian or high-order ad-
has recently attracted significant interest due to its jacency matrix to produce node embeddings. Others include
wide applications such as graph visualization, link Random-walk based models [Grover and Leskovec, 2016;
prediction and node classification. Existing meth- Perozzi et al., 2014] which create embeddings from local-
ods focus on computing the embedding for static ized random walks and many others [Tang et al., 2015;
graphs. However, many graphs in practical ap- Ahmed et al., 2013; Cao et al., 2016; Niepert et al., 2016].
plications are dynamic and evolve constantly over Recently, Wang et al. designed an innovative model, SDNE,
time. Naively applying existing embedding algo- which utilizes a deep autoencoder to handle non-linearity
rithms to each snapshot of dynamic graphs inde- to generate more accurate embeddings [Wang et al., 2016].
pendently usually leads to unsatisfactory perfor- Many other methods which handle attributed graphs and
mance in terms of stability, flexibility and effi- generate a unified embedding have also been proposed in
ciency. In this work, we present an efficient algo- the recent past [Chang et al., 2015; Huang et al., 2017a;
rithm DynGEM based on recent advances in deep 2017b].
autoencoders for graph embeddings, to address this
However, in practical applications, many graphs, such as
problem. The major advantages of DynGEM in-
social networks, are dynamic and evolve over time. For ex-
clude: (1) the embedding is stable over time, (2)
ample, new links are formed (when people make new friends)
it can handle growing dynamic graphs, and (3) it
and old links can disappear. Moreover, new nodes can be in-
has better running time than using static embedding
troduced into the graph (e.g., users can join the social net-
methods on each snapshot of a dynamic graph. We
work) and create new links to existing nodes. Usually, we
test DynGEM on a variety of tasks including graph
represent the dynamic graphs as a collection of snapshots of
visualization, graph reconstruction, link prediction
the graph at different time steps [Leskovec et al., 2007].
and anomaly detection (on both synthetic and real
Existing works which focus on dynamic embeddings of-
datasets). Experimental results demonstrate the su-
ten apply static embedding algorithms to each snapshot of
perior stability and scalability of our approach.
the dynamic graph and then rotationally align the resulting
static embeddings across time steps [Hamilton et al., 2016;
1 Introduction Kulkarni et al., 2015]. Naively applying existing static em-
Many important tasks in network analysis involve making bedding algorithms independently to each snapshot leads to
predictions over nodes and/or edges in a graph, which de- unsatisfactory performance due to the following challenges:
mands effective algorithms for extracting meaningful patterns
and constructing predictive features. Among the many at- • Stability: The embedding generated by static methods
tempts towards this goal, graph embedding, i.e., learning low- is not stable, i.e., the embedding of graphs at consecu-
dimensional representation for each node in the graph that tive time steps can differ substantially even though the
accurately captures its relationship to other nodes, has re- graphs do not change much.
cently attracted much attention. It has been demonstrated • Growing Graphs: New nodes can be introduced into
that graph embedding is superior to alternatives in many the graph and create new links to existing nodes as the
supervised learning tasks, such as node classification, link dynamic graph grows in time. All existing approaches
prediction and graph reconstruction [Ahmed et al., 2013; assume a fixed number of nodes in learning graph em-
Perozzi et al., 2014; Cao et al., 2015; Tang et al., 2015; beddings and thus cannot handle growing graphs.
Grover and Leskovec, 2016; Ou et al., 2016].
• Scalability: Learning embeddings independently for
Various approaches have been proposed for static graph
each snapshot leads to running time linear in the number
embedding [Goyal and Ferrara, 2017]. Examples include
of snapshots. As learning a single embedding is already
∗
Nitin Kamra and Palash Goyal contribute equally to this article. computationally expensive, the naive approach does not
scale to dynamic networks with many snapshots. and T is the number of snapshots. We consider the setting
Other approaches have attempted to learn embedding of dy- with growing graphs i.e. Vt ⊆ Vt+1 , namely new nodes can
namic graphs by explicitly imposing a temporal regularizer to join the dynamic graph and create links to existing nodes.
ensure temporal smoothness over embeddings of consecutive We consider the deleted nodes as part of the graph with zero
snapshots [Zhu et al., 2016]. This approach fails for dynamic weights to the rest of the nodes. We assume no relationship
graphs where consecutive time steps can differ significantly, between Et and Et+1 and new edges can form between snap-
and hence cannot be used for applications like anomaly detec- shots while existing edges can disappear.
tion. Moreover, their approach is a Graph Factorization (ab- A dynamic graph embedding extends the concept of em-
breviated as GF hereafter) [Ahmed et al., 2013] based model, bedding to dynamic graphs. Given a dynamic graph G =
and DynGEM outperforms these models as shown by our ex- {G1 , · · · , GT }, a dynamic graph embedding is a time-series
periments in section 5. [Dai et al., 2017] learn embedding of mappings F = {f1 , · · · , fT } such that mapping ft is a
of dynamic graphs, although they focus on a bipartite graphs graph embedding for Gt and all mappings preserve the prox-
specifically for user-item interactions. imity measure for their respective graphs.
In this paper, we develop an efficient graph embedding al- A successful dynamic graph embedding algorithm should
gorithm, referred to as DynGEM, to generate stable embed- create stable embeddings over time. Intuitively, a stable dy-
dings of dynamic graphs. DynGEM employs a deep autoen- namic embedding is one in which consecutive embeddings
coder at its core and leverages the recent advances in deep differ only by small amounts if the underlying graphs change
learning to generate highly non-linear embeddings. Instead of a little i.e. if Gt+1 does not differ from Gt a lot, the em-
learning the embedding of each snapshot from scratch, Dyn- bedding outputs Yt+1 = ft+1 (Gt+1 ) and Yt = ft (Gt ) also
GEM incrementally builds the embedding of snapshot at time change only by a small amount.
t from the embedding of snapshot at time t − 1. Specifically, To be more specific, let St (Ṽ ) be the weighted adjacency
we initialize the embedding from previous time step, and then matrix of the induced subgraph of node set Ṽ ⊆ Vt and
carry out gradient training. This approach not only ensures Ft (Ṽ ) ∈ R|Ṽ |×d be the embedding of all nodes in Ṽ ⊆ Vt of
stability of embeddings across time, but also leads to efficient snapshot t. We define the absolute stability as
training as all embeddings after the first time step require
very few iterations to converge. To handle dynamic graphs kFt+1 (Vt ) − Ft (Vt )kF
Sabs (F; t) = .
with growing number of nodes, we incrementally grow the kSt+1 (Vt ) − St (Vt )kF
size of our neural network with our heuristic, PropSize, to In other words, the absolute stability of any embedding F is
dynamically determine the number of hidden units required the ratio of the difference between embeddings to that of the
for each snapshot. In addition to the proposed model, we also difference between adjacency matrices. Since this definition
introduce rigorous stability metrics for dynamic graph em- of stability depends on the sizes of the matrices involved, we
beddings. define another measure called relative stability which is in-
On both synthetic and real-world datasets, experiment re- variant to the size of adjacency and embedding matrices:
sults demonstrate that our approach achieves similar or bet-
ter accuracy in graph reconstruction and link prediction more kFt+1 (Vt ) − Ft (Vt )kF kSt+1 (Vt ) − St (Vt )kF
Srel (F; t) = .
efficiently than existing static approaches. DynGEM is also kFt (Vt )kF kSt (Vt )kF
applicable for dynamic graph visualization and anomaly de-
We further define the stability constant:
tection, which are not feasible for many previous static em-
bedding approaches. KS (F) = max
0
|Srel (F ; τ ) − Srel (F ; τ 0 )|.
τ,τ
2 Definitions and Preliminaries We say that a dynamic embedding F is stable as long as it has
a small stability constant. Clearly, the smaller the KS (F) is,
We denote a weighted graph as G(V, E) where V is the vertex the more stable the embedding F is. In the experiments, we
set and E is the edge set. The weighted adjacency matrix of use the stability constant as the metric to compare the stability
G is denoted by S. If (u, v) ∈ E, we have sij > 0 denoting of our DynGEM algorithm to other baselines.
the weight of edge (u, v); otherwise we have sij = 0. We use
si = [si,1 , · · · , si,|V | ] to denote the i-th row of the adjacency
matrix.
3 DynGEM: Dynamic Graph Embedding
Given a graph G = (V, E), a graph embedding is a map- Model
ping f : V → Rd , namely yv = f (v) ∀v ∈ V . We require Recent advances in deep unsupervised learning have shown
that d |V | and the function f preserves some proximity that autoencoders can successfully learn very complex low-
measure defined on the graph G. Intuitively, if two nodes u dimensional representations of data for various tasks [Bengio
and v are “similar” in graph G, their embedding yu and yv et al., 2013]. DynGEM uses a deep autoencoder to map the
should be close to each other in the embedding space. We use input data to a highly nonlinear latent space to capture the
the notation f (G) ∈ R|V |×d for the embedding matrix of all connectivity trends in a graph snapshot at any time step. The
nodes in the graph G. model is semi-supervised and minimizes a combination of
In this paper, we consider the problem of dynamic graph two objective functions corresponding to the first-order prox-
embedding. We represent a dynamic graph G as a series of imity and second-order proximity respectively. The autoen-
snapshots, i.e. G = {G1 , · · · , GT }, where Gt = (Vt , Et ) coder model is shown in Figure 1, and the terminology used
Symbol Definition 𝑳𝒈𝒍𝒐𝒃 𝑳𝒈𝒍𝒐𝒃
n number of vertices ෝ
𝒙𝒊 ෝ
𝒙𝒋
K number of layers ෝ𝒊
𝒚
𝟏
ෝ𝒋𝟏
𝒚
S = {s1 , · · · , sn } adjacency matrix of G
X = {xi }i∈[n] input data (𝑲) (𝑲)
𝒚𝒊 𝑳𝒍𝒐𝒄 𝒚𝒋
X̂ = {x̂i }i∈[n] reconstructed data 𝜃𝑡−1
𝜃𝑡 𝜃𝑡
(k) (k)
Y = {yi }i∈[n] hidden layers (𝟏)
𝒚𝒊
(𝟏)
𝒚𝒋
Y = Y (K) embedding 𝒔𝒊
𝒙𝒊 𝒙𝒋
θ = {W , Ŵ (k) , b(k) , b̂(k) }
(k)
weights, biases 𝒔𝒊 𝒔𝒋
Week 101 events: Jeffrey Skilling took over as CEO in Feb 2001; Rove
divested his stocks in energy in June 2001 and CEO resigna-
tion and crime investigation by FBI began in Jan 2002. We
Figure 3: Results of anomaly detection on Enron and visu- also observe some peaks leading to each of these time frames
alization of embeddings for weeks 93, 94 and 101 on Enron which indicate the onset of these events. Figure 3 shows em-
dataset bedding visualizations around week 94. A spread out embed-
ding can be observed for weeks 93 and 101, corresponding
to low communication among employees. On the contrary,
known community structure. We apply t-SNE [Maaten and the volume of communication grew significantly in week 94
Hinton, 2008] to the embedding generated by DynGEM at (shown by the highly compact embedding).
each time step to plot the resulting 2D embeddings. To avoid
instability of visualization over time steps, we initialize t- 5.6 Effect of Layer Expansion
SNE with identical random state for all time steps.
Figure 2 illustrates the results for 2D visualization of We evaluate the effect of layer expansion on HEP-TH data
100-dimensional embeddings for SYN dataset, when nodes set. For this purpose, we run our model DynGEM, with and
change their communities over a single time step. The left without layer expansion. We observe that without layer ex-
(right) plot in each subfigure shows the embedding before pansion, the model achieves an average MAP of 0.46 and
(after) the nodes change their communities. A point in any 0.19 for graph reconstruction and link prediction respectively.
plot represents the embedding of a node in the graph with the Note that this is significantly lower than the performance
color indicating the node community. Small (big) points are of DynGEM with layer expansion which obtains 0.491 and
nodes which didn’t (did) change communities. Each big point 0.26 for the respective tasks. Also note that for SDNE and
is colored according to its final community color. SDNEalign , we select the best model at each time step. Us-
We observe that the DynGEM embeddings of the nodes ing PropSize heuristic obviates this need and automatically
which changed communities, follow the changes in commu- selects a good neural network size for subsequent time steps.
nity structure accurately without disturbing the embeddings
of other nodes, even when the fraction of such nodes is very
5.7 Scalability
high (see figure 2b where 30% nodes change communities). We now compare the time taken to learn different embed-
This strongly demonstrates the stability of our technique. ding models. From Table 6, we observe that DynGEM is
significantly faster than SDNEalign . We do not compare it
5.5 Application to Anomaly Detection with Graph Factorization based methods because although
Anomaly detection is an important application for detecting fast, they are vastly outperformed by deep autoencoder based
malicious activity in networks. We apply DynGEM on the models. Assuming ns iterations to learn a single snapshot
Enron dataset to detect anomalies and compare our results embedding from scratch and ni iterations to learn embed-
with the publicly known events occurring to the company ob- dings when initialized with previous time step embeddings,
served by [Sun et al., 2007]. the expected speedup for a dynamic graph of length T is de-
T ns
We define ∆t as the change in embedding between time t fined as ns +(T −1)ni (ignoring other overheads). We com-
and t − 1: ∆t = kFt+1 (Vt ) − Ft (Vt )kF , and this quantity can pare the observed speedup with the expected speedup. In
be thresholded to detect anomalies. The plot of ∆t with time Table 7, we show that our model achieves speedup closer
on Enron dataset is shown in Figure 3. to the expected speedup as the number of graph snapshots
In the figure, we see three major spikes around week 45, increase due to diminished effect of overhead computations
55 and 94 which correspond to Feb 2001, June 2001 and (e.g. saving, loading, expansion and initialization of the
Jan 2002. These months were associated with the following model, weights and the embedding). Our experiment results
show that DynGEM achieves consistent 2-3X speed up across References
a variety of different networks. [Ahmed et al., 2013] Amr Ahmed, Nino Shervashidze,
Shravan Narayanamurthy, Vanja Josifovski, and Alexan-
6 Conclusion der J Smola. Distributed large-scale natural graph
In this paper, we propose DynGEM, a fast and efficient al- factorization. In 22nd Intl. World Wide Web Conference,
gorithm to construct stable embeddings for dynamic graphs. pages 37–48, 2013.
It uses a dynamically expanding deep autoencoder to capture [Belkin and Niyogi, 2001] Mikhail Belkin and Partha
highly nonlinear first-order and second-order proximities of Niyogi. Laplacian eigenmaps and spectral techniques for
the graph nodes. Moreover, our model utilizes information embedding and clustering. In Proc. 13th Advances in
from previous time steps to speed up the training process by Neural Information Processing Systems, pages 585–591,
incrementally learning embeddings at each time step. Our 2001.
experiments demonstrate the stability of our technique across
time and prove that our method maintains its competitiveness [Bengio et al., 2013] Yoshua Bengio, Aaron Courville, and
on all evaluation tasks e.g., graph reconstruction, link predic- Pascal Vincent. Representation learning: A review and
tion and visualization. We showed that DynGEM preserves new perspectives. IEEE transactions on pattern analysis
community structures accurately, even when a large fraction and machine intelligence, 35(8):1798–1828, 2013.
of nodes (∼ 30%) change communities across time steps. We [Cao et al., 2015] Shaosheng Cao, Wei Lu, and Qiongkai
also applied our technique to successfully detect anomalies, Xu. Grarep: Learning graph representations with global
which is a novel application of dynamic graph embedding. structural information. In Proc. 21st Intl. Conf. on Knowl-
DynGEM shows great potential for many other graph infer- edge Discovery and Data Mining, pages 891–900, 2015.
ence applications such as node classification, clustering etc., [Cao et al., 2016] Shaosheng Cao, Wei Lu, and Qiongkai
which we leave as future work.
Xu. Deep neural networks for learning graph represen-
There are several directions of future work. Our algorithm
tations. In AAAI, pages 1145–1152, 2016.
ensures stability by initializing from the weights learned from
previous time step. We plan to extend it to incorporate the [Chang et al., 2015] Shiyu Chang, Wei Han, Jiliang Tang,
stability metric explicitly with modifications ensuring satis- Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang.
factory performance on anomaly detection. We also hope to Heterogeneous network embedding via deep architectures.
provide theoretical insight into the model and obtain bounds In Proceedings of the 21th ACM SIGKDD International
on performance. Conference on Knowledge Discovery and Data Mining,
pages 119–128. ACM, 2015.
Acknowledgments [Chen et al., 2015] Tianqi Chen, Ian Goodfellow, and
This work is supported in part by NSF Research Grant IIS- Jonathon Shlens. Net2net: Accelerating learning via
1254206 and IIS-1619458. The views and conclusions are knowledge transfer. arXiv preprint arXiv:1511.05641,
those of the authors and should not be interpreted as repre- 2015.
senting the official policies of the funding agency, or the U.S. [Dai et al., 2017] Hanjun Dai, Yichen Wang, Rakshit
Government. The work was also supported in part by USC Trivedi, and Le Song. Deep coevolutionary network:
Viterbi Graduate PhD fellowship. Embedding user and item features for recommendation.
2017.
[Gehrke et al., 2003] Johannes Gehrke, Paul Ginsparg, and
Jon Kleinberg. Overview of the 2003 kdd cup. ACM
SIGKDD Explorations Newsletter, 5(2):149–151, 2003.
[Glorot et al., 2011] Xavier Glorot, Antoine Bordes, and
Yoshua Bengio. Deep sparse rectifier neural networks. In
Proc. 14th Intl. Conf. on Artificial Intelligence and Statis-
tics, page 275, 2011.
[Goyal and Ferrara, 2017] Palash Goyal and Emilio Ferrara.
Graph embedding techniques, applications, and perfor-
mance: A survey. arXiv preprint arXiv:1705.02801, 2017.
[Grover and Leskovec, 2016] Aditya Grover and Jure
Leskovec. node2vec: Scalable feature learning for net-
works. In Proc. 22nd Intl. Conf. on Knowledge Discovery
and Data Mining, pages 855–864, 2016.
[Hamilton et al., 2016] William L Hamilton, Jure Leskovec,
and Dan Jurafsky. Diachronic word embeddings re-
veal statistical laws of semantic change. arXiv preprint
arXiv:1605.09096, 2016.
[Huang et al., 2017a] Xiao Huang, Jundong Li, and Xia Hu. [Sutskever et al., 2013] Ilya Sutskever, James Martens,
Accelerated attributed network embedding. In Proceed- George E Dahl, and Geoffrey E Hinton. On the impor-
ings of the 2017 SIAM International Conference on Data tance of initialization and momentum in deep learning.
Mining, pages 633–641. SIAM, 2017. In Proc. 30th Intl. Conf. on Machine Learning, pages
[Huang et al., 2017b] Xiao Huang, Jundong Li, and Xia Hu. 1139–1147, 2013.
Label informed attributed network embedding. In Pro- [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang,
ceedings of the Tenth ACM International Conference on Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-
Web Search and Data Mining, pages 731–739. ACM, scale information network embedding. In 24th Intl. World
2017. Wide Web Conference, pages 1067–1077, 2015.
[Klimt and Yang, 2004] Bryan Klimt and Yiming Yang. The [Tenenbaum et al., 2000] Joshua B Tenenbaum, Vin
enron corpus: A new dataset for email classification re- De Silva, and John C Langford. A global geomet-
search. In European Conference on Machine Learning, ric framework for nonlinear dimensionality reduction.
pages 217–226, 2004. science, 290(5500):2319–2323, 2000.
[Wang and Wong, 1987] Yuchung J Wang and George Y
[Kulkarni et al., 2015] Vivek Kulkarni, Rami Al-Rfou,
Wong. Stochastic blockmodels for directed graphs. Jour-
Bryan Perozzi, and Steven Skiena. Statistically significant
nal of the American Statistical Association, 82(397):8–19,
detection of linguistic change. In 24th Intl. World Wide
1987.
Web Conference, pages 625–635. ACM, 2015.
[Wang et al., 2016] Daixin Wang, Peng Cui, and Wenwu
[Leskovec and Krevl, 2014] Jure Leskovec and Andrej
Zhu. Structural deep network embedding. In Proc. 22nd
Krevl. SNAP Datasets: Stanford large network dataset Intl. Conf. on Knowledge Discovery and Data Mining,
collection, 2014. pages 1225–1234, 2016.
[Leskovec et al., 2007] Jure Leskovec, Jon Kleinberg, and [Zhu et al., 2016] Linhong Zhu, Dong Guo, Junming Yin,
Christos Faloutsos. Graph evolution: Densification and Greg Ver Steeg, and Aram Galstyan. Scalable temporal
shrinking diameters. ACM Transactions on Knowledge latent space inference for link prediction in dynamic so-
Discovery from Data (TKDD), 1(1):2, 2007. cial networks. IEEE Transactions on Knowledge and Data
[Maaten and Hinton, 2008] Laurens van der Maaten and Ge- Engineering, 28(10):2765–2777, 2016.
offrey Hinton. Visualizing high-dimensional data us-
ing t-sne. Journal of Machine Learning Research,
9(Nov):2579–2605, 2008.
[Niepert et al., 2016] Mathias Niepert, Mohamed Ahmed,
and Konstantin Kutzkov. Learning convolutional neural
networks for graphs. In International Conference on Ma-
chine Learning, pages 2014–2023, 2016.
[Ou et al., 2016] Mingdong Ou, Peng Cui, Jian Pei, Ziwei
Zhang, and Wenwu Zhu. Asymmetric transitivity preserv-
ing graph embedding. In Proc. 22nd Intl. Conf. on Knowl-
edge Discovery and Data Mining, pages 1105–1114, 2016.
[Park et al., 2009] Youngser Park, C Priebe, D Marchette,
and Abdou Youssef. Anomaly detection using scan statis-
tics on time series hypergraphs. In Link Analysis, Coun-
terterrorism and Security (LACTS) Conference, page 9,
2009.
[Perozzi et al., 2014] Bryan Perozzi, Rami Al-Rfou, and
Steven Skiena. Deepwalk: Online learning of social rep-
resentations. In Proc. 20th Intl. Conf. on Knowledge Dis-
covery and Data Mining, pages 701–710, 2014.
[Roweis and Saul, 2000] Sam T Roweis and Lawrence K
Saul. Nonlinear dimensionality reduction by locally lin-
ear embedding. science, 290(5500):2323–2326, 2000.
[Sun et al., 2007] Jimeng Sun, Christos Faloutsos, Spiros
Papadimitriou, and Philip S Yu. Graphscope: parameter-
free mining of large time-evolving graphs. In Proc. 13th
Intl. Conf. on Knowledge Discovery and Data Mining,
pages 687–696, 2007.