Anomaly Detection Using Graph Neural Networks

Uploaded by

hilmi bukhori

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views

Anomaly Detection Using Graph Neural Networks

Uploaded by

hilmi bukhori

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (Com-IT-Con), India, 14th -16th

Feb 2019

Anomaly Detection using Graph Neural Networks

Anshika Chaudhary, Himangi Mittal, Anuja Arora
Computer Science and Engineering
Jaypee Institute of Information Technology
Noida, India
anshika.ch412@gmail.com, himangimittal@gmail.com, anuja.arora29@gmail.com

Abstract—Conventional methods for anomaly detection spatial as well as temporal networks[26]. However, with the
include techniques based on clustering, proximity or increasing use of these platforms, user come under the threat
classification. With the rapidly growing social networks, outliers of several anomalies in which horizontal anomalies are
or anomalies find ingenious ways to obscure themselves in the difficult to detect and hazardous for any network. These are
network and making the conventional techniques inefficient. In the anomalies caused by a user because of his/her anomalous
this paper, we utilize the ability of Deep Learning over and changing behaviour towards different sources. A self-
topological characteristics of a social network to detect healing neuro-fuzzy approach is used for the detection,
anomalies in email network and twitter network. We present a recovery, and removal of horizontal anomalies efficiently and
model, Graph Neural Network, which is applied on social
accurately[12]. This approach model is evaluated with three
connection graphs to detect anomalies. The combinations of
various social network statistical measures are taken into
datasets: DARPA'98 benchmark dataset, synthetic dataset, and
account to study the graph structure and functioning of the real-time traffic. The evaluation over DARPA'98 dataset
anomalous nodes by employing deep neural networks on it. The demonstrates that the proposed approach is better than the
hidden layer of the neural network plays an important role in existing solutions.
finding the impact of statistical measure combination in In a social network application, anomalous node detection
anomaly detection. is actually a challenging task. On one side, a number of
utilities exist in social networking sites and on the other end,
Keywords— Graph Neural Network, Anomaly Detection,
Social Network, Enron dataset, twitter dataset
free handed content delivery led to extensive misuse. Many
cyber-attacks through social media content show that it has
become prime source of malicious activities. These attempts
I. INTRODUCTION are made to are made to earn illegal profits through rumour
Anomaly or outlier detection is a procedure to find data spread, enhance the prestige of an unknown product, etc.
points which have a spurious behaviour. It happens around us Therefore, in order to catch and abbreviate security risk in the
in the form of fraud detection [6], network surveillance [7], social network, techniques required to detect anomalous
public safety and security [8], intrusion detection [9], medical behaviour in a social network.
problems [10], false advertisement and many more. The term According to observation, malicious users follow some
anomalies are used synonymously with outliers, noise, and ambiguous patterns while sharing information. In this research
deviations. Anomalies may occur as point anomalies or group work, few hypotheses are considered to capture the behaviour
anomalies [11]. Point anomalies can be defined as single data of anomalous nodes. Initially, adjacency matrix of the dataset
points having deviant behaviour from the rest of the network. is taken as input for graph neural network and then topological
Whereas group anomalies are collective anomalous data characteristics of the targeted datasets are computed. This
points, mostly observed in fraudulent activities. Our work is work is inspired by graph-based anomaly detection. Deep
focused on point anomalies. learning technique is utilizing in order to detect nodes
Graph-based anomaly detection can be helpful in finding containing anomalous behaviour [1]. The outliers are labelled
the spammers [14], outspread of any information [16], fake as a distrustful node with the help of social network statistical
reviews [17] or malicious activities [18]. Thus, detecting measures: Between Centrality, Degree, and Closeness. These
anomalies is a vital task to ensure safety and security for the parameters are taken into account to understand the structure
users in a network. Analysing large graphs to find out the of the graph. These topological characteristics are used to
anomalies can also yield important and interesting information exploit the anomalous node behaviour. Following
about the graph structure. contributions are made in this research work:
Detecting spam profiles is considered as one of the most RC1: Uses graph neural network in order to detect
challenging issues in the online social network. Most recent anomaly in social network.
work in this direction has been done by Farris et. al. in 2018, a RC2: Impact of statistical properties of a social network is
hybrid model on SVM-WOA [14] is introduced. This model is tested and empirical validation of results is evaluated on
applied and tested on different lingual context, collected from Enron and Twitter dataset.
Twitter in four languages: Arabic, English, Spanish, Korean to
identify the most influencing features/factor. This model can The paper is divided into five sections. The first section is
effectively help in designing more accurate and insightful the same introduction section which discusses the problem
spam detection models for an online social network. domain. Section 2 covers the related works done in the
domain of anomaly detection. Following this, the third section
Social networks have become a hot topic today and much presents the considered hypothesis, dataset and graph neural
of the work has been done on social networks including network which is used to detect the anomalies in a social
visualization[24] [25], recommendation, link prediction on network. The fourth section shows the experimental setup and

978-1-7281-0211-5/19/$31.00 2019 ©IEEE 346

results of anomaly detection on two datasets. Finally, section 5 YEAR TITLE TECHNIQUES RESULTS
concludes the work.
2018 Web traffic CNN+LSTM+D Accuracy:
anomaly NN 98.60%
II. RELATED WORK detection using
Recently, there has been a boost in the area of data mining C-LSTM neural
in graphs which is further extended to spatial and temporal networks [13]
graphs[26]. Much work has been done on finding valuable
2018 Heterogeneous Parameter-free Precision:
information from the structure of graphs, however, a little
anomaly framework(HAD 89.7%
work has been done in the area of anomaly detection.
detection in ISD)
social diffusion Recall:
For anomaly detection, techniques of supervised and
with 91.3%
unsupervised learning [2] require the data consisting of some
nodes labelled as outliers and the rest as normal nodes. discriminative F1-score:
Surveys [3], [4] have been done which present the various feature discovery 90.5%
approaches to detect anomalies. Basic techniques for anomaly [15]
detection include the statistical methods which find the
deviation from common statistical properties such as mean, 2018 Network LSTM Accuracy:
median, mode, and quantiles. Recent techniques utilize Anomaly 84%
machine learning such as density based clustering, K-nearest Detection Using
neighbour Support Vector Machines (SVM) to serve the Recurrent
purpose of detecting and classifying the anomalies based on Neural
an initially large set of features. Techniques using a variation Networks[23]
of Bayesian network [5] have been used in finding group 2017 Neuro-Fuzzy A self-healing Accuracy:
anomalies by using point activities data of a user and pairwise Based neuro-fuzzy 99.98%
communication data. An anomaly can be defined in many Horizontal approach Precision:
terms such as outliers, exceptions or abnormality that Anomaly (NHAD) 98.1%
represent unusual, illegal or malicious activities. The presence Detection
of anomalies is considered on the basis of functional structure Detection In Rate:
which is different from the normal model. It was witnessed Online Social 97.97%
that the attacks have a huge distributed effect through Networks [12]
engagements in social network sites as illegal users making 2017 Semi-Supervised GCN Accuracy:
use of it differently obeying patterns in a different manner Classification 81.50%
from their peers. with graph
The initial impetus has been done by Gori et.al. in 2005 Convolution
where this term graph neural network (GNN) came in Networks[1][21]
limelight[19]. Further study is continued by Li et. al. in 2016
about Gated graph neural network [20]. Using neural network
models like RNN and CNN is a somewhat challenging
problem to work upon arbitrarily structured graphs so Kipf et. III. METHODOLOGY
al. adopted the somewhat similar approach and initiated from To detect the anomalies in the graph, simply employing
the spectral graphs convolutions [1] [22]. the classification, community detection or clustering
techniques will fail to capture the behaviour of the anomalies.
Application of anomaly detection are observed widely in
An unusual activity or different behaviour of graph exhibits
the Twitter network to detect spammers. In the work of
abnormality or outlier in a social media
[27], two algorithms DenStream and StreamKM++ have been
application..Therefore, in order to find out these
employed. The former algorithm is a modified density based
abnormalities, a graph-based approach is applied to detect the
clustering method (DBSCAN) to generate p micro clusters.
anomalies. The aim of the work is to validate graph neural
The latter algorithm used k-means clustering method to
network for anomaly detection and try to find out the impact
choose the tightest cluster. Anomaly detection in dynamic
of social network statistical properties on anomaly detection.
networks have also been worked upon. In the survey [28],
Figure 1 depicts the overview of the graph-based approach
various types of anomalies in the form of nodes, edges,
which is experimented to detect anomalies. Initially, social
subgraphs and events have been discussed, along with the
network based data has been selected.
detection techniques like communities, probabilistic,
compression, representative and distance. Some other latest TABLE II: Statistics of the dataset
works of anomaly detection in graphs are enlisted in table 1. It Dataset Nodes Edges
shows that in recent years convolution neural network, graph
convolution network, and neuro-fuzzy approaches are mainly Enron 11703 450813
used for anomaly detection. The accuracy achieved using Twitter 76851 342153
neural techniques varies from 81-98 % for various social
network applications.
TABLE I: Latest works on Anomaly Detection in Graphs This data is augmented by computing and adding its statistical
properties. Further, Graph neural network is applied on the

347
graph adjacency matrix to classify nodes in two categories-
anomaly, general(not anomalous).

A. Hypothesis for Social Network Analysis

For anomaly identification, node information in form of its
statistical properties has been used as a feature set. The
problem of detecting this anomalous subgraph is formulated in
terms of a hypothesis between many nodes.
x Hypothesis 0: An anomalous node will have a higher
degree Fig 1. Parameter Statistics on Enron and Twitter Datasets
In a social network, an anomalous network will try to of an anomaly. Anomalous nodes are assigned to both the
influence maximal nodes by connecting to as many datasets based on three statistical graph properties-
nodes as possible, heaving a higher degree than Betweenness Centrality α(v), Closeness centrality β(v), and
normal. Degree centrality γ(v) which are calculated (1), (2) and (3)
x Hypothesis 1: An anomalous node will have a higher respectively.
between centrality
If the first hypothesis is true, it will be peculiarly
connected a large percent of nodes of a social
network and will lie in the shortest paths between
many nodes.
x Hypothesis 2: An anomalous node will have higher a
closeness centrality
If the two hypotheses hold true, the node being
largely connected in the network will have small
paths to the other nodes, hence, will be closer to the
nodes in the network. These three parameters are used to define the
characteristics of the anomalous nodes (section III A). Further,
all parameters individually and in groups/pairs are used to
B. Datasets tune the system and find out accuracy for anomalous nodes
The experiments are performed on two datasets- Enron identification.
dataset and Twitter dataset.
The statistics of the parameters on both the datasets can be
Enron is an email communication network which has been seen in Figure 1. The image shows the degree distribution,
widely used for anomaly detection in networks. Nodes of the between centrality distribution and the closeness distribution.
network are email addresses which are changed to numeric id The x-axis of the graph shows the number of nodes and the y-
in our work. An undirected edge connects two nodes if an axis shows the frequency of the nodes with the same value.
address i send an email to address j. The dataset is available Due to a large number of nodes, the parameter value of every
on https://www.cs.cmu.edu/~./enron/. The given Enron dataset node could not be presented, hence, the nodes are taken as
contains only five anomalous nodes. We have incorporated bins and the frequency of the bins is shown. The threshold for
and imputed around 40% nodes as abnormal/outlier nodes in the data augmentation on the datasets is chosen in a way that
Enron dataset based on hypothesis mentioned in section 40%-60% nodes can be labeled as outliers. The threshold was
III(A). decided with reference to these parameter distributions.
Another dataset is from a social networking site Twitter. A
Twitter social network consists of followers and followings D. Graph Neural Network
those are connected to each other based on a posted tweet, An anomalous shows peculiar behaviour which has a great
mentioned mention, following, and replying. In our work, we deviation from a normal node. We consider the case that in a
have utilized the connections made by the follower network social network, there will be two extreme behaviours shown
only by using a directed edge. The dataset is available on by the outliers.
https://snap.stanford.edu/data/higgs-twitter.html. This dataset
is useful to validate our work on large, directed graphs as Either, they will be connected to only a few people or will be
Enron dataset contains less number of nodes and undirected connected to a large percent of the nodes in the network. We
edges. Dataset details are depicted in table 2 which shows the do not follow with the first extreme as a person might join the
number of nodes and edges taken in Enron and Twitter network but may not be an active node. However, the second
datasets respectively. extreme behaviour shows signs of anomalous behaviour.
Therefore, we consider the assumption that an anomalous
node will have a large number of
C. Anomalous Nodes Assignment Criteria
Data augmentation technique is applied and datasets are
tweaked. These parameters since they take into account the
graph structure and will help to examine the graph structure

348
Closeness 0.25 0.8611

TABLE IV: Results of the parameters on Twitter Dataset

Parameter Threshold Accuracy
Between Centrality 1*10^-7 0.8126
Degree 70 0.8157
Closeness 0.25 0.8118

Fig 2: Data Preprocessing/Data-Flow and GNN Block Diagram

connections in an attempt to influence the network as much as Best achieved accuracy corresponding to an individual
possible. It will try to act as a central node in a network so that parameter for a specific threshold is presented in table 3 for
it can give an impact to its neighbouring nodes. Also, having a Enron dataset. The same threshold was used for the Twitter
large number of connections, it can reach out to the whole dataset to mark the nodes as anomalous and general(see table
network in short paths, thus being close to each and every 4). Using a threshold of 1*10^-7 on betweenness centrality, 70
node in the network. These hypotheses can also be used in for degree and 0.25 for closeness centrality, nodes having
detecting the spammers in a network. value greater than threshold were labelled as outliers.
Thresholds were chosen by the trial and error method to yield
The three hypothesis take into the graph structure of the around 40% - 50% of the dataset as outliers.
anomalous nodes and yield features incorporating it. This will
also be helpful in the application of neural network model. Comparing the accuracy of the parameters found on both
Once the three parameters are calculated and data is the datasets, we can conclude that degree is a better parameter
augmented, we find out a representation vector of each node to capture the nature of anomalous nodes and hence,
in the network with the graph neural network. For the network hypothesis 1 holds true.
input and computation, we consider the following: For carrying out fraudulent activities, anomalies will try to
x Given a graph G = (V, E) having N number of nodes connect with and affect as many people as possible. This
and E number of edges. justifies the advantage of degree parameter over others. Our
hypothesis gives a good accuracy and generalizes well over
x Let A denote the matrix of size N*N, representing both the datasets. Thus, we can conclude that our hypotheses
the adjacency matrix of the graph. hold true to detect anomalies in a social network.
x Let W denote the weight matrix initialized uniformly. We conduct another experiment by making a combination
of two to further study the nature of anomalies. This will rank
x Let H(l) denote the l-th hidden neural network layer. the parameters to find out which of them is a better measure in
Our goal is to accurately classify the nodes of the graph as observing the behaviour of an anomaly. Table 5 and Table 6
anomalous or normal using the graph neural network. The presents the combined parameters based anomaly detection
input to the neural network will be the adjacency matrix of the results for Enron and Twitter dataset respectively. Results
graph and thereafter, we use the layer-wise propagation rule. show that combined parameters help in enhancing the
anomaly detection accuracy. For the combination of
parameters, we can observe that degree is the best parameter
which captures the behavior of
TABLE V: Results on Enron Dataset
Here, σ is the ReLU activation function used for the first
layer and sigmoid activation for the output layer. We build our Parameter Threshold Accuracy
neural network using the Keras library. Weights are initialized Between Centrality, Degree 1*10^-7, 70 0.9845
using uniform random initialization.
Between Centrality, 1*10^-7, 0.25 0.9006
Closeness
IV. EXPERIMENTS
We divide the dataset into 80-20 ratio and run the graph Closeness, Degree 0.25, 70 0.9749
neural network for 100 epochs. For compiling the Keras
model, Adam optimizer and Binary-Cross Entropy are used TABLE VI: Results on Twitter Dataset
for optimization and loss computation. Table 3 shows the Parameter Threshold Accuracy
classification results for Enron dataset for best suitable
thresholds of betweenness, closeness and degree centrality. Between Centrality, Degree 1*10^-7, 70 0.9823
TABLE III: Results of the parameters on Enron Dataset Closeness, Degree 0.25, 70 0.9756
Parameter Threshold Accuracy
anomalous nodes. This also validates our hypothesis 1 and the
Between Centrality 1*10^-7 0.8615 assumption that anomalous nodes tend to connect to maximal
Degree 70 0.8632 nodes as possible to be a central node and be close to each and
every node as much as possible. Our work outperforms the
works [6] [12] in a way that it takes the graph structure by

349
considering the degree, closeness and betweenness. By the [9] Karami, A. (2018). An anomaly-based intrusion detection system in
definition of anomaly, the node will have a peculiar behaviour presence of benign outliers with visualization capabilities. Expert
Systems with Applications, 108, 36-60.
of having very large or very few connections, thus, verifying
[10] Kodama, T., Kamata, K., Fujiwara, K., Kano, M., Yamakawa, T., Yuki,
our approach. I., \& Murayama, Y. (2018). Ischemic stroke detection by analyzing
heart rate variability in rat middle cerebral artery occlusion model.
V. CONCLUSIONS IEEE Transactions on Neural Systems and Rehabilitation Engineering.
[11] Ahmed, Mohiuddin, Abdun Naser Mahmood, and Jiankun Hu. "A
In this work, we presented a deep learning model, Graph survey of network anomaly detection techniques." Journal of Network
Neural Network to detect the anomalies and outliers in a and Computer Applications 60 (2016): 19-31.
social network. We also present three hypothesis stating the [12] KUMAR, RAVINDER, et al. "NHAD: Neuro-Fuzzy Based Horizontal
behaviour of anomalous nodes and try to prove them using our Anomaly Detection In Online Social Networks." IEEE Transactions on
model. Validation of the efficiency of our model was done on Knowledge and Data Engineering (2018).
two datasets - Enron (email communication network) and [13] Kim, Tae-Young, and Sung-Bae Cho. "Web traffic anomaly detection
using C-LSTM neural networks." Expert Systems with Applications
Twitter (social networking site). The number of outliers in the 106 (2018): 66-76.
dataset were augmented using the node properties - degree,
[14] AlaM, A. Z., Faris, H., \& Hassonah, M. A. (2018). Evolving Support
between centrality and closeness centrality. These parameters Vector Machines using Whale Optimization Algorithm for spam
were chosen since they take into account the structure of the profiles detection on online social networks in different lingual
graph. We show the results by taking these parameters contexts. Knowledge-Based Systems, 153, 91-104.
individually and as a combination which achieves good [15] Liu, Siyuan, Qiang Qu, and Shuhui Wang. "Heterogeneous anomaly
accuracy over the datasets and hence, proves our hypothesis detection in social diffusion with discriminative feature discovery."
true. Information Sciences 439 (2018): 1-18.
[16] Prado-Romero, M. A., Oliva, A. F., & Hernández, L. G. (2018,
September). Identifying Twitter Users Influence and Open Mindedness
VI. FUTURE WORK Using Anomaly Detection. In International Workshop on Artificial
Intelligence and Pattern Recognition(pp. 166-173). Springer, Cham.
Detecting anomalies can help to reduce the fraudulent
[17] Ramalingam, D., &Chinnaiah, V. (2018). Fake profile detection
activities or spamming spreading in the network. For this, techniques in large-scale online social networks: A comprehensive
efficient method need to be developed which take into account review. Computers & Electrical Engineering, 65, 165-177.
the behaviour of anomalies to its core. Graph Neural Network [18] Al-Qurishi, M., Hossain, M. S., Alrubaian, M., Rahman, S. M. M.,
can capture the features and establish well relationships &Alamri, A. (2018). Leveraging Analysis of User Behavior to Identify
between them due to the hidden layer. Experimenting with Malicious Activities in Large-Scale Social Networks. IEEE
more features and testing it on neural networks can broaden Transactions on Industrial Informatics, 14(2), 799-813.
the study on the nature of anomalies. [19] Scarselli, F., Tsoi, A. C., Gori, M., & Hagenbuchner, M. (2005). A new
neural network model for graph processing. Department of Information
Engineering, University of Siena, Tech. Rep, 502, 01-05.
REFERENCES [20] Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2015). Gated
[1] Semi-Supervised Classification With Graph Convolutional Networks, graph sequence neural networks. arXiv preprint arXiv:1511.05493.
Thomas N. Kipf, Max Welling, ICLR 2017 [21] Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., &
[2] Lili Zhang*, Huibin Wang, Chenming Li, Qing Ye, Yehong Shao, Bronstein, M. M. (2017, July). Geometric deep learning on graphs and
``Unsupervised Anomaly Detection Algorithm of Graph Data Based on manifolds using mixture model CNNs. In Proc. CVPR (Vol. 1, No. 2,
Graph Kernel '', 2017 IEEE 4th International Conference on Cyber p. 3).
Security and Cloud Computing [22] Radford, B. J., Apolonio, L. M., Trias, A. J., & Simpson, J. A. (2018).
[3] A Survey on Different Graph Based Anomaly Detection Techniques, Network Traffic Anomaly Detection Using Recurrent Neural
Debajit Sensarma, and Samar Sen Sarma, Indian Journal of Science and Networks. arXiv preprint arXiv:1803.10769.
Technology, Vol 8(31), November 2015 [23] Aggrawal, N., & Arora, A. (2016, October). Visualization, analysis and
[4] Leman Akoglu, Hanghang Tong and Danai Koutra, ``Graph-based structural pattern infusion of DBLP co-authorship network using
Anomaly Detection and Description: {A} Survey, CoRR, Gephi. In Next Generation Computing Technologies (NGCT), 2016
abs/1404.4679, 2014 2nd International Conference on(pp. 494-500). IEEE.
[5] GLAD: Group Anomaly Detection in Social Media Analysis, Rose Yu, [24] Aggrawal, N., & Arora, A. (2016). Vulnerabilities issues and
Xinran He, and Yan Liu melioration plans for online social network over Web 2.0. Commun.
Dependability Qual. Manag. Int. J, 19(1), 66-73.
[6] Behdad, Mohammad, Luigi Barone, Mohammed Bennamoun, and Tim
French. "Nature-inspired techniques in the context of fraud detection." [25] Miller, Zachary, et al. "Twitter spammer detection using data stream
IEEE Transactions on Systems, Man, and Cybernetics, Part C clustering." Information Sciences 260 (2014): 64-73.
(Applications and Reviews) 42, no. 6 (2012): 1273-1290. [26] Ranshous, Stephen, et al. "Anomaly detection in dynamic networks: a
[7] Alpaydn, G. An Adaptive Deep Neural Network for Detection, survey." Wiley Interdisciplinary Reviews: Computational Statistics 7.3
Recognition of Objects with Long Range Auto Surveillance. (2015): 223-247.G. Eason, B. Noble, and I. N. Sneddon, “On certain
integrals of Lipschitz-Hankel type involving products of Bessel
[8] Yang, J., Zhou, C., Yang, S., Xu, H., \& Hu, B. (2018). Anomaly functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551,
detection based on zone partition for security protection of industrial April,1955.
cyber-physical systems. IEEE Transactions on Industrial Electronics,
65(5), 4257-4267.