Feature Selection For Machine Learning-Based Eraly Detection of Distributed Cyber Attacks
Feature Selection For Machine Learning-Based Eraly Detection of Distributed Cyber Attacks
Feature Selection For Machine Learning-Based Eraly Detection of Distributed Cyber Attacks
Kouichi Sakurai
Faculty of Informatics,
Kyushu University, Japan.
Email: sakurai@inf.kyushu-u.ac.jp
Abstract—It is well known that distributed cyber attacks huge amount of money has been invested on cyber security.
simultaneously launched from many hosts have caused the most The main reason for this is that attackers have also been more
serious problems in recent years including problems of privacy and more sophisticated.
leakage and denial of services. Thus, how to detect those attacks
at early stage has become an important and urgent topic in the Distributed attacks are those launched cooperatively by
cyber security community. For this purpose, recognizing C&C many compromised hosts. Such attacks are referred to as next-
(Command & Control) communication between compromised generation cyber-attacks in Xu’s work [1] and it is well known
bots and the C&C server becomes a crucially important issue, that such attacks are one kind of the most sophisticated attacks.
because C&C communication is in the preparation phase of According to many reports, distributed attacks have caused
distributed attacks. Although attack detection based on signature
has been practically applied since long ago, it is well-known that the most serious problems/losses in these years. Thus, many
it cannot efficiently deal with new kinds of attacks. In recent researchers and developers in the cyber security community
years, ML(Machine learning)-based detection methods have been have been working on how to detect and avoid such attacks.
studied widely. In those methods, feature selection is obviously In general, the attacker prepares or hijack a C&C server,
very important to the detection performance. We once utilized which is used to send attack instructions to the compromised
up to 55 features to pick out C&C traffic in order to accomplish
early detection of DDoS attacks. In this work, we try to answer hosts (bots). Then, the bots launch an actual distributed attack
the question that ”Are all of those features really necessary?” to the victim(s). Thus, the C&C communication is prepara-
We mainly investigate how the detection performance moves as tion phase for distributed attack. If such communication is
the features are removed from those having lowest importance recognized, the upcoming actual distributed attack might be
and we try to make it clear that what features should be blocked. Thus, the detection of the C&C communication is
payed attention for early detection of distributed attacks. We
use honeypot data collected during the period from 2008 to regarded as early detection of distributed attacks.
2013. SVM(Support Vector Machine) and PCA(Principal Com- There have been many cases of distributed attack. The non-
ponent Analysis) are utilized for feature selection and SVM and profit anti-spam organization Spamhaus [2] was suffering a
RF(Random Forest) are for building the classifier. We find that large DDoS attack against their website. The peak of that
the detection performance is generally getting better if more DDoS attack was up to about 300 Gbps on March 19, 2013
features are utilized. However, after the number of features has
reached around 40, the detection performance will not change [3]. That means the data in about 64 full DVDs of 4.7GB were
much even more features are used. It is also verified that, in poured on the server of Spamhaus’s server in every second,
some specific cases, more features do not always means a better which finally Knocked Spamhaus Offline. Another large DDoS
detection performance. We also discuss 10 important features attack peaking at around 400Gbps was reported by Cloudflare
which have the biggest influence on classification. on March 3, 2016 [4]. All 13 DNS root servers suffered from
Keywords-distributed cyber attacks; DDoS attacks; machine a distributed attack on June 25, 2016 and this wasn’t the first
learning; feature selection; early detection; time that critical DNS infrastructure was aimed at. At the end
of 2015, several root servers encountered a DDoS attack, and
I. D ISTRIBUTED C YBER ATTACK AND I TS E ARLY they also experienced a DDoS attack in the middle of May of
D ETECTION the same year that made services like Yelp and Alexa stop. The
The problems and losses caused by cyber attacks have been DDoS attack on June 25, 2017 lasted for around three hours
increasing greatly in recent years, despite many works for during which the average availability across all root servers
avoiding and detecting cyber attacks have been done and a dropped to around 50% of the normal [5].
Thus, the early detection of distributed attacks which can
1 Presently with Fujitsu Co. ltd, Japan. stop the upcoming attacks obviously becomes critically sig-
174
Fig. 1. The image of Online Detection.
175
performance. The detailed examination result will be presented
in the next section.
TP
T P rate = Recall = ;
TP + FN
FP
F P rate = ;
FP + TN
TP
P recision = ;
TP + FP
2 × P recision × Recall
F − M easure = ;
P recision + Recall
176
packet
datasets
session
feature 55 features
split
extraction
feature
ranking
classifier
Fig. 6. Distribution of the variance of packet sizes. The X-axis is for packets
sent; The Y-axis packet received; Reds are C&C sessions and blues normal. removing features
gradually
result
10-fold
cross-validation
Fig. 8. Our experiment process.
start changing
test data
177
(a) the TP rate
(a) the TP rate
Fig. 9. Detection performance based on SVM feature selection and SVM Fig. 10. Detection performance based on SVM feature selection and RF
classifier. classifier.
178
(a) the TP rate (a) the TP rate
Fig. 11. Detection performance based on PCA feature selection and RF Fig. 12. Detection performance based on PCA feature selection and SVM
classifier. classifier.
[8] Y. Tang, ”Defending against Internet Worms: a Signature-based Ap- [18] Snort Users Manual, http://www.snort.org/docs (accessed on June 20,
proach,” in Proc. the 24th IEEE Annual Joint Conference of the 2018).
Computer and Communications Societies (INFOCOM), pp. 1384–1394,
[19] C. Gates, ”The Modeling and Detection of Distributed Port Scans: a
2005.
Thesis Proposal,” Technical Report CS-2003-01, Dalhousie University,
[9] I. Yazid, A. Hanan and M. Aizaini, ”Volume-based Network Intrusion 2003.
Attacks Detection,” Advanced Computer Network and Security, UTM
[20] V. Yegneswaran, P. Barford and J. Ullrich, ”Internet Intrusions: Global
Press, pp. 147–162, 2008.
Characteristics and Prevalence,” in Proc. 2003 ACM Joint International
[10] A. Kind, M. P. Stoecklin and X. Dimitropoulos, ”Histogram-Based
Conference on Measurement and Modeling of Computer Systems, pp.
Traffic Anomaly Detection,” IEEE Transactions on Network Service
138–147, 2003.
Management, Vol. 6, No. 2, pp. 1–12 (2009).
[21] G. Gu, P. Porras, V. Yegneswaran, M. Fong, W. Lee, ”BotHunter:
[11] E. Eskin and W. Lee, ”Modeling System Call for Intrusion Detection
Detecting Malware Infection Through IDS-driven Dialog Correlation,”
with Dynamic Window Sizes,” in Proc. DARPA Information Survivabil-
in Proc. the 16th USENIX security symposium, pp. 167–182, 2007.
ity Conference and Exposition (DISCEX), pp. 165–175, 2001.
[12] Y. Feng, Y. Hori, K. Sakurai and J. Takeuchi, ”A Behavior-based Method [22] S. Kondo and N. Sato, ”Botnet traffic detection techniques by C&C
for Detecting Outbreaks of Low-rate Attacks,” in Proc. 3rd Workshop session classification using SVM,” in Proc. 2nd international conference
on Network Technologies for Security, Administration and Protection on Advances in information and computer security, pp. 29–31, 2007.
(NETSAP), pp. 267–272, 2012. [23] D. Ashley, ”An algorithm for http bot detection,” University of Texas at
[13] Y. Xiang, K. Li and W. Zhou, ”Low-Rate DDoS Attacks Detection and Austin - Information Security Office, 2011.
Traceback by Using New Information Metrics,” IEEE Transactions on [24] S. Yamauchi, J. Kawamoto and K. Sakurai, ”Evaluation of Machine
Information Forensics and Security, Vol.6, No.2, pp. 426–437, 2011. Learning Techniques for C&C traffic Classification,” (in Japanese) IPSJ
[14] W. Lee and D. Xiang, ”Information-theoretic Measures for Anomaly Journal, Vol. 56, No. 9 : pp. 1745–1753, 2015.
Detection,” in Proc. IEEE Symposium on Security and Privacy, pp. 130– [25] L. Lu, Y. Feng and K. Sakurai, ”C&C session detection using random
143, 2001. forest,” in Proc. the 11th ACM International Conference (IMCOM) ,
[15] V. Chandola, A. Banerjee and V. Kumar, ”Anomaly Detection: a Survey,” 2017.
ACM Computing Survey, Vol.41, No.3, pp. 1–72, 2009. [26] J. Weston et al. , ”Feature Selection for SVMs, ”
[16] M. S. Kim, H. J. Kang and S. C. Hong; ”A Flow-based Method for http://www.cs.columbia.edu/ jebara/6772/papers/weston01feature.pdf
Abnormal Network Traffic Detection,” in Proc. IEEE/IPIP Network (accessed on April 29, 2018).
Operations and Management Symposium, pp. 599–612, 2004. [27] Y. W. Chen and C. J. Lin, ”Combining SVMs with various feature
[17] J. Treurniet, ”A Network Activity Classification Schema and its Appli- selection strategies,” Feature extraction, pp. 315–324, Springer Berlin
cation to Scan Detection,” IEEE/ACM Transactions on Networking, Vol. Heidelberg, 2006.
19, No. 5, pp. 1396–1404, 2011. [28] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, ”Gene selection for
179
cancer classification using support vector machines,” Machine learning,
46(1-3), pp. 389–422, 2002.
[29] V. Sugumaran, V. Muralidharan and K. I. Ramachandran, ”Feature
selection using decision tree and classification through proximal support
vector machine for fault diagnostics of roller bearing,” Mechanical
systems and signal processing, Vol. 21, No. 2, pp. 930–942, 2007.
[30] F. Song, Z. Guo and D.Mei, ”Feature Selection Using Principal Compo-
nent Analysis,” in Proc. the International Conference on System Science,
Engineering Design and Manufacturing Informatization (ICSEM), pp.
27–30, 2010.
[31] T. Grtner, P. A. Flach, ”WBCSV M : Weighted Bayesian classification
based on support vector machines,” in Proc. the 18th International
Conference on Machine Learning (ICML), pp. 154161, 2001.
[32] J. Brank, M. Grobelnik, N. M. Frayling and D. Mladeni, ”Feature
Selection Using Linear Support Vector Machines,” Technical Report
MSR-TR-2002-63, Microsoft Research, Microsoft Corporation, 2002.
[33] O. Takata, et al., ”MWS Datasets 2016, Anti-Malware Engineer-
ing Workshop (MWS),” www.iwsec.org/mws/2016/20160714-takata-
dataset.pdf (2016) (accessed on April 29,2018).
[34] K. Ron, ”A study of cross-validation and bootstrap for accuracy es-
timation and model selection,” in Proc. the 14th international joint
conference on Artificial intelligence (IJCAI) , pp. 1137–1143 , 1995.
180