0% found this document useful (0 votes)

66 views

Feature Selection For Machine Learning-Based Eraly Detection of Distributed Cyber Attacks

The document discusses feature selection for machine learning-based early detection of distributed cyber attacks. It aims to determine the necessary number of features for effective detection of command and control (C&C) communication between compromised hosts and C&C servers, which is important for early detection of distributed attacks. The study uses honeypot data from 2008-2013 and analyzes how detection performance changes as features are removed from lowest to highest importance. It finds that performance generally improves with more features up to around 40, after which additional features provide little benefit. Certain cases show more features do not always mean better detection. The top 10 most influential features on classification are also discussed.

Uploaded by

Shehara Fernando

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Feature Selection For Machine Learning-Based Eraly Detection of Distributed Cyber Attacks

Uploaded by

Shehara Fernando

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2018 IEEE 16th Int. Conf. on Dependable, Autonomic & Secure Comp., 16th Int. Conf.

on Pervasive Intelligence &

Comp., 4th Int. Conf. on Big Data Intelligence & Comp., and 3rd Cyber Sci. & Tech. Cong.

Feature Selection For Machine Learning-Based

Early Detection of Distributed Cyber Attacks
Yaokai Feng Hitoshi Akiyama Liang Lu1
Faculty of Advanced Information Technology, Department of Informatics, Department of Informatics,
Kyushu University, Japan. Kyushu University, Japan. Kyushu University, Japan.
Email: fengyk@ait.kyushu-u.ac.jp Email: ru78toy6x hitc@icloud.com Email: lu.liang@jp.fujitsu.com

Kouichi Sakurai
Faculty of Informatics,
Kyushu University, Japan.
Email: sakurai@inf.kyushu-u.ac.jp

Abstract—It is well known that distributed cyber attacks huge amount of money has been invested on cyber security.
simultaneously launched from many hosts have caused the most The main reason for this is that attackers have also been more
serious problems in recent years including problems of privacy and more sophisticated.
leakage and denial of services. Thus, how to detect those attacks
at early stage has become an important and urgent topic in the Distributed attacks are those launched cooperatively by
cyber security community. For this purpose, recognizing C&C many compromised hosts. Such attacks are referred to as next-
(Command & Control) communication between compromised generation cyber-attacks in Xu’s work [1] and it is well known
bots and the C&C server becomes a crucially important issue, that such attacks are one kind of the most sophisticated attacks.
because C&C communication is in the preparation phase of According to many reports, distributed attacks have caused
distributed attacks. Although attack detection based on signature
has been practically applied since long ago, it is well-known that the most serious problems/losses in these years. Thus, many
it cannot efficiently deal with new kinds of attacks. In recent researchers and developers in the cyber security community
years, ML(Machine learning)-based detection methods have been have been working on how to detect and avoid such attacks.
studied widely. In those methods, feature selection is obviously In general, the attacker prepares or hijack a C&C server,
very important to the detection performance. We once utilized which is used to send attack instructions to the compromised
up to 55 features to pick out C&C traffic in order to accomplish
early detection of DDoS attacks. In this work, we try to answer hosts (bots). Then, the bots launch an actual distributed attack
the question that ”Are all of those features really necessary?” to the victim(s). Thus, the C&C communication is prepara-
We mainly investigate how the detection performance moves as tion phase for distributed attack. If such communication is
the features are removed from those having lowest importance recognized, the upcoming actual distributed attack might be
and we try to make it clear that what features should be blocked. Thus, the detection of the C&C communication is
payed attention for early detection of distributed attacks. We
use honeypot data collected during the period from 2008 to regarded as early detection of distributed attacks.
2013. SVM(Support Vector Machine) and PCA(Principal Com- There have been many cases of distributed attack. The non-
ponent Analysis) are utilized for feature selection and SVM and profit anti-spam organization Spamhaus [2] was suffering a
RF(Random Forest) are for building the classifier. We find that large DDoS attack against their website. The peak of that
the detection performance is generally getting better if more DDoS attack was up to about 300 Gbps on March 19, 2013
features are utilized. However, after the number of features has
reached around 40, the detection performance will not change [3]. That means the data in about 64 full DVDs of 4.7GB were
much even more features are used. It is also verified that, in poured on the server of Spamhaus’s server in every second,
some specific cases, more features do not always means a better which finally Knocked Spamhaus Offline. Another large DDoS
detection performance. We also discuss 10 important features attack peaking at around 400Gbps was reported by Cloudflare
which have the biggest influence on classification. on March 3, 2016 [4]. All 13 DNS root servers suffered from
Keywords-distributed cyber attacks; DDoS attacks; machine a distributed attack on June 25, 2016 and this wasn’t the first
learning; feature selection; early detection; time that critical DNS infrastructure was aimed at. At the end
of 2015, several root servers encountered a DDoS attack, and
I. D ISTRIBUTED C YBER ATTACK AND I TS E ARLY they also experienced a DDoS attack in the middle of May of
D ETECTION the same year that made services like Yelp and Alexa stop. The
The problems and losses caused by cyber attacks have been DDoS attack on June 25, 2017 lasted for around three hours
increasing greatly in recent years, despite many works for during which the average availability across all root servers
avoiding and detecting cyber attacks have been done and a dropped to around 50% of the normal [5].
Thus, the early detection of distributed attacks which can
1 Presently with Fujitsu Co. ltd, Japan. stop the upcoming attacks obviously becomes critically sig-

978-1-5386-7518-2/18/$31.00 ©2018 IEEE 173

DOI 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00040
nificant. In this study, we try to predict possible upcoming of signature-based detection. Gu et al. proposed a botnet de-
distributed attacks during their preparation phase by picking tection framework based on signature-based detection, called
out its C&C communication. BotHunter [21], which is an alert system when some bot be-
There have been many ML (Machine Learning)-based stud- haviors are detected by Snort [18] intrusion detection system.
ies on detecting distributed attacks including how to ac- Bot activity can also be detected by picking out C&C session
complish early detection. Those methods tried to find some that occurs before actual distributed attacks. Several methods
particular features of the abnormal traffic to distinguish them have been proposed to pick out C&C sessions using some
from the others. Also, we once used up to 55 traffic features features of those sessions. For example, the C&C server only
to implement early detection of distributed attacks. In this send commands whose packets are of small size. The work
study, we discuss what will happen if the number of features [22] analyzes network traffic and uses machine learning to
is decreased gradually. That is, how the detection performance detect IRC-based C&C communication. In that work, it was
changes as the features are decreased from those having low found that visible differences between regular sessions and
importance. Of course, the appropriate number of features is C&C sessions are the size of time-interval of packets. That is,
different according to different machine learning algorithms. C&C sessions show more normalized and small in packet size
Honeypot data collected during the period from 2008 to 2013 and small periodicity of interval time. The work [22] uses three
and the OSS Weka are used in our study. SVM and PCA different feature vectors focusing on total, first 16 packets and
are used for feature selection and SVM and RF (Random histogram separately.
Forest) are for building classifier. We find that, the detection The HTTP-based C&C communication has been confirmed
performance is generally getting better if more features are in around 2005. According to the investigation in the work
used. However, after the number have reached around 40, [23], a periodicity occurs among the operation obtaining the
the detection performance will not change much even more command and the standard deviation in the time interval of the
features are used. Moreover, in some specific cases, it is also communication between the C&C server and HTTP-based bots
verified that more features do not always lead a better detection is smaller compared with the frequent communication. How-
performance. Also, we will discuss the top-10 important ever, because of the network interception or an un-receivable
features for detection of C&C communication. command for some reason, a large number of exceptions may
appear.
II. E XISTING D ETECTION T ECHNOLOGIES Some methods to detect C&C session based on multiple
protocols also have been proposed. The work [24] proposed a
Many approaches have been proposed to detect cyber C&C traffic detection approach based on analysis of network
attacks [6], [7], including signature-based methods [8], traffic using seven features (e.g. the standard deviation of
histogram-based methods [10], [11], volume-based methods access time and access time). That study claims that their
[9] and information theory-based methods [13], [14]. However, approach is able to effectively detect C&C traffic even in
it is well known that signature-based methods cannot effi- multiple protocols. Our previous research [25] adopts 55
ciently deal with new kinds of attacks and new variants. This features for the same purpose, which is explained in the next
is because they can only detect the anomalies stored in a pre- section. In that study, adding some new features to the work
defined database of signatures. Many statistic histograms are [24], we defined a 55-dimensional feature vector to improve
built in histogram-based methods using clean traffic data and the detection performance.
all the histograms are mapped into a high-dimensional space. Generally, an actual detection system based machine learn-
Such methods are easy to understand. But, their false negative ing algorithms may be online or offline. An online system
rates are often very high [12]. Volume-based methods need to must be able to report alerts for possible attacks/anomalies and
thresholds that must be determined in advance, which is not the learning result should updated by invoking the machine
easy for most situations. Information theory-based methods learning algorithm repeatedly while an offline system can
also suffer from the following problems. Their performance use a training dataset and a machine learning algorithm to
is greatly dependent on the information theoretic measure; obtain the learning result, which will be used to pick out the
they behavior well only when a significantly large number attacks from the dataset that we want to detect the possible
of anomalies are present in the data and, moreover, it is attacks/anomalies in it. In order to evaluate the learning result
difficult to associate an anomaly score with a test instance and to know if it can be used for actual detection, a dataset
[15]. To detect scan attacks, the session information based for testing the learning result is also used to tune parameters
on the actions of communication protocols (such as TCP and or/and to decide what learning algorithm is suitable for the
UDP) was also used in the work [17] . Change-points based current application. Figures 1 and 2 depict the online detection
methods also have been proposed. For example, the work [16] and the offline detection, respectively.
proposed a change-point based method to detect TCP SYN
flood attacks and scan attacks by computing the characteristics III. O UR D ETECTION U SING 55 F EATURES
of the packets. In our previous work [25], all the related features we
Signature-based detection is a direct method to detect botnet could find, totally 55 features, were used and some simple
in a network and payload analysis is a common technique experimental results for verifying detection performance were

174
Fig. 1. The image of Online Detection.

Fig. 3. 55 Features Used in Our Detection.

Fig. 2. The image of Ofﬂine Detection.

IV. F EATURE S ELECTION AND I TS I NFLUENCE ON

D ETECTION P ERFORMANCE
presented. The total size and total amount of sent-packets
Good feature selection is critically important for many
and received-packets, respectively, are selected. The minimum
actual machine learning-based attack detections. At first, it can
packet size was also chosen as a feature because we consider
speed up the detection process and it can also perform feature
many C&C sessions only send commands to bots, which
discovery, that is, finding what features are really important.
means the payload of packets is almost empty. Besides, some
Many researches have indicated that SVMs may perform badly
statistics features of packet size are also selected including the
if many irrelevant features being used [26], [27], [28], [29].
average size and the size variance in a session. The interval
In the last section, our work using 55 features of traffic
time of successive packets is also contained in our feature
to pick out the C&C traffic is briefly explained. An obvious
vector because it was reported meaningful for picking out
question here is that ”Are all of them necessary?” and ”What
C&C traffic in the work [22]. Similar to packet size, total
if some of them are removed”. To answer these questions, in
interval time, minimum interval time, average interval time
this paper, after our proposal using 55 features is investigated
and the variance of interval times in a session are also used.
in more details, all the 55 features are ranked according to
Furthermore, because C&C sessions just issue some similar
their important degrees, and then the features will be removed
commands to the bots, the size of most C&C packets should
gradually to examine the change of the detection performance.
be concentrated. Thus, in our research, some scale features of
PCA is able to select a number of important individuals
packet size are also used. Specifically, the possible range of
from all the feature components and has the potential to per-
packet size is split into 15 intervals from 0-99 bytes, 100-199
form feature selection [30] and the work [31], [32] confirmed
bytes, ... . And, the proportion of the packets in each interval
that SVM can perform feature weighting. In this paper, PCA
is utilized. The Figure 3 shows all the 55 features.
and SVM are employed for ranking the features. The ranked
In this paper, after the behavior of our proposal using features are removed 5 features at one time from the ones of
55 features are investigated in more details, we will try to the least importance and the change of detection performance
decrease some features to see how the detection performance are examined using SVM and RF (Random Forest) . As shown
changes. In this way, we can know what features are most below, the TP (Ture Positive) rate, the FP (False Positive)
important to detection C&C communication. rate, Presion and F-Measure are used to evaluate the detection

175
performance. The detailed examination result will be presented
in the next section.
TP
T P rate = Recall = ;
TP + FN

FP
F P rate = ;
FP + TN

TP
P recision = ;
TP + FP

2 × P recision × Recall
F − M easure = ;
P recision + Recall

where T P =the number of the true positives; T N =the

number of the true negatives; F P =the number of false
positives; F N =the number of false negatives. Fig. 4. Distribution of average packet sizes. The X-axis is for packets sent;
The Y-axis packet received; Reds are C&C sessions and blues normal.
V. E XPERIMENT
A. Data
In our experiment, as C&C session data we used the CCC
datasets [33] including C08, C09, C10 and C13 datasets
(https://www.telecom-isac.jp/ccc/). CCC datasets were col-
lected by honeypots in a project supported by Japanese govern-
ment and managed by Cyber Clean Center. The normal session
data is collected in our lab for two months from Aug. 2012 to
Sept. 2012. We collected traffic packets in port No. 6667 used
by IRC and port No. 80 used by HTTP protocol. Because a bot
need to establish a connection with the C&C server, so we only
take sessions having bidirectional communication. Totally1162
C&C traffic sessions and 894 normal sessions were extracted
successfully.
B. Verification of availability of different features (examples) Fig. 5. Distribution of packet numbers. The X-axis is for packets sent; The
Y-axis packet received; Reds are C&C sessions and blues normal.
In order to investigate the impact of each feature on attack
detection, we need to know the distribution of the normal
sessions and the C&C sessions in the feature space. Here are
some results of our investigation. Figure 4 is the distribution • Step1: session split. All the sessions are extracted from
of average packet sizes in the feature space. The X-axis is the packet datasets.
for packets sent, The Y-axis is for packet received; The reds • Step2: feature extraction. 55 features are computed from
are C&C sessions and The blues normal. Figure 5 shows the the session datasets.
distribution of packet numbers. Figure 6 is the distribution of • Step3: feature ranking. the 55 features are ranked using
the variance of packet sizes. PCA and SVM, respectively.
From these distribution figures, we can observed that avail- • Step4: decision of test data and training data.
ability of different features is very different from each other • Step5: building classifier using SVM and RF, respec-
when they are used for detecting C&C traffic. tively.
• Step6: testing the classifier
C. Experiment process for feature selection • Step7: looping to Step 4 for 10 times.
After the packet lists in our datasets are split into sessions, • Step8: looping to Step 4 with a decreased number of
the features of each session are computed. In our experiment, features.
10-fold cross-validation [34] is used. That is, each of our
session datasets is split into 10 folds, one of which is used D. Experiment result about the influence of different feature
as test data and the other nine are used as training data. selections
Afterwards, select the next fold as test data. In this way, this
Two machine learning algorithms, SVM and RF, are used
process will be conducted 10 times and get the average as
and two algorithms, PCA and SVM, are for ranking the
the final result. In summary, our experiment has the following
features by their importance. Thus, totally, the influence of
eight steps, which is also depicted in Figure 7. This process
different feature selection on the detection performance is in-
is also simply shown in Figure 8.
vestigated in the following four different cases. As mentioned

176
packet
datasets

session
feature 55 features
split
extraction

feature
ranking

classifier

Fig. 6. Distribution of the variance of packet sizes. The X-axis is for packets
sent; The Y-axis packet received; Reds are C&C sessions and blues normal. removing features
gradually
result
10-fold
cross-validation
Fig. 8. Our experiment process.
start changing
test data

session splitting training

the number of features increased up to 40. In the case of
feature selection using PCA and classifying by SVM, however,
feature extracting testing
the irregularity appears (see Figure 12) , which is thought to
feature ranking lp-1
be due to the fact that PCA immediately deleted important
features for classification for SVM. Anyway, Figure 11 shows
fn=55; lp=10; N lp=0 that PCA feature selection also lead to a good detection
performance if RF is used for classification. Generally, the
Y detection performance is getting better if more features are
removing some features utilized. However, after the number of features has reached
(updating fn) around 40, the detection performance will not change much
even more features are used. It is also verified that, in some
fn<1 N specific cases, more features do not always means a better
detection performance. That is, some features may have bad
Y
influence to the detection performance. Thus, our experiment
end
verified again that the importance of feature selection. It
is necessary to investigate in detail why the results are so
Fig. 7. The flow of our experiment process.
different for different algorithms of machine learning and
feature selection, which will be our future work.
Moreover, from our experiment, the most important 10
in Section IV, the TP rate, the FP rate, Precision and F- features for picking out the C&C traffic were found and listed
Measure are evaluated for each case. in Table I.
1) SVM feature selection and SVM classifier: The experi-
TABLE I
ment result is shown in Figure 9. T HE MOST IMPORTANT FEATURES FOR DETECTING C&C TRAFFIC .
2) SVM feature selection and RF classifier: The experiment Feature Description
result is shown in Figure 10. Spc amount of packets sent
3) PCA feature selection and RF classifier: The experi- FlagR proportion of packets with flag R in a session
ment result is shown in Figure 11. Sip1 Proportion of the sent packets having the size of 100-199B
Sip0 Proportion of the sent packets having the size of 0-99B
4) PCA feature selection and SVM classifier: The experi-
FlagS proportion of packets with flag S in a session
ment result is shown in Figure 12. Svar Variance of sent packets
RITmax Maximum interval time in received package
E. Observation RITvar Variance of interval time of received packages
Sip5 Proportion of the sent packets having the size of 500-599B
From Figures 9 and 10, a slow improvement on the detection Rvar Variance of packet size of the received packets
performance is observed after the number of features increased
up to 20. From Figure 11, a slow improvement appears after

177
(a) the TP rate
(a) the TP rate

(a) the FP rate

(a) the Precision

(a) the F-Measure (a) the F-Measure

Fig. 9. Detection performance based on SVM feature selection and SVM Fig. 10. Detection performance based on SVM feature selection and RF
classiﬁer. classiﬁer.

VI. C ONCLUSION AND F UTURE W ORK ACKNOWLEDGE

This work was partially supported by JSPS KAKENHI
In this paper, we focused on the issue of feature selection Grant Numbers JP17K00187 and JP18K11295. This work is
for early detection of distributed cyber attacks. We imple- also partially supported by Strategic International Research
mented the early detection by detecting C&C communication Cooperative Program, Japan Science and Technology Agency
of distributed attacks because those communication is at the (JST).
preparation phase of distributed attacks. Based on our previous
research using 55 features to detect C&C communication, in R EFERENCES
this paper we investigated what if we remove the features [1] S. Xu S., ”Collaborative Attack vs. Collaborative Defense,” in Proc. the
from those of the least importance. We did this for the 4th International Conference on Collaborative Computing (Collaborate-
Com), pp. 217–228, 2009.
purpose of finding that what features are actually critical for [2] The non-profit anti-spam organization Spamhaus,
early detection of distributed attacks. From our experiment https://www.spamhaus.org/ (accessed on April 19, 2018).
using traffic data collected by honeypots, we observed that [3] ”The DDoS That Knocked Spamhaus Offline,” reported by Cloudflare
on March 30, 2013. https://blog.cloudflare.com/the-ddos-that-knocked-
the detection performance is generally getting better if more spamhaus-offline-and-ho/ (accessed on April 19, 2018).
features are utilized. However, after the number of features [4] ”400Gbps: Winter of Whopping Weekend DDoS Attacks,” reported by
has reached around 40, the detection performance will not Cloudflare on March 3, 2016. https://blog.cloudflare.com/a-winter-of-
400gbps-weekend-ddos-attacks/ (accessed on April 19, 2018).
change much even more features are used. We also found [5] ”DDoS Attack Has Varying Impacts on DNS Root Servers,” reported by
the top-10 important features for detecting C&C traffic. We ThousandEyes on July 19th, 2016. https://blog.thousandeyes.com/ddos-
again verified that some ”bad” features would deteriorate the attack-varying-impacts-dns-root-servers/ (accessed on April 19, 2018).
[6] Y. Feng, Y. Hori, and K. Sakurai, ”A Proposal for Detecting Distributed
detection performance. Cyber-Attacks Using Automatic Thresholding,” in Proc. the 10th Asia
As future work, we will analyze the experiment result in Joint Conference on Information Security (AsiaJCIS), pp. 152–159,
more detail and find the detailed reason why the detection 2015.
[7] Y. Feng, Y. Hori, K. Sakurai, J. Takeuchi, ”A Behavior-Based Method for
performance changes in such ways. Also, we will verify our Detecting Distributed Scan Attacks in Darknets,” Journal of Information
observation using other traffic datasets. Processing, Vol.21, No.3, page 527-538, 2013.

178
(a) the TP rate (a) the TP rate

(a) the FP rate (a) the FP rate

(a) the Precision (a) the Precision

(a) the F-Measure (a) the F-Measure

Fig. 11. Detection performance based on PCA feature selection and RF Fig. 12. Detection performance based on PCA feature selection and SVM
classiﬁer. classiﬁer.

[8] Y. Tang, ”Defending against Internet Worms: a Signature-based Ap- [18] Snort Users Manual, http://www.snort.org/docs (accessed on June 20,
proach,” in Proc. the 24th IEEE Annual Joint Conference of the 2018).
Computer and Communications Societies (INFOCOM), pp. 1384–1394,
[19] C. Gates, ”The Modeling and Detection of Distributed Port Scans: a
2005.
Thesis Proposal,” Technical Report CS-2003-01, Dalhousie University,
[9] I. Yazid, A. Hanan and M. Aizaini, ”Volume-based Network Intrusion 2003.
Attacks Detection,” Advanced Computer Network and Security, UTM
[20] V. Yegneswaran, P. Barford and J. Ullrich, ”Internet Intrusions: Global
Press, pp. 147–162, 2008.
Characteristics and Prevalence,” in Proc. 2003 ACM Joint International
[10] A. Kind, M. P. Stoecklin and X. Dimitropoulos, ”Histogram-Based
Conference on Measurement and Modeling of Computer Systems, pp.
Traffic Anomaly Detection,” IEEE Transactions on Network Service
138–147, 2003.
Management, Vol. 6, No. 2, pp. 1–12 (2009).
[21] G. Gu, P. Porras, V. Yegneswaran, M. Fong, W. Lee, ”BotHunter:
[11] E. Eskin and W. Lee, ”Modeling System Call for Intrusion Detection
Detecting Malware Infection Through IDS-driven Dialog Correlation,”
with Dynamic Window Sizes,” in Proc. DARPA Information Survivabil-
in Proc. the 16th USENIX security symposium, pp. 167–182, 2007.
ity Conference and Exposition (DISCEX), pp. 165–175, 2001.
[12] Y. Feng, Y. Hori, K. Sakurai and J. Takeuchi, ”A Behavior-based Method [22] S. Kondo and N. Sato, ”Botnet traffic detection techniques by C&C
for Detecting Outbreaks of Low-rate Attacks,” in Proc. 3rd Workshop session classification using SVM,” in Proc. 2nd international conference
on Network Technologies for Security, Administration and Protection on Advances in information and computer security, pp. 29–31, 2007.
(NETSAP), pp. 267–272, 2012. [23] D. Ashley, ”An algorithm for http bot detection,” University of Texas at
[13] Y. Xiang, K. Li and W. Zhou, ”Low-Rate DDoS Attacks Detection and Austin - Information Security Office, 2011.
Traceback by Using New Information Metrics,” IEEE Transactions on [24] S. Yamauchi, J. Kawamoto and K. Sakurai, ”Evaluation of Machine
Information Forensics and Security, Vol.6, No.2, pp. 426–437, 2011. Learning Techniques for C&C traffic Classification,” (in Japanese) IPSJ
[14] W. Lee and D. Xiang, ”Information-theoretic Measures for Anomaly Journal, Vol. 56, No. 9 : pp. 1745–1753, 2015.
Detection,” in Proc. IEEE Symposium on Security and Privacy, pp. 130– [25] L. Lu, Y. Feng and K. Sakurai, ”C&C session detection using random
143, 2001. forest,” in Proc. the 11th ACM International Conference (IMCOM) ,
[15] V. Chandola, A. Banerjee and V. Kumar, ”Anomaly Detection: a Survey,” 2017.
ACM Computing Survey, Vol.41, No.3, pp. 1–72, 2009. [26] J. Weston et al. , ”Feature Selection for SVMs, ”
[16] M. S. Kim, H. J. Kang and S. C. Hong; ”A Flow-based Method for http://www.cs.columbia.edu/ jebara/6772/papers/weston01feature.pdf
Abnormal Network Traffic Detection,” in Proc. IEEE/IPIP Network (accessed on April 29, 2018).
Operations and Management Symposium, pp. 599–612, 2004. [27] Y. W. Chen and C. J. Lin, ”Combining SVMs with various feature
[17] J. Treurniet, ”A Network Activity Classification Schema and its Appli- selection strategies,” Feature extraction, pp. 315–324, Springer Berlin
cation to Scan Detection,” IEEE/ACM Transactions on Networking, Vol. Heidelberg, 2006.
19, No. 5, pp. 1396–1404, 2011. [28] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, ”Gene selection for

179
cancer classification using support vector machines,” Machine learning,
46(1-3), pp. 389–422, 2002.
[29] V. Sugumaran, V. Muralidharan and K. I. Ramachandran, ”Feature
selection using decision tree and classification through proximal support
vector machine for fault diagnostics of roller bearing,” Mechanical
systems and signal processing, Vol. 21, No. 2, pp. 930–942, 2007.
[30] F. Song, Z. Guo and D.Mei, ”Feature Selection Using Principal Compo-
nent Analysis,” in Proc. the International Conference on System Science,
Engineering Design and Manufacturing Informatization (ICSEM), pp.
27–30, 2010.
[31] T. Grtner, P. A. Flach, ”WBCSV M : Weighted Bayesian classification
based on support vector machines,” in Proc. the 18th International
Conference on Machine Learning (ICML), pp. 154161, 2001.
[32] J. Brank, M. Grobelnik, N. M. Frayling and D. Mladeni, ”Feature
Selection Using Linear Support Vector Machines,” Technical Report
MSR-TR-2002-63, Microsoft Research, Microsoft Corporation, 2002.
[33] O. Takata, et al., ”MWS Datasets 2016, Anti-Malware Engineer-
ing Workshop (MWS),” www.iwsec.org/mws/2016/20160714-takata-
dataset.pdf (2016) (accessed on April 29,2018).
[34] K. Ron, ”A study of cross-validation and bootstrap for accuracy es-
timation and model selection,” in Proc. the 14th international joint
conference on Artificial intelligence (IJCAI) , pp. 1137–1143 , 1995.

180

CC Question
40% (5)
CC Question
48 pages
Legal Register Example
100% (6)
Legal Register Example
3 pages
A New Database Intrusion Detection Approach Based On Hybrid Meta-Heuristics
No ratings yet
A New Database Intrusion Detection Approach Based On Hybrid Meta-Heuristics
17 pages
Deep Reinforcement Learning for Cyber Security
No ratings yet
Deep Reinforcement Learning for Cyber Security
17 pages
Intrusion Detection Systems Using Decision Trees and Support Vector Machines
No ratings yet
Intrusion Detection Systems Using Decision Trees and Support Vector Machines
16 pages
Cyber Intrusion Detection Using Machine Learning Classification Techniques
No ratings yet
Cyber Intrusion Detection Using Machine Learning Classification Techniques
11 pages
A Review On The Effectiveness of Machine Learning and Deep Learning Algorithms For Cyber Security
No ratings yet
A Review On The Effectiveness of Machine Learning and Deep Learning Algorithms For Cyber Security
19 pages
Network Intrusion Detection System Using
No ratings yet
Network Intrusion Detection System Using
9 pages
Review On Cyber-Physical Systems (2017)
No ratings yet
Review On Cyber-Physical Systems (2017)
14 pages
Cyber Physical Systems: The Role of Machine Learning and Cyber Security in Present and Future
No ratings yet
Cyber Physical Systems: The Role of Machine Learning and Cyber Security in Present and Future
16 pages
Systematic Survey of Advanced Metering Infrastructure Security-Vulnerabilities Attacks Countermeasures and Future Vision
No ratings yet
Systematic Survey of Advanced Metering Infrastructure Security-Vulnerabilities Attacks Countermeasures and Future Vision
20 pages
Long-Short Term Memory Network Based Model For Reverse Brute Force Attack Detection
No ratings yet
Long-Short Term Memory Network Based Model For Reverse Brute Force Attack Detection
12 pages
Collaborative Anomaly-Based Intrusion Detection IN Mobile Ad Hoc Networks
No ratings yet
Collaborative Anomaly-Based Intrusion Detection IN Mobile Ad Hoc Networks
4 pages
Cyber Attacks and Defences
No ratings yet
Cyber Attacks and Defences
15 pages
Enhancing Network Security in IoT Using Machine Learning - Based Anomaly Detection
No ratings yet
Enhancing Network Security in IoT Using Machine Learning - Based Anomaly Detection
5 pages
Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
No ratings yet
Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
59 pages
Machine Learning Paper-2
No ratings yet
Machine Learning Paper-2
4 pages
The Role of 5G Networks
No ratings yet
The Role of 5G Networks
6 pages
(IJETA-V8I5P1) :yew Kee Wong
No ratings yet
(IJETA-V8I5P1) :yew Kee Wong
5 pages
Machine Learning and Deep Learning Methods For Cybersecurity
No ratings yet
Machine Learning and Deep Learning Methods For Cybersecurity
17 pages
Unsupervised Feature Extraction With Autoencoders For EEG Based Multiclass Motor Imagery BCI
No ratings yet
Unsupervised Feature Extraction With Autoencoders For EEG Based Multiclass Motor Imagery BCI
10 pages
Parkinsons Disease Prediction - Ieee
No ratings yet
Parkinsons Disease Prediction - Ieee
5 pages
Computer Aided Technology Based On Graph Sample and Aggregate Attention Network Optimized For Soccer Teaching and Training
No ratings yet
Computer Aided Technology Based On Graph Sample and Aggregate Attention Network Optimized For Soccer Teaching and Training
18 pages
Tutorials - Software Engineering
No ratings yet
Tutorials - Software Engineering
5 pages
Applications of Machine Learning To Optimize Tennis
No ratings yet
Applications of Machine Learning To Optimize Tennis
20 pages
A Machine Learning Framework For Sport Result Prediction
No ratings yet
A Machine Learning Framework For Sport Result Prediction
7 pages
Documentation (218609p)
No ratings yet
Documentation (218609p)
65 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
Multiple Choice Questions: Net/Set Preparation MCQ On Numerical Analysis by S. M. Chinchole
No ratings yet
Multiple Choice Questions: Net/Set Preparation MCQ On Numerical Analysis by S. M. Chinchole
25 pages
Movie Ticketing System
No ratings yet
Movie Ticketing System
14 pages
Week 2 Python For Data Science
No ratings yet
Week 2 Python For Data Science
27 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Ai Performance Sports
No ratings yet
Ai Performance Sports
2 pages
Real-Time Face Detection On A "Dual-Sensor" Smart Camera Using Smooth-Edges Technique
No ratings yet
Real-Time Face Detection On A "Dual-Sensor" Smart Camera Using Smooth-Edges Technique
5 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
Currency Recognition On Mobile Phones Proposed System Modules
No ratings yet
Currency Recognition On Mobile Phones Proposed System Modules
26 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Artificial Neural Network (ANN)
No ratings yet
Artificial Neural Network (ANN)
34 pages
Clustering Seasonal Performances of Soccer Teams Based On Situational Score Line
No ratings yet
Clustering Seasonal Performances of Soccer Teams Based On Situational Score Line
6 pages
Computer Science Project
No ratings yet
Computer Science Project
19 pages
Survey of Machine Learning in Phishing Detection Research
No ratings yet
Survey of Machine Learning in Phishing Detection Research
21 pages
The Application of Machine Learning For Sport Result Prediction A Review
No ratings yet
The Application of Machine Learning For Sport Result Prediction A Review
49 pages
Ai and Machine Learning For Network Security - Applications and Case Studies
No ratings yet
Ai and Machine Learning For Network Security - Applications and Case Studies
13 pages
Chronic Kidney Disease Prediction Using Machine Learning Algorithms and The Important Attributes For The Detection
No ratings yet
Chronic Kidney Disease Prediction Using Machine Learning Algorithms and The Important Attributes For The Detection
4 pages
Face Detection and Recognition Using Image Processing
No ratings yet
Face Detection and Recognition Using Image Processing
43 pages
Analysing Ad Budget
100% (1)
Analysing Ad Budget
4 pages
A Comprehensive Review of Cyber-Attacks and Defense Mechanisms For Improving Security in Smart Grid Energy Systems Past, Present and Future 2023
No ratings yet
A Comprehensive Review of Cyber-Attacks and Defense Mechanisms For Improving Security in Smart Grid Energy Systems Past, Present and Future 2023
20 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
The Growth of Machine Learning in Cybersecurity
No ratings yet
The Growth of Machine Learning in Cybersecurity
17 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
TP Regression
100% (1)
TP Regression
1 page
Assignment 11
100% (1)
Assignment 11
7 pages
5 A Machine Learning Approach For Skin Disease Detection and 2022 Healthcare
No ratings yet
5 A Machine Learning Approach For Skin Disease Detection and 2022 Healthcare
15 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Identification of HATE Speech Tweets in Pashto Language Using Machine Learning Techniques
No ratings yet
Identification of HATE Speech Tweets in Pashto Language Using Machine Learning Techniques
8 pages
Actividad Semana 4 - Jupyter Notebook
100% (1)
Actividad Semana 4 - Jupyter Notebook
7 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Militant and Weapon Detection Final Report
No ratings yet
Militant and Weapon Detection Final Report
63 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
1294-Manuscript (Without Author Details) - 5326-1-10-20201227
No ratings yet
1294-Manuscript (Without Author Details) - 5326-1-10-20201227
11 pages
omkar (3)
No ratings yet
omkar (3)
28 pages
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
Brochure of Chiang Rai City Municipality
No ratings yet
Brochure of Chiang Rai City Municipality
8 pages
L'Alcoran de Mahomet (1734) PDF
No ratings yet
L'Alcoran de Mahomet (1734) PDF
603 pages
Arun Kumar Manglik-1
No ratings yet
Arun Kumar Manglik-1
23 pages
Rajesh Final Book@28!11!16
No ratings yet
Rajesh Final Book@28!11!16
168 pages
Conquilla vs. Bernardo, A.M. No. MTJ-09-1737, February 9, 2011
No ratings yet
Conquilla vs. Bernardo, A.M. No. MTJ-09-1737, February 9, 2011
5 pages
Creativity in Public Relations PDF
No ratings yet
Creativity in Public Relations PDF
19 pages
TTL 1 Course Outline
No ratings yet
TTL 1 Course Outline
4 pages
Powertrack Catalogue 07
No ratings yet
Powertrack Catalogue 07
8 pages
Chapter 8 - Notes Payable and Debt Restructuring: Problem 8-7
No ratings yet
Chapter 8 - Notes Payable and Debt Restructuring: Problem 8-7
3 pages
Why Do Local Governments Return Money To The Treasury?
No ratings yet
Why Do Local Governments Return Money To The Treasury?
1 page
SAP Concur: The Complete Smart Solution For Your Business Travel
No ratings yet
SAP Concur: The Complete Smart Solution For Your Business Travel
1 page
OB 58 OB Ch12 Leadership and Followership
No ratings yet
OB 58 OB Ch12 Leadership and Followership
26 pages
1506a E88tag3 - 250 Kva
No ratings yet
1506a E88tag3 - 250 Kva
12 pages
Affidavit For Common Law Name Change
No ratings yet
Affidavit For Common Law Name Change
1 page
G.R. No. 246255 - Cordova v. Ty
No ratings yet
G.R. No. 246255 - Cordova v. Ty
12 pages
DHM-T72X10FS
No ratings yet
DHM-T72X10FS
2 pages
Earthsci Lesson 7&8
No ratings yet
Earthsci Lesson 7&8
6 pages
Motivational Strategies Followed in Accenture
0% (1)
Motivational Strategies Followed in Accenture
10 pages
Research Study Concrete Bricks
No ratings yet
Research Study Concrete Bricks
33 pages
LV Home Ipid
No ratings yet
LV Home Ipid
2 pages
MF-218 Piping and Miscellaneous Practice in Engine Room PDF
No ratings yet
MF-218 Piping and Miscellaneous Practice in Engine Room PDF
35 pages
LOAN
No ratings yet
LOAN
3 pages
2018 Yam Marine b2b Eu Web
No ratings yet
2018 Yam Marine b2b Eu Web
59 pages
005 Edsa Shangri-La Hotel v. BF Corp.
75% (4)
005 Edsa Shangri-La Hotel v. BF Corp.
2 pages
DASH Eating Plan
No ratings yet
DASH Eating Plan
4 pages
Offer Letter_Ajharuddin_Ambuja Cement Darlaghat Shutdown
No ratings yet
Offer Letter_Ajharuddin_Ambuja Cement Darlaghat Shutdown
5 pages
Beneheart R12: Peripherals and Communications
No ratings yet
Beneheart R12: Peripherals and Communications
4 pages
Heuristics
No ratings yet
Heuristics
4 pages