Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Feature Selection For Machine Learning-Based Eraly Detection of Distributed Cyber Attacks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2018 IEEE 16th Int. Conf. on Dependable, Autonomic & Secure Comp., 16th Int. Conf.

on Pervasive Intelligence &


Comp., 4th Int. Conf. on Big Data Intelligence & Comp., and 3rd Cyber Sci. & Tech. Cong.

Feature Selection For Machine Learning-Based


Early Detection of Distributed Cyber Attacks
Yaokai Feng Hitoshi Akiyama Liang Lu1
Faculty of Advanced Information Technology, Department of Informatics, Department of Informatics,
Kyushu University, Japan. Kyushu University, Japan. Kyushu University, Japan.
Email: fengyk@ait.kyushu-u.ac.jp Email: ru78toy6x hitc@icloud.com Email: lu.liang@jp.fujitsu.com

Kouichi Sakurai
Faculty of Informatics,
Kyushu University, Japan.
Email: sakurai@inf.kyushu-u.ac.jp

Abstract—It is well known that distributed cyber attacks huge amount of money has been invested on cyber security.
simultaneously launched from many hosts have caused the most The main reason for this is that attackers have also been more
serious problems in recent years including problems of privacy and more sophisticated.
leakage and denial of services. Thus, how to detect those attacks
at early stage has become an important and urgent topic in the Distributed attacks are those launched cooperatively by
cyber security community. For this purpose, recognizing C&C many compromised hosts. Such attacks are referred to as next-
(Command & Control) communication between compromised generation cyber-attacks in Xu’s work [1] and it is well known
bots and the C&C server becomes a crucially important issue, that such attacks are one kind of the most sophisticated attacks.
because C&C communication is in the preparation phase of According to many reports, distributed attacks have caused
distributed attacks. Although attack detection based on signature
has been practically applied since long ago, it is well-known that the most serious problems/losses in these years. Thus, many
it cannot efficiently deal with new kinds of attacks. In recent researchers and developers in the cyber security community
years, ML(Machine learning)-based detection methods have been have been working on how to detect and avoid such attacks.
studied widely. In those methods, feature selection is obviously In general, the attacker prepares or hijack a C&C server,
very important to the detection performance. We once utilized which is used to send attack instructions to the compromised
up to 55 features to pick out C&C traffic in order to accomplish
early detection of DDoS attacks. In this work, we try to answer hosts (bots). Then, the bots launch an actual distributed attack
the question that ”Are all of those features really necessary?” to the victim(s). Thus, the C&C communication is prepara-
We mainly investigate how the detection performance moves as tion phase for distributed attack. If such communication is
the features are removed from those having lowest importance recognized, the upcoming actual distributed attack might be
and we try to make it clear that what features should be blocked. Thus, the detection of the C&C communication is
payed attention for early detection of distributed attacks. We
use honeypot data collected during the period from 2008 to regarded as early detection of distributed attacks.
2013. SVM(Support Vector Machine) and PCA(Principal Com- There have been many cases of distributed attack. The non-
ponent Analysis) are utilized for feature selection and SVM and profit anti-spam organization Spamhaus [2] was suffering a
RF(Random Forest) are for building the classifier. We find that large DDoS attack against their website. The peak of that
the detection performance is generally getting better if more DDoS attack was up to about 300 Gbps on March 19, 2013
features are utilized. However, after the number of features has
reached around 40, the detection performance will not change [3]. That means the data in about 64 full DVDs of 4.7GB were
much even more features are used. It is also verified that, in poured on the server of Spamhaus’s server in every second,
some specific cases, more features do not always means a better which finally Knocked Spamhaus Offline. Another large DDoS
detection performance. We also discuss 10 important features attack peaking at around 400Gbps was reported by Cloudflare
which have the biggest influence on classification. on March 3, 2016 [4]. All 13 DNS root servers suffered from
Keywords-distributed cyber attacks; DDoS attacks; machine a distributed attack on June 25, 2016 and this wasn’t the first
learning; feature selection; early detection; time that critical DNS infrastructure was aimed at. At the end
of 2015, several root servers encountered a DDoS attack, and
I. D ISTRIBUTED C YBER ATTACK AND I TS E ARLY they also experienced a DDoS attack in the middle of May of
D ETECTION the same year that made services like Yelp and Alexa stop. The
The problems and losses caused by cyber attacks have been DDoS attack on June 25, 2017 lasted for around three hours
increasing greatly in recent years, despite many works for during which the average availability across all root servers
avoiding and detecting cyber attacks have been done and a dropped to around 50% of the normal [5].
Thus, the early detection of distributed attacks which can
1 Presently with Fujitsu Co. ltd, Japan. stop the upcoming attacks obviously becomes critically sig-

978-1-5386-7518-2/18/$31.00 ©2018 IEEE 173


DOI 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00040
nificant. In this study, we try to predict possible upcoming of signature-based detection. Gu et al. proposed a botnet de-
distributed attacks during their preparation phase by picking tection framework based on signature-based detection, called
out its C&C communication. BotHunter [21], which is an alert system when some bot be-
There have been many ML (Machine Learning)-based stud- haviors are detected by Snort [18] intrusion detection system.
ies on detecting distributed attacks including how to ac- Bot activity can also be detected by picking out C&C session
complish early detection. Those methods tried to find some that occurs before actual distributed attacks. Several methods
particular features of the abnormal traffic to distinguish them have been proposed to pick out C&C sessions using some
from the others. Also, we once used up to 55 traffic features features of those sessions. For example, the C&C server only
to implement early detection of distributed attacks. In this send commands whose packets are of small size. The work
study, we discuss what will happen if the number of features [22] analyzes network traffic and uses machine learning to
is decreased gradually. That is, how the detection performance detect IRC-based C&C communication. In that work, it was
changes as the features are decreased from those having low found that visible differences between regular sessions and
importance. Of course, the appropriate number of features is C&C sessions are the size of time-interval of packets. That is,
different according to different machine learning algorithms. C&C sessions show more normalized and small in packet size
Honeypot data collected during the period from 2008 to 2013 and small periodicity of interval time. The work [22] uses three
and the OSS Weka are used in our study. SVM and PCA different feature vectors focusing on total, first 16 packets and
are used for feature selection and SVM and RF (Random histogram separately.
Forest) are for building classifier. We find that, the detection The HTTP-based C&C communication has been confirmed
performance is generally getting better if more features are in around 2005. According to the investigation in the work
used. However, after the number have reached around 40, [23], a periodicity occurs among the operation obtaining the
the detection performance will not change much even more command and the standard deviation in the time interval of the
features are used. Moreover, in some specific cases, it is also communication between the C&C server and HTTP-based bots
verified that more features do not always lead a better detection is smaller compared with the frequent communication. How-
performance. Also, we will discuss the top-10 important ever, because of the network interception or an un-receivable
features for detection of C&C communication. command for some reason, a large number of exceptions may
appear.
II. E XISTING D ETECTION T ECHNOLOGIES Some methods to detect C&C session based on multiple
protocols also have been proposed. The work [24] proposed a
Many approaches have been proposed to detect cyber C&C traffic detection approach based on analysis of network
attacks [6], [7], including signature-based methods [8], traffic using seven features (e.g. the standard deviation of
histogram-based methods [10], [11], volume-based methods access time and access time). That study claims that their
[9] and information theory-based methods [13], [14]. However, approach is able to effectively detect C&C traffic even in
it is well known that signature-based methods cannot effi- multiple protocols. Our previous research [25] adopts 55
ciently deal with new kinds of attacks and new variants. This features for the same purpose, which is explained in the next
is because they can only detect the anomalies stored in a pre- section. In that study, adding some new features to the work
defined database of signatures. Many statistic histograms are [24], we defined a 55-dimensional feature vector to improve
built in histogram-based methods using clean traffic data and the detection performance.
all the histograms are mapped into a high-dimensional space. Generally, an actual detection system based machine learn-
Such methods are easy to understand. But, their false negative ing algorithms may be online or offline. An online system
rates are often very high [12]. Volume-based methods need to must be able to report alerts for possible attacks/anomalies and
thresholds that must be determined in advance, which is not the learning result should updated by invoking the machine
easy for most situations. Information theory-based methods learning algorithm repeatedly while an offline system can
also suffer from the following problems. Their performance use a training dataset and a machine learning algorithm to
is greatly dependent on the information theoretic measure; obtain the learning result, which will be used to pick out the
they behavior well only when a significantly large number attacks from the dataset that we want to detect the possible
of anomalies are present in the data and, moreover, it is attacks/anomalies in it. In order to evaluate the learning result
difficult to associate an anomaly score with a test instance and to know if it can be used for actual detection, a dataset
[15]. To detect scan attacks, the session information based for testing the learning result is also used to tune parameters
on the actions of communication protocols (such as TCP and or/and to decide what learning algorithm is suitable for the
UDP) was also used in the work [17] . Change-points based current application. Figures 1 and 2 depict the online detection
methods also have been proposed. For example, the work [16] and the offline detection, respectively.
proposed a change-point based method to detect TCP SYN
flood attacks and scan attacks by computing the characteristics III. O UR D ETECTION U SING 55 F EATURES
of the packets. In our previous work [25], all the related features we
Signature-based detection is a direct method to detect botnet could find, totally 55 features, were used and some simple
in a network and payload analysis is a common technique experimental results for verifying detection performance were

174
Fig. 1. The image of Online Detection.

Fig. 3. 55 Features Used in Our Detection.


Fig. 2. The image of Offline Detection.

IV. F EATURE S ELECTION AND I TS I NFLUENCE ON


D ETECTION P ERFORMANCE
presented. The total size and total amount of sent-packets
Good feature selection is critically important for many
and received-packets, respectively, are selected. The minimum
actual machine learning-based attack detections. At first, it can
packet size was also chosen as a feature because we consider
speed up the detection process and it can also perform feature
many C&C sessions only send commands to bots, which
discovery, that is, finding what features are really important.
means the payload of packets is almost empty. Besides, some
Many researches have indicated that SVMs may perform badly
statistics features of packet size are also selected including the
if many irrelevant features being used [26], [27], [28], [29].
average size and the size variance in a session. The interval
In the last section, our work using 55 features of traffic
time of successive packets is also contained in our feature
to pick out the C&C traffic is briefly explained. An obvious
vector because it was reported meaningful for picking out
question here is that ”Are all of them necessary?” and ”What
C&C traffic in the work [22]. Similar to packet size, total
if some of them are removed”. To answer these questions, in
interval time, minimum interval time, average interval time
this paper, after our proposal using 55 features is investigated
and the variance of interval times in a session are also used.
in more details, all the 55 features are ranked according to
Furthermore, because C&C sessions just issue some similar
their important degrees, and then the features will be removed
commands to the bots, the size of most C&C packets should
gradually to examine the change of the detection performance.
be concentrated. Thus, in our research, some scale features of
PCA is able to select a number of important individuals
packet size are also used. Specifically, the possible range of
from all the feature components and has the potential to per-
packet size is split into 15 intervals from 0-99 bytes, 100-199
form feature selection [30] and the work [31], [32] confirmed
bytes, ... . And, the proportion of the packets in each interval
that SVM can perform feature weighting. In this paper, PCA
is utilized. The Figure 3 shows all the 55 features.
and SVM are employed for ranking the features. The ranked
In this paper, after the behavior of our proposal using features are removed 5 features at one time from the ones of
55 features are investigated in more details, we will try to the least importance and the change of detection performance
decrease some features to see how the detection performance are examined using SVM and RF (Random Forest) . As shown
changes. In this way, we can know what features are most below, the TP (Ture Positive) rate, the FP (False Positive)
important to detection C&C communication. rate, Presion and F-Measure are used to evaluate the detection

175
performance. The detailed examination result will be presented
in the next section.
TP
T P rate = Recall = ;
TP + FN

FP
F P rate = ;
FP + TN

TP
P recision = ;
TP + FP

2 × P recision × Recall
F − M easure = ;
P recision + Recall

where T P =the number of the true positives; T N =the


number of the true negatives; F P =the number of false
positives; F N =the number of false negatives. Fig. 4. Distribution of average packet sizes. The X-axis is for packets sent;
The Y-axis packet received; Reds are C&C sessions and blues normal.
V. E XPERIMENT
A. Data
In our experiment, as C&C session data we used the CCC
datasets [33] including C08, C09, C10 and C13 datasets
(https://www.telecom-isac.jp/ccc/). CCC datasets were col-
lected by honeypots in a project supported by Japanese govern-
ment and managed by Cyber Clean Center. The normal session
data is collected in our lab for two months from Aug. 2012 to
Sept. 2012. We collected traffic packets in port No. 6667 used
by IRC and port No. 80 used by HTTP protocol. Because a bot
need to establish a connection with the C&C server, so we only
take sessions having bidirectional communication. Totally1162
C&C traffic sessions and 894 normal sessions were extracted
successfully.
B. Verification of availability of different features (examples) Fig. 5. Distribution of packet numbers. The X-axis is for packets sent; The
Y-axis packet received; Reds are C&C sessions and blues normal.
In order to investigate the impact of each feature on attack
detection, we need to know the distribution of the normal
sessions and the C&C sessions in the feature space. Here are
some results of our investigation. Figure 4 is the distribution • Step1: session split. All the sessions are extracted from
of average packet sizes in the feature space. The X-axis is the packet datasets.
for packets sent, The Y-axis is for packet received; The reds • Step2: feature extraction. 55 features are computed from
are C&C sessions and The blues normal. Figure 5 shows the the session datasets.
distribution of packet numbers. Figure 6 is the distribution of • Step3: feature ranking. the 55 features are ranked using
the variance of packet sizes. PCA and SVM, respectively.
From these distribution figures, we can observed that avail- • Step4: decision of test data and training data.
ability of different features is very different from each other • Step5: building classifier using SVM and RF, respec-
when they are used for detecting C&C traffic. tively.
• Step6: testing the classifier
C. Experiment process for feature selection • Step7: looping to Step 4 for 10 times.
After the packet lists in our datasets are split into sessions, • Step8: looping to Step 4 with a decreased number of
the features of each session are computed. In our experiment, features.
10-fold cross-validation [34] is used. That is, each of our
session datasets is split into 10 folds, one of which is used D. Experiment result about the influence of different feature
as test data and the other nine are used as training data. selections
Afterwards, select the next fold as test data. In this way, this
Two machine learning algorithms, SVM and RF, are used
process will be conducted 10 times and get the average as
and two algorithms, PCA and SVM, are for ranking the
the final result. In summary, our experiment has the following
features by their importance. Thus, totally, the influence of
eight steps, which is also depicted in Figure 7. This process
different feature selection on the detection performance is in-
is also simply shown in Figure 8.
vestigated in the following four different cases. As mentioned

176
packet
datasets

session
feature 55 features
split
extraction

feature
ranking

classifier

Fig. 6. Distribution of the variance of packet sizes. The X-axis is for packets
sent; The Y-axis packet received; Reds are C&C sessions and blues normal. removing features
gradually
result
10-fold
cross-validation
Fig. 8. Our experiment process.
start changing
test data

session splitting training


the number of features increased up to 40. In the case of
feature selection using PCA and classifying by SVM, however,
feature extracting testing
the irregularity appears (see Figure 12) , which is thought to
feature ranking lp-1
be due to the fact that PCA immediately deleted important
features for classification for SVM. Anyway, Figure 11 shows
fn=55; lp=10; N lp=0 that PCA feature selection also lead to a good detection
performance if RF is used for classification. Generally, the
Y detection performance is getting better if more features are
removing some features utilized. However, after the number of features has reached
(updating fn) around 40, the detection performance will not change much
even more features are used. It is also verified that, in some
fn<1 N specific cases, more features do not always means a better
detection performance. That is, some features may have bad
Y
influence to the detection performance. Thus, our experiment
end
verified again that the importance of feature selection. It
is necessary to investigate in detail why the results are so
Fig. 7. The flow of our experiment process.
different for different algorithms of machine learning and
feature selection, which will be our future work.
Moreover, from our experiment, the most important 10
in Section IV, the TP rate, the FP rate, Precision and F- features for picking out the C&C traffic were found and listed
Measure are evaluated for each case. in Table I.
1) SVM feature selection and SVM classifier: The experi-
TABLE I
ment result is shown in Figure 9. T HE MOST IMPORTANT FEATURES FOR DETECTING C&C TRAFFIC .
2) SVM feature selection and RF classifier: The experiment Feature Description
result is shown in Figure 10. Spc amount of packets sent
3) PCA feature selection and RF classifier: The experi- FlagR proportion of packets with flag R in a session
ment result is shown in Figure 11. Sip1 Proportion of the sent packets having the size of 100-199B
Sip0 Proportion of the sent packets having the size of 0-99B
4) PCA feature selection and SVM classifier: The experi-
FlagS proportion of packets with flag S in a session
ment result is shown in Figure 12. Svar Variance of sent packets
RITmax Maximum interval time in received package
E. Observation RITvar Variance of interval time of received packages
Sip5 Proportion of the sent packets having the size of 500-599B
From Figures 9 and 10, a slow improvement on the detection Rvar Variance of packet size of the received packets
performance is observed after the number of features increased
up to 20. From Figure 11, a slow improvement appears after

177
(a) the TP rate
(a) the TP rate

(a) the FP rate


(a) the FP rate

(a) the Precision


(a) the Precision

(a) the F-Measure (a) the F-Measure

Fig. 9. Detection performance based on SVM feature selection and SVM Fig. 10. Detection performance based on SVM feature selection and RF
classifier. classifier.

VI. C ONCLUSION AND F UTURE W ORK ACKNOWLEDGE


This work was partially supported by JSPS KAKENHI
In this paper, we focused on the issue of feature selection Grant Numbers JP17K00187 and JP18K11295. This work is
for early detection of distributed cyber attacks. We imple- also partially supported by Strategic International Research
mented the early detection by detecting C&C communication Cooperative Program, Japan Science and Technology Agency
of distributed attacks because those communication is at the (JST).
preparation phase of distributed attacks. Based on our previous
research using 55 features to detect C&C communication, in R EFERENCES
this paper we investigated what if we remove the features [1] S. Xu S., ”Collaborative Attack vs. Collaborative Defense,” in Proc. the
from those of the least importance. We did this for the 4th International Conference on Collaborative Computing (Collaborate-
Com), pp. 217–228, 2009.
purpose of finding that what features are actually critical for [2] The non-profit anti-spam organization Spamhaus,
early detection of distributed attacks. From our experiment https://www.spamhaus.org/ (accessed on April 19, 2018).
using traffic data collected by honeypots, we observed that [3] ”The DDoS That Knocked Spamhaus Offline,” reported by Cloudflare
on March 30, 2013. https://blog.cloudflare.com/the-ddos-that-knocked-
the detection performance is generally getting better if more spamhaus-offline-and-ho/ (accessed on April 19, 2018).
features are utilized. However, after the number of features [4] ”400Gbps: Winter of Whopping Weekend DDoS Attacks,” reported by
has reached around 40, the detection performance will not Cloudflare on March 3, 2016. https://blog.cloudflare.com/a-winter-of-
400gbps-weekend-ddos-attacks/ (accessed on April 19, 2018).
change much even more features are used. We also found [5] ”DDoS Attack Has Varying Impacts on DNS Root Servers,” reported by
the top-10 important features for detecting C&C traffic. We ThousandEyes on July 19th, 2016. https://blog.thousandeyes.com/ddos-
again verified that some ”bad” features would deteriorate the attack-varying-impacts-dns-root-servers/ (accessed on April 19, 2018).
[6] Y. Feng, Y. Hori, and K. Sakurai, ”A Proposal for Detecting Distributed
detection performance. Cyber-Attacks Using Automatic Thresholding,” in Proc. the 10th Asia
As future work, we will analyze the experiment result in Joint Conference on Information Security (AsiaJCIS), pp. 152–159,
more detail and find the detailed reason why the detection 2015.
[7] Y. Feng, Y. Hori, K. Sakurai, J. Takeuchi, ”A Behavior-Based Method for
performance changes in such ways. Also, we will verify our Detecting Distributed Scan Attacks in Darknets,” Journal of Information
observation using other traffic datasets. Processing, Vol.21, No.3, page 527-538, 2013.

178
(a) the TP rate (a) the TP rate

(a) the FP rate (a) the FP rate

(a) the Precision (a) the Precision

(a) the F-Measure (a) the F-Measure

Fig. 11. Detection performance based on PCA feature selection and RF Fig. 12. Detection performance based on PCA feature selection and SVM
classifier. classifier.

[8] Y. Tang, ”Defending against Internet Worms: a Signature-based Ap- [18] Snort Users Manual, http://www.snort.org/docs (accessed on June 20,
proach,” in Proc. the 24th IEEE Annual Joint Conference of the 2018).
Computer and Communications Societies (INFOCOM), pp. 1384–1394,
[19] C. Gates, ”The Modeling and Detection of Distributed Port Scans: a
2005.
Thesis Proposal,” Technical Report CS-2003-01, Dalhousie University,
[9] I. Yazid, A. Hanan and M. Aizaini, ”Volume-based Network Intrusion 2003.
Attacks Detection,” Advanced Computer Network and Security, UTM
[20] V. Yegneswaran, P. Barford and J. Ullrich, ”Internet Intrusions: Global
Press, pp. 147–162, 2008.
Characteristics and Prevalence,” in Proc. 2003 ACM Joint International
[10] A. Kind, M. P. Stoecklin and X. Dimitropoulos, ”Histogram-Based
Conference on Measurement and Modeling of Computer Systems, pp.
Traffic Anomaly Detection,” IEEE Transactions on Network Service
138–147, 2003.
Management, Vol. 6, No. 2, pp. 1–12 (2009).
[21] G. Gu, P. Porras, V. Yegneswaran, M. Fong, W. Lee, ”BotHunter:
[11] E. Eskin and W. Lee, ”Modeling System Call for Intrusion Detection
Detecting Malware Infection Through IDS-driven Dialog Correlation,”
with Dynamic Window Sizes,” in Proc. DARPA Information Survivabil-
in Proc. the 16th USENIX security symposium, pp. 167–182, 2007.
ity Conference and Exposition (DISCEX), pp. 165–175, 2001.
[12] Y. Feng, Y. Hori, K. Sakurai and J. Takeuchi, ”A Behavior-based Method [22] S. Kondo and N. Sato, ”Botnet traffic detection techniques by C&C
for Detecting Outbreaks of Low-rate Attacks,” in Proc. 3rd Workshop session classification using SVM,” in Proc. 2nd international conference
on Network Technologies for Security, Administration and Protection on Advances in information and computer security, pp. 29–31, 2007.
(NETSAP), pp. 267–272, 2012. [23] D. Ashley, ”An algorithm for http bot detection,” University of Texas at
[13] Y. Xiang, K. Li and W. Zhou, ”Low-Rate DDoS Attacks Detection and Austin - Information Security Office, 2011.
Traceback by Using New Information Metrics,” IEEE Transactions on [24] S. Yamauchi, J. Kawamoto and K. Sakurai, ”Evaluation of Machine
Information Forensics and Security, Vol.6, No.2, pp. 426–437, 2011. Learning Techniques for C&C traffic Classification,” (in Japanese) IPSJ
[14] W. Lee and D. Xiang, ”Information-theoretic Measures for Anomaly Journal, Vol. 56, No. 9 : pp. 1745–1753, 2015.
Detection,” in Proc. IEEE Symposium on Security and Privacy, pp. 130– [25] L. Lu, Y. Feng and K. Sakurai, ”C&C session detection using random
143, 2001. forest,” in Proc. the 11th ACM International Conference (IMCOM) ,
[15] V. Chandola, A. Banerjee and V. Kumar, ”Anomaly Detection: a Survey,” 2017.
ACM Computing Survey, Vol.41, No.3, pp. 1–72, 2009. [26] J. Weston et al. , ”Feature Selection for SVMs, ”
[16] M. S. Kim, H. J. Kang and S. C. Hong; ”A Flow-based Method for http://www.cs.columbia.edu/ jebara/6772/papers/weston01feature.pdf
Abnormal Network Traffic Detection,” in Proc. IEEE/IPIP Network (accessed on April 29, 2018).
Operations and Management Symposium, pp. 599–612, 2004. [27] Y. W. Chen and C. J. Lin, ”Combining SVMs with various feature
[17] J. Treurniet, ”A Network Activity Classification Schema and its Appli- selection strategies,” Feature extraction, pp. 315–324, Springer Berlin
cation to Scan Detection,” IEEE/ACM Transactions on Networking, Vol. Heidelberg, 2006.
19, No. 5, pp. 1396–1404, 2011. [28] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, ”Gene selection for

179
cancer classification using support vector machines,” Machine learning,
46(1-3), pp. 389–422, 2002.
[29] V. Sugumaran, V. Muralidharan and K. I. Ramachandran, ”Feature
selection using decision tree and classification through proximal support
vector machine for fault diagnostics of roller bearing,” Mechanical
systems and signal processing, Vol. 21, No. 2, pp. 930–942, 2007.
[30] F. Song, Z. Guo and D.Mei, ”Feature Selection Using Principal Compo-
nent Analysis,” in Proc. the International Conference on System Science,
Engineering Design and Manufacturing Informatization (ICSEM), pp.
27–30, 2010.
[31] T. Grtner, P. A. Flach, ”WBCSV M : Weighted Bayesian classification
based on support vector machines,” in Proc. the 18th International
Conference on Machine Learning (ICML), pp. 154161, 2001.
[32] J. Brank, M. Grobelnik, N. M. Frayling and D. Mladeni, ”Feature
Selection Using Linear Support Vector Machines,” Technical Report
MSR-TR-2002-63, Microsoft Research, Microsoft Corporation, 2002.
[33] O. Takata, et al., ”MWS Datasets 2016, Anti-Malware Engineer-
ing Workshop (MWS),” www.iwsec.org/mws/2016/20160714-takata-
dataset.pdf (2016) (accessed on April 29,2018).
[34] K. Ron, ”A study of cross-validation and bootstrap for accuracy es-
timation and model selection,” in Proc. the 14th international joint
conference on Artificial intelligence (IJCAI) , pp. 1137–1143 , 1995.

180

You might also like