15709-Article Text-55876-2-10-20220114
15709-Article Text-55876-2-10-20220114
15709-Article Text-55876-2-10-20220114
Abstract. The spread of ransomware has risen exponentially over the past decade,
causing huge financial damage to multiple organizations. Various anti-
ransomware firms have suggested methods for preventing malware threats. The
growing pace, scale and sophistication of malware provide the anti-malware
industry with more challenges. Recent literature indicates that academics and anti-
virus organizations have begun to use artificial learning as well as fundamental
modeling techniques for the research and identification of malware. Orthodox
signature-based anti-virus programs struggle to identify unfamiliar malware and
track new forms of malware. In this study, a malware evaluation framework
focused on machine learning was adopted that consists of several modules: dataset
compiling in two separate classes (malicious and benign software), file
disassembly, data processing, decision making, and updated malware
identification. The data processing module uses grey images, functions for
importing and Opcode n-gram to remove malware functionality. The decision
making module detects malware and recognizes suspected malware. Different
classifiers were considered in the research methodology for the detection and
classification of malware. Its effectiveness was validated on the basis of the
accuracy of the complete process.
1 Introduction
Malware is defined as intrusive software that penetrates or destroys a system
without permission of the user. Malware is a common concept that threatens all
sorts of devices. A basic malware distinction is between file infectors and
individual malware. According to the specific behavior malware items can be
classified into adware, viruses, trojans, spyware, rootkits, etc. The process of
malware detection through traditional signature-based methods (Santos, et al. [1])
is very problematic because all older and new malware programs have
polymorphic layers to avoid detection; the use of lateral mechanisms assists in
developing new malware versions in a shorter time in order to avoid antivirus
detection. For malware identification through dynamic file review in a virtual
Received February 7th, 2021, 1st Revision May 18th, 2021, 2nd Revision September 8th, 2021, Accepted for
publication October 29th, 2021.
Copyright © 2021 Published by IRCS-ITB, ISSN: 2337-5787, DOI: 10.5614/itbj.ict.res.appl.2021.15.3.5
266 Saleh Abdulaziz Habtor & Ahmed Haidarah Hasan Dahah
world, the interested reader is referred to Rieck et al. [2]. The classical methods
for detecting metamorphic viruses are discussed in Konstantinou, et al. [3].
Cyber threats become possible when criminals use malware as a primary weapon
in their operations. Therefore, one information protection issue is to detect
ransomware in time so that it can be blocked to prevent the attackers from
accomplishing their goals, or at least delay them long enough to stop
them.Various detection methods, such as regulatory or signature-based methods,
enable the analyst to apply rules manually based on specific data to identify and
automatically describe harmful or sensitive data to the specifications of the
detection model. The automated generation of signatures is a middle ground
between these two methods. To date, manual and automated rules and signatures
have been used in the information security field using machine learning and
mathematical techniques due to the low false positive rates they can achieve.
In recent years, however, three advances have strengthened the potential for
progress in machine-based learning techniques, suggesting that these strategies
will attain high detection rates at low false positive rates without the pressure of
producing manual signatures. The first such trend is the rise of commercial threat
intelligence feeds that offer large quantities of new malware, which means that
the safety community has access to labeled malware for the first time. The second
trend is that processing power has become cheaper, so researchers can travel more
easily around learning models in malware detection systems and fit larger and
more complex models to the results. Thirdly, machine learning has developed as
a discipline, which means that researchers have more resources for effective
detection models that can achieve both accuracy and scalability breakthroughs.
1.2 Malware
Malware is software that is used or designed to interrupt network processes,
capture personal information, or control private computer systems. It can be
found in JavaScript, scripts, active content, and applications. Malware is
commonly used as a term to refer to several types of software that is offensive,
disruptive, or irritating.
Malware Use:
1. Many early infectious programs were written as experiments or pranks,
including the first Internet worm.
2. Today, malware is mostly used to capture confidential information for the
benefit of others, including personal, financial and business information.
3. Malware is often used extensively to capture or destroy secured information
from government or business websites.
4. Malware, however, is also used to obtain personal data such as credit card
numbers, social security numbers, bank accounts etc.
30
25
20
15
10
5
0
2 Methods of Detection
Malware identification approaches can be categorized into signatures-based and
behavior-based strategies. It is crucial to consider the fundamentals of two
malware analysis approaches before moving to the discussion of these methods:
static analysis of malware, and dynamic analysis of malware. Static analysis takes
Machine Learning Classifiers for Malware Detection 269
place ‘statically’, i.e. without processing any files. In contrast, dynamic file
processing is carried out on a virtual machine.
Static research is interpreted as ‘reading’ the source code of malware and
attempting to deduce behavioral features from the code. Various strategies can be
used in static analyses (Prasad, Annangi & Pendyala [11]):
1. File format inspection: The file metadata can be helpful. For example
Windows PE files contain information about time to compile, imported and
exported functions, etc.
2. String extraction: This means program output inspection (for example, status
or error messages) and the inference of malware process information.
3. Fingerprinting: This involves the calculation of the cryptographic hash, the
identification of environmental items, including hard-coded usernames,
passwords or strings in the registry.
4. AV scanning: If the examined file is known ransomware, it can possibly be
found by all anti-virus scanners available. While this identification can seem
insignificant, AV vendors or sandboxes use this identification tool to
‘confirm’ their results.
5. Disassembly: This involves the reverse of the program code to combine the
language and structure and purpose of applications. This is the most widely
used and accurate static analysis method.
6. Dynamic and static analysis: in contrast to static analysis, in dynamic
analysis the file under investigation is tracked during execution and the
features and purposes of the file are derived from these details. The file is
normally run in a simulated environment, such as a sandbox. All behavioral
characteristics such as opened directories, generated mutexes, etc. can be
found during this method of analysis. It is also easier compared to static
analysis. Static analysis only reveals the behavioral situation that is
applicable to the present device characteristics. If a virtual machine is built
under Windows 7, then the results may vary from those of Windows 8.1
malware (Egele, et al. [12]).
When using a heuristic process, there must be a certain level of malware activity,
which determines the number of heuristics necessary to identify a program as
malicious. For instance, a variety of suspicious operations such as ‘changed
registry key’, ‘link created’, ‘changed permit’, etc., may be identified. It would
also assume that every program that causes at least five of such operations may
be considered malicious. While this strategy gives some degree of reliability, it
is not necessarily valid, since there are features that may have extra ‘weight’
compared to others, for example, ‘modified allowances’ usually has more drastic
effects on a device than ‘adjusted registry key’. In comparison, certain
combinations of features may be more questionable than the features separately
(Rieck, et al. [17]).
3 Related Work
In 2001, Schultz, et al. [18] launched machine learning for finding new, static-
based malware, byte n-grams on program executables, and strings for
functionality extraction writers. In 2007, Bilar [19] released Opcode, a malware
finder to investigate the distribution of opcode frequency in non-malicious and
malicious scripts. In 2007, Elovici, et al. [20] used Feature Range and Decision
Tree (5 grams, top 300, FS), Bayesian Network (5 grams), Artificial Neural
Network (5 grams, top 300, FS), Decision Tree (using the PE), BN (using the PE)
and accuracy of 95.8 percent. In 2008, Moskovitch, et al. [21] used philtres for
the collection of functions. For the collection and classification of functions and
Decision Tree (DT), Naïve Bayes (NB), and Adaboost, Neural Support Networks
(ANN). The assistance of support vector machine (SVM) and M1 (DT and NB
boosted) using Fisher score and gain ratio (GR) had an accuracy of 94.9%.
Again, Moskovitch, et al. [22] used the n-gram (2,3,4,5,6 grams) of opcodes as
standard and used the collection of document frequency (DF), GR and FS features
in 2008. They used the ANN, DT, Boosted DT, NB and Boosted NB classification
algorithms, which were outperformed by ANN, DT, BDT in retaining a low false
positive score.
Santos, et al. [23] concluded in 2011 that supervised learning includes labeling
data so that semi-controlled learning was introduced to recognize unknown
malware. In 2011, the frequency of operating codes was again provided by
Santos, et al. [24]. They used the function selection approach and various
classifiers, i.e. DT, K-Closest Neighbors (KNN, Bayesian Network), Support
Vector Machine (SVM) with an opcode sequence length of 92.92% and an
opcode sequence length of 95.90%. Shabtai, et al. used n-gram opcode pattern
features in 2012 to define the best available tool for document frequency (DF),
G-mean and Fisher ranking. They used several classifiers in their method, with
Random Forest exceeding 95.146% accuracy (Shabtai, et al. [25]).
272 Saleh Abdulaziz Habtor & Ahmed Haidarah Hasan Dahah
In 2016, Ashu, et al. [26] proposed a new method for high-precision detection of
unknown malware. They studied the frequency of opcodes and put them together.
The authors tested thirteen classifiers, from which FT, J48, NBT, and Random
Forest were included in the WEKA machine learning stage, and obtained over
96.28% accuracy for malware. In 2016, Sahay, et al. [27] using the Optimal K
Means Clustering algorithm, clustered malware executables and these groups
were used by classifiers to identify unknown malware as promising training
features (FT, J48, NBT, and Random Forest). They found that the identification
by the proposed solution of unknown malware had 99.11% accuracy.
Some scholars have recently been working on a new malware dataset for Kaggle
[28]. In 2016, Ahmadi, et al. [29] collected Microsoft malware data and hex
dump-based characteristics used (string length, metadata, entropy, n-gram, and
image depiction) and also characteristics derived from unmounted files and the
classification algorithms of XGBoost (metadata, icon duration, opcodes,
registries, etc.). They achieved an accuracy of ~99.8%. For the 2017 classification
of polymorphic malware, Drew, et al. [30] employed the Super Threaded
Reference Free Alignment-Free N-sequence Decoder (STRAND) classifier.
They introduced an ASM sequence model and achieved a precision of more than
98.59% with a 10-fold cross-validation approach.
In Souri, et al. [31], a number of malware detection techniques are presented in
two categories:
1. signature-based methods, and
2. behavior-based methods. The survey, however, did not include either a study
of the current deep learning methods or the types of features used for malware
detection and classification in data-mining techniques. Ucci, et al. [32]
categorized the methods according to:
a. What is the objective problem they are trying to solve?
b. What are the types of characteristics taken from portable executable files
(PEs), and
c. Which machine learning algorithms they use. Although the research
provides a full overview of the taxonomy of functions, new research
trends, notably multimodal and deep learning approaches, are not
outlined.
organization, and author. However, the paper does not define the features of
malware detectors and does not consider the latest technologies in this field.
Sakhnini, et al. (2019) [44] present a bibliometric survey focusing on the security
aspects of IoT enabled smart grids. Furthermore, the authors address the problem
of the different types of cyber attacks that they found related to a particular topic.
Yazdinejad, et al. (2020) [45] designed a novel RNN model in order to detect
malware threats in cryptocurrencies. The authors for this particular study
collected 500 samples of cryptocurrency malware and 200 samples of goodware.
Table 1 Recent research in machine learning-based Android malware detection.
Authors Features Algorithm Comment
Sahs & Khan’s approach yielded high
recall with low precision. The vast
Permissions,
Sahs & Khan majority of our in-lab classifiers
CFG 1-class SVM
(2012) [46] yielded both a high recall and a high
subgraphs
precision.
been collected in one single dataset or repository yet. In this study, the publicly
accessible dataset Ember was used, with a subset containing 70,140 benign and
69,860 malicious files. This dataset was randomly divided into 60% training and
40% testing data using Scikit-learn. The training dataset consisted of 42,140
benign files and 41,860 malicious files. In the training dataset, 28,000 benign files
and 28,000 malicious files existed. These samples were derived from VirusTotal,
VirusShare and privately collected samples of benign and malware samples
(Kaggle [28]).
4.2.4 n-gram
In this analysis, we used an n-gram model to remove opcode functionality from
the malware. It is an easy way to remove text functions. The presence of n terms
is only correlated with the previous n − 1 terms, n being the length of one function
sequence. If we have a set of L opcodes, then the set will be split into sequences
of L – n + 1 attributes. This model seeks sequences of functions in a sliding pane.
A 3-gram model, for example, is used to obtain functional sequences from for
example, call, push, mov, add, pop, inc and xor. As shown in Figure 4, we take
out five short strings, and three opcodes are used in each sequence.
p(si, y) is a combined distribution of probability of si and y, and p(si) and p(y) are
the cumulative distribution of likelihood functions of S and Y respectively.
Information gain is used to calculate the malware sequence’s ability to
differentiate. We use a two-step dimension reduction technique.
If the condition of the function is satisfied, it is deleted. This indicates that the
characteristics are not found in the malware categories. Then, a new value of the
data is determined:
( )
𝐼 (𝑆; 𝑌) = ( )
∑ ∈ ∑ ∈ 𝑝(𝑠 𝑦 )𝑙𝑜𝑔 ( ). ( )
(4)
The phrase has a weight definition. This concept seeks to increase the value of
certain low frequencies and high discrimination characteristics. We maintain 500
settings with greater values when measuring the information gain.
amount and then correctly categorizes the individual unit. Different classifiers
can influence different families. Each classifier is therefore able to provide
classification outcomes with a greater degree of trust.
Sample A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
P1 1 1 0 0 0 0 0 0 0 0
P2 0 0 0 0 0 0 0 0 1 1
P3 0 0 0 0 1 1 1 1 1 1
D(P1, P2) = 2, D(P2, P3) = 2, and D(P1, P3) = 6 are determined. There are two
conclusions:
1. P1 and P3 have the same relation to P2, i.e. they have the same distance to
P2.
2. P1 and P3 are similar to P2; P1 is also similar to P3.
Suppose the abovementioned three examples are instances of malware. Suppose
further that these samples contain the value of a variable that is not zero. Table 1
reveals P1 and P2 have no similar characteristics while P2 and P3 have two
similar characteristics. That is why the Euclidean distance does not necessarily
demonstrate the resemblance of samples in a wide space.
Machine Learning Classifiers for Malware Detection 279
Given the problems described above we followed the SNN approach, which
works well in high-dimensional spaces. Jarvis & Patrick [43] first proposed this
method. The similarity between two points is featured by the fact that they share
a major quarter C with at least k points. This approach has the advantage that it
can cluster points of varying densities. As shown in Figure 6, the clustering of
varying densities represents circles of different sizes.
In each row of the matrix of similarities the relation of position M between two
points is stored. M(A, B) = 1 means that B is nearest to A. In-row saves only k
minimum values, and all values are set to 0. The matrix is used to create the
nearest K (K-NN) line. From Fig. 10, the points O and P are noise or outliers, but
graphs are not used to differentiate them. The extent of the relationship is
calculated by:
𝑠𝑡𝑟(𝑂, 𝑃) = ∑(𝑘 + 1 − 𝑚). (𝑘 + 1 − 𝑛) (5)
If a point’s value is smaller than a unique threshold it looses all edges. In Fig. 10,
O and P points are listed as outliers. Ertoz, et al. have focused on the link strength
to choose the main points of each cluster with a higher connection capacity. In
each cluster, a point is either one of the core points or linked to the core points.
Figure 6 (a) Near neighbor graph and (b) weighted shared near neighbor graph.
4000
3500
3000
2500
2000
1500
1000
500
0
files and malware detected, where the accuracy of the classification of malicious
files was about 98%.
Table 3 Detection evaluation using K-Nearest Neighbor.
S.N. Family of Sample Correctly Classified Incorrectly Classified Accuracy
1. Benign 3499 834 82.3%
2. Dridex 1880 340 84.2%
3. Locky 1340 280 83.4%
4. TeslaCrypt 2600 40 98%
5. Vawtrak 920 160 84.5%
6. Zeus 1820 580 78%
7. DarkComet 2840 100 96%
8. CyberGate 2300 0 100%
9. Xtreme 1880 160 93%
10. CTB-Locker 1040 280 79.2%
Support Vector Machine was the next algorithm that was tested. In Table 5 and
Figure 8, the outcome of the predictions can be seen. The overall accuracy
obtained for multi-class classification was 87.6% and for binary classification
94.6%. The maximum accuracy was 100%, achieved by CTB-Locker, and the
minimum accuracy was 59.3%, achieved by Vawtrak. Table 6 shows the
classification of files as goodware or malware using Support Vector Machine. As
the results show, around 83.8% accuracy was seen for benign files and around
89.3% accuracy in the case of malware.
Table 5 Detection evaluation using Support Vector Machine.
S.N. Family of Sample Correctly Classified Incorrectly Classified Accuracy
1. Benign 3996 335 93%
2. Dridex 1940 280 87.3%
3. Locky 1280 340 78.7%
4. TeslaCrypt 2240 400 85.1%
5. Vawtrak 620 460 59.3%
6. Zeus 1880 520 79.1%
7. DarkComet 2900 40 98.2%
8. CyberGate 2240 40 98%
9. Xtreme 1880 160 92.1%
10. CTB-Locker 1320 0 100%
282 Saleh Abdulaziz Habtor & Ahmed Haidarah Hasan Dahah
4500
4000
3500
3000
2500
2000
1500
1000
500
0
Table 6 Accuracy of benign and malicious files using Support Vector Machine.
Class Correctly Classified Incorrectly Classified Accuracy
Benign 3996 8335 83.8%
Malicious 16300 2240 89.37%
5000
4000
3000
2000
1000
0
Table 8 Accuracy of benign and malicious files using J48 Decision Tree.
Class Correctly Classified Incorrectly Classified Accuracy
Benign 3996 8335 83.8%
Malicious 16300 2240 99.5%
3500
3000
2500
2000
1500
1000
500
0
4500
4000
3500
3000
2500
2000
1500
1000
500
6 Conclusion
Because of the ever growing number of malware variants and the variety of
malware activities there is renewed interest in and need for effective malware
detectors to protect against zero-day attacks. Anti-virus firms typically collect
millions of malicious samples, which are obtained and analyzed in the usual
manner, delaying the identification of any unusual samples that harm users. Our
primary aim was to create a machine-learning system that commonly detects as
many malware samples as possible, with the tough constraint of having a zero
false positive rate. We came quite close to our goal, but still have a non-zero false
positive rate. For this method to become part of a highly competitive commercial
product, a number of deterministic exemption mechanisms must be added. In the
proposed work, the Random Forest and Naïve Bayes classifiers showed the best
results.
The system was validated using a sample of 140,000 files consisting of malware
and benign files. The malware was further divided into 9 different classes on the
basis of their properties. The complete sample list was categorized into groups at
a 60% and 40% ratio for further processing of system training and decision
making as training dataset and testing dataset respectively. Given that most anti-
virus products achieve a detection rate of more than 90% there was a very
significant increase in the overall detection rate of 3 to 4% produced by our
algorithms.
Machine Learning Classifiers for Malware Detection 287
7 Future Scope
In the future more features will be considered to develop a better model that will
use a more robust deep learning technique for the detection of cyber attacks. It
will also be capable of detecting all types of different malware attacks and
automatically deal with all types of cyber attacks.
References
[1] Santos, I., Penya, Y.K., Bringas, P.G. & Devesa, J., N-grams-based File
Signatures for Malware Detection, Proceedings of the 11th International
Conference on Enterprise Information Systems - Artificial Intelligence and
Decision Support Systems, pp. 317-320. 9, 2009.
[2] Rieck, K., Holz, T., Willems, C., Düssel, P. & Laskov, P., Learning and
Classification of Malware Behavior, International Conference on
Detection of Intrusions and Malware, and Vulnerability Assessment, pp.
108-125, 2008.
[3] Konstantinou, E. & Wolthusen, S., Metamorphic Virus: Analysis and
Detection, Technical Report, RHUL-MA-2008-02, Royal Holloway
University of London, 2008.
[4] Horton, J. & Seberry, J., Computer Viruses: An Introduction, University of
Wollongong , 1997.
[5] Smith, C., Matrawy, A., Chow, S. & Abdelaziz, B., Computer Worms,
Architectures, Evasion Strategies, and Detection Mechanisms, Journal of
Information Assurance and Security, 4, pp. 69-83, 2009.
[6] Moffie, M., Cheng, W., Kaeli, D. & Zhao, Q. Hunting Trojan Horses,
Proceedings of the 1st Workshop on Architectural and System Support for
Improving Software Dependability, pp. 12-17, October, 2006.
[7] Chien, E., Techniques of Adware and Spyware, Proceedings of the
Fifteenth Virus Bulletin Conference, Dublin Ireland, 47, 2005.
[8] Chuvakin, A., An Overview of Unix Rootkits, iALERT White Paper,
iDefense Labs, http://www.megasecurity.org/papers/Rootkits.pdf, 2003.
[9] Chumachenko, K., Machine Learning Methods for Malware Detection and
Classification, Department of Information Technology, University of
Applied Science, Bremen, 2017.
[10] Savage, K., Coogan, P. & Lau, H., The Evolution of Ransomware, Version
1.0, Symantec Corporation, http://www.symantec.com/content/en/us/
enterprise/media/security_response/whitepapers/the-evolution-of-
ransomware.pdf., August 6, 2015.
[11] Prasad, B.J., Annangi, H. & Pendyala, K.S., Basic Static Malware Analysis
Using Open-Source Tools, 2016.
288 Saleh Abdulaziz Habtor & Ahmed Haidarah Hasan Dahah
[12] Egele, M., Scholte, T., Kirda, E. & Kruegel, C., A Survey on Automated
Dynamic Malware-analysis Techniques and Tools, ACM computing
surveys (CSUR), 44(2), pp. 1-42. 2008.
[13] Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E. & Ahmadi, M.,
Microsoft Malware Classification Challenge, arXiv preprint
arXiv:1802.10135, 2018.
[14] Gibert, D., Mateu, C. & Planes, J., The Rise of Machine Learning for
Detection and Classification of Malware: Research Developments, Trends
and Challenge, Journal of Network and Computer Applications, 153,
102526, 2020.
[15] Chu, Q., Liu, G. & Zhu, X., Visualization Feature and CNN Based
Homology Classification of Malicious Code, Chinese Journal of
Electronics, 29(1), pp. 154-160, 2020.
[16] Baskaran, B. & Ralescu, A., A Study of Android Malware Detection
Techniques and Machine Learning, MAICS, pp. 15-23, 2016.
[17] Rieck, K., Trinius, P., Willems, C. & Holz, T., Automatic Analysis of
Malware Behavior Using Machine Learning, Journal of Computer
Security, 19(4), pp. 639-668, 2011.
[18] Schultz, M.G., Eskin, E., Zadok, E. & Stolfo, S.J., Data Mining Methods
for Detection of New Malicious Executables, in Proceedings 2001 IEEE
Symposium on Security and Privacy, pp. 38-49, IEEE, 2000.
[19] Bilar, D., Opcodes as Predictor for Malware, International Journal of
Electronic Security and Digital Forensics, 1(2), pp. 156-168, 2007.
[20] Sharma, S., Krishna, C.R. & Sahay, S.K., Detection of Advanced Malware
by Machine Learning Techniques, Soft Computing: Theories and
Applications, Springer, Singapore, pp. 333-342., 2019.
[21] Shabtai, A., Moskovitch, R., Elovici, Y. & Glezer, C., Detection of
Malicious Code by Applying Machine Learning Classifiers on Static
Features: A State-of-the-Art Survey, Information Security Technical
Report, 14(1), pp. 16-29, 2009.
[22] Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev,
S. & Elovici, Y., Unknown Malcode Detection Using Opcode
Representation, European Conference on Intelligence and Security
Informatics, Springer, Berlin, Heidelberg, pp. 204-215, 2008.
[23] Santos, I., Nieves, J. & Bringas, P.G., Semi-supervised Learning For
Unknown Malware Detection, International Symposium on Distributed
Computing and Artificial Intelligence, Springer, Berlin, Heidelberg, 2011.
[24] Santos, I., Brezo, F., Ugarte-Pedrero, X. & Bringas, P.G., Opcode
Sequences as Representation of Executables for Data Mining-based
Unknown Malware Detection, Information Sciences, 231, pp. 64-82, 2013.
[25] Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C. & Weiss, Y.,
‘Andromaly’: A Behavioral Malware Detection Framework for Android
Machine Learning Classifiers for Malware Detection 289
[39] Tian, R., Batten, L., Islam, Md.R. & Versteeg, S., An Automated
Classification System Based on the Strings of Trojan and Virus Families,
2009 4th International Conference on Malicious and Unwanted Software
(MALWARE), pp. 23-30, IEEE, 2009.
[40] Ye, Y., Li, T., Chen, Y. & Jiang, Q, Automatic Malware Categorization
Using Cluster Ensemble, Proceedings of the 16th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp.
95-104, July, 2010.
[41] Qinghua, H., Yu, D., Xie, Z. & Li, X., EROS: Ensemble Rough Subspaces,
Pattern Recognition, 40(12), pp. 3728-3739, 2007.
[42] Tao, H., Ma, X-P. & Qiao, M-Y., Subspace Selective Ensemble Algorithm
Based on Feature Clustering, Journal of Computers 8(2), pp. 509-516,
2013.
[43] Jarvis, R.A. & Patrick., E.A., Clustering using a Similarity Measure Based
on Shared Near Neighbors, IEEE Transactions on Computers, 100(11), pp.
1025-1034, 1973.
[44] Sakhnini, J., Karimipour, H., Dehghantanha, A., Parizi, R.M. & Srivastava,
G., Security Aspects of Internet of Things Aided Smart Grids: A
Bibliometric Survey, Elsevier’s Internet of Things, 100111, 2019.
[45] Yazdinejad, A., HaddadPajouh, H., Dehghantanha, A., Parizi, R.M.,
Srivastava, G. & Chen, M-Y., Cryptocurrency Malware Hunting: A Deep
Recurrent Neural Network Approach, Applied Soft Computing, 96,
106630, 2020.
[46] Laitner, J.A., Nadel, S., Elliott, R.N., Sachs, H. & Khan, S., The Long-Term
Energy Efficiency Potential: What The Evidence Suggests, E121, American
Council for an Energy-Efficient Economy, Washington DC, 2012.
[47] Amos, B., Turner, H. & White, J., Applying Machine Learning Classifiers
to Dynamic Android Malware Detection at Scale, 2013 9th International
Wireless Communications And Mobile Computing Conference (IWCMC),
pp. 1666-1671, IEEE, 2013.
[48] Yerima, S.Y., Sezer, S. & McWilliams, G., Analysis of Bayesian
Classification-based Approaches for Android Malware Detection, IET
Information Security, 8(1), pp. 25-36, 2013.
[49] Canfora, F., Nonlinear Superposition Law and Skyrme Crystals, Physical
Review D, 88(6), 065028, 2013.
[50] Wu, D-J., Mao, C-H., Wei, T-E., Lee, H-M. & Wu, K-P., Droidmat:
Android Malware Detection through Manifest and API Calls Tracing, 2012
Seventh Asia Joint Conference on Information Security, pp. 62-69, IEEE,
2012.