Ransomware Detection and Classification Using Ensemble Learning: A Random Forest Tree Approach

Ransomware Detection and Classification using
Ensemble learning: A Random Forest Tree

Approach
Shahid Anwar Abdul Ahad
Knowledge Unit of Systems and Technology (KUST) Knowledge Unit of Systems and Technology (KUST)
University of Management and Technology University of Management and Technology
Sialkot, Pakistan Sialkot, Pakistan
21001279008@skt.umt.edu.pk Department of Electronics and Communication Engineering
Istanbul Technical University
ahad@gmail.com
Mudassar Hussain Ibraheem Shayea

Knowledge Unit of Systems and Technology (KUST) Department of Electronics and Communication Engineering
University of Management and Technology Istanbul Technical University
Sialkot, Pakistan ibr.shayea@gmail.com
mudassar.hussain@skt.umt.edu.pk
Ivan Miguel Pires

Instituto de Telecomuniocações
Covilhã, Portugal
impires@it.ubi.pt
Abstract—Viruses significantly threaten computer systems, the victim will lose access to his files. This threat is also called
potentially causing extensive damage and data loss. All users crypto-locker because it locks your entire files on your system
must prioritize cyber security by installing effective antivirus and demands money to decrypt your files again [1]. Suppose
software, safeguarding their PCs against potential harm. Even
though there are many different kinds of malware, ransomware is you are a victim of this threat. In that case, you will only
particularly dangerous since it prevents victims from accessing have two choices: give demand money from the hacker or
their vital data or locks files permanently unless they pay a erase your entire data from the drive and install an operating
ransom to the attackers. Recent ransomware strains must be system, i.e., windows or any other, but is it simple to lose or
categorized promptly. Data for the present investigation was erase all your important data?
gathered from a variety of web resources, including Kaggle and
ransomware.re. Concerning using Kaggle to acquire harmless First, a ransomware .exe file gets downloaded from some
datasets, ransomware.re is retrieved for use in a study on ran- unknown resource, a phishing attack, or a downloadable file
somware. Many preprocessing methods, such as Normalisation through email. Once the user downloads it, it will appear as
and Imputation, are used to polish our datasets. The most recent a .exe file; when it installs it, it will take over to the system.
additions to the dataset were classified using the Random Forest It uses different parameters like the name of the computer,
tree classifier, with a final accuracy of 99.9%. Random Forest
Tree fared exceptionally well compared to the KNN and SVM information related to the processor, etc., to generate a key
algorithms. We also highlighted that additional preprocessing that is unique for various PCs; this hashed data will be used
methods can enhance outcomes for SVM and KNN. to uniquely identify victims.[2].2ndly, encrypt files and folders
Index Terms—ransomware, Viruses, Random Forest Tree, or lock the entire system or prevent normal usage. 3rd Step: It
Support Vector Machine, K nearest Neighbors will lock every important file and display a ransom message
with a deadline to pay and the amount the victim must pay to
I. I NTRODUCTION
unlock his files or computer. Targeted Files by ransomware
Antivirus software is the basic need of a computer system attack: Once this virus enters your computer, it will look
to protect it from computer attacks through viruses. With for the files with extensions .txt, .doc, .rft, .chm, .ppt, .cpp,
the increase in technology, cyber threats are also increasing, .db, .zip, .jpg, .mdb, .asm, .key, .pdf, .pgp, etc.[3] Encryption
requiring more computer system protection. Ransomware is method: when the target files are found, it encrypts files with
the latest and improved version of the threat. By this threat, the RAS + AES algorithm to prevent the owner from accessing
This work is funded by FCT/MEC through national funds and co-funded by them without paying for the attack. Ransomware limits a user
FEDER—PT2020 partnership agreement under the project UIDB/50008/2020. to use its computer, it encrypts all files and folders on a
laptop or locks the whole computer with a password, and the suggested using approaches such as permission-based feature
attacker demands money to give a key to decrypt all data of extraction and opcode n-gram to increase the detection rate.
the computer or give the password to provide full access to a According to the researchers, their system used various
computer. It is a widely spreading attack. Every 13 seconds, machine learning models to obtain high detection rates of
a computer is a victim of a ransomware attack, and there is a almost 97%. The framework also recorded minimal false pos-
million-dollar loss [4] to different companies due to this threat itive rates, showing it can successfully differentiate between
(virus). This research investigates the previously developed legitimate and malicious apps. The authors also discussed the
techniques to detect ransomware attacks and how people or framework’s shortcomings and upcoming work to increase
businesses can protect their data and privacy to avoid them. accuracy. In conclusion, the authors offer a viable method
Moreover, the proposed model would be able to detect and for identifying Android malware using machine learning tech-
classify ransomware families by using an ensemble learning niques. The quality of the data used to train the model and the
approach called Random Forest Tree. Results are compared reliability of the feature extraction technique will impact how
with the other three state-of-the-art classification models to well any machine learning-based malware detection system
validate the proposed model’s accuracy, precision, recall, and performs.
f1-score. H. Rathore et al. [7] concentrated on creating a malware de-
tection system that effectively utilizes Machine Learning and
A. Key Contributions of this work is summarized below Deep Learning techniques because the conventional signature-
1) Preparing the data for the impending ransomware attack. based approaches can no longer keep up with the quickly
2) Normalization of the dataset to prepare it for the model changing malware. The research’s technique involved training
to classify ransomware better. and evaluating multiple Machine Learning and Deep Learning
3) Implementation of proposed Model for the classification models on a dataset of good and bad Windows executable files.
of ransomware and compare results with other classifica- The assembly code of the executable file is used as input to
tion models to validate the performance of the proposed the models in the authors’ new feature extraction technique.
model. The study’s findings revealed that, with an average detection
rate of 97.32 percent, deep learning models outperformed
The rest of the paper will proceed as follows: Section II
machine learning models in terms of detection rate. However,
represents state-of-the-art work (related work). In section III,
this strategy’s drawback was the high computational cost
the proposed model is discussed. Section IV represents the
and memory requirement of deep learning models, which the
results obtained from this research, and finally, Section V will
authors suggested as a potential area of improvement in future
conclude this work and present future research directions.
work.
F. Khan et al. [8] revealed a novel method for identifying
II. R ELATED W ORK
ransomware utilizing digital DNA sequencing and machine
J. Hwang et al.[5] presented a two-stage mixed method learning methods. According to the authors, there is a need for
for detecting ransomware. The proposed method combined a more sophisticated strategy because conventional signature-
a Markov model and a Random Forest model to capture based ransomware detection methods are insufficient to iden-
the characteristics of ransomware. The first stage focused on tify new strains of ransomware. The authors put forth a brand-
Windows API call sequence patterns and used a Markov model new feature extraction technique that extracts features from
to extract the features of ransomware. In contrast, the second ransomware samples using digital DNA sequencing and ap-
stage used a Random Forest machine learning model on the plies a machine learning-based classifier to determine whether
remaining data to control false positive and negative error the samples are malicious or benign. They gathered a dataset
rates. The authors reported that the method achieved an overall of benign and ransomware files and utilized it to test and refine
accuracy of 97.3%, with a false positive rate of 4.8% and their suggested methodology. According to the authors, their
a false negative rate of 1.5%. This approach was presented approach had minimal false positive rates and high detection
as a promising solution that could improve current methods rates of around 98 percent. They admitted, nevertheless, that
of ransomware detection and could further be developed for the dataset they utilized was small and not a representative of
practical use. the entire ransomware ecosystem. And they also suggested that
Arvind Mahindru presented a framework named MLDroid future work should include a much more diverse and extensive
and A. L. Sangal [6] for identifying Android malware using dataset to increase the method’s robustness.
machine learning methods. The authors contend that due H. Zhang et al. [9] N-grams of opcodes have been proposed
to malware authors’ increased ability to elude detection by as a new machine-learning technique for categorizing various
changing their programs, existing signature-based malware ransomware families. The authors want to solve the issue
detection techniques are losing effectiveness. The authors of correctly recognizing and classifying different kinds of
suggested a model to detect malware using machine learning ransomware. The study takes ransomware samples and extracts
methods to solve this issue. They gathered a dataset of the opcodes, presenting them as N-grams. The samples are
good and bad Android applications to train and test multiple then classified using machine learning models that have been
machine-learning models for malware detection. They also trained using these N-grams. The study’s findings show that
the suggested method can accurately classify several ran- excluding specific scenario possibilities. The author advises
somware families, up to 97.44 percent. However, the authors testing the performance of the suggested evaluation metric
also acknowledge the modest size of the dataset employed using more challenging and realistic settings and a larger and
and the need for additional research to boost performance. more varied dataset.
The authors recommend evaluating the strategy and enlarging E.Berrueta et al. [13] offered a method based on sharing-file
the dataset size for future work. traffic analysis to find and stop crypto-ransomware activities.
Udayakumar N. et al. [10]presented a study on classifying The paper’s primary goal is to address the severe threat
malware samples using machine learning methods. The paper ransomware poses to individuals and businesses, particularly
investigated the application of several machine-learning tech- in corporate settings where one infected computer can lock
niques for precisely classifying and identifying malware sam- access to all shared files to which it has access. The suggested
ples. As part of the study’s approach, features from malware method keeps track of all data sent between clients and file
samples were extracted, and these features were used to train servers, and it uses machine learning to look for patterns in
a variety of machine learning algorithms, including Support the data that reveal ransomware operations while reading and
Vector Machines, Random Forests, and Neural Networks. overwriting files. It is the first proposal intended to function
The algorithms’ performance is then assessed using various with encrypted file-sharing protocols and clear text protocols.
criteria, including accuracy, precision, recall, and F1-score. This article aims to identify ransomware activity from a high
According to the study’s findings, when compared to the SVM activity from innocuous programs by extracting elements from
and neural network algorithms, Random Forest had the best network data that describe the activity of accessing, closing,
accuracy, at 98.5%. However, the study has significant draw- and changing files. More than 2,400 hours of ’not infected’
backs, including the short dataset size and lack of coverage of traffic from actual users and more than 70 ransomware files
all malware strains. from 33 different strains were used to train and test the
Sreelaja N.K. [11] presented a study using the Ant Colony proposed technique.
Optimisation (ACO) algorithm to boost signature matching’s The methodology utilized in the paper includes employing
effectiveness in filtering ransomware. The research suggested a network probe to acquire and analyze network traffic and
a novel strategy for leveraging the ACO algorithm to enhance machine learning methods to examine the captured data. The
the effectiveness of signature-based ransomware detection. number of TCP connections, bytes exchanged, the order of
The study’s methodology utilized the ACO algorithm to speed messages between the client and server, packet sizes, inter-
up the signature-matching step that identifies ransomware. The packet timings, inactivity times, connection durations, and
ACO algorithm determines which ransomware sample and combinations of any of these are among the features utilized
signature match best. The proposed strategy is then compared in the training and testing of the model. A neural network with
to the conventional binary search method and assessed using three hidden layers of neurons was discovered to be the most
a variety of metrics, including false positive and true positive effective model.
rates. The study’s findings show that the suggested method The validation findings demonstrate that the suggested tool
employing the ACO algorithm had a higher true positive rate can detect all ransomware binaries listed, even those not
of 99.5% and a lower false positive rate of 0.1%. The study utilized during the training phase. With more than 2400 hours
does, however, acknowledge significant limitations, including of real user traffic, the tool has a false positive rate of 0.004%.
that it was only tested on a small dataset of well-known ran- It can identify all ransomware binaries used during the training
somware strains and could not identify unidentified variants. phase in an average of 30.2 seconds. Only losing an average of
S.H. Kok et al. [12] presented a study on creating measures 99 MB of user data before discovery, it detects 100% of a batch
for evaluating machine learning-based methods for detecting of 10 crypto-ransomware binaries not utilized in the training
crypto-ransomware. The paper aims to suggest a brand-new phase. The study also identifies various tool limitations, such
evaluation metric for gauging the effectiveness of crypto- as the tool’s focus on just Microsoft Windows operating
ransomware detection systems based on machine learning. systems and its relevance to only cases where essential files are
Formulating a new evaluation metric based on the mean kept on a file server. The writers mention that the suggested
average precision (MAP) and the area under the receiver oper- approach is static and that future work will concentrate on
ating characteristic curve (AUC-ROC) of the study’s approach developing better adaptive training methodologies so that new
considers both the detection and false positive rates. After that, ransomware strains can be added to the model and assessed
the performance of several machine learning-based crypto- for improvement or decrement in results.
ransomware detection systems is assessed using the proposed
evaluation metric. III. M ETHODOLOGY
The study’s findings suggest that the suggested assess-
ment metric is superior to conventional metrics like accu- This research aims to describe the approach used to inves-
racy (86.5%), precision (85.2%), recall (90.7%), and F1-score tigate the problem of ransomware detection using Ensemble
(87.9%) for assessing the effectiveness of crypto-ransomware learning Random Forest Tree. We look into the possibility of
detection systems. The study does, however, admit significant using RFT techniques to improve the detection of ransomware
limitations, including using a small dataset for testing and attacks in a distributed network. The proposed method is
divided into three phases: Data Preprocessing phase, Ran-
somware detection using RFT, and Classification of Ran-
somware families and their variants.
A. Pre-Processing
In Preprocessing Phase, data is collected for benign and
ransomware files from different websites. After the collection
of the dataset, I converted all the datasets into an Excel
file and defined its features based on the requirements. After
finetuning, the dataset is converted into useful preprocessed
data. Columns included type of file, hash code, benign or
virus, and ransomware family. The dataset consists of 20,000
samples of ransomware and almost 30,000 benign files. We
also divided our dataset into train and test data with a ratio of
80: 20.
Algorithm 1 An algorithm for Pre-Processing phase

Procedure: P REPROCESS DATA
1) Download ransomware dataset in JSON format from
ransomware.re
2) Convert the ransomware dataset to CSV format using
https://csvjson.com/json2csv
3) Load the CSV file into the machine learning environ-
ment
4) Clean the data by handling missing values, duplicate
records, and irrelevant columns Fig. 1. Preprocessing of Dataset diagram
5) Process the dataset for features by selecting and extract-
ing relevant features
6) Download a dataset of benign files from Kaggle model’s dependability and precision in identifying ransomware
7) Combine the ransomware and benign datasets samples.
8) Use the preprocessed dataset for machine learning tasks The preprocessing of the dataset is an important component
9) Evaluate the quality of the preprocessed dataset to ensure that improves the effectiveness of the Random Forest model.
the accuracy and reliability of the machine-learning Preprocessing entails multiple stages, including dealing with
models missing or null values, removing duplicate records, and remov-
ing extraneous columns. The model is better suited to extract
relevant patterns and features after properly cleaning and
B. Ransomware Classification using Random Forest Tree preparing the dataset, resulting in high classification results.
The proposed methodology, shown in Figure 2, is intended An imputer library is used to address missing values in the
to solve the difficult task of classifying ransomware. It works dataset. The imputer library provides several ways for filling
with a normalized dataset, ensuring the data is scaled or trans- in or replacing missing values. This scenario uses an approach
formed to a standard range. This phase is critical for effective based on the most often occurring value in the relevant column.
modeling and categorization. The model uses a hard voting This imputation technique ensures that the dataset remains
strategy to categorize the ransomware samples, leveraging complete and accurate, avoiding any negative consequences
several classifiers’ aggregated judgments. The model makes for the classification process.
robust and accurate classification decisions by considering the In addition, an Ordinal Encoder is used to convert the
majority vote among these classifiers. dataset to float format. Ordinal encoding assigns numerical
The Random Forest (RF) technique is used as the classifier values to category variables, allowing the Random Forest
in this model. Because of its ability to manage complicated classifier to handle the data correctly. This encoding approach
and non-linear interactions within data, Random Forest has preserves the ordinal relationships between categories and
shown to be extremely effective in classifying the most recent enables the model to capture the inherent information in
forms of ransomware. The use of RF contributes to the model’s categorical variables.
outstanding performance. The suggested model delivers excep- Mathematically it is explained below.
tional accuracy results, achieving a flawless 99.9% accuracy
C. Normalization of Dataset
on both the training and testing datasets. With such high
accuracy, the model can correctly identify all instances in the
dataset without error. This accomplishment demonstrates the (Xnorm = NormalizationFunction(X)) (1)
In the dataset, values are stored in rows and columns and Algorithm 2 Classification using Random Forest
saved as strings. In this equation, X denotes the dataset, 1) Xnorm ← NormalizationFunction(X)
and Xnorm represents the normalized values stored in it after {Normalization of Dataset}
normalization for further action. 1 2) RF.fit(Xtrain , ytrain )
3) RF prediction ← RF.predict(Xtest ) {Predicting the
labels for testing data}
4) train accuracy ←
Imputation of Missing Values:
CalculateAccuracy(ytrain , RF.predict(Xtrain ))
{Calculate accuracy on training data}
Xtrain imputed = ImputeFunction(Xtrain ) Xtest imputed = 5) test accuracy ←
ImputeFunction(Xtest ) CalculateAccuracy(ytest , RF prediction)
After normalization, we noticed that there were some fields {Calculate accuracy on testing data}
with missing values in them. To remove those missing valued 6) Xtrain imputed ← ImputeFunction(Xtrain ) {Imputation
rows, we applied the imputation function. Here Xtrain-imputed of Missing Values}
and Xtest-imputed are the variables to store dataset values after 7) Xtest imputed ← ImputeFunction(Xtest ) {Imputation of
removing missing values by using ImputeFunction on training Missing Values}
and testing data. 8) Xencoded ← encoder.fit transform(X) {Ordinal En-
coding}
9) procedure C ALCULATE ACCURACY(true labels,
Ordinal Encoding: predicted labels)
10) total instances ← length(true labels)
11) correct predictions ← 0
Xencoded = encoder.fit transform(X) To convert categorical
12) for i ← 1 to total instances do
data into numerical data, we used an Ordinal encoder which
13) if true labels[i] = predicted labels[i] then
is a handy tool in Python to convert categorical data into
14) correct predictions ←
numerical data.
correct predictions + 1
15) end if
16) end for
D. Hard Voting Technique: 17) accuracy ← correct predictions
total instances
18) return accuracy
We prefer to use the hard voting technique to collect results 19) end procedure
of multiple decision trees because it’s easy to implement and =0
better to find better results and class labels with a majority
vote.
(MajorityVote(X) = arg max (Count(f1 (X)), Count(f2 (X)),
...,
Count(fk (X)))(2)
IV. R ESULTS AND D ISCUSSION
We have used the argMax() function to find the class label
index with the highest count, which will be our predicted class.
Random Forest (RF) Classifier:
RF(X) Finally, we have trained our random forest classifier

for the final prediction on the training and testing dataset.
Accuracy Calculation: In this research, we developed a new dataset after collecting

information from websites like Kaggle and ransomware.re. In
Number of correct predictions on training dataset
TrainAccuracy = Total number of training instances figure2 explains the percentage of ransomware families and
Number of correct predictions on testing dataset
TestAccuracy = Total number of testing instances benign files included in this dataset.
TABLE I
R ESULTS OF THE R ANDOM F OREST T REE
Precision Recall f1-score

0 1.00 1.00 1.00
1 1.00 1.00 1.00
Accuracy
1.00 1.00 1.00
Macro Avg.
weighted Avg. 1.00 1.00 1.00
Figure 3 shows the confusion matrix of the results.
Fig. 2. Percentage of Ransomware families and benign dataset
Assuming Training set
Xtrain = {(x1 , y1 ), . . . , (xn , yn )}
Drawn randomly from a probability distribution

(xi , yi ) ∼ (X, Y )
Fig. 3. confusion-matrix
Here X denotes Data, and Y is a target label. Here built
a classifier that predicts (x)f rom(y) based on the data set of For every occurrence in the dataset, the model has pro-
the example X. duced accurate predictions. In other words, all positive and
To represent ensemble classifiers h = {h1 (X), . . . , hk (X)} negative instances have been correctly identified, and neither
if each {hk (X)} false positives nor false negatives have ever occurred. This
Is a decision tree, then a combination of all will be a random scenario shows that the model has produced accurate predic-
forest. Defined parameters of the decision tree for classifier tions, resulting in a flawless categorization performance. K-
hk (X)tobe Nearest Neighbours (KNN), Support Vector Machine (SVM),
θk = (θk1 , θk2 , . . . , θkp ) and Random Forest were three machine learning algorithms
tested and contrasted. It is usual practice to employ these
It can be written in another way below. algorithms for categorization problems. According to the tests,
hk (x) = h(X | θk ) the Random Forest method fared better regarding prediction
So final classifier becomes accuracy than KNN and SVM. It demonstrates that Random
hk (x) = h(X | θk ) Forest accurately classified the data points with a better degree
of accuracy. The training and test datasets showed 99.9%
Classifier was then further tested on training and testing
accuracy for the Random Forest method. It suggests that
datasets and achieved 100% accuracy in classifying ran-
there were no false positives or negatives and that the model
somware in the early stages.
correctly predicted every incident in both datasets.
To compute precision, divide True positive with true positive
+ False positive V. C ONCLUSION
Precision = T PT+F P In this research, we classified ransomware families using
P
Results show a 1.00 precision value from the confusion random forest trees; we have applied data preprocessing tech-
matrix. niques like normalization and Simple Imputation on a new
dataset generated from the data given online on Kaggle and
And recall by using the following formula. ransomware.re. We used an Ensemble learning model called
Recall = T PT+FP
N results show a 1.00 Recall value obtained Random forest. Random Forest outperforms the classification
from the confusion matrix displayed. and shows 99.9% accuracy, while KNN and SVM didn’t
Random Forest classifier outperformed the classification and perform well and showed 97% and 74% accuracy, respec-
gave 100% accuracy. Results are shown in the table I tively. These algorithms can improve accuracy by applying
more refining and preprocessing on the dataset. Preprocessing [10] N. Udayakumar, V. J. Saglani, A. V. Cupta, and T.
describes the preliminary processes necessary to convert un- Subbulakshmi, “Malware classification using machine
processed or raw data into a format appropriate for additional learning algorithms,” in 2018 2nd International Confer-
analysis or modeling. ence on Trends in Electronics and Informatics (ICOEI),
The accuracy and efficiency of a model may be hampered IEEE, 2018, pp. 1–9.
by the noise, inconsistencies, missing values, or irrelevant [11] N. Sreelaja, “Ant colony optimization based light
information frequently present in raw data. Preprocessing weight binary search for efficient signature matching to
ensures the dataset is trustworthy and reflects the underlying filter ransomware,” Applied Soft Computing, vol. 111,
phenomenon by cleaning the data, eliminating outliers, and p. 107 635, 2021.
filling in any missing values. In short, Preprocessing sets the [12] S. Kok, A. Azween, and N. Jhanjhi, “Evaluation metric
foundation for successful data-driven tasks and significantly for crypto-ransomware detection using machine learn-
enhances the quality and reliability of the results obtained. ing,” Journal of Information Security and Applications,
vol. 55, p. 102 646, 2020.
ACKNOWLEDGMENTS
[13] E. Berrueta, D. Morato, E. Magaña, and M. Izal,
This work is funded by FCT/MEC through national funds “Crypto-ransomware detection using machine learning
and co-funded by FEDER—PT2020 partnership agreement models in file-sharing network scenario with encrypted
under the project UIDB/50008/2020. traffic,” arXiv preprint arXiv:2202.07583, 2022.
R EFERENCES
[1] A. Arabo, R. Dijoux, T. Poulain, and G. Chevalier,
“Detecting ransomware using process behavior analy-
sis,” Procedia Computer Science, vol. 168, pp. 289–296,
2020.
[2] D. Kansagra, M. Kumhar, and D. Jha, “Ransomware:
A threat to cyber security,” CS Journals, vol. 7, no. 1,
2016.
[3] A. Kapoor, A. Gupta, R. Gupta, S. Tanwar, G. Sharma,
and I. E. Davidson, “Ransomware detection, avoidance,
and mitigation scheme: A review and future directions,”
Sustainability, vol. 14, no. 1, p. 8, 2021.
[4] A. AlSabeh, H. Safa, E. Bou-Harb, and J. Crichigno,
“Exploiting ransomware paranoia for execution preven-
tion,” in ICC 2020-2020 IEEE International Conference
on Communications (ICC), IEEE, 2020, pp. 1–6.
[5] J. Hwang, J. Kim, S. Lee, and K. Kim, “Two-stage
ransomware detection using dynamic analysis and ma-
chine learning techniques,” Wireless Personal Commu-
nications, vol. 112, no. 4, pp. 2597–2609, 2020.
[6] A. Mahindru and A. Sangal, “Mldroid—framework for
android malware detection using machine learning tech-
niques,” Neural Computing and Applications, vol. 33,
no. 10, pp. 5183–5240, 2021.
[7] H. Rathore, S. Agarwal, S. K. Sahay, and M. Sewak,
“Malware detection using machine learning and deep
learning,” pp. 402–411, 2018.
[8] F. Khan, C. Ncube, L. K. Ramasamy, S. Kadry, and Y.
Nam, “A digital dna sequencing engine for ransomware
detection using machine learning,” IEEE Access, vol. 8,
pp. 119 710–119 719, 2020.
[9] H. Zhang, X. Xiao, F. Mercaldo, S. Ni, F. Martinelli, and
A. K. Sangaiah, “Classification of ransomware families
with machine learning based onn-gram of opcodes,”
Future Generation Computer Systems, vol. 90, pp. 211–
221, 2019.

Ransomware Detection and Classification Using Ensemble Learning: A Random Forest Tree Approach

Uploaded by

Copyright:

Available Formats

Ransomware Detection and Classification Using Ensemble Learning: A Random Forest Tree Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ransomware Detection and Classification Using Ensemble Learning: A Random Forest Tree Approach

Uploaded by

Copyright:

Available Formats

Ransomware Detection and Classification using

Ensemble learning: A Random Forest Tree

Mudassar Hussain Ibraheem Shayea

Ivan Miguel Pires

Algorithm 1 An algorithm for Pre-Processing phase

(MajorityVote(X) = arg max (Count(f1 (X)), Count(f2 (X)),

Random Forest (RF) Classifier:

RF(X) Finally, we have trained our random forest classifier

Accuracy Calculation: In this research, we developed a new dataset after collecting

Precision Recall f1-score

Figure 3 shows the confusion matrix of the results.

Fig. 2. Percentage of Ransomware families and benign dataset

Assuming Training set

Xtrain = {(x1 , y1 ), . . . , (xn , yn )}

Drawn randomly from a probability distribution

You might also like