Report Format
Report Format
Report Format
ABSTRACT ........................................................................................................................... 6
1
2.7 TECHNOLOGIES AND FRAMEWORK TO BE USED ............................................ 14
2.7.8 DATASET............................................................................................................. 16
3.4 CONCLUSION........................................................................................................... 23
2
4.3.1 TEST PLANNING ................................................................................................ 26
4.5 SUMMARY.............................................................................................................. 30
6.6 CONCLUSION........................................................................................................... 39
REFERENCES ..................................................................................................................... 42
APPENDICES ...................................................................................................................... 46
3
I Yotam Mkandawire student number 201905063, hereby declare that this is my
original work and it has never been submitted at any University for any award.
SIGN:
4
ACKNOWLEDGEMENTS
First and foremost, I would like to acknowledge my God for his grace, in times when
faced with impossible deadlines and personal challenges God strengthened me. Secondly, I
extend high acknowledgement to my family and friends for the love and support rendered to
me throughout the life of this project, it really does take a village to raise a child. Thirdly, this
academic piece of information would not have come to fruition without the vigilant guidance
of Dr. Aaron Zimba, his faith in my capabilities and the patience to see this project come
together is inspiring. Last but not least, I would also like to acknowledge my lectures for always
being there to support and guide whenever they could.
5
ABSTRACT
The term ransomware has become a common headline, and the impacts of this sort of
software have been fast expanding, leaving a trail of terrible losses in its wake. Individuals and
businesses have both been victims of ransomware, with victims having to forego millions of
dollars in ransom money. Victims have also suffered data losses as a result of failing to pay
the ransom or failing to unlock the encrypted data. The chaos caused by ransomware has
inspired research in the field, mitigation measures are also on the rise but majority of them are
focused on network-level detection and prevention. This leaves a significant research void in
the field of host-based ransomware mitigation approaches. As a result, the goal of this research
is to create a host-based ransomware detection model/framework that is capable of detecting
more recent ransomware variants. Incremental and spiral software development methodologies
were used for the development of the two main modules of this research, much focus was
dedicated to appropriately labeling the dataset and extracting optimum features using
conventional machine learning classifiers. Multiple classifiers were tested and the best
classifier with regards to accuracy was selected. Feature extraction from the sandbox report
was executed and a perdition was made. The results of the experiments showed an impressive
success rate. Findings are useful as cardinal points to consider in the field of ransomware
detection and prevention.
6
LIST OF FIGURES
Figure 1: Evolution of ransomware attack techniques (Zimba and Chishimba, 2019)............. 5
Figure 2: Novel attack model (Zimba and Chishimba, 2019) ................................................. 6
Figure 3: Recovery-prevention techniques (Kharraz et al., 2015) ........................................... 6
Figure 4: I/O access monitor in UNVEIL (Kharraz et al., 2015) ............................................ 8
Figure 5: Shows the workflow of the API monitoring program (Honda et al., 2018) .............. 8
Figure 6: Flow chart of the proposed framework. ................................................................ 10
Figure 7: Incremental model (Sommerville, 2011). .............................................................. 11
Figure 8: Waterfall Model (Sommerville, 2011) .................................................................. 12
Figure 9: Overview of dataset.............................................................................................. 16
Figure 10: System overview. ............................................................................................... 19
Figure 11: Initial step in model training. .............................................................................. 20
Figure 12: Optimize input set with Extra Tree Classifier. .................................................... 20
Figure 13: Split dataset to train and test set. ......................................................................... 21
Figure 14: Model training. ................................................................................................... 21
Figure 15: Use case diagram................................................................................................ 22
Figure 16: Activity diagram. ............................................................................................... 23
Figure 17: Unit testing of Model training module. ............................................................... 24
Figure 18: Unit testing for Report Processing module.......................................................... 25
Figure 19: Unit testing of the Detection module. ................................................................. 25
Figure 20: Classification report of the trained and tested Logistic Regression model. .......... 26
Figure 21: Confusion matrix. ............................................................................................... 27
Figure 22: ROC curve and AUC. ......................................................................................... 27
Figure 23: Host Based Ransomware Detection Web Application. ........................................ 28
Figure 24: Upload directory of the web application. ............................................................ 29
Figure 25: Submitted sample being processed. .................................................................... 29
Figure 26: Results of submitted sample. .............................................................................. 30
Figure 27: Triple Constraint model (Van Wyngaard, Pretorius and Pretorius, 2012) ............ 31
Figure 28: Shows the count-total for the proposed model .................................................... 34
Figure 29: Shows the Complexity of Weighting Factors for the proposed model. ................ 35
Figure 30: Shows the LOC for the proposed model. ............................................................ 35
Figure 31: Shows the effort and duration for the proposed model. ....................................... 35
Figure 32: Gantt chart view of software development plan. ................................................. 36
7
Figure 33: Schedule for software development. ................................................................... 36
Figure 34 Model training code snippet. ............................................................................... 46
Figure 35: Repost processing module code snippet. ............................................................. 47
8
LIST OF TABLES
Table 1: Comparison of Systems ......................................................................................... 10
Table 2: Technologies Used. ............................................................................................... 16
Table 3: Test planning ......................................................................................................... 26
Table 4: COCOMO Constants ............................................................................................. 34
9
ACRONYMS AND ABBREVIATIONS
FN - False Negative
FP - False Positive
I/O - Input/output
OS - Operating System
PE - Portable Executable
RSA - Rivest–Shamir–Adleman
TN - True Negative
TP - True Positive
10
CHAPTER 1-INTRODUCTION
1.1 INTRODUCTION
Devising defense mechanisms against ransomware is an impossible task without
having an insightful understanding of the paradigm, this chapter aims at giving a background
of ransomware, problem statement, project aim, project objectives, project scope, project
justification, and summary.
The AIDS Trojan Horse infamously known as PC-CYBORG, first made its appearance
in 1989, and it was the first known instance of ransomware. The victims were requested to pay
a $189 ransom by the malware (Hernandez-Castro, Cartwright and Cartwright, 2020). This
ransomware not only proved the concept, but it also coupled it with various current attack
strategies. "To fool the recipients, the Trojan was placed in a socially engineered package with
a floppy disk. The attacker mass-mailed the item by surface mail, addressing it to a mailing
list to which the attacker had subscribed. The creator of this spyware was arrested on blackmail
charges."(Geri, Jota and Avert, 2006).
Adam Young and Mote Yung introduced the notion of crypto-virology in the academic
literature for the first time in 1996 (Young and Yung, 1996). The practice of employing public-
key cryptography for extortion was a major feature in the Young and Yung technique, and the
cryptographic scheme utilized should not be open to compromise via key reverse-engineering.
To put it another way, once a victim has been infected, they have no choice but to communicate
with the attackers and possibly reimbursement a ransom in order to recover their files (Young
and Yung, 1996).
Ransomware has been divided into two types over the years: locker-ransomware and
crypto ransomware (O’Kane, Seer and Carlin, 2018). Locker ransomware essentially includes
corrupting or disrupting basic computer functionality while protecting the data integrity and
safety of the victim; it typically locks computing devices or user interfaces and requires a
ransom payment to unlock them. Crypto-ransomware on the other hand, encrypts the files of
victims on a computer or network and demands a ransom to decode them. It is worth noting
that crypto-ransomware assaults do not encode the entire hard disk, but rather look for
imperative file extensions that have the greatest impact on victims (Human et al., 2021).
1
At first, ransomware was mainly a problem for the Windows platform. However,
Linux, Mac and Android systems have all fallen prey to ransomware attacks. It has been
observed that technological advancement and ransomware evolution are seen to be directly
proportional. For example, an innovation such as a smart-watch already has ransomware
targeted at them like the ransomware written by researchers in 2016 that attacks the smart
thermostat (Casen, Li and Williams, 2021), “If researchers can do it, so can ransomware”
(Savage, Cogan and Lau, 2015).
In this research, interest is directed towards host-based detection techniques for the
most (with regards to the time of writing) recent ransomware attack techniques.
Data loss: some data can never be recovered once encrypted regardless of the
availability of a decryption key.
Data insecurity: ransomware does not offer a guarantee that the encrypted data will be
restored nor does it guarantee data integrity and confidentiality once a ransom is paid.
Data corruption: once encrypted, the integrity of some of the data is compromised.
Denial of service: service in this regard is having access to data, computer or network.
Extortion: ransomware is fundamentally a crime.
Abuse of Crypto-currency: Though ransom payments methods are left to the discretion
of the attacker, most ransomware utilize crypto-currency as the mode of payment.
Investigation Challenges: due to the incorporation of crypto-currencies in newer
ransomware attack techniques, tracking and investigations by the authorities has been
very limited and challenging.
1.3 AIM
The main aim of this project is to design a host-based ransomware detection framework
with the aid of machine learning.
2
1.4 OBJECTIVES
The following are the objectives of the project.
The project will be limited to host level detection only, as such the study will not be
looking at the network behavior or network-based attack and mitigation techniques.
The project will be focused on Windows operating system. This is due to the fact that
Windows as of the time of writing, remains the most used operating system globally compared
to other operating systems such Linux, Mac, FreeBSD etc. (• Computer operating systems
market share 2012-2021 | Statista, no date).
The project will not cover mobile device ransomware detection and mitigation
frameworks and as such will not cover mobile applications and operating systems such as
android, iOS, Solaris etc.
3
In the fight against ransomware, the growth of ransomware attack tactics is a major
source of concern. Modern ransomware attack techniques have proven to be resilient due to
their encryption and recovery-prevention techniques. The newer variants of ransomware use
hybrid cryptosystems in which the malware generates sub symmetric and symmetric keys of
the host using AES and RSA. The AES keys are used to encrypt the data, the entrenched key
is used to encrypt the sub-RSA key, which is used to encrypt the AES keys (Zimba and
Chishimba, 2019). Furthermore, newer ransomware strains have evolved to the extent of
including recovery-prevention tactics such as the erasing of volume shadow copies or
overwriting original target files after encryption (Zimba and Chishimba, 2019). Most
mitigation frameworks have become obsolete as a result of these novel strategies, which have
sparked research attention.
The losses in data and finances have continued to be on the rise and the need to device
a detection framework that can keep up with the newer strains of ransomware cannot be
overemphasized.
Because of the stealth and evasion strategies utilized in the latest strains of ransomware,
this study focuses on host level detection. Network-level detection tools cannot directly
witness the action of a malicious software and must rely on traffic generated by the malicious
program. Host-based malware detection techniques have the advantage of being able to view
the entire set of actions that a malware program performs, allowing harmful code to be
identified before it is run at all (Kolbitsch et al., 2009).
1.7 SUMMARY
This chapter introduced the project title “Host-based ransomware detection model with
machine learning”, it further brought about the background of the study. Statement of the
identified problem, project aim and objectives were thoroughly explained as well as project
justification.
4
CHAPTER 2 - LITERATURE REVIEW
2.1 INTRODUCTION
This chapter reviews ransomware detection models proposed by past researchers, a
comparison was made and a proposed model outlined. The literature was reviewed with
accordance to the objectives of the project.
5
Figure 2: Novel attack model (Zimba and Chishimba, 2019)
6
2.3 REVIEW OF EXISTING MODELS
2.3.1 BRIEF ON MALWARE ANALYSIS
There are two types of malware analysis, namely static and dynamic analysis. Static
analysis examines a malware file without actually running the program while dynamic analysis
involves executing the malware and examining its behavior on a particular device. Utilizing
static analysis for detection on a windows operating system relies on extracting anomalies in
the code and resources embedded in a PE file structure (analogous to signature-based
detection). Regardless of being the safest way to analyze malware, this method is
disadvantaged due to obfuscation. Obfuscation is a technique that makes programs harder to
understand, it converts a program to a new different version while making them functionally
equal to each other. Originally, this technology aimed at protecting the intellectual property of
software developers, but it has been broadly used by malware authors to elude detection (You
and Yim, 2010).
Utilizing dynamic analysis for detection involves extracting artifacts from the behavior
of a malware sample as it is executed. Malware detection techniques build on the two types of
malware analysis to broadly categorize detection techniques in two: deception-based methods
and behavior-based methods. The deception-based methods use decoy files to detect
ransomware activities or malicious activities. The behavior-based methods monitor file-related
operations to find out whether there is an abnormal process or not (Canfora et al., 2014).
7
randomness (entropy) between read and write data buffers, or the generation of new files with
a high entropy signature.
A study by Takanari (Takanari Shigeta et al., 2016), found that Locky and CryptoWall
ransomwares use Microsoft CryptoAPI or OpenSSL. This finding propelled the authors to
monitor API calls that relate to encryption as a method of attack detection. In this technique,
Encryptor are detected when they have attempted to start file encryption, prevention is achieved
by halting the API execution by the operating system (OS) as soon as detection occurs. To
monitor API calls from the target software, the detection program injects a DLL (dynamic-link
library) file into each software process.
Figure 5: Shows the workflow of the API monitoring program (Honda et al., 2018)
8
In 2018, (Takeuchi, Sakai and Fukumoto, 2018) proposed a ransomware detection
scheme for Microsoft computers based on support vector machines (SVM). Using Cuckoo
Sandbox, they dynamically retrieved characteristics from ransomware API invocation
sequences. When tested, the framework generated 2-gram count vectors that had a detection
accuracy of 97%.
(Sgandurra et al., 2016) created the EldeRan model: a novel real-time analysis and
detection system of ransomware. EldeRan runs ransomware in a sandbox environment while
also monitoring registry and system functions, it then extracts characteristics using a mutual
information approach and uses the Regularized Logistic Regression classifier to classify
ransomware and benign files.
(Alazab et al., 2010) proposed a malware detection method that extracts Window APIs
from the assembly code of executable files. By using Mutual Information (MI)–based
Maximum Relevance (MR) filter, their method selects important information from the
extracted Windows APIs. The selected information was applied to various machine learning
algorithms such as Naïve Bayes, Sequential Minimal Optimization (SMO), and K-Nearest
Neighbors (KNN), and their experimental results showed that the accuracy ranged from mid
ninety percent to high nineties.
In this paper, we made use of the dataset result of the EldeRan research paper, the
proposed model focused on API statistics, registry key delete, open, read, and write operations,
file delete, open, read and write operations, directory created and enumerated operations plus
strings. Features such as file extensions, dropped files, and dropped file extension were
excluded from our research scope due to the insufficient information on the same in the cuckoo
sandbox report. Our experimental samples are restricted to ransomware and benign samples.
9
2.4 COMPARISON OF MODELS
Table 1: Comparison of Systems
10
2.6 SELECTED METHODOLOGY
Considering that the project can be broadly segmented into two parts (the machine
learning framework and the web application), two software development methodologies were
applied appropriately.
11
Validation: At this phase, the performance of the existing function and additional
functionalities will be checked to ensure that each functionality works as required.
Because the analysis and documentation are far less in the incremental method than in
the waterfall model, the cost of adapting change in requirements is greatly reduced. Once
advanced work has been completed, the incremental approach allows software engineers to
gather input from key stakeholders, allowing users of the system to assess how much of the
requirements have been implemented. The model also allows for rapid delivery and
development to the user, allowing the user to benefit from the software sooner than if they
used any other methodology (Sommerville, 2011).
12
The waterfall approach should be utilized only when the requirements are thoroughly
comprehended and are unlikely to drastically change during the development process
(Sommerville, 2011). Below are some of the other conditions where the waterfall model is
recommended:
The following are the advantages of the waterfall methodology (Kramer, 2018):
13
Works well for smaller projects where requirements are clearly defined and very well
understood.
The following are the disadvantages of the waterfall methodology (Kramer, 2018):
Poor fault tolerance. The waterfall model does not facilitate backtracking, when an
error occurs in any stage the process has to start from requirement specification all over
again.
2.7.3 STREAMLIT
Streamlit is an open-source app framework for machine learning and data science that
was formed by three industry veterans: a Zoox VP of engineering and founder of Eterna and
FoldIt; a Google Hangouts web tech lead manager and a Google X AI project; and a Stanford
MBA who directed product and operations for numerous secretive Google X initiatives
(Streamlit).
14
workflows in data science, scientific computing, computational journalism, and machine
learning (Horton, 2020).
2.7.5 SCIKIT-LEARN
Scikit-learn is a free software machine learning library for the Python programming
language. It features various classification, regression and clustering algorithms such as
support-vector machines, random forests, gradient boosting etc. It is designed to work with the
Python numerical and scientific libraries such as NumPy and SciPy (scikit-learn: machine
learning in Python — scikit-learn 1.1.1 documentation, no date).
2.7.6 CLASSIFIERS
The proposed system will be developed based on six classification algorithms and the
best performing algorithm will be selected as our classifier model. The six classification
algorithms are Decision Tree Classifier, Random Forest Classifier, Gradient Boosting
Classifier, Ada Boost Classifier, Gaussian Naïve Bayes and Logistic Regression.
The selection of classifier will be based on how best they perform in the following
metrics:
Accuracy: Accuracy is the proportion of true results among the total number of cases
examined. This metrics is suitable for binary as well as multi-class classification
problems. Formally, accuracy has the following formula:
𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1.0)
𝑇𝑜𝑡𝑎𝑙𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Precision: also called the positive predictive value, is the fraction of relevant instances
among the retrieved instances.
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃 (1.1)
Recall: also known as sensitivity, is the fraction of relevant instances that have been
retrieved over the total amount of relevant instances.
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑃 (1.2)
15
2.7.7 CUCKOO SANDBOX
Cuckoo sandbox is an open-tool that is used to launch malware in a secure and
isolated environment. The idea is to fool the malware into thinking it has infected a genuine
host, then record the activity of the malware and then generate a report on what the malware
has attempted to do while in this secure environment (Jamalpur et al., 2018).
2.7.8 DATASET
The proposed model makes use of the EldeRan dataset that was proposed and made
accessible by (Sgandurra et al., 2016). The dataset consists of 582 working samples of
ransomware belonging to 11 different classes and 942 of good applications. The dataset
features include registry keys operations, API statistics, strings, file extensions, files
operations, directory operations and dropped files extensions.
16
2.8 SUMMARY
This chapter reviewed related literature, analyzed and evaluated existing host-based
ransomware detection frameworks, made a comparison of existing frameworks as well as
outline in detail technologies to be used in the proposed novel framework. Detail of the selected
development methodology has been expressed plus a description of the dataset used.
17
CHAPTER 3 – SYSTEM ANALYSIS AND DESIGN
3.1 INTRODUCTION
This chapter will introduce the design and analysis considerations of the developed
framework. System design and analysis are important phases in development as they provide
an avenue for solutions in the system through the various tasks involved in doing the analysis
as well as the design (Phillips and Nagle, 1985). This chapter will additionally instantiate
graphical blueprints of the developed system that form the basis of the developed systems
structure.
18
NON-FUNCTIONAL REQUIREMENTS
Performance: The system’s performance is efficient when all dependencies are met.
Reliability: The system is dependable when provided with the right data.
Usability: The system is easy to use and does not need any programming skills.
19
3.3.2 MODEL TRAINING
Determine feature and label sets
The figure below shows the first step in the model training process which involves
loading the dataset into a pandas data frame and splitting the data frame into feature-set and
output / label set.
Optimizing feature-set
The figure below shows the steps taken to rectify the problem of dimensionality. We
use the extra tree classifier as a dimensionality reduction technique (Cho and Kurup, 2011).
This reduces the number of feature from 30, 967 to x > 2, 000 features. The product of this
process is an optimized feature-set and a features list which is saved in a pickle file.
20
Dataset Splitting
The figure below shows the dataset splitting phase in model training. The new feature
set is split in two: the train set and test set. 80% of the data was used for training while the
remaining 20% was dedicated to testing the model as shown in the figure below.
21
3.3.2 WEB APPLICATION DESIGN
USE CASE DIAGRAM
This style of diagram depicts use cases, actors, and the interactions between them in
the form of an action and reaction behavior of the system from the user's perspective (Kumar
and Gupta, 2011). The diagram below shows the use case diagram, with the user as the primary
actor. The user first submits or uploads a file, the extension of the file is verified before
submitting the file to the sandbox for dynamic analysis. The user views updates from the
background processes, then finally views the detection results.
22
ACTIVITY DIAGRAM
This diagram graphically depicts the sequential flow of activities in a business process
or a use case, and it can also be used to describe actions that will be taken once an operation is
completed, as well as the outcomes of those actions.
3.4 CONCLUSION
The chapter presented an overview of the design and analytical considerations of the
developed system. The chapter additionally instantiated graphical blueprints of the developed
system that form the basis of the developed systems’ structure.
23
Chapter 4 – RESULTS ANALYSIS
4.1 INTRODUCTION
In order to determine the success of any research project, a series of tests have to be
done to determine its viability based on some performance measures. This chapter examines
the performance of the Host Based Ransomware Detection Model with Machine Learning on
five ransomware samples collected from https://github.com/ytisf/theZoo/tree/master/malware/Source/Original
24
As seen in figure 17 above, testing the model training module took 146.75 seconds to
test and the results are successful. At the end of the test we successfully had a features pickle
file with the optimum features that were extracted from the dataset using the extra tree
classifier and a trained & tested model with a full classification report.
As seen from the test results shown in the figure above, the time it takes for the report
to be processed and a fully prepped sample to be generated is 161.145 seconds (approximately
2.6 minutes). This complexity is quite high and makes the whole process disadvantaged with
regards to time.
As seen from the test results shown in the figure above, the time it takes for the actual
detection is 0.195 seconds which is very fast and efficient.
A unit test for the web application module was not undertaken because the Streamlit
framework which is being used as our server doesn’t support any performance testing (at the
time of writing), load testing was not considered because the system is not designed to be
25
communicated with via the internet and the only traffic being received by the application is
coming from cuckoo sandbox.
Figure 20: Classification report of the trained and tested Logistic Regression model. 26
CONFUSION MATRIX
The figure below shows the confusion matrix of our selected classifier. A confusion
matrix, also known as an error matrix, is a special table structure that permits visualization of
the performance of an algorithm in the field of machine learning, specifically the problem of
statistical classification (Luque et al., 2019).
The figure 24 below shows the upload button working as an upload directory is opened
and a file is selected to be submitted. Figure 25 shows the size of the uploaded file and the
progress bar which helps the user visualize how the process is going and how long it is taking
to complete the background processes.
28
Figure 24: Upload directory of the web application.
29
The figure 26 below shows the detection results on the web interface. Samples are
either benign or ransomware as such only one outcome is expected and as seen the submitted
sample in the figure below is ransomware.
4.5 SUMMARY
This chapter outlined the necessary steps which were explored in the development and
implementation of the host based ransomware detection framework. The system was designed,
developed and deployed successfully meeting the aim and objectives set beforehand.
30
CHAPTER 5 – PROJECT MANAGEMENT
5.1 INTRODUCTION
A risk is defined as exposure to specific elements that pose a danger to accomplishing
a project's desired outcomes (Schwalbe, 2015). On this premise, risk is typically described in
software projects as the probability-weighted impact of an incident on a project. The process
of identifying and analyzing potential issues that could have a negative impact on significant
business endeavors or crucial projects in order to assist businesses in avoiding or mitigating
those risks is known as risk analysis (Schwalbe, 2015). The technique of predicting the most
realistic amount of effort (expressed in terms of person-hours) required to develop or sustain
software based on incomplete, ambiguous, and noisy data is known as effort costing in
software development (Schwalbe, 2015).
This chapter will present concepts of risk analysis and project management concerning
the proposed approach. Later on, the risk register will be presented. Calculations involving
effort costing will also be outlined. A clear structure of the development work schedule for the
proposed model will also be presented.
Figure 27: Triple Constraint model (Van Wyngaard, Pretorius and Pretorius, 2012)
31
Below are six main processes involved in software risk management (Schwalbe, 2015)
namely:
Risk Management Planning: determining how to carry out risk management operations
for a project.
Identifying Hazards: identifying and recording the risks that may harm the project.
Conducting Qualitative Risk Analysis: identifying and prioritizing hazards for further
investigation by analyzing and compounding their likelihood of occurrence and impact.
Conduct Quantitative Risk Analysis: which entails examining the impact of identified
risks on overall project objectives.
Develop Risk Responses: choices and activities to improve opportunities and mitigate
threats to project objectives.
Risk Monitoring and Control: include establishing risk response plans, tracking
recognized risks, recognizing new risks, and assessing risk process efficacy throughout
the project.
𝐸 = 𝑎(𝐾𝐿𝑂𝐶)𝑏 (1)
Where ‘KLOC’ is the size of the code (Kilo-lines of code), ‘E’ is the software effort
computed in person-month and ‘a’, ‘b’ is the COCOMO model parameters. The value of ‘a’
and ‘b’ depend on the mode of the software project (Boehm et al., 1995). The three COCOMO
modes are described further below.
32
Organic (2-50 KLOC): A project can be treated as an organic type if the project deals
with developing a well-understood program, the size of the development team is
reasonably small, and the team members are experienced in developing using
frameworks familiar to all team members (Boehm et al., 1995).
The effort costing equation 3 and duration of the project is calculated using the equation
4 below.
Where the ‘period’ of the project is the effort in ‘person-month’. The proposed model
falls on organic type of project, Justification of this statement is the expected ‘KLOC’ is less
than 50K. Table 4.2 shows the value of the constants a, b, c and d.
33
Table 4: COCOMO Constants
a b c d
Organic 2.4 1.05 2.5 0.38
Semi-detached 3.0 1.12 2.5 0.35
An estimation of the project program size is 2000 lines of code based expert judgement.
To convert this into KLOC we use 𝐾𝐿𝑂𝐶=2000/1000. KLOC is then given as 2K, the effort is
then calculated using 𝐸 = 2.4 (2)1.05
𝐸 = 4.97 𝑝𝑒𝑟𝑠𝑜𝑛−𝑚𝑜𝑛𝑡ℎ.
Project duration: Using equation 1.2 we can find an estimation of project duration.
= 4.6 𝑚𝑜𝑛𝑡ℎ𝑠
The Figures Below Show The Cost Of The Project Based On COCOMO.
34
Figure 29: Shows the Complexity of Weighting Factors for the proposed model.
Figure 31: Shows the effort and duration for the proposed model.
Figure
Figure4.4
4.3Effort
complexity
and Duration
weighting factor
35
5.7 SCHEDULING AND WORK PLAN
The figures below shows the software development plan Ghantt chart. It graphically
illustrates the schedule of the different components in the software development phase of our
proposed model.
37
CHAPTER 6 – CRITICAL EVALUATION
6.1 INTRODUCTION
This chapter outlines the reason for undertaking the project, lessons learnt throughout
the development of the software. The chapter goes on to stipulate the challenges encountered
during the development of the system, as well as the future works.
38
I have understood the relevance of a development community and supervisor guidance
with regards to projects.
6.6 CONCLUSION
This chapter delivered the reasons with regards to why the project was undertaken and
the results from the developer’s perspective. It outline difficulties that were faced during the
development and future works of the system were proposed.
39
CHAPTER 7 – CONCLUSION
7.1 INTRODUCTION
Chapter one was a brief introduction to ransomware. The chapter elaborated on the
need for a more efficient ransomware detection framework for the Microsoft Windows
operating system. Additionally, the chapter covers the problem statement, aim, objectives,
scope, and justification of the proposed project.
Chapter two outlines a review of literature in the field of the proposed system, brief
descriptions of concepts as well as past related works were brought forward. The chapter
outlined in detail the selected development methodologies. The incremental development
methodology was selected because it can accommodate changes in the requirements when the
development of the system is in progress. Furthermore, incremental models provide the
capability to test and debug during model iterations. The waterfall model was chosen for the
development of the web application because the user requirements were very simple and
readily understood. The chapter went ahead to look at the technologies and frameworks to be
used for the development of the proposed system. PyCharm community version was chosen as
an ideal environment to develop the proposed system, Jupyter notebook was used for our
machine learning model training environment.
Chapter three gave an overview of system and design analysis, it stated the functional
and nonfunctional requirement. The chapter also includes the UML diagrams of the two
modules of this project.
Chapter four shows the results of the development process. Unit tests and system tests
were conducted and results shown. This chapter also outlines the results of our machine
learning model training.
Chapter five is initiated by defining risk management which involves identifying and
analyzing risk factors. The various risks that could possibly influence the proposed system
were likewise analyzed and these include: failure to complete the system as expected,
40
ambiguity in requirement, loss of information, inability to implement nonfunctional-
requirements, and unrealistic duration estimates. The section additionally shows effort costing
computations utilizing COCOMO online calculator and scheduling for the proposed system.
Chapter six delivered the reasons with regards to why the project was undertaken and
the results from the developer’s perspective. The chapter also brings to light the plethora of
challenges that were faced during the life of this project. Future works of the system were also
proposed.
41
REFERENCES
Alazab, Mamoun et al. (2010) ‘Zero-day malware detection based on supervised learning
algorithms of API call signatures’.
Alshamrani, A. and Bahattab, A. (2015) ‘A comparison between three SDLC models waterfall
model, spiral model, and Incremental/Iterative model’, International Journal of Computer
Science Issues (IJCSI), 12(1), p. 106.
Bannerman, P.L. (2008) ‘Risk and risk management in software projects: A reassessment’,
Journal of systems and software, 81(12), pp. 2118–2133.
Boehm, B. et al. (1995) ‘Cost models for future software life cycle processes: COCOMO 2.0’,
Annals of software engineering, 1(1), pp. 57–94.
Canfora, G. et al. (2014) ‘Metamorphic malware detection using code metrics’, Information
Security Journal: A Global Perspective, 23(3), pp. 57–67.
Cho, J.H. and Kurup, P.U. (2011) ‘Decision tree approach for classification and dimensionality
reduction of electronic nose data’, Sensors and Actuators B: Chemical, 160(1), pp. 542–548.
doi:10.1016/j.snb.2011.08.027.
Casen, M., Li, F. and Williams, D. (2021) ‘Friend or Foe: An Investigation into Recipient
Identification of SMS-Based Phishing’, in Furnell, S. and Clarke, N. (eds) Human Aspects of
Information Security and Assurance. Cham: Springer International Publishing (IFIP Advances
in Information and Communication Technology), pp. 148–163. doi:10.1007/978-3-030-
81111-2_13.
Computer operating systems market share 2012-2021 | Statista (no date). Available at:
https://www.statista.com/statistics/268237/global-market-share-held-by-operating-systems-
since-2009/ (Accessed: 25 November 2021).
Daka, E. and Fraser, G. (2014) ‘A survey on unit testing practices and problems’, in 2014 IEEE
25th International Symposium on Software Reliability Engineering. IEEE, pp. 201–211.
Geri, B.N., Jota, N. and Avert, M. (2006) ‘The emergence of ransomware’, AVAR, Auckland
[Preprint].
Hampton, N. and Baig, Z.A. (2015) ‘Ransomware: Emergence of the cyber-extortion menace’.
Human, M. et al. (2021) ‘Internet of things and ransomware: Evolution, mitigation and
prevention’, Egyptian Informatics Journal, 22(1), pp. 105–117. doi:10.1016/j.eij.2020.05.003.
42
Jamalpur, S. et al. (2018) ‘Dynamic malware analysis using cuckoo sandbox’, in 2018 Second
international conference on inventive communication and computational technologies
(ICICCT). IEEE, pp. 1056–1060.
Kharraz, A. et al. (2015) ‘Cutting the gordian knot: A look under the hood of ransomware
attacks’, in International Conference on Detection of Intrusions and Malware, and
Vulnerability Assessment. Springer, pp. 3–24.
Khoirom, S. et al. (2020) ‘Comparative analysis of Python and Java for beginners’, Int. Res. J.
Eng. Technol, 7(8), pp. 4384–4407.
Kim, C.W. (2018) ‘Ntmaldetect: A machine learning approach to malware detection using
native api system calls’, arXiv preprint arXiv:1802.05412 [Preprint].
Kolbitsch, C. et al. (2009) ‘Effective and efficient malware detection at the end host.’, in
USENIX security symposium, pp. 351–366.
Kramer, M. (2018) ‘Best practices in systems development lifecycle: An analyses based on the
waterfall model’, Review of Business & Finance Studies, 9(1), pp. 77–84.
Kumar, R. and Gupta, D. (2011) ‘Object oriented design heuristics’, International Journal of
Engineering Science and Technology (IJEST), 3(1), pp. 459–463.
Luo, X. and Liao, Q. (2009) “Ransomware: A new cyber hijacking threat to enterprises,”.
Luque, A. et al. (2019) ‘The impact of class imbalance in classification performance metrics
based on the binary confusion matrix’, Pattern Recognition, 91, pp. 216–231.
O’Kane, P., Seer, S. and Carlin, D. (2018) ‘Evolution of ransomware’, IET Networks, 7(5), pp.
321–327.
Phillips, C.R. and Nagle, N.T. (1985) ‘Digital control system analysis and design’, IEEE
Transactions on Systems, Man, and Cybernetics, SMC-15(3), pp. 452–453.
doi:10.1109/TSMC.1985.6313385.
PyCharm: the Python IDE for Professional Developers by JetBrains (no date). Available at:
https://www.jetbrains.com/pycharm/ (Accessed: 2 June 2022).
Ransomware Attackers Get Short Shrift From Zambian Central Bank - Bloomberg (no date).
Available at: https://www.bloomberg.com/news/articles/2022-05-18/ransomware-attackers-
get-short-shrift-from-zambian-central-bank (Accessed: 4 June 2022).
43
Satzinger, J.W., Jackson, R.B. and Burd, S.D. (2015) Systems analysis and design in a
changing world. Cengage learning.
Savage, K., Cogan, P. and Lau, H. (2015) ‘The evolution of ransomware’, Symantec, Mountain
View [Preprint].
Sawant, A.A., Bari, P.H. and Chawan, P. (2012) ‘Software testing techniques and strategies’,
International Journal of Engineering Research and Applications (IJERA), 2(3), pp. 980–986.
Takeuchi, Y., Sakai, K. and Fukumoto, S. (2018) ‘Detecting ransomware using support vector
machines’, in Proceedings of the 47th International Conference on Parallel Processing
Companion, pp. 1–6.
Trendowicz, A. and Jeffery, R. (2014) ‘Software project effort estimation’, Foundations and
Best Practice Guidelines for Success, Constructive Cost Model–COCOMO pags, 12, pp. 277–
293.
unittest — Unit testing framework — Python 3.10.4 documentation (no date). Available at:
https://docs.python.org/3/library/unittest.html (Accessed: 4 June 2022).
Van Rossum, G. and others (2007) ‘Python Programming language.’, in USENIX annual
technical conference, pp. 1–36.
Van Wyngaard, C.J., Pretorius, J.-H.C. and Pretorius, L. (2012) ‘Theory of the triple
constraint—A conceptual review’, in 2012 IEEE International Conference on Industrial
Engineering and Engineering Management. IEEE, pp. 1991–1997.
You, I. and Yim, K. (2010) ‘Malware obfuscation techniques: A brief survey’, in 2010
International conference on broadband, wireless computing, communication and applications.
IEEE, pp. 297–300.
44
Zimba, A. and Chishimba, M. (2019) ‘Understanding the evolution of ransomware: paradigm
shifts in attack structures’, International Journal of computer network and information
security, 11(1), p. 26.
45
APPENDICES
1. Model Training
46
2. Report Processing
47