0% found this document useful (0 votes)

73 views

Phase 1 Report Group ID CSE19-G58 Malware Detection Using ML

This project report discusses using machine learning techniques for malware detection. It aims to explore traditional and deep learning models for classifying software as malware or safe. Lightweight models with minimal computational load are sought. Phase 1 involves studying malware datasets containing Portable Executable files and extracting static and dynamic features to engineer image-based tensor data for training models to classify malware types. Tools like Keras will be used for generating tensor images and splitting datasets into test and train subsets.

Uploaded by

Kartikey Sharma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views

Phase 1 Report Group ID CSE19-G58 Malware Detection Using ML

Uploaded by

Kartikey Sharma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

A PROJECT REPORT (CS321)

MALWARE DETECTION USING MACHINE LEARNING

A report submitted in partial fulfilment of the requirement for the award of

the degree of

BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING

Submitted to:
Dr. Atul Kumar Srivastava
Assistant Professor

Submitted by:
Anagh Sharma, CSE A
Kartikey Sharma, CSE A
Vishal Bora, CSE ML
Harendra Singh Bisht, CSE A

SCHOOL OF COMPUTING
DIT UNIVERSITY, DEHRADUN
(State Private University through State Legislature Act No. 10 of 2013 of Uttarakhand and approved by UGC)

Mussoorie Diversion Road, Dehradun, Uttarakhand - 248009, India.

2021
DECLARATION

We here by certify that the work, which is being presented in the Phase 1 report, entitled
Malware Detection Using Machine Learning, in partial fulfilment of the requirement for the
award of the Degree of Bachelor of Technology and submitted to the DIT University is an
authentic record of our work carried out under the guidance of Dr. Atul Kumar Srivastava.

Date: 23-10-2021
Signature of the Candidates
1. Anagh Sharma 2. Kartikey Sharma 3. Vishal Bora 4. Harendra Singh Bisht

Signature of Guide

2
ACKNOWLEDGEMENT

This project is a culmination of invaluable guidance and encouragement from various people at
DIT University.

We would first like to thank our guide, Dr. Atul Kumar Srivastava for his encouragement and
guidance throughout the project. We are wholeheartedly thankful to him for giving us his valuable
time & attention and for providing us with regular feedback to help us in progressing our project
in time. Then we would also like to thank our friends and family for their support.

Anagh Sharma, Roll Number: 190102260

Kartikey Sharma, Roll Number: 190102261
Vishal Bora, Roll Number: 190178033
Harendra Singh Bisht, Roll Number: 190102041
Group ID: CSE19-G58

3
ABSTRACT

With the rise in complexity of cybersecurity techniques and a simultaneous growth of

sophistication with which cybercriminals write and obfuscate malware, the research and industry
investment in exploring machine learning techniques to develop more robust security solutions
have also increased exponentially.
This project has the following two goals: (1) to explore the effectiveness of traditional machine
learning methods along with deep learning approaches to identify malicious software, and (2) to
identify light weight model(s) that produce minimal computational load on user devices.
The static and dynamic features of a file are to be used to analyse a file’s nature using several
configurations of feature selection. The files are to be either classified as malware or safe, or further
subclassified into various of malware types.

4
TABLE OF CONTENTS

CHAPTER PAGE NO.

Declaration 2
Acknowledgement 3
Abstract 4
Chapter 1 – Introduction 7
1.1. Malicious software 7
1.2. Machine learning 8
1.3. Malware detection using machine learning 9
Chapter 2 – Phase 1: Description 11
2.1. Study of datasets 11
2.2. Portable Executable file format 14
2.3. Feature engineering 16
Chapter 3 – Tools and Technologies 19
Chapter 4 – Summary: Phase 1 20
Chapter 5 – Project Pathway 20
Bibliography 21

Annexure –
Implementation and Code 22

Plagiarism Report: 9% Only

5
List of Figures

FIGURE NAME PAGE NO.

Fig 1: Malware classes in MMCC dataset 12

Fig 2: Malware sample files 13
Fig 3: Malware classes in Malimg dataset 14
Fig 4: PE file structure 15
Fig 5: Feature engineering in machine learning pathway 16
Fig 6: Binary file to PNG representation 16
Fig 7: Generation of tensor image data using Keras 17
Fig 8: A sample of PNG malware image representation 17
Fig 9: Splitting the dataset into test and train subsets 18
Fig 10: Phase 1: Summary 20
Fig 11: Project pathway 20

6
Chapter 1 – Introduction

In recent years, the increased usage and widespread understanding of computer systems has led to
a growth in the attempts to exploit the systems for data and money, or simply with the intent of
vandalism. According to Cybersecurity Ventures, ransomware attacks alone are projected to cost
over twenty billion US dollar in 2021 [1]. This threat has led to a growth in the malware analysis
services sector that seeks to offer solutions which can help classify software as malicious or not.
A study by MarketsandMarkets Research Pvt. Ltd. concluded that between 2019 and 2024, market
for malware analysis shall grow from three to twelve billion US dollars [2].

1.1. Malicious software

Malicious software, or malware for short, is a program that is designed to compromise a digital
system. Malware specimens were initially made for experimentation purposes to study the
vulnerabilities in computer system architecture and programming. Malware has rapidly evolved
from targeting personal computers to almost every device used today, from mobile phones to
ATMs. The diversity of malware and the complexity with which malware is launched will only
increase in future, as shown by various studies.

Various organizations broadly classify malware in following categories:

1. Worms: A malware which self-replicates and spreads to other computers through a

computer network. Worms can corrupt and steal data, install backdoors for hackers, and
consume memory and bandwidth.
2. Virus: The term “virus” was coined by Frederick B. Cohen in his Ph.D. thesis in 1986. He
defined the term as: “A program that can infect other programs by modifying them to
include a, possibly evolved, version of itself.”.
3. Trojan horse: A software which disguises itself as a legitimate program and when
downloaded and installed, is used to disrupt or damage data or a network.

7
4. Spyware: A malware that installs itself on a computer to monitor the behaviour and to steal
sensitive information from a user. Spyware can also be used to grant remote access of the
compromised system to predators.
5. Adware: This type of software is designed to help companies generate more revenue by
automatically displaying advertisement banners and pop-ups while another program is
running.
6. Ransomware: A program which threatens to perpetually block or publish personal user data
unless a ransom is paid. The attacker does so by using a disguised link to trick the user into
downloading the malware file which then encrypts the user data. The data can then only be
unlocked through a secret key which is usually promised to be revealed upon payment.

1.2. Machine learning

Machine learning is a part of the broader subject of artificial intelligence which is applied to enable
a machine to use real-world data in order to solve a problem, i.e., machine learning is a
combination of statistics, applied mathematics, and computer science. Machine learning allows
computers to automatically improve their performance from experiences and without any
additional programming to make those improvements. A model training algorithm is used to
generate a model which can generalise well on unseen data.

Machine learning is classified into the following three paradigms:

1. Supervised Learning: This methodology uses real-world datasets which consist of training
features and associated labels to generate an inferred function which can then be used to
label unseen and unlabelled data.
2. Unsupervised Learning: In unsupervised learning approach, an unlabelled and
uncategorized dataset is used to train a machine learning model which must discover
patterns to associate features with labels by itself.
3. Reinforcement Learning: This approach involves training through reward and punishment
of the behaviour of an intelligent agent, which takes actions with the goal of maximizing
cumulative reward.

8
Deep learning, is a subfield of machine learning wherein, features are extracted from raw data
progressively, from lower to higher degree of abstraction. Deep learning algorithms are composed
of multiple layers arranged in a hierarchical manner of increasing complexity.

1.3. Malware detection using machine learning

Earlier, the methodology used for malware detection involved manual configuration of malware
fingerprints by analysts and security experts. This signature-based malware detection was used
extensively throughout the industries and involved detection through manually configured and
regularly updated pre-execution rules. These rules were based on the features of files such as
fragments of code and several other properties of a file.

But this methodology eventually became obsolete due to the substantial increase of the volume of
malware produced that made manual configuration of fingerprints unrealistic since even a small
change in a file’s properties could render these rules ineffective.

This has led to the exploration of machine learning and deep learning-based malware detection
approaches which are able to use intelligent solutions in order to keep up with the evolution of
malware. The following methods are used in malware detection:

1. Static analysis: Features of a program are extracted and used to predict its nature without
executing the code. Such features include but are not limited to file format, binary data of
the file, text strings, etc. This method is least computationally expensive but could fail to
detect malware if it uses effective code obfuscation.
2. Dynamic analysis: This method takes a behaviour-based approach in examining an
executable file. The executable file is run and its behaviour is studied on either an air-
gapped machine, a virtual machine or a sandbox. The behaviour includes API calls,
memory writes, registry changes, etc. This method is used only after static methods have
been exhausted.
3. Hybrid malware detection: This method combines both static and dynamic methodologies.

9
Deep learning-based approaches do not require expertly selected feature configurations based on
domain knowledge. Instead, deep learning involves approaches that include:

1. Extracting a feature vector to represent the executable.

2. Examining grayscale image pixel intensity data that is produced using the binary content
of a file.
3. Studying API call traces.
4. Examining binary representation of a file, etc.

10
Chapter 2 – Phase 1: Description

Following the machine learning pathway, the first step in the first phase of this project is to explore
various datasets suitable for this study. The datasets explored in this project include executable
files, binary representations of those files, and metadata information. The executable files are
studied without their execution as this study focuses on static malware detection.

This is followed by an examination of features from a dataset to assess which features could be
most useful to train a generalized machine learning model. At this stage, this is done by learning
from the results of related work in this field. In the further stages, features will be selected based
upon usefulness measured by training models and observing their performance on unseen data.

2.1. Study of datasets

This is a fundamental stage of the machine learning pathway and involves identifying appropriate
data sources and available datasets. The following are some of the attributes of a dataset which are
used to assess the usefulness of a dataset:

1. The size of the dataset

2. The source and age of the dataset
3. The age of the dataset
4. Feature representation
5. The number of unlabelled samples

11
The following datasets were studied in this project:

1. Microsoft Malware Classification Challenge (BIG 2015) [3]: This dataset was published
by Microsoft as part a malware classification challenge hosted on Kaggle in 2015. When
uncompressed, this dataset contains over half a terabyte of data. The dataset consists of
bytecode and disassembly code from over twenty thousand malware files.

The malware samples which are represented in this dataset belong to over nine malware
families. These malware families, or classes, are shown in Fig 1.

Ramnit Lollipop

Kelihos_
Gatak
ver3

MMCC
DATASET Vundo
Obfusca
tor.ACY

Kelihos_ Simda
ver1
Tracur

Fig 1: Malware classes in MMCC dataset

The MMCC malware samples have following specifications:

a. A name: MD5 hash value to uniquely identify the file

b. A label: Integer representation of one of the nine malware families

12
Each malware sample has two files associated with it which are described in Fig 2.

.bytes file .asm file

Hexadecimal
Metadata
representation
information of
of a sample’s
the sample
binary content

PE header is Includes
removed to function calls,
ensure sterility strings, etc.
Fig 2: Malware sample files

2. Sophos-ReversingLabs 20 million dataset [4]: This dataset is hosted on Amazon S3 storage

service and contains approximately over eight terabytes of data. This is a large-scale dataset
comprised of almost twenty million files with pre-extracted features and metadata, labels
derived from various sources, along with “tags” related to every malware file which serve
as additional targets. Additionally, the dataset contains over ten million disarmed malware
samples.

The SoReL-20M dataset contains the following data for each malware sample:
a. Features of the samples that are derived in accordance with the format of the EMBER
2.0 dataset.
b. Labels for each sample which are obtained by using both external as well as internal
Sophos sources.
c. PE metadata of malware files obtained using pefile library.
d. Binary files of malware samples.

3. Malimg (Nataraj et al., 2011) dataset [5] [6]: This dataset contains PNG image
representation of nearly nine thousand malware files. Over 25 malware families are
represented in this dataset.

13
The malware families represented in this dataset are shown in Fig 3.

Fig 3: Malware classes in Malimg dataset

2.2. Portable Executable file format

The Portable Executable, or PE format is employed for executable/dll files in the Windows
environments and was first used in Windows NT operating system. The contents of the file are
composed in a linear manner. Features are extracted from this file to train a machine learning
model which is used in static malware analysis.

The structure of a PE file is described in the Fig 4.

14
Fig 4: PE file structure [7]

15
2.3. Feature engineering

At this stage, the raw data of the samples is transformed into features which can be used to
effectively train a machine learning model.

Raw Data Features Model Output

Feature
Fitting Inference
Engineering
Fig 5: Feature engineering in machine learning pathway

In this instance of feature engineering, the hexadecimal content of the malware file samples is
transformed to produce a PNG image representation of the malware file. These PNG files will be
used in the later phases of this project to train a deep learning model.

Malware Binary
Data PNG Image

Hexadecimal
Representation
Fig 6: Binary file to PNG representation

Feature engineering is a continuous process in the machine learning pathway and newer features
will be explored throughout the later stages as this project evolves. The image representation of
malware files is only an example of possible features.

16
The PNG images as such cannot be directly fed to the model since there is a great variance in sizes
of malware images. These images need to be processed first so that they are of a common scale.
This helps a deep learning model to converge faster.

The ImageDataGenerator.flow_from_directory() method of the TensorFlow python library is used

to generate this tensor image data as shown in Fig 7. This demonstration uses only ten samples
each from all the malware classes. In later stage of this project, thousands of samples will be
processed to train a deep learning model.

Fig 7: Generation of tensor image data using Keras

A sample of normalized tensor image data created using the TensorFlow python library is shown
in Fig 8. This data is used to train a deep learning model.

Fig 8: A sample of PNG malware image representation

Feature engineering is the most time-consuming stage of the machine learning pathway and
involves deep statistical analysis of the data. An exhaustive study of the features on the basis of
domain knowledge and with respect to specific machine learning models is required.

17
Finally, after feature engineering the resultant data is split into the following two randomly
generated subsets using the train_test_split() method of the Scikit-learn python library:

1. Training set: Used to train a machine learning model to help it understand the relation
between the features and labels.
2. Testing set: The performance of a model is assessed using this subset by the use of several
evaluation metrics.

Fig 9: Splitting the dataset into test and train subsets

18
Chapter 3 – Tools and Technologies

Hardware Requirement:
1. CPU : Intel core i5 8th generation or better
2. GPU : Preferred
3. RAM : 8 GB or better

Software Requirements:
1. Anaconda computer program including all essential machine learning tools.

Tools Used:

1. Anaconda: A python and R distribution which is primarily used for data science and
machine learning applications. The primary objective of this tool is to make package
management simple. It includes data-science packages adapted to Windows, Linux and
macOS.

2. Keras: An open-source python library works as interface for the TensorFlow python
library. Keras provides high level APIs for machine learning.

3. JupyterLab: An interactive environment which helps in integrating code, data and notebook
in a single interface.

4. Amazon S3: Amazon Simple Storage Service or Amazon S3 in cloud service that is used
for object storage and is provided by the Amazon Web Services. Practically infinite amount
of data can be stored and accessed from anywhere through Amazon S3.

19
Chapter 4 – Summary: Phase 1

The first phase of this project follows the machine learning procedure as described in Fig 20.

Fig 20: Phase 1: Summary

Chapter 5 – Project Pathway

The project on malware detection using machine learning is divided into three phases as described
in Fig 21.

Fig 21: Project pathway

20
Bibliography

[1] D. Braue, "Global Ransomware Damage Costs Predicted To Exceed $265 Billion By 2031," 3 June
2021. [Online]. Available: https://cybersecurityventures.com/global-ransomware-damage-costs-
predicted-to-reach-250-billion-usd-by-2031/.

[2] businesswire, "Global Malware Analysis Market Expected to Grow with a CAGR of 31% During the
Forecast Period, 2019-2024 - ResearchAndMarkets.com," businesswire, 13 December 2019.
[Online]. Available: https://www.businesswire.com/news/home/20191213005123/en/Global-
Malware-Analysis-Market-Expected-to-Grow-with-a-CAGR-of-31-During-the-Forecast-Period-2019-
2024---ResearchAndMarkets.com.

[3] Microsoft, "Microsoft Malware Classification Challenge (BIG 2015)," Microsoft, 2015. [Online].
Available: http://arxiv.org/abs/1802.10135.

[4] Sophos, "Sophos-ReversingLabs (SOREL) 20 Million sample malware dataset," Sophos-

ReversingLabs, 14 December 2020. [Online]. Available: https://ai.sophos.com/2020/12/14/sophos-
reversinglabs-sorel-20-million-sample-malware-dataset/.

[5] L. &. K. S. &. J. G. &. M. B. Nataraj, Malware Images: Visualization and Automatic Classification,
2011.

[6] H. Mallet, "Malware Classification using Convolutional Neural Networks — Step by Step Tutorial,"
27 May 2020. [Online]. Available: https://towardsdatascience.com/malware-classification-using-
convolutional-neural-networks-step-by-step-tutorial-a3e8d97122f.

[7] Wikipedia, "Portable Executable," [Online]. Available:

https://en.wikipedia.org/wiki/Portable_Executable.

[8] Kaspersky, "Machine Learning for Malware Detection," [Online]. Available:

https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-
Learning.pdf.

21
Annexure – Implementation and Code

Malware Detection Using Machine Learning, Phase 1

October 23, 2021

1 A PROJECT ON MALWARE DETECTION USING MA-

CHINE LEARNING
Group Members: Anagh Sharma | Kartikey Sharma | Vishal Bora | Harendra Singh Bisht
Group ID: CSE19-G58

Importing relevant libraries

[1]: import os
from math import log
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import random

Defining the path to the dataset folder

[2]: root = 'C:
‹→\\Users\\ANAGH\\Documents\\College\\Group_Project\\Datasets\\Microsoft␣

‹→Malware Classification Challenge\\MMCC'

Displaying malware classes of the dataset

[3]: for subdir, dirs, files in os.walk(root):
for d in dirs:
print(f'Malware Class: {d}')

Malware Class: 1_Ramnit

Malware Class: 2_Lollipop
Malware Class: 3_Kelihos_ver3
Malware Class: 4_Vundo
Malware Class: 5_Simda
Malware Class: 6_Tracur
Malware Class: 7_Kelihos_ver1
Malware Class: 8_Obfuscator.ACY
Malware Class: 9_Gatak

Converting hexadecimal representation of the file’s binary content to PNG image

22
[4]: for subdir, dirs, files in os.walk(root):
for d in dirs:
print(f'Iterating through folder of class: {d}')
def convert(array,name):
print('Converting: '+name)
b = int((array.shape[0]*16)**(0.5))
b = 2**(int(log(b)/log(2))+1)
a = int(array.shape[0]*16/b)
array = array[:a*b//16,:]
array = np.reshape(array,(a,b))
img = Image.fromarray(np.uint8(array))
img.save(os.path.join(subdir, d)+'\\'+name+'.png', "PNG")
return img

files = os.listdir(os.path.join(subdir, d))

for counter, name in enumerate(files):
if '.bytes' != name[-6:]:
continue
f=open(os.path.join(subdir, d)+'/'+name)
array=[]
for line in f:
c = line.split()
if len(c)!=17:
continue
array.append([int(i,16) if i!='??' else 0 for i in c[1:] ])
convert(np.array(array),name)
del array
f.close()

Iterating through folder of class: 1_Ramnit

Converting: 01kcPWA9K2BOxQeS5Rju.bytes
Converting: 04EjIdbPV5e1XroFOpiN.bytes
Converting: 05EeG39MTRrI6VY21DPd.bytes
Converting: 05rJTUWYAKNegBk2wE8X.bytes
Converting: 0AnoOZDNbPXIr2MRBSCJ.bytes
Converting: 0AwWs42SUQ19mI7eDcTC.bytes
Converting: 0cH8YeO15ZywEhPrJvmj.bytes
Converting: 0DNVFKwYlcjO7bTfJ5p1.bytes
Converting: 0DqUX5rkg3IbMY6BLGCE.bytes
Converting: 0eaNKwluUmkYdIvZ923c.bytes
Iterating through folder of class: 2_Lollipop
Converting: 01IsoiSMh5gxyDYTl4CB.bytes
Converting: 02JqQ7H3yEoD8viYWlmS.bytes
Converting: 02K5GMYITj7bBoAisEmD.bytes
Converting: 02MRILoE6rNhmt7FUi45.bytes
Converting: 02zcUmKV16Lya5xqnPGB.bytes
Converting: 05aiMRw13bYWqZ8OHvjl.bytes
Converting: 05IXcWGxvnkto4sq17zZ.bytes

23
Converting: 05Kps4iFw8mOLJZQrb1H.bytes
Converting: 065EZhxgbLRSHsB87uIF.bytes
Converting: 06aLOj8EUXMByS423sum.bytes
Iterating through folder of class: 3_Kelihos_ver3
Converting: 04BfoQRA6XEshiNuI7pF.bytes
Converting: 04cvLCVPqBMs6yn5xGlE.bytes
Converting: 04QzZ3DVdPsEp9elLR65.bytes
Converting: 04sJnMaORYc1SV5pKjrP.bytes
Converting: 06arUi9q3wHS2C8RZxeB.bytes
Converting: 06KfrF7ltESna2ZHPVp5.bytes
Converting: 06osXqPUVM1HbvBGNncT.bytes
Converting: 07nrG1cLKUPxjOlWMFiV.bytes
Converting: 09bfacpUzuBN5W3S8KTo.bytes
Converting: 0aSTGBVRXeJhx5OcpsgC.bytes
Iterating through folder of class: 4_Vundo
Converting: 0qPGt4cRVk9NoiJgubf2.bytes
Converting: 1bL4yiwCUvSOg7tBJudf.bytes
Converting: 1eOaAY4fpV38LIdhxl95.bytes
Converting: 1FacC02JPfxSdXeD7MEw.bytes
Converting: 1gx83bLB4PSsYIKCTlZt.bytes
Converting: 1PQFYMSBLAO9TmKk2Zhj.bytes
Converting: 1S9ui2XqltCJAOGUPw7v.bytes
Converting: 1yC7BzWHgtI2FibhQ0km.bytes
Converting: 2CfJMa5HIn6D1d9EXbpe.bytes
Converting: 2g4C03AeqoZR6ctiF1Qr.bytes
Iterating through folder of class: 5_Simda
Converting: 0qjuDC7Rhx9rHkLlItAp.bytes
Converting: 1IpWLz6eyhVxDAfQMKEd.bytes
Converting: 1KB3Z7gd5aN4Xmx8W0sf.bytes
Converting: 2aHfrLhcPTj5GnFZXUCN.bytes
Converting: 2pwjzv6eGEb8QmHPfxSc.bytes
Converting: 2qAtoGOuMQZdmH3y7bEY.bytes
Converting: 3m8kb5ILPrHcMC1o9Nht.bytes
Converting: 3zZpqyclD9B2v5Qas18m.bytes
Converting: 40KRbGeQZ8PwcUgt5joa.bytes
Converting: 4UTMdcZkxzLvwygO8EuK.bytes
Iterating through folder of class: 6_Tracur
Converting: 02IOCvYEy8mjiuAQHax3.bytes
Converting: 02mlBLHZTDFXGa7Nt6cr.bytes
Converting: 03nJaQV6K2ObICUmyWoR.bytes
Converting: 05LHG8fR3iPn6agIo9z7.bytes
Converting: 08BX5Slp2I1FraZWbc6j.bytes
Converting: 09CPNMYyQjSguFrE8UOf.bytes
Converting: 09sXMJUHwQWVanrhzAoT.bytes
Converting: 0BZQIJak6Pu2tyAXfrzR.bytes
Converting: 0Cq4wfhLrKBJiut1lYAZ.bytes
Converting: 0df4cbsTBCn1VGW8lQRv.bytes
Iterating through folder of class: 7_Kelihos_ver1

24
Converting: 09LXtWxm1EbK5uVqcQS3.bytes
Converting: 0ACDbR5M3ZhBJajygTuf.bytes
Converting: 0b5LqcWix3J4fGIEhXQu.bytes
Converting: 0BIdbVDEgmPwjYF4xzir.bytes
Converting: 0eN9lyQfwmTVk7C2ZoYp.bytes
Converting: 0hZEqJ5eMVjU21HAG7Ii.bytes
Converting: 0KgE6ksUeytoHfl2cT4r.bytes
Converting: 0LAXajqhQy7po16dw8Tx.bytes
Converting: 0M7aSiE9csDzkmfKheVt.bytes
Converting: 0PlfqyKM1JtYZx2me5FU.bytes
Iterating through folder of class: 8_Obfuscator.ACY
Converting: 01SuzwMJEIXsK7A8dQbl.bytes
Converting: 04hSzLv5s2TDYPlcgpHB.bytes
Converting: 0aKlH1MRxLmv34QGhEJP.bytes
Converting: 0aVNj3qFgEZI6Akf4Kuv.bytes
Converting: 0aVxkvmflEizUBG2rMT4.bytes
Converting: 0BFIPv1rO83whtpMYyAs.bytes
Converting: 0BY2iPso3bEmudlUzpfq.bytes
Converting: 0C4aVbN58O1nAigFJt9z.bytes
Converting: 0cTu2bkefOAJqIhYUWFK.bytes
Converting: 0fhnXI9ESr4jgWmkiaTe.bytes
Iterating through folder of class: 9_Gatak
Converting: 01azqd4InC7m9JpocGv5.bytes
Converting: 01jsnpXSAlgw6aPeDxrU.bytes
Converting: 04mcPSei852tgIKUwTJr.bytes
Converting: 07ECKjDTyQLnabNoxrIH.bytes
Converting: 0AV6MPlrTWG4fYI7NBtQ.bytes
Converting: 0bjN3Kgw5OATSreRmEdi.bytes
Converting: 0co46B8IkPt2UN3HSaw7.bytes
Converting: 0CPaAXtyswrBq83D6VEg.bytes
Converting: 0dauMIK4ATfybzqUgNLc.bytes
Converting: 0dhL8Jvcswa7U1qHiDS5.bytes

Generating normalized tensor image data using Keras

[5]: from keras.preprocessing.image import ImageDataGenerator

#Generating DataSet
the_data = ImageDataGenerator().flow_from_directory(directory=root,␣
‹→target_size=(512,512), batch_size=100)

imgs, labels = next(the_data)

Found 90 images belonging to 9 classes.

Train-Test Split
[6]: from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(imgs/255.,labels,␣
‹→test_size=0.3)

25
[7]: print(f'Shape of the tuple holding image data: {imgs.shape}')
print(f'Shape of the tuple holding labels of the image data: {labels.shape}')

Shape of the tuple holding image data: (90, 512, 512, 3)

Shape of the tuple holding labels of the image data: (90, 9)

Plotting randomly selected PNG images from the generated dataset

[8]: images = imgs

[9]: if type(images[0]) is np.ndarray:

images = np.array(images).astype(np.uint8)

[10]: fig = plt.figure(figsize=(14, 14))

rows = 3
columns = 3
for i in range(0,9):
fig.add_subplot(rows, columns, i+1)
plt.axis('off')
k = random.randint(0,89)
plt.title(list(the_data.class_indices.keys())[np.argmax(labels[k])])
plt.imshow(images[k])

26
27
MALWARE DETECTION USING MACHINE LEARNING
ORIGINALITY REPORT

9 %
SIMILARITY INDEX
6%
INTERNET SOURCES
2%
PUBLICATIONS
6%
STUDENT PAPERS

PRIMARY SOURCES

1
www.coursehero.com
Internet Source 3%
2
Submitted to DIT university
Student Paper 2%
3
Submitted to South Bank University
Student Paper 1%
4
Submitted to The Robert Gordon University
Student Paper 1%
5
scholarworks.sjsu.edu
Internet Source <1 %
6
link.springer.com
Internet Source <1 %
7
speakerdeck.com
Internet Source <1 %
8
Yixuan Ma, Shuang Liu, Jiajun Jiang, Guanhong
Chen, Keqiu Li. "A comprehensive study on
<1 %
learning-based PE malware family
classiﬁcation methods", Proceedings of the
29th ACM Joint Meeting on European
Software Engineering Conference and
Symposium on the Foundations of Software
Engineering, 2021
Publication

9
Submitted to Metropolitan Community
College
<1 %
Student Paper

10
ai.sophos.com
Internet Source <1 %
11
arun-aiml.blogspot.com
Internet Source <1 %
12
repositori.udl.cat
Internet Source <1 %
13
"Big Data Analytics", Springer Science and
Business Media LLC, 2018
<1 %
Publication

14
Sumit S. Lad, Amol C. Adamuthe. "Malware
Classiﬁcation with Improved Convolutional
<1 %
Neural Network Model", International Journal
of Computer Network and Information
Security, 2020
Publication

15
journals.riverpublishers.com
Internet Source <1 %
16
www.hindawi.com
Internet Source <1 %
Exclude quotes On Exclude matches Oﬀ
Exclude bibliography On

MD102
No ratings yet
MD102
312 pages
Krypto - Welcome & Documentation PDF
No ratings yet
Krypto - Welcome & Documentation PDF
1 page
FinalStudyGuide Imgs
No ratings yet
FinalStudyGuide Imgs
8 pages
Changes in Supply Chain Management
No ratings yet
Changes in Supply Chain Management
23 pages
Security Plus 601 ObjectivesMap
No ratings yet
Security Plus 601 ObjectivesMap
20 pages
Question Bank-DSA Using Python (Unit-I & Unit-II)
No ratings yet
Question Bank-DSA Using Python (Unit-I & Unit-II)
4 pages
Major Project Synopsis Format
No ratings yet
Major Project Synopsis Format
20 pages
Mahdi Al Sahily M811 A Activity 2
No ratings yet
Mahdi Al Sahily M811 A Activity 2
32 pages
07 - Ai-900 71-90
No ratings yet
07 - Ai-900 71-90
6 pages
Observeit vs. Interguard: A Competitve Comparison: Supported Platforms
No ratings yet
Observeit vs. Interguard: A Competitve Comparison: Supported Platforms
2 pages
Innovation at Wipro Limited - Case Study
100% (1)
Innovation at Wipro Limited - Case Study
3 pages
CIS - Automation and Analytics - Curridulum Details
No ratings yet
CIS - Automation and Analytics - Curridulum Details
28 pages
Alex Van Ginkel Final Thesis MSC - 27nov
100% (1)
Alex Van Ginkel Final Thesis MSC - 27nov
48 pages
Cyber Forensic Report Data Recovery Module
No ratings yet
Cyber Forensic Report Data Recovery Module
54 pages
Cyber Security Curriculum Quantum Learnings 2020
No ratings yet
Cyber Security Curriculum Quantum Learnings 2020
2 pages
PHP Arrays
No ratings yet
PHP Arrays
18 pages
DEA-1TT4.30q: Number: DEA-1TT4 Passing Score: 800 Time Limit: 120 Min
No ratings yet
DEA-1TT4.30q: Number: DEA-1TT4 Passing Score: 800 Time Limit: 120 Min
16 pages
Security Issues in Service Delivery Models of Cloud Computing
No ratings yet
Security Issues in Service Delivery Models of Cloud Computing
10 pages
Lecture Notes: ON Network Security
No ratings yet
Lecture Notes: ON Network Security
184 pages
Spam Review Detection Using Machine Learning and Deep Learning Classifiers
No ratings yet
Spam Review Detection Using Machine Learning and Deep Learning Classifiers
46 pages
For More Information Please Contact Our Whatsupp +21655255099
No ratings yet
For More Information Please Contact Our Whatsupp +21655255099
28 pages
Introduction To Post-Quantum Cryptography and Learning With Errors
No ratings yet
Introduction To Post-Quantum Cryptography and Learning With Errors
106 pages
Mrs ashwiniShethResearchPaperonCyberSecurity
No ratings yet
Mrs ashwiniShethResearchPaperonCyberSecurity
7 pages
FW5530 19.0v1 Creating Hotspots On Sophos Firewall PDF
No ratings yet
FW5530 19.0v1 Creating Hotspots On Sophos Firewall PDF
15 pages
Computer Networks & Diagrams
No ratings yet
Computer Networks & Diagrams
42 pages
Securing Secrets at Scale Ebook FINAL
No ratings yet
Securing Secrets at Scale Ebook FINAL
17 pages
PAM in Most Reasonable Price in India.
No ratings yet
PAM in Most Reasonable Price in India.
26 pages
All in One Interview Questions For All Graduates
No ratings yet
All in One Interview Questions For All Graduates
61 pages
Cyber Security CIT
No ratings yet
Cyber Security CIT
70 pages
Security in Computing - Chapter 2 Notes
100% (1)
Security in Computing - Chapter 2 Notes
14 pages
Csam Qsc2021 Slides
No ratings yet
Csam Qsc2021 Slides
123 pages
Computer Network Forensics Course Outline
No ratings yet
Computer Network Forensics Course Outline
5 pages
Short Penetration Test Report
No ratings yet
Short Penetration Test Report
7 pages
PdfScanner 1664335281137
No ratings yet
PdfScanner 1664335281137
52 pages
IT Infrastructure and Emerging Tech
No ratings yet
IT Infrastructure and Emerging Tech
58 pages
Installation - Kali Linux Documentation
No ratings yet
Installation - Kali Linux Documentation
2 pages
IFS-annual Report 2022 Final 23-05-2023
No ratings yet
IFS-annual Report 2022 Final 23-05-2023
18 pages
SLA Baseline PDF
No ratings yet
SLA Baseline PDF
57 pages
Master File
No ratings yet
Master File
65 pages
Cyber Security Mind Mapping Assignment
No ratings yet
Cyber Security Mind Mapping Assignment
1 page
13-03-11 Huawei and ZTE Reply Brief in InterDigital Case
No ratings yet
13-03-11 Huawei and ZTE Reply Brief in InterDigital Case
22 pages
Ch1.1 (Lecture 1-4)
No ratings yet
Ch1.1 (Lecture 1-4)
41 pages
Server Roles and Features
No ratings yet
Server Roles and Features
16 pages
NigussieBerhanu 2017
No ratings yet
NigussieBerhanu 2017
100 pages
E - Commerce MCQ
No ratings yet
E - Commerce MCQ
3 pages
chapter -5
No ratings yet
chapter -5
32 pages
User Guide: Linux Mint
No ratings yet
User Guide: Linux Mint
26 pages
The Cybersecurity Landscape Is Vast and Dynamic. We Have Vigilantly Covered The Sector For Over Two Decades
No ratings yet
The Cybersecurity Landscape Is Vast and Dynamic. We Have Vigilantly Covered The Sector For Over Two Decades
2 pages
FOP B9PrivacyandDataSecurity
No ratings yet
FOP B9PrivacyandDataSecurity
37 pages
IIT JEE Roadmap
No ratings yet
IIT JEE Roadmap
3 pages
Me Internship Certificate(s)
No ratings yet
Me Internship Certificate(s)
27 pages
Public Key Infrastructure
No ratings yet
Public Key Infrastructure
4 pages
Results of Competition: Cyber Security Academic Start-Up Accelerator Programme Year 3 Phase 1 Competition Code: 1902 - FS - DCMS - CYBERASAP - P1
No ratings yet
Results of Competition: Cyber Security Academic Start-Up Accelerator Programme Year 3 Phase 1 Competition Code: 1902 - FS - DCMS - CYBERASAP - P1
52 pages
ITM Notes
100% (1)
ITM Notes
20 pages
IMDA IoT Cyber Security Guide
No ratings yet
IMDA IoT Cyber Security Guide
19 pages
CA Access Gateway: at A Glance
No ratings yet
CA Access Gateway: at A Glance
2 pages
An Empirical Study On Cyber Security Threats and Attacks: R. Sri Devi
No ratings yet
An Empirical Study On Cyber Security Threats and Attacks: R. Sri Devi
6 pages
Sad Lab All Expt
No ratings yet
Sad Lab All Expt
53 pages
55 Machine Learning Engineer Questions To Find The Perfect Candidate
100% (1)
55 Machine Learning Engineer Questions To Find The Perfect Candidate
14 pages
MUSHKAN REPORT
No ratings yet
MUSHKAN REPORT
67 pages
15709-Article Text-55876-2-10-20220114
No ratings yet
15709-Article Text-55876-2-10-20220114
26 pages
Module 2 Iccs
No ratings yet
Module 2 Iccs
80 pages
Prisma Access Ds
No ratings yet
Prisma Access Ds
7 pages
Cloud Security Unit 2
No ratings yet
Cloud Security Unit 2
10 pages
OneSpan Datasheet App Shielding Runtime Protection
No ratings yet
OneSpan Datasheet App Shielding Runtime Protection
3 pages
Wireshark Network Forensic Analysis Tutorial
No ratings yet
Wireshark Network Forensic Analysis Tutorial
36 pages
The 1St Drone Congress in Helsinki: Special Edition
100% (1)
The 1St Drone Congress in Helsinki: Special Edition
35 pages
Bypassing_AV_EDR_part-2_118
No ratings yet
Bypassing_AV_EDR_part-2_118
13 pages
Building and Securing The Modern Security Operations Center (SOC)
No ratings yet
Building and Securing The Modern Security Operations Center (SOC)
15 pages
Using Information Technology 10th Edition Williams Test Bank Download
100% (22)
Using Information Technology 10th Edition Williams Test Bank Download
83 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Auditing in CIS Environment - Auditing Operating Systems and Networks (Final)
100% (2)
Auditing in CIS Environment - Auditing Operating Systems and Networks (Final)
44 pages
DODD 5240.06 Counterintelligence Awareness and Reporting (CIAR)
No ratings yet
DODD 5240.06 Counterintelligence Awareness and Reporting (CIAR)
19 pages
Hexco-computer Skills Notes
No ratings yet
Hexco-computer Skills Notes
34 pages
The Invincibility Paradox
No ratings yet
The Invincibility Paradox
20 pages
Ans Key Empowerment Technology 1 For HUMSS STEM
No ratings yet
Ans Key Empowerment Technology 1 For HUMSS STEM
31 pages
Vulnerability Assessment and Penetration Testing
No ratings yet
Vulnerability Assessment and Penetration Testing
48 pages
Decrypting SSL/TLS Traffic For Hidden Threats Detection: Conference Paper
No ratings yet
Decrypting SSL/TLS Traffic For Hidden Threats Detection: Conference Paper
5 pages
CybersecuritySimplified Ransomware
No ratings yet
CybersecuritySimplified Ransomware
1 page
CIPT Updated Dumps - Certified Information Privacy Technologist
No ratings yet
CIPT Updated Dumps - Certified Information Privacy Technologist
20 pages
GWTN141-1 - User Manual
No ratings yet
GWTN141-1 - User Manual
16 pages
DIGITAL_FORENSICS_PROCESS_with_14_Principles_of_Management_1741413156
No ratings yet
DIGITAL_FORENSICS_PROCESS_with_14_Principles_of_Management_1741413156
8 pages
Securing The Software Supply Chain: A New Taxonomy For Attack Classification
No ratings yet
Securing The Software Supply Chain: A New Taxonomy For Attack Classification
8 pages
Overview of Tools Used by Security Operation Center (Soc) by Mohammed Alsubayt
100% (1)
Overview of Tools Used by Security Operation Center (Soc) by Mohammed Alsubayt
11 pages
Skills Catalog Infosec Skills Course Catalog
100% (1)
Skills Catalog Infosec Skills Course Catalog
376 pages
CNS Unit V
No ratings yet
CNS Unit V
82 pages
(Ebook) Practical Paranoia Android 5 Security Essentials by Marc L. Mintz ISBN 9781519333940, 1519333943download
100% (4)
(Ebook) Practical Paranoia Android 5 Security Essentials by Marc L. Mintz ISBN 9781519333940, 1519333943download
48 pages
A Critical Investigation of The Causes of Students' Indulgence in Internet Fraud
100% (2)
A Critical Investigation of The Causes of Students' Indulgence in Internet Fraud
56 pages
Inav
No ratings yet
Inav
5 pages
Typology of Cyber Crime
No ratings yet
Typology of Cyber Crime
28 pages