Detecting Malware Using Deep Learning Mo
Detecting Malware Using Deep Learning Mo
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page3162
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
execution, typically within a contained environment such format is employed for EXE, DLL, SYS (device driver), and
as a sandbox or virtual machine. There are 4 features that other file types. The Extensible Firmware Interface (EFI)
can be seen frequently being extracted from Windows
specification states that PE is that the standard executable
executables within research into automating static
analysis of malware using ML: bytes, opcodes, strings, and format in EFI environment
metadata encoded in PE files.[4]
B. PE header: The following list describes the Microsoft
3. PROPOSED WORK
PE executable format, with the base of the image header
This project aimed to detect ransom ware using an at the top. The section from the MS-DOS 2.0 Compatible
ensemble of ML (Machine Learning) classifiers, classifying EXE Header through to the unused section just before the
Windows executable files as belonging to 1 of 3 possible PE header is that the MS-DOS 2.0 Section, and is
classes: benign, malware, or ransom ware. The hope is employed for MS-DOS compatibility only.
that by releasing source code to the public, the model can
be used in practice to identify ransom ware on a user’s ● MS-DOS 2.0 Compatible EXE Header
machine, or can be integrated into an existing AV (Anti-
● unused
Virus) system to facilitate or improve its ability to
distinguish between ransom ware and other types of ● OEM Identifier
malware.
● OEM Information
3.1 System Architecture
● Offset to PE Header
The system architecture is given in Figure 1. Each block is
described in this Section. ● PE Header (aligned on 8-byte boundary)
● Section Headers
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page3163
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
given more traction to the event of deep learning Processor Pentium IV & above
techniques. Such techniques are supported
mathematical functions and parameters for achieving HDD 250 GB
the specified output.
RAM 1 GB
D. Classification results: The deciliter ensemble
model developed during this project automates static
analysis and classification of Windows feasible files as 5. IMPLEMENTATION
either benign, malware, or ransom ware, with a high Obtain samples of ransom ware, other malware, and
degree of accuracy. The model is incontestable to benign Windows Portable Executable (PE) files. Labeled
accurately classify 3000 files unseen in its coaching, by multiple antiviruses. Gathering at least 10,000
suggesting generalizability to sleuthing malware and samples of each class, though more will be required if
ransom ware within the this volume proves too small to train an accurate
machine learning (ML) model. Extract UTF-
wild. For end-users of an easy terminal program 8 strings encoded in the binary contents of the sample
developed for ransom ware detection as a part of this files and build a numerical model of the strings extracted
project, it's vital to the safety of their or their from all training samples. The same process will be
applied to the opposes of assembly code disassembled
organizations devices that the excellence between
from the binary contents of the executable files, as well
benign and malicious files is correct, and then this was as imported library names and exported function /
tested to be ninety eight correct on the set of 3000 library names extracted from the import and export
unseen samples. Jewish calendar month developers and sections of the PE files. Train a variety of machine
security researchers will use, and re-use the ensemble learning classification models on a subset of the sample
model in their own systems, and retrain the models data and compare their accuracy in labeling another
with a bigger dataset and newer samples to take care of subset of the sample data that the models are not shown
in training, finding the most accurate models for
its ability to classify in-the-wild executable.
classifying the unseen sample data. Implement the
machine learning classification model in a user- friendly,
4. REQUIREMENT ANALYSIS
GUI featured, compiled executable program that allows
The implementation detail is given in this section. users to select and analyses multiple files. This program
will report back the model’s estimated probabilities for
4.1 Software the files’ being ransom ware; other forms of malware, or
The minimal software requirement is given in Table benign software, and will allow the user to delete these
1. Table 1 minimal Software requirement files and report them to a popular public malware
repository. Evaluate the accuracy, usability, and source
Operating System Linux/Windows disk encryption.applicability of the solution to real-
Programming Python world problems, and compare it to similar existing
Language solutions.
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page3164
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
been designed and implemented that achieved high [8] SM Reasor, AJ Newman, RA Franczyk, J Garms,
validation accuracy, the results of their performance in 2010 driven Malware Detector(2010).
classifying unseen test data were surprisingly
impressive, at 96% test accuracy for the ensemble model, [9] Shabtai et al, Uri Kapoor, Yuval Elovici, 2009, A
considering their ability to compete with contemporary Behavioral Malware Detection Framework for Android
research in the field which is often performed by teams Devices(2009).
of academics. And the test accuracy was even more
impressive when the ensemble model was adapted to [10] Nwokedi Idika, Aditya Mathur 2007, Malware
perform binary classification of executable as either Detection Technique(2007).
benign or malicious, calculated at 98%, which is the most
critical distinction for end-users of the model who are
using it to decide whether or not a Windows program is
safe to execute.
6. CONCLUSION
So we are implementing the deep learning model which
will detect if there is malware in selected file.
For that we have to download an .exe file or we can try
with existing file also by uploading it.
We have used a deep learning model to verify among the
.exe files.
Convolution neural network is one of the deep learning
model we have used.
7. REFERENCES
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page3165