Matter Cti

Machine learning based Cyber Threat Intelligence 2018-2019
CHAPTER 1
INTRODUCTION
During the recent years, there has been a significant increase in the number and
variety of cyber attacks which make it extremely difficult for security analysts and forensic
investigators to detect and defend against such security attacks. In order to cope with this
problem, researchers introduced the notion of “Cyber Threat Intelligence”. Cyber Threat
Intelligence (CTI) emerged in order to help security practitioners in recognizing the
indicators of cyber attacks, extracting information about the attack methods, and
consequently responding to the attack accurately and in a timely manner. Cyber Threat
Intelligence is defined as the ‘set of data collected, assessed and applied regarding security
threats, threat actors, exploits, malware, vulnerabilities, compromise indicators and
development of capabilities to help identify attribution sources and take appropriate forms of
protection and counter-action.
There is an increase in similar cyber incidents which use the same IP, domain, and
malicious code. To understand the correlation between cyber attacks and to respond
promptly, it is necessary to collect the related data concerning the procedures and techniques
of cyber attacks with the aid of cyber threat information collection system.
There is simply too much data at play and coming from multiple sources in a way that
human detection of cyber security events would be extremely difficult, if not impossible. A
big challenge in collecting and analyzing intelligence has always been scalability. Good,
actionable intelligence takes expertise to develop. One possible solution to this problem of
scalability and expertise is to use machine learning in cyber threat intelligence. Usage of
machine learning algorithms for cyber attack prediction is very helpful.
A typical enterprise network with thousands of IT systems generates billions of

security events per day. Generally, only a subset of these security events is ingested by
SIEMs to run through threat correlation rules. Organizations are now aiming to ingest as
much security events as possible to meet regulatory requirements and to enhance
investigative visibility, conduct Threats Hunting (TH) and expedite the Incident Response
(IR) process through live and historical analysis of security events. This brings a dire need for
an efficient machine learning based cyber threat detection model.
DEPT. OF CSE, NIEIT Page 1

More and more users have accepted the concept of threat intelligence and are trying to
use threat intelligence in routine security protection. Selection of appropriate threat
intelligence vendors and services has become a crucial issue. There is a need for method to
scientifically and objectively evaluate threat intelligence services of vendors that is
significant for users. So the quality evaluation system in user perspective is used widely.
A number of machine learning based cyber threat intelligence tools are available
which are greatly contributing for the prevention and identification of cyber attacks well
before. They are also throwing light on recovery of data that is being corrupted during an
attack.
1.1 Aim
Performing a detailed analysis of machine learning based cyber threat intelligence.

Understanding the working of cyber threat information collection system, application of
various machine learning algorithms in cyber security, analysing the quality evaluation
method for cyber threat intelligence in user perspective and seeing into the working of few
real time applications of machine learning based cyber threat intelligence.
1.2 Purpose
In the era of digital information technology and connected devices, the most challenging
issue is ensuring the security and privacy of the individuals’ and organizations’ data. During
the recent years, there has been a significant increase in the number and variety of cyber
attacks and malware samples which make it extremely difficult for security analysts and
forensic investigators to detect and defend against such security attacks. Cyber threat
intelligence can resolve this problem to a good extent which is basically an intelligence that
contains enormous amount of data related to cyber threats and attacks. Timely management
of such huge amount of data is possible with the aid of machine learning algorithms and
techniques. This is the reason behind the inception of machine learning based cyber threat
intelligence.
1.3 Scope
Threat intelligence helps accelerate threat detection, prioritization and incident response
capabilities. Cyber threat intelligence can help you maintain visibility of landscape so that

your security infrastructure is able to respond to the latest threats, in real-time. This includes
detecting malicious activity already inside your network, analyzing it and helping your
security team understand the attackers’ objectives.
1.4 Overview
In recent years, cyber threat intelligence has become an important supporting pillar in a
mature cyber security strategy. When applied well, threat intelligence can help security teams
defend against an ever-more sophisticated threat landscape before, during and after attack. By
studying adversaries and understanding their strategies and objectives, organizations can
build more effective, more refined and more robust cyber defences. Machine learning based
cyber threat intelligence helps in sorting through false positives and the reams of data that we
collect, combined with ever-more sophisticated TTPs employed by cybercriminals. To
understand the correlation between cyber attacks and to respond promptly, it is necessary to
collect the related data which helps in constructing very efficient cyber threat intelligence. A
big challenge in collecting and analyzing intelligence has always been scalability. One
possible solution to this problem of scalability and expertise is to use machine learning in
cyber threat intelligence. Selection of appropriate threat intelligence vendors and services has
become a crucial issue and to meet this challenge there is requirement of comprehensive
quality evaluation system that helps us in recognizing the appropriate cyber threat
intelligence vendor. A number of machine learning based cyber threat intelligence tools are
available which are greatly contributing for the prevention and identification of cyber attacks
well before. We shall see to such tools and understand the machine learning algorithms
working for the functioning of the tool.

CHAPTER 2
LITERATURE SURVEY
A literature survey or a literature review in a project report is that section which shows
the various analyses and research made in the field of interest and the results already
published, taking into account the various parameters of the project and the extent of the
project. It is the most important part of report as it gives a direction in the area of our
research. It helps to set a goal for the analysis - thus giving the problem statement.
2.1 Survey papers
A literature review or narrative review is a type of review article. A literature review

is a scholarly paper, which includes the current knowledge including substantive findings, as
well as theoretical and methodological contributions to a particular topic. Literature reviews
are secondary sources, and do not report new or original experimental work. The IEEE papers
and other papers used in the seminar are as follows.
[1] Design of a Cyber Threat Information Collection System for Cyber Attack
Correlation
It was published by Nakhyun Kim, Seulgi Lee, Hyeisun Cho, Byun-ik Kim and
MoonSeog Jun in 2018 International Conference on Platform Technology and Service
(PlatCon).
Nowadays, the number of cyber threats is increasing continuously, and attack techniques
are becoming increasingly advanced and intelligent. One important thing that should be noted
with regard to this situation is the marked increase in similar cyber incidents which use the
same IP, domain, and malicious code for one cyber attack. Therefore, it is essential to
understand the correlation between cyber attacks that occur due to the re-use of the same
attack infrastructure (IP, domain, malicious code, etc.) for different cyber attacks, in order to
detect and respond promptly to similar cyber attacks. To understand the correlation between
cyber attacks, it is necessary to collect the related data concerning the procedures and
techniques of cyber attacks. This paper proposes the design details of the cyber threat
information collection system according to such needs. The proposed system performs the
function of collecting the attack infrastructure data (IoCs) exploited for the cyber attack from

various open data sources (OSINT, Open Source INTelligence), and uses the collected data
as an input value to collect more data recursively. The relationship of the collected data can
also be collected, saved, and managed, so that the data can be used to analyze the collection
of cyber attacks. The proposed system has used a virtualization structure and distributed
processing technology to collect data stably from various collection channels.
[2] Optimal Machine Learning Algorithms for Cyber Threat Detection
It was published by Hafiz M. Farooq, Naif M. Otaibi in 2018 UKSim-AMSS 20th

International Conference on Modelling & Simulation.
With the exponential hike in cyber threats, organizations are now striving for better
data mining techniques in order to analyze security logs received from their IT infrastructures
to ensure effective and automated cyber threat detection. Machine Learning (ML) based
analytics for security machine data is the next emerging trend in cyber security, aimed at
mining security data to uncover advanced targeted cyber threats actors and minimizing the
operational overheads of maintaining static correlation rules. However, selection of optimal
machine learning algorithm for security log analytics still remains an impeding factor against
the success of data science in cyber security due to the risk of large number of false-positive
detections, especially in the case of large-scale or global Security Operations Center (SOC)
environments. This fact brings a dire need for an efficient machine learning based cyber
threat detection model, capable of minimizing the false detection rates. In this paper, they
have proposed optimal machine learning algorithms with their implementation framework
based on analytical and empirical evaluations of gathered results, while using various
predictions, classification and forecasting algorithms.
[3] A Quality Evaluation Method of Cyber Threat Intelligence in User Perspective
It was published by Majethia Li Qiang, Jiang Zhengwei, Yang Zeming*, Liu Baoxu,
Wang Xin, Zhang Yunan in 2018 17th IEEE International Conference On Trust, Security
And Privacy In Computing And Communications/ 12th IEEE International Conference On
Big Data Science And Engineering.
With the widely use of cyber threat intelligence, the influence of security threats and
cyber attacks have been relieved and controlled in a degree. More and more users have
accepted the conception of threat intelligence and are trying to use threat intelligence in
routine security protection. Then, how to choose appropriate threat intelligence vendors and
services has become a crucial issue. The present research of threat intelligence evaluation is
mainly focused on one-sided threat intelligence contents and approaches, which was lack of
comprehensiveness and effectiveness. Aiming at this situation, they have proposed the
comprehensive evaluation architecture of threat intelligence in user perspective to evaluate
threat intelligence services in several dimensions with quantitative index system. They also
carried out typical experiments for threat intelligence data feeds and comprehensive situation
to verify the feasibility of proposed method. The results show that the proposed evaluation
method has a clear advantage in coverage and partition degree.

CHAPTER 3
SYSTEM ARCHITECTURE
A system architecture or systems architecture is the conceptual model that defines the
structure, behaviour, and more views of a system. A system architecture can consist of
system components and the sub-systems developed, that will work together to implement the
overall system. The purpose of system architecture activities is to define a comprehensive
solution based on principles, concepts, and properties logically related and consistent with
each other.
3.1 Flow Diagram of Cyber Threat Information Collection System
The cyber threat information collection system was designed and developed to collect
data for cyber threat intelligence analysis, and to collect, save and manage large quantities of
collected data by developing a flexible and scalable platform. Fig. 3.1 represents the flow
diagram of cyber threat information collection system.
Fig. 3.1 Flow diagram of cyber threat information collection system
It is composed of the Total Management Server (TMS) and an integrated collection

agent. A server/client type structure was applied to the distributed processing technology as
explained above. The user performs the basic environment setting needed for server operation
in the total management server. Based on these settings, the server has a work queue that
collects the information related to the actual cyber attack.

3.2 Machine learning analytical workflow
The workflow in fig 3.2 represents how Machine Learning can be incorporated in
Cyber threat intelligence. Machine Learning based security analytics is performed through an
Optimal Workflow Machine Learning Analytical Workflow (MLAW) in order to ensure
efficient pre-processing of the data before applying a nicely trained machine learning
predictor or classifier for subsequent analytics. Such a workflow can help in addressing all
challenges by reducing the huge volume of security events to a few outlier events and
providing security analysts with potential indicators of malicious activities to feed into cyber
threats detection and hunting processes.
Fig. 3.2 Machine Learning Analytical Workflow

In this workflow, the real time data is sent as offline training data for which machine
learning algorithms are applied. A machine learning data model is built after the application
of algorithms and training the model with real time data. The model is sent to the machine
learning processor which will send the model to relevant alert engine or decision engine. In
the mean time, the same real time data is also fed to the pre-processor for carrying out the
pre-processing and later to feature extractor. From there the data is also sent to the machine
learning processor.
3.3 Comprehensive evaluation architecture of cyber threat intelligence
The Comprehensive Evaluation Architecture of Threat Intelligence and Quantization

Method will show the overview of threat intelligence evaluation in user perspective and
detailed process. Fig 3.3 represents the evaluation architecture of cyber threat intelligence.

Fig. 3.3 Comprehensive evaluation architecture of cyber threat intelligence
Categories point that threat intelligence can provide as three classes: strategic,
tactical and operational. Functions mean that the threat intelligence can be used in different
security businesses, including early warming, process detection and afterward response.
Properties refer to the characteristics utilized in quality evaluation. Testing Methods show
the measuring approaches used in threat intelligence evaluation. Items mean the testing
content of evaluation in user perspective.

CHAPTER 4
METHODOLOGY
Methodology is the systematic, theoretical analysis of the methods applied to a field

of study. It comprises the theoretical analysis of the body of methods and principles
associated with a branch of knowledge. Typically, it encompasses concepts such as paradigm,
theoretical model, phases and quantitative or qualitative techniques.
4.1 Cyber Threat Intelligence
Nowadays, the number of cyber threats is increasing continuously, and attack techniques
are becoming increasingly advanced and intelligent.
4.1.1 Definition
Cyber Threat Intelligence is defined as the
 The set of data collected, assessed and applied regarding security threats, threat
actors, exploits, malware, vulnerabilities and compromise indicators.
 Development of capabilities to help identify attribution sources and take
appropriate forms of protection and counter-action.
4.1.2 Need for cyber threat intelligence
 During the recent years, there has been a significant increase in the number and
variety of cyber attacks which make it extremely difficult for security analysts
and forensic investigators to detect and defend against such security attacks in
almost real-time.
 Cyber Threat Intelligence (CTI) emerged in order to help security practitioners in
recognizing the indicators of cyber attacks, extracting information about the
attack methods, and consequently responding to the attack accurately and in a
timely manner.
4.1.3 Challenges of cyber threat intelligence
 Attack Vector Reconnaissance
• Recognizing the point of attacks and the system vulnerabilities that could
be exploited by the cybercriminals.
• Advancements in attack methods make the recognition of the attacker and
attack’s point of arrival an extremely challenging issue.

 Attack Indicator Reconnaissance

• Recognizing the arrival of a cyber threat or attack.
• Usage of advanced anti-forensics and evasion methods by
cybercriminals in their malicious code has made the usual security
assessment techniques analysis less efficient.
4.2 Cyber threat information collection system
There is an increase in similar cyber incidents which use the same IP, domain, and
malicious code .To understand the correlation between cyber attacks and to respond
promptly, it is necessary to collect the related data concerning the procedures and
techniques of cyber attacks.
Table 4.1 Similar cyber attack cases

4.2.1 System architecture
The cyber threat information collection system was designed and developed to collect
data for cyber threat intelligence analysis. The proposed system is composed of the Total
Management Server (TMS) and an integrated collection agent.
A. Total Management Server (TMS)

The major functions of the total management server (TMS) include overall
management of the cyber threat information collection system, client management, and
cyber-attack resources and associated information management. The total management server
is composed of an interface management module, an integrated management module, and a
process management module. The interface management module manages network
communication occurring in the TMS and internal/external interface; the total management
module manages the function of collected resource management, collection channel
management, setting management, and web console support; and the process management
module manages the function of stability and distributed processing of the collection process.
If the TMS specifies the environment setting of the specific collection channel, the
environment setting command in question is transferred to the Total Collecting Agent (TCA),
and the overall history of the job performed by the TCA and collection information are
received and saved.
B. Total Collecting Agent (TCA)

The Total Collecting Agent (TCA) is a client that performs data collection work
according to the TMS command. One TCA is allocated to one virtual machine using the
virtualization structure. The allocated TCA manages a specific collection target channel
according to the TMS command. The allocated TCA can be dynamically increased or
decreased, depending on the quantity of data collected from the collection target channel. The
TCA is composed of an interface management module, a command management module, and
a collection management module. The interface management module manages the interface
of the communication between the TCA and TMS, and internal data; the command
management module interprets commands concerning the channel environment setting,
structures the service of the command in question, and the TMS command that can execute
the service; and the collection module manages the module to run the collection process
actually, and supports service structuralization in the command management module.

 Collection work is divided into direct data collection from the collection channel, and
retrieval work whereby more data are collected using the collected data as an input
value.
 The TCA classifies work into Crawler, API Connect, Direct Input, and Shared
Storage, depending on the type of collection work.
 The TCA performs a parser function such as HTML, JSON, XML, CSV, and TXT,
depending on the format of the collected data.
 The TCA can respond to various collection channels by combining the collection
work type and collected data format.
 The cyber threat information collection system estimates the work quantity of the
TCA by managing the collection process, and decides whether distributed processing
will be performed or not, by comparing the threshold value of the TCA and the
processing speed.
 The resource use status of the TCA and the TMS is monitored to manage and operate
multiple virtual machines allocated to various collection channels.
 Based on the analysis of the collected data types, an integrated data schema is
designed, and the collected data are saved based on the pertinent schema.
 The TMS combines the collected data and saves them in the designated No-SQL
(MongoDB) based database.
C. Web Consol for System Management

Various functions are provided, such as retrieving the status of the collected data on
the web console, allocating a specific collection job to the idle TCA, and monitoring the
resource use of the TMS and TCA. These functions are performed using the data interface
module, template module, channel expansion and deletion module, and agent update module.
The data interface module manages the interface of the pertinent data, so that the information
of the collected information retrieval, change, and addition function, operational status of the
system, and environment setting can be retrieved. The template module normalizes the
HTML elements that should be supported by the web console, so that they can be utilized in
common. The channel expansion and deletion module also manages the function of assigning
a new collection channel to the collection agent and that of deleting a collection channel
when operation of the existing collection channel becomes meaningless. The agent update

module manages the function of distributing the TCA so that the TCA itself can be updated,
if modification is required due to a change in the TCA functions.
4.2.2 Results
A cyber threat information collection system was developed in which twelve types of cyber
threat related information were collected using eight collection channels and four retrieval
channels based on the proposed system. About two million items of cyberattack- related data
were collected over a one-month data collection period.
Fig. 4.1 Web console GUI (Check amount of data collection)
Table 4.2 Collection and inquiry channel list

4.3 Machine learning based cyber threat intelligence
 There is simply too much data about cyber threat at play coming from multiple
sources in a way that human detection of such cyber security events would be
extremely difficult, if not impossible.
 Timely dealing with such a large number of attacks is not possible without deeply
perusing the attack features and taking corresponding defensive actions through
intelligence.
Therefore, organizations have now realized that the traditional monitoring complemented by
an effective and versatile Machine Learning based Threat Hunting will be a necessary part of
any Security Monitoring portfolio.
4.3.1 Data Rate Analytic using Clustering Algorithms
Analyzing internet upload and download traffic is acutely important for initial
detection of cyber attacks. In larger enterprise conventional statistical tools are insufficient to
detect abnormal network sessions due to high volume of network traffic. So, Numerical
Clustering is used to filter normal and abnormal network traffic because of the numerical
nature of the dataset. K-Means can be comprehensively used for clustering enterprise users
based on their download and upload rates.
K-Means analysis explicates the results sought by clustering the users' traffic data from
enterprise internet gateway (firewall) into three clusters (k=3) thus earmarking low, medium
and highly active internet users.
Fig. 4.2 K-Means clustering for upload/download rates

Algorithm:
Ex-filtration quadrant:
Clusters positioned at quadrants-II & IV pose relatively major risk of data ex-filtration
subjected to any suspected actions.
Fig. 4.3 Ex-filtration quadrant
4.3.2 Predicting User Behaviour using Linear Regression
One potential indicator of malicious activity is an abnormal count of process

executions compared to their behavioural baseline, which would indicate abnormal activity of
systems/users. In the experiments, linear regression algorithm worked best to model this
threat case given the simple, numerical and linear dataset. The below model consists of
DayOfTheWeek and UserAccount as input features, and the dependent variable is the number
of process executions for that user in that day of the week (execount). The model was trained
was four months of known good data to envisage all possible known scenarios. A security
analyst can use such a model to spot behavioral anomalies by zooming in on data points
where actual process executions is much larger than predicted values generated by the model,
indicating potential suspicious/malicious activity.

Fig. 4.4 Abnormal process execution detection using linear regression
4.3.3 Anomaly detection in process execution using OCSVM classifier
Finding anomalous Windows Processes in millions of processes executed in a large-

scale Data Centre needs an army of SOC Analysts to investigate. Microsoft SYSMON Tool
is installed locally on multiple enterprise hosts for analyzing process creation logs. One-Class
SVM (OCSVM) algorithm relatively well suited for novelty detection scenario where only
"normal" dataset is available and no outliers are practically known.
Algorithm:
Result:
Fig. 4.5 Classification using OCSVM

4.4 A Quality Evaluation Method of Cyber Threat Intelligence in User

Perspective
Threat intelligence is used to strengthen security protection in several scenes.

However, for the business is different, there are lots of threat intelligence vendors and
services. The current selection mainly relies on vendors’ advertisements and the trial. Most
advertisements declare that the product or service is the best one in threat intelligence market.
Trial of products or services is limited by trial time. So how to scientifically and objectively
evaluate threat intelligence services of vendors is significant for users.
4.4.1 Comprehensive evaluation architecture and approach
A. Principles of Building Evaluation Architecture
Principles of building evaluation architecture include: systematization, situationality,
dynamics, combinability, operability, quantifiability and comparability. Systematization
means that there are corrections and hierarchy among the selected indexes logically.
Situationality points that various situations of utilizing threat intelligence need to be take into
account when designing the index system for different evaluation objects and proposes.
Dynamics refers to that the subsequent possible situation needs to be considered to adjust the
change accordingly. Combinability is defined as the principle that the designed index system
can support quantitative analysis and qualitative analysis at the same time. Operability,
quantifiability and comparability means that the selected indexes can use uniform quantifying
and calculating methods to collect index data stably and the same index of different
evaluation objects can be compared.
B. Comprehensive Quality Evaluation Framework
Fig. 4.6 Comprehensive Quality Evaluation Framework

The Comprehensive Evaluation Architecture of Threat Intelligence and Quantization

Method will show the overview of threat intelligence evaluation in user perspective and
detailed process. Categories point that threat intelligence can provide as three classes:
strategic, tactical and operational. Functions mean that the threat intelligence can be used in
different security businesses, including early warming, process detection and afterward
response. Properties refer to the characteristics utilized in quality evaluation. Testing
Methods show the measuring approaches used in threat intelligence evaluation. Items mean
the testing content of evaluation in user perspective.
C. Quantitative Index System
In order to use the comprehensive evaluation architecture of threat intelligence in
practical work, we design an index system to describe and cover the whole items in different
dimensions and aspects by quantization method. The designed index system includes 5 first
class indexes, 19 second class indexes and more than 50 third class indexes.
Table 4.3 Index System of Quantization Method

D. Quantization and Normalization

Given that the characteristics of evaluated types and items, methods of quantizing
each item are different. Zero-One quantization is used in judging exist or not, such as whether
the vendor has the qualification of secrecy or military. Gradient quantization used in the
situation of great difference among data.
In order to make initial testing results easy to calculate, normalization processing is
necessary. We select three kinds of normalization methods: min-max normalization, z-score
normalization and Sigmoid function. These methods can handle most cases of normalization.
E. Affirmation of Weight
Considering the characteristics of threat intelligence quality evaluation in weight
selection, weights of properties and experts would be assigned based on selected
comprehensive weight determination. At first, to improve the accuracy of properties’ weights,
algorithm of multi-objective optimization would be used. Then, in the phase of adjusting
decision-makers’ weight, the method of determining the weights of the decision makers
would be applied to ensure the maximal discrimination and difference based on the idea of
maximizing deviations.
4.4.2 Experiments
1) Testing Content
Quality evaluation of threat intelligence vendor in user perspective mainly focuses on
price, function, performance &quality, service, reputation &qualification and other content.
Values in score matrix were provided by experts based on basic items test result, quantitative
approaches and experience. The weight of second class index and third class index is
provided by objective evaluation, subjective analysis and optimization. Experts’ weights were
confirmed by method of maximum deviation. Finally, according to expert scoring results,
weight of testing index and experts’ weights, each testing item of each vendor can be
confirmed.
2) Testing Process and Result
Intelligence source and gathering channel includes public information from Internet,
communication with vendors, basic item test methods and quantitative approach. According
to various channels and personal experience, the value of third class index and second class
index in each vendor would be provided by experts. After merging third class index value to
second class index and normalized methods. Scoring matrix would be built.

In the experiment, services of three threat intelligence vendors were evaluated by four
experts. The row of matrix means the evaluation result for each vender. The column of matrix
points the value of evaluation index after standardization.
a) Properties weight determination and optimization

Based on AHP (Analytic Hierarchy Process) and variation coefficient method,
original weight of each index can be confirmed. We use eq. (4.1) to calculate the weights.
Eq. (4.1)
Table 4.4 Part of Properties’ Weight Optimization
b) Expert weight determination

Using the eq. (4.2), we can calculate the comprehensive evaluation value of vendors.
Eq. (4.2)

Table 4.5 Comprehensive Evaluation Value of Vendors

According to the eq. (4.3) below, we can calculate the experts weight value of first class
index.
Eq. (4.3)
Table 4.6 Experts Weight Value of First Class Index
c) Calculating the final result

Based on the results of properties’ weight optimization and expert weight
determination, and evaluation results, we can get the final result of vendors’ first class index.

Table 4.7 Evaluation Result of Threat Intelligence

We used radar map to show the evaluation results of threat intelligence. Fig 4.7 represents the
radar map of evaluation result.
Fig. 4.7 Radar Map of evaluation result

The three service providers 360, ThreatBook and IBM XForce are represented in the
radar map with respect to the values pertaining to five parameters considered in the five
corners of a pentagon. They are
 Price
 Function, performance and data
 Reputation and qualification
 Service
 Other
It is observed from the radar map that 360 is the optimal service provider in comparison with
IBM XForce and ThreatBook because the price parameter is low, function, performance and
data parameter is high, reputation and qualification parameter is very high, service parameter
is moderate and other parameters are high. In this way we can identify the most optimal cyber
threat intelligence service provider among a number of service providers for availing their
facility to any organization.

4.5 Real Time Applications of Machine Learning Based Cyber Threat

Intelligence
4.5.1 G Suite Ransomware Protection Module:
 G Suite ransomware protection module is the creation of Spinbackup which is able to

detect anomalous behaviour that is characteristic with ransomware infection.
 It could be rogue processes changing large numbers of files, irregular file extensions,
or other sort of abnormal behaviour.
 Spinbackup is the most comprehensive SaaS Data Backup & Security solutions
provider for G Suite (An integrated suite of secure, cloud-native collaboration and
productivity apps powered by Google AI).
Fig. 4.8 ML driven custom G Suite security policy to identify ransomware
Fig. 4.9 Automatic ransomware infected files restoration

Spinbackup’s machine learning enabled ransomware protection module allows for:
 Detecting ransomware infections.

 Automatically placing a block on ransomware encryption that is actively damaging
files.
 Accurate identification of files that have been encrypted and automatically restoring
them.
 Versioning files so that previous versions can be restored before ransomware
damaged them.
 Automated administrative security alerts.
4.5.2 Data Leak Detection Module:

 One of the most catastrophic events that can potentially happen to an organization is
to have sensitive data leak to unauthorized individuals or entities.
 Insider threats can pose significant danger to organizations.
 Using Spinbackup’s Domain Audit functionality, organizations can gain tremendous
visibility to employee actions including the suspicious downloading of data locally or
to personal cloud environments and many others.
Fig. 4.10 Getting real-time insights in Domain Audit dashboard from Spinbackup
machine learning

CONCLUSION
Usage of Machine Learning analytics in CTI will enhance cyber security monitoring
along with analysis on optimal algorithms for common cyber threats cases. Machine Learning
analytics are best suited to analyze huge volume of security events and feed deviations from
normal baselines into proactive threat hunting processes as indicators or leads of potential
malicious activity. The machine learning algorithms will provide better results with good
accuracy when the amount of data that is fed to the system is large. The comprehensive
evaluation architecture of threat intelligence in user perspective, which take several
dimensions into account at the same time provide references for users to select appropriate
services in suitable degree. The machine learning based cyber threat intelligence tools
provide good efficiency and they largely contribute to the security of an organization.

FUTURE ENHANCEMENT
 Twelve types of cyber threat related information were collected using eight collection
channels and four retrieval channels based on the proposed system. About two million
items of cyber attack- related data were collected over a one-month data collection
period. As a part of future work, number of collection channels can be increased on a
continuous basis.
 Semi supervised (one-class classification) algorithms like One-Class SVM (OCSVM)
are relatively easier to train, more cost effective which can be better suited to enable
SOC Analysts to perform novelty detection and uncover new indicators of
compromise (IOCs).
 For future work, to assess the effect of evaluation method, four criterions can be taken
into account, including coverage, difficulty of acquirement, accuracy and partition
degree (Coverage points the completeness of indexes and properties. Difficulty of
acquirement means the feasibility of acquiring the index value in quantitative or
qualitative method. Accuracy refers to the difference between the evaluation result of
indexes in each level and the true situation. Partition degree reflects on the differences
among various evaluation systems and methods, which is the symbol of the evaluation
effect).

REFERENCES
[1] Nakhyun Kim, Seulgi Lee, Hyeisun Cho, Byun-ik Kim, MoonSeog Jun, “Design of Cyber
Threat Information Collection System for Cyber Attack Correlation, 2018 International
Conference on Platform Technology and Service (PlatCon).
[2] Hafiz M Farooq, Naif M.Otaibi,“Optimal Machine Learning Algorithms for Cyber Threat
Detection”, 2018 UKSim-AMSS 20th International Conference on Modelling & Simulation.
[3] Li Qiang, Jiang Zhengwei, Yang Zeming*, Liu Baoxu, Wang Xin, Zhang Yunan, “A
Quality Evaluation Method for cyber threat intelligence in User Perspective”, 2018 17th
IEEE International Conference On Trust, Security And Privacy In Computing And
Communications
[4] Mauro Conti, Ali Dehghantanha, and Tooska Dargahi “Cyber Threat Intelligence:
Challenges and Opportunities”, 2018 University of Padua, Italy.
[5] www.spinbackup.com
[6] www.gsuite.google.com


Matter Cti

Uploaded by

Copyright:

Available Formats

Matter Cti

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matter Cti

Uploaded by

Copyright:

Available Formats

Machine learning based Cyber Threat Intelligence 2018-2019

A typical enterprise network with thousands of IT systems generates billions of

DEPT. OF CSE, NIEIT Page 1

Performing a detailed analysis of machine learning based cyber threat intelligence.

DEPT. OF CSE, NIEIT Page 2

DEPT. OF CSE, NIEIT Page 3

2.1 Survey papers

A literature review or narrative review is a type of review article. A literature review

DEPT. OF CSE, NIEIT Page 4

[2] Optimal Machine Learning Algorithms for Cyber Threat Detection

It was published by Hafiz M. Farooq, Naif M. Otaibi in 2018 UKSim-AMSS 20th

[3] A Quality Evaluation Method of Cyber Threat Intelligence in User Perspective

DEPT. OF CSE, NIEIT Page 6

3.1 Flow Diagram of Cyber Threat Information Collection System

Fig. 3.1 Flow diagram of cyber threat information collection system

It is composed of the Total Management Server (TMS) and an integrated collection

DEPT. OF CSE, NIEIT Page 7

3.2 Machine learning analytical workflow

Fig. 3.2 Machine Learning Analytical Workflow

3.3 Comprehensive evaluation architecture of cyber threat intelligence

The Comprehensive Evaluation Architecture of Threat Intelligence and Quantization

DEPT. OF CSE, NIEIT Page 8

Fig. 3.3 Comprehensive evaluation architecture of cyber threat intelligence

DEPT. OF CSE, NIEIT Page 9

Methodology is the systematic, theoretical analysis of the methods applied to a field

4.1 Cyber Threat Intelligence

DEPT. OF CSE, NIEIT Page 10

 Attack Indicator Reconnaissance

4.2 Cyber threat information collection system

Table 4.1 Similar cyber attack cases

DEPT. OF CSE, NIEIT Page 11

4.2.1 System architecture

A. Total Management Server (TMS)

B. Total Collecting Agent (TCA)

DEPT. OF CSE, NIEIT Page 12

C. Web Consol for System Management

DEPT. OF CSE, NIEIT Page 13

Fig. 4.1 Web console GUI (Check amount of data collection)

Table 4.2 Collection and inquiry channel list

DEPT. OF CSE, NIEIT Page 14

4.3 Machine learning based cyber threat intelligence

4.3.1 Data Rate Analytic using Clustering Algorithms

Fig. 4.2 K-Means clustering for upload/download rates

DEPT. OF CSE, NIEIT Page 15

Fig. 4.3 Ex-filtration quadrant

4.3.2 Predicting User Behaviour using Linear Regression

One potential indicator of malicious activity is an abnormal count of process

DEPT. OF CSE, NIEIT Page 16

Fig. 4.4 Abnormal process execution detection using linear regression

4.3.3 Anomaly detection in process execution using OCSVM classifier

Finding anomalous Windows Processes in millions of processes executed in a large-

Fig. 4.5 Classification using OCSVM

DEPT. OF CSE, NIEIT Page 17

4.4 A Quality Evaluation Method of Cyber Threat Intelligence in User

Threat intelligence is used to strengthen security protection in several scenes.

Fig. 4.6 Comprehensive Quality Evaluation Framework

DEPT. OF CSE, NIEIT Page 18

The Comprehensive Evaluation Architecture of Threat Intelligence and Quantization