Matter Cti
Matter Cti
Matter Cti
CHAPTER 1
INTRODUCTION
During the recent years, there has been a significant increase in the number and
variety of cyber attacks which make it extremely difficult for security analysts and forensic
investigators to detect and defend against such security attacks. In order to cope with this
problem, researchers introduced the notion of “Cyber Threat Intelligence”. Cyber Threat
Intelligence (CTI) emerged in order to help security practitioners in recognizing the
indicators of cyber attacks, extracting information about the attack methods, and
consequently responding to the attack accurately and in a timely manner. Cyber Threat
Intelligence is defined as the ‘set of data collected, assessed and applied regarding security
threats, threat actors, exploits, malware, vulnerabilities, compromise indicators and
development of capabilities to help identify attribution sources and take appropriate forms of
protection and counter-action.
There is an increase in similar cyber incidents which use the same IP, domain, and
malicious code. To understand the correlation between cyber attacks and to respond
promptly, it is necessary to collect the related data concerning the procedures and techniques
of cyber attacks with the aid of cyber threat information collection system.
There is simply too much data at play and coming from multiple sources in a way that
human detection of cyber security events would be extremely difficult, if not impossible. A
big challenge in collecting and analyzing intelligence has always been scalability. Good,
actionable intelligence takes expertise to develop. One possible solution to this problem of
scalability and expertise is to use machine learning in cyber threat intelligence. Usage of
machine learning algorithms for cyber attack prediction is very helpful.
More and more users have accepted the concept of threat intelligence and are trying to
use threat intelligence in routine security protection. Selection of appropriate threat
intelligence vendors and services has become a crucial issue. There is a need for method to
scientifically and objectively evaluate threat intelligence services of vendors that is
significant for users. So the quality evaluation system in user perspective is used widely.
A number of machine learning based cyber threat intelligence tools are available
which are greatly contributing for the prevention and identification of cyber attacks well
before. They are also throwing light on recovery of data that is being corrupted during an
attack.
1.1 Aim
1.2 Purpose
In the era of digital information technology and connected devices, the most challenging
issue is ensuring the security and privacy of the individuals’ and organizations’ data. During
the recent years, there has been a significant increase in the number and variety of cyber
attacks and malware samples which make it extremely difficult for security analysts and
forensic investigators to detect and defend against such security attacks. Cyber threat
intelligence can resolve this problem to a good extent which is basically an intelligence that
contains enormous amount of data related to cyber threats and attacks. Timely management
of such huge amount of data is possible with the aid of machine learning algorithms and
techniques. This is the reason behind the inception of machine learning based cyber threat
intelligence.
1.3 Scope
Threat intelligence helps accelerate threat detection, prioritization and incident response
capabilities. Cyber threat intelligence can help you maintain visibility of landscape so that
your security infrastructure is able to respond to the latest threats, in real-time. This includes
detecting malicious activity already inside your network, analyzing it and helping your
security team understand the attackers’ objectives.
1.4 Overview
In recent years, cyber threat intelligence has become an important supporting pillar in a
mature cyber security strategy. When applied well, threat intelligence can help security teams
defend against an ever-more sophisticated threat landscape before, during and after attack. By
studying adversaries and understanding their strategies and objectives, organizations can
build more effective, more refined and more robust cyber defences. Machine learning based
cyber threat intelligence helps in sorting through false positives and the reams of data that we
collect, combined with ever-more sophisticated TTPs employed by cybercriminals. To
understand the correlation between cyber attacks and to respond promptly, it is necessary to
collect the related data which helps in constructing very efficient cyber threat intelligence. A
big challenge in collecting and analyzing intelligence has always been scalability. One
possible solution to this problem of scalability and expertise is to use machine learning in
cyber threat intelligence. Selection of appropriate threat intelligence vendors and services has
become a crucial issue and to meet this challenge there is requirement of comprehensive
quality evaluation system that helps us in recognizing the appropriate cyber threat
intelligence vendor. A number of machine learning based cyber threat intelligence tools are
available which are greatly contributing for the prevention and identification of cyber attacks
well before. We shall see to such tools and understand the machine learning algorithms
working for the functioning of the tool.
CHAPTER 2
LITERATURE SURVEY
A literature survey or a literature review in a project report is that section which shows
the various analyses and research made in the field of interest and the results already
published, taking into account the various parameters of the project and the extent of the
project. It is the most important part of report as it gives a direction in the area of our
research. It helps to set a goal for the analysis - thus giving the problem statement.
[1] Design of a Cyber Threat Information Collection System for Cyber Attack
Correlation
It was published by Nakhyun Kim, Seulgi Lee, Hyeisun Cho, Byun-ik Kim and
MoonSeog Jun in 2018 International Conference on Platform Technology and Service
(PlatCon).
Nowadays, the number of cyber threats is increasing continuously, and attack techniques
are becoming increasingly advanced and intelligent. One important thing that should be noted
with regard to this situation is the marked increase in similar cyber incidents which use the
same IP, domain, and malicious code for one cyber attack. Therefore, it is essential to
understand the correlation between cyber attacks that occur due to the re-use of the same
attack infrastructure (IP, domain, malicious code, etc.) for different cyber attacks, in order to
detect and respond promptly to similar cyber attacks. To understand the correlation between
cyber attacks, it is necessary to collect the related data concerning the procedures and
techniques of cyber attacks. This paper proposes the design details of the cyber threat
information collection system according to such needs. The proposed system performs the
function of collecting the attack infrastructure data (IoCs) exploited for the cyber attack from
various open data sources (OSINT, Open Source INTelligence), and uses the collected data
as an input value to collect more data recursively. The relationship of the collected data can
also be collected, saved, and managed, so that the data can be used to analyze the collection
of cyber attacks. The proposed system has used a virtualization structure and distributed
processing technology to collect data stably from various collection channels.
With the exponential hike in cyber threats, organizations are now striving for better
data mining techniques in order to analyze security logs received from their IT infrastructures
to ensure effective and automated cyber threat detection. Machine Learning (ML) based
analytics for security machine data is the next emerging trend in cyber security, aimed at
mining security data to uncover advanced targeted cyber threats actors and minimizing the
operational overheads of maintaining static correlation rules. However, selection of optimal
machine learning algorithm for security log analytics still remains an impeding factor against
the success of data science in cyber security due to the risk of large number of false-positive
detections, especially in the case of large-scale or global Security Operations Center (SOC)
environments. This fact brings a dire need for an efficient machine learning based cyber
threat detection model, capable of minimizing the false detection rates. In this paper, they
have proposed optimal machine learning algorithms with their implementation framework
based on analytical and empirical evaluations of gathered results, while using various
predictions, classification and forecasting algorithms.
It was published by Majethia Li Qiang, Jiang Zhengwei, Yang Zeming*, Liu Baoxu,
Wang Xin, Zhang Yunan in 2018 17th IEEE International Conference On Trust, Security
And Privacy In Computing And Communications/ 12th IEEE International Conference On
Big Data Science And Engineering.
With the widely use of cyber threat intelligence, the influence of security threats and
cyber attacks have been relieved and controlled in a degree. More and more users have
accepted the conception of threat intelligence and are trying to use threat intelligence in
routine security protection. Then, how to choose appropriate threat intelligence vendors and
DEPT. OF CSE, NIEIT Page 5
Machine learning based Cyber Threat Intelligence 2018-2019
services has become a crucial issue. The present research of threat intelligence evaluation is
mainly focused on one-sided threat intelligence contents and approaches, which was lack of
comprehensiveness and effectiveness. Aiming at this situation, they have proposed the
comprehensive evaluation architecture of threat intelligence in user perspective to evaluate
threat intelligence services in several dimensions with quantitative index system. They also
carried out typical experiments for threat intelligence data feeds and comprehensive situation
to verify the feasibility of proposed method. The results show that the proposed evaluation
method has a clear advantage in coverage and partition degree.
CHAPTER 3
SYSTEM ARCHITECTURE
A system architecture or systems architecture is the conceptual model that defines the
structure, behaviour, and more views of a system. A system architecture can consist of
system components and the sub-systems developed, that will work together to implement the
overall system. The purpose of system architecture activities is to define a comprehensive
solution based on principles, concepts, and properties logically related and consistent with
each other.
The cyber threat information collection system was designed and developed to collect
data for cyber threat intelligence analysis, and to collect, save and manage large quantities of
collected data by developing a flexible and scalable platform. Fig. 3.1 represents the flow
diagram of cyber threat information collection system.
The workflow in fig 3.2 represents how Machine Learning can be incorporated in
Cyber threat intelligence. Machine Learning based security analytics is performed through an
Optimal Workflow Machine Learning Analytical Workflow (MLAW) in order to ensure
efficient pre-processing of the data before applying a nicely trained machine learning
predictor or classifier for subsequent analytics. Such a workflow can help in addressing all
challenges by reducing the huge volume of security events to a few outlier events and
providing security analysts with potential indicators of malicious activities to feed into cyber
threats detection and hunting processes.
Categories point that threat intelligence can provide as three classes: strategic,
tactical and operational. Functions mean that the threat intelligence can be used in different
security businesses, including early warming, process detection and afterward response.
Properties refer to the characteristics utilized in quality evaluation. Testing Methods show
the measuring approaches used in threat intelligence evaluation. Items mean the testing
content of evaluation in user perspective.
CHAPTER 4
METHODOLOGY
Nowadays, the number of cyber threats is increasing continuously, and attack techniques
are becoming increasingly advanced and intelligent.
4.1.1 Definition
Cyber Threat Intelligence is defined as the
The set of data collected, assessed and applied regarding security threats, threat
actors, exploits, malware, vulnerabilities and compromise indicators.
Development of capabilities to help identify attribution sources and take
appropriate forms of protection and counter-action.
4.1.2 Need for cyber threat intelligence
During the recent years, there has been a significant increase in the number and
variety of cyber attacks which make it extremely difficult for security analysts
and forensic investigators to detect and defend against such security attacks in
almost real-time.
Cyber Threat Intelligence (CTI) emerged in order to help security practitioners in
recognizing the indicators of cyber attacks, extracting information about the
attack methods, and consequently responding to the attack accurately and in a
timely manner.
4.1.3 Challenges of cyber threat intelligence
Attack Vector Reconnaissance
• Recognizing the point of attacks and the system vulnerabilities that could
be exploited by the cybercriminals.
• Advancements in attack methods make the recognition of the attacker and
attack’s point of arrival an extremely challenging issue.
There is an increase in similar cyber incidents which use the same IP, domain, and
malicious code .To understand the correlation between cyber attacks and to respond
promptly, it is necessary to collect the related data concerning the procedures and
techniques of cyber attacks.
The cyber threat information collection system was designed and developed to collect
data for cyber threat intelligence analysis. The proposed system is composed of the Total
Management Server (TMS) and an integrated collection agent.
Collection work is divided into direct data collection from the collection channel, and
retrieval work whereby more data are collected using the collected data as an input
value.
The TCA classifies work into Crawler, API Connect, Direct Input, and Shared
Storage, depending on the type of collection work.
The TCA performs a parser function such as HTML, JSON, XML, CSV, and TXT,
depending on the format of the collected data.
The TCA can respond to various collection channels by combining the collection
work type and collected data format.
The cyber threat information collection system estimates the work quantity of the
TCA by managing the collection process, and decides whether distributed processing
will be performed or not, by comparing the threshold value of the TCA and the
processing speed.
The resource use status of the TCA and the TMS is monitored to manage and operate
multiple virtual machines allocated to various collection channels.
Based on the analysis of the collected data types, an integrated data schema is
designed, and the collected data are saved based on the pertinent schema.
The TMS combines the collected data and saves them in the designated No-SQL
(MongoDB) based database.
module manages the function of distributing the TCA so that the TCA itself can be updated,
if modification is required due to a change in the TCA functions.
4.2.2 Results
A cyber threat information collection system was developed in which twelve types of cyber
threat related information were collected using eight collection channels and four retrieval
channels based on the proposed system. About two million items of cyberattack- related data
were collected over a one-month data collection period.
There is simply too much data about cyber threat at play coming from multiple
sources in a way that human detection of such cyber security events would be
extremely difficult, if not impossible.
Timely dealing with such a large number of attacks is not possible without deeply
perusing the attack features and taking corresponding defensive actions through
intelligence.
Therefore, organizations have now realized that the traditional monitoring complemented by
an effective and versatile Machine Learning based Threat Hunting will be a necessary part of
any Security Monitoring portfolio.
Analyzing internet upload and download traffic is acutely important for initial
detection of cyber attacks. In larger enterprise conventional statistical tools are insufficient to
detect abnormal network sessions due to high volume of network traffic. So, Numerical
Clustering is used to filter normal and abnormal network traffic because of the numerical
nature of the dataset. K-Means can be comprehensively used for clustering enterprise users
based on their download and upload rates.
K-Means analysis explicates the results sought by clustering the users' traffic data from
enterprise internet gateway (firewall) into three clusters (k=3) thus earmarking low, medium
and highly active internet users.
Algorithm:
Ex-filtration quadrant:
Clusters positioned at quadrants-II & IV pose relatively major risk of data ex-filtration
subjected to any suspected actions.
Algorithm:
Result:
4.4.2 Experiments
1) Testing Content
Quality evaluation of threat intelligence vendor in user perspective mainly focuses on
price, function, performance &quality, service, reputation &qualification and other content.
Values in score matrix were provided by experts based on basic items test result, quantitative
approaches and experience. The weight of second class index and third class index is
provided by objective evaluation, subjective analysis and optimization. Experts’ weights were
confirmed by method of maximum deviation. Finally, according to expert scoring results,
weight of testing index and experts’ weights, each testing item of each vendor can be
confirmed.
2) Testing Process and Result
Intelligence source and gathering channel includes public information from Internet,
communication with vendors, basic item test methods and quantitative approach. According
to various channels and personal experience, the value of third class index and second class
index in each vendor would be provided by experts. After merging third class index value to
second class index and normalized methods. Scoring matrix would be built.
In the experiment, services of three threat intelligence vendors were evaluated by four
experts. The row of matrix means the evaluation result for each vender. The column of matrix
points the value of evaluation index after standardization.
Eq. (4.1)
Eq. (4.2)
Eq. (4.3)
Fig. 4.10 Getting real-time insights in Domain Audit dashboard from Spinbackup
machine learning
CONCLUSION
Usage of Machine Learning analytics in CTI will enhance cyber security monitoring
along with analysis on optimal algorithms for common cyber threats cases. Machine Learning
analytics are best suited to analyze huge volume of security events and feed deviations from
normal baselines into proactive threat hunting processes as indicators or leads of potential
malicious activity. The machine learning algorithms will provide better results with good
accuracy when the amount of data that is fed to the system is large. The comprehensive
evaluation architecture of threat intelligence in user perspective, which take several
dimensions into account at the same time provide references for users to select appropriate
services in suitable degree. The machine learning based cyber threat intelligence tools
provide good efficiency and they largely contribute to the security of an organization.
FUTURE ENHANCEMENT
Twelve types of cyber threat related information were collected using eight collection
channels and four retrieval channels based on the proposed system. About two million
items of cyber attack- related data were collected over a one-month data collection
period. As a part of future work, number of collection channels can be increased on a
continuous basis.
Semi supervised (one-class classification) algorithms like One-Class SVM (OCSVM)
are relatively easier to train, more cost effective which can be better suited to enable
SOC Analysts to perform novelty detection and uncover new indicators of
compromise (IOCs).
For future work, to assess the effect of evaluation method, four criterions can be taken
into account, including coverage, difficulty of acquirement, accuracy and partition
degree (Coverage points the completeness of indexes and properties. Difficulty of
acquirement means the feasibility of acquiring the index value in quantitative or
qualitative method. Accuracy refers to the difference between the evaluation result of
indexes in each level and the true situation. Partition degree reflects on the differences
among various evaluation systems and methods, which is the symbol of the evaluation
effect).
REFERENCES
[1] Nakhyun Kim, Seulgi Lee, Hyeisun Cho, Byun-ik Kim, MoonSeog Jun, “Design of Cyber
Threat Information Collection System for Cyber Attack Correlation, 2018 International
Conference on Platform Technology and Service (PlatCon).
[2] Hafiz M Farooq, Naif M.Otaibi,“Optimal Machine Learning Algorithms for Cyber Threat
Detection”, 2018 UKSim-AMSS 20th International Conference on Modelling & Simulation.
[3] Li Qiang, Jiang Zhengwei, Yang Zeming*, Liu Baoxu, Wang Xin, Zhang Yunan, “A
Quality Evaluation Method for cyber threat intelligence in User Perspective”, 2018 17th
IEEE International Conference On Trust, Security And Privacy In Computing And
Communications
[4] Mauro Conti, Ali Dehghantanha, and Tooska Dargahi “Cyber Threat Intelligence:
Challenges and Opportunities”, 2018 University of Padua, Italy.
[5] www.spinbackup.com
[6] www.gsuite.google.com