Chung ACMComputer Surveys Data Exfiltration
Chung ACMComputer Surveys Data Exfiltration
net/publication/367417802
CITATIONS READS
2 108
8 authors, including:
Mark Chignell
University of Toronto
426 PUBLICATIONS 5,180 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mark Chignell on 26 January 2023.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
0360-0300/2023/1-ART $15.00
https://doi.org/10.1145/3582077
its customers a week later after they realized there was an exfiltration event [27]. About 77 million user accounts
were affected in this event, and it could be the largest ever credit card information leak incident [154].
Public departments are also valuable targets. The voter data leak in 2016 exposed 55 million Filipino voters’
fingerprints and passport information [59]. In the Office of Personnel Management (OPM) hack, 21.5 million
federal employees’ background information, including their names, addresses, social security numbers, and
5.6 million fingerprints were leaked [72]. The hacker group leveraged a compromised contractor’s credentials
to access OPM’s internal network to exfiltrate valuable data. The reaction of the OPM office was significantly
delayed, where one article suggested that the hackers might have been stealing data for more than a year until
the OPM office finally discovered it through a third-party company’s disclosure [67].
Exfiltration events can also be launched by government agencies [85]. The Yahoo breach, one of the largest data
breach events so far, was carried out by hackers believed to be aligned with Russian state security service [218].
Through phishing emails, these hackers successfully obtained valid credentials for the user database and details
regarding the account management tool. The database contained names, phone numbers, password challenge
questions/answers. It also stored password recovery emails and a cryptographic value unique to each account,
which later allowed the hackers to access their target victims including an assistant to the deputy chairman of
Russia, an officer in Russia’s Ministry of Internal Affairs, a trainer working in Russia’s Ministry of Sports, some
Russian journalists, and some U.S. government workers [218]. Yahoo! estimated that all of its user accounts,
roughly 3 billion, were affected by this event [201], which thus made it one of the largest events ever, in terms of
number of people/accounts affected.
In addition to user claims, companies subject to exfiltration events usually have to pay for fines, settlements,
and penalties relating to ‘poor handling’ of cyber threats. In 2018, Yahoo was fined $35 million by the U.S.
Securities and Exchange Commission (SEC), and the class action lawsuit penalty cost around $50 million dollars.
In two more recent financial company breach events -the Equifax breach (losing 150 million user records) and
the Capital One breach (affecting 100 million users) - Equifax agreed to pay $575 million in a settlement with the
Federal Trade Commission, the Consumer Financial Protection Bureau (CFPB); whereas Capital One was fined
by the Office of the Comptroller of the Currency for $80 million [202].
The 2014 McAfee Centre for Strategic and International Studies report calculated that the total annual cost of
cybercrime was around $400 billion, where data exfiltration was the main motivator for these attacks [127]. In
recent years, cyber breach objectives have gradually transformed into delivering/installing ransomware (which
not only undermine information confidentiality as in regular exfiltration events but also affect system availability).
Data exfiltration has consequently become a major component of ransomware attacks, where adversaries leverage
the fear of sensitive data disclosure or destruction to demand a ransom [146]. The use of ransomware that leads
to exfiltration threats may create much greater costs than simply losing access to proprietary data. The latest
Crowdstrike global threat report revealed that some adversaries even setup marketplaces to advertise and sell
potential victim’s sensitive data [49].
While there have been many technical approaches to battle against exfiltration threats, an earlier report (the
SANS 2016 security analytics survey [178]) indicated that many organizations still rely on inadequate security,
with the following problems being highlighted :
• Corporations are short of skilled professionals, funding, and resources to support security analytics.
• Organizations are still having trouble baselining ‘normal’ behavior in their environments, a metric necessary
to accurately detect, inspect, and block anomalous behaviors.
• Only 4% of respondents consider their analytics capabilities fully automated.
• Just 22% of respondents are currently using tools that incorporate machine learning (ML), where ML offers
more insights that could help less skilled analysts with faster detection, automatic reuse of patterns detected,
and more.
The 2020 SANS Network Visibility and Threat Detection Survey [159] further reported that while conventional
rule-based and signature-based methods have been utilized in most organizations’ networks/hosts, of the
participating organizations:
• 59% still believe that lack of network visibility poses a high or very high risk to their operations.
• 64% of respondents experienced at least one compromise over the past 12 months.
The situation has not improved in recent years [49], as there is a continuing lack of skilled professionals. In fact,
as corporations moved their critical assets including sensitive data to the cloud, protecting against exfiltration
threats became even more complicated, because cloud-based assets created an additional attack surface. Thus
organizations had to deal with problems arising from having too many people potentially able to access sensitive
data from their cloud data repositories). Insufficient human resources dedicated to cybersecurity, combined with
increasing system complexity, likely explain why insider exfiltration threat has become the second most common
cloud threat [126].
Industry reports have revealed socio-technical issues that limit the effectiveness of defense perimeters in
combating exfiltration threats. In other words, a significant source of the challenge in tackling cybercrime and
data exfiltration is the complexity of the information to be analyzed by human actors. Thus in the remainder of
this survey, we review current technologies in place to defend against exfiltration incidents. set in the broader
view of approaches being applied in industry in order to reveal potential issues when considering socio-technical
relationships between organizations, humans, and the machine.
Table 1. Comparison between the current survey and major previous surveys on relevant topics in the past decade
Topics [177] [208] [116] [8] [70] [163] [95] [26] [11] [65] This
Covered Survey
Adversary Types and Characteristics x x x x x x x x x
Attack Vectors and Campaigns x x x x x x
Threat Models and Frameworks x x x
Countermeasures x x x x x x
Countermeasure Limitations x x x x x x x
Countermeasure Human Factors x x x
ML Solutions x x x x x x x x
ML Limitations x x x
Human Role in Expert-ML Systems x
Table 2. Research questions as the foundation of this survey
these defensive countermeasures against exfiltration threats. We then review the limitations of these approaches,
focusing in particular on the human tasks that can be difficult for domain experts.
usually conducted by a legitimate user. This type of threat involves unintentional violation of norms or policies
[80, 197] and is usually detectable with customized DLP (Data Loss Prevention) systems that follow organization
policies. By contrast, adversarial threats usually come from external sources and may be carried out persistently
and covertly (and be harder to detect as a result) if the attackers have sufficient resources.
Malicious external adversaries who have established a foothold inside the perimeter are usually referred to as
masqueraders [135]. Establishing this foothold typically requires a sequence of activities [116], with a common
attack campaign involving three stages: research, attack, and exfiltration [208]. In the research stage, sometimes
referred to as the enumeration stage, attackers can leverage OSINT (Open-Source INTelligence) to search for
public-facing domains and potential disclosure of internal information. They can also choose more aggressive
approaches such as port scanning or web vulnerability scanning in order to discover unpatched vulnerabilities or
bad codes/misconfigured settings of public-facing servers. Attackers can then exploit discovered vulnerabilities
such as local/remote file inclusion (LFI/RFI), SQL injection, insecure direct object reference (IDOR), cross-site
request forgery (CSRF), etc., to get remote code execution, hijack user sessions, or obtain user credentials that
may later on yield remote access. The whole attack campaign may eventually lead to the exfiltration of sensitive
data.
In addition, masqueraders having abundant resource, e.g., funded by hostile state entities, may carry out more
sophisticated attack campaigns and are more capable of maintaining a C2 (Control and Command) channel,
targeting enterprise or government networks. Such long-term threats posed by well-resourced adversaries are
typically referred to as APTs (Advanced Persistent Threats) [38].
Regardless of which TTPs (tactics, techniques, procedures) and how sophisticated attack campaigns external
adversaries employ in order to get access to the internal network, they eventually impersonate internal users
[165]. This often leads to a “shared” user account which is effectively owned by both the original valid user,
and the new malicious user who will misuse the account credentials from time to time. Thus, defending against
exfiltration at this stage may require focusing on behavioral changes of internal users, since significant changes
in a user’s behavior may be due to the actions of malicious attackers who have captured, or are sharing, the user
account.
Since data exfiltration threats arise not only from external actors, we also consider internal actors in this review.
Internal actors may pose even greater threats to data security, with industry reports suggesting that internal
threats are increasingly serious. The proportion of exfiltration threats conducted by internal actors increased
from 17% in 2011 to 30% in 2020 [14, 211]. Internal actors may have been authorized with legitimate access to an
organization’s internal computer systems, data, or networks, but when they act maliciously (i.e., their actions are
counter to policy/code of conduct) they are referred to as traitors [73, 148]. In the context of data exfiltration, the
goal of these “traitors” is to “negatively affect confidentiality, integrity, or availability of some information asset”
[165] for a variety of incentives such as revenge, monetary reward, hacktivism, etc.
Most traitors depend on four main enabling resources: Access to the system; ability to represent the organization;
knowledge of the system/network; gaining the trust of the organization [89]. Traitors can have a variety of roles
such as employees, contractors or consultants, clients or customers, joint venture partners, and vendors. However,
external actors may also recruit, or collaborate with, trusted internal personnel and thus create an insider threat
by allying with an internal user [139].
Traitors, as well as masqueraders who have successfully obtained valid credentials and sufficient knowledge,
share the following properties:
• They have access to the system
• They can represent the organization
• They have knowledge about the internal workings of the system they have infiltrated
In principle, insiders, whether traitors or masqueraders, should behave differently from other users as they
prepare a data exfiltration exploit [42, 70, 83]. Thus, the kind of analysis needed to defend inside the perimeter will
mainly depend on differentiating normal from abnormal behavior. Previous work on data exfiltration has relied
on anomalous behavior detection, often using statistical and machine learning techniques [111, 134]. However,
algorithms that seek to detect anomalies typically do not have access to the implicit human knowledge that
can recognize subtle differences in normal versus abnormal behavior. It has proven difficult to provide accurate
detection of malicious behavior without generating large numbers of false alarms (false detections), because
behavior will tend to differ across different adversaries, who will have different motivations, resources, and
preferred methods. Thus in the following sections, we will consider actors as insiders with similar data exfiltration
motivations, regardless of whether there were originally inside the network (traitors) or not (masqueraders).
Aspect Definition
A structured way to secure software design by understanding an adversary’s goal in attacking
•
a system based on the system’s assets of interest [20, 200]
General Threat modeling is the process of enumerating and risk-rating malicious agents, their attacks,
•
and those attacks’ possibleimpacts on a system’s assets [196]
• A sound analysis of potential attacks or threats in various contexts [209]
A conceptual exercise to analyze a system’s architecture or design to find security flaws
•
and reduce architectural risk [152]
System Evaluation
The process to analyze system architecture, identify potential security threats, and select
•
appropriate mitigation techniques [66, 223]
• A systematic way to identify threats that might compromise security [122]
Application Development A process to analyze the security and vulnerabilities of
•
an application or network services [51, 185]
implemented by industry [198], we review three in the remainder of this subsection, focusing on their ability to
identify potentially useful exfiltration countermeasures.
4.1.1 Microsoft STRIDE Framework. One of the earliest cybersecurity frameworks is the Microsoft STRIDE
security framework [103]. The STRIDE framework uses a 2-step approach to evaluate detailed system design in
terms of security [183]. In step one, analysts should build a data flow diagram (DFD) to identify assets, dataflow,
and the boundary of a network system in place. There are two major variants of using STRIDE [101] in this step:
• STRIDE per element [184] recommended highlighting the elements such as the external entity, the process,
the flow, and the DFD data in terms of their behavior and operations
• STRIDE per interaction [96] suggested considering elements’ origin, destination, and interactions (can
better capture threats that are only visible in interactions between systems)
Next, in step 2 an analyst should determine the potential threat category of an entity, from several general
known threats from which STRIDE is named after. The STRIDE general threat categories are as follows [84]:
• Spoofing identity (Confidentiality/Integrity at risk)
• Tampering data (Integrity at risk)
• Repudiation (Integrity at risk)
• Information disclosure (Confidentiality at risk)
• Denial of service (Availability at risk)
• Elevation of privilege (Confidentiality/Integrity at risk)
Using the STRIDE framework can be time consuming [184]. STRIDE uses the DFD to visualize every asset of
an organization network system. As the scale and complexity of the organization increases, the total number
of assets to be analyzed tends to grow exponentially. One study [170] hypothesized that it would be difficult to
detect more than about two threats per hour during analysis. Another problem found by Scandariato et al. was
that STRIDE leads to a roughly 25% false positive rate with around a 65% chance of missing a threat.
Mitigating the problems noted in the previous paragraph, STRIDE is relatively easy to adopt for organizations
[183] and it is effective in identifying known threats [217]. Several studies suggested that combining STRIDE
with other approaches, for instance, scores from CWE (common weakness enumeration) and CVE (common
vulnerability enumeration) databases [84]; or combining STRIDE with NIST standards [123], can improve overall
performances in terms of threat detectability and efficiency.
In general, the STRIDE framework provides organizations a structure of element identification and threat
modeling. This defensive framework should improve all round security for organizations, but with large
organizations the use of STRIDE can be time consuming. STRIDE does not exclusively list approaches that
can protect against certain threats. Thus, other frameworks that have more granularity in terms of attack
techniques in exfiltration threats also need to be considered.
4.1.2 Cyber Kill Chain. One of the most well-recognized threat models in industry is the cyber kill chain,
which focuses on the offensive process. The cyber kill chain represents attack vectors as a sequence of stages,
from scouting for information to the final action on objectives, in seven phases [90, 104]: Reconnaissance;
weaponization; delivery; exploitation; installation; command and control (C2); actions on objectives.
There are different ways to implement the cyber kill chain concept. For instance, the diamond model was
proposed to support “feature” exploration in each stage of the cyber kill chain [31] that can depict the core
features of an intrusion (an adversary deploying a capability over some infrastructure against a victim).
By pivoting through each stage and the core features, analysts can better identify the fundamental relationship
between attack vectors and defensive approaches to protect against them. That relationship can also help identify
countermeasures that are potentially useful at each stage of an attack campaign, for example, Table 4 shows
approaches that may be useful in defending against exfiltration campaigns, including the stages involved and
their action definitions.
4.1.3 MITRE ATT&CK Framework. The MITRE ATT&CK Framework for Enterprise aligns with the cyber kill
chain model, while updating it with adversary techniques as they are developed and become available [108, 199].
It evolved from the cyber kill chain, focusing on possible tactics in and after the delivery stage, as shown in
Figure 2.
Figure 2. The relationship between MITRE ATT&CK tactics and the cyber kill chain
The MITRE ATT&CK framework focuses on the TTPs (tactics, techniques, and procedures) of adversaries,
where “a tactic is a behavior that supports a strategic goal; a technique is a possible method of executing a tactic.
Each technique has a description explaining what the technique is, how it may be executed, when it may be used,
and various procedures for performing it” [6].
Given an understanding of the whole chain of attack vectors that constitute a threat, one can predict future
actions along the attack chain and develop strategies to deal with them. In the present context of data exfiltration
threats, the possible tactics are listed as follows [130]:
• Automated Exfiltration
– Traffic Duplication
• Data Transfer Size Limits
• Exfiltration Over Alternative Protocol
– Exfiltration Over Symmetric Encrypted Non-C2 Protocol
Table 5. Common countermeasures against exfiltration and their functions, traits, and limitations
Category Countermeasure Functionality Trait and Limitation
Firewall Block request based on predefined rules/policies
(Passive) Operate based on
Perimeter Defense (Network) Intrusion Detection Detect unwanted traffic based on pre-stored signatures
predefined rules or signatures
Access Control Block/Grant access based on policies, roles, or attributes
Encryption Protect against data leakage for data at rest and in motion (Passive proactive) Provide
Data Protection Data Provenance Provide evidence of data modifications and transfers supporting evidence but require
Honeytoken Trigger alerts of data modifications and transfers furtheralerting functions
(Host/Network) Intrusion Prevention Detect unwanted traffic/activity and send out alerts (Proactive) Constantly monitoring
Alerting and Monitoring Endpoint Protection Monitor normal/anomalous behavior on endpoints but can trigger a high volume of
Data Loss Prevention Prevent unwanted traffic/process/behavior in the intranet false alarms
campaign and to “hunt threats”. The whole process is human-centered to a large extent, but scant research
has studied the importance of this critical human component in human-machine security systems. Thus, in
the remainder of this section, we survey studies concerning our proposed research questions 1 and 2. We
review the studies and technologies proposed and implemented in detail and introduce problems relating to the
unacknowledged human component (in human-machine systems), such as those that arise when domain experts
operate or consume information from these technologies.
4.2.1 Firewall. Network firewalls form the outer layer of perimeter defense between the untrusted internet
and the trusted intranet, or between local network segmentation [91, 181]. These firewalls restrict network
traffic through accepting, denying, or dropping/resetting requests and thus significantly reduce the number of
potentially malicious packets being passed into the organizational intranet. However, since firewalls are only
effective when their rules are properly configured [219], and the rules are usually set to block known bad traffic,
network firewalls are not fully effective at handling human-executed, novel exfiltration threats.
In addition to network and host firewalls, web application firewalls (WAF) are crucial in terms of protecting web
servers [44]. Web servers are usually public facing to fulfill required business functions. They are consequently
more vulnerable because they provide many opportunities for attack. As a result, web-based attacks such as SQL
injection or cross-site scripting (XSS) are very common in modern computer environments [9]. A well-configured
WAF may block web requests based on context, and/or sanitize user input for the sake of zero trust, so as to
protect web servers from malicious attempts [206]. WAFs can also provide compensation controls when a major
web server update is not deployable while some critical vulnerabilities have been published. Unfortunately, WAFs
have similar issues as other types of firewalls because they all need preset rules or policies, thus making them
less resilient.
Researchers have suggested using interactive approaches to increase the usability of setting up or re-configuration
firewalls at a personal network level [176]. By creating an additional interface between firewalls and users, either
visual or auditory, these tools help improve users’ efficiency. However, interactive interfaces may sacrifice technical
details, especially for personal use, sometimes undermining human-technology system performances [155, 156].
At an organizational level, while experts are willing and capable of handling complex security information, it
is much more difficult to configure multiple sets of firewall rules or update them. Thus, interactive tools (e.g.,
supporting visualizations) are needed to manage complex system configurations [112, 121].
With recent advances in ML implementation, policy configuration data and rule updating at the backend
have improved significantly. ML may support reducing errors caused by misconfiguration and increasing packet
dropping accuracy, and, most importantly, reduce expert workload [3, 207]. Automatic models work well with
human experts in this case, since anomaly rule detection and massive packet attribute inspection do not involve
complex human behavior detection.
Experts may use firewall logs as an initial step in forensic investigation as well as threat hunting. Exfiltration
threats, and associated malicious activities, may arise from disgruntled users who have legitimate accounts
privileges, and whose exfiltration activity may only be detected when they attempt to transfer data out of the
protected network. When data is exfiltrated, the firewall is the final opportunity to detect outgoing sensitive
data. However, detecting such activities with firewalls at the perimeters may be too late. For this reason, access
controls are typically used in combination with firewalls, and are configured to prevent both unwanted external
users and insiders from reaching protected zones.
4.2.2 Access Control. In contrast to firewalls that control network traffic, access control systems limit user access
to protected files, databases, or network zones. Starting with the early development of the access matrix [109, 172],
various types of access control models have been proposed, with four models currently dominant in industry.
Initially, there were two major control strategies: discretionary access control (DAC) and mandatory access
control (MAC). DACs use access control lists (ACLs) to manage whether a user should be assigned access (and
define what operations can be made such as read and/or write privilege) to the requested resources [167, 169],
based on their identities registered on the system.
While DACs are simple to configure and support timely updates to fulfill business needs, they are often
vulnerable to impersonation or to certain types of malwares such as RAT (remote access trojan) [56]; since all the
DAC restrictions are based on identities, DACs will not be effective when someone impersonates another user. In
addition, users with multiple identities may request resources from multiple identities on each system, making
central management extremely difficult.
In contrast, MACs use labels to manage groups of resources (i.e. confidential, secret, top-secret), so that only a
subset of users who have matching labels (clearance) can access. By forming a “lattice-based” control method,
MACs are strongly enforceable and easier to manage centrally [140, 166]. However, if resources are required to
share between groups, the highly restricted environment controlled by MACs may not be suitable. In addition,
since labels are assigned to both users and the resources, it may be costly to set up a central management center.
Both DACs and MACs fail to satisfy the needs for industry practitioners [93]. Due to the defects listed above,
role-based access control (RBAC) systems were developed, gradually becoming the dominant access control
strategy. RBACs use organizational roles as the main basis for defining user privileges [63, 168]. Based on the
organizational chart, roles can easily be assigned and reassigned to a user, and only when needed, leading to a
guarantee of ‘least privilege’, at all times [64].
Since RBACs manage roles only (instead of both resource and user identities as is done with systems like
MACs), the management cost can be significantly lower. However, in large multinational corporations with many
Table 6. A summary of the advantages and disadvantages of different types of access control models
Access Control Type Advantage Disadvantage
• A user may have excessive ACL settings
• Simple configuration through ACL
DAC • Vulnerable to impersonation
• Current task-oriented
(owner-controlled) • Difficult for centralized control
• Support timely update
• Prone to assign over or under privilege
• Centrally manageable (object and subject labels)
MAC • Less flexible when group-wise collaboration is needed
• Stronger enforceability
(lattice-based) • Centralized management cost
• Single configuration for a group of users
• Centrally manageable (user roles)
Large organizations may have complex employee structures
RBAC • Least privilege yields better security •
and thus reduce the manageability of user role assignment
(hierarchical) Easier to manage user roles than item labels
• • Multiple roles and access granted to one user may lead to over privilege
(better flexibility)
• Centrally manageable (user attributes)
ABAC
• Dynamic and task-oriented • Difficult to define and manage attributes at the beginning
(granular and scalable)
• Highly scalable
thousands of employees, the disadvantages of RBAC became apparent. Business roles in very large organizations
are complex and the business hierarchy may be unclear, increasing the complexity of managing roles, and
increasing the chance of assigning undesirable levels of privilege to users with multiple roles.
Addressing the failings of other access control models, a more sensitive attribute-based access control (ABAC)
was proposed [87, 143, 173]. ABACs rely on a top-down, uniformly controlled framework that defines every
aspect of "everything" [133]. Attributes can be values including sensitivity of a resource, identity and context
of a user, or even environment factors as long (as they can be further defined and applied as policies). If DAC,
MAC, RBAC each represent a type of filter that can screen and remove based on its unique filter category, ABAC
contains a great number of filters including, but not limited to, these three categories.
When constructed well, access control can be applied more easily and securely [93], with the marginal cost
of adding instances or attributes. A summary of the advantages and disadvantages of all four types of current
access control are presented in Table 6.
Maintaining a complex attribute framework and dynamically reassigning access may be as difficult as
maintaining complex, distributed firewall rules. However, ABAC systems have a lot of data regarding user
attributes that could be extremely useful in terms of detecting unusual behavior by cross-referencing attributes
[4] forming a strong basis for detecting insiders using ML.
4.2.3 Intrusion Detection Systems. While rule-based systems can detect malicious packets based on content
inspection, current approaches typically carry out that detection using network intrusion detection systems
(IDS). Network IDSs look for signature matches in web requests, emails, and other packets to detect malicious
payloads that sneak through rule-based defenses [5, 45, 107, 220]. However, signature-based detections rely on a
pre-existed database that contains known attack signatures. Since signature-based approaches are not able to
detect novel threats, anomaly-based IDSs were proposed [229].
Anomaly-based IDSs perform content inspection by not only looking for signature matches but also by
comparing the current profile with predefined "normal" profiles [68, 214]. IDSs then produce a numeric score
(the higher the less secure of the system), usually between 1 to 100, representing how anomalous a profile is
[132]. In this way, anomaly-based approaches are more capable of handling novel attacks in real time. However,
anomaly based IDSs also have significant drawbacks. As shown in Figure 3, it may be difficult to match a single
score of how anomalous a profile is to an attack pattern that is occurring in real time [110]. The anomaly score
rises after an attack has begun and will fall once the attack has ended. Since the time-sensitive nature of attack
profiles makes it difficult to assign a proper score, anomaly-based IDSs are prone to false alarms.
While there have been numerous approaches proposed to solve the excessive false alarm issue, especially with
the increased use of ML algorithms [7, 39], industry reports (for instance reports in section 1) have shown that
human experts are still overwhelmed by false alarms with no solution currently in sight. With little knowledge of
the human factors of anomaly detection, research on the impact of current anomaly detection systems on human
users in terms of user-centered testing and workload assessment is urgently needed.
Perimeter defense approaches employ a wide variety of methods to detect network-based attacks. They all,
however, suffer from the disadvantages noted above. While perimeter defenses can screen out a large majority of
attack attempts before they reach the intranet, they are less capable of combating exfiltration activities. As a
result, defense strategies based on analysis of data usage within the intranet has become a focus for cyber-defense
activity.
RSA does not disclose the original material (plain text) if partial pieces of the ciphertext are exposed [71, 179].
RSA and its derived algorithms are currently considered secure in industry, until such time as an adversary
obtains quantum computing technologies [33].
Encryption approaches focus on either protecting data in motion or protecting data at rest. Data in motion is
usually vulnerable to man-in-the-middle attack. Encryption of data transmitted through the internet is crucial
to prevent data leakage; for instance, the current TLS (Transport Layer Security) version 1.2 [54] secures web
requests against eavesdropping. By contrast, protecting data at rest can be more difficult than protecting data
in motion. In many cases, adversaries (especially insiders) may be more interested in stealing high volume of
sensitive data at rest rather than small pieces of information in motion. It is thus important to label the sensitivity
of data so that access clearance and records can be properly managed. There are several ways to classify data
sensitivity. For instance, Executive Order 12356 [46, 149], describes three levels of information classification:
• Top Secret, where unauthorized disclosure could cause exceptionally grave damage to the national security
• Secret, where unauthorized disclosure could cause serious damage to the national security
• Confidential, where unauthorized disclosure could cause damage to national security
These three levels are proposed as a standard. There are many approaches complying with the standard so as to
assign data sensitivity, such as using role and access patterns [128] to classify data, or using NLP (natural language
processing) technologies to learn from text fragments and assign file sensitivities. Once data classification is
completed, a data owner (usually a senior role who is responsible for data collection, protection, and data quality
retention) can make decisions concerning the assignment of data access or editing permissions to users [224].
Many studies have been carried out on securing data in motion and data at rest using encryption technologies.
However, cryptography itself is not sufficient to secure data in motion from man-in-the-middle attacks, and data
at rest from physically accessing [210], its ability to stop exfiltration threats is limited in the following scenarios:
• Key stealing: Cryptography requires the secret key being protected securely (which usually rely on access
control). Successful social-engineering attacks or impersonation can lead to key disclosure and sabotage
data security.
• Data in use: legit users need to access clear text data for their day-to-day job. Spyware can easily record
decrypted in-use data and thus cause data leakage.
• Insider threat: an insider with sufficient privilege can access original, unencrypted data at any time.
Sometimes a user may unintentionally print out data that is supposed to be encrypted and secured at rest,
thus leading to data exfiltration.
Thus, in the next subsection we consider data provenance as a supplement to encryption; data provenance
keeps track of sensitive data location more effectively, protecting it against exfiltration.
4.3.2 Data Provenance. Data security constitutes an important aspect of the cybersecurity posture [13] of an
organization. Data provenance is closely related to exfiltration threat protection, as it can provide reliable sources
of evidence for domain experts as they form hypotheses to carry out investigations and build IOCs (Indicators of
Compromise).
IOCs are indicator measures of whether a user account has been compromised. Accurate IOCs greatly facilitate
threat hunting, allowing organizations to proactively look for malicious behaviors [125, 129]. Data provenance
(sometimes referred to as the ‘lineage’ of data) provides data “labels” that can facilitate the process of building
valid IOCs. It is thus crucial information for hunting novel or insider threats.
Implementing data provenance involves keeping track of data origins, as well as managing data arrival processes
[29]. Conventionally, there are two ways of managing data provenance in a database [186]:
• Annotation: data origins and transfer points are ‘annotated’ in the metadata [22]
• Inversion: queries/functions used to derive data are stored and can ‘inversely’ reproduce source and derived
data [98]
While both data provenance methods are readily scalable in modern systems [88], annotation can provide
more information completeness. Current data provenance applications orchestrate various data sources. They are
combined with other security approaches so as to detect anomalous events by tracking every possible modification
(read, write, execution and transfer) of data files. Some data provenance application examples are:
• Monitoring data accesses and following on the chain of processes [106, 215]
• Providing tamper-proof function (using blockchain) to secure cloud data [115]
• Establishing trust so as to retain security status in the IoT (Internet of Things) environment, where multiple
different metadata sources and formats are inevitable [57, 86]
• Integrating historical and contextual provenance data to triage false positives [1]
Data provenance can be obtained from system process calls [10], or can be obtained from email, print, copy
(e.g., to removable drives), and any other traceable activities at a higher application/database level [61]. The
collected provenance data should be secure from tampering, for instance, using provenance-aware platforms
such as the Trusted Platform Module (TPM) [203]. Implementation primitives such as encryption, hash, signature,
or watermarking [228] should also be considered, so that analysts can rely on the information for investigation.
An interesting example of a secure provenance collection method is the Red Star system, developed by the
North Korean government (according to a YouTube video cited in [120]). It is “an operating system that has
been specifically enhanced to append “watermarks” based on the specific hardware being used. The receiving
system can see the thread of previous systems that opened the file. In this case data provenance is secured and
can provide non-repudiable information regarding who might be leaking files or creating “subversive” content.
With improvements in computational power, data provenance may contain more granular information (e.g.,
specific workbook in a spreadsheet file or particularly selected area in a table) that can more precisely indicate
the causal relationship of events [79]. This can improve the efficiency of conducting investigations concerning
the chain of exfiltration activities [60], which could also improve APT activity detections [92].
For large organizations, however, considering the number of files they need to secure, data provenance may
create “too many” additional details. The problem of having too much data is much more salient than having too
little data in modern threat detection, especially in a large corporate environment. Detailed data provenance
can create huge amount of data as actions are tracked through a system. Like excessive numbers of false alarms
generated in automated anomaly detection, data provenance threatens to create more information and potential
threats than human analysts are able to handle.
Thus, it is believed that supporting experts, who are working on investigations using provenance with ML,
may help them automate repetitive screening tasks, making their investigations less burdensome. ML models may
support automatic threat detection using IOCs formed with low-level provenance data, transforming that data
into enriched security incident knowledge, with a higher-level of abstraction, that is more suitable for human
consumption [151]. However, when experts are trying to make critical decisions (e.g., determining whether an
instance is malicious or not), ML outputs with low interpretability may do more harm than good. High-level
abstractions may be unsuitable for people with high expertise, since the more expertise practitioners possess, the
more “interpretability” they are likely to require in model output [30].
Experts need more explanation of model output, so that they can trust and rely on model outputs in making
critical decisions, but too much explanation may be counterproductive. There is a tradeoff between the level of
abstraction and the richness of model explainable outputs, with too much abstraction reducing expert trust in
ML recommended decisions, while too much detailed explanation may be distracting and create inefficiencies.
In addition, different experts may have varying requirements for model interpretability. Thus, the level of
interpretability needs to be customized so that experts can trust the model and integrate model outputs into their
decision-making process. ML models failing to fulfill these requirements may in turn reduce detection efficiency
and create excessive burdens on human experts (A more detailed discussion of the expert-ML interactions is
provided in section 5).
4.3.3 Honeytoken. A more aggressive way to protect sensitive data is through the use of honeytokens. Honeytokens
evolved from the concept of honeypots. A honeypot is a decoy, a closely monitored network intended to
trick malicious actors into providing insight into their techniques. Honeypots have the following advantages
[131, 153, 192, 194]:
• Distract or mislead adversaries from valuable real targets
• Alert domain workers in advance
• Allow investigation of the vectors performed by adversaries
• Reduce false alarms (because activities performed in a honeypot are most likely malicious)
A honeypot acts as a decoy host that contains data that looks sensitive in order to lure adversaries to attack it,
so as to detect the identities of the adversaries (in some rare but valuable cases) and their TTPs. A honeypot
can also involve low or high interaction [216]. Low interaction honeypots emulate and monitor some specific
services such as known Windows vulnerable services [12] and SSH server [47].
With low interaction honeypots, attackers cannot interact with the operating system directly. In contrast, high
interaction honeypots support a more flexible interaction environment that can provide various types of data
for investigation, such as tcpdump data, keystroke logs, file access details, and other input/output associated
with adversaries’ activities [216]. A high interaction honeypot might be insightful for analyzing comprehensive
adversary attack vectors and creating IOCs to prevent upcoming attacks.
A honeytoken is an expansion of the honeypot concept, faking digital items such as credit card number,
database entry, or credentials [193], making them quasi-authentic, and placing them in the system within the
intranet [21]. Two major ways of creating honeytokens from database rules are [226]:
• Obfuscation: substitute sensitive attributes and their values with artificial data
• Generation: completely generate artificial data from scratch
High definition honeytokens should be indistinguishable even with extensive efforts performed by domain
experts [195]. Thus, they can be used to trigger alarm when someone tries to interact with certain rarely accessed
database entries [147]; to keep track of the fingerprint (similar to provenance) of an active attack campaign [195];
or even protecting 2 factor authentication (2FA) with injecting honeytokens as words into credentials [142].
Whenever a honeytoken is accessed, used, modified, or transmitted, an alarm will be triggered to notify relevant
personnel. Proper alerting and monitoring technologies must be prepared in advance to deal with Honeypot data
and honeytokens.
Figure 5 summarizes countermeasures that may support detection against exfiltration threats, where each
colored block represents a type of data source that can be used in further investigation and threat hunting. Among
the countermeasures, UEBA provides a relatively complete human behavior information profile that can be used
in cross-endpoint EDR investigations and incident responses.
Centrally managed endpoint protection approaches require experts to work with their rich functions and
data sources proactively. Analysts working with these platforms can respond to anomalous events in real time.
However, for platforms focusing on human activities, this can be a disadvantage due to the nature of unpredictable
and novel human behavior. Users on endpoints do not always operate with certain fixed patterns. Thus, numerous
alerts can be generated as false positives [205]. Consequently, these platforms may cause fatigue, overwhelm,
and reduce situational awareness of human experts because of the well-known alert fatigue phenomenon [1, 15].
Alert fatigue in turn leads to human-machine system performance degradation and undermines overall security
performance with a canonical example of poor human factors outcomes due to alert fatigue being the case of the
Three Mile Island nuclear incident [25].
4.4.2 Data Loss Prevention. While large numbers of false alarms can be burdensome for human experts, one
approach to reduce the number of false alarms is by lowering the sensitivity of detection and focusing on the
final exfiltration actions. Because every exfiltration campaign has a final exfiltrating action, organizations can
focus on preventing this final step by applying business functions (i.e., a Data Loss Prevention. or DLP, system)
that define acceptable vs. unacceptable actions.
A DLP can inspect file contents and block policy violating actions preceding outbound traffic, so as to prevent
sensitive data from leaving the intranet [204]. This should significantly reduce alerts being presented at a SIEM,
reducing human workload. Many vendors supply DLP solutions to organizations [78]. At a minimum, a DLP
system should provide the following functions [117]:
• Define data sensitivity to create a data inventory that contains sensitive data location
• Discover sensitive data at rest and relocate the data to logged secure inventory
• Manage data usage policies and how they are enforced, including data handling such as data cleanup and
disposal
• Monitor, understand, and visualize (make visible to the organization) sensitive data usage patterns
• Prevent sensitive data from leaving an organization by enforcing security policies proactively.
• Report data loss incidents and establish incident response capability to enable corrective actions that
remediate violations
While it sounds straightforward to “block outbound sensitive data”, sensitive files can be dynamically created
and deleted constantly, making it difficult to track which data is sensitive. If sensitive data is not tracked adequately,
the DLP may fail to block transfers that should be blocked, undermining security, or may block too many transfers,
undermining system service quality [221].
Since DLP systems operate using rules, they are subject to the same problems (noted earlier) as other rule-based
systems. To block sensitive files from leaving the intranet, DLP requires certain policies/rules to operate properly,
based on how the following questions are answered:
• What kind of actions should be blocked?
• Who (which privilege), when operating what, should be blocked?
• How to block?
As the scale of the organization increases, it can be more difficult to answer these questions, making the defined
policies more complex. As a result, a DLP system following these complex policies can in turn generate a large
volume of false positives.
to identify social-engineering attacks so as to improve their awareness. Sometimes an organization may insert its
own pseudo-phishing emails into user mail queues to detect the susceptibility of those users to social engineering
attacks. However, organizations remain susceptible to social engineering attacks whenever they are feasible, due
to a variety of human foibles such as over-trust, impulsiveness, or greed. The vulnerabilities of human nature
have made humans "the weakest link in the security pipeline”, a weak link that is easily taken advantage of [171].
Human slips/errors may weaken human-based protection, and consequently, undermine the effectiveness of
computer-based countermeasures.
In recent years, social-engineering attacks have evolved. Social-engineering attacks may no longer obtain
access to a network system, but simply deliver a malicious payload. The delivery process can be covert (e.g.,
the recent Excel macro malware attachment attack reported by Fortinet [230]), and the goal is only to install
ransomware onto the target system. The adversary can then demand a ransom and threaten sensitive information
disclosure, as presented in reports in section 1 [49, 126, 146]. This new type of attack is even more difficult to
prevent because one negligent or careless employee can cause severe damage to the whole intranet.
Hardening the system network against social-engineering attacks can be difficult. Domain experts must protect
not only the computer network but also human interactions with the computer network. This has become a
socio-technical issue, where there is a lack of comprehensive guidelines to support their work. The cybersecurity
domain urgently needs more investment in training people in order to enhance their social-engineering attack
awareness [164]. More advanced detection countermeasures to battle social-engineering attacks are also needed.
cybersecurity in general, the issues raised will apply more broadly to human interaction with automation, and
more specifically to data exfiltration applications.
5.1 SIEM Integration with ML and Resulting Implications for Human Factors
Modern enterprise environments use a SIEM (or a SOAR) approach to integrate and centralize complex data
for the purposes of real-time attack detection and security event analytics (typically within a SOC, a Security
Operations Center). SIEM systems provide log data collection and integration functionalities, supporting expert
investigation, forensic analysis, incident response, incident mitigation, and reporting [99].
A SIEM tool works on data logs from a variety of security devices and traffic sensors [23]. These devices
and sensors can be the types of countermeasures discussed in section 4, such as firewalls (including WAFs),
IDSs/IPSs, authentication servers, and endpoints. There is usually an executive SIEM that shows the overall
behavior and risk associated with each device and sensor. Unresolved events can then be triaged and highlighted
using colors representing different threat levels [105]. In this way, a SIEM can visually guide the expert to resolve
the most urgent incident. The integration of multiple data sources also helps, giving a “full picture” of the attack
pathway/campaign including other targets or areas that may be affected within the network system.
SIEMs utilize visualization intensively (and not just in executive dashboards) to visually support experts
in their search for anomalous patterns [138]. In contrast to other tools used by domain experts, SIEM tools
tend to follow human factors guidelines more closely. Integrating SIEM systems with ML models may also
lead to better categorization of network traffic and prediction of attack patterns [28, 227]. With the help of ML
technologies, incident responders should be able to both obtain required information more efficiently, and isolate
the compromised zone in a timely manner.
While studies have shown the usefulness of SIEM tools, SOC implementations in industry are often not ideal.
Chamkar et al. conducted a survey with 45 SOC analysts/SOC service providers [34] and found deficiencies
in automation and data orchestration (97%), visibility concerning IT security infrastructure (95%), appropriate
methods to handle false alarms (93%), and guidelines or playbooks (92%). They also found a general lack of:
training and attack simulations, knowledge towards business risks, and adequate evaluation metrics, etc., in the
SOCs that they studied. Meanwhile, a study [? ] showed that there are only few off-the-shelf SIEM systems that
have ML functionalities. The level of cybersecurity automation is currently far less automated than the level of
automation studied in academic settings. Thus industry faces a situation where there is a considerable amount
of manual (human) task activity in cybersecurity countermeasures but without the requisite consideration of
human factors issues.
How can we learn from this situation, and develop improved methods, not just for SIEMs, but for all
countermeasures in dealing with the threat of data exfiltration, and more broadly, within the domain of
cybersecurity. The promise of ML will not be fully realized if solutions are not engineered with the properties
of humans clearly in mind. In the following discussion we consider four major human factors issues that have
been prominent in a range of domains from nuclear power to aviation and healthcare. We will use SIEM tools to
exemplify the problems here and will then further elaborate them in later subsections. Thus four key human
factors problems are:
• Expert availability
• Situational awareness
• Trust and reliance
• Human-System Compatibility
Expert availability is a highly salient human factors issue for SIEMs. Experts are expensive, and difficult to hire
because of security knowledge shortages in the market [150]. Thus, human experts are a precious resource and
their time should not be wasted. However, SIEM deployment currently relies on writing ad-hoc data collectors
and compromise indicators case-by-case. This makes it difficult for domain experts to keep track of large volumes
of data [41]. In contrast, situational awareness is usually well-considered in SIEM tools, which are typically
constructed to promote situational awareness [62]. However, interpreting SIEM dashboard outputs can be
challenging. Few studies (subsection 5.3) have covered this issue within the domain of cybersecurity. SIEM tools
are widely used in attempting to automate decision-making processes [? ], but the problem of setting appropriate
levels of trust and reliance for human experts has not been considered, neither have human-system compatibility
issues been discussed, although they are coming to the fore in other ML application areas [17, 18].
In the remainder of this section we briefly review the role of human experts in human-model systems
as characterized in the previous research literature. This review will help identify problems associated with
implementing automation/ML in the domain of cybersecurity against exfiltration threats, and will address our
earlier research question 3 that concerns the actual benefits/limitations of countermeasures, considering human
users, organizational structures, and other socio-technical factors.
Prior to reviewing each of these human factors in the following subsections, we will briefly characterize the
opportunities for including human expertise in various stages of the ML model training process:
• In data collection: human interaction is involved in the collection of past events, the process of use cases
creation in simulation technologies, in the setup of honeypots, etc.
• In data pre-processing: human interaction is involved in defense system building, cyber kill-chain design,
system patching, rules/policies creation, signature databases maintenance, data labelling, etc.
• In detection process: human interaction is involved in knowledge input, discussion between domain experts
and ML experts, and related activities
• In results and analyses: human interaction is involved in reading output, investigations, resolving alerts,
and making different types of judgements
The human role is important throughout the monitoring and detection process, but it has rarely been considered
in past research and that role has been poorly defined. As a result, the outputs provided by ML models and software
countermeasures will often be ignored or misinterpreted. This deficiency should be addressed, and human factors
should be considered in designing detection algorithms. While human factors issues are sometimes considered out
of scope in highly automated systems, they will start to come to the fore in strategic decision-making concerning
the selection and preprocessing of data, and in model training.
While we noted four human factors issues in this section, we will conclude by recognizing that the essential
difficulty in defining the human role in combating data exfiltration, and perhaps in cybersecurity generally,
is that humans work very differently from algorithms and have very different input and output requirements.
While there may be some recognition of this fact at a conceptual level, we are a long way from dealing with it
in operational settings. The following subsections review the four human factors problems listed earlier as a
necessary step towards defining more appropriate and useful roles for humans in an interactive ML process.
becomes increasingly challenging. ML, data visualization, and other computer aiding methods can provide
situation awareness and highlight the most important features of the current situation, but that highlighting has
to be done carefully, so that the information is presented to human experts in a way that matches their needs and
capabilities, as well as their expectations in the particular context.
Providing the right information at the right time will also help manage the mental workload of domain experts.
Without proper interaction design between experts and ML algorithms as well as their outputs, there is typically
a significant stream of alerts representing possibly anomalous cases, and the domain expert needs to try and
prioritize the alerts and sift through them. Prioritization is necessary because with so many alerts it is not possible
to deal with them all. Like an understaffed call center with the phones always ringing, the expert is besieged
by more alerts than can possibly be handled, leading to stress as well as high workload. Thus, it is critical to
offload the routine handling of alerts so that the expert can handle the highest priority alerts, for instance, those
that need to be interpreted with human expertise. Note that the human interaction with the ML algorithm will
involve not only sorting through high priority alerts, but also training the algorithm(s) with labelling advice,
feature weighting, and other activities.
Perhaps the greatest challenge of expert-ML systems is creating compatibility between humans and ML
algorithms [17]. In the case of deep learning, compatibility is particularly challenging because it is difficult to
translate the weights assigned to the many processing units (“neurons”) in the network into simpler concepts,
relationships and general weightings of importance that are easily grasped by humans. However, the problem of
opacity in neural network outputs is well known, and research is ongoing into how to make approaches such
as deep learning more consumable by humans. In practice there may be a tradeoff, where domain experts and
managers may be willing to trade off a certain amount of model accuracy in return for greater interpretability.
Thus, there have been attempts to break down deep learning models by providing representative explanations for
insights [160]; or by utilizing local linear models to approximate detection boundaries near the input instances,
so as to help select key contributing features [74]. Regardless of the approach used, humans need to remain
in-the-loop to read results and make decisions about how to update or apply models in the future.
In a domain like cybersecurity, where intensive situation-awareness and trust is needed, the compatibility
issue is always likely to be a problem. An interactive machine learning (iML) approach that can directly address
this issue by iteratively updating the training data based on human input and by making the model’s logic more
transparent is needed, so as to both hand control back to human users efficiently and avoids the problem of
unrecognized model brittleness [190] involving states or cases where the model training is insufficient, and the
model predictions cannot be trusted. However, further studies are required before implementing such models in
this critical domain.
7 CONCLUSION
The ever-growing threat of costly data exfiltration events has led organizations to recognize data security
as a major imperative. Unfortunately, efforts to secure the perimeters of organizational networks have not
adequately addressed the threats posed by insiders, either those who have legitimate roles inside organizations,
or masqueraders, who have obtained insider credentials (e.g., through phishing). Since there are many data
exfiltration threats and knowledge of human behavior is an essential part of analyzing these threats, previous
algorithms that have relied exclusive on ML based detection, followed by human review of alerts, have fallen
short because they have not addressed the full complexity of data exfiltration scenarios or relevant human factors
issues. Thus, there is a need to create a more active role for human experts throughout the process of detecting
data exfiltration activities. The assistance of human experts is relevant across the exfiltration detection lifecycle,
from data logging, rules creating, and debugging, to resolution of alerts and performance of investigations. The
need for vigilant detection methods will continue regardless of whether sensitive data is stored in the cloud
or within a network hosted by the organization. In spite of efforts to prevent cybersecurity threats using new
approaches such as zero trust architectures [162], data exfiltration will continue to be a threat for the foreseeable
future and it is part of the fiduciary responsibility of organizations to include strong detection methods, as well
as prevention methods, in their defensive arsenal.
In a domain that is rapidly adopting state-of-the-art automation methods, the importance of expert knowledge
in detecting data exfiltration events has been overlooked. In this paper we addressed this issue by 1) surveying
industry reports and previous studies to emphasize the urgent need to place experts in-the-loop while creating
automated models/systems; 2) documenting the failings of current countermeasures and explaining why those
failings occur due to inadequate consideration of human roles; 3) describing why it is crucial to connect algorithms
and experts together, and emphasizing the need to improve the human factors of the domain expert work flow.
Cybersecurity applications that include a role for human experts are necessarily socio-technical systems and
cannot be safely and efficiently operated without considering relevant human factors issues. In this paper we
have not only provided a state-of-the-art review of data exfiltration countermeasures, but have also provided
insights into the human factors that need to be addressed in future research.
ACKNOWLEDGMENTS
Mark Chignell acknowledges support from Mitacs grant IT30559, "Detection and Investigation of Email Exfiltration
Events in Sun Life Cybersecurity Data”. David Lie acknowledges support from a Tier 1 Canada Research Chair.
REFERENCES
[1] 2019. Nodoze: Combatting threat alert fatigue with automated provenance triage. Network and Distributed Systems Security (NDSS)
Symposium 2019 (2019).
[2] Islam Abdalla and Mohamed Abass. 2018. Social Engineering Threat and Defense: A Literature Survey. Journal of Information Security
9 (2018), 257–264. https://doi.org/10.4236/jis.2018.94018
[3] Qasem Abu Al-Haija and Abdelraouf Ishtaiwi. 2021. Machine Learning Based Model to Identify Firewall Decisions to Improve
Cyber-Defense. International Journal on Advanced Science Engineering and Information Technology 11, 4 (2021).
[4] Majid Afshar, Saeed Samet, and Hamid Usefi. 2021. Incorporating Behavior in Attribute Based Access Control Model Using Machine
Learning. 15th Annual IEEE International Systems Conference, SysCon 2021 - Proceedings (apr 2021).
[5] Alfred V. Aho and Margaret J. Corasick. 1975. Efficient string matching. Commun. ACM 18, 6 (jun 1975), 333–340.
[6] Rawan Al-Shaer, Jonathan M. Spring, and Eliana Christou. 2020. Learning the Associations of MITRE ATT CK Adversarial Techniques.
2020 IEEE Conference on Communications and Network Security, CNS 2020 (jun 2020).
[7] Wajdi Alhakami, Abdullah Alharbi, Sami Bourouis, Roobaea Alroobaea, and Nizar Bouguila. 2019. Network Anomaly Intrusion
Detection Using a Nonparametric Bayesian Approach and Feature Selection. IEEE Access 7 (2019), 52181–52190.
[8] Sultan Alneyadi, Elankayer Sithirasenan, and Vallipuram Muthukkumarasamy. 2016. A survey on data leakage prevention systems.
Journal of Network and Computer Applications 62 (feb 2016), 137–152.
[9] Dennis Appelt, Cu D. Nguyen, and Lionel Briand. 2015. Behind an application firewall, are we safe from SQL injection attacks? 2015
IEEE 8th International Conference on Software Testing, Verification and Validation, ICST 2015 - Proceedings (may 2015).
[10] Abir Awad, Sara Kadry, Guraraj Maddodi, Saul Gill, and Brian Lee. 2016. Data leakage detection using system call provenance.
Proceedings - 2016 International Conference on Intelligent Networking and Collaborative Systems, IEEE INCoS 2016 (oct 2016), 486–491.
[11] Amos Azaria, Ariella Richardson, Sarit Kraus, and V. S. Subrahmanian. 2014. Behavioral analysis of insider threat: A survey and
bootstrapped prediction in imbalanced data. , 135–155 pages.
[12] Paul Baecher, Markus Koetter, Thorsten Holz, Maximillian Dornseif, and Felix Freiling. 2006. The nepenthes platform: An efficient
approach to collect malware. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), Vol. 4219 LNCS. Springer Verlag, 165–184.
[13] Ashutosh Bahuguna, R. K. Bisht, and Jeetendra Pande. 2020. Country-level cybersecurity posture assessment:Study and analysis of
practices. Information Security Journal 29, 5 (sep 2020), 250–266.
[14] Wade Baker, Mark Goudie, Alexander Hutton, C David Hylender, Jelle Niemantsverdriet, Christopher Novak, David Ostertag,
Christopher Porter, Mike Rosen, Bryan Sartin, et al. 2011. 2011 data breach investigations report. Verizon RISK Team, Available: www.
verizonbusiness. com/resources/reports/rp_databreach-investigationsreport-2011_en_xg. pdf (2011), 1–72.
[15] Tao Ban, Ndichu Samuel, Takeshi Takahashi, and Daisuke Inoue. 2021. Combat Security Alert Fatigue with AI-Assisted Techniques.
ACM International Conference Proceeding Series (aug 2021), 9–16.
[16] Gagan Bansal, Raymond Fok, Marco Tulio Ribeiro, Tongshuang Wu, Joyce Zhou, Ece Kamar, Daniel S Weld, and Besmira Nushi. 2021.
Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance; Does the Whole Exceed its
Parts? The Effect of AI Explanations on Complementary Team Performance. Proceedings of the 2021 CHI Conference on Human Factors
in Computing Systems (2021), 1–16.
[17] Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental
Models in Human-AI Team Performance. Technical Report 1. 19 pages. www.aaai.org
[18] Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams:
Understanding and addressing the performance/compatibility tradeoff. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019,
31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in
Artificial Intelligence, EAAI 2019. 2429–2437.
[19] Paul Barford, Marc Dacier, Thomas G Dietterich, Matt Fredrikson, Jon Giffin, Sushil Jajodia, Somesh Jha, Jason Li, Peng Liu, Peng Ning,
Xinming Ou, Dawn Song, Laura Strater, Vipin Swarup, George Tadda, Cliff Wang, and John Yen. 2010. Cyber SA: Situational awareness
for cyber defense. Advances in Information Security 46 (2010), 3–13.
[20] Punam Bedi, Vandana Gandotra, Archana Singhal, Himanshi Narang, and Sumit Sharma. 2012. Threat-oriented security framework in
risk management using multiagent system. Wiley Online Library 43, 9 (sep 2012), 1013–1038.
[21] Maya Bercovitch, Meir Renford, Lior Hasson, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2011. HoneyGen: An automated honeytokens
generator. Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011 (2011), 131–136.
[22] Deepavali Bhagwat, Laura Chiticariu, Wang-Chiew Tan, Gaurav Vijayvargiya, D Bhagwat, · L Chiticariu, W.-C Tan, and · G Vijayvargiya.
2005. An annotation management system for relational databases. The VLDB Journal 14, 4 (oct 2005), 373–396.
[23] Sandeep Bhatt, Pratyusa K. Manadhata, and Loai Zomlot. 2014. The operational role of security information and event management
systems. IEEE Security and Privacy 12 (2014), 35–41. Issue 5. https://doi.org/10.1109/MSP.2014.103
[24] RM Blank. 2011. Guide for conducting risk assessments. (2011).
[25] James P. Bliss and Richard D. Gilson. 1998. Emergency signal failure: implications and recommendations. Ergonomics 41, 1 (jan 1998),
57–72.
[26] DJ Bodeau, CD McCollum, and DB Fox. 2018. Cyber threat modeling: Survey, assessment, and representative framework. (2018).
[27] Lance Bonner. 2012. Cyber risk: How the 2011 Sony data breach and the need for cyber risk insurance policies should direct the federal
response to rising data breaches. Wash. UJL & Pol’y 40 (2012), 257.
[28] Blake D. Bryant and Hossein Saiedian. 2020. Improving SIEM alert metadata aggregation with a novel kill-chain based classification
model. Computers & Security 94 (7 2020), 101817. https://doi.org/10.1016/J.COSE.2020.101817
[29] Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In International
Conference on Database Theory, Vol. 1973. Springer, Berlin, Heidelberg, 316–330.
[30] Peter Buneman and Wang-Chiew Tan. 2018. Data Provenance: What next? ACM SIGMOD Record 47, 3 (2018), 5–13.
[31] S Caltagirone, A Pendergast, and C Betz. 2013. The diamond model of intrusion analysis. Center For Cyber Intelligence Analysis and
Threat Research (2013).
[32] Jared J. Cash. 2009. Alert fatigue. , 2098–2101 pages.
[33] Davide Castelvecchi. 2020. Quantum-computing pioneer warns of complacency over Internet security - Document - Gale Academic
OneFile. Nature 587, 7833 (2020), 189–190.
[34] Samir Achraf Chamkar, Yassine Maleh, and Noreddine Gherabi. 2022. THE HUMAN FACTOR CAPABILITIES IN SECURITY OPERATION
CENTER (SOC). EDPACS 66 (2022), 1–14. Issue 1. https://doi.org/10.1080/07366981.2021.1977026
[35] S Chandel, S Yu, T Yitian, Z Zhili, and H Yusheng. 2019. Endpoint protection: Measuring the effectiveness of remediation technologies
and methodologies for insider threat. 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery
(CyberC) (2019), 81–89.
[36] Juan D. Chaparro, Cory Hussain, Jennifer A. Lee, Jessica Hehmeyer, Manjusri Nguyen, and Jeffrey Hoffman. 2020. Applied Clinical
Informatics 11, 1 (2020), 46–58.
[37] Suresh N Chari and Pau-Chen Cheng. 2003. BlueBoX: A Policy-Driven, Host-Based Intrusion Detection System. ACM Transactions on
Information and System Security 6, 2 (2003), 173–200.
[38] Ping Chen, Lieven Desmet, and Christophe Huygens. 2014. A Study on Advanced Persistent Threats. Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8735 LNCS (2014), 63–72.
[39] Zouhair Chiba, Noureddine Abghour, Khalid Moussaid, Amina El Omri, and Mohamed Rida. 2018. A novel architecture combined with
optimal parameters for back propagation neural networks applied to anomaly network intrusion detection. Computers & Security 75
(jun 2018), 36–58.
[40] Mu Huan Chung, Mark Chignell, Lu Wang, Alexandra Jovicic, and Abhay Raman. 2020. Interactive Machine Learning for Data
Exfiltration Detection: Active Learning with Human Expertise. In IEEE Transactions on Systems, Man, and Cybernetics: Systems,
Vol. 2020-Octob. 280–287.
[41] Marcello Cinque, Domenico Cotroneo, and Antonio Pecchia. 2018. Challenges and Directions in Security Information and Event
Management (SIEM). Proceedings - 29th IEEE International Symposium on Software Reliability Engineering Workshops, ISSREW 2018 (11
2018), 95–99. https://doi.org/10.1109/ISSREW.2018.00-24
[42] Clearswift. 2013. The Enemy Within: an emerging threat... https://www.clearswift.com/blog/2013/05/02/enemy-within-emerging-
threat
[43] Chris W. Clegg. 2000. Sociotechnical principles for system design. Applied Ergonomics 31, 5 (2000), 463–477.
[44] Victor Clincy and Hossain Shahriar. 2018. Web Application Firewall: Network Security Models and Configuration. Proceedings -
International Computer Software and Applications Conference 1 (jun 2018), 835–836.
[45] B. Commentz-Walter. 1979. A string matching algorithm fast on the average. Springer- International Colloquium on Automata, Languages,
and Programming (1979), 118–132.
[46] U. S. Congress. 1982. Security Classification Policy and Executive Order 12356. , 13–20 pages.
[47] Jose Antonio Coret. [n.d.]. Kojoney - A honeypot for the SSH Service.
[48] Lorrie Faith Cranor. 2008. A framework for reasoning about the human in the loop. In Usability, Psychology, and Security, UPSEC 2008.
[49] CrowdStrike. 2022. 2022 Global Threat Report. (2022).
[50] Joan Daemen and Vincent Rijmen. 1999. AES Proposal: Rijndael. (1999).
[51] R. N. Dahbul, C. Lim, and J. Purnama. 2017. Enhancing Honeypot Deception Capability Through Network Service Fingerprinting.
Journal of Physics: Conference Series 801, 1 (jan 2017).
[52] K Daniel and J. Andreas. 2022. Evaluation of AI-based use cases for enhancing the cyber security defense of small and medium-sized
companies (SMEs). Electronic Imaging 34 (2022), 1–8.
[53] Ruth M. Davis. 1978. The Data Encryption Standard in Perspective. IEEE Communications Society Magazine 16, 6 (1978), 5–9.
[54] T. Dierks and E. Rescorla. [n.d.]. The Transport Layer Security (TLS) Protocol Version 1.2.
[55] W. Diffie and M. E. Hellman. 1976. New directions in cryptography. .
[56] Deborah D. Downs, Jerzy R. Rub, Kenneth C. Kung, and Carole S. Jordan. 1985. Issues in Discretionary Access Control. Proceedings -
IEEE Symposium on Security and Privacy (1985), 208–218.
[57] Mahmoud Elkhodr and Belal Alsinglawi. 2020. Data provenance and trust establishment in the Internet of Things. Security and Privacy
3, 3 (may 2020).
[58] Mica R. Endsley. 1988. Design and Evaluation for Situation Awareness Enhancement. Proceedings of the Human Factors Society Annual
Meeting 32, 2 (oct 1988), 97–101.
[59] Eden Estopace. 2016. Massive data breach exposes all Philippines voters. [Online]. Available: https://www.telecomasia.net/content/massive-
data-breach-exposes-all-philippines-voters (2016).
[60] Daren Fadolalkarim and Elisa Bertino. 2019. A-PANDDE: Advanced Provenance-based ANomaly Detection of Data Exfiltration.
Computers & Security 84 (jul 2019), 276–287.
[61] Daren Fadolalkarim, Asmaa Sallam, and Elisa Bertino. 2016. PANDDE: Provenance-based anomaly detection of data exfiltration.
CODASPY 2016 - Proceedings of the 6th ACM Conference on Data and Application Security and Privacy (mar 2016), 267–276.
[62] BS Fakiha. 2020. Effectiveness of Security Incident Event Management (SIEM) System for Cyber Security Situation Awareness. Indian
Journal of Forensic Medicine and Toxicology 14 (2020). Issue 4.
[63] D Ferraiolo, J Cugini, and DR Kuhn. 1995. Role-based access control (RBAC): Features and motivations. Proceedings of 11th computer
security application conference (1995), 241–248.
[64] David F. Ferraiolo, Ravi Sandhu, Serban Gavrila, D. Richard Kuhn, and Ramaswamy Chandramouli. 2001. Proposed NIST standard for
role-based access control. ACM Transactions on Information and System Security (TISSEC) 4, 3 (aug 2001), 224–274.
[65] U Franke and J Brynielsson Security. 2014. Cyber situational awareness–a systematic review of the literature. Elsevier - Computers &
Security (2014).
[66] Maxime Frydman, Guifré Ruiz, Elisa Heymann, Eduardo César, and Barton P. Miller. 2014. Automating risk analysis of software design
models. Scientific World Journal (2014).
[67] Sean Gallagher. 2015. At first cyber meeting, China claims OPM hack is “criminal case” [Updated] | Ars Technica. https://arstechnica.
com/tech-policy/2015/12/at-first-cyber-meeting-china-claims-opm-hack-is-criminal-case/
[68] P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and E. Vázquez. 2009. Anomaly-based network intrusion detection: Techniques,
systems and challenges. Computers and Security 28, 1-2 (2009), 18–28.
[69] Jill Gerhardt-Powals. 1996. Cognitive Engineering Principles for Enhancing Human-Computer Performance. Plastics, Rubber and
Composites Processing and Applications 8, 2 (1996), 189–211.
[70] Iffat A Gheyas and Ali E Abdallah. 2016. Detection and prediction of insider threats to cyber security: a systematic literature review
and meta-analysis. Big Data Analytics 1, 1 (2016), 1–29.
[71] Shafi Goldwasser and Silvio Micali. 1984. Probabilistic encryption. J. Comput. System Sci. 28, 2 (apr 1984), 270–299.
[72] Stephanie Gootman. 2016. OPM hack: The most dangerous threat to the federal government today. Journal of Applied Security Research
11, 4 (2016), 517–525.
[73] Frank L Greitzer and Deborah A Frincke. 2010. Combining traditional cyber security audit data with psychosocial data: towards
predictive modeling for insider threat mitigation. In Insider threats in cyber security. Springer, 85–113.
[74] Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. Lemna: Explaining deep learning based security
applications. Proceedings of the ACM Conference on Computer and Communications Security (oct 2018), 364–379.
[75] Hani Hagras. 2018. Toward Human-Understandable, Explainable AI. Computer 51, 9 (sep 2018), 28–36.
[76] P A Hancock, Tara Kajaks, Jeff K Caird, Mark H Chignell, Sachi Mizobuchi, Peter C. Burns, Jing Feng, Geoff R Fernie, Martin Lavallière,
Ian Y. Noy, Donald A Redelmeier, and Brenda H. Vrkljan. 2020. Challenges to Human Drivers in Increasingly Automated Vehicles.
Human Factors 62, 2 (mar 2020), 310–328.
[77] Richard Harang and Peter Guarino. 2012. Clustering of Snort alerts to identify patterns and reduce analyst workload. In Proceedings -
IEEE Military Communications Conference MILCOM.
[78] Michael Hart, Pratyusa Manadhata, and Rob Johnson. 2011. Text Classification for Data Loss Prevention. Privacy Enhancing Technologies
(2011), 18–37.
[79] W. U. Hassan, MA Noureddine, P. Datta, and A. Bates. 2020. OmegaLog: High-fidelity attack investigation via transparent multi-layer
log analysis. In Network and Distributed System Security Symposium.
[80] Morgan Henrie. 2013. Cyber security risk management in the scada critical infrastructure environment. EMJ - Engineering Management
Journal 25, 2 (jun 2013), 38–45.
[81] Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for Explainable AI: Challenges and Prospects.
arXiv:1812.04608
[82] Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Crişan, Camelia M. Pintea, and Vasile
Palade. 2019. Interactive machine learning: experimental evidence for the human in the algorithmic loop: A case study on Ant Colony
Optimization. Applied Intelligence 49, 7 (jul 2019), 2401–2414.
[83] Ivan Homoliak, Flavio Toffalini, Juan Guarnizo, Yuval Elovici, and Martín Ochoa. 2019. Insight into insiders and it: A survey of insider
threat taxonomies, analysis, modeling, and countermeasures. ACM Computing Surveys (CSUR) 52, 2 (2019), 1–40.
[84] Anne Honkaranta, Tiina Leppanen, and Andrei Costin. 2021. Towards Practical Cybersecurity Mapping of STRIDE and CWE - A
Multi-perspective Approach. Conference of Open Innovation Association, FRUCT (may 2021), 150–159.
[85] Feng-Yung Hu. 2016. Russian Intervention: Paranoia or Weapon for National Security? From the Perspective on Public Diplomacy.
Washington Post (2016).
[86] Rui Hu, Zheng Yan, Wenxiu Ding, and Laurence T. Yang. 2020. A survey on data provenance in IoT. World Wide Web 23, 2 (mar 2020),
1441–1463.
[87] Vincent C Hu, David Ferraiolo, Rick Kuhn, Arthur R Friedman, Alan J Lang, Margaret M Cogdell, Adam Schnitzer, Kenneth Sandlin,
Robert Miller, Karen Scarfone, et al. 2013. Guide to attribute based access control (ABAC) definition and considerations (draft). NIST
special publication 800, 162 (2013).
[88] Sebastiaan P. Huber, Spyros Zoupanos, Martin Uhrin, Leopold Talirz, Leonid Kahle, Rico Häuselmann, Dominik Gresch, Tiziano Müller,
Aliaksandr V. Yakutovich, Casper W. Andersen, Francisco F. Ramirez, Carl S. Adorf, Fernando Gargiulo, Snehal Kumbhar, Elsa Passaro,
Conrad Johnston, Andrius Merkys, Andrea Cepellotti, Nicolas Mounet, Nicola Marzari, Boris Kozinsky, and Giovanni Pizzi. 2020.
AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance. Scientific Data 7, 1 (sep
2020), 1–18. arXiv:2003.12476
[89] Jeffrey Hunker and Christian W Probst. 2011. Insiders and Insider Threats-An Overview of Definitions and Mitigation Techniques. J.
Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl. 2, 1 (2011), 4–27.
[90] Eric M Hutchins, Michael J Cloppert, Rohan M Amin, et al. 2011. Intelligence-driven computer network defense informed by analysis
of adversary campaigns and intrusion kill chains. Leading Issues in Information Warfare & Security Research 1, 1 (2011), 80.
[91] Sotiris Ioannidis, Angelos D Keromytis, Steve M Bellovin, and Jonathan M Smith. 2000. Implementing a Distributed Firewall. Proceedings
of the 7th ACM conference on Computer and communications security (2000), 190–199.
[92] Graeme Jenkinson, Lucian Carata, Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, Robert N M Watson, Jonathan Anderson,
Brian Kidney, Amanda Strnad, and Arun Thomas. 2017. Applying Provenance in APT Monitoring and Analysis: Practical Challenges
for Scalable, Efficient and Trustworthy Distributed Provenance. 9th USENIX Workshop on the Theory and Practice of Provenance (2017).
[93] Xin Jin, Ram Krishnan, and Ravi Sandhu. 2012. A Unified Attribute-Based Access Control Model Covering DAC, MAC and RBAC.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012),
41–55.
[94] Shijoe Jose, D. Malathi, Bharath Reddy, and Dorathi Jayaseeli. 2018. A Survey on Anomaly Based Host Intrusion Detection System. In
Journal of Physics: Conference Series, Vol. 1000. Institute of Physics Publishing, 12049.
[95] N Kaloudi, J Li ACM Computing Surveys (CSUR), and undefined 2020. 2020. The ai-based cyber threat landscape: A survey. dl.acm.org
53, 1 (feb 2020).
[96] Adi Karahasanovic, Pierre Kleberger, and Magnus Almgren. 2017. Adapting Threat Modeling Methods for the Automotive Industry. ej
tryckt (2017).
[97] Mike Karp. 2005. Keep on truckin’ your back-up tapes? You’ve got to be kidding! | Network World. https://www.networkworld.com/
article/2320740/keep-on-truckin--your-back-up-tapes--you-ve-got-to-be-kidding-.html
[98] Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. 2010. Querying data provenance. Proceedings of the ACM SIGMOD
International Conference on Management of Data (2010), 951–962.
[99] Kelly M Kavanagh, Oliver Rochford, and Toby Bussa. 2015. Magic quadrant for security information and event management. Gartner
Group Research Note (2015).
[100] Salman Khaliq, Zain Ul Abideen Tariq, and Ammar Masood. 2020. Role of User and Entity Behavior Analytics in Detecting Insider
Attacks. 1st Annual International Conference on Cyber Warfare and Security, ICCWS 2020 - Proceedings (oct 2020).
[101] Rafiullah Khan, Kieran McLaughlin, David Laverty, and Sakir Sezer. 2017. STRIDE-based threat modeling for cyber-physical systems.
2017 IEEE PES Innovative Smart Grid Technologies Conference Europe, ISGT-Europe 2017 - Proceedings (jul 2017), 1–6.
[102] Dennis Kiwia, Ali Dehghantanha, Kim Kwang Raymond Choo, and Jim Slaughter. 2018. A cyber kill chain based taxonomy of banking
Trojans for evolutionary computational intelligence. Journal of Computational Science 27 (jul 2018), 394–409.
[103] L. Kohnfelder and Garg. 1999. The threats to our products.
[104] Maria Korolov and Lysa Myers. 2018. What is the Cyber Kill Chain? Why It’s Not Always the Right Approach to Cyber Attacks. CSO.
[105] Igor Kotenko and Evgenia Novikova. 2014. Visualization of security metrics for cyber situation awareness. Proceedings - 9th International
Conference on Availability, Reliability and Security, ARES 2014 (12 2014), 506–513. https://doi.org/10.1109/ARES.2014.75
[106] Srinivas Krishnan, Kevin Z. Snow, and Fabian Monrose. 2012. Trail of bytes: New techniques for supporting data provenance and
limiting privacy breaches. IEEE Transactions on Information Forensics and Security 7, 6 (2012), 1876–1889.
[107] Sailesh Kumar. 2007. Survey of Current Network Intrusion Detection Techniques. Washington Univ. in St. Louis (2007).
[108] Roger Kwon, Travis Ashley, Jerry Castleberry, Penny McKenzie, and Sri Nikhil Gupta Gourisetti. 2020. Cyber threat dictionary using
MITRE ATTCK matrix and NIST cybersecurity framework mapping. 2020 Resilience Week, RWS 2020 (oct 2020), 106–112.
[109] Butler W. Lampson. 1974. Protection. ACM SIGOPS Operating Systems Review 8, 1 (jan 1974), 18–24.
[110] Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, and Jaideep Srivastava. 2003. A Comparative Study of Anomaly
Detection Schemes in Network Intrusion Detection. Proceedings of the 2003 SIAM International Conference on Data Mining (SDM) (may
2003), 25–36.
[111] Duc C. Le, Nur Zincir-Heywood, and Malcolm I. Heywood. 2020. Analyzing Data Granularity Levels for Insider Threat Detection
Using Machine Learning. IEEE Transactions on Network and Service Management 17 (3 2020), 30–44. Issue 1. https://doi.org/10.1109/
TNSM.2020.2967721
[112] Hyunjung Lee, Suryeon Lee, Kyounggon Kim, and Huy Kang Kim. 2021. HSViz: Hierarchy Simplified Visualizations for Firewall Policy
Analysis. IEEE Access 9 (2021), 71737–71753.
[113] John D. Lee and Neville Moray. 1994. Trust, self-Confidence, and operators’ adaptation to automation. International Journal of Human -
Computer Studies 40, 1 (1994), 153–184.
[114] John D. Lee and Katrina A. See. 2004. Trust in automation: Designing for appropriate reliance. , 50–80 pages.
[115] Xueping Liang, Sachin Shetty, Deepak Tosh, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. 2017. ProvChain: A Blockchain-Based
Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability. Proceedings - 2017 17th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017 (jul 2017), 468–477.
[116] Liu Liu, Olivier De Vel, Qing-Long Han, Jun Zhang, and Yang Xiang. 2018. Detecting and preventing cyber insider threats: A survey.
IEEE Communications Surveys & Tutorials 20, 2 (2018), 1397–1417.
[117] Simon Liu and Rick Kuhn. 2010. Data loss prevention. IT Professional 12, 2 (mar 2010), 10–13.
[118] Lockheed Martin. 2022. Cyber Kill Chain . https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-chain.html
[119] Xin Luo, Richard Brody, Alessandro Seazzu, and Stephen Burd. 2011. Social Engineering: The Neglected Human Factor for Information
Security Management. Information Resources Management Journal (IRMJ) 24 (2011), 1–8. Issue 3. https://doi.org/10.4018/IRMJ.
2011070101
[120] Tyson Macaulay. 2016. RIoT Control: Understanding and Managing Risks and the Internet of Things.
[121] Florian Mansmann, Timo Göbel, and William Cheswick. 2012. Visual analysis of complex firewall configurations. ACM International
Conference Proceeding Series (2012), 1–8.
[122] Aaron Marback, Hyunsook Do, Ke He, Samuel Kondamarri, and Dianxiang Xu. 2013. A threat model-based approach to security testing.
Software: Practice and Experience 43, 2 (feb 2013), 241–258.
[123] Goncalo Martins, Sajal Bhatia, Xenofon Koutsoukos, Keith Stouffer, Cheeyee Tang, and Richard Candell. 2015. Towards a systematic
threat modeling approach for cyber-physical systems. Proceedings - 2015 Resilience Week, RSW 2015 (oct 2015), 114–119.
[124] Earl D. Matthews, Harold J. Arata III, and Brian L. Hale. 2016. Cyber situational awareness. JSTOR: The Cyber Defense Review 1, 1
(2016), 35–46.
[125] Vasileios Mavroeidis and Audun Jøsang. 2018. Data-Driven Threat Hunting Using Sysmon. Proceedings of the 2nd International
Conference on Cryptography, Security and Privacy (2018).
[126] McAfee. 2021. Advanced Threat Research Report. (2021).
[127] CSIS McAfee. 2014. Net losses: estimating the global cost of cybercrime. McAfee, Centre for Strategic & International Studies (2014).
[128] Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, and Margo Seltzer. 2004. File classification in self-* storage systems.
Proceedings - International Conference on Autonomic Computing (2004), 44–51.
[129] Md Nazmus Sakib Miazi, Mir Mehedi A. Pritom, Mohamed Shehab, Bill Chu, and Jinpeng Wei. 2017. The design of cyber threat hunting
games: A case study. 2017 26th International Conference on Computer Communications and Networks, ICCCN 2017 (sep 2017).
[130] MITRE ATT&CK. [n.d.]. ATT&CK Matrix for Enterprise. https://attack.mitre.org/
[131] Iyatiti Mokube and Michele Adams. 2007. Honeypots: Concepts, approaches, and challenges. In Proceedings of the Annual Southeast
Conference, Vol. 2007. 321–326.
[132] B Mukherjee, LT Heberlein, and KN Levitt. 1994. Network intrusion detection. IEEE Network (1994), 26–41.
[133] Masoud Narouei, Hamed Khanpour, Hassan Takabi, Natalie Parde, and Rodney Nielsen. 2017. Towards a top-down policy engineering
framework for attribute-based access control. Proceedings of ACM Symposium on Access Control Models and Technologies, SACMAT (jun
2017), 103–114.
[134] Rida Nasir, Mehreen Afzal, Rabia Latif, and Waseem Iqbal. 2021. Behavioral Based Insider Threat Detection Using Deep Learning. IEEE
Access 9 (2021), 143266–143274. https://doi.org/10.1109/ACCESS.2021.3118297
[135] Peter G Neumann. 2010. Combatting insider threats. In Insider Threats in Cyber Security. Springer, 17–44.
[136] Jakob Nielsen. 2004. Usability engineering. In Computer Science Handbook, Second Edition. 45–1–45–21.
[137] Kaiti Norton. 2020. Antivirus vs. EPP vs. EDR: How to Secure Your Endpoints.
[138] Evgenia Novikova and Igor Kotenko. 2013. Analytical visualization techniques for security information and event management.
Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2013 (2013),
519–525. https://doi.org/10.1109/PDP.2013.84
[139] Jason RC Nurse, Oliver Buckley, Philip A Legg, Michael Goldsmith, Sadie Creese, Gordon RT Wright, and Monica Whitty. 2014.
Understanding insider threat: A framework for characterising attacks. In 2014 IEEE Security and Privacy Workshops. IEEE, 214–228.
[140] Sylvia Osborn. 1997. Mandatory access control and role-based access control revisited. In Proceedings of the ACM Workshop on
Role-Based Access Control. 31–40.
[141] Y Ou, Y Lin, and Y Zhang. 2010. The design and implementation of host-based intrusion detection system. The design and implementation
of host-based intrusion detection system (2010), 595–598.
[142] Vassilis Papaspirou, Leandros Maglaras, Mohamed Amine Ferrag, Ioanna Kantzavelou, Helge Janicke, and Christos Douligeris. 2021. A
novel Two-Factor HoneyToken Authentication Mechanism. Proceedings - International Conference on Computer Communications and
Networks, ICCCN (jul 2021). arXiv:2012.08782
[143] Jaehong Park and Ravi Sandhu. 2004. The UCONABC usage control model. ACM Transactions on Information and System Security
(TISSEC) 7, 1 (feb 2004), 128–174.
[144] Kamran Parsaye and Mark Chignell. 1988. Expert Systems for experts. (1988).
[145] Charles Perrow. 1981. Normal accident at three Mile Island. Technical Report 5. 17–26 pages.
[146] John Pescatore. 2021. SANS 2021 Top New Attacks and Threat Report. (2021). https://www.rapid7.com/info/sans-2021-new-attacks-
threat-report/
[147] A. B.Robert Petrunić. 2015. Honeytokens as active defense. 38th International Convention on Information and Communication Technology,
Electronics and Microelectronics, MIPRO 2015 - Proceedings (jul 2015), 1313–1317.
[148] Shari Lawrence Pfleeger, Joel B Predd, Jeffrey Hunker, and Carla Bulford. 2009. Insiders behaving badly: Addressing bad actors and
their actions. IEEE transactions on information forensics and security 5, 1 (2009), 169–179.
[149] Charles E Phillips, T C Ting, and Steven A Demurjian. 2002. Information Sharing and Security in Dynamic Coalitions. Proceedings of
the seventh ACM symposium on Access control models and technologies - SACMAT ’02 (2002).
[150] Oskars Podzins and Andrejs Romanovs. 2019. Why SIEM is Irreplaceable in a Secure IT Environment? 2019 Open Conference of
Electrical, Electronic and Information Sciences, eStream 2019 - Proceedings (4 2019). https://doi.org/10.1109/ESTREAM.2019.8732173
[151] Davy Preuveneers and Wouter Joosen. 2021. Sharing Machine Learning Models as Indicators of Compromise for Cyber Threat
Intelligence. Journal of Cybersecurity and Privacy 2021, Vol. 1, Pages 140-163 1, 1 (feb 2021), 140–163.
[152] D Dhillon Privacy. 2011. Developer-driven threat modeling: Lessons learned in the trenches. IEEE Security & Privacy (2011).
[153] Niels Provos. 2004. A virtual honeypot framework. Proceedings of the 13th USENIX Security Symposium (2004).
[154] Ben Quinn and Charles Arthur. 2011. PlayStation Network hackers access data of 77 million users. The Guardian 27 (2011).
[155] Fahimeh Raja, Kirstie Hawkey, and Konstantin Beznosov. 2009. Towards improving mental models of personal firewall users. Conference
on Human Factors in Computing Systems - Proceedings (2009), 4633–4638.
[156] Fahimeh Raja, Kai Le Clement Wang, Kirstie Hawkey, Konstantin Beznosov, and Steven Hsu. 2011. Promoting a physical security
mental model for personal firewall warnings. Conference on Human Factors in Computing Systems - Proceedings (2011), 1585–1590.
[157] Pedro Ramos Brandao and João Nunes. 2021. Extended Detection and Response Importance of Events Context. Kriative.tech (2021).
[158] R. Rengarajan and S. Babu. 2021. Anomaly Detection using User Entity Behavior Analytics and Data Visualization. 8th International
Conference on Computing for Sustainable Global Development (2021), 842–847.
[159] Ian Reynolds. 2020. 2020 SANS Network Visibility and Threat Detection Survey. SANS Institute April (2020). https://www.sans.org/
webcasts/network-visibility-threat-detection-survey-112595
[160] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 13-17-Augu (aug 2016), 1135–1144.
arXiv:1602.04938
[161] R. L. Rivest, A. Shamir, and L. Adleman. 1978. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. ACM Secure
communications and asymmetric cryptosystems 21, 2 (feb 1978), 120–126.
[162] Scott Rose, Oliver Borchert, Stu Mitchell, and Sean Connelly. 2019. Zero Trust Architecture. Technical Report.
[163] Bushra Sabir, Faheem Ullah, M. Ali Babar, and Raj Gaire. 2021. Machine Learning for Detecting Data Exfiltration: A Review. Comput.
Surveys 54, 3 (jun 2021).
[164] Fatima Salahdine and Naima Kaabouch. 2019. Social Engineering Attacks: A Survey. Future Internet 2019, Vol. 11, Page 89 11 (4 2019),
89. Issue 4. https://doi.org/10.3390/FI11040089
[165] Malek Ben Salem, Shlomo Hershkop, and Salvatore J Stolfo. 2008. A survey of insider attack detection research. Insider Attack and
Cyber Security (2008), 69–90.
[166] Ravi S. Sandhu. 1993. Lattice-Based Access Control Models. Computer 26, 11 (1993), 9–19.
[167] Ravi S. Sandhu. 1998. Role-based Access Control. Advances in Computers 46, C (jan 1998), 237–286.
[168] Ravi S. Sandhu, Edward J. Coyne, Hal L. Feinstein, and Charles E. Youman. 1996. Computer role-based access control models. Computer
29, 2 (feb 1996), 38–47.
[169] Ravi S. Sandhu and Pierangela Samarati. 1994. Access Control: Principles and Practice. IEEE Communications Magazine 32, 9 (1994),
40–48.
[170] Riccardo Scandariato, Kim Wuyts, and Wouter Joosen. 2015. A descriptive study of Microsoft’s threat modeling technique. Requirements
Engineering 20, 2 (mar 2015), 163–180.
[171] Peter Schaab, Kristian Beckers, and Sebastian Pape. 2017. Social engineering defence mechanisms and counteracting training strategies.
Information and Computer Security 25 (2017), 206–222. Issue 2. https://doi.org/10.1108/ICS-04-2017-0022/FULL/HTML
[172] G. Scott Graham and Peter J. Denning. 1972. Protection-Principles and practice. Proceedings of the Spring Joint Computer Conference,
AFIPS 1972 (may 1972), 417–429.
[173] Daniel Servos and Sylvia L Osborn. 2017. Current research and open problems in attribute-based access control. ACM Computing
Surveys (CSUR) 49, 4 (2017), 1–45.
[174] Burr Settles. 2009. Active learning literature survey. Technical Report (2009).
[175] Burr Settles. 2011. From theories to queries: Active learning in practice. JMLR: Workshop and Conference Proceedings 16 16 (2011), 1–18.
[176] William Seymour. 2019. Privacy therapy with ARETHA: What if your firewall could talk? Conference on Human Factors in Computing
Systems - Proceedings (may 2019).
[177] A Shabtai, Y Elovici, and L Rokach. 2012. A survey of data leakage detection and prevention solutions. (2012).
[178] Dave Shackleford. 2016. SANS 2016 security analytics survey. SANS Institute, Swansea (2016).
[179] Adi Shamir. 1979. How to share a secret. Commun. ACM 22, 11 (nov 1979), 612–613.
[180] Balaram Sharma, Prabhat Pokharel, and Basanta Joshi. 2020. User Behavior Analytics for Anomaly Detection Using LSTM Autoencoder:
Insider Threat Detection. Proceedings of the 11th International Conference on Advances in Information Technology (2020), 1–9.
[181] Rupam Kumar Sharma, Hemanta Kumar Kalita, and Biju Issac. 2014. Different firewall techniques: A survey. 5th International Conference
on Computing Communication and Networking Technologies, ICCCNT 2014 (nov 2014).
[182] Thomas B Sheridan and Robert T Hennessy. 1984. Research and Modeling of Supervisory Control Behavior. Technical Report.
[183] N Shevchenko, TA Chick, P O’Riordan, and TP Scanlon. 2018. Threat modeling: a summary of available methods. Carnegie Mellon
University Software Engineering Institute (2018).
[184] Adam Shostack. 2008. Experiences Threat Modeling at Microsoft. (2008).
[185] Adam Shostack. 2014. Threat Modeling: Designing for Security.
[186] Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. 2005. A survey of data provenance in e-science. ACM SIGMOD Record 34, 3 (sep
2005), 31–36.
[187] Jussi Simola and Jyri Rajamäki. 2017. Hybrid emergency response model: Improving cyber situational awareness. In European Conference
on Information Warfare and Security, ECCWS. 442–451. www.laurea.fi
[188] Michael Sivak, Daniel J. Weintraub, and Michael Flannagan. 1991. Nonstop Flying Is Safer Than Driving. Risk Analysis 11, 1 (1991),
145–148.
[189] Miles E. Smid and Dennis K. Branstad. 1988. The Data Encryption Standard: Past and Future. Proc. IEEE 76, 5 (1988), 550–559.
[190] Philip J. Smith, C. Elaine McCoy, and Charles Layton. 1997. Brittleness in the design of cooperative problem-solving systems: The
effects on user performance. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans. 27, 3 (1997), 360–371.
[191] Luke S. Snyder, Yi Shan Lin, Morteza Karimzadeh, Dan Goldwasser, and David S. Ebert. 2019. Interactive learning for identifying
relevant tweets to support real-time situational awareness.
[192] Lance Spitzner. 2003. Honeypots: Catching the insider threat. Proceedings - Annual Computer Security Applications Conference, ACSAC
2003-Janua (2003), 170–179.
[193] L. Spitzner. 2003. Honeytokens: The other honeypot.
[194] Lance Spitzner. 2003. The honeynet project: Trapping the hackers. IEEE Security and Privacy 1, 2 (2003), 15–23.
[195] Shreyas Srinivasa, Jens Myrup Pedersen, and Emmanouil Vasilomanolakis. 2020. Towards systematic honeytoken fingerprinting. 13th
International Conference on Security of Information and Networks (2020).
[196] J Steven. 2010. Threat modeling-perhaps it’s time. IEEE Security & Privacy (2010).
[197] SJ Stolfo, SM Bellovin, S Hershkop, and AD Keromytis. 2008. Insider attack and cyber security: beyond the hacker. (2008).
[198] Jeremy Straub. 2020. Modeling Attack, Defense and Threat Trees and the Cyber Kill Chain, ATTCK and STRIDE Frameworks as
Blackboard Architecture Networks. Proceedings - 2020 IEEE International Conference on Smart Cloud, SmartCloud (nov 2020), 148–153.
[199] B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, and C. B. Thomas. 2018. Mitre att&ck: Design and philosophy.
Technical report (2018).
[200] Frank. Swiderski and Window. Snyder. 2004. Threat modeling. Microsoft Press.
[201] Dan Swinhoe. 2019. The biggest data breach fines, penalties and settlements so far. CSO, Framingham, jul (2019).
[202] Dan Swinhoe. 2020. The 15 biggest data breaches of the 21st century. CSO. Last modified (2020).
[203] Mohammad M.Bany Taha, Sivadon Chaisiri, and Ryan K.L. Ko. 2015. Trusted tamper-evident data provenance. Proceedings - 14th IEEE
International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2015 1 (dec 2015), 646–653.
[204] Radwan Tahboub and Yousef Saleh. 2014. Data leakage/loss prevention systems (DLP). 2014 World Congress on Computer Applications
and Information Systems, WCCAIS 2014 (oct 2014).
[205] Baoming Tang, Qiaona Hu, and Derek Lin. 2017. Reducing false positives of user-to-entity first-access alerts for user behavior analytics.
IEEE International Conference on Data Mining Workshops, ICDMW (dec 2017), 804–811.
[206] Adem Tekerek, Cemal Gemci, and Omer Faruk Bay. 2014. Development of a hybrid web application firewall to prevent web based
attacks. 8th IEEE International Conference on Application of Information and Communication Technologies, AICT 2014 - Conference
Proceedings (2014).
[207] Erdem Ucar and Erkan Ozhan. 2017. The Analysis of Firewall Policy Through Machine Learning and Data Mining. Wireless Personal
Communications 96, 2 (sep 2017), 2891–2909.
[208] Faheem Ullah, Matthew Edwards, Rajiv Ramdhany, Ruzanna Chitchyan, M Ali Babar, and Awais Rashid. 2018. Data exfiltration: A
review of external attack vectors and countermeasures. Journal of Network and Computer Applications 101 (2018), 18–54.
[209] AV Uzunov and EB Fernandez Interfaces. 2014. An extensible pattern-based library and taxonomy of security threats for distributed
systems. Elsevier - Computer Standards (2014).
[210] Antonio Varriale, Paolo Prinetto, Alberto Carelli, and Pascal Trotta. 2016. SEcube ™ : Data at Rest and Data in Motion Protection.
International Conference Security and Management (2016), 138–145.
[211] Verizon. 2020. 2020 Data Breach Investigations Report. https://enterprise.verizon.com/resources/reports/dbir/
[212] Rakesh Verma, Murat Kantarcioglu, David Marchette, Ernst Leiss, and Thamar Solorio. 2015. Security analytics: Essential data analytics
knowledge for cybersecurity professionals and students. IEEE Security and Privacy 13, 6 (2015), 60–65.
[213] Luca Vigano and Daniele Magazzeni. 2020. Explainable Security. In Proceedings - 5th IEEE European Symposium on Security and Privacy
Workshops, Euro S and PW 2020. 293–300. arXiv:1807.04178
[214] Ke Wang and Salvatore J. Stolfo. 2004. Anomalous Payload-Based Network Intrusion Detection. Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3224 (2004), 203–222.
[215] Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, Xiao Yu, Kexuan Zou, Junghwan Rhee, Zhengzhang Chen, Wei Cheng, Carl A
Gunter, and Haifeng Chen. 2020. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. Network and
Distributed Systems Security (NDSS) Symposium 2020 (2020).
[216] David Watson and Jamie Riden. 2008. The honeynet project: Data collection tools, infrastructure, archives and analysis. Technical Report.
24–30 pages.
[217] Imano Williams and Xiaohong Yuan. 2015. Evaluating the Effectiveness of Microsoft Threat Modeling Tool. Proceedings of the 2015
Information Security Curriculum Development Conference (2015).
[218] Martyn Williams. 2017. Inside the Russian hack of Yahoo: How they did it.
[219] Avishai Wool. 2004. A Quantitative Study of Firewall Configuration Errors. Computer 37, 6 (2004), 62–67.
[220] Sun Wu and Udi Manber. 1994. A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING. (1994).
[221] Tobias Wüchner and Alexander Pretschner. 2012. Data loss prevention based on data-driven usage control. Proceedings - International
Symposium on Software Reliability Engineering, ISSRE (2012), 151–160.
[222] Wenjun Xiong, Emeline Legrand, Oscar Åberg, and Robert Lagerström. 2022. Cyber security threat modeling based on the MITRE
Enterprise ATT&CK Matrix. Software and Systems Modeling 21, 1 (feb 2022), 157–177.
[223] W Xiong and R Lagerström Security. 2019. Threat modeling–A systematic literature review. Elsevier Computers & security (2019).
[224] Kaiping Xue, Weikeng Chen, Wei Li, Jianan Hong, and Peilin Hong. 2018. Combining Data Owner-Side and Cloud-Side Access Control
for Encrypted Cloud Storage. IEEE Transactions on Information Forensics and Security 13, 8 (aug 2018), 2062–2074.
[225] T Yadav and AM Rao. 2015. Technical aspects of cyber kill chain. International Symposium on Security in Computing and Communication
(2015), 438–452.
[226] Ran Yahalom, Erez Shmueli, and Tomer Zrihen. 2010. Constrained Anonymization of Production Data: A Constraint Satisfaction
Problem Approach. Secure Data Management (2010), 41–53.
[227] Jae yeol Kim and Hyuk Yoon Kwon. 2022. Threat classification model for security information event management focusing on model
efficiency. Computers & Security 120 (9 2022), 102789. https://doi.org/10.1016/J.COSE.2022.102789
[228] Faheem Zafar, Abid Khan, Saba Suhail, Idrees Ahmed, Khizar Hameed, Hayat Mohammad Khan, Farhana Jabeen, and Adeel Anjum.
2017. Trustworthy data: A survey, taxonomy and future trends of secure provenance schemes. Journal of Network and Computer
Applications 94 (sep 2017), 50–68.
[229] Marzia Zaman and Chung Horng Lung. 2018. Evaluation of machine learning techniques for network intrusion detection. IEEE/IFIP
Network Operations and Management Symposium: Cognitive Management in a Cyber World, NOMS 2018 (jul 2018), 1–5.
[230] Xiaopeng Zhang. 2022. Phishing Campaign Delivering Three Fileless Malware: AveMariaRAT / BitRAT / PandoraHVNC – Part I |
FortiGuard Labs.
[231] Xinyou Zhang, Chengzhong Li, and Wenbin Zheng. 2004. Intrusion prevention system design. Proceedings - The Fourth International
Conference on Computer and Information Technology (CIT 2004) (2004), 386–390.