research-article

Open access

Enhancements to Threat, Vulnerability, and Mitigation Knowledge for Cyber Analytics, Hunting, and Simulations

Authors:

Erik Hemberg,

Matthew J. Turner,

Nick Rutar,

Una-May O’reillyAuthors Info & Claims

Digital Threats: Research and Practice, Volume 5, Issue 1

Article No.: 8, Pages 1 - 33

https://doi.org/10.1145/3615668

Published: 21 March 2024 Publication History

PDF eReader

Abstract

Cross-linked threat, vulnerability, and defensive mitigation knowledge is critical in defending against diverse and dynamic cyber threats. Cyber analysts consult it by deductively or inductively creating a chain of reasoning to identify a threat starting from indicators they observe or vice versa. Cyber hunters use it abductively to reason when hypothesizing specific threats. Threat modelers use it to explore threat postures. We aggregate five public sources of threat knowledge and three public sources of knowledge that describe cyber defensive mitigations, analytics, and engagements and which share some unidirectional links between them. We unify the sources into a graph, and in the graph, we make all unidirectional cross-source links bidirectional. This enhancement of the knowledge makes the questions that analysts and automated systems formulate easier to answer. We demonstrate this in the context of various cyber analytic and hunting tasks as well as modeling and simulations. Because the number of linked entries is very sparse, to further increase the analytic utility of the data, we use natural language processing and supervised machine learning to identify new links. These two contributions demonstrably increase the value of the knowledge sources for cyber security activities.

1 Introduction

Numerous technical approaches defend computer networks from security threats. At least three benefit from drawing upon interconnected, online, and professionally curated sources that offer extensive threat, target, and defensive knowledge that has been extracted from security reports: Cyber analysts trace entries that span the sources to trace connections among exposures, threats, and defensive solutions [2, 4, 6, 10, 13, 14, 15, 16, 17, 25, 28, 52]. In cyber hunting [33], the hunter consults threat and vulnerability knowledge to characterize a particular threat and pinpoint where they would find evidence indicative of it being active. Finally, in modeling and simulation, the knowledge sources are consulted to simulate adapting threats, offer different defenses, and evaluate threat-hardening solutions [64].

Our interests lie in making this sort of knowledge more accessible to cyber analysts, cyber hunters, and threat modelers, while enhancing it for their purposes. We work with an existing comprehensive aggregation of public sources of threat, vulnerability knowledge, and public sources that describe cyber defensive countermeasures. Specifically¹ (see also Section A.2.2), this aggregation consists of:

–

Knowledge about the behavior of advanced persistent threat (APT)² tactics, techniques, and procedures (TTPs), as they are classified and described in the MITREAdversarial Tactics, Techniques, and Common Knowledge (ATT&CK) [36] matrix. This is an evolving knowledge base.

–

Knowledge about APT attack patterns as they are described in the MITRE Common Attack Pattern Enumeration and Classification (CAPEC) [37] dictionary. Attack patterns connect specific sets of TTPs to specific software weaknesses.

–

Software and hardware weaknesses as they are listed in the MITRE Common Weakness Enumeration (CWE) [39] dictionary.

–

Common vulnerabilities and exposures that are in the NIST National Vulnerability Database (CVE) [38].

–

Descriptions of cyber tools for exploitation, like Metasploit [55], which is used for penetration testing, as they are listed in ExploitDB (EDB) [46].

–

Descriptions of cyber defensive counter measures as they are listed in a knowledge graph named D3FEND [30].

–

Knowledge of deployable defensive analytics as listed in the Cyber Analytics Repository (CAR) [34].

–

A framework for adversary engagements (Engage) [35] that provides resources for effective and safe defensive denial and deception of adversaries.

These internet-based sources are interconnected by unidirectional hyperlinks to web addresses (URLs) starting from an entry in one source and linking to an entry in another source. A link between a specific pair of sources has a specific meaning. For example, a link from an attack pattern entry in CAPEC to an entry in ATT&CK indicates that the attack pattern named and described in the CAPEC entry uses the technique described in the ATT&CK entry. Table 1’s “Relationship in Sources” column shows the different types of directional links found between pairs of the knowledge sources.

Table 1.

From, To Sources	Relationship in sources	Transposed relationship
CAPEC, ATT & CK	technique uses attack-pattern	attack-pattern used by technique (BRON)
CWE, CVE	weakness allows vulnerability	vulnerability allowed by weakness
CWE, CAPEC	weakness enables attack-pattern	attack-pattern enabled by weakness
D3FEND, ATT & CK	countermeasure alleviates technique	technique alleviated by countermeasure (BRON)
Engage, ATT & CK	engagement counters technique	technique countered by engagement (BRON)
EDB, CVE	tool accomplishes vulnerability	vulnerability accomplished by tool (BRON)

Table 1. The Unidirectional Links (middle) within the Knowledge Sources (Left Column) and Their Transpositions (Right Column)

Some links are added by the BRON project (marked BRON). The resulting BRON property graph has bidirectional edges. Note, at the date of publication, one-third of transpositions exists in the sources.

Our goals are to:

(1)

Enhance the sources we have chosen in support of cyber analytics and hunting by transposing the unidirectional links to create bidirectional relational knowledge and to demonstrate use cases.

(2)

Fill gaps in relational knowledge in linking the data sources.

We introduce the reader to an existing aggregative representation, a property graph³ named BRON.⁴ The BRON project offers software that unifies these sources by creating the BRON project graph. Each entry is a node and links between pairs of entries are edges without any direction, i.e., they are bidirectional [3, 20] (for details, see Section 2). Significantly, and with a modest implementation cost, BRON enhances the knowledge sources with its bidirectional links. The general outcome is that BRON finds [A relates to B] and adds [B relates to A]. More specifically, BRON takes [CAPEC-ENTRY uses ATT&CK-ENTRY, i.e., attack-pattern uses technique] and adds [ATT&CK-ENTRY used by CAPEC-ENTRY, i.e., technique used by attack-pattern], e.g., CAPEC-148: Content Spoofing uses T1491: Defacement and T1491: Defacement used byCAPEC-148: Content Spoofing. The same happens for other source links (see Table 1 for them and their transpositions). Note that the sources are constantly being curated and some links added by BRON now exist in the sources. This enhancement of the knowledge in BRON makes complex inquiries of analysts and automated systems easier to conduct and also unifies navigation through the sources. We provide some example use cases that demonstrate how the BRON enhancements support hypothesis-driven threat hunting and modeling and simulation with parameterized red and blue agents.

Our second goal is to fill another gap in relational knowledge. This will further enhance the collective value of the sources and improve cyber analytics and hunting. This gap is the sparse connectivity between entries, even with the transposed links added in BRON. There are pairs of entries between the sources that are arguably related but found to be not linked (in either direction). Obviously, this gap is not a fault of the property graph, BRON, because the graph preserves every link in the sources. The gap falls back to the challenges of curating a set of existing and constantly updating information. As a result of the gap, experts may find that important links between information in the eight sources are unavailable. We show how we use machine learning to infer links. We use a language embedding technique to encode the text of entries into machine understandable representations. We then use supervised learning, with encoded and labeled examples that we can obtain from BRON, to train link inference models. These techniques are embedded within a workflow that starts with existing examples of linked entries for different relationships that span the knowledge sources. The workflow then uses different language embedding models (LEM) to encode the text of pairwise related entries. Finally, with a relationship label, these examples form positive examples that are combined with negative examples to train a suite of predictive models. After training, the workflow predicts unseen candidate pairs of linked entries, filters a short-list, and passes these to human experts who rate them to provide assurance. We test this workflow and experts verify it has found novel, plausible, and interesting relationships.

Our contributions are:

(1)

Enhancing the sources we have chosen in support of cyber analytics and hunting by transposing the unidirectional links to create and add bidirectional relational knowledge. For this publication, we extend BRON [20, 21, 22, 23] by adding CAR, Engage, and EDB, as well as expanding Reference [61] with further RL experiments. The original BRON paper [20] only deals with offensive data sources. References [21, 22, 23] introduced defensive information from the existing sources and additional defensive data sources (D3FEND).

(2)

Demonstrations of how the BRON graph with its bidirectional enhancements makes it more convenient to conduct hypothesis-driven cyber hunting. We map different kinds of hypothetical threats, those that arise from an APT perspective and those that arise from the perspective of a potential target. In addition, we show how the BRON graph can prepare information for a red team exercise. Furthermore, we demonstrate research use of BRON to offer network defense postures as they support a machine learning enhanced modeling and simulation framework. The modeling entails parameterized red agents drawing upon behavioral threat and target information within the sources in the BRON graph. It allows us to investigate how different machine learning algorithms find different defensive postures.

(3)

Fill a gap in relational knowledge and further enhance the collective value of the sources and improve cyber analytics and hunting. We show how we use machine learning to infer links. We test this workflow and experts verify it has found novel, plausible, and interesting relationships.

We now proceed to the rest of the article. Note that BRON [20] was extended to include CAR, Engage, and EDB. This article consolidates, clarifies, and elaborates upon References [21, 22, 23] with red team planning and analysis including the BRON extensions. It also expands upon Reference [61], reporting further reinforcement learning experiments. In Section 2, we provide a description of BRON so references to its usage are clear throughout the remainder of the article. In Section 3, we employ BRON to launch inquiries that call upon the knowledge sources of our interest. In Section 4, we present a human-centered workflow that generates machine-learned relationships for the knowledge sources. In Section 5, we discuss our results. In Section 6, we present related work. In Section 7, we present conclusions and future work.

2 Unified Cyber Security Knowledge Sources

The BRON property graph unifies and enhances cyber knowledge We motivate its existence in Section 2.1, explain how the implementation enhances its sources in Section 2.2, and catalog its sources and say why they were selected in Section 2.3. In Section 2.4, we present descriptive statistics of the data in BRON. Additional information about BRON is provided in Appendix A.2 and in Reference [20].

2.1 Motivation behind BRON

Without BRON, within the knowledge sources we work with, multiple searches involving trial-and-error link tracing between their entries are required to gather threat, target, and defensive knowledge, which informs incident analysis, hunting exercises, and modeling experiments. An example of this effort, in early 2022, is trying to gather knowledge around a Log4j vulnerability with identifier CVE-2021-44228. To learn about the vulnerability, we could read its entry in the CVE database. This allows us to learn that the vulnerability is related to the Log4j library, which is popular for logging events in Java applications. To continue and find out what APTs might target this vulnerability and what techniques and procedures the APT might use, we encounter a problem. This query is not directly possible because, while the ATT&CK matrix [36] has this information, there is no direct link between CVE and ATT&CK entries. Instead, we need to, one-by-one, follow each of the links from the CVE entry to weaknesses in CWE, then follow all these links, one-by-one, to CAPEC for attack patterns [37], and finally, one-by-one, look for any links to the ATT&CK matrix from the attack patterns. Then, among all the possible navigations and readings of the content, one possible path between CVE and ATT&CK is found. The BRON project was started in 2018 to ease this kind of effort and streamline these sorts of inquiries by unifying knowledge sources and the links between their entries into one representation—a property graph—and making each link bidirectional. In practice, BRON is built by downloading data from each source, finding the linked entries in the downloaded data, and then constructing the property graph (see Reference [3]).

2.2 BRON’s Property Graph

BRON’s graph is built by scripts within the BRON project [3] that download text entries and link data from the knowledge sources. Our data source notation is: Tactics \(\tau\), Techniques \(\epsilon\), Attack Patterns \(\alpha\), Weakness \(\omega\), Vulnerabilities \(\nu\), Exploits \(\chi\), Mitigations \(\delta\), Engagements \(\eta\), Analytics \(\kappa\) (see Table 12; for a more formal description, see Appendix A.2.1). The scripts in the BRON project consolidate by mapping the sources’ entries to nodes in the graph (typed by source) and the links between entries to bidirectional edges in the graph (typed by relationship). Implementing the property graph with a database further provides a powerful query interface to the knowledge. Implementing bidirectional edges for unidirectional links adds inverse meaning to the links, while it also extends navigational capabilties across the sources. The need for enumerative search requiring link inversion is eliminated. The graph offers search capability starting from any entry and provides traversal along any edge relation, starting and ending from any two entries of different sources.

Table 2.

Relationship (Src-Dst)	Src	Dst	# Edges	# Src	# Dst	E/P	Dst Ratio	Src Ratio
EDB-CVE	45,094	204,736	29,919	26,350	24,193	3.2E-06	1.2E-01	5.8E-01
CVE-CPE	204,736	246,436	4,582,192	191,624	246,436	9.1E-05	1.0E+00	9.4E-01
Technique-CAPEC	594	555	117	101	91	3.5E-04	1.6E-01	1.7E-01
Technique-D3fend mitigation	594	170	155	42	35	1.5E-03	2.1E-01	7.1E-02
Technique-Technique detection	594	578	578	578	578	1.7E-03	1.0E+00	9.7E-01
CAPEC-CWE	555	933	1,157	412	329	2.2E-03	3.5E-01	7.4E-01
CWE-CVE	933	204,736	447,663	326	158,484	2.3E-03	7.7E-01	3.5E-01
CWE-CWE mitigation	933	661	1,641	661	661	2.7E-03	1.0E+00	7.1E-01
CAPEC-CAPEC detection	555	54	92	54	54	3.1E-03	1.0E+00	9.7E-02
CAR-Technique	101	594	263	99	110	4.4E-03	1.9E-01	9.8E-01
CWE-Cwe detection	933	115	482	115	115	4.5E-03	1.0E+00	1.2E-01
CAPEC-CAPEC mitigation	555	395	1,153	395	395	5.3E-03	1.0E+00	7.1E-01
CAR-D3fend mitigation	101	170	101	99	17	5.9E-03	1.0E-01	9.8E-01
Technique-Engage	594	31	596	175	23	3.2E-02	7.4E-01	2.9E-01
Technique-Technique mitigation	594	43	1,137	489	43	4.5E-02	1.0E+00	8.2E-01
CAR-Tactic	101	14	126	99	12	8.9E-02	8.6E-01	9.8E-01
Tactic-Technique	14	594	770	14	594	9.3E-02	1.0E+00	1.0E+00

Table 2. BRON Description

Relationship is the name of the Src-Dst collections. Src is the number of nodes in Src. Dst is the number of nodes in Dst. # Edges is the number of edges. # Src is the unique number of nodes with an edge in Src. # Dst is the unique number of nodes with an edge in Dst. E/P is the ratio of existing edges and possible edges. Dst Ratio and Src Ratio is the ratio of nodes with at least one edge and total nodes in the collection.

Table 3.

Tactic	Technique	CAPEC	CWE	CVE	Metasploit
Persistence	Dynamic Linker Hijacking	Subverting Environment Variable Values	Exposure of Sensitive Information to an Unauthorized Actor	CVE-2019-1653	Cisco RV320 and RV325
Persistence	Web Shell	Upload a Web Shell to a Web Server	Improper Authentication	CVE-2018-12613	phpMyAdmin
Persistence	Dynamic Linker Hijacking	Subverting Environment Variable Values	Improper Input Validation	CVE-2018-11776	Apache Struts 2
Persistence	Default Accounts	Try Common or Default Usernames and Passwords	Use of Hard-coded Credentials	CVE-2018-10575	Watchguard AP100 AP102 AP200 1.2.9.15
Initial-access	Default Accounts	Try Common or Default Usernames and Passwords	Use of Hard-coded Credentials	CVE-2018-10575	Watchguard AP100 AP102 AP200 1.2.9.15

Table 3. Example of Paths to Metasploit Exploits for the Initial-access, Persistence Retrieved from BRON

Ordered by CVE ID.

Table 4.

Collection	Entries	Count
Tactic	[Initial-access, Persistence]	2
CAR	[Simultaneous Logins on a Host, Service Outlier Executables, Create local admin accounts using net exe, ...]	16
CAPEC mitigation	[Try Common or Default Usernames and Passwords, Run Software at Logon, ...]	6
CWE mitigation	[Use of Hard-coded Credentials, Improper Access Control, ...]	8
Technique detection	[Default Accounts, Boot or Logon Initialization Scripts, ...]	6
CAPEC detection	[Try Common or Default Usernames and Passwords]	1
CWE detection	[Use of Hard-coded Credentials, Improper Authentication, ...]	6
D3FEND mitigation	[Process Self-Modification Detection, Process Termination, ...]	6
Engage	[Baseline]	1

Table 4. Entries for Defense on Paths to Metasploit Exploits for Initial-access and Persistence Tactics Retrieved from BRON

Table 5.

Dataset	Neighbors	Attack Pattern	Technique	Tactic	Weakness	Mitigative	Detections
N	Direct	\(\checkmark\)	\(\checkmark\)
NO	Direct and Offensive	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
NOM	Direct, Offensive, and Mitigative	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
NOD	Direct, Offensive, and Detections	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)		\(\checkmark\)
NOMD	All	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)

Table 5. Datasets and Their Sources

Mitigative consists of records from D3FEND, Engage, ATT & CK, CAPEC, and CWE.

Table 6.

LEM	Context	Semantics	Dimensionality	Training
Bag-of-Words	No	No	Varying	None
GloVE	No	Yes	Fixed	Pre
BERT	Yes	Yes	Fixed	Pre
F-BERT	Yes	Yes	Fixed	Fine-tuned

Table 6. Comparison of Language Embedding Models (LEM)

Context indicates whether the embeddings capture context in our text inputs. Semantics indicates whether the embeddings capture word meaning. Dimensionality refers to embedding size. Training indicates how the Language Embedding Model was trained.

Table 7.

	Dataset	N		NO		NOD		NOM		NOMD
	Measure	Acc	F1	Acc	F1	Acc	F1	Acc	F1	Acc	F1
Relationship	LEM
countermeasure alleviates technique \((\delta , \epsilon)\)	BERT	0.937	0.937	0.952	0.952	0.961	0.963	0.957	0.959	0.971	0.972
	BoW	0.923	0.927	0.947	0.949	0.957	0.959	0.947	0.949	0.952	0.951
	F-BERT	0.937	0.937	0.952	0.952	0.961	0.963	0.957	0.959	0.971	0.972
	GloVE	0.952	0.957	0.957	0.962	0.961	0.962	0.971	0.972	0.971	0.972
engagement counters technique \((\eta , \epsilon)\)	BERT	0.861	0.858	0.873	0.881	0.888	0.888	0.884	0.892	0.876	0.884
	BoW	0.861	0.866	0.884	0.887	0.878	0.883	0.869	0.872	0.867	0.875
	F-BERT	0.861	0.858	0.873	0.881	0.888	0.888	0.884	0.892	0.876	0.884
	GloVE	0.850	0.863	0.876	0.885	0.895	0.897	0.888	0.893	0.891	0.893
technique uses attack-pattern \((\alpha , \epsilon)\)	BERT	0.859	0.853	0.873	0.857	0.817	0.812	0.803	0.821	0.803	0.800
	BoW	0.789	0.783	0.831	0.835	0.803	0.821	0.803	0.821	0.817	0.840
	F-BERT	0.859	0.853	0.873	0.857	0.817	0.812	0.803	0.821	0.803	0.800
	GloVE	0.831	0.829	0.817	0.827	0.845	0.861	0.817	0.827	0.831	0.842
weakness enables attack-pattern \((\omega , \alpha)\)	BERT	0.831	0.828	0.840	0.834	0.838	0.839	0.865	0.867	0.857	0.858
	BoW	0.840	0.849	0.836	0.846	0.815	0.818	0.821	0.828	0.802	0.813
	F-BERT	0.831	0.828	0.840	0.834	0.838	0.839	0.865	0.867	0.857	0.858
	GloVE	0.847	0.846	0.852	0.848	0.833	0.830	0.868	0.864	0.853	0.852
weakness allows vulnerability \((\omega , \nu)\)	BERT	0.938	0.935	0.930	0.930	0.938	0.938	0.937	0.937	0.935	0.935
	BoW	0.877	0.875	0.920	0.916	0.912	0.908	0.927	0.925	0.933	0.932
	F-BERT	0.938	0.935	0.930	0.930	0.938	0.938	0.937	0.937	0.935	0.935
	GloVE	0.940	0.939	0.935	0.935	0.940	0.939	0.937	0.937	0.935	0.937

Table 7. Best Accuracy (Acc) and F1 Results on Test Data, over the Four Classifiers for Each Language Embedding Model (LEM), Relationship, and Dataset before the Curator Sorts and Thresholds

Accuracy and F1: higher is better, range is \([0, 1]\) Classifiers for each Language Embedding Model with the best F1 for each relationship are selected for the workflow (highlighted in pink).

Table 8.

Relationship	Plausible	Implausible	Interesting	Undecided	Total
weakness enables attack-pattern \((\omega , \alpha)\)	4	0	2	4	10
engagement counters technique \((\eta , \epsilon)\)	2	0	1	6	9
technique uses attack-pattern \((\alpha , \epsilon)\)	1	3	2	6	12
weakness allows vulnerability \((\omega , \nu)\)	0	3	1	6	10
countermeasure alleviates technique \((\delta , \epsilon)\)	0	0	0	12	12

Table 8. Number of Plausible, Implausible, Interesting, and Undecided Consensus Labels for the Candidates for Each Relationship

Table 9.

CAPEC \((\alpha)\)	Technique \((\epsilon)\)	Relation	\(\mathbf {E_1}\)	\(\mathbf {E_2}\)	\(\mathbf {E_3}\)	\(\mathbf {E_4}\)	Class
561	T1078.002	Windows Admin Shares with Stolen Credentials uses Domain Accounts	1	1	1	1	P
652	T1555	Use of Known Kerberos Credentials uses Credentials from Password Stores	1	0.5	0.5	1	IN
653	T1558	Use of Known Windows Credentials uses Steal or Forge Kerberos Tickets	1	0.5	0.5	0.5	IN
CWE \((\omega)\)	CAPEC \((\alpha)\)	Relation	\(\mathbf {E_1}\)	\(\mathbf {E_2}\)	\(\mathbf {E_3}\)	\(\mathbf {E_4}\)	Class
181	71	Incorrect Behavior Order: Validate Before Filter enables Using Unicode Encoding to Bypass Validation Logic	1	1	0.5	1	IN
20	174	Improper Input Validation enables Flash Parameter Injection	1	1	1	1	P
287	60	Improper Authentication enables Reusing Session IDs (a.k.a. Session Replay)	1	1	1	0.5	IN
74	73	Improper Neutralization of Special Elements in Output Used by a Downstream Component (“Injection”) enables User-controlled Filename	1	1	1	0.5	IN
Engage \((\eta)\)	Technique \((\epsilon)\)	Relation	\(\mathbf {E_1}\)	\(\mathbf {E_2}\)	\(\mathbf {E_3}\)	\(\mathbf {E_4}\)	Class
EAC0016	T1083	Network Manipulation counters File and Directory Discovery	1	1	1	0.5	IN
EAC0018	T1025	Security Controls counters Data from Removable Media	1	1	1	0.5	IN
EAC0022	T1033	Artifact Diversity counters System Owner/User Discovery	1	1	1	1	P
CWE \((\omega)\)	CVE \((\nu)\)	Relation	\(\mathbf {E_1}\)	\(\mathbf {E_2}\)	\(\mathbf {E_3}\)	\(\mathbf {E_4}\)	Class
20	CVE-2021-32507	Improper Input Validation allows Absolute Path Traversal vulnerability in FileDownload in QSAN Storage Manager allows remote authenticated attackers to download arbitrary files via the URL path parameter. The referred vulnerability has been solved with the updated version of QSAN Storage Manager v3.3.3.	1	1	0.5	0.5	IN

Table 9. Pairs of Relational Link Candidates

\(E_*\) are the experts who labeled the edges. Highlighted (pink) rows indicate relevant undetected links existing along the Top 25 CWE weakness’ externally linked path, the node on the path is in italic. Classes are Implausible (IM), Interesting (IN), Plausible (P), and Undecided (U).

Table 10.

Paper	Problem	Input	Output	Downstream Task
This Article	Detect plausible unknown relationships between threat, weakness, and defensive knowledge	BRON tactic, technique, attack pattern, weakness, mitigations and detections text	Probability of the relationship	Threat intelligence and cyber hunting
[21]	CAPEC to Technique edge prediction	BRON tactic, technique, attack pattern, and weakness text	Boolean: edge existence	Detect undocumented techniques and attack pattern relationships
[42]	Provide intention from alerts	Logs from Suricata alerts	1 of 11 Intents	Identify campaigns
[51]	Construct Knowledge Graph (KG)	Named entities from malware text descriptions	1 of 6 relationships in KG	KG reasoning
[4]	Provide ATT & CK Tactic from CVE	CVE text description	1 of 10 tactics	Stakeholders add preliminary ATT & CK information to CVEs

Table 10. Related work, Part 1

Italic highlights the difference with Hemberg and O’Reilly [21].

Table 11.

Paper	Feature	Representation	Modeling	Training text of Language Model
This Article	Concatenated text of all connected entries	Word Frequency, Word2Vec, Transformer	Random Forest, Ensembles and Expert labels	Pretrained, fine-tuned with BRON text entries
[21]	Concatenated text of all connected entries	Word Frequency, Transformer	Random Forest	Pretrained
[42]	Text string	Word2Vec	Pseudo-active Learning with Neural Network	Cybersecurity and other sources; see Reference [42]
[51]	Two entities	Word2Vec	Feed Forward NN	Cybersecurity Technical Reports, CVE and STIX
[4]	Text string	Word Frequency, Word2Vec, Transformer	NN	CVE description

Table 11. Related Work, ML Aspects, Part 2

Italic highlights the difference with Hemberg and O’Reilly [21].

2.3 Knowledge Sources in BRON and Their Selection

We are interested in knowledge that assists with threat hunting, threat analysis, and the selection of defensive measures. The knowledge sources aggregated in BRON have been deliberately selected, because they collectively match these purposes. At a minimum, each source in BRON is reputable, curated, and actively updated. In terms of topics, the sources provide information on the different elements of a threat narrative: threat behavior (in particular APTs), threat targets, as well as defensive mitigation and detection analytics. In terms of supporting security actions, the sources each serve specific purposes. In combination, their relational connections with each other are vital to connecting the “dots” when trying to form a complete narrative from the perspective of a threat, or a defense, or when searching for vulnerabilities that could be targeted. Links explicitly make some relational connections between entries across sources, while other connections are latent and implicit. A person must read many entries to find implicit links.

BRON integrates eight sources (five before this work and the additions of CAR, Engage, and the EDB with this work), which are logically grouped into: (1) sources describing threats, i.e., attackers and attacks, (2) sources describing the possible targets of threats, and (3) sources with defensive mitigation and detection knowledge.

Table 12 shows information sources and types, organization, and descriptions of the selected knowledge sources. Note that the BRON project downloads the data for these data sources programmatically. This might result in different data compared to manually accessing the webpages of the data sources, due to curation factors beyond our control.

Threat-related Knowledge Sources. There are two: ATT&CK and CAPEC. Threat behavior⁵ information is provided in ATT&CK, a knowledge base of adversary tactics and techniques based on real-world observations [36]. It serves as a foundation for the development of specific threat models and methodologies. ATT&CK is focused on describing the operational phases in an attack campaign, pre- and post-exploit, and contains the specific TTPs APTs use to execute their objectives while operating on a network.

Attack pattern identification is provided by CAPEC, a dictionary of attack patterns known to have been employed by adversaries to exploit weaknesses in cyber-enabled capabilities. It is intended to capture the “design patterns of attackers”⁶ [37]. CAPEC is focused on application threats and describes common attributes and techniques these use. Attack patterns link to multiple techniques in ATT&CK and weaknesses in CWE.

Target-related Knowledge Sources. There are three: CWE, CVE, and EDB. CWE is a list of software- and hardware-weakness types [39]. It provides a common language, a measure for security tools, and a baseline for weakness identification. A weakness is a condition in a software, firmware, hardware, or service component that could become multiple vulnerabilities. NVD CVE is used to identify cybersecurity vulnerabilities [38] in computational logic, e.g., code found in software and hardware components. Vulnerabilities can be exploited resulting in negative impacts to confidentiality, integrity, or availability. The common platform enumeration (CPE) is used to identify the vulnerable artifact, i.e., software or hardware [43]. The EDB [46] is a collection of public exploits and corresponding vulnerable software gathered through direct submissions, mailing lists, and public sources. One example of an exploit tool is Metasploit, widely used for penetration testing.

Defense Mitigation and Detection Knowledge Sources. There are three: D3FEND, Engage, and CAR. D3FEND is a knowledge graph [30]. The knowledge graph contains semantically rigorous types and relations that define both the key concepts for cybersecurity countermeasures and the relations necessary to link those concepts to each other. This linking connects offensive and defensive techniques.

Engage is a framework for communicating and planning cyber adversary engagement, deception, and denial activities [35]. Security analysts use it to implement defensive strategies for previously observed adversarial threat behavior. Adversary engagement and deception operations can reduce the cost of a data breach, waste an attacker’s time, and improve detection.

CAR is focused on providing a set of validated and well-explained analytics that help detect and deter threats [34]. CAR is a knowledge base of intrusion detection system rules for known techniques in ATT&CK. It includes pseudocode representations and code implementations directly targeted at specific tools in its analytics.

ATT&CK, CAPEC, and CWE sometimes have fields in their entries for possible mitigations and detections related to an entry. In contrast, CVE mitigations can typically be generalized to take the form of a configuration change or software update based on vendor recommendations. A CVE can also include specification changes or even specification deprecations.

BRON limitations.

BRON is limited to returning knowledge in its sources. BRON is built by downloading data from a URL for each source, finding the linked entries in the downloaded data and then constructing the property graph (see Reference [3]). Note that BRON solely uses edges reported in the original data sources and their transposes. The original data sources are continually updated, and the web page (URL) content can also be different from downloaded data.

2.4 BRON Descriptive Statistics

In this section, we present some descriptive statistics of BRON. We briefly compare the extensions of the BRON dataset to the original [20]. The added data sources are the defensive D3FEND, CAR, Engage, as well as mitigations and detections listed in ATT&CK, CAPEC, CWE. In addition, EDB is added. Note that the data sources are constantly updated.

Table 2 shows qualitative statistics regarding number of nodes and edges from a snapshot of BRON. In total, there are hundreds of thousands of nodes in BRON. The defensive data sources have in the order of thousand nodes (entries), D3FEND (170), CAR (101), Engage (31), as well as mitigations listed in ATT&CK (43), CAPEC (395), CWE (661), and detections in ATT&CK (578), CAPEC (54), CWE (115). Further, there are in the order of tens of thousands of offensive nodes from EDB (45,094). Regarding edges, the defensive data sources have in the order of thousand edges. There are also in the order of tens of thousands of offensive edges from EDB (26,350).

In Table 2, we see that the relationships in the BRON graph are sparse by looking at the ratio of existing edges and possible edges (E/P). These values range from 3.2E-06 to 9.3E-02, depending on the relationship. Note that some sparsity is expected, purely based on the definition of the entries in the data sources. Another measure of connectivity provides similar indications, e.g., we see the ratio of the unique and possible nodes in the relationships (Dst Ratio and Src Ratio). The Src Ratio values are between 7.1E-02 and 1.0E+00, and the Dst Ratio values are between 1.0E-01 and 1.0E+00. A value one (1.0E+00) indicates that all nodes have at least one edge. Note that some relations have one-to-one mappings by definition.

Centrality graph measures also indicate a low degree of connectivity in the BRON graph. For example, the degree centrality median for the BRON graph is 1.836E-04. Another centrality measure, the eigenvector centrality, which computes the centrality for a node based on the centrality of its neighbors, has a median value of 1.316E-18. Finally, we observe that there are few isolated nodes, i.e., nodes with no edges. This is a change compared to what was observed in the original BRON paper [20]. We can only speculate that entries in data sources could have removed entries and/or added relationships.

We now proceed with demonstrations of using the knowledge sources and extended relationships within BRON.

3 Cyber Hunting and Analytics with BRON

Our goal in this section is to demonstrate use of the knowledge sources and BRON in support of the inductive, deductive, or abductive reasoning involved in cyber hunting, analytics, red teaming, and threat modelling and simulation. In Section 3.1, we describe how a hunter can navigate between knowledge sources to reason about hypotheses and questions requiring following links. In Section 3.2, we show how the knowledge sources can be consulted to set up a red team exercise. In Section 3.3, we investigate cyber threat modeling simulations that explore defensive postures.

3.1 Source Navigation for Threat Mapping

3.1.1 Mapping APTs to Targets and Vice Versa.

If a hunter’s hypothesis starts from a potential target, e.g., “Our Web Servers are under attack and this will lead to Persistence,” then they will start with CVE search, looking for instances of web server vulnerabilities. Information from within CVE entries that list web server vulnerabilities potentially allows the hunter to take some actions, perhaps checking what versions are on their system. Then, the knowledge within CWE entries that are linked from these same CVE entries informs them of how to leverage tools and logging to give visibility into activities that reveal privilege escalation. As a final step, links from the CWE entry to Attack Pattern (CAPEC) and Technique informs them, e.g., to look at activity involving files that would be dangerous in the wrong hands (access control).

3.1.2 Mapping Tactics and Exploit Tools.

Let us presume that a cyber hunter acknowledges that the network perimeter has been compromised. They want to identify whether a hosted product is targeted by a specific APT tactic, such as persistence, and is vulnerable because of a particular weakness. Moreover, they want to find which Attack Patterns use any of the tactic’s techniques, weaknesses, and different vulnerabilities. They also want to know if any tools can be used to enable the tactic.

To support performing these tasks, the knowledge sources provide paths (easily traceable using BRON) that connect a Tactic to a tool for a known exploit. One finds multiple paths that link Persistence (a Tactic) to CVEs with exploits targeting Apache Struts 2 that are enabled by a Metasploit module.

One such path is (with abbreviated text):

Tactic (TA0003) Persistence: The adversary is trying to maintain a foothold. Persistence consists of techniques that adversaries use to keep access to systems across restarts, changed credentials, and other interruptions that could cut off access.

Technique (T1574) Hijack Execution Flow: Adversaries may execute their payloads by hijacking the way operating systems run programs. Hijacking execution flow can be for the purposes of persistence, since this hijacked execution may reoccur over time.

Sub- Technique (T1574.006) Dynamic Linker Hijacking: Adversaries may execute their own malicious payloads by hijacking environment variables the dynamic linker uses to load shared libraries.

Attack Pattern (CAPEC-13) Subverting Environment Variable Values: The adversary directly or indirectly modifies environment variables used by or controlling the target software. The adversary’s goal is to cause the target software to deviate from its expected operation in a manner that benefits the adversary.

Weakness (CWE-20) Improper Input Validation: The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

Vulnerability CVE-2018-11776: Apache Struts versions 2.3 to 2.3.34 and 2.5 to 2.5.16 suffer from possible Remote Code Execution when alwaysSelectFullNamespace is true (either by user or a plugin like Convention Plugin) and then: results are used with no namespace and at the same time, its upper package has no or wildcard namespace and similar to results, same possibility when using url tag that does not have value and action set and at same time, its upper package have no or wildcard namespace. Known Affected Hardware or Software Configuration CPE is

cpe:2.3:a:apache:struts:*:*:*:*:*:*:*:*, from 2.5.0 up to 2.5.16 Vendor-Product extracted as 3rd and 4th fields of Known Affected Hardware or Software Configuration, obtaining Apache Struts.

Exploit Tool Metasploit module for Apache Struts 2.

This path in one direction is: Given an attacker’s objective is Persistence, an attack subverting environment variable values, by means of exploiting dynamic linker hijacking technique, could be used to hijack execution flow and run a malicious binary due to improper input validation weaknesses in Apache Struts 2.5.0-2.5.16 with a Metasploit module.

An interpretation of this path in the other direction is: If any of a given network’s computers are running Apache Struts 2 versions 2.5.0-2.5.16, then the administrators need to be alert for the invocation of a Metasploit module that will hijack it to execute a malicious payload that can achieve persistence by exploiting improper input validation weaknesses that allow the attack to subvert environment variable values by hijacking environment variables used by a dynamic linker to load shared libraries.

3.2 Red Team Planning

One part of a red team campaign involves launching an attack to gain access to a network and establish a persistent back-door. The MITRE ATT&CK itemizes TTPs for the tactics of initial access and persistence. By querying BRON a red team not only retrieves publicly available techniques related to this campaign, but also attack patterns (CAPEC), weaknesses (CWE), vulnerabilities (CVE), product configurations, and Metasploit modules (see Tables 3 and 13).

During a campaign, the red team is constantly trying to acquire more environmental information about the network, e.g., the configuration (CPE format) of devices. To do this, they need tools such as Metasploit. The red team can filter Metasploit modules. They can access information about these tools by querying BRON (see Table 14). If they want to know the potential severity of using a tool or command or module in Metasploit, then they can navigate from the tactic and technique they are using to CVE entries. This CVE information for persistence is shown in Figure 1. This shows that most Metasploit modules can target a vulnerability with potentially severe consequences. Note that Metasploit is commonly available, so many of these exploits would be defended by updates or mitigations. A red team finding them would likely indicate that some impediment is preventing the update or mitigation.

Fig. 1.

The defender can find alternatives among defenses (D3FEND), analytics (CAR), mitigations (ATT&CK, CAPEC, CWE, D3FEND), detections (ATT&CK, CAPEC, CWE), and engagements (Engage) from BRON (see Table 4). There are multiple possible types of defenses for the Metasploit modules related to initial access and persistence tactics. Analytics found in CAR occur with the highest frequency.

3.3 Modeling Defensive Postures

We have also used the knowledge in the BRON property graph for threat modeling and simulation. In this context, we use it to provide action spaces and defensive configurations for the modeling and simulation of red (attack) versus blue (defense) sides. Additionally, the modeling and simulation can consult BRON to determine a performance score for its engagements between specific threats and defenses. To introduce this new use case of the knowledge in BRON, we next present a modeling and simulation that compares machine learning with other learning methods. This modeling and simulation framework also analyzes equilibria of the threats and defenses.

The framework models a cyberattack as a zero-sum game played between an attacker and defender on a network environment [60]. Specifically, with the use of BRON, the attacker selects one or more attack patterns to deploy against the defender’s network. Simultaneously, the defender selects software configuration(s) to patch, or upgrade, to the next version. The graph in BRON is used to calculate the reward. The reward for the attacker is computed as the sum of the risk scores of the product configurations affected by their attack; the defender’s reward is the negation of this. Executions of the framework compare two methods that find this game’s Nash Equilibria: multi-agent reinforcement learning(MARL), i.e., a competition between two RL agents, versus competitive coevolution (CCA), i.e., an evolutionary competition between two populations, threats and defenses, as methods to find the game’s Nash Equilibria.

We present a brief background on Nash Equilibria (NE) and ML methods for finding them.

3.3.1 Background on Using Machine Learning to Find Nash Equilibria.

A Nash equilibrium occurs when two players are playing their best response to the strategies of their opponent(s), and no player can deviate to achieve a higher payoff. Let \((S, r)\) be a game of n players, where \(S = (S_1 \dots S_n)\) contains the strategy set \(S_i\) for each player and \(r : x \rightarrow \mathbb {R}\) is the reward function. Each player selects a strategy \(x_i \in S_i\) from their strategy set. We denote \(x_{-i}\) as the selected strategies of all other players except player i. A set of chosen strategies \(x^*\) is a Nash equilibrium if \(\forall i, x_i \in S_i, r(x_i, x_{-i}) \le r(x_i^*, x_{-i}^*)\). In other words, no player can deviate from their equilibrium strategy \(x_i^*\) and receive a higher payoff.

To find a player’s best response strategy exactly, one could draw a game tree, enumerate all possible branches, and find the optimal action via backwards induction. The problem then becomes a brute force search over the entire state space. Here, this amounts to search over all combinations of attack patterns and software patches. Heuristics such as alpha-beta pruning improve the search time by pruning branches, but solving still requires search and full knowledge of the opponent’s actions [19]. Thus, this approach becomes infeasible when the number of possible states or actions becomes too large.

Reinforcement learning (RL) consists of one or more agents acting as players who interact with an environment consisting of a set of states \(\mathcal {S}\), actions \(\mathcal {A}\), and a reward function \(r : \mathcal {S} x \mathcal {A} \rightarrow \mathbb {R}\). The goal of the agent is to maximize its reward. RL environments typically rely on the Markov assumption that the environment is essentially memoryless and that the present state and reward only depends on the previous state and action. RL agents are often evaluated based on their regret R, which is defined as \(R = \sum _{t=1}^T r_t^{\pi _t} - \sum _{t=1}^T r_t^{\pi ^*}\), where the agent acts in \(t=1 \dots T\) rounds, \(r_t^{\pi _t}\) is the average reward achieved in round t from following policy \(\pi _t\), and \(r_t^{\pi ^*}\) is the average reward achieved in round t from following the optimal policy \(\pi ^*\). Intuitively, regret is the difference in expected rewards between following the learned policy and some optimal policy \(\pi ^*\). Our works uses multi-agent RL to find best response policies of a two-player game. Our two-player game models a preventative defense in competition with a threat (attack). We call the policies or strategies of the competitions in this game “adversarial postures.”

For comparison, we use a CCA that evolves two populations using selection and variation (crossover and mutation) techniques. One population comprises attacks and the other defenses. In each generation, competitions are held by pairing an attack and a defense. CCA differs from RL models, including no need for gradient updates and value function estimation [57]. In addition, CCAs are flexible and pose few restrictions on the types of environments that may be used. We will compare CCAs to RL in terms of their abilities to find NE and the quality of their adversarial strategies (postures) at equilibrium.

3.3.2 Modeling Attack Patterns and Patches.

We begin by detailing the modeling assumptions of the environment. We then describe each algorithm in turn and the modifications to the environment necessary for its function.

Environment. We define our simulation environment as follows:

Scenario. The scenario is that an attacker may not know the network structure before launching an attack, and the defender certainly cannot know which attack patterns will be selected. Time. The simulation proceeds in rounds. In a single round, both an attack and a defense are made simultaneously, independently, and without knowledge of the other player’s actions. At the end of the round, both players see their own reward but do not learn the other player’s actions. Attack decision. An attack is the selection of three attack patterns from the CAPEC repository [37]. There may be 0, 1, or more than 1 software configuration affected by each attack pattern. There were 546 CAPECs resulting in 162,771,336 potential combinations. Defense decision. A defense is the selection of three software configurations, as identified by CPE, to patch on the network [43]. A patch increments every instance of the particular software to the next available version, similar to how a network administrator would roll out a systemwide update. Note that there is no guarantee the upgrade will fix any security risks to particular attack patterns and may introduce new ones. We used a network with 20 unique software configurations for a total of 8,000 patch combinations. Network environment. We model an enterprise network. It contains a map between each software configuration and the number of occurrences on the network. We experimented with a single network. Reward. The reward is the sum of the CVSS scores for every software configuration on the network affected by the selected attack patterns as identified in BRON. The CVSS score is retrieved from BRON by tracing a path from a CAPEC to a CWE, and then from CWE to a CVE. The CVSS score is returned for the CPEs of the CVE that match the network. Note that this is a minimax game, so the attacker receives a positive reward with increased risk, while the defender receives the negation of that reward. The defender’s max reward is 0, while the attacker’s max reward depends on the network. Nash equilibrium strategy. A distribution over attack patterns for attacker and patches for defender that results in the average reward that is highest for attacker and lowest for defender, given the opponent’s strategy.

Competitive Coevolutionary Algorithm (CCA). Figure 2(a) shows an overview of a CCA. Each attacker–defender pair in the competition is assigned a CVSS score. Fitness is then calculated as the mean expected utility of competition scores (the average of all competitions). The populations are evolved in alternating steps: First, the attack population is selected, varied, updated, and evaluated against the defenses, and then, the same for the defense population.

Fig. 2.

We treat each population as an agent, i.e., a player or adversary of one type or the other (either attacker or defender) and its members as strategies in a mixed strategy NE [61]. This formulation has two advantages. First, the evolutionary dynamics allocate members of the population to maximize individual fitness, which in turn maximizes the collective fitness. Thus, the dynamics are present to select members of the population that approximate the optimal mixed strategy NE. Second, the mean population reward obtained from the CCA is equal to the expected reward from playing a round with a single randomly selected strategy from the population. This means we have an exact performance measure and a fair way to compare a CCA population to a single RL agent operating under its own learned distribution over actions.

Reinforcement Learning. We developed two separate OpenAI gym environments—one for each agent—and updated each with changes to the opponent’s strategy [7, 61].

We designed the action spaces of our environment to ensure compatibility with the RL algorithms, as RL algorithms can perform poorly in large discrete action spaces [12]. To handle the large number of attack pattern combinations, we converted the action space for the selection of attack pattern triplets into a continuous action space represented by a cube of edge length 2 centered at the origin. The agent’s action was represented by the selection of a coordinate within this cube. To convert this point into three attack patterns, we partitioned each axis into equal-sized intervals, with each interval representing a single attack pattern. Thus, we could determine the attack pattern \(c(x_i)\) for coordinate \(x_i\) of dimension i by taking \(c(x_i) = \mathbf {\alpha } [ \lfloor \frac{\scriptstyle x_i + 1}{\scriptstyle 2} |\mathbf {\alpha }| \rfloor ]\), where \(\mathbf {\alpha }\) is some zero-indexed list of CAPECs, \(|\mathbf {\alpha }|\) denotes the cardinality of the list, and \(\mathbf {\alpha }[i]\) denotes the ith element of list \(\mathbf {\alpha }\). Note that this is equivalent to partitioning the size-2 cube into \(|\mathbf {\alpha }|^3\) smaller cubes of side length \(2/|\mathbf {\alpha }|\), where each represents a unique combination of 3 CAPECs.

Our MARL training algorithm is shown in Figure 2(b). We trained both the attack and defense agents in an alternating schedule using a single environmental sample (batch size of 1) for each agent per episode. Each agent would learn using an environmental sample, update its internal model, make a prediction, and update their opponent’s reward function based on their adversarially chosen action. Our training algorithm is presented in Algorithm 1.

3.3.3 Experiments for Finding and Evaluating Defense Postures.

The experimental details are in Appendix A.4. The MARL and CCA attack agents achieved markedly similar equilibria (see Figure 3) when searching for attack patterns and defensive postures in the form of patches. Our results suggest that not only with MARL, but also with the CCA, the exploration-exploitation tradeoff seems to be a key factor for determining an agent’s convergence towards a possible equilibria. This suggests that, for each algorithm, there is an optimal ratio of exploration versus exploitation; this ratio is based on the quality of the samples drawn through random chance. Each algorithm has its own method for approaching this tradeoff, but the hyperparameter tuning can have a significant impact on the final outcome. For example, we investigate the breadth of threat behavior, in the form of attack patterns (CAPECs), the defensive postures have been evaluated against with the different ML methods. We observed that the CCA’s attacker used 108 unique CAPECs while the MARL attacker used 248.

Fig. 3.

This demonstration and comparison of ML for NE identification and modeling simulation for finding and evaluating defensive postures used the BRON property graph to find paths from attack patterns (CAPECs) to vulnerabilities (CVEs).

Next, we will show how supervised ML can be used to infer novel cybersecurity knowledge in the form of relationships between data sources.

4 Inference of Novel Relationships with Machine Learning

We have demonstrated uses of linked cybersecurity data for deductive, inductive, and ablative reasoning for cyber hunting, analysis, and threat modeling. However, one challenge when investigating the security implications of a threat or vulnerability is that links between entries in different sources are sparse (given the number of entries in each source) and some could be missing. Knowledge in the form of missing links is either due to a lack of documented relationships from threat forensics or lack of cyber knowledge in the human curators that read related entries to add links that are plausible. Adding complexity to the knowledge curation, the lack of a link does not always imply one is missing; sometimes no valid relationship exists. For example:

Consider a security analyst processing the Cybersecurity and Infrastructure Security Agency (CISA) Alert AA20-239A⁷ entitled “FASTCash 2.0: North Korea’s BeagleBoyz Robbing Banks.” The alert reports on the BeagleBoyz APT and mentions that during the discovery phase of FASTCash 2.0 campaigns technique T1033: System Owner/User Discovery in the ATT&CK matrix is used. To respond, an analyst could seek a suitable counter measure. In May 2022, while there was a suitable counter measure in ENGAGE EAC0022: Artifact Diversity, this entry was not linked to the technique.

BRON provides consultation of known relationships, however, it is not able to present currently undetected relationships that could be novel or variations of known ones. Yet, this is an important facet of cyber hunting and analytics—it is critical to infer potential outcomes from existing information, Figure 4 shows some possible relationships between nodes in BRON as well as link quantities. Attackers may have changed their behavior and their activities have not yet been observed. This unsupported capability motivates the inference of information. We want to present plausible, but undetected, relationships to cyber hunters and analysts.

Fig. 4.

Specifically, we investigate how ML techniques within a semi-automated curation workflow can address inferring novel relationships. This is challenging, because entries are written in free-form text with inconsistent documentation standards. Some entries are quite sparse (see CWE-200,⁸ for example).

CWE-200: Exposure of Sensitive Information to an Unauthorized Actor; The product exposes sensitive information to an actor that is not explicitly authorized to have access to that information.

In such cases, following links from entries is helpful in encoding more information about them. These linked entries themselves have links that also could contribute helpful information.

Software-based reasoning about the meaning of free-form text can be a challenge for conventional tools. They often only reference the meanings of field names and basically search it. ML has been used in Natural Language Processing (NLP) to improve the automated processing of free-form text from these sources and other security information, such as logs, alerts, and reports [15, 17, 25, 28, 52].

In this section, we demonstrate how ML research could in practice serve digital threat knowledge-base curators, threat hunters, and cyber security analysts. We present an ML-based workflow that addresses the overwhelming quantity of text entries that have to be read and assimilated by cyber hunters and analysts to infer a plausible relationship between two entries from different threat, vulnerability, and mitigation sources. In Section 4.1, we present the relationship inference method. In Section 4.3, we present experimental results from the ML workflow.

4.1 Relationship Inference Method

This section describes the ML workflow (4.1.1) and provides a stepwise training method for the different parts in the ML workflow (4.1.2).

4.1.1 Relationship Inference Workflow.

We create a workflow for each relationship (see Figure 5). Each workflow is the same, except it integrates different, optimally selected, Language Embedding Models and classifiers from a preliminary training and testing stage. In this stage, we take three steps to obtain the machine learning components of a workflow and note what neighboring and indirect text entries will be used. A summary of this supervised training stage (details in Section 4.1.2) is:

Fig. 5.

–

In the dataset construction (1. in Figure 5) step, we first assemble five datasets.

–

In the text encoding (2. in Figure 5) of entries, for each dataset, we use four different Language Embedding Models to translate the text of its records, obtaining different feature encodings based on each Language Embedding Model.

–

In the prediction (3. in Figure 5) step, we train one classifier per encoding, again for each dataset. We then independently train four classifiers, each with the features encoded by one of the four Language Embedding Models and their labels.

After training, using F1 measures obtained from test data, we select the best set of four classifiers and Language Embedding Models, noting the record format of each dataset used in training the selected models. These are inserted at the beginning of the workflow for working with unlabeled data.

The workflow progresses with inputs assembled like the records of the noted training dataset. We exhaustively sample pairs of entries that are not linked and trace their other cross-source links to match the dataset’s. These pairs are encoded and passed to the classifiers, one per Language Embedding Model. Each classifier outputs the probability of the pair being linked by the relationship. The curator tunes the classifier parameters to obtain some quantity of candidate linked pairs from each classifier (4. in Figure 5). The candidates from all classifiers are then combined according to how a curator decides how to rank them (here, it is ordered by the sum of probabilities). The curator then sets a threshold and passes ranked candidates above it to experts.

The experts label (5. in Figure 5) the candidates using their text descriptions as a starting point for their assessment. Their labels are: Unlinked, no relationship between the two entries. Interesting, the relationship is not certain, yet interesting. Linked, there is a relationship between the two entries.

Finally, the curator derives a final label from consensus rules using the experts’ labels (6.). Here, we use the curation rules: Implausible (IM): All expert labels are Unlinked. Interesting (IN): All expert labels are either Interesting or Linked, and not all are Linked. Plausible (P): All expert labels are Linked. Undecided (U): All candidates not one of IM, IN, P.

4.1.2 Preliminary Stage - Supervised Training.

The supervised training is the steps 1.,2.,3. in Figure 5.

(1) – Dataset construction. A dataset “Neighbors,” denoted by N, is drawn from the two sources (collections) with directly linked entries. It is an aggregation of positively labeled, directly linked entries and negatively labeled pairs of entries (one from each source) that have no link between them. Because we do not know how much direct and indirect textual information best informs link inference, we designate sources (collections) in BRON as offensive (O), defensive detection (D), and mitigations (M). This allows us to create four additional datasets by following directly linked “neighbor” (N) entries’ external links to collect additional text entries that may help with inferring the relationship. Details are in Table 5.

(2) – Encode Text Entries in Records. For each of a dataset’s exemplars, we extract the text as one (concatenated) segment and tokenize it. Tokenization includes word stemming, removal of common or connecting words [24]. Because the meaning of the text is critical to inferring a relational link, we use four different Language Embedding Model s to explore feature representations. For choosing the Language Embedding Model, we consider whether the embeddings capture context in text, word meaning (semantics), dimensionality, and training requirement (see Table 6). The Language Embedding Models are Bag-of-Words (BoW), GloVE, BERT, and a BERT fine-tuned on BRON’s domain-specific text, F-BERT. In Section 4.2, we provide additional details on how we used these Language Embedding Models. Each Language Embedding Model yields an updated dataset replacing the text with numerical features as input to a classifier.

(3) – Train One Classifier per Encoding. For each dataset and each Language Embedding Model, we train a RandomForestClassifier from Scikit-learn [49] to infer link probability. The random forest is a meta estimator that fits a number of decision trees on sub-samples of the dataset and uses averaging to improve the predictive accuracy and reduce over-fitting.

4.2 Relationship Inference Workflow Setup

With the five datasets for each relationship, we proceeded through the preliminary three steps of the workflow training stage. We selected four classifiers for each relationship’s workflow and note the corresponding dataset, i.e., the extent of indirectly, externally linked, text entries to be assembled for each candidate pair of entries. For details, see Appendix A.5.

For 1. – dataset creation, Figure 4 provides, for each relationship, how many links it has in December 2021 in our BRON version. It also shows the quantity and ratio of potential undetected relationships and how many positive and potential direct neighbor examples there are in BRON. Note that, because of computational expense, we considered only CVEs in 2021 when working with the weakness allows vulnerability relationship we sampled 1,000 (0.068) links randomly as positive examples.

For 2. – translating text to features, experimental details for each Language Embedding Model follow: Bag-of-Words (BoW): A piece of text is represented as a vector containing the count of each token in it along with zero counts for tokens within a background corpus that do not appear in the piece of text. GloVE Token embeddings are created with the GloVe model [50]. BERT [9]: BERT (Bidirectional Encoder Representations from Transformers) has been pre-trained by Google to consider the context of tokens within a piece of text. Fine-tuned BERT (F-BERT): We fine-tuned BERT on BRON text data, including weaknesses, attack patterns, techniques, tactics, mitigations, and detections entries.

For 3. – training one classifier per feature representation, to emphasize the minimization of false positives, we empirically tuned the class error weights of the cost matrices and used the RandomForestClassifier. Per Figure 4, examples of positive relationships, i.e., related entries, are vastly outnumbered by unrelated ones. Thus, we under-sample the negative class and training on a smaller but balanced training set.

4.3 Relationship Inference Results

Here, we present the results from our ML workflow with BRON regarding, training, testing, and exploration of the inferred relationships.

4.3.1 Relationship Inference Model Training Results.

Table 7 shows the results of training the classifiers for each relationship, Language Embedding Model, and dataset. At this stage, we had, for each relationship, a set of 4 Language Embedding Models used to express 5 different sets of text knowledge, 20 in total. Each of these 20 experiments culminated in the training of its own RandomForestClassifier and was repeated 100 times. Since we are interested in picking a high-performing trained classifier for our workflow, we select (and report) the best results from testing in Table 7. We see that there is a difference for both Language Embedding Model and dataset for each relation. The best technique uses attack-pattern model is the lowest performing among all five relationship models. The best countermeasure alleviates technique model is superior among all five relationships models. In close second is the best weakness allows vulnerability model. The Language Embedding Model that supports the most best-models is GloVE (from spaCy).

The datasets used in training for these best-models varied depending on the relationship. Over all relationships, NOM is used 6 times (out of 20), NOM and NOMD 5 times, and N (just the text of the pairs of entries) only once. Recall that O represents links to offensive entries. Every relationship had at least one Language Embedding Model that used cross-source entries linked to Offensive entries, however, different combinations of datasets were used in different relationships. This points to the importance of considering multiple options for features and Language Embedding Models.

4.3.2 Relationship Inference Workflow Results.

We next use the workflow of each relationship, acting as the curator. For each relationship, we tune a threshold, C, in the RandomForestClassifier s provide \(\approx 10\) candidates from each workflow classifier. We combine the candidates of the four models and sort them by their probabilities. Then, we down select to the top 10. Finally, we pass these 10 candidates to four experts: \(E_1,E_2,E_3,E_4\). The experts have varying academic and industrial cyber security, hunting, and Security Operation Center experience. They were presented with the text of the candidates’ entries and links to the URLs of the entries. They could also do their own research regarding the proposed entries and the relationship.

Again, acting as the curator, we set up the consensus rules for the final label of the relationship. Our rules are the strict ones stated at the end of Section 4.1.1. Table 8 provides a count of candidates by each final consensus label for each relationship. For three relationships, there was at least one candidate that was Plausible. weakness enables attack-pattern and engagement counters technique yielded the highest number of Plausible candidates. weakness enables attack-pattern and technique uses attack-pattern relationships had the highest number of Interesting candidates. weakness allows vulnerability and technique uses attack-pattern each had a few Implausible candidates. There were no plausible or interesting candidates for the countermeasure alleviates technique relationship. This is reasonable, because D3FEND entries are known to be directly mapped to Techniques [30].

Table 9 shows pairs of candidate entries with plausible or interesting consensus labels, how the experts rated them, and the final label. Overall, the examples and counts indicate the framework can yield Plausible or Interesting candidates. Another way to put the results in context is to examine the impact of a new plausible link. We found that the new plausible link for engagement counters technique, between Artifact Diversity (EAC0022) and System Owner/User Discovery (T1033), offers up another possible mitigation to the BeagleBoyz APT.⁹

4.3.3 Exploration Examples with Inferred Links.

Acting as cyber hunters, we next explored using the workflow’s results. We consult the 2020 Common Weakness Enumeration (CWE)Top 25 Most Dangerous Software Weaknesses list [40] and see whether any plausible link that the workflow identified would be relevant to any weakness on the list. The list is compiled by MITRE and highlights “the most frequent and critical errors that can lead to serious vulnerabilities in software” [40]. For example, an attacker can exploit these weaknesses to take control of a system, obtain sensitive information, or cause a denial-of-service. BRON already provides the ability to trace ATT&CK techniques that are relationally linked to the top 25 CWEs and mitigations and detections, not only the ones in the CWE, CVE, and CAPEC data sources.

We started by checking how much relational knowledge was already linked to the CWEs in the Top 25 list. We found that all Top 25 CWEs of 2020 have CWE mitigation text, but mitigations along the path of indirectly linked entries are not available. Specifically:

(a) Only six CWE entries are connected to ATT&CK techniques, only four are connected externally to D3FEND entries, and only five to Engage. (b) Seven CWE entries have no CWE detection. (c) Only three CWE entries are not connected to attack patterns. (d) Only CWE-200 is connected to all the BRON sources. (e) One CWE, CWE-416, has just two mitigations in total.

That these critical weaknesses are supported only by a sparse set of relational knowledge increases the relevance of potential undetected links. We therefore tried relationship inference on the top 25 CWEs. For each CWE and the text in BRON, we obtained from the best RandomForestClassifier the probability of links between currently unlinked weaknesses and other entries. We then compared the links that were assessed with high probability to the ones labeled by our experts. We found that four overlap: two direct relationships with CAPECs, e.g., Improper Authentication enables Reusing Session IDs (a.k.a. Session Replay), and two indirect relationships, both Engagecounters Technique relationships. Table 9 highlights the rows of these relevant undetected relationships. Repeating our process, these findings were valid for the top 25 CWEs for 2021, which only differ in three CWE entries.¹⁰

We have investigated how cybersecurity information can help finding, comparing, and improving cybersecurity information.

Next, we present a discussion regarding the comparison of cybersecurity information, ML inference of novel relationships, and ML for finding defensive postures.

5 Discussion

We have demonstrated how to support cyber threat analytics, hunting, and simulations with enhanced threat, vulnerability, and mitigation knowledge, as well as how to infer novel relationships with ML. In Section 5.1, we discuss our overall observations. In Section 5.2, we discuss limitations and threats to validity.

5.1 Observations

In terms of the findings of our inquiries, we found uneven availability both “local” to the data sources and on the graph over the data sources. This places a caveat on the particular query responses we found. The same public data is also used to build predictive models, so the caveat carries over to this domain. Examples of modeling include addressing situational awareness [31], predicting missing edges between CVE, CWE, and CAPEC [63], and investigating data breaches with semantic analysis of ATT&CK [45]. Finally, for research-oriented, multi-agent threat modeling simulations that model Red vs. Blue teams [18], i.e., attack and defense dynamics, BRON also offers all or any existing countermeasure knowledge in the public domain that blue agent can draw upon.

In our threat-modeling experiment, both the CCA and MARL algorithms converged to very similar equilibria after hyperparameter tuning, although these equilibria were markedly close. For the MARL, we recognize that the MARL representation structure could imply some sort of spatial relation between CAPECs that may adversely affect the model’s learning. However, this spatial relation could be advantageous, since CAPEC identifiers are often grouped by similarity in the CAPEC taxonomy [37]. Finally, without a specialized tuning network, CCA may be a reasonable alternative to compare and set the MARL hyperparameters. This could prove useful with, e.g., Moving Target and other dynamic enterprise networks, where constant updates could make frequent re-tuning of the hyperparameters infeasible. In such cases, training these two approaches simultaneously provides a benchmark for their performance.

5.2 Limitations & Threats to Validity

There are several limitations to cyber hunting and analytics, as well as ML enhancement, of BRON’s graph.

Limitations to cyber hunting and analytics. BRON relies only on publicly reported data from NVD, and numerous vulnerabilities exist that do not have CVE IDs. Additionally, the data has biases due to reporting by diverse sources with different interests and product offerings, and they are continuously updated and altered. All the data biases of each individual source exist in BRON.

Furthermore, given BRON is reliant on the quality of the available public data, a major threat to this work’s validity is the integrity of the same data. There could be curation errors such as inconsistent severity scores, and we have uncovered gaps. No public resource in this context will ever be complete. The sources are also sensitive to data aging and heterogeneity: Not all products use the same versioning standard, and older products can accumulate newly disclosed vulnerabilities. The risks for the data breaching confidentiality is low, since the data is public. The threat to availability of data depends on the availability of the data sources we amalgamated. Directly ascribable to BRON is a risk that BRON’s snapshot of the sources is out of date and misses new updates to them.

For the modeling and simulation, the absence of concrete knowledge of the local equilibria in our reward function to which we could compare our solutions make it impossible to rank our approaches and qualify their performance. However, this was not our goal. In fact, what is better for the attacker is worse for the defender and vice versa. We use BRON to compare algorithms and create benchmarks. We rely on indicators of optimality: the reward to which the algorithm converged after many iterations, the cumulative reward as a proxy for (negative) regret, and the characteristics of the solutions themselves. Moreover, for the CCA, one limitation is the fact that a finite-size population is only a rough approximation for a distribution over strategies. Another limitation is the possibility of the variation creating strategies that are less optimal (such as duplicating attack patterns), but this seems to be outweighed by the benefits toward convergence. The key result is that the CCA allows coevolution to create mixed, probabilistic strategies in a manner similar to MARL while requiring fewer assumptions and with greater robustness to hyperparameter misspecification in our environment. Finally, we note that other heuristics for comparison could have been used, including number of environment calls and CPU time.

Limitations to relationship inference. Please note these results are for data from December 2021. Due to the dynamic nature of the data, inspection of the results on newer data might produce different results due to updates in the data sources. Note, however, that we have used curated data from reliable sources that are hopefully less sensitive to data poisoning, e.g., as shown in generating fake cyber threat intelligence using transformer-based models [54].

There are several limitations in our implementation of the machine learning (Language Embedding Models and classifiers) methodology. These could impact the quality of our results. First, other dataset creation, sorting, and down-sampling methods could be substituted and may be superior. For example, the datasets we used to train the models were relatively small in size due to the low number of positive examples. We under-sampled the negative examples, and our random sampling was very sparse. This may have introduced a bias toward less accuracy. One approach to mitigate this limitation is to use more labeled data, perhaps by means of an active learning paradigm or few-shot learning with newer language models.

Another example is that, for the weakness allows vulnerability inference, we only used CVEs in 2021. Our results may have been better if we instead included more data in these steps. Additionally, when generating candidates for pairs of entries for the weakness allows vulnerability relationship, we only examined 100,000 pairs of 13 million. Finally, while our workflow is general in terms of its components and roles for humans, our implementation may introduce limitations. We have not tried different classifiers. We could have fine-tuned BERT further. Plus, our human-made choices could have differed. The consensus rules obviously are subjective and could have been less strict. More and better experts could have been consulted to improve confidence in the final results.

6 Related Work

In Section 6.1, we describe work related cyber hunting and analytics with knowledge sources and work related to modeling and simulation with ML for finding robust defensive postures. In Section 6.2, we present work related to inference with cybersecurity knowledge sources.

6.1 Cyber Hunting, Analytics, and Modeling & Simulation

In terms of the sources this contribution focuses upon some have also previously used in research, e.g., References [5, 26, 29, 58]. In terms of referencing different sources, other works study inconsistencies in public security vulnerability reports [11] and threat intelligence [56]. There is also work on finding inconsistency of security information from unstructured text with machine learning to find quality metrics for cyber threat intelligence [28]. The text of reports for Threat Intelligence is also used to generate automatic and extraction of threat actions from unstructured text of CTI Sources in Reference [25].

ML offers attack planning, defensive modeling, threat prediction, anomaly detection, and simulation of adversarial dynamics in support of cyber security [2, 6, 10, 13, 14, 16, 25]. The breadth of actions utilized in prior research can be placed on a spectrum from very concrete actions on specific systems to more general, network-wide abstractions. One class of simulations has focused on operating on particular code bases and could prove useful in training models for threat detection and monitoring. Coev-Malware and ArmsRace involved injecting lines of malicious code while a vendor aimed for early detection of the security bug [8, 59]. In the middle of the spectrum lies RIVALS-DDOS, which simulates a DDOS environment while not actually running any malware itself [48]. Finally, Prior work on finding Nash equilibria in cyber simulations has focused on finding strategies that are evolutionarily stable and therefore correspond to Nash equilibria [66].

Multi-agent reinforcement learning (MARL) includes problems in which two or more agents either cooperatively or competitively act in an environment [65]. A special case arises when agents neither observe their opponents’ actions nor their impact on the agent’s state. In this case, the problem of finding a best response can be reduced to solving a Markov decision process with bandit feedback and adversarial rewards [27, 32]. An alternative is to use a neural net to approximate each entry in a field known as deep reinforcement learning. Some work has been done on training robust deep reinforcement learning agents, but it has mainly focused on bounded (and not adversarial) changes in the reward function, so using this approach currently means setting aside theoretical guarantees on the regret bound [47].

6.2 Inference of Relationships with Machine Learning

ML techniques for cybersecurity that work at the behavioral level are emerging. They typically use threat information that abstractly describes an attacker’s TTPs as well as vulnerability knowledge such as exposed product configurations, system weaknesses, and exploits. These information sources are typically independent, though they sometimes have external links to one another. ML has been used to improve the automated processing of free-form text from these sources and other cybersecurity information, such as logs, alerts, and reports [15, 17, 25, 28, 52].

We summarize work that, like this contribution, uses text knowledge for cyber security purposes in Tables 10 and 11. Table 10 describes the problem, the problem’s input and output, and any downstream tasks into which the solution feeds. Table 11 describes the same works from a machine learning perspective. It describes the problem features, their NLP technique for obtaining Language Embedding Models, and the second-stage inference modeling technique. The right-most column states the text sources on which the Language Embedding Models were trained. Of note, Moskal inferred intent from alerts to help scale campaign identification, referencing the text lines of Suricata logs. Pingle et al. inferred the relationship between pairs of entities, classified on the basis of classes from a set of six pre-identified relationships to construct knowledge graph triplets and ultimately used the knowledge graph for reasoning about, e.g., vulnerabilities. Ampel et al. used text from CVEs to predict a link to one of 10 ATT&CK TACTICS, allowing stakeholders to add preliminary ATT&CK information to CVEs. They experimented with Language Embedding Models ranging from simpler to complex, and, like Reference [51], even employed a task-specific vocabulary to fine-tune the Language Embedding Model. This shows that this contribution overlaps with other works using text knowledge for cyber security purposes, but is also distinctly innovative. Similar to References [4, 21, 42] the downstream task is to find variations and assist cyber hunters and threat intelligence analysts. Distinctively, this contribution provides a wider range of inferred relationships, seeks expert labels and consensus, while emphasizing the combination of machine learning and human judgment.

7 Conclusions & Future Work

Defense against diverse and dynamic cyber threats requires cross-linked threat, vulnerability, and defensive mitigation knowledge. Cyber analysts consult it to form a chain of reasoning to identify a threat starting from indicators they observe or vice versa. Cyber hunters use it when seeking specific threats. Threat modelers apply it to explore different defensive postures for evolving threats. We aggregated five public sources of threat knowledge and three public sources of knowledge that describe cyber defensive mitigations, analytics, and engagements, with some unidirectional links between them. We consolidated the sources into a graph, BRON. In this graph all unidirectional cross-source links are bidirectional. This enhancement of the knowledge made it easy to answer potential questions analysts and automated systems can pose. We demonstrated this for threat mappings and red team planning and ML-based modeling and simulation by providing automated red and blue agents. Finally, the linked data is very sparse and relies on expensive human curation, and we demonstrated how an ML workflow can help access the semi-structured text descriptions within. Combined with supervised machine learning and expert knowledge, we found novel relationships.

In future work, we plan analyses that take additional care to align comparisons along a date or age of product. CAPEC and CWE have information regarding similar entries that can be utilized as well. We have not studied data source entity similarity (connections), only similarity between data sources. Additional data sources can be added, such as CISA known exploited vulnerabilities catalog,¹¹ as well as text sources, such as reports. The language embedding models can be updated and fine-tuning could be extended and include a well-chosen training text, as well as training multi class classifiers. Another direction of future work includes enhancing the knowledge structure; currently, we used a property graph. Updating it to a knowledge graph would provide edge inference training with more complex relationship knowledge. Finally, for the modeling and simulation, we can expand upon the environments to more realistically model cyber attack progressions. The option of using other defensive measures as a form of mitigation on certain software could also be added.

Footnotes

There is a glossary in Appendix A.1.

Advanced persistent threat refers to a purposeful actor with nation state capabilities that gains unauthorized access to computer networks and evades detection for an extended period of time.

A property graph uses nodes, relationships, labels.

⁴

Bron means the bridge in Swedish, referring to unification of the knowledge sources through their linkages with one another.

⁵

https://capec.mitre.org/about/attack_comparison.html

⁶

https://capec.mitre.org/about/new_to_capec.html

⁷

https://www.cisa.gov/uscert/ncas/alerts/aa20-239

⁸

https://cwe.mitre.org/data/definitions/200.html

⁹

https://www.cisa.gov/uscert/ncas/alerts/aa20-239a

¹⁰

https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html

¹¹

https://www.cisa.gov/known-exploited-vulnerabilities-catalog

A Appendix

A.1 Glossary

Advanced Persistent Threat (APT) Advanced persistent threat refers to a purposeful actor with nation state capabilities that gains unauthorized access to computer networks and evades detection for an extended period of time.

Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) a knowledge base of classifications and descriptions of cyberattacks and intrusions from MITRE [36].

Artificial Intelligence (AI) intelligence demonstrated by machines.

Bag-of-Words (BoW) a text is represented as set of words (tokens).

Bidirectional Encoder Representations from Transformers (BERT) a family of masked-language models introduced [9].

BRON Bron means the bridge in Swedish, referring to how it links data sources. A property graph: The following notation is used: \(\tau ^O\) MITRE ATT&CK Tactics [36] \(\epsilon ^{O,M,D}\) MITRE ATT&CK Techniques [36] \(\alpha ^{O,M,D}\) MITRE CAPEC Attack Pattern s [37] \(\omega ^{O,M,D}\) MITRE CWE Common Weakness Enumeration [39] \(\nu ^O\) National Vulnerability Database CVE Common Vulnerabilities and Exposures [44] \(\chi ^O\) Offensive-Security Exploit Database exploits [46] \(\delta ^M\) MITRE D3FEND Mitigation [30] \(\kappa ^D\) MITRE CAR Detection [34] \(\eta ^M\) MITRE ENGAGE Mitigations [35].

Common Attack Pattern Enumeration and Classification (CAPEC) a dictionary of known patterns of attack employed by adversaries to exploit known weaknesses in cyber-enabled capabilities from MITRE [37].

Common Vulnerabilities and Exposures (CVE) a reference method for publicly known information-security vulnerabilities and exposures [38].

Common Weakness Enumeration (CWE) a list of software and hardware weakness types from MITRE [39].

Common Vulnerability Scoring System (CVSS) an open standard for assessing the severity of computer system security vulnerabilities, \(\text {CVSS} \in [0,10]\) [1].

Common Platform Enumeration (CPE) a format for software and hardware configurations [43].

Competitive Coevolutioanry Algorithm (CCA) evaluation, selection, and variation of competing populations of strategies, i.e., a stochastic coupled multi point heuristic.

Cyber Analytics Repository (CAR) a knowledge base of analytics developed by MITRE based on the MITRE ATT&CK adversary model [34].

D3FEND A knowledge graph of cybersecurity countermeasures from MITRE [30].

Detection (D) BRON collection with defensive information for detection.

Engage a framework for planning adversary engagement operations from MITRE [35].

Exploit-DB (EDB) a database of exploits from Offensive-Security [46].

Fine-tuned BERT (F-BERT). BERT fine-tuned on a cyber security text corpus.

Implausible (IM). All expert labels are Unlinked.

Interesting (IN). All expert labels are either Interesting or Linked, and not all are Linked.

GloVE a language embedding model based on encoding a sequence of words to a real valued vector with an artificial neural network that has been trained on a corpus [50].

Language Embedding Model (LEM). Model for transforming from the language to some other codomain, e.g., a function that takes an English \(\mathcal {L}\) sentence and maps it to a floating point vector \(f: \mathcal {L} \rightarrow \mathbb {R}^n, f(\mathbf {x}) = y\).

Machine Learning (ML) methods that use data to improve performance on some tasks.

Metasploit a computer security project and tools that aid in penetration testing from Rapid7.

Mitigation (M). BRON collection with defensive information for mitigation.

Multi Agent Reinforcement Learning (MARL) multiple learning agents that coexist in a shared environment.

Nash Equilibrium (NE) when two players play their best response to the strategies of their opponent(s), and no player can deviate to achieve a higher payoff.

Natural Language Processing (NLP) a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language.

Neighbor (N) two collections in BRON with directly linked entries.

Network mapper (nmap) a network scanner for discovering hosts and services on a computer network by sending packets and analyzing the responses.

Offensive (O). BRON collection with offensive information.

Plausibe (P) All expert labels are Linked.

Property graph a graph \(G = (N, E)\), N are nodes, and E are edges. Both nodes and edges can have labels (properties).

\(\mathbf {r}\) reward from a reward function \(r : \mathcal {S} x \mathcal {A} \rightarrow \mathbb {R}\) from an action in a state.

\(\mathbf {R}\) regret, which is the difference between learned policy \(\pi\) and optimal policy \(\pi ^*\), \(R = \sum _{t=1}^T r_t^{\pi _t} - \sum _{t=1}^T r_t^{\pi ^*}\).

Random Forest Classifier (RF) A random forest classifier from Scikit-learn [49].

Reinforcement Learning (RL) methods for how agents take sequential actions in an environment to maximize some reward.

[\({\mathbf {S}}\)] strategy in a game (threat modeling simulation) \(S = (S_1 \dots S_n)\).

SpaCy open-source software library for natural language processing [24].

Tactics, Techniques and Procedures (TTP) identify patterns of behavior of a particular cyber adversary.

Undecided (U). All candidates not one of IM, IN, P.

A.2 BRON

We introduce property graph notation, describe how the BRON property graph is constructed, and how it can be extended.

A.2.1 Notation.

Formally, BRON is a graph \(G = (N, E)\), N are nodes, and E are edges. Both nodes and edges can have labels. Nodes, N, are denoted based on data source name l and category c, \(l^c\), e.g., \(\tau ^ o\) is the data source tactic (\(\tau\)) that is of the type offensive (O). The source category c can be: Offensive O, Mitigative \(M,\) or Detection D, \(c \in \lbrace O, M, D\rbrace\). The data source name l can be: Tactics \(\tau\), Techniques \(\epsilon\), Attack Patterns \(\alpha\), Weakness \(\omega\), Vulnerabilities \(\nu\), Exploits \(\chi\), Mitigations \(\delta\), Engagements \(\eta\), Analytics \(\kappa\), \(l \in \lbrace \tau , \epsilon , \alpha , \omega , \nu , \chi , \delta , \eta , \kappa \rbrace\) (see Table 12).

Edges are bidirectional between nodes; see Figure 6 for edges that exist in the data sources. The number of possible edges between nodes is \(|N_i| \times |N_j|\) when \(N_i \ne N_j\).

Fig. 6.

A.2.2 Property Graph Construction from Knowledge Sources.

The information in the BRON property graph is as accurate as the links in the public data sources that it relies on.

Table 12 shows symbols, information sources and types, organization and description of the selected knowledge sources. It is visualized with an example in Figure 7.

Fig. 7.

A.3 Red Team Planning

Entries on paths to Metasploit exploits for the Initial-access and Persistence tactics retrieved from BRON are shown in Table 13.

Filtered Metasploit module paths (see Table 14).

Entries on paths to Metasploit exploits for the Initial-access and Persistence tactics retrieved from BRON are shown in Table 15.

A.4 Modeling and Simulation of Defensive Postures

The hyper-parameter combinations for evaluation are displayed in Table 16 and are based upon prior research [60, 61, 66].

We selected the A2C model from the Stable Baselines3 package [41, 53]. We simply used the default learning rate of 0.0007 provided in the StableBaselines3 package [53].

A.5 Relationship Inference Workflow Setup Details

For 1. – dataset creation, Note that, because of computational expense, we considered only CVEs in 2021 when working with the weakness allows vulnerability relationship. There were 174,835 CVE entries in total, and 14,613 entries in 2021. For inference, 1,000 links were randomly sampled as positive examples.

For 2. – translating text to features, experimental details for each Language Embedding Model follow:

Bag-of-Words (BoW): The CountVectorizer module generated the vectors of word counts [49].

GloVE We use the en_core_web_lg pipeline in the spaCy library [24].

BERT [9]: We obtain BERT and a tokenizer from the Hugging Face Transformers library [62]. We use both the BertModel so the pooler output and [CLS] final hidden state could be accessed.

Fine-tuned BERT (F-BERT): We fine-tuned BERT on BRON text data, including weaknesses, attack patterns, techniques, tactics, mitigations, and detections entries, using Hugging Face’s BertForMaskedLM masked language modeling objective. A 90/10 train-validation split was used in fine-tuning. Hugging Face’s DataCollatorForLanguageModeling [62] was used to batch the training/validation data, pad the sequences to the maximum length of the batch, and randomly mask 15% of the tokens in the sequences. The model was fine-tuned for 50 epochs with a batch size of eight.

To account for bias in the model algorithm and the data, trials using different seeds were performed for both the classifier and the train-test split. One hundred trials were performed for each (embedding, dataset) pair, with seeds in \([0,\dots ,99]\) used in the random_state parameters of the classifier and the data split. After fine-tuning, we selected the model with the lowest validation loss across all epochs to use going forward.

Table 12.

Symbols	Source and Type of Entry	Description
\(\tau ^O\)	MITRE ATT & CK Tactics [36]	Common tactics of attack staging. The columns of the ATT & CK matrix.
\(\epsilon ^{O,M,D}\)	MITRE ATT & CK Techniques [36]	Means of achieving a tactical objective, organized by Tactic, the row elements of the ATT & CK matrix.
\(\alpha ^{O,M,D}\)	MITRE CAPEC Attack Patterns [37]	Relates to abstract why and how (Tactic and Technique) of an attack objective and to target where (Weakness) of the attack.
\(\omega ^{O,M,D}\)	MITRE CWE Common Weakness Enumeration [39]	Security-related flaws in architecture, design, or code.
\(\nu ^O\)	NVD CVE Common Vulnerabilities and Exposures [44]	Security-related flaws in software and applications. Specific software application or hardware platform releases that are affected. The Common Platform Enumeration (CPE) is used for Affected Product Configuration s [43].
\(\chi ^O\)	Offensive-Security Exploit Database Exploits [46]	The Exploit Database provides scripts (tools) for exploits.
\(\delta ^M\)	MITRE D3FEND Mitigation [30]	A knowledge graph of cybersecurity countermeasures.
\(\kappa ^D\)	MITRE CAR Detection [34]	A knowledge base of analytics based on the ATT & CK adversary model.
\(\eta ^M\)	MITRE ENGAGE Mitigation [35]	Cybersecurity mitigation goals, approaches and activities.

Table 12. BRON Symbols, Organization, and Information Sources and a Short Descriptions

For 3. – training one classifier per feature representation, to emphasize the minimization of false positives, we empirically tuned the class error weights of the cost matrices. The RandomForestClassifier [49] package supports the use of a cost matrix for class error weighting via the class_weight attribute. Using a class weight of five on negative examples (and one on positive examples) with BoW reduced the false positive rate, while also increasing accuracy. In fact, when inferring links, using the class weight reduced the proportion of results with probability above 0.5 from 29% to 19%. However, when we increased the class weight of negative examples and used the RandomForestClassifier on input embeddings from GloVE, BERT, or F-BERT, the false positive rate increased. Ultimately, we used, for link inference, a class weight of five on negative examples when BOW embeddings were inputs, and the default class weight of one when GloVE, BERT, and F-BERT embeddings were used to train RandomForestClassifier.

Table 13.

	Entries	Count
Tactic	[Initial-access, Persistence]	2
Technique	[Default Accounts, Boot or Logon Initialization Scripts, Web Shell, Shortcut Modification, Dynamic Linker Hijacking, Services File Permissions Weakness]	6
CAPEC	[Try Common or Default Usernames and Passwords, Run Software at Logon, Upload a Web Shell to a Web Server, Symlink Attack, Subverting Environment Variable Values, Using Malicious Files]	6
CWE	[Use of Hard-coded Credentials, Improper Access Control, Improper Authentication, Improper Link Resolution Before File Access (“Link Following”), Improper Neutralization of Special Elements in Output Used by a Downstream Component (“Injection”), Improper Input Validation, Exposure of Sensitive Information to an Unauthorized Actor, Incorrect Permission Assignment for Critical Resource]	8
CVE	[CVE-2017-14143, CVE-2018-10575, CVE-2015-2509, CVE-2015-4624, CVE-2016-1543, CVE-2016-9722, CVE-2009-0695, CVE-2010-4279, CVE-2013-1080, CVE-2013-6117, CVE-2014-3139, CVE-2015-1486, CVE-2017-12477, CVE-2017-12478, CVE-2017-13872, CVE-2017-17560, CVE-2018-12613, CVE-2018-20735, CVE-2010-3847, CVE-2015-3315, CVE-2016-6253, CVE-2013-3214, CVE-2015-7309, CVE-2006-4842, CVE-2008-2683, CVE-2008-6791, CVE-2010-3904, CVE-2011-2763, CVE-2011-3496, CVE-2012-0267, CVE-2012-3399, CVE-2012-3485, CVE-2012-6554, CVE-2013-1362, CVE-2013-1892, CVE-2013-2143, CVE-2013-5045, CVE-2013-5576, CVE-2013-6282, CVE-2014-0038, CVE-2014-0257, CVE-2014-0476, CVE-2014-4114, CVE-2014-4971, CVE-2014-8361, CVE-2015-3245, CVE-2015-6567, CVE-2016-0792, CVE-2016-2098, CVE-2016-3087, CVE-2016-3088, CVE-2016-3714, CVE-2016-6433, CVE-2017-0143, CVE-2017-11346, CVE-2017-11394, CVE-2017-12500, CVE-2017-17562, CVE-2017-5638, CVE-2017-5816, CVE-2017-5817, CVE-2017-6316, CVE-2017-6516, CVE-2017-9791, CVE-2018-1000049, CVE-2018-11776, CVE-2018-5955, CVE-2018-7600, CVE-2011-3829, CVE-2012-3996, CVE-2013-0632, CVE-2015-2433, CVE-2016-4655, CVE-2016-9349, CVE-2017-17692, CVE-2018-6849, CVE-2018-9948, CVE-2019-1653, CVE-2011-3923]	79
Metasploit	[Kaltura, Watchguard AP100 AP102 AP200 1.2.9.15, Microsoft Windows Media Center, Hak5 WiFi Pineapple 2.4, BMC Server Automation RSCD Agent, IBM QRadar SIEM, Wyse, Pandora FMS 3.1, Novell ZENworks Configuration Management 10 SP3/11 SP2, Dahua DVR 2.608.0000.0/2.608.GV00.0, Unitrends Enterprise Backup 7.3.0, Symantec Endpoint Protection Manager, Unitrends UEB 9, Unitrends UEB, Apple macOS 10.13.1 (High Sierra), Western Digital MyCloud, phpMyAdmin, BMC Patrol Agent, glibc, ABRT, NetBSD, vTiger CRM 5.4.0 SOAP, CMS Bolt, Solaris, Black Ice Cover Page SDK, PumpKIN TFTP Server 2.7.2.0, Linux 2.6.30 < 2.6.36, LifeSize Room, Measuresoft ScadaPro 4.0.0, NTR, Basilic 1.5.14, Tunnelblick, Active Collab ’chat module’ < 2.3.8, Nagios Remote Plugin Executor, MongoDB, Katello (RedHat Satellite), Microsoft Registry Symlink, Joomla! Component Media Manager, Google Android, Linux Kernel 3.13.1, Microsoft .NET Deployment Service, Chkrootkit, Microsoft Windows, Microsoft Bluetooth Personal Area Networking, Realtek SDK, Libuser, Wolf CMS 0.8.2, Jenkins, Ruby on Rails ActionPack Inline ERB, Apache Struts, ActiveMQ < 5.14.0, ImageMagick 6.9.3, Cisco Firepower Management Console 6.0, ManageEngine Desktop Central 10 Build 100087, Trend Micro OfficeScan 11.0/XG (12.0), HPE iMC 7.3, GoAhead Web Server 2.5 < 3.6.5, Apache Struts 2.3.5 < 2.3.31 / 2.5 < 2.5.10, HPE iMC, Netscaler SD, MagniComp SysInfo, Apache Struts 2, Nanopool Claymore Dual Miner, GitStack, Drupal < 8.3.9 / < 8.4.6 / < 8.5.1, Support Incident Tracker 3.65, Tiki Wiki CMS Groupware 8.3, Adobe ColdFusion 9, WebKit, Advantech SUSIAccess < 3.0, Samsung Internet Browser, WebRTC, Foxit PDF Reader 9.0.1.1049, Cisco RV320 and RV325]	74

Table 13. Entries on Paths to Metasploit Exploits for the Initial-access and Persistence Tactics Retrieved from BRON

Table 14.

Tactic	Technique	CAPEC	CWE	CVE	Metasploit
Persistence	Dynamic Linker Hijacking	Subverting Environment Variable Values	Exposure of Sensitive Information to an Unauthorized Actor	CVE-2018-6849	WebRTC
Persistence	Dynamic Linker Hijacking	Subverting Environment Variable Values	Improper Input Validation	CVE-2017-5817	HPE iMC
Persistence	Dynamic Linker Hijacking	Subverting Environment Variable Values	Improper Input Validation	CVE-2017-5816	HPE iMC
Initial-access	Default Accounts	Try Common or Default Usernames and Passwords	Use of Hard-coded Credentials	CVE-2017-14143	Kaltura
Persistence	Default Accounts	Try Common or Default Usernames and Passwords	Use of Hard-coded Credentials	CVE-2017-14143	Kaltura
Persistence	Dynamic Linker Hijacking	Subverting Environment Variable Values	Improper Input Validation	CVE-2017-11394	Trend Micro OfficeScan 11.0/XG (12.0)

Table 14. Example of Paths to Metasploit Exploits for the Initial-access, Persistence Retrieved from BRON

Ordered by CVE ID.

Table 15.

	Entries	Count
Tactic	[Initial-access, Persistence]	2
Technique	[Default Accounts, Dynamic Linker Hijacking]	2
CAPEC	[Try Common or Default Usernames and Passwords, Subverting Environment Variable Values]	2
CWE	[Use of Hard-coded Credentials, Improper Input Validation, Exposure of Sensitive Information to an Unauthorized Actor]	3
CVE	[CVE-2017-14143, CVE-2017-11394, CVE-2017-5816, CVE-2017-5817, CVE-2018-6849]	5
Metasploit	[Kaltura, Trend Micro OfficeScan 11.0/XG (12.0), HPE iMC, WebRTC]	4

Table 15. Entries on Paths to Metasploit Exploits for the Initial-access, Persistence Retrieved from BRON

Table 16.

Name	Mutation probability	Crossover probability	Elite size	Tournament Size	Population Size
GE	0.1	0.8	0	2	10

Table 16. GE(CCA) Hyperparameters

All experiments were conducted with a population size of 10 and tournament size of 2.

Per Figure 4, examples of positive relationships, i.e., related entries, are vastly outnumbered by unrelated ones. For example, there are only 117 technique uses attack-pattern examples. This introduced a class imbalance for training the RandomForestClassifier, which we addressed by under-sampling the negative class and training on a smaller but balanced training set.

References

[1]

NIST. 2022. NVD - Vulnerability Metrics. Retrieved from https://nvd.nist.gov/vuln-metrics/cvss

Abstract

1 Introduction

2 Unified Cyber Security Knowledge Sources

2.1 Motivation behind BRON

2.2 BRON’s Property Graph

2.3 Knowledge Sources in BRON and Their Selection

2.4 BRON Descriptive Statistics

3 Cyber Hunting and Analytics with BRON

3.1 Source Navigation for Threat Mapping

3.1.1 Mapping APTs to Targets and Vice Versa.

3.1.2 Mapping Tactics and Exploit Tools.

3.2 Red Team Planning

3.3 Modeling Defensive Postures

3.3.1 Background on Using Machine Learning to Find Nash Equilibria.

3.3.2 Modeling Attack Patterns and Patches.

3.3.3 Experiments for Finding and Evaluating Defense Postures.

4 Inference of Novel Relationships with Machine Learning

4.1 Relationship Inference Method

4.1.1 Relationship Inference Workflow.

4.1.2 Preliminary Stage - Supervised Training.

4.2 Relationship Inference Workflow Setup

4.3 Relationship Inference Results

4.3.1 Relationship Inference Model Training Results.

4.3.2 Relationship Inference Workflow Results.

4.3.3 Exploration Examples with Inferred Links.

5 Discussion

5.1 Observations

5.2 Limitations & Threats to Validity

6 Related Work

6.1 Cyber Hunting, Analytics, and Modeling & Simulation

6.2 Inference of Relationships with Machine Learning

7 Conclusions & Future Work

Footnotes

A Appendix

A.1 Glossary

A.2 BRON

A.2.1 Notation.

A.2.2 Property Graph Construction from Knowledge Sources.

A.3 Red Team Planning

A.4 Modeling and Simulation of Defensive Postures

A.5 Relationship Inference Workflow Setup Details

References

Cited By

Index Terms

Recommendations

Evolving techniques in cyber threat hunting: A systematic review

Data-Driven Threat Hunting Using Sysmon

Machine Learning–based Cyber Attacks Targeting on Controlled Information: A Survey

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations