1 Introduction
The practice of
Coordinated Vulnerability Disclosure (
CVD) emerged as part of a growing consensus to develop normative behaviors in response to the persistent fact of vulnerable software. Yet while the basic principles of Coordinated Vulnerability Disclosure (CVD) have been established [
13,
22,
23,
30], to date there has been limited work to measure the efficacy of CVD programs, especially at the scale of industry benchmarks.
ISO 29147 [
23] sets out the goals of vulnerability disclosure:
(a) ensuring that identified vulnerabilities are addressed;
(b) minimizing the risk from vulnerabilities;
(c) providing users with sufficient information to evaluate risks from vulnerabilities to their systems;
(d) setting expectations to promote positive communication and coordination among involved parties.
Meanwhile, use of third party libraries and shared code components across vendors and their products creates a need to coordinate across those parties whenever a vulnerability is found in a shared component.
Multi-Party Coordinated Vulnerability Disclosure (
MPCVD) is a more complex form of CVD, as illustrated by the Senate hearings about the Meltdown and Spectre vulnerabilities [
31]. The need for Multi-Party Coordinated Vulnerability Disclosure (MPCVD) arises from the complexities of the software supply chain [
18]. Nevertheless, the goals of CVD apply to MPCVD, as the latter is merely a special case of the former.
The difficulty of MPCVD derives from the diversity of its stakeholders: Software vendors have different development budgets, schedules, tempos, and analysis capabilities to do any of isolate, understand, or fix vulnerabilities. Additionally, they face diverse customer support expectations and obligations, and an increasing variety of regulatory regimes governing some stakeholders but not others. For these and many other reasons, practitioners of MPCVD highlight
fairness as a core difficulty in coordinating disclosures across vendors [
22].
So with the goal of minimizing the societal harm that results from the existence of a vulnerability in multiple products spread across multiple vendors, our motivating question is “What does fair mean in MPCVD?”. Optimizing MPCVD directly is not currently possible as we lack a utility function to map from the events that occur in a given case to the impact that case has on the world. While this article does not fully address the problem, it sets out a number of steps toward a solution. We seek a way to sort MPCVD cases into better or worse outcomes. Ideally the sorting criteria should be agreed based on unambiguous principles, which are intelligible by all interested parties. Furthermore, we seek a way to measure relevant features across MPCVD cases. Feature observability is a key factor: Our measurement needs to be simple and repeatable without relying overly much on proprietary or easily hidden information.
While a definition of
fair in MPCVD is a responsibility for the broader community, we focus on evaluating the skill of the coordinator. We expect this contributes to fairness based on the EthicsfIRST principles of ethics for incident response teams promoted by
Forum of Incident Response and Security Teams (
FIRST) [
39].
1 To that end, our research questions are:
RQ1: Construct a model of CVD states amenable to analysis and also future generalization to MPCVD.
RQ2: What is a reasonable baseline expectation for ordering of events in the model of CVD?
RQ3: Given this baseline and model, does CVD as observed “in the wild” demonstrate skillful behavior?
This article primarily focuses on the simpler case of CVD. This focus provides an opportunity for incremental analysis of the success of the model; MPCVD modeling can follow in future work.
1.1 Contributions
The contributions of this article are as follows:
—
We define a simple yet comprehensive model of possible disclosure histories in Section
3 and a set of criteria to order them in Section
4, which will address
RQ1.
—
We explore the implications of our model with respect to expected outcomes in Section
5, which will address
RQ2.
—
We propose a method for measuring the relative contribution of both skill and luck to observations of CVD outcomes over time in Section
6.
—
We demonstrate the application of these techniques to analyze the efficacy of observed CVD processes in Section
7, which will address
RQ3.
A discussion of how the model could be applied to benchmarks and multiparty coordination follows in Section
8. Section
9 describes the limitations of the approach and lays out future work to improve it. Section
10 surveys related work, and Section
11 summarizes and concludes.
2 Events in A Vulnerability Lifecycle
The goal of this section is to establish a model of events that affect the outcomes of vulnerability disclosure.
Our model builds on previous models of the vulnerability lifecycle, specifically those of Arbaugh et al.[
1], Frei et al.[
19], and Bilge and et al.[
8]. A more thorough literature review of vulnerability lifecycle models can be found in [
27].
Because we are modeling only the disclosure process, we assume the vulnerability both exists and is known to at least someone. Therefore, we ignore the
birth (
creation,
introduced) and
discovery states as they are implied at the beginning of all possible vulnerability disclosure histories. We also omit the
anti-virus signatures released event from [
8] since we are not attempting to model vulnerability management operations in detail.
The first event we are interested in modeling is Vendor Awareness (
\(V\)). This event corresponds to
Disclosure in [
1] and
vulnerability discovered by vendor in [
8] (this event is not modeled in [
19]). We are not concerned with
how the vendor came to find out about the vulnerability’s existence, whether it was found via internal testing, reported by a security researcher, or noticed as the result of incident analysis.
The second event we include is Public Awareness (
\(P\)) of the vulnerability. This event corresponds to
Publication in [
1],
time of public disclosure in [
19], and
vulnerability disclosed publicly in [
8]. The public might find out about a vulnerability through the vendor’s announcement of a fix, a news report about a security breach, a conference presentation by a researcher, by comparing released software versions as in [
40,
41], or any of a variety of other means. As above, we are primarily concerned with the occurrence of the event itself rather than the details of
how the
\(P\) event arises.
The third event we address is Fix Readiness (
\(F\)), by which we refer to the vendor’s creation and possession of a fix that
could be deployed to a vulnerable system,
if the system owner knew of its existence. Here we differ somewhat from [
1,
8,
19] in that their models address the
release of the fix rather than its
readiness for release.
The reason for this distinction will be made clear, but first we must mention that Fix deployed (\(D\)) is simply that: the fix exists, and it has been deployed.
We chose to include the Fix Ready (
\(F\)), Fix Deployed (
\(D\)), and Public Awareness (
\(P\)) events so that our model could better accommodate two common modes of modern software deployment:
—
shrinkwrap —The traditional distribution mode in which the vendor and deployer are distinct entities and deployers must be made aware of the fix before it can be deployed. In this case, which corresponds to the previously mentioned fix release event, both fix readiness (\(F\)), and public awareness (\(P\)) are necessary for the fix to be deployed (\(D\)).
—
SaaS—A more recent delivery mode in which the vendor also plays the role of deployer. In this distribution mode, fix readiness (\(F\)) can lead directly to fix deployed (\(D\)) with no dependency on public awareness (\(P\)).
We note that so-called silent fixes by vendors can sometimes result in a fix being deployed without public awareness even if the vendor is not the deployer. Thus, it is possible, albeit unlikely, for \(D\) to occur before \(P\) even in the shrinkwrap case above. It is also possible, and somewhat more likely, for \(P\) to occur before \(D\) in the SaaS case as well.
We diverge from [
1,
8,
19] again in our treatment of exploits and attacks. Because attacks and exploit availability are often discretely observable events, the broader concept of
exploit automation in [
1] is insufficiently precise for our use. Both [
8,
19] focus on the availability of exploits rather than attacks, but the observability of their chosen events is hampered by attackers’ incentives to maintain stealth. Frei et al. [
19] use
exploit availability, whereas Bilge et al.[
8] call it
exploit released in wild. Both refer to the state in which an exploit is known to exist, but this can arise for at least two distinct reasons, which we wish to discriminate:
—
exploit public (\(X\))—when the method of exploitation for a vulnerability has been made public in sufficient detail to be reproduced by others. Proof of concept (POC) code posted to a widely available site or inclusion of the exploit in a commonly available exploit tool meets this criteria, whereas privately held exploits do not.
—
attacks observed (\(A\))—when the vulnerability has been observed to be exploited in attacks. In this case, one has evidence that the vulnerability has been exploited and can presume the existence of an exploit regardless of its availability to the public. Analysis of malware from an incident might meet \(A\) but not \(X\) depending on how closely held the malware is thought to be to the attacker. Use of an already public exploit in an attack meets both \(X\) and \(A\).
Therefore, while we appreciate the existence of a hidden exploit exists event as causal predecessor of both exploit public and attacks, for our model we assert no causal relationship between \(X\) and \(A\). We make this choice in the interest of observability. The exploit exists event is difficult to consistently observe independently of the two events we have chosen to use; its occurrence is nearly always inferred from the observation of either exploit public or attacks.
A summary of this model comparison is shown in Table
1. Further discussion of related work can be found in Section
10.
2.1 Definitions and Notation
Before we discuss either possible histories (Section
3) or desirable histories (Section
4) in the vulnerability life cycle, we need to formally define our terms. In all these definitions, we take standard Zermelo–Fraenkel set theory. The concept of sequences extends set theory to include a concept of ordered sets. From them, we adopt the following notation:
—
\(\lbrace \dots \rbrace\) An unordered set, which makes no assertions about sequence.
—
The normal proper subset (\(\subset\)), equality (\(=\)), and subset (\(\subseteq\)) relations between sets.
—
\((\dots)\) An ordered set in which the events \(e\) occur in that sequence.
—
The precedes (
\(\prec\)) relation on members of an ordered set:
\(e_i \prec e_j \textrm { if and only if } e_i,e_j \in s \textrm { and } i \lt j\) where
\(s\) is as defined in (
2).
From Table
1, we define the set of events
\(E\)A sequence
\(s\) is an ordered set of some number of events
\(e_i \in E\) for
\(1 \le i \le n\) and the length of
\(s\) is
\(|s| \stackrel{\mathsf {def}}{=}n\).
A valid vulnerability coordination history
\(h\) is a sequence
\(s\) containing one and only one of each of the event types in
\(E\); by definition
\(|h| = |E| = 6\). Note this is a slight abuse of notation;
\(|\textrm { }|\) represents both sequence length and the cardinality of a set
where two members of the set
\(E\) are equal if they are represented by the same symbol and not equal otherwise. The set of all possible histories,
\(S_H\), is a set of all the sequences
\(h\) that satisfy this definition.
3 THE Possible Histories OF CVD
Given that a history
\(h\) contains all six events
\(E\) in some order, there are at most 720 (
\(_{6} \mathrm{P}_{6} = 6! = 720\)) possible histories. That is,
\(|S_H| = 720\). However, we can apply causal constraints as follows:
—
vendor awareness must precede fix ready (\(V \prec F\));
—
fix ready must precede fix deployed (\(F \prec D\)).
In symbols, this puts the following constraints on the possible set of histories, which we call
\(H_0\)We further impose two simplifying assumptions. The first is that vendors know at least as much as the public does. In other words, all histories must meet one of two criteria: either
Vendor Awareness \(V\) precedes
Public Awareness \(P\) or else
Vendor Awareness must immediately follow it
The second is that the public can be informed about a vulnerability by a public exploit. Therefore, either
public awareness precedes
exploit public or must immediately follow it
Combining them, we arrive at our formal definition of valid possible histories as all sequences meeting the three constraining assumptions (
4), (
5), and (
6)
Once these constraints are applied, only 70 possible histories
\(h \in S_H\) remain viable (
\(|H| = 70\)). This model is amenable for analysis of CVD, but we need to add a way to express preferences before it is complete. Thus, we are part way through
RQ1. Section
8.2 will address how this model can generalize from CVD to MPCVD.
These histories are listed exhaustively in Table
2. The skill ranking function on the histories will be defined in Section
5.5. The desirability of the history (
\(\mathbb {D}^h\)) will be defined in Section
4. The expected frequency of each history
\(f_h\) is explained in Section
5.2.
4 ON THE Desirability OF Possible Histories
All histories are not equally preferable. Some are quite bad—for example, those in which attacks precede vendor awareness (\(A \prec V\))—while others are very desirable, for example, those in which fixes are deployed before either an exploit is made public (\(D \prec X\)) or attacks (\(D \prec A\)).
In pursuit of a way to reason about our preferences for some histories over others, we define the following preference criteria: history
\(h_a\) is preferred over history
\(h_b\) if, all else being equal, a more desirable event
\(e_1\) precedes a less desirable event
\(e_2\). In notation,
\(e_1 \prec e_2\). We define the following ordering preferences:
—
\(V \prec P\), \(V \prec X\), or \(V \prec A\) – Vendors can take no action to produce a fix if they are unaware of the vulnerability. Public awareness prior to vendor awareness can cause increased support costs for vendors at the same time they are experiencing increased pressure to prepare a fix. If public awareness of the vulnerability prior to vendor awareness is bad, then a public exploit is at least as bad because it encompasses the former and makes it readily evident that adversaries have exploit code available for use. Attacks prior to vendor awareness represent a complete failure of the vulnerability remediation process because they indicate that adversaries are far ahead of defenders.
—
\(F \prec P\), \(F \prec X\), or \(F \prec A\) – As noted above, the public can take no action until a fix is ready. Because public awareness also implies adversary awareness, the vendor/adversary race becomes even more critical if this condition is unmet. When fixes exist before exploits or attacks, defenders are better able to protect their users.
—
\(D \prec P\), \(D \prec X\), or \(D \prec A\) – Even better than vendor awareness and fix availability prior to public awareness, exploit publication, or attacks are scenarios in which fixes are deployed prior to one or more of those events.
—
\(P \prec X\) or \(P \prec A\) – In many cases, \(D\) requires system owners to take action. We therefore prefer histories in which public awareness happens prior to either exploit publication or attacks.
—
\(X \prec A\) – This criteria is not about whether exploits should be published or not.
2 It is about whether we should prefer histories in which exploits are published
and then attacks happen over histories in which attacks happen
and then an exploit is published. Our position is that attackers have more advantages in the latter case than the former, and therefore we should prefer histories in which
\(X \prec A\).
Equation (
8) formalizes our definition of desired orderings
\(\mathbb {D}\). Table
3 displays all 36 possible orderings of paired events and whether they are considered impossible, required (as defined by Equation (
4)), desirable (as defined by Equation (
8)), or undesirable (the complement of the set defined in Equation (
8)).
Before proceeding, we note that our model focuses on the ordering of events, not their timing. We acknowledge that in some situations, the interval between events may be of more interest than merely the order of those events, as a rapid tempo of events can alter the options available to stakeholders in their response. We discuss this limitation further in Section
9; however, the following model posits event sequence timing on a human-oriented timescale measured in minutes to weeks.
An element
\(d \in \mathbb {D}\) is of the form
\(e_i \prec e_j\). More formally,
\(d\) is a relation of the form
\(d\left(e_1, e_2, \prec \right)\).
\(\mathbb {D}\) is a set of such relations.
Given the desired preferences over orderings of events (
\(\mathbb {D}\) in Equation (
8)), we can construct a partial ordering over all possible histories
\(H\), as defined in Equation (
10). This partial order requires a formal definition of which desiderata are met by a given history, provided by (
9).
A visualization of the resulting partially ordered set, or poset,
\((H,\le _{H})\) is shown as a Hasse Diagram in Figure
1. Hasse Diagrams represent the transitive reduction of a poset. Each node in the diagram represents an individual history
\(h_a\) from Table
2; labels correspond to the index of the table. Figure
1 follows Equation (
10), in that
\(h_a\) is higher in the order than
\(h_b\) when
\(h_a\) contains all the desiderata from
\(h_b\) and at least one more. Histories that do not share a path are incomparable (formally, two histories incomparable if both
\(\mathbb {D}^{h_a} \not\supset \mathbb {D}^{h_b}\) and
\(\mathbb {D}^{h_a} \not\subset \mathbb {D}^{h_b}\)). The diagram flows from least desirable histories at the bottom to most desirable at the top. This model satisfies
RQ1; Sections
5 and
6 will demonstrate that the model is amenable to analysis and Section
8.2 will lay out the criteria for extending it to cover MPCVD.
The poset \((H,\le _{H})\), has as its upper bound \(h_{69} = (V, F, D, P, X, A)\), while its lower bound is \(h_{0} = (A, X, P, V, F, D)\).
Thus far, we have made no assertions about the relative desirability of any two desiderata (that is,
\(d_i,d_j \in \mathbb {D}\) where
\(i \ne j\)). In the next section, we will expand the model to include a partial order over our desiderata, but for now it is sufficient to note that any simple ordering over
\(\mathbb {D}\) would remain compatible with the partial order given in Equation (
10). In fact, a total order on
\(\mathbb {D}\) would create a linear extension of the poset defined here, whereas a partial order on
\(\mathbb {D}\) would result in a more constrained poset of which this poset would be a subset.
5 Reasoning Over Possible Histories
Our goal in this section is to formulate a way to rank our undifferentiated desiderata
\(\mathbb {D}\) from Section
4 in order to develop the concept of CVD skill and its measurement in Section
6. This will provide a baseline expectation about events (
RQ2).
In order to begin to differentiate skill from chance in Section
6, we need a model of what the CVD world would look like without any skill. We cannot derive this model by observation. Even when CVD was first practiced in the 1980s, some people may have had social, technical, or organizational skills that transferred to better CVD. We follow the principle of indifference, as stated in [
17]:
Principle of Indifference: Let \(X = \lbrace x_1,x_2,\ldots ,x_n\rbrace\) be a partition of the set \(W\) of possible worlds into \(n\) mutually exclusive and jointly exhaustive possibilities. In the absence of any relevant evidence pertaining to which cell of the partition is the true one, a rational agent should assign an equal initial credence of \(n\) to each cell.
While the principle of indifference is rather strong, it is inherently a bit difficult to reason about absolutely skill-less CVD when the work of CVD is, by its nature, a skilled job. We will use the principle of indifference to define a baseline against which measurement can be meaningful. For additional analysis of the application of the principle of indifference to this problem, see [
21, Section 3.3].
5.1 Event Frequency Analysis
We model event frequency with a simple state-based model of the possible histories
\(h \in H\) in which each state is a binary vector indicating which events
\(e \in E\) have occurred prior to reaching that state. The events
\(e \in E\) therefore represent state transitions, and the histories
\(h \in H\) are paths (traces) through the states. This meets the definition above because each
\(e \in E\) is unique (mutually exclusive) and the set of available
\(e\) at each step of the way is exhaustive. Let
\(E^h_{i+1}\) be the set of possible next events following the
\(i\)th event in history
\(h\), which is a subset of all possible events:
\(E^h_{i+1} \subseteq E\). The fragment of a history
\(h\) up to its
\(i\)th element is a sequence, which contains the first
\(i\) events of
\(h\), denoted as
\(h_i\). The initial case
\(h_0 \stackrel{\mathsf {def}}{=}\emptyset\). The probability of transition from
\(e_i\) to any of the possible next events
\(e_{i+1}\), where
\(e_{i+1}\in E^h_{i+1}\), is defined, based on the principle of indifference, as the inverse of the cardinality of the set of possible states, namely:
For example, because Equation (
4) requires
\(V \prec F\) and
\(F \prec D\), only four of the six events in
\(E\) are possible at the beginning of a history:
\(\lbrace V,P,X,A\rbrace\). Therefore,
\(p(F|\emptyset) = p(D|\emptyset) = 0\). Since the principle of indifference assigns each possible transition event as equally probable in this model of unskilled CVD, we assign an initial probability of 0.25 to each possible event (
\(p(V|\emptyset) = p(P|\emptyset) = p(X|\emptyset) = p(A|\emptyset) = 0.25\)). From there, we see that the other rules dictate possible transitions from each subsequent state. For example, Equation (
5) says that any
\(h\) starting with
\(P\) must start with
\(PV\). And Equation (
6) requires any
\(h\) starting with
\(X\) must proceed through
\(XP\) and again Equation (
5) gets us to
\(XPV\). Therefore, we expect histories starting with
\(PV\) or
\(XPV\) to occur with frequency 0.25 as well.
5.2 History Frequency Analysis
We apply the principle of indifference to the available events (
\(E^h_{i+1} \subseteq E\)) at each state
\(i\) for each of the possible histories to compute the expected frequency of each history, which we denote as
\(f_h\). The frequency of a history
\(f_h\) is the cumulative product of the probability
\(p\) of each event
\(e\) in the history
\(h\). We are only concerned with histories that meet our sequence constraints, namely,
\(h \in H\)Table
2 displays the value of
\(f_h\) for each history. Having an expected frequency (
\(f_h\)) for each history
\(h\) will allow us to examine how often we might expect our desiderata
\(d \in \mathbb {D}\) to occur across
\(H\).
Choosing uniformly over event transitions is more useful than treating the six-element histories as uniformly distributed. For example, \(P \prec A\) in 59% of valid histories, but when histories are weighted by the assumption of uniform state transitions \(P \prec A\) is expected to occur in 67% of the time. These differences arise due to the dependencies between some states. Since CVD practice is comprised a sequence of events, each informed by the last, our uniform distribution over events is more likely a useful baseline than a uniform distribution over histories.
5.3 Event Order Frequency Analysis
Each of the event pair orderings in Table
3 can be treated as a Boolean condition that either holds or does not hold in any given history.
In Section
5.2, we described how to compute the expected frequency of each history (
\(f_h\)) given the presumption of indifference to possible events at each step. We can use
\(f_h\) as a weighting factor to compute the expected frequency of event orderings (
\(e_i \prec e_j\)) across all possible histories
\(H\). Equations (
13) and (
14) define the frequency of an ordering
\(f_{e_i \prec e_j}\) as the sum over all histories in which the ordering occurs (
\(H^{e_i \prec e_j}\)) of the frequency of each such history (
\(f_h\)) as shown in Table
2.
Table
4 displays the results of this calculation. Required event orderings have an expected frequency of 1, while impossible orderings have an expected frequency of 0. As defined in Section
4, each desiderata
\(d \in \mathbb {D}\) is specified as an event ordering of the form
\(e_i \prec e_j\). We use
\(f_d\) to denote the expected frequency of a given desiderata
\(d \in \mathbb {D}\). The values for the relevant
\(f_d\) appear in the upper right of Table
4. Some event orderings have higher expected frequencies than others. For example, vendor awareness precedes attacks in 3 out of 4 histories in a uniform distribution of event transitions (
\(f_{V \prec A} = 0.75\)), whereas fix deployed prior to public awareness holds in less than 1 out of 25 (
\(f_{D \prec P} = 0.037\)) histories generated by a uniform distribution over event transitions.
5.4 A Partial Order on Desiderata
Any observations of phenomena in which we measure the performance of human actors can attribute some portion of the outcome to skill and some portion to chance [
15,
26]. It is reasonable to wonder whether good outcomes in CVD are the result of luck or skill. How can we tell the difference?
We begin with a simple model in which outcomes
\(o\) are a combination of luck and skill,
In other words, outcomes due to skill are what remain when you subtract the outcomes due to luck from the outcomes you observe. In this model, we treat
luck as a random component: the contribution of chance. In a world where neither attackers nor defenders held any advantage and events were chosen uniformly from
\(E\) whenever they were possible, we would expect to see the preferred orderings occur with probability equivalent to their frequency
\(f_d\) as shown in Table
4.
Skill, on the other hand, accounts for the outcomes once luck has been accounted for. So the more likely an outcome is due to luck, the less skill we can infer when it is observed. As an example, from Table
4 we see that fix deployed before the vulnerability is public is the rarest of our desiderata with
\(f_{D \prec P} = 0.037\), and thus exhibits the most skill when observed. On the other hand, vendor awareness before attacks is expected to be a common occurrence with
\(f_{V \prec A} = 0.75\).
We can therefore use the set of
\(f_d\) to construct a partial order over
\(\mathbb {D}\) in which we prefer desiderata
\(d,\) which are more rare (and therefore imply more skill when observed) over those that are more common. We create the partial order on
\(\mathbb {D}\) as follows: for any pair
\(d_1,d_2 \in \mathbb {D}\), we say that
\(d_2\) exhibits less skill than
\(d_1\) if
\(d_2\) occurs more frequently in
\(H\) than
\(d_1\)Note that the inequalities on the left and right sides of Equation (
16) are flipped because skill is inversely proportional to luck. Also, while
\(\le _{\mathbb {D}}\) on the left side of Equation (
16) defines a preorder over the poset
\(H\), the
\(\stackrel{\mathbb {R}}{\ge }\) is the usual ordering over the set of real numbers. The result is a partial order
\((\mathbb {D},\le _{\mathbb {D}})\) because a few
\(d\) have the same
\(f_d\) (
\(f_{F \prec X} = f_{V \prec P} = 0.333,\) for example). The full Hasse Diagram for the partial order
\((\mathbb {D},\le _{\mathbb {D}})\) is shown in Figure
2.
5.5 Ordering Possible Histories by Skill
Next we develop a new partial order on \(H\) given the partial order \((\mathbb {D},\le _{\mathbb {D}})\) just described. We observe that \(\mathbb {D}^{h}\) acts as a Boolean vector of desiderata met by a given \(h\). Since \(0 \le f_d \le 1\), simply taking its inverse could in the general case lead to some large values for rare events, so for convenience we use \(log(1/f_d)\) as our proxy for skill. Taking the dot product of \(\mathbb {D}^h\) with the set of \(log(1/f_d)\) represented as a vector, we arrive at a single value representing the skill exhibited for each history \(h\). Careful readers may note that this value is equivalent to the Term Frequency—Inverse Document Frequency (TF-IDF) score for a search for the “skill terms” represented by \(\mathbb {D}\) across the corpus of possible histories \(H\).
We have now computed a skill value for every
\(h \in H\), which allows us to sort
\(H\) and assign a rank to each history
\(h\) contained therein. The rank is shown in Table
2. Rank values start at 1 for least skill up to a maximum of 62 for most skill. Owing to the partial order
\((\mathbb {D},\le _{\mathbb {D}})\), some
\(h\) have the same computed skill values, and these are given the same rank.
The ranks for
\(h \in H\) lead directly to a new poset
\((H,\le _{\mathbb {D}})\), which is an extension of and fully compatible with
\((H,\le _{H})\) as developed in Section
4. The resulting Hasse Diagram would be too large to reproduce here. Instead, we include the resulting rank for each
\(h\) as a column in Table
2. In the table, rank is ordered from least desirable and skillful histories to most. Histories having identical rank are incomparable to each other within the poset. The refined poset
\((H,\le _{\mathbb {D}})\) is much closer to a total order on
\(H\), as indicated by the relatively few histories having duplicate ranks.
The remaining incomparable histories are the direct result of the incomparable
\(d\) in
\((\mathbb {D},\le _{\mathbb {D}})\), corresponding to the branches in Figure
2. Achieving a total order on
\(\mathbb {D}\) would require answering the following: Assuming you could achieve only one, and without regard to any other
\(d \in \mathbb {D}\), would you prefer
—
that fix ready precede exploit publication (\(F \prec X\)) or that vendor awareness precede public awareness (\(V \prec P\))?
—
that public awareness precede exploit publication (\(P \prec X\)) or that exploit publication precede attacks (\(X \prec A\))?
—
that public awareness precede attacks (\(P \prec A\)) or vendor awareness precede exploit publication (\(V \prec X\))?
Recognizing that readers may have diverse opinions on all three questions, we leave further analysis of the answers to these as future work.
This is just one example of how poset refinements might be used to order \(H\). Different posets on \(\mathbb {D}\) would lead to different posets on \(H\). For example, one might construct a different poset if certain \(d\) were considered to have much higher financial value when achieved than others.
6 Discriminating Skill AND Luck in Observations
This section defines a method for measuring skillful behavior in CVD, which we will need to answer RQ3 about measuring and evaluating CVD “in the wild.” The measurement method makes use of all the modeling tools and baselines established thus far: A comprehensive set of possible histories \(H\), a partial order over them in terms of the presence of desired event precedence \(\mathbb {D}\), and the a priori expected frequency of each desiderata \(d \in \mathbb {D}\).
If we expected to be able to observe all events in all CVD cases, we could be assured of having complete histories and could be done here. But the real world is messy. Not all events \(e \in E\) are always observable. We need to develop a way to make sense of what we can observe, regardless of whether we are ever able to capture complete histories. Continuing towards our goal of measuring efficacy, we return to considering the balance between skill and luck in determining our observed outcomes.
Of course, there are any number of conceivable reasons why we should expect our observations to differ from the expected frequencies we established in Section
5. Adversaries might be rare, or conversely very well equipped. Vendors might be very good at releasing fixes faster than adversaries can discover vulnerabilities and develop exploits for them. System owners might be diligent at applying patches. We did say
might, did we not? Regardless, for now we will lump all of those possible explanations into a single attribute we will call “skill.”
In a world of pure skill, one would expect that a player could achieve all 12 desiderata \(d \in \mathbb {D}\) consistently. That is, a maximally skillful player could consistently achieve the specific ordering \(h=(V,F,D,P,X,A)\) with probability \(p_s = 1\).
Thus, we construct the following model: For each of our preferred orderings
\(d \in \mathbb {D}\), we model their occurrence due to luck using the binomial distribution with parameter
\(p_l = f_d\) taken from Table
4.
Recall that the mean of a binomial distribution is simply the probability of success
\(p\), and that the mean of a weighted mixture of two binomial distributions is simply the weighted mixture of the individual means. Therefore, our model adds a parameter
\(\alpha _d\) to represent the weighting between our success rates arising from skill
\(p_s\) and luck
\(p_l\). Because there are 12 desiderata
\(d \in \mathbb {D}\), each
\(d\) will have its own observations and corresponding value for
\(\alpha _d\) for each history
\(h_a\),
Where
\(f_d^{obs}\) is the observed frequency of successes for desiderata
\(d\). Because
\(p_s = 1\), one of those binomial distributions is degenerate. Substituting
\(p_s = 1\),
\(p_l = f_d\) and solving Equation (
17) for
\(\alpha\), we get
The value of \(\alpha _d\) therefore gives us a measure of the observed skill normalized against the background success rate provided by luck \(f_d\).
We denote the set of
\(\alpha _d\) values for a given history as
\(\alpha _\mathbb {D}\). When we refer to the
\(\alpha _d\) coefficient for a specific
\(d\) we will use the specific ordering as the subscript, for example:
\(\alpha _{F \prec P}\)The concept embodied by \(f_d\) is founded on the idea that if attackers and defenders are in a state of equilibrium, the frequency of observed outcomes (i.e., how often each desiderata \(d\) and history \(h\) actually occurs) will appear consistent with those predicted by chance. So another way of interpreting \(\alpha _d\) is as a measure of the degree to which a set of observed histories is out of equilibrium.
The following are a few comments on how
\(\alpha _d\) behaves. Note that
\(\alpha _d \lt 0\) when
\(0 \le f_d^{obs} \lt f_d\) and
\(0 \le \alpha _d \le 1\) when
\(f_d \le f_d^{obs} \le 1\). The implication is that a negative value for
\(\alpha _d\) indicates that our observed outcomes are actually
worse than those predicted by pure luck. In other words, we can only infer positive skill when the observations are higher (
\(f_d^{obs} \gt f_d\)). That makes intuitive sense: If you are likely to win purely by chance, then you have to attribute most of your wins to luck rather than skill. The highest value for any
\(\mathbb {D}\) in Table
4 is
\(f_{V \prec A}=0.75\), implying that even if a vendor only knows about 7 out of 10 vulnerabilities before attacks occur (
\(f_{V \prec A}^{obs} = 0.7\)), they are still not doing better than random.
On the other hand, when
\(f_d\) is small it is easier to infer skill should we observe anything better than
\(f_d\). However, it takes larger increments of observations
\(f_d^{obs}\) to infer growth in skill when
\(f_d\) is small than when it is large. The smallest
\(f_d\) we see in Table
4 is
\(f_{D \prec P} = 0.037\).
Inherent to the binomial distribution is the expectation that the variance of results is lower for both extremes (as
\(p\) approaches 0 or 1) and highest at
\(p=0.5\). Therefore, we should generally be less certain of our observations when they fall in the middle of the distribution. We address uncertainty further in Section
7.2.
8 Discussion
The observational analysis in Section
7 supports an affirmative response to
RQ3: vulnerability disclosure as currently practiced demonstrates skill. In both datasets examined, our estimated
\(\alpha _d\) is positive for most
\(d \in \mathbb {D}\). However, there is uncertainty in our estimates due to the application of the principle of indifference to unobserved data. This principle assumes a uniform distribution across event transitions in the absence of CVD, which is an assumption we cannot readily test. The spread of the estimates in Figures
5 and
7 represents the variance in our samples, not this assumption-based uncertainty. Our interpretation of
\(\alpha _d\) values near 0 is therefore that they reflect an absence of evidence rather than evidence that skill is absent. While we cannot rule definitively on luck or low skill, values of
\(\alpha _d \gt 0.9\) should reliably indicate skillful defenders.
If, as seems plausible from the evidence, it turns out that further observations of
\(h\) are significantly skewed toward the higher end of the poset
\((H,\le _{\mathbb {D}})\), then it may be useful to empirically calibrate our metrics rather than using the
a priori frequencies in Table
4 as our baseline. This analysis baseline would provide context on “more skillful than the average for some set of teams” rather than more skillful than blind luck. Section
8.1 discusses this topic, which should be viewed as an examination of what “reasonable” in
RQ2 should mean in the context of “reasonable baseline expectation.”
Section
8.2 suggests how the model might be applied to establish benchmarks for CVD processes involving any number of participants, which closes the analysis of
RQ1 in relation to MPCVD. Section
8.3 surveys the stakeholders in CVD and how they might use our model; The stakeholders are vendors, system owners, the security research community, coordinators, and governments. In particular, we focus on how these stakeholders might respond to the affirmative answer to
RQ3 and a method to measure skill in a way more tailored to each stakeholder group.
8.1 Benchmarks
As described above, in an ideal CVD situation, each observed history would achieve all 11 desiderata
\(\mathbb {D}\). Realistically, this is unlikely to happen. We can at least state that we would prefer that most cases reach fix ready before attacks (
\(F \prec A\)). Per Table
4, even in a world without skill we would expect
\(F \prec A\) to hold in 73% of cases. This means that
\(\alpha _{F \prec A} \lt 0\) for anything less than a 0.73 success rate. Doing just barely better than random (
\(\epsilon \gt \alpha _d \gt 0\) for
\(\epsilon \approx 0\)) is not terribly satisfying, so we would like to seek outcomes in which
\(\alpha _{F \prec A} \ge c_{F \prec A} \ge 0\) for some benchmark constant
\(c_{F \prec A}\). In fact, we propose to generalize this for any
\(d \in \mathbb {D}\), such that
\(\alpha _d\) should be greater than some benchmark constant
\(c_d\)where
\(c_d\) is a based on observations of
\(\alpha _d\) collected across some collection of CVD cases.
We propose as a starting point a naïve benchmark of \(c_d = 0\). This is a low bar, as it only requires that CVD actually do better than possible events which are independent and identically distributed (i.i.d.) within each case. For example, given a history in which \((V, F, P)\) have already happened, \(D\), \(X\), or \(A\) are equally likely to occur next.
The i.i.d. assumption may not be warranted. We anticipate that event ordering probabilities might be conditional on history: for example,
\(p(X|P) \gt p(X|\lnot P)\) or
\(p(A|X) \gt p(A|\lnot X)\). If the i.i.d. assumption fails to hold for
\(e \in E\), observed frequencies of
\(h \in H\) could differ significantly from the rates predicted by the uniform probability assumption behind Table
4.
Some example suggestive observations are:
—
There is reason to suspect that only a fraction of vulnerabilities ever reach the
exploit public event
\(X\), and fewer still reach the
attack event
\(A\). Recent work by the Cyentia Institute found that “5% of all CVEs are both observed within organizations AND known to be exploited”[
14], which suggests that
\(f_{D \prec A} \approx 0.95\).
—
Likewise,
\(D \prec X\) holds in 28 of 70 (0.4)
\(h\). However, Cyentia found that “15.6% of all open vulnerabilities observed across organizational assets in our sample have known exploits”[
14], which suggests that
\(f_{D \prec X} \approx 0.844\).
On their own these observations can equally well support the idea that we are broadly observing skill in vulnerability response, rather than that the world is biased from some other cause. However, we could choose a slightly different goal than differentiating skill and “blind luck” as represented by the i.i.d. assumption. One could aim at measuring “more skillful than the average for some set of teams” rather than more skillful than blind luck.
If this were the “reasonable” baseline expectation (RQ2), the primary limitation is available observations. This model helps overcome this limitation because it provides a clear path toward collecting relevant observations. For example, by collecting dates for the six \(e \in E\) for a large sample of vulnerabilities, we can get better estimates of the relative frequency of each history \(h\) in the real world. It seems as though better data would serve more to improve benchmarks rather than change expectations of the role of chance.
As an applied example, if we take the first item in the above list as a broad observation of
\(f_{D \prec A} = 0.95\), we can plug into Equation (
18) to get a potential benchmark of
\(\alpha _{D \prec A} = 0.94\), which is considerably higher than the naïve generic benchmark
\(\alpha _d = 0\). It also implies that we should expect actual observations of
\(h \in H\) to skew toward the 19
\(h\) in which
\(D \prec A\) around 19x as often as the 51
\(h\) in which
\(D \not\prec A\). Similarly, if we interpret the second item as a broad observation of
\(f_{D \prec X} = 0.844\), we can then compute a benchmark
\(\alpha _{D \prec X} = 0.81\), which is again a significant improvement over the naïve
\(\alpha _d = 0\) benchmark.
8.2 Applicability to MPCVD
MPCVD occurs when multiple vendors or stakeholders are involved in the disclosure process. The need for MPCVD arises due to the inherent nature of the software supply chain [
22]. A vulnerability that affects a low-level component (such as a library or operating system API) can require fixes from both the originating vendor and any vendor whose products incorporate the affected component. Alternatively, vulnerabilities are sometimes found in protocol specifications or other design-time issues where multiple vendors may have each implemented their own components based on a vulnerable design.
A common problem in MPCVD is that of fairness: Coordinators are often motivated to optimize the CVD process to maximize the deployment of fixes to as many end users as possible while minimizing the exposure of users of other affected products to unnecessary risks.
The model presented in this article provides a way for coordinators to assess the effectiveness of their MPCVD cases. In an MPCVD case, each vendor/product pair effectively has its own 6-event history
\(h_a\). We can therefore recast MPCVD as a set of histories
\(M\) drawn from the possible histories
\(H\):
Where
\(m = |M| \ge 1\). The edge case when
\(|M| = 1\) is simply the regular (non-multiparty) case.
We can then set desired criteria for the set
\(M\), as in the benchmarks described in Section
8.1. In the MPCVD case, we propose to generalize the benchmark concept such that the median
\(\tilde{\alpha _d}\) should be greater than some benchmark constant
\(c_d\)In real-world cases where some outcomes across different vendor/product pairs will necessarily be lower than others, we can also add the criteria that we want the variance of each \(\alpha _d\) to be low. An MPCVD case having high median \(\alpha _d\) with low variance across vendors and products involved will mean that most vendors achieved acceptable outcomes.
To summarize:
—
The median
\(\alpha _d\) for all histories
\(h \in M\) should be positive and preferably above some benchmark constant
\(c_d\), which may be different for each
\(d \in \mathbb {D}\).
—
The variance of each
\(\alpha _d\) for all histories
\(h \in M\) should be low. The constant
\(\varepsilon\) is presumed to be small.
8.3 Reflections on the Influence of CVD Roles
CVD stakeholders include vendors, system owners, the security research community, coordinators, and governments [
22]. Different stakeholders might want different things, although most benevolent parties will likely seek some subset of
\(\mathbb {D}\). Because
\(H\) is the same for all stakeholders, the expected frequencies shown in Table
4 will be consistent across any such variations in desiderata. A discussion of some stakeholder preferences is given below, while a summary can be found in Table
5. We notate these variations of the set of desiderata
\(\mathbb {D}\) with subscripts:
\(\mathbb {D}_v\) for vendors,
\(\mathbb {D}_s\) for system owners,
\(\mathbb {D}_c\) for coordinators, and
\(\mathbb {D}_g\) for governments. In Table
3, we defined a preference ordering between every possible pairing of events; therefore,
\(\mathbb {D}\) is the largest possible set of desiderata. We thus expect the desiderata of benevolent stakeholders to be a subset of
\(\mathbb {D}\) in most cases. That said, we note a few exceptions in the text that follows.
8.3.1 Vendors.
As shown in Table
5, we expect vendors’ desiderata
\(\mathbb {D}_v\) to be a subset of
\(\mathbb {D}\). It seems reasonable to expect vendors to prefer that a fix is ready before either exploit publication or attacks (
\(F \prec X\) and
\(F \prec A\), respectively). Fix availability implies vendor awareness (
\(V \prec F\)), so we would expect vendors’ desiderata to include those orderings as well (
\(V \prec X\) and
\(V \prec A\), respectively).
Vendors typically want to have a fix ready before the public finds out about a vulnerability (\(F \prec P\)). We surmise that a vendor’s preference for this item could be driven by at least two factors: the vendor’s tolerance for potentially increased support costs (e.g., fielding customer support calls while the fix is being prepared), and the perception that public awareness without an available fix leads to a higher risk of attacks.
When a vendor has control over fix deployment (
\(D\)), it will likely prefer that deployment precede public awareness, exploit publication, and attacks (
\(D \prec P\),
\(D \prec X\), and
\(D \prec A\), respectively).
4 However, when fix deployment depends on system owners to take action, the feasibility of
\(D \prec P\) is limited.
5 Regardless of the vendor’s ability to deploy fixes or influence their deployment, it would not be unreasonable for them to prefer that public awareness precedes both public exploits and attacks (
\(P \prec X\) and
\(P \prec A\), respectively).
Ensuring the ease of patch deployment by system owners remains a likely concern for vendors. Conscientious vendors might still prefer \(D \prec X\) and \(D \prec A\) even if they have no direct control over those factors. However, vendors may be indifferent to \(X \prec A\).
Although our model only addresses event ordering, not timing, a few comments about timing of events are relevant here since they reflect the underlying process from which
\(H\) arises. Vendors have significant influence over the speed of
\(V\) to
\(F\) based on their vulnerability handling, remediation, and development processes [
24]. They can also influence how early
\(V\) happens based on promoting a cooperative atmosphere with the security researcher community [
23]. Vendor architecture and business decisions affect the speed of
\(F\) to
\(D\). Cloud-based services and automated patch delivery can shorten the lag between
\(F\) and
\(D\). Vendors that leave deployment contingent on system owner action can be expected to have longer lags, making it harder to achieve the
\(D \prec P\),
\(D \prec X\), and
\(D \prec A\) objectives, respectively.
8.3.2 System Owners.
System owners ultimately determine the lag from \(F\) to \(D\) based on their processes for system inventory, scanning, prioritization, patch testing, and deployment—in other words, their vulnerability management (VM) practices. In cases where the vendor and system owner are distinct entities, system owners should optimize to minimize the lag between \(F\) and \(D\) in order to improve the chances of meeting the \(D \prec X\) and \(D \prec A\) objectives, respectively.
System owners might select a different desiderata subset than vendors \(\mathbb {D}_s \subseteq \mathbb {D}\), \(\mathbb {D}_s \ne \mathbb {D}_v\). In general, system owners are primarily concerned with the \(F\) and \(D\) events relative to \(X\) and \(A\). Therefore, we expect system owners to be concerned about \(F \prec X\), \(F \prec A\), \(D \prec X\), and \(D \prec A\). As discussed above, \(D \prec P\) is only possible when the vendor controls \(D\). Depending on the system owner’s risk tolerance, \(F \prec P\) and \(D \prec X\) may or may not be preferred. Some system owners may find \(X \prec A\) useful for testing their infrastructure, others might prefer that no public exploits be available.
8.3.3 Security Researchers.
The “friendly” offensive security community (i.e., those who research vulnerabilities, report them to vendors, and sometimes release proof-of-concept exploits for system security evaluation purposes) can do their part to ensure that vendors are aware of vulnerabilities as early as possible prior to public disclosure (\(V \prec P\)). They can also delay the publication of exploits until after fixes exist (\(F \prec X\)) and possibly even until most system owners have deployed the fix (\(D \prec X\)). This does not preclude adversaries from doing their own exploit development on the way to \(A\), but it avoids providing them with unnecessary assistance.
8.3.4 Coordinators.
Coordinators have been characterized as seeking to balance the social good across both vendors and system owners [
7]. This implies that they are likely interested in the union of the vendors’ and system owners’ preferences. In other words, coordinators want the full set of desiderata (
\(\mathbb {D}_c = \mathbb {D}\)).
We pause for a brief aside about the design of the model with respect to the coordination role. We considered adding a Coordinator Awareness (\(C\)) event, but this would expand \(|H|\) from 70 to 452 because it could occur at any point in any \(h\). There is not much for a coordinator to do once the fix is deployed; however, so we could potentially reduce \(|H|\) to 329 by only including positions in \(H\) that precede the \(D\) event. This is still too large and unwieldy for meaningful analysis within our scope; instead, we simply provide the following comment.
The goal of coordination is this: regardless of which stage a coordinator becomes involved in a case, the objective is to choose actions that make preferred histories more likely and non-preferred histories less likely. A careful reading of the Hasse Diagram in Figure
1 or the ranking in Table
2 can suggest available actions to improve outcomes. Namely this means focusing coordination efforts as needed on vendor awareness, fix availability, fix deployment, and the appropriately timed public awareness of vulnerabilities and their exploits.
8.3.5 Governments.
In their defensive roles, governments are essentially acting as a combination of system owners, vendors, and—increasingly—coordinators. Therefore, we might anticipate \(\mathbb {D}_g = \mathbb {D}_c = \mathbb {D}\).
However, governments sometimes also have an adversarial role to play for national security, law enforcement, or other reasons. The model presented in this article could be adapted to that role as well by drawing some desiderata from the lower left triangle of Table
3. While defining such adversarial desiderata (
\(\mathbb {D}_a\)) is out of scope for this article, we leave the topic with our expectation that
\(\mathbb {D}_a \not\subseteq \mathbb {D}\).
9 Limitations AND Future Work
This section highlights some limitations of the current work and lays out a path for improving on those limitations in future work. Broadly, the opportunities for expanding the model include modeling multiple agents, gathering more data about CVD in the world, working to account for fairness and MPCVD, addressing the importance of duration between events, options for modeling attacker behavior, and managing the impact of partial information.
Modeling Multiple Agents. We agree with the reviewer who suggested that an agent-based model could allow deeper examination of the interactions between stakeholders in MPCVD. Many of the mechanisms and proximate causes underlying the events this model describes are hidden from the model, and would be difficult to observe or measure even if they were included. Nevertheless, in order to reason about different stakeholders’ strategies and approaches to MPCVD, we need a way to measure and compare outcomes. The model we present here gives us such a framework, but it does so by making a tradeoff in favor of generality over causal specificity. We anticipate that future agent-based models of MPCVD will be better positioned to address process mechanisms, whereas this model will be useful to assess outcomes independently of the mechanisms by which they arise.
Gather Data About CVD. Section
8.1 discusses how different benchmarks and “reasonable baseline expectations” might change the results of a skill assessment. It also proposes how to use observations of the actions a certain team or team performs to create a baseline, which compares other CVD practitioners to the skill of that team or teams. Such data could also inform causal reasoning about certain event orderings and help identify effective interventions. For example, might causing
\(X \prec F\) be an effective method to improve the chances of
\(D \prec A\) in cases where the vendor is slow to produce a fix? Whether it is better to compare the skill of a team to blind luck via the i.i.d. assumption or to other teams via measurement remains an open question.
To address the question, a future research effort must collect and collate a large amount of data about the timing sequences of events in the model for a variety of stakeholder groups and a variety of vulnerabilities. Then deeper analysis using joint probabilities could continue, if the modeling choice is to base skill off of a measure from past observations.
While there is a modeling choice about using the uniformity assumption versus observations from past CVD (see Section
8.1), the model does not depend on whether the uniformity assumption actually holds. We have provided a means to calculate from observations a deviation from the desired “reasonable baseline,” whether this is based on the i.i.d. assumption or not. Although, via our research questions, we have provided a method for evaluating skill in CVD, evaluating the overarching question of
fairness in MPCVD requires a much broader sense of CVD practices.
MPCVD Criteria Do Not Account for Equitable Resilience. The proposed criteria for MPCVD in Section
8.2 fail to account for either user populations or their relative importance. For example, suppose an MPCVD case had a total of 15 vendors, with 5 vendors representing 95% of the total userbase achieving highly preferred outcomes and 10 vendors with poor outcomes representing the remaining 5% of the userbase. The desired criteria (high median
\(\alpha\) score with low variance) would likely be unmet even though most users were protected.
Similarly, it is possible that a smaller set of vendor/product pairs represents a disproportionate concentration of the total risk posed by a vulnerability,
6 and again aggregation across all vendor/product pairs may be misleading. In fact, risk concentration within a particular user population may lead to a need for strategies that appear inequitable at the vendor level while achieving greater outcome equity at a larger scale.
The core issue is that we lack a utility function to map from observed case histories to harm reduction. Potential features of such a function include aggregation across vendors and/or users. Alternatively, it may be possible to devise a method for weighting the achieved histories in an MPCVD case by some proxy for total user risk. Other approaches remain possible as well, such as a heuristic to avoid catastrophic outcomes for all, then apply a weighted sum over the impact to remaining users. Future work might also consider whether criteria other than high median and low variance could be applied. Regardless, achieving accurate estimates of such parameters is likely to remain challenging.
The Model Has No Sense of Timing. There is no concept of time in this model, but delays between events can make a big difference in history results. Two cases in which \(F \prec A\) would be quite different if the time gap between these two events was 1 week versus 3 months, as this gap directly bears on the need for speed in deploying fixes. Organizations may wish to extend this model by setting timing expectations in addition to simple precedence preferences. For example, organizations may wish to specify service level objectives for \(V \prec F\), \(F \prec D\), \(F \prec A\), and so forth.
Furthermore, in the long run the elapsed time for \(F \prec A\) essentially dictates the response time requirements for vulnerability management (VM) processes for system owners. Neither system owners nor vendors get to choose when attacks happen, so we should expect stochasticity to play a significant role in this timing. However, if an organization cannot consistently achieve a shorter lag between \(F\) and \(D\) than between \(F\) and \(A\) (i.e., achieving \(D \prec A\)) for a sizable fraction of the vulnerability cases they encounter, it is difficult to imagine that organization being satisfied with the effectiveness of their VM program.
Similarly, the model casts each event \(e \in E\) as a singular point event, even though some—such as fix deployed \(D\)—would be more accurately described as diffusion or multi-agent (as above) processes. To apply this model to real world observations, it may be pragmatic to adapt the event definition to include some defined threshold criteria. A fixed quantile appears to be a reasonable approach. For example, a stakeholder might decide that their goal is for 80% of known systems to be patched. They then could observe the deployed fix ratio for their constituency and mark the event \(D\) as having occurred when that threshold is reached.
Attacks as random events. In the model presented here, attacks are modeled as random events, but they are not. At an individual or organization level, attackers are intelligent adversaries and can be expected to follow their own objectives and processes to achieve their ends. However, modeling the details of various attackers is beyond the scope of this model. Thus, we believe that a stochastic approach to adversarial actions is reasonable from the perspective of a vendor or system owner. Furthermore, if attacks were easily predicted, we would be having a very different conversation.
Observation may be limited. Not all events
\(e \in E\), and therefore not all desiderata
\(d \in \mathbb {D}\) will be observable by all interested parties. But in many cases at least some are, which can still help to infer reasonable limits on the others, as shown in Section
7.3. Vendors are of course well placed to observe most of the events in each case, even more so if they have good sources of threat information to bolster their awareness of the
\(X\) and
\(A\) events. A vigilant public can also be expected to eventually observe most of the events, although
\(V\) might not be observable unless vendors, researchers, and/or coordinators are forthcoming with their notification timelines (as many increasingly are).
\(D\) is probably the hardest event to observe for all parties, for the reasons described in the timing discussion above.
10 Related Work
Numerous models of the vulnerability life cycle and CVD have been proposed. Arbaugh, Fithen, and McHugh provide a descriptive model of the life cycle of vulnerabilities from inception to attacks and remediation [
1], which we refined with those of Frei et al. [
19], and Bilge and et al. [
8] to form the basis of this model as described in Section
2. We also found Lewis’ literature review of vulnerability lifecycle models to be useful [
27].
Prescriptive models have also been proposed. Christey and Wysopal’s 2002 IETF draft laid out a process for responsible disclosure geared towards prescribing roles, responsibilities for researchers, vendors, customers, and the security community [
13]. The NIAC Vulnerability Disclosure Framework also laid out a prescriptive process for coordinating the disclosure and remedation of vulnerabilities [
12]. The CERT Guide to Coordinated Vulnerability Disclosure provides a practical overview of the CVD process [
22]. ISO/IEC 29147 describes standard externally-facing processes for vulnerability disclosure from the perspective of a vendor receiving vulnerability reports, while ISO/IEC 30111 describes internal vulnerability handling processes within a vendor [
23,
24]. The FIRST
Product Security Incident Response Team (
PSIRT) Services Framework provides a practical description of the capabilities common to vulnerability response within vendor organizations. The FIRST provides a number of scenarios for MPCVD [
18]. Many of these scenarios can be mapped directly to the histories
\(h \in H\) described here.
Benchmarking CVD capability is the topic of the
Vulnerability Crduoordination Maturity Model (
VCMM) from Luta Security [
28]. The Vulnerability Coordination Maturity Model (VCMM) addresses five capability areas: organizational, engineering, communications, analytics, and incentives. Of these, our model is perhaps most relevant to the analytics capability, and could be used to inform an organization’s assessment of their progress in this dimension.
Economic analysis of CVD has also been done. Arora et al. explored the CVD process from an economic and social welfare perspective [
2,
3,
4,
5,
6,
7]. More recently, so did Silfversten et al. [
37]. Cavusoglu and Cavusoglu model the mechanisms involved in motivating vendors to produce and release patches [
10]. Ellis et al. examine the dynamics of labor market for bug bounties both within and across CVD programs [
16]. Lewis highlights systemic themes within the vulnerability discovery and disclosure system [
27]. Pupillo et al. explore the policy implications of CVD in Europe [
35]. Moore and Householder modeled the interactions of VM and MPCVD processes [
29]. A model for prioritizing vulnerability response that considers
\(X\) and
\(A\), among other impact factors, is found in Spring et al. [
38].
Other work has examined the timing of events in the lifecycle, sometimes with implications for forecasting. Ozment and Schechter examine the rate of vulnerability reports as software ages [
34]. Bilge and Dumitraş study 18 vulnerabilities in which
\(A \prec P\), finding a lag of over 300 days [
8]. Jacobs et al. propose an Exploit Prediction Scoring System [
25], which could provide insight into
\(V \prec A\),
\(F \prec A\), and other desiderata
\(d \in \mathbb {D}\). Householder et al. find that while only about 5% of vulnerabilities have public exploits available via commodity tools, for those that do the median lag between
\(P\) and
\(X\) was 2 days [
20].
Frei et al. describe the timing of many of the events here, including
\(F\),
\(D\),
\(X\), and
\(P\) and the
\({\Delta }t\) between them for the period 2000–2007 across a wide swath of industry [
19]. In their analysis, they note that
\(X \prec P\) in 15% of vulnerabilities they analyzed. That means
\(f^{obs}_{P \prec X}=0.85\), but from Table
4 we already expect
\(f_{P \prec X}=0.5\) from chance alone. So we therefore compute
\(\alpha _{P \prec X}=0.7\). Similarly, they report that a patch is available on or before the date of public awareness in 43% of vulnerabilities (our interpretation of their “zero day patch” where
\(t_{P}-t_{F}=0\) is that in order for P and F to happen in the same day the patch existed before the
\(P\) event). In other words,
\(f^{obs}_{F \prec P}=0.43\), giving us an
\(\alpha _{F \prec P}=0.36\) once we factor in
\(f_{F \prec P}=0.111\).