Use of Sysml For The Creation of Fmeas For Reliability, Safety, and Cybersecurity For Critical Infrastructure
Use of Sysml For The Creation of Fmeas For Reliability, Safety, and Cybersecurity For Critical Infrastructure
Use of Sysml For The Creation of Fmeas For Reliability, Safety, and Cybersecurity For Critical Infrastructure
Copyright © 2019 by Myron Hecht and David Baum. Published and used by INCOSE with permission.
Abstract. A method for producing a Failure Modes and Effects Analysis (FMEA) from SysML is
presented together with a simple critical infrastructure system example. The significance of the
method is the modeling of failure propagation which enables not only an automated approach but
significant additional analysis results that can be used to support reliability, safety, and cyberse-
curity. The analysis products from a tool implementing this method together with the insight and
analytical benefits they provide are discussed.
Introduction
Failure Modes and Effects Analyses (FMEAs) are widely used in safety and mission critical ap-
plications. The U.S. Department originated the technique in the 1960’s and institutionalized them
in multiple reliability and safety standards. These include Defense (MIL STD 1629A, GEIA STD
009, and MIL STD 882E); Avionics (SAE ARP 4754, ARP 4761, and AARP 5580); Automotive
(SAE J1739); and Medical Devices (ISO 14971 Risk Management, ISO 60812 FMEA, FDA
Guidance for Industry, Q9 Quality Risk Management). They are also used in Nuclear Power
Reactors, Space Systems, and Industrial Process Control.
The systematic and thorough analysis approach of FMEAs have resulted in their application for
cybersecurity in information systems, often using variations such as F(I)MEA (Schmittner et. al.
2014; Gorbenko et. al. 2006; Ramanan, 2008). For network management and system manage-
ment, FMEAs provide a method of correlating end effects with causes and as such, can be an
important aid to intrusion detection as well as for incident response.
Ideally, FMEAs should be done at multiple stages in the development process to identify failure
detection and recovery deficiencies as early as possible so that economically feasible corrective
actions can be taken. However, this practice has not been feasible using traditional manual tech-
niques because of their cost and skilled labor requirements. Hence, there has been significant
interest in the development of automated techniques based on SysML and other design languages
for generation of FMEAs. Research in methods development has been ongoing nearly a decade.
An early example used IBDs and Sequence Diagrams to model propagation, and AltaRica to
model transformation (David, Idasiak, and Katz, 2010). Another group demonstrated use of the
Architecture Analysis, and Design Language (AADL) error model to generate functional hazard
analyses for a medical device. With respect to failure models, Feiler (2013) defined a failure class
hierarchy and Wallace (2010) defined a Failure Propagation and Transformation Calculus that is
used in our approach.
The Object Management Group (Biggs, 2019) is now developing a standardized Safety and Re-
liability profile that could be used for the markup of items that could be incorporated into this
approach for the development of FMEAs
This paper combines (1) application of FMEAs for integrated cybersecurity, reliability, and safety
analysis and (2) use of SysML for the automatic generation of such analyses. It demonstrates the
application of our approach to failure propagation using Supervisory Control and Data Acquisition
(SCADA) information network. Finally, it describes the analysis products generated by a tool
(Java plug-in to a SysML modeling product) and the insights these products provide.
Figure 1 shows a row of a conventional FMEA for a software component. The column of the left
is the component under analysis which is paired with the failure mode. This is the fundamental
unit of analysis in a typical FMEA. The table then shows the effect on the component, the next
level effect, and the end effect. The next two columns describe the handling of the failure and
effects through detection and mitigation. The second to rightmost column is the severity rating (in
this case, 1 through 5 with 5 being most severe) and comment which might be a recommendation
for a system improvement, a notation of uncertainty in the analysis requiring further resolution, or
an assumption. The comment column is often the most important product of the FMEA because it
may lead to the addition of new mitigations that result in a safer and more secure system (if the
analysis is done at a time when design changes are still feasible). There are variations on this
general format including a separate column for a probability assessment which, when combined
with the severity provides a measure of risk, or Criticality, in which case, the FMEA is sometimes
referred to as an FMECA
Item Failure Cause Immediate Next Level End Detection Mitigation Severity Comment
Mode Effect Effect Effect
ABC Incorrect Algorithm, ABC outputs Incorrect result Incorrect None None 3 Insert code to
result design, or wrong result affects compo‐ results check for rea‐
coding to persistent nents DEF, HIJ, displayed sonableness
error database LMN to user and substitute
default value
Only three levels of effects (immediate, next higher, and end) are considered; there might
be 5 or 10 additional effects which are not explicitly considered in this analysis
Only partial consideration of failure propagation paths (and usually one propagation path)
for detections and mitigations. An FMEA that considers only one failure propagation path
for each component and failure mode will not suffice as a tool for cybersecurity analysis.
The approach described in this paper overcomes these limitations by enumerating and analyzing
all propagation paths for each failure mode/component/cause triple. For each propagation path,
the software which implements this approach calculates total length and the earlier point of de-
tection and mitigation (i.e., closest to the postulated failure origin). It also tabulates the number of
components subjected to the most failure modes, symptoms linked to each failure modes, and list
all propagation paths. This listing and enumeration of all propagation paths enables the complete
identification of cyberattack paths, and where defenses (either in terms of detections and mitiga-
tions or preventative/protective measures) can be evaluated.
An additional significant advantage of an automated approach is that it enables multiple analyses
to be performed during each design stage and making the FMEA evaluation integrated into the
design activity.
Figure 3. Defining the System Components to be Included in the Analysis with a SysML Block
Definition Diagram
The major concepts depicted on this diagram are
• The water supply system is composed of 5 major system component types: actuators, sensors,
computers, firewalls, and a VPN. For the purposes of analysis, we have added a 6th “com-
ponent” type: an adversary that will be used for the cybersecurity analysis
• These component definitions will be instantiated into components using the names (roles)
shown at the ends of the arrows that join the top-level block (water supply system) to the lower
level blocks (e.g., the actuator at the level of the diagram)
The next step is to define the failure propagations within each of the component block definitions
that we defined in step 1. Figure 4 shows the overall relationship between the BDD and the more
detailed failure propagations and transformations for the purposes of the FMEA that we will de-
scribe below
Two kinds of SysML ports
are defined: (1) generic
failure propagation ports
called sink ports and source
ports and (2) specific source
and sink failure mode ports
that are nested within their
respective propagation
ports. It is not necessary to
create these ports manually.
The model and tool we have
created can perform this
action using predefined ste-
reotypes.
When creating designs using
SysML, the blocks identi-
fied in the Block Definition
Diagram (BDD) are defined
with many different proper-
ties (e.g., weight, power,
capacity, etc.). It is also
necessary to define their
failure modes to perform the
FMEA. We do this by de-
fining two failure propaga-
tion ports: an input or “sink”
port, and an output or
“source” port. As we will
we show in the next chart,
failures propagation paths
defined by connections from
sources to sinks.
Figure 4. Defining the Failure Propagations and Transfor-
Also, another key point is
mations within a Component
that these failure propaga-
tions are included in the block definitions. That means that they can be reused in future analyses
thereby not only saving labor but also enabling standardization.
Once the input (sink) and output (source) ports have been defined, the failure propagations are
defined using a SysML internal block diagram (upper diagram) that represents both the component
and the outside ports of the block definition (lower diagrams). Horizontal lines connecting the
input and output ports represent propagation. Diagonal lines represent transformations. In this
diagram, interfered transmission on the VPN can be transformed into low signal integrity.
For the third step, we describe the failure propagation paths among the components using a SysML
Internal Block Diagram (IBD). Figure 5 shows the relationship between the component defini-
tions in the BDD and their instantiations in the IBD both of which were shown earlier. Notice that
on the IBD, the IBD shows the failure propagation ports.
Figure 5. Defining the Propagation Paths with a SysML Internal Block Diagram
The last step is defining the failure propagations and transformations between components. Figure
6 describes the SysML association block, the modeling construct used to represent the details of
the failure propagation. The top diagram shows the definition of the connection which shows the
source port (VPN in this case) connected to the sink port (the computer in this case). The name of
the connection (or “association” using SysML terminology) is VPN-Computer. The lower dia-
gram shows the propagations and transformations between the components represented in a
SysML internal block diagram. As was the case with the intra-component failure propagations and
transformations, failure propagations are represented as horizontal lines connecting source and
sink ports; failure transformations are represented as diagonal lines. Figure 6 also shows how the
association block relates to the internal block diagram below, and how the propagation source
ports on the left-hand component related to the propagation sink ports for the right-hand compo-
nent
Tool Output
The method described in the previous section was implemented as a Java Plug-in for Cameo
Systems Modeler. This section discusses five tables that are produced from the tool as tabs in a
Microsoft Excel spreadsheet to enable subsequent formatting, summarization, and visualization.
This section describes the output of the tool and shows the increased insight and actionable in-
formation (relative to a conventional FMEA) that is produced. Because of automation, this in-
formation can be acquired early enough in the design that it is possible to make changes that can
improve its robustness and resiliency
Figure 7 shows a portion of the full FMEA that is generated by the tool (7 of the 22 columns are
shown). For the example water control system described here, there are 1110 propagation paths
with unique originating components, failure modes, causes, propagation steps, and end effects
(with a conventional manually generated FMEA, there would be only 37 rows)
Figure 7. Failure Modes and Effects Analysis (excerpt)
The complete set of columns in the table are
• Failed Component: Identification of the specific component and component type
• Failure Mode: Identification of the failure mode
• Cause: Cause of the failure mode. If there are multiple causes, each cause is listed on a sep-
arate line because the protection/prevention measures would differ
• Intermediate Effects: Identification of each of the effects (secondary failures) as the primary
failure propagates through the system until its end effect. Note that this propagation path is
further detailed in the Propagation Description table
• Intermediate Causes: The causes associated with each of these intermediate effects
• End Component: The component at which the failure propagation terminates (end effect)
• End Cause: The cause of the failure at the end component
• Severity: The severity of the end effect
• Severity Comment: Explanation or uncertainty in determining the severity
• Detection: Detection of the end effect
• Mitigation: Mitigation of the end effect
• Protection: A protective or preventative measure to prevent the failure or cyberattack effect
from occurring
• Comment: Explanation of the protective measure or documentation of uncertainty
• # Propagations: Number of components involved in the propagation from the primary failure
mode to the end effect
• First Known Detection: First component in the propagation path at which the failure can be
detected
• # Propagations to Detection: Number of components affected by the failure until it is detected
• First Known Mitigation: First component at which a mitigation of the failure can occur
• # Propagations to Mitigation: Number of propagations to the mitigation
• First Known Protection: First protective measure along the propagation path that can prevent
failure propagation from occurring (particularly relevant to cybersecurity)
• # Propagations to Protection: Number of components involved in the propagation until the
protective measure is reached
• Intermediate Detections: List of all detection mechanisms other than the primary and end
effect detections along the propagation path
• Intermediate Mitigations: List of all mitigations other than the primary and end effect miti-
gations along the propagation path
• Intermediate Protections: List of all protection measures other than the primary and end
effect protection measures along the propagation path
• Intermediate Comments: Explanation or uncertainties on the intermediate detections, miti-
gations, or protections
The second output table is the failure modes and effects summary (FMES), shown in Figure 8, is
one of the two most useful outputs from our plug-in because it enables a rapid identification of the
failure modes that lead to the most severe effects, components with the most failure modes, the
most used detection and mitigation effects, and the distribution of failure modes by severity.
For the water supply example, this table showed that (1) Malicious Data is the failure mode that is
most often not detected and has the greatest severity effects and (2) actuators (pump and value) and
the control processor are also significant contributors to Severity 1 failure modes.
The fourth analysis product is the Diagnostics table, shown in Figure 10, enables an assessment of
what is the most likely item to have failed given the externally observable system effect. The
number of rows is equal to the number of components/end effect combinations; the number of
columns is the number of components (plus the adversary block). Using the top row as an example
starting from the left, 27% of the failure modes that could lead to the effect of sensor receiving bad
data (top data row of the table) are from the control processor, 13% are from each of the sensors
themselves (flow, pressure, and level sensors), and 13% are from the adversary.
This table can be used as an aid in assessing the likely causes for a given symptom. For example,
the table shows that 100% of the components that contribute to the sensor receiving malicious data
are from the control processor. Hence, we call this a diagnostics table because it provides a
measure of the relative likelihood of each component to be the root cause leading to the system end
effect.
For the water supply example, complete table showed that (1) The VPN is the single component
most likely to be the cause of malfunctions in the actuators and (2) The control processor can be a
cause of all system level effects identified thus far.
rigorous defect detection in the code. For the late data, such a protection would not be effective
(because the attack could affect the networking software outside of the application – it was as-
sumed for the purposes of this example that such code is commercially procured and would not be
part of a coding analysis). Where protection measures are absent, the analysts should examine the
existing mitigation and detection measures intended to support reliability and safety and determine
whether these are sufficient for cybersecurity purposes. For the water system example, the com-
plete table showed that there are multiple propagation paths for which there is no protection
against a cyberattack; measures for failure detection and mitigation should be evaluated to de-
termine if there is any effect
Discussion
The methodology described in the previous section reduces the labor and time required for system
engineers and analysts by reducing repetitive tasks required to manually create an FMEA but does
not eliminate the need for expert and knowledgeable input. Failure modes and internal transfor-
1
The failure mode designated as “corrupt data” means data that has been altered in such that it can no longer be in-
terpreted by the software (i.e., no longer meets its actual or implied interface specification), “bad data” means data that
is interpretable by the software but has incorrect values. Both corrupt and bad data are non-malicious failure modes.
“Malicious data”, on the other hand is a failure mode in which the data have been altered with intent as part of a
cyberattack,
mations for each component must still be originated by a knowledgeable engineer or analyst and
manually entered – but only once. In a conventional FMEA, they would be repeated on each row.
Similarly, propagations and transformations would have to be identified manually by a knowl-
edgeable individual, but only between components and their nearest neighbors. The propagation
algorithms would automatically generate the component to system effect by traversing the paths
(hence, our characterization of this tool as an “automated” FMEA generation technique). Finally,
when design changes result in a system model change, the FMEA reflecting the changes can be
regenerated automatically. As a result, the impact of design changes on reliability, safety, and
cybersecurity can be immediately assessed, and a superior system or product can be produced.
Because all propagation paths are identified completely, the outputs produced by this FMEA ap-
proach contain far more information than using the traditional methodology. The additional re-
porting formats defined in Figures 8 through 10 enable a summarization of the information so that
informed engineering decisions can be made despite the significantly larger number of “rows” that
are produced by an FMEA. For example, the distribution of failure mode severities by component
in the FMES (Figure 8) enables the identification of vulnerable or critical components and helps
prioritize design and analysis resources; the System Effects Table (Figure 9) can be used to iden-
tify the best placement of detection and recovery provisions. The diagnostics table (Figure 10) is
an aid to the creation of maintenance manuals and troubleshooting procedures by enabling the
association of an observable system with the most likely component. Users can develop their own
queries as additional needs arise. For example, if a traditional FMEA with only the immediate,
next higher level, and effects is required, a query can be made to find unique records of the first
two propagations and the end effects, ignoring the additional propagation that are identified.
Conclusions
This paper describes an automated FMEA generation capability using the SysML modeling lan-
guage and described its application to a simple SCADA computer network. We also presented the
outputs produced by the tool (implemented as a SysML plug-in) from this analysis and showed the
insights into the design that can be achieved.
The fundamental innovation in our approach is the identification and enumeration of all failure
propagation paths and the detailed documentation of the failure transformations, detection
measures, mitigation measures and protective measures that can be applied to these devices to
prevent or mitigate the impact of the anomaly. By doing so, we can expand the traditional FMEA
approach to analysis of cyberattack vectors.
Because our approach is automated and can be readily integrated into a system development effort
using Model Based Systems Engineering (MBSE), the analysis can be readily repeated throughout
the design and can be used frequently to assess a system design, identify weaknesses, and take
corrective actions to create a more resilient and robust system.
References
Biggs, Geoffrey, Andrius Armonas, Tomas Juknevicius, Kyle Post, Nataliya Yakymets, Axel
Berres, “OMG standard for integrating safety and reliability into MBSE: Core Concepts
and Applications”, Submitted to IS 2019
David, Pierre, Vincent Idasiak, Frederic Kratz, “Reliability study of complex physical systems
using SysML”, Reliability Engineering and System Safety 95 (2010) 431–450
Feiler, P. AADL Error Model version, available online at
https://wiki.sei.cmu.edu/aadl/index.php/Standardization (registration required)
Gorbenko, A., Kharchenko, V., Tarasyuk, O., Furmanov, A.: F(I)MEA-technique of web services
analysis and dependability ensuring. In: Butler, M., Jones, C.B., Romanovsky, A.,
Troubitsyna, E. (eds.) Rigorous Development of Complex Fault-Tolerant Systems. LNCS,
vol. 4157, pp. 153–167. Springer, Heidelberg (2006)
Larson, Brian, John Hatcliff, Kim Fowler, and Julien Delange, “Illustrating the AADL Error
Modeling Annex (v. 2) Using a Simple Safety-Critical Medical Device”, Proc. ACM 2013
High Integrity Language Technology Conference, Pittsburgh PA
Ramanan, B. “An illustration of the application of Failure Modes and Effects Analysis (FMEA)
techniques to the analysis of information security risks.” , August, 2008 available online at
www.iso27001security.com/ISO27k_FMEA_spreadsheet_1v1.xls
Schmittner C., Gruber T., Puschner P., Schoitsch E. (2014) Security Application of Failure Mode
and Effect Analysis (FMEA). In: Bondavalli A., Di Giandomenico F. (eds) Computer
Safety, Reliability, and Security. SAFECOMP 2014. Lecture Notes in Computer Science,
vol 8666. Springer, Cham
Wallace, M. Modular Architectural Representation and Analysis of Fault Propagation and
Transformation, Proc. European Joint Conf. Theory and Practice of Software (ETAPS),
Elsevier Electronic Notes in Theoretical Computer Science(ENTCS), vol.141, no.3,2005,
pp.53–71