Safe Automotive Software Architecture (Safe)
Safe Automotive Software Architecture (Safe)
Safe Automotive Software Architecture (Safe)
WP3
Deliverable D331b: Methodology and Tool
specification for analysis of qualitative and
quantitative cut-sets issued from error failure
propagation analyses
1 Table of contents
1 Table of contents ............................................................................................................................................ 3
2 List of figures .................................................................................................................................................. 6
3 List of tables ................................................................................................................................................... 7
4 Executive Summary ....................................................................................................................................... 8
5 Introduction and overview of the document .................................................................................................... 9
5.1 Introduction ............................................................................................................................................ 9
5.2 Scope of deliverable D331b................................................................................................................. 10
5.3 Structure of the document ................................................................................................................... 10
6 Proposal of a Global Safety Analysis Methodology ...................................................................................... 11
6.1 Knowledge sharing between partners on practiced safety analysis methods ...................................... 11
6.2 Terminology used in the deliverable for physical elements .................................................................. 12
6.3 Global Safety Analysis Methodology Proposal .................................................................................... 13
6.4 Use case presentation for illustration ................................................................................................... 15
6.5 System Safety Analyzes [Design Phase] ............................................................................................. 16
6.5.1 STEP 1A: Perform Qualitative System FMEA [Mandatory] [System Safety Analysis] [Design
Phase] 16
6.5.1.1 Qualitative SFMEA: Application Rules .................................................................................... 16
6.5.1.2 Qualitative SFMEA: Introduction............................................................................................. 16
6.5.1.3 Qualitative SFMEA: Main Purpose ......................................................................................... 16
6.5.1.4 Qualitative SFMEA: Standards applicable .............................................................................. 16
6.5.1.5 Qualitative SFMEA: Input ....................................................................................................... 17
6.5.1.6 Qualitative SFMEA: Main Principles ....................................................................................... 17
6.5.1.7 Qualitative SFMEA: Output..................................................................................................... 18
6.5.1.8 Qualitative SFMEA: Illustration via our example ..................................................................... 18
6.5.2 STEP 1B: Perform Qualitative System FTA [ASIL dependent] [System Safety
Analysis] [Design Phase] .............................................................................................................................. 20
6.5.2.1 Qualitative System FTA: Application Rules ............................................................................ 20
6.5.2.2 Qualitative System FTA: Main Purpose .................................................................................. 20
6.5.2.3 Qualitative System FTA: Standards applicable....................................................................... 20
6.5.2.4 Qualitative System FTA: Input ................................................................................................ 20
6.5.2.5 Qualitative System FTA: Main Principles ................................................................................ 21
6.5.2.6 Qualitative System FTA: Output ............................................................................................. 21
6.5.2.7 Qualitative System FTA: Illustration via our example ............................................................. 22
6.5.3 STEP 1C: Perform Quantitative System FTA for residual risk allocation [ASIL dependent] [System
Safety Analysis] [Design Phase] ................................................................................................................... 23
6.5.3.1 Quantitative System FTA: Application Rules .......................................................................... 23
6.5.3.2 Quantitative System FTA: Main Purpose ................................................................................ 23
6.5.3.3 Quantitative System FTA: Standards applicable .................................................................... 23
6.5.3.4 Quantitative System FTA: Input .............................................................................................. 23
6.5.3.5 Quantitative System FTA: Main Principle ............................................................................... 23
6.5.3.6 Quantitative System FTA: Output ........................................................................................... 24
6.5.3.7 Quantitative System FTA: Illustration via our example ........................................................... 25
6.5.4 STEP 1D: Allocate HW Architectural Metrics to components (Optional) [System Safety Analysis]
[Design Phase] ............................................................................................................................................. 26
6.5.4.1 HW Architectural Metrics Allocation: Application Rules .......................................................... 26
6.5.4.2 HW Architectural Metrics Allocation: Main Purpose ................................................................ 26
6.5.4.3 HW Architectural Metrics Allocation: Standards applicable .................................................... 26
6.5.4.4 HW Architectural Metrics Allocation: Input .............................................................................. 26
6.5.4.5 HW Architectural Metrics Allocation: Main Principle ............................................................... 26
6.5.4.6 HW Architectural Metrics Allocation: Output ........................................................................... 28
6.6 Component Safety Analyzes: Design Phase ....................................................................................... 29
6.6.1 STEP 2A: Perform Qualitative Component FMEDA [Mandatory] [Component
Safety Analysis] [Design Phase]................................................................................................................... 29
6.6.1.1 Qualitative Component FMEDA: Application Rules ................................................................ 29
6.6.1.2 Qualitative Component FMEDA: Introduction ......................................................................... 29
6.6.1.3 Qualitative Component FMEDA: Main Purpose...................................................................... 29
6.6.1.4 Qualitative Component FMEDA: Standards applicable .......................................................... 29
6.6.1.5 Qualitative Component FMEDA: Input .................................................................................... 29
6.6.1.6 Qualitative Component FMEDA: Main Principles ................................................................... 30
6.6.1.7 Qualitative Component FMEDA: Output ................................................................................. 30
6.6.1.8 Qualitative Component FMEDA: Illustration via our example ................................................. 31
6.6.2 STEP 2B: Perform Qualitative Component FTA [Optional] [Component
Safety Analysis] [Design Phase] ................................................................................................................... 33
6.6.2.1 Qualitative Component FTA: Application Rules ...................................................................... 33
6.6.2.2 Qualitative Component FTA: Main Purpose ........................................................................... 33
2 List of figures
Figure 1: Scope of deliverable D331b ................................................................................................... 10
Figure 2: General Safety Analysis Process Proposal by WT331 Partners ........................................... 13
Figure 3: General flowchart of the Global Safety Analysis Process ..................................................... 14
Figure 4: Physical architecture of the lighting control system ............................................................... 15
Figure 5: Example of qualitative system FTA for the lighting control system ....................................... 22
Figure 6: Example of quantitative System FTA with possible target allocation for our example .......... 25
Figure 7: Example of parallel architecture ............................................................................................ 28
Figure 8: Example of physical architecture description for the Top Column Module............................ 31
Figure 9: Example of qualitative Component FTA for the Top Column Module ECU ........................... 35
Figure 10: Analog interface schematics ................................................................................................ 38
Figure 11: Example of flow diagram for failure mode classification ...................................................... 41
Figure 12: Analog Interface Schematics ............................................................................................... 43
Figure 13 : Example of flow diagram for failure mode classification ..................................................... 51
Figure 14: F(t) and w(t) plot with
SR, HW
SPF RF = 50 FIT ............................................................... 56
Figure 15: F(t) and w(t) plot with 2 latent faults combined (1 = 50 FIT and 2 =30FIT) ....................... 57
Figure 16: Example of a single-point fault FTA pattern ........................................................................ 58
Figure 17: Example of 2 possible residual-fault FTA patterns .............................................................. 59
Figure 18: Example of 2 possible dual-point fault FTA patterns resulting from safety-mechanism
failure..................................................................................................................................................... 60
Figure 19: Example of 2 possible FTA patterns to model safety mechanism effect ............................. 61
Figure 20: Example of a dual-point failure pattern in a FTA ................................................................. 61
Figure 21: Example for quantitative component FTA for the TCM_ECU for SG_01 ............................ 63
Figure 22: Example of quantitative System FTA for verification ........................................................... 67
Figure 23: Illustration of a combination of FTA and FMEA [2] .............................................................. 70
Figure 24: Domains of interest for tool requirements for daily use ....................................................... 71
3 List of tables
Table 1 : Type of analysis methods required or recommended by ISO26262 [1] .................................. 9
Table 2 : Example of recognized analyzes methods listed by ISO26262 [1] .......................................... 9
Table 3 : Example of partial SFMEA performed on our use case......................................................... 19
Table 4 : Metrics allocation required or recommended by ISO26262 [1] ............................................. 23
Table 5 : HW Architectural Metrics allocation required or recommended by ISO26262 [1] ................. 26
Table 6 : Example of partial qualitative FMEDA performed on our use case ....................................... 32
Table 7 : Example of partial eFMEA for analog interface performed on our use case ......................... 39
Table 8 : Example of partial quantitative FMEDA at HW part level for analog interface performed on
our use case .......................................................................................................................................... 44
Table 9 : Example of different FRC tables assuming different rationales for dangerous faults ............ 47
Table 10 : Targets of failure rate classes of HW parts regarding single-point faults ............................ 48
Table 11 : Maximum failure rate classes for a given diagnostic coverage of HW parts regarding
residual faults ........................................................................................................................................ 48
Table 12 : Targets of failure rate class and coverage of HW parts regarding dual-point faults ............ 48
Table 13 : Example of partial quantitative FMEDA at HW Part level for analog interface performed on
our use case .......................................................................................................................................... 54
Table 14 : Example of architectural metrics data synthesis for different components.......................... 65
Table 15 : Example of residual risk data synthesis for different components ....................................... 67
4 Executive Summary
The main goal of the deliverable D331b is to provide to readers some guidance on how to
perform safety analyses when developing a safety-related product.
This need to provide guidance is born from the exchange between WT331 partners because
ISO26262 document recommends or requires to perform certain kind of safety analyses
(qualitative or quantitative) but does not clearly state what is expected, how the different
safety analyses can interact together, etc…
Therefore a global safety analysis process is proposed from system design to detailed
design and possible alternatives are highlighted. Of course it just serves as an example. In
addition new methods for “safety” FMEAs or for calculating hardware architectural metrics at
high abstraction level of hardware architecture are introduced.
For each safety analysis considered, the basic idea is to provide information about when it is
applicable, to which standards we can refer to, what are the inputs needed and what are the
outputs provided. Moreover, this deliverables documents how to perform the safety analysis
illustrated with a concrete example.
From this global safety analysis process proposal, we identified some gaps between end-
user needs and what can be really extracted from tool state of the art, tool capable to
support FMEA and FTA methods. It leads to a list of requirements that would improve
ISO26262 application thanks to strong improvement on tool usability in daily use.
Finally a first attempt to define the ontology of malfunctions at different abstractions level in
the SAFE Meta model is proposed for harmonization. It would facilitate the use and the
share of the error model defined in the deliverable D331a via standardization of malfunction
description.
5.1 Introduction
As already explained in deliverable D331a, through the different concept and development
phases from the safety lifecycle, ISO26262 recommends or requires, depending on the
criticality of the items or elements to be developed, to perform safety analyses as shown
hereafter:
ASIL A ASIL B ASIL C ASIL D
Inductive
Required Required Required Required
methods
Deductive Nothing required
methods or recommended Recommended Required Required
The main objective of safety analyses is to support the derivation of safety requirements
from the safety goals, and to validate and verify their effectiveness and completeness.
Safety analyses are either inductive (starting from known causes and forecast possible
effects) or deductive (starting from known effect and forecast possible causes).
Qualitative analyses can be first appropriate and sufficient in most cases to identify
malfunctions.
In a second step, quantitative analyses extend qualitative safety analyses,, mostly to assess
the effect of random hardware failures. So, the calculation of the hardware architectural
metrics and of the residual risk to violate the safety goals is performed. Software failures, as
systematic failures, do not require quantitative analyses but only qualitative analyses.
ISO26262 does not force a specific analysis method but list all recognized methods as
follows:
Some of these safety analysis methods are well known and well defined in standards (e.g.
FTA, Markov, FMEA, etc…). Some others like FMEA can be used in very different ways and
are practiced out of safety analyses context and before the publication of ISO26262.
The ISO26262 provides very few examples on how to perform safety analyses. In addition,
the Part 10 [2] supposed to be a guideline for documentation of methodology gaps is also
imprecise.
Moreover the tools used to perform safety analyses are not dedicated to ISO26262 and
require extensions to be used in practice. This deliverable aims clearly to address this gap.
The main scope of deliverable D331b is to define in a first step a methodology on how to
perform safety analyses when developing a safety-related product. In a second step it
provides the related list of news requirements for safety analysis tools available on the
market.
First, we highlight the result of knowledge sharing on safety analysis methodologies between
partners, and then propose a possible global safety analysis process from system level
down to the detailed design.
Second, we benchmark the safety tools from the state of the art with the proposed safety
analyses.
Thirdly we propose a list of requirements to close the gap identified between our needs and
feature available in tools performing FMEAs and FTAs safety analyses.
In the context of deliverable D331b the main task of WT331, the different partners exchange
on how they perform safety analyses when developing a new safety-related product.
From this knowledge sharing we identified some lacks in safety analysis methodology, as
there are not necessary well explained in ISO26262 [1] and even in ISO26262 Part 10 [2]
which supposed to be a guideline. The list of the gaps is:
Performing the PMHF calculation at HW Part level from a given FTA result is
unrealistic because a slight update of the HW schematics produces an update of the
FTA. So at which level of architecture can we build FTA to calculate the PMHF?
How to allocate metrics from system to the different components of the system in
case of distributed development?
How to rebuild residual risk metrics at system level when the different component
suppliers have provided residual risk results using different methods (PMHF or
Failure Rate Class as proposed in the ISO26262)?
As people from the different companies do not use necessary the same vocabulary
(system, component, element, sub-system, part, etc…), it can lead to
misunderstanding. Therefore clarification is needed.
So in the next chapters we aim to provide answers to these questions by proposing a global
safety analysis process and explaining selected technical points that are not clear enough in
the ISO26262.
Battery
A system (most of the time confused with item) is made of several components and deliver
one or several functionalities at vehicle level. A component can be an ECU, a smart sensor,
an actuator as an example. System can interact with other System.
This architecture description is mapped to EAST-ADL Analysis Level.
Sensor Analog
Analog
Input Interface Input
CAN CAN
12V Reset Output
Watchdog Module
LEVEL (N-1)
12V
Battery
Input
A component (e.g. ECU01) can be decomposed in different HW architectural elements or
HW blocks that fulfill a particular Functionality. This architecture description is mapped to
EAST-ADL Design Level on the Hardware Design architecture. Note that depending of the
accuracy of safety analysis as for important HW blocks features they can be decomposed at
this level.
Regulator
Battery Battery Pin
Input + 12V
+
LEVEL (N-2)
Reset
Ground Pin
This global safety analysis process is the core of the D311b deliverable about methodology.
Hazard and Risk Analysis is not here explained because already covered by deliverable
D311b [3]. Therefore we consider that the safety goals are our first inputs to derive safety
requirements and verify them with help of safety analyses.
As the global safety analysis process from Figure 2 is not so easy to understand the
following flowchart represents the corresponding chapter structure in the D331b deliverable.
Also direct links to the relevant chapters for the reader are provided:
Direct
Start
Link
STEP 1D
[System Design]
STEP 1A
Allocate HW
Perform Qualitative
Item defintion + Hazard System FMEA
Architectural Metrics 1A
& Risk Analysis -> Safety to components
Goal + Initiation of
1B
STEP 1C
Safety Lifecycle
STEP 1B Perform Quantitative 1C
Perform Qualitative
System FTA
System FTA for residual
risk allocation to
1D
components
N components
[Component Design]
STEP 2D
STEP 2A
Allocate HW
Perform Qualitative 2A
Architectural Metrics
(very rare)
Component FMEDA 2B
STEP 2C 2C
Perform Quantitative STEP 2B
Component FTA for Perform Qualitative 2D
residual risk target Component FTA
allocation (very rare)
[Alternative 1] [Alternative 2]
Each Component has 2
alternatives possible for
verification
STEP 4A
STEP 3A
Perform Quantitative
Perform eFMEA at
Component FMEDA at 3A
HW Part level
HW Part Level
[Component Verification] 4A
STEP 5A STEP 4B
Perform Quantitative Calculate Component
Component FMEDA at Residual Risk at HW Part
5A
HW Block Level level 4B
STEP 5B
Calculate Component Residual
Risk at HW Block level
using PMHF 5B
N components N components
STEP 6B
Verifying Residual Risk at
STEP 6A
Verifying Architectural
6A
System level
using PMHF
Metrics at System level 6B
The Hazard and Risk Analysis (HA&RA) from top level system malfunction MF01 leads to
the definition of one safety goal:
SG01: The system shall not spuriously cut off the low beams [ASIL B]
Safe State: Low beam always ON
Fault Time Tolerance Interval (FTTI) is: 400 ms
Other top level malfunctions (MF02 & MF03) are not leading to the definition of a safety goal
and would be rated with severity according to FMEA scale (1 to 8).
In the system architecture example showed hereafter, a Top Column Module (TCM) ECU
acquires the driver request for lighting from a mechanical switch position, then sends the
lighting command (either LOW BEAM ON or PARKING LIGHT ON or OFF) on a CAN
communication bus and a Body Controller Management (BCM) ECU executes it.
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
(TCM_ECU) (BCM_ECU)
Light Module 02
Lighting System
Figure 4: Physical architecture of the lighting control system
In the VDA standard [4] SFMEA is equivalent to a Product FMEA at System-Level. Also a
new Mechatronic FMEA has been introduced in order be able to cover fault tolerance and
have a clear visualization of the mitigation effect.
As the VDA approach is only covered by some tools from the market here in this chapter we
would propose a possible alternative that is used by Valeo.
Moreover later in the document some requirements will be addressed to the different tools
from the market in order to be able to perform this new method (see chapter 8.3).
It is as well a design support tool to allow defining safety mechanisms at the right place in
the system. Moreover it helps to evaluate and consolidate the Functional Safety Concept.
The model describing the functional architecture (also called logical) and its physical
implementation of the system. The description of functional and physical architecture
may be a block diagram showing the atomic components of the system, their inputs
and outputs and the interconnections between the components. The description of
the functional architecture must describe the behavior of the system functions and
their split-up into sub-functions. An allocation of the sub-functions onto the physical
components of the architecture is necessary as well.
The list of top level system malfunctions with their maximum associated criticality.
Note: Sometimes safety and unavailability studies are mixed together. Therefore for top
level system malfunctions that were rated S0 during hazard and risk analysis, meaning that
consequences of the malfunctioning behavior is clearly limited to material damage and do
not involve harm to persons, it can be relevant to refine this S0 with the 1 to 8 severity scale
used in classical FMEA methods.
As already mentioned in the Note of chapter 6.5.1.5 for system effects that are not safety
related (severity = S0) but very annoying for the driver because availability problem, it can
be relevant to assess the severity with the classical FMEA scale (1 to 8).
The analysis has to be done in all relevant life phases. Car assembly, long term parking as
well as decommissioning may be relevant life phases. The always relevant life phase is of
course the “use” phase. In the “use” life phase, the different operation modes such as
parking, ignition on, engine running, and vehicle running … have to be considered. Relevant
life phases and vehicle situations are at least those identified in the hazard and risk analysis.
Some life phases and operation modes may be regrouped in a single analysis. The criterion
for grouping in a single analysis different life phases and vehicle situations is when the
functions and therefore the functional architecture are the same in the different life phase
and operational modes.
Also during SFMEA analysis, it is possible to identified new top level system
malfunctions that were not considered in hazard and risk analysis. In this case they must be
provided to people in charge of hazard and risk analysis for impact analysis.
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
(TCM_ECU) (BCM_ECU)
Light Module 02
Lighting System
The results of the qualitative SFMEA can be showed in the table hereafter:
STEP 1 STEP 2 STEP 3
system Severity or Severity or
system
effect Criticality Criticality
Potential effect with
Component Function without Without Safety mechanism With
Malfunction safety
safety Safety Safety
mechanism
mechanism mechanism mechanism
MF02: No SM01 :If BCM receives no
MF1001: No lighting MF03: Low
lighting command on the CAN
command sent on low beam Severity = Severity =
the CAN bus by the
bus from TCM, BCM switches beams
when 8 LOW BEAM ON when ignition 4
TCM_ECU always ON
required switch is ON
MF1003 : Lighting
command on the SM03: If internal failure
CAN bus detected by TCM, TCM put
erroneously lighting command at INVALID
switches from MF01: on the CAN bus &. SM02 : If MF03: Low
LOWBEAM ON to Severity =
Loss of the ASIL B BCM receives an INVALID beams
another valid command on the CAN bus 4
low beams always ON
position (OFF or from TCM, BCM switches
PARKING LIGHT LOW BEAM ON when ignition
ON) by the switch is ON
TCM_ECU
TCM TCM_F1
SM03: If internal failure
detected by TCM, TCM put
MF1004: Lighting MF02: No lighting command at INVALID
command on the on the CAN bus &. SM02 : If MF03: Low
low beam Severity = Severity =
CAN bus always put BCM receives an INVALID beams
at OFF by the when 8 4
command on the CAN bus always ON
TCM_ECU required from TCM, BCM switches
LOW BEAM ON when ignition
switch is ON
SM03: If internal failure
MF1005: Lighting detected by TCM, TCM put
command on the MF02: No lighting command at INVALID
on the CAN bus &. SM02 : If MF03: Low
CAN bus always put low beam Severity = Severity =
at PARKING LIGHT BCM receives an INVALID beams
when 8 4
ON by the command on the CAN bus always ON
required from TCM, BCM switches
TCM_ECU
LOW BEAM ON when ignition
switch is ON
MF1006 : Low beam MF03: Low
CAN parameter put Severity =
always at ON value
beams
4
by the TCM_ECU always ON
Here is a typical example where also not safety relevant malfunctions were considered. In
the SFMEA table it is clearly identified that if the TCM_ECU erroneously switch the light
command from LOWBEAM ON to another valid position [MF1003], without safety
mechanism, it will lead directly to the violation of the safety goal SG01 which is ASIL B.
Note: As components are here considered as black boxes we do not investigate the
potential failure causes as seen in classical FMEA [5].
Its main purpose is to start from top level system malfunctions that can violate a safety goal
(the top event) and analyze all possible component malfunctions or combination of
component malfunctions (the causes) that can lead to the top event.
System FTA allows defining safety mechanisms at the right place in the system, to detect
and mitigate a component malfunction. Therefore additionally to system FMEA, system FTA
helps to evaluate and consolidate the Functional Safety Concept (FSC) using a second
analysis method.
The model describing the functional architecture (also called logical) and its physical
implementation of the system. The description of functional and physical architecture
may be a block diagram showing the atomic components of the system, their inputs
and outputs and the interconnections between the components. The description of
the functional architecture must describe the behavior of the system functions and
their split-up into sub-functions. An allocation of the sub-functions onto the physical
components of the architecture is necessary as well.
A top event is generally a top level system malfunctions leading to the violation of a given
safety goal. Therefore there is one qualitative system FTA per considered safety goal.
Qualitative system FTA starts from the top event and analyses all necessary pre-conditions
that could cause the top event to occur. These conditions can be combined in any number of
ways using logical gates (OR, AND, etc...). Events in a qualitative system FTA are expanded
until component malfunctions appear.
Qualitative system FTA can be used to determine if a top level system malfunction would
occur, but also be used to prevent the occurrence of the top level system malfunction by
inserting a safety mechanism that mitigates the local component malfunction.
Boolean logic is then used to reduce the system FTA structure into combinations of events
leading to the top event, generally referred as Minimal Cut Sets (MCS).
The possible common cause failures that would then feed the complete list.
The description and position of safety mechanisms with regard to each related
component malfunction.
In our lighting system, all components are in serial configuration and therefore the system
FTA have only minimal cut sets of order 1 as non Electrical or/and Electronic (E/E)
components such as light modules or switch are not considered.
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
Light Module 02
Lighting System
For our example the qualitative system FTA is very simple as shown hereafter:
OR
IE IE
TCM_E001 BCM_E001
Figure 5: Example of qualitative system FTA for the lighting control system
The events below the OR gate have a diamond shape meaning in the FTA graphical
representation that they are undeveloped events.
At these steps of the design phase, as components are considered as black box, there is no
need to go further in details.
The above undeveloped events are then extended by the different component developers
during component safety analysis.
6.5.3 STEP 1C: Perform Quantitative System FTA for residual risk allocation [ASIL
dependent] [System Safety Analysis] [Design Phase]
At this step of the development phase, the main purpose of quantitative system FTA is to
derive the residual risk target defined for each considered safety goal for each relevant
electronics component. It is particularly useful for distributed developments.
In addition the SAE ARP4761 [9] standard from aeronautic field describes best practices of
residual risk target allocation.
1. The first step is to define the residual risk target for each considered safety goal. This
value will be also the target to be reached for the top level event of the FTA.
To determine this target at top level, people can either use the standard targets given
in ISO26262 Part 5 Clause 9.4.2.1 and represented in the following table (most
commonly used):
An alternative is to derive the targets from the calculated values of the residual risk
on a similar well trusted design. (Two similar designs have similar functionalities and
similar safety goals with the same ASIL. A well trusted design has a sufficient service
history with no safety issues).
2. The second step is then to start from the qualitative system FTA and to affect
residual risk targets to each event that can cause the top event malfunction to occur.
It is strongly recommended to start with events that are minimal cut set of order 1,
meaning that they can cause directly the top event to occur.
3.
A simple rule to allocate residual risks target to events of minimum cut sets of order 1
can be to divide the value defined for the top event by the total number of minimal cut
sets of order. Therefore the same residual risk target will be distributed uniformly to
each event that is minimum cut sets of order 1. This allocation is not mandatory as
we could imagine others distributions for components reused from well known
physical architecture. The decision shall be taken case by case, no standard rule are
provided here because this subject is context dependant and not simple.
Note: For events that are minimum cut sets of order 2, meaning that 2 independent
events must be combined in order that the top event occurs, the allocation of residual
risk target is also not so easy.
On one hand, if we focus only on safety issues, residual risk target of each event can
be much lower than the target recommended by the ISO26262 (see Table 4) as final
probability of failure of both events is then combined. It shall be noticed that the
independence of the two events shall be ensured latter during the component design.
But on the other hand, if a too high residual risk target is allowed for each event, it
might lead to a high probability of unavailability. This means that during vehicle life
(often 15 years) a function of the system has a high probability of not being available
for the driver, or that the system has high probability of being switched in a degraded
state. These 2 situations are not safety related but very annoying for the driver.
Therefore in this particular case, it is highly recommended to do the allocation in
closed collaboration with people from the Quality Management.
OR
IE IE
TCM_E001 BCM_E001
Our low beam system is ASIL B. If we refer to the Table 4, allocation of residual risk target is
only recommended for ASIL B, but we still allocate a residual risk target of 10e-7 /h for the
system top event as “the loss of the low beams”.
The qualitative system FTA is simple as we have only 2 minimal cut sets of order 1.
Therefore each event receives a residual risk target of 5.0*10e-8 /h (failure rate) as shown
hereafter:
MF01: Loss of the low beams leads to
Violation of Safety Goal 01 :
The system shall not spuriously cut off
the low beams [ASIL B]
IE
OR
Q=0.0015 w=1e-7
IE IE
TCM_E001 BCM_E001
Figure 6: Example of quantitative System FTA with possible target allocation for our example
These residual risk targets can then be transformed into non functional safety requirements.
HW architectural metrics concern single-point fault metric (SPFM) and latent-point fault
metric (LFM). They only address random HW failure and not systematic failures.
An alternative is to derive the targets from the calculated values of the residual risk
on a similar well trusted design. (Two similar designs have similar functionalities and
similar safety goals with the same ASIL. A well trusted design has a sufficient service
history with no safety issues).
2. The second step is strongly linked to the physical architecture and where safety
mechanisms are implemented (internal or external).
Therefore in this chapter we will only provide only examples and not standard rules to
be applied:
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
(TCM_ECU) (BCM_ECU)
Light Module 02
Lighting System
The ECUs are in a serial configuration. As seen during SFMEA and qualitative system FTA,
in such a configuration, a malfunction of each component (TCM_ECU or BCM_ECU) may
directly violate the system safety goal.
The TCM_ECU may send a wrong order to cut off the low beams and the BCM_ECU may
on its own wrongly cut off the low beams. Moreover, a coherent wrong order coming from
the TCM_ECU cannot be covered by a safety mechanism in the BCM_ECU. For these
malfunctions, both ECUs can integrate safety mechanisms to mitigate theirs effects. In such
a case, it makes sense to allocate a local metric target to both ECUs for these malfunctions.
As the safety goal is ASIL B it is recommended for the single-point fault metric to be better
than 90%. A first allocation can be 90% for both ECUs for these particular malfunctions. This
means that we require that the TCM_ECU controls, using safety mechanisms, at least 90%
of the faults, that otherwise would have resulted in a wrong coherent order to cut off the low
beams.
For an ASIL B safety goal, it is also recommended that for the latent-point fault metric to be
better than 60% and therefore a first allocation can be 60% for both ECUs.
Actuator power
command request ESCL Actuator
The ECUs are in a true parallel configuration. In this configuration, a single malfunction of
any component (ESCL_ECU or BCM_ECU) cannot directly violate the system safety goal.
The BCM_ECU may power the ESCL_ECU when vehicle speed is above 6km/h or the
ESCL_ECU may spuriously lock the steering column actuator, but none of these
malfunctions can violate the system safety goal directly.
Therefore, there is neither single-point fault nor residual fault in the system. The single-point
fault metric (SPFM) is implicitly 100% because of the system architecture for that reason
there is no SPFM requirement allocated to the ESCL_ECU.
Note: The actuator here is an electrical device without electronics and can not violate the
safety goal alone because powered by ESCL_ECU.
In the VDA standard [4] qualitative FMEDA is equivalent to a Product FMEA at sub-system
Level. Also a new Mechatronic FMEA has been introduced in order be able to cover fault
tolerance and have a clear visualization of the mitigation effect.
As the VDA approach is only covered by some tools from the market here in this chapter we
propose a possible alternative that is used by Valeo. Moreover later in the document some
requirements are addressed to the different tools from the market in order to be able to
perform this new method (see chapter 8.3).
Its principle is to identify the critical malfunctions of the HW functional blocks of the
component and the way they propagate to cause the component malfunctions identified in
the SFMEA. Therefore it allows defining adequate safety mechanisms at component level.
FMEA is a common practice for many years in lots of industry domains. Nevertheless even if
there is many standards available like [5][6], all are addressing fault avoidance and not fault
tolerance. For fault tolerance the only standard to which we can refer is the VDA[4] and more
particularly its chapter 2.1 on Mechatronic FMEA.
The model describing the functional and physical architectures of the considered
component. The description of physical architecture as a block diagram showing the
atomic functional blocks (HW blocks) of the component, their inputs and outputs and
the interconnections between the functional blocks. The description of the functional
architecture must describe the behavior of the component functions and also their
split-up into sub-functions. An allocation of the sub-functions to the HW blocks is
necessary as well. At this stage functional blocks are supported by HW but we do not
know yet how they are realized (could be a mixed of HW/SW).
List of component malfunctions with their associated criticality or severity (if not
safety-related) from SFMEA results.
Note: For component effects that are not safety related but very annoying for the user, it can
be relevant to assess the severity with the classical FMEA scale (1 to 8).
The analysis has to be done in all relevant life phases. Car assembly, long term parking as
well as decommissioning may be relevant life phases. The always relevant life phase is of
course the “use” phase. In the “use” life phase, the different operation modes such as
parking, ignition on, engine running, and vehicle running … have to be considered. Relevant
life phases and vehicle situations are at least those identified in SFMEA. Some life phases
and operation modes may be regrouped in a single analysis. The criterion for grouping in a
single analysis different life phases and vehicle situations is when the functions and
therefore the functional architecture are the same in the different life phase and operational
modes.
The main outputs are the list of HW block critical malfunctions, with their corresponding
component effect (component malfunction) and potentially with the safety mechanism
(internal or external to the components) to be implemented.
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
(TCM_ECU) (BCM_ECU)
Light Module 02
Lighting System
Digital Output
Watchdog Reset
Battery Power
Supply 5V µC
Figure 8: Example of physical architecture description for the Top Column Module
For the illustration of the qualitative component FMEDA, only the analog interface HW
block is considered.
The proposed qualitative FMEDA considering only the analog interface is the following:
MF504 :
Wrong
output value MF1006 : Low beam CAN
provided: parameter put always at
ON value by the
Severity = 4
Low Beam
ON instead TCM_ECU
of OFF
MF504 :
Wrong
output value MF1006 : Low beam CAN
provided: parameter put always at
Low Beam ON value by the
Severity = 4
ON instead TCM_ECU
of Parking
light ON
Note: Here in the Table 6 the safety mechanism SM05 would then be refined in HWSR
(Hardware Safety Requirement) and SWSR (Software Safety Requirement) in the Technical
Safety Concept.
Its main purpose is to start from the critical component malfunctions that were identified
during system design (top event) and analyze all possible HW block malfunctions or
combination of HW block malfunctions (the causes) that can lead to the top event.
Qualitative component FTA allows defining safety mechanisms, to detect and mitigate a HW
block malfunction in the considered component. Therefore additionally to qualitative FMEDA,
qualitative component FTA helps to evaluate and consolidate the Technical Safety Concept.
The model describing the functional and physical architectures of the considered
component. The description of physical architecture as a block diagram showing the
atomic functional blocks of the component (HW blocks), their inputs and outputs and
the interconnections between the functional blocks. The description of the functional
architecture must describe the behavior of the component functions and potentially
also their split-up into sub-functions. An allocation of the sub-functions to the HW
blocks is necessary as well. At this stage functional blocks are supported by HW but
we do not know yet how they are realized (could be a mixed of HW/SW).
A top event in qualitative component FTA is a component malfunctions that has the potential
to violate a given safety goal and that was previously identified as undeveloped event in
qualitative system FTA. Therefore there is one new qualitative component FTA per
considered component malfunctions.
Qualitative component FTA starts from the top event and analyses all necessary pre-
conditions that could cause the top event to occur. These conditions can be combined in any
number of ways using logical gates (OR, AND, etc...). Events in a qualitative component
FTA are expanded until HW blocks malfunctions appear.
Qualitative component FTA can be used to determine if a top level component malfunction
will occur but also be used to prevent the top level component malfunction from by inserting
a safety mechanism that mitigates the local HW block malfunction.
Boolean logic is then used to reduce the qualitative component FTA structure into
combinations of events leading to the top event, generally referred as Minimal Cut Sets
(MCS).
The list of the causes (HW blocks malfunctions) or combinations of causes (HW
blocks malfunctions) than can lead to the considered component malfunction.
The possible common cause failures that would then feed the complete list.
The description and position of safety mechanisms with regard to each related HW
blocks malfunction.
Digital Output
Watchdog Reset
Battery Power
Supply 5V µC
The critical component malfunction that was identified for the Top Column Module ECU is:
MF1003: Lighting command on the CAN bus erroneously switches from LOWBEAM ON to
another valid position (OFF or PARKING LIGHT ON) by the TCM_ECU [ASIL B].
Qualitative component FMEDA has shown that the following HW blocks malfunctions can
lead to MF003:
A low voltage provided by the power supply
Analog interface can provide an erroneous value instead of Low Beam ON
Microcontroller can wrongly elaborate the light command from analog interface inputs
or send a wrong Light command on the Can Bus.
OR
[Analog Interface] MF502 :Wrong [Analog Interface] MF503 : Wrong [Microcontroller] MF901 : Wrong [Microcontroller] MF902 : Wrong Light
[Analog Interface] MF501 : output value provided: OFF instead of output value provided: Parking light ON [Power Supply Unit] elaboration of the CAN Light command command (another valid position instead
Wrong output value provided: Low Beam ON instead of Low Beam ON
MF401 : Low v oltage(< (another valid position instead of LOW of LOW BEAM ON) sent on the CAN
BEAM ON) bus
out of range
5V) prov ided
IE IE IE IE IE IE
Figure 9: Example of qualitative Component FTA for the Top Column Module ECU
There were no new cause or combination of causes highlighted in the qualitative FTA.
Allocation of residual risk targets to elements of a component is not recommended but could
occur depending of business scenario. For example, in case of an ECU integrating complex
elements under different supplier’s responsibility inside the ECU, an allocation to the
integrated complex element would be relevant.
To facilitate the allocation same principles than those described in chapter 6.5.3 can be used
and therefore this chapter will not be furthermore developed.
Similar to STEP 2C, the allocation principles are not further developed please refers to
chapter 6.5.4.
At this step of the design phase, as explained in introduction of chapter 6.3, two alternatives
are proposed depending mainly at which level of architecture (HW part or HW block) people
want to perform quantitative FMEDA.
6.7.1 STEP 3A: Perform eFMEA at HW Part level (Optional) [HW Safety Analysis]
[Design Phase] [Alternative 1]
In an eFMEA, the effects of all HW parts (resistors, ICs …) failure modes are systematically
analyzed considering the effects at the outputs of the HW block (HW block malfunction from
Qualitative FMEDA). Each failure mode of a HW part is given a failure rate coming from
reliability databases () and failure modes distribution () database.
Then, a synthesis of total failure rate per HW block malfunction is done and would be used
as input for quantitative FMEDA.
The analog interface from our Top column Module ECU is shown hereafter:
+5V
R1 R6 Analog Output to µC
R2 C1
Low Beam
R3
Parking light
R4
OFF
R5
Analog
Interface
For the analog interface from Figure 10 , the corresponding eFMEA is the following:
Worst case
HW Part Part Part failure Part failure
Function Effect at block output
Block Part id. failure failure mode mode failure
description (Block Malfunction)
name rate (FIT) mode distribution (%) rate (FIT)
MF502 :Wrong output value
Parameter
30 % 0.15 provided: OFF instead of
change +
Low Beam ON
R1 0.5 Parameter MF501 : Wrong output value
30 % 0.15
change - provided: out of range
MF501 : Wrong output value
Open 40 % 0.2
provided: out of range
MF502 :Wrong output value
Parameter
30 % 0.15 provided: OFF instead of
change +
Low Beam ON
R2 0.5 Parameter MF501 : Wrong output value
30 % 0.15
change - provided: out of range
MF501 : Wrong output value
Open 40 % 0.2
provided: out of range
MF502 :Wrong output value
Parameter
30 % 0.15 provided: OFF instead of
Provide 3 change +
Low Beam ON
analog values
(corresponding R3 0.5 Parameter MF501 : Wrong output value
30 % 0.15
to OFF or change - provided: out of range
Analog Parking light MF501 : Wrong output value
Interface ON or Low Open 40 % 0.2
provided: out of range
Beam ON
lighting switch Parameter
30 % 0.15 No Effect
position) to the change +
microcontroller R4 0.5 Parameter
30 % 0.15 No Effect
change -
Open 40 % 0.2 No Effect
Parameter
30 % 0.15 No Effect
change +
R5 0.5 Parameter
30 % 0.15 No Effect
change -
Open 40 % 0.2 No Effect
MF502 :Wrong output value
Parameter
30 % 0.15 provided: OFF instead of
change +
Low Beam ON
R6 0.5 Parameter MF501 : Wrong output value
30 % 0.15
change - provided: out of range
MF501 : Wrong output value
Open 40 % 0.2
provided: out of range
MF501 : Wrong output value
C1 2 short 100% 2
provided: out of range
NOTE: In this example the worst case [OFF instead of Low beam ON] was considered and not [Parking light ON instead of Low
beam ON] because the effect at component level is the same.
Table 7 : Example of partial eFMEA for analog interface performed on our use case
GO TO NEXT STEP
Chapter 6.9.1
Qualitative component FMEDA with the list of HW block Malfunction their effect at
component level, their criticality or severity, and the safety mechanism implemented
to control the malfunction propagation.
A description of the physical and functional architecture of the component used for
this analysis. It can be a block diagram showing the internal blocks of the component,
their physical inputs and outputs and the interconnections between the internal
blocks of the component.
During quantitative FMEDA, each individual part failure mode must be classified carefully as
safe fault, single point fault, residual fault or multiple point faults (detected, perceived, latent)
as illustrated hereafter with a flow diagram :
For each failure mode with
Failure Rate = FR
of an HW Element or HW Part
Start
Yes
Which fraction is
Which fraction is
1-DC1 prevented by safety DC1 Which fraction is 1-Fper MPF, Latent
detected by another 1-DC2
Residual Fault mechanism from perceived by the MPF,Latent = End
Multiple Point Fault safety mechanism and
RF = FR x (1-DC1) violating the safety
notified to the driver?
driver ? (FR-RF) x (1-DC2)(1-Fper)
goals (possible failure of the
safety mechanism prior
failure mode occurs) DC2 Fper
End End
In order to have a link between quantitative FMEDA at HW part level and quantitative FTA at
HW block level (that is used later to verify if residual risk targets are reached), a new optional
step (in blue above) compared to the analyze process from ISO26262 Part 5 Annex B Figure
B.2 [1] is introduced to get the link with safety analyses performed at an higher abstraction
level (HW Block level)
Having classified the different failure modes as safe fault, single-point fault, residual fault or
multiple point faults (detected, perceived, latent) is not sufficient to calculate the architectural
metrics SPFM and LFM because the total sum of the failure rates of safety-related parts
(impacting denominator in following formula) need to be determined :
SPF RF MPF , Latent
SPFM 1
SR, HW
LFM 1 SR, HW
SR, HW SR, HW
SPF RF
SR, HW
With
SR, HW
SPF RF = Sum of (Residual Fault and Single Point Fault failure rates)
SR, HW
MPF , Latent = Sum of Multiple Point Fault Latent failure rates
SR, HW
= Sum of the failure rates of the safety-related elements. Here elements are HW parts,
a HW part being safety related if one of its failures mode can be SPF, RF or MPF,Latent for the
considered safety goal or for the considered component malfunction.
Then knowing
SR, HW
it is possible to calculate the SPFM and LFM and verify if architectural
metrics targets are reached for each considered safety goal or critical component
malfunctions.
If SPFM and LFM targets are not reached, main contributors to SPFM and LFM can
be identified and new safety mechanisms put in place or improved if their diagnostic
coverage was very low (60%). Be careful, here the recommendation is not to play with
numbers to reach the target. It must be done carefully and if architectural metrics values are
closed to the targets, the best is to come with these values to the system responsible that
will be able to verify that at safety goal level, the final architectural metrics targets are
reached when integrating all results from all components.
Remarks when performing quantitative FMEDA:
1. HW parts whose faults are multiple-point faults with n > 2 can be omitted from the
calculations unless shown to be relevant in the technical safety concept.
2. It is important to understand that the Diagnostic Coverage’s (DC) that are given in
ISO26262 Part 5 Annex D [1] are average values that consider all failure modes of a
HW block. Here the maximum DC considered is 99% but it does not mean that a DC
of 100% is not reachable, but this has to be demonstrated with a specific analysis.
3. A same failure mode can be placed in different classes for fault when being
considered for different safety goals or critical component malfunctions.
Also it can be interesting to have the list of the main contributors for each metric with
potential actions identified when architectural metrics targets are not reached.
The following synthesis is extracted from the qualitative FMEDA for the analog interface:
Severity or
Potential Criticality
Analog Interface Component effect [TCM] Without Safety mechanism
Malfunction Safety
mechanism
MF1003 : Lighting command on the CAN SM05: Detection of out of
MF501 : Wrong output
bus erroneously switches from LOWBEAM range values by the µC.
value provided: out of ASIL B
ON to another valid position (OFF or CAN light command put at
range
PARKING LIGHT ON) by the TCM_ECU INVALID
MF502 :Wrong output MF1003 : Lighting command on the CAN
value provided: OFF bus erroneously switches from LOWBEAM
instead of Low Beam ON to another valid position (OFF or ASIL B Not Detectable
ON PARKING LIGHT ON) by the TCM_ECU
MF503 : Wrong output MF1003 : Lighting command on the CAN
value provided: Parking bus erroneously switches from LOWBEAM
light ON instead of Low ON to another valid position (OFF or ASIL B Not Detectable
Beam ON PARKING LIGHT ON) by the TCM_ECU
During qualitative FMEDA analysis, it was identified that a wrong output value of type [out of
range] provided by the analog interface could violate the considered critical malfunction
[MF1003]. Nevertheless it could be detected by the safety mechanism SM05 and the system
is put in safe state [LOWBEAM ON].
It was also identified that if the Analog Interface provided a wrong output value of type [OFF
instead of LOWBEAM ON] or [PARKING LIGHT ON instead of LOWBEAM ON], this could
violate also the considered critical malfunction [MF1003] and this is not detectable.
The corresponding quantitative FMEDA at HW part level for analog interface HW block is
proposed hereafter:
Failure Mode
Part Name
HW Block
goal
goal
MF502 :Wrong
output value Y
Parameter
30 0.15 provided: OFF Yes No 0 0.15 No N/A N/A N/A e
change + instead of Low s
Beam ON
SM06: ADC
MF501 : SM05: Detection of
test (reference
Wrong output v out of range values Y
Parameter voltage,
R1 0.5 30 0.15 Yes by the µC. CAN 90 0.015 Yes 60 0.054 e Yes
change - alue provided: reference
light command put s
out of range ground) at
at INVALID
each power up
SM06: ADC
SM05: Detection of
MF501 : test (reference
out of range values Y
Wrong output voltage,
Open 40 0.2
value provided:
Yes by the µC. CAN 90 0.02 Yes
reference
60 0.072 e
light command put s
out of range ground) at
at INVALID
each power up
MF502 :Wrong
output value Y
Parameter
30 0.15 provided: OFF Yes No 0 0.15 No N/A N/A N/A e
change + instead of Low s
Beam ON
SM06: ADC
SM05: Detection of
Analog Interface
R4 0.5 Parameter N No
30 0.15 No Effect No No
change - o
N
Open 40 0.2 No Effect No No
o
Parameter N
30 0.15 No Effect No No
change + o
R5 0.5 Parameter N No
30 0.15 No Effect No No
change - o
N
Open 40 0.2 No Effect No No
o
R6 Same as R1 (not shown in the example because lack of space) Yes
SM06: ADC
SM05: Detection of
MF501 : test (reference
out of range values Y
Wrong output voltage,
C1 2 Short 100 2
value provided:
Yes by the µC. CAN 90 0.2 Yes
reference
60 0.72 e Yes
light command put s
out of range ground) at
at INVALID
each power up
Table 8 : Example of partial quantitative FMEDA at HW part level for analog interface performed on our use case
As it can be seen above the FMEDA table can become quickly not readable. If we have an
ECU with 300 HW parts each of them having 3 failure modes, it gives a quantitative FMEDA
table with 900 lines. Therefore it is better to have a tool that can support quantitative FMEDA
and permit to extract quickly the relevant information for people in charge of the analysis..
The fact to add a new column considering “the failure mode effect at output of the HW block”
help defining for each failure mode (with qualitative FMEDA output) if the considered safety
goal or the considered critical component malfunction can be violated and if safety
mechanisms are put in place to control this failure mode. Also this additional column is a
mean to link quantitative FMEDA performed at HW part level with quantitative component
FTA performed at HW block level during residual risk verification (see chapter 6.9.2)
It can be also noticed that the first 7 columns of the FMEDA table above are equivalent to
the eFMEA approach from [Alternative 1] showed in chapter 6.7.1.7.
Back to our example, if we want to calculate the local architectural metrics for the analog
interface
For the analog interface, the parts R1, R2, R3, R6 and C1 are safety related.
Therefore the total amount of safety related failure rate is =0.5+0.5+0.5+0.5+2 = 4 FIT
SR, HW
and
SR, HW
SPF RF = 0.94 FIT
and
SR, HW
MPF , Latent = 1.224 FIT
All necessary data to determine local architectural metrics for the analog interface are now
available. Results are:
SPF RF
0.94
SPFM 1 SR, HW
1 76.5%
SR, HW
4
MPF , Latent
1.224
LFM 1 SR, HW
1 60%
SR, HW SR, HW
SPF RF 4 0.94
Compared to the initial architectural metrics targets defined in chapter 6.5.4.5 (SPFM = 90%;
LFM = 60%) the SPFM result is not good but here only the analog interface was considered
to simplify the example. In real life other HW parts of the TCM_ECU component have also to
be considered for the global architectural metrics calculation.
GO TO NEXT STEP
Chapter 6.8.2
6.8.2 STEP 4B: Calculate Component Residual Risk (Optional) at HW Part level [HW
Safety Analysis] [Verification Phase] [Alternative 2]
ISO26262 Part 5 Chapter 9 [1] proposed two alternative methods to evaluate whether the
risk of violations of safety goal is sufficient low:
1st Method called “Probabilistic Metric for random Hardware Failures” (PMHF)
consists in using a probabilistic metric to evaluate the violation of the considered
safety goal using, for example, quantified FTA and to compare the result of this
quantification with a target value. [STEP 4B1]
2nd Method called “Evaluation of each cause of safety goal violation” consists in
evaluating individually each single-point fault or residual fault or dual-point fault that
can lead to the violation of the considered safety goal. [STEP 4B2]
Applicability: The calculation of residual risk is required for ASIL C and ASIL D safety goals
and recommended for ASIL B safety goals.
In both methods, multiple-point faults can also be considered if shown to be relevant when
building the technical safety concept.
The first method using quantified FTA can be performed at different architectural levels, from
HW parts to system level, whereas the second method shall be used at HW part level as
stated in ISO26262 Part 5 Clause 9.4.3.2 [1].
6.8.2.1 STEP 4B1: Calculate Component Residual Risk at HW Part level using Method
1: Probabilistic Metric for random Hardware Failures (PMHF) [Alternative 2]
Considering the experience of WT331 partners, it is not recommended to use method 1 with
quantitative FTA performed at HW part level because it leads to huge fault trees that should
be updated each time when the HW schematics is modified.
Moreover the resulting fault trees are not readable and error prone.
The only case where it could be relevant to perform quantified FTA at HW part level is for
complex HW parts such as microcontrollers. The resulting FTA (coming often from suppliers)
could then be integrated in a quantified FTA performed at HW block level as an example.
Therefore if you want to use the method 1 proposed by ISO26262 Part 5 [1] to calculate the
PMHF, please go directly to chapter 6.9.2.
A simplified method could be to consider as a first approximation only the minimum cut-set
of order 1, given by the computation of SPF and RF fault given by the formula:
PMHF SPF RF
SR, HW
In most of the cause this assumption is sufficient for pessimistic approximation.
GO TO NEXT STEP
Chapter 6.9.2
6.8.2.2 STEP 4B2: Calculate Component Residual Risk at Part level using Method 2:
Evaluation of each cause of safety goal violation [Alternative 2]
Method 2 also called sometimes “Failure Rate Class method”. In our D331b deliverable we
remind only the main principles and highlight that are not so obvious when reading the
ISO26262 Part 5 Clause 9.4.3 [1].
A complete explanation for the method 2 is also provided by [11] , [12] and [13].
Unlike the method 1 where it is required to simply verify that you do not exceed a global
budget for residual risk target considering all dangerous faults together, the basic idea of
method 2 is to spread the residual risk target among all the dangerous faults having then a
same “local” target.
To illustrate the comparison let us consider an example of a component that has a total of no
more than 5 single point faults (no other faults) and a residual risk target of 10 FIT. Using
method 1, it is permissible for 4 of the SPF to have 1 FIT and the last one to have 6 FIT. But
using method 2, no SPF is allowed to have more than 2 FIT.
Having then understood the philosophy of the method 2, it seems easy to use this method
with some highlights necessary.
Basically when the residual risk target is known (resulting from allocation from chapter
6.5.3) , the first step of the method 2 is to construct a failure rate class table (FRC1 to FRC
n) that is applied to each dangerous fault.
The threshold for FRC1 is first determined by dividing the residual risk target allowed to the
component for the considered safety goal by 100 (we assume the hypothesis of 100 relevant
dangerous faults or cut sets for the safety goals).
This rational of 100 can be modified to a number lower than 100 (as notified in
ISO26262 Part 5 chapter 9.4.3.4 [1] ) but also in the SAFE Meta model by a number higher
than 100 (even if not addressed by ISO26262 because 100 was considered pessimistic).
FRC2, FRC3 …FRC n are then derived from FRC1 having each time on order of magnitude
(FRC i = FCR 1 / 10i).
If we take our example which is an ASIL B system having a residual risk target of 50 FIT
allocated to the Top Column Module ECU and assuming different rationales of dangerous
faults, the corresponding FRC tables to apply to our component is:
FRC table assuming 100 FRC table assuming 50 FRC table assuming 125
dangerous faults dangerous faults dangerous faults
FRC HW Part FRC HW Part FRC HW Part
FRC1 0.5 FIT FRC1 1 FIT FRC1 0.4 FIT
FRC2 0.5 FIT < 5 FIT FRC2 1 FIT < 10 FIT FRC2 0.4 FIT < 4 FIT
FRC3 5 FIT < 50 FIT FRC3 10 FIT < 100 FIT FRC3 4 FIT < 40 FIT
FRC4 50 FIT < 500 FIT FRC4 100 FIT < 1000 FIT FRC4 40 FIT < 400 FIT
FRC5 500 FIT < 5000 FIT FRC5 1000 FIT < 10000 FIT FRC5 400 FIT < 4000 FIT
Table 9 : Example of different FRC tables assuming different rationales for dangerous faults
Once the relevant FRC table is build, for each safety goals, each single-point fault, residual
fault or latent-point faults of an HW part must be assessed with following targets:
For single-point faults, the targets for failure rate class are the following depending on
the ASIL level :
ASIL of the
Failure Rate class Target
safety goal
D FRC1 + dedicated measures*
FRC2 + dedicated measures*
C
OR FRC1
B FRC2 OR FRC1
Table 10 : Targets of failure rate classes of HW parts regarding single-point faults
For residual faults, the targets for failure rate class are the following depending on
the ASIL level and on the diagnostic coverage :
The diagnostic coverage (DC) of an HW part must not be confused with the diagnostic
coverage of a safety mechanism covering a certain failure mode of an HW part.
SPF RF
1 HWPart
HW Part
The calculation of the DC regarding residual faults of an HW part is done analogously to the
calculation of the single-point fault metrics as stated in ISO26262 Part 5 Clause 9.4.3.6 Note
4 [1].
For dual-point fault, the targets for failure rate class are the following depending on
the ASIL level and on the diagnostic coverage :
The diagnostic coverage (DC) regarding latent faults of an HW part is the following:
MPF, Latent
1 HWPart
HW Part SPF RF
HWPart
The calculation of the DC of an HW part is done analogously to the calculation of the latent-
point fault metrics as stated in ISO26262 Part 5 Chapter 9.4.3.11 Note 4 [1].
*Dedicated Measures
As stated in the ISO26262 Part 5 Clause 9.4.2.4 [1] a dedicated measure can be:
a) design features such as hardware part over design (e.g. electrical or thermal stress
rating) or physical separation (e.g. spacing of contacts on a printed circuit board) or
b) a special sample test of incoming material to reduce the risk of occurrence of this failure
mode or
c) a burn-in test or
d) a dedicated control set as part of the control plan and
e) assignment of safety-related special characteristics.
GO TO NEXT STEP
Chapter 6.10.1
The model describing the functional and physical architecture of the component.
The description of physical architecture is a block diagram showing the internal
HW blocks of the component, their physical inputs and outputs and the
interconnections between the internal HW blocks of the component.
The description of the functional architecture must give the behavior of the
component functions and their split-up into sub-functions. An allocation of the
sub-functions to the HW blocks of the component is necessary as well, meaning
that physical and functional architectures must fit together (every sub-function
must fit to a unique HW block).
Qualitative component FMEDA with the list of HW block malfunction, their effect
at component level, their criticality or severity, and the safety mechanism
implemented to control the malfunction propagation.
Start
Yes
Which fraction is
Which fraction is
1-DC1 prevented by safety DC1 Which fraction is 1-Fper
detected by another 1-DC2
Residual Fault mechanism from perceived by the MPF, Latent End
Multiple Point Fault safety mechanism and
RF = FR x (1-DC1) violating the safety driver ? = (FR-RF) x (1-DC2)(1-Fper)
(possible failure of the notified to the driver?
goals
safety mechanism prior
failure mode occurs) DC2 Fper
End End
To compute the architectural metrics SPFM and LFM the total sum of the failure rates of
safety-related failure modes (impacting denominator in following formula) need to be
determined as given by the formula below:
SPFM 1
SR, HW
LFM 1 SR, HW
SR, HW SR, HW
SPF RF
SR, HW
With
SR, HW
SPF RF = Sum of (Residual Fault and Single Point Fault failure rates)
SR, HW
MPF , Latent = Sum of Multiple Point Fault Latent failure rates
Then knowing
SR, HW
it is possible to calculate the SPFM and LFM and verify if architectural
metrics targets are reached for each considered safety goal or critical component
malfunctions.
can not be determined accurately when considering HW block because this approach
SR, HW
does not allow taking into account the failure rate of safe failure modes of safety related HW parts
as illustrated hereafter:
When quantitative FMEDA is performed at HW part level, if one of the failure mode belonging to a
HW part is violating the safety goal then the complete failure rate is considered for
SR, HW
calculation.
A proposal was done in context of D322a deliverable [16] to estimate accurately the amount of
safety-related failure rate (same result as HW part level) but it has to be verified with concrete
examples.
If SPFM and LFM targets are not reached, main contributors to SPFM and LFM can
be identified and new safety mechanisms put in place or improved if their diagnostic
coverage was very low (60%). Be careful, here the recommendation is not to play with
numbers to reach the target. It must be done carefully and if architectural metrics values are
closed to the targets, the best is to come with these values to the system responsible that
will be able to verify that at safety goal level, the final architectural metrics target is reached
when integrating all results from all components. An example is given in chapter 6.10.1.
Also it can be interesting to have the list of the main contributors for each metric with
potential actions identified when architectural metrics targets are not reached.
6.9.1.7 Quantitative Component FMEDA at HW Block Level: Illustration via our example
The qualitative FMEDA was performed on HW block level from the Top Column Module
(TCM) ECU component and it is our main input with eFMEA synthesis to perform our
quantitative FMEDA.
The following synthesis is extracted from the qualitative FMEDA for the analog interface:
Severity or
Potential Criticality
Component effect without safety
Analog Interface Without Safety mechanism
mechanism [TCM]
Malfunction Safety
mechanism
MF1003 : Lighting command on the CAN SM05: Detection of out of
MF501 : Wrong output
bus erroneously switches from LOWBEAM range values by the µC.
value provided: out of ASIL B
ON to another valid position (OFF or CAN light command put at
range
PARKING LIGHT ON) by the TCM_ECU INVALID
MF502 :Wrong output MF1003 : Lighting command on the CAN
value provided: OFF bus erroneously switches from LOWBEAM
instead of Low Beam ON to another valid position (OFF or ASIL B Not Detectable
ON PARKING LIGHT ON) by the TCM_ECU
MF503 : Wrong output MF1003 : Lighting command on the CAN
value provided: Parking bus erroneously switches from LOWBEAM
light ON instead of Low ON to another valid position (OFF or ASIL B Not Detectable
Beam ON PARKING LIGHT ON) by the TCM_ECU
During qualitative FMEDA, it was identified that if the analog interface provided a wrong
output value of type [out of range] it could violate our considered critical malfunction
[MF1003]. Nevertheless it could be detected by the safety mechanism SM05 and the system
put in safe state [LOWBEAM ON].
It was also identified that if the analog interface provided a wrong output value of type [OFF
instead of LOWBEAM ON] or [PARKING LIGHT ON instead of LOWBEAM ON] could violate
also our considered critical malfunction [MF1003] and that was not detectable.
The following synthesis is extracted from the eFMEA for the analog interface (see chapter
6.7.1.7):
For the analog interface quantitative FMEDA at HW block level is the following:
goal
goal
Table 13 : Example of partial quantitative FMEDA at HW Part level for analog interface performed on our use case
As it can be easily seen here the quantitative FMEDA table when performed at HW block
level is must better readable than compared to the equivalent quantitative FMEDA done at
HW part level in chapter 6.8.1.
For the analog interface all potential HW block malfunctions are safety-related.
and
SR, HW
SPF RF = 0.94 FIT
and
SR, HW
MPF , Latent = 1.224 FIT
All necessary data to determine local architectural metrics for the analog interface are now
available. Results are:
SPF RF
0.94
SPFM 1 SR, HW
1 76.5%
SR, HW
4
MPF , Latent
1.224
LFM 1 SR, HW
1 60%
SR, HW SR, HW
SPF RF 4 0.94
Compared to the initial architectural metrics targets defined in chapter 6.5.4.5 (SPFM = 90%;
LFM = 60%) the SPFM result is not good but only considering the analog interface to get a
simple example. In real life other parts of the TCM component have also to be considered for
the global architectural metrics calculation.
It can be notice that here the results for SPFM and LFM are exactly the same than the
results of the quantitative FMEDA performed at HW part level. It will not be always the case
because as mentioned already in 6.9.1.5 the amount of safety-related failure rate is
SR, HW
not defined accurately at HW block level. If a microcontroller has a failure rate of 100 FIT
with one failure mode safety-related and one failure mode not safety-related equally
distributed,
is 100 FIT when FMEDA is performed at HW part level but only 50 FIT
SR, HW
when FMEDA is performed at HW block level. Nevertheless it should not be the case
anymore with the new approach proposed in the deliverable D322a [16].
GO TO NEXT STEP
Chapter 6.9.2
6.9.2 STEP 5B: Calculate Component Residual Risk at HW Block level using
Method 2 / PMHF [Component Safety Analysis] [Verification Phase] [Alternative 1]
& [Alternative 2]
The second definition from the ISO26262 is more in line with the “probability of failure per
hour” that is required by IEC61508 [12] used for E/E safety-related system during continuous
or high demand mode operation. This definition of “probability of failure per hour” was also
not clear in edition 1 of IEC61508 and therefore at that time it has been very discussed as it
can be found in [15].
In the edition 2.0 of IEC61508 [12], FPH is defined as “the average of the so called
unconditional failure intensity (also called failure frequency) w(t) over the period of interest”:
With w(t) being defined for non repairable components also as where F(t) is the
probability of failure versus time often called also unreliability.
Let’s now take some simplified examples and calculate the unconditional failure intensity
over time and discussed its maximum and average values.
Example 01: Our component has single-point faults (SPF) or residual faults (RF) that have
failure rates constant versus time and a probability of failure F following an exponential law.
The evolution of the probability of failure F(t) versus time would be linear versus time (when
.t << 0.01) and therefore the unconditional failure intensity w(t) would be constant over
time.
neglected even if they have high failure rate. So calculating this time an average value of the
unconditional failure intensity over the period of interest or considering the maximum value
over the period of interest produces the same result.
Example 02: Our component has neither single-point fault (SPF) nor residual fault (RF) but
only latent dual-point faults (remaining latent over the lifetime) that have failure rates
constant versus time and a probability of failure F following an exponential law.
In this case the resulting probability of failure is not anymore linear (when .t << 0.01) but
follows a polynomial law of degree 2 (F(t) 1.2.t²).
Therefore the unconditional failure intensity is also not anymore constant versus time but
linear. Considering the maximum value or the average value does not give the same results.
In this case even the average value is optimistic (factor 2) compared to maximum value.
Figure 15: F(t) and w(t) plot with 2 latent faults combined (1 = 50 FIT and 2 =30FIT)
So it can be seen with these two simple examples that calculating the maximum value of
unconditional failure intensity during lifetime or taking the average value does not provide the
same results. Also if we also consider periodic testing, repair when a critical fault is detected,
etc…it is much more complex and therefore the best definition and more secure one is to
calculate the evolution of unconditional failure intensity during lifetime and get the maximum
value.
So we only consider faults that violate directly a consider safety goal, or dual-point faults
which remain latent during all the lifetime of the vehicle.
We also propose some possible FTA patterns of how to represent these different categories
of fault as proposed in the ISO26262 Part 10 figure B.4 [2].
As stated by the ISO26262 [1] a single-point fault is “the fault in an element that is not
covered by a safety mechanism and that leads directly to the violation of a safety goal”.
Therefore a single point fault will be always a minimal cut set or order 1 in a FTA. It will be
represented by a simple event in a FTA under an OR Gate with its failure rate defined.
As stated by the ISO26262 [1] a residual fault is “the portion of a fault that by itself leads to
the violation of a safety goal occurring in a hardware element where that portion of the fault
is not covered by safety mechanisms.”
There are several ways to represent a residual fault. Here 2 possibilities are shown.
(2)
(1)
(3)
Q = DC1
The residual-fault can be represented like in Also the residual-point fault can be represented
ISO26262 Part 10 figure B.4 [2] by a simple as the not detected portion a fault. It can be then
event where the failure rate is the residual-faultmodeled with a combination a event representing
failure rate under an OR gate. Therefore with thisthe total fit rate of the fault (1) and the NOT gate
representation a residual-fault will be a minimal (2) with an event modeling the availability of the
cut set of order 1. diagnostic coverage (DC1) (3). As this global
pattern is build from an AND gate the final
residual-fault will appear as a cut set of order 2.
Figure 17: Example of 2 possible residual-fault FTA patterns
The second pattern (right) has the advantage to be able to model the diagnostic coverage as
a parameter and in case of parameter change it seems more flexible. Nevertheless the
resulting FTA is more complex than the one generated with the first pattern (left) and the
NOT gate should be properly computed in the FTA tool used.
There are several ways to represent a dual-points failures resulting from safety mechanism
failure. Here 2 possibilities are shown:
(3)
(2) (1)
Q = DC1
The dual-points failure resulting from a safety Also the dual-point failure resulting from a safety
mechanism failure can be represented like in mechanism failure can be represented as a
ISO26262 Part 10 figure B.4 [2] by a simple combination a 3 events under an AND Gate. One
event which represents the safety mechanism of them is representing the fault of the HW
failure combined with an event representing the element with its failure rate (1), another the
portion of the fault of an HW element normally failure of the safety mechanism with its failure
detected. Therefore with this representation a rate (2) and the last event is modeling the
dual-point failure resulting from safety availability of the diagnostic coverage (DC1) (3).
mechanism failure will be a minimal cut set of With this representation the final dual-point
order 2. failure will not appear as a cut set of order 2 but
as cut set of order 3.
Figure 18: Example of 2 possible dual-point fault FTA patterns resulting from safety-mechanism
failure
The second pattern (right) has the advantage to be able to model the diagnostic coverage as
a parameter and in case of parameter change it is more flexible. Nevertheless the resulting
FTA is more complex than the one generated with the first pattern (left).
Pattern 01
as proposed in the SAE ARP 4761 figure D14 [9] Pattern 02
and the ISO26262 Part 10 figure B.4 [2]
Figure 19: Example of 2 possible FTA patterns to model safety mechanism effect
The pattern 02 looks more complex but using Altarica dataflow we were able to generate it
automatically (refer to [17]). Moreover each value for the diagnostic coverage (DC), failure
rates are represented by an independent event and do not need intermediate calculation
such as for pattern 01.
Nevertheless for a complete component we may imagine that an FTA build using pattern 1 is
much more readable than the one build with Pattern 2.
As stated by the ISO26262 [1] a dual-point fault is “an individual fault that in combination
with another independent fault leads to dual-point failure”. Therefore a dual-point failure (in
case there not safety mechanism involved) will be always a minimal cut set or order 2 in a
FTA and will be represented by a combination of two independent events in a FTA under an
AND gate with their failure rates defined.
The unconditional failure intensity has to be calculated over the lifetime and the maximum
value considered for the PMHF.
6.9.2.8 PMHF calculation using Quantitative Component FTA: Illustration via our
example
Let’s consider the light system example and here more particularly the Top Column Module
(TCM) ECU.
The Component FTA was performed in chapter 6.6.2.7 from which we got the following
results:
MF1003 : Lighting command on the
CAN bus erroneously switches from
LOWBEAM ON to another valid
position (OFF or PARKING LIGHT
ON) by the TCM_ECU
OR
[Analog Interface] MF502 :Wrong [Analog Interface] MF503 : Wrong [Microcontroller] MF901 : Wrong [Microcontroller] MF902 : Wrong Light
[Analog Interface] MF501 : output value provided: OFF instead of output value provided: Parking light ON [Power Supply Unit] elaboration of the CAN Light command command (another valid position instead
Wrong output value provided: Low Beam ON instead of Low Beam ON
MF401 : Low v oltage(< (another valid position instead of LOW of LOW BEAM ON) sent on the CAN
BEAM ON) bus
out of range
5V) prov ided
IE IE IE IE IE IE
For the malfunction of the analog interface [MF501] a safety mechanism was implemented to
detect a portion of it.
It was not shown in the previous examples safety mechanism to control the other analog
interface malfunction [MF401] and [MF902].
The resulting quantitative FTA modeling the different safety mechanisms could be:
MF1003 : Lighting command on the CAN bus
erroneously switches from LOWBEAM ON to
another valid position (OFF or PARKING
LIGHT ON) by the TCM_ECU
IE
OR
Q=9.7e-9 w=9.7e-9
[Analog Interface] MF501 : Wrong output value [Analog Interface] MF502 :Wrong output value [Analog Interface] MF503 : Wrong output value [Power Supply Unit] MF401 : Low voltage(< [Microcontroller] MF902 : Wrong Light [Microcontroller] MF902 : Wrong Light
provided: out of range without [TCM_SM01] provided: OFF instead of Low Beam ON provided: Parking light ON instead of Low 5V) provided without [TCM_SM09] detection command (another valid position instead of command (another valid position instead of
detection Beam ON LOW BEAM ON) sent on the CAN bus LOW BEAM ON) sent on the CAN bus
without [TCM_SM11] detection
IE IE IE IE IE IE
Q=3.4e-10 w=3.4e-10 Q=6e-10 w=6e-10 Q=0.0 w=0.0 Q=2e-9 w=2e-9 Q=2.5e-9 w=2.5e-9 Q=4.3e-9 w=4.3e-9
IE IE IE IE IE IE
Q=3.4e-9 w=3.4e-9 Q=0.1 w=3.6e-8 Q=5e-9 w=5e-9 Q=0.4 w=1.1e-8 Q=2.5e-8 w=2.5e-8 Q=0.1 w=3.6e-8
Portion of f ault NOT [TCM_SM05] failure Portion of f ault NOT [TCM_SM09] failure Portion of f ault NOT [TCM_SM11] failure
detected by [TCM_SM05] prior [MF501] failure detected by [TCM_SM09] prior [MF401] failure detected by [TCM_SM11] prior [MF901] failure
IE IE IE IE IE IE
Q=0.1 w=0.0 Q=4e-8 w=4e-8 Q=0.4 w=0.0 Q=1.8e-8 w=1.8e-8 Q=0.1 w=0.0 Q=4e-8 w=4e-8
Diagnostic Cov erage of Safety Mechansim Diagnostic Cov erage of Diagnostic Cov erage of Safety Mechansim Diagnostic Cov erage of Diagnostic Cov erage of Safety Mechansim Diagnostic Cov erage of
Saf ety Mechansim [TCM_SM05] failure Saf ety Mechansim Saf ety Mechansim [TCM_SM09] failure Saf ety Mechansim Saf ety Mechansim [TCM_SM11] failure Saf ety Mechansim
[TCM_SM05] [TCM_SM05] [TCM_SM09] [TCM_SM09] [TCM_SM11] [TCM_SM11]
IE R IE IE R IE IE IE R IE IE IE R
Q=0.9 w=0.0 Q=4.5e-8 w=4.5e-8 Q=0.9 w=0.0 Q=0.6 w=0.0 Q=3e-8 w=3e-8 Q=0.6 w=0.0 Q=0.9 w=0.0 Q=4.5e-8 w=4.5e-8 Q=0.9 w=0.0
Figure 21: Example for quantitative component FTA for the TCM_ECU for SG_01
As it can be seen such FTA performed at HW block level is much readable that those
performed at HW part level.
The final calculation give an unconditional failure intensity or failure frequency of 9.7 FIT. We
are much below the initial target of 50 FIT that was allocated to our componenent.
IResult can be predicted as there are single-point faults and residual faults in our
component, the PMHF can be easily approximate to SPF RF .
SR, HW
When single-point faults and residual faults are remaining in a component for a
considered safety goal, a good mean to compare consistency of results between quantitative
FMEDA and quantitative FTA is to compare the results as they will be very similar. Therefore
the strong effort spent to model dual point faults with possible different exposure time, etc…
is done most of the time for nearly limited added value.
When quantitative FMEDA is performed at HW part level (see chapter 6.8.1) the fact to
add a new column with the potential failure mode effect at output of the HW block permit to
have the link with the quantitative FTA performed at HW block level.
6.10.1 STEP 6A: Verifying Architectural Metrics at System level (Optional) [System
Safety Analysis] [Verification Phase]
6.10.1.5 Verifying Architectural Metrics at System level: Main principles using our
example
Let us take the lighting system to illustrate how the verification of architectural metrics at
system level could be achieved based on architectural metrics calculated for each relevant
component, here the Top Column Module (TCM) ECU and the Body Control Management
(BCM) ECU.
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
Light Module 02
Lighting System
From the different ECU suppliers data the following intermediate results synthesis is built:
SR, HW
MPF , Latent 10 FIT 62.4 FIT
SR, HW
77.65 FIT 245 FIT
For single-point fault metrics, the TCM ECU does not meet the individual target. That why
the verification at system level is very important.
For latent-fault metrics, both ECUs are individually meeting the target. So obviously it should
be the same at system level, but for the example we performed the verification.
Here for verification we could also use the same kind of template that is proposed for
quantitative FMEDA. Nevertheless as we have only few components in our system we use
the architectural metrics formulae with calculation as follows:
SPF RF
9.7 22.05
SPFM 1 SR, HW
1 90.15% OK
SR, HW
77.65 245
MPF, Latent
10 62.4
LFM 1 SR, HW
1 75% OK
SR, HW SR, HW
SPF RF (77.65 245) (9.7 22.05)
So finally even if the local SPFM target was not reached for the Top Column Module ECU
the final verification at system level showed that global SPFM target was met. The global
LFM target is met at system level as expected.
If here we have had an external safety mechanism in the BCM_ECU that could detect
a portion of the single-point fault and residual point faults generated by the TCM_ECU, then
it should be taken into account in the global calculation.
Adequate measures with action plan if we are not compliant with targets for a
considered safety goal.
6.10.2 STEP 6B: Verifying Residual Risk at System level (Optional) [System Safety
Analysis] [Verification Phase]
6.10.2.5 Verifying Residual Risk at System level: Main principles using our example
Let’s take the lighting system to illustrate how the verification of residual risk for each
considered safety goal at system level could be achieved having all residual risk results
(using method 1 or method 2) already available for the relevant components, here the Top
Column Module ECU and the Body Control Management ECU.
Light Module 01
Switch
Top Column CAN Bus Body Control
Module_ECU Management _ECU
Light Module 02
Lighting System
As we have single-point faults and residual point faults in the system (see chapter 6.10.1.5)
basically the residual risk at system level for our considered safety goal is the sum of failure
rates of these main contributors. So normally the residual risk at system can be approximate
to SPF RF and in our case as the targets were 50 FIT for both ECUs we would meet
SR, HW
the targets at system level.
Let us now imagine a factice example to show how both alternative methods could be
combined and also highlight some difficulties to reach the residual risk target for our
considered safety goal.
In our case, on one hand the Top Column Module ECU residual risk was evaluated with
success using the method of failure rate class criterion. On other hand the Body Control
Management ECU residual risk was evaluated using a quantified component FTA and
unfortunately we do not meet the target (60 FIT instead of 50 FIT allowed).
Therefore if we are using the system FTA that was used for residual risk allocation we have:
MF01: Loss of the low beams leads to
Violation of Safety Goal 01 :
The system shall not spuriously cut off
the low beams [ASIL B]
IE
OR
Q=0.0016 w=1.1e-7
IE IE
TCM_E001 BCM_E001
Not surprising the residual risk target for the considered safety goal at system level is not
reach (110 FIT instead of 100 FIT allowed).
In the example from chapter 6.10.1.5 it is was shown that sometimes when local metrics
target are not met at component level final global results can meet the target when
combining all results together.
But in this scenario we cannot expect to benefit to the fact that the Body Control
Management ECU is better than its residual risk target because Failure Rate Class method
was used with success but we do not have a accurate measurement compare to the target
(as pushed by the FRC methods).
So in real life we could be in a situation that we have to modify the design of the Body
Control Management ECU to reach the residual risk at system level while we had lot of
margin with the Top Column Module ECU.
As a short conclusion as shown above it is possible to mix the residual risk coming
from different components evaluated with the 2 alternative methods (PMHF or Failure Rate
Class) proposed by ISO26262 (refer to chapter 6.8.2). Nevertheless the use of Failure Rate
Class method is not recommended because margin between the residual risk achieved
versus the target cannot be estimated.
Compliance status for the residual risk metrics for each safety goal at system level
compared to targets.
Adequate measures with action plan if we are not compliant with targets for a
considered safety goal.
The different safety FMEAs (SFMEA, FMEDA, eFMEA) are part of the safety process. They
all aim at controlling propagation of internal failures (fault tolerance).
Classic Design FMEA (DFMEA) and Process FMEA (PFMEA) are fundamentally different as
they aim at fault avoidance (the design faults and process faults). DFMEA and PFMEA are
not considered as safety activities. As these activities ensure the robustness of the design
and the conformity of the production, they are considered as pre-requisites for safety
activities. As these activities are important to reach an acceptable level of safety for a
component development, they are part of the safety case.
So gaps are clearly identified between our needs for safety FMEAs and what is seen today
in most tools of the market that are mainly addressing FMEA for fault avoidance only.
Therefore some requirements for such tools will be addressed in chapter 8.3.
In ISO26262 Part 9 Clause 8.2 [1] about safety analyses it is written than “Quantitative
safety analyses complement qualitative safety analyses”.
It is maybe true for FTA tools, starting from a qualitative FTA and then using quantification
tool features to obtain a quantified FTA, via classical probabilistic law occurrence definition
on basic event of the FTA.
As shown in chapter 5.1 ISO26262 [1] recommends or requires, depending on the ASIL of
the safety goals to not violate, to perform inductive (bottom up) and deductive (top down)
safety analyses. The main goal for performing safety analyses with different reasoning
behavior is to ensure exhaustively and therefore the risk to forget a scenario for failure
propagation up to the safety goal is very limited.
Therefore when using different inductive methods such as FMEAs (qualitative and
quantitative) and FTAs (qualitative and quantitative), it is good to connect the different
events that are appearing in both types of safety analyses as illustrated in ISO26262 Part 10
figure B.3 [2] as follows:
In the most tools of the market seen today, there are clearly some gaps identified when
different analyses types have to be combined, and also there is no consistency check
performed. Therefore some requirements for such tools will be addressed in chapter 8.3.
The method 1 is based on a quantified FTA to calculate the PMHF. For more explanation on
the method please refer to chapter 6.9.2. There are clearly gaps in FTA tools of the market
to calculate accurately this PMHF value thanks formal representation and impact and of
diagnosis coverage of safety mechanism. Therefore some requirements for such tools will
be addressed in chapter 8.3.
The method 2 is based on Failure Rate Class and is performed during quantified FMEDA.
For more explanation on the method please refer to chapter 6.8.2.2. Most market available
tools can calculate architectural metrics but cannot perform residual risk calculation based
on method 2. Therefore some requirements for such tools will be addressed in chapter 8.3.
In particular the Failure Rate classification is not implemented and correct check of individual
part criteria is not verified.
8 Tool specification
The safety analyzes types considered in this deliverable are the qualitative and quantitative
FMEAs (including SFMEA, qualitative FMEDA, and quantitative FMEDA) and qualitative and
quantitative FTA. The detail of how these safety analysis types were originally selected can
be found in the deliverable D331a under chapter 6.4 [17].
8.2 WT331 Added Value and topics of interest derived from ISO26262
FMEAs and FTAs tools are not new on the market. Therefore the goal here is not to specify
requirements for elementary features that are already implemented in most of the market
available tools.
We concentrate only on features that are missing today relatively to ISO26262 or features
indifferent domains of interest that could be relevant for users as daily use as illustrated
hereafter:
Figure 24: Domains of interest for tool requirements for daily use
Link or comment
Req. Tool
Domain Req. description for more
# applicability explanations
The tool shall allow to export results in an XML http://www.open-
20 Export All capable to be read by tools (XML SAFE or OpenPSA) psa.org/
The tool shall allow displaying qualitative and
21 Display FTA quantitative FTA with and without safety mechanism.
The tool shall permit to perform quantitative FMEDA at
22 Building FMEA any level of architecture (Part, HW Block, Component
level, ...)
The tool shall permit to initiate quantitative FMEDA at
23 Building FMEA HW Block level from qualitative FMEDA.
The tool shall permit to transform a malfunction
24 Building FTA identified in qualitative FMEA in event for FTA.
As generally implemented in FTA, the tool shall
calculate in quantitative FMEDA importance factor
25 Calculation FMEA versus (SPF or Residual) contribution and (MPF,
Latent).
The tool shall permit to calculate automatically FRC
26 Calculation FMEA scale to evaluate residual risk.
The tool shall permit to apply FRC method when
27 Calculation FMEA performing the quantitative FMEDA at Part level
The tool shall permit to re-assess the new effect in
28 Building FMEA qualitative FMEA when a safety mechanism is
implemented.
The tool shall permit to display results using filtering
29 Display All request from user (ex. safety-related, not-safety
related, SPF, Residual, MPF, Latent,).
The tool shall be able to detect inconsistency between
30 Debugging FMEA qualitative and quantitative FMEDA.
The tool shall be able to detect inconsistency between
31 Debugging All qualitative FMEA and qualitative FTA for minimal cut
sets of order 1.
The tool shall able to import Bill of Material to simplify
32 Building FMEA eFMEA / Quantitative FMEDA at Part level.
The tool shall interface eFMEA and quantitative
33 Building FMEA FMEDA at Part level with failure mode database (in-
house, external).
The tool shall able to have database of which safety
34 Building FMEA mechanism are selectable when performing Safety
Analyses.
The tool shall permit to export the safety analysis
35 Export All results in reports that are customizable by users.
The tool shall permit to customize and rearrange
36 Display All displayed results easily.
The tool shall permit to display relationships between
37 Display All different elements of the model.
The tool shall permit to navigate easily between the
38 Display All different elements of safety analyses.
The tool shall permit to identify malfunction that were
39 Debugging All not considered in safety analyses.
When a new malfunction is highlighted during safety
40 Building All analysis the upper safety analysis shall show a need
to update the analysis.
The FTA tool shall permit to calculate the
41 Calculation FTA unconditional failure intensity or failure frequency over Used for PMHF
time and get the maximum value over time.
The FTA tool shall permit to display the unconditional
42 Display FTA failure intensity or failure frequency over time.
Used for PMHF
The FTA tool shall permit to model event that are
43 Building FTA periodically tested with a certain coverage and
exposure time.
The FTA tool shall permit to create our own patterns to
44 Building FTA model a HW failure covered with a certain coverage by
a safety mechanism.
The aim of this section is to facilitate the use of safety analyses by defining malfunction
ontology as library of malfunction types to be used and exchange within supplier chain. This
ontology of malfunction can be used according to engineering level or across the abstraction
view of the safety analysis, depending of the accuracy required in the safety analysis.
The proposed first part of the ontology of malfunction for the representation of the functional
architecture is recommended to be simple. Thanks to above classification the elements
defined are:
Structure Malfunction
- Error: data Error as generic fault/failure visible at port level.
- Limp Home: Limp Home data visible at port level (see Note below).
Behaviour Malfunction
- Sensor Error: Error as internal or process fault of a sensor.
- Actuator Error: Error as internal or process fault of an actuator.
- Function Error: Error as internal or process fault of a function.
Depending of precision of the analysis required, the below definition from Software domain
can be used as they are related to functional approach from the software perspective.
This part of the ontology is reflecting the problematic of fault and failure link to hardware and
software architecture. It is more precise than the elements used in the error model of the
functional safety concept and spitted across software and hardware domain.
Structure Malfunction
Software architecture domain (Function Port):
- Omission: data not delivered.
- Commission: data delivered erroneously.
- High Value: data stick to a high value.
- Low Value: data stick to a low value.
- Fixed Value: data stick to a fixed value.
- Drift Value: data value is drifting.
- Latency Value: data latency.
- Limp Home Value: limp home data.
Behaviour Malfunction
Software architecture domain (Design Function):
- Process Software: process fault as wrong use of specification (calibration ...).
These composite malfunctions shall be filled with the following detailed malfunction:
- Internal Drift: Internal drift of the E/E value delivered.
- Internal Latency: Latency of E/E value delivered.
- Internal Open Circuit: No E/E value delivered.
- Internal Short Circuit to Ground: E/E Fixed Ground value delivered.
- Internal Short Circuit to Battery: E/E Fixed Battery value delivered.
It can be noticed that Core and Peripheral Error can be specialized at the level of
implementation.
The related ontology for implementation described below is limited to the use of malfunction
in the error model for AUTOSAR safety analysis to interface and malfunction visible from the
execution infrastructure platform.
They represent the malfunction visible at the port level of the operation and services
provided by the AUTOSAR infrastructure as the RTE level as communication and the visible
malfunction of the resource for executing software as the Micro-controller for computing by a
core.
Structure Malfunction
Computing anomalies:
- Omission: software unit gets not executed at all.
- Commission: software unit gets executed too often.
- Too Late: execution of software unit terminates too late.
- Too Early: execution of software unit terminates too early.
- Memory Error: software unit access to memory fails.
- Execution Error: execution of software is not performed correctly
Communication anomalies:
- Omission: application environment omits the provision of incoming data.
- Commission: application environment provides incoming data too often.
- Too Late: data arrives too late.
- Too Early: data arrives too early.
- Value Error: received data is manipulated by the application environment.
10 Conclusions
Here in this deliverable, we had not the pretention to answer to all interrogations that could
arise from the exchange between partners and that are not clearly explained in the
ISO26262.
Nevertheless we try to propose a complete safety development cycle from system design to
detailed design with some possible ways to perform safety analyses.
Also it was shown that it is possible to calculate the architectural metrics at different
architectural levels which is not so obvious for people because not highlighted clearly in the
ISO26262.
Moreover we have illustrated via a example that it was possible for the system responsible to
mix results for residual risk evaluation coming from the different methods (PMHF, FRC) as
proposed by the ISO26262.
Then we provide a first list of requirements that can be addressed to tools vendors from the
SAFE project. With these requirements implemented we can expect to be able to cover all
safety analyses proposed in the global safety analysis process and also improve the
usability of such tools in daily use.
Next step is to generate the FMEAs and FTAs views from the error model automatic
execution from the SAFE tool with possible quantification for random HW failures.
12 References
[1] International Organization for Standardization: ISO 26262 Road vehicles - Functional
safety. Part 1 to 9 (2011)
[2] International Organization for Standardization: ISO 26262 Road vehicles – Functional
safety. Guideline Part 10 (2012)
[3] SAFE Deliverable D311b: Final proposal for extension of meta-model for hazard and
environment modeling ; http://www.safe-project.eu/SAFE-Publications/SAFE_D3.1.1.b.pdf
[4] VDA Volume 4 Chapter: Product-and Process-FMEA
[5] IEC 60812 ed.2.0, Analysis techniques for system reliability – Procedure for failure
mode and effects analysis (FMEA). (2006)
[6] SAE J1739, Potential Failure Mode and Effects Analysis in Design (Design FMEA),
Potential Failure Mode and Effects Analysis in Manufacturing and Assembly Processes
(Process FMEA). (2009)
[7] IEC61025 ed.2.0, Fault Tree Analysis. (2006)
[8] NUREG-0492: Fault Tree Handbook from US Nuclear Regulatory Commission. (1981)
[9] SAE ARP4761: Guideline and Methods for conducting the safety assessment process
on civil airborne systems and equipments. (1996)
[10] MIL-STD1629A: Military Standard, Procedure for Performing a Failure Mode, Effect
and Criticality Analysis. (1980)
[11] Experience with the second method for EPS hardware analysis: Evaluation of each
cause of safety goal violation due to random hardware failures; K.Svancara &
W.Forbes & J.Priddy & M.Kudanowski & T. Lovric & J. Miller; VDA Automotive Sys
conference May 2012.
[12] Advantages of the alternative method for random hardware failures quantitative
evaluation – A practical survey for EPS, K.Svancara & W.Forbes & J.Priddy &
M.Kudanowski & T. Lovric & J. Miller, SAE conference April 2013.
[13] Adler, N., Otten, S., Cuenot, P., and Müller-Glaser, K., "Performing Safety Evaluation
on Detailed Hardware Level according to ISO 26262," SAE Int. J. Passeng. Cars –
Electron. Electr. Syst. 6(1):102-113, 2013, doi:10.4271/2013-01-0182.
[14] IEC 61508 standard: Functional safety of electrical/electronic/programmable electronic
safety-related systems, Parts 6, 2010 (International Electrotechnical Commission,
Geneva, Switzerland).
[15] New insight into the average probability of failure on demand and the probability of
dangerous failure per hour of safety instrumented systems, F Innal & Y Dutuit & A
Rauzy & J-P Signoret, Proc. IMechE Vol. 224 Part O: J. Risk and Reliability.
[16] SAFE Deliverable D322a : Proposal for extension of Meta model for hardware
modeling ; http://www.safe-project.eu/SAFE-Publications/SAFE_D3.2.2.pdf
[17] SAFE Deliverable D331a : Proposal for extension of metamodel for error failure and
propagation analysis ; http://www.safe-project.eu/SAFE-Publications/SAFE_D3.3.1.a.pdf
13 Acknowledgments
This document is based on the SAFE project in the framework of the ITEA2, EUREKA
cluster program Σ! 3674. The work has been funded by the German Ministry for Education
and Research (BMBF) under the funding ID 01IS11019, and by the French Ministry of the
Economy and Finance (DGCIS). The responsibility for the content rests with the authors.