3. Conachey2008_1
3. Conachey2008_1
3. Conachey2008_1
Originally presented at the 2005 SNAME Marine Technology Conference & Expo. Reprinted with the permission of the
Society of Naval Architects and Marine Engineers (SNAME).
Material originally appearing in SNAME publications cannot be reprinted without written permission from the Society,
601 Pavonia Ave., Jersey City, NJ 07306.
ABSTRACT
This paper discusses the technical background of the American Bureau of Shipping (ABS) “Guide for
Survey Based on Reliability-centered Maintenance” to improve reliability for vessels’ machinery
systems and receive credit towards certain machinery survey requirements. Risk assessment
techniques and RCM analysis are used to provide a process to optimize maintenance tasks and achieve
optimal reliability. A process for spare holding requirements incorporating risk is discussed. A
sustainment process was developed so the operator can keep the preventative maintenance tasks
current as the system ages, new failure modes or system modifications occur. The approaches taken to
address the lack of quantitative data for equipment failures, consistency of analyses among operators
and types of consequences and descriptions related to their severity are discussed.
classification. A companion document to the RCM Specifically, one of the focuses of reliability
Guide, Guidance Notes on Reliability-centered improvement is to manage the equipment failures that
Maintenance was published to provide additional impact system performance (e.g., losses of system
information related to maintenance and risk analysis function). Therefore, an understanding of the factors
(ABS 2004). This paper discusses the ABS approach that influence equipment failures is needed. The
taken to apply the principles of this maintenance following factors usually influence equipment failure:
philosophy. • Design error
• Faulty material
OVERVIEW OF RCM PRINCIPLES • Improper fabrication and construction
RCM is a process of systematically analyzing an • Improper operation
engineered system to understand: • Inadequate maintenance
• Maintenance errors
• system functions and impact of functional
failures Therefore, maintenance is merely one of the many
• equipment failure modes and causes that can approaches to improving equipment reliability and
result in functional failures hence system reliability. RCM analyses focus in
• optimal strategy for managing potential reducing failures resulting from inadequate
failures, including maintenance to prevent the maintenance. In addition during the RCM analysis
failures from occurring or to detect potential process, some equipment failures may be identified as
failures before a failure occurs, and the result of maintenance errors. In these cases the
• spares holding requirements. results of RCM analyses may suggest improvements for
ABS requires the following analytical tools to be specific maintenance activities, such as improving the
employed when performing the RCM analyses: manner in which the maintenance procedures are
• Failure modes, effects, and criticality analysis carried out, improving worker performance through
(FMECA), additional training or required skill level, or adding
• RCM task selection flow diagram, quality assurance/quality control tasks during the
maintenance procedure to verify correct performance of
• Risk-based decision making tools (e.g., risk
critical maintenance tasks. Furthermore, RCM analyses
matrix).
may recommend design changes and/or operational
In addition, the following system expertise is needed to
improvements when equipment reliability cannot be
successfully and efficiently perform the analysis:
ensured through maintenance.
• Design, engineering, and operational
knowledge of the system,
Equipment failure rate and patterns
• Condition-monitoring techniques, planned
maintenance actions, failure finding One of the key concepts of RCM is that all equipment
techniques, failures are not the same; therefore, the maintenance
• Other proactive maintenance practices (e.g., tasks necessary to prevent failures may require different
lubrication). strategies in order to successfully manage them. In fact,
depending on the dominant system failure mechanisms,
Equipment failure basics system operation, system operating environment, and
system maintenance, specific equipment failure modes
Since 1978, ABS has cooperated with owners/operators exhibit a variety of failure rates and patterns.
on developing and implementing preventative
maintenance programs. The Bureau recognized an First, let’s discuss the failure rate. The conditional
effective program improved machinery reliability. ABS probability failure rate or lambda (λ) is the probability
issued its first Guide for Survey Based on Preventative that a failure occurs during the next instant of time
Maintenance Techniques (PM Guide) in 1984 that listed given that the failure has not already occurred before
the requirements for a preventative maintenance that time. The conditional failure rate, therefore,
program and provided credit towards a vessel’s Special provides additional information about the survival life
Periodical Machinery Survey for equipment enrolled in and is used to illustrate failure patterns.
the program. Unlike the PM Guide, the RCM Guide For most equipment failure modes, the specific failure
provides owner/operators a process to create an patterns are not known and fortunately detailed
effective preventative maintenance program applying knowledge is not needed to make maintenance
risk principles and a maintenance task methodology. decisions. Nevertheless, certain failure characteristic
The RCM analysis process uses these tools and information is needed to make maintenance decisions.
expertise to help establish the cause effect relationship These characteristics are:
between equipment failures and system performance • Wear-in failure – dominated by “weak”
(e.g., the FMECA) and then determine an effective members related to problems such as
failure management strategy (e.g., RCM task selection). manufacturing defects and
A combination of one or more equipment failures installation/maintenance/startup errors. Also
and/or human errors causes a loss of system function. known as “burn in” or “infant mortality” failures.
• Random failure – dominated by chance failures human errors during maintenance tasks. If an
caused by sudden stresses, extreme conditions, equipment failure mode exhibits a wear-out pattern,
random human errors, etc. (e.g., failure is not rebuilding or replacing the equipment item may be an
predictable by time) during the “useful life” of appropriate strategy.
the equipment. Finally, a basic understanding of failure rate helps in
• Wear-out failure – dominated by end-of-useful determining whether maintenance or equipment
life issues for equipment. redesign is necessary and provides insight into
These failure characteristics are best illustrated by the frequency of maintenance tasks. Once one begins to
failure pattern identified in Figure 1. By simply understand how equipment fails and its failure rate and
identifying which of the three equipment failure pattern, an understanding of maintenance task types and
characteristics is representative of the equipment failure their relationship to the failure characteristics is needed.
mode, one gains insight into the proper maintenance
strategy. Overview of Maintenance Task Types
Understanding that equipment failure modes can exhibit One of the primary objectives of the RCM analysis is to
different failure patterns has important implications define a set of proactive maintenance tasks needed to
when determining appropriate maintenance strategies. manage potential equipment failures that can impact
The literature has indicated there are six different critical system performance. These tasks can manage
failure patterns as shown in Table 1 (Nowlan/Heap these potential failures by:
1978, Moubray 1997, and Smith 1993). We have listed • Detecting onset of failure with sufficient time
the failure characteristic(s) too along with some to allow corrective action before the failure
representative examples: occurs, e.g. condition monitoring tasks,
• Preventing the failures before they occur,
• Pattern A – Bathtub Curve – Wear-In, Random
which are referred to in the RCM Program as
Failure, Wear-Out
planned maintenance tasks,
• Pattern B – Traditional Wear-Out – Random
• Discovering and correcting hidden failures
Failure, Wear Out
before they impact system performance, e.g.
• Pattern C – Gradual Rise with no Distinctive
failure finding tasks.
Wear-out Zone - Random Failure
• Applying operational restrictions or some other
• Pattern D – Initial Increase with a Leveling Off
action, e.g. any applicable and effective task.
– Random Failure
In addition, the RCM analysis might indicate the failure
• Pattern E – Random – Random Failure
does not warrant any proactive maintenance and run-to-
• Pattern F – Infant Mortality- Wear-In, Random
failure is acceptable. Also, RCM analyses should
Failure
include routine servicing tasks to ensure the assumed
Those patterns that do not have distinctive wear-out failure rate and failure pattern are valid (e.g., failure rate
regions (e.g., patterns C through F) may not benefit and pattern for an un-lubricated bearing is drastically
from maintenance tasks of rebuilding or replacing different from that of a lubricated bearing).
equipment items. There may actually be an increase in
failures as a result of infant mortality (pattern F) and/or
B u r n in W ear out T im e
modeled as a hierarchy for the purposes of performing likely a component may be omitted that contributes to
the FMECA. An example is shown in Figure 2. For that system function.
consistency, ABS has named these hierarchy levels, in To ensure consistency among the analyses received, in
descending order as follows: functional group, system, the developed templates we have listed suggested
sub-system, equipment item, and component. A failure modes for each component. For some operating
component is defined as: modes/contexts these failure modes may not be
• the lowest level that can be identified for its applicable and can be indicated so in the analysis. In
contribution to the overall functions of the some cases the failure modes listed may not have been
functional group, considered by the analysis team. ABS has provided a
list of suggested failure modes for ten groups of
• being identifiable for its failure modes, and equipment and components in Appendix 2 of the RCM
• the most convenient physical unit that can be Guide.
considered for the preventative maintenance
ABS decided for consistency among analyses received
plan.
to require the End Effect descriptions to be the effect on
The system block diagrams serve as an aid to visualize the functional group(s). A consolidated example format
the hierarchical structure and identify the various from the RCM Guide is shown in Table 3. The severity
system functions as shown in the example in Figure 3. level is defined for at least four levels from no effect,
Then, the various functional failures associated with two progressive functional degradations to complete
each function are identified. loss of function. Four levels is the minimum to ensure
Fortunately, the arrangement of the components within meaningful risk ranking. An additional severity level or
many systems such as, fuel oil, cooling water, two may be considered but greater care is necessary in
lubricating oil, is similar among vessels because of severity level definition. The traditional approach is to
classification and statutory requirements. Accordingly, define severity levels based on an order of magnitude in
ABS has created “templates” for participants in the economic terms (i.e. $10,000, $100,000, etc.). Some
ABS RCM Program which are partially completed would consider the approach in the RCM Guide as
FMECAs for various systems subject to our Rules and determining the intermediate effect, not the end effect.
Special Periodical Machinery Survey requirements on However, as part of the certification process it is
vessels. These systems are primarily associated with straightforward for determining failure effects on
propulsion, directional control, vessel safety and cargo functional groups when a component failure occurs.
handling. The templates permit those performing the Attempting to determine the ultimate end effects from a
analysis to reduce the analysis time, and ensure vessel’s complete loss of propulsion, such as grounding
consistency of analyses submitted to ABS. and considering other end effects such as pollution
caused by rupture of the fuel oil storage tanks, loss of
Step No. 5 – Conduct FMECA revenue, etc., is much more subjective and therefore
For Step No. 5, ABS requires the application of a difficult to evaluate. Such end effects would be
bottom-up FMECA. An example format is shown in dependent on the operating mode of the vessel,
Table 2. We selected the bottom-up format instead of geographic location, etc. If desired, the owner/operator
the top-down format because during development of the can extend the analysis to assess business risks. A brief
preventative maintenance plan for each system example of this approach along with an estimate in risk
component, there is less of a chance a component will reduction is provided in Appendix 1. However, the
be omitted. The top-down format is useful when RCM Program does not require analyses addressing
designing new systems to determine the risk associated business risks be submitted because of the proprietary
with various functional failures. If one chooses to apply business information they will likely include.
the top-down format for existing systems, it is
necessary to identify all system functions otherwise it is
The other element of risk is the likelihood or frequency The ABS task selection process is similar to other
of the failure mode. Many efforts have been made and selection processes with respect to requiring one-time
are currently underway to collect failure rate data for changes for failure modes with the highest risk, and a
machinery. Obtaining quantitative failure data is run-to-failure strategy for failure modes with the lowest
problematic: published data is scant, reliability risk. For failure modes with risks between the
databases are available only to subscribing members of extremes, maintenance task types in the following order
an industry, manner of data collection is unknown, are considered: condition monitoring, planned
failure modes are not identified, etc. ABS decided to maintenance, combination condition monitoring and
take a qualitative approach by recommending frequency planned maintenance, any applicable and effective task,
ranges as shown in Table 4. As the FMECA is or one-time change. For hidden failure modes, failure
developed, we believe the team members can estimate finding tasks are specified.
failure mode frequencies based on events occurring Unlike other published task selection flow processes we
within their operating fleet or collective memory. We have included additional procedures as shown in the
expect that with time as failure data are collected continuation for Figure 4. These include a procedure to
electronically in maintenance and repair software, specify a maintenance task(s) to address all causes
quantitative data can be determined and compared with associated with the failure mode for evaluation. The
the estimated qualitative data in the initial analysis. risk is re-evaluated for the selected maintenance tasks
and any one-time changes associated with a failure
Step No. 6 – Select Failure Management Tasks mode. If the risk level meets the acceptance criteria, the
There are several RCM task selection flow diagrams in next failure mode is evaluated. If not, the maintenance
the literature (Ministry of Defence (UK) 1999, Moubray tasks and one-time changes are re-evaluated to seek a
1997, Naval Air Systems Command (USA) 2001, reduction in the risk to acceptable criteria. These
Society of Automotive Engineers 1999). ABS criteria would include: a reduction in or at least the
considered all of them and adapted the appropriate same level of risk compared to no maintenance or
features from them for application to the marine present maintenance tasks; the failure mode does not
industry. The RCM Task Selection Flow Diagram is result in the highest risk occurring.
shown in Figure 4.
Navigation &
Propulsion Maneuvering Electrical Vessel Service
Functional Communications
Functional Functional Functional Functional
Groups Functional
Group Group Group Group
Group
Control Systems
Barring
Interlock Air Starting System
Signal
Bridge Signal
Control
Bridge Air
Signal Torque
Governor
Crankcase Clean Vapor
Vapor System Oil Sludge to Sludge Tank
Speed
Control Starting
Signal Engine Air
Vapor Lube
RPM OIl
Freshwater
& Heat
Exhaust Gasses
Cool Freshwater Cool Freshwater
Atmospheric Air
Condensate
& Noise
Freshwater & Heat Freshwater & Heat
Instrumentation
& Alarms
Central Cooling Water System
Cylinder
Alarms Readouts Lubricating
Oil
Sewater &
Seawater
Heat
Directional
Example
Severity Control, Loss of
Descriptors for Explosion/Fire Safety (1)
Level Propulsion, Containment
Severity Level
etc.
Function is not No damage to
affected, no affected equipment Minor impact on
Minor, Little or no response
1 significant or compartment, no personnel/No impact
Negligible necessary
operational delays. significant on public
Nuisance. operational delays.
Function is not
affected, however,
failure
Professional medical
Major, detection/corrective Affected equipment
Limited response of treatment for
2 Marginal, measures not is damaged,
short duration personnel/No impact
Moderate functional. OR operational delays
on public
Function is reduced,
resulting in
operational delays.
An occurrence
Function is reduced,
Critical, adversely affecting Serious/significant
or damaged Serious injury to
Hazardous, the vessel’s commitment of
3 machinery, personnel/Limited
Major, seaworthiness or resources and
significant impact on public
Significant fitness for service or personnel
operational delays
route
Complete loss of
containment. Full
Loss of vessel or Fatalities to
Catastrophic, Complete loss of scale response of
4 results in total personnel/Serious
Critical function extended duration to
constructive loss impact on public
mitigate effects on
environment.
Notes:
1 Safety losses are not intended to be compared to other losses to determine monetary equivalency.
Improbable Fewer than 0.001 events or < 1 event per 1000 vessels per year
Remote 0.001 to 0.01 events or 1 event per 100 to 1000 vessels per year
Occasional 0.01 to 0.1 events or 1 event per 10 to 100 vessels per year
Is the failure mode risk in Yes Is there high confidence Yes Highest One-time change
the highest or lowest risk in the failure mode
required
categories? risk ranking?
No No
Lowest Specify run-to-
failure strategy C
B Select a cause for evaluation
Yes
No Specify planned
maintenance at the
appropriate life limit
No
Evident
Yes Specify failure-finding
Is there a failure finding task that is
Is there a task(s) that is applicable and applicable and effective?
task at the appropriate C
interval
effective?
Yes No
No
No
No
No
A
Reevaluate the maintenance tasks and
one-time changes for the failure mode.
Step No. 7 – Spare Parts Holdings If the operating context was changed to running one
An additional feature of the ABS RCM Program is a pump until maintenance was required, then operating
requirement for the selection of spare parts applying the spare pump until it is shut down for maintenance, a
risk principles. We adapted from Figure 14.1 of NES different conclusion could be reached. In this case,
45 (Ministry of Defence (UK) 1999) the diagram shown since the duty pump will have more operating hours
in Figure 5. We have shown the spare parts decision than the standby pump, the vessel operator would have
diagram with an example of determination of the spare cause to believe the frequency of the standby pump
parts for a fuel oil supply pump. As with the FMECA, being inoperable at the time the duty pump is shut down
the operating context of the equipment is an important to be much lower than in the example above. The
factor in determining spare parts holdings. In the higher availability of the standby pump could be
example, the two supply pumps are operated alternately confirmed by satisfactory failure finding tests over a
on a weekly basis so that after a period of time, both “long” period of time. For this case, provided spare
pumps will have roughly the same number of operating parts can be delivered to the vessel “quickly”, ordering
hours. The example indicates holding a bearing spare parts instead of holding them onboard may be an
replacement kit onboard can reduce the risk associated acceptable spare parts strategy.
with the vessel being out of service because of two
inoperable pumps.
Y es
No
No
R ev ise/R ev iew
R C M T a sk s
Example Operating Context and Analysis. A Fuel Oil piping system is provided with two fuel oil
supply pumps arranged in parallel redundancy. Each pump is sized so as to supply heavy fuel oil to
the main propulsion engine and two of the three diesel generator engines operating at their maximum
continuous rating. The pumps are operated as follows: the No. 1 pump is operated for one week at a
time with the No. 2 pump on standby. After one week, the No. 1 pump is secured and put on standby
and the No. 2 pump is operated for one week. Anticipated annual service hours for both pumps are the
same.
Step No. 8 – Sustainment Process The objective of the sustainment process is to:
Any successful maintenance program needs to be • Continually monitor and optimize the current
dynamic to address modifications to systems and their maintenance program
respective equipment and effects of aging for the life of • Delete unnecessary requirements
the machinery. A process for providing feedback is
necessary and is referred to as RCM sustainment. • Identify adverse failure trends
• Improve overall efficiency and effectiveness of and implementing operating safety measures are
the RCM and maintenance programs examples of interim actions.
ABS has listed several sustainment tools in the The results produced from reviewing the RCM
RCM Guide as an aid to the vessel owner/operator when analysis will be a factor that should be considered in
conducting the sustainment process. These are: determining a response to the failure. It is necessary
• Trend analysis that an RCM review be part of the overall methodology.
The RCM review and update, if required, will
• Maintenance requirements document reviews determine if changes in maintenance requirements are
• Task packaging reviews necessary. The review will indirectly aid in
• Age exploration tasks determining if corrective actions are necessary.
Decisions not to update the RCM analysis should be
• Failures documented for audit purposes. During the RCM
• Relative ranking analysis review, the following questions should be addressed:
• Other activities • Is the failure mode already covered?
For example, in the case of unexpected machinery • Are the failure consequences correct?
failure, ABS would recommend use of the Failures tool • Are the reliability data accurate?
based on Figure 5-5 of Naval Air Systems Command
(USA), Guidelines for the Naval Aviation Reliability- • Is the existing task (or requirement for no task)
centered Maintenance Process, NAVAIR 00-25-403 adequate?
(Naval Air Systems Command (USA) 2001). This • Are the related costs accurate?
process is illustrated in Figure 6. A root cause analysis When new failure modes or failure modes previously
is performed first to develop an understanding of the thought unlikely to occur are determined to be
failure and includes these steps: significant, the RCM analysis is to be updated. The
• Identifying the failure or potential failure existing analysis for a failure mode may also be
• Classifying the event and convening a trained determined to be correct or inadequate. Inadequate
team suitable for addressing the issues posed analyses can result for any number of reasons, such as
by this event revision of mission requirements, changes to operating
context or changes to maintenance procedures.
• Gathering data to understand how the event
happened Failures and other unpredicted events are available from
several sources, including the following examples:
• Performing a root cause failure analysis to
understand why it happened • Defect reports issued by maintenance
engineering or the vessel’s crew
• Generating corrective actions to keep it (and
similar events) from recurring • Defects discovered during routine vessel
repairs in a shipyard
• Verifying that corrective actions are • Vendor and original equipment manufacturer
implemented reports related to inspections, rework or
• Putting all of the data related to this event into overhauls
an information system for trending purposes • Design changes, which may be in the form of a
The failure may be addressed by corrective actions single item change or a major system
for which an RCM analysis is not necessary. Examples modification
of non-RCM corrective actions include technical • Results of tests (such as certification tests or
publication changes and design changes. tests performed during the course of a failure
The root cause analysis may reveal problems that investigation or some other unrelated event)
may need immediate attention. Issuing inspection that may require RCM review and update
bulletins, applying temporary operational restrictions
Root Cause
Failure
Failure Analysis
No
No
RCM Review
No
Document Results
Likelihood of Failure
Severity Level
Improbable Remote Occasional Probable Frequent
4 PR – 4, CR – 3 CR – 1
3 PR – 10, CR – 6 PR – 5, CR – 6 CR – 3
2 PR – 2 CR – 2
1 PR – 2 CR – 2
Risk Shade
High
Medium
Low
No.: Description:
Item Failure Failure H/E Effects Risk Characterization
Mode Char.
Local Functional End S CL CR
failure
No.: Description:
Item Task Selection
Proposed PL PR Disposition
Action(s)
Symbol Description
Failure Characteristic Enter failure description such as wear-in , random or wear-out
failure or combination
H/E Hidden failure/evident failure
S Severity level
CL Current likelihood (frequency) of failure
CR Current risk
Proposed Action(s) Proposed Maintenance for failure mode
PL Proposed likelihood of failure applying Proposed Maintenance
PR Risk after applying proposed maintenance
Disposition Note as to whether proposed maintenance will be applied
Severity Categories
Severity Level 1 Severity Level 2 Severity Level 3 Severity Level 4
Upper
0.02 0.2 0.366 0.013
Current Bound
Events/yr Lower
0.002 0.02 0.036 0.001
Bound
Upper
0.002 0.02 0.06 0.004
Projected Bound
Events/yr Lower
0 0.002 0.005 0.0
Bound
Upper
Frequency 0.018 0.18 0.306 0.009
Bound
Reduction
Events/yr Lower
0.002 0.018 0.031 0.001
Bound