Te 1477 Web
Te 1477 Web
Te 1477 Web
November 2005
IAEA-TECDOC-1477
November 2005
FOREWORD
The IAEA Safety Fundamentals publication, Safety of Nuclear Installations, Safety Series No.
110, states the need for operating organizations to establish a programme for the collection
and analysis of operating experience in nuclear power plants. Such a programme ensures that
operating experience is analysed, events important to safety are reviewed in depth, and
lessons learned are disseminated to the staff of the organization and to relevant national and
international organizations.
As a result of the effort to enhance safety in operating organizations, incidents are
progressively decreasing in number and significance. This means that in accordance with
international reporting requirements the amount of collected data becomes less sufficient to
draw meaningful statistical conclusions. This is where the collection and trend analysis of low
level events and near misses can prove to be very useful. These trends can show which of the
safety barriers are weak or failing more frequently. Evaluation and trending of low level
events and near misses will help to prevent major incidents because latent weaknesses have
been identified and corrective actions taken to prevent recurrence. This leads to improved
safety and production.
Low level events and near misses, which may reach several thousand per reactor operating
year, need to be treated by the organizations as learning opportunities. A system for capturing
these low level events and near misses truly needs to be an organization-wide system in which
all levels of the organization, including contractors, participate.
It is desirable that the overall operational experience feedback (OEF) process should integrate
the lessons learned and the associated data from significant events with those of lower level
events and near misses. To be able to effectively implement a process dealing with low level
events and near misses, it is necessary that the organization have a well established OEF
process for significant events.
The IAEA wishes to thank all participants and their Member States for their valuable
contributions. The IAEA officer responsible for the preparation of this publication was
H. Werdine of the Division of Nuclear Installation Safety.
EDITORIAL NOTE
The use of particular designations of countries or territories does not imply any judgement by the
publisher, the IAEA, as to the legal status of such countries or territories, of their authorities and
institutions or of the delimitation of their boundaries.
The mention of names of specific companies or products (whether or not indicated as registered) does
not imply any intention to infringe proprietary rights, nor should it be construed as an endorsement
or recommendation on the part of the IAEA.
CONTENTS
1.
INTRODUCTION ............................................................................................................ 1
1.1.
1.2.
1.3.
1.4.
2.
3.
Organization ....................................................................................................... 2
Policies and guidance ......................................................................................... 2
Knowledge and skills.......................................................................................... 2
4.
Overview ............................................................................................................ 1
Objective............................................................................................................. 1
Scope .................................................................................................................. 1
Structure.............................................................................................................. 1
4.6.
5.
INDICATORS ................................................................................................................ 16
6.
7.
PROGRAMME EFFECTIVENESS............................................................................... 19
7.1.
7.2.
7.3.
8.
Self-assessment................................................................................................. 19
Independent assessment including international organizations........................ 20
Audits................................................................................................................ 20
CONCLUSIONS ............................................................................................................ 21
1.
INTRODUCTION
1.1. Overview
It is well recognized that organizational factors have a major impact on the behaviour of
individuals and therefore on their performance in safety issues. Human errors are therefore an
indicator of the performance of an organization. Since most of the near misses and many of
the low level events are caused by human errors, they are a very valuable source for the
evaluation of organizational performance. A trend of the number of low level events may be
used as an indicator for the early detection of a change in organizational performance.
This publication is intended to complement the IAEA Safety Fundamentals on the safety of
nuclear installations [1] and other IAEA publications which address the improvement of the
safety performance of nuclear power plants [217]. It illustrates the advantages of using the
operating experience with low level events and near misses for the identification and
prevention at an early stage of degraded plant performance.
1.2. Objective
The objective of this publication is to provide guidance for the enhancement of operational
safety performance through the use of the low level events and near misses as an important
input to the operational experience feedback (OEF) process. In addition to improving
operational safety performance, the same methods can be applied to advance the overall
performance of the plant and organization.
This guidance is provided to assist organizations in how to deal with these types of events
not only to investigate individual items separately, but to perform a total assessment. This
assessment should include trend analysis to assist organizations in the prevention of declining
safety performance.
1.3. Scope
This publication is intended to aid organizations in collecting, evaluating and trending low
level events and near misses. This information can be used for establishing and/or enhancing
their current system of operating experience evaluation regarding these events. In addition,
this publication can be used as a reference for regulators.
1.4. Structure
This publication contains eight sections and six appendices. The operating organization, and
the policies and knowledge necessary for an efficient use of information derived from low
level events and near misses are discussed in Section 2. Additional information on detecting
and reporting criteria on low level events and near misses is given in Section 3. Section 4
presents the process of using operating experience feedback for systematically addressing low
level events and near misses. Section 5 addresses indicators, followed by internal and external
lessons learned in Section 6. The need to verify programme effectiveness is addressed in
Section 7, which also highlights the need and possibilities for conducting self-assessments
and independent reviews of the OEF process. Conclusions for this publication are provided in
Section 8.
2.
2.1. Organization
The OEF process, which should include the low level events and near misses, should have a
structure and organization in which functions and responsibilities are clearly defined. Human
and financial resources should be adequate. Tools and equipment should be user friendly and
should allow the organization to process and analyse a possible high number of reports and
present results in a proper manner.
2.2. Policies and guidance
The OEF process that includes low level events and near misses should take into account the
national legislation, regulatory requirements and good national and international practices.
The regulatory requirements should be regarded as minimum criteria.
A clear policy should commit staff and management to an ongoing effective prevention of
events. Written guidance should exist to document the process, e.g., how internal event
information, including low level events and near misses, is processed and staff is informed of
trends and associated corrective actions. The formal integration of the OEF process into the
organization should be clear, and formal communication links should exist with internal and
external organizations.
2.3. Knowledge and skills
The OEF process staff should be sufficient in number, experienced, well qualified and capable
of effectively performing associated activities. Qualification should include excellent
knowledge of the plant (components and systems). These qualifications should also include
training in the OEF process, analysis of data from low level events and near misses, principles
of safety culture, human and organizational factors and root cause analysis.
The use of a permanent or temporary multidisciplinary group of personnel should be
considered in composing the OEF group. In this case, professionals from different areas are
qualified to investigate and derive results from reported low level events and near misses.
The staff of the organization, mainly operators and maintenance and technical support (e.g.
engineering, chemistry and radiation protection) personnel, should possess the knowledge and
skills required to identify and report low level events and near misses but also to understand
the benefit provided to the organization when the results of trends are acted upon.
3.
Reporting requirement
Event severity
Significant events
(few)
To International Community
Consequential Events
(several)
To National Regulator
Within Utility
Currently lost
learning opportunities
In addition to showing event severity, Fig. 1 illustrates reporting requirements. The figure also
shows an existing threshold for identifying low level events and near misses. The events
below this threshold can be viewed as currently lost learning opportunities.
To better identify the threshold between significant events and low level events and near
misses, Fig. 2 illustrates the relation between an identified deficiency and its consequence.
No consequence
Low consequence
YES
Identified
deficiency
Impact plant
operation or
safety
YES
Significant
event
Safetyrelated
NO
NO
Near miss
High consequence
Low level
event
Significant
event
FIG. 2. Relationship between near misses, low level and significant events.
3.2. Strong reporting culture
It is generally agreed that an essential element of the safe operation of nuclear power plants is
having a strong and continuously improving safety culture. The IAEAs INSAG series
publications on Safety Culture [10], Defence in Depth in Nuclear Safety [11], Basic Safety
Principles for Nuclear Power Plants [12], Management of Operational Safety in Nuclear
Power Plants [13] and Key Practical Issues in Strengthening Safety Culture [14] provide
discussions of the universal features and tangible evidence of a strong safety culture.
Those organizations that recognize the need for a strong safety culture and continually strive
to improve will generally find and correct their problems before they develop into serious
performance issues. However, it remains a difficult task to recognize, evaluate and correct
low level events or deficiencies before they deteriorate safety performance.
The management of a nuclear power plant should establish a policy and reinforce a culture of
open communication between management and plant staff. This policy should encourage staff
to continuously look for ways to minimize human error and improve the quality of work and
plant performance. Open communication means that problems are brought to light and are not
minimized. In order to make this possible, an atmosphere of mutual trust and confidence shall
be established, maintained and supported by a blame- and sanction-free culture.
In order to become proactive and maintain control of emerging problems, management must
be aware of what problems are developing. Management involvement should be visible, and
cannot stop at the identification of problems, even if minor. Only then will emergent problems
be anticipated and thus prevented by systematically examining trends and symptoms.
In the environment of a continuously improving safety culture, low level events, small
degradations and near misses are discovered and reported by all organization personnel. Plant
staff should understand and believe that the self-reporting of errors will not have negative
consequences for the reporting person, that the information gained will never be misused for,
e.g. the assessment of individuals, and that no punitive action will be taken where there was
no violation or malevolent interest.
Often, human errors are immediately corrected by the person who has committed the error.
Therefore these errors may no longer be accessible for analysis if they are not reported, and a
wealth of information is lost.
There are two major advantages in using this information:
Since nothing serious has happened in a near miss or a low level event, a free discussion
about the origin of the error is possible.
The person who has committed the error may have knowledge about the causal factors.
4.
A systematic approach to the use of information on low level events and near misses is
required to ensure consistent improvement in operational safety and overall performance.
Application of this approach involves the processing of an increased number of reported
events, which includes coding and trending of data generated by the analysis of low level
events and near misses (e.g., situational context, causes, consequences, means of detection).
This analysis allows for the validation of potential trends with common characteristics and/or
underlying organizational weaknesses (see Fig. 3).
Trends of Low level
Events and Near
Misses
Low level events
REPRESENTS
100
90
Proactive actions
resulting from trend
analysis
80
70
60
50
Coding
B C
A. Communication problems
B. Procedures deficiency
Trend analysis
Effectiveness of
Organizational
Programmes and
Processes
Reduction of
significant events and
improved overall
safety and plant
performance
C. Qualification of worker
FIG. 3. Demonstration of the use of low level event and near miss data to improve the
organization through reduction of significant events and improved plant performance.
4.1. Preliminary analysis and screening
Screening of event information is undertaken to ensure that all significant safety relevant
matters are considered and that all applicable lessons learned are taken into account. Not all
events require a full root cause analysis. For low level events and near misses, categorization
(coding) of the events should be completed by a designated experienced person, without a
need for in-depth analysis. The screening process should select events for further detailed
investigation and analysis. This should include prioritization according to safety significance,
and recognition of adverse trends. Only in the case of more significant or complicated events
is additional analysis and detailed investigation necessary.
Due to the fact that many of the basic causes of events contain an element of human factor
issues, event information should be collected and acted upon in a timely manner. The quality
of screening depends on the use of highly experienced and knowledgeable personnel,
including those with a specific knowledge of human performance and individual and
organizational behaviours. Individuals without the appropriate knowledge and skills might
render the data unusable or cause corrective actions to be developed for problems that may
not exist.
4.2. Event processing and coding
All events, independent of their consequences, should be coded and collected in the event
database. In order to deal efficiently with a large number of events, the event information
should be properly assessed and coded to include important facts from the event (status of the
plant, date, time, involved equipment and persons) and information about the causes of the
event. One possible, widely used coding system based on event causes is described in
Appendix III.
The coding of events, i.e. giving the correct attributes (codes) to each event record, is an
essential part of the information flow, since this step reduces the event information to the few
essential characteristics that are used for further analysis. Proper coding is the basis for a
valuable set of data. This structured information is fed into the event database, which
simplifies further analysis.
The main point of using codes for trending is to allow for the identification of specific aspects
on which the organization could act. Evaluating low level events and near misses for
organizational factors and then applying organizational factor codes allows for the detection
of organizational weaknesses through adverse trends.
Special attention should be given to ensure that events are coded consistently by different
individuals; this is known as rater reliability. Good rater reliability will minimize variation
within the data and help to provide valid analyses and appropriately focused corrective
actions.
4.3. Data collection system
The operational experience database should include information based on accessible sources.
The following guidance is given to develop a structure of information sources for low level
events and near misses: Data collection should include all deficiencies and/or events,
including the related organizational and human factor issues. Equipment failures and other
such deficiencies may also be reported in other systems, such as a work management system
for maintenance. If they are reported in another system, these systems should be linked
together. For example, the OEF database could have related fields that reference other data
collection systems. All data collection systems within the plant should use a consistent set of
codes. Additional qualities of a good data collection system are described in Appendix IV.
4.3.1.
reports and data from operation, maintenance, testing, quality assurance, etc.;
event reports provided by other plants in the same country, from countries with plants
that have the same vendor, and at international levels from organizations such as the
World Association of Nuclear Operators (WANO);
experiences of other utilities with their safety programmes (e.g. quality assurance,
ageing, surveillance);
A single low level event or near miss is not safety significant. An accumulation of low level
events in the same area or with a similar pattern, however, may indicate a lack of or a failure
in a programme. While the event itself is insignificant, the deficiency in the programme in
question may be a more serious issue, which has to be analysed and corrected. Such a failure
would not be detected by single low level events; only the accumulation and/or a trend may
indicate a need for appropriate action. An effective process must have management
involvement and be proactive instead of reactive in nature.
4.4.1.
The organization can accomplish additional analysis by correlating data and identifying a
clustering of certain causes, consequences and detection mechanisms (see Fig. 3) in the
different areas. Declining trends indicate that there might be a generic problem that has not
been detected. One effective tool to support trend analysis is the barrier analysis technique.
Appendix V contains several techniques for analysing data, such as Pareto analysis, Pareto
charting, confidence factors and control charts. These techniques are not only available
methods, but represent proven techniques for analysing data. The mathematical formulas
presented might be too stringent and limiting, depending on the circumstances of the trend
analyses. Trend analysts are encouraged to discuss and verify any questionable or doubtful
trend results and conclusions with the appropriate knowledgeable parties. This discussion may
identify any special causes or contributors. The results of such an analysis are:
on a physical barrier level: the identification of failed or missing barriers that were not
recognized as important in the analysis of individual events;
Basically a trend analysis shows the strength of the concept of defence in depth, and the care
that is taken to keep the concept of defence in depth effective. The following information is
useful for trending:
the means and the persons through which latent failures were detected (this allows the
identification of groups of persons who are specifically sensitive to certain types of
failures, and the identification of effective detection means) [18];
steps in the work process (this may give an indication of steps in the work process that
need a better surveillance or pre-job briefing) [19]; and
information related to organizational factors that may be specific to the plant and that
may not be listed in the coding list in Appendix III.
Appendix I contains case studies of the evaluation of operating experience illustrating the
necessity and usefulness of total impact assessment. One important step is the application of
an appropriate trending programme. One example of such a trending programme including a
categorization system is given in Appendix III, illustrated with examples of low level events
and near misses.
Corrective actions resulting from event analysis1 and trending should be defined and
implemented, and should address not only the specific causes of particular events but also
common problems identified by trend analysis. The final stage of any OEF process should be
assessment of the corrective actions effectiveness, which is done in accordance with
established criteria. Recurrence of events is an important indicator in the evaluation of the
OEF process effectiveness.
Figure 4 provides detailed guidance for the screening, investigation and trend review
necessary for a comprehensive process dealing with the information captured through the
operating experience feedback programme.
4.5. Cause analysis
The screening process determines events that need a deeper analysis. The IAEA Safety
Requirements for the operation of nuclear power plants [2] state in para. 2.21 that Operating
experience at the plant shall be evaluated in a systematic way. Abnormal events with
significant safety implications shall be investigated to establish the direct and root causes. The
investigation shall, where appropriate, result in clear recommendations to the plant
management, which shall take appropriate corrective action without undue delay. Information
resulting from such evaluations and investigations shall be fed back to the plant personnel.
Effective event analysis results in the determination of causes and failed barriers. There are
many well known methods for the analysis of determining the causes of an event. The
following paragraphs provide short descriptions of some of these methods.
The analysis of any event should be performed by an appropriate method. It is common
practice that organizations regularly involved in the evaluation process use standard methods
to achieve a consistent approach for the assessment of all events. These standard methods
normally make use of different techniques. Each technique may have its particular advantages
for cause analysis, depending on the type of failure or error. Therefore, it is not possible to
recommend any single technique. The use of one or a combination of techniques in event
analysis should ensure identification of the relevant causes and contributing factors2 which aid
in the development of effective corrective actions.
Event analysis is the process that is used to identify the root causes and contributing factors. It is an analysis of
an event or condition to determine what events and conditions led to the outcome, as well as a determination of
how to prevent an undesired outcome and obtain the desired result.
2
A contributing factor is an event or condition that is not directly responsible for the problem but whose
existence has complicated the problem or made the consequences more severe than if only the root cause had
existed.
10
Internal information
External information
Operating experience
Initial
screening requires
further
investigation?
no
Database
Detecting
trends
yes
Understanding event
and implications
In-depth
analysis
necessary
no
yes
Perform appropriate
analysis
Criteria/
Barriers
exceeded?
no
yes
Generic
implications,
corrective actions
11
4.5.1.
This method of event analysis is performed by describing the problem, asking the question
why? and asking the question again of each successive answer until the question can no
longer be answered meaningfully (endpoints are reached). Using cause and effect analysis,
broken barriers are identified and analysed. If displayed in a flow chart, the results proceed
from right to left in reverse chronological order.
4.5.2.
Change analysis
This method of event analysis compares a successfully performed activity to one that was
unsuccessful. The method consists of developing a matrix to address who, what, when, where,
and the associated conditions that existed when the problem occurred. Then these elements, as
they existed before the problem occurred, are compared to those after its occurrence. Known
differences can be identified and evaluated to identify causes of the problem. This method
also is used to provide input for a cause and effect or event and causal factors analysis.
4.5.3.
Events and causal factors charting was developed as a way to investigate accidents. The
primary purpose of this method is to provide the entire background behind a problem.
Answers to who, what, when, where, why and how are developed in pictorial form using a
variety of symbols. The method consists of developing a sequence diagram using event,
causal factor and condition symbols that describe chronologically how the problem developed
and occurred.
4.5.4.
The HPES process addresses the many ways in which human performance factors affect
personnel when their actions cause, or contribute to, a problem. Selection of the correct causal
factors for a human performance problem is dependent upon determining the internal and
external factors that affected the behaviour of the performer. The method is implemented by
completing a number of survey forms, which inquire and categorize human performance
problem conditions, causal factors and contributing factors. The survey forms by themselves
are not a stand-alone method. The survey generally is used to identify conditions that can be
input into cause and effect analysis or an event and causal factors chart. This investigative
method was developed by the Institute of Nuclear Power Operations (INPO).
4.5.5.
Task analysis
Task analysis is a tool that can be used to clearly identify all facets of a particular problem
related activity. Conducting a task analysis typically includes re-enacting an unsuccessful
activity step by step to determine why the activity failed. Individual tasks that were performed
are analysed either on paper or by observing a re-enactment of the task, preferably by those
who were involved in the original problem. Task analysis normally is performed to provide
input for cause and effect or event & causal factors analysis.
4.5.6.
Barrier analysis
A method closely connected to the concept of defence in depth and specially designed to
analyse and evaluate failed barriers is barrier analysis, which is described below.
12
4.5.6.1.
Barriers are designed and established to prevent the propagation of an unexpected, undesired
situation into a more severe situation. The concept of defence in depth is the placement of
multiple barriers to protect personnel, equipment and the environment, and to enhance the
safety and performance of humanmachine systems. Barriers can be either physical or
administrative, such as system interlocks, locked doors and valves, and automatic protective
systems. Physical barriers prevent inappropriate actions by design. They will always work
unless they are misused, bypassed or allowed to degrade. Administrative barriers can be
grouped into hard administrative barriers which enforce a desired behaviour, such as
procedures, checklists and administrative controls, and soft administrative barriers which
promote the desired behaviour, such as training, communication and supervision.
Administrative barriers generally work by requiring or promoting desired actions or by
discouraging undesired actions. They can also be used to detect or compensate for
inappropriate actions or conditions. The failure of one barrier has usually, because of the
concept of defence in depth, no consequences. For an event to occur, it is necessary that
several barriers fail in a series path (Fig. 5). If one barrier remains intact, it will prevent or
mitigate the consequences of the event.
4.5.6.2.
Simple logic tells us that every causal factor that affected, or allowed another contributory
causal factor to affect, a safety target, is an indication of a barrier that did not exist or failed.
Barrier analysis can therefore be of assistance in conducting causal factor analysis. When
conducting root cause analysis, we are likely to focus on the last barrier and not on all barriers
that failed. Subsequently our corrective actions focus on prevention of the specific event and
not on the prevention of similar events. To establish effective corrective actions, we must
track and trend all the causal factors, i.e. all barriers that failed.
13
4.5.6.3.
Corrective actions should focus on addressing the causes, and should be incorporated into the
organizations corrective action process or programme. Subsequent follow-up should be
conducted to verify that the adverse trend has been corrected, or to modify the original
corrective actions.
4.6. Corrective actions
Actions taken in response to events constitute the main objective of the OEF process to
enhance NPP safety. They are aimed generally at correcting a situation, preventing recurrence
or enhancing safety. The safety significance of an event determines the depth of the cause
analysis needed and, subsequently, determines the type, the number and the time limit for
implementation of corrective actions. It should be noted that the designation of safety
significance can be changed during the analysis of the event. The regulatory body should be
kept informed of any such changes so that it can fulfill its duties and responsibilities, e.g.
making information available on incidents.
4.6.1.
14
bodies. One element of the screening process carried out centrally or at plant level should be
to consider the applicability of corrective actions taken by other plants in response to an event
investigation. Where such a corrective action is screened and found to be relevant, it should
be included in the plants own corrective action plan. A number of important factors should
be taken into account when determining corrective actions. These should include the need to
restore or maintain the desired level of nuclear safety, to address human and organizational
factors, and to consider the overall impact of the action on existing documentation and
operational aspects and the changes to the planning and scheduling of work and/or of the
individuals assigned to particular duties.
Corrective actions should be compared against actions that are in progress or are part of other
action plans. Generating too many actions may overwhelm the intended beneficiary and leave
some important ones outstanding for too long. Corrective actions can be either immediate,
interim or long term necessitating detailed evaluation. Examples of immediate actions are
measures to recover from a plant transient or to isolate contaminated areas.
4.6.2.
A tracking process should be implemented to ensure that all approved corrective actions are
completed in a timely manner, and that those actions with a long lead time to completion
remain valid at the time of their implementation in the light of later experience or more recent
discoveries. A periodic evaluation should be carried out to check the effectiveness of actions
implemented. Primarily, implementation and tracking of corrective actions should be
performed by the NPP management. The regulatory body may monitor the progress of certain
recommended actions. This may be done by requiring plants or operating organizations to
provide periodic progress reports.
In addition to the documentation and tracking of actions associated with each single event, a
systematic compilation of actions should be made which may, taken over a number of years,
provide an information base of lessons learned. When these actions are compiled and sorted
on the basis of the systems affected or safety issues raised, they can serve as solutions for
similar problems which may arise in the future, or at other plants.
15
5.
INDICATORS
The plant should have indicators of operational performance and the effectiveness of
corrective actions taken in response to identified deficiencies. Most nuclear power plants
collect and publish a standard set of performance indicators such as radiation exposure,
number of unplanned reactor trips, forced outage rates, plant availability, human performance,
ratio between number of low level events and total number of events, number of corrective
actions delayed and so forth.
Numerical indicators to monitor operational performance are used by operators and regulatory
bodies worldwide, and these existing indicators should also be used for trending low level
events and near misses where possible. However, since these performance indicators are
measured at such a high level, by the time a negative trend in performance is recognized the
plant may be on its way to declining performance. A key contributor to this declining
performance is the fact that these high level performance indicators may fail to identify
organizational weaknesses causing the decline in performance.
Therefore, it is important that nuclear power plants have the capability to trend, analyse and
recognize early warning signs of deteriorating performance. It is necessary that plants be
sensitive to these warning signs, which may not be immediately evident.
Experience has shown that a relationship exists between those events affecting nuclear safety,
plant performance, plant reliability and individual events that have no significant impact on
plant performance (see Fig. 1).
Many of the human performance challenges that affect the strength of barriers, which in turn
affect the margin to safety, are classified as low level events. However, continuously
challenging the barriers that keep these events at this low level will eventually result in
failure. Some examples of organizational issues are given in Appendix I. The knowledge that
minor events are precursors of more significant events indicating missing barriers or failures
in barriers, and the fact that human errors continue to occur, should encourage the
management of nuclear power plants to systematically collect data on human errors in low
level events3 and near misses.
Some examples of indicators for monitoring low level events and near misses are given
below:
Low level industrial safety events (not resulting in lost work time),
Number of low level events and near misses related to human error,
A low level event is an undesirable occurrence or series of occurrences with minimal consequences which do
not reach the threshold of a significant event.
16
17
6.
Lessons learned from the analysis of the OEF process are an important component of a plant
safety management programme. Even though organizations are performing at high levels of
safety and reliability, there still remains a need for learning from a dynamic operational
experience process to continue to improve nuclear safety, plant performance and quality.
A good organization is a learning organization, willing and able to learn from its own
experience. Operational events are learning opportunities but the events affecting safety are
infrequent, as national and international records show. This positive fact may also result in
complacency, which could lead to a loss of attention, which in turn may result in a loss of
safety awareness and staff competence. Vigilance may decrease if staff is seldom triggered
by the occurrence of events4.
Therefore, organizations ignoring low level events and near misses do not fully benefit from
the OEF process. By taking lessons from low level events and near misses, the plant staff can
prevent the development of significant events at early stages.
It is important to integrate the analysis of low level events and near misses into the overall
event analysis and use these results to identify existing precursors to the infrequent significant
events. For example, during the root cause analysis of a significant event, the data associated
with low level events and near misses should be assessed for any common cause(s).
6.2. External lessons learned
The use of external operational experience allows each organization to learn from the
experience of other organizations and implement corrective actions to preclude similar events
from occurring at their nuclear facilities. The following aspects should be considered:
A systematic review of significant external information, along with trends from low
level events and near misses, should be made;
Personnel qualified to determine applicability to the NPP should perform the review;
A periodic assessment of the use of the external OEF process should be conducted to
monitor effectiveness.
The process of learning from external operational experience should include national and
international experience. National (e.g. INPO, VNIIAES-Rosenergoatom) and international
organizations (IAEA, NEA and WANO) provide a system of information exchange and
recommendations for significant events.
In this publication: any variance, such as an operational deviation, abnormality, equipment failure or human
error, or procedure non-conformities.
18
7.
PROGRAMME EFFECTIVENESS
A periodic review of all stages of the OEF process should be undertaken to ensure that all of
its elements are performed effectively. Continuous improvement of the OEF process should
be the objective of the review. An effective OEF process can significantly contribute to
minimizing the recurrence of events. In general, there are three approaches to undertaking
such a review: a self-assessment by the operating organization, independent assessments, and
audits. These reviews may assist in identifying the early signs of declining performance.
Examples of early signs of declining performance can be found in Appendix VIII.
7.1. Self-assessment
The operating organization should periodically review the effectiveness of the OEF process.
The purpose of this review is to evaluate the overall process effectiveness and to recommend
remedial measures to resolve any weaknesses identified. Indicators of process effectiveness
should be developed. These may include the number, severity and recurrence rate of events,
causes of different events, etc. This self-assessment review should also:
(a )
verify that corrective actions arising from the OEF process are implemented in a timely
manner;
(b)
evaluate the effectiveness of solving the original problems and preventing recurrence;
and
(c)
review recurring events to identify whether improvements in the OEF process can be
made.
The operating organization should issue a periodic report, at least annually, which
summarizes the activities performed which consider the framework of the OEF process. Such
a report should list the internal and external experience that has been analysed, the corrective
actions approved and the status of their implementation. A target completion date should be
assigned for those corrective actions still under way.
In order to provide an effective OEF process, and improving safety culture and plant
performance, it is desirable that the organization have in place a systematic self-assessment of
operational experience. The findings of the self-assessment process may range from
significant to low level events and near misses. The self-assessment process should also
include a feedback mechanism to correct any self-assessment deficiencies.
The self-assessment process should permeate through all levels of the organization by being
an integral part of the work pattern. In scope, it should cover all areas important to safe
operation.
Experience has shown that the establishment and implementation of the self-assessment
process are not sufficient steps in themselves to ensure its success. Success depends on
continued application of well established principles and on maintaining the self-assessment
process. Improving operational performance is a relatively slow process with no shortcuts.
The detailed information on self-assessment scope, methods and international practices is
available in Ref. [8] and in the IAEA Safety Guide: A System for Feedback of Experience
from Events in Nuclear Installations [9]. The IAEA guidelines for PROSPER [10] also
provide instructions and mechanisms for a sound plant self-assessment programme.
19
Although a systematic self-assessment process is very useful for monitoring current trends in
operational performance, it does not guarantee that all operational areas are properly covered
and the self-assessment complies with the best international practices. That is why
independent assessments should be made on a periodic basis.
There are several possible options to conduct independent assessments of operational
experience and the effectiveness of the plants self-assessment process, such as peer review
and technical review missions. Examples of independent assessments are the IAEA
PROSPER and OSART missions/peer reviews and WANO peer reviews.
The main purposes of a peer review are to determine whether the OEF process meets
internationally accepted standards and whether the plants self-assessment is effective and
comprehensive, and to identify areas for improvement. For example, the peer review should
compare the OEF process of an operating organization/licensee with guidance and equivalent
good practices elsewhere.
7.3. Audits
An experienced group not directly involved in the OEF process itself should audit the OEF
process at regular intervals. This audit team is usually made up of quality assurance staff
belonging to the same operating organization. A good practice would involve at least one
member from a different organization.
The OEF process may also be periodically reviewed by external audit or inspection, e.g. by
the regulatory authority or external organizations. The regulatory body may consider the OEF
process, including low level events and near misses, as an item for regulatory inspection.
20
8.
CONCLUSIONS
Industry experience has shown that the precursors of a significant event5 are present long
before the significant event occurs. The precursors are advance messengers of the event and
must be embraced by the utility to prevent the more significant events. Many of the precursors
present as degrading safety culture and degrading plant material condition. These degraded
conditions are often the result of ineffective change management, i.e. implementation of
staffing cuts without fully evaluating the risk and impact associated with this action. As the
workload increases, the ability of personnel to focus on the task at hand decreases. This lack
of focus will start to surface as minor problems or events that have no real impact on the
plant. Left unattended, these minor problems or events will accumulate and contribute to more
significant events or failures.
Effective trending and analysis will provide early identification of the accumulating less
significant, low impact events (low level events and near misses) and provide the opportunity
to take effective corrective actions prior to the occurrence of more significant events.
It is strongly suggested that nuclear power plants increase the use of feedback from low level
events in their day-to-day activities, as this is an important contributor in improving safety
performance. There are strong indications that timely corrective actions on trends of declining
performance help to avoid further degradation (i.e. occurrences of safety significant events).
Significant events are defined as events that are consequential and in most cases have an impact on safety. The
level of event considered significant varies between nuclear power plants and regulatory bodies.
21
23
MANAGEMENT
OPERATIONS
MAINTENANCE
PROGRAMMES AND
PROCEDURES
TRAINING
EQUIPMENT
FAILURE
BRIEF DESCRIPTION
TYPE
not
clearly
FAILED
for
trending
Maintenance programme
equipment failure history.
Procedural adherence.
Management expectations
defined, no supervision.
Equipment unavailable.
Spreading
of
low
levels
of
contamination and a small increase in
internal radiation uptake by individuals.
CONSEQUENCES
AND
CORRECTIVE
ACTIONS
LESSONS LEARNED
EXAMPLES OF FACTS ARISING FROM LOW LEVEL EVENTS AND NEAR MISSES
APPENDIX I.
24
Maintaining
good
configuration
management of the plant is essential to safe
operation. The plant on paper must
represent the physical plant and must be
supported by safety analyses.
There is a need to convey to workers the
importance
to
safety
of
the
system/equipment they are working on, as
well as to provide adequate training. This is
an example of non-conservative operation.
Worker safety.
PROGRAMMES
AND
PROCEDURES
TRAINING
OUTAGES
self-check
the
and
self-checking
of
Pre-job
briefing,
supervision.
MAINTENANCE
Reinforcement
programme.
OPERATIONS
LESSONS LEARNED
MANAGEMENT
POTENTIAL CONSEQUENCES
BRIEF DESCRIPTION
TYPE
I.2. Examples of significant events that could have been avoided with corrective actions
developed from the analysis of previously identified low level events and near misses
This section attempts to illustrate, through real life examples from operating experience, that
declining safety performance can be detected and corrected at an early stage through the use
of low level events and near misses as amplifiers of weak signals of emerging problems. The
five examples provided show significant events from different plants that, retrospectively,
could have been avoided with earlier corrective actions ensuing from previous low level
events. Nevertheless, they still were acted upon efficiently when put in perspective with the
information available from those low level events, enabling an amplification of the root
causes underlying them.
I.2.1. Example 1
Breach of containment following maintenance activities on the equipment and emergency
airlocks
Event description
Following a pressure test for the emergency airlock, it was required that a blank flange be put
on a pipe going through the airlock. The purpose of the blank flange was to maintain the
airlock containment function. This pipe was used to pressurize the airlock with air from a
flexible joint linked to the plant air system. Since the procedure did not specify the location of
the flange, the mechanic reasoned it was to be attached to the air supply line (a logical thing to
do, since he normally put blanks downstream of air valves). So he re-opened the valve,
confirmed that the air was coming from the flexible joint, and installed the blank on the air
supply line. With the air supply now disconnected from the airlock and the blank flange in the
wrong location, the containment was breached. The situation existed for about 15 hours, and
each opening of the airlock door on the reactor building side was creating a breach of
containment. This was detected when the mechanics discussed the job on a shift turnover.
Another mechanic had performed the job before, so was aware of the correct installation and
the consequence of not reinstalling the blank correctly. The information was immediately
communicated to the shift supervisor and the situation was corrected. The total unavailability
of the containment function of the reactor building was about 30 minutes (18 openings of
about 100 seconds each).
Two days later, another pipe modification was required in preparation for the reactor building
pressure test that was planned for two weeks later in the coming outage. The job was to install
a tee downstream (auxiliary building side) of an air supply valve connected to the equipment
airlock. Because the work documents did not specify clearly what the tee looked like or where
it should be installed, there was some confusion as to the expected installation. The tee the
planners intended to be installed was in fact a long pipe with two outlets that were connected
to the ventilation system. The workers understood from the evaluation of the existing
configuration that the tee was to be installed where an existing spool piece was installed
upstream (containment side) of the valve connected to the airlock. They contacted the system
engineer to say that the tee was missing and that they needed a drawing. No drawing was
found, and because there was no particular requirement (other than holding air at 125 kPag),
the system engineer suggested they fabricate a new tee and install it. The workers did
manufacture a tee, as they understood it, to replace the spool piece, and installed it. Again, in
this situation, an opening of the airlock door on the reactor building side was causing a breach
of containment. The situation was detected after 2 hours by an operator who heard about the
other event on the emergency airlock and was puzzled by the installation he saw on the front
25
of the airlock. He followed the piping and found the anomaly. The situation was corrected
right away.
Consequences
The immediate consequence was to create two breaches of containment for a total duration of
about 40 minutes. Even if the reliability target is set at 10-3 y/y (8.8 h/y), this was considered
as a serious breach in safety provisions. Another potential consequence if the mistakes had
passed undetected is that, during the high pressure test of the reactor building (done at
125 kPag), it would have been impossible to enter the reactor building in an emergency,
invalidating some scenarios credited in outage safety analysis.
Causes
The main cause identified for both events is that the procedures used for the job were not
precise enough to fulfil the needs of a new user. These procedures were used adequately for
years because normally the job was done with qualified personnel who had done the job
before. In this event, however, the two mechanics, even thorough experienced, were not
familiar with the task. They demonstrated a good questioning attitude, but the
communications were ineffective. They started the job without any pre-job briefing, as the
task was not identified as critical. In the second case, the work authorization described a
modification on the air system (not the containment system), so no need for a particular
qualification was identified by the shift supervisor.
In numerous events that occurred in the past two years, it appeared that procedures that had
been used for years suddenly seemed to be less than adequate, because new personnel or
personnel not familiar with the task had difficulty to perform optimally with those documents.
The lack of personnel with proper training/qualification/knowledge of the task was creating a
breach in the first line of defence; this weakness highlighted a second weak barrier not
identified previously (written procedures). The tendency was significant enough that, in a
trending exercise, it was underlined as a precursor to a more serious event. In the past two
years, 35 event analyses (on ~300 events) identified procedures as the direct cause or
contributing factor. Had this been earlier recognized, pre-job review of the procedures with
knowledgeable personnel could have precluded the breaches of containment.
Corrective actions
Generic corrective actions that are suggested by these low level events, which could have
prevented this significant event and that were finally taken after the event, are: systematic prejob briefing before executing a safety system procedure for the first time (to mitigate the
procedure problem); training/qualification on a special safety system; revision of some
procedures; and supervision.
26
I.2.2. Example 2
Electric cable fire while drying a filter in the reactor building
Event description
Following concerns about tritium release in the low level waste storage area, it was decided
that all the filters coming from heavy water systems (coolant and moderator) should be dried
before being sent to the low level waste storage area. A cask and a dryer were designated for
this activity. The filter was successfully dried, but it took more than 10 days. To accelerate the
drying process for subsequent filters, the heater power was to be increased. Workers were
requested to install equipment with increased drying power. However, no precise guidance
was provided.
The new arrangement consisted of a larger cask and heaters with a total power level of 2 kW.
The previous heaters had provided only 200 W. The new cask completely covered the heaters.
The workers did place spacers between the heaters and the cask but did not provide for
significant airflow. A flexible air duct was provided to ventilate the top of the cask, which
was connected directly to the reactor building ventilation system. The installation was then
left to be used for filters, without any surveillance requirements established.
After a few hours, tritium levels were rising at the gaseous effluents monitor. The shift
supervisor directed that the drying activity be stopped and the flexible air duct disconnected.
He did not know that there were heaters under the cask. Because of the continued heat input,
the filter dried out and spread dry particulate contamination. A radiological emergency (not an
emergency action level) was declared and the heaters were unplugged. After gaining an
understanding of the installation and of what had occurred, the shift supervisor asked for a
redesign of the installation to make sure that heavy water vapour would go through the
recovery system instead of directly to the ventilation (reducing any release of tritium outside,
which was in fact the initial concern). After some modification, the drying was restarted. As
radiological protection workers were cleaning, one of them noticed that there was a smell of
something overheating. It was explained to him that it was normal because it was the filter
that was drying.
After a couple of hours, some flames were seen coming from underneath the installation and a
fire emergency was declared. Once inside, a radiological protection technician evaluated the
radiation field and it was judged acceptable to go near and try to extinguish the fire. After two
unsuccessful attempts, they decided to remove the cask from the heater. After this was done,
they succeeded in extinguishing the fire. It seemed that the fire was of electrical origin. The
cable powering the heaters was overheated by the heaters and caused the electrical insulation
to burn.
Consequences
There were no serious consequences because there was no combustion loading in the room.
Nevertheless, the spread of contamination could have been significant if the filter had caught
fire.
Causes
Numerous causes were identified in this sequence, but the main high level contributors were
lack of supervision, no pre-job briefing and no co-ordination. The absence of a procedure
27
covering all steps of the job was considered a failed barrier, particularly since this was not a
routine task. Responsibilities for each of the steps of the job were not known or defined.
Multiple low level events seemed to point to a significant decrease of supervisory activity
since the last re-organization, leading to lack of organizational clarity in new or infrequently
performed activities or in activities involving more than one unit. In addition to that, clear
expectations were frequently missing and pre-job briefings were not conducted. Before this
event, trending on low level events and licensee event reports identified supervision of
activities as one major area where improvement was needed. Some facts encountered were:
thermal treatment done without the presence of a fire watch; workers from an outside
company who did not know the fire protection requirements; a work area left without fire
detection while the automatic fire detection system was disabled; a work supervisor who had
to leave the job site and did not transfer responsibility; and no validation that equipment
released for maintenance was adequately isolated or protected. It was clear that the design
modification process was poor.
In the past two years, 40 event analyses identified supervision/co-ordination as a cause or
contributing factor. When looking specifically at organizational clarity, 31 event analyses
found a weakness in either roles and responsibilities (the decision on who does what),
formalization (formal documentation) or organizational knowledge (communication and
internalization of those responsibilities).
Corrective actions
Generic corrective actions that are suggested by these low level events, which could have
prevented this significant event and that were taken after the event, are: increased supervision
in the field (first line managers); requirement for a supervisor for any new or non-routine
activity; designation of a co-coordinator for activities involving more than one group or unit,
and revision of the existing design modification process.
I.2.3. Example 3
Two valves in the reactor pressure vessel head cooling system were inoperable due to wrong
connection of instrument air hoses during modification work
Event description
The original installation of two valves in the reactor pressure vessel head cooling system was
recognized as deficient regarding environmental conditions (heat) and maintainability (heat,
radiation). The valves are located in the containment of a boiling water reactor (BWR) plant.
The valves were to be moved to a less harsh environment. The modification work was
performed during the annual refuelling outage. As part of the work, the instrument air hoses
for both valves had to be removed and the copper lines were to be lengthened. After the
valves were relocated outside the containment, the instrument air hoses were connected to the
wrong valves. In this configuration, the controller for the valve that delivered cooling water to
the head now operated a bypass valve and the controller for the bypass valve now operated
the cooling supply valve. The connection failure was not detected until after the valves were
placed in service.
28
Consequences
Causes identified
The instrument air hoses were connected to the wrong valves. The mechanic did not assure
himself that the hoses were connected to the right valves. The removed hoses were not
labelled. Post-modification testing was inadequate.
In numerous events that occurred in the past years, it appeared that false installation and
omitted testing after maintenance work led to inoperability of components. Around 40
examples of low level events with the same identified causes were found.
Corrective actions
Generic corrective actions that are suggested by these low level events, which could have
prevented this significant event and that were finally taken after the event, were: complete and
comprehensive planning and installation of modification work; improving the self-checking
methods; labelling of components; improving the maintenance instructions; and complete
testing after maintenance and/or modification work.
I.2.4. Example 4
Two out of four channels of a reactor protection system were inoperable due to failure to
properly restore the system subsequent to testing during a refuelling outage
Event description
Two transmitters measuring the conductivity of the feedwater were detected to be inoperable.
These measurements are part of the reactor protection system. The function of the
measurements is to isolate the feedwater system in the case of seawater leakage in the turbine
condenser. The four measurement channels were calibrated during the refuelling outage and
the terminal strips of two channels were left disconnected after the calibration. Disconnection
of the channels was not documented. After the work, the terminal cubicle was independently
inspected and the disconnected terminals were detected, but the supervisor did not question
the cubicle inspection report. The lack of vigilance of the instrument technician was the main
contributor to this event. The procedures were not comprehensive enough and the coming
weekend influenced the supervisors and the technicians review of the cubicle inspection
report.
Consequences
There were no serious consequences because the protection system works with two out of
four channels. The two operable channels would have triggered the isolation of the feedwater
system if needed.
29
Causes identified
Numerous causes were identified in this sequence, but the main contributors were lack of
vigilance of the technician and inadequate procedures. The procedure did not require
documentation of disconnection or connection of the terminals. Schedule pressure had an
influence on the attentiveness of the supervisor.
In numerous events that occurred in the past years, it appeared that careless restoration of the
systems after work led to inoperability of safety systems. Some examples of low level events
with the same identified causes are:
(a)
(d)
(e)
(f)
(g)
(h)
Omitted restoration of a shut-off valve in the residual heat removal (RHR) system after
maintenance work;
Automatic opening of a recirculation valve of the main cooling system was prevented
due to disconnected terminals after testing;
An emergency diesel generator stopped due to improper manoeuvreing of the reset
screw of overspeed protection;
A feedwater pump control actuator inoperable after maintenance;
Pressure indication of a safety relief valve inoperable after outage work;
Several connectors left unlocked after maintenance work;
The earth cable of an electric motor left disconnected after outage work;
A connection was not removed during modification work.
Corrective actions
(b)
(c)
Generic corrective actions that are suggested by these low level events, which could have
prevented this significant event and that were finally taken after the event, were: improving
the self-checking methods; increased supervision in the field; improving the maintenance
instructions; and complete testing after maintenance and/or modification work.
I.2.5. Example 5
Unplanned withdrawal of a control rod during performance of control rod surveillance
testing
Event description
On a day shift, the day shift superintendent was exiting the control room to perform plant
duties. Prior to leaving he held an informal briefing with the senior reactor operator (SRO)
and reactor operator (RO), providing overall guidance for the conduct of the control rod
testing. The SRO regarded this briefing as an informal notification and preliminary discussion
of the test to be performed. The RO thought that it was the pre-job briefing for the test. There
were 133 control rods to be tested; 128 were in fully withdrawn position, 24 were fully
inserted, and one was in an intermediate position. The normal practice was to test all the fully
withdrawn control rods, then the intermediate control rods and finish with the fully inserted
rods. During this test, a second RO was assigned to perform peer checking of the activity.
Contrary to the normal practice, the decision was made to test intermediate and inserted
control rods prior to completion of testing of the withdrawn control rods. This was due to the
availability of an additional operator, necessary to support the testing of the intermediate and
30
inserted control rods. After completing the testing on 105 of the fully withdrawn rods, the RO
selected and tested the intermediate position rod, then selected a fully inserted rod and moved
it in the outward direction. Following that mispositioning event, appropriate levels of control
room supervision were notified and action was taken to replace the control rod in the correct
position and verify core thermal limits.
Consequences
This event had no real impact on safety, but could potentially have led to violation of core
thermal limits.
Causes identified
The causes and error precursors6 identified were related to monotony of the task, board
repetitive actions from an established pattern, and failure to adequately self-check, allowing
the intended component to be manipulated incorrectly.
Some similar events occurred at the plant in the past five years, in addition to different events
with similar causes. Those events were pointing to improvement opportunities in work
practices and procedure adherence in operation activities. Examples are:
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Inappropriate control rod movement during power reduction; the direct cause was
related to work practices (inadequate self-checking and attention to details).
Control rod mispositioning event while performing rod operability testing.
Inadequate self-checking was identified as the first weakness.
Pressure drop of condensate header and automatic startup of condensate pumps
following inadequate positioning of the pumps hand switches.
The manoeuvre was done during shift turnover and without a procedure.
Control rod withdrawal in violation of a procedure.
One rod was forgotten in a previous withdrawal and was pulled a longer distance than
initially authorized.
Starting the wrong diesel generator during a surveillance test.
The activity necessitated peer checking, but the supervisor allowed the surveillance to
continue while the second operator was absent.
Valve mispositioning on the cooling water system.
Also, the industry did report many similar events through INPO Report INPO/SOER 84-02 on
Control Rod Mispositioning [20].
Corrective actions
The corrective actions suggested by those events, which could have prevented this significant
event and that were taken after this event, were: communication and training on proper self-
Unfavourable factors embedded in the work site that increase the chances of error during performance of a
specific task by a particular individual.
31
checking methods and expectations; review of supervisor workload; training on procedure use
and adherence; and training on efficient pre-job briefing.
After those corrective actions were put in place, it was observed that events identifying selfchecking deficiencies as a cause decreased by half, and supervisory activities in the field
increased by 20%.
I.3. Examples of organizational issues
32
Significant events, low level events and near misses all share something in common: latent
weaknesses that result in failed barriers and root causes. All these types of events differ only
in their resulting consequences.
As far as defence in depth is concerned, the analysis of any level of event can be used
provide information for an effective corrective action programme. The challenge here is
identify which latent weaknesses contributed most to the common causes and then
implement effective corrective action(s). This is where the use of coding and trending
helpful.
to
to
to
is
As far as single human performance events are concerned, reporting and discussing the events
may be quite sufficient, as in isolation they dont provide an indication that a widespread
improvement is needed. On the other hand, seeing re-occurrences of a high number of such
events which share some common pattern or causal factors should result in a more generic
corrective action.
Significant events normally develop through deficiencies in barriers that were not detected
during normal operation. We may operate believing that a strong defence in depth is in place,
with multiple barriers preventing events, when in fact there may be significant weaknesses
that go undetected. The benefit of low level event analysis is that we can find deficiencies in
barriers that normally go unchallenged and may be ineffective in stopping a significant event.
In addition, large numbers of low level events and near misses may increase the probability of
occurrence of a significant event, which in itself is a sufficient reason for addressing these
types of events.
Past accidents, either in the nuclear industry or in other types of facilities, have occurred when
a series of small latent weaknesses combined with an additional failure which resulted in a
serious event. Defence in depth normally ensures that a single failure does not degenerate into
an event with serious consequences for either the public, the personnel or the nuclear power
plant itself.
The previous proposal illustrates that it would be poor management to leave low level events
and near misses unreported. In all probability, they contain factors that may be present in
significant events or even in incidents or accidents and should therefore be seen as precursors.
Reporting and analysing those low level events and near misses allows detection of latent
weaknesses that may indicate the need for improvement and definition of corrective actions to
prevent more significant failures.
Latent weaknesses in the programmes
Cumbersome processes (that force people to work around the process, e.g. work
management, engineering design).
33
There are cases where procedures are never applied. For example, the procedure for
calculation of a leak balance sheet in power operation was never used, since it was long and
complex. Instead of this, the operators used another, implicit, unwritten one.
Latent weaknesses in operations
34
The term near miss comes from aviation, describing two aircrafts approaching each other
during flight at a distance less than that usually considered to be safe, but where nothing
actually happens. This may be a result of an action preventing a serious event from
happening.
Figure II.1 illustrates how an inappropriate action develops either into a near miss (fortunate
situation) or into an event. Figure II.2 shows the progression of a low level event to a more
significant event to a severe accident, depending on how many lines of defence are breached.
Lines of defence include physical barriers, administrative barriers (procedures, checklists) and
good work practices (as a result of training, safety culture, etc.). The examples given below
describe typical near misses in nuclear power plants. Further examples of near misses are
contained in Appendix III.
Examples for near misses that occurred:
An operator places his hand on the wrong switch; however, immediately prior to
actuating the switch he recognizes his mistake.
A craftsman in the turbine building sees a fellow craftsman not utilizing proper safety
equipment and practices. He points out this problem to his peer and a potential industrial
safety accident is avoided.
An operator is walking through the plant when he looks up and sees a craftsmen kick a
tool off a scaffold. Quick action on the part of the operator prevents injury.
These examples show near misses as something that happened without consequences. Had the
timing of the individual actions been different, the outcome of the event may have
significantly changed.
At some utilities the reporting of near misses is called good catches, to point out the fact that
a significant event was prevented by timely detection and appropriate action by the
individuals involved.
35
Action
Inappropriate
action
Fortunate
situation?
Yes
Near miss
No
severity
Low
level
event
1
failure
2
event,
significant
incident
3
P SA
initiating
event
LO D s are activated
and they are
generally successful. T hey are only
studied when they
fail. T hey are of
the organizational
and design type.
4
unacceptable
consequence
(level 1 )
LO D s are not
activated. It is not
known when they
wo uld have been
successful: this is
the objective of the
P SA. T hey are
ab ove all of the
design type.
5
unacceptable
releases
(level 2)
LO D s are not
activated. N ot m uch
is known about them
and they are not
studied m uch. Little is
known of their
chances of success.
T hey are m o stly of
the design type.
As explained in Fig. II.2 it can be said that a serious accident may develop from a low level
event in the following manner: a failure or a set of failures is revealed through a declared
incident (incident report) because the lines of defence set up (maintenance, monitoring,
operation, design quality, etc.) have been broken. The incident can then degenerate into an
increasingly serious accident as other lines of defence are successively broken. This process
36
of aggravation of the situation is stopped when a line of defence has been activated and has
correctly played its role. It is noteworthy that these last, successful lines of defence are not
generally studied.
In this situation, the process of jumping from near misses to a low level event and then to an
effective failure of the installation is the same one, and generally, the defence lines activated
by a successful good catch are not documented and studied. Nevertheless, it could be very
interesting because the mechanism of failure of the different types of defence lines do not
depend on the force of these lines. To have a severe accident, it is necessary to break a lot of
defence lines. Thus, the origin of the accident is a simultaneous failure of several lines of
defence which may be of very different types.
In order to illustrate the general approach shown in Fig. II.2, an example is given below (see
Fig. II.3).
The initial conditions of the nuclear power plant are as follows:
The plant has been operating at 100% power for 250 days;
3 main condensate pumps are running;
3 main feed pumps are running.
37
38
Normal operating
procedures
(NOPs)
Supervision
Training
Unit
operating
at 100%
power
3 condensate
pumps
3 feed pumps
On-line 250
days
Worker practice
(WP)*
Off-normal operating
procedures (ONOPs)
NOPs
Supervision*
Training*
Condensate
pump trips
Reactor coolant
temperature and
pressure increases
envelope
Operating
Emergency
operating* procedures
(EOPs)
Automatic safety
function*
Plant design*
WP*
ONOPs*
NOPs
Supervision*
Training*
Reactor trips
on low steam
generator level
I
II
FIG. II.3. Example of lines of defence.
Plant Design*
WP
ONOPs*
NOPs
Supervision*
Training*
Feed pump
trips
Low suction
pressure trip
Nuclear safety
envelope
III
Environmental
release procedures
EOPs*
Automatic safety
function
Plant design
WP*
ONOPs*
NOPs*
Supervision*
Training*
Primary
relief valve
lifts
Safety margin
Radioactive
release to
public
Best international practices demonstrate that the most frequently identified performance
problems fall into several categories. As a good example on how to implement a trending
code, we selected seventeen categories. If we identify each one by a letter and a number, we
have an identification system, very useful for a trending analysis:
A. Communication the presentation and exchange of information
1. type of communication
a. verbal,
b. written;
2. information not provided to end user;
3. too much information provided to end user;
4. no feedback provided to message initiator.
B. Procedures and documents the written presentation or exchange of information
1. type of procedure/document
a. permanent,
b. temporary;
2. format confusing to end user;
3. errors in technical content of the document;
4. not properly co-ordinated with change implementation.
C. Displays and labels the design of equipment used to communicate information from
the plant to the person
1. equipment layout and usability;
2. equipment labels unreadable or incorrect.
D. Environmental conditions physical condition of the work area
1. poor working conditions
a. lighting/temperature/noise,
b. cramped/overcrowded conditions;.
2. protective equipment required
a. industrial safety equipment,
b. radiological protection equipment.
E. Workers schedule factors that contribute to the ability of the workers to perform their
assigned task
1. worker fatigued;
2. interruptions of work in progress;
3. poor co-ordination of related activities.
F.
39
H.
Work planning and scheduling how planning, scoping, assignment and scheduling of
the task to be performed is accomplished
1. insufficient time allotted for worker to prepare for task;
2. insufficient time and/or personnel assigned to task;
3. task planning/scoping did not identify all conditions;
4. personnel assigned to task not qualified.
Supervision methods used to direct workers in the performance of tasks
1. supervision not interfacing with workers;
2. task progress not adequately tracked;
3. direct supervisor/achievement resulted in loss of overview rule;
4. emphasis on schedule exceeds emphasis on doing work correctly.
I.
J.
K.
Personnel and materials management how personnel and materials are assigned to a
task
1. insufficient supervising resources provided;
2. insufficient personnel assigned to task;
3. inadequate materials provided to complete task.
L.
M.
Design configuration the layout of systems and subsystems to support operations and
maintenance
1. design changes not implemented in a timely fashion;
2. misapplication/-interpretation of design requirements;
3. unauthorized design changes implemented;
4. design change not properly co-ordinated with design change implementation.
N.
40
O.
P.
Q.
41
Deficiencies and/or events, including the related organizational and human factor issues,
should be reported, evaluated, and appropriate information entered.
Who can report: anybody, including contractors.
How to report: defined organizational structure; clear procedure, communication to
those individuals involved.
Criteria for dealing with event: low level event is below or outside of the INES scale, so
this scale is not appropriate for use.
Categorization with respect to safety significance.
Simple; easy to understand.
Feedback to: person reporting the event; communicate immediate corrective action plan
to other affected units and to all involved plant personnel.
Encouragement to report; readily accessible, simple, non-punitive.
Meeting regulatory requirements.
A good reporting system may include a few serious reports and maybe thousands of low
level events.
A database with appropriate coding, maintained by knowledgeable, experienced
individuals.
Information periodically provided to senior management.
One system for all low level events, near misses and deficiencies.
Identified (central) group to manage process, co-ordinate decisions and recommend
actions to senior management.
Example criteria:
Clear criteria for further investigation per unit:
Full investigation/root cause investigations: <6/year.
Partial investigation: not more than 60 events.
Corrective actions to eliminate the consequence of ~600 events.
Trending on ~6000 low level events.
43
The techniques described below are not the only methods available, but represent proven
techniques for analysing data. The mathematical formulas presented might be too stringent
and limiting, depending on the circumstances of the trend analyses.
V.1. Pareto charts and analysis
V.1.1.
Pareto principle
The Pareto principle is a mathematical model used to describe unequal distributions. It was
discovered by Italian economist V. Pareto, who observed that 80% of the wealth in Italy was
held by 20% of the people. The Pareto principle, or 80/20 rule, is a naturally occurring pattern
that can be applied to any field. The Pareto principle helps us find the big hitters. Using the
Pareto principle allows us to concentrate our limited resources on resolving/improving the big
hitters which are creating the most problems. In this way, we can maximize the effectiveness
of our improvement efforts.
There are six steps to conducting a Pareto analysis:
(1)
(2)
(3)
(4)
(5)
(6)
V.1.2.Pareto chart
A Pareto chart depicts the problem or issue categories as bars, and is a snapshot in time (e.g.,
a month). Problem categories are plotted in descending order (see the chart). The chart does
not show how the data are changing over time. Therefore, once the big hitter categories are
identified, each category should be plotted for the past 612 months as appropriate.
In Figure V.1, the first three categories constitute 80% of total issues. Therefore, the Pareto
principle (or 80/20 rule) can be applied. The first three categories are the big hitters, and
should be reviewed further. In addition, time series data should be plotted for these categories
to see how each category trends over time (for example, see sample chart C in Attachment 5).
If the issues are not concentrated and are evenly distributed (as shown in Fig. V.2), the Pareto
principle may not apply.
45
For each parameter (e.g., configuration control in the operations or design process, in
engineering or material conditions) to be evaluated, determine the mean (average)
number of CRs over a period of time (e.g., 6 months).
(2). Determine the standard deviation of the CR data set. The standard deviation is a
measure of how widely values are dispersed from the average (mean) value. A
statistical function is available in Excel for calculating the standard deviation of a
population (STDVEP).
(3)
When the number of CRs for a parameter is greater (or less) than 2 standard deviations
above (or below) the mean, this area should be further reviewed/analysed to determine
potential causal factors for the deviation.
(4). If there are insufficient data to determine a reasonable mean and a standard deviation,
professional judgment should be used to make a determination of a trend.
46
The trend analyst may utilize appropriate charting and/or calculation methods to trend data to
determine if a potential trend exists. The trend analyst analyses the available data for these
areas, to identify which areas require more detailed investigation. If a system, programme or
organizational trend is identified, the analyst should review the available data to focus
attention on specific areas.
If the trend analyst normalizes trend data in order to provide for a more realistic analysis, the
normalization method utilized should be clearly identified. For example, the analyst may
choose to normalize a human performance error rate taking into account the total number of
person-hours worked when analysing a departments event history.
47
The purpose of this appendix is to illustrate by a small selection of actual operating cases the
importance of reporting, evaluating and trending low level events and near misses in order to
prevent the occurrence of events which can be safety significant and may lead to unwanted
radiological doses or have a negative impact on the economic performance of the plant.
The sections of this appendix comprise:
VI.1. Examples of the use of trend codes
VI.2. Example of a decision instrument for screening information
VI.3. Case studies
VI.4. Example of a method to analyse potential consequences of low level events and near
misses.
Section VI.3. of this appendix contains case studies. In case study 1, an evaluation of twelve
reported events of not following work procedures could have prevented the seven day delay in
operation startup. Case 2 describes the lucky situation that the same operator who detected the
failure of a light-bulb was in the daily morning meeting. Thus, he could give advice that further
analysis of the event was needed. After performing the analysis, a serious modification error was
revealed. In case study 3, trending of twelve delayed maintenance activities could have
prevented the (almost) violations of technical specifications. Additional examples of illustrative
cases are given.
VI.1. Examples of the use of trend codes
This section describes the necessity to determine a limited number of categories to classify
low level events and near misses, and provides good practice for trend codes. To illustrate the
application of trend codes, a set of examples is provided. We are using Appendix III as a
reference
Example 1:
A craftsman is assigned to repack a valve on the auxiliary steam system. He assembles the
required tool and materials and goes to perform the work. During disassembly of the valve,
after the packing gland has been removed, the packing is blown out of the valve and the
craftsman receives a steam burn on his hands. Investigation of this event showed the
following:
The valve to be worked on was not properly isolated for performance of the
maintenance activity: Trend codes F.4, O.3.
The supervisor did not discuss the task assignment with the craftsman: Trend codes
A.1.a, A.2, H.1.
The craftsman did not use the appropriate safety equipment to prevent burns: Trend
code F.3.
No written instructions that defined the task, safety precautions or work techniques were
provided to the craftsman: Trend codes A.1.b, a.2, F.4.
49
Example 2:
A long standing material condition problem resulted in replacement of a safety related pump.
The new pump was installed but failed within the first 30 days of operation. Investigation of
this event showed the following:
During plant operation, the replacement pump was not operating within its normal
operating parameters: Ttrend codes M.2, N.1.
The pump was installed in a vertical position, as against a horizontal position as
required by the manufacturer: Trend code N.2.
Repair parts to support the new pump had not been ordered to support the new
installation:. Trend codes J.3, K.3.
Training on maintenance of the new equipment was not provided: Trend codes I.2,
G.4.
Example 3:
The operator and his supervisor conducted a pre-job briefing. After the brief, the operator
reviewed the procedure and walked through the evolution. At that time, the telephone rang
and the operator answered, then returned to the evolution in progress. He immediately put his
hand on the wrong switch, operating the wrong valve.
Investigation of this event showed the following:
The operator became distracted when he answered the telephone. The operator failed to
self-check his performance of the operation in progress when the telephone rang.
Subsequently, he did not verify that his hand was on the proper switch prior to actuating the
component. Trend codes D.1.a, E.2, F.2.a, F.3 and A.1.a, A.2, P.1.
Example 4: Malfunction of pressure limiting valves (low level event)
Through an in-service-inspection during an outage it was found that a pressure limiting valve
did not open completely. A root cause analysis revealed that the lubrication grease has been
hardening due to high temperature. Further inspections showed the same situation at other
comparable valves.
Generally these valves are additional redundancies to safety and relief valves. In the case of
an initiating event, only 3 out of 6 valves would have operated well, whereas a result of the
plant specific PSA shows that 4 valves are necessary when all 8 safety and relief valves do not
open (one of these valves is already sufficient). However, use of a specified type of grease,
appropriate for the environmental conditions, would have avoided this situation. Trend code
O.1.
Example 5: Loss of control of steam generator level and recovery (near miss)
The reactor was in shutdown condition. The auxiliary feedwater was in service. The operator
adjusted the levels on the three steam generators and left to have a coffee. There was another
engineer who saw that the levels had slowly drifted, and he called the operator. When the
operator jumped to adjust the levels, they were very close to the boundary of automatic scram.
Trend code E.2.
50
A control technician had to do tests on the four protective trains of the core. He began his test
at train 1, which is located near the door. As he was called, he went out for a moment, and
when he came back, he was about to open train 4 instead of train 1.
Explanation: He usually works on the other reactor of the same unit, and in this other reactor
train 4 is located near the door. He remembered this just before opening the drawer of train 4
and stopped his action. Otherwise, the plant would have run into a reactor trip initiated by two
drawers being opened simultaneously. Trend codes E.2, F.3.
Example 7: Near loss of site power (near miss)
After a loss of off-site power on reactor 3, application of the procedure requires the operator
to go out of the control room, to the area where the contactors are situated. There he has to
switch on the contactor manually, using a switcher.
He did not find the switcher in the area of reactor 3; therefore, he went into the area of reactor
4, found a switcher and connected it to the contactor of reactor 4. He was just pressing the
button, but not closing the switch, when another operator said: What are you doing in the area
of reactor 4? Trend codes E.2, F.3, I.1.
Example 8: Loss of main feedwater (near miss)
The reactor operated at 30% power. The field operator was testing one turbine driven main
feedwater pump. The second turbine driven main feed pump was in operation. The last part of
the test was to switch off manually the pump being tested. When the field operator turned the
pump off, the pump did not stop because the operator in the control room had overridden the
test. The field operator went to the control room to investigate the problem. After the field
operator and the control room operator had resolved their problems, the field operator
returned to the pumping station. When the field operator arrived at the pumping station, he did
not initially verify the pump he was about to stop. As he prepared to push the stop button, he
realized he was about to stop the only operating turbine driven main feed pump. Trend code
F.3.
VI.1. Example of a decision instrument for screening information
One example of a decision instrument for initial screening of reported information is shown in
the following. In this example, all incoming information, called condition report (CR), is
checked against a decision tree. A matrix, given as Table VI.1, supports the decision making
and can be used when reviewing and resolving conditions. Some typical condition level
guidelines and examples to illustrate Fig. VI.1. are given in Figs VI.2((a)(d)).
51
FIG. VI.1. Example of a decision tree for screening information for further detailed
investigations.
CNAQ:
CAQ-D:
CAQ-S:
SCAQ:
52
TABLE VI.1.
CONDITIONS
EXPECTATIONS
CORRELATED
TO
DIFFERENT
TYPES
EXPECTATION
CNAQa
CAQDb
CAQ-Sc
SCAQd
Evaluate the effect of the recurrences on the plant, and whether the
recurrences are indicative of a problem that warrants additional
action or escalation of the CR level
Consider condition applicability across departmental lines where
appropriate; consider generic implications
Compensatory actions are taken until final corrective action can be
taken
OF
Ensure that corrective actions are selected specific to the cause and
will effectively prevent recurrence of the condition
CNAQ: Condition not adverse to quality; bCAQ-D: Condition adverse to quality, department level
CAQ-S: Condition adverse to quality, station level; dSCAQ: Significant condition adverse to quality
53
Examples
FIG. VI.2. Condition not adverse to quality (CNAQ): guidelines and examples.
54
Procedure non-adherence that does not affect safe, reliable operation of the
plant.
Procedure non-adherence that does not affect personnel safety.
A deficiency in equipment, material, documentation or procedure that does not
affect the safe, reliable operation of the plant.
A deficiency in equipment, material, documentation or procedure that does not
affect personnel safety.
A recordable injury.
Repeat issues.
Examples
FIG.VI.3. Condition adverse to quality, department level (CAQ-D) guidelines and examples.
55
Examples
FIG. VI.4. Condition adverse to quality, station level (CAQ-S:) guidelines and examples.
56
Examples
While moving the fuel assembly, a thimble plug was knocked over.
Adverse trend involving inadequate operational procedure accuracy.
Unexpected objects found on lower core support plate.
Individual radiation exposure greater than administrative or regulatory limits.
During power ascension, main steam to de-aerator valve drifted open unexpectedly
resulting in an uncontrolled power increase of 2%.
Seven workers were severely burned when a steam line they were working on burst.
The equipment clearance order event resulted in personnel injury.
The equipment clearance order event resulted in equipment damage.
FIG. VI.5. Significant condition adverse to quality (SCAQ): guidelines and examples.
The purpose of this appendix is to illustrate by a small selection of actual operating cases the
importance to report, evaluate and trend low level events and near misses in order to prevent
events from occuring which can be safety significant, lead to unwanted radiological doses or
have a negative impact on economical plant performance.
In case study 1, an evaluation of 12 reported events of not following work procedures could
have prevented the seven day delay in operation startup.
Case 2 describes the lucky situation that the same operator who detected the failure of a lightbulb was in the daily morning meeting. Thus, he could give advice that further analysis of the
event was needed. After performing the analysis, a serious modification error was revealed.
In case study 3, trending of 12 delayed maintenance activities could have prevented the
(almost) violations of the technical specifications.
57
CASE STUDY 1
Initial plant conditions
Plant is in a refuelling outage.
Reactor core reload is complete.
Reactor vessel reassembly is complete.
Reactor coolant system is at reduced inventory to facilitate the removal of steam generator
nozzle dams.
Safety injection surveillance testing on train B is in progress.
Event
At 5:30 p.m. a safety injection initiation signal is received. The plant emergency diesel
generators start and the associated electrical busses are de-energized, then re-energized as the
diesels load electrically.
The initial investigation showed that a train A safety injection initiation signal had been
received from the reactor protection system.
Analysis of the event
Analysis of this event showed that the reactor operators performing the safety injection
system surveillance test on train B were actually working in the train A safety injection
system cabinet.
When the operators recognized their mistake, they attempted to back out of the test following
the procedure steps in reverse order. While repositioning the train A safety injection test
switches with safety injection train B in test, the safety injection logic was satisfied and
initiation took place.
Further analysis of this event identified:
During the previous six weeks operators had performed surveillance tests on the wrong
system or component four times.
The need to perform self-checking prior to starting any test or evolution was not stressed at
the pre-evolution briefing. This failure to self-check had resulted in seven valve and four
breaker mispositionings during the previous month.
Procedure non-adherence had been a recurring problem throughout the refuelling outage.
Twelve events had been reported during the previous six weeks.
Because no significant consequence was associated with the previously mentioned events,
these events were considered as isolated cases or near misses, and therefore no corrective
action was taken.
When the utility looked at the above mentioned information, it realized that in the previous
six weeks 27 opportunities (previous low level/near miss events) to prevent this event had
been overlooked.
58
Had these events been tracked and trended using a programme similar to that identified in
Appendix III of this publication, the need to initiate corrective actions would have been
identified and action taken.
This action would have strengthened/re-enforced self-checking techniques and expectations,
which may have prevented additional wrong train and valve mispositioning events. Had
procedure adherence been discussed during the pre-evolution briefing, the operators would
have informed the control room that they were working on the wrong train, and appropriate
compensatory action would have been discussed.
The restart of the reactor was delayed approximately seven days while additional corrective
actions were taken to resolve the noted problems. The lesson learned by the utility is that
tracking and trending of the lower tier events will identify adverse trends in human
performance, which, if corrected, will help to prevent more significant events.
CASE STUDY 2
Many plants in eastern Europe were built according to Soviet design. The norms (technical
requirements) are(were) different from national norms. There are, in these nuclear power
plants, some exceptions to the rules.
In one of them (a PWR), there were modifications to the essential electricity supply system.
Old equipment was changed by new one. All changes were prepared with high
responsibility in time. Everything was successful and everybody was happy. But,
nevertheless, an event occurred. It is described below.
Initial plant conditions
Event
During the routine periodical test, one I&C worker tried to check the condition of the
electrical bulbs (fortunately or not, the train was in test condition) and the whole train was
switched off. Nothing happened.
In the morning meeting it was announced that the whole train was switched off without any
consequences during the routine test.
As a result, a work order to the maintenance department to check the condition was written.
Analysis of the event
After a detailed analysis was made, it was discovered that everything was done according to
the norm, but according to the national norm, and at the connecting point there were different
connections to the ones before between the new and the old part of the equipment. The old
part was still supplied according to the original standard scheme. When the train was in
normal operation status, the event did not occur. It was discovered a few months after
finishing the work.
59
Discussion with I&C people: a similar situation had happened a month ago (a short
description was found in the I&C log; no special event report was written).
Next day the result of the maintenance activity was known:
a special meeting of I&C, electricity, operation and maintenance people was
organized,
people from the technical department were invited,
results of corrective actions for the process of modification were suggested.
Necessary corrective actions were taken immediately. During investigation of the event it was
found that there were many low level discrepancies in the modification process. These had
been found two times before, but countermeasures were not taken.
A reporting system without low level barriers and criteria did not catch a single non-safetysignificant event (the burning fuse) which had hidden weaknesses in the process of
modification.
CASE STUDY 3
Initial plant conditions
The plant is operating at 100% power and has taken train A of essential cooling water out of
service for maintenance. This places the plant in a 72 hour shutdown action statement.
This maintenance activity has been on the plant maintenance schedule for 12 weeks. The final
schedule was locked down three weeks prior to the scheduled start date.
Following plant policy, during the three week period prior to the scheduled activity the
maintenance supervisor walked down the work, verifying that all the necessary repair parts
and work documents were correct and available.
Event
Work on the train A essential cooling water pump was in progress. The pump packing had
been removed and the mechanic was preparing to install the new packing. When the mechanic
removed the packing from the material staging area, he noted that he had removed 12 rings of
packing from the pump and only 8 rings of packing were available to be re-installed.
He immediately informed his supervisor, and additional packing was obtained and the pump
returned to service prior to expiration of the limiting condition for the operation action
statement. The pump was scheduled to be out of service for 48 hours; the actual return to
service of the pump was delayed for approximately 8 hours. (Total out of service time for the
pump was 56 hours.)
Analysis of the event
Investigation of this event showed that the supervisor had not done a physical verification of
the necessary repair parts, that he had relied on the material requisition form to verify that the
necessary repair parts were available.
Follow-up investigation in the condition reporting database, using trending codes, as
explained in Appendix III, identified 12 instances in the preceding three months where the
60
return to service of technical specification related equipment had been delayed because the
necessary repair parts were not available. Additionally, 32 non-technical-specification-related
equipment schedule performance problems, related to repair parts, were identified.
Therefore, further investigation was necessary to understand the event and the generic
implications. The subsequent investigation showed the following:
In 8 of the 13 events identified in this report, the supervisors had relied on the material
requisition form to ensure that the necessary repair parts were available. (The remaining
five delays were the result of increased work scope.)
Six supervisors were involved in the eight limited condition for operation related events
being reviewed.
80% of the remaining shop supervisors were involved in the 32 non-technicalspecification-related equipment events identified.
The results of this more in-depth investigation showed that the major contributor to this event
was the fact that the plant was going through a downsizing effort. As a result of this effort,
additional administrative burdens such as time keeping, maintenance of personnel logs and
development of training schedules had been assigned to the maintenance supervisors.
The use of trending showed that this was not an isolated case; subsequently, discipline was
not imposed, but corrective action was taken to reduce the administrative workload on the
maintenance supervisor, allowing the supervisors to focus on their primary assignment of
ensuring that scheduled maintenance activities are ready to work as scheduled.
Cause of the event
The root cause of this event was ineffective change management; the full impact of the
reduction in the administrative workforce and the shifting of the administrative workload to
the maintenance supervisors was not fully evaluated prior to implementation.
61
APPENDIX VII.
EXAMPLE OF ENHANCING SAFETY CULTURE BY
ENCOURAGING ALL PERSONNEL TO REPORT LOW LEVEL EVENTS
As indicated in Section 3, the general approach to safety culture can be enhanced if low level
events, small degradation and near misses are discovered and reported by all plant personnel.
As an example, all maintenance workers should internally report any low level events.
However, these workers are usually not trained to identify precursors to safety problems.
They have generally a good training to identify the precursors of work accidents, but work
accident precursors and safety accident precursors are different.
There are two items which must be taken into account: work safety and nuclear safety during
operation.
Work safety type analysis
The problem is essentially LOCAL (limited The problem occurs at the GLOBAL level
to the work area).
of the plant (extended to the interactions
of all the systems of the plant).
of
analysis based on
probabilistic
safety
Some of the current methods of preliminary work analysis contain elements that can be used
for nuclear safety analysis. However, they do not seem to be adapted as a whole case to the
problem at hand. It would be a mistake to simply transpose what is usually done for work
safety to the nuclear safety. Work safety methods are generally designed for resolving the
problem of risk from the point of view of work safety (agents, the job site, or the system
directly concerned by the job at hand). Nuclear safety analysis poses a completely different
problem and can even lead to contradictory conclusions. For nuclear and work safety together,
the problem is more difficult and compromises are often impossible.
63
As far as nuclear safety analysis is concerned, it seems that the problem is essentially caused
by a fault in the global view of the system. However, this global view of the system is very
difficult to perceive. In particular, it has been noticed in incident analyses that even nuclear
safety specialists missed this point, but PSA models give them the solution.
Therefore, there are two methods and two different objectives, which are explained in the
following example.
A maintenance craftsman identified a material deficiency on a charging pump suction flange.
Based on his experience he knew that fixing this deficiency would take about 5 minutes.
Therefore, he gathered his tools and started to work. The control room was not aware that
maintenance was in progress on the charging pump. So the control room operator performed
an automatic test of the pump suction valve. When the suction valve opened, water from a
refuelling water storage tank flooded the building.
The investigation of this event showed:
The craftsman wanted to do a good job, but he did not see how his actions could be the origin
of a safety event. This event identifies the need for risk awareness and a good safety culture
that must be supported by a global vision of the plant.
64
APPENDIX VIII.
Prompt assessment of OEF process effectiveness is also possible by monitoring the early
signs of declining performance. Industry experience has shown that areas to consider when
looking for the early signs of declining performance are:
Corporate coverage, supervision and support.
Do board of directors receive reports that accurately reflect plant performance?
Are problems detected or issues raised prior to prompting by outside organizations?
65
When the outcome of such an assessment suggest the onset of degraded safety performance,
the regulator may provide guidance for those who are supervising the nuclear power plant.
66
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
67
[19]
[20]
68
DEFINITIONS
event. In the context of the reporting and analysis of events, an event is any unintended
occurrence, including operating error, equipment failure or other mishap, the consequences or
potential consequences of which are not negligible from the point of view of protection or
safety.
near miss. A potentially significant event that could have occurred as the consequence of a
sequence of actual occurrences but did not occur owing to the plant conditions prevailing at
the time.
root cause. The fundamental cause of an initiating event which, if corrected, will prevent its
recurrence, i.e. failure to detect and correct the relevant latent weakness(es) and the reasons
for that failure.
69
Balmisa, J.M.
Berg, H.P.
Betak, A.
Coovert, C.
Coovert, R.
Cui, Z.
Czak, S.
Diaz Francisco, J.
Eletronuclear, Brazil
Domenech, M.
Dubsk, L.
Dusic, M.
Frischknecht, A.
Foltov, I.
Franois, P.
EDFDER, France
Frischknecht, A.
Ganchev, T.
Glazunov, A.
Guha, G.
Gubler, R.
Guy er, H.
Hanski, O.
Hansson, B.
Haschke, D.M.
Hashmi, J.
Heather, C.
Henderson, N.R.
Iqleem, J.
Janei, A.
Johansson, A.
Kawano, A
71
Kinoshita, K.
Klimo, J.
Koltakov, V.
Kriz, Z.
Krizek, J.
Kubanyi, J.
Lan, Z.
Leijonberg, A.M.C.
Lipr, M.
Loiselle, G.
Malkhasyan, H.
Atomservice, Armenia
Mamani Alegria, Y.
Nichols, R.
Parsons, C.B.
Perez, S.
Perramon, F.
Poilpret, P.
Reis, T.
Rooseboom, A.J.
Serbanescu, D.
Sivokon, V.
Soldeus, U.
Srinivasan, G.R.
Szke, A.
Teodor, V.
Tth, A.
Tolstykh, V.
Verduras Ruiz, E.
72
Waddell, T.
Werdine, H.
Wu, B.Q.
Yang, C.
Zhang, T.
Zellbi, I.
Consultants Meetings
Vienna, Austria:
1519 December 1997
711 December 1998
1519 November 1999
812 May 2000
2125 August 2000
26 June 2003
Vienna, Austria:
1822 October 1999
711 October 2002
73