Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Johnson 2004

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Copyright © IFAC Information Control Problems in

Manufacturing, Salvador, Brazil, 2004

PU8UCATIONS
www.elscvicr.comllocalclifac

IMPROVING AUTOMATION SOFTWARE DEPENDABILITY:


A ROLE FOR FORMAL METHODS?

Timothy L. JohnsoD, PhD.

GE Global Research, K-1 , 5C30A


P.o. Box 8
Schenectady, NY 12301
(johnsontl@research.ge.com)

Abstract: The growth of manufacturing control software from simple NC and PLC-based
systems to concurrent networked systems incorporating PC's, PLC's, CNC's, and
enterprise databases has created new challenges to the design, implementation, and
maintenance of safe and dependable manufacturing systems. Key milestones in this
evolution, and the prospects for the use of formal verification methods in achieving
enhanced dependability of future manufacturing software, are examined in this paper and
presentation. Copyright © 2004 [FAC

Keywords: Automation, manufacturing systems, system engineering, programming


theory, reliability theory, safety analysis, computer software, testability.

I. INTRODUCTION automated systems in favor of more automated ones,


and control engineers will have less to do.
In the US, the Denver International Airport is often
used by the Federal Aviation Administration (FAA) As a whole, only about 30-40% of large software
as a test site for new technologies. A perennial dread projects that are initiated will run to completion
of the air traveler is the baggage handling system: lost (Brooks, 1995), and this was one that didn't. Even
bags, delayed bags, and worst of all, bags transferred though the record in manufacturing systems - which
to the wrong airline, and ending up in remote places are highly structured - is probably better than this
like Brazil (well, at least remote from Denver!). So, average, it still could benefit from substantial
the FAA and the Denver businesses and politicians improvement (Place and Kang, 1993 - selected
decided that the brand new airport would be a references from older literature have also been
wonderful place to showcase new baggage handling repeated here). Start-ups of new manufacturing and
technology. The system requirements were duly process plants are often notoriously delayed. And
prepared, the contract awarded, and millions of increasingly software development is at the heart of
dollars committed to a network of computer- most of the problems. With the rapid decrease in
controlled conveyors that would whisk luggage cost, and even more rapid increase in the capabilities
immediately to its intended destination (deNeufville, of computers over the last decade, the computing
1994). But then came the control system. The initial hardware components of automation have become
indication that something was wrong occurred when less costly, more versatile, and more reliable. So the
the rest of the airport, and conveyors, were in place, drive to shift hardware functions into software has
but the software design had barely begun. The accelerated over the last decade. Manufacturing
project became a laughing-stock when it was over software itself has expanded from isolated, carefully
two years late on delivery (the rest of the airport designed PLC logic systems that operate for months
could not be used without it). Finally, the time for without interruption, to PC-based platforms, where
initial testing arrived: It was a disaster! The system even in the absence of an application, the operating
could not do even the most basic luggage transport systems must be rebooted every few days!
correctly. Patience wore thin. Political and business
reputations were ruined. Finally, the system was Not only have manufacturing control applications
scrapped and a "semi-automated" (viz., become rapidly more complex, but also the
conventional) system was used instead! From the expectations of timely response have grown
control engineer's perspective, the most serious increasingly more demanding. At the same time,
consequence of this type of failure is that the public other design requirements have also grown more
was left with the impression that automation itself demanding. Availability targets have expanded from
was at fault, and not that (as was undoubtedly the 95% to 99.99% or higher in some applications (e.g.,
case) the project was mis-managed. Dozens of network broadcasting). The numbers of measurement
airports around the country will now opt for less and control points and size of control programs have

153
exploded. Networks are a part of almost every test and verification processes and issues that have
system (Perrow, 1984). Enterprise integration, as accompanied the better-known improvements in
well as sensor-level integration is expected. manufacturing automation and computation. This
provides a context in which to assess the context for
In spite of the increasing level of dependence of growth of formal methods in this field.
manufacturers on automation software that is
expected to be safe and reliable, very little rigorous The earliest machine tool control languages, such as
statistical data concerning manufacturing software APT, that were invented in the late 1940's (Alford
mishaps is available. The best publicly available and Sledge, 1976), were deliberately designed for
data in the US appears to be in the area of ease of test. The core elements of the language were
Occupational Health and Safety incident instructions such as:
investigations, and in documented court cases
involving software failures. However, in the case of START
Occupational Health and Safety, many accident root MOVE (position)
causes may be traced to process, sensing, or display GO TO (line number)
irregularities - even when software is involved. In STOP
court cases, e.g., those involving personal injury in RESET
manufacturing operations, the legal profession is
frequently challenged to differentiate between error In fact, sometimes START and STOP were
on the part of a software user, and errors in the instantiated in hardware! But, even in these relatively
software itself: until very recently, end responsibility simple cases, program verification could be difficult.
for safety-related functions has often been delegated The developer would be required to compute any
by the courts to users or operators of software, even coordinate transformations with a slide rule! Running
in cases of software malfunction. The vast majority them through the machine tool, determining if the
of unscheduled outages are "routine", and the part was right, and then modifying the program if
appropriate unit or subunit is investigated, and then necessary tested the early programs. The data and
reset or restarted within a few minutes; nevertheless, logic of the programs were combined in the MOVE
part production runs below capacity during this time statement. The position data for the MOVE command
interval. was a set of coordinate values specifying the next set-
point for a servo; the entire mechanical structure and
The advent of web-based and distributed software, coordinate system of the machine tool was presumed
often with multithreaded (concurrent) operation, and to be known to the user. The only transfer of control
the contemplated use of wireless links for factory was via an unconditional GO TO. Instructions were
networks will create system-level fault modes of a executed according to a fixed, precisely timed clock
complexity that could only be imagined a few years cycle. No safety checking was performed, so the
ago. In spite of this, it is not likely that computing machining head could collide with any jig or guide, or
progress will be reversed by these considerations. indeed with a part of the machining table itself. Such
Instead, what may be required is a host of much more machines were commonly found in a state of
powerful verification and validation methods. The considerable damage after a few years of use!
study of more powerful verification methods is bound Nevertheless, numerically controlled machine tools
to become more important as software becomes more became very popular and were used to make very
complex. The purpose of this presentation is to complex parts with a much higher degree of
review some of the fundamental factors underlying consistency than human operators could produce.
manufacturing software dependability, to survey the The machine and software could be verified, and even
state of the art in current products and research concurrently calibrated, by running a set of simple
related to verification, validation, and safety of such calibration routines and then gauging the resulting
systems, and to provide a brief preview of some more work-pieces to compare them with their intended
recent research that shows promise in improving the values. Today's machine tools, of course, support a
quality of service (QOS) of manufacturing much higher degree of complexity, and may include
automation software. With an understanding of built in 3-D simulators with collision detection
feedback processes and manufacturing system capabilities.
dynamics, control engineers and scientists are well
qualified to play vital role in the future of dependable •
manufacturing systems.

2. MANUFACTURING CONTROL
BACKGROUND

Many of us are familiar with the development of


manufacturing automation equipment - but have we
ever thought about the evolution of test and
verification for such equipment? The purpose of this
brief sprint through history is to trace the growth of

154
(often after controller installation) to provide for
correct operation during unexpected events such as
part jams or power failures. The "main execution
loop" structure of such programs was also frequently
confounding to the un-initiated, since in its pure form,
the PLC would rely on the controlled system to store
"state" information. For instance, an external relay
would be set on one cycle and then its state could be
read on the next cycle as a primitive form of "state
transition". This fact, that the developer of an RLL
program might scheme to store part of the system
state in the plant itself, was often a big impediment to
verification, particularly when the verifier was not
intimately familiar with the equipment under control.
At times, dummy states were even used to delay a
Figure 1: Machine Tool with CNC (early models transition, based on the known timing of a loop (even
used APT) though PLC's later contained timers that could be set
and tested with a similar effect).
In the late 1950's and throughout the 1960's and
1970's, a complementary machine was born: the The PLC is also of great interest because it was
programmable logic controller (PLC). Such developed as a "universal" controller, where the
machines originally came about to replace racks of program itself was the control law and could be
relays that had been developed in the 1920's to changed directly without any intervening "re-design"
1940's to govern the sequential control of complex process. Soon programs having thousands of lines of
operations such as telephone line switching, railroad PLC code were being written. Verification of PLC
signalling, and some early automation equipment. programs became a cottage industry! As the
The development of the PLC was stimulated by programs became large and complex, the "side
several problems with relays racks: effects" of changing even a single line of code, rather
• Relays would fail mechanically after 30,000 or than the likelihood of a bug occurring in the change
40,000 operations. itself, came to dominate the decision to make
• Reprogramming required re-wIrIng and program changes; experienced PLC programmers
knowledge of the original logic! would become more and more wary of changing code
• Large systems became limited by reliability of as programs became larger. Various methods of
individual components and wiring. simulating plant behavior came into use; in some
Early PLC's like CNC equipment, used punched cases manufacturing equipment itself might contain
paper tapes (or front panel switch settings) to read in "test" modes that could be used to assist in
programs. Their role in the factory was different, and verification, since "open loop" testing of the PLC
in some sense complementary to that of the CNC often was not meaningful. Testing in the presence of
machine: timing changes in the plant (or the controller) is also
quite difficult with PLC's, and usually requires an
• Their inputs and outputs were normally logical
(binary) rather than numerical (integer) actual manufacturing equipment installation to verify.
For this reason, many "programming" systems were
• They operated asynchronously with respect to the
eventually made portable, so that they could be
process
attached to equipment in situ. Methods for formal
• They used a programmed sequence of logic
verification of PLC code are now coming into
steps, normally executed in a loop repeated at a
commercial use, and will be discussed a bit later.
rate much higher than the rate of occurrence of
new events in the controlled manufacturing
process
• The essence of the program was in the cycle-to-
cycle changes in the results of each ladder logic
"rung", not in perfect or unconditional
repeatability.
Rather than becoming expert at the wiring of a relay
rack, the PLC programmer became expert at "Relay
Ladder Logic" (RLL) programming, which was
normally interpretive rather than pre-compiled (since
programs frequently were modified on the shop
floor). Still, the program provided only very implicit
reference to the desired properties of the system under
control, making it extremely difficult to debug. Ladier DIagram (LD)
Often, such programs would be developed first for the
desired "correct" sequence of operations, and then Figure 2: Relay Ladder Programming (GE Fanuc
conditional sections would be added to the program LogicMaster 90)

155
any local controller can use any subset of the data to
Several subsequent developments have now also determine the outcome of a control action.
occurred:
.s.-_";~""c.n..I
.---
• Factory Local Area Networks (LANs) were
developed in the 1970's and 1980's, initially to
11

. _1.-
'~"(crMllAon

transmit real-time control data between different oCa..ntokf_

PLC's. o$I_M"'~IrQ""'Colmd

• A "supervisory" or "enterprise" layer, initially -o",c.-..


based on mainframe computers, was added
of.I:na~liOftI-....u.
.c-.n... _ I
"above" the PLC layer, to support program
down load and factory-level synchronization and ............. ..........
GotIm
"~t_
....,ct
c. ...... ' _ \

performance data collection.


• A (typically separate, and often non-real-time)
monitoring and fault-reporting network was
I
added, and tied in separately to the supervisory
layer.
The supervisory and monitoring functions, being
Figure 3: Factory LAN Hierarchy
essentially "open loop" , were less critical to
operation, and in many cases would operate for years
with serious deficiencies. They were often only
tested carefully when someone actually needed to This paradigm could be successfully applied to
make use of the data that they produced! This was factory data collection networks, but was not usually
not true, though, for the Factory LAN system, which successful when applied to PLC data, since the
was typically required to synchronize operations implementation of sequencing constraints along the
between different work areas in a plant (e.g., two ends production line was difficult to implement; it would
of a conveyor system!), so hardware, software, and be easy to create control loops with non-local
timing of verification of such LANs became feedback that would oscillate at frequencies that were
extremely important. Although these systems are dependent of material transport delays through the
being widely replaced with Internet technology today, system; hence, the "data driven" paradigm was rarely
the Factory LAN evolution during the 1980's and applied to plant wide control systems (with the
1990's made important contributions to reliability, possible exception of "open loop" controls at the
data encoding, and protocol development that enterprise level). The most common reflections of
eventually benefited Ethernet based technology. inconsistent network connections of this type are limit
Since PLC's were by nature asynchronous in concept, cycles (set/reset of the inconsistent variables), or
the use of Factory LANs was a natural development deadlocks (e.g., two types of equipment each waiting
(it was as if 1/0 bits were simply set from a more for the other to initiate an action). A common
remote location in the factory), and most PLC practice in avoiding such conflicts is to force all
systems could still be verified on a machine-by- synchronizing actions to occur through PLCs, while
machine basis. In other words, little additional effort having a separate data, monitoring, and/or
was normally required to establish tight concurrency "enterprise" network which is merely used for data
between multiple PLCs ("islands of automation"): collection, polling, or remote diagnostics of PC-based
where this was required, an input or output line could equipment controllers, but never for control.
simply be run directly to both systems. At the same
time, factory-level start-up now required more The most significant development within the last
attention when large PLC subsystems became decade has been the advent of web-based
coupled. A common verification approach would be applications. The ability to remotely view, re-
to work backwards from the end of a line forward in program, re-con figure, and sometimes even service
order to verify proper interactions of PLC equipment from a desktop has enormously improved
subsystems, e.g., during re-start of a plant following factory productivity: manufacturing control
an unscheduled outage. engineers do not have to personally visit each item of
equipment during an outage in order to reprogram,
Today, many individual items of manufacturing reset or restart it, resulting in much shorter outages,
equipment have become quite a bit more accurate and and routine fault mo~itoring and diagnostics can be
sophisticated, often incorporating their own PC-based done at the desktop. With web-based tools, the
controllers, rather than relying on a PLC to perform remote diagnostics and servicing of equipment by
all of the necessary non-mechanical logic. Factory OEMs also eliminates travel and delays associated
LANS may link PC's as well as PLC's, and there is with unexplained trips of more complex equipment
ample opportunity for inconsistency in these (Locy, 200 I). Standards for security and e-business
interconnections. In the late 1980's and early 1990's, interfaces to support remote monitoring and diagnosis
the "data driven" factory became a popular concept. are under development in several industries, as
In the extreme interpretation of this concept, all OEM ' s take on more responsibility to diagnose
measurement data is deposited in a database complex equipment faults. The advent of more
(nominally, the present "state" of the factory), and complex patterns of data transfer has made system

156
level verification still more difficult, even when
dependability at the subsystem level may have
improved (sometimes due to very reliable computing Con,IIexlty Growth with .tIi8i1abl1ity Decline
hardware replacing much less reliable mechanical
1.6
relays and interconnects).
1:1.4
.. 1.2 - - - -- -, .
> . --~__~~__
lr , -Al8ilobility ,
I ~ 08 ~ " : . - HiWMre ReiJobility
'i 0.6 j ~Compexity i :

1~.~ j o ~­
o 10

Figure 5: Software Complexity Growth and


System Availability Decline (illustration only)

Figure 4: e-Diagnostics Logical Layer 3. SYSTEM ENGINEERING AND


DEPENDABILITY
The next generation of factory computing technology
can be expected to include wireless links, particularly Dependability is a system engineering requirement,
for monitoring and diagnostics. Generally, control and normally involves performance requirements on
decisions can be made on the basis of much more both the severity and frequency of occurrence of
complex combinations of measured variables. The unscheduled outages or production interruptions due
role of PLC's in the factory can be expected to to production equipment, and particularly automation
continue to decline, or at least the re-programming of equipment, failures . Severe outages usually involve
PLC's can be expected to occur via remote occupational safety, environmental damage, or
"programmer" interfaces. Model-based control of equipment damage (Crocker, 1987). A recent
discrete manufacturing processes can be expected to summary of root causes of industrial accidents based
expand as the ability to visualize equipment state at a on US Occupational Health and Safety incidents
remote location is perfected. From a control reported that failure to perform adequate Preliminary
perspective, two of the most substantial threats to Hazard Analysis and Change Management studies
dependability may become external network security (particularly for environmental control systems) were
threats (particularly on wireless links), and the advent among the most common root causes of severe
of "applet" style (or downloadable) dynamic software accidents (Belke, 1998). Less severe, but much more
modifications. Rigorous factory software frequent outages are caused by loss of primary facility
configuration management will be necessary if the power or by nuisance trips and erroneous fault
impact of these changes is to be confined. Factory- indications of manufacturing equipment or controls.
level regression test suites will probably become a Commonly, the controller either instigates such trips
reality . Robustness to interruptions of wireless when production part data or timing is out of
transmission will pose new challenges to controls specification, or when manufacturing equipment
(and verification) designers! triggers a fault condition. In these cases, primary
effort is invested in resolving or re-setting the fault
The replacement of mechanical with electronic trigger in a minimum time, and then in re-establishing
controls during this revolution in manufacturing production rates (Farrell, et aI., 1993). Recent
technology has been driven largely by the need for advances in system level control of supply chains and
controls to be more reliable than the controlled equipment buffers have suggested that "hedging
process, and the fact that typically electronics have point" strategies (implemented in the design and
(hardware) failure rates about three orders of control of material handling systems, typically) are an
magnitude lower than mechanical equipment. The optimum way to accommodate temporary disruptions
threat, however, is that now software dependability to production (Bullock and Hendrickson, 1994;
may limit further automation progress at the Mourani, Hannequin & Xie, 2003).
enterprise level. Even today, in some industries such
as semiconductors and automobiles, system level Standards are playing a major role in the advance of
availability remains moderate, in spite of very high dependability in public systems in general, and
dependability at the unit operation level. Enterprise manufacturing software in particular. International
systems accumulate so much data that they may be standards organizations such as the ISO and IEC;
forced to "stumble" from one data deficiency to professional societies such as IFAC, IEEE, ASME,
another, only operating as intended for brief periods SAE; and privately sponsored industrial entities such
oftime! as EPRI, International Sematech, Underwriters

157
Laboratories, and many others have successfully practice often result in very long start-up times for
promoted the development and adherence to new plants, where many equipment items and
standards. Standards have a significant impact on the software must be re-designed.
formulation of performance specifications and are a
means by which the manufacturer (by specifying that One benefit enjoyed by many manufacturing systems
certain standards shall be met when production is the common practice of standards that apply to
equipment is purchased) and the public (through various classes of equipment: communications,
occupational and safety standards) can assure safe voltage and current levels, terms of reference for PLC
operation and high quality products. equipment (IEEE, 2000; IEC, 1986; CENELEC,
1997; Suyama, 2003). To some extent, standards
Either product or process specifications may affect circumvent the need for "custom" testing of
the dependability of a manufacturing system. A "well components, with "plug and play compatibility"
toleranced" product design will match part being a common objective. At the same time, there
dimensional specification accuracy to manufacturing is a danger that many standards are only specified
system capability (Phillips, 1994), so that the "down to some level" or "under certain conditions",
dimensional tolerance is not tighter than the beyond which, implementation details are left to the
capability of cutting equipment, for instance. If this supplier' s discretion. A frequent consequence of this
is not done, then standard quality tracking methods is that equipment that nominally meets certain
such as statistical process control (SPC) will generate standards will in fact fail to interface correctly due to
gauging alarms more often than necessary. Among differences in OEM supplier practices in using lower
numerous process specifications are common level options. Standards such as the ISO 7-layer
variables such as power quality, ambient temperature, protocol model of communications, and data
vibration levels, humidity, vapor pressures, and exchange standards are notorious for these problems.
particulates in air, that represent potential "common A more insidious version of this problem occurs when
mode" sources for large numbers of out-of-tolerance equipment appears to interface correctly, but fails to
events. Measurement and control of these system communicate when an exception condition is raised -
level variables is critical for the reduction of nuisance i.e., in precisely the condition where communication
alarms. is most critical!

As our focus of interest will be on unit-level control Recent developments in the automotive and
and operations, the primary interest in system level semiconductor industries (Locy, 2001; Schoop, et aI,
methods concerns the manner in which unit level 2001) have motivated the development of improved
requirements are developed from system level methods for equipment monitoring and diagnostics.
dependability-related requirements, and in potential As noted above, these innovations are motivated by
areas for improvement of current design processes. the higher frequency of brief, non-disruptive outages
Of course, unit level requirements are based on a in manufacturing equipment, and by the rapidly
functional flow-down from system level increasing costs of diagnosis and repair for very
requirements. Commercial manufacturing practices, sophisticated precision manufacturing equipment.
at this level, are usually less rigorous than The Internet provides a significant opportunity for
corresponding practices in safety-critical military or remote monitoring, diagnostics, and even repair of
transportation products, and a few of the differences equipment. Expert designers of OEM equipment can
are instructive (Neuman, 1995; 000, 1984b). remotely access, inspect, and diagnose equipment
Practices such as (software) requirements trace-ability condition, and in some cases instigate software
and Preliminary Hazard Analysis (PHA) are repairs remotely, or provide remote guidance to on-
becoming common in safety-critical systems (Parnas, site repair staff. E-Oiagnostics has become an
et ai, 1990), but are not yet common in manufacturing important new technology for reducing false alarms
systems. Also, perform-ability, statistical and downtime due to nuisance faults. With higher
reliability/availability and life cycle cost analyses are availability targets for automated factories, brief,
now becoming common practice in critical systems, repeated outages could significantly impact system
but are not yet common in manufacturing control availability (Chen & Trivedi, 1999).
systems - except for part gauging and statistical
quality control, both of which are frequently done off- At the unit operation level, top-level requirements are
line rather than on-line (Abbott, 1988). The use of a often reduced to the sequencing of sub steps,
system level simulation during preliminary design, accommodation of faults and alternate modes of
particularly to validate requirements, is also missing operation, on-line part monitoring, and plant level
from manufacturing design practice, or if present may synchronization and reporting. As indicated
be done only on parts of the system (e.g., material previously, two forms of hardware are most common
flow balance at the top level, individual unit operation for the implementation of control logic: The PLC and
sequencing at the unit level). Finally, consistent the Pc. We now proceed to our main task of
standards are often missing for acceptance testing, addressing dependability at this level.
with the focus being on "working demonstration"
rather than a coherent approach toward extreme-value
testing. These deficiencies in system level design

158
4. DEPENDABILITY AT THE UNIT OPERATION meet fonnal specifications but fail to meet infonnal or
LEVEL unstated requirements. Requirements in commercial
applications tend to focus on intended nonnal
operation of a system, and not on how it is expected
At the unit level, the dependability of control to behave in the event of specific types of resource
hardware, except in safety-critical applications, has limitations. Requirements often fail to address
nearly ceased to be an issue (Bryan and Siegel, 1988). transient perfonnance (e.g., during start-up, restart, or
Dedicated, embedded code runs in millions of shutdown), and may also omit to mention timing
applications daily, with digital hardware failure rates, synchronization errors between subsystems, or even
even for entire CPU boards, commonly in the range within as subsystem (Gorski, 1986). This has become
of 10-6 to 1O-8/hr. Or better. This is the result of more significant as concurrent but loosely
many semiconductor and circuit design innovations synchronized software subsystems have come to
that are too numerous to begin to describe here. A characterize most manufacturing applications.
few system level hardware developments during the
1990's are noteworthy: Improvements can be made in several areas, but with
• The development of highly reliable low-cost non- reference to the above discussion some specific topics
volatile memories that require additional research results are as follows:
• The advent of the "safety PLC" (Allen-Bradley, (I) notification of "hardware" designers of operating
2001) system perfonnance and certification requirements,
• The development of highly reliable multi-layer (2) use of "ontologies" to capture the extended
high-speed asynchronous communication "meaning" of a fonnal requirement statement; (3) use
protocols and devices. of software-integrated fault tolerance, inspectability,
While hardware dependability has improved and built-in test, (4) specification of opportunities to
dramatically, software complexity has exploded, and use standardized, configurable, pre-tested
may be on the verge of driving many applications - "middleware" libraries. More advanced design
while very fully functioned - toward lower levels of methods, based on UML, for instance, lead the
dependabi I ity! designer through event sequence diagrams or similar
paradigms to identify the end results of design
In first examining general purpose embedded decisions (Grimson and Kugler, 2000). But as often
applications, frequently implemented on the PC or via as not, the PLC programmer will be required to
PLC co-processors, certain dependability issues generate PLC code "on the fly" without proper
become apparent at the "unit operation" control level opportunity for test and debugging.
(Boasson, 1993). The first issue is the lack of good
validation models. The manufacturing "plant" is 5. OPPORTUNITIES FOR FORMAL
usually described either in qualitative tenns or VERIFICATION
sometimes by reference to process equipment
supplied by a particular supplier. Even in the rare The IEC61508 standard has provided for the
instances when a nominal operating sequence is pre- possibility of pre-certification of certain product
specified, the absence of a model of the equipment software (IEC, 2000). For safety critical systems, this
operation makes the identification of extreme may require an Independent Verification and
performance limits or fault conditions difficult to Validation by an external authority, in addition to the
detennine (e.g., Frachet, et aI., 1997). Occasionally submission and review of design and test
tools such as the fault tree, FMEA or FMECA are documentation (RTCA, 1985; CENELEC, 1997;
used to provide qualitative infonnation about fault IEEE, 2000; MoD, 1991). The standard also for the
conditions and corrective actions, but even in these first time allows for the use of fonnal verification
cases, the likelihood of occurrence of various faults is methods to be used in the certification process,
rarely known in advance (000, 1984a; Greenberg, particularly where the collection of operational test
1986). Similarly, queuing models may have been data is not feasible (e.g., on a space flight, or in a
used to establish operating limits and baseline product system based on new technology (Lions, 1996)). The
flow rates, but this is relatively rare except in cases same is true for corresponding military standards such
such as large new plants. Due to the lack of as the British DO-55, and safety critical system
perfonnance limits and system-level models of standards such as CENELEC. In general, such
desired perfonnance, preliminary hazard analyses methods require a fonnal requirements analysis, use
(PHA) also cannot be performed completely, so that of certified operating systems, and then fonnal
extreme test cases are very difficult to define and verification of code (Pilaud, 1990).
construct (Gruman, 1989).
The Grafcet standards and subsequent specification
Verification (i.e., that a proposed design will meet methods (Arzen, 2002; Silva, et aI., 200 I) have
requirements) is also subject to limitations. In most attempted to extend logic verification to address
cases, fonnal requirements only "cover" a very small certain elements of data flow and timing. The use of
subset of anticipated operating conditions, and while IEC 61499 function block representations as a
this makes verification easier (in the formal sense), it starting point for fonnal verification methods has
often leads to manufacturing control systems that been considered by Schnackenbourg, et al. (2003).

159
Rausch and Krogh (1998) reported some early results Initial attempts at PLC program verification, in
in formal verification of PLC programs. A broader particular De Smet and Rossi (2003) and related
overview of prior work is provided by Frey and Litz efforts, illustrate some important limitations of
(2000). De Smet and Rossi (2003) consider formal presently available verification methods:
controller verification of RLL's with and without • The PLC code itself does not capture enough of
model checking, and find model checking to be the system definition (or "model") or
significant in reducing the computational effort requirements to restrict the size of the
required to verify realistic safety and liveness verification problem to a practical size. This
properties for a realistic pick-and-place case study. requires the manual formulation of a number of
Encouraging results on robustness of concurrent additional constraints, which at this time requires
computational algorithms to inter-process timing deep knowledge of both the application and of
variations have been proposed (Ushio and Wonham, formal verification methods.
2001). Formal methods for developing and verifying
diagnostic codes have been proposed (Paoli and • Presently available formal verification methods
LaFortune, 2003). Results for Petri nets illustrate the do not readily distinguish between errors in logic,
possibility of deducing system level properties such errors in coding, and inconsistencies in
as liveness and safeness, from subsystem properties. requirements, although all of these sources may
However, even for networks consisting exclusively of lead to verification failures and counter-
interconnected PLC's, system level verification examples.
remains elusive (Rush by, 1986; Sennett, 1989). The
SCADE tools (see next section) allow a system level Two factors that may accelerate the resolution of
Stateflow™ diagram to be selectively verified these issues are: (I) improvements in system
between certain points in the diagram . specification methods, including several cited
previously, that ultimately provide a complete set of
Certain operating systems such as the OSPM constraints, requirements, and system models that will
operating system (www.ose.com) and VxWorks™ permit the complete automation of the verification
operating system have been certified for certain process in practical applications, and (2) the adoption
applications, although certification of applications of more rigorous certification requirements for safety-
based on these operating systems cannot in most critical systems by public agencies and standards
cases be obtained purely based on the operating organizations.
system certification.

To date, very few opportunities for formal 6. PRESENTLY A VAILABLE VERIFICATION


verification have been realized in commercial TOOLS
software. Although formal verification has been used
in certain critical military and space applications, the
process still involves many manual stages, a good A fledgling commercial industry has begun to
deal of expert knowledge, and an order of magnitude develop around verification and validation needs, and
increase in time and/or cost in comparison to both commercial and well-tested university software
commercial applications. Within the commercial is available. In this section, we provide a brief
domain, perhaps formal verification of correctness of synopsis of some recent tools that are suitable for
communication protocols is among the most widely improving the dependability of manufacturing and
used application of formal verification; and this is closely related critical systems. This list is
often performed at the theoretical level, rather than on representative, and not in any way comprehensive: it
the software that implements a protocol. Although contains primarily tools that are suitable for
communication protocols may form a (small) part of commercial applications.
manufacturing automation systems, the vast majority
of such systems rely on heritage code that (even when The following tools are summarized:
the source code still can be found, after years of use)
is very difficult to subject to formal verification; from Praxis Critical Systems - SPARK
the historical background of Section 2, the reasons for ( http://www .prax is-cs.co.ukiflashcontent/our-un igue-
this are evident. Manufacturing applications may be products-I .htm) Spark is a subset of Ada used for
seen to have two properties that are favorable for high-integrity program development. It has been used
formal verification - the use of relatively simple in some commercial applications such as railway
programming languages (such as RLL), and the interlock safety programming.
relatively high cost of logical and coding errors.
They lack one important pre-requisite for early use of Reqtify:
formal verification : both the products and processes (http://www.tni-world.com/regtifv.asp) The Reqtify
of developing manufacturing software are extremely toolset allows a user to mark up a requirements
cost-sensitive, and the substitute of low-cost manual document and to construct a requirements trace-
programming is readily available. ability matrix from a natural language requirements
document.

160
Reactis: has many gaps that are not addressed by formal
(http://www .reactive-systems.comlproducts.msp ) methods alone.
Reactis is a new product that allows automatic model
checking of Matlab™Simulink diagrams. The development of formal methods for the statement
and application of system requirements is still a key
SCADE bottleneck (Grimson and Kugler, 2000; Jaffe, et aI,
(http://www.esterel-technologies.comlv3I?id=1328l) 1991; Svedung, 2002, Machado, et ai, 2003). A key
SCADE is a code generator for safety-critical military issue is that natural language statements of system
applications that is a part of a life-cycle code level requirements are incomplete and may be
generation and maintenance system. inconsistent. Not only are the "terminals" (nouns) in
the grammars used to express such requirements
SlideMDL: undefined, but also many of the relationships
(http://www.ece.cmu.edu/cecs/mainiprojects.html) expressed by the requirements are difficult to
This program-slicing tool developed at CMU operates translate into quantitative terms. When a requirement
on Matlab™ISimulink diagrams to trace data is expressed in natural language, the author often
dependencies forward or backward from a given point infers a host of relationships that are implicit with the
in a data flow diagram. language. Not only do the nouns need to be
associated with physical entities on the manufacturing
C-BMC: floor, but also relationships need to be related to unit
(http://www-2.cs.cmu .edu/-modelcheck/cbmc/) operations or modes. A manufacturing control
C-Bounded Model Checker, also developed at CMU, ontology is needed so that the inferred relationships
is an extension of the SMV (hardware) verification among entities in a requirements document can be
concepts to apply to programs written in ANSI "c" explored automatically (e.g., by traversing
language. See also Kroening, et aI., 2003 . relationship graphs and inferring implied
requirements from them).
Codecheck:
(http://www.abxsoft.com/) Codecheck uses an Often, it is ironic that the controller or controlled
extended first-pass compiler analysis to flag object may not be mentioned at all in requirements
violations of programming rules at the program documents: it is assumed to exist! This gap may be
development stage. filled by the use of standardized manufacturing
simulation languages, practices, or graphic
SPIN: representation paradigms (Vain and Kyttner, 2001).
(http://spinroot.comlspin/whatispin.html) SPIN is For instance, many libraries of process control
one of the earliest and most mature formal primItIves, and application-specific mode ling
verification tools, and can now be applied to packages now exist for steam pipe systems, power
distributed computing applications. plants, motors, batch process operations, circuit
design, and signal processing. By associating the
Codewizard: entities and relationships in these packages with
(http://www.parasoft.com/ jsp/products/) Codewizard ontologies, one can infer the existence of certain
can check coding standards such as IEEE 1483 for system components (e.g., a boiler control) when only
railway interlocking, and supports predefined rule the (controlled) system is cited in the requirements.
bases. In this way, a requirement can be associated with a
set of preconditions on a high-level plant model, a set
Po/yspace Auditor: of operations (or controlled modes), and a set of
(http://www.polyspace.com/) This award-winning outcomes. By using inference (or perhaps fuzzy
product can be applied to formal verification of C inference; Holmes and Ray, 2001) to traverse such
source language programs. graphs or ontologies, a much more complete set of
As experience is gained with these tools through inferred requirements (and also test cases) can be
selective applications, typically by universities or automatically generated. Another irony is that in
industrial research laboratories, certain leading spite of the existence of qualitative requirements, it is
concepts are expected to emerge. We can certainly often difficult to define precise quantitative behaviors
look forward to many success stories where formally that are expected for specific quantified inputs to a
validated software "saves the day" and avoids safety system: not only are the test cases difficult to derive,
and environmental hazards, while facilitating the next but the expected performance in any specific test case
level of automation - diagnostics and maintenance. may be difficult to derive. In fact, tracing through
various requirements that apply in a given test case,
may allow one to define - by intersecting a number of
7. A FUTURE VISION FOR DEPENDABLE qualitative performance conditions, each derived from
MANUFACTURING CONTROL a different path through the requirements network - a
much more precise statement of expected behavior.
A VISIOn for future advances in dependable At the same time, this approach may prompt
manufacturing software is beginning to emerge, but it developers to state requirements with greater

161
precision and consistency than can now be needed. The advanced concepts developed for
accomplished. diagnostics and remote servicing are very promising,
but they require too large an investment of
A system strategy is needed for certification itself! engineering effort to be cost-effective when
Verification and validation activities, today, require a equipment has short life cycles. The preceding
combination of process certification, formal design and verification steps, if properly formulated,
verification (which is still optional), traditional case- can provide much data that can be used during later
by-case testing (even when this is known to be stages of the manufacturing life cycle, so that
incomplete), and system level validation tests (Musa, dependability is not a characteristic that only
1993). This process is very costly and not very characterizes "the bottom of the bathtub curve" for a
strategic. If problems are encountered during system plant. In many cases, most of the profit is made by a
level validation, one must often undertake extensive business either during the falling (early) or rising
re-design and re-validation before the system can (late) stage of the bathtub curve - so this is where
progress. The case of long start-up delays for new attention is needed!
manufacturing plants was mentioned previously.
Incomplete requirements, particularly hazard Returning to the example cited in the introduction,
requirements, as indicated above, are the single most there is a sister example that is a success story: The
serious and most costly sources of loss of use on a recently implemented, agent-based system at
dependability (Kletz, 1982; Knutson and Daimler-Chrysler (Schoop, et ai, 200 I). This is a
Charmichael, 2003). Detailed attention is needed to well-designed system with demonstrated availability
provide the documentation and information structures benefits, based on rigorous and consistent application
during the early stages of design that will lead to very of modem agent-based factory software integrated
low defect rates in later stages of design. Concepts with PLC-based factory automation software.
such as late-point identification and design of Although formal verification was not used explicitly
experiments need to be applied to the determination in this system, the analysis suggests that appropriate
of system validation test plans. Since system design documentation and practices could be applied
demonstration and validation are generally very to verify important parts of the system, such as the
expensive, particularly if fault conditions must be messaging protocol and PLC code. This is an
verified, great care should be taken to optimize the appropriate challenge upon which to close this
testing that is done at this stage so that a maximum discussion!
amount of critical information is extracted during the
system validation process, and so that sufficient 8. REFERENCES
"tunability" exists in the system level parameters that
a complete redesign is only very rarely needed. Abbott, H. (I 988) Safer by Design: The Management
of Product Design Risks Under Strict Liability. The
A third area worthy of note is design for Design Council, London.
serviceability and life cycle management of a
manufacturing facility. The life cycles of most Alford, C. 0., and R. B. Sledge (I976),
products today are much shorter than that of the Microprocessor Architecture for Discrete
equipment needed to manufacture them. This Manufacturing Control, Part I: History and Problem
technological progress has led to enormous waste, Definition, IEEE Trans. Manufacturing Technology,
and a glut of slightly used but highly specialized Vo!. MFT-S, No. 2, pp. 43-49 (et seq).
manufacturing equipment (with the rapid growth of
an attendant world wide market for capital equipment AlIen Bradley Co. (200 I) Safety PLCs: How They
re-use). Before it is placed in service, almost every Differ from their Traditional Counterparts. White
manufacturing facility is already scheduled for capital Paper (1755-WPOO I A-EN-E), Rockwell Automation.
equipment upgrades . Dependability is no longer a
static concept: it is a dynamic requirement that needs Arzen, K-E., Rasmus Olsson, and Johan Akesson
to be updated and re-interpreted throughout the life (2002). Grafcet for Procedural Operator Support
cycle of a plant. Thus, verification needs to be Tasks. Proc. 15th IFAC Congress, Barcelona, July.
integrated with the development of diagnostic
methods that can be dynamically adapted as Belke, J. C. (1998) Recurring Causes of Recent
equipment is rearranged, tuned, or reconfigured Chemical Accidents, Proc. Int!. Con! and Workhosp
during the life of a plant (e.g., extensions of fault tree on Reliability and Risk Management, San Antonio,
analysis, as explored in Henry and Faure, 2003). The TX, Sept.
continuity of data from the initial concept through
operation, and finally disposition of a plant, should be Boasson, M. (1993) Control Systems Software. IEEE
given active consideration. This includes the Trans. Auto. Control, Vo!. 38, No. 7, pp. 1094-1106.
possibility of recovery, reconfiguration, and dis-
assembly or recycling of OEM equipment as it Brooks, F. P. (1995) The Mythical Man Month, 20th
reaches the end of its useful life. Built-in diagnostics Anniversary Edition. Reading, MA : Addison-Wesley.
and test software, based on logical concepts that are
independent of a particular controller architecture are

162
Bryan, W. and S. Siege!. (1988) Software Product Grimson, J. B., and H-J. Kugler. (2000) Software
Assurance-Reducing Software Risk in Critical Needs Engineering - A Position Paper, Proc. ICSE,
Systems. In COMPASS '88 Computer Assurance, ACM, Limerick, Ireland, pp. 541-544.
pages 67-74, Gaithersburg, MD, July.
Greenberg, R. (1986) Software Safety Using FT A
Bullock, D. and C. Hendrickson (1994) Roadway Techniques. In Safety and Reliability of
Traffic Control Software, IEEE Trans. On Control Programmable Electronic Systems, pages 86-95,
Systems Technology, Vo. 2, No. 3, pp 255-264. Essex, England, Elsevier.

CENELEC (1997) Railway Applications: Software Gruman, G. (1989) Software Safety Focus of New
for railway, control, and protections systems, British Standard, Def Std. 00-55 . IEEE Software,
Standard EN 50128 (June) 6(3): 95-97.

Chen, D. and K. S. Trivedi (2002), Reliability Henry, S., and J. M. Faure. (2003), Elaboration of
Engineering and System Safety. RESS 3012 (in press) invariants safety properties from fault-tree analysis,
Proc. IMACS-IEEE Computational Engineering in
Crocker, S.D. (1987) Techniques for Assuring Systems Applications (CESA'03), Paper S2-1-04-
Safety-Lessons from Computer Security. In 0372 .
COMPASS '87 Computer Assurance, pages 67-69,
Washington, D.C., July. Holmes, M. and A. Ray. (2001) Fuzzy Damage-
Mitigating Control of a Fossil Power Plant, IEEE
De Smet, O. and O. Rossi . (2002). "Verification of a Trans. On Control Systems Technology, Vo!. 9, No. I,
controller for a flexible manufacturing line written in pp. 140-147.
Ladder Diagram via model-checking," Proc. 21'1
American Control Conference, p. 4147-4152. IEEE. (2000) Verification of Vital Functions in
Processor-Based Systems Used in Rail Transit
DeNeufville, R. (1994) The Baggage System at Systems. STD 1483-2000.
Denver: Prospects and Lessons. Journal of Air
Transport Management, Vo!. 1, No. 4, Dec., pp. 229- International Electro-technical Commission. (1998-
236. 2000) IEC 61508: Functional safety of
electrical/electronic/programmable electronic safety
Department of Defense (US). (1984a) Procedures for related systems.
Performing a Failure Mode, Effect and Criticality
Analysis. Military Standard 1629A . Jaffe, M. S., N.G. Leveson, M. Heimdahl, and B.
Melhart. (1991) Software Requirements Analysis for
Department of Defense (US). (I 984b) System Safety Real-Time Process-Control Systems. IEEE
Program Requirements. Military Standard 882B. Transactions on Software Engineering, March.

Ehrenberger, W. D. (1987) Fail-Safe Software- Kletz, T.A. (1982) Hazard Analysis- A Review of
Some Principles and a Case Study. In B.K. Daniels, Criteria. Reliability Engineering, 3(4): 325-338.
editor, Achieving Safety and Reliability with
Computer Systems, pages 76-88. Knutson, c., and S. Carmichael. (2003) Safety First:
Avoiding Software Mishaps, in Embedded Systems
Farrell, J, T. Berger, and B. Appleby. (1993) Using Programming.
Learning Techniques to Accommodate Unanticipated
Faults, 1EEE Control Systems Magazine, pp. 40-49 Kroening, D. R., Clarke, E., and Yorav, K. (2003)
(June). Behavioral Consistency of C and Veri log Programs
Using Bounded Model Checking, Proc. DAC 2003,
Frachet 1.-P., Lamperiere S., Faure J.-M., "Modeling pp. 368-371, ACM Press.
discrete event systems behaviour using the hyperfinite
signal", European Journal ofAutomation, Volume 31 Krogh, B. (2003) SliceMDL URL:
n03, pp. 453-470, (1997) http://www.ece.cmu.edulcecs/mainlprojects.html

Frey, G., and L. Litz, (2000). "Formal methods in Lamperiere-Couffin S., Rossi 0 ., Roussel J.-M ., and
PLC Programming", Proceedings of the IEEE Lesage J.-1. (1999), Formal verification of PLC
International Conference on Systems, Man and programs: a survey", Proc. ECC '99, paper n0741 .
Cybernetics (SMC2000), pp. 2431-2436.
Lamperiere-Couffin S. and Lesage J.-1. (2002),
Gorski, 1. (1986) Design for Safety Using Temporal Formal verification of the Sequential Part of PLC
Logic. In IFAC SAFECOMP '86, pages 149-155, programs. Proc. IFAC 2002 World Congress.
Sarlat, France. Barcelona.

163
Leveson, N. G. (1991) Software Safety in Embedded Pi laud, E. (1990) Some Experiences of Critical
Computer Systems. Communications of the A CM, Software Development. In Ith International
34(2): 34-46. Conference on Software Engineering, pages 225-226,
Nice, France, March.
Leveson, N. G. (1995) Safeware: System Safety and
Computers. Reading, MA : Addison-Wesley. Place, P.R.H., and K. C. Kang. (1993) Safety-Critical
Software: Status Report and Annotated Bibliography.
Leveson, N. (2002) A New Accident Model for CMU/SEI-92-TR-5.
Engineering Safer Systems, MIT Engineering Systems
Division Symposium, Cambridge, May 2002 (to Radio Technical Commission for Aeronautics.
appear in Safety Systems) (1985). Software Considerations in Airborne Systems
and Equipment Certification, Standard DO-178a,
Lions, l-L. (1996) Ariane 5, Flight 50 I Failure, Washington, D.e.
Report of the Inquiry Board.
http ://www.esrin.esa.itlhtdocs/tidc/Press/Press96/aria Rausch, M., and B.H. Krogh (1998), "Formal
ne5rep.html Verification of PLC Programs," Proc. 1998 American
Control Conference.
Locy, M. (2001). The impact of e-diagnostics -one
year later. Proc. 2001 IEEEE Intl. Semiconductor Rushby, 1. M. (1986) Kernels for Safety? In T.
Manufacturing Symposium, pp. 435-438, San lose, Anderson, Ed., Safe and Secure Computing Systems,
CA . pages 210-220, Glasgow, Scotland, October.

Machado, 1.M., B. Denis, 1. Lesage, 1.M. Faure, and Schnakenbourg, e., 1.-M. Faure, and 1.-1 . Lesage.
1. F. DeSilva. (2003). "Model of the Mechanism (2002). "Towards IEC61499 Function Blocks
Behavior of PLC Programs, 17th Int'1. Congress of Diagrams Verification", Proc. IEEE Int'l. Conference
Mechanical Engineering (COBEM), Paper 0831 . on Systems Man & Cybernetics, Paper TA1C2.

Ministry of Defence (UK). (1991) Hazard Analysis Schoop, R., R. Neubert, and B. Suessmann (2001).
and Safety Classification of the Computer and Flexible manufacturing control with PLC, CNC and
Programmable Electronic System Elements of Software Agents, Proc. 5th IEEE Intl. Symp. On
Defence Equipment. Defence Standard 00-56, Autonomous Decentralized Systems, pp. 265-371,
Ministry of Defence, Great Britain, April. Dallas, TX .

Mourani, I., Hannequin, S. Xie, X (2003). Optimal Sennett, e. T. (1989) High Integrity Software.
discrete-flow control of a single-stage failure-prone Pitman.
manufacturing system. Proc. 42 nd IEEE Con! On
Decision and Control, pp. 5462-5467. Maui, HA. Silva, B. I., o. Stursberg, B. H. Krogh and S. Engell
(200 I ). An assessment of the current status of
Musa, 1. (1993) Operational Profiles in Software- algorithmic approaches to the verification of hybrid
Reliability Engineering. IEEE Software, March. systems. Proc. 40th IEEE Conference on Decision
and Control.
Neumann, P. (1995) Computer related risks. Addison
Wesley . Suyama, K. (2003). Safety integrity analysis
framework for a controller according to IEC 61508.
Paoli, A., and S. Lafortune (2003). Safe Proc.4r IEEE Conference on Decision and Control.
diagnosability of discrete event systems. Proc . 4r
d
Svedung, I. (2002) Graphic representation of
IEEE Conference on Decision and Control, pp . 2658-
2664, Maui, HA. accident scenarios: Mapping system structure and the
causation of accidents, Safety Science, vol. 40,
Parnas, D. L., G.J .K. Asmis, and 1. Madey. (1990) Elsevier Science Ltd ., pages 397-417.
Assessment of Safety-Critical Software. Technical
Report 90-295, Queens University, Kingston, Ushio, T., Y. Li, and W. M. Wonham. (1992)
Ontario, Canada, Ontario, Canada, December. Concurrency and State Feedback in Discrete-Event
Systems, IEEE Trans. Auto. Control, Vo!. 38, No. 8,
Perrow, e. (1984) Normal Accidents: Living with pp. 1180-1184.
High Risk Technologies . Basic Books.
Vain, 1. and R. Kyttner (200 I) Model Checking - A
Phillips, R.G. (1994) Use of Redundant Sensory New Challenge for Design of Complex Computer-
Information for Fault Isolation in Manufacturing Controlled Systems, Proc. 5th 1nt 'l Conf. On
Cells, IEEE Trans. Industry Applications, Vo!. 30, Engineering Design and Automation, Las Vegas, pp.
No. 5, pp. 1413-1425. 593-598.

164

You might also like