Design for Reliability
By Dev G. Raheja and Louis J. Gullo
()
About this ebook
A unique, design-based approach to reliability engineering
Design for Reliability provides engineers and managers with a range of tools and techniques for incorporating reliability into the design process for complex systems. It clearly explains how to design for zero failure of critical system functions, leading to enormous savings in product life-cycle costs and a dramatic improvement in the ability to compete in global markets.
Readers will find a wealth of design practices not covered in typical engineering books, allowing them to think outside the box when developing reliability requirements. They will learn to address high failure rates associated with systems that are not properly designed for reliability, avoiding expensive and time-consuming engineering changes, such as excessive testing, repairs, maintenance, inspection, and logistics.
Special features of this book include:
- A unified approach that integrates ideas from computer science and reliability engineering
- Techniques applicable to reliability as well as safety, maintainability, system integration, and logistic engineering
- Chapters on design for extreme environments, developing reliable software, design for trustworthiness, and HALT influence on design
Design for Reliability is a must-have guide for engineers and managers in R&D, product development, reliability engineering, product safety, and quality assurance, as well as anyone who needs to deliver high product performance at a lower cost while minimizing system failure.
Related to Design for Reliability
Titles in the series (12)
Effective FMEAs: Achieving Safe, Reliable, and Economical Products and Processes using Failure Mode and Effects Analysis Rating: 5 out of 5 stars5/5Design for Reliability Rating: 0 out of 5 stars0 ratingsApplied Reliability Engineering and Risk Analysis: Probabilistic Models and Statistical Inference Rating: 0 out of 5 stars0 ratingsReliability and Risk Models: Setting Reliability Requirements Rating: 0 out of 5 stars0 ratingsNext Generation HALT and HASS: Robust Design of Electronics and Systems Rating: 0 out of 5 stars0 ratingsThermodynamic Degradation Science: Physics of Failure, Accelerated Testing, Fatigue, and Reliability Applications Rating: 0 out of 5 stars0 ratingsReliability Engineering and Services Rating: 0 out of 5 stars0 ratingsDynamic System Reliability: Modeling and Analysis of Dynamic and Dependent Behaviors Rating: 0 out of 5 stars0 ratingsDesign for Safety Rating: 0 out of 5 stars0 ratingsPractical Applications of Bayesian Reliability Rating: 0 out of 5 stars0 ratingsPrognostics and Health Management: A Practical Approach to Improving System Reliability Using Condition-Based Data Rating: 0 out of 5 stars0 ratingsImproving Product Reliability and Software Quality: Strategies, Tools, Process and Implementation Rating: 0 out of 5 stars0 ratings
Related ebooks
Practical Reliability Engineering Rating: 4 out of 5 stars4/5Safety Instrumented Systems Verification – Practical Probabilistic Calculations Rating: 4 out of 5 stars4/5Essential Practices for Creating, Strengthening, and Sustaining Process Safety Culture Rating: 0 out of 5 stars0 ratingsMethods engineering Third Edition Rating: 0 out of 5 stars0 ratingsReliability Theory and Practice Rating: 4 out of 5 stars4/5Design For X A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsUncertainty in Risk Assessment: The Representation and Treatment of Uncertainties by Probabilistic and Non-Probabilistic Methods Rating: 0 out of 5 stars0 ratingsSystem Engineering Management Rating: 4 out of 5 stars4/5Reliability Assessment: A Guide to Aligning Expectations, Practices, and Performance Rating: 0 out of 5 stars0 ratingsReliability Engineering and Services Rating: 0 out of 5 stars0 ratingsFMEA konkret: Preventive risk analysis concretely with FMEA plus. The series of successful developers, trainers and presenters. Rating: 0 out of 5 stars0 ratingsISO 26000 in Practice: A User Guide Rating: 0 out of 5 stars0 ratingsReliability Centered Maintenance Rcm A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsReliability Engineering A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsImproving Product Reliability and Software Quality: Strategies, Tools, Process and Implementation Rating: 0 out of 5 stars0 ratingsFMEA A Complete Guide - 2021 Edition Rating: 5 out of 5 stars5/5IEC 61508 A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsNoise control A Clear and Concise Reference Rating: 0 out of 5 stars0 ratingsDesign for Safety Rating: 0 out of 5 stars0 ratingsRisk Assessment: Theory, Methods, and Applications Rating: 5 out of 5 stars5/5System Reliability Theory: Models and Statistical Methods Rating: 0 out of 5 stars0 ratingsFailure Mode Effects and Criticality Analysis A Clear and Concise Reference Rating: 0 out of 5 stars0 ratingsFMEA A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsDFMA A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsReliability Centered Maintenance Rcm A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsReliability Centered Maintenance A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsEffective Maintenance Management Rating: 0 out of 5 stars0 ratingsISA 95 Integration Standards A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratings
Electrical Engineering & Electronics For You
The Homeowner's DIY Guide to Electrical Wiring Rating: 5 out of 5 stars5/5Off-Grid Projects: Step-by-Step Guide to Building Your Own Off-Grid System Rating: 0 out of 5 stars0 ratingsHow to Diagnose and Fix Everything Electronic, Second Edition Rating: 4 out of 5 stars4/5Electricity for Beginners Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Electrician's Pocket Manual Rating: 0 out of 5 stars0 ratingsThe Electrician's Trade Demystified Rating: 0 out of 5 stars0 ratingsPractical Electrical Wiring: Residential, Farm, Commercial, and Industrial Rating: 4 out of 5 stars4/5Beginner's Guide to Reading Schematics, Fourth Edition Rating: 4 out of 5 stars4/5DIY Lithium Battery Rating: 3 out of 5 stars3/5Electrical Engineering | Step by Step Rating: 0 out of 5 stars0 ratingsRamblings of a Mad Scientist: 100 Ideas for a Stranger Tomorrow Rating: 0 out of 5 stars0 ratingsTwo-Stroke Engine Repair and Maintenance Rating: 0 out of 5 stars0 ratingsCircuitbuilding Do-It-Yourself For Dummies Rating: 0 out of 5 stars0 ratingsUpcycled Technology: Clever Projects You Can Do With Your Discarded Tech (Tech gift) Rating: 5 out of 5 stars5/5Understanding Electricity Rating: 4 out of 5 stars4/5Electric Circuits Essentials Rating: 5 out of 5 stars5/5Off-Grid Projects: Innovative and Practical Projects for Living Off the Grid Rating: 0 out of 5 stars0 ratingsElectrical Engineering Rating: 4 out of 5 stars4/5Schaum's Outline of Basic Electricity, Second Edition Rating: 5 out of 5 stars5/5Beginner's Guide to Reading Schematics, Third Edition Rating: 0 out of 5 stars0 ratingsBasic Electricity Rating: 4 out of 5 stars4/5Basic Electronics: Book 2 Rating: 5 out of 5 stars5/5Electronic Circuits for the Evil Genius 2/E Rating: 0 out of 5 stars0 ratingsElectronics for Artists: Adding Light, Motion, and Sound to Your Artwork Rating: 4 out of 5 stars4/5Build Your Own Electronics Workshop Rating: 4 out of 5 stars4/5THE Amateur Radio Dictionary: The Most Complete Glossary of Ham Radio Terms Ever Compiled Rating: 4 out of 5 stars4/5Simple, Practical Electronics Rating: 5 out of 5 stars5/5
Reviews for Design for Reliability
0 ratings0 reviews
Book preview
Design for Reliability - Dev G. Raheja
Contributors
Steven S. Austin
Missile Defense Agency
Department of Defense
Huntsville, Alabama
Lawrence Bernstein
Stevens Institute of Technology
Hoboken, New Jersey
Joseph A. Childs
Missiles and Fire Control
Lockheed Martin Corporation
Orlando, Florida
Jack Dixon
Dynamics Research Corporation
Orlando, Florida
Louis J. Gullo
Missile Systems
Raytheon Company
Tucson, Arizona
Samuel Keene
Keene and Associates, Inc.
Lyons, Colorado
Brian Moriarty
Engility Corporation
Lake Ridge, Virginia
Dev Raheja
Raheja Consulting, Inc.
Laurel, Maryland
Robert W. Stoddard
Six Sigma IDS, LLC
Venetia, Pennsylvania
C.M. Yuhas
Foreword
The importance of quality and reliability to a system cannot be disputed. Product failures in the field inevitably lead to losses in the form of repair cost, warranty claims, customer dissatisfaction, product recalls, loss of sales, and in extreme cases, loss of life. Thus, quality and reliability play a critical role in modern science and engineering and so enjoy various opportunities and face a number of challenges.
As quality and reliability science evolves, it reflects the trends and transformations of technological support. A device utilizing a new technology, whether it be a solar power panel, a stealth aircraft, or a state-of-the-art medical device, needs to function properly and without failure throughout its mission life. New technologies bring about new failure mechanisms (chemical, electrical, physical, mechanical, structural, etc.), new failure sites, and new failure modes. Therefore, continuous advancement of the physics of failure, combined with a multi-disciplinary approach, is essential to our ability to address those challenges in the future.
In addition to the transformations associated with changes in technology, the field of quality and reliability engineering has been going through its own evolution: developing new techniques and methodologies aimed at process improvement and reduction of the number of design- and manufacturing-related failures.
The concept of design for reliability (DFR) has been gaining popularity in recent years and its development is expected to continue for years to come. DFR methods shift the focus from reliability demonstration and the outdated test-analyze-fix
philosophy to designing reliability into products and processes using the best available science-based methods. These concepts intertwine with probabilistic design and design for six sigma (DFSS) methods, focusing on reducing variability at the design and manufacturing levels. As such, the industry is expected to increase the use of simulation techniques, enhance the applications of reliability modeling, and integrate reliability engineering earlier and earlier in the design process. DFR also transforms the role of the reliability engineer from being focused primarily on product test and analysis to being a mentor to the design team, which is responsible for finding and applying the best design methods to achieve reliability. A properly applied DFR process ensures that pursuit of reliability is an enterprise-wide activity.
Several other emerging and continuing trends in quality and reliability engineering are also worth mentioning here. For an increasing number of applications, risk assessment will enhance reliability analysis, addressing not only the probability of failure but also the quantitative consequences of that failure. Life-cycle engineering concepts are expected to find wider applications in reducing life-cycle risks and minimizing the combined cost of design, manufacturing, quality, warranty, and service. Advances in prognostics and health management will bring about the development of new models and algorithms that can predict the future reliability of a product by assessing the extent of degradation from its expected operating conditions. Other advancing areas include human and software reliability analysis.
Additionally, continuous globalization and outsourcing affect most industries and complicate the work of quality and reliability professionals. Having various engineering functions distributed around the globe adds a layer of complexity to design coordination and logistics. Moving design and production into regions with little knowledge depth regarding design and manufacturing processes, with a less robust quality system in place and where low cost is often the primary driver of product development, affects a company's ability to produce reliable and defect-free parts.
Despite its obvious importance, quality and reliability education is paradoxically lacking in today's engineering curriculum. Few engineering schools offer degree programs or even a sufficient variety of courses in quality or reliability methods. Therefore, a majority of quality and reliability practitioners receive their professional training from colleagues, professional seminars, and from a variety of publications and technical books. The lack of formal education opportunities in this field greatly emphasizes the importance of technical publications for professional development.
The real objective of the Wiley Series in Quality & Reliability Engineering is to provide a solid educational foundation for both practitioners and researchers in quality and reliability and to expand the reader's knowledge base to include the latest developments in this field. This series continues Wiley's tradition of excellence in technical publishing and provides a lasting and positive contribution to the teaching and practice of engineering.
Andre Kleyner
Editor
Wiley Series in Quality & Reliability Engineering
Preface
Design for reliability (DFR) has become a worldwide goal, regardless of the industry and market. The best organizations around the world have become increasingly intent on harvesting the value proposition for competing globally while significantly lowering life cycle costs. The DFR principles and methods are aimed proactively to prevent faults, failures, and product malfunctions, which result in cheaper, faster, and better products. In Japan, this tool is used to gain customer loyalty and customer trust. However, we still face some challenges. Very few engineering managers and design engineers understand the value added by design for reliability; they often fail to see savings in warranty costs, increased customer satisfaction, and gain in market share.
These facts, combined with the current worldwide economic challenges, have created perfect conditions for this science of engineering. This is an art also because many decisions have to be made not only on evidence-based data, but also on engineering creativity to design out failure at lower costs. Readers will be delighted with the wealth of knowledge because all contributors to this book have at least 20 years hands-on experience with these methods.
The idea for this book was conceived during our participation in the IEEE Design for Reliability Technical Committee. We saw the need for a DFR volume not only for hardware engineers, but also for software and system engineers. The traditional books on reliability engineering are written for reliability engineers who rely more on statistical analysis than on improvements in inherent design to mitigate hardware and software failures. Our book attempts to fill a gap in the published body of knowledge by communicating the tremendous advantages of designing for reliability during very early development phase of a new product or system. This volume fulfills the needs of entry-level design engineers, experienced design engineers, engineering managers, as well as the reliability engineers/managers who are looking for hands-on knowledge on how to work collaboratively on design engineering teams.
ACKNOWLEDGMENTS
We would like to thank the IEEE Reliability Society for sowing the seed for this book, especially the encouragement from a former society president, Dr. Samuel Keene, who also contributed chapters in the book. We would like to recognize a few of the authors for conducting peer reviews of several chapters: Joe Childs, Jack Dixon, Larry Bernstein, and Sam Keene. We also thank the guest editors—Tim Adams, at NASA Kennedy Center, and Dr. Nat Jambulingam, at NASA Goddard Space Flight Center—who helped edit several chapters. We are grateful to Diana Gialo, at Wiley, who has always been gracious in helping and guiding us.
We acknowledge the contributions of the following:
Steve Austin (Chapter 12)
Larry Bernstein (Chapter 13)
Joe Childs (Chapters 2, 6, and 15)
Jim Dixon (Chapters 9 and 16)
Lou Gullo (Chapters 4, 5, 10, 11, 14, and 18)
Sam Keene (Chapters 3 and 8)
Brian Moriarty (Chapter 17)
Dev Raheja (Chapter 1)
Bob Stoddard (Chapter 7)
C. M. Yuhas (Chapter 13)
Dev Raheja
Louis J. Gullo
Introduction: What You will Learn
Chapter 1 Design for Reliability Paradigms (Raheja)
This chapter introduces what is means to design for reliability. It shows the technical gaps between the current state-of-art and what it takes to design reliability as a value proposition for new products. It gives real examples of how to get high return on investment to understand the art of design for reliability. The chapter introduces readers to the deeper level topics with eight practical paradigms for best practices.
Chapter 2 Reliability Design Tools (Childs)
This chapter summarizes reliability tools that exist throughout the product's life cycle from creation, requirements, development, design, production, testing, use and end of life. The need for tools in understanding and communicating reliability performance is also explained. Many of these tools are explained in further detail in the chapters that follow.
Chapter 3 Developing Reliable Software (Keene)
This chapter describes good design practices for developing reliable software embedded in most of the high technology products. It shows how to prevent software faults and failures often inherent in the design by applying evidence based reliability tools to software such as FMEA, capability maturity modeling, and software reliability modeling. It introduces the most popular software reliability estimation tool CASRE (Computer Aided Software Reliability Estimation).
Chapter 4 Reliability Models (Gullo)
This chapter is on reliability modeling, one of the most important tools for design for reliability in he early stages of design, to determine strategy for overall reliability. The chapter covers models for system reliability, component reliability, and shows the use of block diagrams in modeling. It discusses reliability growth process, similarity analysis used for physical modeling, and widely used models for simulation.
Chapter 5 Design Failure Modes, Effects, and Criticality Analysis (Gullo)
This chapter on FMECA contains the core knowledge for reliability analysis at system level, subsystem level and component level. The chapter shows how to perform risk assessment using a risk index called Risk Priority Number and shows how to eliminate single point failures making a design significantly less vulnerable. It explains the difference between FMEA and FMECA and how to us them for improving product performance and the maintenance effectiveness.
Chapter 6 Process Failure Modes, Effects, and Criticality Analysis (Childs)
The last chapter showed how to make design more robust. This chapter applies the FMEA tool to analyze a process for robustness such that the manufacturing defects are eliminated before the show p in production. The end result is improved product reliability with lower manufacturing costs. It covers step by step procedure to perform the analysis including the risk assessment using the Risk Priority Number.
Chapter 7 FMECA Applied to Software Development (Stoddard)
The FMEA tool is just as applicable to software design. There is very little literature on how to apply it to software. This chapter shows the details of how to use it to improve the software reliability. It covers the lessons learned and shows different ways of integrating the FMECA into the most widely used software development model known as V
model. The chapter describes roles and responsibilities for proper use of this tool.
Chapter 8 Six Sigma Approach to Requirements Development (Keene)
In this chapter the author explains why Design of Experiments (DOE) is a sweet spot for identifying the key input variables to a Six Sigma programs. The chapter covers the origin of this program, the meaning of six sigma measurements, and how it is applied to improve the design. It then proceeds to cover the tools for designing the product for Six Sigma performance to reduce failure rates as close to zero as possible.
Chapter 9 Human Factors in Reliable Design (Dixon)
Humans are Often blamed for many product failures when in fact the fault lies in the insufficient attention to human factor engineering. This chapter covers the principles of human-centered design to make man-machine interface robust and error-tolerant. It covers how to perform the human factors analysis, and how to integrate it to make the product design user-friendly.
Chapter 10 Stress Analysis During Design to Eliminate Failures (Gullo)
This chapter explains why it is critical to reduce the design stress to improve durability as well as reliability. It introduces the concept of derating as a design tool. The author includes examples on electrical and mechanical stress analysis including how to apply this theory to software design. The chapter also shows how to apply Finite Element Analysis, a numerical technique, to solve specific design problems.
Chapter 11 Highly Accelerated Life Testing (Gullo)
Usually designers cannot predict what failures will occur for a new design. This chapter shows how highly accelerated life tests and highly accelerated stress tests can reveal the failure modes quickly. It covers how to design these tests and how to estimate the design margin from the test results. It shows different methods of accelerating the stresses.
Chapter 12 Design for Extreme Environments (Austin)
When a product is used in extreme cold or extreme heat such as in Alaska or in a desert in Arizona, we must design for such environments to assure product can last long enough. This chapter shows what factors need to be considered and how to design for each condition. It shows how lessons learned from space programs and overseas experience can help make products durable, reliable, and safe.
Chapter 13 Design for Trustworthiness (Bernstein and Yuhas)
This is a very important chapter because software design methods for reliability are not standardized yet. This chapter goes beyond reliability to design software such that it is also safe, and secure from errors in engineering changes which are very frequent. This chapter covers design methods and offers suggestions for improving the architecture, modules, interfaces, and using right policies for re-using the software. The chapter offers good design practices.
Chapter 14 Prognostics and Health Management Capabilities to Improve Reliability (Gullo)
Design for reliability practices should include detecting a malfunction before a product malfunctions. This chapter covers designing prognostics and product health monitoring principles that can b designed into the product. The result is enhanced system reliability. The chapter includes condition-based maintenance and time-based maintenance, use of failure precursors to signal an imminent failure event, and automatic stress monitoring to enhance prognosis.
Chapter 15 Reliability Management (Childs)
This chapter provides both motivation and guidance in outlining the importance of good reliability management. Management participation is the key to any successful reliability in design. It shows how to manage, plan, execute, and document the needs of the program during early design. It describes the important tasks, and closing the feedback loops after reliability assessment, problem solving, and reliability growth testing.
Chapter 16 Risk Management, Exception Handling, and Change Management (Dixon)
Many risks are overlooked in a product design. This chapter defines what is risk in engineering terms, how to predict risk, assess risk, and mitigate it. It highlights the role of risk management culture in mitigating risks and the critical role of configuration management for avoiding new risks from design changes. Included in this chapter is how to minimize oversights and omissions including requirement creeps.
Chapter 17 Integrating Design for Reliability with Design for Safety (Moriarty)
This chapter integrates reliability with safety, including how to design for safety. It covers several safety analysis techniques that equally apply to reliability. It shows the how a risk assessment code matrix is used widely in aerospace and many commercial products to make risk management decisions. It includes examples of risk reduction.
Chapter 18 Organizational Reliability Capability Assessment (Gullo)
This chapter describes the benefits of using IEEE 1624–2008 standard to describe how reliability capability of an organizational entity is determined by assessing eight key reliability practices and associated metrics. Management should know the capability of an organization to deliver a reliable product, which is defined as organizational reliability capability. It describes the process in detail with case studies.
Chapter 1
Design for Reliability Paradigms
Dev Raheja
Why Design for Reliability?
The science of reliability has not kept pace with user expectations. Many corporations still use MTBF (mean time between failures) as a measure of reliability, which, depending on the statistical distribution of failure data, implies acceptance of roughly 50 to 70% failures during the time indicated by the MTBF. No user today can tolerate such a high number of failures. Ideally, a user does not want any failures for the entire expected life! The life expected is determined by the life inferred by users, such as 100,000 miles or 10 years for an automobile, at least 10 years for kitchen appliances, and at least 20 years for a commercial airliner. Most commercial companies, such as automotive and medical device manufacturers, have stopped using the MTBF measure and aim at 1 to 10% failures during a self-defined time. This is still not in line with users' dreams. The real question is: Why not design for zero failures if we can increase profits and gain more market share? Zero failures implies zero mission-critical failures or zero safety-critical system failures. As a minimum, systems in which failures can lead to catastrophic consequences must be designed for zero failures. There are companies that are able to do this. Toyota, Apple, Gillette, Honda, Boeing, Johnson & Johnson, Corning, and Hewlett-Packard are a few examples.
The aim of design for reliability (DFR) is to design-out failures of critical system functions in a system. The number of such failures should be zero for the expected life of the product. Some components may be allowed to fail, such as in redundant systems. For example, in aerospace, as long as a system can function at least for the duration of the mission and the failed components are replaced prior to the next mission to maintain redundancy, certain failures can be tolerated. This is, however, insufficient for complex systems where thousands of software interactions, hundreds of wiring connections, and hundreds of human factors affect the systems' reliability. Then there are issues of compatibility [1] among components and materials, among subsystems, and among hardware and software interactions. Therefore, for complex systems we may find it impossible to have zero failures, but we must at least prevent the potential failures we know about. Since failures can come from unknown and unexpected interactions, we should try to design-in fallback modes for unexpected events. A what-if
analysis usually points to some events of this type. To minimize failures in complex systems, in this book we describe techniques for improving software and interface reliability.
As indicated earlier, some companies have built a strong and long-lasting reputation for reliability based on aiming at zero failures. Toyota and Sony built their world leadership mostly on high reliability; and Hyundai has been offering a 10-year warranty and increasing its market share steadily. Progress has been made since then. In 1974, when nobody in the world gave a warranty longer than one year, Cooper Industries gave a 15-year warranty to electric power utilities on high-voltage transformer components and stood out as the leader in profitability among all Fortune 500 electrical companies. Raytheon has established a culture at the highest level in the corporation of providing customers with mission assurance through a no doubt
mindset. Says Bill Swanson, chairman and CEO of Raytheon: [T]here must be no doubt that our products will work in the field when they are needed
(Raytheon Company, Technology Today, 2005, Issue 4). Similarly, with its new lifetime power train warranty, Chrysler is creating new standards for reliability.
Reflections on the Current State of the Art
Reliability is defined as the probability of performing all the functions (including safety functions) satisfactorily for a specified time and specified use conditions. The functions and use conditions come from the specification. If a specification misses or is vague 60% or more of the time, the reliability predictions are of very little value. This is usually the case [2]. The second big issue is: How many failures should be tolerable? Some readers may not agree that we can design for zero critical failures, but the evidence supports the contrary conclusion. We may not be able to prevent failures that we did not foresee, but we can design out all the critical failure modes that we discover during the requirements analysis and in the failure mode and effects analysis (FMEA). In over 30 years' experience, I have yet to encounter a failure mode that cannot be designed-out. The cost is usually not an issue if the FMEA is conducted and the improvements are made during the early design stage. The time specified for critical failures in the reliability definition should be the entire lifetime expected.
In this chapter we address how to write a good system specification and how to design so as not to fail. We make it clear that the design for reliability should concentrate on the critical and major failures. This prevents us from solving easy problems and ignoring the complex ones. The following incident raises issues that are central to designing for reliability.
The lessons learned from the Interstate 35 bridge collapse in Minnesota on August 1, 2007 into the Mississippi River on August 1, killing 13, give us some clues about what needs to be done. Similar failure mechanisms can be found in many large electrical and mechanical systems, such as aircraft and electric power plants.
The bridge was expanded from four lanes to six, and eventually to eight. Some wonder whether that might have played a role in its collapse. Investigators said the failure resulted because of a flaw in its design. The designers had specified a metal plate that was too thin to serve as a junction of several girders.
Like many products, it gradually got exposed to higher loads, adding strain to the weak spot. At the time of the collapse, the maintenance crews had brought tons of equipment and material onto the deck for a repair job. The bridge was of a design known as a nonredundant structure, meaning that if a single part failed, the entire structure could collapse. Experts say that the pigeon dung all over the steel could have caused faster corrosion than was predicted.
This case history challenges the fundamentals of engineering taught in the universities.
Should the design margin be 100% or 800%? How does the designer determine the design margin?
Should we design for pigeons doing their dirty job? What about designing for all the other environmental stressors, such as chemicals sprayed during snow emergencies, tornados, and earthquakes?
Should we design-in redundancy on large mechanical systems to avoid disasters? The wisdom says that redundancy delays failures but may not avoid disasters. The failure could occur in both the redundant paths, such as in an aircraft accident where the flying debris cut through all three redundant hydraulic lines.
Should we design for sudden shocks experienced by the bridge during repair and maintenance?
These concerns apply to any product, such as electronics, electrical power systems, and even a complex software design. In software, the corrosion can be symbolic for applying too many patches without knowing the interactions. Call it software corrosion.
The answers to the questions above should be a resounding yes.
An engineering team should foresee all these and many more failure scenarios before starting to design. The obvious strategy is to write a good system specification by first predicting all major potential failures and avoiding them by writing robust requirements. Oversights and omissions in specifications are the biggest weakness in the design for reliability. Typically, 200 to 300 requirements are generally missing or vague for a reasonably complex system such as an automotive transmission.
Analyses techniques covered in this book for hardware and software help us discover many missing requirements, and a good brainstorming session for overlooked requirements always results in discovering many more. What we really need is perhaps the paradigms based on lessons learned.
The Paradigms for Design for Reliability
Reliability is a process. If the right process is followed, results are likely to be right. The opposite is also true in the absence of the right process. There is a saying: If we don't know where we are going, that's where we will go.
It is difficult enough to do the right things, but it is even more difficult to know what the right things are!
Knowledge of the right things comes from practicing the use of lessons learned. Just having all the facts at your fingertips does not work. One must utilize the accumulated knowledge for arriving at correct decisions. Theory is not enough. One must keep becoming better by practicing. Take the example of swimming. One cannot learn to swim from books alone; one must practice swimming. It is okay to fail as long as mistakes are the stepping stones to failure prevention. Thomas Edison was reminded that he failed 2000 times before the success of the light bulb. His answer, I never failed. There were 2000 steps in this process.
One of the best techniques is to use lessons learned in the form of paradigms. They are easy to remember and they make good topics for brainstorming during design reviews.
Paradigm 1: Learn To Be Lean Instead of Mean
When engineers say that a component's life is five years, they usually imply the calculation of the mean value, which says that there is a 50% chance of failure during the five years. In other words, either the supplier or the customer has to pay for 50% failures during the product cycle. This is expensive for both: a lose–lose situation. Besides, there are many indirect expenses: for warranties, production testing, and more inventories to replace failed parts. This is mean management. It has a negative return on investment. It is mean to the supplier because of loss of future business and mean to the customer in putting up with the frustrations of downtime and the cost of business interruptions. Therefore, our failure rate goal should be as lean as possible. Engineers should promise minimum life to customers, not mean life. Never use averages in reliability; they are of no use to anyone.
Paradigm 2: Spend a Lot of Time on Requirement Analysis
It is worth repeating that the sources of most failures are incomplete, ambiguous, and poorly defined requirements. That is why we introduce unnecessary design changes and write deviations when we are in hurry to ship a product. Look particularly for missing functions in the specifications. There is often practically nothing in a specification about modularity, reliability, safety, serviceability, logistics, human factors, reduction of no faults found,
diagnostics capability, and prevention of warranty failures. Very few specifications address even obvious requirements, such as internal interface, external interface, user–hardware interface, user–software interface, and how the product should behave if and when a sneak failure occurs. Developing a good specification is an iterative process with inputs from the customer and the entities that are downstream in the process. Those who are trying to build reliability around a faulty specification should only expect a faulty product. Unfortunately, most companies think of reliability when the design is already approved. At this stage there is no budget and no time for major design changes. The only thing a company can do is to hope for reasonable reliability and commit to do better the next time.
To identify missing functions, a cross-functional team is necessary. At least one member from each disciple should be present, such as manufacturing, field service, and marketing, as well as a customer representative. If the specification contains only 50% of the necessary features, how can one even think of reliability? Reliability is not possible without accurate and comprehensive specifications. Therefore, writing accurate performance specifications is a prerequisite for reliability. Such specifications should aim at zero failures for the modes that result in product recalls, high downtime, and inability to diagnose. My interviews with those attending my reliability courses reveal that the dealers are unable to diagnose about 65% of the problems (no faults found). Obviously, fault isolation requirements in the specifications are necessary to reduce down time.
To ensure the accuracy and completeness of a specification, only those who have knowledge of what makes a good specification should approve it. They must ensure that the specification is clear on what the product should never do, however stupid it may sound. For example: There shall be no sudden acceleration during landing
for an aircraft. In addition, the marketing and sales experts should participate in writing the specification to make sure that old warranty problems shall not
be in the new product and that there is enough gain in reliability to give the product a competitive edge.
The shall not
specification is not limited to failures. That would be too simple. We must be able to see the complexity in this simplicity. This is called interconnectedness. We need to know that reliability is intertwined with many elements of life-cycle costs. The costs of downtime, repairs, preventive maintenance, amount of logistics support required, safety, diagnostics, and serviceability are dependent on the level of reliability. In the same spirit, we should also analyze product friendliness and modularity, which are interconnected with reliability. For example, General Motors is designing its hydrogen cars to have a single chassis for all models instead of 80 different chassis as is the case with current production. This action influences reliability in many ways. Similarly, an analysis of downtime should be conducted by service engineering staff to ensure that each fault will be diagnosed in a timely manner, repairs will be quick, and life-cycle costs will be reduced by extending the maintenance cycles or eliminating the need for maintenance altogether. The specification should be critiqued for quick serviceability and ease of access. Until the specification is written thoroughly and approved, no design work should begin. An example of the need to identify missing requirements is that nearly 1000 people around the world lost their lives while the kinks were being removed from the 290-ton McDonnell Douglas DC-10 during the 1970s. Blown-out cargo doors, shredded hydraulic lines, and engines dropped during the flight were just a few of the behemoth's early problems. It is obvious that the company did not have the right system performance specification. We rely on customers to tell us what they want, but they themselves don't know many requirements until there is a breakdown. Customers are not going to tell us that the cargo doors should not blow out during a crowded flight. It is