Comparison Modeling of System Reliability for Future NASA Projects

Amanda M Koons-Stapf

Comparison Modeling of System Reliability for Future NASA Projects Amanda M. Gillespie, ASQ CRE, SAIC Mark W. Monaghan, Ph.D, SAIC Yuan Chen, Ph.D, NASA LaRC Key Words: RBD, Importance Measure, Cut Set, Fussell-Vesely, Comparison Modeling SUMMARY & CONCLUSIONS A National Aeronautics and Space Administration (NASA) supported Reliability, Maintainability, and Availability (RMA) analysis team developed a RMA analysis methodology that uses cut set and importance measure analyses to compare model proposed avionics computing architectures. In this paper we will present an effective and efficient application of the RMA analysis methodology for importance measures that includes Reliability Block Diagram (RBD) Analysis, Comparison modeling, Cut Set Analysis, and Importance Measure Analysis. In addition, we will also demonstrate that integrating RMA early in the system design process is a key and fundamental decision metric that supports design selection. The RMA analysis methodology presented in this paper and applied to the avionics architectures enhances the usual way of predicting the need for redundancy based on failure rates or subject matter expert opinion. Typically, RBDs and minimal cut sets along with the Fussell-Vesely (FV) method is used to calculate importance measures are calculated for each functional element in the architecture [1]. This paper presents an application of the FV importance measures and presents it as a methodology for using importance measures in success space to compare architectures. These importance measures are used to identify which functional element is most likely to cause a system failure, thus, quickly identifying the path to increase the overall system reliability by either procuring more reliable functional elements or adding redundancy [2]. This methodology that used RBD analysis, cut set analysis, and the FV importance measures allowed the avionics design team to better understand and compare the vulnerabilities in each scenario of the architectures. It also enabled the design team to address the deficiencies in the design architectures more efficiently, while balancing the need to design for optimum weight and space allocations. 1 INTRODUCTION A trade study was performed to evaluate various avionics computing architectures from the perspectives of reliability, mass, power, data integrity, software implementation, and hardware and software integration for future NASA programs. A set of RBD models were developed to analyze the reliability of and rank the various computing system architectures. These reliability analysis modules allowed for ease and consistency in calculating reliability, cut sets, and importance measures. First, the RBD modules were created and the reliability of each architecture was calculated. Second, cut set analyses were performed to determine functional elements most likely to fail in the architecture (i.e., which functional elements had the largest unreliability). Third, FV importance measures were calculated for each functional element in each of the architectures. Last, identical functional elements were grouped to allow for comparison between the architectures and provide the understanding of which functional elements had the most significant impact on system reliability. 2 SCOPE This paper describes the reliability engineering methodology developed for the RBD comparison, cut set analysis, importance analysis, and improvement recommendations for the architectures for future NASA launch vehicles. 3 ASSUMPTIONS To ensure that the RBD modules for each scenario of the architecture were comparable, repeatable, and auditable, assumptions were documented, which included functional element failure rates, fault tolerance, mission duration, and cable and connector details. 3.1 Functional element Failure Rates The failure rates for the functional elements in the architectures were estimated based on the existing avionics system reliability databases. To facilitate comparison, the same failure rates were assumed for the same functional elements, interconnects, and topologies for the various architectures. All functional elements were assumed to have an exponential failure rate distribution, as not only is the most common approach, but the exponential distribution is the model for the useful life period, signifying that random failures are occurring. This useful life period was the focus of the avionics architecture trade study. Due to the proprietary nature of these failure rates, they will not be listed in this paper. The functional element failure rates were based on existing avionics architectures used in aircraft, and a failure rate environmental conversion was made to the Spaceflight (SF) environment. The conversions were made in accordance with the System Reliability Center (SRC) environmental matrix [3]. 3.2 Fault Tolerance Avionics systems have evolved over time to incorporate fault tolerance within the system architecture. The capability to survive a functional element fault has driven the design of multiple avionics system configurations [4]. In the Delta family of launch vehicles, the avionics systems have evolved over time based on reliability and fault tolerance [5]. Several examples of various avionic system architectures were considered. The NASA avionic architecture team selected the systems to be evaluated based on fault tolerance capability. The reliability analysis for the selected avionic systems to be studied assumed one-fault tolerance for each function element, i.e., more than one failure in any single functional element was deemed to be a failure. The configuration of the functional elements provided the fault tolerance capability. For example, more than one of the three Inertial Navigation Units (INU) would result in a system failure. In addition, only hard or non-recoverable failures of the functional elements were considered in the analysis. The impact of a functional element operating in a degraded stated was not taken into consideration. For a triplex voter (TV), the functional unit configuration was assumed to consist of three functional units plus majority voting logic, with the functional elements in a parallel configuration for reliability calculations (2-of-3 in agreement for success). Figure 1 shows the RBD configuration for a TV. For a self-checking pair (SCP) functional element configuration, it was assumed that, for the flight computers, switches, or buses, the self-checking pair consists of two flight computers or two switches or two buses, which needed to have data agreement to be successful. Therefore, the SCP functional elements were in a series configuration for reliability calculations (2-of-2 in agreement for success). Figure 2 shows the RBD configuration for a SCP. of up to nine months (6,480 hours) for mission scenarios that could potentially require an Earth departure stage and long mission duration. 3.5 Cables and Cable Connectors (C&C) The different architectures were modeled with and without cabling as part of the RBD analysis. The request to model cabling was made in order to identify a potential difference when cabling is installed into the models. The cable input to the functional element was modeled as having their individual failure properties. The cabling assembly RBD “block” was assumed to be comprised of the cable, supports, and connectors. The cable assembly was assumed to be routed from the source to the destination individually. Figure 1 – RBD Example of TV and Fully-Cross Strapped Architecture 3.3 Channelized and Fully-Cross Strapped Configurations A fully-cross strapped configuration was assumed to be such that all functional elements could share data, i.e., FC-1 could share data with all switches, and all switches could share data with all instrumentation/sensors and effectors/actuators. Figure 1 shows the RBD configuration of the fully-cross strapped architecture. A channelized configuration was assumed to be such that only functional elements of the same channel could share data, i.e., Flight Computer-1 (FC-1) only shared data with switch-1 (SW-1), and SW-1 could only share data with Data Acquisition Unit-1 (DAU-1), Main Propulsion System-1 (MPS-1), etc. Figure 2 shows the RBD configuration of the channelized architecture. 3.4 Mission Duration The reliability analysis was performed for a time period Figure 2 – RBD example of SCP and Channelized Architecture 4 RELIABILITY BLOCK DIAGRAMS (RBD) A RBD provides a pictorial representation of the architecture’s reliability performance. The RBD demonstrates a logical connection of functional elements needed for system success. The RBD does not identify the avionics system topology but rather the functional element logical connection [6]. The particular assemblies identified by the RBD blocks identify system operational functions. The RBD model does not demonstrate physical configuration, cannot predict mass, does not estimate power consumption, and cannot guarantee the reliability values demonstrated are capable of being achieved. However, when RBDs from different architectures or systems have the same assumptions (failure rates, fault tolerance, etc), the RBDs can provide the ability to rank architectures by order of magnitude comparison, in which case the design engineer can determine the most reliable system architecture [7]. The architecture RBD calculations take into account the objectives and related engineering defined aspects of each system configuration from an assessment of operational success. The RBD is assembled in a success path for the system. The series representation indicates a system in which each block is dependent upon the success of the system. Parallel block configurations indicate a group of blocks that provide active redundancy or standby redundancy. PTC Windchill Quality Solutions (formerly RELEX) was used as the primary reliability modeling tool. The various architectures were modeled into different RBD configurations using the failure rates as described in Section 3.1. By using the same failure rates, the only variance in results would be due to the RBD configurations identifying the variant in configuration of the architectures. This allows for a normalized comparison of the architectures. In the RELEX software, the Optimization Simulation (OpSim) module was used to depict the RBDs. Results were calculated using both analytical and Monte-Carlo Simulation calculations with 1,000,000 iterations. For the Monte-Carlo Simulations, the confidence level was set at 95%. ranking of the architectures to determine if one is significantly better than the others. With these results, two architectures were eliminating two of the architectures (CBTV and CBSC). However, additional RBD comparison data was needed to ensure the vulnerabilities of each architecture was understood. 5 RBD ANALYSIS AND RELIABILITY RESULTS Cut set analysis provides clear indication of where the most likely failure paths would be depending on the accuracy of the RBD and the accuracy of the failure data of the functional elements. A cut set is a set of basic events [failures] where the joint occurrence of these basic events results in the failure of the system [2]. Each cut set can contain anywhere from one to all functional elements, depending on the system architecture. A minimal cut set is defined as a set that “cannot be reduced without losing its status as a cut set” [8]. Once the minimal cut sets were identified, a comparison was made to determine if the system with the least reliability contained the most failure paths, thus, making a recommendation for the most reliable architecture based on the number of minimal failure paths. It was determined that, for the architectures in this study, in general, the more cut sets an architecture had, the less reliable it tended to be. However, as the difference in reliability between the architectures became smaller, a conclusion as to the most reliable architecture could not be drawn from the number of cut sets alone. This was due to the difference in number of functional elements and their configuration in each of the architectures. Listed below are the various avionic architectures were evaluated. Following this, the quantity of functional elements, the redundancy configuration for each architecture, and the architecture fault tolerance are provided. 1. Fully Cross-Strapped Switched Triplex Voter (FCSSTV) 2. Partially Cross-Strapped Switched Triplex Voter (PCSSTV) 3. Channelized Bussed Triplex Voter (CBTV) 4. Fully Cross-Strapped Self-Checking (FCSSC) 5. Fully Cross-Strapped Bussed Self-Checking (FCSBSC) 6. Channelized Bussed Self-Checking (CBSC) 5.1 Architecture RBD Reliability Results Summary Figure 3 and Table 1 show the reliability results for the six architectures. The reliability results are calculated at 9 months (6,480 hours) and include the failure contribution from cabling. These reliability results were considered to be a best estimate of the architecture reliability. The results may not be the actual operational reliability; however, they allow a Figure 3: Architecture Reliability Comparison Results: Reliability versus Mission Elapsed Time (MET) Architecture R (6480 hrs) FCSSTV 0.666999 PCSSTV 0.613596 CBTV 0.464581 FCSSC 0.648547 FCSBSC 0.646730 CBSC 0.389427 Table 1: Architecture Reliability Results 6 CUT SET ANALYSIS Table 2 shows the architectures ranked from most reliable to least reliable and the number of minimal cut sets calculated for each, which led the authors to the conclusion that, for the architectures compared for this study, the number of cut sets had no inherent tie to system reliability, rather it was more a function of the modeling detail and number of components in each architecture. Architecture R (6480 hrs) # of Minimal Cut Sets FCSSTV 0.666999 75 FCSSC 0.648547 67 FCSBSC 0.646730 73 PCSSTV 0.613596 195 CBTV 0.464581 267 CBSC 0.389427 304 Table 2: Architecture Ranking and Number of Minimal Cut Sets 7 IMPORTANCE MEASURES IN SUCCESS SPACE Part of the decision analysis in selecting a specific architecture includes determining which of the functional elements can lead to high risk scenarios. In order to assess the importance of functional elements in the architecture or the sensitivity of the architecture reliability to changes in the functional element’s input failure rates, several importance (or sensitivity) measures are available [2]. “Importance measures quantify the criticality of a particular functional element within a system design. They have been widely used as tools for identifying system weakness, and to prioritize reliability improvement activities.” [9] The various measures are based on slightly different interpretations of the concept of functional element importance. Intuitively, the functional element importance should depend on the location of the functional element in the system, the reliability of the functional element in question, and the uncertainty in the estimate of functional element reliability [10]. For this study, the individual functional element importance measures were divided by the sum of all importance measures to determine which element represented the most significant contribution to architecture reliability. The importance measure reflects how much relative improvement may be available from improving performance of a specific functional element. Change in the failure rates of the functional elements (or adding redundancy to account for the high failure rate) with the highest importance measure percent contribution will have the most significant effect on increasing system reliability. Typically, importance measures are used in failure space, or for Fault Tree Analysis (FTA). However, these importance factors can be defined in success space (RBD Analysis) by calculating the measures based on the total success of the system instead of the total risk. Some importance factors do not preserve their meaning in success space; therefore, they fail to rank the functional elements appropriately. However, all importance measures provide one with a single number for each functional element that can be used as part of a comparative analysis. There are five importance measures generally accepted for use: Birnbaum (BM), Criticality, Fussell-Vesely (FV), Risk Reduction Worth (RRW), and Risk Achievement Worth (RAW) [11]. The FV is the fractional contribution of the event to the overall architecture reliability. The RAW is the decrease in the architecture reliability if the functional element is has always failed. The RRW is the increase in the architecture reliability is the functional element never fails. The BM is the rate of change in the architecture reliability as a result of the change in the reliability of a given event. BM can be expressed also as BM = RAW + RRW [12]. The Criticality measure is a weighted version of the BM, which takes into account the ratio of functional element failure probability and system reliability. Due to the limitation of the RELEX software program in calculating importance measures in success space, all importance measures had to be calculated by hand. The BM, Criticality, RAW, and RRW measures proved significantly time consuming, as the conditional reliability calculations had to be made by performing multiple runs in RELEX to obtain values (20 or more runs per architecture). However, RELEX calculations quickly provided minimal cut set and their corresponding unreliability values that could then be used to calculate the FV for all functional elements. Figure 4 shows the comparison of the BM, Criticality, FV, RRW, and RAW results for the functional elements in the FCSSTV architecture. Figure 4: Comparison of Various Importance Measure Results Table 3 details the comparison of the RRW, RAW, and FV measures for the FCSSTV architecture. The RRW and RAW were calculated on an interval scale to allow for comparison of architectures [13]. The FV and RAW measures yielded very similar results, while the FV measure differed slightly in comparison to the RRW measure. Table 4 details the comparison of the BM, Criticality, and FV measures for the FCSSTV architecture. The RRW, BM, and Criticality measures yielded nearly identical results and rankings. The FV and RAW measures differed slightly from the BM and Criticality measures, but the primary contributor to the unreliability of the architecture (over 21% of the unreliability in all three measures) remained the same. RRW RAW FV FC 21.32% FC 24.89% FC 27.36% INU 14.92% PIC 20.41% PIC 18.54% RGA 14.92% ECU 14.68% ECU 13.63% PIC 11.01% HCU 13.54% HCU 12.62% ECU 9.74% INU 10.00% INU 10.74% HCU 9.45% RGA 10.00% RGA 10.74% DAU 4.72% DAU 2.61% DAU 2.54% SW 4.63% RCS 1.50% RCS 1.47% RCS 3.67% MPS 1.50% MPS 1.47% MPS 3.67% SW 0.74% SW 0.76% TVC 1.94% TVC 0.12% TVC 0.12% Table 3: Comparison of RRW, RAW, and FV Importance Measure Results for FCSSTV Architecture BM Criticality FV FC 21.93% FC 21.32% FC 27.36% INU 14.07% INU 14.92% PIC 18.54% RGA 14.07% RGA 14.92% ECU 13.63% PIC 12.62% PIC 11.01% HCU 12.62% ECU 10.59% ECU 9.74% INU 10.74% HCU 10.15% HCU 9.45% RGA 10.74% DAU 4.36% DAU 4.72% DAU 2.54% SW 3.97% SW 4.63% RCS 1.47% RCS 3.30% RCS 3.67% MPS 1.47% MPS 3.30% MPS 3.67% SW 0.76% TVC 1.63% TVC 1.94% TVC 0.12% Table 4: Comparison of BM, Criticality, and FV Importance Measure Results for FCSSTV Architecture Although all importance measures provided useful and similar information on functional elements in success space, when weighing the time needed to perform the importance measure analysis versus the benefits that were to be achieved, the FV was chosen in order to efficiently obtain the quantitative importance measures for each functional element for a comparative analysis. 7.1 Fussell-Vesely (FV)Importance for Functional Elements The FV is the probability that at least one minimal cut set that contains the functional element (i) has failed at time (t), given that the system is failed at time (t) [10]. In other words, the functional element FV is the sum of the unreliability of the minimal cut sets containing the functional element, divided by the sum of the unreliability of all of the system’s minimal cut sets. Once the functional element FV percent contributions were calculated for all functional elements and all architectures, a comparison of functional element importance between the various architectures can be made. The implementation of the FV to compare functional elements within the different architectures proved somewhat more involved than first anticipated. Most architectures contained redundancy with two or more of the same functional element functions within each; however, there was not a one-to-one correspondence. For example, the TV architectures (FCSSTV, PCSSTV, and CBTV) contained three FCs, while the SCP architectures (FCSSC, FCSBSC, and CBSC) contained four FCs. Therefore, the functional element functions were grouped for comparison by summing the contributions of all like-functional elements in the architecture. Table 5 shows the example of how this was done for FCs. Architecture FC-# FV % FV % FC-1 9.12% FCSSTV FC-2 9.12% 27.36% FC-3 9.12% FC-1 6.66% PCSSTV FC-2 6.66% 19.98% FC-3 6.66% FC-1 15.83% CBTV FC-2 7.63% 39.29% FC-3 15.83% FC-1A 8.35% FC-1B 8.35% FCSSC 33.39% FC-2A 8.35% FC-2B 8.35% FC-1A 8.51% FC-1B 8.51% FCSBSC 34.04% FC-2A 8.51% FC-2B 8.51% FC-1A 11.08% FC-1B 11.08% CBSC 44.33% FC-2A 11.08% FC-2B 11.08% Table 5: Example of Functional element grouping for Importance Measure Comparison Once all functional element functions were grouped, a few functions were unable to be compared. For example, CBTV and CBSC had Bus functional element contributions, while the other Architectures did not. Overall, though, this worked to determine which functional elements were most likely to cause a failure. The results are shown in Figure 5. Figure 5: Functional element FV Calculations per Architecture The functional element importance measures provide the designer with more information than a single reliability calculation comparison may provide. Architecture selection is made based not only on reliability calculations, but on weight, space, cost, risk to the mission, etc, and the additional importance measure calculations provide comparison data to allow the designer to make more informed trade decisions in a more efficient and effective manner. The difference in vulnerabilities of the architectures can easily be compared using the importance measure analysis. Different distributions of the functional elements’ failure contributions indicate different reliability improvement paths for the architectures. Intermediate states can be modeled to include the impact of failure data integrity on the reliability and integrity of the architectures. These results can be used to determine the most efficient and cost-effective way to increase reliability of the architecture. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. W.S. Gough, J. Riley, J.M. Koren, “A New Approach to the Analysis of Reliability Block Diagrams,” Proc. Ann. Reliability & Maintainability Symp. (Jan.) 1990, pp. 456464. B.M. Ayyub, Risk Analysis in Engineering and Economics, New York, Chapman & Hall/CRC, 2003. D. Nicholls ed., System Reliability Toolkit, Reliability information Analysis Center (RAIC), 2005. R. Hammett, “Design by Extrapolation: An Evaluation of Fault Tolerant Avionics,” IEEE AESS Systems Magazine, (April) 2002. J.M. Palsulich, B.J. Schinnerer, Launch Vehicles for Low-Cost Planetary Missions, Elsevier Science Ltd., 2002. P.D.T. O’Connor, Practical Reliability Engineering, 4th Edition, Wiley, 2006. M.L.O. e`Souza, T.R. de Carvalho, The Fault Avoidance and The Fault Tolerance Approaches for Increasing the Reliability of Aerospace and Automotive Systems, Society of Automotive Engineers Inc., 2005. W.R. Blischke, D.N. Prabhakar Murthy, Reliability: Modeling, Prediction, and Optimization, Wiley, 2000. J.E. Ramierez-Marquez, D.W. Coit, “Composite Importance Measures for Multi-State Systems with MultiState Functional elements,” IEEE Transactions on Reliability, vol. 54, (Sept) 2005, p.517. M. Rausand, A. Hoyland, System Reliability Theory: Models, Statistical Methods, and Applications, 2nd Edition, Wiley, 2003. J. Apt, “Human Spaceflight Risk Management,” Encyclopedia of Aerospace Engineering, Wiley, 2010. M. Stamatelatos, W. Vesely, et al, Fault Tree Handbook with Aerospace Applications, NASA Office of Safety and Mission Assurance, August 2002. W.E. Vesely, T.C. Davis, R.S. Denning, N. Saltos, Measures of Risk Importance and Their Applications, Division of Risk Analysis, Office of Nuclear Regulator Research, NUREG/CR-3385, 1986. ACKNOWLEDGMENTS The authors would like to thank Glen (Spence) Hatfield, Duane H. Pettit, Joseph M. Schuh and Dr. Robert Hodson, who contributed to the development and review of the RMA analysis methodology described in this paper. BIOGRAPHIES Amanda M. Gillespie, ASQ CRE SAIC-LX-O3 Operations & Checkout Bldg, M7-0355 Kennedy Space Center, FL 32899 USA e-mail: amanda.gillespie-1@nasa.gov Amanda received her BS in Applied Mathematics from the Georgia Institute of Technology in 2000. Amanda M. Gillespie is a Reliability Engineer with SAIC at NASA KSC, FL. As a part of the KSC RMA team, Amanda works with multiple engineering teams to evaluate and increase the operational and inherent availability of the systems. Amanda is a member of the American Society for Quality (ASQ) Reliability and Statistics Societies. Amanda received her ASQ Certified Reliability Engineer (CRE) certification in January 2011. Mark W. Monaghan, Ph.D SAIC-LX-O3 Operations & Checkout Bldg, M7-0355 Kennedy Space Center, FL 32899 USA e-mail: mark.w.monaghan@nasa.gov Mark W. Monaghan received his Ph.D in Applied Decision Science from Walden University in 2008. Mark is a Reliability Engineer with SAIC at NASA KSC, FL. As a part of the KSC RMA team, Dr. Monaghan works with multiple engineering teams to evaluate and increase the operational and inherent availability of the systems. He is a senior member of Institute of Electrical and Electronic Engineers (IEEE) Industrial Application Society (IAS). He is also a member of American Institute of Aeronautics and Astronautics (AIAA) and the ASQ Reliability Societies. Yuan Chen, PhD NASA LaRC Electronic Systems Branch 5 N. Dryden Street, MS 488 Hampton, VA 23681 USA e-mail: yuan.chen@nasa.gov Yuan Chen received her Ph.D. Reliability Engineering from the University of Maryland at College Park, Maryland, in 1998, with a Graduate Fellowship from the National Institute of Standards and Technologies. She is currently a senior member of technical staff with the Electronic Systems Branch, NASA Langley Research Center, Hampton, Virginia. Dr. Chen’s research area has been focused on the development of and reliability methodologies on microelectronic devices/systems for space applications. She has authored and co-authored over 40 technical papers, and is a senior member of IEEE and AIAA.

RELATED PAPERS

RELATED TOPICS

Log In

Comparison Modeling of System Reliability for Future NASA Projects

Comparison Modeling of System Reliability for Future NASA Projects

Related Papers

RELATED PAPERS

RELATED TOPICS