A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems

Kim, Changhyeon; Lee, Dongmin; Na, Jongwhoa

doi:10.3390/aerospace12020152

Open AccessArticle

A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems

by

Changhyeon Kim

¹

,

Dongmin Lee

²

and

Jongwhoa Na

^1,*

¹

School of Electronics and Information Engineering, Korea Aerospace University, 76 Hanggongdaehang-ro, Deogyang-gu, Goyang-si 10540, Republic of Korea

²

School of Smart Air Mobility, Korea Aerospace University, Goyang-si 10540, Republic of Korea

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(2), 152; https://doi.org/10.3390/aerospace12020152

Submission received: 31 December 2024 / Revised: 13 February 2025 / Accepted: 14 February 2025 / Published: 17 February 2025

(This article belongs to the Special Issue On-Board Systems Design for Aerospace Vehicles (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

The advent of the New Space Era has significantly accelerated the development of space equipment systems using commercial off-the-shelf components. Field Programmable Gate Arrays are increasingly favored for their ability to be easily modified, which substantially reduces both development time and costs. However, their high susceptibility to space radiation poses a considerable risk of mission failure, potentially compromising system reliability in harsh space environments. To mitigate this vulnerability, the implementation of fault-tolerant mechanisms is essential. In this study, we applied eight distinct fault-tolerant mechanisms to various circuits and conducted a comparative analysis between two different categories: hardware redundancy and informational redundancy. This comparison was based on consistent criteria, specifically the Architectural Vulnerability Factor and resource consumption. Utilizing statistical fault injection tests and specialized software, we quantitatively measured structural vulnerability, power consumption, delay, and area. The results revealed that while the Hamming Code achieved the lowest structural vulnerability, it resulted in approximately fourfold increases in resource consumption. Conversely, Triple Modular Redundancy provided high reliability with relatively minimal resource usage. This research elucidates the trade-offs between reliability and resource overhead among different fault-tolerant mechanisms, highlighting the critical importance of selecting appropriate mechanisms based on system requirements to optimize the balance between reliability and resource utilization. Our analysis offers new insights essential for optimizing fault-tolerant mechanisms in space applications. Future work should explore more complex circuit architectures and diverse fault models to refine the selection criteria for fault-tolerant mechanisms tailored to real-world space missions.

Keywords:

fault tolerance mechanism; reliability; architectural vulnerability factor; Power-Delay-Area Product (PDAP); TMR; Hamming

1. Introduction

With the advent of the New Space Era, the development of space equipment systems utilizing commercial off-the-shelf electronic components has surged. Unlike the state-led space missions of the Old Space Era, modern commercial space development operates on a smaller scale and demands faster data processing [1,2]. Consequently, commercial off-the-shelf electronic components are increasingly favored for their cost efficiency, compact size, and high processing performance compared to traditional space-grade components. Among these, field-programmable gate arrays have gained significant popularity due to their ability to undergo hardware modifications without requiring physical redesign, thereby reducing development time and costs, particularly in small-scale production [3].

However, commercial field-programmable gate arrays are more vulnerable to space radiation than their space-grade counterparts. While space-grade components incorporate radiation-hardening techniques such as Semiconductor on Insulator and Semiconductor on Sapphire to mitigate radiation effects [4,5], commercial field-programmable gate arrays lack such protective measures. As a result, they are more susceptible to mission failures caused by space radiation-induced faults [6]. To enhance reliability and prevent such failures, the implementation of fault-tolerant mechanisms is essential [7,8,9].

Selecting the optimal fault-tolerant mechanism requires a comprehensive understanding of system architecture and technology to maximize reliability while minimizing resource consumption. Redundancy-based fault-tolerant mechanisms are categorized into four types: hardware, software, informational, and temporal. This study specifically focuses on hardware redundancy and informational redundancy due to their direct impact on system reliability and resource efficiency in field-programmable gate array-based implementations [10,11]. While these mechanisms enhance system robustness, they require additional resources such as power, processing time, and area. Therefore, achieving a balance between reliability and resource utilization is crucial.

Previous studies have compared the reliability and resource consumption of various fault-tolerant mechanisms; however, these comparisons have largely been limited to mechanisms within the same redundancy category [3,10,11,12,13]. There is a relative lack of research systematically comparing fault-tolerant mechanisms across different redundancy categories, which hinders a comprehensive understanding of their integrated effects and resource consumption aspects. Therefore, this study aims to provide a unified evaluation of mechanisms from different redundancy categories to maximize the reliability of space equipment systems while optimizing resource efficiency.

In contrast, this study provides a novel cross-category analysis by comparing hardware redundancy and informational redundancy using identical criteria: the Architectural Vulnerability Factor and resource consumption. By applying eight fault-tolerant mechanisms—DMR, DMR+, TMR, QMR, Parity Code, Two-Dimensional Parity Check Code, Hamming Code, and Hamming and TMR—to an Advanced Encryption Standard security circuit, we analyze their respective reliability and resource trade-offs. Statistical fault injection tests, conducted using the Verilog Fault Injector on the Icarus Verilog Simulator, were utilized to calculate the Architectural Vulnerability Factor. Resource usage was quantified in terms of power, delay, and area using Xilinx VIVADO (2024.2) software. Power consumption was measured using the “Report Power” feature; delay was assessed using “Timing Analysis” to determine the circuit’s critical path, and area was measured through “Report Utilization” by counting the number of Look-Up Tables, flip-flops, and Block RAMs used in implementation. These metrics were then combined into a Power-Delay-Area Product for a holistic evaluation of resource consumption.

The research results indicated that while the Hamming Code achieved the lowest Architectural Vulnerability Factor, its resource consumption increased by approximately fourfold. In contrast, Triple Modular Redundancy provided high reliability with relatively lower resource usage. Specifically, within the category of hardware redundancy, Triple Modular Redundancy offered the most efficient balance between reliability and resource consumption, resulting in a roughly 140% reduction in the Power-Delay-Area Product compared to QMR, which achieved the lowest Architectural Vulnerability Factor but with higher resource consumption. In the domain of informational redundancy, the 2D-Parity Code demonstrated the highest reliability, with a 75.22% decrease in the Architectural Vulnerability Factor compared to the baseline, although the Power-Delay-Area Product increased by 122.02%. Overall, informational redundancy techniques provide superior reliability, while hardware redundancy methods excel in resource efficiency.

This study emphasizes the importance of selecting fault-tolerant mechanisms that optimize the balance between reliability and resource utilization based on system requirements by elucidating the trade-offs between reliability and resource overhead among various fault-tolerant mechanisms. Additionally, by enabling quick and easy comparisons of reliability and resource efficiency during the early Register Transfer Level (RTL) development stage, this analysis method can significantly reduce overall development costs and time. These innovative analysis techniques offer valuable insights for engineers aiming to optimize fault-tolerant mechanisms for applications not only in space but also in Urban Air Mobility (UAM), Advanced Air Mobility (AAM), aircraft, and high-altitude electronic systems. Future research should investigate more complex circuit architectures and diverse fault models to refine the selection criteria for fault-tolerant mechanisms tailored to real-world applications.

The remainder of this paper is organized as follows. Section 2 discusses issues that arise in space environments when using COTS electronic equipment, as well as related research aimed at addressing them. Section 3 presents representative redundancy-based FTMs applied in this study, along with methods for measuring reliability and resource usage. Section 4 compares and analyzes the reliability and resource utilization of the eight FTMs, all applied to the AES-128 target circuit. Finally, Section 5 concludes the paper.

2. Background

2.1. Trends in Using Commercial Off-the-Shelf Components in Space Electronic Systems

With the dawn of the New Space Era, in which private companies rather than governments are spearheading space development, commercial space development has become increasingly active, as shown in Figure 1. Compared to traditional state-led space projects, commercial space development faces tighter constraints in terms of scale, budget, and development timeframe.

In recent years, the small satellite market has been steadily growing, driven in part by the increasing role of private companies in space development. Small satellites can reduce overall mission costs due to short development times and low expenses, which is crucial for private enterprises focusing on commercial viability [14,15].

In the past, space development demanded extremely high reliability for electronic equipment due to the high cost of missions and the impracticality of maintenance in space. As a result, engineers were reluctant to use commercial electronic components. However, with the advent of the New Space era, budget, and human resources for satellite development are more limited, and development cycles have become shorter. Consequently, COTS electronic components—known for low cost and short development times—are being adopted more actively.

Figure 1. Number of launched LEO satellites (Data from the UCS Satellite Database [16]).

2.2. Single Event Upset (SEU)

Single Event Upset (SEU) refers to the phenomenon where data bits stored in memory are altered when an electronic system is exposed to a radiation environment. The primary causes include primary cosmic rays, which are the direct impact of cosmic and solar energy, and secondary cosmic rays, which are generated when primary cosmic rays scatter in the Earth’s atmosphere. Due to these factors, SEUs can occur in various environments ranging from outer space to aircraft altitudes and ground level, albeit with varying frequencies [17]. When these high-energy particles collide with sensitive parts of electronic circuits, such as memory cells, they change the state of the circuit, thereby causing errors. The process by which SEUs occur is illustrated in Figure 2.

SEUs are a major cause of reduced reliability and stability in electronic devices operating in space environments, potentially leading to system malfunctions, performance degradation, or even complete system failure. For example, the communication module of a satellite may experience data transmission issues due to the effects of an SEU, and in severe cases, a malfunction in the attitude control system could lead to mission failure [18]. Additionally, there have been reports of incidents where a spacecraft’s navigation system processed erroneous data due to an SEU, leading to failures in orbit correction, or where command signals between the spacecraft and the ground station were distorted, resulting in the execution of incorrect commands [19]. Given that such issues can ultimately cause mission failure, it is essential to address SEU-related challenges.

One of the representative countermeasures is shielding, which protects electronic devices by using radiation-blocking materials [20]. Shielding employs high-density materials such as lead, tungsten, or aluminum to form a physical barrier around electronic devices, effectively blocking various types of radiation encountered in space. These materials exhibit excellent absorption capabilities against different radiation particles, including gamma rays, neutrons, and protons, thereby protecting sensitive components and enhancing system reliability. However, achieving sufficient radiation shielding while minimizing weight and volume remains a significant challenge in design, necessitating careful material selection and optimized placement.

Another approach involves using radiation-hardened components and implementing radiation-tolerant design techniques to enhance circuit resilience against radiation [4]. Radiation-hardened design typically includes the selection of radiation-resistant semiconductor devices, circuit layout optimization, and the simplification of signal paths to minimize electromagnetic interference and physical damage caused by radiation. For example, the use of semiconductor materials such as silicon carbide (SiC) or gallium nitride (GaN) can improve device durability, while adopting high-purity materials such as sapphire substrates can reduce the incidence of radiation-induced defects. Additionally, by simulating radiation effects during the circuit design phase and optimizing the arrangement and placement of devices accordingly, the overall radiation tolerance of the system can be significantly enhanced. These radiation-tolerant design techniques not only strengthen the fundamental reliability of electronic devices but also play a crucial role in preemptively mitigating radiation-related issues during long-term missions. However, the use of high-radiation-resistant materials often comes with the drawback of increased system weight and cost.

Fault-tolerant techniques, on the other hand, include the ability to detect and correct errors in real time that are caused by single event effects. Such techniques encompass error detection and correction codes (ECC), redundancy designs, and reset mechanisms, ensuring that the system continues to operate normally despite transient errors induced by radiation [21]. Fault-tolerant technologies are essential for maintaining the stability of electronic devices during long-term missions and significantly enhance overall system reliability [8]. Moreover, compared to traditional methods, such as shielding or radiation-hardened design, fault-tolerant techniques offer the advantage of improving system reliability without incurring additional weight or cost.

Figure 2. Single event upset occurrence process [22].

2.3. Reliability and Resource Utilization of Fault Tolerance Mechanisms in Electronic Systems

In designing electronic equipment for environments prone to high fault occurrence, such as space, it is essential to apply fault tolerance mechanisms (FTMs) to ensure reliability and stability. However, each FTM exhibits different levels of reliability and resource utilization. Consequently, system designers must compare and evaluate these mechanisms based on the demands of the target system to select the most suitable one [23].

Figure 3 categorizes fault recovery methods in electronic equipment according to their FTM types. Representative FTMs include hardware redundancy, software redundancy, information redundancy, and time redundancy. These are design techniques that employ additional data, hardware, or time to enhance system reliability and constitute a core concept for fault tolerance. Redundancy ensures normal operation or enables error detection and recovery, even when faults occur.

Hardware Redundancy: This approach adds extra hardware with the same functionality to detect or mitigate faults in hardware. A primary example is triple modular redundancy (TMR), a form of static parallel redundancy. In TMR, multiple identical hardware modules produce outputs that are sent to a “major voter”, which masks faults through majority voting. Although hardware redundancy improves system reliability, it requires significant resources;
Software Redundancy: This technique mitigates software faults by using multiple independently developed software versions designed not to fail simultaneously under the same input. It can be implemented by running multiple versions concurrently using additional hardware or by sequentially running them using time redundancy;
Information Redundancy: This approach protects data by adding extra bits to the original data. Common techniques include error detection and error correction coding, which ensure data integrity in memory and communication channels;
Time Redundancy: This method detects transient faults by repeating or rerunning the same task. It leverages the fact that hardware faults are often transient, making it less likely for the same fault to recur upon re-execution. Although time redundancy can be implemented with fewer resources, it may degrade system performance;
Hybrid redundancy is a technique that tolerates faults by combining various fault tolerance mechanisms. It takes advantage of the fact that each fault tolerance mechanism exhibits strengths in different fault models. Since multiple fault tolerance mechanisms are employed, it consumes more resources while providing a higher level of fault tolerance. Hybrid redundancy is a technique that tolerates faults by combining various fault tolerance mechanisms. It takes advantage of the fact that each fault tolerance mechanism exhibits strengths in different fault models. Since multiple fault tolerance mechanisms are employed, it consumes more resources while providing a higher level of fault tolerance.

With the recent advancements in artificial intelligence (AI)-based fault tolerance techniques, AI-Based Fault Tolerance—leveraging machine learning and cognitive computing—has garnered significant attention. Machine Learning-Based Fault Prediction techniques learn from historical system failure patterns to predict and preempt faults and are particularly effective in anomaly detection using sensor data [26]. Furthermore, Cognitive Computing for System Resilience analyzes system behavior under various conditions and employs cognitive-based decision-making to automatically select the optimal response strategy when faults occur [27]. Finally, Deep Learning for Error Correction techniques learn error correction patterns from large-scale datasets, enabling the maintenance of high reliability even in complex environments [28]. These AI-based techniques, when combined with traditional hardware and software redundancy methods, can offer more effective fault tolerance mechanisms for aerospace and embedded systems. However, since these approaches are primarily applied as software-based responses after system synthesis, they are not addressed in the case study presented in this work.

The reliability of an FTM refers to its ability to prevent system failure even when exposed to faults such as radiation-induced errors. In this study, we use the Architectural Vulnerability Factor (AVF) to compare the reliability of different FTMs. AVF indicates the probability that faults occurring in hardware components will affect the system output and is commonly used to measure susceptibility to soft errors like Single Event Upsets (SEUs) [29]. AVF values range from 0 to 1, with lower values indicating better fault tolerance.

Resource utilization refers to the power, delay, and area necessary for the electronic equipment to function. In this study, we use the Power-Delay-Area Product (PDAP) to provide a comprehensive comparison of various resources:

Power: The amount of power consumed while the circuit is operating;
Delay: The time from when a signal is input until the circuit produces an output;
Area: The physical space on the chip used to implement the circuit.

A lower PDAP value indicates a more efficient design, helping system designers balance power efficiency, operating speed, and chip area. Studies that utilize PDAP to assess design efficiency include research proposing a design methodology for optimizing digital circuits [30] and a study that introduced a high-performance inverter architecture, evaluated in terms of low power, low delay, and high density, and ultimately compared using PDAP [31].

In this study, we reviewed three metrics for evaluating the performance of FTMs: EDP (Energy-Delay Product), EDAP (Energy-Delay-Area Product), and PDAP. EDP reflects energy and delay factors but does not include area information, which can result in a somewhat limited evaluation in terms of hardware redundancy. EDAP is primarily obtained through simulation or mathematical modeling after synthesis, which may impose some limitations when applied at the RTL-stage evaluation [32,33,34,35]. In contrast, PDAP allows us to acquire key parameters—power, delay, and area—at the RTL stage using available support tools [36,37,38,39]. It can provide a more comprehensive performance evaluation for real-time applications, such as FPGA logic circuits. These aspects are presented in Table 1. Consequently, PDAP was adopted as the evaluation metric in this study.

2.4. Comparative Studies on Fault Tolerance Mechanisms

Comparing Fault-Tolerant Mechanisms (FTMs) is a crucial factor in evaluating the efficiency of electronic devices. Accordingly, numerous studies have compared the performance and resource requirements of various FTMs, as shown in Table 2. For example, a study that compared the speed, cost, and reliability of redundancy techniques applied in real-time applications [40] focused on analyzing reliability for HW Redundancy and Time Redundancy, primarily through mathematical modeling. Similarly, another study [11] compared the area overhead of Triple Modular Redundancy (TMR) and Duplication with Comparison (DWC) for SEU tolerance by evaluating HW Redundancy and Information Redundancy at the register level using RTL analysis.

However, most previous studies have either compared the advantages and disadvantages of each FTM type in isolation or, in experimental comparisons, evaluated only specific metrics. For instance, in [41], all three redundancy types—HW, Information, and Time Redundancy—were compared, yet only delay was measured using processor-based simulations. In [42,43], while power, delay, and area were analyzed via RTL analysis, only HW Redundancy was considered, excluding Information and Time Redundancy. In [43], power, delay, and area were measured through FPGA synthesis, but the comparison was limited to Information Redundancy without incorporating HW Redundancy. Notably, [44] evaluated reliability and area for a 36-bit adder, a simple FSM, and a combinational circuit via FPGA-based fault injection experiments; however, the analysis of power and delay was insufficient.

Table 2. Comparison of used redundancy, evaluation metrics, target circuit, and testing methods in related works.

Reference No.	HW Redundancy	Information Redundancy	Time Redundancy	Measured Metrics	Target Circuit	Method
[40]	O	X	O	Reliability	-	Mathematical analysis
[11]	O	O	X	Reliability, Power, Area	Reed Solomon Decoder	FPGA fault injection
[41]	O	O	O	Delay	Simulation program	Processor-based simulation
[42]	O	X	X	Power, Delay, Area	-	RTL analysis
[43]	X	O	X	Power, Delay, Area	-	FPGA synthesis
[44]	O	O	O	Reliability, Area	36-bit adder; simple FSM; simple combinational circuit	FPGA fault injection
[Paper]	O	O	O	Reliability, Area, Delay, Power	AES 128	RTL simulation and fault injection

To address these limitations, the present study applies various types of FTMs to an AES-128 security circuit and conducts a comprehensive analysis of reliability, area, delay, and power consumption for circuits employing HW Redundancy, Information Redundancy, and Time Redundancy. Furthermore, by leveraging RTL simulation and fault injection techniques, our analysis provides an integrated performance evaluation that overcomes the shortcomings of previous studies—namely, the focus on specific metrics or isolated circuit components—thus offering a more practical comparison for real-world system design.

3. Methodology for Comparative Analysis of FTM Performance

3.1. Fault Tolerance Mechanism (FTM)

Fault tolerance refers to techniques that enable a system to continue performing its intended functions even if some of its components fail or encounter errors. These techniques are critical in domains requiring high availability, mission-critical, or life-critical operations due to the potentially enormous losses associated with mission failure.

A variety of fault tolerance techniques can be adopted to mitigate space-radiation-induced failures in space electronic systems. For instance, in ASICs, one can use radiation-hardened flip-flops to mitigate faults caused by SEUs in memory elements [45]. However, in FPGAs, it is difficult for users to change the underlying hardware, making it impossible to use radiation-hardened flip-flops. Thus, FPGAs have more limited fault tolerance options. Generally, for FPGAs, designers apply mechanisms such as TMR at the circuit design stage. This section illustrates the block diagrams of the FTMs discussed in this paper (Figure 4, Figure 5 and Figure 6) and introduces their operational principles and characteristics.

3.1.1. TMR (Triple Modular Redundancy)

TMR is one of the most widely used hardware redundancy-based fault tolerance mechanisms. A TMR circuit triple-replicates the module to be protected, processes signals in parallel in three identical modules, and uses a voter to determine the result via majority voting. With triple redundancy, even if one module fails, the remaining two modules can still produce the correct result; since the voter selects the output shared by the majority of modules, the single failing module is masked. The primary advantages of TMR are its simplicity and short delay. However, since two additional copies of the module are needed, TMR consumes more than twice the resources of the original design, resulting in additional area and power overhead.

Figure 4. Block diagram: TMR.

3.1.2. DMR+ (Dual Modular Redundancy+)

DMR+ is a hardware redundancy-based FTM designed to reduce area usage relative to TMR while maintaining similar delay [46]. The DMR+ circuit uses two sets of duplicated bits as well as parity bits to detect and correct errors. Unlike DMR (which only detects errors), DMR+ can correct them. It also consumes fewer resources than TMR, but it has the drawback of slightly longer delay compared to TMR.

Figure 5. Block diagram: DMR+.

3.1.3. Hamming Code

Hamming Code is a binary linear code designed to detect and correct errors in data communication and storage systems. It consists of data bits and parity bits. As depicted in Figure 6, the original data (Msg_in d0-d3) must be encoded into a Hamming Code and then later decoded back into the original data (Msg_out d0-d3). During encoding, parity bits are generated and added to the data bits such that each parity bit can check for errors in specific positions of the data. During decoding, the circuit checks the parity bits (Decoder s0-s2) and the associated data bits (Decoder e0-e3) to detect and correct errors, outputting the corrected original data bits. Hamming Code can detect two-bit errors and correct one-bit errors. As the number of data bits increases, the overhead ratio decreases. However, a downside is that the encoding and decoding processes add extra delay.

Figure 6. Block diagram: Hamming Code.

3.1.4. Additional Fault Tolerance Mechanisms for Comparison

The following four FTMs were used for comparisons of reliability and resource usage, but detailed explanations have been omitted as they are well-known FTMs. Comprehensive descriptions can be found in references A, B, and C.

DMR (Dual Modular Redundancy): Uses two identical modules and a comparator to detect faults; however, it cannot correct errors;
QMR (Quintuple Modular Redundancy): Uses five identical modules and majority voting for fault detection and correction, providing better fault masking than TMR but at the cost of significantly higher resource usage;
Parity Check Code: Adds a single parity bit to a data block, ensuring either even or odd parity. It can detect any single-bit error but cannot correct it and cannot detect or correct multi-bit errors;
Two-Dimensional-Parity Check Code: Extends the 1D-parity method by adding both row and column parity bits. It can detect and correct all single-bit errors and detect (but not correct) double-bit or triple-bit errors.

3.2. Reliability and Resource Utilization Analysis of Fault Tolerance Mechanisms

3.2.1. Reliability of Fault Tolerance Mechanisms

To evaluate the fault tolerance reliability, we calculated the Architectural Vulnerability Factor (AVF) by conducting statistical fault injection tests. The statistical fault injection test is a test in which faults are injected during the RTL simulation of circuits written in Verilog to verify how the circuit behaves when a fault occurs. The RTL simulation was conducted using Icarus Verilog (v11.0), an open-source Verilog simulator, and the fault injection was conducted using Verilog Fault Injector (v1.0).

(i): Fault Model

In this study, we applied a fault model that reflects the actual radiation-induced fault patterns (SEUs) observed in SRAM-based FPGAs. Based on radiation test results reported in previous studies, faults in a space radiation environment mainly manifest as single-bit flips, and when two-bit faults are also considered, approximately 95% of the overall fault patterns are represented [47]. Building on these findings, our fault model is designed to include both single-bit and multi-bit faults.

The fault model is broadly categorized into Single-Bit Faults and Multi-Bit Faults.

Single-Bit Faults: Stuck-at 0, Stuck-at 1, Bit Flip Flop
Multi-Bit Faults: Stuck-at 00, Stuck-at 01, Stuck-at 10, Stuck-at 11, 2 Bit Flip Flop

This fault model is developed to mirror, as closely as possible within the RTL-level implementation scope, the SEU patterns observed in an actual radiation environment. In particular, single-bit faults model errors that typically occur within individual SRAM cells, while multi-bit faults capture the possibility of simultaneous errors in adjacent cells caused by high-energy particle collisions. In our study, we conducted over 10,000 statistical fault injection experiments based on these various fault types, thereby quantitatively evaluating the reliability and resource consumption characteristics of the fault-tolerant mechanisms. Unlike previous studies that assumed a uniform fault distribution, our fault injection model, which takes actual SEU patterns into account, enables a more realistic reliability assessment. This approach ultimately contributes to the optimization of reliability and resource efficiency during the early design stages of FPGA-based space electronic equipment.

The validity of these fault models can be confirmed by previous radiation tests and simulation results. For example, Di Natale et al. reported that in radiation tests on an SRAM-based FPGA, single-bit faults accounted for 72.29% and multi-bit (2-bit) faults for 23.53% of the total faults [47]. Additionally, Neale and Sachdev demonstrated that neutron radiation tests on 45, 32, and 22 nm SRAM revealed that single-bit and 2-bit faults accounted for approximately 97–98% of all faults [48]. These findings support that the fault models defined in our study effectively reflect the main SEU patterns observed in actual radiation environments, thereby enabling a more realistic reliability evaluation.

(ii): Test Environment

The fault model used in this study consists of transient single-bit stuck-at-0, stuck-at-1, and bit-flip faults. These faults were injected throughout the entire AES-128 security circuit, including the 32-bit registers in the Key Expansion module protected by each FTM.

The required number of fault injections was determined by applying a 95% confidence level and a ±1% margin of error. The number of tests (

n

) is calculated using the population size (

N

), margin of error (

e

), confidence level constant (

t

= 1.96), and standard deviation (

p

= 0.5), as shown in Equation (1).

n = \frac{N}{1 + e^{2} \times \frac{N - 1}{t^{2} \times p \times (1 - p)}}

(1)

Table 3 summarizes the number of fault injection points, simulation time, total fault space (the product of the number of fault injection points and the simulation time), and the total number of test iterations (

n

) calculated using Equation (1). The number of fault injection points equals the total number of signals (wires and registers) in the Verilog code that can be injected with faults; this value increases when an FTM is added to the circuit. The fault injection simulation time refers to the total duration of each circuit simulation under the test bench (510,000 ns for every target circuit). Faults are randomly injected at a random time and location during the simulation. Therefore, the population size (

N

)—the total fault space—is the product of the number of fault injection points and the simulation time. Once this population size surpasses a certain threshold, the number of required tests converges to 9604.

(iii): Statistical Fault Injection Process

This study follows the process below to measure the fault tolerance reliability of the target circuits:

Fault-Tolerant Circuit Design: Implement fault-tolerant versions of the AES-128 security circuit by protecting the 32-bit registers in the Key Expansion module using the following mechanisms: DMR, DMR+, TMR, QMR, Parity Code, 2D-Parity Code, and Hamming Code;
Fault Injection Simulation: Use the Verilog Fault Injector (VFI) tool to conduct fault injection simulations on both the fault-tolerant and baseline circuits. Fault locations are randomly selected across the fault space, and the test is repeated according to the calculated number of iterations;
Success/Failure Determination: After each simulation, determine whether the circuit successfully tolerated the fault. If the circuit still produces the correct output despite the injected fault, it is considered a “success”; otherwise, it is a “failure”;
$A V F$ Calculation: The $A V F$ (Architectural Vulnerability Factor) is the ratio of failed cases to the total number of injected faults, as shown in Equation (2).

$A V F = \frac{N u m b e r o f F a i l u r e s}{N u m b e r o f F a u l t I n j e c t i o n s} \times 100 [%]$

(2)

The Figure 7 illustrates the statistical fault injection process. In this study, 10,000 fault injection experiments were conducted independently for each fault model (stuck-at 0, stuck-at 1, and bit-flip). Consequently, the

A V F

for each fault type was analyzed independently in the simulation environment.

3.2.2. Resource Utilization of Fault Tolerance Mechanisms

The resource metric was measured using the Xilinx VIVADO (2024.2) tool to quantify the required resources based on the use of the FTM, and this was used to calculate PDAP. PDAP is the product of area usage, power consumption, and delay [36].

(i): Resource Measurement Environment

For resource measurements, we used fault-tolerant circuits implemented in the AES-128 security circuit. The evaluated circuits include three FTM variants with DMR, DMR+, TMR, QMR, 1D-Parity Code, 2D-Parity Code, and Hamming Code applied on a 32-bit register, as well as a Baseline circuit without any FTM. The resource measurement was performed using Xilinx’s VIVADO Design Suite, and the metrics considered were power consumption, delay, and area.

Figure 7. Statistical fault injection test flowchart.

This study was carried out at the early register transfer level (RTL) development stage, which plays a crucial role in identifying and mitigating potential reliability and resource consumption issues at the initial design phase. By applying FTMs at the RTL stage, we can utilize the functionalities of VIVADO to perform rapid and accurate evaluations of power, delay, and area without a full board-level synthesis. This early analysis facilitates iterative design optimization and ensures that design decisions are based on reliable metrics.

Furthermore, numerous studies in the FPGA-based system design domain have demonstrated the effectiveness and accuracy of using the Xilinx VIVADO tool for evaluating power, delay, and area [49,50,51]. Building on these established works, our study adheres to best practices by employing a validated tool for similar analyses. Nonetheless, we acknowledge that the VIVADO tool itself may have inherent measurement errors or limitations. To minimize these effects, we maintained a consistent measurement environment and conducted multiple repeated experiments. Future work should consider additional validation through comparisons with alternative measurement methods.

(ii): Resource Analysis Process

Circuit Synthesis: Use Xilinx VIVADO Design Suite to synthesize the eight different circuits;
Power Measurement: Use the Report Power feature to sum dynamic and static power, representing total power utilization;
Delay Measurement: Use Timing Analysis to derive the circuit’s maximum data path delay;
Area Measurement: Use Report Utilization to record the number of flip-flops, LUTs, and BRAMs. These values are summed to represent the circuit’s area usage;
$P D A P$ Calculation: Compute $P D A P$ using Equation (3) [36]

$P D A P = P o w e r \times D e l a y \times A r e a .$

(3)

4. Case Study

4.1. Target System: Advanced Encryption Standard (AES-128)

AES-128 is an encryption method designated as a Federal Information Processing Standard (FIPS) by the U.S. National Institute of Standards and Technology (NIST). It implements the Rijndael algorithm and performs the four-step procedure illustrated in Figure 8 (KeyExpansion, 0 Round, 1–9 Round, Last Round) to generate an encryption key.

The fault tolerance mechanism (FTM) is applied to the 32-bit registers within the circuit that performs the Key Expansion process of AES-128. During a single encryption operation, AES-128 undergoes 10 Key Expansion processes, with each process utilizing four 32-bit registers, resulting in a total of 40 32-bit registers where FTM is applied. Additionally, to enhance the accuracy of our study on Key Expansion, we also applied FTM to the 32-bit registers within the circuit responsible for the Round process. The Round process is repeated nine times during a single encryption operation, with each iteration involving four 32-bit registers, leading to an additional 36 32-bit registers being equipped with FTM.

AES is also used in space electronic systems. For example, the European Space Agency (ESA) introduced the AES algorithm when upgrading the Spacecraft Computer Unit (SCU) of the Eurostar 3000 satellite platform to ensure secure telecommand (TC) links and data integrity between ground and space [52].

Figure 8. AES-128 operation flow.

4.2. Fault Tolerance Mechanism–Applied Target Circuit

Figure 9 shows the Key Expansion section of the AES-128 circuit, which serves as the target circuit for FTM application. The target circuit includes four 32-bit registers.

Figure 10 illustrates the target circuit where Triple Modular Redundancy (TMR) is applied as the FTM. In this configuration, four 32-bit registers are replaced by a TMR module consisting of triplicated 32-bit registers and a voter. The TMR module comprises registers and a voter. Since the registers are inherently fault-tolerant for all fault models, their AVF is 0%. The voter is implemented with three 2-input AND gates and one 3-input OR gate that takes the outputs of these AND gates. Consequently, the voter exhibits stronger fault tolerance against Stuck-at 0 faults compared to Stuck-at 1 faults. This is because, if one of the three AND gates outputs a 0 due to a Stuck-at 0 fault instead of a 1, the remaining two AND gates can still output 1, thereby allowing the OR gate to mask the fault. In contrast, if a Stuck-at 1 fault occurs and an AND gate outputs a 1 instead of a 0, even if the other two gates output 0, the OR gate will reflect the faulty value.

Figure 11 shows the target circuit with the DMR+ FTM applied. The DMR+ module processes a 32-bit input by dividing it into two 16-bit segments and consists of parity logic for generating parity bits as well as redundancy logic. In this configuration, four target registers are replaced by DMR+ modules. Each DMR+ module is composed of a parity bit generator, a register, a parity bit checker, and a voter. The parity bit generator and checker do not affect the module’s output when no error exists in the protected register, ensuring that their AVF remains 0% across all fault models. Additionally, even if a fault causes a register value to change, the parity bit checker can correct the error, resulting in an AVF of 0%. Similarly to TMR, the voter in the DMR+ module masks faults arising from Stuck-at 0 errors via the AND-OR configuration, whereas a Stuck-at 1 fault causes the OR gate to pass the erroneous value. Therefore, the fault tolerance against Stuck-at 0 faults is higher than that against Stuck-at 1 faults.

Figure 12 depicts the target circuit using the Hamming Code as the FTM. The Hamming Code module is composed of an encoder, a register, and a decoder. The encoder includes a generator logic that creates parity codes based on the input message, while the decoder consists of a parity checker to detect error bits using the parity codes and a corrector to fix any detected errors. In this scheme, each of the four target registers is replaced by a Hamming Code module. The encoder generates parity bits in such a way that the decoder can correct a single error and detect two errors, and these parity bits are transmitted along with the input data to the register. Consequently, even if a 1-bit fault occurs in the register or the encoder, the decoder is capable of correcting the error, thus preventing system failure. As a result, the Hamming Code module does not exhibit any difference in fault tolerance between Stuck-at 0 and Stuck-at 1 faults, unlike the TMR and DMR+ methods.

4.3. Efficiency Analysis Results

4.3.1. Reliability Analysis of Fault Tolerance Mechanisms

In this study, simulation-based fault injection was employed to evaluate fault tolerance mechanisms at the early Register Transfer Level (RTL) development stage. This approach enables rapid and cost-efficient optimization during the initial design phase, and it facilitates reliability assessment without the need for physical testing. To this end, 10,000 statistical fault injection trials were conducted for each fault model, allowing for a comprehensive analysis of the performance of fault tolerance mechanisms under various fault conditions.

Considering the potential random bias that may arise during the defect injection process in real operational environments, we performed additional validation to ensure the stability and reliability of the experimental results. For each defect model, AVF values were obtained by conducting 200 defect injection tests, and this process was repeated 50 times to secure a sufficient data sample. Based on these data, a variance analysis (ANOVA) was conducted. During the analysis, the null hypothesis was set as “the means of all groups (AVF value groups) are equal,” and a significance level of 0.05 was applied. As a result, in all repeated tests, the p-values exceeded 0.05, confirming that there were no statistically significant differences in AVF values among the groups. The ANOVA results can be found in Table A1 in Appendix A.

Table 4 summarizes the reliability analysis results for each circuit. The “Redundancy Type” and “Target Circuit” columns indicate the type of FTM applied to the baseline circuit. In the “

A V F

per Fault Model” column, the

A V F

(Architectural Vulnerability Factor) for each type of injected fault is shown based on the statistical fault injection tests.

{A V F}_{a v g}

represents the average value of the

A V F

obtained from applying each fault injection model to the target circuit under analysis. In this paper, we assume that all fault models occur with equal probability [53]. For example, in Table 4, we address single-bit faults. Since the single-bit fault model comprises three cases (stuck-at 0, stuck-at 1, and bit-flip), we have n = 3. The average AVF, denoted by

{A V F}_{a v g}

, is computed by summing the AVF values for each of the three fault models and dividing by 3. To evaluate the performance of the developed FTM, we use

1 - {A V F}_{a v g}

, which represents the complement of AVF—that is, the probability that no failure occurs due to a fault injection.

1 - {A V F}_{a v g} = 1 - \frac{\sum_{1}^{n} {A V F}_{n}}{n}, (n = n u m b e r o f f a u l t m o d e l s)

(4)

Among hardware redundancy methods, QMR achieved the highest reliability (85.92%); among information redundancy methods, Hamming Code had the highest reliability (88.85%); and among all redundancy methods, Hamming and TMR showed the highest reliability (95.82%). QMR’s strong result stems from its quintuple-modular design—more extensive redundancy than other hardware-based FTMs. Hamming Code’s strong result is due to the larger number of parity bits added during its encoding/decoding stages compared to other information redundancy FTMs. Hamming and TMR achieved the highest

1 - {A V F}_{a v g}

value among all redundancy types because it combines TMR from hardware redundancy, which shows a relatively high

1 - {A V F}_{a v g}

value, with Hamming Code from information redundancy, which exhibits the highest

1 - {A V F}_{a v g}

value. On the other hand, DMR and Parity, which lack error-correction features, showed only small AVF improvements compared to the baseline.

In the hardware redundancy category, the circuit’s tolerance for stuck-at-1 faults is generally higher than for stuck-at-0 faults. This characteristic arises from the structure of the majority voter: by design, AND gates tend to mask stuck-at-1 faults, while OR gates tend to mask stuck-at-0 faults. In a TMR majority voter, there are three 2-input AND gates but only one 3-input OR gate, leaving it more vulnerable to stuck-at-0 faults.

Within information redundancy, Hamming Code uses parity checks that treat stuck-at-0 and stuck-at-1 faults equally. However, its bit-flip AVF was about twice as high as either stuck-at fault, since bit-flips differ more significantly from their original state than single stuck-at faults.

Hybrid redundancy appears as a fusion of the features of the combined FTM. In the case of Hamming and TMR, the influence of the Hamming Code results in equal fault tolerance performance for both stuck-at 0 and stuck-at 1 faults.

Figure 13 presents the fault tolerance performance of various target circuits using the

1 - {A V F}_{a v g}

metric, with error bars representing the 95% confidence intervals. This graphical representation enables a direct comparison of different fault tolerance mechanisms (FTMs). Notably, FTMs without error correction capabilities, such as DMR and 1D-Parity, exhibit significantly lower fault tolerance compared to those with correction functionality. Among the FTMs incorporating fault correction, Hamming and TMR demonstrates the highest fault tolerance performance, achieving a

1 - {A V F}_{a v g}

value of 95.82. In contrast, DMR+, which has the lowest performance among the fault-correcting FTMs, records a

1 - {A V F}_{a v g}

of 79.56, indicating a 20.44% performance gap. However, this difference is relatively small when considering resource utilization, which will be discussed in later sections. Additionally, the performance differences among FTMs, up to Hamming and TMR, are relatively close, suggesting that factors beyond fault tolerance, such as area overhead, power consumption, and design complexity, should also be considered when selecting an appropriate FTM.

Table 5 shows the fault injection test results for 2-bit faults. The fault models used in this experiment include stuck-at 00/01/10/11 and 2-bit bit-flip faults. The fault injection targets and the number of fault injection trials were kept identical to those in the 1-bit fault injection experiment for consistency.

Comparing the results with the 1-bit fault injection experiment, all target circuits exhibit lower

1 - {A V F}_{a v g}

values, indicating that 2-bit faults generally have a higher likelihood of inducing failures than 1-bit faults. Additionally, a distinction between hardware redundancy and information redundancy reveals that hardware redundancy exhibits greater resilience against 2-bit faults. This effect is particularly pronounced in 2-bit bit-flip faults, where masking is not as effective. Specifically, when compared to the 1-bit fault injection results, hardware redundancy mechanisms such as DMR+, TMR, and QMR exhibit 1.39%, 8.06%, and 3.44% higher AVF values, respectively. In contrast, information redundancy mechanisms, such as 2D-Parity and Hamming, show significantly higher increases of 15.85% and 45.46%, respectively. This is because hardware redundancy mechanisms utilize voters that mask entire register-level faults, whereas 2D-Parity and Hamming only guarantee correction for 1-bit faults, making them more vulnerable to 2-bit faults.

For the hybrid FTM (Hamming and TMR), the AVF increases by 27.95%, which is higher than TMR but lower than Hamming. This result suggests that while Hamming and TMR experiences a notable increase in AVF under 2-bit faults, it still benefits from the advantages of both hardware and information redundancy, mitigating the impact more effectively than information redundancy alone.

In order to further verify the results obtained by applying FTM to the Key Expansion module and conducting fault injection tests, we applied FTM to the Round module and performed fault injection tests. As shown in Table 6 and Table 7, which summarize the test results for the target circuits Baseline, DMR+, TMR, and Hamming, the results are similar to those obtained from the tests on the Key Expansion module, confirming that our approach can be applied to other targets as well.

4.3.2. Resource Utilization Analysis of Fault Tolerance Mechanisms

Table 8 summarizes the resource utilization analysis for each circuit. “Redundancy Type” and “Target Circuit” again indicate the FTM used in the baseline circuit. “Power” represents the circuit’s maximum required power, “Delay” is the critical path delay, and “Area” is the sum of LUTs, flip-flops, and BRAMs needed to implement the circuit. PDAP (Power-Delay-Area Product) is an integrated metric of resource utilization obtained by multiplying these three factors.

The “Rank” column sorts the circuits based on 1/PDAP in ascending order of resource usage (i.e., smaller PDAP ranks higher). DMR and Parity, which rank 1 and 2, have PDAP values only slightly higher than the baseline because they provide minimal fault detection functionality and no error correction. Among FTMs with error correction, TMR showed the smallest PDAP (39.28) among hardware redundancy methods, while 2D-Parity showed the smallest PDAP (56.09) among information redundancy methods. Both TMR and 2D-Parity require relatively less additional power and delay compared to the other FTMs.

Comparing hardware redundancy and information redundancy, hardware redundancy typically requires less power and delay due to its relatively simple majority voter logic. In contrast, information redundancy involves encoding and decoding processes, which consume more power and introduce additional delay relative to a simple voter.

Figure 14 presents the PDAP values for each target circuit, providing a visual comparison of resource utilization. FTMs without fault correction capabilities, such as DMR and 1D-Parity, exhibit minimal differences in PDAP compared to the Baseline. This suggests that when fault detection alone is sufficient, without the need for correction, DMR or 1D-Parity can be employed to achieve fault tolerance with minimal resource overhead. Among FTMs capable of fault correction, Hamming records the highest PDAP, showing a 189.70% increase compared to TMR, which has the lowest PDAP, highlighting a significant disparity. Notably, this difference is substantially larger than the 20.44% increase in

1 - {A V F}_{a v g}

observed in Hamming and TMR compared to DMR+, which exhibited the lowest fault tolerance performance. These findings indicate that a moderate reduction in fault tolerance performance can lead to a substantial decrease in resource usage, offering valuable insights for FTM selection in resource-constrained environments.

In this study, to examine the efficiency of PDAP with respect to parameter variations, we conducted a PDAP analysis based on optimization options. Table 9 presents the results obtained by defining three logical optimization options within the overall design flow and calculating the resource usage and PDAP values for three design variants—Baseline, TMR, and Hamming—for each option. Specifically, we introduced the Flow_Area option, focusing on area optimization, the Flow_Perf option, targeting performance enhancement, and the Flow_Alternate option, considering routing ease (routability).

First, Flow_AreaOptimized_high option—designed to minimize area usage—the power consumption of the Baseline, TMR, and Hamming circuits decreased by 0.19%, 0.94%, and 0.51%, respectively, compared to the Default option. Furthermore, the delay decreased by 3.38% and 5.40% for the Baseline and TMR circuits, respectively, while it in-creased by 6.39% for the Hamming circuit. The area was reduced by 0.69%, 0.44%, and 1.14% for the Baseline, TMR, and Hamming circuits, respectively. Consequently, the PDAP decreased by 4.24% and 6.69% for the Baseline and TMR circuits but increased by 4.63% for the Hamming circuit. In the Baseline and TMR cases, simultaneous reductions in power, delay, and area led to an overall decrease in PDAP, whereas in the Hamming circuit, the increase in delay outweighed the reductions in power and area, resulting in an increased PDAP.

Second, performance-oriented Flow_PerfOptimized_high option, the power consumption of the Baseline, TMR, and Hamming circuits decreased by 0.19%, 0.16%, and 0.13%, respectively, relative to the Default option. The delay decreased by 1.06% for the Baseline circuit but increased by 2.24% and 1.14% for the TMR and Hamming circuits, respectively. Additionally, the area increased by 0.69%, 0.39%, and 0.82% for the Baseline, TMR, and Hamming circuits, respectively. As a result, PDAP decreased by 0.57% for the Baseline circuit, while it increased by 2.47% and 1.84% for the TMR and Hamming circuits, respectively. In this case, the Baseline circuit exhibited a net PDAP decrease because the reduction in delay exceeded the increase in area, whereas in the TMR and Hamming circuits, the increase in delay dominated over the power reduction, leading to an overall PDAP increase.

Third, in the Flow_AlternateRoutability option—which employs an enhanced routing algorithm to remove complex logic elements such as MUX and Carry—the power consumption of the Baseline, TMR, and Hamming circuits increased by 2.72%, 0.16%, and 1.40%, respectively, compared to the Default option. The delay increased by 3.54% for the Baseline circuit, while it decreased by 6.94% and 3.94% for the TMR and Hamming circuits, respectively. The area decreased by 1.34%, 0.85%, and 1.27% for the Baseline, TMR, and Hamming circuits, respectively, leading to an overall PDAP increase of 11.60% for the Baseline circuit and decreases of 4.09% and 0.81% for the TMR and Hamming circuits, respectively. In the Baseline circuit, increases in power, delay, and area collectively resulted in a higher PDAP; however, in the TMR and Hamming circuits, the reduction in delay was sufficiently large to offset the increases in power and area, resulting in a net PDAP decrease.

Thus, PDAP varies depending on which parameter—power, delay, or area—exhibits the most significant change. This is because PDAP is calculated as the product of power, delay, and area, meaning that even a large change in one parameter can dominate the overall PDAP value. Therefore, it can be concluded that PDAP is influenced proportionally by each parameter and reflects their combined effects.

4.3.3. Comparative Analysis of Reliability and Resource Utilization

Table 10 summarizes both the reliability and resource utilization results for each circuit. “Redundancy Type” and “Target Circuit” again refer to the type of FTM applied to the baseline circuit. The reliability metric is the AVF improvement rate, calculated as in Equation (4). A larger value indicates better reliability. The resource utilization metric is the absolute value of the PDAP increase rate, as in Equation (5). A larger value indicates higher resource usage.

A V F I m p r o v e m e n t = \frac{{A V F}_{B a s e l i n e} - {A V F}_{F T M}}{{A V F}_{B a s e l i n e}}

(5)

P D A P I n c r e a s e = |\frac{{P D A P}_{B a s e l i n e} - {P D A P}_{F T M}}{{P D A P}_{B a s e l i n e}}|

(6)

In terms of reliability, Hamming Code, QMR, 2D-Parity, TMR, and DMR+ ranked highest in that order. If reliability were the sole consideration, Hamming Code would be the top choice. However, in terms of resource utilization, Hamming Code, QMR, and 2D-Parity require 524%, 253%, and 118% more resources than TMR, respectively. These increases far exceed their respective reliability gains (25%, 15%, and 7% improvement over TMR). Thus, TMR is the most efficient choice when balancing both reliability and resource usage.

5. Conclusions

This study analyzed the reliability and resource consumption of eight distinct fault-tolerant mechanisms applied to target circuits before synthesis on commercial Field Programmable Gate Arrays. These eight mechanisms were categorized based on their structure and characteristics into Hardware Redundancy, including Dual Modular Redundancy (DMR), Enhanced Dual Modular Redundancy (DMR+), Triple Modular Redundancy (TMR), and Quadruple Modular Redundancy (QMR), and Information Redundancy, including 1D-Parity, 2D-Parity, and Hamming Code. Reliability was assessed using the Architectural Vulnerability Factor through statistical fault injection tests, while resource consumption was evaluated using the Power-Delay-Area Product with measurements obtained via Xilinx VIVADO (2024.2) software, specifically power consumption, delay time, and area. Overall, Information Redundancy techniques offered superior reliability, while Hardware Redundancy methods excelled in resource efficiency.

This study underscores the critical trade-offs between reliability and resource overhead among different fault-tolerant mechanisms, highlighting the importance of selecting appropriate mechanisms to optimize the balance based on specific system requirements. Additionally, the innovative analysis method employed allows for rapid and straightforward comparisons of reliability and resource efficiency during the early Register Transfer Level (RTL) development stage, significantly reducing overall development costs and time. These advantages extend the applicability of the proposed methods beyond space applications to fields such as Urban Air Mobility (UAM), Advanced Air Mobility (AAM), aircraft, and high-altitude electronic systems. By providing valuable insights and facilitating informed decision-making early in the development process, this research offers substantial benefits for engineers aiming to optimize fault-tolerant mechanisms across various high-stakes environments. Future work should explore more complex circuit architectures and diverse fault models to further refine the selection criteria for fault-tolerant mechanisms tailored to real-world applications.

Author Contributions

Data review and visualization were performed by C.K. and paper writing and editing were managed by C.K., D.L. and J.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the GRRC program of Gyeonggi province (GRRC Korea Aerospace University 2023-B02) and the National Research Foundation of Korea (NRF), grant funded by the Korean government (MSIT) (No. 2022K1A3A1A2001493).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Abbreviation	Full Name	Description
AES	Advanced Encryption Standard	A symmetric key encryption standard (AES-128 used)
AAM	Advanced Air Mobility	Next-generation air mobility systems
AVF	Architectural Vulnerability Factor	A reliability metric for structural vulnerability assessment
BRAM	Block RAM	Block memory in FPGA
COTS	Commercial Off-The-Shelf	Commercially available electronic components
DMR	Dual Modular Redundancy	Fault-tolerant mechanism with dual module redundancy (error detection)
DMR+	Enhanced Dual Modular Redundancy	Improved dual modular redundancy (error detection and correction)
ECC	Error Correction Code	A technique for detecting and correcting errors in data
ESC	Electronic Speed Control	Electronic system for motor speed control
FPGA	Field Programmable Gate Array	A reconfigurable semiconductor device
FSM	Finite State Machine	A computational model with finite states
FTM	Fault Tolerance Mechanism	Techniques for ensuring system fault tolerance
FF	Flip-Flop	A basic digital memory element in sequential circuits
HW	Hardware	Physical electronic components
LEO	Low Earth Orbit	Orbit region for satellites (typically < 2000 km altitude)
LUT	Look-Up Table	A digital logic component used in FPGA
NASA	National Aeronautics and Space Administration	The U.S. space agency
NIST	National Institute of Standards and Technology	A U.S. federal agency for technology and standardization
PDAP	Power-Delay-Area Product	A metric evaluating resource efficiency in electronic designs
QMR	Quintuple Modular Redundancy	Fault-tolerant mechanism with five redundant modules (error masking)
RTL	Register Transfer Level	A digital circuit design abstraction level
SEU	Single Event Upset	A radiation-induced bit flip in memory
SER	Soft Error Rate	The rate of radiation-induced transient errors in electronics
SRAM	Static Random Access Memory	A type of volatile memory with faster access times
TC	Telecommand	Remote command transmission for satellite control
TMR	Triple Modular Redundancy	A fault-tolerant mechanism using three redundant modules (error masking)
UAM	Urban Air Mobility	Air transportation in urban environments
VFI	Verilog Fault Injector	A tool for fault injection testing in Verilog-based simulations

Appendix A

Table A1. ANOVA Results.

FTM	Source	Sum of Squares	Degrees of Freedom	Mean Square	F-Value	p-Value
BF_Baseline	AVF	0.0035	4	8.6743 × 10⁻⁴	1.7931	0.1317
	Residual	0.0943	195	4.8375 × 10⁻⁴	–	–
	Total	0.0978	199	–	–	–
BF_DMR	AVF	0.0042	4	0.0011	2.6235	0.0361
	Residual	0.0785	195	4.0252 × 10⁻⁴	–	–
	Total	0.0827	199	–	–	–
BF_DMRplus	AVF	0.0037	4	9.3453 × 10⁻⁴	2.2260	0.0676
	Residual	0.0819	195	4.1983 × 10⁻⁴	–	–
	Total	0.0856	199	–	–	–
BF_Hamming	AVF	9.3872 × 10⁻⁴	4	2.3468 × 10⁻⁴	0.5635	0.6895
	Residual	0.0812	195	4.1650 × 10⁻⁴	–	–
	Total	0.0822	199	–	–	–
BF_HammingTMR	AVF	0.0011	4	2.8195 × 10⁻⁴	1.3475	0.2538
	Residual	0.0408	195	2.0923 × 10⁻⁴	–	–
	Total	0.0419	199	–	–	–
BF_Parity1D	AVF	0.0047	4	0.0012	2.0845	0.0843
	Residual	0.1101	195	5.6443 × 10⁻⁴	–	–
	Total	0.1148	199	–	–	–
BF_Parity2D	AVF	0.0023	4	5.8132 × 10⁻⁴	1.4311	0.2251
	Residual	0.0792	195	4.0622 × 10⁻⁴	–	–
	Total	0.0815	199	–	–	–
BF_QMR	AVF	3.9548 × 10⁻⁴	4	9.8870 × 10⁻⁵	0.2422	0.9141
	Residual	0.0796	195	4.0827 × 10⁻⁴	–	–
	Total	0.0800	199	–	–	–
BF_TMR	AVF	0.0027	4	6.6887 × 10⁻⁴	1.8429	0.1222
	Residual	0.0708	195	3.6294 × 10⁻⁴	–	–
	Total	0.0734	199	–	–	–

References

Golkar, A.; Salado, A. Definition of New Space—Expert Survey Results and Key Technology Trends. IEEE J. Miniaturization Air Space Syst. 2021, 2, 2–9. [Google Scholar] [CrossRef]
George, A.D.; Wilson, C.M. Onboard Processing with Hybrid and Reconfigurable Computing on Small Satellites. Proc. IEEE 2018, 106, 458–470. [Google Scholar] [CrossRef]
Shaker, M.N.; Hussien, A.; Alkady, G.I.; Amer, H.H.; Adly, I. FPGA-Based Reliable Fault Secure Design for Protection against Single and Multiple Soft Errors. Electronics 2020, 9, 12. [Google Scholar] [CrossRef]
Lovelly, T.M.; George, A.D. Comparative Analysis of Present and Future Space-Grade Processors with Device Metrics. J. Aerosp. Inf. Syst. 2017, 14, 184–197. [Google Scholar] [CrossRef]
Belous, A.; Saladukha, V.; Shvedau, S. Space Microelectronics Volume 2: Integrated Circuit Design for Space Applications; Artech House: Houston, TX, USA, 2017; Volume 2. [Google Scholar]
Wirthlin, M. High-Reliability FPGA-Based Systems: Space, High-Energy Physics, and Beyond. Proc. IEEE 2015, 103, 379–389. [Google Scholar] [CrossRef]
Jacobs, A.; Cieslewski, G.; George, A.D.; Gordon-Ross, A.; Lam, H. Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing. ACM Trans. Reconfigurable Technol. Syst. 2012, 5, 1–30. [Google Scholar] [CrossRef]
van Harten, L.D.; Mousavi, M.; Jordans, R.; Pourshaghaghi, H.R. Determining the necessity of fault tolerance techniques in FPGA devices for space missions. Microprocess. Microsyst. 2018, 63, 1–10. [Google Scholar] [CrossRef]
Perez-Celis, J.A.; Ferrer-Pérez, J.A.; Santillán-Gutierrez, S.D.; de la Rosa Nieves, S. Simulation of Fault-Tolerant Space Systems Based on COTS Devices With GPSS. IEEE Syst. J. 2016, 10, 53–58. [Google Scholar] [CrossRef]
Smith, F. Overhead and performance comparison of SET fault tolerant circuits used in flash-based FPGAs. Int. J. Electr. Electron. Eng. Telecommun. 2021, 10, 76–82. [Google Scholar] [CrossRef]
Gao, Z.; Shi, J.; Liu, Q.; Ullah, A.; Reviriego, P. Reliability Evaluation and Fault Tolerance Design for FPGA Implemented Reed Solomon (RS) Erasure Decoders. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2023, 31, 142–146. [Google Scholar] [CrossRef]
Ismail, M.M.C.; Halim, I.S.A.; Hassan, S.L.M.; Rahim, A.A.A.; Abdullah, N.E. Fault Tolerant Design Comparison Study of TMR and 5MR. In Proceedings of the 2021 IEEE Symposium on Industrial Electronics & Applications (ISIEA), Langkawi Island, Malaysia, 10–11 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Farman, S.K.A.; Duggineni, C.; Khasim, K.N.V.; Valiveti, H.B. Optimization of Energy and Area of a Randshift: Fault-Tolerant Technique using FPGA design flow. In Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021; pp. 404–409. [Google Scholar] [CrossRef]
Denis, G.; Alary, D.; Pasco, X.; Pisot, N.; Texier, D.; Toulza, S. From new space to big space: How commercial space dream is becoming a reality. Acta Astronaut. 2020, 166, 431–443. [Google Scholar] [CrossRef]
Sweeting, M.N. Modern Small Satellites-Changing the Economics of Space. Proc. IEEE 2018, 106, 343–361. [Google Scholar] [CrossRef]
Union of Concerned Scientists (UCS) Satellite Database. Available online: https://www.ucsusa.org/resources/satellite-database (accessed on 16 February 2025).
Tang, H.H.K.; Rodbell, K.P. Single-Event Upsets in Microelectronics: Fundamental Physics and Issues. MRS Bull. 2003, 28, 111–116. [Google Scholar] [CrossRef]
Leach, R. Spacecraft system failures and anomalies attributed to the natural space environment. In Space Programs and Technologies Conference; American Institute of Aeronautics and Astronautics: Huntsville, AL, USA, 1995. [Google Scholar] [CrossRef]
Jacklin, S.A. Small-Satellite Mission Failure Rates. NASA/TM-2018-220034; 2019. Available online: https://ntrs.nasa.gov/citations/20190002705 (accessed on 6 February 2025).
Daneshvar, H.; Milan, K.G.; Sadr, A.; Sedighy, S.H.; Malekie, S.; Mosayebi, A. Multilayer radiation shield for satellite electronic components protection. Sci. Rep. 2021, 11, 20657. [Google Scholar] [CrossRef]
Koren, I.; Krishna, C.M. Fault-Tolerant Systems; Morgan Kaufmann: Burlington, MA, USA, 2020. [Google Scholar]
Dodd, P.E.; Massengill, L.W. Basic mechanisms and modeling of single-event upset in digital microelectronics. IEEE Trans. Nucl. Sci. 2003, 50, 583–602. [Google Scholar] [CrossRef]
Hentschke, R.; Marques, F.; Lima, F.; Carro, L.; Susin, A.; Reis, R. Analyzing area and performance penalty of protecting different digital modules with Hamming code and triple modular redundancy. In Proceedings of the 15th Symposium on Integrated Circuits and Systems Design, Porto Alegre, Brazil, 14 September 2002; pp. 95–100. [Google Scholar] [CrossRef]
Siewiorek, D.P.; Swarz, R.S. Reliable Computer Systems: Design and Evaluation, 3rd ed.; A K Peters/CRC Press: New York, NY, USA, 1998. [Google Scholar] [CrossRef]
Kameli, M.; Pissoort, D.; Claeys, T. Analyzing Bit Error Propagation in Multilayer Communication Networks: Codebook Method to Combine TMR and Hamming. In Proceedings of the 2024 International Symposium on Electromagnetic Compatibility–EMC Europe, Brugge, Belgium, 2–5 September 2024; pp. 739–744. [Google Scholar] [CrossRef]
Khalil, K.; Eldash, O.; Kumar, A.; Bayoumi, M. Machine Learning-Based Approach for Hardware Faults Prediction. IEEE Trans. Circuits Syst. Regul. Pap. 2020, 67, 3880–3892. [Google Scholar] [CrossRef]
Zhu, Y.; Reddi, V.J. Cognitive computing safety: The new horizon for reliability. IEEE Micro. 2017, 37, 15–21. [Google Scholar] [CrossRef]
Assiri, B.; Sheneamer, A. Fault tolerance in distributed systems using deep learning approaches. PLoS ONE 2025, 20, e0310657. [Google Scholar] [CrossRef]
Mukherjee, S.S.; Weaver, C.T.; Emer, J.; Reinhardt, S.K.; Austin, T. Measuring architectural vulnerability factors. IEEE Micro. 2003, 23, 70–75. [Google Scholar] [CrossRef]
Kim, S.-H.; Lee, J.-A.; Kim, D. Design methodology adopting normalized power-delay-and-area product (N-PDAP) for digital-circuit optimization. Curr. Appl. Phys. 2004, 4, 87–90. [Google Scholar] [CrossRef]
Singh, K.; Tiwari, S.C.; Gupta, M. A Modified Implementation of Tristate Inverter Based Static Master-Slave Flip-Flop with Improved Power-Delay-Area Product. Sci. World J. 2014, 2014, 453675. [Google Scholar] [CrossRef] [PubMed]
Pei, Z.; Liu, H.H.; Mayahinia, M.; Tahoori, M.B.; Catthoor, F.; Tőkei, Z.; Abdi, D.B.; Myers, J.; Pan, C. Ultra-Scaled E-Tree-Based SRAM Design and Optimization with Interconnect Focus. IEEE Trans. Circuits Syst. Regul. Pap. 2024, 71, 4597–4610. [Google Scholar] [CrossRef]
Shang, L.; Naeemi, A.; Pan, C. Towards Area Efficient Logic Circuit: Exploring Potential of Reconfigurable Gate by Generic Exact Synthesis. IEEE Open J. Comput. Soc. 2023, 4, 50–61. [Google Scholar] [CrossRef]
Liu, H.H.; Gilardi, C.; Salahuddin, S.M.; Pei, Z.; Schuddinck, P.; Xiang, Y.; Weckx, P.; Hellings, G.; Bardon, M.G.; Ryckaert, J.; et al. Future Design Direction for SRAM Data Array: Hierarchical Subarray with Active Interconnect. IEEE Trans. Circuits Syst. Regul. Pap. 2024, 71, 6495–6506. [Google Scholar] [CrossRef]
Li, C.; Sun, C.; Yang, J.; Ni, K.; Gong, X.; Zhuo, C.; Yin, X. Multibit Content Addressable Memory Design and Optimization Based on 3-D nand-Compatible IGZO Flash. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2024, 32, 1380–1388. [Google Scholar] [CrossRef]
Van Toan, N.; Lee, J.-G. FPGA-Based Multi-Level Approximate Multipliers for High-Performance Error-Resilient Applications. IEEE Access 2020, 8, 25481–25497. [Google Scholar] [CrossRef]
Zhang, M.; Nishizawa, S.; Kimura, S. Area Efficient Approximate 4–2 Compressor and Probability-Based Error Adjustment for Approximate Multiplier. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 1714–1718. [Google Scholar] [CrossRef]
Paul, B.; Nath, A.; Krishnaswamy, S.; Pidanic, J.; Nemec, Z.; Trivedi, G. Tensor Based Multivariate Polynomial Modulo Multiplier for Cryptographic Applications. IEEE Trans. Comput. 2023, 72, 1581–1594. [Google Scholar] [CrossRef]
Balasubramanian, P.; Maskell, D.L. FAC: A Fault-Tolerant Design Approach Based on Approximate Computing. Electronics 2023, 12, 18. [Google Scholar] [CrossRef]
Neves, F.G.R.; Saotome, O. Comparison between Redundancy Techniques for Real Time Applications. In Proceedings of the Fifth International Conference on Information Technology: New Generations (itng 2008), Las Vegas, NV, USA, 7–9 April 2008; pp. 1299–1300. [Google Scholar] [CrossRef]
Abhyankar, A. Performance/Cost Analysis of Software Implemented Hardware Fault Tolerance Techniques. 2010. Available online: https://minds.wisconsin.edu/handle/1793/45562 (accessed on 20 September 2024).
Balasubramanian, P.; Mastorakis, N.E. Power, Delay and Area Comparisons of Majority Voters relevant to TMR Architectures. arXiv 2016, arXiv:1603.07964. [Google Scholar] [CrossRef]
Chennakesavulu, M.; Prasad, T.J.; Sumalatha, V. Improved Performance of Error Controlling Codes Using Pass Transistor Logic. Circuits Syst. Signal Process. 2018, 37, 1145–1161. [Google Scholar] [CrossRef]
Morgan, K.S.; McMurtrey, D.L.; Pratt, B.H.; Wirthlin, M.J. A Comparison of TMR With Alternative Fault-Tolerant Design Techniques for FPGAs. IEEE Trans. Nucl. Sci. 2007, 54, 2065–2072. [Google Scholar] [CrossRef]
Knudsen, J.E.; Clark, L.T. An Area and Power Efficient Radiation Hardened by Design Flip-Flop. IEEE Trans. Nucl. Sci. 2006, 53, 3392–3399. [Google Scholar] [CrossRef]
Reviriego, P.; Demirci, M.; Tabero, J.; Regadío, A.; Maestro, J.A. DMR+: An efficient alternative to TMR to protect registers in Xilinx FPGAs. Microelectron. Reliab. 2016, 63, 314–318. [Google Scholar] [CrossRef]
Di Natale, G.; Gizopoulos, D.; Di Carlo, S.; Bosio, A.; Canal, R. Cross-Layer Reliability of Computing Systems. IET-the Institution of Engineering and Technology. 2020. Available online: https://iris.polito.it/handle/11583/2854763 (accessed on 5 February 2025).
Neale, A.; Sachdev, M. Neutron radiation induced soft error rates for an adjacent-ECC protected SRAM in 28 nm CMOS. IEEE Trans. Nucl. Sci. 2016, 63, 1912–1917. [Google Scholar] [CrossRef]
Madasu, V.; Kumres, L. Enhancing Efficiency and Effectiveness: An Innovative Design for Multipliers on FPGA. In Proceedings of the 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 24–25 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
Yao, S.; Zhang, L. Hardware-Efficient FPGA-Based Approximate Multipliers for Error-Tolerant Computing. In Proceedings of the 2022 International Conference on Field-Programmable Technology (ICFPT), Hong Kong, China, 5–9 December 2022; pp. 1–8. [Google Scholar] [CrossRef]
Franceschi, M.; Camus, V.; Ibrahim, A.; Enz, C.; Valle, M. Approximate FPGA Implementation of CORDIC for Tactile Data Processing Using Speculative Adders. In Proceedings of the 2017 New Generation of CAS (NGCAS), Genova, Italy, 6–9 September 2017; pp. 41–44. [Google Scholar] [CrossRef]
AES (Advanced Encryption Standard). Available online: https://connectivity.esa.int/projects/aes-advanced-encryption-standard (accessed on 6 February 2025).
Tăbăcaru, B.-A. On Fault-Effect Analysis at the Virtual-Prototype Abstraction Level. Ph.D. Thesis, Technische Universität München, Munich, Germany, 2020. Available online: https://mediatum.ub.tum.de/1471529 (accessed on 6 February 2025).

Figure 3. Classification of fault-tolerant mechanisms for electronic equipment [21,24,25].

Figure 9. Target circuit with FTM applied.

Figure 10. Target circuit with TMR applied.

Figure 11. Target circuit with DMR+ applied.

Figure 12. Target circuit with Hamming Code applied.

Figure 13.

1 - {A V F}_{a v g}

per target circuits (%).

Figure 13.

1 - {A V F}_{a v g}

per target circuits (%).

Figure 14. Power-delay-area product by target circuit.

Table 1. Summary of journal studies on EDAP and PDAP metrics: evaluation methods, target systems, and References.

Metric	Ref.	Target System	Method
EDAP	[32]	SRAM Cache System	Designed the circuit layout and analyzed simulation results using Cadence Spectre tool.
	[33]	Reconfigurable Logic Circuit	Simulated 300 randomly sampled function pairs and analyzed the results.
	[34]	SRAM Array	Applied IGZO transistor model and 45 nm predictive technology MOSFET model for simulation.
	[35]	Content Addressable Memory	SPICE simulation was performed (applying the IGZO transistor model and the 45 nm predictive technology MOSFET model), followed by result analysis
PDAP	[36]	Multiplier	The Verilog code was synthesized using Xilinx ISE, and the synthesis results were analyzed.
	[37]	Multiplier	The Verilog code was synthesized using Synopsys Design Compiler, and the results were analyzed
	[38]	Multiplier	Calculations were performed using the report output from Xilinx VIVADO as the synthesis result
	[39]	Adder	Synthesis was carried out using Synopsys Design Compiler, followed by analysis of power, delay, and area using the tool’s analysis features, then calculated

Table 3. Number of fault injection tests per target circuit (Simulation Time = 510,000 ns).

Redundancy Type	Target Circuit	Number of Fault Injection Locations (ea)	Total Fault Space (ea)	Number of Test Iterations (ea)
-	Baseline	2440	1,244,400,000	9604
Hardware Redundancy	DMR	2760	1,407,600,000	9604
	DMR+	3160	1,611,600,000	9604
	TMR	2800	1,428,000,000	9604
	QMR	3080	1,570,800,000	9604
Information Redundancy	1D-Parity	2720	1,387,200,000	9604
	2D-Parity	2760	1,407,600,000	9604
	Hamming	2720	1,387,200,000	9604
Hybrid Redundancy	Hamming and TMR	3040	1,550,400,000	9604

Table 4. AVF by target circuit: 1-bit fault injection for key expansion.

Redundancy Type	Target Circuit	AVF per Fault Model (%)			Reliability $(1 - {A V F}_{a v g}$ ) (%)	Rank
Redundancy Type	Target Circuit	Stuck-at 0	Stuck-at 1	Bit-Flip	Reliability $(1 - {A V F}_{a v g}$ ) (%)	Rank
-	Baseline	34.59	36.46	68.17	53.59	9
Hardware Redundancy	DMR	31.68	35.30	67.56	55.15	8
	DMR+	12.41	16.98	28.46	79.56	6
	TMR	4.88	118.90	23.69	81.81	5
	QMR	4.51	12.45	17.59	85.92	3
Information Redundancy	1D-Parity	32.34	32.92	64.43	56.77	7
	2D-Parity	0.00	15.71	17.04	83.69	4
	Hamming	10.59	11.58	21.04	88.85	2
Hybrid Redundancy	Hamming and TMR	3.20	3.33	6.01	95.82	1

Table 5. AVF by target circuit: 2-bit fault injection for key expansion.

Redundancy Type	Target Circuit	AVF per Fault Model (%)					$1 - {A V F}_{a v g}$ (%)	Rank
Redundancy Type	Target Circuit	Stuck-at 00	Stuck-at 01	Stuck-at 10	Stuck-at 11	Bit-Flip	$1 - {A V F}_{a v g}$ (%)	Rank
-	Baseline	69.06	66.64	67.38	67.26	67.22	32.49	9
Hardware Redundancy	DMR	68.45	68.01	66.35	65.79	64.59	33.36	7
	DMR+	33.93	30.26	30.26	32.22	34.15	67.84	6
	TMR	9.23	29.17	28.41	42.72	35.28	71.04	4
	QMR	14.25	22.47	21.44	26.69	24.48	78.13	2
Information Redundancy	1D-Parity	63.31	59.19	62.09	61.48	64.19	37.95	7
	2D-Parity	25.38	26.71	27.31	29.69	40.41	70.10	5
	Hamming	24.67	9.94	9.67	22.27	61.76	74.34	3
Hybrid Redundancy	Hamming and TMR	7.95	5.08	4.98	8.48	33.96	87.91	1

Table 6. AVF by target circuit: 1-bit fault injection for round.

Redundancy Type	Target Circuit	AVF per Fault Model (%)			Reliability $(1 - {A V F}_{a v g}$ ) (%)	Rank
Redundancy Type	Target Circuit	Stuck-at 0	Stuck-at 1	Bit-Flip	Reliability $(1 - {A V F}_{a v g}$ ) (%)	Rank
-	Baseline	33.19	34.52	67.97	54.77	4
Hardware Redundancy	DMR+	13.30	23.95	32.88	76.62	3
Hardware Redundancy	TMR	4.30	24.03	28.83	80.95	2
Information Redundancy	Hamming	8.08	8.80	16.81	88.77	1

Table 7. AVF by target circuit: 2-bit fault injection for round.

Redundancy Type	Target Circuit	AVF per Fault Model (%)					$1 - {A V F}_{a v g}$ (%)	Rank
Redundancy Type	Target Circuit	Stuck-at 00	Stuck-at 01	Stuck-at 10	Stuck-at 11	Bit-Flip	$1 - {A V F}_{a v g}$ (%)	Rank
-	Baseline	70.42	64.74	65.96	69.41	68.91	32.11	4
Hardware Redundancy	DMR+	34.85	30.99	31.11	34.21	35.66	66.64	3
Hardware Redundancy	TMR	9.80	29.08	28.92	47.37	36.01	69.76	2
Information Redundancy	Hamming	25.38	9.82	9.84	26.10	62.27	73.32	1

Table 8. Resource utilization analysis per target circuit.

Redundancy Type	Target Circuit	Power (W)	Dealy ×10⁻³ (μs)	Area (ea)				PDAP	Rank (1/PDAP)
Redundancy Type	Target Circuit	Power (W)	Dealy ×10⁻³ (μs)	LUT	FF	BRAM	SUM	PDAP	Rank (1/PDAP)
-	Baseline	0.51	5.50	4368	4480	70	8918	25.01	-
Hardware Redundancy	DMR	0.51	5.93	4704	4480	70	9254	27.99	1
	DMR+	0.59	6.02	5806	6400	70	12,276	43.60	4
	TMR	0.57	5.54	5329	7040	70	12,439	39.28	3
	QMR	0.73	6.42	6428	9600	70	16,098	75.44	6
Information Redundancy	1D-Parity	0.52	6.15	4704	4480	70	9254	29.59	2
	2D-Parity	0.70	6.69	7007	4900	70	11,977	56.09	5
	Hamming	1.02	9.07	7778	4480	70	12,328	114.05	7
Hybrid Redundancy	Hamming and TMR	0.97	9.00	11,821	7760	70	19,651	171.98	8

Table 9. Resource usage according to synthesis options.

Synthesis Option	Target Circuit	Power (W)	Dealy ×10⁻³ (μs)	Area (ea)				PDAP
Synthesis Option	Target Circuit	Power (W)	Dealy ×10⁻³ (μs)	LUT	FF	BRAM	SUM	PDAP
Default	Baseline	0.51	5.77	4704	4480	70	9254	27.42
	TMR	0.64	6.52	8207	7040	70	15,317	63.94
	Hamming	0.79	7.51	7440	4720	70	12,230	72.14
Flow_AreaOptimized_high	Baseline	0.51	5.57	4640	4480	70	9190	26.26
	TMR	0.63	6.17	8140	7040	70	15,250	59.66
	Hamming	0.78	7.99	7300	4720	70	12,090	75.48
Flow_PerfOptimized_high	Baseline	0.51	5.70	4768	4480	70	9318	27.27
	TMR	0.64	6.67	8266	7040	70	15,376	65.52
	Hamming	0.78	7.60	7540	4720	70	12,330	73.47
Flow_alternateRoutability	Baseline	0.53	5.97	5220	4420	70	9710	30.60
	TMR	0.64	6.07	8713	6980	70	15,763	61.33
	Hamming	0.80	7.22	7724	4660	70	12,454	71.55

Table 10. Reliability and resource utilization analysis per target circuit.

Redundancy Type	Target Circuit	Reliability (AVF Improvement)	Resource Utilization (∣PDAP Increase∣)
-	Baseline	-	-
Hardware Redundancy	DMR	3.36	11.88
	DMR+	55.96	74.30
	TMR	60.80	57.03
	QMR	69.65	201.60
Information Redundancy	1D-Parity	6.84	18.31
	2D-Parity	64.85	124.22
	Hamming	75.97	355.93
Hybrid Redundancy	Hamming and TMR	91.00	587.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, C.; Lee, D.; Na, J. A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems. Aerospace 2025, 12, 152. https://doi.org/10.3390/aerospace12020152

AMA Style

Kim C, Lee D, Na J. A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems. Aerospace. 2025; 12(2):152. https://doi.org/10.3390/aerospace12020152

Chicago/Turabian Style

Kim, Changhyeon, Dongmin Lee, and Jongwhoa Na. 2025. "A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems" Aerospace 12, no. 2: 152. https://doi.org/10.3390/aerospace12020152

APA Style

Kim, C., Lee, D., & Na, J. (2025). A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems. Aerospace, 12(2), 152. https://doi.org/10.3390/aerospace12020152

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparison of Reliability and Resource Utilization of Radiation Fault Tolerance Mechanisms in Spaceborne Electronic Systems

Abstract

1. Introduction

2. Background

2.1. Trends in Using Commercial Off-the-Shelf Components in Space Electronic Systems

2.2. Single Event Upset (SEU)

2.3. Reliability and Resource Utilization of Fault Tolerance Mechanisms in Electronic Systems

2.4. Comparative Studies on Fault Tolerance Mechanisms

3. Methodology for Comparative Analysis of FTM Performance

3.1. Fault Tolerance Mechanism (FTM)

3.1.1. TMR (Triple Modular Redundancy)

3.1.2. DMR+ (Dual Modular Redundancy+)

3.1.3. Hamming Code

3.1.4. Additional Fault Tolerance Mechanisms for Comparison

3.2. Reliability and Resource Utilization Analysis of Fault Tolerance Mechanisms

3.2.1. Reliability of Fault Tolerance Mechanisms

3.2.2. Resource Utilization of Fault Tolerance Mechanisms

4. Case Study

4.1. Target System: Advanced Encryption Standard (AES-128)

4.2. Fault Tolerance Mechanism–Applied Target Circuit

4.3. Efficiency Analysis Results

4.3.1. Reliability Analysis of Fault Tolerance Mechanisms

4.3.2. Resource Utilization Analysis of Fault Tolerance Mechanisms

4.3.3. Comparative Analysis of Reliability and Resource Utilization

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI