research-article

Open access

Scalable and Accelerated Self-healing Control Circuit Using Evolvable Hardware

Authors:

Deepanjali S.,

Noor Mahammad SKAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 29, Issue 2

Article No.: 31, Pages 1 - 29

https://doi.org/10.1145/3634682

Published: 15 February 2024 Publication History

PDF eReader

Abstract

Controllers are mission-critical components of any electronic design. By sending control signals, they decide which and when other data path elements must operate. Faults, especially Single Event Upset (SEU) occurrence in these components, can lead to functional/mission failure of the system when deployed in harsh environments. Hence, competence to self-heal from SEU is highly required in the control path of the digital system. Reconfiguration is critical for recovering from a faulty state to a non-faulty state. Compared to native reconfiguration, the Virtual Reconfigurable Circuit (VRC) is an FPGA-generic reconfiguration mechanism. The non-partial reconfiguration in VRC and extensive architecture are considered hindrances in extending the VRC-based Evolvable Hardware (EHW) to real-time fault mitigation. To confront this challenge, we have proposed an intrinsic constrained evolution to improve the scalability and accelerate the evolution process for VRC-based fault mitigation in mission-critical applications. Experimentation is conducted on complex ACM/SIGDA benchmark circuits and real-time circuits used in space missions, which are not included in related works. In addition, a comparative study is made between existing and proposed methodologies for brushless DC motor control circuits. The hardware utilization in the multiplexer has been significantly reduced, resulting in up to a 77% reduction in the existing VRC architecture. The proposed methodology employs a fault localization approach to narrow the search space effectively. This approach has yielded an 87% improvement on average in convergence speed, as measured by the evolution time, compared to the existing work.

1 Introduction

In the current era, digital devices are typically operated at voltages lower than the environmental voltage, resulting in Single Event Transient (SET) errors, which are more likely to be latched in memory elements and cause the bits stored in them to flip, known as Single Event Upset (SEU). The fault addressed in the proposed work is transient errors, which result in single-bit or multiple-bit flips in the memory elements of the deployed digital system. Galactic cosmic rays, including \(\alpha , \beta , and \ gamma\) , are the source of these faults by introducing high-energy particle strikes on semiconductor devices. These strikes can induce SEU during the operation of the digital system, leading to changes in the stored bit states of the memory element. The control circuit, which includes the combinational path and memory element, is particularly vulnerable to this transient error type. The control unit is a crucial component of any digital system that serves as a liaison between various data path elements and peripherals to facilitate system operation. Any fault in the control circuit can lead to significant functional failures. Consider a control circuit in a processor environment where instructions stored in the instruction register are fetched and decoded based on the opcode. In the event of an SEU occurring in the opcode, an erroneous control signal can be generated, activating inappropriate data path elements and, in turn, causing functional failure. Therefore, the design of a fault-tolerant control circuit is of utmost importance to ensure reliable system performance. Traditional fault mitigation methodologies in safety-critical systems incorporate redundancy in temporal and spatial dimensions, such as N-modular redundancy. However, this approach can be hindered by the redundancy rate and delay, regardless of whether a fault has occurred [20].

By drawing inspiration from biological systems, Evolvable Hardware (EHW) aims to design electronic systems that can adapt to changing conditions and repair faults independently without human intervention. EHW is an application of evolutionary algorithms to reconfigurable fabrics, such as Field Programmable Gate Array (FPGA) and Field Programmable Transistor Array (FPTA), with two main objectives. The first objective is to use evolutionary design to optimize digital circuits regarding parameters such as area, power, and delay. Because evolutionary algorithms are search-based algorithms, they are well suited for autonomously designing digital circuits from scratch. This category encompasses the design of various circuits, including a 2-bit multiplier and an 8-bit parity generator [41], as well as 2-bit and 4-bit adders and multipliers, along with a 6-bit parity generator [37]. Additionally, a four-to-one even parity generator and 2-bit adders and multipliers are designed from scratch in [34] and evolved antenna in [12]. The works of [32] have utilized an online evolution of hybrid evolutionary algorithms such as Genetic algorithm (GA) and simulated annealing. This work uses small circuits such as a 2-bit full adder to complex circuits of size 4-bit input and 8-bit output to verify the proposed hybrid algorithm’s efficacy compared to standard GA. The second objective falls in the category of evolutionary design in adaptive hardware systems wherein the digital circuit under operation has to identify the changes in the working environment and self-rectify itself [14]. The primary requirement of self-adaptive hardware is to reconfigure itself to the changing environment. In FPGA, the reconfiguration ability is achieved by overwriting the in-built bitstream using native reconfiguration tools like JTAG controller or ICAP [4] or PCAP [40] tools. This method of reconfiguration is termed Dynamic Partial Reconfiguration (DPR) in the EHW community [2, 26, 35]. In another recent work [5], direct bitstream modification was achieved in the Lattice iCE40 FPGA family. This was achievable because the bitstream format was documented in Project IceStorm [11]. The works presented in [9] also fall under the above-mentioned category, employing three strategies. Initially, it involves directly modifying the bitstream to avoid extra synthesis phases for circuit evolution. Subsequently, it incorporates FPGA low-level management, removing the need for slow external program invocations. Lastly, it combines bitstream compression with direct FPGA reconfiguration. In contrast to existing methods, experiments reveal that utilizing CoBEA for evolutionary processes on various FPGA devices can yield a remarkable speed boost of over 130 times. The DPR method reconfigures the bitstream based on the frame address, where the frame is the fundamental unit of the configuration bit. This methodology is highly acceptable in the following scenarios:

—

To perform partial reconfiguration while the hardware is operational, it is necessary to have the bitstream format. Regrettably, only a few FPGA vendors will provide information about the configuration bitstream format. Moreover, the publication of the bitstream format has mostly ceased since the Xilinx 6200 series [6, 10].

—

The presence of the DPR mechanism is crucial for facilitating reconfiguration. However, certain highly secure military-grade FPGAs do not support these mechanisms. Furthermore, in specialized security-focused FPGAs, the bitstream undergoes extensive encryption [36].

When DPR is not feasible, an equivalent alternative reconfiguration technology is necessary to perform the native reconfiguration method at the application level. This alternative methodology is termed Virtual Reconfigurable circuit [28, 29]. This methodology implements all possible hardware circuit functionalities and inputs using a multiplexer. Switching these functionalities is possible by reconfiguring the multiplexer’s select lines, which are stored in the configuration register. Since reconfiguration is performed on the same FPGA but not using native reconfiguration, it is called Virtual Reconfigurable Circuit (VRC) or virtual overlay.

Using Hardware Description Language (HDL), the designer can design a VRC of a digital circuit comprising a two-dimensional programming element architecture and a configuration register housing select lines for multiplexers within the programming elements. By writing the appropriate configuration register values, the designer can choose the desired inputs and functionality and subsequently deploy the circuit on the FPGA. This technique has been applied in various applications, such as evolving image operators [8], combinational circuits [30], and implementing the Evolutionary Algorithm (EA) for hardware security from attacks such as side-channel security [16] and hardware trojans [17, 18]. Despite its usefulness in autonomous electronic design and adaptive hardware, this approach faces specific challenges, which include:

—

Extensive Architecture for VRC Implementation. The two-dimensional programming element (PE) architecture in VRC is fundamental for emulating circuit functionality with multiplexers. However, its size is comparatively more significant than the standard implementation. As the architecture serves as the phenotype of EHW, increasing the number of PEs can enhance the genotype and augment the search space. For instance, even for a less complex circuit like a 3-bit multiplier [30], the VRC produces 80 PEs owing to 880 bits of configuration.

—

Non Partial Genotype Representation. In the DPR methodology for FPGA design, faults occurring in the configuration memory can be effectively isolated by breaking them down into frames. A portion of the bitstream can be reconfigured for the circuit to regain functionality. However, regarding reconfiguration based on VRC, the configuration bit stored in the configuration register mimics the configuration memory bitstream. Despite the usefulness of this approach, partial configuration or fault localization still needs to be explored. Furthermore, in the event of a single event upset, the entire configuration bits are evolved, resulting in an insignificant evolution.

—

Unaccelerated Convergence. Convergence is a critical metric for assessing the effectiveness of fault mitigation in terms of generation. Its value can be affected by various factors, including the method of phenotype representation, the efficacy of selected genetic operators, and the search space size. In mission-critical applications, convergence is a crucial parameter. Most of the circuit’s convergence speed is low since the above two challenges have yet to be confronted.

This article addresses the initial challenges related to accelerating the deployment of hardware circuits. To achieve this, we propose a compact virtual overlay architecture that utilizes fewer multiplexers and implements a reduced number of functions through a heuristic approach facilitated by the deterministic control circuit. This results in a reduced representation of chromosomes, lowering the evaluation time. We introduce a constrained evolution technique that enables the reverse engineering of faults, particularly single-event upsets in the virtual overlay structure of the deployed circuit. This approach involves positioning fewer bits, unlike the traditional VRC-based evolution process, where the entire genotype is evolved. The search space is constrained by reducing the chromosome range, leading to faster and more efficient evolution. In summary, our fault-tolerant technique leverages EHW modules for fault identification and correction, employing an improved VRC-based reconfiguration technology with the above-mentioned amendments.

1.1 Contribution of This Article

The proposed work mainly focuses on the following:

—

Development of a generic and efficient method to address faults in complex control circuits. The proposed technique can be applied to commercial and military-grade FPGAs using a Virtual Reconfigurable mechanism.

—

A compact VRC structure is employed to minimize hardware usage and the two-dimensional architectural footprint, as opposed to the existing virtual-based reconfiguration.

—

We present the integration of partial reconfiguration into VRC by accurately localizing the potential error locations, which contradicts the existing non-partial reconfiguration approach in VRC.

—

The proposed constrained evolution-based genetic algorithm is deployed on the same FPGA as a digital circuit to reduce further the time required in communicating the evolved circuit to the target platform, as in the case of extrinsic EHW.

—

The proposed solution of fault tolerance uses a VRC-based EHW and is being compared to the existing VRC-based EHW solution for Brushless DC (BLDC) motors [47]. Furthermore, SEU fault injection is being conducted on complex control circuits in the ACM/SIGDA benchmark circuits to evaluate scalability and other fault mitigation parameters.

The rest of the article is organized as follows: Section 2 explains related work, Section 3 describes the proposed scalable solutions and genetic unit with its application to two case studies, Section 4 details the implementation of proposed constrained evolution with the genetic unit, Section 5 summarizes the efficacy of proposed methodologies in terms of fault mitigation parameters, and Section 6 outlines the conclusion.

2 Related Work

This section discusses the existing research to tackle scalability issues in EHW, including architecture and evaluation time. Furthermore, it examines the challenges and potential areas for improvement in VRC-based EHW systems, focusing on important EHW parameters such as VRC array size, genotype length, type of EHW, and convergence generation. Regarding the architecture scalability, there are two main topologies: Cartesian Genetic Programming (CGP) and Systolic Array (SA) architecture. The Cartesian-based topology is generally used when virtual overlay architecture is used. It contains a two-dimensional multiplexer-based programming structure where inputs to each column can be from external input or previous column programming structure. This architecture is highly flexible in routing and has a lesser path length from input to output [21, 24, 45]. On the contrary, the SA architecture defines a restriction in the routing of programming structure only to its neighbor. This method is highly suitable for DPR-based reconfiguration. The SA architecture has reduced usage of multiplexer when compared to CGP since routing is less complex when compared to CGP [13, 22, 23].

The scalability of representation revolves around the issue of genotype and phenotype mapping. This issue, when solved, can reduce the search space for converging to the solution. The initial work concerned with scalability was migrating from gate-level genotype representation to function-level representation [3, 25, 46]. This approach helped consider a decent hardware size for evolution, such as up to a 3-bit multiplexer. The alternate solution in earlier EHW days was to introduce modular evolution [7]. The target circuit considered for evolution was divided into multiple modules of lesser size and evolved parallelly [42]. The greatest challenge in this concept was circuit decomposition, primarily when hardware applications utilize special circuits instead of conventional design. In these methodologies where decomposition, incremental [33], and modularity have applied, the scalability of the circuit is achievable up to the size of 20 inputs and more than 100 gates. A binary decision representation-based encoding was followed in [39]. This methodology could converge for circuits higher than 40 inputs in a few milliseconds. Though the scalability and acceleration are high, the proposed work was theoretical and not generalized. The solution was tested in benchmark circuits wherein only 23 out of 28 circuits evolved to a solution. The search space can be further reduced when proper understanding and phenotype mapping are available. Apart from these scalable solutions, other specific solutions include [38], where a formal verification methodology called combinational equivalence is followed to find the evolved circuit’s fitness. This change in fitness evaluation has drastically accelerated the evolution speed and increased scalability. The audio CODEC engine with 2,176 inputs and 2,136 outputs was the most complex circuit that evolved using this methodology. Multiple VRC-based EHWs are proposed for the automated design of various applications, whereas the evolution of the control circuits is limited. [27] has proposed a VRC-based EHW on a Xilinx XC6216 FPGA for an Image Filter with a VRC size of eight rows and seven columns. The EA was hosted extrinsically utilizing 64.2% of CLB and 70.0% of Slices. The convergence ratio was accounted as 10,340. The work has recorded a high utilization rate for implementing the two-dimensional structure for VRC for the chosen application.

Similarly, an automated design of the sorting network was proposed in [15]. Various input lengths were experimented, ranging from the largest input size of 36, which occupied a \(18\times 108\) VRC array size, to the smallest input size of 10 with a VRC array size of \(5\times 30\) . The sorting network circuit was implemented as intrinsic EHW on a Xilinx XC6216 FPGA. The genotype length for the above range was noted as 3,421 to 343. The CLB Slices of size \(18\times 108\) required 13,483 iterations and took approximately 21 hours to converge, and the CLB Slices of size \(5\times 30\) required 6,378 iterations and took approximately 20.48 microseconds to converge. The challenge of this work involved the evolution of the largest sorting networks, which require 10 hours in FPGA running at 100 MHz. [30] devised an extrinsic EHW for combinational circuits like a 3-bit multiplier, adder, multiplexer, and parity encoder with an average genotype length of 880 bits. The CLB utilization was recorded as 19.26%. The unit’s implementation cost is relatively high compared to the evolved circuit’s size. The same author has proposed a polymorphic combinational circuit [31] for a 2-bit Multiplier, 4-bit sorting Network, and 4-bit-Plus1/Plus7 circuits on the Virtex FPGA XC2V3000bf957. This work’s challenge has recorded that the evolutionary design of a complex multi-functional circuit was intractable using a standard PC or a cluster of workstations. The other circuits implemented using VRC methodology involve the IIR Filter [8] and Packet Classifier [43]. The convergence of the genetic algorithm is also considered unaccelerated because chromosome length is huge for complex circuits. To focus on this factor, [19] has upgraded the standard genetic algorithm with the elite-partheno capability. With this approach, though the length of the chromosome is huge, the number of such populations is reduced. Hence, the evolution speed was accelerated by 144 times.

The control circuits chosen for fault tolerance based on VRC and DPR are very few [20, 47], respectively, with less complexity and high evaluation time. Both works have utilized extrinsic evolution strategies, where evolutionary algorithms are hosted in the processor of the SoC-based commercial FPGA. Also, the works have proposed a self-healing nature in less complex circuits, like brushless DC motor [47] with 3-bit input and 6-bit output and quadrature decoder [20] with 2-bit input and output. Irrespective of the lower complexity, the evolution time is vast, owing to the considerable architecture and unconstrained evolution. Hence, based on all the previous work analyses, the identified challenges, namely Extensive Architecture for VRC Implementation, Non-partial Reconfiguration in VRC, and Unaccelerated Convergence, have been recognized as crucial areas for improvement in the existing research. Notably, methodologies for partial evolution or localizing errors are lacking in VRC, similar to Dynamic Partial Reconfiguration. Therefore, our work aims to introduce a scalable solution for evolving control circuits, incorporating the concept of constrained evolution.

3 Proposed Scalable Architecture AND Representation for Evolvable Hardware

The standard EHW consists of the target circuit to be evolved and an EA hosted in an external PC or a processor in the same FPGA (extrinsic). On the contrary, the GA in our proposed EHW system is drafted as a digital circuit with point-to-point communication between the modules and VRC. Since the internal system bus is used for communication, fault identification and recovery time are highly reduced. The communication parameters of the proposed EHW approach with VRC are shown in Figure 1. The length of the configuration register is shown as \(len(chr)\) content and is sent as the primary input to the GA. The randomly generated chromosome is then transmitted to the VRC for fitness evaluation. Genetic operations are performed based on the fitness value of the chromosome, and ultimately, fault-free chromosomes are written back to the configuration register. The other modules used to implement a genetic algorithm are explained in detail in Section 3.5. The following section explains two proposed solutions to aid faster convergence in the VRC architecture with suitable theorem, proof, examples, and case studies.

Fig. 1.

3.1 Design of Compact VRC Architecture

For any digital circuit, the VRC comprises two primary parts. The first part involves a two-dimensional architecture with PEs comprising three multiplexers each. These multiplexers are responsible for selecting inputs and choosing the desired function for the selected inputs, as shown in Figure 2. The second part consists of a configuration register that holds the select line bits for all the multiplexers in the PEs; by reconfiguring the configuration register, multiple versions of the circuits can be obtained. To accurately define the inputs and functionality of the digital circuit components, designers use an HDL to specify the configuration register values. Subsequently, the designed components are implemented on the FPGA platform.

Fig. 2.

In the case of a VRC-based circuit, the genotype space is determined by the length of the configuration register, denoted as N. N is calculated using the formula \(N=\sum _{i=1}^{p} n_i\) , where p represents the total number of PEs, and n represents the length of the configuration bit for each PE. In turn, \(n_i\) can be calculated as \((2\times r)+f\) , where r and f represent the number of bits in the select line influencing the routing and functionality of each PE, respectively. The \(n_i\) bits of each PE are placed continuously to drive the output of each sub-expression in the Boolean function of the circuit. From the calculation \(N=\sum _{i=1}^{p} n_i\) , it is evident that genotype space is influenced by parameters p, r, and f. The number of PEs, p, is influenced by the span of rows and columns of the entire VRC. The number of rows m increases with an increase in exterior input, and the number of columns n increases with the expression’s path length. In situations with higher path lengths, there are usually higher chances of redundant gate-level or function-level operation. In previous research, one of the main reasons for the extensive architecture of VRC is the lack of analysis of redundancy in the gate-level operation of the circuit’s phenotype or Boolean expression before constructing the VRC. The phenotype space, on the other hand, is influenced by input and output configuration space, represented as \(I=2^{i}\) and \(O=2^{o}\) , where i and o represent the number of inputs and outputs, respectively. The number of Boolean functions or phenotype space is realizable in I, and O configuration space can be calculated as \(O^{I}\) . The VRC representation is generally compact and redundant-free if the genotype and phenotype space are nearly equal. Constructing a compact structure for VRC with redundant free representation is highly required to ensure lesser search space for EA and faster convergence. The following theorem and its proof are proposed to emphasize the above explanation further.

Theorem 3.1.

Suppose genotype space is denoted as G. If measuring the genotype length of the configuration register is greater than the phenotype space, denoted as P, in a logic function representation of the standard circuit, then redundancy exists in the representation of the logic function.

Proof.

Let \(G=2^{|G|}\) be the set of all possible genotypes or genotype space, represented as binary strings of length \(|G|\) . Let \(P=2^{|P|}\) be the set of all possible phenotypes, represented as binary strings of length \(|P|\) . Assume a one-to-one mapping between G and P, denoted as \(f: G \rightarrow P\) , where each genotype uniquely corresponds to a phenotype.

Since \(|G| \gt |P|\) , multiple genotypes exist in G that map to the same phenotype in P, violating the one-to-one correspondence assumption. This implies that the mapping f is not injective. Formally, there exist \(g_1, g_2 \in G\) such that \(g_1 \ne g_2\) , and \(f(g_1) = f(g_2)\) . This notation asserts the existence ( \(\exists\) ) of two distinct genotypes \(g_1\) and \(g_2\) in G, such that \(g_1\) is not equal to \(g_2\) ( \(g_1 \ne g_2\) ), and their corresponding phenotypes under the mapping f are equal ( \(f(g_1) = f(g_2)\) ). This demonstrates that the mapping f is not injective since multiple genotypes can produce the same phenotype.

Therefore, redundancy exists in the logic function representation, as multiple genotypes produce the same phenotype. Hence, when \(G\gt P\) is in a logic function representation and routing using configuration bits of the VRC, redundancy arises in the representation of the logic function. □

To further emphasize the above theorem, we can consider the VRC structure for a less complex Boolean expression \((AB^{\prime }+CD^{\prime }E).FG^{\prime }\) , where S1, S2, and S3 denote each sub-expression such as \(AB^{\prime }\) , \(CD^{\prime }E\) , and \(FG^{\prime }\) . The sub-expression S1 can be implemented using the VRC method, as demonstrated in Figure 3. The S1 sub-expression uses two input terms, requiring two PEs to select the inputs and one PE to operate. Similarly, the number of input terms in the S2 and S3 sub-expressions determines the number of PEs needed, which is three and two, respectively. Consequently, the total number of PEs required to provide external input to S1, S2, and S3 in the VRC architecture will be 2 + 3 + 2. This discussion suggests that the number of terms in each sub-expression determines the number of rows in the external column of the VRC. The number of operations in the circuit, known as the path length of the expression, determines the number of columns in the VRC. The size of the VRC array can vary based on the circuit’s functionality and expression. Iterating back to our example considered for Boolean expression \((AB^{\prime }+CD^{\prime }E).FG^{\prime }\) , where sub-expression S1 is constructed using VRC of the 21-bit configuration, this sub-expression can be matched with the above proof, such as the S1 expression containing a 2-bit input (A and B) and single-bit output S1. The phenotype space P can be calculated as 16 because \(2^{4}\) different Boolean functions can be devised; here, 2 is the configuration of the 1-bit output, and 4 is the configuration size of the 2-bit input. The configuration size or genotype length \(|G|\) of 21 was required to represent \(3\times 2\times 2=12\) bits (three PEs with two multiplexers each and 2-bit select lines for each mux) of routing information and \(3\times 1\times 3=9\) bits (three PEs with one multiplexer for applying function and 3-bit select line) of functionality information, thereby creating a genotype space of \(2^{21}\) . Based on Theorem 3.1, redundancy is observed due to more significant genotype space than phenotype space. It is evident from the expression analysis that the NOT gate is predominantly observed in each sub-expression. Hence, this redundancy can be reduced by forwarding the NOT gate out of the VRC structure as shown in Figure 4, resulting in 5 bits of configuration register and nearly equal genotype space \((2^{5}\) ) and phenotype space \((2^{4}\) ). In addition, the number of PEs and multiplexers is reduced from 3 to 1 and 9 to 3, respectively.

Fig. 3.

Fig. 4.

Hence, it is evident from the above example and Theorem 3.1 that to eliminate redundancy in the configuration register and achieve equivalence between the genotype and phenotype space of the circuit, a comprehensive analysis of each sub-expression in the Boolean function is necessary. This analysis thoroughly examines each sub-expression structure and logic to identify redundant components or operations. By carefully examining the Boolean function in detail, designers can detect and forward the redundant logic outside of the VRC. This approach allows for the prevention of repeated usage of columns to achieve redundant operations, resulting in a direct reduction in the number of \(n_i\) bits within the configuration register. This process ensures that the genotype space, determined by the configuration register length, aligns closely with the phenotype space, representing the circuit’s Boolean expression. The resulting VRC circuit achieves a more compact and efficient representation by removing redundancy from two-dimensional space. The above updation is of utmost importance as it reduces the search space for EA and facilitates faster convergence toward non-faulty solutions.

3.2 Design of Partial Genotype Representation for Fault Localization and Constrained Evolution

The non-partial reconfiguration and its extensive architecture have been identified as the significant parameter that significantly influences the convergence speed in various related studies. It also contributes to scalability issues. Even for less complex circuits, the number of PEs in the VRC architecture system is relatively high. While the architectural problem can be addressed with the solution proposed in the previous section, non-partial reconfiguration leads to the unnecessary utilization of an ample search space. Despite only a single or a few PE structures being faulty, the configuration bits of all PEs are sent as input to the EA. Unlike the methodologies used for DPR that reconfigure only a part of the frame, the entire configuration bit is utilized in VRC. Therefore, this section presents a methodology and algorithm to localize the configuration bits of faulty PEs to perform only constrained evolution.

The VRC architecture is composed of multiple PEs, where \(PE=\lbrace P_{1},P_{2}, \ldots , P_{n}\rbrace\) , and each \(P_{i}\) in P is composed of n-bit select lines. The length of the configuration register can be calculated as \(|PE| \times n\) , where |PE| is the number of programming elements. In a standard VRC-based EHW system, the entire \(|PE| \times n\) is communicated as the input of EA. For a complex circuit, search space increases exponentially with an increase in \(|PE| \times n\) ; hence, a methodology is proposed to locate the faulty bits and send partial bits in \(|PE| \times n\) . The VRC circuit can produce a Y-bit of the output signal, represented as \(Y=\lbrace y_{1},y_{2}, \ldots , y_{n}\rbrace\) . Any \(y_{i}\) in Y will be influenced by a subset of P, represented as \(p \subseteq P\) . If an attempt is made to recognize the faulty output \({\bar{y}_{i}}\) in Y, then the possible location of faults can be constrained to configuration bit of \(\lbrace p_{i}\rbrace\) ; i.e., only \(|p|\times n\) is sent as the input to EA instead of \(|PE| \times n\) , thereby eliminating the irrelevant configuration of other PEs in P in the search space. Since the configuration bits of PE are placed continuously in the register, a window of configurations can be localized with the start address of the frame and end address, as shown in Algorithm 1.

Algorithm 1 outlines a sequence of steps to determine the potential location of faults in the configuration register. It takes as input a set of acceptable inputs represented as X and outputs represented as Y for the circuit. The training data, consisting of expected input-output combinations, is stored in the FlashROM of the FPGA. The VRC structure, represented by a two-dimensional PE structure and the configuration register ( \(Config\_Reg\) ), is synthesized on the FPGA platform. The algorithm calculates the start and end indices, which indicate possible fault locations. Lines 2–4 in Algorithm 1 involve applying inputs to the circuit and recording the corresponding outputs from the VRC, denoted as \(Y_{VRC}\) for all acceptable input-output combinations. Lines 6–8 perform a bitwise comparison between the obtained VRC outputs and the expected outputs stored in the FlashROM. A position in comparison output with a value of 1 indicates a faulty output. The configuration bits influencing this output position need to be identified. Therefore, the potential faulty bits in the configuration register begin at position \(p \times n\) , where p represents the position of the error output bit, and n is the number of select line bits supplied to each programming element. The window expands to cover the successive \(n-1\) positions for the respective PE. As a result, only the portion of \(Config\_Reg\) from the \(start\_index\) to the \(end\_index\) is supplied to GA for evolution.

3.3 Case Study I: Implementation of Proposed Scalable Solution in Brushless DC Motor Control Circuit

This section employs a BLDC motor controller in various space applications to demonstrate the compact two-dimensional arrangement of programming elements and the operation of constrained evolution within the VRC circuit. Table 1 displays the state transition table of the control circuit, where \(S_{i}\) represents input sensing signals and \(C_{i}\) denotes output control signals. The behavior of the Boolean expression is defined by Equations (1) to (6):

\begin{equation} \begin{aligned}C0=\overline{S1}.S2, \end{aligned} \end{equation}

(1)

\begin{equation} \begin{aligned}C1=S0.\overline{S2,} \end{aligned} \end{equation}

(2)

\begin{equation} \begin{aligned}C2=\overline{S0}.S1, \end{aligned} \end{equation}

(3)

\begin{equation} \begin{aligned}C3=S1.\overline{S2,} \end{aligned} \end{equation}

(4)

\begin{equation} \begin{aligned}C4=\overline{S0}.S2, \end{aligned} \end{equation}

(5)

\begin{equation} \begin{aligned}C5=S0.\overline{S1}. \end{aligned} \end{equation}

(6)

Table 1.

Position Signal			Driving Signal
S0	S1	S2	C0	C1	C2	C3	C4	C5
1	0	0	0	1	0	0	0	1
1	1	0	0	1	0	1	0	0
0	1	0	0	0	1	1	0	0
0	1	1	0	0	1	0	1	0
0	0	1	1	0	0	0	1	0
1	0	1	1	0	0	0	0	1

Table 1. Functionality and Relationship between the Input and Output Signals in the Controller of BLDC

This virtual overlay structure of the brushless DC motor is described in [47]. The virtual overlay structure contains 18 PEs (P1–P18) with six rows and three columns, as shown in Figure 5. The first column (P1–P6) receives input from external sources, so the external column multiplexers for selecting the input contain 2-bit select lines (M1, M2). In contrast, the input multiplexers for the other two columns of PEs (P7–P18) can accept input from the preceding column’s 6 PEs (P1–P6). Hence, the select lines of these multiplexers are 3 bits. The configuration bits stored in the configuration register owing to the routing of inputs is \(6\times (2+2)+12\times (3+3)=96\) bits. A single multiplexer performs the function selection (F1) in each PE; hence, 3 bits for each multiplexer are stored in the configuration register, contributing \(18\times 3=54\) . Therefore, 153 bits, with the last 3 bits for selecting the output from the last column of PEs, are stored in the configuration register. These bits in the configuration register define the routing and functional logic of the discussed brushless DC motor.

Fig. 5.

The configuration register is 153 bits wide for BLDC and contains the correct configuration bit of the select line that defines the expected functionality of the control circuit. However, when this register is implemented on an FPGA and used in a harsh environment, a single-event effect can occur, leading to a flip in any configuration bit of the select line defining the function or routing of the circuit. Therefore, providing the configuration register with an in-built self-healing capability is essential to mitigate any faults. In an existing system [47], the author used a GA approach to evolve the bits in the configuration register. However, since the entire content of the configuration register was evolved, it required a maximum of 9,856 generations to address a single-bit flip occurring in three PEs. In addition, based on Theorem 3.1, the genotype space \(2^{153}\) is higher than phenotype space, calculated as \(64^{8}\) . Here, 8 is the configuration bit of 3-bit input, and 64 is the output configuration space for 6-bit input. In our proposed methodology, we use the same circuit as the first example to demonstrate solutions discussed in Sections 3.1 and 3.2. The proposed constrained evolution approach is based on a foundational concept that involves comprehending and mapping the configuration register bit to the virtual overlay structure’s phenotype. This process is crucial for understanding and ultimately implementing the EA effectively.

In Figure 5, the BLDC circuit is visualized with three inputs, and the Boolean expression (1–6) consists of two main operations—AND and NOT gate; therefore, the VRC architecture is composed of three columns and six rows. Hence, based on the individual functions in the Boolean expression, the number of columns increases, increasing the number of multiplexers. For instance, in BLDC [47], the number of multiplexers utilized accounts for 89% of the total utilization rate. On deeper analysis of the Boolean expression in the BLDC circuit, it can be noted that the NOT gate is applied uniformly on all the inputs. Hence, a separate column that influences the NOT gate can be removed by being brought out as external input. The former number of inputs, such as S0, S1, and S2, is appended with its complementary \(\overline{S0}\) , \(\overline{S1}\) , and \(\overline{S2}\) . This modification in the architecture can reduce the \(6\times 3\) platform to \(6\times 1\) architecture. The exposure of the NOT gate as an external function has further reduced the number of select line bits in the function multiplexer of each PE. The deeper analysis of the phenotype functionalities can reduce the architecture utilization by observing the common gate/function applied on all inputs; hence, the solution mentioned above for architecture scalability is circuit specific. The compact VRC architecture of BLDC is shown in Figure 6, where the inputs S0, S1, S2, \(\overline{S0}\) , \(\overline{S1}\) , and \(\overline{S2}\) are applied externally. The two routing MUXes in each PE select among these six inputs and apply the AND operation. The VRC architecture encompasses \(6\times 1\) architecture with six PEs each. At each PE, two MUXes are utilized with 3-bit select lines. Hence, after employing the proposed scalable architecture, the total configuration bit is 6 PEs \(\times\) (2 MUX \(\times\) 3 bit) = 36 bit. Hence, the genotype space is reduced from \(2^{153}\) to \(2^{36}\) . The reduced genotype is approximately equal to \(10^{16}\) . Hence, the redundancy in genotype-to-phenotype mapping is predominantly reduced. In [47], the BLDC-VRC architecture utilized 153 bits to select the routing and functionality of the circuit. The search space of this architecture is huge ( \(2^{153}\) ), drastically downgrading the acceleration of fault recovery. The search space for fault recovery increases because the complete evolution of genotype has proceeded without a deeper understanding of phenotype and genotype mapping. In our proposed method, genotype mapping concerning the fault tolerance is studied for the BLDC circuit and reported as shown in Table 2.

Table 2.

Error Injection Position	Erroneous Output Bit	Input Affected
0–2	Q0	001,101
3–5	Q0	001,101
6–8	Q1	110,100
9–11	Q1	110,100
12–14	Q2	010,011
15–17	Q2	010,011
18–20	Q3	010,011
21–23	Q3	010,011
24–26	Q4	011,101
27–29	Q4	011,101
30–32	Q5	001,100
33–35	Q5	001,100

Table 2. Correlation Error Analysis between Erroneous Output Bit and Error Position

Fig. 6.

From Table 2, we can observe that SEU in any bit of the configuration register results in a compulsory fault in any of the outputs. A pattern between the SEU bit occurrence and erroneous output prevails. This pattern is utilized to partially locate the SEU occurrence and perform a constrained evolution on the part of configuration bits instead of the entire configuration register bits. This constrained evolution is possible by reverse engineering the location of SEU from a faulty output bit. A frame encoder hardware is introduced along with the proposed EHW system to aid the constrained evolution and target the window for evolution, as explained in Algorithm 1. The role of this frame encoder is to locate the window’s starting and ending addresses. Consider that the configuration register after enforcing the solution of scalable architecture is 36 bits and is indexed from 0 to 35. If the BLDC circuit experiences the fault in the fourth bit, then from the table, it can be understood that SEU would have occurred at bits 24–29. Hence, from this observation, it can be deduced that the start index of the window is \(E\times n\) , where E is the erroneous output bit, and n is the number of configuration bits for each PE. For BLDC, each PE accounts for 6 bits, thus directing the start index to 4 \(\times\) 6 = 24. The window’s terminal can be formulated as \((E\times n)+(n-1)\) , which accounts for 29 in the BLDC index, thus making the window length equal to n, the length of the select line to the routing MUX in each PE.

In general, VRC is designed based on the circuit chosen for evolution. Since the structure of this architecture is entirely user designed, mapping genotype with phenotype is achievable, and thus, evolving a target circuit for fault recovery at accelerated speed is made possible. The frame encoder and a solution of scalable architecture have reduced the search space of the configuration bit from 153 bits to 36 bits and subsequently to 6 bits, which is approximately 90% lesser search space.

3.4 Case Study II: Implementation of Proposed Scalable Solution in RISC-V Processor Control Circuit

RISC-V is an instruction set architecture in many ARM-based processors deployed on the SoC-based FPGA of Xilinx and Microsemi. Since these FPGAs are used in multiple space-based missions and the control circuit is complex with a higher path length of the Boolean expression as shown in Equations (7) to (10), we have chosen this control circuit to prove the efficacy of our proposed scalable solution. The RISC-V control circuit contains the current state, representing the different stages in the control circuit, like instruction fetch, execution, and memory access. Another input represents the instruction’s opcode, representing operations like ADD and MUL performed by the instruction. Each state on accepting opcodes traverses from the current state to the next state.

The VRC architecture with appropriate configuration register content and dimension architecture of PEs is designed using Verilog (HDL) and deployed on the FPGA. The control circuit contains 10-bit input and 4-bit output, which is realized using 35 rows and 7 columns of VRC architecture as shown in Figure 7. The initial row columns numbered PE11 to PE135 accept the input from 10 external inputs. Following our proposed architecture solution, the inputs, including opcode and the current state, are negated and available as input to the PE11 to PE135, as shown in Figure 7. Hence, two 5-bit select lines are utilized for routing the inputs in each PE of the first column, accounting for 35 \(\times\) 10 = 350 bits. The remaining column accepts the input from the previous column PEs (35) owing to two 6-bit select lines in the remaining PE indexed by PE20 to PE735, thus making 2,520 configurations, as shown in Figure 8. Therefore, 2,870 configuration bits are utilized for routing the inputs in the VRC architecture.

Fig. 7.

Fig. 8.

Regarding functionality, 16 gate level operation is implemented in each PE, requiring a 4-bit select line. Hence, 980 bits in total are required by the VRC architecture for selecting appropriate functions. The total configuration bits stored in the configuration register for the RISC-V processor account for 3,850 bits. In standard genetic evolution, entire chromosome bits will evolve for fault mitigation, requiring more generations for fault recovery. On the contrary, in our proposed system, the VRC architecture is designed in such a way as to support constrained evolution based on erroneous bit position. For instance, if the error happens in NS3, the frame encoder positions the start and end indexes to 0 and 445. This restriction in search space to 446 bits instead of 3,850 will accelerate the evolution time with lesser generation for fault recovery. Similarly, if NS2, NS1, and NS0 face an error, then bits from 446 to 1,495, 1,496 to 2,683, and 2,683 to 3,850 are evolved, respectively.

\begin{equation} NS3= \overline{S3}\overline{S2}\overline{S1}S0\overline{P5}\overline{P4}\overline{P3}\overline{P0}(P2\oplus P1) \end{equation}

(7)

\begin{equation} \begin{aligned}NS2= (\overline{S2}(\overline{P4}\overline{P2}(\overline{S1}S0\overline{P5}\overline{P3} \overline{P1}\overline{P0}+S1\bar{S0}P5P3P1P0)+S1S0)) \end{aligned} \end{equation}

(8)

\begin{equation} \begin{aligned}NS1= \bar{S3}(\overline{S2}(\overline{P4}\overline{P2}(\overline{S1}S0(P5P1P0+\overline{P5}\bar{P3}\overline{P1}\overline{P0})S1\overline{S0}P5 \overline{P3}P1P0))+\\ S2S1\overline{S0} \end{aligned} \end{equation}

(9)

\begin{equation} \begin{aligned}NS0= \overline{S3}(\overline{S2}(\overline{P4}\overline{P2}\overline{P1}(S1\overline{S0}P5P0+\overline{S1}S0\overline{P5}\overline{P3}\overline{P0})+\overline{S1}\overline{S0})+S2S1\overline{S0} \end{aligned} \end{equation}

(10)

Based on these two case studies, it is clear that the VRC architecture can be constructed for any circuit. As mentioned in earlier studies, the compact VRC architecture described in Section 3.2 has been implemented in the case, resulting in a reduced architectural footprint. Comprehensive fault injection in the configuration register has successfully localized faults for constrained evolution, as explained in Section 3.3 and Algorithm 1. The analysis of the case studies demonstrates that the proposed solutions can be applied to control circuits in general.

3.5 Genetic Unit

The genetic unit implements the evolutionary algorithm for fault mitigation, which serves as a digital circuit on the same FPGA. This genetic unit is entirely intrinsic. The modules of the genetic unit are designed based on different phases of the genetic algorithm, such as random population generation, fitness evaluation, selection, and reproduction. The interface between VRC and other genetic modules is shown in Figure 9—the configuration content acts as the genotype or chromosome of the genetic algorithm. Contrary to the standard genetic algorithm, only a part of a chromosome is evolved. The constrained chromosome from the configuration register is restricted by the frame encoder. The frame’s start and end addresses are decided based on the erroneous output bit position calculated by comparing VRC’s output and reference output. The length of this frame is communicated as the primary input to the genetic algorithm. The random population generator module contains a Left Shift Register (LFSR) to generate random values in the range of [ \(0-2^{len(frame)}\) ]. The binary values of this range are stored as chromosomes in a register (Ran_reg). Each value from this register C_i is replaced in the restricted frame of the configuration register. The configuration register bits are passed to the respective multiplexer of each PE in VRC, and the output is recorded.

Fig. 9.

The Fitness Evaluation is designed as an individual module similar to the random generator. The inputs to the module are the output of the VRC architecture compared with the reference output stored in FlashROM of the FPGA. The inputs of all combinations are stored as a vector with correct outputs in this flash ROM. A counter is deployed in this module to continuously send the input vectors to the VRC and observe the output. The fitness value after bitwise XOR comparison is added for the fitness value. Based on the number of 0s in the comparison output, the fitness value of chromosome C_i is stored in the register named Fit_Reg. The values with a higher number of 0s are stored in the selection register (Sel_Reg). The reproduction stage of the genetic algorithm continues the evolutionary process for generating new population memory from the fittest chromosome. The crossover module requires two chromosomes, and the allele is exchanged between these two chromosomes at each fixed crossover point. The mutation unit happens with a single chromosome, where the mutation point is chosen randomly with LFSR and bitwise XOR operation is applied for the chosen random points. The newly generated chromosomes are stored in New Pop_Reg and are communicated as new restricted frames in the configuration register. This process is continued until convergence. The act of convergence is verified by the fitness evaluation module where the bitwise XOR comparison between output_VRC and output_Ref produces all 0s; under such conditions the signal of the genetic unit is set to OFF state.

The convergence of GA is influenced by several parameters that can be adjusted to optimize its performance. These parameters include population size, selection methodologies, and genetic operation strategies. The summary of how these parameters affect the convergence of GA is tabulated in Table 3. In order to analyze how these parameters affect convergence, a series of experiments were conducted with 10 trials for each strategy. The experiments focused on three different string lengths: 153, 36, and 6, which correspond to the chromosome length of the existing self-healing BLDC [47] circuit with standard VRC architecture and non-partial reconfiguration, proposed solution-1 with compact VRC structure for the BLDC circuit, and proposed solution-2 with the partial genotype reconfiguration. The convergence acceleration for the proposed compact and partial VRC reconfiguration is evident in Table 3. It is crucial to recognize that the impact of the parameters in a GA on convergence is highly dependent on the specific circuit being considered. The optimal settings for these parameters may differ for different circuits. Moreover, interactions between the parameters can also influence convergence. Therefore, conducting experiments and carefully adjusting these parameters are essential to determine the most suitable configuration for a particular EHW system with a zero fault occurrence rate. After refining the strategies, the GA algorithm can be evaluated through fault simulation. This involves testing the algorithm with chosen parameters, including Rank-Based selection, Uniform crossover, and uniform bit-flipping, with rates of 0.50 and 0.1, respectively. The GA can be assessed by employing these settings, and its performance is measured using fault simulation techniques, which are discussed in Section 5.

Table 3.

String Length	System under Consideration	Population Size	Selection: Rank-based Crossover: Single-point Mutation: Bit Flipping Mutation Pool Rate: 50%	Selection: Rank-based Crossover: Multi-point Mutation: Uniform Bit Flipping Mutation Pool Rate: 50%	Selection: Tournament Size :4 Crossover: Uniform Rate: 0.5 Mutation: Uniform Rate: 0.1
String Length	System under Consideration	Population Size	Average Generation
153	Existing Standard BLDC Circuit [47]	5,000	40	36	17
		1,000	54	42	18
		500	66	57	19
		100	144	149	35
36	Proposed Solution for Compact VRC Architecture	5,000	10	16	5
		1,000	12	9	6
		500	12	10	6
		100	20	17	6
6	Proposed Solution for Partial VRC Reconfiguration	30	1	1	1
		20	2	2	1
		10	3	3	2
		5	9	6	3

Table 3. Effect of GA Parameter and Strategies on Convergence for Existing and Proposed BLDC Circuit

3.5.1 Timing Analysis.

To estimate the time it takes for the suggested genetic algorithm to run, we need to examine two different operating scenarios of the circuit. The first scenario occurs when the circuit is undergoing repair, while the second occurs when the circuit is functioning normally. The genetic algorithm comes into play exclusively when the circuit is in repair mode. During this phase, the candidate control circuit’s configuration bits are generated during each generation \((\alpha _{ran})\) , which is evaluated for its fitness value \((\alpha _{fit})\) . After selecting \((\alpha _{sel})\) the fittest configuration bit, GA performs the genetic operation, like mutation and crossover \((\alpha _{gen})\) . These processes continue until convergence is achieved over multiple generations \((N_{gen})\) . The time taken for fitness evaluation, denoted as \((\alpha _{fit})\) , includes the process of reconfiguring \((\alpha _{rec})\) the configuration register using the generated population. This process also involves selecting the appropriate configuration bit for routing and supplying functions for each PE within the two-dimensional architecture. Finally, the control circuit generates the control signal \((\alpha _{VRC})\) from the VRC architecture to be compared with the reference output.

Therefore, the overall runtime of genetic unit \(~T_{GA}\) shown in Equation (12) of the genetic algorithm for fault repair encompasses the cumulative time taken by each module mentioned above. The PS and s in Equation (11) denote the population size in random population and selection pressure applied by the designer, respectively:

\begin{equation} t_{gen}\approx PS\times (\alpha _{ran}+\alpha _{rec}+\alpha _{VRC})+(PS-s)\times (\alpha _{sel}+\alpha _{gen}) \end{equation}

(11)

\begin{equation} T_{GA} \approx T_{gen}\times N_{gen}. \end{equation}

(12)

The runtime analysis reveals two important points. First, it highlights that much time is allocated to reconfiguring the configuration register and generating control signals through the VRC architecture. Second, the duration occupied by \((\alpha _{fit})\) depends on the population’s total number of mutated genes. Timing analysis conducted using the Vivado IDE indicates a maximum operating frequency of 300 MHz and a reconfiguration time of \(4.6 \times 10^{-6}\) s for a single processing element affected by SEU.

4 Implementation

The proposed scalable solution is implemented on the A3PE3000 FPGA because it is highly used in multiple space-based missions and offers the required performance and secure platform. The A3PE3000 FPGA is mounted on the RTAX adaptor ACT-H3Qi356 with 356 I/O pins for experimentation. However, the crucial advantage of the proposed scalable approach is that any FPGA that does not offer bitstream access and no DPR tools with sufficient capacity can be used.

The generic unit and VRC are modeled using Verilog code (HDL) and deployed using the LiberoSOC suite on the FPGA. After simulation, the synthesized modules were deployed to the FPGA using the FlashPRO express tool. The FPGA utilized the genetic algorithm, which required 3,024 gates, while the VRC architecture used 6,754 gates. In our suggested method, we employ the chromosome population as a register bank in blockRAM. The proposed EHW, combined with VRC, achieves an operational frequency of 300 MHz and a maximum combination delay of 8.7 msec. The minimum input arrival time and output required time before and after the clock are 3.45 ns and 2.78 ns, respectively. The implementation details of the proposed solution for BLDC and RISC-V control circuits used in space missions are summarized in Tables 4 and 5. In addition to the above circuits, The ACM/SIGDA circuit benchmarks are further implemented to prove the efficacy of our system. We have chosen this benchmark since details regarding complex control circuits are scarce in the research community. The corpus constitutes the finite state machine or the behavior model of the control circuit presented in the LGsynth 91 workshop [44]. Among the various benchmark circuits, we have selected four FSMs and utilized the methodology summarized in [1] to convert the KISS2 format to VHDL code and implement the VRC architecture as shown in Table 5.

Table 4.

Resources	Available	Utilized (Proposed)	% of Utilization (Proposed)	% of Utilization [47]
IOs	620	12	1.61	1
CLB Slices	75,264	87	0.11	2
BlockRAM	112	5	4.4	6
FlashROM	1,024	67	6.54	Not used
DFFs	2,594	269	10.37	-

Table 4. Resource Utilization of Proposed Solution for BLDC Motor in Comparison with Existing System [47]

Table 5.

Circuit	Inputs	Outputs	# Transitions	# States	% of Utilization
Circuit	Inputs	Outputs	# Transitions	# States	IOs	CLB	BlockRAM	FlashROM	DFFs
RISC-V	10	4	54	10	7.35	6.67	37.54	21.46	31.97
S1494	8	19	250	48	8.65	4.67	47.09	19.56	27.32
S510	9	7	77	47	9.34	5.07	46.93	20.01	26.96
S832	18	19	245	25	17.74	13.29	60.07	27.56	35.96
S420	19	2	137	18	15.31	11.31	56.23	24.57	32.97
S820	18	19	232	25	23.52	19.89	69.76	31.21	45.31

Table 5. Resource Utilization of Proposed Solution for LGSynth-91 FSM Benchmark Circuits Chosen Based on Circuit I/O and Path Length Complexity

5 Results AND Discussion

This section provides a summary and analysis of experimental results for the following circuits: BLDC, RISC_V, S1494, S510, S832, S420, and S320, among which the comparative study is available for the BLDC control circuit in [47]. The authors have proposed a hybrid EHW approach, where the evolutionary algorithm is hosted in the processor of the FPGA. This approach has been reported as time-consuming in the challenges of the work and considered for future improvement. Therefore, our proposed approach involves utilizing a comprehensive intrinsic strategy in which the genetic algorithm is implemented as a digital circuit alongside the target circuit. The following metrics were considered in comparing the works: (1) resource utilization, (2) fault detection efficiency, and (3) fault recovery rate. In the upcoming discussion, the Proposed Fault Tolerance (Proposed FT) technique is compared with Standard Fault Tolerance (Standard FT) [47] and Triple Modular Redundancy (TMR) for the above metrics. The Proposed FT utilizes the solution discussed in Sections 3.1 and 3.2 to improve the standard FT in terms of hardware utilization and fault recovery time and is modeled with the EHW (GA modules discussed in Section 3.5) for providing a self-healing nature to the control circuits.

Regarding resource utilization, the proposed BLDC resources are summarized in Table 4 of the previous section. The existing system [47] utilization rates are summarized as 1%, 2%, 7%, 1%, 8%, 15%, and 4% for register, LUT, block memory, DFF, clock manager, global clock buffer, and I/O, respectively. The processor metrics, such as clock manager and clock buffer, are not considered since the intrinsic EHW approach is followed. Compared to LUT, BRAM, and I/O, the resources are reduced by 1.99%, 2.6%, and 2.39%, respectively. The number of registers in our approach is 0.7%, which is higher when compared to the existing system. The number of multiplexers utilized in the existing system has accounted for \(18\times 3=54\) MUXes, whereas in our proposed system, \(6\times 2=12\) MUXes are utilized. The number of generations is too low in our proposed system since constrained evolution is adopted in our proposed design. Only 6 out of 36 bits are evolved in the case of SEU; on the contrary, 153 bits are evolved in [47] for fault recovery. Hence, the search space is reduced from \(2^{153}\) to \(2^{6}\) . This improvement in search space has reduced the number of generations from 1,525 in the existing system to 2 in the proposed system.

Compared to TMR, which incurs an area overhead of approximately 200%, both Standard FT and Proposed FT exhibit a lower area overhead of around 56%. However, TMR demonstrates lower power utilization due to concurrent mitigation than the GA-based mitigation procedures, which are iterative and sequential fault recovery. The power utilization in the Proposed FT solution is only 33% higher when compared to TMR, which has been reduced further from Standard FT [47], which recorded 63% higher power usage than TMR. The reduction in power usage is less in the proposed solution when compared to the standard solution due to the reduced number of generations required for SEU correction by targeting specific faulty bits instead of all bits. Regarding latency, 87% speedup is achieved in the Proposed FT compared to Standard FT. In summary, when comparing TMR and the Proposed FT solution, area utilization is reduced in the proposed EHW. In contrast, the Proposed FT solution has an increase in power and latency of 33.72% and 22.97%, respectively. However, it is also worth noting three advantages of the Proposed FT solution compared to TMR:

(1) TMR can correct only a single bit of upset. In contrast, the Proposed FT solution has increased the upset-correcting capability to a minimum of 6 adjacent bits (BLDC) and a maximum of 13 adjacent bits (S1494). (2) TMR is an active fault tolerance mitigation technique requiring the two duplicated circuits to operate throughout the circuit operation irrespective of the fault occurrence. On the contrary, the GA modules are enabled when the fault has been identified. Therefore, when faults have not occurred, the GA modules are not operative. (3) In the case of TMR, periodic scrubbing is needed to restore the original configuration bits, which is automated in the proposed approach and can be best suited for any environmental deployment (FPGA generic).

The area, power, and latency analysis of the Proposed FT compared to Standard FT and TMR is shown in Tables 6 and 7. The values in the table show the percentage of area utilization with respect to the control logic block. The resource utilization report in the Vivado tool has been utilized to record the values. The latency in each circuit was calculated by recording the maximum delay of 3.48 ns and the minimum delay of 2.78 ns in timing constraints.

Table 6.

Circuit	TMR	Standard FT [BLDC [47]]	Proposed FT	% of Area Reduction in Proposed FT w.r.t TMR
Circuit	Area (CLB Slices)
BLDC	3%	2.60%	0.11%	2.9%
RISC V	11.23%	7.87%	6.67%	4.6%
S1494	15.45%	7.43%	4.67%	10.8%
S510	10.72%	10.72%	5.07%	5.7%
S832	23.56%	16.89%	13.29%	10.3%
S420	25.60%	15.78%	11.31%	14.3%
S820	39.89%	24.72%	19.89%	20.0%

Table 6. Area Utilization Profile for Control Circuits in Comparison to TMR, Standard FT [47], and Proposed FT

Table 7.

Circuit	Power (On Chip, µW)				Latency (Delay, ps)
Circuit	TMR	Standard FT [47]	Proposed FT	% of Increase in Proposed FT w.r.t TMR	TMR	Standard FT [47]	Proposed FT	% of Increase in Proposed FT w.r.t TMR
BLDC	210	342.3	264.6	26.56	112	182.56	141.12	26
RISC V	780	1271.4	982.8	31.09	437	437	550.62	14.21
S1494	670	1092.1	844.2	43.26	726	726	914.76	17.97
S510	565	920.95	711.9	51.67	679	679	855.54	25.78
S832	678	1,105.14	854.28	28.9	560	560	705.6	23.45
S420	545	888.35	686.7	32.89	389	389	490.14	29.67
S820	437	712.31	550.62	21.67	980	980	1,234.8	23.72
Average % of Increase in Power (Proposed FT vs. TMR)				33.72	Average % of Increase in Latency (Proposed FT vs. TMR)			22.97

Table 7. Power Utilization and Latency Profile for Control Circuits in Comparison to TMR, Standard FT [47], and Proposed FT

The time for evolving the system can be calculated from t_g—number of generations, t_p—population size, t_v–number of test vectors, t_c—overhead, and f_max—maximum frequency based on Equation (13). Thus, the speedup achieved in evolving the BLDC circuit can be calculated by comparing the time taken for the existing system (21.2) with the time taken for the proposed system (0.92) as 9.2 times. The values for t_p, t_v, t_c, and f_m are assumed in common for both methodologies as 4, 54 ((3-bit input + 6-bit output)*6 combinations), 8, and 100 MHz, respectively. Hence, the acceleration achieved for the Proposed FT solution is approximately 92% times greater than the Standard FT [47].

\begin{equation} \frac{t_{g}\ast t_{p}(t_{v}+t_{c})}{f_{m}} \end{equation}

(13)

The fault mitigation efficiency is accounted for in Figure 10. The number of faulty PEs denotes the number of bits injected with error. For instance, if the number of PEs injected with error is one, then the bits concerning a single PE are injected with SEU. As PEs increase, the number of generations in the existing system drastically grows to approximately 25,000 in the existing system [47]. In our proposed system, faults in all six PEs can be mitigated within a few hundred generations, which showcases the efficiency of correcting multiple-bit upset. This achievement is attributed to the frame encoder identifying faulty positions in the configuration register bits based on the output pattern. By considering the output, the corresponding faulty bits in all six PEs’ configuration registers are selected even in the case of errors in all six output bits. In contrast, the mechanism to locate faulty configuration bits from erroneous bits is absent in [47]. Consequently, all configuration bits undergo the evolutionary algorithm.

Fig. 10.

In order to demonstrate the scalability of the proposed system, circuits such as RISC-V, S1494, S510, S832, S420, and S820 are selected alongside BLDC due to their intricate I/O and extended path length. On the contrary, a comparative study is impossible for these circuits due to the non-availability of the related work. The resource utilization of the above circuits is summarized in Table 5. On average, the utilization of BlockRAM is the highest among all the resources. Each transition of the control circuit is stored as a reference for fitness evaluation in the FlashROM; hence, the increase in memory occupancy is seen for the complex circuit compared to the BLDC circuit. The convergence or fault recovery for the above circuits is shown in Figure 11. Each graph represents the convergence of individual circuits, where the x-axis represents the generation and the y-axis indicates the maximum fitness value attained. The maximum fitness value signifies the highest fitness level achieved by a chromosome or genotype in a specific generation. This value serves as an indicator to determine the smooth progression of the genetic algorithm toward a globally optimal solution. The algorithm is trapped in a local optimum if the value fails to increase with each generation consistently. The convergence with Constrained Evolution (CE) is accelerated compared to Standard Genetic Evolution (SGE) in each circuit. On average, the convergence is 56% lesser in constrained evolution since the circuit is evolved for a restricted length of a chromosome instead of the entire chromosome length. This owes to the fact that search space hugely determines the convergence speed. The straight line in the graph indicates the convergence point, and SEU in the configuration register is mitigated.

Fig. 11.

Table 8 summarizes the fault detection efficiency corresponding to the SEU and Multiple Bit Upset (MBU) convergence rate. The SEU and MBU are simulated on the configuration register of the virtual reconfigurable circuit. The random bit in the configuration bits is selected and XORed with a high signal, leading to a bit flip in functionality and routing bits. Similarly, for MBU, multiple positions in the range of single programming elements are selected and XORed with a high signal. The number of PEs injected with error is selected randomly so that nearly 50% of the total PEs are selected to inject the errors. The search space shown in Table 8 denotes the number of chromosomes selected for constrained evolution by the frame encoder. For instance, the number of PEs injected with error for the BLDC circuit is three. Hence, the search space constitutes 3 \(\times\) 6 = 18 bits, where 6 represents the chromosome bit length of a single PE. The table denotes the minimum and maximum generation taken for mitigating the SEU and MBU faults for 10 trial runs. The generation with which a particular frame of single PE is mitigated depends on constrained evolution. Hence, from case to case, the mitigation or convergence point differs. The faults injected in routing bits of a single PE can propagate to other PEs in the corresponding column compared to faults in the functionality bits. Among the routing and functionality faults, the routing faults require a higher convergence rate. Hence, Table 8 shows that our proposed system can mitigate both SEU and MBU with minimal increase in hardware utilization. Compared with standard genetic evolution, the number of generations for mitigation is less when constrained evolution is practiced because the search space is restricted concerning the erroneous output bit position.

Table 8.

Circuit	Fault Location	# of PEs Injected	Search Space	SEU			MBU
Circuit	Fault Location	# of PEs Injected	Search Space	Min Generation	Max Generation	Avg Generation	Min Generation	Max Generation	Avg Generation
BLDC	Routing	3	18	13	21	23	28	42	35
RISC-V		34	374	43	61	74	51	76	64
S1494		11	187	21	31	37	31	58	45
S510		19	228	49	73	61	62	85	74
S832		23	322	53	101	77	65	127	96
S420		31	341	74	131	103	95	171	133
S820		65	1,040	231	483	357	232	502	367
BLDC	Functionality	3	18	9	13	11	21	34	28
RISC-V		34	374	31	54	58	42	63	53
S1494		11	187	15	23	26	29	47	38
S510		19	228	32	65	65	49	73	61
S832		23	322	46	97	95	52	102	77
S420		31	341	52	112	82	71	131	101
S820		65	1,040	189	456	417	201	489	345

Table 8. Fault Injection Profile and Convergence Points for the Test Circuits

While the discourse above encapsulates the result summary of the proposed FT, it is still essential to conduct a comprehensive analysis comparing the proposed FT methodology against diverse size parameters that exert influence over the phenotype and genotype of the control circuit. This analysis is depicted in Table 9. Furthermore, considering the VRC two-dimensional architecture, pertinent size-related factors are outlined in Table 10.

Table 9.

Circuit	Phenotype- influencing Parameters		Genotype- influencing Parameters
Circuit	# Inputs	# Outputs	# States	Max # Terms in Subexpression	Max # Subexpressions	# Gate-level Operations
BLDC	3	6	4	2	1	2
RISC-V	10	4	10	11	5	4
S1494	8	19	48	7	4	4
S510	9	7	47	5	3	4
S832	18	19	25	7	4	4
S420	19	2	18	8	4	4
S820	18	19	25	9	5	4

Table 9. Phenotype and Genotype Size Parameters for Control Circuit under Consideration

Table 10.

Circuit	Standard VRC [47]					Proposed VRC					Reduction % in Proposed VRC (# Multiplexers)
Circuit	VRC Array Size	# PEs	# Multiplexers	Configuration Size per PE	Genotype Length	VRC Array Size	# PEs	# Multiplexer	Configuration Size per PE	Genotype Length	Reduction % in Proposed VRC (# Multiplexers)
BLDC	6*3	18	54	8	153	6*1	6	12	6	36	77.8
RISC-V	7*78	546	1,638	35	19,110	7*35	245	735	15	3,850	55.1
S1494	5*52	260	780	26	6,760	5*36	180	540	12	2,160	30.8
S510	6*11	66	198	21	1,386	6*6	36	108	14	504	45.5
S832	8*56	448	1,344	32	14,336	8*26	208	624	18	3,744	53.6
S420	7*4	28	84	12	336	7*2	14	42	7	98	50.0
S820	10*21	210	630	34	6,300	10*13	130	390	23	2,990

Table 10. VRC Array Parameters in Terms of PE and Configuration Bits for Control Circuit under Consideration

Circuit complexity is captured through parameters such as the number of inputs, outputs, and states alongside the maximum number of terms within a sub-expression. These factors collectively define the intricacies of the circuit’s functional behavior. The phenotype space of the corresponding control circuit is determined by both the input and output parameters, as elaborated upon in Section 3.1. Parameters associated with the functional expression, such as the maximum number of terms within the sub-expression and the count of sub-expressions, play a crucial role in determining the size of the VRC array concerning path length and functionality. A thorough analysis of gate-level operations within each sub-expression concerning circuit behavior becomes essential to moderate the redundancy between genotype and phenotype spaces.

In Table 10, the size comparison is provided regarding parameters influencing the VRC architecture. Examining phenotype and genotype parameters is pivotal in designing a compact VRC system. Notably, reducing the number of gate-level operations significantly minimizes the column space required for each control circuit within the VRC architecture. Likewise, the maximum count of subexpressions profoundly impacts the number of PEs. This collective effect reduces the genotype length and the overall number of PEs, yielding an average decrease in their respective quantities. The reduction percentages for genotype length and PE configuration prove substantial when considering various circuits—these reductions in genotype length span from 30% in the case of S820 to 76.47% in BLDC. Correspondingly, the cuts in PE configuration size vary between 15% for S820 and 66.67% for BLDC. Also, it is noticeable from the same that the percentage of reduction in multiplexers varies from 30% to 77%. This showcases the efficiency and adaptability of the proposed approach across diverse circuits. To calculate the average SEU speedup from Table 11, we sum up each control circuit’s individual SEU speedup values and divide by the total number of control circuits. For the BLDC circuit, the SEU speedup is calculated as the ratio between the execution time of the Standard FT (21.2 msec) and the execution time of the Proposed FT (0.92 msec), resulting in a speedup of approximately 95.7%. This process is repeated for all the other control circuits, and their SEU speedup values are accumulated. Once we have the sum of all SEU speedups, we divide it by the total number of control circuits to obtain the average SEU speedup, which is approximately 93.69%. This average value represents the improved efficiency of the Proposed FT approach over the Standard FT approach in handling SEU across all control circuits. Similar calculations are performed to determine the average speedup for MBU cases, where the individual MBU speedup values are summed and then divided by the total number of control circuits to yield the average MBU speedup value of approximately 43.21%. These average speedup values collectively demonstrate the enhanced resilience of the Proposed FT approach in mitigating upsets in various control circuits.

Table 11.

Circuit	SEU		% of Decrease in Fault Recovery Time for Proposed FT	MBU		% of Decrease in Fault Recovery Time for Proposed FT
Circuit	Standard FT [47]	Proposed FT	% of Decrease in Fault Recovery Time for Proposed FT	Standard FT [47]	Proposed FT	% of Decrease in Fault Recovery Time for Proposed FT
BLDC	21.2	0.92	95.7	18.1	2.34	5.2
RISC-V	41.6	2.96	92.9	67.5	0.86	66.2
S1494	26.8	1.48	94.5	65.14	14.53	42.8
S510	36.4	2.44	93.3	220	123.45	43.9
S832	42.8	3.08	92.8	370	85.89	76.8
S420	23.5	4.12	82.5	33	29.65	10.2
S820	302.8	14.28	95.3	1,250.4	189.9	84.8

Table 11. Performance of Proposed EHW System (Proposed FT) in Terms of Fault Recovery Time for the Control Circuit Consideration with Varying Size

To assess the efficacy of fault mitigation through random fault injection, we introduce faults into control circuits and observe how the Standard FT and Proposed FT approaches manage them. Random configuration bits responsible for routing and functionality are deliberately flipped. For instance, in the BLDC control circuit, we iteratively inject 100 unexpected faults. The Standard FT mitigates 83 faults, while the Proposed FT handles 94. Notably, the Proposed FT tackles faults that the Standard VRC needs to address. Particularly in the Proposed FT architecture, we observe successful mitigation of faults located at multiple random positions with greater distances, a phenomenon not detected by the standard VRC.

This process is replicated for each control circuit (RISC-V, S1494, S510, S832, S420, S820), yielding distinct fault mitigation efficiencies. By averaging these efficiencies, we establish the mean effectiveness for each approach. For example, if we calculate efficiencies for all circuits as follows: Standard VRC \(\approx\) 81%, 84%, 67%, 78%, 94%, 59%, 78%; Proposed FT \(\approx\) 98%, 89%, 81%, 88%, 92%, 73%, 85%, the average fault mitigation efficiency for the Standard FT is computed at \(\approx\) 78.14%, whereas for the Proposed FT, it significantly improves to \(\approx\) 91.71%. This comprehensive analysis unequivocally demonstrates that the approach attains superior fault mitigation efficiency compared to the Standard FT approach. This distinction is attributed to its enhanced mechanisms for detecting and addressing faults. However, a subset of faults, accounting for 8.29%, are linked to timing-related issues arising from race conditions in the clocking circuits. Identifying these errors proves intricate and may not consistently trigger fault detection mechanisms. The preceding analysis encapsulates the impact of different size parameters on performance outcomes. The circuit’s dimensions, encompassing phenotype, genotype, and VRC structure, emerge as pivotal factors dictating control circuit operational efficiency. Thus, it asserts that crafting the control circuit through VRC implementation necessitates a comprehensive and intricate grasp of these size parameters. This proficiency becomes pivotal in devising a compact VRC architecture and expediting convergence. By leveraging these investigations in our proposed FT, enhancements in fault mitigation performance, specifically in terms of reduced recovery time and increased recovery rate, become attainable goals with lesser area utilization.

6 Conclusion

Adaptability and reliability are critical for electronics components used in radiation-prone environments, particularly mission-critical applications. However, the conventional approach of employing system-level redundancy-based technologies can be resource intensive and costly. Therefore, this study focuses on developing alternative self-healing control circuits to address these challenges. While VRC methodology is commonly used for autonomous circuit design, its effectiveness in developing adaptable systems is limited due to its extensive architecture and lack of partial reconfiguration capabilities. To address these limitations, our proposed work presents a solution that reduces the circuit architecture and localizes errors to expedite the fault mitigation process. To demonstrate the feasibility of this approach, we conducted a comparative study on available hybrid-evolutionary methodologies for the BLDC circuit. By optimizing the hardware utilization in the multiplexer, we reduced it from 54 to 12, resulting in a utilization reduction of up to 77%. Furthermore, the proposed methodology significantly accelerated the convergence speed, reducing the evolution time by 95.7% due to a reduction in the search space from \(2^{153}\) to \(2^{6}\) . To validate the functionality of the proposed system, we implemented it on an A3PE3000 FPGA and injected errors into the routing and circuit functionality. Furthermore, our study delved into the fault mitigation metrics of the complex circuits benchmark that had not been previously examined. We explored how their performance is enhanced concerning varying parameters of the control circuit size. The results of our study indicate that the proposed methodology can achieve scalability and acceleration for circuits, regardless of their complexity. This research advances adaptable and reliable electronic components for use in radiation-prone environments. The reduced architecture and localized error mitigation approach presented in this work offers beneficial results, demonstrating its potential for improving the design of mission-critical circuits. Future work involves investigating advanced optimization algorithms or techniques to streamline the search space and convergence speed further, potentially achieving a more significant speedup in evolution time.

References

[1]

Amr T. Abdel-Hamid, Mohamed Zaki, and Sofiene Tahar. 2004. A tool converting finite state machine to VHDL. In Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No. 04CH37513), Vol. 4. IEEE, 1907–1910.

Position Signal			Driving Signal
S0	S1	S2	C0	C1	C2	C3	C4	C5
1	0	0	0	1	0	0	0	1
1	1	0	0	1	0	1	0	0
0	1	0	0	0	1	1	0	0
0	1	1	0	0	1	0	1	0
0	0	1	1	0	0	0	1	0
1	0	1	1	0	0	0	0	1

Position Signal			Driving Signal
S0	S1	S2	C0	C1	C2	C3	C4	C5
1	0	0	0	1	0	0	0	1
1	1	0	0	1	0	1	0	0
0	1	0	0	0	1	1	0	0
0	1	1	0	0	1	0	1	0
0	0	1	1	0	0	0	1	0
1	0	1	1	0	0	0	0	1

Abstract

1 Introduction

1.1 Contribution of This Article

2 Related Work

3 Proposed Scalable Architecture AND Representation for Evolvable Hardware

3.1 Design of Compact VRC Architecture

3.2 Design of Partial Genotype Representation for Fault Localization and Constrained Evolution

3.3 Case Study I: Implementation of Proposed Scalable Solution in Brushless DC Motor Control Circuit

3.4 Case Study II: Implementation of Proposed Scalable Solution in RISC-V Processor Control Circuit

3.5 Genetic Unit

3.5.1 Timing Analysis.

4 Implementation

5 Results AND Discussion

6 Conclusion

References

Cited By

Index Terms

Recommendations

Evolvable Hardware: From On-Chip Circuit Synthesis to Evolvable Space Systems

Fault Tolerance Analysis and Self-Healing Strategy of Autonomous, Evolvable Hardware Systems

Hardware evolution of a digital circuit using a custom VLSI architecture

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations

Position Signal			Driving Signal
S0	S1	S2	C0	C1	C2	C3	C4	C5
1	0	0	0	1	0	0	0	1
1	1	0	0	1	0	1	0	0
0	1	0	0	0	1	1	0	0
0	1	1	0	0	1	0	1	0
0	0	1	1	0	0	0	1	0
1	0	1	1	0	0	0	0	1