research-article

Open access

ProtFe: Low-Cost Secure Power Side-Channel Protection for General and Custom FeFET-Based Memories

Authors: Taixin Li, Boran Sun, Hongtao Zhong, Yixin Xu, Vijaykrishnan Narayanan, Liang Shi, Tianyi Wang, Yao Yu, Thomas Kämpfe, Kai Ni, Huazhong Yang, Xueqing LiAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 29, Issue 1

Article No.: 3, Pages 1 - 18

https://doi.org/10.1145/3604589

Published: 15 November 2023 Publication History

PDF eReader

Abstract

Ferroelectric Field Effect Transistors (FeFETs) have spurred increasing interest in both memories and computing applications, thanks to their CMOS compatibility, low-power operation, and high scalability. However, new security threats to the FeFET-based memories also arise. A major threat is the power analysis side-channel attack (P-SCA), which exploits the power traces of the memory access to obtain data information. There have been several effective efforts on resistive nonvolatile memories (NVMs), but they fail to meet the requirements for secure FeFET-based memories due to the different capacitive FeFETs load. Directly applying these existing countermeasures to the P-SCA protection for FeFETs induces huge challenges, especially for the balance between power side-channel resistance and corresponding overheads.

To address this issue, we leverage the unique features of FeFETs and propose ProtFe, namely the protection methods for FeFET-based memories, including the pipelined multi-step write strategy (PiMWrite) and the split array design (SpA). PiMWrite is proposed for general FeFET-based memories, and inserts specially designed intermediate states to mitigate information leakage with pipelined steps to reduce overheads. SpA is proposed for custom FeFET-based memories, and simultaneously writes two split portions of the array with shared minimized peripherals to go beyond the balance between security and overheads. Simulation results show that PiMWrite expands the search space of a single power trace to 21× and involves nearly zero hardware penalties. SpA presents 33× search space improvement with negligible latency, 0.6% area, and only 7.1% energy overhead. ProtFe achieves improved balance between security and overheads, compared with the state-of-the-art works.

1 Introduction

Nonvolatile memories (NVMs) are promising in providing higher density and lower-power operation, and are being widely explored in both memories and computing applications [6, 41]. Among various NVMs, ferroelectric field-effect transistors (FeFETs) [3, 11, 19, 23, 24, 28, 29, 37, 39, 40] have received tremendous attention owing to their CMOS compatibility, a large ON/OFF ratio, low power operation, and high scalability. With the HfO \(_2\) doped high-k ferroelectric (FE) layer, FeFETs show good compatibility with the complementary metal-oxide-semiconductor (CMOS) technology. Besides, the three-terminal structure makes FeFETs flexible in circuit designs and benefit from non-destructive read operations. Moreover, many recent works have shown that FeFET-based memories exhibit much lower power consumption than other resistive NVMs such as magnetic RAM (MRAM) [13, 18], spin-transfer torque RAM (STT-RAM) [38], phase change memory (PCM) [1, 33] and resistive RAM (RRAM) [5, 12]. It helps to improve the energy efficiency when these memories are utilized in various applications, including those recently emerged applications such as compute-in-memory (CiM), neuromorphic computing, and hyperdimensional computing (HDC) [8, 9, 15, 20, 25, 27, 32].

However, the physical features of FeFETs induce new threats to data privacy. As illustrated in Figure 1(a), FeFETs have the characteristics of asymmetric write energy and read current. Such asymmetry means that the write energy is different for the datum ‘1’ and the datum ‘0’. This asymmetry could also be found in the read current. This fact exposes an opportunity for attackers to obtain side-channel information about sensitive data and can be exploited to launch power analysis side-channel attack (P-SCA).

Fig. 1.

P-SCA has been a severe threat to memories, including volatile and nonvolatile memories, in cryptographic chips with asymmetric power consumption. P-SCA has thus gained tremendous importance over the last decade [10, 14, 16]. The real-time power traces, as the side-channel information, can be probed from the power supply (VDD) pin. With the knowledge of device features and circuit structures, the attacker can build a power consumption model and match it with the extracted power trace to reveal the Hamming Weight (HW) of the processed data. Then, the sensitive data can be directly revealed by testing multiple inputs and performing correlation analysis in less than 300 traces [17].

To address this issue, prior works have proposed several countermeasures on the resistive NVMs [14, 21], as shown in Figure 1(b). However, these countermeasures face challenges when applied to FeFET-based memories, including the requirement for fully custom circuit design and the poor balance between the P-SCA resistance and the corresponding overheads. For example, constant-current write [14] can successfully hide the asymmetry of power consumption for the resistive NVMs through the current mirrors but can not be applied to FeFETs due to the capacitive load. Parity encoding [14] is a simple and low-cost method and has been widely used in error detection [34]. However, only one redundancy bit limits the security improvement, and sensitive data can still be revealed with some more attempts. Capacitor power bank [21] provides high P-SCA resistance and general applicability for all emerging NVMs at the cost of more power and latency.

To overcome the inherent challenges of existing countermeasures above for the secure FeFET-based memories, combined with the unique features of FeFETs, we propose ProtFe, namely the protection methods for FeFET-based memories, including the pipelined multi-step write strategy (PiMWrite) and the split array design (SpA). As shown in Figure 1(c), ProtFe fully leverages the FeFET characteristics and introduces time and space dimensions into protection techniques to significantly improve the security for both general and custom FeFET-based memories with low overheads. The specific contributions are listed as follows:

•

We propose PiMWrite to mitigate P-SCA from the time dimension. PiMWrite supports general FeFET-based memories with ultra-low overheads and considerable security. The write operation is divided into multiple steps, and these steps consume equal energy, which obfuscates the observation of the attackers. Pipelining is further adopted to reduce latency and energy with the help of the FeFET capacitive load.

•

We propose SpA, a fully custom countermeasure that exploits the space dimension to achieve high P-SCA security and low overheads. Complementary data are written into two split wings to provide consistent power consumption regardless of data patterns.

•

Benchmarking of the proposed PiMWrite and SpA techniques, which shows 21 \(\times\) search space expansion for a single power trace with nearly zero penalties, and 33 \(\times\) search space expansion with negligible latency, 0.6% area, and 7.1% energy overheads, respectively.

The rest of this article is organized as follows: Section 2 reviews FeFET background, P-SCA vulnerability of NVMs, and existing countermeasures. Section 3 introduces the proposed ProtFe and optimizations in detail. Section 4 evaluates ProtFe against prior works. Section 5 presents more discussions on PiMWrite. Section 6 concludes the work.

2 Background

This section reviews the FeFET device and array basics, and illustrates the P-SCA vulnerability of FeFET memories. Existing methods are also discussed afterward.

2.1 FeFET Device and Array Basics

Figures 2(a) and 2(b) show the symbol and structure of a FeFET, respectively. FeFET is a MOSFET (either a planar FET or a FinFET) with an FE layer integrated into the gate stack [3, 11, 19, 23, 24, 28, 29, 37, 39, 40]. While the FeFETs have been proposed for a long time [40], the recent discovery of HfO \(_2\) doped high-k ferroelectric materials triggers new scaling opportunities that enable the FeFET devices with smaller dimensions, lower operating voltage, higher endurance, etc. Recent reports have shown the FeFETs fabricated on 20-nm-thick SOI technology [3] and the FeFETs with a high ON/OFF ratio beyond \(10^{6}\) [24], promising for large memory arrays. Another recent work has even reported FeFETs with 10ns read-after-write programming time at 1.8V operating voltage and endurance of \(10^{12}\) cycles [29]. Moreover, the FeFET physical mechanism has been widely explored by many modeling-related works (e.g., the multi-domain Preisach model [22], the Landau-Khalatnikov model [2], and the Monte Carlo model [7]), which provides more accurate models for circuit designers to simulate the FeFET-based circuit behavior.

Fig. 2.

The fundamental differences between a MOSFET and a FeFET originate from the FE layer polarization behavior. FeFETs use the polarization direction in the FE layer to store the memory state. For typical N-type FeFETs, if applying a positive (usually 1.8V-4V [39]) gate to source voltage ( \(V_{GS}\) ) to the FE layer, electrons will be accumulated in the nucleation-dominated channel and the domains in the FE layer will be switched to the positive state probabilistically, which reduces the device threshold voltage ( \(V_{TH}\) ). Similarly, applying a negative \(V_{GS}\) to the FE layer, the domains in the FE layer will be switched to the negative state probabilistically, which increases the FeFET \(V_{TH}\) . It is observed that this program process exhibits hysteresis and can achieve the multi-level memory state through tuning \(V_{TH}\) carefully, as shown in Figure 2(c). Reading a FeFET is a process to detect the drain to source current ( \(I_{DS}\) ) determined by \(V_{TH}\) with a proper read voltage ( \(V_R\) ). Note that the drain voltage can be the same as the source voltage during the write operation (e.g., both 0V), so no DC power is involved. This capacitive load feature makes it possible for low-power write operation.

Several FeFET-based memories have been designed that take the FeFET advantages aforementioned. Existing FeFET arrays include the 1T AND array [23], the 1T NOR array [28], and the 2T/3T arrays [11, 19, 39]. Figure 3 shows the corresponding cells. The 1T AND array has the highest density. However, due to the shared wordline (WL) structure, the write operation of a vector needs to be carried out in two phases and may cause write disturbance to other memory cells. Figures 3(a) and 3(b) show the different write operations of ‘1’ and ‘0’. The positive or negative write voltage ( \(V_W\) ) can set the FeFET \(V_{TH}\) to different states. It can be seen that the two cells in the same row can only be written in two phases as the WL voltage is different. The 1T NOR array is also dense but consumes more energy due to large channel current compared with the 1T AND array [23]. The 2T/3T designs apply switch transistors to isolate WLs from the FeFETs and eliminate the write disturbance. For the gate-select 2T array, a vector can be written in one phase. Therefore, the 2T/3T arrays provide more reliability and less write delay at the cost of more area.

Fig. 3.

2.2 P-SCA Vulnerability of FeFET Arrays

P-SCA reveals the HW of the processed data by investigating the difference of power traces during read or write operations in the array. The attacker can test multiple inputs and leverage the relationship in specific algorithms between the input data (known) and the stored data (unknown) to steal the stored data encrypted by cryptographic methods such as the Advanced Encryption Standard (AES). Prior works have shown the P-SCA vulnerability of several resistive emerging NVMs due to the high and asymmetric write and read current [16]. There have been several cases that successfully attack AES [17]. Although the P-SCA protection of the resistive NVMs has been well-researched, the P-SCA vulnerability of the FeFET-based memories is not fully explored, especially considering the unique features of FeFETs that exhibit large discrepancies from the features of the resistive NVMs. Here we will illustrate the FeFET vulnerability with the simulation results on asymmetric write energy and read current. The simulation setup is presented in Section 4.1.

With no DC power, the FeFET arrays would not present asymmetric current in the whole period of the write window but may exhibit instantaneous current spikes of different amplitude. Moreover, the stored state has little impact on the power trace as the original \(V_{TH}\) affects the charging process negligibly. These issues bring differences for P-SCA on FeFET-based memories compared with the resistive NVMs during the write operation. However, despite the capacitive load, FeFETs still consume asymmetric write power due to the different charging processes of data. Figure 4(a) shows the power profiles and energy consumption of writing ‘1’ and ‘0’. It is observed that the peaks of the profiles are distinctly different. This is because the number of lines and the corresponding capacitance that need to be charged vary with the written data. This fact makes it simple for the adversary to observe the trace and analyze the represented data.

Fig. 4.

The read operation of the FeFET arrays is similar to other NVMs, and also suffers from asymmetric read currents. Larger \(I_{DS}\) can be detected within the ON-state transistors and distinguished from the OFF-state transistors. Figure 4(b) shows the current of reading 1-bit ‘1’ and ‘0’. The asymmetry would cause severe vulnerability considering the FeFET high ON/OFF ratio.

2.3 Existing P-SCA Countermeasures

Recent research has proposed several countermeasures for write operation protection in NVMs [14, 21]. They are based on masking techniques and can be further categorized into balancing, encoding, and power isolation as follows:

The balancing technique adds extra circuits or dummy cells to balance the power consumption. A typical method of the balancing technique is constant-current write [14], which adopts a current mirror to compensate for the write current. This countermeasure provides high security at the cost of much higher power consumption. When applying to FeFET arrays, the current mirror causes even higher energy overheads as the plain FeFET memory is consuming little energy. Therefore, the method is restricted in applications with FeFETs.

The encoding methods, such as parity encoding, try to obfuscate the observation of the adversary at the cost of less area and energy [14]. However, the limited redundancy bit cannot provide significant security improvement, as sensitive data can still be revealed with some more attempts.

In [21], an on-chip capacitor and low dropout regulators (LDOs) are adopted to supply power for the arrays. The voltage supply is isolated during the write operation so the attackers can only observe the power traces of the charging process. The method eliminates P-SCA threats and can be applied in all emerging NVMs, but this method is costly in both power (due to extra capacitor charging and discharging waste of energy plus LDO input-output difference) and latency (due to the capacitor charging time).

It could be observed that the state-of-the-art P-SCA countermeasures on NVMs are not well balanced between security and overheads, and could not provide a secure solution for FeFET-based memories. Besides, these works mainly focus on custom memory designs, and general memories (especially those that are already fabricated) are not protected. Therefore, both general and custom methods are worth being investigated against P-SCA for FeFET-based memories.

3 Proposed ProtFe

This section introduces the proposed ProtFe in detail, including PiMWrite for general FeFET-based memories and SpA for custom FeFET-based memories. Further optimizations for higher security and lower overheads are also discussed.

3.1 Proposed PiMWrite

As discussed in Section 2.3, making the power consumption constant for different data patterns can provide high P-SCA resistance. However, prior works based on balancing cause significant overheads due to the utilization of extra circuits to narrow the difference. To address this issue, the proposed PiMWrite exploits a new time dimension to balance the power consumption and achieves considerable security improvement with ultra-low overheads.

Figure 5 gives an example that compares the conventional one-step write and the proposed two-step write using time-dimension optimizations, so as to quickly understand the P-SCA security improvement in PiMWrite. Conventionally, writing “0001” or “0011” needs one single write operation, and may leak the written data patterns due to the different power traces between these two target data of “0001” and “0011” . Differently, with the exploration in the time dimension, PiMWrite adds an intermediate state for each target data pattern and finishes each write in two steps. For the example shown in Figure 5(b), to write “0011”, an intermediate state “0-1-” is written in the first step, and the final state “0- -1” is written in the second step. In each step, the applied write pulses always contain 1-bit 0, 1-bit 1 (‘-’ indicates no write operation or pulse applied to the specific bitline): “0-1-” in the first step, and “0- -1” in the second step. With the introduction of the intermediate state, PiMWrite guarantees constant overall write transitions, i.e., 1-bit 0 and 1-bit 1, applied to the bitlines (BLs) independent of the target data pattern. While a more strict and comprehensive analysis would be provided subsequently to show the possibility for all data patterns, from the observation of the attackers, the power trace now is always the same, even with different HWs. In other words, the power trace now does not carry the knowledge of the written data.

Fig. 5.

It is also noted that writing a vector in two steps may degrade the energy efficiency and the throughput of the array. Figure 6(a) shows the operations of writing A and B consecutively. It takes 2N steps or cycles to complete N writes, which halves the throughput of the array. To solve this issue, PiMWrite proposes pipelined write, which combines two adjacent write operations in the array and performs them simultaneously, as shown in Figure 6(b). With the intermediate states properly set, the second step of writing A and the first step of writing B are now performed in two separate rows (Row i and Row j) in the same cycle. The pipeline continues as B and C are written similarly.

Fig. 6.

As shown in Figure 6(b), compared with the two-step write, the pipelined write reduces the required write cycles from 2N to \(N+1\) if the pipeline involves N writes. This is because writing in parallel hides the extra write cycle. Moreover, thanks to the FeFET capacitive write load, simultaneously writing two rows costs the extra energy of WL charging and does not introduce data conflicts. As the parasitic capacitance of WLs is usually much smaller than BLs, the pipelined write consumes nearly \(N+1\) times the energy of a single write operation instead of 2N times. As N gets larger, the energy and latency overhead of the two-step write is significantly reduced as shown in Figure 6(c). Therefore, with the parallel multi-step pipelined write operations that exploits both the time and space dimensions, PiMWrite significantly improves the P-SCA resistance and reduces the overall energy and latency overheads.

Moreover, PiMWrite needs no modifications to hardware and thus supports general FeFET-based memories. For different FeFET arrays, detailed algorithms and selections of the intermediate states may be adjusted. The algorithms can be executed by a secure co-processor so the scheme can be applied to general FeFET-based memories. In this work, a detailed algorithm is designed for the 1T AND FeFET array as an example without loss of generality. The method can also be applied to FeFET arrays using other cell topologies by applying modifications to the algorithm when necessary.

Algorithm 1 shows the two-step write algorithm in the 1T AND FeFET array. For the write operation of a vector in the 1T AND FeFET array, two separate phases with different charging schemes for writing 1’s and 0’s are carried out as mentioned in Section 2.2. Therefore, the two-step write involves four phases, and each phase writes half of the cells in a row. The order of the phases (writing 1’s first or 0’s first) relies on the row currently in the second step. To bring the intermediate states closer to the input data in the sense of HW, some phases may not change the stored data (just like writing ‘0’ to a ‘0’-state cell). With the operation of each phase determined, it is simple to reach each intermediate state and the input data. Within the algorithm, the two orders of phases are the only variable observed in the power traces for any data, so the security is greatly improved.

Figure 7 shows an example of the algorithm. The original states stored in Row A and Row B have no difference. In the first step, 1’s and 0’s are written into Row A successively to reach the intermediate state. The first phase of the second step starts the pipeline and writes 1’s to make sure more 0’s exist in the final state of Row A. The operation corresponds to the data pattern of A. In Case 1, both rows are written in the second phase of the second step to promise at least four 0’s in the intermediate state. For Case 2, Row B writes nothing to prevent excessive 0’s. Finally, B is written into Row B in the third step. Through this method, the second step of writing A runs in parallel with the first step of writing B. Each step writes four 0’s and four 1’s, and three steps are taken to complete the two write operations.

Fig. 7.

PiMWrite fixes the written data patterns and presents the same power trace for different vectors with the two-step write operation. The attackers can hardly reveal the correct HW from the power traces, and thus the security is significantly improved. Thanks to the unique features of FeFETs, different steps of write can be pipelined in different rows to improve energy efficiency and the overall throughput. Moreover, PiMWrite needs no modification to the hardware circuits and can be applied to general FeFET-based memories. With these advantages, the PiMWrite performance in practice can be further improved by the following optimizations.

3.2 Optimizations on the Proposed PiMWrite

No Read Operation Involved. For the selection of the intermediate states, having the knowledge of the stored states usually helps reach the final state, which, however, requires a read operation. As is well known, reading the stored vector before writing would cost extra energy and latency overheads. Fortunately, for FeFET arrays, the asymmetric write energy has little dependence on the stored states. The FeFET device state switching characteristic provides the possibility to remove the read operation in PiMWrite. Through carefully designing the operation orders in Algorithm 1, all steps are only controlled by the input data and involve no read operation, so both performance and efficiency are improved.

Trade-Off between Steps and Types. In some scenarios where the security is not a big threat, the types of the observed data pattern may be relaxed to reduce the needed number of write steps. This is a trade-off between security and efficiency. When more steps are involved, the change for each step can be smaller and requires fewer types of power traces to reach the intermediate states. For attackers, fewer types lead to more obfuscation and harder attack, so more security improvement is provided. However, more steps also induce latency and energy overheads. Therefore, optimizing the trade-off between steps and types to balance security and overheads is meaningful.

Fortunately, we make it possible to achieve the fewest steps with the fewest types in the algorithm for the 1T AND array. The array writes 1’s and 0’s in two separate phases. Considering all-one and all-zero vectors, both orders of the two phases are required, so at least two types of power traces can be observed. Under this constraint, the algorithm reduces the number of steps to 2 (minimum steps) and reaches the best trade-off by overwriting some bits written in the previous phase. For other FeFET arrays, more types may be required without the feature of writing a vector in two phases but finding the best trade-off is still necessary.

Continuous Pipelining. Similar to other pipeline applications, the length of the pipeline affects the overheads of PiMWrite. Continuous pipelining can significantly reduce the overheads of the multi-step write. To lengthen the pipeline, the write operation may stop at the intermediate states until the next input data arrive. When the interval between two write requests to the same subarray exceeds a certain threshold, the data will be written without pipelining to release the occupation of the registers. At this time, the memory is currently writing into other subarrays, so the operation will not induce additional latency. It is noted that the interval can be reduced by memory access scheduling, and the threshold can be co-optimized with the applications to reduce the energy overheads. A new operation carried out in the same row successively would also interrupt the continuous pipelining but can be alleviated with software by-pass. Therefore, PiMWrite can be performed continuously to reduce the latency and energy overheads.

Other Write Schemes and Array Structures. In the 1T AND array, write operations can also be carried out by resetting the entire row and selectively writing certain cells to ‘1’. To perform PiMWrite with this scheme, the target rows are first reset to 0’s. The algorithm is almost the same as Algorithm 1. The only difference is that, for the data whose HW is no smaller than half of the bit width, the phase of writing 0’s can be omitted. Compared with the original write scheme, the modified PiMWrite costs fewer cycles but consumes more energy due to the reset operation.

The 1T NOR array, the drain-select 2T array in Figure 3(d), and the 3T array also require two phases to write a vector, which is similar to the 1T AND array. Therefore, Algorithm 1 can be intuitively implemented in these arrays. For the gate-select 2T array in Figure 3(c), writing a vector needs only one phase. To apply PiMWrite to the gate-select 2T array, the steps are increased to 3 and the data pattern ‘01’ is written in each step for every four bits. However, for each set of 4-bit data, all-zero and all-one vectors have to be written directly, and 4–5 steps are required if the HD of the adjacent data exceeds 2. It is noted that the sets can be flexibly grouped to reduce the occurrence of the above cases. Referring to the discussion about the trade-off between steps and types, the modified algorithm shows comparable security but induces more overheads compared with Algorithm 1. Figure 8 shows an example of the PiMWrite algorithm in the gate-select 2T array. In each step, the data pattern “01” is written. As the HD between A and B is 3, Row B takes 4 steps to finish the write operation. Finally, Row C is written successfully.

Fig. 8.

3.3 Proposed SpA

The proposed PiMWrite has provided a software approach to the security enhancement for FeFET memory arrays that are already fabricated. For a new FeFET memory design, there are other methods that may not need software change, and may achieve higher efficiency. Following the path of making the power consumption constant, one approach to high P-SCA security and low latency overheads is to add the dummy circuits as simplified as possible. As shown in Figure 9(a), writing complementary dummy data patterns would consume complementary energy. Such an observation inspires a trivial idea that adopts writing the complementary data into a dummy array to make the total power consumption constant, as illustrated in Figure 9(b). However, the added dummy array would double the power consumption and the area occupation. To address the issue, the proposed SpA splits the array into two portions. Exploiting the characteristic that the FeFET write energy is linear with the parasitic capacitance, the halved BL length structure would significantly save energy and provide high security with balanced power consumption.

Fig. 9.

Figure 9(c) shows the circuit structure and write operation of SpA, which can be applied to all FeFET arrays. The array is split into two portions, and a dummy row is added to each portion. The BL controllers and sense amplifiers (SAs) are placed between the two portions and shared by both of them. Inverters and two-way switches are placed in the middle to generate complementary data input. The WL controller is partially modified and controls the switches.

Figure 9(d) shows an example of the write operation in SpA. Input data are written into the specified row of one portion, and the bit-reversed data are written into the dummy row of another portion simultaneously. The WL voltage is set to be complementary (i.e., the WL voltage of the specified row and the dummy row is set to \(V_W\) and GND, respectively). As illustrated in Section 2.2, the write energy is nearly linear with the line capacitance and the line length. In each column, as half the length of the BLs is always charged and the other half is always not changed, the total energy consumed by the array is stable for all data patterns. Note that SpA simply utilizes the complement power consumption of writing ‘0’ and ‘1’ and sums them for constancy, it is applicable to the existing types of FeFET array [11, 19, 28, 39].

SpA balances the power consumption with two dummy rows and reduces the overheads with the circuit structure split in half, providing high P-SCA resistance. It is noted that SpA is similar to subarrays, but with significant differences. In [26] and [30], subarray designs are proposed to improve the throughput of read and write operations. The design has independent controllers and SAs operating in parallel [26]. Despite the circuit structure similarity, SpA are completely different from subarrays, as shown in Figure 10(a). SpA aims at security rather than throughput. The controllers and SAs are shared, so the access operations occur in one portion at a time. The design occupies less area and significantly improves the P-SCA resistance without throughput degradation. For various applications, optimizations on SpA can be made to satisfy different requirements.

Fig. 10.

3.4 Optimizations on the Proposed SpA

WL Voltage of the Dummy Rows. The WL voltage is set to be complementary in the original design to fully balance the power consumption. However, for the 2T/3T FeFET arrays, the WL voltage is set the same for writing 1’s and 0’s and leaks no side-channel information. In this case, the WLs of the two dummy rows can be grounded to save energy with the same security improvement, as illustrated in Figure 10(b). For applications with fewer security requirements in the 1T FeFET arrays, the method can also be adopted to reduce overheads.

Read Operation Protection. Figure 10(c) shows the method of read operation protection in SpA. Some applications may require high security, and SpA can provide read operation protection to mitigate P-SCA leveraging the dummy rows. Both portions perform the read operation, and only the valid portion is connected to SAs. As the dummy rows store the most recent written data which are irrelevant or complementary to the accessed data, obfuscated read currents are added with the read currents in the valid portion. Therefore, the attackers can hardly restore the HW of the read-out data from the read currents, and the read operation is protected.

4 ProtFe Benchmarking

4.1 Benchmark Settings and Attack Model

We comprehensively evaluate ProtFe in the 1T AND FeFET array on security performance and overheads compared with parity encoding [14] and power bank [21]. A FeFET Verilog-A model based on [7] and a commercial 65 nm CMOS process are adopted. The key device and simulation parameters are listed in Table 1. With the array size set to 512 \(\times\) 32, 51.2 fF and 3.2 fF parasitic capacitance is assumed for BLs and WLs, respectively. All security improvements and overheads are evaluated and normalized to the corresponding original unprotected design. As the PiMWrite algorithm provides the actual write operations for the memory array, it can be implemented automatically during compiling. Therefore, the memory array simply executes the PiMWrite instructions with negligible algorithm implementation overheads at the run time.

Table 1.

Device Parameters	\(\mathbf {W}\)	\(\mathbf {L}\)	\(\mathbf {T_{FE}}\)	Domain Number
Device Parameters	400nm	100nm	100nm	50
Simulation Parameters	\(\mathbf {V_W}\)	\(\mathbf {V_R}\)	\(\mathbf {V_{precharge}}\)	Write Duration
Simulation Parameters	2.5V	0.4V	0.1V	1 \(\mathrm{\mu s}\)

Table 1. Device and Simultion Parameters of FeFETs

In this work, we make the following assumptions about the attackers. These assumptions are typical for P-SCA in cryptographic chips, and the ProtFe benchmarking is evaluated based on them.

(1)

The attacker can sample the power traces of the memory arrays with high accuracy. Power consumed by other peripherals in the memory can be fully eliminated. The impact of stochastic noise can be minimized by averaging after enough traces are obtained.

(2)

The attacker can extract the HW during the write operation through repeated testing.

(3)

The attacker has sufficient knowledge about the write strategy or the custom design adopted in the array.

4.2 Security Performance Evaluation

To comprehensively analyze the security performance, we evaluate the proposed ProtFe from two aspects, as illustrated in Figure 11(a). Normalized energy deviation (NED) is adopted to evaluate the asymmetry extent for similar power traces. With a smaller NED, similar traces are more obfuscated and can be regarded as the same trace for the attackers. To analyze the range of data patterns generating the same trace with a small NED, search space is used. A larger search space means that one single trace corresponds to more data patterns and has a weaker correlation with the HW, so the security performance is improved. Evaluation results on ProtFe show that NED is as small as existing power balancing methods, and the search space almost reaches the upper limit.

Fig. 11.

Normalized Energy Deviation. If the similar energy consumed for the write operation is E, the NED is defined as follows:

\begin{equation} NED=\frac{\max {(E)} - \min {(E)}}{\max {(E)}}\cdot 100\% \end{equation}

(1)

As ProtFe balances the power consumption to provide security, it is compared with existing power balancing designs in logic circuits to show the power consumption consistency, as shown in Figure 11(b). The NED of the proposed PiMWrite (0.02%) and SpA (0.03%) outperforms SABL (3.2%) [35], WDDL (1.1%) [36], TDPL (0.4%) [4], and Dual-Rail (2.1%) [31]. This is owing to the highly consistent write operation for all data patterns and negligible impact of the history states. As mentioned in Section 2.2, the previous FeFET states will not affect the consistency. Moreover, the array will be reset to the hold state after each operation to prevent the potential disturb, so the history access data have no impact, either.

Considering the device-to-device and cycle-to-cycle variations of FeFETs, we further perform Monte Carlo simulation. A ferroelectric capacitor and a MOSFET are adopted as the equivalent model of a FeFET. It is noted that the device-to-device variation would induce asymmetry in the power consumption of different devices and lead to an increase in NED. However, the cycle-to-cycle variation may bring more obfuscations to the attackers in practice as the power consumption varies with write cycles. Simulation results show 2.5% NED for PiMWrite and 3.7% NED for SpA, which is comparable to the existing works [35, 36]. Referring to the P-SCA resistance evaluation in these works, the deviation is sufficiently small and would make the power traces obfuscated in practice.

Search Space. As the NED is small enough to obfuscate similar traces, the search space of a single trace is expanded. Figure 11(c) shows the search space compared with existing countermeasures on NVMs, which is normalized to the average case of conventional write. It is observed that ProtFe shows superior performance as the bit width gets larger and reaches 21.1 \(\times\) and 33.0 \(\times\) search space with 32-bit width, thanks to the significantly reduced types of power traces. ProtFe also holds consistent performance regardless of the data patterns, while the search space of conventional write and parity encoding degrades when the HW of data is close to 0 or the bit width. Therefore, attackers cannot get comprehensive knowledge of data, and the P-SCA vulnerability is mitigated.

4.3 Area and Latency Comparison

Figure 12(a) evaluates the area and latency overhead of ProtFe. The proposed PiMWrite, the proposed SpA, parity encoding, and power bank occupy \(\sim\) 0%, 0.6%, 3.3% and \(\sim\) 0% more area, respectively. This is because PiMWrite makes no modification to the hardware circuits, and SpA mainly adds dummy rows which are low-cost. For latency overheads, PiMWrite, SpA, and parity encoding all exhibit \(\sim\) 0% extra latency, and power bank exhibits 28.3% more delay due to the charging process. For PiMWrite, the latency brought by multiple steps is hidden by the pipelining, leading to nearly 0% penalty with continuous pipelining.

Fig. 12.

4.4 Energy Comparison

Figure 12(b) shows the energy comparison between ProtFe and existing methods. Simulation results show 0.2% and 7.1% energy overheads for PiMWrite and SpA on average, respectively. The extra energy of both designs is mostly due to the charging of WLs. The overhead of ProtFe is close to that of parity encoding (3.2%) and much less than that of power bank (106.2%), as the capacitor charging-discharging and LDOs adopted in [21] consume much energy. Thanks to the pipelining in PiMWrite, the circuit structure split in half in SpA, and the exploration of FeFET characteristics, ProtFe achieves high energy efficiency in both methods.

Finally, we summarize the comparison in Table 2. It is seen that ProtFe leads in the balance between security and overheads. Moreover, PiMWrite can be adopted in general FeFET-based memories without custom FeFET circuits. Therefore, ProtFe is indeed promising for P-SCA protection in FeFET-based memories.

Table 2.

Methods	Area	Latency	Energy	Normalized Search Space
Parity Encoding [14]	3.3%	\(\sim\) 0%	3.2%	9.0 \(\times\) (max)
Power Bank [21]	\(\sim\) 0% \(^{\mathrm{a}}\)	28.3%	\(\sim\) 0%/106.2% \(^{\mathrm{b}}\)	33.0 \(\times\)
PiMWrite (This work)	\(\sim\) 0%	\(\sim\) 0%	0.2%	21.1 \(\times\)
SpA (This work)	0.6%	\(\sim\) 0%	7.1%	33.0 \(\times\)

Table 2. Overall Comparison between ProtFe and Existing Methods

\(\mathrm{^a}\) Using metal-insulator-matal capacitor in higher metal layers to save area.

\(\mathrm{^b}\) Energy of capacitor charging-discharging and LDOs is considered.

5 More Discussions

5.1 Algorithm Comparison

We carry out AES-128 and the PiMWrite algorithm with C++ scripts on the same hardware. Both algorithms aim at security and can be adopted in general memories, so a comparison of execution time is analyzed. As shown in Figure 13, the simulation results show that the execution time has linearity with input data length in both algorithms. However, PiMWrite has a smaller coefficient and runs 8 times faster with fewer calculations. Moreover, only a few registers are required to save the written data and the intermediate states in the algorithm, while the S-box look-up table in AES-128 would occupy more space in the memory. Therefore, the proposed method has a lower cost compared with AES-128 in the dimension of time and space. Meanwhile, ProtFe can be combined with AES or other encryption methods to further enhance the data privacy.

Fig. 13.

5.2 Future Work

In this work, we deliver a PiMWrite algorithm for the 1T AND FeFET array leveraging the feature of writing in two phases. However, algorithms for other FeFET arrays have not been fully studied and optimized, which are promising as well. Meanwhile, PiMWrite leverages the FeFET capacitive load and embedded DRAM (eDRAM) shares the same feature, so the opportunity of applying PiMWrite to the eDRAM-based memories can be further explored.

6 Conclusion

In this article, we propose ProtFe to protect the FeFET-based memories from P-SCA risks. Fully exploiting the capacitive load characteristic of FeFETs, we explore the opportunity of splitting the write operation in time and space dimensions for security designs and propose two specific methods, i.e., PiMWrite and SpA. Further optimizations such as the trade-off between write steps and permitted power trace types, the continuous pipelining, and the WL voltage selection of the dummy rows are analyzed and discussed. Simulation results show that ProtFe is promising with higher security, lower overheads, and extended coverage beyond the custom memories.

References

[1]

F. Arnaud, P. Zuliani, J. P. Reynard, A. Gandolfo, F. Disegni, P. Mattavelli, E. Gomiero, G. Samanni, C. Jahan, R. Berthelon, et al. 2018. Truly innovative 28nm FDSOI technology for automotive micro-controller applications embedding 16MB phase change memory. In 2018 IEEE International Electron Devices Meeting (IEDM). IEEE, 18–4.

Abstract

1 Introduction

2 Background

2.1 FeFET Device and Array Basics

2.2 P-SCA Vulnerability of FeFET Arrays

2.3 Existing P-SCA Countermeasures

3 Proposed ProtFe

3.1 Proposed PiMWrite

3.2 Optimizations on the Proposed PiMWrite

3.3 Proposed SpA

3.4 Optimizations on the Proposed SpA

4 ProtFe Benchmarking

4.1 Benchmark Settings and Attack Model

4.2 Security Performance Evaluation

4.3 Area and Latency Comparison

4.4 Energy Comparison

5 More Discussions

5.1 Algorithm Comparison

5.2 Future Work

6 Conclusion

References

Cited By

Index Terms

Recommendations

FeFET-based low-power bitwise logic-in-memory with direct write-back and data-adaptive dynamic sensing interface

A case for small row buffers in non-volatile main memories

Building and Optimizing MRAM-Based Commodity Memories

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations