With the development of NAND flash memories’ bit density and stacking technologies, while storage capacity keeps increasing, the issue of reliability becomes increasingly prominent. Low-density parity check (LDPC) code, as a robust error-correcting code, is extensively employed in flash memory. However, when the RBER is prohibitively high, LDPC decoding would introduce long latency. To study how LDPC performs on the latest 3D NAND flash memory, we conduct a comprehensive analysis of LDPC decoding performance using both the theoretically derived threshold voltage distribution model obtained through modeling (Modeling-based method) and the actual voltage distribution collected from on-chip data through testing (Ideal case). Based on LDPC decoding results under various interference conditions, we summarize four findings that can help us gain a better understanding of the characteristics of LDPC decoding in 3D NAND flash memory. Following our characterization, we identify the differences in LDPC decoding performance between the Modeling-based method and the Ideal case. Due to the accuracy of initial probability information, the threshold voltage distribution derived through modeling deviates by certain degrees from the actual threshold voltage distribution. This leads to a performance gap between using the threshold voltage distribution derived from the Modeling-based method and the actual distribution. By observing the abnormal behaviors in the decoding with the Modeling-based method, we introduce an Offsetted Read Voltage (ΔRV) method for optimizing LDPC decoding performance by offsetting the reading voltage in each layer of a flash block. The evaluation results show that our ΔRV method enhances the decoding performance of LDPC on the Modeling-based method by reducing the total number of sensing levels needed for LDPC decoding by 0.67% to 18.92% for different interference conditions on average, under the P/E cycles from 3,000 to 7,000.

1 Introduction

3D NAND flash memory is the primary storage media in many storage systems, like mobiles, IoT devices, and data centers. To meet applications’ demands for large capacity and low cost, flash memories have increased their density from multiple aspects, including multi-level cell technology, intensive cell stacking, and so on [24, 26, 34, 45]. These development trends lead to worsened reliability of the raw data stored in NAND flash cells. By providing strong error correction capability, low-density parity check (LDPC) codes are currently the most popular error correction code (ECC) in NAND flash storage systems [18, 31, 56]. Because of its probability-based decoding nature, the error correction capability of LDPC is highly related to the input probability information for decoding [12, 32, 44]. Therefore, it is critical to properly transform the raw flash information for LDPC encoding and decoding.

In the literature, different strategies are introduced to exploit the error characteristics of NAND flash memory to enhance LDPC error correction capability. The first strategy involves modifying the sensing-level placement scheme to prioritize placing more sensing levels in positions with higher error rates. Dong et al. [12] proposed a nonuniform sensing strategy based on the observation that more errors occur in the overlapped region between two neighboring voltage states. Li et al. [32] proposed an asymmetric sensing-level placement scheme based on asymmetric error characteristics among voltage states in NAND flash memory. The second strategy utilizes the error characteristics of flash memory to optimize the LDPC decoding process, thus speeding up the decoding rate. Zhang et al. [55] leveraged the correlation between errors in different pages of flash memory and added additional information to the LDPC decoding process, thereby accelerating the decoding iteration process. The third strategy focuses on implementing specific quantization methods for the initial information in flash memory channels to improve its accuracy and speed up the decoding process. Ouyang et al. [39] proposed a non-uniform quantization method based on the distribution characteristics of channel initial log-likelihood ratio (LLR) in MLC flash memory, which exhibits better performance. However, all of these strategies are based on the analysis of the theoretical threshold voltage distribution model of the flash memory. The theoretical threshold voltage distribution model may not fully reflect the actual data distribution characteristics in the flash memory, leading to a discrepancy between theory and practice. Therefore, it is essential to analyze based on the actual data distribution characteristics in the flash memory.

In this work, we first characterize the LDPC performance in 3D NAND flash memory. We obtain threshold voltage distributions using two methods: one based on actual data distribution (Ideal case) and the other utilizing a Gaussian model (Modeling-based method). We subsequently study the LDPC decoding performance in 3D NAND based on these two threshold voltage distributions. Through the analysis of extensive experiments, we have summarized four findings. First, in the actual decoding process, soft-decision decoding is only utilized when the interference is severe; in most cases, hard-decision decoding is sufficient to correctly decode the data. Second, the Ideal and Modeling-based methods exhibit a performance gap during decoding, especially when severe interference conditions occur. Third, due to the inter-layer variations in 3D NAND flash memory, there are significant differences in LDPC decoding performance among different layers. Fourth, when using the Modeling-based method for decoding, using the optimal read voltage does not always yield the best decoding results. However, better decoding performance can be achieved by applying a certain offset to the optimal read voltage.

In light of the performance gap we observe between the Ideal case and the Modeling-based method, we introduce our Offsetted Read Voltage (ΔRV) method based on the findings. We compare the threshold voltage distributions established by the Ideal case and the Modeling-based method and find two cases of differences in the threshold voltage distributions for certain wordlines. For the two cases, the Modeling-based method does not achieve the best results when using the optimal read voltage. We respectively adjust the optimal read voltage positions for each wordline based on the characteristics of these two cases and perform LDPC decoding accordingly. The experimental results show that our ΔRV method can significantly reduce the performance gap, with the Modeling-based method slightly falling short compared to the Ideal case. Specifically, on average, our ΔRV method can reduce the number of sensing levels required for LDPC decoding by 0.67% to 18.92% under various interference conditions when Program/Erase (P/E) cycles range from 3,000 to 7,000. Moreover, the ΔRV method can decrease the number of pages with LDPC decoding failures in a block by 1.55% This article makes the following contributions.

We conduct extensive experiments on multiple real chips to characterize the LDPC decoding performance of both the Ideal case and the Modeling-based method in 3D NAND.

To bridge the performance gap between the Ideal case and the Modeling-based method, we introduce our ΔRV method. It greatly improves the LDPC decoding performance by adjusting the optimal read voltage, which is inaccurately modeled in the Modeling-based approach.

The rest of this article is organized as follows: Section 2 describes the background and motivation. Section 3 presents our characterization results and analysis. How our ΔRV method works is explained in Section 4. Section 5 concludes this article.

2 Background and Motivation

2.1 3D Nand Flash

The storage medium of a solid-state drive (SSD) is primarily composed of flash memory. As illustrated in Figure 1, an SSD consists of multiple flash channels, with each channel consisting of several flash chips. Within a chip, there are multiple dies, which can be further divided into multiple planes. A plane typically contains several hundred to thousands of blocks. In SSDs, a block is the smallest erasable unit, while a page is the smallest unit for reading and writing. A block comprises numerous wordlines, and a single wordline corresponds to several pages depending on the type of flash cell. NAND flash memory comprises flash cells that store data by storing electric charges. Flash cells can be categorized into different types, including Single-Level Cell (SLC), Multi-Level Cell (MLC), Triple-Level Cell (TLC), Quad-Level Cell (QLC), and more, depending on the number of bits each storage cell can store. The amount of charge stored in a flash cell, represented as the threshold voltage ( \(V_{th}\) ), determines the data stored in flash cells. Taking TLC as an example, this unit can store 3 bits of data, ranging from “000” to “111,” with each 3-bit value being assigned a threshold voltage window (i.e., state). The threshold voltages of cells programmed to the same state on the same wordline can be approximated as a Gaussian distribution [1, 3, 36, 42], and the probability density curve is shown in Figure 2.

Fig. 1.

Fig. 2.

Different states store 3 bits, with adjacent states differing in only 1 bit. These 3 bits correspond to three different pages: Least Significant Bit (LSB), Center Significant Bit (CSB), and Most Significant Bit (MSB). When reading a specific page, the controller applies one or more read reference voltages ( \(V_{rf}\) ) to determine whether the bits within that page are 0 or 1. For example, when retrieving data from the LSB, we only need to compare the \(V_{th}\) of the flash cell with \(V_3\) and \(V_7\) . If it falls between \(V_3\) and \(V_7\) , the LSB of that flash cell stores a bit of 0; otherwise, it stores a bit of 1. Similarly, we only need to compare the \(V_{th}\) of all flash cells on the wordline with different reference voltages to read data from different pages. However, throughout the lifecycle of NAND flash memory, the distribution of threshold voltages is influenced by various sources of errors, including program interference [2, 6, 28], read interference [4, 17], and retention loss [5, 7, 8], resulting in charge leakage or accumulation in the flash memory cells and causing shifts in \(V_{th}\) . If the same \(V_{rf}\) is used for subsequent reads, there is a possibility of an erroneous determination of the cell’s state, resulting in reading errors. Furthermore, phenomena like Temporary Read Errors [50, 53] can lead to the occurrence of a high raw bit error rate (RBER) during the initial read operations, because of the distortion of the distribution of threshold voltages.

As the demand for higher storage density continues to rise, planar NAND has faced significant challenges in further reducing the cell sizes [4, 5, 6]. To address this, manufacturers have turned to adopt the vertical stacking technology of 3D NAND, where multiple wordlines constitute a layer as shown in Figure 1. Fundamentally, the planar NAND flash typically employs a floating gate cell structure, utilizing conductors to store charges. In contrast, 3D NAND flash adopts a cylindrical charge trap (CT) structure, which employs insulators to capture charges [51]. On a macroscopic level, 3D NAND can be regarded as the vertical stacking of multiple planar NANDs [23], interconnected through channel holes formed using etching processes [21]. However, the variability in hole formation through etching results in significant variations in error characteristics among different layers within the same 3D flash block[20, 48]. Moreover, the stacking architecture introduces additional vertical inter-layer interference, making it subject to more complex interference and changes compared to planar NAND. In conclusion, owing to differences in overall structure and fundamental units, 3D NAND exhibits substantial disparities in its physical and electrical characteristics compared to planar NAND, leading to certain NAND features observed in a more complex manner within the realm of 3D NAND [34].

2.2 LDPC Codes

Due to their excellent error correction performance, LDPC codes, first presented by Gallager [15], are now widely used in storage systems, including NAND flash memory. In this section, we will first introduce the basics of LDPC codes, and then describe how they are utilized in NAND flash memory.

2.2.1 Basics of LDPC Codes.

LDPC codes are linear block codes, and the codes can be described with a parity-check matrix [55] H in Equation (1). Each row of the matrix can be divided into two parts: information bits and parity bits. The corresponding parity bits can be generated using the provided parity-check matrix for a given set of information bits. These parity bits are combined with the information bits to form a codeword. The bits “1” in both parts form a parity equation to ensure the correctness of certain bits in the codeword. Taking the first row of the matrix as an example, the formed equation can be represented as Equation (2). For a codeword generated using this parity-check matrix, there will be five parity equations to ensure the accuracy of the information bits.

\begin{equation} {\bf H} = \left(\begin{array}{cccccc|cccccc} \stackrel{d_1}{0} & \stackrel{d_2}{1} & \stackrel{d_3}{1} & \stackrel{d_4}{1} & \stackrel{d_5}{0} & \stackrel{d_6}{0} & \stackrel{p_1}{1} & \stackrel{p_2}{0} & \stackrel{p_3}{1} & \stackrel{p_4}{0} & \stackrel{p_5}{0} \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 1 \end{array} \right) \end{equation}

(1)

\begin{equation} d_2 \oplus d_3 \oplus d_4 \oplus p_1 \oplus p_3 = 0 \end{equation}

(2)

2.2.2 LDPC Decoding in Flash Memory.

Currently, LDPC codes have been widely applied in flash memory, and there have been many research studies focusing on the application of LDPC codes in flash memory [13, 14, 29, 30, 52, 55, 56].

In flash memory, the decoding methods for LDPC codes can be divided into two types: hard-decision decoding and soft-decision decoding. As shown in Figure 3, hard-decision decoding requires only one hard-decision sensing level for obtaining initial information between each pair of two adjacent states, by comparing the \(V_{th}\) of flash cells with the hard-decision sensing level. Meanwhile, soft-decision decoding applies more sensing levels to obtain probability information of flash cells. In the uniform sensing strategy, sensing levels are uniformly distributed across the threshold voltage distribution. However, the current trend mostly follows a nonuniform sensing strategy [12]. In this strategy, more sensing levels are placed in the overlapping region of two adjacent voltage states because the overlapping region is prone to read errors, necessitating additional sensing levels to achieve higher sensing precision. Specifically, we need first to identify the boundaries of the overlapping region according to Equation (3), where \({B}^{(i)}_{l}\) and \({B}^{(i)}_{r}\) represent the left and right boundaries of the overlapping region, and \({p}^{(i)}\) and \({p}^{(i+1)}\) denote the probability values of two adjacent voltage states at a certain position. To determine the position of the boundary, that is, the size of the overlapping region, we control the voltage values of the left and right boundaries through the ratio R between the states \(P_{i}\) and \(P_{i+1}\) . Once we determine the probability ratio R, which is set to 512 in our experiments, we can establish the boundaries of the overlapping region. Next, within the determined boundary and on both sides of the hard-decision sensing level, we gradually add the soft-decision sensing level to improve the accuracy of LDPC decoding.

\begin{equation} \frac{{p}^{(i)}({B}^{(i)}_{l})}{{p}^{(i+1)}({B}^{(i)}_{l})}=\frac{{p}^{(i+1)}({B}^{(i)}_{r})}{{p}^{(i)}({B}^{(i)}_{r})}=R \end{equation}

(3)

Fig. 3.

If the threshold voltage distribution is divided into more different regions by more sensing levels, the obtained initial information will be more accurate. However, more sensing levels mean longer sensing latency, so blindly adopting more sensing levels to ensure decoding accuracy is not advisable. The decoding latency also needs to be taken into consideration. The current progressive LDPC decoding method [56] integrates both hard-decision and soft-decision decoding, and the entire decoding process is illustrated in Figure 4.

Fig. 4.

During the initial decoding stage, the decoding process starts with hard-decision decoding, where only one hard-decision sensing level is placed between adjacent states. If decoding is unsuccessful, the approach switches to soft-decision decoding. Soft-decision decoding incorporates two initial sensing levels, including the hard-decision sensing level. By gradually increasing the number of soft-decision sensing levels (referred to as k in Figure 4, the initial value is 2) on both sides of the hard-decision sensing level, decoding continues until success is achieved. If the maximum number of soft-decision sensing levels (referred to as n in Figure 4) is reached but decoding still hasn’t been successful, it is determined as a decoding failure. This method has some merits when the RBER is low. However, as the P/E cycles increase and retention time lengthens, resulting in an increase in RBER, the method will exhibit a significant cumulative delay. Specifically, when reading a page with a high RBER, the progressive LDPC decoding method iteratively raises the number of sensing levels until the read data is successfully decoded. With the requirement for additional sensing levels, this procedure involves multiple reading cycles. As a result, this iterative process quickly leads to the accumulation of read latency. Table 1 shows the LDPC read latency with different numbers of sensing levels. With a sensing level of 1, the latency stands at a minimal 85 \(\mu\) s, representing the lowest individual latency. In contrast, at a sensing level of 7, the latency increases to 229 \(\mu\) s. Importantly, the cumulative latency from sensing levels 1 to 7 peaks at 1,099 \(\mu\) s, indicating a significant overall increase in latency [14].

Table 1.

Number of sensing levels	1	2	3	4	5	6	7
Latency ( \(\mu\) s)	85	109	133	157	181	205	229
Accumulated latency ( \(\mu\) s)	85	194	327	484	665	870	1099

Table 1. The Read Latency for LDPC Decoding with Different Numbers of Sensing Levels [14]

2.2.3 Log-likelihood Ratio Calculation.

The decoding algorithms of LDPC are based on belief propagation, such as the Sum-Product Algorithm and Min-Sum Algorithm. During the decoding process, after receiving initial input information, iterative calculations and updates are performed through message passing between variable nodes and check nodes, ultimately achieving the goal of error correction. The so-called initial information referred to here is the LLR, whose accuracy determines the decoding speed.

\begin{equation} LLR\left(b_{i}\right) = log\frac{\int _{R_{l}}^{R_{r}}\sum \nolimits _{P_{j} \in S_{i}}^{}p^{\left(P_{j}\right)}\left(x\right)dx}{\int _{R_{l}}^{R_{r}}\sum \nolimits _{P_{j} \in P}^{}p^{\left(P_{j}\right)}\left(x\right)dx - \int _{R_{l}}^{R_{r}}\sum \nolimits _{P_{j} \in S_{i}}^{}p^{\left(P_{j}\right)}\left(x\right)dx} \end{equation}

(4)

When sensing levels divide the threshold voltage distribution into multiple regions, assuming a flash cell with the threshold voltage \(V_{th}\) , we can determine the voltage region of the cell, as described in Section 2.2.2. The LLR of each bit in the cell that falls into the range ( \(R_{l}, R_{r}\) ), i.e., the region the cell belongs to, where \(R_{l}\) and \(R_{r}\) are two adjacent sensing levels, can be calculated by Equation (4) [12]. P represents the set of all voltage states, while \(P_j\) denotes the \({\it j}\) th (for the TLC cell, \({\it j} \in \lbrace 0, 1, 2, 3, 4, 5, 6, 7 \rbrace\) ) voltage state of P. \(p^{(P_j)}(x)\) represents the probability density function of the threshold voltage of the \(P_j\) . \(S_{i}\) denotes the set of states whose \({\it i}\) th bit is 0. Then, when attempting to retrieve data from a specific page, an LDPC decoding algorithm is employed to decode the entire page of calculated LLR values. If the decoding is successful, the desired data can be obtained.

2.3 Motivation

LDPC technology is widely used to address the reliability and performance issues of NAND, and its decoding process relies on the threshold voltage model. However, with stacking technology, 3D NAND introduces some interference factors that were not present in planar NAND.

Early Retention Loss. To improve programming speed, the tunnel oxide layer of 3D NAND has been designed to be thinner [41], and due to the adoption of a stacked structure, units on the same wordline in 3D NAND share a charge trap layer, making it easier for charge leakage to occur during programming [10]. After 2 hours of programming, charge leakage occurs, causing a reduction in the optimal read reference voltage for the units. The RBER of 3D NAND increases rapidly, reaching an order of magnitude higher within 3 hours and increasing another order of magnitude within 11 days [37]. This is a phenomenon that has not occurred in planar NAND [9, 38].

Layer-to-layer Process Variation. Due to technological limitations, the channel holes that connect different layers of 3D NAND cannot have consistent shapes and sizes. This leads to significant variations in the average and variance of the threshold voltage distribution of the flash cells as the number of layers increases [20, 40, 43]. The RBER difference between the strongest and weakest layers can reach an order of magnitude.

Retention Interference. In 3D NAND, flash cells in the same wordline share the same charge trap layer. When two adjacent cells have different threshold voltages, charge leaks from the higher-threshold cell to the lower-threshold cell, causing changes in the threshold voltage distributions of both cells. In MLC particle NAND, the change in the average voltage offset over 24 days between units where adjacent cells are in a higher-voltage state compared to units where adjacent cells are in a lower-voltage state is within the range of \(5\%\) to \(20\%\) [37].

These factors interfere with the threshold voltage distribution of NAND more complex, and previous studies have indicated that the threshold voltage distribution should follow a skewed distribution [32, 34]. However, traditional modeling methods typically use existing distribution models or consider single interference factors to establish the threshold voltage model [3, 42, 46, 47, 57, 58]. These modeling approaches may struggle to achieve higher modeling accuracy. Considering the complexity of 3D flash memory and the limitations of existing modeling methods, this article aims to investigate the disparities between the theoretical threshold voltage distribution model and the actual threshold voltage distribution in flash memory.

3 Characterizing LDPC Performance in 3D NAND

In this section, we conduct a series of experiments to investigate the decoding performance of LDPC on real 3D NAND flash devices.

3.1 Characterization Methodology

3.1.1 The Settings of the Characterized Strategies.

In SSD products equipped with LDPC, the LLR values are pre-defined based on the modeled voltage distribution of NAND flash. We call this method the Modeling-based method, which is the actual case in SSDs. For comparison, we introduce an Ideal case by adopting the ground-truth voltage distribution for LDPC decoding parameter tuning, which is unrealistic to implement in normal use. We mainly compare the performance of LDPC decoding conducted on the Modeling-based and Ideal threshold voltage distribution. This subsection describes how these two \(V_{th}\) distributions are obtained.

Ideal case: In the first experimental method, to decode the data in flash chips, it is necessary to obtain the real 3D NAND Flash threshold voltage distribution model to obtain the LLR. Only with the threshold voltage distribution model can we divide it into different regions based on the hard-decision sensing level and soft-decision sensing level, and then the LLR of each region is obtained. Therefore, we utilize the read-retry method [7] to continuously fine-tune each reading reference voltage within a certain range for the reading of a certain block of flash memory. Then, based on the results obtained from reading with all the reference voltages within the range, we perform calculations to ultimately obtain the threshold voltage distribution for a specific wordline in Figure 5 where the Ideal voltage distribution is described by the bar chart. Although the complete distribution of the erase state cannot be calculated due to its extensive voltage range, the cells that belong to the erase state but are not statistically counted will not be in the overlapped region between adjacent states, and they will not affect the final decoding process. Notably, the Ideal case, although capable of obtaining exact data distribution through extensive testing, proves impractical in real-world applications due to the associated costs. In this regard, we present the data in the Ideal case solely for experimental comparative analysis to underscore the limitations of other approaches. Then, based on the obtained Ideal threshold voltage distribution, we divide it into multiple quantization regions using hard-decision or soft-decision sensing levels. After calculating the LLR values for each region according to Equation (4), when we decode a specific page on a wordline, we only need to compare the voltage values of each flash cell with the corresponding sensing levels for that page to determine their region. This allows us to obtain the LLR values for an entire page, which can be used directly for the following LDPC decoding.

Fig. 5.

Modeling-based method: In the second experimental method, similarly, before decoding, it is necessary to obtain the corresponding threshold voltage distribution model. The existing modeling methods like the Gaussian-based Model, Normal-Laplace-based Model, and Student’s t-based Model [35] are all static distribution models established based on the testing data from real Flash chips. Among the three voltage distribution models mentioned, although the Gaussian-based Model may have slightly lower accuracy than the latter two models, it offers the advantage of simplicity in calculations and does not require extensive computational resources. Therefore, it is a suitable choice for conducting decoding analysis and has been adopted in the latest literature [11, 49, 54, 55]. To construct theoretical threshold voltage distribution models under different P/E cycles, retention times, and read disturbances, we utilize real test data from flash chips under different scenarios. By calculating the mean and standard deviation for each state in these different situations (partial results can be seen in Table 2), we could build the corresponding theoretical threshold voltage distribution models like Figure 5. To maximize the accuracy of the model, we minimize the variance of the theoretically established model through fine-tuning to reduce its Kullback–Leibler (KL) divergence [35, 36] from the actual threshold voltage distribution. Table 3 displays the KL divergence values between the threshold voltage distribution of Wordline 1300 and the Modeling-based model, which represents the minimum value after optimization. For all wordlines, their KL divergences have been tuned to the minimum.

Table 2.

P/E	P0	P1	P2	P3	P4	P5	P6	P7
1	\(-\) 40.3	13.4	52.1	86.3	126.1	162.0	204.4	249.3
1,000	\(-\) 39.7	12.1	50.6	84.9	124.9	160.8	203.5	248.7
3,000	\(-\) 41.1	11.7	50.0	84.1	123.8	159.4	201.9	246.9
5,000	\(-\) 40.4	11.8	50.1	84.2	123.7	159.3	201.7	246.8

Table 2. The Mean of Theoretical Threshold Voltage Distribution Model When the Baking Time is 9 Hours

Table 3.

P0	P1	P2	P3	P4	P5	P6	P7
0.155	0.089	0.024	0.139	0.198	0.178	0.656	1.197

Table 3. The KL Divergence between the Threshold Voltage Distribution of Wordline 1300 and the Modeling-based Model

Here, we compare the fitted Modeling-based model with the Ideal voltage distribution in the same figure, where the bar chart represents the Ideal voltage distribution, and the line chart represents the Modeling-based model. Due to the inter-layer variations in the flash memory, different layers of the same block also exhibit certain differences in their voltage distribution models. Therefore, when fitting the theoretical models, we exclude some layers that are significantly affected by the 3D NAND flash hierarchical structure and use the average values of different wordline model parameters to create the fitting model. As shown in Figure 5, the voltage models of different wordlines show different fitting performances compared to the fitted theoretical threshold voltage distribution models. In Figure 5(a), the fitting performance for state 7 (P7) is relatively better than that in Figure 5(b). Then, we utilize this unified Modeling-based threshold voltage distribution model to decode all pages in an entire block.

3.1.2 Experimental Setup.

There are two main procedures for conducting the experiments. First, we use MATLAB to perform LDPC encoding on a random sequence of numbers to generate QC-LDPC codes with a code rate of 8/9, based on the setting in the latest literature [11, 19, 54]. One key characteristic of LDPC codes is the tight relationship between the LDPC decoding performance and the accuracy of LLR information obtained through sensing. Our optimization does not target improving the LDPC decoding performance of a particular code rate but rather boosting the accuracy of LLR information, which is beneficial for arbitrary code rates.

Second, we use a 3D NAND flash testing platform to write the generated codewords into all pages of a specific block of a 3D TLC flash chip. The main flash chip used for testing consists of 176 layers per block, with 8 wordlines per layer, resulting in 1,408 wordlines per block. For TLC chips, each wordline contains 3 pages, so each block has a total of 4,224 pages, with each page being 18 KB in size. After the chip experiences external interference such as P/E cycle, retention time, and read disturbance, we use MATLAB to decode and analyze the data within this chip. The maximum number of sensing levels is set to 7. To study the decoding performance of LDPC in 3D NAND flash memory under different situations, we mainly set two sets of parameters. On the one hand, according to the Arrhenius model, when \(E_a\) is set as 1.1 eV, baking for 2.19 hours at 120 \(^{\circ }\) C is equivalent to retention time of 1 year at 40 \(^{\circ }\) C [33].

Therefore, for the blocks that have undergone different P/E cycles, 0, 1k, 3k, 5k, we have set a series of different baking hours (from 1 to 11 hours) in a 120 \(^{\circ }\) C environment to accelerate the increase of RBER, while the value of read count is maintained at 0. And the corresponding retention time at 40 \(^{\circ }\) C can be seen in Table 4. On the other hand, for the blocks that have undergone different P/E cycles, 1, 1k, 3k, and 5k, we have set a series of different read disturbances, including read count values of 1k, 3k, 5k, 7k, 9k, 11k, and 13k, and keep the baking time at 0 hours. Moreover, as described in Figure 6, we have also configured four different initial read voltages for decoding in the overlapped region between adjacent states \(P_i\) , \(P_{i+1}\) , including one default read voltage (called “DRV” in this article) and one optimal read voltage (called “ORV” in this article). Among them, “DRV” is to apply the initially set read voltage within the chip as the initial read voltage, remaining unchanged despite various disturbances the chip may undergo. The ORV is set between two voltage states, always residing at their intersection, which results in the lowest RBER in the overlapping region. The value of “ORV” is also achieved in the Ideal case by excessive experiments. After calculating the best voltage offset (called “BVO” in this article, which means the distance between two voltages) between “DRV” and “ORV” by subtracting the value of “DRV” from “ORV,” we obtain the remaining two initial read voltages: “DRV + 0.5 \(\times\) BVO” and “ORV + 0.5 \(\times\) BVO.” Note that “ORV + 0.5 \(\times\) BVO” equals “DRV + 1.5 \(\times\) BVO.” After a decoding failure, it’s necessary to adjust the initial read voltage position through read retry to reduce the RBER and thereby increase the probability of successful decoding. However, during actual data reading, it’s often challenging to precisely obtain the optimal read voltage, and more often than not, the initial read voltage used deviates from the optimal read voltage by a certain distance. Therefore, here we have additionally set “DRV + 0.5 \(\times\) BVO” and “ORV + 0.5 \(\times\) BVO” to analyze and compare the impact of different initial read voltage positions on LDPC decoding.

Table 4.

Baking Time (hour)	1	3	5	7	9	11	13
Retention Time (year)	0.46	1.37	2.28	3.20	4.11	5.02	5.94

Table 4. The Retention Time at 40 \(^{\circ }\) C Corresponding to Different Baking Hours at 120 \(^{\circ }\) C

Fig. 6.

Then, based on the chip data under different interference factors, we conduct the decoding analysis using the two methods. The analysis results are presented in the following section.

3.2 Characterization Results and Analysis

In this section, we present and analyze the experimental results from various perspectives and summarize our findings.

3.2.1 LDPC Hard vs. Soft Decoding.

First, we present an experimental analysis of LDPC decoding with the threshold voltage distribution collected from real-chip testing of the flash memory, i.e., Ideal voltage distribution, and the threshold voltage distribution modeled by the Gaussian-based Model, i.e., Modeling-based model. For pages with a low bit error rate, hard-decision decoding is sufficient to successfully read the data. In contrast, for pages with a higher number of errors, the situation is different and may require soft-decision decoding. In actual testing with different error sources, we find that hard-decision decoding can successfully decode data in most cases. Soft-decision decoding is only required for MSB pages with excessive read disturbance conditions and LSB pages under long retention time. For CSB pages, soft-decision decoding rarely happens. The phenomenon can be explained from the error sources. Regarding two major interference factors in 3D NAND flash, retention time and read disturbance, the overlapping regions in threshold voltage distribution are also different. Errors caused by an increase in retention time mainly concentrate in the overlapping region of P6 and P7 (corresponding to the read voltage required by reading LSB), while errors resulting from intensified read disturbance are distributed in the overlapping region of P0 and P1 (corresponding to the read voltage required by reading MSB). Therefore, in the following analysis, all our analyses are focused solely on the MSB with relatively high bit error rates in the case of different read disturbance conditions. Similarly, for different retention time conditions, we concentrate only on the LSB with extensive retention errors.

Then, we decode the LSB and MSB separately for both scenarios using the Ideal threshold voltage distribution and the Modeling-based threshold voltage distribution model and conduct statistical analysis on their decoding results. When performing LDPC decoding, hard-decision decoding requires much lower latency than soft-decision decoding. However, for data with higher error rates, only soft-decision decoding can correctly read the data. Here, we have compiled statistics on the change in the percentage of pages in the entire block that can be successfully decoded using hard-decision decoding under two methods and four different initial read voltages as the interference conditions intensify. For the Ideal case, from Figures 7(a), 7(b), and 7(c), we can observe that with the continuous increase in baking time, the proportion of pages in the entire block that can be decoded using hard-decision decoding decreases. This downward trend is particularly evident when using the “DRV” initial read voltage. This is because as retention time increases, the continuously rising RBER makes hard-decision decoding challenging to achieve successful decoding, necessitating the use of soft-decision decoding. For the other three initial read voltages, there is only a slight decrease in the proportion of pages using hard-decision decoding when the baking time is between 7 and 9 hours. Figure 7(d) shows the proportion of pages being correctly read by hard-decision decoding. Similarly, the proportion decreases with the increase of read count, and we only present the results when P/E cycles are 5k. For the Modeling-based method, we can also see from Figure 8 that the percentage of pages using hard-decision decoding displays a changing trend similar to that of the Ideal case. The main difference lies in the Modeling-based method, where the decoding performance with “DRV + 0.5 \(\times\) BVO” and “ORV + 0.5 \(\times\) BVO” as initial read voltages differs significantly.

Fig. 7.

Fig. 8.

In summary, regardless of whether the Ideal case or the Modeling-based method is used, a higher occurrence of soft-decision decoding will only happen in cases where interference factors are extremely severe. In such instances, if default read voltages are used, the voltage offset caused by interference will lead to an increase in RBER, requiring more soft-decision decoding. This significantly increases the decoding delay. However, if using an initial read voltage closer to the optimal read voltage like “DRV + 0.5 \(\times\) BVO,” “ORV,” and “ORV + 0.5 \(\times\) BVO,” it will significantly reduce RBER, resulting in the successful decoding of most wordlines with only hard-decision decoding required. This means that the selection of the initial read voltage values significantly impacts the LDPC decoding performance. With the optimal read voltage, the data could be correctly decoded by LDPC hard-decision decoding, even under high P/E conditions, long retention time, and severe read disturbance. Conversely, when the initial read voltage is far from the optimal read voltage position, there is a greater likelihood that the data can only be read using soft-decision decoding. In some cases, decoding may fail, making it impossible to retrieve the data. Therefore, finding the optimal read voltage or the initial read voltage close to it will significantly reduce the latency issues caused by soft-decision decoding.

3.2.2 Gap between Modeling-based Method and Ideal Case.

Because the optimal read voltage obtained by calculation in the Modeling-based method could deviate from the actual optimal values, the LDPC performance deteriorates as the performance is closely related to the accuracy of LLR values. In light of the differences between the Ideal voltage distribution and the Modeling-based voltage distribution model, we compare the proportion of pages using soft-decision decoding. As shown in Figure 9, apart from using the initial read voltage “ORV + 0.5 \(\times\) BVO,” in all other scenarios, the proportion of pages using soft-decision decoding is higher in the Modeling-based voltage distribution model compared to the Ideal voltage distribution. Indeed, that’s because the Modeling-based voltage distribution model cannot perfectly fit the actual voltage distribution characteristics of each wordline like the Ideal voltage distribution. As a result, the “ORV” of the Modeling-based voltage distribution model cannot minimize the actual data’s RBER as effectively as the “ORV” of the Ideal voltage distribution. When using “ORV + 0.5 \(\times\) BVO” as the initial read voltage, the Modeling-based method exhibits slightly better decoding performance than the Ideal case. Due to the different positions of “ORV” in the two models, “ORV + 0.5 \(\times\) BVO” in the Modeling-based method results in a more significant reduction in RBER, leading to better decoding performance. To better explore the discrepancy resulting from the Modeling-based voltage distribution model’s inability to precisely replicate the actual voltage distribution, we concentrate solely on the differences in decoding performance between the two methods when “ORV” is employed as the initial read voltage. As shown in Figure 9(c) and Figure 9(g), as the P/E cycles increase, the proportion of soft-decision decoding usage rises for the Modeling-based method, while under the Ideal case, the increase in the proportion of soft-decision decoding usage is relatively gradual. This further widens the performance gap between the two methods.

Fig. 9.

Based on the above results, we can observe that when we use the optimal read voltage as the initial read voltage, both the Ideal case and the Modeling-based method can significantly reduce the number of sensing levels. However, when the flash memory chip experiences more severe interference, there is a more significant overlap between different voltage states. In such cases, even when using the optimal read voltage, the data may not be correctly decoded. To further analyze the performance differences between the Ideal case and the Modeling-based method, we conduct a series of similar experiments that put the chips under more severe interference conditions. There are a total of five different interference conditions in which the baking time is set at 13 hours (at an environment temperature of 120 \(^{\circ }\) C), and the P/E cycles are 3k, 5k, and 7k, while the read counts are from 13k to 21k, respectively. Under the combined influence of multiple interference factors, there is also a more significant overlap between voltage states. In this case, the region with severe overlap is between P0 and P1, so we focus our analysis on the MSB.

Then, we analyze the decoding performance of the two methods under different conditions when using the optimal read voltage as the initial read voltage to decode the data of MSB. After separately decoding the data in real flash chips under different conditions using two methods, we calculate the total number of sensing levels required to decode an entire block of data under each condition, as shown in Figure 10. With the increase in P/E cycles and read count, the required sensing level for successful decoding continues to rise. This means that the latency required to correctly read data from flash memory increases. Meanwhile, the Ideal case requires fewer sensing levels to achieve successful decoding than the Modeling-based method under all conditions. When the P/E cycle is 3k in Figure 10(a), the Modeling-based method requires 46.9 \(\%\) more sensing levels than the Ideal case. When the P/E cycle is 5k in Figure 10(b) and 7k in Figure 10(c), the Modeling-based method calls for 45.4 \(\%\) more sensing levels than the Ideal case in the first case and an extra 13.9 \(\%\) in the second case.

Fig. 10.

Specifically, as shown in Figure 11, the Modeling-based method results in a significantly higher number of pages experiencing decoding failures compared to the Ideal case. In three distinct P/E cycle scenarios, the Modeling-based method experiences 76.3 \(\%\) , 63.7 \(\%\) , and 7.1 \(\%\) more failed decoded pages than the Ideal case, respectively. This difference is because the Ideal case produces a more accurate model and initial probability information, resulting in better decoding performance than the Modeling-based method. It’s worth noting that as the P/E cycle increases to 7k, the number of sensing levels required for decoding increases significantly, and the gap between the Ideal case and the Modeling-based method further diminishes. This is because when the P/E cycle increases to 7k, the data RBER in flash memory increases significantly, even exceeding the error correction capability range of LDPC, resulting in a substantial increase in the number of sensing levels. This is also confirmed by Figure 11(c). Though the Ideal case provides more accurate LLRs, it still cannot compensate for the limited error correction capability of LDPC. However, this does not prevent the existence of a performance gap between the Ideal case and the Modeling-based method. For the Modeling-based method, there is still some room for optimization.

Fig. 11.

3.2.3 Impacts from Inter-layer Variations.

Due to the inter-layer variations in 3D NAND flash, different layers exhibit distinct characteristics, including different RBER, optimal read voltages, and threshold voltage distributions. To better understand the usage of soft-decision decoding in flash memory, further analysis will also be conducted for scenarios with severe interference, such as baking time between 7 hours and 11 hours and read count from 9,000 to 13,000. We present the differences in soft-decision decoding usage among different wordlines in 3D NAND flash. In Figures 12(a), 12(b), and 13(c), under different baking times, there are noticeable differences in the number of sensing levels required by different wordlines. When using “DRV” as the initial read voltage, some wordlines cannot be correctly decoded by seven sensing levels, which is the maximum setting of the current LDPC, leading to a great number of decoding failures. However, when the initial read voltage is close to or set as the optimal read voltage, most wordlines can be successfully decoded using hard-decision decoding. Similarly, as shown in Figure 12(d), under read disturbance, differences among wordlines exist as well. We only present the results when P/E cycles are 5k. Different layers exhibit varying tolerance to interference factors, resulting in distinct inter-layer variations. The decoding results presented above were obtained using the Ideal voltage distribution. For the Modeling-based voltage distribution model, we perform experimental tests with identical configurations, and the results are shown in Figure 13. In the Modeling-based method, the variations become smaller than those in the Ideal case. With the same baking time/read count, the variation among wordlines increases with the increase of P/E cycles. We can conclude that although they use the same initial read voltage, there are differences in the number of required sensing levels. Nevertheless, most wordlines requiring more sensing levels are located in similar positions. This observation highlights that the disparities between different layers are responsible for the varying numbers of sensing levels required for each layer.

Fig. 12.

Fig. 13.

3.2.4 Abnormal Behaviors in Optimal Read Voltages.

In theory, using the optimal read voltage would achieve the best performance, because it has the most accurate estimation of the probability information of read data. However, we can observe using “ORV + 0.5 \(\times\) BVO” as the initial read voltage requires less soft-decision decoding compared to using “ORV” in Figure 9. This phenomenon might be caused by the disparities between the estimated data in Modeling-based estimation and the actual data. The estimated optimal read voltages are obtained based on some averaged data, but in reality, the actual optimal read voltage values could be very different because of the complex disturbances. The variation may contribute to the discrepancies in the decoding performance between the Modeling-based method and the Ideal case. Therefore, we conduct additional tests with various initial read voltages to examine the total number of sensing levels required for the entire block. As shown in Figure 14, the horizontal axis represents the x-value of the initial read voltage “DRV + \(x \times\) BVO.” The Ideal voltage distribution requires the least number of sensing levels, setting “ORV” as the initial read voltage (x = 1.0 in the figure). At the same time, the Modeling-based method does not achieve the best performance with x = 1.0, but rather closer to x = 1.4. The value of x may fluctuate under different interference conditions, typically ranging between 1 and 1.5.

Fig. 14.

The result implies that due to the existing inter-layer variations, the optimal read voltage of the Modeling-based model cannot minimize the required sensing levels for decoding. Instead, it requires a relative offset. When there are discrepancies between the Modeling-based voltage model and the Ideal voltage distribution, as shown in Figure 5(b), the optimal read voltage between the P6 state and P7 state may assign some data belonging to the P7 state to the left side of the voltage region. As a result, the LLR values for this portion of data become less accurate, leading to increased decoding latency. Therefore, when using the Modeling-based voltage distribution model for wordline 1300, the optimal read voltage required should be offset to the left by a certain unit. For different wordlines, the overlap of voltage states can also vary, leading to changes in the offset of the optimal reading voltage position. The specific offset magnitude should also be adjusted based on the voltage distribution of different wordlines.

3.2.5 Observation Summary.

Based on the experimental and statistical results mentioned above, we can draw the following findings.

Finding 1: In general, LDPC hard-decision decoding can correctly read data in most cases. In normal usage, soft-decision decoding is frequently triggered when reading with the default read voltages. However, by setting the optimal read voltage, soft-decision decoding is rarely triggered.

Finding 2: A noticeable gap exists between the LDPC performance of the Modeling-based method and the Ideal case. Because the Ideal case utilizes actual threshold voltage distribution information, the decoding performance is the ideal case. If the threshold voltage distribution estimated in the Modeling-based model is close to the actual distribution, the LDPC decoding performance will also be similar to the Ideal case.

Finding 3: For 3D NAND flash memory, due to its inter-layer variation characteristics, there are significant differences in the soft-decision decoding usage and the number of required sensing levels among different layers. As a result, attempting to use a single unified Modeling-based voltage distribution model for the entire block may not accurately fit the actual voltage distribution for each wordline. Therefore, there is room for improvement in the decoding performance by adopting more refined and adaptable models to account for these variations.

Finding 4: As the Modeling-based method fails to accurately estimate the real voltage distribution in some cases, the estimated optimal read voltages have some deviations from real ones. Therefore, the LDPC decoding performance at the estimated optimal read voltages is not the best, while reading with some offsetted read voltages could perform better.

4 Optimizing LDPC Performance in 3D NAND

In this section, we aim to bridge the performance gap between the Modeling-based method and the Ideal case. This becomes especially crucial when various interference factors become complex and severe, raising challenges for high performance in LDPC decoding. To address this challenge, we introduce an ΔRV method, by calculating an offset value (Δ) for each layer, enhancing the LDPC decoding performance.

4.1 Analysis of Performance Gap

As elaborated in Section 3.2.2, the Modeling-based method requires more sensing levels for data decoding compared to the Ideal case. This is because the threshold voltage distribution estimated by the Modeling-based method cannot accurately match the actual threshold voltage distribution, which is achieved in the Ideal case. Furthermore, due to the inter-layer variation characteristics inherent in 3D NAND flash memory, there are differences in the actual threshold voltage distributions among different wordlines within the same block. This often makes it challenging to fit the voltage distribution of all wordlines using a single theoretical voltage distribution model. Hence, we will analyze the common characteristics of the inter-layer variations. Since our experiments focus on decoding MSB data, our analysis of model differences will also be limited to the overlapping regions corresponding to MSB (the overlap region between P0 and P1 and the overlap region between P4 and P5).

First of all, by observing the two voltage distributions obtained through the Ideal case and the Modeling-based method, we summarize the differences at the P0 state and P1 state into two cases, while P4 and P5 states have little difference between the two voltage distributions. For Case 1 in Figure 15(a), the P0 state in the Modeling-based voltage distribution is shifted slightly to the right compared to the P0 state in the Ideal case, and the P1 state has similar behavior. In this case, the overlap region between the P1 state and P0 state in the actual data distribution is relatively small, which results in the corresponding RBER also being relatively low. From the decoding results, it can be observed that the Ideal case, in comparison to the Modeling-based method, reduces the number of pages that originally required soft-decision decoding, making them amenable to hard-decision decoding. For Case 2 in Figure 15(b), only the voltage distributions of the P0 state estimated in the Modeling-based method shifted slightly to the left compared to the P0 state in the Ideal voltage distribution, while the P1 states in both distributions maintain a similar shape. There is a substantial overlap region between the P1 state and P0 state in the actual data distribution, leading to a relatively high RBER. Consequently, from a decoding perspective, the Ideal case decreases the number of required sensing levels for successfully decoding certain pages, contrasted with the Modeling-based method. Moreover, it can successfully decode previously failed pages using the Modeling-based method.

Fig. 15.

Second, combined with the findings in Section 3.2.4, the decoding performance of the theoretical optimal read voltage in the Modeling-based method is not the best for using as the initial read voltage. Therefore, we will analyze the two cases mentioned above to determine the best reading voltage for achieving the optimal decoding performance in the Modeling-based method. In Case 1, as shown in Figure 15, the intersection of the P0 and P1 states in the Modeling-based model is on the right side of the intersection in the Ideal voltage distribution. In this scenario, using the Modeling-based model’s optimal read voltage would calculate LLR values for some data points in the P1 state that tend toward the P0 state. Shifting the optimal read voltage to the left by a certain distance can help mitigate this effect. Similarly, in Case 2, better decoding results can be achieved by shifting the Modeling-based model’s optimal read voltage to the right. Then, we will perform separate decoding verification for wordlines in these two cases. For the first case, we will decode by shifting the optimal read voltage to the left by a certain amount, i.e., changing “DRV + \(x \times\) BVO,” where x equals a value smaller than 1. As shown in Figure 16(a), when we adjust the value of x to 0.6, the majority of wordlines require only one sensing level instead of two. For the second case, we will decode by shifting the optimal read voltage to the right by a certain amount, i.e., changing “DRV + \(x \times\) BVO,” where x equals a value larger than 1. Similarly, in Figure 16(b), when x is set to 1.5, the decoding performance of most wordlines is significantly improved. When we adjust the value of x, the reduction in the number of sensing levels is greater in the second case than in the first case. This is why in Section 3.2.4, when we set the same value of x for the initial read voltage DRV + \(x \times\) BVO for all wordlines, the decoding performance is best when x is greater than 1, rather than less than 1. However, to achieve the best results, x needs to be adjusted based on the voltage distribution characteristics of different wordlines.

Fig. 16.

4.2 Offsetted Read Voltage Method

As presented above, the LDPC decoding performance can be improved by adjusting the value of the initial read voltage. This is to counteract the deviations of voltage distributions between the Modeling-based method and the Ideal case. Due to the inter-layer variations in 3D NAND flash memory, we need to adjust the optimal read voltage values for different layers. For each read voltage between voltage states, the adjustment in optimal values is determined based on the conditions of the respective overlapping regions of two voltage states. The values of x are calculated in different scenarios to achieve proper tuning under various disturbance cases. It is worth noting that, in our experiments, the difference in voltage states between the Modeling-based method and the Ideal case does not exhibit significant changes under different layers and different disturbance cases, except that between P0 and P1 and that between P6 and P7. Therefore, we only adjusted the corresponding optimal read voltage values in these two overlapping regions. For a specific page, such as MSB, it corresponds to voltages V1 and V5. Therefore, when optimizing the LDPC decoding performance for the MSB, adjustments regarding different layers need to be made only for V1. Specifically, we quantify the width by measuring the mean difference Δ between the P0 and P1 states in Ideal voltage distribution and Modeling-based voltage distribution according to Equation (5), where \(ut_i\) represents the average voltage value of the Pi state in the Ideal voltage distribution, and similarly, \(um_i\) represents the average voltage value of the Pi state in the Modeling-based model. When the voltage state offset aligns with Case 1 like Figure 17(a), the calculated Δ-value is negative. A larger offset leads to a larger absolute value of Δ. Conversely, when the voltage state offset aligns with Case 2 like Figure 17(b), the calculated Δ-value is positive. Similarly, the absolute value of Δ becomes larger when the offset is larger. This means that based on the Δ-value of the overlapping region between P0 and P1, we can identify the shifting situation as Case 1 or Case 2.

\begin{equation} \Delta = ut_0 - um_0 - (ut_1 - um_1) \end{equation}

(5)

Fig. 17.

Therefore, we adjust the values of the initial read voltage based on the Δ-values corresponding to different overlapping regions, i.e., the value of x in “DRV + \(x \times\) BVO” according to Equation (6). \(\alpha\) and \(\beta\) are two parameters related to the flash characteristics. In our evaluated flash chips, we set the values of \(\alpha\) and \(\beta\) to 1 and 0.1 for the overlapping region between P0 and P1, respectively. When the Δ-value for a certain overlapping region is 0 using the Modeling-based method for decoding, we only need to utilize “DRV + \(1 \times\) BVO,” i.e., “ORV” as the initial read voltage for the cells within that region. If Δ is smaller than 0, the value of x will be reduced to smaller than 1 accordingly, and if it’s greater than 0, the value of x will be increased to greater than 1. As a result, the actual optimal reading voltage position will also shift to the left and right in the “ORV.” When reading the MSB page, it is necessary to adjust the initial read voltage of V1 and V5 based on the Δ-values calculated for the overlapping regions where these two initial read voltages, V1 and V5, are located. Subsequently, we utilize the Modeling-based model to perform LDPC decoding after adjusting the optimal read voltage position for different wordlines. Similarly, when decoding data on other pages such as LSB and CSB, we need to calculate the Δ-value corresponding to their overlapped regions and adjust the optimal read voltage accordingly. If this page corresponds to multiple overlapped regions with different cases described above, we can simultaneously adjust the positions of multiple optimal read voltages.

\begin{equation} x = \alpha + \beta \times \Delta \end{equation}

(6)

A further question is about the value calculations under different error conditions. Based on the experimental results in Section 3.2.2, we collect the changes in the values of Δ of the overlapping region between P0 and P1 for different wordlines under various interference conditions. To ensure that the Δ-value exhibits similar characteristics under lot-to-lot and wafer-to-wafer variations, we conduct tests on multiple sets of chips from different models to observe the trend in Δ-value variations. Figure 18 and Figure 19 respectively display the variation trends of Δ-values in the overlapping regions of P0 and P1 for two different sets of chips, chip 1 and chip 2, under different P/E cycles (ranging from 3k to 7k) and read counts (ranging from 13k to 21k). We can observe that, despite the differing trends in Δ-values between chips of two different models, chips of the same model exhibit similar trends in the variation of Δ-values even under different interference conditions. This means that we only need to obtain the Δ-values for each wordline under an arbitrary condition, and we can then adjust the optimal read voltage positions for 3D NAND flash memory under various interference conditions.

Fig. 18.

Fig. 19.

4.3 Performance Evaluation of ΔRV Method

We evaluate the ΔRV method by conducting decoding experiments on real flash chip data under different interference statistics and performing statistical analysis on the results obtained after decoding all the data. Furthermore, to reflect the impact of different numbers of sensing levels on SSD read performance during LDPC decoding, we conduct corresponding simulation tests on the SimpleSSD simulator [22], which has been widely adopted in previous work [16, 25, 27].

Figure 20 illustrates the comparison of decoding performance under different P/E cycles and read count conditions. As we can observe, the decoding performance of the Modeling-based method and ΔRV method is generally affected by different conditions. With the increase of the P/E cycle and read count, the required sensing level for successful decoding also increases, and there are variations in the decoding performance at different layers. The total sensing levels needed to successfully decode for these two methods in Figure 20(a) are 1,659 and 1,495, while in Figure 20(f) they are 6,056 and 6,046. The number may be limited by the maximum number of sensing levels, which is set to 7. The number of sensing levels required for decoding the wordlines in the first half of a block is about half of the number needed for the second half of the wordlines on average for both methods. Though the ΔRV method may not completely eliminate the inter-layer differences in decoding performance, it improves the decoding performance over the Modeling-based method.

Fig. 20.

In Figure 21, we compare the total number of sensing levels required by the three methods. Our ΔRV method requires significantly fewer sensing levels than the Modeling-based method under various interference conditions, but it is slightly inferior to the Ideal case. Specifically, in three different P/E cycle scenarios, the ΔRV method requires 18.92 \(\%\) , 11.93 \(\%\) , and 0.67 \(\%\) fewer sensing levels than the Modeling-based method, respectively.

Fig. 21.

Furthermore, we have conducted a statistical analysis of the number of wordlines within a block that experienced decoding failures under different interference conditions using the three methods. As shown in Figure 22, our ΔRV method significantly reduces the number of wordlines experiencing decoding failures and, to some extent, enhances the endurance of flash memory. In three different P/E cycle scenarios, the ΔRV method experiences 28.13 \(\%\) , 17.14 \(\%\) , and 1.55 \(\%\) fewer failed decoded pages than the Modeling-based method, respectively. Nevertheless, it’s important to mention that when the P/E cycle is set to 7k, the performance differences among the three methods are minimal. This is due to the extremely high RBER exceeding the error correction capacity of the LDPC code, and all three methods are incapable of correcting pages with a high error count, thus further reducing the performance gap.

Fig. 22.

Lastly, we configure the delay time required for adding different numbers of sensing levels on SimpleSSD. Subsequently, based on the chip-related results that present the number of sensing levels needed for decoding each page in every block under various interference conditions, we set the required number of sensing levels for each page in the simulated experiments. Following this, we test the average response time for three methods under a trace of random reads. As shown in Figure 23, similar to the previous results, the ΔRV method respectively reduces average response time by 6.73 \(\%\) , 4.64 \(\%,\) and 1.89 \(\%\) compared to the Modeling-based method, among the three P/E cycle conditions. The ΔRV method respectively has an average response time of 7.62 \(\%\) , 5.35 \(\%\) , and 2.2 \(\%\) higher than the Ideal case under the condition.

Fig. 23.

Though the Ideal case yields the best results, the Ideal case requires a substantial overhead as it needs to go through read-retry to obtain actual data, which is unachievable in practical scenarios. On the contrary, our ΔRV method only requires an adjustment of the optimal read voltage based on the Modeling-based method and the Δ-value. Furthermore, the calculation of the Δ-value is an offline process that can be computed in advance, without incurring additional time overhead.

5 Conclusion

In this article, we use the Ideal case and Modeling-based method to characterize the LDPC decoding performance on 3D NAND flash memory and summarize a series of findings, which helps us better understand the performance of LDPC in 3D NAND flash memory. In addressing the performance gap between the Modeling-based method and the Ideal case, we leverage the two cases of differences that exist between the voltage distributions from estimation and that of real chips. Based on the observations, we propose an ΔRV method, which adjusts the optimal read voltage value in the Modeling-based method through the Δ-value by quantifying the voltage state offsets of different wordlines. The proposed method significantly reduces the performance gap between the Modeling-based method and the Ideal case at minimal expense.

References

[1]

Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur Mutlu. 2017. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. Proceedings of the IEEE 105, 9 (2017), 1666–1704.

Abstract

1 Introduction

2 Background and Motivation

2.1 3D Nand Flash

2.2 LDPC Codes

2.2.1 Basics of LDPC Codes.

2.2.2 LDPC Decoding in Flash Memory.

2.2.3 Log-likelihood Ratio Calculation.

2.3 Motivation

3 Characterizing LDPC Performance in 3D NAND

3.1 Characterization Methodology

3.1.1 The Settings of the Characterized Strategies.

3.1.2 Experimental Setup.

3.2 Characterization Results and Analysis

3.2.1 LDPC Hard vs. Soft Decoding.

3.2.2 Gap between Modeling-based Method and Ideal Case.

3.2.3 Impacts from Inter-layer Variations.

3.2.4 Abnormal Behaviors in Optimal Read Voltages.

3.2.5 Observation Summary.

4 Optimizing LDPC Performance in 3D NAND

4.1 Analysis of Performance Gap

4.2 Offsetted Read Voltage Method

4.3 Performance Evaluation of ΔRV Method

5 Conclusion

References

Cited By

Index Terms

Recommendations

Improving LDPC Decoding Performance for 3D TLC NAND Flash by LLR Optimization Scheme for Hard and Soft Decision

A New Solution Based on Multi-rate LDPC for Flash Memory to Reduce ECC Redundancy

A New Solution Based on Multi-rate LDPC for Flash Memory to Reduce ECC Redundancy

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations