1. Introduction
Considering the successful development of software-implemented Artificial Neural Networks (ANN) and their increasing integration in the commercial market, it is of special interest to consider the subsequent drawbacks that the performance of these kind of neuro-inspired networks entails. The gap between the off-chip memory units and the processing units is a clear example of one of the main shortcomings that the von Neuman architectures present. Owing to this fact, for instance, the downscaling of the electronic devices and the implementation of learning algorithms for mobile applications could be compromised. Among others, the reduction of the overall energy consumption and computation time within these networks are key aspects that the artificial intelligence research community has had in its scope for several years now [
1]. As an alternative, the hardware-based neuromorphic networks have shown to be the precursor elements to step up onto a new stage in the integration of biological-inspired systems. Particularly, the RRAM technology has been taken into consideration, not only because of its integration as CMOS-compatible non-volatile memory (NVM) arrays but also because of its synaptic-like analog properties [
2,
3,
4,
5,
6,
7].
The implementation of RRAM cells as synaptic unions between artificial neurons allows the possibility to design and carry out ANNs featured by low-power consumption and low integration area [
8]. This is possible due to the multilevel approach also known as Multi-Level Cell (MLC) behavior, consisting of modulating in multiple states the resistance/conductance of the dielectric layer in the RRAM cell. To obtain the mentioned MLC behavior, different programming techniques have been reported lately, such as gradual Reset process by consecutive identical pulses [
9], applying positive and negative voltage sweeps with different stop values during the Set and Reset operations [
10,
11,
12], modifying the compliance current imposed to the cell during the Set transition [
13,
14,
15] and implementing multilevel incremental step pulses with verify algorithm (M-ISPVA) [
16,
17], among others.
Nevertheless, the existence of a gap between the device and circuit/system levels challenges the implementation of hardware-based ANN, thus the development of accurate and time-efficient RRAM models for circuit design simulations is an issue that must be tackled.
Several RRAM compact models have been developed and reviewed throughout the history of the memristor [
18,
19,
20,
21,
22,
23] also in the MLC approach [
24]. Regarding the switching behavior of the RRAM devices, different phenomena and equations govern the implementation of each reported model and therefore, they account for different computational cost, yield and accuracy.
This work is focused on modeling the MLC by using the change in the compliance current in order to switch the RRAM device between various low resistance states (LRS) and a single high resistance state (HRS). This is accomplished by connecting a NMOS select transistor in series with the memristor, constituting the so called one-transistor-one-resistor (1T-1R) structure [
17]. Furthermore, three different compact models implemented in Verilog-A are integrated in the mentioned 1T-1R structure to give insight into their viability in the multilevel approach. First of all, the Stanford-PKU model [
25] including the modification provided by Reuben et al. for multilevel operation [
26] is taken into account. Based on similar physical aspects, the Valence Change Memory model with Cylindrical shaped Filament (UGR-VCMCF) and Valence Change Memory model with Truncated-Cone shaped Filament (UGR-VCMTCF) developed by Gonzalez-Cordero et al. [
27] are analyzed as well. For other modeling approaches, a revision paper was published recently [
28], where the different memristor models were compared and described in depth. The experimental verification is driven by the RRAM devices fabricated using the 130 nm CMOS technology of IHP.
The abovementioned compact models are briefly detailed in
Section 2, while the experimental samples characteristics are described in
Section 3. The experimental verification and the subsequent modeling results are presented and discussed within
Section 4. Later, an ANN study to assess the presented multilevel approach is described in
Section 5. Finally, a set of conclusions are drawn in
Section 6.
4. Modeling Results and Discussion
In order to validate the capability of the three described models to reproduce the MLC by changing the compliance current, we obtained a set of experimental measurements using a single 1T-1R structure detailed in
Section 3. From the modeling point of view, in order to adapt the model parameters to the experimental measurements we are focused on the DC response of the devices with respect to the mentioned MLC behavior. This behavior is featured by the transition from a single HRS to three different LRSs and vice versa by modifying the compliance current imposed by the NMOS transistor during the Set operation.
The experiments were carried out by applying positive voltage sweeps between 0 V and 1.5 V with steps of 0.05 V to the top electrode (TE) of the 1T-1R cell to perform the Set operations and negative voltage sweeps between 0 V and −1.5 V with steps of −0.05 V to perform the Reset operations. The voltage sweep rate used was 0.6 V/s. The compliance current imposed to the cell was controlled by setting the gate voltage of the NMOS transistor to different values. Thereof, four conductive levels were accomplished: three LRSs corresponding to = 1 V, 1.2 V, 1.6 V, respectively, and one HRS in which = 2.7 V. We chose a gate voltage of 2.7 V to reduce as much as possible the resistance of the transistor during the Reset operation, easing the transition from the LRSs to the sole HRS. In total, 10 Set–Reset cycles were accomplished for each LRS level within the same sample.
The observed experimental results present typical RRAM behaviors such as abrupt Set operations at
and a gradual Reset transition whose
value depends on the accomplished LRS level, i.e., the more conductive the LRS is, the higher the voltage amplitude required to Reset the device.
Table 1 gathers the most remarkable electrical characteristics observed for each of the possible transitions between states.
Concerning the simulated results, the NMOS transistor model was provided by [
31] and every RRAM compact model presented within this work is tuned in to fit the median DC characteristics of the experimental measurements. All the simulations were ran in Cadence Virtuoso Analog Design Environment (ADE) [
32]. The aim of the fitting procedure is to provide with one set of parameters to each model to reproduce the MLC behavior indicated above. For this purpose, different steps were accomplished based on the methodology indicated in [
25,
26,
27]:
First, one of the LRS levels is chosen to be the reference for the fitting procedure. In this case, the median values for LRS2 are taken as a reference to establish the initial set of parameters since it is the medium level of the LRSs under study. This first step can be accomplished following the methodology proposed in [
25,
26]. If the reference LRS is not the most conductive one, the minimum gap distance achieved during this first step cannot reach the atomic distance
(0.25 nm). Otherwise, the multilevel approach cannot be accomplished for more conductive levels. Once the model is able to reproduce the reference LRS (in our case LRS2) and the single HRS, the rest of LRS levels are evaluated by modifying the gate voltage of the transistor during the Set operation and thus, the compliance current imposed to the cell. Further adjustments might be applied to the previous set of parameters in order to match the rest of the median curves for the different resistance levels. The fitting parameters
and
have special influence in the conductance difference between levels (see (
3)). It is mandatory to achieve different minimum gap distances for every single LRS level to reproduce the MLC behavior due to the fact that the shorter the gap distance, the more conductive the resistance state is.
Table 2 exposes the achieved gap distances for each conductive level for the three models.
Figure 3 shows the comparison between the experimental and the simulated results extracted from the different models for each of the LRSs considered within this work. More precisely,
Figure 3a–d make reference to the Stanford-PKU model extended with multilevel capability configured with the set of parameters exposed in
Table 3.
Figure 3e–h account for the UGR-VCMCF model fed with the parameters exposed in
Table 4.
Figure 3i–l make reference to the UGR-VCMTCF model configured as indicated in
Table 5. All the simulations were carried out setting the HRS as the initial state (
) thus, a Set operation is initially performed followed by a Reset of the cell.
In general terms, despite the limited complexity of the models, the experimental results are reproduced in an acceptable way. The gap distances exposed in
Table 2 match the required current levels for both, the LRSs and the sole HRS showing that this parameter is the key aspect to be considered in order to reproduce the MLC by changing the compliance current. Regarding the knee point of the Reset transition, it can be correctly simulated in all the conductive states except for LRS1 (see
Figure 3b,f,j), where the maximum voltage difference for the simulated and experimental
is up to 0.15 V. Reducing
helps to minimize the mentioned voltage difference, keeping the knee point of the Reset transition in LRS2 (reference state). However, due to the linear dependence of
I with respect this parameter (see (
3)), this would lead to a reduction of the
in LRS3 as well, enlarging the voltage difference between the simulated and the experimental results in this concrete point.
Due to the cycle-to-cycle variability of the experimental data measured in LRS3 (see the grey lines in
Figure 3d,h,l), the HRS curve for positive voltage values presents lower median current levels as well as a higher
with respect to LRS1 and LRS2. Thus, even though the simulated results fulfill the ideal behavior of LRS3, they differ from the experimental results in both mentioned aspects.
Lastly, since the memristor charge conduction is filamentary, device-to-device (D-D) variability, usually linked to technological differences in the fabrication process, is not studied. It is the cycle-to-cycle (C-C) variability during programming the factor that was considered in the experimental measurements. Modeling approaches linked to time series have been given in the literature [
33,
34] that can be implemented. Other modeling schemes linked to Gaussian distribution functions and Monte Carlo simulations within circuit simulators are also possible (in this case for D-D and C-C variability). The device-to-device variability that shows up during programming could be also modeled along the line described in [
35], as well as the cycle-to-cycle variability. From the memory characterization point of view, retention time and endurance properties were previously studied both experimentally [
16,
36] and by simulation [
37] from another perspective in our technology.
5. A Neural Network Study to Assess the Multilevel Approach
As it was mentioned in
Section 1, the implementation of RRAM cells as synaptic unions within hardware-based ANNs is gaining momentum [
2,
3,
4,
5,
6,
8]. Regarding the MLC behavior taken into account in this work, it is convenient to study the performance of ANNs in this context. That implies the reduction of the precision of the synaptic weights from floating point numbers (used in conventional software-based ANNs) to a limited number of values, that is, the resistance levels defined in RRAM devices as shown in
Section 4.
First, we used the SciKit-Learn [
38] software tool to implement and train a Multilayer Perceptron (MLP) in order to recognize and classify handwritten digits. The selected dataset for the experiment is the well-known MNIST image dataset [
39]. It is composed of
pixel images of 70,000 handwritten digits (labeled in the interval
), divided into a training set of size 60,000 digits and a test set of size 10,000 digits. This classifier optimizes the log-loss function using stochastic gradient descent (with the back-propagation algorithm) and rectified linear unit functions (e.g.,
) [
40]. The training process was performed during 500 epochs (cycles through the full training dataset) or until the loss (or score, a prediction error of the ANN) improves by a factor of
.
We have considered different quantization strategies making use of 2, 4 and 8 levels in order to explore the accuracy of the ANN inference. In this respect, we have employed a quantization approach in line with the multilevel implementation described above for our RRAM devices (4 levels), although we also included other multilevel possibilities for the sake of completeness (2 and 8 levels). The options considered here were: Uniform-ASYMM and Uniform-SYMM, as suggested in [
41]. Both methods are linear, range-based and quantify a given input data
into
using
n bits. For instance, for
they transform the original data into
levels, which is the case we have considered at the device resistance level in
Section 4. Uniform-ASYMM (see (
8)) is an asymmetric approach that maps the minimum and maximum of the float range to an integer range with a quantization bias term. Uniform-SYMM (see (
9)) is a symmetric approach that maps the original data to the quantized range with the maximum absolute value of the minimum or maximum of the original data. In this latter case, there is no quantization bias term and it is symmetric around zero.
The architecture of the ANN adapted to the MNIST dataset is depicted in
Figure 4. In order to simulate the usage of our ANN in a RRAM-based environment, all the synaptic weights were quantized by using the functions (
8) and (
9) and the number of levels (four) determined in our study, after the training process and before the final prediction on the performed tests.
The resistance levels of the memristors are connected to the use of Equations (
8) and (
9) for the quantization of the ANN floating point weights obtained after training. In this respect, taking into consideration that weights can be positive and negative, the implementation could be performed using two memristive devices per synapsis as demonstrated in a previous work [
42].
In summary, taking into account the mentioned quantization schemes, the inference was carried out with four synaptic levels (2 bits). In total, 1, 2 and up to 3 hidden layers and 32, 64, 128, 256 and 512 perceptrons (neurons) included within each hidden layer are considered as well.
The balanced accuracies (i.e., the number of true positives divided by the total number of elements that actually belong to a class digit in our case) obtained by using the described set of parameters are shown in
Table 6. The Uniform SYMM works better than the Uniform ASYMM when using four levels of quantization by an average difference of
, for all the number of hidden layers and perceptrons per layer. Nevertheless, the accuracy obtained by using the Uniform SYMM scheme with 4 levels is lower than a conventional MLP with no quantized synaptic weights by an average difference of
.
Finally, in order to assess the impact of the number of conductive levels per device on the performance of the ANN, two additional number of levels, namely, 2 (1 bit) and 8 (3 bits), were tested. Considering the implementation of RRAM devices as synaptic unions between artificial neurons, it can be assumed that these numbers of levels correspond to the number of resistance levels achieved by the basic digital behavior of the RRAM devices (two levels) and by the MLC behavior with eight resistance levels.
Table 6 shows the balanced accuracies obtained with the three mentioned number of levels.
The Uniform ASYMM scheme works better than the Uniform SYMM scheme only when the number of quantizied levels is 2 by an average difference of in the balanced accuracy. However, it seems that the Uniform ASYMM improves slower than its counterpart with the increase of the number of levels.
Both quantization methods achieve almost the same results as the original ANN (without quantization) when the number of levels increases, as it is also observed in [
41]. More precisely, when the number of levels is 8, the quantized version of the ANN may obtain better results than the conventional ANN. This is due to the fact that the ANN without quantization is overfitting the training data, a common consequence observed in ANN architectures which report small training error [
43]. On the contrary, due to the downgrade of the weight precision in the quantized versions, the overfitting issue is mitigated during the training period and the balanced accuracies are slightly enhanced with respect to the conventional ANN. This highlights the advantages of increasing the number of levels from 4 to 8 in the MLC behavior of the RRAM cells.
An interesting observation is the fact that by using more perceptrons in each hidden layer the balanced accuracy tends to improve in all cases, while the number of hidden layers seems to mitigate the low number of perceptrons per layer (e.g., 32 perceptrons per layer).
While investigating the coefficient (weight) distribution in each level, we discovered that not all the levels were used for quantization in the Uniform SYMM quantization scheme (data not shown). This feature could be further studied to select a lower number of levels, not in the form of , which could be useful for tryouts.
Concerning device variability, further research in the ANN context indicates that it does not affect the outcome of ANNs in the same way as in other hardware systems [
44]. Due to the particular features of neural networks, in some cases, variability in the synaptic weights produces better accuracy than the ideally quantized ANN without variability. This behavior is caused by the random nature of the variability introduced, which, in some cases may shift the values of the synaptic weights closer to the optimal ANN without quantization (data not shown). Furthermore, the variability introduced in the synaptic weights may cope with a small amount of overfitting. Finally, variability do not prevent the technology from being used in a multilevel scheme in different Artificial Intelligence hardware accelerators since some prototypes have been fabricated based on these type of devices [
45,
46].