Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to severe security threats in their large-scale deployment. Thus, finding adversarially robust DNN architectures for non-ideal crossbars is critical to the safe and secure deployment of DNNs on the edge. This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS that searches for hardware efficient and adversarially robust neural architectures for non-ideal crossbar platforms. We use the one-shot Neural Architecture Search approach to train a large Supernet with crossbar-awareness and sample adversarially robust Subnets therefrom, maintaining competitive hardware efficiency. Our experiments on crossbars with benchmark datasets (SVHN, CIFAR10, CIFAR100) show up to ~8–16% improvement in the adversarial robustness of the searched Subnets against a baseline ResNet-18 model subjected to crossbar-aware adversarial training. We benchmark our robust Subnets for Energy-Delay-Area-Products (EDAPs) using the Neurosim tool and find that with additional hardware efficiency–driven optimizations, the Subnets attain ~1.5–1.6× lower EDAPs than ResNet-18 baseline.
1 Introduction
Today, Neural Architecture Search (NAS) has achieved great feats in reducing human effort for designing neural architectures specific to a plethora of tasks, such as image classification, image segmentation, and natural language processing, among others [7, 28, 35]. In the recent years, there has been a growing interest in using NAS as a tool to explore high-performance and hardware-efficient deep neural networks (DNNs) tailored for platforms such as CPUs, GPUs, and FPGAs and Compute In-Memory (CIM) architectures [8, 9, 20, 27, 33]. The goal is to maximize accuracy of the NAS-derived DNN models on these hardware platforms while achieving increased energy efficiency, throughput, and area-utilization.
Memristive crossbar-based CIM has become popular as an alternative platform addressing the “memory-wall” bottleneck of von-Neumann architectures. Although memristive crossbars have the ability to perform compact and energy-efficient Multiply-and-Accumulate (MAC) operations with high throughput, they are susceptible to various non-idealities owing to their analog nature of computing [3, 4, 10, 17, 18, 19]. These non-idealities include interconnect parasitics, memristor device variations, and so on, that degrade the performance (accuracy) of the DNN architecture mapped onto crossbars. In fact, the adversarial security of DNNs is highly compromised on CIM platforms that are prone to non-idealities [2, 6, 26, 30]. Adversarial attacks are structured, yet small perturbations on the input that fool a DNN cause high confidence misclassification [14, 15]. Today, addressing adversarial vulnerabilities of DNNs has become an active area for research [15, 24]. There have been recent works that suggest mitigation of non-idealities to improve the adversarial robustness of crossbar-mapped DNN models [2, 30]. But these strategies are DNN architecture and underlying crossbar platform specific, requiring expensive re-training of the model when the crossbar size or related specifications change. No prior work has focused on obtaining DNN architectures that are adversarially robust in the presence of non-idealities for multiple crossbar sizes, while maintaining competitive hardware efficiency.
In this work, we propose XploreNAS, a crossbar-aware NAS method that searches for optimal hardware-efficient and adversarially robust neural architectures (Subnets) for CIM platforms from a parent Supernet. We propose a two-phase hardware-aware Supernet training methodology that identifies a series of operations or paths within the Supernet architecture that collectively provide resilience against the impact of intrinsic crossbar-level noise and white-box adversarial perturbations. In addition to unleashing non-ideality-aware adversarial robustness, XploreNAS allows efficiency-driven hardware optimizations during the architecture search phase to find Subnets that are resource efficient on crossbar-based platforms and expend low Energy-Delay-Area-Product (EDAP) on hardware (see Figure 1).
Fig. 1.
In summary, the key contributions of our work are as follows:
•
We propose a two-phase NAS approach based on algorithm-hardware co-optimization, called XploreNAS, to build adversarially robust DNN architectures suited for non-ideal memristive crossbar arrays.
•
We conduct our experiments on non-ideal crossbars with sizes ranging from 32 \(\times\) 32 to 128 \(\times\) 128 and show that our Subnets are up to \(\sim\)8–16\(\%\) more robust than a baseline adversarially trained ResNet-18 model against strong adversarial attacks.
•
We introduce hardware efficiency–driven optimizations during XploreNAS to implement highly efficient Subnets on crossbars achieving \(\sim\)1.5–1.6\(\times\) lower EDAPs (benchmarked using the Neurosim tool [12]) than a baseline ResNet-18 architecture, in addition to being adversarially robust.
•
We also find that XploreNAS-derived Subnets can show significant adversarial resilience over a range of crossbar sizes and device variations during inference without the need for re-training.
The remainder of the work is organized as follows. Section 2 discusses related works in the areas of NAS-based solutions to relieve the adversarial accuracy of DNNs on software as well as non-NAS works based on non-ideality mitigation in CIM platforms to better the robustness of crossbar-mapped DNNs. In Section 3, we present some background about adversarial attack generation, followed by depiction of dot-product computations in non-ideal memristive crossbars. Thereafter, we provide a detailed description of the XploreNAS methodology, referred to in Figure 1, followed by Subnet selection and fine-tuning in Section 4. Section 5 enlists all the evaluation metrics to assess the performance of our XploreNAS-derived Subnets against corresponding baselines. Section 6 presents our experimental setup followed by results and discussion. In Section 7, we present the contributions of XploreNAS in regard to related works in a tabular form and, finally, conclude our work in Section 8.
2 Related Works
Works such as Guo et al. [16] have proposed a fully software-based solution to train an over-parameterized network (called Supernet) using one-shot NAS [23] and sample multiple Subnets therefrom. These Subnets are fine-tuned to yield state-of-the-art adversarial robustness on benchmark datasets such as CIFAR10, Imagenet, and so forth. However, this approach involves the cost of training or fine-tuning a large number of Subnets drawn from the parent Supernet. The NAS methodology in References [16] is completely hardware-agnostic, and, hence, such models on crossbars with non-idealities will suffer huge degradation in their adversarial performance.
Concerning NAS for analog CIM platforms, a recent work called NAX [27] explores the design space to determine appropriate kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy in presence of non-idealities. The DNN architectures derived using NAX have heterogeneous crossbar sizes across different convolutional layers to achieve optimal tradeoff between EDAP and application accuracy in presence of non-idealities. However, such heterogeneous architectures are not practical and are challenging to manufacture, since different-sized crossbars will require peripheral circuits to be custom designed [21, 34]. Moreover, no prior work in connection with NAS for analog crossbars has considered adversarial robustness as a key goal for optimization.
There have been several previous works such as References [2, 5, 6, 30] that have used non-ideality-driven techniques to improve the adversarial robustness of pretrained DNN models on memristive crossbar arrays. Recent works such as SwitchX [6] and NEAT [2] propose mapping DNNs onto crossbars in a manner that increases the proportion of low conductance synapses in each crossbar array. This helps in non-ideality mitigation, whereby the adversarial robustness of the DNN models is significantly improved on hardware. But, all these approaches have considered a fixed DNN model and do not propose architectural modifications to mitigate the effects of crossbar noise or adversarial noise. In contrast, our XploreNAS approach is driven toward architecture search to obtain an optimal tradeoff between crossbar-based adversarial accuracy and EDAP.
3 Background
3.1 Adversarial Attacks
Adversarial attacks have been shown to degrade a DNN’s performance by introducing structured but small perturbations to the clean inputs, leading to high confidence misclassification. In this work, we use one of the strongest known gradient-based adversarial attacks in literature, Projected Gradient Descent (PGD). The PGD attack, shown in Equation (1), is an iterative attack over \(n\) steps. In each step \(i\), perturbations of strength \(\alpha\) are added to \(x_{adv}^{i-1}\). Note that \(x_{adv}^{0}\) is created by adding random noise to the clean inputs \(x\), \(\theta\) denotes the DNN model’s weight parameters, \(y_{true}\) denotes the correct prediction labels for the inputs \(x\), and \(\mathcal {L}\) the cross-entropy loss function. Additionally, for each step, \(x_{adv}^{i}\) is projected on a Norm ball [24], of radius \(\epsilon\). In other words, we ensure that the maximum pixel difference between the clean and adversarial inputs is \(\epsilon\),
In this work, we will use the notation PGD-\(n\) to denote a PGD attack iterated over \(n\) steps. Further, we consider white-box attacks in our experiments, where the attacker is assumed to have full knowledge of the target model and dataset.
3.2 Memristive Crossbars and Non-Idealities
Memristive crossbars consist of two-dimensional (2D) arrays of Non-Volatile-Memory (NVM) devices, Digital-to-Analog Converters (DACs), Analog-to-Digital Converters (ADCs), and a write circuit. The synaptic devices at the cross-points are programmed to a particular value of conductance (between \(G_{MIN}\) and \(G_{MAX}\)) during inference. The MAC operations are performed by converting the digital inputs to the DNN into analog voltages on the Read lines using DACs and sensing the output current flowing through the Bitlines using the ADCs [2, 18, 19]. In other words, the activations of the DNNs are fed in as analog voltages \(V_i\) to each row and weights are programmed as synaptic device conductances (\(G_{ij}\)) at the cross-points as shown in Figure 2. For an ideal crossbar array, during inference, the voltages interact with the device conductances and produce a current (governed by Ohm’s law). Consequently, by Kirchoff’s current law, the net output current sensed at each column \(j\) is the sum of currents through each device, i.e., \(I_{j(ideal)} = \Sigma _{i}^{}{G_{ij} * V_i}\). We term the matrix \(G_{ideal}\) as the collection of all \(G_{ij}\)’s for a crossbar.
Fig. 2.
In reality, the analog nature of the computation leads to various hardware noise or non-idealities, such as interconnect parasitic resistances and NVM device-level variations [10, 19, 25]. This results in a \(G_{non-ideal}\) matrix, with each element \(G_{ij}^{\prime }\) incorporating the impact of the non-idealities. Consequently, the net output current sensed at each column \(j\) in a non-ideal scenario becomes \(I_{j(non-ideal)} = \Sigma _{i}^{}{G_{ij}^{\prime } * V_i}\), which deviates from its ideal value. This manifests as accuracy degradation for DNNs mapped onto crossbars. Larger crossbars entail greater non-idealities, resulting in higher accuracy losses [5, 10, 19].
4 METHODOLOGY OF XploreNAS
Our NAS methodology is based on the conventional one-shot learning using DARTS [1, 23], which is adapted to include the impact of crossbar noise along with adversarial training [24]. The entire process of training an over-parameterized network (Supernet) that includes a number of operations, and ultimately obtaining an optimal neural network configuration (Subnet), is described as follows:
4.1 Supernet Architecture
Figure 3 presents our Supernet architecture. There are four residual blocks highlighted in green, from R-I to R-IV. At end of network after R-IV, there is an average pooling layer with a stride of 2 and kernel size of 3 \(\times\) 3 followed by a full-connected classifier layer. In each residual block, we have an Op-1 and an Op-2 operation choices, each followed by a batchnorm layer and a ReLU function. The Op-1 operation choice constitutes convolution operations with 3 \(\times\) 3 and 5 \(\times\) 5 kernel sizes (i.e., Conv3x3 and Conv5x5). The Op-2 operation choice constitutes of average pooling with a stride of 1 and kernel size of 3 \(\times\) 3 (AvgPool), Conv3x3, Conv5x5, and a skip-connection (implying no operation). Note that a downsampling block (marked as \(D\)) is used at the end of residual blocks R-I, R-II, and R-III, since the feature sizes of the corresponding inputs and outputs of these residual blocks are unequal. The downsampling block includes a Conv3x3 operation (with a stride of 2) followed by a batchnorm layer.
Fig. 3.
Each constituent operation choice in Op-1 and Op-2 is associated with a parameter \(\alpha\) called the architecture parameter in the Supernet. Let us suppose that a given Op-1 operation in the Supernet is associated with the architecture parameters \(\alpha _{1}\) and \(\alpha _{2}\) for the constituent Conv3x3 (\(o_1\)) and Conv5x5 (\(o_2\)) operations, respectively. If \(p_{1}\) and \(p_{2}\) are, respectively, the softmax of \(\alpha _{1}\) and \(\alpha _{2}\), then the output at the end of the Op-1 operation (\(m_{op-1}\)) is computed using DARTS as follows:
Here, \(p_j\)\(\epsilon\) [0,1] pertaining to each constituent operation is referred to as a probability coefficient. Similarly, for a given Op-2 operation that has four constituent operations (AvgPool (\(o_1\)), Conv3x3 (\(o_2\)), Conv5x5 (\(o_3\)), and skip-connection (\(o_4\))), the output (\(m_{op-2}\)) is computed as follows:
Here, we describe the manner in which weight matrices of different DNN layers are mapped onto crossbar arrays. The entire procedure is carried out in Python for better integration between the software model and the hardware mapping framework. In Figure 4, we have a Python wrapper built to unroll each and every convolution operation in the software DNN model into MAC operations between input activation matrices and their corresponding weight matrices. For a given convolutional layer, its 4D weight matrix is reshaped to a 2D matrix \(W\). The matrices obtained are then zero-padded (in case the dimensions of the 2D weight matrix \(W\) are not exact multiples of the crossbar size \(n \times n\)) and then partitioned into multiple \(n \times n\) crossbar arrays consisting of DNN weights at the synapses (modelled as conductances). Here, we assume that the NVM devices at the synapses in the crossbars can be programmed to a conductance level between \(R_{MIN}=100\) k\(\Omega\) and \(R_{MAX}=1\) M\(\Omega\), typical for ReRAM devices. For this range of conductance, the impact of crossbar non-idealities due to parasitic interconnect resistances is minimized [31]. Thus, the NVM device-level non-linearities or stochasticity are the key players that impact the accuracy of DNNs deployed on such crossbars. Next, we add synaptic device variations to the weights mapped in the crossbar arrays. The device variations for each \(n \times n\) crossbar are modelled using a Gaussian distribution with \(\sigma /\mu\) of 35% (assuming 8-bit precision of DNN weights when mapped to the NVM synapses in the crossbars) [11, 32]. Note that this noise-profile is specific to the crossbar size under consideration. Finally, these noisy weights (\(W_{noisy}\)) are then integrated into the original Pytorch-based DNN model to facilitate crossbar-aware evaluation.
Fig. 4.
4.3 Crossbar-aware Training of the Supernet
Having proposed the Supernet architecture, we now describe the methodology adopted for carrying out crossbar-aware NAS (XploreNAS). One epoch of training the Supernet architecture occurs in two phases (see Algorithm 1) as follows.
4.3.1 Phase-1: Training the Weight Parameters.
A batch of clean inputs with a large batch-size (1,000 in this work) is randomly sampled from the training dataset and forwarded through the Supernet once. Thereafter, the weight parameters for all the layers in the Supernet are updated using backpropagation. Note that during Phase-1, we do not include any hardware-related parameter during the weight update. Also, the architectural parameters (\(\alpha\)) are unaffected in Phase-1.
4.3.2 Phase-2: Training the Architecture Parameters with Hardware-awareness.
After Phase-1, the weight matrices for the different convolutional layers in the Supernet are partitioned into numerous \(n \times n\) non-ideal crossbar arrays and integrated with crossbar-level noise using the methodology described in Section 4.2 as shown in Figure 4. Note that in Phase-2, the noisy weights are frozen and only the architecture parameters (\(\alpha\)) are trained using backpropagation. The Supernet integrated with crossbar-level noise (for a given crossbar size) is subjected to ensemble of adversarial images from the validation dataset (a subset of the original training dataset) sent in batches. For a given batch, adversarial images are generated either using PGD-7 or PGD-20 attack randomly. The architecture parameters are updated to assign higher probability coefficients to a series of operations or paths through the Supernet that are more resilient to the impact of crossbar noise as well as adversarial noise. We repeat the above steps and train the Supernet for a few epochs, each epoch consisting of Phase-1 followed by Phase-2 training.
4.4 Deriving the Optimal Subnet from the Trained Supernet
Finally, based on the trained architecture parameters, we sample the optimal Subnet from the over-parameterized Supernet by modifying Equations (2) and (3) based on the following criteria:
In other words, if probability coefficient for a constituent operation of Op-1 or Op-2 is greater than a threshold \(th\), then we retain the operation in the computation of the output at the end of Op-1 or Op-2. Otherwise, the operation is skipped. In this way, we derive our single Subnet with optimal number of operations from the trained Supernet architecture. In this work, the value of \(th\) is heuristically chosen to be 0.2, for the Subnet to achieve substantially higher robustness against the baselines.
4.5 Fine-tuning the Subnet
Having obtained the Subnet architecture, we train the weights of the model using backpropagation by feeding an ensemble of clean and adversarial images (from the training dataset). Note that, similarly to Phase-2, we include crossbar-specific noise in the weights of the Subnet during the fine-tuning phase. Further, the adversarial images used in this stage can be generated either using white-box PGD-7 or PGD-20 attack.
4.6 Deriving Three Types of Subnet Models
In this work, we derive three types of Subnet models, namely the following:
Model Xbar: This is the standard scenario wherein, the noise-profile specific to a given crossbar size is integrated with the model weights during Phase-2 of Supernet training and Subnet fine-tuning.
Model Xbar_Ar: Here, in addition to adding crossbar-specific noise profile to weights in the Supernet, we also regularize the loss function during Phase-2 so as to search for a Subnet architecture that is optimized to have minimal resource-utilization on memristive crossbars. The simplest way to accomplish this is to optimize the architectures with respect to crossbar-area consumption. We follow the differentiable approach proposed in Reference [9] to optimize crossbar-specific area utilization during Phase-2. Let \(\phi\) denote the hardware parameter (area) that needs to be optimized. Thus, for Op-1 or Op-2 constituting multiple paths or operations, each with a given probability coefficient, we have the following expression for the expected value of \(\phi\), i.e., \(E[\phi ]\):
where \(\phi _j\) denotes the area estimate for the \(j{\rm th}\) constituent operation in Op-1 or Op-2, and all the possible values for \(\phi _j\) for a given crossbar size can be found in a lookup table containing the area estimates for all the operations in the search space. Now, by summing \(E[\phi ]_{op-1~or~op-2}\) over all the Op-1 and Op-2 operations in the Supernet model, we get the cumulative expected value of \(\phi\) for the Supernet (denoted as \(E[\phi ]_{total}\)), which needs to be minimized. Hence, the loss function in Phase-2 (\(\mathcal {L}_{Phase-2}\)) is written as follows:
Here, \(\mathcal {L}_{CE}\) denotes the cross-entropy loss and \(\lambda\) is a hyperparameter that control the relative importance given to \(E[\phi ]_{total}\) with respect to \(\mathcal {L}_{CE}\).
Model MultiXbar: Here, our objective is to search for a Subnet architecture that is not crossbar specific but rather robust across multiple crossbar sizes. Let us suppose a scenario wherein we want to obtain a Subnet that is robust against hardware-noise pertaining to 32 \(\times\) 32, 64 \(\times\) 64, and 128 \(\times\) 128 crossbars. Thus, during Phase-2, for a batch of adversarial inputs fed into the Supernet, we first add the noise-profile for 32 \(\times\) 32 crossbars to the weights, freeze them, update the architecture parameters, and restore the noise-free weights. Again, for the same batch of inputs, we add the noise-profile for 64 \(\times\) 64 crossbars to the weights, freeze them, update the architecture parameters and restore the noise-free weights. Finally, the same process is repeated for 128 \(\times\) 128 crossbars. In this manner, we ultimately arrive at a Subnet architecture that is resilient to the noise-profiles of multiple crossbar sizes. Thereafter, at the end of Phase-2 training, we derive the optimal Subnet (Equations (4) and (5)) and then fine-tune (see Section 4.5) our Subnet using an ensemble of noise-profiles pertaining to different crossbar sizes.
5 Metrics for Assessment
In this section, we define the metrics that are used to evaluate the Subnets as well as our baseline models on memristive crossbar arrays. These are as follows:
Adversarial accuracy on crossbars: This denotes the PGD-\(n\) accuracy of the trained Subnets and baselines on non-ideal crossbar arrays during inference. The higher the value of adversarial accuracy, the better the robustness of the DNN deployed on crossbars.
Hardware efficiency using EDAP: We compute the overall EDAP for a DNN during inference on a memristive crossbar-based hardware platform using the Neurosim tool. Neurosim [12] is a Python-based hardware-evaluation platform that performs a holistic energy-latency-area evaluation of analog crossbar-based DNN accelerators. The EDAP evaluations using Neurosim include contributions of the dot-product processing engines (i.e., the crossbars) as well as the peripheral circuits (DACs, ADCs, buffers, and so forth) and on-chip interconnects. Note that in this work, all EDAPs are shown in the units of mJ ms mm\(^2\). Th higher the EDAP, the better the hardware efficiency of the DNN model. We calibrate the Neurosim environment for ReRAM crossbar arrays with specifications listed in the table inscribed within Figure 4.
Average crossbar underutilization: As discussed in Section 4.2, if the dimensions of the 2D weight matrix of a DNN layer are not exact multiples of the crossbar size, then the weight matrix is zero padded and partitioned into multiple crossbar arrays. This zero padding effectively results in the underutilization of certain crossbar arrays, i.e., additional hardware area, energy, latency and leakage power are expended in the computation and processing of certain non-useful dot-products. Let us assume that the dimensions of a 2D weight matrix (\(in\_ch*k^2\), \(out\_ch\)) are not multiples of crossbar size \(n \times n\). Then, the amount of zero-padding along the rows is given by \(r_{pad} = (in\_ch*k^2)+(in\_ch*k^2)\%n\), where the operator % denotes the remainder operation. Likewise, the amount of zero-padding along the columns is given by \(c_{pad} = out\_ch+out\_ch\%n\). Thus, we define the value of crossbar-underutilization for the given 2D weight matrix as follows:
If there are \(l\) convolutional layers in a DNN whose weight matrices are to be zero-padded, then average crossbar-underutilization for the DNN is defined as the mean of the values of crossbar-underutilization for the \(l\) layers.
6 Experiments and Results
We conduct our experiments with benchmark datasets- SVHN [29], CIFAR10, and CIFAR100 [22], using Pytorch. We construct the validation set by selecting 5,000 images randomly from the overall training dataset. The remainder of images constitute our training dataset. Note that the test dataset used to evaluate the inference accuracy of the DNN models on hardware is completely different from the training and validation sets and is never used during the Supernet training or Subnet fine-tuning stages. We perform the two-phase crossbar-aware training of the Supernet for 60 epochs using Algorithm 1. The crossbar sizes considered in this work include 32 \(\times\) 32, 64 \(\times\) 64, and 128 \(\times\) 128. We generate Model XBar, Model Xbar_Ar, and Model MultiXbar using XploreNAS. These Subnets are fine-tuned using Adam Optimizer for 40 epochs (see Section 4.4) and used for inference on crossbars. Unless otherwise stated, the crossbar-related parameters used for training and inference are specified in the table inscribed within Figure 3(b). Note that all PGD-\(n\) adversarial attacks are white-box in nature with \(\alpha =2/255\) and \(\epsilon =8/255\). Our baseline is a ResNet-18 architecture adversarially trained for 40 epochs using Adam Optimizer with an ensemble of PGD-7 and PGD-20 inputs in presence of crossbar-specific noise for a given crossbar size. The crossbar-aware clean accuracies (i.e., inference accuracy in absence of adversarial attack) of the XploreNAS-derived Subnets and ResNet-18 baselines on 64 \(\times\) 64 crossbars have been listed in Table 1. It should be noted that all of these models have been adversarially trained in presence crossbar-specific noise profiles. The rigorous adversarial training in presence of an ensemble of noise-profiles pertaining to different crossbar sizes for Model MultiXbar limits its clean accuracy compared to the other XploreNAS-derived Subnets, although we will see in the following sections that Model MultiXbar achieves the best hardware-aware adversarial robustness across multiple crossbar sizes with competitive EDAP benefits.
Table 1.
Dataset
Model
Xbar Clean Accuracy (%)
CIFAR10
ResNet-18 (Baseline)
80.06
Model Xbar
84.32
Model Xbar_Ar
83.87
Model MultiXbar
81.03
CIFAR100
ResNet-18 (Baseline)
51.36
Model Xbar
55.82
Model Xbar_Ar
54.62
Model MultiXbar
51.89
Table 1. Hardware-aware Clean Accuracies of the XploreNAS-derived Subnets and ResNet-18 Baseline on 64 \(\times\) 64 Crossbars for CIFAR10 and CIFAR100 Datasets
l7.5cm
Now, we present Figure 5 that helps us understand the usefulness of our proposed hardware-aware XploreNAS approach. The results are for 64 \(\times\) 64 crossbars on CIFAR10 dataset. In Figure 5(a), we find that Model Xbar achieves a high degree of adversarial robustness on hardware compared to a purely software-based adversarially trained DNN model derived via NAS as per [16], when inferred on crossbars with non-idealities. Note, the purely software-based model (referred to as SW-NAS) achieves state-of-the-art adversarial accuracy (\(\sim\)52–56\(\%\)) on software using CIFAR10 dataset against white-box PGD attacks. However on crossbars for weaker attack (PGD-2), the adversarial accuracy of SW-NAS witnesses a huge drop to \(\sim 20\%\), while our Model Xbar achieves \(\sim 74\%\) adversarial accuracy. For stronger attack (PGD-20), the accuracy of SW-NAS is at \(\sim 14\%\), while Model Xbar still achieves \(\sim 13\%\) higher accuracy than SW-NAS.
Fig. 5.
Figure 5(b) presents a radar chart that helps us understand the usefulness of our proposed NAS approach across multiple dimensions. Here, we look at the EDAP per inference estimated using the Neurosim tool, normalized with respect to the baseline ResNet-18 model. We find that the SW-NAS model has \(\sim 4.5\times\) higher EDAP than Model Xbar, while the ResNet-18 baseline is slightly more hardware efficient than Model Xbar (which is not as optimized as its Model Xbar_Ar counterpart). Although Model MultiXbar achieves marginally higher clean accuracy with regard to the ResNet-18 baseline as shown in Table 1, it provides the best tradeoff between adversarial accuracy and EDAP on hardware, achieving \(\sim 30\%\) higher accuracy than SW-NAS for stronger PGD attack (PGD-20) and has \(\sim 2\times\) lower EDAP than the ResNet-18 baseline. Another dimension to look at in Figure 5(b) is the average crossbar-underutilization. As discussed in Section 5, higher the crossbar-underutilization, greater is the hardware energy and area expended on the processing of non-useful dot-product computations (or MAC operations). SW-NAS, being optimized in a hardware-agnostic manner, has a layerwise architecture that is not crossbar-friendly and leads to a huge average crossbar-underutilization of \(\sim 71\%\). For our Model Xbar, the crossbar-underutilization gets reduced to \(\sim 49\%\) and is restricted only to the Op-1 operation prior to the R-I residual-block of our Supernet. This is attributed to our crossbar-centric choice of kernel sizes and input/output channels in the convolutional layers of the Supernet model.
6.1 Analysis of Adversarial Accuracy of Subnets
The results in Figure 6(a) and (c) are for CIFAR10 dataset, Figure 6(b) is for CIFAR100 dataset and Figure 6(d) for SVHN dataset. In Figure 6(a), we find that Model Xbar corresponding to crossbar sizes ranging from 32 \(\times\) 32 to 128 \(\times\) 128 are more adversarially robust (\(\sim\)2–4\(\%\)) than their Model Xbar_Ar counterparts. This is because Model Xbar is purely optimized to maximise adversarial accuracy on noisy crossbars, while Model Xbar_Ar is optimized to yield crossbar-efficient architectures for hardware-constrained scenarios in addition to maximising adversarial robustness. However, both Model Xbar and Model Xbar_Ar outperform their corresponding ResNet-18 baselines in terms of adversarial robustness across all the crossbar sizes considered.
Fig. 6.
Furthermore, in Figure 6(c) we find that Model MultiXbar optimized for crossbar noise pertaining to sizes ranging from 32 \(\times\) 32 to 128 \(\times\) 128 attains significantly higher robustness with respect to the Model Xbar counterparts, specifically for stronger PGD-20 attacks (\(\sim\)15–16\(\%\) better robustness). This is because for each training epoch during Subnet fine-tuning in Model MultiXbar, we update weights thrice (corresponding to three different crossbar noise-profiles) for a given batch of inputs. Thus, the Subnet attains inherently higher hardware noise-aware adversarial robustness than the three Model Xbar architectures each pertaining to a specific crossbar size. This analysis also shows that XploreNAS can derive a Subnet architecture that can be inferred on crossbar arrays of varying sizes with high degree of adversarial resilience across weak and strong PGD attacks. Similar results can be seen for the Subnets trained with CIFAR100 dataset as depicted in Figure 6(b) with Model MultiXbar outperforming the corresponding Model Xbar by a margin of \(\sim\)4–8\(\%\) across the PGD-10 and PGD-20 adversarial attacks. For the plot in Figure 6(d) for SVHN dataset, we show results for weak attacks (PGD-2 and PGD-5), since stronger attacks using a simple dataset like SVHN leads to random test accuracy for both NAS-derived and ResNet-18 models. We see similar trends as that of CIFAR10 dataset. Note that here Model Xbar and Model Xbar_Ar comprise three corresponding Subnets/DNNs for three different crossbar sizes, while Model MultiXbar is a single Subnet/DNN optimized for all three crossbar sizes.
6.2 Analysis of EDAP Results of Subnets
In this section, we benchmark our Subnet architectures derived using XploreNAS for overall hardware efficiency using the Neurosim tool. In Figure 6(a), we plot the EDAP per inference for various DNNs on the logarithmic scale with respect to the crossbar size. The trends pertain to CIFAR10 dataset. We find that although Model Xbar outperforms the corresponding baseline ResNet-18 model in terms of adversarial robustness on crossbars, it expends a higher EDAP on hardware (\(\sim 1.62\times\) higher on 64 \(\times\) 64 crossbars). This scenario is altered on considering Model Xbar_Ar, which is optimized to yield better hardware efficiency in addition to adversarial robustness. On 64 \(\times\) 64 (128 \(\times\) 128) crossbars, Model Xbar_Ar outperforms the corresponding ResNet-18 baseline by achieving \(\sim 1.60\times\) (\(\sim 1.53\times\)) lower EDAP on hardware. In fact, the EDAP of Model Xbar_Ar on 64 \(\times\) 64 crossbars is lower than Model Xbar on 128 \(\times\) 128 crossbars.
The EDAP trends in Figure 6(b) pertain to CIFAR100 dataset. The plot corroborates that Model MultiXbar,which is more robust than the corresponding Model Xbar, is also inherently more hardware efficient without any additional optimizations during training. This result implies that XploreNAS inherently performs a performance vs. EDAP tradeoff when searching for a Subnet in Phase-2 training that is amenable for all three crossbar size-specific noise profiles (Model MultiXbar scenario). On 128 \(\times\) 128 crossbars, Model MultiXbar even achieves \(\sim 1.16\times\) lower EDAP than the corresponding Model Xbar_Ar, which is specifically optimized to better hardware efficiency.
6.3 How Do the Searched Architectures Differ?
There are a total of nine Op-1 and Op-2 operations in the Supernet (see Figure 3). Let us collectively refer to them as Select operations. In this section, we sequentially arrange the Select operations of the different Subnets determined by our XploreNAS approach and analyze the architectures (see Table 2). It is evident that a DNN layer corresponding to a 3 \(\times\) 3 weight-kernel size would require a lower number of crossbars to be mapped than the 5 \(\times\) 5 weight-kernel size. Further, the latter convolutional layers in the searched Subnets would entail a larger number of crossbars for mapping owing to an increase in the number of input/output channels. Based on the results in Table 2, we carry our analysis as follows:
Table 2. Architectures of the Different XploreNAS-derived Subnets
Comparison between Model Xbar and Model Xbar_Ar: For a given crossbar size, when we compare the architectures of Model Xbar and Model Xbar_Ar, we find that Model Xbar_Ar chooses an architecture that requires overall a lower number of crossbars to be mapped. This includes preferring convolutions with lower weight-kernel sizes in the latter residual blocks or replacing convolutional layers with AvgPool or skip-connections. For example, in case of CIFAR10 results for crossbar sizes ranging from 32 \(\times\) 32 to 128 \(\times\) 128, we find that the last Select operation is a skip-connection for Model Xbar_Ar instead of Conv3x3 for Model Xbar (highlighted in red in Table 2). The trends are similar even for CIFAR100 dataset (data not shown for brevity). Reduction in overall crossbar count brings in higher hardware efficiency (lower EDAP) to the Model Xbar_Ar as we can see in Section 6.2. For the CIFAR10 dataset, we plot blockwise EDAP for the four residual blocks in our Subnets (see Figure 7) and find that the major reduction in EDAP for Model Xbar_Ar arises from the Select operations in the latter residual blocks (R-II and R-IV for 64 \(\times\) 64 crossbars and R-III and R-IV for 128 \(\times\) 128 crossbars). Additionally, since XploreNAS inherently performs a performance vs. EDAP tradeoff when searching for a MultiXbar Subnet, we find that MultiXbar on 128 \(\times\) 128 crossbars undergoes reduction in EDAP primarily for the R-I and R-III residual blocks compared to its Model Xbar counterpart to achieve \(\sim 1.15\times\) lower overall EDAP.
Fig. 7.
Comparison between different crossbar sizes: For Model XBar with CIFAR10 dataset, on moving from a crossbar size of 32 \(\times\) 32 to 64 \(\times\) 64, we find that for the sixth and eighth Select operations, Conv5x5 gets replaced with Conv3x3 operation. Similarly, for the second Select operation, the ensemble of Conv3x3 & Conv5x5 operation gets replaced with a single Conv5x5 operation. This has been highlighted in blue in Table 2. Similar trends are also seen in the case of Model Xbar_Ar on moving from the crossbar sizes of 32 \(\times\) 32, 64 \(\times\) 64, to 128 \(\times\) 128. In other words, XploreNAS generally opts for convolution operations with smaller weight-kernel sizes to reduce the dimensions of the weight matrices when searching the optimal architecture on larger crossbar sizes. However, on moving to very large crossbar sizes such as 128 \(\times\) 128 (having a greater impact of non-idealities), very small weight matrices may result in non-ideality dominance that can adversely affect the performance (robustness) of the DNN model [3]. Thus, we find that XploreNAS chooses a mix of small and large kernel operations to balance the hardware cost alongside performance on 128 \(\times\) 128 crossbars (highlighted in magenta in Table 2). Interestingly, XploreNAS takes the best of both worlds for Model MultiXbar and yields an optimal mix of smaller and larger kernels that corroborates to its balanced EDAP vs. performance tradeoff.
6.4 Results on Varying NVM Device Noise in Crossbars during Inference
So far in all our experiments (for XploreNAS and baselines), we have assumed that the NVM devices in the crossbars have synaptic variations with \(\sigma /\mu =0.35\), both during training and inference as specified in Section 6. In this section, we infer the models with different levels of crossbar noise by altering the synaptic device variations. Figure 8(a) and (b) show the plots between PGD-10 adversarial accuracy and device variations for CIFAR10 and CIFAR100 datasets, respectively. The device variations are varied from 0.1 to 0.5 for DNNs mapped on 64 \(\times\) 64 crossbars during inference.
Fig. 8.
We find that our XploreNAS-derived models to be highly resilient across this range of device noise without re-training; specifically the Model MultiXbar architecture maintains adversarial accuracy in the range of \(\sim\)48–55\(\%\) for the CIFAR10 dataset. In contrast, the standard ResNet-18 baseline model is relatively less robust and suffers higher accuracy losses for device variations over 30%. Its adversarial accuracy lies in the range of \(\sim\)41–52\(\%\) for the CIFAR10 dataset. Similar trends can also be seen for the CIFAR100 dataset, with our XploreNAS-derived Model MultiXbar maintaining its adversarial accuracy in the range of \(\sim\)29–36\(\%\), while the standard ResNet-18 baseline achieves lower adversarial robustness in the range \(\sim\)20–30\(\%\).
7 Comparison with Related Works
We provide qualitative and quantitative comparison between XploreNAS and previous works in Table 3. Non-NAS works such as References [2, 5, 6, 30] have used algorithm-hardware co-design to achieve adversarial robustness with competitive hardware efficiencies. Other NAS-based co-design works such as References [13, 27] have shown significant EDAP reductions during inference on their respective hardware platforms. But none of these have focused on optimizing neural architectures for adversarial robustness. Contrarily, Reference [16] is a purely hardware-agnostic approach to boost adversarial robustness of DNNs on software and hence does not deal with hardware metrics such as EDAP. As pointed out in Section 6, SW-NAS models derived via Reference [16] are highly vulnerable on crossbar platforms and lose their adversarial robustness owing to the non-idealities. XploreNAS explores the best of both the worlds and achieves a balanced tradeoff between adversarial robustness and EDAP.
All quantitative values denote percentage changes with respect to the respective baselines used in these works. Here, “\(\times\)” denotes non-applicability and — denotes metrics that are not reported in the respective works.
8 Conclusion
In the quest for achieving adversarial security for DNNs deployed on memristive crossbar accelerators, we propose an algorithm-hardware co-optimization approach called XploreNAS. It uses one-shot NAS for searching adversarially robust DNNs for non-ideal crossbar platforms. Our experiments show XploreNAS-derived Subnets can achieve a balanced tradeoff between adversarial robustness and hardware efficiency in terms of lower EDAP. Furthermore, we also find the Model MultiXbar architectures derived by XploreNAS to have adversarial resilience over a range of crossbar sizes and device-level variations without the requirement for re-training. Our XploreNAS work can further motivate future studies toward better methods integrated with the Subnet fine-tuning stage for adversarially training or adapting the sampled Subnets to strong adversarial attacks on memristive crossbars.
References
[1]
Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning. PMLR, 550–559.
Abhiroop Bhattacharjee, Lakshya Bhatnagar, and Priyadarshini Panda. 2022. Examining and mitigating the impact of crossbar non-idealities for accurate implementation of sparse deep neural networks. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’22). IEEE, 1119–1122.
Abhiroop Bhattacharjee, Youngeun Kim, Abhishek Moitra, and Priyadarshini Panda. 2022. Examining the robustness of spiking neural networks on non-ideal memristive crossbars. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. 1–6.
Abhiroop Bhattacharjee, Abhishek Moitra, and Priyadarshini Panda. 2021. Efficiency-driven hardware optimization for adversarially robust neural networks. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’21). IEEE, 884–889.
Abhiroop Bhattacharjee and Priyadarshini Panda. 2020. Switchx: Gmin-gmax switching for energy-efficient and robust implementation of binary neural networks on reram xbars. arXiv:2011.14498. Retrieved from https://arxiv.org/abs/2011.14498.
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient architecture search by network transformation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-all: Train one network and specialize it for efficient deployment. arXiv:1908.09791. Retrieved from https://arxiv.org/abs/1908.09791.
Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv:1812.00332. Retrieved from https://arxiv.org/abs/1812.00332.
Indranil Chakraborty, Mustafa Fayez Ali, Dong Eun Kim, Aayush Ankit, and Kaushik Roy. 2020. Geniex: A generalized approach to emulating non-ideality in memristive xbars using neural networks. In Proceedings of the 57th ACM/IEEE Design Automation Conference (DAC’20). IEEE, 1–6.
Gouranga Charan, Jubin Hazra, Karsten Beckmann, Xiaocong Du, Gokul Krishnan, Rajiv V. Joshi, Nathaniel C. Cady, and Yu Cao. 2020. Accurate inference with inaccurate RRAM devices: Statistical data, model transfer, and on-line adaptation. In Proceedings of the 57th ACM/IEEE Design Automation Conference (DAC’20). IEEE, 1–6.
Ian Goodfellow et al. 2014. Generative adversarial nets. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’14).
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572.
Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, and Dahua Lin. 2020. When nas meets robustness: In search of robust architectures against adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 631–640.
Miao Hu, Catherine E. Graves, Can Li, Yunning Li, Ning Ge, Eric Montgomery, Noraica Davila, Hao Jiang, R Stanley Williams, J. Joshua Yang, et al. 2018. Memristor-based analog computation and neural network classification with a dot product engine. Adv. Mater. 30, 9 (2018), 1705914.
Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M. Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R. Stanley Williams. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of the 53nd ACM/EDAC/IEEE Design Automation Conference (DAC’16). IEEE, 1–6.
Gokul Krishnan, Sumit K. Mandal, Chaitali Chakrabarti, Jae-sun Seo, Umit Y. Ogras, and Yu Cao. 2020. Interconnect-aware area and energy optimization for in-memory acceleration of DNNs. IEEE Des. Test 37, 6 (2020), 79–87.
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision. 2736–2744.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Retrieved from https://arxiv.org/abs/1706.06083.
Shubham Negi, Indranil Chakraborty, Aayush Ankit, and Kaushik Roy. 2022. NAX: Neural architecture and memristive xbar based accelerator co-design. In Proceedings of the 59th ACM/IEEE Design Automation Conference. 451–456.
Vladimir Nekrasov, Hao Chen, Chunhua Shen, and Ian Reid. 2020. Architecture search of dynamic cells for semantic video segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1970–1979.
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
Deboleena Roy, Indranil Chakraborty, Timur Ibrayev, and Kaushik Roy. 2020. Robustness hidden in plain sight: Can analog computing defend against adversarial attacks? arXiv:2008.1201. Retrieved from http://arxiv.org/abs/2008.1201.
Xiaoyu Sun and Shimeng Yu. 2019. Impact of non-ideal characteristics of resistive synaptic devices on implementing convolutional neural networks. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 3 (2019), 570–579.
Zhenhua Zhu, Jilan Lin, Ming Cheng, Lixue Xia, Hanbo Sun, Xiaoming Chen, Yu Wang, and Huazhong Yang. 2018. Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). IEEE, 1–8.
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697–8710.
Bhattacharjee AMoitra APanda P(2023)HyDe: A Hybrid PCM/FeFET/SRAM Device-Search for Optimizing Area and Energy-Efficiencies in Analog IMC PlatformsIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.332774813:4(1073-1082)Online publication date: Dec-2023
Memristive crossbars can efficiently implement Binarized Neural Networks (BNNs) wherein the weights are stored in high-resistance states (HRS) and low-resistance states (LRS) of the synapses. We propose SwitchX mapping of BNN weights onto ReRAM crossbars ...
Deep neural networks have been found vulnerable to adversarial attacks, thus raising potential concerns in security-sensitive contexts. To address this problem, recent research has investigated the adversarial robustness of deep neural ...
Neural architecture search (NAS) has emerged as one successful technique to find robust deep neural network (DNN) architectures. However, most existing robustness evaluations in NAS only consider l ∞ norm-based adversarial noises. In order to ...
Highlights
We search for comprehensively robust architectures under multiple types of evaluations using the weight-sharing-based NAS method.
We introduce multi-fidelity evaluations and reduce the number of robustness evaluations by the ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Bhattacharjee AMoitra APanda P(2023)HyDe: A Hybrid PCM/FeFET/SRAM Device-Search for Optimizing Area and Energy-Efficiencies in Analog IMC PlatformsIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.332774813:4(1073-1082)Online publication date: Dec-2023