1 Introduction
In recent years, machine learning has shown successful results in artificial intelligence applications such as computer vision, natural language processing, pattern classification and speech recognition [
1]. The rapid expansion of data volume and neural network training durations over the past decade has posed significant challenges for CPUs and GPUs in terms of energy efficiency [
2]. In response, specialized
artificial intelligence (AI) accelerators such as Eyeriss and Google TPUs have been devised to address these concerns [
3]. However, the constant shuttling of data back and forth from the accelerator and memory is often limited by the memory bandwidth and contributes to significant power and energy consumption.
Computing-in-memory (CIM) emerged as a promising paradigm for AI accelerators where the computations are done inside the memory [
4]. The conventional CMOS memory technologies encompass
dynamic random access memory (DRAM), static random access memory (SRAM) and flash that rely on the charge storage phenomenon. However, compared to these traditional technologies, the
emerging non-volatile memory technologies (eNVMs) are more suited to CIM operations in the field of neuromorphic hardware because of their capability in closely imitating the neurons and synapses in biological systems [
5]. The memristor, a specific eNVM, has several advantageous properties of low power consumption, scalability, compatibility with CMOS technology, and analog conductance modulation making it a viable replacement for traditional CMOS memory technologies [
5]. These characteristics of the memristor allows it to be used in building CIM architectures, supplanting the usage of analog-to-digital and digital-to-analog converters, resulting in substantial reductions in area and power consumption. Previously there have been several memristor-based AI accelerators built on
Convolutional Neural Networks (CNNs), including PRIME, ISAAC, PipeLayer and AtomLayer that have achieved significant improvements in energy and power compared to CMOS-based accelerators [
6–
9].
The integration of
Spiking Neural Networks (SNNs) in memristor-based machine learning can be further beneficial in improving the efficiency of these neural networks. SNNs have the ability to emulate biological neurons more closely via the transmission of spiking signals. The information in SNNs is encoded in both the timing and the ordering of the spikes, allowing them to have a spatiotemporal information processing capability [
10]. While the spatial aspect arises from the neurons localized to each other, the temporal aspect stems from the timing characteristics of SNNs [
11]. Once a threshold is exceeded, a spike is fired, capturing the information in a binary format, which further aids in the energy and power efficiency of these neural networks in hardware implementations. This makes SNNs superior to ANNs and is therefore depicted as the third generation of neural networks [
12]. In memristor-based neural networks, the integration of SNNs also ensures improvements in noise margins and increased tolerance in the variation of the devices [
13]. There have been several implementations of memristor-based SNNs over the past years. An energy-efficient and reconfigurable architecture built with Memristor Crossbar on deep SNNs, RESPARC, was developed that uses a hierarchical reconfigurable design to add data-flow patterns on SNNs [
14]. Zhao et al. [
15] fabricated a memristor-based SNN made with
Leaky-integrate and fire (LIF) neurons capable of performing
spike timing dependent plasticity (STDP) learning. However, there is no evaluation of the network against any image or pattern recognition tasks. In 2021, a multilayer memristor-based SNN was developed with a temporal order information encoding and STDP for weight updating [
16]. Although this work can achieve a relatively high accuracy with the MNIST dataset, the LIF neurons used in [
16] incorporate multiple operational amplifiers, consuming significant area and power. A fully hardware implementation of memristor-based SNN was also developed in [
17] that realizes the LIF neuron functions with threshold switching. However, the utilization of digital circuitry in this work leads to a notable increase in the hardware overhead.
In our work, we aim to build a low power, area and energy-efficient CIM-based SNN architecture. To achieve our goal, we combine the SNN with our fabricated robust two-layer memristor crossbar array that has an enhanced heat dissipation capability [
18]. Two major problems with memristor devices are resistance variation and leakage currents [
19]. With the added heat dissipation layer, our device is able to reduce the resistance variation and increase the inference accuracy by ∼30% [
20]. The issue of the leakage current is minimized by the high on and off ratio of our device, making it reach stable high and low states [
20]. One concern with the one-layer crossbar is that the entire structure suffers from high latency, large area, and a substantial amount of power consumption [
21]. By combining the monolithic three-dimensional integration technology, two crossbars are stacked on top of each other that can reduce the area, power consumption and latency by 2×, 1.48× and 1.58×, respectively [
18]. In addition to the improvements mentioned, the enhanced heat dissipation capability of our memristor crossbar array improves the robustness of the architecture. By effectively dissipating heat, the impact of temperature fluctuations is minimized, ensuring the reliability and stability of the device during prolonged operation. This robustness is crucial in maintaining the accuracy and performance of SNN.
To convert our data into spikes, we investigate and implement a type of temporal encoding scheme known as the
inter-spike interval (ISI) encoding scheme. Rate and temporal encoding schemes stand out as the two prominent neural coding schemes in literature. While data is encoded in the frequency of the spikes in rate encoding, temporal encoding utilizes the timing aspect of the signal and encodes information at precise times of the spikes [
22,
23]. Due to the fewer spike generation in temporal encoding, it is superior to its rate encoding counterpart in terms of power and energy efficiency, especially in hardware implementations [
23]. Among the temporal encoding schemes, the widely known ones are phase, burst and
time-to-first-spike (TTFS) encoding [
24–
26]. But the major advantage of our developed ISI encoding scheme lies in its high information density property which stems from the fact that information is encoded in both the distance and the timing of the spikes, influenced by the strength of the incoming signal.
We propose an energy-efficient SNN by combining the merits of our ISI encoding scheme and the two layer memristor. The input and output processing units of the memristor crossbar are discussed in detail in our work as well as the construction of the hidden units demonstrating the scalability of the network. A small-scale hardware model is proposed using a three-layer SNN architecture and the accuracy is shown using pixelated handwritten images of digits as inputs. To demonstrate the potential of our design, a large-scale three-layer CIM-based SNN model is then developed in PyTorch to test this network with the MNIST dataset which has shown to have a very high accuracy compared to its TTFS and rate counterpart.
The key contributions of our work are listed below:
(1)
We design and optimize our SNN with an ISI encoding scheme for high information density and spatiotemporal information capability, memristor crossbars for CIM operations and TTFS-based classification scheme in the output layer for energy efficiency.
(2)
Both positive and negative weight matrices are implemented using our two-layer memristor crossbar for improvements in latency, area and power consumption.
(3)
We successfully fabricated our robust memristor crossbar array with an enhanced heat dissipation capability and high on/off ratio to be used for CIM-based vector-matrix multiplication operations.
(4)
We demonstrate the classification of handwritten digits using a TTFS classification scheme on hardware with pixelated images of digits 0--9 while consuming merely 2.9mW of power with an inference speed of 2μs/image.
(5)
We evaluate our large-scale CIM-based SNN through the MNIST dataset on PyTorch and our results provide a ∼10% increase in accuracy with the ISI encoding scheme, compared to the rate and TTFS encoding schemes.