Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Essay

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Neural Networks 170 (2024) 190–201

Contents lists available at ScienceDirect

Neural Networks
journal homepage: www.elsevier.com/locate/neunet

Full Length Article

Exploiting nonlinear dendritic adaptive computation in training deep


Spiking Neural Networks
Guobin Shen a,c ,1 , Dongcheng Zhao a ,1 , Yi Zeng a,b,c ,∗
a Brain-Inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China
b
Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences (CAS), Shanghai, China
c
School of Future Technology, University of Chinese Academy of Sciences, Beijing, China

ARTICLE INFO ABSTRACT

Keywords: Inspired by the information transmission process in the brain, Spiking Neural Networks (SNNs) have gained
Dendritic Nonlinearity considerable attention due to their event-driven nature. However, as the network structure grows complex,
Dendritic Spatial Gating Module managing the spiking behavior within the network becomes challenging. Networks with excessively dense or
Dendritic Temporal Adjust Module
sparse spikes fail to transmit sufficient information, inhibiting SNNs from exhibiting superior performance.
Spiking Neural Networks
Current SNNs linearly sum presynaptic information in postsynaptic neurons, overlooking the adaptive adjust-
ment effect of dendrites on information processing. In this study, we introduce the Dendritic Spatial Gating
Module (DSGM), which scales and translates the input, reducing the loss incurred when transforming the
continuous membrane potential into discrete spikes. Simultaneously, by implementing the Dendritic Temporal
Adjust Module (DTAM), dendrites assign different importance to inputs of different time steps, facilitating
the establishment of the temporal dependency of spiking neurons and effectively integrating multi-step time
information. The fusion of these two modules results in a more balanced spike representation within the
network, significantly enhancing the neural network’s performance. This approach has achieved state-of-the-
art performance on static image datasets, including CIFAR10 and CIFAR100, as well as event datasets like
DVS-CIFAR10, DVS-Gesture, and N-Caltech101. It also demonstrates competitive performance compared to
the current state-of-the-art on the ImageNet dataset.

1. Introduction Plasticity (STP) (Zucker & Regehr, 2002) have solved the training
problem of SNNs to a certain extent. However, the performance of
As the third generation of Artificial Neural Networks (ANNs) (Maass, complex tasks is still poor. The surrogate gradient enables the SNNs
1997), Spiking Neural Networks (SNNs) consist of biologically inspired to be trained using the backpropagation algorithm, which significantly
spiking neurons and synaptic connections. Compared with the real- promotes the development of SNNs. However, as the network deepens,
valued artificial neural network, the SNNs use binary spike sequences the SNNs gradually become challenging to converge. Although there
to transmit information, which is more in line with the information are some algorithms such as neuron normalization (Wu et al., 2019;
processing mechanism of the brain. Furthermore, it is more convenient Zheng, Wu, Deng, Hu, & Li, 2021), deep residual learning (Fang, Yu,
to combine with neuromorphic sensors to perform efficient computing Chen, Huang, et al., 2021), and network architecture search (Kim, Li,
on neuromorphic hardware (Furber et al., 2012). Also, the rich spatio- Park, Venkatesha, & Panda, 2022) have improved the training of SNNs
temporal dynamic characteristics of spiking neurons make SNNs show
from various aspects, most of these are borrowing experience from the
great potential in brain-inspired artificial intelligence (Shen, Zhao, &
optimization of ANNs, while ignoring the characteristics of the SNNs.
Zeng, 2022a; Wu, Deng, Li, Zhu, & Shi, 2018; Zhao, Zeng, & Li, 2022)
Although a single spiking neuron can achieve the performance of
and brain simulation (Zeng, Zhao, Zhao, Shen, Dong, Lu, Zhang, Sun,
deep artificial neural networks (Beniaguev, Segev, & London, 2021),
Liang, Zhao, et al., 2023).
due to hardware limitations, the Leaky Integrate-and-Fire (LIF) and
The non-differentiable characteristics of SNNs make the training
IF spiking neurons are often used when constructing large-scale deep
of SNNs difficult, which attracts many researchers to study. Some
brain-inspired synaptic plasticity learning algorithms such as Spike- SNNs. These neurons only passively sum the presynaptic neuron infor-
Timing-Dependent Plasticity (STDP) (Bi & Poo, 1998) and Short Term mation linearly, accumulate the membrane potential, and fire spikes

∗ Corresponding author at: Brain-Inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China.
E-mail address: yi.zeng@ia.ac.cn (Y. Zeng).
1
Guobin Shen, Dongcheng Zhao contributed equally to this work.

https://doi.org/10.1016/j.neunet.2023.10.056
Received 21 November 2022; Received in revised form 22 August 2023; Accepted 31 October 2023
Available online 10 November 2023
0893-6080/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
G. Shen et al. Neural Networks 170 (2024) 190–201

Fig. 1. Illustration of the relationship between different neuronal models and informativeness in the features.

when the membrane potentials reach the threshold. They often share the proposed DSGM and DTAM can help balance the spiking firing
the same hyperparameters and cannot dynamically adjust themselves rates and achieve state-of-the-art accuracy with very low latency.
to the input current to achieve the optimal fire rate. Fig. 1 shows the For the ImageNet dataset, our algorithm shows comparable per-
feature maps corresponding to the DVS data processed by different formance with the current best algorithms under lighter structures
types of neurons. Different excitation patterns of neurons affect the and lower latency. Codes and data have been deposited in GitHub
informativeness of the features and greatly influence the final accuracy. https://github.com/BrainCog-X/Brain-Cog.
We find that the plain linear synaptic model affects the excitatory
pattern of neurons. When the number of layers deepens, the deep 2. Related works
spiking neurons are under-activated for a long time and are in the spike
vanishing state. In this state, the network cannot transmit sufficient The design of high-performance deep SNNs is mainly divided into
information in the forward propagation, and the weights cannot be two aspects, conversion-based SNNs and the directly trained SNNs
effectively updated in the backpropagation process. By simply lower- based on surrogate gradient.
ing the threshold or strengthening the input of the spiking neurons,
although the deep spiking neurons will be activated, it will cause the 2.1. Conversion based
shallow spiking neurons to release too many spikes and be in a state
of spike explosion. A large number of spikes will make it difficult The conversion-based SNNs usually apply the parameter constraints
for the networks to extract useful information, thus destroying the on the well-trained ANNs, so the ReLU activation function in ANNs
performance of SNNs. A spike-balanced state will enable the SNNs to can be directly replaced with IF neurons. In order to improve the
learn richer representations and converge rapidly and stably. performance after conversion, many methods have been proposed, such
In addition to the computational properties of the soma, dendrites, as soft reset (Han, Srinivasan, & Roy, 2020), synchronous neurons (Li,
as powerful computational units, play a significant role in the informa- Zhao, & Zeng, 2022), burst spikes (Li & Zeng, 2022). In order to reduce
tion processing in the brain Acharya et al. (2021), Bicknell and Häusser the conversion error, Bu et al. (2021) replaces the ReLU activation
(2021). Inspired by dendrites’ powerful spatiotemporal information function of the ANN before training with the clip-floor-shift function.
processing abilities, we construct the adaptive dendritic computing Although the conversion-based SNNs can achieve competitive results on
module from spatial and temporal dimensions to adjust the input so large-scale datasets, most of these methods are limited to constructing
that the neuron firing states are more balanced, thus enriching the
SNNs using simple IF neurons and ignore the temporally rich dynamics
feature extraction ability. To illustrate the superiority of our model,
of SNNs. These lead to SNNs obtained by conversion methods that do
we conduct experiments on several datasets, our model dramatically
not handle event data well and have high latency.
improves the performance of SNNs, and our contributions can be
summarized as follows:
2.2. Directly trained
• Inspired by the dendrites’ spatial information processing mecha-
nism, we design the dendritic spatial gating module (DSGM) to To exploit the dynamic nature of SNNs in the temporal domain and
translate and scale the input at the current moment adaptively. to train SNNs end-to-end, some researchers have attempted to train
DSGM achieves multimode membrane potential distribution with- SNNs directly from scratch. The main problem faced in training SNNs
out external constraints and reduces the gap between membrane from scratch via gradient descent is the non-differentiability of the
potential and spike distribution. binary spikes. Wu et al. (2018, 2019) proposed the STBP algorithm to
• Inspired by the temporal information processing mechanism of speed up the training process of SNNs based on the spatio-temporal
dendrites, we design the dendritic temporal adjustment module dynamic properties of SNNs. Then the learnable neuron hyperparame-
(DTAM) to adaptively adjust the contribution of the input current ters, such as the membrane time constant (Fang, Yu, Chen, Masquelier,
at each moment to the membrane potential. et al., 2021), firing threshold (Shaban, Bezugam, & Suri, 2021). Then
• We evaluate our method on the static image datasets such as the attention mechanism (Liu, Xing, Feng, Tang, & Pan, 2022; Yao
CIFAR10 and CIFAR100 and the event datasets such as DVS- et al., 2021) is introduced to enhance the spatiotemporal informa-
CIFAR10, N-Caltech101, and DVS-Gesture. Experiments show that tion processing capability of SNNs. By adding an additional loss, Guo

191
G. Shen et al. Neural Networks 170 (2024) 190–201

et al. (Guo et al., 2022) achieved the conversion of membrane potential 3.2. Dendritic spatial gating module
from single-peak to multi-peak distribution, reducing the gap between
the distribution of membrane potential and spike and improving the The spiking neurons in SNNs control whether the neurons are
performance of SNNs. Duan et al. (Duan, Ding, Chen, Yu, & Huang, excited and output spikes by comparing the membrane potential with
2022) proposed Temporal Effective Batch Normalization (TEBN) to the threshold. As shown in Eq. (1), converting continuous membrane
make the temporal distribution of SNNs smoother and more even by potentials into discrete spikes makes SNNs more biologically plausible
assigning different weights to each moment. Yao et al. Yao, Li, Mo, and energy efficient. However, this conversion process is inherently
and Cheng (2022) introduced gating factors to fuse different biological nonlinear and can lead to information losses, which negatively affect
features and expand the characterization space of SNNs. However, in the performance of SNNs. When the input current is negative, the
these methods, spiking neurons often passively receive presynaptic in- membrane potential of the neuron also becomes negative, keeping the
neuron in an inhibited state and preventing it from firing. Therefore,
formation while ignoring the regulatory role of dendrites as a powerful
the transformation error between the membrane potential and the spike
computing unit in information transmission.
of a neuron can be measured by Eq. (3):

= [𝐻(𝑢𝑡 − 𝑣𝑡ℎ ) − 𝑢𝑡 ]2 ⋅ (𝑢𝑡 ≥ 0) (3)
3. Methods 𝑡

Instead of directly imposing external constraints on the model, we


In this section, first, we will give an introduction to the spiking neu- hope to empower neurons with the ability to adapt to complex distri-
ron model. Then the dendritic spatial gating module and the dendritic butions by more biologically plausible means to reduce transformation
temporal adjustment module will be discussed in detail. errors.
Inspired by the nonlinear dendrites in biological neurons, we pro-
pose the Dendritic Spatial Gating Module (DSGM), which can be de-
3.1. Spiking neuron model scribed as:
1
𝑢𝑡+1 = ( (𝑓 (𝑥𝑡 ) − 𝑢𝑡 ) + 𝑢𝑡 )(1 − 𝑠𝑡 ) + 𝑣𝑟𝑠𝑡 𝑠𝑡 (4)
The most commonly used LIF spiking neuron model is adopted in 𝜏
our network. The LIF model is transformed into an efficient iterative As shown in Eq. (4), in contrast to the basic LIF model in Eq. (1), we
expression for better simulation by solving the first-order differential implement nonlinear modeling of dendrites by mapping the input cur-
equation with Euler’s method. rents nonlinearly with 𝑓 (⋅). To achieve a continuous, smooth nonlinear
response for different input currents, 𝑓 (⋅) can be described as:
1 ∑
𝑁
{
𝑢𝑡+1,𝑖 = ( ( 𝑤𝑖𝑗 𝑠𝑡,𝑗 − 𝑢𝑡,𝑖 ) + 𝑢𝑡 )(1 − 𝑠𝑡,𝑖 ) + 𝑣𝑟𝑠𝑡 𝑠𝑡,𝑖 𝑥 − 𝛾 + 𝜉, 𝑥⩾𝛾
𝜏 𝑗=1 (1) 𝑓 (𝑥) = (5)
𝛽(𝑒𝑥−𝛾 − 1) + (1 − 𝛽)(𝑥 − 𝛾) + 𝜉, 𝑥 < 𝛾
𝑠𝑡+1,𝑖 = 𝐻(𝑢𝑡+1,𝑖 − 𝑣𝑡ℎ )
As shown in Eq. (5), the 𝛾 and 𝜉 control the translations in the
In Eq. (1), 𝑠𝑡,𝑗 represents the spike of the presynaptic neuron, 𝑤𝑖𝑗 input and output dimensions, respectively, and 𝛽 controls the decay
is the synaptic weight between neuron 𝑗 and neuron 𝑖, and 𝑥𝑡,𝑖 = of inhibitory inputs. The learnable parameters in the DSGM enable
∑𝑁
𝑗=1 𝑤𝑖𝑗 𝑠𝑡,𝑗 represents the input current calculated from presynaptic nonlinear modeling of synapses and enable SNNs to achieve spike
spikes. 𝑢𝑡 denotes the membrane potential of the neuron at 𝑡, 𝜏 ∈ balance in response to external stimuli (see Figs. 2 and 3).
(1, +∞) denotes the membrane time constant, and 𝑠𝑡 ∈ {0, 1} denotes The DSGM facilitates a nonlinear reaction to presynaptic input and
the spike of the neuron at 𝑡. After a neuron delivers a spike, the tailors the input current, which becomes critical for SNNs that only
membrane potential is reset to 𝑣𝑟𝑠𝑡 . 𝐻(⋅) denotes the Heaviside step provide binary spike outputs. As described in Eq. (5), parameter 𝛾
function, which is used to realize the spike process of the neuron. 𝑣𝑡ℎ adjusts the distribution of input currents, while parameter 𝛽 modifies
denotes the firing threshold. the inhibitory stimulus. It achieves this by scaling down the inhibitory
As shown in Eq. (1), the process of neurons firing spikes is non- input, which mitigates the intensity of such input, thus addressing the
differentiable. Applying the surrogate gradient (Bohte, 2011) allows prevalent issue of spike disappearance that often plagues deep SNNs.
On the other hand, 𝜉 serves to transform the modulated input
the backpropagation (Rumelhart, Hinton, & Williams, 1986) to be
currents, resulting in a shift in the output distribution. These pa-
successfully applied to the optimization process of the network. The
rameters together effectuate a nonlinear mapping of neuronal input
surrogate gradient we use is the same as Wu et al. (2018) and can be
currents. This mapping permits neurons to convert the traditionally
expressed as:
single-peaked membrane potential distribution into a more represen-
𝜕𝑠𝑡 1 tative multi-peaked distribution, aligning better with the features of a
= (𝛼 − 𝛼 2 ||𝑢𝑡 ||)𝑠𝑖𝑔𝑛( − ||𝑢𝑡 ||) (2)
𝜕𝑢𝑡 𝛼 discrete spike.
This proposed synaptic model, boasting a more comprehensive char-
In Eq. (2), 𝛼 controls the width of the surrogate gradient function,
acterization capacity, is designed to minimize transformation errors
which is set to 2 in our experiments.
typical in spiking neurons. In essence, it introduces a more dynamic,
Neurons predominantly receive their inputs through dendrites. In adaptable response to the synaptic input, enhancing the overall effi-
the LIF model, dendrites play a linear summation function, ignoring ciency and accuracy of SNNs.
the critical computational role played by dendrites. In real biological
neurons, dendrites respond nonlinearly to the input stimuli of different 3.3. Dendritic temporal adjustment module
intensities and can perform many functions such as temporal filter-
ing (RoSE & Call, 1992), amplification (Szalai, Kevwitch, & McGrath, Dendrites possess the unique capability of processing information
2003). Here we take inspiration from the adaptive dendritic compu- spatially as well as temporally, exhibiting varied responses to inputs at
tation and propose the Dendritic Spatial Gating Module (DSGM) and different time steps. The Leaky Integrate-and-Fire (LIF) neuron model,
Dendritic Temporal Adjust Module (DTAM) to translate and scale the however, employs a fixed membrane time constant, preventing any
input and adjust different importance to the input at different time dynamic changes during inference. This invariability in the membrane
steps. time constant leads to an exponential decay of input current over time,

192
G. Shen et al. Neural Networks 170 (2024) 190–201

Fig. 2. Illustration of the distribution of postsynaptic membrane potential. DSGM achieves reshaping of membrane potential by nonlinear dendritic modeling.

Fig. 3. The linear dendrities, excitatory dendrities, and our DSGM.

Fig. 4. Schematic of our Dendritic Temporal Adjustment Module (DTAM). This module enables attributing different significance to inputs at various time steps.

with the decay rate remaining the same across all time steps. Conse- 4. Experiments
quently, neurons are hindered from attributing significance to pivotal
moments in time. This decay also inhibits neurons from establishing
long-term dependencies and effectively integrating information from To verify the effectiveness and efficiency of the proposed method,
various moments. we apply DSGM and DTAM to deep SNNs and validate them on clas-
To address these issues, we introduce the Dendritic Temporal Ad- sification tasks with static image and event datasets. We train deep
justment Module (DTAM) as depicted in Fig. 4. DTAM is designed to SNNs using STBP (Wu et al., 2018) and use AdamW as the optimizer,
establish a correlation between dendrites and input currents at different
with the initial learning rate set to 5 × 10−3 , and cosine decay to 0.
moments. Upon employing DTAM, the dendritic state can be modulated
The batch size is set to 128, and the training epoch is set to 600. For
by the input current at diverse time steps, thereby allowing the attribu-
tion of varying levels of importance to inputs at these different steps. static datasets, the images are converted to spike sequences using direct
encoding (Rathi & Roy, 2021; Wu et al., 2019), while for all datasets,
1
𝑢𝑡+1 = ( (𝑔𝒘 (𝑓 (𝒙𝑡 )) − 𝑢𝑡 ) + 𝑢𝑡 )(1 − 𝑠𝑡 ) + 𝑣𝑟𝑠𝑡 𝑠𝑡 the membrane potential of the last layer of neurons is used for direct
𝜏 (6)
s. t. 𝒙𝑡 = [𝑥0 , 𝑥1 , … , 𝑥𝑡 , 0, … , 0, 0]𝑇 ∈ R𝑇 output (Wu et al., 2019; Zheng et al., 2021). We use the LIF neuron
as a baseline and apply DSGM and DTAM to the LIF neuron to test
In Eq. (6), for an individual neuron, 𝒙𝑡 encapsulates the input
currents from time step 0 to 𝑡, while the remaining elements from 𝑡 + 1 their effects on the performance of SNNs. The threshold voltage is set
to 𝑇 are zero-padded. 𝑇 denotes the total number of simulation steps. to 0.5, and the initial membrane time constant is set to 2. The surrogate
Additionally, 𝑔𝒘 (⋅) represents a function of the input current, which is gradient is applied to backpropagation to overcome the problem of the
weighted by an adjustable parameter. The role of 𝑓 (𝒙𝑡 ) in this equation non-differentiability of binary spikes. As in Wu et al. (2018) we use
is to summarize the input currents over time. triangular pulses to approximate the gradient of neurons. We assigned
In our DTAM, the introduction of 𝑔𝒘 allows us to adjust the weights independent learnable parameters to the DSGMs in different channels.
of input currents from different moments, allowing neurons to assign The 𝛾 and 𝜉 in the DSGM are initialized to 0, and 𝛽 is initialized to 0.75.
different levels of importance to inputs at various time steps, thus en-
For DTAM, the lower triangular matrix is used to describe the effects
abling the establishment of long-term dependencies and more effective
integration of information from different moments. The DTAM offers a of synaptic currents at different moments and to ensure that previous
novel approach to managing the temporal dynamics of input currents, moments were not influenced by subsequent moments. Moreover, each
enhancing the performance of SNNs by providing a more flexible and layer of neurons shares the same DTAM weights to prevent DTAM from
adaptive temporal processing mechanism. significantly affecting the number of network parameters.

193
G. Shen et al. Neural Networks 170 (2024) 190–201

Table 1
Compare with existing works on static image datasets.
Dataset Model Methods Architecture Step Accuracy
Rathi, Srinivasan, Panda, and Roy (2019) Hybrid training ResNet-20 250 92.22
Rathi and Roy (2021) Diet-SNN ResNet-20 10 92.54
Wu et al. (2019) STBP NeuNorm CIFARNet 12 90.53
Zhang and Li (2020) TSSL-BP CIFARNet 5 91.41
Shen et al. (2022a) STBP 7-layer-CNN 8 92.15
CIFAR10 Na et al. (2022) STBP NAS 16 93.15
Zheng et al. (2021) STBP-tdBN ResNet-19 6 93.16
Deng, Li, Zhang, and Gu (2021) TET ResNet-19 6 94.50
Duan et al. (2022) TEBN ResNet-19 6 94.71
Yao et al. (2022) GLIF ResNet-19 6 95.03
our method DSGM+DTAM ResNet-19 4 96.41 ± 0.03
Rathi et al. (2019) Hybrid training VGG-11 125 67.87
Rathi and Roy (2021) Diet-SNN ResNet-20 5 64.07
Shen et al. (2022a) STBP ResNet34 8 69.32
Kim et al. (2022) STBP NAS 5 73.04
CIFAR100
Deng et al. (2021) TET ResNet-19 6 74.72
Duan et al. (2022) TEBN ResNet-19 6 76.41
Yao et al. (2022) GLIF ResNet-19 6 77.35
our method DSGM+DTAM ResNet-19 4 78.87 ± 0.18
Rathi et al. (2019) Hybrid training ResNet-34 250 61.48
Sengupta, Ye, Wang, Liu, and Roy (2019) SPIKE-NORM ResNet-34 2500 69.96
Zheng et al. (2021) STBP-tdBN Spiking-ResNet-34 6 63.72
Fang, Yu, Chen, Masquelier, et al. (2021) SEW ResNet SEW-ResNet-34 4 67.04
ImageNet
Deng et al. (2021) TET SEW-ResNet-34 4 68.00
Fang, Yu, Chen, Masquelier, et al. (2021) SEW ResNet SEW-ResNet-152 4 69.26
Yao et al. (2022) GLIF ResNet-34 6 69.09
Yao et al. (2022) TEBN ResNet-34 4 64.29
DSGM+DTAM Spiking-ResNet-34 4 68.47
our method
DSGM+DTAM SEW-ResNet-34 4 69.36

4.1. Static datasets 4.2. Event datasets

To verify the performance of our DSGM and DTAM, we first con- We further conduct experiments on the event datasets, DVS-CIFAR10
duct experiments on the commonly used static classification datasets, (Li, Liu, Ji, Li, & Shi, 2017), DVS-Gesture (Amir et al., 2017), and
CIFAR10, CIFAR100, and ImageNet. All static datasets are based on the N-Caltech101 (Orchard, Jayawant, Cohen, & Thakor, 2015) to demon-
direct input encoding used in Wu et al. (2019). strate the powerful information representation ability.

4.1.1. CIFAR 4.2.1. DVS-CIFAR10


The CIFAR series mainly include two categories: CIFAR10 DVS-CIFAR10 is the event stream version of CIFAR10. There are
(Krizhevsky et al., 2009) and CIFAR100 dataset (Xu, Wang, Chen, & ten categories, including 10,000 samples in CIFAR10. Since the original
Li, 2015). They are widely used to verify multiple visual recognition DVS-CIFAR10 does not divide the training set and test set, we use the
algorithms. The CIFAR10 dataset contains ten categories of image sam- first 90% of each category as the training set and the remaining 10% as
ples, with a total of 50,000 training samples and 10,000 test samples. the test set. The samples are resized to 48 × 48, and random horizontal
CIFAR100 is based on CIFAR10 to classify images more finely, with 100
flips, rotations, and crops are applied for data argumentation. We set
categories, making it more challenging. The samples in the dataset are
the simulation set 𝑇 = 10.
color datasets of size 32 × 32. 𝑇 = 4 are used for training and validation
on the CIFAR dataset.
4.2.2. DVS-gesture
DVS-Gesture is a real-world gesture recognition dataset collected by
4.1.2. ImageNet
the dynamic vision sensor (DVS). It contains 11 categories of gestures,
ImageNet (Krizhevsky, Sutskever, & Hinton, 2012) is a large com-
comprising 1342 different samples. The samples are resized to 48 × 48
puter vision dataset with a total of 1000 categories, containing more
than 1250,000 training samples and 50,000 validation samples. The with no data augmentation applied. The training set and validation set
samples are cropped to size of 224 × 224, and apply standard data are divided by 8:2.
augmentation for training. Similar to CIFAR10, the simulation step is
𝑇 = 4. 4.2.3. N-Caltech101
The results are compared with several existing works on the struc- The N-Caltech101 is a neuromorphic version of the Caltech101,
tures ResNet19 (Zheng et al., 2021), Spiking-ResNet34 (Zheng et al., obtained by capturing images of the Caltech101 displayed on an LCD
2021), SNN5 (Deng et al., 2021) and VGGSNN (Deng et al., 2021). monitor with a freely moving event camera. We resize samples to
As shown in Table 1, for CIFAR10 and CIFAR100 datasets, our model 48 × 48, and use the same data argumentation strategy as DVS-
achieved performance with 96.41% and 78.87%, which achieves state- CIFAR10. The training and validation sets are split 9:1. The simulation
of-the-art performance compared with other superior work (Deng et al., step is set to 𝑇 = 10, and 𝑇 = 16 to test the performance of SNNs with
2021; Zheng et al., 2021) on the same network architecture. For the different latencies.
more complex dataset CIFAR100, our model improves by nearly 4.4% As shown in Table 2, our model shows competitive results on the
over the current state-of-the-art algorithm with the same simulation event dataset. On DVS-CIFAR10, we achieve an accuracy improvement
length and network architecture. On ImageNet, we also achieve com- of 1.05% compared to other methods. After applying EventMix (Shen,
petitive results using a simpler network structure than other methods. Zhao, & Zeng, 2022b), we achieved an accuracy improvement of 2.35%
We achieve a 69.36% accuracy compared to methods using the same compared with SNNs of the same structure. Using only 80% of the train-
structure on ImageNet. ing data, we achieved an accuracy that exceeded those of other methods

194
G. Shen et al. Neural Networks 170 (2024) 190–201

Table 2
Compare with existing works on event datasets.
Dataset Model Methods Architecture Step Accuracy
Zheng et al. (2021) STBP-tdBN ResNet-19 10 67.8
Kugele, Pfeil, Pfeiffer, and Chicca (2020) Streaming Rollout DenseNet 10 66.8
Wu et al. (2021) Conv3D LIAF-Net 10 71.70
Wu et al. (2021) LIAF LIAF-Net 10 70.40
DVS-CIFAR10 Na et al. (2022) STBP NAS 16 72.50
Shen et al. (2022a) STBP 5-layer-CNN 16 78.95
Duan et al. (2022) TEBN 7-layer CNN 10 75.10
Deng et al. (2021) TET VGGSNN 10 83.17
Yao et al. (2022) GLIF 7B-wideNet 16 78.10
DSGM+DTAM VGGSNN 10 84.23 ± 0.48
our method
DSGM+DTAM VGGSNN 10 85.53b ± 0.39
Xing, Di Caterina, and Soraghan (2020) SLAYER 5-Layer-CNN 20 92.01
Shrestha and Orchard (2018) SLAYER 16-layer-CNN 300 93.64
Fang, Yu, Chen, Masquelier, et al. (2021) STBP BP 20 97.57
DVS-Gesture Na et al. (2022) STBP NAS 16 96.53
DSGM + DTAM SNN5 10 98.36 ± 0.53
our method DSGM + DTAM SNN5 10 96.69a ± 0.40
DSGM + DTAM SNN5 16 98.58a ± 0.49
Kugele et al. (2020) STBP VGG11 20 55.0
Ramesh et al. (2019) N/A N/A N/A 66.8
N-Caltech101
DSGM + DTAM VGGSNN 10 74.48 ± 0.31
DSGM + DTAM VGGSNN 10 76.39b ± 0.39
our method
DSGM + DTAM VGGSNN 16 84.42 ± 0.43
DSGM + DTAM VGGSNN 16 85.26b ± 0.14
a
Denotes that the training and test sets are divided by 8:2, while other works uses 9:1.
b Indicates using EventMix.

using 90% data for training. On the more complex N-Caltech101, we To verify the adaptability of DSGM, we also test on SNNs using
achieved an accuracy of 85.26%. soft-reset neurons, as shown in Eq. (7).
1
𝑢𝑡+1,𝑖 = ( (𝑥𝑡 − 𝑢𝑡,𝑖 ) + 𝑢𝑡 ) − 𝑣𝑡ℎ 𝑠𝑡,𝑖 (7)
4.3. Result analysis 𝜏
Unlike the hard-reset neuron in Eq. (1), the soft-reset neuron is soft-
In this section, we present an advanced analysis of the proposed reset by subtracting the threshold voltage after the spike is delivered. As
method in terms of the membrane potential distribution, spike sparsity, shown in the last row of Fig. 5, we find that for the soft-reset neurons,
and the effect of different dendritic models on the model performance by applying DSGM, the membrane potential can also show a multi-peak
and loss landscape. distribution and reduce the transformation error of SNNs.
As shown in Fig. 6, we analyzed the effect of different dendritic
models on the transformation error of SNNs for all channels in the
4.3.1. Membrane potential distribution
model corresponding to Fig. 5. For different depth neurons, DSGM
In SNNs, neurons use binary spikes to represent different infor-
brings less transformation error to the SNNs.
mation. When the membrane potential is above the threshold, the
neuron is excited and outputs 1, while when the membrane potential is
below the threshold, the neuron is resting and outputs 0. Therefore, 4.3.2. Spike sparsity analysis
if the membrane potential can be concentrated around the neuron’s Fig. 7 shows the firing rate of neurons in different layers related
resting potential and threshold voltage, then the binary spikes can more to the dendrite type during training on the DVS-Gesture dataset. In
accurately approximate the membrane potential distribution. To further the initial stage of training, the shallow layer neurons have a higher
illustrate that nonlinear dendrites can bring better characterization firing rate. This is due to the random initialization of the synaptic
ability and more minor transformation errors to SNNs, we analyze the weights, which makes it difficult to activate the deep neurons, while
distribution of membrane potentials of neurons at different depths on the nonlinear design of DSGM enables the deep neurons of SNNs to
the DVS-Gesture dataset and the corresponding transformation errors. have higher firing rates, which facilitates the rapid convergence of the
As shown in Fig. 5, we conduct experiments on DVS-Gesture with network. As the training progresses, in the fifth epoch, the neurons
SNN5 structure, and we demonstrate the effect of our DSGM and DTAM of different layers have a higher firing rate than the initial stage to
on the distribution of membrane potentials in different layers. Such pass the information from the shallow layers to the deeper neurons.
distribution makes it impossible to efficiently characterize the distri- Subsequently, the firing rate of the shallow neurons decreases to better
bution of membrane potentials by binary spikes, which creates a gap adapt to the external stimuli.
between membrane potentials and spikes and affects the performance At different stages of training, excitatory dendritic models result in
of SNNs. When using linear and excitatory synaptic models, the mem- higher firing rates, which can lead to more significant energy overhead.
brane potential tends to show a single-peaked distribution. In contrast, The linear dendritic model, like the excitatory dendritic model, has
neurons with DSGM and DTAM show more excellent adaptation to a considerable variation in the firing rate of different layers as the
external stimuli, with the membrane potential showing a multimodal training progresses, which can easily lead to spike disappearance and
distribution and concentrated near the resting potential and threshold spike explosion, which can affect the performance of SNNs. The DSGM,
voltage. The adaptability of neurons with DSGM and DTAM enables on the other hand, has a more flexible dendritic model design, which
binary spikes to characterize the membrane potential of neurons better can ensure a relatively stable firing rate of neurons in different layers
and improve the information transmission efficiency of SNNs. at different training stages and thus performs better.

195
G. Shen et al. Neural Networks 170 (2024) 190–201

Fig. 5. Distribution of postsynaptic neuronal membrane potential at different depths on the DVS-Gesture.

Fig. 6. Transformation errors of neurons with different dendritic types at different depths.

4.3.3. Comparison of dendrite models membrane potential of the postsynaptic neuron. The other nonlinear
The DSGM can give the spiking neuron a larger representation space functions do not significantly improve the model’s performance com-
and realize the reshaping of the input current. To further illustrate pared to the linear dendritic model. This is because these functions
the effectiveness of DSGM, we compared DSGM with the nonlinear cannot adjust to the gain of the input current, making the neuron more
activation functions commonly used in ANNs such as ReLU (Glorot, likely to deliver more spikes, thus affecting the performance of the
Bordes, & Bengio, 2011), ELU (Clevert, Unterthiner, & Hochreiter, model.
2015), and GeRU (Hendrycks & Gimpel, 2016), and the results are
shown in Table 3.
Table 3 shows the performance of dendritic models with differ- 4.3.4. Loss landscapes
ent activation functions on the CIFAR10 and DVS-CIFAR10 datasets. To further examine the 2D landscapes (Li, Xu, Taylor, Studer, &
Compared with the other dendritic models, DSGM confers greater Goldstein, 2018) of SNNs near the local minima to demonstrate the
adaptability to the spiking neuron, allowing it to better adapt to the effect of DSGM and DTAM on the model’s generalization ability. As
input current’s shape and make more targeted adjustments to the shown in Fig. 8, DSGM and DTAM can help SNNs find flatter minima.

196
G. Shen et al. Neural Networks 170 (2024) 190–201

Fig. 7. Firing rates of different dendritic models at different training stages.

Fig. 8. Loss landscape of VGGSNN with/without DSGM and DTAM. Validation is performed on N-Caltech101 and DVS-CIFAR10.

Table 3 4.4.1. Qualitative analysis


Comparison of different dendrite models.
Our model includes DSGM, which is guided by three adjustable
Dataset Model Linear ReLU ELU GeLU DSGM
parameters: 𝛾, 𝜉, and 𝛽. Each layer and channel in the network is
CIFAR10 ResNet-19 95.98 96.01 95.86 96.12 96.41 assigned its unique DSGM configuration. To obtain robust evidence for
DVS-CIFAR10 VGGSNN 84.80 84.53 84.86 84.72 85.53 the changing trends of DSGM during the training process, we conducted
five runs using the SNN5 architecture on the DVS-CIFAR10 dataset and
visualized the variations of different parameters across all channels, as
depicted in Fig. 9.
4.4. Ablation studies
Throughout the training process, parameters 𝛾 and 𝜉 converge
swiftly, suggesting that the DSGM rapidly adjusts the polarity of the
We perform ablation studies on several datasets to better illustrate input current at the initial stages of training. Subsequently, 𝛽 is del-
the role of DSGM and DTAM. The qualitative and quantitative analyses icately adjusted to regulate the inhibitory input’s stimulus intensity
will be given. over time. These recurring observations demonstrate the ability of these

197
G. Shen et al. Neural Networks 170 (2024) 190–201

Fig. 9. Variation of parameters in DSGM during training.

Fig. 10. DTAM weight matrices on the DVS-CIFAR10 and CIFAR10 datasets.

parameters to maintain spike distributions in a relatively balanced operation in the temporal dimension, resulting in relatively low compu-
state. tational overhead. Furthermore, each layer of neurons in the network
As depicted in Fig. 10, we present the visualizations of the DTAM shares the same DTAM matrix, leading to improved energy efficiency.
matrices on the CIFAR10 and DVS-CIFAR10 datasets. The DTAM dy- Importantly, DTAM is designed with consideration for the temporal
namically adjusts the influence of input currents on membrane poten- characteristics of SNNs, where early time steps should not be influenced
tials for different time steps, endowing significance to inputs at various by subsequent ones. Therefore, DTAM better simulates the information
moments. This enables a more precise encoding of spike representation. processing mechanisms in the brain, thus enhancing the performance
To further demonstrate the attention mechanism of DTAM in the and efficiency of the model.
temporal dimension, we retrain the model on DVS-Gesture with a
longer sequence of length 64, 128, 256, and 512, and visualize the DTAM 4.4.2. Quantitative analysis
weights, as shown in Fig. 11. As shown in Table 4, to investigate the effect of DSGM and DTAM
Fig. 11 shows the distribution of DTAM weights for different lengths on the performance and convergence speed of SNNs, we conducted an
of sequences excluding the current moment. At longer sequences, the ablation study on CIFAR10, DVS-CIFAR10, and N-Caltech101.
distribution of DTAM is more pronounced, and the neurons pay more Both DSGM and DTAM contribute to the improvement of model
attention to the input currents at closer moments. performance. Due to less dynamic characteristics and shorter simula-
The attention mechanism is one of the common advanced cognitive tion time, DTAM does not contribute significantly to the classification
functions in the brain, and it plays a crucial role in many deep learning task of SNNs on static datasets due to less dynamic characteristics and
models such as Transformer (Vaswani et al., 2017). Transformer utilizes shorter simulation time. Both DSGM and DTAM effectively improve the
a self-attention mechanism to adaptively adjust the weights of different model performance for event datasets with richer temporal dynamics.
spatial positions by calculating matrices for Query, Key, and Value. Fig. 12 shows the training curve when using different modules.
This method enhances representational power without significantly DSGM and DTAM enable the neuron’s membrane potential to adapt
increasing the number of parameters. However, it incurs computational to different input currents at different moments, thus enabling faster
costs during inference due to the need for online computation of adaptation to external stimuli.
self-attention matrices. The nonlinear mapping of the DSGM for the input current can be
In contrast, DTAM assigns dynamic importance to different time divided into two parts: linear and exponential. To further verify the role
steps by utilizing a pre-defined matrix. This matrix is updated con- of these two components in DSGM, we perform an ablation analysis on
currently with other parameters during model training, but remains the components of DSGM, and the results are shown in Table 5.
fixed during inference. This enables DTAM to assign different weights We decompose the mapping of DSGM for inhibitory input currents
to input currents at different historical time steps, thereby influencing into linear and exponential components and conduct ablation experi-
the behavior of neurons. As shown in Eq. (6), DTAM is a linear ments on the CIFAR10 and DVS-CIFAR10 datasets, as shown in Table 5.

198
G. Shen et al. Neural Networks 170 (2024) 190–201

Fig. 11. DTAM weight matrices on the DVS-Gesture dataset with long sequences.

Fig. 12. Training curves on CIFAR10 and DVS-CIFAR10.

Table 4 The experimental results show that the DSGM outperforms the dendritic
Ablation study on DSGM and DTAM. model using only exponential and linear mappings on both static and
Method CIFAR10 DVS-CIFAR10 N-Caltech101 DVS datasets. This is because DSGM achieves a continuous and smooth
Baseline 96.03 83.4 84.37 transformation of the input currents while maintaining the nonlinearity
DSGM 96.32 84.6 85.23 of the dendritic model.
DTAM 95.98 84.8 85.11
DSGM+DTAM 96.41 85.53 85.26
5. Conclusion

Table 5 The SNNs is widely used in modeling various cognitive functions


Ablation analysis on the components of DSGM. because of its binary spike train information transmission. The sparse
Dataset Model 𝛽(𝑥 − 𝛾) + 𝜉 𝛽(𝑒𝑥−𝛾 ) + 𝜉 DSGM information in the network makes SNNs in the spike vanishing state
CIFAR10 ResNet-19 96.11 95.92 96.41 when it is extended to the deep structure and cannot transmit enough
DVS-CIFAR10 VGGSNN 85.23 84.96 85.53 information. When the input stimulus is strengthened, there will be a
state of spike explosion, which cannot transmit adequate information.
Dendrites perform a simple linear summation of presynaptic informa-
tion in the current information transmission of SNNs, ignoring the

199
G. Shen et al. Neural Networks 170 (2024) 190–201

nonlinear computing role. This paper proposes the dendritic spatial gat- Han, B., Srinivasan, G., & Roy, K. (2020). Rmp-snn: Residual membrane potential
ing mechanism and dendritic temporal adjust mechanism due to den- neuron for enabling deeper high-accuracy and low-latency spiking neural network.
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
drites’ adaptive spatiotemporal processing ability for input information. (pp. 13558–13567).
Through the spatial translation and scaling of the input information Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv preprint
and assigning different importance to the input information at different arXiv:1606.08415.
Kim, Y., Li, Y., Park, H., Venkatesha, Y., & Panda, P. (2022). Neural architecture search
time steps, the spikes of the network are in a relatively balanced state.
for spiking neural networks. In European Conference on Computer Vision (pp. 36–56).
We conduct experiments on the CIFAR10, CIFAR100, DVS-CIFAR10, Springer.
DVS-Gesture, and N-Caltech101 datasets and achieve state-of-the-art Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny
performance. At the same time, on ImageNet, it achieves a performance images. Citeseer.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
comparable to the current state-of-the-art algorithm under a lower convolutional neural networks. In Advances in neural information processing systems,
latency and lighter network structure. vol. 25.
Kugele, A., Pfeil, T., Pfeiffer, M., & Chicca, E. (2020). Efficient processing of spatio-
temporal data streams with spiking neural networks. Frontiers in Neuroscience, 14,
Declaration of competing interest 439.
Li, H., Liu, H., Ji, X., Li, G., & Shi, L. (2017). Cifar10-dvs: An event-stream dataset for
The authors declare that they have no known competing finan- object classification. Frontiers in Neuroscience, 11, 309.
Li, H., Xu, Z., Taylor, G., Studer, C., & Goldstein, T. (2018). Visualizing the loss
cial interests or personal relationships that could have appeared to
landscape of neural nets. In Advances in neural information processing systems, vol.
influence the work reported in this paper. 31.
Li, Y., & Zeng, Y. (2022). Efficient and accurate conversion of spiking neural network
Data availability with burst spikes. arXiv preprint arXiv:2204.13271.
Li, Y., Zhao, D., & Zeng, Y. (2022). BSNN: Towards faster and better conversion of
artificial neural networks to spiking neural networks with bistable neurons. Frontiers
Data will be made available on request. in neuroscience, 16, 991851.
Liu, Q., Xing, D., Feng, L., Tang, H., & Pan, G. (2022). Event-based multimodal spiking
neural network with attention mechanism. In ICASSP 2022-2022 IEEE international
Acknowledgments conference on acoustics, speech and signal processing (pp. 8922–8926). IEEE.
Maass, W. (1997). Networks of spiking neurons: The third generation of neural network
This work was supported by the National Key Research and Devel- models. Neural Networks, 10(9), 1659–1671.
Na, B., Mok, J., Park, S., Lee, D., Choe, H., & Yoon, S. (2022). Autosnn: Towards energy-
opment Program (Grant No. 2020AAA0107800). efficient spiking neural networks. In International Conference on Machine Learning
(pp. 16253–16269). PMLR.
References Orchard, G., Jayawant, A., Cohen, G. K., & Thakor, N. (2015). Converting static image
datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience,
9, 437.
Acharya, J., Basu, A., Legenstein, R., Limbacher, T., Poirazi, P., & Wu, X. (2021).
Ramesh, B., Yang, H., Orchard, G., Le Thi, N. A., Zhang, S., & Xiang, C. (2019). Dart:
Dendritic computing: Branching deeper into machine learning. Neuroscience. Distribution aware retinal transform for event-based cameras. IEEE Transactions on
Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J., Di Nolfo, C., et al. (2017). A Pattern Analysis and Machine Intelligence, 42(11), 2767–2780.
low power, fully event-based gesture recognition system. In Proceedings of the IEEE Rathi, N., & Roy, K. (2021). Diet-SNN: A low-latency spiking neural network with direct
conference on computer vision and pattern recognition (pp. 7243–7252). input encoding and leakage and threshold optimization. IEEE Transactions on Neural
Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial Networks and Learning Systems.
neural networks. Neuron, 109(17), 2727–2739. Rathi, N., Srinivasan, G., Panda, P., & Roy, K. (2019). Enabling deep spiking neural
Bi, G.-q., & Poo, M.-m. (1998). Synaptic modifications in cultured hippocampal neurons: networks with hybrid conversion and spike timing dependent backpropagation. In
Dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal International conference on learning representations.
of Neuroscience, 18(24), 10464–10472. RoSE, G. J., & Call, S. J. (1992). Evidence for the role of dendritic spines in the temporal
Bicknell, B. A., & Häusser, M. (2021). A synaptic learning rule for exploiting nonlinear filtering properties of neurons: The decoding problem and beyond. Proceedings of
dendritic computation. Neuron, 109(24), 4001–4017. the National Academy of Sciences, 89(20), 9662–9665.
Bohte, S. M. (2011). Error-backpropagation in networks of fractionally predictive Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by
spiking neurons. In Artificial neural networks and machine learning–ICANN 2011: back-propagating errors. Nature, 323(6088), 533–536.
21st international conference on artificial neural networks, Espoo, Finland, June 14-17, Sengupta, A., Ye, Y., Wang, R., Liu, C., & Roy, K. (2019). Going deeper in spiking
neural networks: VGG and residual architectures. Frontiers in Neuroscience, 13, 95.
2011, proceedings, Part I 21 (pp. 60–68). Springer.
Shaban, A., Bezugam, S. S., & Suri, M. (2021). An adaptive threshold neuron for
Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., & Huang, T. (2021). Optimal ANN-SNN
recurrent spiking neural networks with nanodevice hardware implementation.
conversion for high-accuracy and ultra-low-latency spiking neural networks. In
Nature Communications, 12(1), 1–11.
International conference on learning representations.
Shen, G., Zhao, D., & Zeng, Y. (2022a). Backpropagation with biologically plausible
Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network spatiotemporal adjustment for training deep spiking neural networks. Patterns,
learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. Article 100522.
Deng, S., Li, Y., Zhang, S., & Gu, S. (2021). Temporal efficient training of spiking Shen, G., Zhao, D., & Zeng, Y. (2022b). Eventmix: An efficient augmentation strategy
neural network via gradient re-weighting. In International conference on learning for event-based data. arXiv preprint arXiv:2205.12054.
representations. Shrestha, S. B., & Orchard, G. (2018). Slayer: Spike layer error reassignment in time.
Duan, C., Ding, J., Chen, S., Yu, Z., & Huang, T. (2022). Temporal effective batch In Advances in neural information processing systems, vol. 31.
normalization in spiking neural networks. Adv. Neural Inf. Process. Syst., 35, Szalai, M. L., Kevwitch, R. M., & McGrath, D. V. (2003). Geometric disassembly
34377–34390. of dendrimers: Dendritic amplification. Journal of the American Chemical Society,
Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., & Tian, Y. (2021). Deep residual 125(51), 15688–15689.
learning in spiking neural networks. Advances in Neural Information Processing Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al.
Systems, 34, 21056–21069. (2017). Attention is all you need. In Advances in neural information processing
Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., & Tian, Y. (2021). Incorporating systems, vol. 30.
learnable membrane time constant to enhance learning of spiking neural networks. Wu, Y., Deng, L., Li, G., Zhu, J., & Shi, L. (2018). Spatio-temporal backpropagation for
In Proceedings of the IEEE/CVF international conference on computer vision (pp. training high-performance spiking neural networks. Frontiers in Neuroscience, 12,
331.
2661–2671).
Wu, Y., Deng, L., Li, G., Zhu, J., Xie, Y., & Shi, L. (2019). Direct training for spiking
Furber, S. B., Lester, D. R., Plana, L. A., Garside, J. D., Painkras, E., Temple, S., et
neural networks: Faster, larger, better. In Proceedings of the AAAI conference on
al. (2012). Overview of the SpiNNaker system architecture. IEEE Transactions on
artificial intelligence, vol. 33, no. 01 (pp. 1311–1318).
Computers, 62(12), 2454–2467.
Wu, Z., Zhang, H., Lin, Y., Li, G., Wang, M., & Tang, Y. (2021). Liaf-net: Leaky integrate
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. and analog fire network for lightweight and efficient spatiotemporal information
In Proceedings of the fourteenth international conference on artificial intelligence and processing. IEEE Transactions on Neural Networks and Learning Systems.
statistics (pp. 315–323). JMLR Workshop and Conference Proceedings. Xing, Y., Di Caterina, G., & Soraghan, J. (2020). A new spiking convolutional recurrent
Guo, Y., Tong, X., Chen, Y., Zhang, L., Liu, X., Ma, Z., et al. (2022). RecDis-SNN: neural network (SCRNN) with applications to event-based hand gesture recognition.
Rectifying membrane potential distribution for directly training spiking neural Frontiers in Neuroscience, 14, 1143.
networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations
recognition (pp. 326–335). in convolutional network. arXiv preprint arXiv:1505.00853.

200
G. Shen et al. Neural Networks 170 (2024) 190–201

Yao, M., Gao, H., Zhao, G., Wang, D., Lin, Y., Yang, Z., et al. (2021). Temporal-wise Zhang, W., & Li, P. (2020). Temporal spike sequence learning via backpropagation for
attention spiking neural networks for event streams classification. In Proceedings of deep spiking neural networks. Advances in Neural Information Processing Systems, 33,
the IEEE/CVF international conference on computer vision (pp. 10221–10230). 12022–12033.
Yao, X., Li, F., Mo, Z., & Cheng, J. (2022). Glif: A unified gated leaky integrate- Zhao, D., Zeng, Y., & Li, Y. (2022). BackEISNN: A deep spiking neural network with
and-fire neuron for spiking neural networks. Adv. Neural Inf. Process. Syst., 35, adaptive self-feedback and balanced excitatory–inhibitory neurons. Neural Networks.
32160–32171. Zheng, H., Wu, Y., Deng, L., Hu, Y., & Li, G. (2021). Going deeper with directly-trained
Zeng, Y., Zhao, D., Zhao, F., Shen, G., Dong, Y., Lu, E., et al. (2023). Braincog: a spiking larger spiking neural networks. In Proceedings of the AAAI conference on artificial
neural network based, brain-inspired cognitive intelligence engine for brain-inspired intelligence, vol. 35, no. 12 (pp. 11062–11070).
ai and brain simulation. Patterns, 4(8). Zucker, R. S., & Regehr, W. G. (2002). Short-term synaptic plasticity. Annual Review of
Physiology, 64(1), 355–405.

201

You might also like