Essay
Essay
Essay
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
Keywords: Inspired by the information transmission process in the brain, Spiking Neural Networks (SNNs) have gained
Dendritic Nonlinearity considerable attention due to their event-driven nature. However, as the network structure grows complex,
Dendritic Spatial Gating Module managing the spiking behavior within the network becomes challenging. Networks with excessively dense or
Dendritic Temporal Adjust Module
sparse spikes fail to transmit sufficient information, inhibiting SNNs from exhibiting superior performance.
Spiking Neural Networks
Current SNNs linearly sum presynaptic information in postsynaptic neurons, overlooking the adaptive adjust-
ment effect of dendrites on information processing. In this study, we introduce the Dendritic Spatial Gating
Module (DSGM), which scales and translates the input, reducing the loss incurred when transforming the
continuous membrane potential into discrete spikes. Simultaneously, by implementing the Dendritic Temporal
Adjust Module (DTAM), dendrites assign different importance to inputs of different time steps, facilitating
the establishment of the temporal dependency of spiking neurons and effectively integrating multi-step time
information. The fusion of these two modules results in a more balanced spike representation within the
network, significantly enhancing the neural network’s performance. This approach has achieved state-of-the-
art performance on static image datasets, including CIFAR10 and CIFAR100, as well as event datasets like
DVS-CIFAR10, DVS-Gesture, and N-Caltech101. It also demonstrates competitive performance compared to
the current state-of-the-art on the ImageNet dataset.
1. Introduction Plasticity (STP) (Zucker & Regehr, 2002) have solved the training
problem of SNNs to a certain extent. However, the performance of
As the third generation of Artificial Neural Networks (ANNs) (Maass, complex tasks is still poor. The surrogate gradient enables the SNNs
1997), Spiking Neural Networks (SNNs) consist of biologically inspired to be trained using the backpropagation algorithm, which significantly
spiking neurons and synaptic connections. Compared with the real- promotes the development of SNNs. However, as the network deepens,
valued artificial neural network, the SNNs use binary spike sequences the SNNs gradually become challenging to converge. Although there
to transmit information, which is more in line with the information are some algorithms such as neuron normalization (Wu et al., 2019;
processing mechanism of the brain. Furthermore, it is more convenient Zheng, Wu, Deng, Hu, & Li, 2021), deep residual learning (Fang, Yu,
to combine with neuromorphic sensors to perform efficient computing Chen, Huang, et al., 2021), and network architecture search (Kim, Li,
on neuromorphic hardware (Furber et al., 2012). Also, the rich spatio- Park, Venkatesha, & Panda, 2022) have improved the training of SNNs
temporal dynamic characteristics of spiking neurons make SNNs show
from various aspects, most of these are borrowing experience from the
great potential in brain-inspired artificial intelligence (Shen, Zhao, &
optimization of ANNs, while ignoring the characteristics of the SNNs.
Zeng, 2022a; Wu, Deng, Li, Zhu, & Shi, 2018; Zhao, Zeng, & Li, 2022)
Although a single spiking neuron can achieve the performance of
and brain simulation (Zeng, Zhao, Zhao, Shen, Dong, Lu, Zhang, Sun,
deep artificial neural networks (Beniaguev, Segev, & London, 2021),
Liang, Zhao, et al., 2023).
due to hardware limitations, the Leaky Integrate-and-Fire (LIF) and
The non-differentiable characteristics of SNNs make the training
IF spiking neurons are often used when constructing large-scale deep
of SNNs difficult, which attracts many researchers to study. Some
brain-inspired synaptic plasticity learning algorithms such as Spike- SNNs. These neurons only passively sum the presynaptic neuron infor-
Timing-Dependent Plasticity (STDP) (Bi & Poo, 1998) and Short Term mation linearly, accumulate the membrane potential, and fire spikes
∗ Corresponding author at: Brain-Inspired Cognitive Intelligence Lab, Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China.
E-mail address: yi.zeng@ia.ac.cn (Y. Zeng).
1
Guobin Shen, Dongcheng Zhao contributed equally to this work.
https://doi.org/10.1016/j.neunet.2023.10.056
Received 21 November 2022; Received in revised form 22 August 2023; Accepted 31 October 2023
Available online 10 November 2023
0893-6080/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
G. Shen et al. Neural Networks 170 (2024) 190–201
Fig. 1. Illustration of the relationship between different neuronal models and informativeness in the features.
when the membrane potentials reach the threshold. They often share the proposed DSGM and DTAM can help balance the spiking firing
the same hyperparameters and cannot dynamically adjust themselves rates and achieve state-of-the-art accuracy with very low latency.
to the input current to achieve the optimal fire rate. Fig. 1 shows the For the ImageNet dataset, our algorithm shows comparable per-
feature maps corresponding to the DVS data processed by different formance with the current best algorithms under lighter structures
types of neurons. Different excitation patterns of neurons affect the and lower latency. Codes and data have been deposited in GitHub
informativeness of the features and greatly influence the final accuracy. https://github.com/BrainCog-X/Brain-Cog.
We find that the plain linear synaptic model affects the excitatory
pattern of neurons. When the number of layers deepens, the deep 2. Related works
spiking neurons are under-activated for a long time and are in the spike
vanishing state. In this state, the network cannot transmit sufficient The design of high-performance deep SNNs is mainly divided into
information in the forward propagation, and the weights cannot be two aspects, conversion-based SNNs and the directly trained SNNs
effectively updated in the backpropagation process. By simply lower- based on surrogate gradient.
ing the threshold or strengthening the input of the spiking neurons,
although the deep spiking neurons will be activated, it will cause the 2.1. Conversion based
shallow spiking neurons to release too many spikes and be in a state
of spike explosion. A large number of spikes will make it difficult The conversion-based SNNs usually apply the parameter constraints
for the networks to extract useful information, thus destroying the on the well-trained ANNs, so the ReLU activation function in ANNs
performance of SNNs. A spike-balanced state will enable the SNNs to can be directly replaced with IF neurons. In order to improve the
learn richer representations and converge rapidly and stably. performance after conversion, many methods have been proposed, such
In addition to the computational properties of the soma, dendrites, as soft reset (Han, Srinivasan, & Roy, 2020), synchronous neurons (Li,
as powerful computational units, play a significant role in the informa- Zhao, & Zeng, 2022), burst spikes (Li & Zeng, 2022). In order to reduce
tion processing in the brain Acharya et al. (2021), Bicknell and Häusser the conversion error, Bu et al. (2021) replaces the ReLU activation
(2021). Inspired by dendrites’ powerful spatiotemporal information function of the ANN before training with the clip-floor-shift function.
processing abilities, we construct the adaptive dendritic computing Although the conversion-based SNNs can achieve competitive results on
module from spatial and temporal dimensions to adjust the input so large-scale datasets, most of these methods are limited to constructing
that the neuron firing states are more balanced, thus enriching the
SNNs using simple IF neurons and ignore the temporally rich dynamics
feature extraction ability. To illustrate the superiority of our model,
of SNNs. These lead to SNNs obtained by conversion methods that do
we conduct experiments on several datasets, our model dramatically
not handle event data well and have high latency.
improves the performance of SNNs, and our contributions can be
summarized as follows:
2.2. Directly trained
• Inspired by the dendrites’ spatial information processing mecha-
nism, we design the dendritic spatial gating module (DSGM) to To exploit the dynamic nature of SNNs in the temporal domain and
translate and scale the input at the current moment adaptively. to train SNNs end-to-end, some researchers have attempted to train
DSGM achieves multimode membrane potential distribution with- SNNs directly from scratch. The main problem faced in training SNNs
out external constraints and reduces the gap between membrane from scratch via gradient descent is the non-differentiability of the
potential and spike distribution. binary spikes. Wu et al. (2018, 2019) proposed the STBP algorithm to
• Inspired by the temporal information processing mechanism of speed up the training process of SNNs based on the spatio-temporal
dendrites, we design the dendritic temporal adjustment module dynamic properties of SNNs. Then the learnable neuron hyperparame-
(DTAM) to adaptively adjust the contribution of the input current ters, such as the membrane time constant (Fang, Yu, Chen, Masquelier,
at each moment to the membrane potential. et al., 2021), firing threshold (Shaban, Bezugam, & Suri, 2021). Then
• We evaluate our method on the static image datasets such as the attention mechanism (Liu, Xing, Feng, Tang, & Pan, 2022; Yao
CIFAR10 and CIFAR100 and the event datasets such as DVS- et al., 2021) is introduced to enhance the spatiotemporal informa-
CIFAR10, N-Caltech101, and DVS-Gesture. Experiments show that tion processing capability of SNNs. By adding an additional loss, Guo
191
G. Shen et al. Neural Networks 170 (2024) 190–201
et al. (Guo et al., 2022) achieved the conversion of membrane potential 3.2. Dendritic spatial gating module
from single-peak to multi-peak distribution, reducing the gap between
the distribution of membrane potential and spike and improving the The spiking neurons in SNNs control whether the neurons are
performance of SNNs. Duan et al. (Duan, Ding, Chen, Yu, & Huang, excited and output spikes by comparing the membrane potential with
2022) proposed Temporal Effective Batch Normalization (TEBN) to the threshold. As shown in Eq. (1), converting continuous membrane
make the temporal distribution of SNNs smoother and more even by potentials into discrete spikes makes SNNs more biologically plausible
assigning different weights to each moment. Yao et al. Yao, Li, Mo, and energy efficient. However, this conversion process is inherently
and Cheng (2022) introduced gating factors to fuse different biological nonlinear and can lead to information losses, which negatively affect
features and expand the characterization space of SNNs. However, in the performance of SNNs. When the input current is negative, the
these methods, spiking neurons often passively receive presynaptic in- membrane potential of the neuron also becomes negative, keeping the
neuron in an inhibited state and preventing it from firing. Therefore,
formation while ignoring the regulatory role of dendrites as a powerful
the transformation error between the membrane potential and the spike
computing unit in information transmission.
of a neuron can be measured by Eq. (3):
∑
= [𝐻(𝑢𝑡 − 𝑣𝑡ℎ ) − 𝑢𝑡 ]2 ⋅ (𝑢𝑡 ≥ 0) (3)
3. Methods 𝑡
192
G. Shen et al. Neural Networks 170 (2024) 190–201
Fig. 2. Illustration of the distribution of postsynaptic membrane potential. DSGM achieves reshaping of membrane potential by nonlinear dendritic modeling.
Fig. 4. Schematic of our Dendritic Temporal Adjustment Module (DTAM). This module enables attributing different significance to inputs at various time steps.
with the decay rate remaining the same across all time steps. Conse- 4. Experiments
quently, neurons are hindered from attributing significance to pivotal
moments in time. This decay also inhibits neurons from establishing
long-term dependencies and effectively integrating information from To verify the effectiveness and efficiency of the proposed method,
various moments. we apply DSGM and DTAM to deep SNNs and validate them on clas-
To address these issues, we introduce the Dendritic Temporal Ad- sification tasks with static image and event datasets. We train deep
justment Module (DTAM) as depicted in Fig. 4. DTAM is designed to SNNs using STBP (Wu et al., 2018) and use AdamW as the optimizer,
establish a correlation between dendrites and input currents at different
with the initial learning rate set to 5 × 10−3 , and cosine decay to 0.
moments. Upon employing DTAM, the dendritic state can be modulated
The batch size is set to 128, and the training epoch is set to 600. For
by the input current at diverse time steps, thereby allowing the attribu-
tion of varying levels of importance to inputs at these different steps. static datasets, the images are converted to spike sequences using direct
encoding (Rathi & Roy, 2021; Wu et al., 2019), while for all datasets,
1
𝑢𝑡+1 = ( (𝑔𝒘 (𝑓 (𝒙𝑡 )) − 𝑢𝑡 ) + 𝑢𝑡 )(1 − 𝑠𝑡 ) + 𝑣𝑟𝑠𝑡 𝑠𝑡 the membrane potential of the last layer of neurons is used for direct
𝜏 (6)
s. t. 𝒙𝑡 = [𝑥0 , 𝑥1 , … , 𝑥𝑡 , 0, … , 0, 0]𝑇 ∈ R𝑇 output (Wu et al., 2019; Zheng et al., 2021). We use the LIF neuron
as a baseline and apply DSGM and DTAM to the LIF neuron to test
In Eq. (6), for an individual neuron, 𝒙𝑡 encapsulates the input
currents from time step 0 to 𝑡, while the remaining elements from 𝑡 + 1 their effects on the performance of SNNs. The threshold voltage is set
to 𝑇 are zero-padded. 𝑇 denotes the total number of simulation steps. to 0.5, and the initial membrane time constant is set to 2. The surrogate
Additionally, 𝑔𝒘 (⋅) represents a function of the input current, which is gradient is applied to backpropagation to overcome the problem of the
weighted by an adjustable parameter. The role of 𝑓 (𝒙𝑡 ) in this equation non-differentiability of binary spikes. As in Wu et al. (2018) we use
is to summarize the input currents over time. triangular pulses to approximate the gradient of neurons. We assigned
In our DTAM, the introduction of 𝑔𝒘 allows us to adjust the weights independent learnable parameters to the DSGMs in different channels.
of input currents from different moments, allowing neurons to assign The 𝛾 and 𝜉 in the DSGM are initialized to 0, and 𝛽 is initialized to 0.75.
different levels of importance to inputs at various time steps, thus en-
For DTAM, the lower triangular matrix is used to describe the effects
abling the establishment of long-term dependencies and more effective
integration of information from different moments. The DTAM offers a of synaptic currents at different moments and to ensure that previous
novel approach to managing the temporal dynamics of input currents, moments were not influenced by subsequent moments. Moreover, each
enhancing the performance of SNNs by providing a more flexible and layer of neurons shares the same DTAM weights to prevent DTAM from
adaptive temporal processing mechanism. significantly affecting the number of network parameters.
193
G. Shen et al. Neural Networks 170 (2024) 190–201
Table 1
Compare with existing works on static image datasets.
Dataset Model Methods Architecture Step Accuracy
Rathi, Srinivasan, Panda, and Roy (2019) Hybrid training ResNet-20 250 92.22
Rathi and Roy (2021) Diet-SNN ResNet-20 10 92.54
Wu et al. (2019) STBP NeuNorm CIFARNet 12 90.53
Zhang and Li (2020) TSSL-BP CIFARNet 5 91.41
Shen et al. (2022a) STBP 7-layer-CNN 8 92.15
CIFAR10 Na et al. (2022) STBP NAS 16 93.15
Zheng et al. (2021) STBP-tdBN ResNet-19 6 93.16
Deng, Li, Zhang, and Gu (2021) TET ResNet-19 6 94.50
Duan et al. (2022) TEBN ResNet-19 6 94.71
Yao et al. (2022) GLIF ResNet-19 6 95.03
our method DSGM+DTAM ResNet-19 4 96.41 ± 0.03
Rathi et al. (2019) Hybrid training VGG-11 125 67.87
Rathi and Roy (2021) Diet-SNN ResNet-20 5 64.07
Shen et al. (2022a) STBP ResNet34 8 69.32
Kim et al. (2022) STBP NAS 5 73.04
CIFAR100
Deng et al. (2021) TET ResNet-19 6 74.72
Duan et al. (2022) TEBN ResNet-19 6 76.41
Yao et al. (2022) GLIF ResNet-19 6 77.35
our method DSGM+DTAM ResNet-19 4 78.87 ± 0.18
Rathi et al. (2019) Hybrid training ResNet-34 250 61.48
Sengupta, Ye, Wang, Liu, and Roy (2019) SPIKE-NORM ResNet-34 2500 69.96
Zheng et al. (2021) STBP-tdBN Spiking-ResNet-34 6 63.72
Fang, Yu, Chen, Masquelier, et al. (2021) SEW ResNet SEW-ResNet-34 4 67.04
ImageNet
Deng et al. (2021) TET SEW-ResNet-34 4 68.00
Fang, Yu, Chen, Masquelier, et al. (2021) SEW ResNet SEW-ResNet-152 4 69.26
Yao et al. (2022) GLIF ResNet-34 6 69.09
Yao et al. (2022) TEBN ResNet-34 4 64.29
DSGM+DTAM Spiking-ResNet-34 4 68.47
our method
DSGM+DTAM SEW-ResNet-34 4 69.36
To verify the performance of our DSGM and DTAM, we first con- We further conduct experiments on the event datasets, DVS-CIFAR10
duct experiments on the commonly used static classification datasets, (Li, Liu, Ji, Li, & Shi, 2017), DVS-Gesture (Amir et al., 2017), and
CIFAR10, CIFAR100, and ImageNet. All static datasets are based on the N-Caltech101 (Orchard, Jayawant, Cohen, & Thakor, 2015) to demon-
direct input encoding used in Wu et al. (2019). strate the powerful information representation ability.
194
G. Shen et al. Neural Networks 170 (2024) 190–201
Table 2
Compare with existing works on event datasets.
Dataset Model Methods Architecture Step Accuracy
Zheng et al. (2021) STBP-tdBN ResNet-19 10 67.8
Kugele, Pfeil, Pfeiffer, and Chicca (2020) Streaming Rollout DenseNet 10 66.8
Wu et al. (2021) Conv3D LIAF-Net 10 71.70
Wu et al. (2021) LIAF LIAF-Net 10 70.40
DVS-CIFAR10 Na et al. (2022) STBP NAS 16 72.50
Shen et al. (2022a) STBP 5-layer-CNN 16 78.95
Duan et al. (2022) TEBN 7-layer CNN 10 75.10
Deng et al. (2021) TET VGGSNN 10 83.17
Yao et al. (2022) GLIF 7B-wideNet 16 78.10
DSGM+DTAM VGGSNN 10 84.23 ± 0.48
our method
DSGM+DTAM VGGSNN 10 85.53b ± 0.39
Xing, Di Caterina, and Soraghan (2020) SLAYER 5-Layer-CNN 20 92.01
Shrestha and Orchard (2018) SLAYER 16-layer-CNN 300 93.64
Fang, Yu, Chen, Masquelier, et al. (2021) STBP BP 20 97.57
DVS-Gesture Na et al. (2022) STBP NAS 16 96.53
DSGM + DTAM SNN5 10 98.36 ± 0.53
our method DSGM + DTAM SNN5 10 96.69a ± 0.40
DSGM + DTAM SNN5 16 98.58a ± 0.49
Kugele et al. (2020) STBP VGG11 20 55.0
Ramesh et al. (2019) N/A N/A N/A 66.8
N-Caltech101
DSGM + DTAM VGGSNN 10 74.48 ± 0.31
DSGM + DTAM VGGSNN 10 76.39b ± 0.39
our method
DSGM + DTAM VGGSNN 16 84.42 ± 0.43
DSGM + DTAM VGGSNN 16 85.26b ± 0.14
a
Denotes that the training and test sets are divided by 8:2, while other works uses 9:1.
b Indicates using EventMix.
using 90% data for training. On the more complex N-Caltech101, we To verify the adaptability of DSGM, we also test on SNNs using
achieved an accuracy of 85.26%. soft-reset neurons, as shown in Eq. (7).
1
𝑢𝑡+1,𝑖 = ( (𝑥𝑡 − 𝑢𝑡,𝑖 ) + 𝑢𝑡 ) − 𝑣𝑡ℎ 𝑠𝑡,𝑖 (7)
4.3. Result analysis 𝜏
Unlike the hard-reset neuron in Eq. (1), the soft-reset neuron is soft-
In this section, we present an advanced analysis of the proposed reset by subtracting the threshold voltage after the spike is delivered. As
method in terms of the membrane potential distribution, spike sparsity, shown in the last row of Fig. 5, we find that for the soft-reset neurons,
and the effect of different dendritic models on the model performance by applying DSGM, the membrane potential can also show a multi-peak
and loss landscape. distribution and reduce the transformation error of SNNs.
As shown in Fig. 6, we analyzed the effect of different dendritic
models on the transformation error of SNNs for all channels in the
4.3.1. Membrane potential distribution
model corresponding to Fig. 5. For different depth neurons, DSGM
In SNNs, neurons use binary spikes to represent different infor-
brings less transformation error to the SNNs.
mation. When the membrane potential is above the threshold, the
neuron is excited and outputs 1, while when the membrane potential is
below the threshold, the neuron is resting and outputs 0. Therefore, 4.3.2. Spike sparsity analysis
if the membrane potential can be concentrated around the neuron’s Fig. 7 shows the firing rate of neurons in different layers related
resting potential and threshold voltage, then the binary spikes can more to the dendrite type during training on the DVS-Gesture dataset. In
accurately approximate the membrane potential distribution. To further the initial stage of training, the shallow layer neurons have a higher
illustrate that nonlinear dendrites can bring better characterization firing rate. This is due to the random initialization of the synaptic
ability and more minor transformation errors to SNNs, we analyze the weights, which makes it difficult to activate the deep neurons, while
distribution of membrane potentials of neurons at different depths on the nonlinear design of DSGM enables the deep neurons of SNNs to
the DVS-Gesture dataset and the corresponding transformation errors. have higher firing rates, which facilitates the rapid convergence of the
As shown in Fig. 5, we conduct experiments on DVS-Gesture with network. As the training progresses, in the fifth epoch, the neurons
SNN5 structure, and we demonstrate the effect of our DSGM and DTAM of different layers have a higher firing rate than the initial stage to
on the distribution of membrane potentials in different layers. Such pass the information from the shallow layers to the deeper neurons.
distribution makes it impossible to efficiently characterize the distri- Subsequently, the firing rate of the shallow neurons decreases to better
bution of membrane potentials by binary spikes, which creates a gap adapt to the external stimuli.
between membrane potentials and spikes and affects the performance At different stages of training, excitatory dendritic models result in
of SNNs. When using linear and excitatory synaptic models, the mem- higher firing rates, which can lead to more significant energy overhead.
brane potential tends to show a single-peaked distribution. In contrast, The linear dendritic model, like the excitatory dendritic model, has
neurons with DSGM and DTAM show more excellent adaptation to a considerable variation in the firing rate of different layers as the
external stimuli, with the membrane potential showing a multimodal training progresses, which can easily lead to spike disappearance and
distribution and concentrated near the resting potential and threshold spike explosion, which can affect the performance of SNNs. The DSGM,
voltage. The adaptability of neurons with DSGM and DTAM enables on the other hand, has a more flexible dendritic model design, which
binary spikes to characterize the membrane potential of neurons better can ensure a relatively stable firing rate of neurons in different layers
and improve the information transmission efficiency of SNNs. at different training stages and thus performs better.
195
G. Shen et al. Neural Networks 170 (2024) 190–201
Fig. 5. Distribution of postsynaptic neuronal membrane potential at different depths on the DVS-Gesture.
Fig. 6. Transformation errors of neurons with different dendritic types at different depths.
4.3.3. Comparison of dendrite models membrane potential of the postsynaptic neuron. The other nonlinear
The DSGM can give the spiking neuron a larger representation space functions do not significantly improve the model’s performance com-
and realize the reshaping of the input current. To further illustrate pared to the linear dendritic model. This is because these functions
the effectiveness of DSGM, we compared DSGM with the nonlinear cannot adjust to the gain of the input current, making the neuron more
activation functions commonly used in ANNs such as ReLU (Glorot, likely to deliver more spikes, thus affecting the performance of the
Bordes, & Bengio, 2011), ELU (Clevert, Unterthiner, & Hochreiter, model.
2015), and GeRU (Hendrycks & Gimpel, 2016), and the results are
shown in Table 3.
Table 3 shows the performance of dendritic models with differ- 4.3.4. Loss landscapes
ent activation functions on the CIFAR10 and DVS-CIFAR10 datasets. To further examine the 2D landscapes (Li, Xu, Taylor, Studer, &
Compared with the other dendritic models, DSGM confers greater Goldstein, 2018) of SNNs near the local minima to demonstrate the
adaptability to the spiking neuron, allowing it to better adapt to the effect of DSGM and DTAM on the model’s generalization ability. As
input current’s shape and make more targeted adjustments to the shown in Fig. 8, DSGM and DTAM can help SNNs find flatter minima.
196
G. Shen et al. Neural Networks 170 (2024) 190–201
Fig. 8. Loss landscape of VGGSNN with/without DSGM and DTAM. Validation is performed on N-Caltech101 and DVS-CIFAR10.
197
G. Shen et al. Neural Networks 170 (2024) 190–201
Fig. 10. DTAM weight matrices on the DVS-CIFAR10 and CIFAR10 datasets.
parameters to maintain spike distributions in a relatively balanced operation in the temporal dimension, resulting in relatively low compu-
state. tational overhead. Furthermore, each layer of neurons in the network
As depicted in Fig. 10, we present the visualizations of the DTAM shares the same DTAM matrix, leading to improved energy efficiency.
matrices on the CIFAR10 and DVS-CIFAR10 datasets. The DTAM dy- Importantly, DTAM is designed with consideration for the temporal
namically adjusts the influence of input currents on membrane poten- characteristics of SNNs, where early time steps should not be influenced
tials for different time steps, endowing significance to inputs at various by subsequent ones. Therefore, DTAM better simulates the information
moments. This enables a more precise encoding of spike representation. processing mechanisms in the brain, thus enhancing the performance
To further demonstrate the attention mechanism of DTAM in the and efficiency of the model.
temporal dimension, we retrain the model on DVS-Gesture with a
longer sequence of length 64, 128, 256, and 512, and visualize the DTAM 4.4.2. Quantitative analysis
weights, as shown in Fig. 11. As shown in Table 4, to investigate the effect of DSGM and DTAM
Fig. 11 shows the distribution of DTAM weights for different lengths on the performance and convergence speed of SNNs, we conducted an
of sequences excluding the current moment. At longer sequences, the ablation study on CIFAR10, DVS-CIFAR10, and N-Caltech101.
distribution of DTAM is more pronounced, and the neurons pay more Both DSGM and DTAM contribute to the improvement of model
attention to the input currents at closer moments. performance. Due to less dynamic characteristics and shorter simula-
The attention mechanism is one of the common advanced cognitive tion time, DTAM does not contribute significantly to the classification
functions in the brain, and it plays a crucial role in many deep learning task of SNNs on static datasets due to less dynamic characteristics and
models such as Transformer (Vaswani et al., 2017). Transformer utilizes shorter simulation time. Both DSGM and DTAM effectively improve the
a self-attention mechanism to adaptively adjust the weights of different model performance for event datasets with richer temporal dynamics.
spatial positions by calculating matrices for Query, Key, and Value. Fig. 12 shows the training curve when using different modules.
This method enhances representational power without significantly DSGM and DTAM enable the neuron’s membrane potential to adapt
increasing the number of parameters. However, it incurs computational to different input currents at different moments, thus enabling faster
costs during inference due to the need for online computation of adaptation to external stimuli.
self-attention matrices. The nonlinear mapping of the DSGM for the input current can be
In contrast, DTAM assigns dynamic importance to different time divided into two parts: linear and exponential. To further verify the role
steps by utilizing a pre-defined matrix. This matrix is updated con- of these two components in DSGM, we perform an ablation analysis on
currently with other parameters during model training, but remains the components of DSGM, and the results are shown in Table 5.
fixed during inference. This enables DTAM to assign different weights We decompose the mapping of DSGM for inhibitory input currents
to input currents at different historical time steps, thereby influencing into linear and exponential components and conduct ablation experi-
the behavior of neurons. As shown in Eq. (6), DTAM is a linear ments on the CIFAR10 and DVS-CIFAR10 datasets, as shown in Table 5.
198
G. Shen et al. Neural Networks 170 (2024) 190–201
Fig. 11. DTAM weight matrices on the DVS-Gesture dataset with long sequences.
Table 4 The experimental results show that the DSGM outperforms the dendritic
Ablation study on DSGM and DTAM. model using only exponential and linear mappings on both static and
Method CIFAR10 DVS-CIFAR10 N-Caltech101 DVS datasets. This is because DSGM achieves a continuous and smooth
Baseline 96.03 83.4 84.37 transformation of the input currents while maintaining the nonlinearity
DSGM 96.32 84.6 85.23 of the dendritic model.
DTAM 95.98 84.8 85.11
DSGM+DTAM 96.41 85.53 85.26
5. Conclusion
199
G. Shen et al. Neural Networks 170 (2024) 190–201
nonlinear computing role. This paper proposes the dendritic spatial gat- Han, B., Srinivasan, G., & Roy, K. (2020). Rmp-snn: Residual membrane potential
ing mechanism and dendritic temporal adjust mechanism due to den- neuron for enabling deeper high-accuracy and low-latency spiking neural network.
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
drites’ adaptive spatiotemporal processing ability for input information. (pp. 13558–13567).
Through the spatial translation and scaling of the input information Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv preprint
and assigning different importance to the input information at different arXiv:1606.08415.
Kim, Y., Li, Y., Park, H., Venkatesha, Y., & Panda, P. (2022). Neural architecture search
time steps, the spikes of the network are in a relatively balanced state.
for spiking neural networks. In European Conference on Computer Vision (pp. 36–56).
We conduct experiments on the CIFAR10, CIFAR100, DVS-CIFAR10, Springer.
DVS-Gesture, and N-Caltech101 datasets and achieve state-of-the-art Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny
performance. At the same time, on ImageNet, it achieves a performance images. Citeseer.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
comparable to the current state-of-the-art algorithm under a lower convolutional neural networks. In Advances in neural information processing systems,
latency and lighter network structure. vol. 25.
Kugele, A., Pfeil, T., Pfeiffer, M., & Chicca, E. (2020). Efficient processing of spatio-
temporal data streams with spiking neural networks. Frontiers in Neuroscience, 14,
Declaration of competing interest 439.
Li, H., Liu, H., Ji, X., Li, G., & Shi, L. (2017). Cifar10-dvs: An event-stream dataset for
The authors declare that they have no known competing finan- object classification. Frontiers in Neuroscience, 11, 309.
Li, H., Xu, Z., Taylor, G., Studer, C., & Goldstein, T. (2018). Visualizing the loss
cial interests or personal relationships that could have appeared to
landscape of neural nets. In Advances in neural information processing systems, vol.
influence the work reported in this paper. 31.
Li, Y., & Zeng, Y. (2022). Efficient and accurate conversion of spiking neural network
Data availability with burst spikes. arXiv preprint arXiv:2204.13271.
Li, Y., Zhao, D., & Zeng, Y. (2022). BSNN: Towards faster and better conversion of
artificial neural networks to spiking neural networks with bistable neurons. Frontiers
Data will be made available on request. in neuroscience, 16, 991851.
Liu, Q., Xing, D., Feng, L., Tang, H., & Pan, G. (2022). Event-based multimodal spiking
neural network with attention mechanism. In ICASSP 2022-2022 IEEE international
Acknowledgments conference on acoustics, speech and signal processing (pp. 8922–8926). IEEE.
Maass, W. (1997). Networks of spiking neurons: The third generation of neural network
This work was supported by the National Key Research and Devel- models. Neural Networks, 10(9), 1659–1671.
Na, B., Mok, J., Park, S., Lee, D., Choe, H., & Yoon, S. (2022). Autosnn: Towards energy-
opment Program (Grant No. 2020AAA0107800). efficient spiking neural networks. In International Conference on Machine Learning
(pp. 16253–16269). PMLR.
References Orchard, G., Jayawant, A., Cohen, G. K., & Thakor, N. (2015). Converting static image
datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience,
9, 437.
Acharya, J., Basu, A., Legenstein, R., Limbacher, T., Poirazi, P., & Wu, X. (2021).
Ramesh, B., Yang, H., Orchard, G., Le Thi, N. A., Zhang, S., & Xiang, C. (2019). Dart:
Dendritic computing: Branching deeper into machine learning. Neuroscience. Distribution aware retinal transform for event-based cameras. IEEE Transactions on
Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J., Di Nolfo, C., et al. (2017). A Pattern Analysis and Machine Intelligence, 42(11), 2767–2780.
low power, fully event-based gesture recognition system. In Proceedings of the IEEE Rathi, N., & Roy, K. (2021). Diet-SNN: A low-latency spiking neural network with direct
conference on computer vision and pattern recognition (pp. 7243–7252). input encoding and leakage and threshold optimization. IEEE Transactions on Neural
Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial Networks and Learning Systems.
neural networks. Neuron, 109(17), 2727–2739. Rathi, N., Srinivasan, G., Panda, P., & Roy, K. (2019). Enabling deep spiking neural
Bi, G.-q., & Poo, M.-m. (1998). Synaptic modifications in cultured hippocampal neurons: networks with hybrid conversion and spike timing dependent backpropagation. In
Dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal International conference on learning representations.
of Neuroscience, 18(24), 10464–10472. RoSE, G. J., & Call, S. J. (1992). Evidence for the role of dendritic spines in the temporal
Bicknell, B. A., & Häusser, M. (2021). A synaptic learning rule for exploiting nonlinear filtering properties of neurons: The decoding problem and beyond. Proceedings of
dendritic computation. Neuron, 109(24), 4001–4017. the National Academy of Sciences, 89(20), 9662–9665.
Bohte, S. M. (2011). Error-backpropagation in networks of fractionally predictive Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by
spiking neurons. In Artificial neural networks and machine learning–ICANN 2011: back-propagating errors. Nature, 323(6088), 533–536.
21st international conference on artificial neural networks, Espoo, Finland, June 14-17, Sengupta, A., Ye, Y., Wang, R., Liu, C., & Roy, K. (2019). Going deeper in spiking
neural networks: VGG and residual architectures. Frontiers in Neuroscience, 13, 95.
2011, proceedings, Part I 21 (pp. 60–68). Springer.
Shaban, A., Bezugam, S. S., & Suri, M. (2021). An adaptive threshold neuron for
Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., & Huang, T. (2021). Optimal ANN-SNN
recurrent spiking neural networks with nanodevice hardware implementation.
conversion for high-accuracy and ultra-low-latency spiking neural networks. In
Nature Communications, 12(1), 1–11.
International conference on learning representations.
Shen, G., Zhao, D., & Zeng, Y. (2022a). Backpropagation with biologically plausible
Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network spatiotemporal adjustment for training deep spiking neural networks. Patterns,
learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. Article 100522.
Deng, S., Li, Y., Zhang, S., & Gu, S. (2021). Temporal efficient training of spiking Shen, G., Zhao, D., & Zeng, Y. (2022b). Eventmix: An efficient augmentation strategy
neural network via gradient re-weighting. In International conference on learning for event-based data. arXiv preprint arXiv:2205.12054.
representations. Shrestha, S. B., & Orchard, G. (2018). Slayer: Spike layer error reassignment in time.
Duan, C., Ding, J., Chen, S., Yu, Z., & Huang, T. (2022). Temporal effective batch In Advances in neural information processing systems, vol. 31.
normalization in spiking neural networks. Adv. Neural Inf. Process. Syst., 35, Szalai, M. L., Kevwitch, R. M., & McGrath, D. V. (2003). Geometric disassembly
34377–34390. of dendrimers: Dendritic amplification. Journal of the American Chemical Society,
Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., & Tian, Y. (2021). Deep residual 125(51), 15688–15689.
learning in spiking neural networks. Advances in Neural Information Processing Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al.
Systems, 34, 21056–21069. (2017). Attention is all you need. In Advances in neural information processing
Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., & Tian, Y. (2021). Incorporating systems, vol. 30.
learnable membrane time constant to enhance learning of spiking neural networks. Wu, Y., Deng, L., Li, G., Zhu, J., & Shi, L. (2018). Spatio-temporal backpropagation for
In Proceedings of the IEEE/CVF international conference on computer vision (pp. training high-performance spiking neural networks. Frontiers in Neuroscience, 12,
331.
2661–2671).
Wu, Y., Deng, L., Li, G., Zhu, J., Xie, Y., & Shi, L. (2019). Direct training for spiking
Furber, S. B., Lester, D. R., Plana, L. A., Garside, J. D., Painkras, E., Temple, S., et
neural networks: Faster, larger, better. In Proceedings of the AAAI conference on
al. (2012). Overview of the SpiNNaker system architecture. IEEE Transactions on
artificial intelligence, vol. 33, no. 01 (pp. 1311–1318).
Computers, 62(12), 2454–2467.
Wu, Z., Zhang, H., Lin, Y., Li, G., Wang, M., & Tang, Y. (2021). Liaf-net: Leaky integrate
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. and analog fire network for lightweight and efficient spatiotemporal information
In Proceedings of the fourteenth international conference on artificial intelligence and processing. IEEE Transactions on Neural Networks and Learning Systems.
statistics (pp. 315–323). JMLR Workshop and Conference Proceedings. Xing, Y., Di Caterina, G., & Soraghan, J. (2020). A new spiking convolutional recurrent
Guo, Y., Tong, X., Chen, Y., Zhang, L., Liu, X., Ma, Z., et al. (2022). RecDis-SNN: neural network (SCRNN) with applications to event-based hand gesture recognition.
Rectifying membrane potential distribution for directly training spiking neural Frontiers in Neuroscience, 14, 1143.
networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations
recognition (pp. 326–335). in convolutional network. arXiv preprint arXiv:1505.00853.
200
G. Shen et al. Neural Networks 170 (2024) 190–201
Yao, M., Gao, H., Zhao, G., Wang, D., Lin, Y., Yang, Z., et al. (2021). Temporal-wise Zhang, W., & Li, P. (2020). Temporal spike sequence learning via backpropagation for
attention spiking neural networks for event streams classification. In Proceedings of deep spiking neural networks. Advances in Neural Information Processing Systems, 33,
the IEEE/CVF international conference on computer vision (pp. 10221–10230). 12022–12033.
Yao, X., Li, F., Mo, Z., & Cheng, J. (2022). Glif: A unified gated leaky integrate- Zhao, D., Zeng, Y., & Li, Y. (2022). BackEISNN: A deep spiking neural network with
and-fire neuron for spiking neural networks. Adv. Neural Inf. Process. Syst., 35, adaptive self-feedback and balanced excitatory–inhibitory neurons. Neural Networks.
32160–32171. Zheng, H., Wu, Y., Deng, L., Hu, Y., & Li, G. (2021). Going deeper with directly-trained
Zeng, Y., Zhao, D., Zhao, F., Shen, G., Dong, Y., Lu, E., et al. (2023). Braincog: a spiking larger spiking neural networks. In Proceedings of the AAAI conference on artificial
neural network based, brain-inspired cognitive intelligence engine for brain-inspired intelligence, vol. 35, no. 12 (pp. 11062–11070).
ai and brain simulation. Patterns, 4(8). Zucker, R. S., & Regehr, W. G. (2002). Short-term synaptic plasticity. Annual Review of
Physiology, 64(1), 355–405.
201