SGSAFormer: Spike Gated Self-Attention Transformer and Temporal Attention
Abstract
:1. Introduction
- Propose a TA module, which has the property of having low latency and low power consumption when performing training and inference tasks, and apply it to SNNs.
- Combine the gating mechanism with SNNs by proposing an SGLU and SGSA, which introduce the gating mechanism into the SNNs domain.
- Based on SGLU and SGSA, we propose the SGSAFormer model, which is an improvement of Spikformer that combines the gating mechanism with the spike Transformer to improve the information control of the network, and validate it on a variety of neuromorphic datasets to achieve advanced performance.
2. Related Works
2.1. Learning Methods
2.2. Gating Mechanism
2.3. Attention Spiking Neural Networks
3. Method
3.1. Leaky Integrate-and-Fire Neuron Model
3.2. Temporal Attention
3.3. Spike Gated Linear Unit
3.4. Spike Gated Self-Attention
4. Results
4.1. Datasets
4.2. Temporal Attention Experimental Result
4.3. SGSAFormer Experimental Result
4.4. Ablation Experiment Result
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Netw. 1997, 10, 1659–1671. [Google Scholar] [CrossRef]
- Merolla, P.A.; Arthur, J.V.; Alvarez-Icaza, R.; Cassidy, A.S.; Sawada, J.; Akopyan, F.; Jackson, B.L.; Imam, N.; Guo, C.; Nakamura, Y.; et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 2014, 345, 668–673. [Google Scholar] [CrossRef] [PubMed]
- Davies, M.; Srinivasa, N.; Lin, T.H.; Chinya, G.; Cao, Y.; Choday, S.H.; Dimou, G.; Joshi, P.; Imam, N.; Jain, S.; et al. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 2018, 38, 82–99. [Google Scholar] [CrossRef]
- Pei, J.; Deng, L.; Song, S.; Zhao, M.; Zhang, Y.; Wu, S.; Wang, G.; Zou, Z.; Wu, Z.; He, W.; et al. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 2019, 572, 106–111. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Brown, T.B. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Perrett, T.; Masullo, A.; Burghardt, T.; Mirmehdi, M.; Damen, D. Temporal-relational crosstransformers for few-shot action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 475–484. [Google Scholar]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhu, Y.; He, C.; Wang, Y.; Yan, S.; Tian, Y.; Yuan, L. Spikformer: When spiking neural network meets transformer. arXiv 2022, arXiv:2209.15425. [Google Scholar]
- Diehl, P.U.; Neil, D.; Binas, J.; Cook, M.; Liu, S.C.; Pfeiffer, M. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In Proceedings of the 2015 International Joint Conference On Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; IEEE: Piscataway Township, NJ, USA, 2015; pp. 1–8. [Google Scholar]
- Wang, Y.; Zhang, M.; Chen, Y.; Qu, H. Signed Neuron with Memory: Towards Simple, Accurate and High-Efficient ANN-SNN Conversion. In Proceedings of the IJCAI, Vienna, Austria, 23–29 July 2022; pp. 2501–2508. [Google Scholar]
- Diehl, P.U.; Cook, M. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front. Comput. Neurosci. 2015, 9, 99. [Google Scholar] [CrossRef] [PubMed]
- Kheradpisheh, S.R.; Ganjtabesh, M.; Thorpe, S.J.; Masquelier, T. STDP-based spiking deep convolutional neural networks for object recognition. Neural Netw. 2018, 99, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Kim, Y.; Panda, P. Optimizing deeper spiking neural networks for dynamic vision sensing. Neural Netw. 2021, 144, 686–698. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Cho, K. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
- Qiu, X.; Zhu, R.J.; Chou, Y.; Wang, Z.; Deng, L.j.; Li, G. Gated attention coding for training high-performance and efficient spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 601–610. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Yao, M.; Gao, H.; Zhao, G.; Wang, D.; Lin, Y.; Yang, Z.; Li, G. Temporal-wise attention spiking neural networks for event streams classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10221–10230. [Google Scholar]
- Yao, M.; Zhao, G.; Zhang, H.; Hu, Y.; Deng, L.; Tian, Y.; Xu, B.; Li, G. Attention spiking neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9393–9410. [Google Scholar] [CrossRef]
- Zhou, C.; Yu, L.; Zhou, Z.; Ma, Z.; Zhang, H.; Zhou, H.; Tian, Y. Spikingformer: Spike-driven residual learning for transformer-based spiking neural network. arXiv 2023, arXiv:2304.11954. [Google Scholar]
- Yao, M.; Hu, J.; Zhou, Z.; Yuan, L.; Tian, Y.; Xu, B.; Li, G. Spike-driven transformer. arXiv 2024, arXiv:2307.01694. [Google Scholar]
- Zhou, C.; Zhang, H.; Zhou, Z.; Yu, L.; Huang, L.; Fan, X.; Yuan, L.; Ma, Z.; Zhou, H.; Tian, Y. QKFormer: Hierarchical Spiking Transformer using QK Attention. arXiv 2024, arXiv:2403.16552. [Google Scholar]
- Izhikevich, E.M. Simple model of spiking neurons. IEEE Trans. Neural Netw. 2003, 14, 1569–1572. [Google Scholar] [CrossRef] [PubMed]
- Deng, L.; Wu, Y.; Hu, X.; Liang, L.; Ding, Y.; Li, G.; Zhao, G.; Li, P.; Xie, Y. Rethinking the performance comparison between SNNS and ANNS. Neural Netw. 2020, 121, 294–307. [Google Scholar] [CrossRef] [PubMed]
- Fang, W.; Yu, Z.; Chen, Y.; Masquelier, T.; Huang, T.; Tian, Y. Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2661–2671. [Google Scholar]
- Jiang, C.; Zhang, Y. Klif: An optimized spiking neuron unit for tuning surrogate gradient slope and membrane potential. arXiv 2023, arXiv:2302.09238. [Google Scholar]
- Li, H.; Liu, H.; Ji, X.; Li, G.; Shi, L. Cifar10-dvs: An event-stream dataset for object classification. Front. Neurosci. 2017, 11, 309. [Google Scholar] [CrossRef]
- Amir, A.; Taba, B.; Berg, D.; Melano, T.; McKinstry, J.; Di Nolfo, C.; Nayak, T.; Andreopoulos, A.; Garreau, G.; Mendoza, M.; et al. A low power, fully event-based gesture recognition system. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7243–7252. [Google Scholar]
- Orchard, G.; Jayawant, A.; Cohen, G.K.; Thakor, N. Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neurosci. 2015, 9, 437. [Google Scholar] [CrossRef]
- Fang, W.; Chen, Y.; Ding, J.; Yu, Z.; Masquelier, T.; Chen, D.; Huang, L.; Zhou, H.; Li, G.; Tian, Y. Spikingjelly: An open-source machine learning infrastructure platform for spike-based intelligence. Sci. Adv. 2023, 9, eadi1480. [Google Scholar] [CrossRef]
- Horowitz, M. 1.1 computing’s energy problem (and what we can do about it). In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014; IEEE: Piscataway Township, NJ, USA, 2014; pp. 10–14. [Google Scholar]
- Bi, Y.; Chadha, A.; Abbas, A.; Bourtsoulatze, E.; Andreopoulos, Y. Graph-based spatio-temporal feature learning for neuromorphic vision sensing. IEEE Trans. Image Process. 2020, 29, 9084–9098. [Google Scholar] [CrossRef]
- Gao, S.; Guo, G.; Huang, H.; Cheng, X.; Chen, C.P. An end-to-end broad learning system for event-based object classification. IEEE Access 2020, 8, 45974–45984. [Google Scholar] [CrossRef]
- Wu, Z.; Zhang, H.; Lin, Y.; Li, G.; Wang, M.; Tang, Y. Liaf-net: Leaky integrate and analog fire network for lightweight and efficient spatiotemporal information processing. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6249–6262. [Google Scholar] [CrossRef]
- Deng, Y.; Chen, H.; Li, Y. MVF-Net: A multi-view fusion network for event-based object classification. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 8275–8284. [Google Scholar] [CrossRef]
- Zheng, H.; Wu, Y.; Deng, L.; Hu, Y.; Li, G. Going deeper with directly-trained larger spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtuall, 2–9 February 2021; Volume 35, pp. 11062–11070. [Google Scholar]
- Fang, W.; Yu, Z.; Chen, Y.; Huang, T.; Masquelier, T.; Tian, Y. Deep residual learning in spiking neural networks. Adv. Neural Inf. Process. Syst. 2021, 34, 21056–21069. [Google Scholar]
- Wang, Y.; Shi, K.; Lu, C.; Liu, Y.; Zhang, M.; Qu, H. Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks. In Proceedings of the IJCAI, Macao, China, 19–25 August 2023; pp. 3085–3093. [Google Scholar]
- Zhang, H.; Zhou, C.; Yu, L.; Huang, L.; Ma, Z.; Fan, X.; Zhou, H.; Tian, Y. SGLFormer: Spiking Global-Local-Fusion Transformer with high performance. Front. Neurosci. 2024, 18, 1371290. [Google Scholar] [CrossRef] [PubMed]
- Kaiser, J.; Mostafa, H.; Neftci, E. Synaptic plasticity dynamics for deep continuous local learning (DECOLLE). Front. Neurosci. 2020, 14, 424. [Google Scholar] [CrossRef] [PubMed]
- Jiang, B.; Li, Z.; Asif, M.S.; Cao, X.; Ma, Z. Event transformer. arXiv 2022, arXiv:2204.05172. [Google Scholar]
- Wu, X.; Song, Y.; Zhou, Y.; Jiang, Y.; Bai, Y.; Li, X.; Yang, X. STCA-SNN: Self-attention-based temporal-channel joint attention for spiking neural networks. Front. Neurosci. 2023, 17, 1261543. [Google Scholar] [CrossRef]
- Zhu, R.J.; Zhang, M.; Zhao, Q.; Deng, H.; Duan, Y.; Deng, L.J. Tcja-snn: Temporal-channel joint attention for spiking neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2024. [Google Scholar] [CrossRef]
Model | Architecture | T | Top-1 Acc (%) | Power (mJ) |
---|---|---|---|---|
SCNN | 7-layer SCNN | 16 | 94.79 | 1.272 |
SCNN | 7-layer SCNN | 5 | 90.32 | 0.486 |
SCNN-TA | 7-layer SCNN | 16 | 94.1 | 0.601 |
Spikeformer | Spikformer-2-256 | 16 | 98.3 | 3.055 |
Spikeformer | Spikformer-2-256 | 5 | 94.79 | 0.954 |
Spikeformer-TA | Spikformer-2-256 | 16 | 96.53 | 1.253 |
Dataset | Model | Method | Architecture | T | Top-1 Acc (%) |
---|---|---|---|---|---|
CIFAR10-DVS | RG-CNNs [37] | ANN | - | - | 54.0 |
MLS [38] | ANN | - | - | 58.8 | |
LIAF-Net [39] | ANN | 7-layer CNN | 10 | 70.4 | |
MVF-Net [40] | ANN | ResNet-34 | - | 76.2 | |
tdBN [41] | SNN | ResNet-17 | 10 | 67.8 | |
SEW-ResNet [42] | SNN | 9-layer SCNN | 16 | 74.4 | |
Spikformer [11] | SNN | Spikformer-2-256 | 16 | 80.9 | |
Spikingformer [25] | SNN | Spikingformer-2-256 | 16 | 81.3 | |
S-Transformer [26] | SNN | S-Transformer-2-256 | 16 | 80.0 | |
STSA [43] | SNN | STSformer-2-256 | 16 | 79.9 | |
SGLFormer [44] | SNN | SGLFormer-2-256 | 16 | 82.6 | |
This work | SNN | SGSAFormer-1-512 | 16 | 85.0 ± 0.5 | |
DVS128-Gesture | LIAF-Net [39] | ANN | 7-layer CNN | 60 | 97.4 |
DECOLLE [45] | SNN | 8-layer SCNN | 500 | 95.5 | |
tdBN [41] | SNN | ResNet-17 | 10 | 96.9 | |
SEW-ResNet [42] | SNN | 9-layer SCNN | 16 | 97.9 | |
Spikformer [11] | SNN | Spikformer-2-256 | 16 | 98.3 | |
Spikingformer [25] | SNN | Spikingformer-2-256 | 16 | 98.3 | |
S-Transformer [26] | SNN | S-Transformer-2-256 | 16 | 99.3 | |
STSA [43] | SNN | STSformer-2-256 | 16 | 98.7 | |
SGLFormer [44] | SNN | SGLFormer-2-256 | 16 | 98.6 | |
This work | SNN | SGSAFormer-2-256 | 16 | 99.0 ± 0.03 | |
N-Caltech101 | RG-CNNs [37] | ANN | - | - | 65.7 |
MLS [38] | ANN | - | - | 72.7 | |
MVF-Net [41] | ANN | ResNet-34 | - | 87.1 | |
EventTransformer [46] | SNN | Event Transformer | 16 | 78.9 | |
STCA-SNN [47] | SNN | VGG11 | 16 | 80.88 | |
TCJA-SNN [48] | SNN | SCNN | 16 | 82.5 | |
This work | SNN | SGSAFormer-2-256 | 16 | 83.9 ± 0.2 |
Method | Architecture | Time Step | Power (mJ) |
---|---|---|---|
ANN | Transformer-8-512 | 4 | 38.34 |
Spikformer | Spikformer-8-512 | 4 | 11.58 |
SGSAFormer | SGSAFormer-8-512 | 4 | 12.68 |
Model | Architecture | Top-1 Acc (%) | ||
---|---|---|---|---|
T = 4 | T = 16 | T = 20 | ||
Spikeformer | Spikeformer-2-256 | 76.1 | 79.4 | 81.2 |
Spikeformer-SGLU | Spikeformer-SGLU-2-256 | 76.3 | 80.4 | 82.0 |
SGSAFormer | SGSAFormer-2-256 | 77.0 | 83.6 | 84.8 |
Spikeformer | Spikeformer-1-512 | 77.5 | 80.0 | 80.6 |
Spikeformer-SGLU | Spikeformer-SGLU-1-512 | 77.6 | 84.6 | 84.8 |
SGSAFormer | SGSAFormer-1-512 | 78.1 | 85.5 | 85.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, S.; Qin, Y.; Zhu, R.; Zhao, Z.; Zhou, H.; Zhu, Z. SGSAFormer: Spike Gated Self-Attention Transformer and Temporal Attention. Electronics 2025, 14, 43. https://doi.org/10.3390/electronics14010043
Gao S, Qin Y, Zhu R, Zhao Z, Zhou H, Zhu Z. SGSAFormer: Spike Gated Self-Attention Transformer and Temporal Attention. Electronics. 2025; 14(1):43. https://doi.org/10.3390/electronics14010043
Chicago/Turabian StyleGao, Shouwei, Yu Qin, Ruixin Zhu, Zirui Zhao, Hao Zhou, and Zihao Zhu. 2025. "SGSAFormer: Spike Gated Self-Attention Transformer and Temporal Attention" Electronics 14, no. 1: 43. https://doi.org/10.3390/electronics14010043
APA StyleGao, S., Qin, Y., Zhu, R., Zhao, Z., Zhou, H., & Zhu, Z. (2025). SGSAFormer: Spike Gated Self-Attention Transformer and Temporal Attention. Electronics, 14(1), 43. https://doi.org/10.3390/electronics14010043