Abstract
Recently, Transformer has achieved state-of-the-arts results in several research areas such as Natural Language Processing and Computer Vision. Due to Transformer has a very large number of parameters and its core module Multi-Head Attention has a complex structure, the optimization of Multi-Head Attention for Transformer is now the research hotspots. However, most of the current work focused on software model optimization or hardware accelerator design, but unilateral optimization from algorithms or hardware is difficult to give full play to comprehensive performance of Multi-Head Attention, which is not well adapted to its characteristics. To solve the above problem, we propose a Software and Hardware Fusion Multi-Head Attention structure, which has less inference latency with tiny accuracy loss than the existing software optimization methods and hardware accelerators. We implement this design on Xilinx ZCU102 and validate this model accuracy and inference time using CIFAR-10 dataset, and obtained accuracy within 1% loss with respect to the baseline, and inference time 15.19 times of the baseline.
This paper is supported by the National Natural Science Foundation of China under Grant No. 61972293.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Devlin, J., Chang, M., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: 9th ICLR (2021)
Ham, T.J., Jung, S.J., Kim, S., et al.: A\(^{\wedge }\)3: accelerating attention mechanisms in neural networks with approximation. In: IEEE HPCA, pp. 328–341 (2020)
Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. Adv. Neural Inf. Process. Syst. 34 (2021)
Khan, H., Khan, A., et al.: NPE: an FPGA-based overlay processor for natural language processing. In: ACM/SIGDA FPGA, p. 227 (2021)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)
Lan, Z., Chen, M., Goodman, S., et al.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th ICLR (2020)
Li, B., Pandey, S., Fang, H., et al.: FTRANS: energy-efficient acceleration of transformers using FPGA. In: ACM/IEEE ISLPED, pp. 175–180 (2020)
Li, Y., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE TII 17(4), 2833–2841 (2020)
Liu, M., Zhang, S., et al.: H infinite state estimation for discrete-time chaotic systems based on a unified model. IEEE Trans, SMC (B) (2012)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: IEEE/CVF CV, pp. 10012–10022 (2021)
Liu, Z., Li, G., Cheng, J.: Hardware acceleration of fully quantized BERT for efficient natural language processing, March 2021
Lu, S., Wang, M., et al.: Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In: 33rd IEEE SoCC, pp. 84–89 (2020)
Lu, Z., Wang, N., et al.: IoTDeM: an IoT big data-oriented mapReduce performance prediction extended model in multiple edge clouds. J. Parallel Distrib. Comput. 118, 316–327 (2018)
Niu, J., Gao, Y., et al.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. J. Parallel Distrib. Comput. 72, 1565–1575 (2012)
Qiu, H., et al.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE J. Bio. Health Inf. 24, 2499–2505 (2020)
Qiu, L., Gai, K., Qiu, M.: Optimal big data sharing approach for tele-health in cloud computing. In: IEEE SmartCloud, pp. 184–189 (2016)
Qiu, M., Cao, D., et al.: Data transfer minimization for financial derivative pricing using Monte Carlo simulation with GPU in 5G. Int. J. Comm. Sys. 29(16), 2364–2374 (2016)
Qiu, M., Gai, K., Xiong, Z.: Privacy-preserving wireless communications using bipartite matching in social big data. Fut. Gene. Comput. Syst. 87, 772–781 (2018)
Qiu, M., Guo, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. J. Parallel Distrib. Comput. 69, 546–558 (2009)
Qiu, M., Li, H., Sha, E.: Heterogeneous real-time embedded software optimization considering hardware platform. In: ACM SAC, pp. 1637–1641 (2009)
Qiu, M., Liu, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: 2011 IEEE/ACM International Conference on Green Computing and Communications (2011)
Qiu, M., Xue, C., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. In: IEEE EUC, pp. 25–34 (2006)
Qiu, M., et al.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE, pp. 1–6 (2007)
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Wang, S., Li, B.Z., et al.: Linformer: Self-attention with linear complexity. CoRR abs/2006.04768 (2020). https://arxiv.org/abs/2006.04768
Wang, W., Bi, B., Yan, M., et al.: StructBERT: incorporating language structures into pre-training for deep language understanding. In: 8th ICLR (2020)
Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distributed system monitoring. J. Parallel Distrib. Comput. 73(3), 330–340 (2013)
Wu, Z., Liu, Z., Lin, J., Lin, Y., Han, S.: Lite transformer with long-short range attention. In: 8th ICLR (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, W., Xu, D., Liu, F., Fan, Z. (2022). Software and Hardware Fusion Multi-Head Attention. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13370. Springer, Cham. https://doi.org/10.1007/978-3-031-10989-8_51
Download citation
DOI: https://doi.org/10.1007/978-3-031-10989-8_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10988-1
Online ISBN: 978-3-031-10989-8
eBook Packages: Computer ScienceComputer Science (R0)