Abstract
The rapid development of deep learning technology has made deep learning models widely used in image processing, speech recognition, and target tracking. However, the model becomes larger and larger, and it is difficult to deploy on a stand-alone device, usually on a distributed computing platform. As a high-performance digital signal processor developed by the 38th Research Institute of China Electronics Technology Group, HXDSP has strong computing power and rich computing resources, and is suitable for computing-intensive applications such as deep learning. Design the many-core virtual platform based on the HXDSP simulator, and provide the parallel communication interface MPIRIO to realize fast communication and task synchronization between the HXDSPs, and provide basic conditions for the deployment of deep learning models. At the same time, the parallel computing capability and pipeline mechanism provided by the virtual platform are used to accelerate the operation of the model. Aiming at the problem that the traditional gradient descent algorithm needs to be manually set, the meta-learning optimization algorithm is used to realize the adaptive fine-tuning of the model on the virtual platform, forming a deep learning optimization framework based on the CPU/HXDSP heterogeneous system.
Supported by the Core Electronic Devices, High-end Generic Chips and Basic Software of National Sicence and Technology Major Projects of China under Grant No. 2012ZX01034-001-001.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hao, X., Zhang, G., Ma, S.: Deep learning. Int. J. Semant. Comput. 10(3), 417–439 (2016)
Li, T.-M., Gharbi, M., Adams, A., et al.: Differentiable programming for image processing and deep learning in halide. ACM Trans. Graph. 37(4), 1–13 (2018)
Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2014). https://doi.org/10.1007/s10489-014-0629-7
Lu, W., Zhou, Z., Zhang, L., Zheng, G.: Multi-target tracking by non-linear motion patterns based on hierarchical network flows. Multimed. Syst. 25(4), 383–394 (2019). https://doi.org/10.1007/s00530-019-00614-y
Weng, Yu., Xia, C.: A new deep learning-based handwritten character recognition system on mobile computing devices. Mob. Netw. Appl. 25(2), 402–411 (2019). https://doi.org/10.1007/s11036-019-01243-5
Kang, W., Chung, J.: Power- and time-aware deep learning inference for mobile embedded devices. IEEE Access 7, 3778–3789 (2019)
Bhardwaj, K., Lin, C., Sartor, A., et al.: Memory- and communication-aware model compression for distributed deep learning inference on IoT. ACM Trans. Embed. Comput. Syst. 18(5), 1–22 (2019)
Mayer, R., Jacobsen, H.-A.: Scalable deep learning on distributed infrastructures: challenges, techniques, and tools. ACM Comput. Surv. 53(1), 1–37 (2020)
Liu, Y., Lang, W., Jia, G.: Realization and performance analysis of matrix multiplication on HXDSP platform. Comput. Eng. 45(4), 25–29 (2019)
Shang, C., Yang, F., Huang, D., et al.: Data-driven soft sensor development based on deep learning technique. J. Process Control 24(3), 223–233 (2014)
Shah, S.-I.-A., Khanvilkar, S., Khokhar, A.: RapidIO traffic management and flow arbitration protocol. IEEE Commun. Mag. 44(7), 45–52 (2006)
Cossu, G., Sturniolo, A., Messa, A., et al.: Full-Fledged 10Base-T ethernet underwater optical wireless communication system. IEEE J. Sel. Areas Commun. 36(1), 194–202 (2018)
Rivas-Gomez, S., Gioiosa, R., Peng, I.-B., et al.: MPI windows on storage for HPC applications. Parallel Comput. 77(9), 38–56 (2018)
Schumacher, J., Hayley, K., Boutin, L.-C., et al.: PPAPI: a program for groundwater modeling tasks in distributed parallel computing environments. J. Ground Water 56(2), 248–250 (2018)
Berg, R., König, L., Rühaak, J., Lausen, R., Fischer, B.: Highly efficient image registration for embedded systems using a distributed multicore DSP architecture. J. Real-Time Image Process. 14(2), 341–361 (2014). https://doi.org/10.1007/s11554-014-0457-3
Ma, Y., Suda, N., Cao, Y., et al.: ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler. Integration 62(6), 14–23 (2018)
Zhou, F., Wu, B., Li, Z.: Deep Meta-Learning: Learning to Learn in the Concept Space. arXiv preprint arXiv:1802.03596 (2018)
Hong, G., Kang, S., Kim, C.-S., et al.: Efficient parallel join processing exploiting SIMD in multi-thread environments. ICE Trans. Inf. Syst. 101(3), 659–667 (2018)
Qiu, K., Zhu, Y., Xu, Y., et al.: BRLoop: constructing balanced retimed loop to architect STT-RAM-based hybrid cache for VLIW processors. Microelectron. J. 83(1), 137–146 (2019)
Chen, K., Tao, W.: Learning linear regression via single-convolutional layer for visual object tracking. IEEE Trans. Multimed. 21(1), 86–97 (2018)
Shermin, T., Murshed, M., Lu, G., et al.: An Efficient Transfer Learning Technique by Using Final Fully-Connected Layer Output Features of Deep Networks. arXiv preprint arXiv:1712.01252 (2018)
Zhou, Y., Zhang, M., Zhu, J., Zheng, R., Wu, Q.: A randomized block-coordinate adam online learning optimization algorithm. Neural Comput. Appl. 32(16), 12671–12684 (2020). https://doi.org/10.1007/s00521-020-04718-9
Acknowledgments
This work was supported by the Core Electronic Devices, High-end Generic Chips and Basic Software of National Sicence and Technology Major Projects of China under Grant No. 2012ZX01034-001-001. And we thank the AnHui Province Key Laboratory of High Performance Computing at Heifei in UTSC for their support of our research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cai, H., Ning, C., Zheng, Q. (2021). Deep Learning Optimization for Many-Core Virtual Platforms. In: Ning, L., Chau, V., Lau, F. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2020. Communications in Computer and Information Science, vol 1362. Springer, Singapore. https://doi.org/10.1007/978-981-16-0010-4_3
Download citation
DOI: https://doi.org/10.1007/978-981-16-0010-4_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0009-8
Online ISBN: 978-981-16-0010-4
eBook Packages: Computer ScienceComputer Science (R0)