A Novel Distributed Duration-Aware LSTM for Large Scale Sequential Data Analysis

Niu, Dejiao; Liu, Yawen; Cai, Tao; Zheng, Xia; Liu, Tianquan; Zhou, Shijie

doi:10.1007/978-981-15-1899-7_9

Dejiao Niu ORCID: orcid.org/0000-0001-6351-3004¹²,
Yawen Liu¹²,
Tao Cai ORCID: orcid.org/0000-0003-1423-2710¹²,
Xia Zheng¹²,
Tianquan Liu¹² &
…
Shijie Zhou¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1120))

Included in the following conference series:

CCF Conference on Big Data

1229 Accesses
1 Citations

Abstract

Long short-term memory (LSTM) is an important model for sequential data processing. However, large amounts of matrix computations in LSTM unit seriously aggravate the training when model grows larger and deeper as well as more data become available. In this work, we propose an efficient distributed duration-aware LSTM(D-LSTM) for large scale sequential data analysis. We improve LSTM’s training performance from two aspects. First, the duration of sequence item is explored in order to design a computationally efficient cell, called duration-aware LSTM(D-LSTM) unit. With an additional mask gate, the D-LSTM cell is able to perceive the duration of sequence item and adopt an adaptive memory update accordingly. Secondly, on the basis of D-LSTM unit, a novel distributed training algorithm is proposed, where D-LSTM network is divided logically and multiple distributed neurons are introduced to perform the easier and concurrent linear calculations in parallel. Different from the physical division in model parallelism, the logical split based on hidden neurons can greatly reduce the communication overhead which is a major bottleneck in distributed training. We evaluate the effectiveness of the proposed method on two video datasets. The experimental results shown our distributed D-LSTM greatly reduces the training time and can improve the training efficiency for large scale sequence analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Video Summarization with Long Short-Term Memory

Long Short-Term Attention

Complex sequential understanding through the awareness of spatial and temporal concepts

Article 27 April 2020

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal processing, pp. 6645–6649. IEEE, Piscataway (2013)
Google Scholar
Salehinejad, H., Sankar, S, Barfett, J., Colak, E., Valaee, S.: Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078 (2018)
Dean, J., Corrado, G.S., Monga, R., Chen, K., Ng, A.Y.: Large scale distributed deep network. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1223–1231. Neural Information Processing Systems Foundation, San Diego (2012)
Google Scholar
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Devin, M.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Xing, E.P., Ho, Q.R., Dai, W., Kim, J.K., Yu, Y.L.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)
Article Google Scholar
Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Su, B.Y.: Scaling distributed machine learning with the parameter server. In: Proceedings of International Conference on Big Data Science & Computing, p. 1. ACM, New York (2014)
Google Scholar
Eluyode, O.S., Akomolafe, D.T.: Comparative study of biological and artificial neural networks. Eur. J. Appl. Eng. Sci. Res. 2(1), 36–46 (2013)
Google Scholar
Lei, T., Zhang, Y.: Training RNNs as fast as CNNs. arXiv preprint arXiv:1709.02755 (2017)
Khomenko, V., Shyshkov, O., Radyvonenko, O., Bokhan, K.: Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization. In: Proceedings of International Conference on Data Stream Mining and Processing, pp. 561–570 IEEE, Piscataway (2016)
Google Scholar
Huang, Z., Zweig, G., Levit, M., Dumoulin, B., Chang, S.: Accelerating recurrent neural network training via two stage classes and parallelization. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Olomouc, Czech Republic, pp. 326–331. IEEE, Piscataway, 8–12 December 2013
Google Scholar
Ji, S., Vishwanathan, S.V.N., Satish, N., Anderson, M.J., Dubey, P.: BlackOut: speeding up recurrent neural network language models with very large vocabularies. Comput. Sci. 115(8), 59–68 (2015)
Google Scholar
Keuper, J., Preundt, F.J.: Distributed training of deep neural networks: theoretical and practical limits of parallel scalability. In Proceedings of the Workshop on Machine Learning in High Performance Computing Environments, pp. 19–26. IEEE, Piscataway (2017)
Google Scholar
Gholami, A., Azad, A., Jin, P., Keutzer, K., Buluc, A.: Integrated model, batch, and domain parallelism in training neural networks. arXiv preprint arXiv:1712.04432 (2017)
Niu, D.J., Xia, Z., Liu, Y.W., Cai, T., Zhan, Y.Z.: ALSTM: adaptive LSTM for durative sequential data. In: Proceedings of IEEE 30th International Conference on Tools with Artificial Intelligence, pp. 1018–1026. IEEE, Piscataway (2018)
Google Scholar
Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 510–526. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_31
Chapter Google Scholar
Tang, Y., Ding, D., Rao, Y., Zheng, Y., Zhang, D., Zhao, L.: COIN: a large-scale dataset for comprehensive instructional video analysis. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (accepted, 2019)
Google Scholar

Download references

Acknowledgment

This work was partly supported by the National Natural Science Foundation of China No. 61806086, and the China Postdoctoral Science Foundation No. 2016M601737.

Author information

Authors and Affiliations

The School of Computer Science and Communication Engineering of Jiangsu University, Zhenjiang, 212013, Jiangsu, China
Dejiao Niu, Yawen Liu, Tao Cai, Xia Zheng, Tianquan Liu & Shijie Zhou

Authors

Dejiao Niu
View author publications
You can also search for this author in PubMed Google Scholar
Yawen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xia Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Tianquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dejiao Niu or Tao Cai .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Hai Jin
East China Normal University, Shanghai, China
Xuemin Lin
Chinese Academy of Sciences, Beijing, China
Xueqi Cheng
Huazhong University of Science and Technology, Wuhan, China
Xuanhua Shi
National University of Defense Technology, Changsha, China
Nong Xiao
Nanjing University, Nanjing, China
Yihua Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niu, D., Liu, Y., Cai, T., Zheng, X., Liu, T., Zhou, S. (2019). A Novel Distributed Duration-Aware LSTM for Large Scale Sequential Data Analysis. In: Jin, H., Lin, X., Cheng, X., Shi, X., Xiao, N., Huang, Y. (eds) Big Data. BigData 2019. Communications in Computer and Information Science, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-1899-7_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-1899-7_9
Published: 28 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1898-0
Online ISBN: 978-981-15-1899-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

A Novel Distributed Duration-Aware LSTM for Large Scale Sequential Data Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Video Summarization with Long Short-Term Memory

Long Short-Term Attention

Complex sequential understanding through the awareness of spatial and temporal concepts

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

A Novel Distributed Duration-Aware LSTM for Large Scale Sequential Data Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Video Summarization with Long Short-Term Memory

Long Short-Term Attention

Complex sequential understanding through the awareness of spatial and temporal concepts

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation