Learning deep hierarchical and temporal recurrent neural networks with residual learning

Zia, Tehseen; Abbas, Assad; Habib, Usman; Khan, Muhammad Sajid

doi:10.1007/s13042-020-01063-0

Learning deep hierarchical and temporal recurrent neural networks with residual learning

Original Article
Published: 29 January 2020

Volume 11, pages 873–882, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Tehseen Zia ORCID: orcid.org/0000-0001-8176-3373¹,
Assad Abbas¹,
Usman Habib² &
…
Muhammad Sajid Khan³

504 Accesses
7 Citations
Explore all metrics

Abstract

Learning both hierarchical and temporal dependencies can be crucial for recurrent neural networks (RNNs) to deeply understand sequences. To this end, a unified RNN framework is required that can ease the learning of both the deep hierarchical and temporal structures by allowing gradients to propagate back from both ends without being vanished. The residual learning (RL) has appeared as an effective and less-costly method to facilitate backward propagation of gradients. The significance of the RL is exclusively shown for learning deep hierarchical representations and temporal dependencies. Nevertheless, there is lack of efforts to unify these finding into a single framework for learning deep RNNs. In this study, we aim to prove that approximating identity mapping is crucial for optimizing both hierarchical and temporal structures. We propose a framework called hierarchical and temporal residual RNNs, to learn RNNs by approximating identity mappings across hierarchical and temporal structures. To validate the proposed method, we explore the efficacy of employing shortcut connections for training deep RNNs structures for sequence learning problems. Experiments are performed on Penn Treebank, Hutter Prize and IAM-OnDB datasets and results demonstrate the utility of the framework in terms of accuracy and computational complexity. We demonstrate that even for large datasets exploiting parameters for increasing network depth can gain computational benefits with reduced size of the RNN "state".

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

Article 06 June 2018

CS-RNN: efficient training of recurrent neural networks with continuous skips

Article 24 June 2022

Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

These notations are used consistently throughout the paper unless specified otherwise.
https://www.fit.vutbr.cz/imikolov/rnnlm/simple-examples.tgz.
Notations are used.

References

Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. https://arxiv.org/abs/arXiv:1506.00019
Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3
Google Scholar
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649
Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) DRAW: a recurrent neural network for image generation. https://arxiv.org/abs/arXiv:1502.04623
Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552
Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (M-RNN). https://arxiv.org/abs/arXiv:1412.6632
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2625–2634
Luo J, Wu J, Zhao S, Wang L, Xu T (2019) Lossless compression for hyperspectral image using deep recurrent neural networks. Int J Mach Learn Cybern 10(10):2619–2629
Article Google Scholar
Kim J, Kim H (2016) Classification performance using gated recurrent unit recurrent neural network on energy disaggregation. In: 2016 international conference on machine learning and cybernetics (ICMLC), vol 1. IEEE, pp 105–110
Chung J, Gulcehre C, Cho K, Bengio Y (2015) Gated feedback recurrent neural networks. In: International conference on machine learning, pp 2067–2075
Schmidhuber J (1992) Learning complex, extended sequences using the principle of history compression. Neural Comput 4(2):234–242
Article Google Scholar
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. https://arxiv.org/abs/arXiv:1312.6026
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, pp 13100–1318
Krizhevsky A, Sutskever I, Hinton GE (2016) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML'15: proceedings of the 32nd international conference on international conference on machine learning, vol 37, pp 448–456
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Bengio Y, Boulanger-Lewandowski N, Pascanu R (2013) Advances in optimizing recurrent networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, Vancouver, pp 8624–8628
Chapter Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth annual conference of the international speech communication association
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Kim JH, Lee SW, Kwak D, Heo MO, Kim J, Ha JW, Zhang BT (2016) Multimodal residual learning for visual qa. In: Advances in neural information processing systems, pp 361–369
Yao K, Cohn T, Vylomova K, Duh K, Dyer C (2015) Depth-gated recurrent neural networks. arXiv:1508.03790
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Article MathSciNet Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. https://arxiv.org/abs/arXiv:1412.3555
Šter B (2013) Selective recurrent neural network. Neural Process Lett 38(1):1–15
Article Google Scholar
Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2016) Recurrent highway networks. https://arxiv.org/abs/arXiv:1607.03474
Zhang Y, Chen G, Yu D, Yaco K, Khudanpur S, Glass J (2016) Highway long short-term memory RNNS for distant speech recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 5755–5759
Zia T (2019) Hierarchical recurrent highway networks. Pattern Recogn Lett 119:71–76
Article Google Scholar
Zia T, Razzaq S (2018) Residual recurrent highway networks for learning deep sequence prediction models. J Grid Comput 1–8
Wang Y, Tian F (2016) Recurrent residual learning for sequence classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 938–943
Kim J, El-Khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. https://arxiv.org/abs/arXiv:1701.03360
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Article Google Scholar
Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. https://arxiv.org/abs/arXiv:1609.01704
El Hihi S, Bengio Y (1996) Hierarchical recurrent neural networks for long-term dependencies. In: Advances in neural information processing systems, pp 493–499
Hutter M (2012) The human knowledge compression contest. https://prize.hutter1.net. Accessed 15 Jan 2019
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330
Google Scholar
Graves A (2013) Generating sequences with recurrent neural networks. https://arxiv.org/abs/arXiv:1308.0850
Goldsborough P (2016) A tour of tensorflow. https://arxiv.org/abs/arXiv:1610.01178

Download references

Author information

Authors and Affiliations

Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
Tehseen Zia & Assad Abbas
National University of Computer and Emerging Sciences (FAST-NUCES), Chiniot-Faisalabad Campus, Chiniot, Pakistan
Usman Habib
College of Computer Science, Sichuan University, Wangjiang campus, Chengdu, China
Muhammad Sajid Khan

Authors

Tehseen Zia
View author publications
You can also search for this author in PubMed Google Scholar
Assad Abbas
View author publications
You can also search for this author in PubMed Google Scholar
Usman Habib
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Sajid Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tehseen Zia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zia, T., Abbas, A., Habib, U. et al. Learning deep hierarchical and temporal recurrent neural networks with residual learning. Int. J. Mach. Learn. & Cyber. 11, 873–882 (2020). https://doi.org/10.1007/s13042-020-01063-0

Download citation

Received: 25 June 2018
Accepted: 11 January 2020
Published: 29 January 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s13042-020-01063-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning deep hierarchical and temporal recurrent neural networks with residual learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

CS-RNN: efficient training of recurrent neural networks with continuous skips

Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning deep hierarchical and temporal recurrent neural networks with residual learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

CS-RNN: efficient training of recurrent neural networks with continuous skips

Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation