Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On the Initialization of Long Short-Term Memory Networks

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2019)

Abstract

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM) networks. It is based on a normalized random initialization of the network weights that aims at preserving the variance of the network input and output in the same range. The method is applied to standard LSTMs for univariate time series regression and to LSTMs robust to missing values for multivariate disease progression modeling. The results show that in all cases, the proposed initialization method outperforms the state-of-the-art initialization techniques in terms of training convergence and generalization performance of the obtained solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)

    Google Scholar 

  2. Martens, J., Sutskever, I.: Learning recurrent neural networks with Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning, pp. 1033–1040 (2011)

    Google Scholar 

  3. Trinh, T.H., Dai, A.M., Luong, M.T., Le, Q.V.: Learning longer-term dependencies in RNNs with auxiliary losses. CoRR abs/1803.00144 (2018)

    Google Scholar 

  4. Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)

    Google Scholar 

  5. Vorontsov, E., Trabelsi, C., Kadoury, S., Pal, C.: On orthogonality and learning recurrent networks with long term dependencies. CoRR abs/1702.00071 (2017)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)

    Google Scholar 

  8. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  9. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)

    MathSciNet  MATH  Google Scholar 

  10. Dau, H.A., et al.: The UCR Time Series Archive. CoRR abs/1810.07758 (2018)

    Google Scholar 

  11. Ghazi, M.M., et al.: Training recurrent neural networks robust to incomplete data: application to Alzheimer’s disease progression modeling. Med. Image Anal. 53, 39–46 (2019)

    Article  Google Scholar 

  12. Petersen, R.C., et al.: Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010)

    Article  Google Scholar 

  13. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  14. Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 153–160 (2009)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  16. Talathi, S.S., Vartak, A.: Improving performance of recurrent neural network with ReLU nonlinearity. CoRR abs/1511.03771 (2015)

    Google Scholar 

  17. Buraczewski, D., Damek, E., Mikosch, T., et al.: Stochastic Models with Power-Law Tails. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29679-1

    Book  MATH  Google Scholar 

Download references

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721820.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa Mehdipour Ghazi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mehdipour Ghazi, M. et al. (2019). On the Initialization of Long Short-Term Memory Networks. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36708-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36707-7

  • Online ISBN: 978-3-030-36708-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics