On the Initialization of Long Short-Term Memory Networks

Mehdipour Ghazi, Mostafa; Nielsen, Mads; Pai, Akshay; Modat, Marc; Cardoso, M. Jorge; Ourselin, Sébastien; Sørensen, Lauge

doi:10.1007/978-3-030-36708-4_23

Mostafa Mehdipour Ghazi^11,12,13,14,
Mads Nielsen^11,12,13,
Akshay Pai^11,12,13,
Marc Modat^14,15,
M. Jorge Cardoso^14,15,
Sébastien Ourselin^14,15 &
…
Lauge Sørensen^11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11953))

Included in the following conference series:

International Conference on Neural Information Processing

2937 Accesses
4 Citations
5 Altmetric

Abstract

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM) networks. It is based on a normalized random initialization of the network weights that aims at preserving the variance of the network input and output in the same range. The method is applied to standard LSTMs for univariate time series regression and to LSTMs robust to missing values for multivariate disease progression modeling. The results show that in all cases, the proposed initialization method outperforms the state-of-the-art initialization techniques in terms of training convergence and generalization performance of the obtained solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predictive Modeling of the Progression of Alzheimer’s Disease with Recurrent Neural Networks

Article Open access 15 June 2018

Learning Sparse Hidden States in Long Short-Term Memory

Overview of Long Short-Term Memory Neural Networks

References

Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Google Scholar
Martens, J., Sutskever, I.: Learning recurrent neural networks with Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning, pp. 1033–1040 (2011)
Google Scholar
Trinh, T.H., Dai, A.M., Luong, M.T., Le, Q.V.: Learning longer-term dependencies in RNNs with auxiliary losses. CoRR abs/1803.00144 (2018)
Google Scholar
Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)
Google Scholar
Vorontsov, E., Trabelsi, C., Kadoury, S., Pal, C.: On orthogonality and learning recurrent networks with long term dependencies. CoRR abs/1702.00071 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)
MathSciNet MATH Google Scholar
Dau, H.A., et al.: The UCR Time Series Archive. CoRR abs/1810.07758 (2018)
Google Scholar
Ghazi, M.M., et al.: Training recurrent neural networks robust to incomplete data: application to Alzheimer’s disease progression modeling. Med. Image Anal. 53, 39–46 (2019)
Article Google Scholar
Petersen, R.C., et al.: Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 153–160 (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Talathi, S.S., Vartak, A.: Improving performance of recurrent neural network with ReLU nonlinearity. CoRR abs/1511.03771 (2015)
Google Scholar
Buraczewski, D., Damek, E., Mikosch, T., et al.: Stochastic Models with Power-Law Tails. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29679-1
Book MATH Google Scholar

Download references

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721820.

Author information

Authors and Affiliations

Biomediq A/S, Copenhagen, Denmark
Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai & Lauge Sørensen
Cerebriu A/S, Copenhagen, Denmark
Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai & Lauge Sørensen
Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai & Lauge Sørensen
Department of Medical Physics and Biomedical Engineering, University College London, London, UK
Mostafa Mehdipour Ghazi, Marc Modat, M. Jorge Cardoso & Sébastien Ourselin
School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
Marc Modat, M. Jorge Cardoso & Sébastien Ourselin

Authors

Mostafa Mehdipour Ghazi
View author publications
You can also search for this author in PubMed Google Scholar
Mads Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Akshay Pai
View author publications
You can also search for this author in PubMed Google Scholar
Marc Modat
View author publications
You can also search for this author in PubMed Google Scholar
M. Jorge Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Ourselin
View author publications
You can also search for this author in PubMed Google Scholar
Lauge Sørensen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mostafa Mehdipour Ghazi .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mehdipour Ghazi, M. et al. (2019). On the Initialization of Long Short-Term Memory Networks. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-36708-4_23
Published: 09 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36707-7
Online ISBN: 978-3-030-36708-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Initialization of Long Short-Term Memory Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predictive Modeling of the Progression of Alzheimer’s Disease with Recurrent Neural Networks

Learning Sparse Hidden States in Long Short-Term Memory

Overview of Long Short-Term Memory Neural Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Initialization of Long Short-Term Memory Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predictive Modeling of the Progression of Alzheimer’s Disease with Recurrent Neural Networks

Learning Sparse Hidden States in Long Short-Term Memory

Overview of Long Short-Term Memory Neural Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation