Properties and Training in Recurrent Neural Networks

Bianchi, Filippo Maria; Maiorino, Enrico; Kampffmeyer, Michael C.; Rizzi, Antonello; Jenssen, Robert

doi:10.1007/978-3-319-70338-1_2

Filippo Maria Bianchi¹⁹,
Enrico Maiorino²⁰,
Michael C. Kampffmeyer¹⁹,
Antonello Rizzi²¹ &
…
Robert Jenssen¹⁹

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

2271 Accesses

Abstract

In this chapter, we describe the basic concepts behind the functioning of recurrent neural networks and explain the general properties that are common to several existing architectures. We introduce the basis of their training procedure, the backpropagation through time, as a general way to propagate and distribute the prediction error to previous states of the network. The learning procedure consists of updating the model parameters by minimizing a suitable loss function, which includes the error achieved on the target task and, usually, also one or more regularization terms. We then discuss several ways of regularizing the system, highlighting their advantages and drawbacks. Beside the standard stochastic gradient descent procedure, we also present several additional optimization strategies proposed in the literature for updating the network weights. Finally, we illustrate the problem of the vanishing gradient effect, an inherent problem of the gradient-based optimization techniques which occur in several situations while training neural networks. We conclude by discussing the most recent and successful approaches proposed in the literature to limit the vanishing of the gradients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade: second edition. Springer, Berlin, pp 437–478. https://doi.org/10.1007/978-3-642-35289-8_26
Bianchi FM, Livi L, Alippi C (2016a) Investigating echo-state networks dynamics by means of recurrence analysis. IEEE Trans Neural Netw Learn Syst 99:1–13. https://doi.org/10.1109/TNNLS.2016.2630802
Bottou L (2004) Stochastic learning. In: Bousquet O, von Luxburg U (eds) Advanced lectures on machine learning. Lecture Notes in Artificial Intelligence, LNAI, vol 3176. Springer Verlag, Berlin, pp 146–168. http://leon.bottou.org/papers/bottou-mlss-2004
Bottou L (2012a) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. Springer, pp 421–436
Google Scholar
Bottou L (2012b) Stochastic gradient descent tricks. In: Montavon G, Orr GB, Müller KR (eds) neural networks: tricks of the trade: second edition. Springer, Berlin, pp 421–436. https://doi.org/10.1007/978-3-642-35289-8_25
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent neural networks for multivariate time series with missing values. arXiv:1606.01865
Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates Inc., pp 2933–2941
Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MATH MathSciNet Google Scholar
El Hihi S, Bengio Y (1995) Hierarchical recurrent neural networks for long-term dependencies. In: Proceedings of the 8th International Conference on Neural Information Processing Systems (NIPS’95). MIT Press, Cambridge, MA, USA, pp 493–499. http://dl.acm.org/citation.cfm?id=2998828.2998898
Gal Y, Ghahramani Z (2015) A theoretically grounded application of dropout in recurrent neural networks. arXiv:1512:05287
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256
Google Scholar
Gomez FJ, Miikkulainen R (2003) Robust non-linear control through neuroevolution. Computer Science Department, University of Texas at Austin
Google Scholar
Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Draw: a recurrent neural network for image generation. arXiv:150204623
Haykin SS, Haykin SS, Haykin SS (2001) Kalman filtering and neural networks. Wiley Online Library
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512:03385
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Google Scholar
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies
Google Scholar
Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. German National Research Center for Information Technology GMD Technical Report 148:34, Bonn, Germany
Google Scholar
Jaeger H (2002b) Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach, vol 5. GMD-Forschungszentrum Informationstechnik
Google Scholar
John H (1992) Holland, adaptation in natural and artificial systems
Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980
Koutník J, Greff K, Gomez FJ, Schmidhuber J (2014) A clockwork RNN. arXiv:1402.3511
Lee JD, Recht B, Srebro N, Tropp J, Salakhutdinov RR (2010) Practical large-scale optimization for max-norm regularization. In: Advances in neural information processing systems, pp 1297–1305
Google Scholar
Lipton ZC (2015) A critical review of recurrent neural networks for sequence learning. http://arxiv.org/abs/1506.00019
Livi L, Bianchi FM, Alippi C (2017) Determination of the edge of criticality in echo state networks through fisher information maximization. IEEE Trans Neural Netw Learn Syst (99):1–12. https://doi.org/10.1109/TNNLS.2016.2644268
Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149. https://doi.org/10.1016/j.cosrev.2009.03.005
Maass W, Joshi P, Sontag ED (2007) Computational aspects of feedback in neural circuits. PLoS Comput Biol 3(1):e165. https://doi.org/10.1371/journal.pcbi.0020165.eor
Martens J (2010) Deep learning via hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp 735–742
Google Scholar
Montavon G, Orr G, Müller KR (2012) Neural networks-tricks of the trade second edition. Springer. https://doi.org/10.1007/978-3-642-35289-8
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pp 807–814
Google Scholar
Neelakantan A, Vilnis L, Le QV, Sutskever I, Kaiser L, Kurach K, Martens J (2015) Adding gradient noise improves learning for very deep networks. arXiv:151106807
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate O(1/sqrt(k)). Sov Math Dokl 27:372–376
MATH Google Scholar
Pascanu R, Gülçehre Ç, Cho K, Bengio Y (2013a) How to construct deep recurrent neural networks. arXiv:1312.6026
Pascanu R, Mikolov T, Bengio Y (2012) Understanding the exploding gradient problem. Computing Research Repository (CoRR). arXiv:12115063
Pascanu R, Mikolov T, Bengio Y (2013b) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International conference on international conference on machine learning, JMLR.org, ICML’13, vol 28, pp III–1310–III–1318. http://dl.acm.org/citation.cfm?id=3042817.3043083
Pham V, Bluche T, Kermorvant C, Louradour J (2014a) Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, pp 285–290
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation, Technical report, DTIC Document
Google Scholar
Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89. https://doi.org/10.1016/j.neucom.2017.02.029
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):e1200. https://doi.org/10.1002/widm.1200
Schmidhuber J, Wierstra D, Gagliolo M, Gomez F (2007) Training recurrent networks by evolino. Neural Comput 19(3):757–779
Article MATH Google Scholar
Schoenholz SS, Gilmer J, Ganguli S, Sohl-Dickstein J (2016) Deep information propagation. arXiv:1611.01232
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Siegelmann HT, Sontag ED (1991) Turing computability with neural nets. Appl Math Lett 4(6):77–80
Article MATH MathSciNet Google Scholar
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc., pp 2377–2385
Google Scholar
Sutskever I (2013) Training recurrent neural networks. PhD thesis, University of Toronto
Google Scholar
Sutskever I, Hinton G (2010) Temporal-kernel recurrent neural networks. Neural Netw 23(2):239–243
Article Google Scholar
Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. ICML 3(28):1139–1147
Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Google Scholar
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4:2
Google Scholar
Williams RJ, Peng J (1990) An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput 2(4):490–501. https://doi.org/10.1162/neco.1990.2.4.490
Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Backpropagation: theory, architectures, and applications, vol 1. pp 433–486
Google Scholar
Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280
Article Google Scholar
Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov RR, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems, vol 29. Curran Associates Inc., pp 1822–1830
Google Scholar
Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2016) Recurrent highway networks. arXiv:1607:03474

Download references

Author information

Authors and Affiliations

UiT The Arctic University of Norway, Tromsø, Norway
Filippo Maria Bianchi, Michael C. Kampffmeyer & Robert Jenssen
Harvard Medical School, Boston, MA, USA
Enrico Maiorino
Sapienza University of Rome, Rome, Italy
Antonello Rizzi

Authors

Filippo Maria Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Maiorino
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Kampffmeyer
View author publications
You can also search for this author in PubMed Google Scholar
Antonello Rizzi
View author publications
You can also search for this author in PubMed Google Scholar
Robert Jenssen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filippo Maria Bianchi .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bianchi, F.M., Maiorino, E., Kampffmeyer, M.C., Rizzi, A., Jenssen, R. (2017). Properties and Training in Recurrent Neural Networks. In: Recurrent Neural Networks for Short-Term Load Forecasting. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-70338-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-70338-1_2
Published: 10 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70337-4
Online ISBN: 978-3-319-70338-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Properties and Training in Recurrent Neural Networks