Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Properties and Training in Recurrent Neural Networks

  • Chapter
  • First Online:
Recurrent Neural Networks for Short-Term Load Forecasting

Abstract

In this chapter, we describe the basic concepts behind the functioning of recurrent neural networks and explain the general properties that are common to several existing architectures. We introduce the basis of their training procedure, the backpropagation through time, as a general way to propagate and distribute the prediction error to previous states of the network. The learning procedure consists of updating the model parameters by minimizing a suitable loss function, which includes the error achieved on the target task and, usually, also one or more regularization terms. We then discuss several ways of regularizing the system, highlighting their advantages and drawbacks. Beside the standard stochastic gradient descent procedure, we also present several additional optimization strategies proposed in the literature for updating the network weights. Finally, we illustrate the problem of the vanishing gradient effect, an inherent problem of the gradient-based optimization techniques which occur in several situations while training neural networks. We conclude by discussing the most recent and successful approaches proposed in the literature to limit the vanishing of the gradients.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade: second edition. Springer, Berlin, pp 437–478. https://doi.org/10.1007/978-3-642-35289-8_26

  • Bianchi FM, Livi L, Alippi C (2016a) Investigating echo-state networks dynamics by means of recurrence analysis. IEEE Trans Neural Netw Learn Syst 99:1–13. https://doi.org/10.1109/TNNLS.2016.2630802

  • Bottou L (2004) Stochastic learning. In: Bousquet O, von Luxburg U (eds) Advanced lectures on machine learning. Lecture Notes in Artificial Intelligence, LNAI, vol 3176. Springer Verlag, Berlin, pp 146–168. http://leon.bottou.org/papers/bottou-mlss-2004

  • Bottou L (2012a) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. Springer, pp 421–436

    Google Scholar 

  • Bottou L (2012b) Stochastic gradient descent tricks. In: Montavon G, Orr GB, Müller KR (eds) neural networks: tricks of the trade: second edition. Springer, Berlin, pp 421–436. https://doi.org/10.1007/978-3-642-35289-8_25

  • Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent neural networks for multivariate time series with missing values. arXiv:1606.01865

  • Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates Inc., pp 2933–2941

    Google Scholar 

  • Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MATH  MathSciNet  Google Scholar 

  • El Hihi S, Bengio Y (1995) Hierarchical recurrent neural networks for long-term dependencies. In: Proceedings of the 8th International Conference on Neural Information Processing Systems (NIPS’95). MIT Press, Cambridge, MA, USA, pp 493–499. http://dl.acm.org/citation.cfm?id=2998828.2998898

  • Gal Y, Ghahramani Z (2015) A theoretically grounded application of dropout in recurrent neural networks. arXiv:1512:05287

  • Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256

    Google Scholar 

  • Gomez FJ, Miikkulainen R (2003) Robust non-linear control through neuroevolution. Computer Science Department, University of Texas at Austin

    Google Scholar 

  • Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Draw: a recurrent neural network for image generation. arXiv:150204623

  • Haykin SS, Haykin SS, Haykin SS (2001) Kalman filtering and neural networks. Wiley Online Library

    Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512:03385

  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

    Google Scholar 

  • Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies

    Google Scholar 

  • Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. German National Research Center for Information Technology GMD Technical Report 148:34, Bonn, Germany

    Google Scholar 

  • Jaeger H (2002b) Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach, vol 5. GMD-Forschungszentrum Informationstechnik

    Google Scholar 

  • John H (1992) Holland, adaptation in natural and artificial systems

    Google Scholar 

  • Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980

  • Koutník J, Greff K, Gomez FJ, Schmidhuber J (2014) A clockwork RNN. arXiv:1402.3511

  • Lee JD, Recht B, Srebro N, Tropp J, Salakhutdinov RR (2010) Practical large-scale optimization for max-norm regularization. In: Advances in neural information processing systems, pp 1297–1305

    Google Scholar 

  • Lipton ZC (2015) A critical review of recurrent neural networks for sequence learning. http://arxiv.org/abs/1506.00019

  • Livi L, Bianchi FM, Alippi C (2017) Determination of the edge of criticality in echo state networks through fisher information maximization. IEEE Trans Neural Netw Learn Syst (99):1–12. https://doi.org/10.1109/TNNLS.2016.2644268

  • Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149. https://doi.org/10.1016/j.cosrev.2009.03.005

  • Maass W, Joshi P, Sontag ED (2007) Computational aspects of feedback in neural circuits. PLoS Comput Biol 3(1):e165. https://doi.org/10.1371/journal.pcbi.0020165.eor

  • Martens J (2010) Deep learning via hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp 735–742

    Google Scholar 

  • Montavon G, Orr G, Müller KR (2012) Neural networks-tricks of the trade second edition. Springer. https://doi.org/10.1007/978-3-642-35289-8

  • Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pp 807–814

    Google Scholar 

  • Neelakantan A, Vilnis L, Le QV, Sutskever I, Kaiser L, Kurach K, Martens J (2015) Adding gradient noise improves learning for very deep networks. arXiv:151106807

  • Nesterov Y (1983) A method of solving a convex programming problem with convergence rate O(1/sqrt(k)). Sov Math Dokl 27:372–376

    MATH  Google Scholar 

  • Pascanu R, Gülçehre Ç, Cho K, Bengio Y (2013a) How to construct deep recurrent neural networks. arXiv:1312.6026

  • Pascanu R, Mikolov T, Bengio Y (2012) Understanding the exploding gradient problem. Computing Research Repository (CoRR). arXiv:12115063

  • Pascanu R, Mikolov T, Bengio Y (2013b) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International conference on international conference on machine learning, JMLR.org, ICML’13, vol 28, pp III–1310–III–1318. http://dl.acm.org/citation.cfm?id=3042817.3043083

  • Pham V, Bluche T, Kermorvant C, Louradour J (2014a) Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, pp 285–290

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation, Technical report, DTIC Document

    Google Scholar 

  • Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89. https://doi.org/10.1016/j.neucom.2017.02.029

  • Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):e1200. https://doi.org/10.1002/widm.1200

  • Schmidhuber J, Wierstra D, Gagliolo M, Gomez F (2007) Training recurrent networks by evolino. Neural Comput 19(3):757–779

    Article  MATH  Google Scholar 

  • Schoenholz SS, Gilmer J, Ganguli S, Sohl-Dickstein J (2016) Deep information propagation. arXiv:1611.01232

  • Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  • Siegelmann HT, Sontag ED (1991) Turing computability with neural nets. Appl Math Lett 4(6):77–80

    Article  MATH  MathSciNet  Google Scholar 

  • Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc., pp 2377–2385

    Google Scholar 

  • Sutskever I (2013) Training recurrent neural networks. PhD thesis, University of Toronto

    Google Scholar 

  • Sutskever I, Hinton G (2010) Temporal-kernel recurrent neural networks. Neural Netw 23(2):239–243

    Article  Google Scholar 

  • Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. ICML 3(28):1139–1147

    Google Scholar 

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

    Google Scholar 

  • Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4:2

    Google Scholar 

  • Williams RJ, Peng J (1990) An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput 2(4):490–501. https://doi.org/10.1162/neco.1990.2.4.490

  • Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Backpropagation: theory, architectures, and applications, vol 1. pp 433–486

    Google Scholar 

  • Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280

    Article  Google Scholar 

  • Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov RR, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems, vol 29. Curran Associates Inc., pp 1822–1830

    Google Scholar 

  • Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2016) Recurrent highway networks. arXiv:1607:03474

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filippo Maria Bianchi .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bianchi, F.M., Maiorino, E., Kampffmeyer, M.C., Rizzi, A., Jenssen, R. (2017). Properties and Training in Recurrent Neural Networks. In: Recurrent Neural Networks for Short-Term Load Forecasting. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-70338-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70338-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70337-4

  • Online ISBN: 978-3-319-70338-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics