Article

Free access

A theoretically grounded application of dropout in recurrent neural networks

Authors:

Zoubin GhahramaniAuthors Info & Claims

NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

Pages 1027 - 1035

Published: 05 December 2016 Publication History

PDF eReader Publisher Site

Abstract

Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout. This grounding of dropout in approximate Bayesian inference suggests an extension of the theoretical results, offering insights into the use of dropout with RNN models. We apply this new variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks. The new approach outperforms existing techniques, and to the best of our knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity). This extends our arsenal of variational tools in deep learning.

References

[1]

Martin Sundermeyer, Ralf Schluter, and Hermann Ney. LSTM neural networks for language modeling. In INTERSPEECH, 2012.

[2]

N Kalchbrenner and P Blunsom. Recurrent continuous translation models. In EMNLP, 2013.

[3]

Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence to sequence learning with neural networks. In NIPS, 2014.

Digital Library

[4]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.

[5]

Geoffrey E others Hinton. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[6]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 2014.

Digital Library

[7]

Marius Pachitariu and Maneesh Sahani. Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint arXiv:1301.5650, 2013.

[8]

J Bayer et al. On fast dropout and its applicability to recurrent networks. arXiv preprint arXiv:1311.0701, 2013.

[9]

Vu Pham, Theodore Bluche, Christopher Kermorvant, and Jerome Louradour. Dropout improves recurrent neural networks for handwriting recognition. In ICFHR. IEEE, 2014.

[10]

Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. Where to apply dropout in recurrent neural networks for handwriting recognition? In ICDAR. IEEE, 2015.

Digital Library

[11]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.

Digital Library

[12]

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. In ICML, 2015.

Digital Library

[13]

Jose Miguel Hernandez-Lobato and Ryan Adams. Probabilistic backpropagation for scalable learning of Bayesian neural networks. In ICML, 2015.

Digital Library

[14]

Yarin Gal and Zoubin Ghahramani. Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv:1506.02158, 2015.

[15]

Diederik Kingma, Tim Salimans, and Max Welling. Variational dropout and the local reparameterization trick. In NIPS. Curran Associates, Inc., 2015.

Digital Library

[16]

Anoop Korattikara Balan, Vivek Rathod, Kevin P Murphy, and Max Welling. Bayesian dark knowledge. In NIPS. Curran Associates, Inc., 2015.

Digital Library

[17]

Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv:1506.02142, 2015.

Digital Library

[18]

S Hochreiter and J Schmidhuber. Long short-term memory. Neural computation, 9(8), 1997.

Digital Library

[19]

Kyunghyun Cho et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, Doha, Qatar, October 2014. ACL.

[20]

Taesup Moon, Heeyoul Choi, Hoshik Lee, and Inchul Song. RnnDrop: A Novel Dropout for RNNs in ASR. In ASRU Workshop, December 2015.

[21]

David JC MacKay. A practical Bayesian framework for backpropagation networks. Neural computation, 4(3):448-472, 1992.

Digital Library

[22]

R M Neal. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995.

Digital Library

[23]

Geoffrey E Hinton and Drew Van Camp. Keeping the neural networks simple by minimizing the description length of the weights. In COLT, pages 5-13. ACM, 1993.

Digital Library

[24]

David Barber and Christopher M Bishop. Ensemble learning in Bayesian neural networks. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES, 168:215-238, 1998.

[25]

Alex Graves. Practical variational inference for neural networks. In NIPS, 2011.

Digital Library

[26]

Alan Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In ICASSP. IEEE, 2013.

[27]

Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL. ACL, 2005.

Digital Library

[28]

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[29]

James Bergstra et al. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.

[30]

fchollet. Keras. https://github.com/fchollet/keras, 2015.

Cited By

Bui HLiu ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Density-softmaxProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692262(4822-4853)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692262
Pasic FDi Cicco NSkocaj MTornatore MSchwarz SMecklenbräuker COegli-Esposti V(2023)Multi-Band Measurements for Deep Learning-Based Dynamic Channel Prediction and SimulationIEEE Communications Magazine10.1109/MCOM.003.220071861:9(98-104)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/MCOM.003.2200718
Wu YChang C(2021)Multi-Task Neural Sequence Labeling for Zero-Shot Cross-Language Boilerplate RemovalIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493938(326-334)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1145/3486622.3493938
Show More Cited By

A theoretically grounded application of dropout in recurrent neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Dropout algorithms for recurrent neural networks
SAICSIT '18: Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists

In the last decade, hardware advancements have allowed for neural networks to become much larger in size. Dropout is a popular deep learning technique which has shown to improve the performance of large neural networks. Recurrent neural networks are ...
Stability analysis of discrete-time recurrent neural networks based on standard neural network models

In order to conveniently analyze the stability of various discrete-time recurrent neural networks (RNNs), including bidirectional associative memory, Hopfield, cellular neural network, Cohen-Grossberg neural network, and recurrent multiplayer ...
Exponential stability analysis of memristor-based recurrent neural networks with time-varying delays

This paper investigates the exponential stability problem about the memristor-based recurrent neural networks. Having more rich dynamic behaviors, neural networks based on the memristor will play a key role in the optimistic computation and associative ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

December 2016

5100 pages

ISBN:9781510838819

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
317
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)11

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bui HLiu ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Density-softmaxProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692262(4822-4853)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692262
Pasic FDi Cicco NSkocaj MTornatore MSchwarz SMecklenbräuker COegli-Esposti V(2023)Multi-Band Measurements for Deep Learning-Based Dynamic Channel Prediction and SimulationIEEE Communications Magazine10.1109/MCOM.003.220071861:9(98-104)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/MCOM.003.2200718
Wu YChang C(2021)Multi-Task Neural Sequence Labeling for Zero-Shot Cross-Language Boilerplate RemovalIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493938(326-334)Online publication date: 14-Dec-2021
https://dl.acm.org/doi/10.1145/3486622.3493938
Puspitaningrum D(2021)A Study of English-Indonesian Neural Machine Translation with Attention (Seq2Seq, ConvSeq2Seq, RNN, and MHA)Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology10.1145/3479645.3479703(271-280)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3479645.3479703
Bin YShang XPeng BDing YChua TShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Multi-Perspective Video CaptioningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475173(5110-5118)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475173
Lourentzou IGruhl DAlba AGentile ARistoski PDeluca CWelch SZhai CMakedon F(2021)AdaReNet: Adaptive Reweighted Semi-supervised Active Learning to Accelerate Label AcquisitionProceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference10.1145/3453892.3461321(431-438)Online publication date: 29-Jun-2021
https://dl.acm.org/doi/10.1145/3453892.3461321
Sun GQian Q(2021)Deep Learning and Visualization for Identifying Malware FamiliesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2018.288492818:1(283-295)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1109/TDSC.2018.2884928
Wei CKakade SMa TDaumé HSingh A(2020)The implicit and explicit regularization effects of dropoutProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525881(10181-10192)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525881
Bica IAlaa AVan Der Schaar MDaumé HSingh A(2020)Time series deconfounderProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525021(884-895)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525021
Alaa AVan Der Schaar MDaumé HSingh A(2020)Frequentist uncertainty in recurrent neural networks via blockwise influence functionsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3524956(175-190)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3524956
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten