Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3157096.3157211guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

A theoretically grounded application of dropout in recurrent neural networks

Published: 05 December 2016 Publication History

Abstract

Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout. This grounding of dropout in approximate Bayesian inference suggests an extension of the theoretical results, offering insights into the use of dropout with RNN models. We apply this new variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks. The new approach outperforms existing techniques, and to the best of our knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity). This extends our arsenal of variational tools in deep learning.

References

[1]
Martin Sundermeyer, Ralf Schluter, and Hermann Ney. LSTM neural networks for language modeling. In INTERSPEECH, 2012.
[2]
N Kalchbrenner and P Blunsom. Recurrent continuous translation models. In EMNLP, 2013.
[3]
Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
[4]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.
[5]
Geoffrey E others Hinton. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[6]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 2014.
[7]
Marius Pachitariu and Maneesh Sahani. Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint arXiv:1301.5650, 2013.
[8]
J Bayer et al. On fast dropout and its applicability to recurrent networks. arXiv preprint arXiv:1311.0701, 2013.
[9]
Vu Pham, Theodore Bluche, Christopher Kermorvant, and Jerome Louradour. Dropout improves recurrent neural networks for handwriting recognition. In ICFHR. IEEE, 2014.
[10]
Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. Where to apply dropout in recurrent neural networks for handwriting recognition? In ICDAR. IEEE, 2015.
[11]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
[12]
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. In ICML, 2015.
[13]
Jose Miguel Hernandez-Lobato and Ryan Adams. Probabilistic backpropagation for scalable learning of Bayesian neural networks. In ICML, 2015.
[14]
Yarin Gal and Zoubin Ghahramani. Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv:1506.02158, 2015.
[15]
Diederik Kingma, Tim Salimans, and Max Welling. Variational dropout and the local reparameterization trick. In NIPS. Curran Associates, Inc., 2015.
[16]
Anoop Korattikara Balan, Vivek Rathod, Kevin P Murphy, and Max Welling. Bayesian dark knowledge. In NIPS. Curran Associates, Inc., 2015.
[17]
Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv:1506.02142, 2015.
[18]
S Hochreiter and J Schmidhuber. Long short-term memory. Neural computation, 9(8), 1997.
[19]
Kyunghyun Cho et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, Doha, Qatar, October 2014. ACL.
[20]
Taesup Moon, Heeyoul Choi, Hoshik Lee, and Inchul Song. RnnDrop: A Novel Dropout for RNNs in ASR. In ASRU Workshop, December 2015.
[21]
David JC MacKay. A practical Bayesian framework for backpropagation networks. Neural computation, 4(3):448-472, 1992.
[22]
R M Neal. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995.
[23]
Geoffrey E Hinton and Drew Van Camp. Keeping the neural networks simple by minimizing the description length of the weights. In COLT, pages 5-13. ACM, 1993.
[24]
David Barber and Christopher M Bishop. Ensemble learning in Bayesian neural networks. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES, 168:215-238, 1998.
[25]
Alex Graves. Practical variational inference for neural networks. In NIPS, 2011.
[26]
Alan Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In ICASSP. IEEE, 2013.
[27]
Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL. ACL, 2005.
[28]
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[29]
James Bergstra et al. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.
[30]
fchollet. Keras. https://github.com/fchollet/keras, 2015.

Cited By

View all
  • (2024)Density-softmaxProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692262(4822-4853)Online publication date: 21-Jul-2024
  • (2023)Multi-Band Measurements for Deep Learning-Based Dynamic Channel Prediction and SimulationIEEE Communications Magazine10.1109/MCOM.003.220071861:9(98-104)Online publication date: 1-Sep-2023
  • (2021)Multi-Task Neural Sequence Labeling for Zero-Shot Cross-Language Boilerplate RemovalIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493938(326-334)Online publication date: 14-Dec-2021
  • Show More Cited By
  1. A theoretically grounded application of dropout in recurrent neural networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems
    December 2016
    5100 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 05 December 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Density-softmaxProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692262(4822-4853)Online publication date: 21-Jul-2024
    • (2023)Multi-Band Measurements for Deep Learning-Based Dynamic Channel Prediction and SimulationIEEE Communications Magazine10.1109/MCOM.003.220071861:9(98-104)Online publication date: 1-Sep-2023
    • (2021)Multi-Task Neural Sequence Labeling for Zero-Shot Cross-Language Boilerplate RemovalIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493938(326-334)Online publication date: 14-Dec-2021
    • (2021)A Study of English-Indonesian Neural Machine Translation with Attention (Seq2Seq, ConvSeq2Seq, RNN, and MHA)Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology10.1145/3479645.3479703(271-280)Online publication date: 13-Sep-2021
    • (2021)Multi-Perspective Video CaptioningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475173(5110-5118)Online publication date: 17-Oct-2021
    • (2021)AdaReNet: Adaptive Reweighted Semi-supervised Active Learning to Accelerate Label AcquisitionProceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference10.1145/3453892.3461321(431-438)Online publication date: 29-Jun-2021
    • (2021)Deep Learning and Visualization for Identifying Malware FamiliesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2018.288492818:1(283-295)Online publication date: 1-Jan-2021
    • (2020)The implicit and explicit regularization effects of dropoutProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525881(10181-10192)Online publication date: 13-Jul-2020
    • (2020)Time series deconfounderProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525021(884-895)Online publication date: 13-Jul-2020
    • (2020)Frequentist uncertainty in recurrent neural networks via blockwise influence functionsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3524956(175-190)Online publication date: 13-Jul-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media