Variational Dropout Sparsifies Deep Neural Networks

Dmitry Molchanov; Arsenii Ashukha; Dmitry Vetrov

Variational Dropout Sparsifies Deep Neural Networks

Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2498-2507, 2017.

Abstract

We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-molchanov17a,
  title = 	 {Variational Dropout Sparsifies Deep Neural Networks},
  author =       {Dmitry Molchanov and Arsenii Ashukha and Dmitry Vetrov},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {2498--2507},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/molchanov17a/molchanov17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/molchanov17a.html},
  abstract = 	 {We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.}
}

Endnote

%0 Conference Paper
%T Variational Dropout Sparsifies Deep Neural Networks
%A Dmitry Molchanov
%A Arsenii Ashukha
%A Dmitry Vetrov
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-molchanov17a
%I PMLR
%P 2498--2507
%U https://proceedings.mlr.press/v70/molchanov17a.html
%V 70
%X We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.

APA


Molchanov, D., Ashukha, A. & Vetrov, D.. (2017). Variational Dropout Sparsifies Deep Neural Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2498-2507 Available from https://proceedings.mlr.press/v70/molchanov17a.html.

Related Material

Download PDF