Article

Predicting parameters in deep learning

Authors:

Marc'Aurelio Ranzato, and

Nando de FreitasAuthors Info & Claims

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

December 2013

Pages 2148 - 2156

Published: 05 December 2013 Publication History

Abstract

We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.

References

[1]

Y. Bengio. Deep learning of representations: Looking forward. Technical Report arXiv:1305.0445, Université de Montréal, 2013.

[2]

D. Cireşan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In IEEE Computer Vision and Pattern Recognition, pages 3642-3649, 2012.

[3]

D. Cireşan, U. Meier, and J. Masci. High-performance neural networks for visual object classification. arXiv:1102.0183, 2011.

[4]

A. Coates, A. Karpathy, and A. Ng. Emergence of object-selective features in unsupervised feature learning. In Advances in Neural Information Processing Systems, pages 2690-2698, 2012.

[5]

A. Coates and A. Y. Ng. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems, pages 2528-2536, 2011.

[6]

A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Artificial Intelligence and Statistics, 2011.

[7]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pages 1232-1240, 2012.

[8]

L. Deng, D. Yu, and J. Platt. Scalable stacking and learning for building deep architectures. In International Conference on Acoustics, Speech, and Signal Processing, pages 2133-2136, 2012.

[9]

I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In International Conference on Machine Learning, 2013.

[10]

K. Gregor and Y. LeCun. Emergence of complex-like cells in a temporal product network with local receptive fields. arXiv preprint arXiv:1006.0448, 2010.

[11]

C. Gülçehre and Y. Bengio. Knowledge matters: Importance of prior information for optimization. In International Conference on Learning Representations, 2013.

[12]

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.

[13]

A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1106-1114, 2012.

[14]

K. Lang and G. Hinton. Dimensionality reduction and prior knowledge in e-set recognition. In Advances in Neural Information Processing Systems, 1990.

[15]

Q. V. Le, A. Karpenko, J. Ngiam, and A. Y. Ng. ICA with reconstruction cost for efficient overcomplete feature learning. Advances in Neural Information Processing Systems, 24:1017-1025, 2011.

[16]

Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In International Conference on Machine Learning, 2012.

[17]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[18]

Y. LeCun, J. S. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal brain damage. In Advances in Neural Information Processing Systems, pages 598-605, 1990.

[19]

K.-F. Lee and H.-W. Hon. Speaker-independent phone recognition using hidden markov models. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(11):1641-1648, 1989.

[20]

V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, pages 807-814. Omnipress Madison, WI, 2010.

[21]

M. Ranzato, A. Krizhevsky, and G. E. Hinton. Factored 3-way restricted Boltzmann machines for modeling natural images. In Artificial Intelligence and Statistics, 2010.

[22]

R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua. Learning separable filters. In IEEE Computer Vision and Pattern Recognition, 2013.

[23]

R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 58:1553-1564, 2010.

[24]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323(6088):533-536, 1986.

[25]

J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY, USA, 2004.

[26]

K. Swersky, M. Ranzato, D. Buchman, B. Marlin, and N. Freitas. On autoencoders and score matching for energy based models. In International Conference on Machine Learning, pages 1201-1208, 2011.

[27]

P. Vincent and Y. Bengio. A neural support vector network architecture with adaptive kernels. In International Joint Conference on Neural Networks, pages 187-192, 2000.

Cited By

Gealy CGeorge A(2024)Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/365479923:3(1-35)Online publication date: 30-Mar-2024
https://dl.acm.org/doi/10.1145/3654799
López-González CGascó EBarrientos-Espillco FBesada-Portas EPajares G(2024)Filter pruning for convolutional neural networks in semantic image segmentationNeural Networks10.1016/j.neunet.2023.11.010169:C(713-732)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.11.010
Mozaffari HShejwalkar VHoumansadr ACalandrino JTroncoso C(2023)Every vote countsProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620334(1721-1738)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620334
Show More Cited By

Predicting parameters in deep learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Predicting cell behaviour parameters from glioblastoma on a chip images. A deep learning approach
Abstract
The broad possibilities offered by microfluidic devices in relation to massive data monitoring and acquisition open the door to the use of deep learning technologies in a very promising field: cell culture monitoring. In this work, we develop a ...
Graphical abstract

Display Omitted
Highlights
- The combination of deep learning and microfluidics offers great potential in the field of cell culture monitoring.
- The developed CNN accurately predicts the parameters governing glioblastoma behaviour.
- The network has been subject ...
Read More
Predicting Blood Glucose Dynamics with Multi-time-series Deep Learning
SenSys '17: Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems

Predicting blood glucose dynamics is vital for people to take preventive measures in time against health risks. Previous efforts adopt handcrafted features and design prediction models for each person, which result in low accuracy due to ineffective ...
Read More
Deep learning for predicting respiratory rate from biosignals
Abstract
In the past decade, deep learning models have been applied to bio-sensors used in a body sensor network for prediction. Given recent innovations in this field, the prediction accuracy of novel models needs to be evaluated for bio-...
Highlights
- Deep learning methods have a good estimate of breathing rate and breathing patterns.
Read More

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

December 2013

3236 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

78
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Gealy CGeorge A(2024)Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/365479923:3(1-35)Online publication date: 30-Mar-2024
https://dl.acm.org/doi/10.1145/3654799
López-González CGascó EBarrientos-Espillco FBesada-Portas EPajares G(2024)Filter pruning for convolutional neural networks in semantic image segmentationNeural Networks10.1016/j.neunet.2023.11.010169:C(713-732)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.11.010
Mozaffari HShejwalkar VHoumansadr ACalandrino JTroncoso C(2023)Every vote countsProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620334(1721-1738)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620334
El Halabi MSrinivas SLacoste-Julien SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Data-efficient structured pruning via submodular optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602923(36613-36626)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602923
Aliee HRichter TSolonin MIbarra ITheis FKilbertus NKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Sparsity in continuous-depth neural networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600336(901-914)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600336
Gupta MAgrawal P(2022)Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/348704516:4(1-55)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3487045
Xu SLiu CZhang BLü JGuo GDoermann D(2022)BiRe-ID: Binary Neural Network for Efficient Person Re-IDACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347334018:1s(1-22)Online publication date: 8-Feb-2022
https://dl.acm.org/doi/10.1145/3473340
Pei SWu YGuo JQiu M(2022)Neural Network Pruning by Recurrent Weights for Finance MarketACM Transactions on Internet Technology10.1145/343354722:3(1-23)Online publication date: 22-Jan-2022
https://dl.acm.org/doi/10.1145/3433547
Korol GJordan MRutzig MBeck A(2021)Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the EdgeACM Transactions on Embedded Computing Systems10.1145/347699020:5s(1-26)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3476990
Liu JChen RAn SZhang HShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)CG-GANProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475666(5391-5399)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475666
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents