Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2999792.2999852guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Predicting parameters in deep learning

Published: 05 December 2013 Publication History
  • Get Citation Alerts
  • Abstract

    We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy.

    References

    [1]
    Y. Bengio. Deep learning of representations: Looking forward. Technical Report arXiv:1305.0445, Université de Montréal, 2013.
    [2]
    D. Cireşan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In IEEE Computer Vision and Pattern Recognition, pages 3642-3649, 2012.
    [3]
    D. Cireşan, U. Meier, and J. Masci. High-performance neural networks for visual object classification. arXiv:1102.0183, 2011.
    [4]
    A. Coates, A. Karpathy, and A. Ng. Emergence of object-selective features in unsupervised feature learning. In Advances in Neural Information Processing Systems, pages 2690-2698, 2012.
    [5]
    A. Coates and A. Y. Ng. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems, pages 2528-2536, 2011.
    [6]
    A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In Artificial Intelligence and Statistics, 2011.
    [7]
    J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pages 1232-1240, 2012.
    [8]
    L. Deng, D. Yu, and J. Platt. Scalable stacking and learning for building deep architectures. In International Conference on Acoustics, Speech, and Signal Processing, pages 2133-2136, 2012.
    [9]
    I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In International Conference on Machine Learning, 2013.
    [10]
    K. Gregor and Y. LeCun. Emergence of complex-like cells in a temporal product network with local receptive fields. arXiv preprint arXiv:1006.0448, 2010.
    [11]
    C. Gülçehre and Y. Bengio. Knowledge matters: Importance of prior information for optimization. In International Conference on Learning Representations, 2013.
    [12]
    G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.
    [13]
    A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1106-1114, 2012.
    [14]
    K. Lang and G. Hinton. Dimensionality reduction and prior knowledge in e-set recognition. In Advances in Neural Information Processing Systems, 1990.
    [15]
    Q. V. Le, A. Karpenko, J. Ngiam, and A. Y. Ng. ICA with reconstruction cost for efficient overcomplete feature learning. Advances in Neural Information Processing Systems, 24:1017-1025, 2011.
    [16]
    Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In International Conference on Machine Learning, 2012.
    [17]
    Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
    [18]
    Y. LeCun, J. S. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal brain damage. In Advances in Neural Information Processing Systems, pages 598-605, 1990.
    [19]
    K.-F. Lee and H.-W. Hon. Speaker-independent phone recognition using hidden markov models. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(11):1641-1648, 1989.
    [20]
    V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, pages 807-814. Omnipress Madison, WI, 2010.
    [21]
    M. Ranzato, A. Krizhevsky, and G. E. Hinton. Factored 3-way restricted Boltzmann machines for modeling natural images. In Artificial Intelligence and Statistics, 2010.
    [22]
    R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua. Learning separable filters. In IEEE Computer Vision and Pattern Recognition, 2013.
    [23]
    R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 58:1553-1564, 2010.
    [24]
    D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323(6088):533-536, 1986.
    [25]
    J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY, USA, 2004.
    [26]
    K. Swersky, M. Ranzato, D. Buchman, B. Marlin, and N. Freitas. On autoencoders and score matching for energy based models. In International Conference on Machine Learning, pages 1201-1208, 2011.
    [27]
    P. Vincent and Y. Bengio. A neural support vector network architecture with adaptive kernels. In International Joint Conference on Neural Networks, pages 187-192, 2000.

    Cited By

    View all
    • (2024)Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/365479923:3(1-35)Online publication date: 30-Mar-2024
    • (2024)Filter pruning for convolutional neural networks in semantic image segmentationNeural Networks10.1016/j.neunet.2023.11.010169:C(713-732)Online publication date: 4-Mar-2024
    • (2023)Every vote countsProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620334(1721-1738)Online publication date: 9-Aug-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2
    December 2013
    3236 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 05 December 2013

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Characterizing Parameter Scaling with Quantization for Deployment of CNNs on Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/365479923:3(1-35)Online publication date: 30-Mar-2024
    • (2024)Filter pruning for convolutional neural networks in semantic image segmentationNeural Networks10.1016/j.neunet.2023.11.010169:C(713-732)Online publication date: 4-Mar-2024
    • (2023)Every vote countsProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620334(1721-1738)Online publication date: 9-Aug-2023
    • (2022)Data-efficient structured pruning via submodular optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602923(36613-36626)Online publication date: 28-Nov-2022
    • (2022)Sparsity in continuous-depth neural networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600336(901-914)Online publication date: 28-Nov-2022
    • (2022)Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/348704516:4(1-55)Online publication date: 8-Jan-2022
    • (2022)BiRe-ID: Binary Neural Network for Efficient Person Re-IDACM Transactions on Multimedia Computing, Communications, and Applications10.1145/347334018:1s(1-22)Online publication date: 8-Feb-2022
    • (2022)Neural Network Pruning by Recurrent Weights for Finance MarketACM Transactions on Internet Technology10.1145/343354722:3(1-23)Online publication date: 22-Jan-2022
    • (2021)Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the EdgeACM Transactions on Embedded Computing Systems10.1145/347699020:5s(1-26)Online publication date: 17-Sep-2021
    • (2021)CG-GANProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475666(5391-5399)Online publication date: 17-Oct-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media