Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366750.3366751acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmlmiConference Proceedingsconference-collections
research-article

Using Distillation to Improve Network Performance after Pruning and Quantization

Published: 18 September 2019 Publication History

Abstract

As the complexity of processing issues increases, deep neural networks require more computing and storage resources. At the same time, the researchers found that the deep neural network contains a lot of redundancy, causing unnecessary waste, and the network model needs to be further optimized. Based on the above ideas, researchers have turned their attention to building more compact and efficient models in recent years, so that deep neural networks can be better deployed on nodes with limited resources to enhance their intelligence. At present, the deep neural network model compression method have weight pruning, weight quantization, and knowledge distillation and so on, these three methods have their own characteristics, which are independent of each other and can be self-contained, and can be further optimized by effective combination. This paper will construct a deep neural network model compression framework based on weight pruning, weight quantization and knowledge distillation. Firstly, the model will be double coarse-grained compression with pruning and quantization, then the original network will be used as the teacher network to guide the compressed student network. Training is performed to improve the accuracy of the student network, thereby further accelerating and compressing the model to make the loss of accuracy smaller. The experimental results show that the combination of three algorithms can compress 80% FLOPs and reduce the accuracy by only 1%.

References

[1]
LeCun, Y., Denker, J. S., and Solla, S. A. 1990. Optimal brain damage. NIPS'89 Proceedings of the 2nd International Conference on Neural Information Processing Systems. 598--605.
[2]
Hassibi, B., and Stork, D. G. 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems. 164--171.
[3]
Han, S., Pool, J., Tran, J., and Dally, W. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143.
[4]
Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. 2017. Pruning convolutional neural networks for resource efficient inference. In International Conference of Learning Representation. arXiv preprint arXiv:1611.06440
[5]
He, Y., Liu, P., Wang, Z., and Yang, Y. 2018. Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration. arXiv preprint arXiv:1811.00250.
[6]
Singh, P., Verma, V. K., Rai, P., and Namboodiri, V. P. 2018. Leveraging Filter Correlations for Deep Model Compression. arXiv preprint arXiv:1811.10559.
[7]
Gong, Y., Liu, L., Yang, M., and Bourdev, L. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115.
[8]
Wu, J., Leng, C., Wang, Y., Hu, Q., and Cheng, J. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4820--4828.
[9]
Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.
[10]
Gysel, P., Motamedi, M., and Ghiasi, S. 2016. Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1605.06402
[11]
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., and Bengio, Y. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or -1. arXiv preprint arXiv:1602.02830.
[12]
Li, F., Zhang, B., and Liu, B. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711.
[13]
Hinton, G., Vinyals, O., and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
[14]
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
[15]
Yim, J., Joo, D., Bae, J., and Kim, J. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4133--4141.
[16]
Mishra, A., and Marr, D. 2018. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In International Conference of Learning Representation.
[17]
Han, S., Mao, H., and Dally, W. J. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
[18]
Oguntola, I., Olubeko, S., and Sweeney, C. 2018. SlimNets: An Exploration of Deep Model Compression and Acceleration. In 2018 IEEE High Performance extreme Computing Conference.1--6.
[19]
Polino, A., Pascanu, R., and Alistarh, D.2018. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668

Cited By

View all

Index Terms

  1. Using Distillation to Improve Network Performance after Pruning and Quantization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MLMI '19: Proceedings of the 2019 2nd International Conference on Machine Learning and Machine Intelligence
    September 2019
    76 pages
    ISBN:9781450372480
    DOI:10.1145/3366750
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. Knowledge distillation
    3. Model compression
    4. Pruning
    5. Quantization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MLMI 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media