research-article

Using Distillation to Improve Network Performance after Pruning and Quantization

Authors:

Wenbo ZhangAuthors Info & Claims

MLMI '19: Proceedings of the 2019 2nd International Conference on Machine Learning and Machine Intelligence

Pages 3 - 6

https://doi.org/10.1145/3366750.3366751

Published: 18 September 2019 Publication History

Abstract

As the complexity of processing issues increases, deep neural networks require more computing and storage resources. At the same time, the researchers found that the deep neural network contains a lot of redundancy, causing unnecessary waste, and the network model needs to be further optimized. Based on the above ideas, researchers have turned their attention to building more compact and efficient models in recent years, so that deep neural networks can be better deployed on nodes with limited resources to enhance their intelligence. At present, the deep neural network model compression method have weight pruning, weight quantization, and knowledge distillation and so on, these three methods have their own characteristics, which are independent of each other and can be self-contained, and can be further optimized by effective combination. This paper will construct a deep neural network model compression framework based on weight pruning, weight quantization and knowledge distillation. Firstly, the model will be double coarse-grained compression with pruning and quantization, then the original network will be used as the teacher network to guide the compressed student network. Training is performed to improve the accuracy of the student network, thereby further accelerating and compressing the model to make the loss of accuracy smaller. The experimental results show that the combination of three algorithms can compress 80% FLOPs and reduce the accuracy by only 1%.

References

[1]

LeCun, Y., Denker, J. S., and Solla, S. A. 1990. Optimal brain damage. NIPS'89 Proceedings of the 2nd International Conference on Neural Information Processing Systems. 598--605.

[2]

Hassibi, B., and Stork, D. G. 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems. 164--171.

[3]

Han, S., Pool, J., Tran, J., and Dally, W. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143.

[4]

Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. 2017. Pruning convolutional neural networks for resource efficient inference. In International Conference of Learning Representation. arXiv preprint arXiv:1611.06440

[5]

He, Y., Liu, P., Wang, Z., and Yang, Y. 2018. Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration. arXiv preprint arXiv:1811.00250.

[6]

Singh, P., Verma, V. K., Rai, P., and Namboodiri, V. P. 2018. Leveraging Filter Correlations for Deep Model Compression. arXiv preprint arXiv:1811.10559.

[7]

Gong, Y., Liu, L., Yang, M., and Bourdev, L. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115.

[8]

Wu, J., Leng, C., Wang, Y., Hu, Q., and Cheng, J. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4820--4828.

[9]

Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.

[10]

Gysel, P., Motamedi, M., and Ghiasi, S. 2016. Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1605.06402

[11]

Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., and Bengio, Y. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or -1. arXiv preprint arXiv:1602.02830.

[12]

Li, F., Zhang, B., and Liu, B. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711.

[13]

Hinton, G., Vinyals, O., and Dean, J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.

[14]

Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.

[15]

Yim, J., Joo, D., Bae, J., and Kim, J. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4133--4141.

[16]

Mishra, A., and Marr, D. 2018. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In International Conference of Learning Representation.

[17]

Han, S., Mao, H., and Dally, W. J. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.

[18]

Oguntola, I., Olubeko, S., and Sweeney, C. 2018. SlimNets: An Exploration of Deep Model Compression and Acceleration. In 2018 IEEE High Performance extreme Computing Conference.1--6.

[19]

Polino, A., Pascanu, R., and Alistarh, D.2018. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668

Cited By

Vadera SAmeen S(2022)Methods for Pruning Deep Neural NetworksIEEE Access10.1109/ACCESS.2022.318265910(63280-63300)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3182659

Index Terms

Using Distillation to Improve Network Performance after Pruning and Quantization
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies

Recommendations

A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory usage and energy ...
Compression of Deep Learning Models for Text: A Survey
In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanks to deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (...
A Compression Method for Object Detection Network Using Joint Pruning and Quantization
ISMSI '24: Proceedings of the 2024 8th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence

In recent years, the application scenarios of artificial intelligence technology have become increasingly diverse, with more and more involvement in terminal devices, whose computational and storage capacities are typically limited. At the same time, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MLMI '19: Proceedings of the 2019 2nd International Conference on Machine Learning and Machine Intelligence

September 2019

76 pages

ISBN:9781450372480

DOI:10.1145/3366750

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MLMI 2019

MLMI 2019: 2019 2nd International Conference on Machine Learning and Machine Intelligence

September 18 - 20, 2019

Jakarta, Indonesia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
355
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vadera SAmeen S(2022)Methods for Pruning Deep Neural NetworksIEEE Access10.1109/ACCESS.2022.318265910(63280-63300)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3182659

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten