research-article

Entropy Targets for Adaptive Distillation

Authors:

Ying AiAuthors Info & Claims

ICRAI '20: Proceedings of the 6th International Conference on Robotics and Artificial Intelligence

Pages 179 - 183

https://doi.org/10.1145/3449301.3449332

Published: 09 June 2021 Publication History

Abstract

The focus of this paper is the problem of targets in knowledge distillation. Compared with hard targets, soft targets can provide extra information which compensates for the lack of supervision signals in classification problems, but there are still many defects such as high entropy's chaos. The problem is addressed by controlling the information entropy, which makes the student network adapt to the targets. After introducing the concepts of the system and interference labels, we propose the entropy transformation which can reduce information entropy of the system using interference labels and maintain supervision signal. Through entropy analysis and entropy transformation, entropy targets are generated from soft targets and are added to the loss function. Due to the decrease in entropy, the student network can better adapt to learn the inter-class similarity from the adaptive knowledge and can potentially lower the risk of over-fitting. Our experiments on MNIST and DISTRACT dataset demonstrate the benefits of entropy targets over soft targets.

References

[1]

Simonyan, K. Zisserman, A. Very deep convolutional networks for large-scale image recognition. computer vision and pattern recognition, 2014.

[2]

Szegedy, C. Wei, L. Going deeper with convolutions. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, 1–9.

[3]

Kaiming, H. Xiangyu, Z, and Shaoqing, R. Deep residual learning for image recognition. ” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 770–778.

[4]

Hinton, G. E., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. 2015.

[5]

Shannon, C, E. A mathematical theory of communication, part ii. Bell Syst. Tech. J., vol. 27. 1948, 623-656.

[6]

Bin, Y. DeMei, Z. Analysis of the random-fuzzy reliability based on the information entropy theory. Journal of Mechanical Strength, vol. 5, 2006. 695-698.

[7]

Perronnin, F. Sanchez, J. and Mensink, T. Improving the fifisher kernel for large-scale image classifification. in European conference on computer vision. Springer, 2010. 143-156.

[8]

Nair, V. Hinton, G. E. Rectifified linear units improve restricted boltzmann machines. in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, 807–814.

Digital Library

[9]

Srivastava, N. Hinton, G. E. Krizhevsky, A. Dropout: a simple way to prevent neural networks from over fifitting. The journal of machine learning research, vol. 15, no. 1,2014,1929–1958.

[10]

Ioffe, S. Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167, 2015.

Digital Library

[11]

Hao, L. Kadav, A. Durdanovic, I. Pruning fifilters for effificient convnets. arXiv preprint arXiv:1608.08710, 2016.

[12]

Jian-Hao, L. Jianxin, W. An entropy-based pruning method for cnn compression. arXiv preprint arXiv:1706.05791, 2017.

[13]

Chenglin. Y. Lingxi, X. and Siyuan, Q. Training deep neural networks in generations: A more tolerant teacher educates better students.in Proceedings of the AAAI Conference on Artifificial Intel ligence, 2019, vol. 33. 5628–5635.

[14]

Pereyra, G. Tucker, G. Chorowski, J. Regularizing neural networks by penalizing confifident output distributions, arXiv preprint arXiv:1701.06548, 2017.

[15]

Romero, A. Ballas, N. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.

[16]

Cheng,G. Shilin, W. Lip image segmentation in mobile devices based on alternative knowledge distillation. in 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019. 1540–1544.

[17]

Zhiting, H. Zichao, Y. Salakhutdinov, R. Deep neural networks with massive learned knowledge. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016. 1670–1679.

[18]

Zagoruyko, S. Komodakis,N. Paying more attention to attention: Improving the performance of

[19]

convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.

[20]

Junho, Y. Donggyu, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4133–4141.

[21]

Y LeCun, “The mnist database of handwritten digits; 1998 http://yann. lecun. com/exdb/mnist,” 2018

[22]

Kaggle, “State farm distracted driver detection,” https://www.kaggle.com/c/ state-farm-distracted-driver-detection, 2016.

Cited By

Kim KJeong C(2023)F-ALBERT: A Distilled Model from a Two-Time Distillation System for Reduced Computational Complexity in ALBERT ModelApplied Sciences10.3390/app1317953013:17(9530)Online publication date: 23-Aug-2023
https://doi.org/10.3390/app13179530

Index Terms

Entropy Targets for Adaptive Distillation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
    2. Machine learning approaches

Index terms have been assigned to the content through auto-classification.

Recommendations

Some properties of Rényi entropy and Rényi entropy rate

In this paper, we define the conditional Renyi entropy and show that the so-called chain rule holds for the Renyi entropy. Then, we introduce a relation for the rate of Renyi entropy and use it to derive the rate of the Renyi entropy for an irreducible-...
The graph-theoretical properties of partitions and information entropy
RSFDGrC'05: Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part I

The information entropy, as a measurement of the average amount of information contained in an information system, is used in the classification of objects and the analysis of information systems. The information entropy of a partition is non-increasing ...
Complement information entropy for uncertainty measure in fuzzy rough set and its applications

Uncertainty measure is an important tool for analyzing imprecise and ambiguous data. Some information entropy models in rough set theory have been defined for various information systems. However, there are relatively few studies on evaluating ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICRAI '20: Proceedings of the 6th International Conference on Robotics and Artificial Intelligence

November 2020

288 pages

ISBN:9781450388597

DOI:10.1145/3449301

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICRAI 2020

ICRAI 2020: 2020 6th International Conference on Robotics and Artificial Intelligence

November 20 - 22, 2020

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
39
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim KJeong C(2023)F-ALBERT: A Distilled Model from a Two-Time Distillation System for Reduced Computational Complexity in ALBERT ModelApplied Sciences10.3390/app1317953013:17(9530)Online publication date: 23-Aug-2023
https://doi.org/10.3390/app13179530

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents