Knowledge Distillation Based on Transformed Teacher Matching

Zheng, Kaixiang; Yang, En-Hui

Computer Science > Machine Learning

arXiv:2402.11148v2 (cs)

[Submitted on 17 Feb 2024 (v1), last revised 7 Mar 2024 (this version, v2)]

Title:Knowledge Distillation Based on Transformed Teacher Matching

Authors:Kaixiang Zheng, En-Hui Yang

View PDF HTML (experimental)

Abstract:As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting variant of KD, dubbed transformed teacher matching (TTM). By reinterpreting temperature scaling as a power transform of probability distribution, we show that in comparison with the original KD, TTM has an inherent Rényi entropy term in its objective function, which serves as an extra regularization term. Extensive experiment results demonstrate that thanks to this inherent regularization, TTM leads to trained students with better generalization than the original KD. To further enhance student's capability to match teacher's power transformed probability distribution, we introduce a sample-adaptive weighting coefficient into TTM, yielding a novel distillation approach dubbed weighted TTM (WTTM). It is shown, by comprehensive experiments, that although WTTM is simple, it is effective, improves upon TTM, and achieves state-of-the-art accuracy performance. Our source code is available at this https URL.

Comments:	Published as a conference paper at ICLR 2024
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.11148 [cs.LG]
	(or arXiv:2402.11148v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.11148

Submission history

From: Kaixiang Zheng [view email]
[v1] Sat, 17 Feb 2024 00:28:06 UTC (558 KB)
[v2] Thu, 7 Mar 2024 22:41:33 UTC (558 KB)

Computer Science > Machine Learning

Title:Knowledge Distillation Based on Transformed Teacher Matching

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Knowledge Distillation Based on Transformed Teacher Matching

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators