research-article

Enhancing deep feature representation in self-knowledge distillation via pyramid feature refinement

Authors:

Yunlong WangAuthors Info & Claims

Volume 178, Issue C

Pages 35 - 42

https://doi.org/10.1016/j.patrec.2023.12.014

Published: 17 April 2024 Publication History

Abstract

In recent years, various self-knowledge distillation approaches have been proposed to reduce the cost of training teacher networks. However, these methods often overlook the significance of deep features. To address this limitation and strengthen the capability of deep features while preserving the ability of shallow features, we propose performing Self-Knowledge Distillation via Pyramid Feature Refinement (PR-SKD). Inspired by the representation learning characteristics of deep neural networks, PR-SKD builds a cohort of sub-networks with a pyramid architecture to hierarchically transfer refined information to the target network. According to the different contributions and functions between deep and shallow feature maps, our PR-SKD fully utilizes feature information to improve deep feature representation ability without compromising the capability of shallow feature maps. Extensive experiments on various image classification datasets demonstrate the superiority of our proposed method over widely used state-of-the-art knowledge distillation methods. The code is available at: https://github.com/wo16pao/PR-SKD.

Highlights

•

The method recognizes the variation contribution of different feature maps in self-distillation.

•

A new hierarchical refinement distillation technique via pyramid architecture.

•

The approach improves the deep feature representative capability of the target network.

•

The proposed technique achieves highly competitive performance on recent benchmarks by focusing on deep feature representation.

References

[1]

Hinton G., Vinyals O., Dean J., et al., Distilling the knowledge in a neural network 2 (7), 2015, arXiv preprint arXiv:1503.02531.

[2]

Zagoruyko S., Komodakis N., Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, 2016, arXiv preprint arXiv:1612.03928.

[3]

Y. You, Z. Zhang, C.-J. Hsieh, J. Demmel, K. Keutzer, Imagenet training in minutes, in: Proceedings of the 47th International Conference on Parallel Processing, 2018, pp. 1–10.

[4]

Y. Zhang, T. Xiang, T.M. Hospedales, H. Lu, Deep mutual learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4320–4328.

[5]

M. Ji, S. Shin, S. Hwang, G. Park, I.-C. Moon, Refine myself by teaching myself: Feature refinement via self-knowledge distillation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10664–10673.

[6]

Bengio Y., et al., Learning deep architectures for AI, Found. Trends Mach. Learn. 2 (1) (2009) 1–127.

Digital Library

[7]

Aziz L., Fc M.S.B.H.S., Ayub S., Multi-level refinement enriched feature pyramid network for object detection, Image Vis. Comput. 115 (2021).

[8]

Li Y., Huang Q., Pei X., Jiao L., Shang R., RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images, Remote Sens. 12 (3) (2020) 389.

[9]

L. Zhu, Z. Deng, X. Hu, C.-W. Fu, X. Xu, J. Qin, P.-A. Heng, Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 121–136.

[10]

D. Liu, Y. Cui, L. Yan, C. Mousas, B. Yang, Y. Chen, Densernet: Weakly supervised visual localization using multi-scale feature aggregation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 7, 2021, pp. 6101–6109.

[11]

Romero A., Ballas N., Kahou S.E., Chassang A., Gatta C., Bengio Y., Fitnets: Hints for thin deep nets, 2014, arXiv preprint arXiv:1412.6550.

[12]

C. Buciluǎ, R. Caruana, A. Niculescu-Mizil, Model compression, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 535–541.

[13]

G. Wu, S. Gong, Peer collaborative learning for online knowledge distillation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 12, 2021, pp. 10302–10310.

[14]

Kim J., Hyun M., Chung I., Kwak N., Feature fusion for online mutual knowledge distillation, in: 2020 25th International Conference on Pattern Recognition, ICPR, IEEE, 2021, pp. 4619–4625.

[15]

L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, K. Ma, Be your own teacher: Improve the performance of convolutional neural networks via self distillation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3713–3722.

[16]

S. Yun, J. Park, K. Lee, J. Shin, Regularizing class-wise predictions via self-knowledge distillation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13876–13885.

[17]

Lee H., Hwang S.J., Shin J., Self-supervised label augmentation via input transformations, in: International Conference on Machine Learning, PMLR, 2020, pp. 5714–5724.

[18]

Kullback S., Leibler R.A., On information and sufficiency, Ann. Math. Stat. 22 (1) (1951) 79–86.

[19]

Furlanello T., Lipton Z., Tschannen M., Itti L., Anandkumar A., Born again neural networks, in: International Conference on Machine Learning, PMLR, 2018, pp. 1607–1616.

[20]

L. Yuan, F.E. Tay, G. Li, T. Wang, J. Feng, Revisiting knowledge distillation via label smoothing regularization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3903–3911.

[21]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

[22]

Bengio Y., Courville A., Vincent P., Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–1828.

Digital Library

[23]

Bottou L., Large-scale machine learning with stochastic gradient descent, in: Proceedings of COMPSTAT’2010, Springer, 2010, pp. 177–186.

[24]

S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768.

[25]

M. Tan, R. Pang, Q.V. Le, Efficientdet: Scalable and efficient object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10781–10790.

[26]

Ioffe S., Szegedy C., Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: International Conference on Machine Learning, PMLR, 2015, pp. 448–456.

[27]

Glorot X., Bordes A., Bengio Y., Deep sparse rectifier neural networks, in: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2011, pp. 315–323.

[28]

Howard A.G., Zhu M., Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., Adam H., Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017, arXiv preprint arXiv:1704.04861.

[29]

Krizhevsky A., Hinton G., et al., Learning Multiple Layers of Features from Tiny Images, Citeseer, 2009.

[30]

Le Y., Yang X., Tiny imagenet visual recognition challenge, CS 231N 7 (7) (2015) 3.

[31]

Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., Imagenet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Ieee, 2009, pp. 248–255.

[32]

Wah C., Branson S., Welinder P., Perona P., Belongie S., The Caltech-Ucsd Birds-200–2011 Dataset, California Institute of Technology, 2011.

[33]

Quattoni A., Torralba A., Recognizing indoor scenes, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 413–420.

[34]

Khosla A., Jayadevaprakash N., Yao B., Li F.-F., Novel dataset for fine-grained image categorization: Stanford dogs, in: Proc. CVPR Workshop on Fine-Grained Visual Categorization, Vol. 2, FGVC, Citeseer, 2011.

[35]

Yao B., Jiang X., Khosla A., Lin A.L., Guibas L., Fei-Fei L., Human action recognition by learning bases of action attributes and parts, in: 2011 International Conference on Computer Vision, IEEE, 2011, pp. 1331–1338.

[36]

Everingham M., Eslami S.A., Van Gool L., Williams C.K., Winn J., Zisserman A., The pascal visual object classes challenge: A retrospective, Int. J. Comput. Vis. 111 (2015) 98–136.

Digital Library

[37]

J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.

[38]

Zagoruyko S., Komodakis N., Wide residual networks, 2016, arXiv preprint arXiv:1605.07146.

[39]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.

[40]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2921–2929.

[41]

B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, J.Y. Choi, A comprehensive overhaul of feature distillation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1921–1930.

Recommendations

Diversified feature representation via deep auto-encoder ensemble through multiple activation functions
Abstract
In this paper, we propose a novel Deep Auto-Encoders Ensemble model (DAEE) through assembling multiple deep network models with different activation functions. The hidden features obtained by our proposed model have better robustness in ...
FPD: Feature Pyramid Knowledge Distillation
Neural Information Processing
Abstract
Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of ...
A strong feature representation for siamese network tracker
Abstract
Because AlexNet is too shallow to form a strong feature representation, the trackers based on the Siamese network have an accuracy gap comparing with state-of-the-art algorithms. Both deep features and appearance features benefit tracking ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition Letters

Pattern Recognition Letters Volume 178, Issue C

Feb 2024

223 pages

Issue’s Table of Contents

Copyright © 2023.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 17 April 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents