Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enhancing deep feature representation in self-knowledge distillation via pyramid feature refinement

Published: 17 April 2024 Publication History

Abstract

In recent years, various self-knowledge distillation approaches have been proposed to reduce the cost of training teacher networks. However, these methods often overlook the significance of deep features. To address this limitation and strengthen the capability of deep features while preserving the ability of shallow features, we propose performing Self-Knowledge Distillation via Pyramid Feature Refinement (PR-SKD). Inspired by the representation learning characteristics of deep neural networks, PR-SKD builds a cohort of sub-networks with a pyramid architecture to hierarchically transfer refined information to the target network. According to the different contributions and functions between deep and shallow feature maps, our PR-SKD fully utilizes feature information to improve deep feature representation ability without compromising the capability of shallow feature maps. Extensive experiments on various image classification datasets demonstrate the superiority of our proposed method over widely used state-of-the-art knowledge distillation methods. The code is available at: https://github.com/wo16pao/PR-SKD.

Highlights

The method recognizes the variation contribution of different feature maps in self-distillation.
A new hierarchical refinement distillation technique via pyramid architecture.
The approach improves the deep feature representative capability of the target network.
The proposed technique achieves highly competitive performance on recent benchmarks by focusing on deep feature representation.

References

[1]
Hinton G., Vinyals O., Dean J., et al., Distilling the knowledge in a neural network 2 (7), 2015, arXiv preprint arXiv:1503.02531.
[2]
Zagoruyko S., Komodakis N., Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, 2016, arXiv preprint arXiv:1612.03928.
[3]
Y. You, Z. Zhang, C.-J. Hsieh, J. Demmel, K. Keutzer, Imagenet training in minutes, in: Proceedings of the 47th International Conference on Parallel Processing, 2018, pp. 1–10.
[4]
Y. Zhang, T. Xiang, T.M. Hospedales, H. Lu, Deep mutual learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4320–4328.
[5]
M. Ji, S. Shin, S. Hwang, G. Park, I.-C. Moon, Refine myself by teaching myself: Feature refinement via self-knowledge distillation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10664–10673.
[6]
Bengio Y., et al., Learning deep architectures for AI, Found. Trends Mach. Learn. 2 (1) (2009) 1–127.
[7]
Aziz L., Fc M.S.B.H.S., Ayub S., Multi-level refinement enriched feature pyramid network for object detection, Image Vis. Comput. 115 (2021).
[8]
Li Y., Huang Q., Pei X., Jiao L., Shang R., RADet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images, Remote Sens. 12 (3) (2020) 389.
[9]
L. Zhu, Z. Deng, X. Hu, C.-W. Fu, X. Xu, J. Qin, P.-A. Heng, Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 121–136.
[10]
D. Liu, Y. Cui, L. Yan, C. Mousas, B. Yang, Y. Chen, Densernet: Weakly supervised visual localization using multi-scale feature aggregation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 7, 2021, pp. 6101–6109.
[11]
Romero A., Ballas N., Kahou S.E., Chassang A., Gatta C., Bengio Y., Fitnets: Hints for thin deep nets, 2014, arXiv preprint arXiv:1412.6550.
[12]
C. Buciluǎ, R. Caruana, A. Niculescu-Mizil, Model compression, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 535–541.
[13]
G. Wu, S. Gong, Peer collaborative learning for online knowledge distillation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 12, 2021, pp. 10302–10310.
[14]
Kim J., Hyun M., Chung I., Kwak N., Feature fusion for online mutual knowledge distillation, in: 2020 25th International Conference on Pattern Recognition, ICPR, IEEE, 2021, pp. 4619–4625.
[15]
L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, K. Ma, Be your own teacher: Improve the performance of convolutional neural networks via self distillation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3713–3722.
[16]
S. Yun, J. Park, K. Lee, J. Shin, Regularizing class-wise predictions via self-knowledge distillation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13876–13885.
[17]
Lee H., Hwang S.J., Shin J., Self-supervised label augmentation via input transformations, in: International Conference on Machine Learning, PMLR, 2020, pp. 5714–5724.
[18]
Kullback S., Leibler R.A., On information and sufficiency, Ann. Math. Stat. 22 (1) (1951) 79–86.
[19]
Furlanello T., Lipton Z., Tschannen M., Itti L., Anandkumar A., Born again neural networks, in: International Conference on Machine Learning, PMLR, 2018, pp. 1607–1616.
[20]
L. Yuan, F.E. Tay, G. Li, T. Wang, J. Feng, Revisiting knowledge distillation via label smoothing regularization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3903–3911.
[21]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[22]
Bengio Y., Courville A., Vincent P., Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1798–1828.
[23]
Bottou L., Large-scale machine learning with stochastic gradient descent, in: Proceedings of COMPSTAT’2010, Springer, 2010, pp. 177–186.
[24]
S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768.
[25]
M. Tan, R. Pang, Q.V. Le, Efficientdet: Scalable and efficient object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10781–10790.
[26]
Ioffe S., Szegedy C., Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: International Conference on Machine Learning, PMLR, 2015, pp. 448–456.
[27]
Glorot X., Bordes A., Bengio Y., Deep sparse rectifier neural networks, in: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2011, pp. 315–323.
[28]
Howard A.G., Zhu M., Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., Adam H., Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017, arXiv preprint arXiv:1704.04861.
[29]
Krizhevsky A., Hinton G., et al., Learning Multiple Layers of Features from Tiny Images, Citeseer, 2009.
[30]
Le Y., Yang X., Tiny imagenet visual recognition challenge, CS 231N 7 (7) (2015) 3.
[31]
Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., Imagenet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Ieee, 2009, pp. 248–255.
[32]
Wah C., Branson S., Welinder P., Perona P., Belongie S., The Caltech-Ucsd Birds-200–2011 Dataset, California Institute of Technology, 2011.
[33]
Quattoni A., Torralba A., Recognizing indoor scenes, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 413–420.
[34]
Khosla A., Jayadevaprakash N., Yao B., Li F.-F., Novel dataset for fine-grained image categorization: Stanford dogs, in: Proc. CVPR Workshop on Fine-Grained Visual Categorization, Vol. 2, FGVC, Citeseer, 2011.
[35]
Yao B., Jiang X., Khosla A., Lin A.L., Guibas L., Fei-Fei L., Human action recognition by learning bases of action attributes and parts, in: 2011 International Conference on Computer Vision, IEEE, 2011, pp. 1331–1338.
[36]
Everingham M., Eslami S.A., Van Gool L., Williams C.K., Winn J., Zisserman A., The pascal visual object classes challenge: A retrospective, Int. J. Comput. Vis. 111 (2015) 98–136.
[37]
J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
[38]
Zagoruyko S., Komodakis N., Wide residual networks, 2016, arXiv preprint arXiv:1605.07146.
[39]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
[40]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2921–2929.
[41]
B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, J.Y. Choi, A comprehensive overhaul of feature distillation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1921–1930.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition Letters
Pattern Recognition Letters  Volume 178, Issue C
Feb 2024
223 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 17 April 2024

Author Tags

  1. Self-knowledge distillation
  2. Feature representation
  3. Pyramid structure
  4. Deep neural networks

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media