Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Structured Knowledge Distillation for Accurate and Efficient Object Detection

Published: 01 December 2023 Publication History

Abstract

Knowledge distillation, which aims to transfer the knowledge learned by a cumbersome teacher model to a lightweight student model, has become one of the most popular and effective techniques in computer vision. However, many previous knowledge distillation methods are designed for image classification and fail in more challenging tasks such as object detection. In this paper, we first suggest that the failure of knowledge distillation on object detection is mainly caused by two reasons: (1) the imbalance between pixels of foreground and background and (2) lack of knowledge distillation on the relation among different pixels. Then, we propose a structured knowledge distillation scheme, including <italic>attention-guided distillation</italic> and <italic>non-local distillation</italic> to address the two issues, respectively. Attention-guided distillation is proposed to find the crucial pixels of foreground objects with an attention mechanism and then make the students take more effort to learn their features. Non-local distillation is proposed to enable students to learn not only the feature of an individual pixel but also the relation between different pixels captured by non-local modules. Experimental results have demonstrated the effectiveness of our method on thirteen kinds of object detection models with twelve comparison methods for both object detection and instance segmentation. For instance, Faster RCNN with our distillation achieves 43.9 mAP on MS COCO2017, which is 4.1 higher than the baseline. Additionally, we show that our method is also beneficial to the robustness and domain generalization ability of detectors. Codes and model weights have been released on GitHub<sup>1</sup>.

References

[1]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervention, 2015, pp. 234–241.
[2]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99.
[3]
K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988.
[4]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2223–2232.
[5]
S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” in Proc. Int. Conf. Learn. Representations, 2016, pp. 1–8.
[6]
T. Zhang et al., “A systematic DNN weight pruning framework using alternating direction method of multipliers,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 184–199.
[7]
Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network pruning,” in Proc. Int. Conf. Learn. Representations, pp. 1–8, 2019.
[8]
J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–8.
[9]
M. Nagel, M. V. Baalen, T. Blankevoort, and M. Welling, “Data-free quantization through weight equalization and bias correction,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1325–1334.
[10]
A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, “Incremental network quantization: Towards lossless CNNs with low-precision weights,” in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–8.
[11]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520.
[12]
A. Howard et al., “Searching for MobileNetV3,” pp. 1314–1324, 2019.
[13]
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 116–131.
[14]
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size,” in Proc. Int. Conf. Learn. Representations, 2016, pp. 1–8.
[15]
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015,.
[16]
C. Bucilua, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2006, pp. 535–541.
[17]
Q. Li, S. Jin, and J. Yan, “Mimicking very efficient network for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6356–6364.
[18]
S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–8.
[19]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.
[20]
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.
[21]
H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3588–3597.
[22]
S. I. Mirzadeh, M. Farajtabar, A. Li, N. Levine, A. Matsukawa, and H. Ghasemzadeh, “Improved knowledge distillation via teacher assistant,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 5191–5198.
[23]
J. H. Cho and B. Hariharan, “On the efficacy of knowledge distillation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 4794–4802.
[24]
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “FitNets: Hints for thin deep nets,” in Proc. Int. Conf. Learn. Representations, 2015, pp. 1–8.
[25]
Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning lightweight lane detection CNNs by self attention distillation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1013–1021.
[26]
J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4133–4141.
[27]
S. Ahn, S. X. Hu, A. Damianou, N. D. Lawrence, and Z. Dai, “Variational information distillation for knowledge transfer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9163–9171.
[28]
B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, and J. Y. Choi, “A comprehensive overhaul of feature distillation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1921–1930.
[29]
L. Zhang, Y. Shi, Z. Shi, K. Ma, and C. Bao, “Task-oriented feature distillation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 14 759–14 771.
[30]
W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3967–3976.
[31]
F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1365–1374.
[32]
J. Zhu et al., “Complementary relation contrastive distillation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9260–9269.
[33]
X. Li, J. Wu, H. Fang, Y. Liao, F. Wang, and C. Qian, “Local correlation consistency for knowledge distillation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 18–33.
[34]
G. Xu, Z. Liu, X. Li, and C. C. Loy, “Knowledge distillation meets self-supervision,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 588–604.
[35]
G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, “Learning efficient object detection models with knowledge distillation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 742–751.
[36]
T. Wang, L. Yuan, X. Zhang, and J. Feng, “Distilling object detectors with fine-grained feature imitation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4933–4942.
[37]
M. F. Bajestani and Y. Yang, “TKD: Temporal knowledge distillation for active perception,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2020, pp. 953–962.
[38]
Y. Liu, K. Chen, C. Liu, Z. Qin, Z. Luo, and J. Wang, “Structured knowledge distillation for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2604–2613.
[39]
S. Ge, S. Zhao, C. Li, and J. Li, “Low-resolution face recognition in the wild via selective knowledge distillation,” IEEE Trans. Image Process., vol. 28, no. 4, pp. 2051–2062, Apr. 2019.
[40]
A. Cheraghian, S. Rahman, P. Fang, S. K. Roy, L. Petersson, and M. Harandi, “Semantic-aware knowledge distillation for few-shot class-incremental learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 2534–2543.
[41]
X. Hu, K. Tang, C. Miao, X. Hua, and H. Zhang, “Distilling causal effect of data in class-incremental learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 3957–3966.
[42]
H. Hu, S. Bai, A. Li, J. Cui, and L. Wang, “Dense relation distillation with context-aware aggregation for few-shot object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 185–10 194.
[43]
Y. Hao, Y. Fu, Y.-G. Jiang, and Q. Tian, “An end-to-end architecture for class-incremental object detection with knowledge distillation,” in Proc. IEEE Int. Conf. Multimedia Expo, 2019, pp. 1–6.
[44]
L. Chen, C. Yu, and L. Chen, “A new knowledge distillation for incremental object detection,” in Proc. Int. Joint Conf. Neural Netw., 2019, pp. 1–7.
[45]
R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl, and G. E. Hinton, “Large scale distributed neural network training through online distillation,” 2018,.
[46]
Z. Liu, X. Qi, and C. Fu, “3D-to-2D distillation for indoor scene parsing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4464–4474.
[47]
H. Bagherinezhad, M. Horton, M. Rastegari, and A. Farhadi, “Label refinery: Improving imageNet classification through label progression,” 2018,.
[48]
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter,” 2019,.
[49]
C. Xu, W. Zhou, T. Ge, F. Wei, and M. Zhou, “BERT-of-theseus: Compressing BERT by progressive module replacing,” 2020,.
[50]
M. Liu, X. Chen, Y. Zhang, Y. Li, and J. M. Rehg, “Attention distillation for learning video representations,” in Proc. Brit. Mach. Vis. Conf., 2020, pp. 1–19.
[51]
S. Bhardwaj, M. Srinivasan, and M. M. Khapra, “Efficient video classification using fewer frames,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 354–363.
[52]
X. Wang, J.-F. Hu, J.-H. Lai, J. Zhang, and W.-S. Zheng, “Progressive teacher-student learning for early action prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3556–3565.
[53]
Y. Zhang, H. Chen, X. Chen, Y. Deng, C. Xu, and Y. Wang, “Data-free knowledge distillation for image super-resolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7852–7861.
[54]
Y. Liu, Z. Shu, Y. Li, Z. Lin, F. Perazzi, and S.-Y. Kung, “Content-aware GAN compression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12 156–12 166.
[55]
Q. Jin et al., “Teachers do more than teach: Compressing image-to-image models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 600–13 611.
[56]
H. Chen et al., “Distilling portable generative adversarial networks for image translation,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 3585–3592.
[57]
Z. Li, R. Jiang, and P. Aarabi, “Semantic relation preserving knowledge distillation for image-to-image translation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 648–663.
[58]
M. Li, J. Lin, Y. Ding, Z. Liu, J.-Y. Zhu, and S. Han, “GAN compression: Efficient architectures for interactive conditional GANs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5284–5294.
[59]
L. Zhang, Z. Tan, J. Song, J. Chen, C. Bao, and K. Ma, “SCAN: A scalable neural networks framework towards compact and efficient models,” 2019,.
[60]
L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” 2019,.
[61]
Y. Chen, Y. Xian, A. S. Koepke, Y. Shan, and Z. Akata, “Distilling audio-visual knowledge by compositional contrastive learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7016–7025.
[62]
S. Ren, Y. Du, J. Lv, G. Han, and S. He, “Learning from the master: Distilling cross-modal advanced knowledge for lip reading,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 325–13 333.
[63]
L. Wang, J. Huang, Y. Li, K. Xu, Z. Yang, and D. Yu, “Improving weakly supervised visual grounding by contrastive knowledge distillation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 090–14 100.
[64]
L. Zhang, M. Yu, T. Chen, Z. Shi, C. Bao, and K. Ma, “Auxiliary training: Towards accurate and robust models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 372–381.
[65]
T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 2351–2363.
[66]
R. Müller, S. Kornblith, and G. E. Hinton, “When does label smoothing help?,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 4694–4703.
[67]
M. Kang, J. Mun, and B. Han, “Towards oracle knowledge distillation with neural architecture search,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 4404–4411.
[68]
Y. Liu et al., “Search to distill: Pearls are everywhere but not the eyes,” 2019,.
[69]
X. Wang, R. Zhang, Y. Sun, and J. Qi, “KDGAN: Knowledge distillation with generative adversarial networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 775–786.
[70]
Z. Zheng, R. Ye, P. Wang, J. Wang, D. Ren, and W. Zuo, “Localization distillation for object detection,” 2021,.
[71]
Z. Kang, P. Zhang, X. Zhang, J. Sun, and N. Zheng, “Instance-conditional knowledge distillation for object detection,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2021, pp. 16 468–16 480.
[72]
X. Dai et al., “General instance distillation for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7842–7851.
[73]
J. Guo et al., “Distilling object detectors via decoupled features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 2154–2164.
[74]
Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, “Soft filter pruning for accelerating deep convolutional neural networks,” 2018,.
[75]
Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1389–1397.
[76]
Y. Ro and J. Y. Choi, “Layer-wise pruning and auto-tuning of layer-wise learning rates in fine-tuning of deep networks,” 2020,.
[77]
Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median for deep convolutional neural networks acceleration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4340–4349.
[78]
Z. Liu et al., “MetaPruning: Meta learning for automatic neural network channel pruning,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 3296–3305.
[79]
Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “AMC: Automl for model compression and acceleration on mobile devices,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 784–800.
[80]
J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, “PACT: Parameterized clipping activation for quantized neural networks,” 2018,.
[81]
M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training deep neural networks with binary weights during propagations,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 3123–3131.
[82]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet classification using binary convolutional neural networks,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 525–542.
[83]
X. Wang, F. Yu, Z.-Y. Dou, T. Darrell, and J. E. Gonzalez, “SkipNet: Learning dynamic routing in convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 409–424.
[84]
Z. Wu et al., “BlockDrop: Dynamic inference paths in residual networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8817–8826.
[85]
G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger, “Multi-scale dense networks for resource efficient image classification,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–8.
[86]
J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, “Slimmable neural networks,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–8.
[87]
J. Yu and T. S. Huang, “Universally slimmable networks and improved training techniques,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1803–1811.
[88]
Y. Wang, R. Huang, S. Song, Z. Huang, and G. Huang, “Not all images are worth 16x16 words: Dynamic vision transformers with adaptive sequence length,” 2021,.
[89]
Y. Wang, Z. Chen, H. Jiang, S. Song, Y. Han, and G. Huang, “Adaptive focus for efficient video recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 249–16 258.
[90]
Y. Wang, K. Lv, R. Huang, S. Song, L. Yang, and G. Huang, “Glance and focus: A dynamic approach to reducing spatial redundancy in image classification,” in Proc. Int. Conf. Neural Inf. Process. Syst., H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., 2020, pp. 2432–2444.
[91]
Z. Xie, Z. Zhang, X. Zhu, G. Huang, and S. Lin, “Spatially adaptive inference with stochastic feature sampling and interpolation,” in Proc. 16th Eur. Conf. Comput. Vis., Glasgow, UK, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., 2020, pp. 531–548.
[92]
L. Yang, Y. Han, X. Chen, S. Song, J. Dai, and G. Huang, “Resolution adaptive networks for efficient inference,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Seattle, WA, USA, 2020, pp. 2366–2375.
[93]
A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1–8.
[94]
X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6848–6856.
[95]
M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 6105–6114.
[96]
M. Tan and Q. V. Le, “EfficientNetV2: Smaller models and faster training,” 2021,.
[97]
R. Li, Y. Wang, F. Liang, H. Qin, J. Yan, and R. Fan, “Fully quantized network for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2810–2819.
[98]
Z. Xie, L. Zhu, L. Zhao, B. Tao, L. Liu, and W. Tao, “Localization-aware channel pruning for object detection,” Neurocomputing, vol. 403, pp. 400–408, 2020.
[99]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779–788.
[100]
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7263–7271.
[101]
J. Redmon and A. Farhad, “YOLOv3: An incremental improvement,” 2018,.
[102]
D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 9157–9166.
[103]
Z. Qin et al., “ThunderNet: Towards real-time generic object detection on mobile devices,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6718–6727.
[104]
M. Tan, R. Pang, and Q. Le, “EfficientDet: Scalable and efficient object detection,” 2019,.
[105]
Q. Chen, Y. Wang, T. Yang, X. Zhang, J. Cheng, and J. Sun, “You only look one-level feature,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 039–13 048.
[106]
A. Buades, B. Coll, and J. Morel, “A non-local algorithm for image denoising,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., San Diego, CA, USA, 2005, pp. 60–65.
[107]
X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5589–5598.
[108]
S. Lefkimmiatis, “Non-local color image denoising with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3587–3596.
[109]
D. Liu, B. Wen, Y. Fan, C. C. Loy, and T. S. Huang, “Non-local recurrent network for image restoration,” 2018,.
[110]
Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non-local attention networks for image restoration,” 2019,.
[111]
S. Zhou, J. Zhang, W. Zuo, and C. C. Loy, “Cross-scale internal graph neural network for image super-resolution,” 2020,.
[112]
P. Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma, “Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 3106–3115.
[113]
Y. Xu, L. Gao, K. Tian, S. Zhou, and H. Sun, “Non-local ConvLSTM for video compression artifact reduction,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7043–7052.
[114]
Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCNet: Criss-cross attention for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 603–612.
[115]
Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “GCNet: Non-local networks meet squeeze-excitation networks and beyond,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2019, pp. 1–8.
[116]
Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, “Asymmetric non-local neural networks for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 593–602.
[117]
Y. Li et al., “Neural architecture search for lightweight non-local networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10 297–10 306.
[118]
D. Walawalkar, Z. Shen, and M. Savvides, “Online ensemble model compression using knowledge distillation,” in Proc. 16th Eur. Conf. Comput. Vis., Glasgow, U.K., 2020, pp. 18–35.
[119]
C. Bian, W. Feng, L. Wan, and S. Wang, “Structural knowledge distillation for efficient skeleton-based action recognition,” IEEE Trans. Image Process., vol. 30, pp. 2963–2976, 2021.
[120]
Z. Yang et al., “Focal and global knowledge distillation for detectors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 4643–4652.
[121]
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[122]
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2961–2969.
[123]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988.
[124]
M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3213–3223.
[125]
C. Michaelis et al., “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” 2019,.
[126]
Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 5, pp. 1483–1498, May 2021.
[127]
H. Zhang, H. Chang, B. Ma, N. Wang, and X. Chen, “Dynamic R-CNN: Towards high quality object detection via dynamic training,” 2020,.
[128]
X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid R-CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7363–7372.
[129]
W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 21–37.
[130]
C. Zhu, Y. He, and M. Savvides, “Feature selective anchor-free module for single-shot object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 840–849.
[131]
Z. Yang, S. Liu, H. Hu, L. Wang, and S. Lin, “RepPoints: Point set representation for object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 9657–9666.
[132]
Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring R-CNN,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 6409–6418.
[133]
Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10 012–10 022.
[134]
R. Sun, F. Tang, X. Zhang, H. Xiong, and Q. Tian, “Distilling object detectors with task adaptive regularization,” 2020,.
[135]
Z. Du et al., “Distilling object detectors with feature richness,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2021, pp. 5213–5224.
[136]
P. Zhang, Z. Kang, T. Yang, X. Zhang, N. Zheng, and J. Sun, “LGD: Label-guided self-distillation for object detection,” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 3309–3317.
[137]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015, pp. 1–8.
[138]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[139]
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár, “Designing network design spaces,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10425–10433.
[140]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[141]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5987–5995.
[142]
A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Representations, 2021, pp. 1–8.
[143]
A. Paszke et al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 8024–8035.
[144]
K. Chen et al., “MMDetection: Open MMLab detection toolbox and benchmark,” 2019,.
[145]
D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–8.
[146]
J. Wang, C. Lan, C. Liu, Y. Ouyang, and T. Qin, “Generalizing to unseen domains: A survey on domain generalization,” in Proc. 30th Int. Joint Conf. Artif. Intell., Montreal, Canada, 2021, pp. 4627–4635.
[147]
H. Mobahi, M. Farajtabar, and P. Bartlett, “Self-distillation amplifies regularization in Hilbert space,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 3351–3361.
[148]
L. Yuan, F. E. Tay, G. Li, T. Wang, and J. Feng, “Revisit knowledge distillation: A teacher-free framework,” 2019,.

Cited By

View all
  • (2024)Slim-YOLO-PR_KD: an efficient pose-varied object detection method for underground coal mineJournal of Real-Time Image Processing10.1007/s11554-024-01539-021:5Online publication date: 28-Aug-2024
  • (2024)Multi-Scale Cross Distillation for Object Detection in Aerial ImagesComputer Vision – ECCV 202410.1007/978-3-031-72967-6_25(452-471)Online publication date: 29-Sep-2024

Index Terms

  1. Structured Knowledge Distillation for Accurate and Efficient Object Detection
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
      IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 45, Issue 12
      Dec. 2023
      1966 pages

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 01 December 2023

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Slim-YOLO-PR_KD: an efficient pose-varied object detection method for underground coal mineJournal of Real-Time Image Processing10.1007/s11554-024-01539-021:5Online publication date: 28-Aug-2024
      • (2024)Multi-Scale Cross Distillation for Object Detection in Aerial ImagesComputer Vision – ECCV 202410.1007/978-3-031-72967-6_25(452-471)Online publication date: 29-Sep-2024

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media