Surface Defect Detection of Industrial Parts Based
Surface Defect Detection of Industrial Parts Based
Surface Defect Detection of Industrial Parts Based
ABSTRACT Industrial product quality inspection, a crucial procedure in industrial production, is crucial in
assuring product yield. Product safety and quality inspections on industrial assembly lines are predominantly
manual, and there is currently a dearth of safe and dependable inspection techniques. An improved surface
defect detection approach based on YOLOv5 is proposed for the problem of surface flaws in industrial
components in order to improve the quality detection effect of industrial production parts. To improve the
effect of dense object detection, the image features are extracted by the convolutional network and enhanced
by coordinate attention. BiFPN is utilized to fuse multi-scale features in order to lower the rate of missed
detection and false detection for small target samples. The detectors from the Transformer structure are added
to the complex problem of fine-grained detection to improve the predictability of challenging occurrences.
According to the experimental findings, on the dataset for industrial parts defects, the proposed network
increases the recall of the original algorithm in abnormal classes by 5.3%, reaching 91.6%. Its inference
speed can approach 95FPS, indicating an improved real-time detection performance.
INDEX TERMS Defect detection, YOLOv5, transformer, deep learning, fine-grained detection.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
130784 VOLUME 10, 2022
H. F. Le et al.: Surface Defect Detection of Industrial Parts Based on YOLOv5
of industrial products. Yang et al. [9] locate various defect the proposed method. The implementation of the proposed
positions through the object detection method and distinguish method and comparison with previous methods is presents in
defect categories using an improved classification network. Section 4. Section 5 summarizes the conclusions of the work
Zhao et al. [10] extract defect information by virtue of the in this study and suggests the future search direction.
instance segmentation method. The prediction results are
output through the subsequent network, and the training data II. RELATED WORK
is enriched with weakly supervised learning. Still, real-time Object detection includes two parts, classification and loca-
detection cannot be guaranteed due to slow recognition speed. tion, and its application fields are broad, including face
The industrial field requires real-time detection performance, detection, pedestrian detection, vehicle detection, etc [13].
and even small embedded devices can satisfy the real-time Traditional object detection algorithms adopt sliding win-
detection requirements. This application scenario requires a dows to detect objects without any pertinence, which is inef-
lightweight, high detection frame rate model. ficient and inaccurate. The manually selected features are
To solve the problems of low detection accuracy and inabil- less robust to irregular objects with different shapes [14].
ity to real-time detection in traditional methods [11], [12], With the advancement of deep learning technology, image
a mechanical product defect detection system is proposed in feature extraction by the convolutional neural network has
this paper for industrial assembly lines based on the object become a common approach [15], [16], [17]. Meanwhile,
detection method. Parts with defects such as deform and object detection, as one of the hot spots in the field of machine
contamination are marked when their appearance is detected vision research, has stimulated the appearance of numerous
to be defective, which provides convenience for subsequent excellent algorithms in object detection [18], [19], [20]. The
early warning and rejection of defective products. Different emergence of abundant networks has played a critical role in
from other object detection methods in defect detection of promoting the development of deep learning. For example,
industrial parts, this paper detects different abnormal states of ResNet [21] proposed the concept of residual blocks, which
the same category at the instance level, which belongs to fine- significantly intensified the depth of networks. Also, new
grained detection, characterized by the distinction between feature extraction methods are provided in terms of image
different abnormal categories. Due to smaller difference of detection with the help of the attention mechanism [22].
the samples in different states, it is more difficult to identify Methods, such as the assemblable attention module proposed
correct samples. Using a lightweight model with high real- by SENet [23], bring accuracy improvement to the convo-
time performance, the proposed method can provide a com- lutional network; DETR [24] uses the classical convolution
puter vision-based solution for current industrial production structure to encode the image features after extraction and
through embedded transplant deployment, so as to enhance completes both classification and positioning through the
the quality of products under industrial assembly lines. Based transformer structure. For detection, an innovative Hungarian
on ensuring real-time detection performance, given the dif- loss function is used to match the decoded target class in gen-
ficulty of defect classification with the existing YOLOv5s eral detection networks, rather than the initial anchor design.
method for small samples with low recall rates and slight sam- Similar to natural language processing, VIT [25] encodes
ple differences, the model is improved in this paper to make the segmented and serialized images to input them into the
it more suitable for tiny target and difficult sample detection transformer, and directly obtains coordinate positions and
to obtain satisfactory results. The main contributions are: category of targets through encoding and decoding. Swin-
1) The feature extraction module can be given coordi- transformer [26] is improved based on VIT, which can solve
nate attention to significantly improve the detection the problem of the enormous computational cost of the VIT
performance of the model with minimal computational method through hierarchical feature mapping and window
overhead. attention transformation.
2) By using a bidirectional multi-scale fusion module, So far, two main branches of object detection methods are
it is possible to optimize the model hierarchy, fuse mentioned on the basis of deep learning: the two-stage object
additional layers of features without increasing extra detection model based on the region generation network and
calculation. It can also enhance the feature fusion abil- the one-stage object detection model that directly performs
ity of the network, and raise the recall rate for small position regression [27]. YOLOv5 is an efficient and stable
target samples. one-stage object detection method with greatly enhanced
3) Aiming at the issue of missing detection of fine-grained speed and accuracy, and can quickly adapt to new tasks after
samples in the dataset, a detector with a Transformer transfer learning. The input of the YOLOv5 is an RGB image
structure is proposed to enhance the feature extraction with a size of 640*640. Its overall network design is divided
capability of the model and effectively increase the into a backbone network based on the CSPNet [28] neural
recognition accuracy of difficult and difficult target network, a multi-scale feature fusion module based on the
samples. FPN [29]+PAN [30] structure and the detector for output
The remainder of this paper is organized as follows. classification and bounding box regression.
In Section 2, we introduce related works on object detec- The backbone of YOLOv5 includes Focus, BottleneckCSP
tion in recent years. Section 3 presents the details of and SPP. The first two components mainly undertake image
FIGURE 1. Structure of proposed network. In the inference stage, the input is a RGB image from camera, the output prediction is the primary
picture with marker box. The CSP_CA represents the CSP module with Coordinate Attention.
III. METHODOLOGY
The proposed network structure is shown in Figure 1. Some
deep-level features are extracted by adding the CSP unit
with the CA module. The BiFPN is utilized to integrate the
features, simplify a portion of the network structure, and pay FIGURE 2. Structure of Coordinate Attention. It carries out average
pooling in horizontal and vertical directions, then carries out
close attention to obtain features at different levels. To locate transformation to encode spatial information, and finally fuses spatial
and classify targets, the fused features are transmitted to information by weighting on the channel.
the corresponding detectors in accordance with different
resolutions.
multiple CSP residual modules for feature extraction, which
A. COORDINATE ATTENTION can continuously accumulate redundant information during
The images of industrial parts and information of included network iteration and reduce the detection accuracy. In view
mechanical parts are usually accompanied by complex back- of the confusion of targets during dense data detection, this
ground environment. The YOLOv5 network uses stacking paper optimizes the overall feature extraction ability of the
model, by embedding position information into the attention on the feature pyramid. The combination of upsampling
module after adding Coordinate Attention (CA) [31] into the and downsampling for multi-scale feature fusion can obtain
CSP structure. deeper semantic information. However, the shallow features
Attention mechanisms in computer vision, which aim of the neck will be diluted, hindering the full combination
to mimic the human visual system, can efficiently capture of image features between deep layers and shallow layers.
salient regions in complex scenes, making progress in multi- Considering many instances of small size in the defect
ple vision tasks. Through the attention mechanism, the input detection dataset of industrial parts, and the difficulty in
image features can be dynamically weighted. The SENet distinguishing features at the deep level, shallow features with
improves the recognition performance of the convolutional in-depth features are combined by the BiFPN [33]. The atten-
network by the feature extraction capability of the atten- tion computation enhances shallow feature information flow,
tion optimization model at the feature channel and spatial making the model more biased towards small target samples
information level. But attention modules in methods such in terms of assigning weights rather than direct summation,
as SENet and CBAM [32] only consider internal channel as in PANet.
information, ignoring the importance of location information.
It is undeniable that the spatial structure of objects in vision is
of great significance. Based on CBAM, coordinate attention
is simplified, as shown in Figure 2. Give an input X, a pooling
window of size (H, 1) or (1, W) is set along horizontal and ver-
tical coordinates. By using the two parallel one-dimensional
feature codes obtained from each channel, spatial coordinate
information is integrated efficiently to acquire coordinate
attention through the subsequent convolution structure to map
the input features, so as to ensure that the network feature
extraction ability is enhanced with less computational over-
head, while obtaining more receptive field information.
FIGURE 5. The Transformer Module. The top part is the standard convolutional predict head, the bottom part is the Transformer predict head
consists of Multi-Head Attention, MLP and other modules. L represents the Linear layer.
TABLE 3. Detection results of different proposed structures on an industrial part defect dataset. Recall(abnormal) represents the recall rate except
normal target.
the A100 graphics card reaches 95FPS, which still meets the
needs of real-time detection. The experiments demonstrated
that our model remains competitive on the dataset.
To verify the optimization effect of each proposed module
in the network, ablation experiments are carried out according
to the proposed method. The experimental results are summa-
rized in Table 3. The Recallabnormal is the primary evaluation
FIGURE 8. Validation curve of epochs. It compares the validation mAP
indicator of the abnormal detection of industrial parts. After
from RetinaNet, YOLOv5, EfficientDet-D0 and our method in the training adding the coordinate attention module, the average precision
stage. of the model is increased by 0.5%. After using the bidi-
rectional multi-scale fusion module for feature integration,
As can be seen from Figure 8, the proposed model achieves the abnormal recall rate of the detection model is increased
higher detection accuracy than YOLOv5s, and the mAP by 2.3%. It is concluded that the prediction accuracy of the
reaches 0.756, which is 2.2 percentages points higher than method for small target samples is significantly improved.
the original network. It can be seen from Table 2 that the pro- The addition of the Transformer detector guarantees great
posed method achieves almost the highest detection accuracy enhancement of the recall rate and precision. Figure 9 shows
among the same type of methods, reaching 93.6%, and the the precision-recall curve of the detection performance of
detection speed is also at a high level. The inference speed on proposed model. The detection speed decreases due to the
FIGURE 10. Feature dimensionality reduction visualization in t-SNE. The feature is from the feature maps of the penultimate layer of network,
and reduced to 2 dimensions by PCA. Different colored dots represent different categories.
increased number of parameters and computation brought by samples, reduces the overlap between categories, and realizes
its structure. The results show the proposed detection model more balanced overall spacing of features, proving that the
is superior to the primary YOLOv5. proposed method can improve the representation ability of
the feature space for effective object detection.
D. ANALYSIS Specifically, to compare the proposed network results more
Figure 10 presents a comparison of the prototype distribution intuitively, some pictures in the test dataset and real pic-
of classification features learned by both original and pro- tures were selected for testing. For more obvious comparison
posed models, indicating that the model with the bidirectional results, the two networks’ confidence thresholds were set
multi-scale fusion module still faces a small amount of sam- to 0.45. The non-maximum suppression IoU threshold is
ple confusion after network fine-tuning, but its classification set to 0.3.
interval is more apparent. The model added to the Trans- Figure 11 describes the detection results of the YOLOv5s
former detector clearly distinguishes the vast majority of model and proposed model respectively on the left and
FIGURE 11. Prediction results with marker box. The left side of the predict picture is from YOLOv5, and the right side is from proposed model.
The part (a) shows the comparing results in small target detection situation. The part (b) shows the comparing detection results in dense
situation.
right sides. In Figure 11(a), owing to long distance from distances of 100 cm and 120 cm. The illuminance comparison
the detection target to the shooting acquisition device, the group was set up with two distinct illuminance environments,
detected targets tend to be tiny overall, and the confidence namely the low-light group with an illuminance of approxi-
is lower with some false detections. The fact is that small mately 100 lx and a high-light group with an illuminance of
target objects can be detected more accurately on the right about 220 lx. A dirty-camera control group was established
side. In Figure 11(b), dense targets make some of the predic- to shoot with the experiment using the same model camera
tion frames on the left inaccurate and undetected, while the that had been in use in the factory for about 14 months.
detection results on the right are improved. While maintaining the same environment, all batches were
Ablation experiments are carried out to verify the shot in five shots with fine-tuning of the shooting angle, and
efficiency of the proposed module in the actual produc- total of 400 samples were collected. By using data enhance-
tion environment. We set up control groups in the factory ment methods including horizontal flip, vertical flip, and
based on different environments. Each control group contains random cropping, the dataset was enlarged. With a total of
20 batches of samples from the abnormal category, with four 1600 samples, they were then summarized and sorted into an
different abnormal samples in each batch. On a assembly line, industrial parts environmental comparison dataset.
samples from the same batch were photographed in different The prediction results of the proposed model on this dataset
environments. The camera is situated between 65 and 85 cm are shown in Table 4. It can be seen that the prediction recall
away from the object in the normal control group. There is rate of the model in different environments has been affected
around 170 lx of indoor illumination, and the camera is brand- to different degrees. Among them, the Twist samples are more
new. The samples were placed in two groups of experimen- significantly affected when the camera is far away and dirty,
tal settings that were separated from the control groups by with a maximum drop is about 5 percentage points. In low
TABLE 4. Prediction results in different environments. still has a slight shortage of detecting the defects of parts with
occlusion under a fixed shooting angle. Future research will
further adjust the structure and determine how to improve
the recognition accuracy through multi-angle collaborative
detection to achieve better detection performance.
A. ABBREVIATIONS
AP Averaged AP at IoUs from 0.5 to 0.95
with an interval of 0.05
AP50 AP at IoU threshold 0.5
light conditions, dirty class samples are more affected, and AP75 AP at IoU threshold 0.75
the recall rate decreases by 6.7 percentage points. The recall BiFPN Bi-directional feature pyramid network
rate of the rest of categories is slightly influenced by the CA Coordinate Attention
environment. Additionally, it was discovered throughout the CBAM Convolutional block attention model
experiment that the samples from the Twist category and end-to-end The input is the original data, and the
the Incomplete category were marginally impacted by the output is the final result
shooting angle. According to the comparative experiments FLOPs Floating-point operations per second
mentioned above, it can be found that the proposed method FPN Feature pyramid network
will slightly reduce the recall rate of abnormal samples when IoU Intersection over union
the illumination and camera height of the real production lx Lux, the unit of illumination.
environment change slightly, but it can still meet the detection Recallabnormal recall rate of abnormal samples
requirements. SSD Single Shot multibox Detector
The proposed model is suitable for use with embedded YOLO You Only Look Once
devices. After compiling to onnx, we migrate the model to
NVIDIA Jetson NX and build a detection system based on it. REFERENCES
Due to the limited computing power of the device, the detec- [1] H. Wang, J. Wang, G. Zhang, X. Ouyang, and F. Luo, ‘‘Improved FPN’s
tion speed on Jetson NX after porting is about 35FPS. When mask R-CNN for industrial surface defect detection,’’ Manuf. Automat.,
the detection system uses monitors to output the detection vol. 42, no. 12, pp. 35–40 and 97, Dec. 2020.
[2] Y. Chen, Y. Ding, F. Zhao, E. Zhang, Z. Wu, and L. Shao, ‘‘Surface defect
videos, the detection speed of the model decreases because detection methods for industrial products: A review,’’ Appl. Sci., vol. 11,
the videos take up part of the computation, and it declines no. 16, p. 7657, Aug. 2021.
to 31FPS in our experimental environment. The above data [3] H.-Y. Lee and T.-E. Lee, ‘‘Scheduling single-armed cluster tools with
reentrant wafer flows,’’ IEEE Trans. Semicond. Manuf., vol. 19, no. 2,
are measured under the condition that the detection accuracy pp. 226–240, May 2006.
is unaffected and the model is transplanted without quan- [4] D. V. Slavov and V. D. Hristov, ‘‘3D machine vision system for defect
tification. The model can be quantized in order to reduce inspection and robot guidance,’’ in Proc. 57th Int. Sci. Conf. Inf., Commun.
Energy Syst. Technol. (ICEST), Jun. 2022, pp. 1–5.
the number of model parameters and calculations and speed
[5] M. Foumani, M. Y. Ibrahim, and I. Gunawan, ‘‘Scheduling dual gripper
up the inference for embedded devices to achieve real-time robotic cells with a hub machine,’’ in Proc. IEEE Int. Symp. Ind. Electron.,
detection. The quantization operation will result in some May 2013, pp. 1–6.
recall loss that is related to the model compression rate. [6] T. Czimmermann, G. Ciuti, M. Milazzo, M. Chiurazzi, S. Roccella,
C. M. Oddo, and P. Dario, ‘‘Visual-based defect detection and classi-
fication approaches for industrial applications—A SURVEY,’’ Sensors,
V. CONCLUSION vol. 20, no. 5, p. 1459, Mar. 2020.
In this study, we propose an end-to-end lightweight defect [7] H. Chang, J. Gou, and X. Li, ‘‘Application of faster R-CNN in image defect
detection of industrial CT,’’ J. Image Graph., vol. 23, no. 7, pp. 1061–1071,
detection model for industrial parts based on improved 2018.
YOLOv5. The detector can achieve excellent detection accu- [8] X. Yue, Q. Wang, L. He, Y. Li, and D. Tang, ‘‘Research on tiny target
racy and real-time detection on edge computing device. Our detection technology of fabric defects based on improved Yolo,’’ Appl. Sci.,
vol. 12, no. 13, p. 6823, Jul. 2022.
contributions mainly concentrate on three aspects: applying [9] Z. Li, X. Tian, X. Liu, Y. Liu, and X. Shi, ‘‘A two-stage industrial
the coordinate attention to module for feature extraction to defect detection framework based on improved-YOLOv5 and optimized-
improve the detection performance of the model, optimizing inception-ResNetV2 models,’’ Appl. Sci., vol. 12, no. 2, p. 834, Jan. 2022.
[10] J. Božic, D. Tabernik, and D. Skocaj, ‘‘Mixed supervision for surface-
the model hierarchy through the BiFPN to reduce the false defect detection: From weakly to fully supervised learning,’’ Comput. Ind.,
detection rate and missed detection of small target sam- vol. 129, Aug. 2021, Art. no. 103459.
ples, and adding the Transformer detector to increase the [11] Q. Luo, X. Fang, L. Liu, C. Yang, and Y. Sun, ‘‘Automated visual defect
detection for flat steel surface: A survey,’’ IEEE Trans. Instrum. Meas.,
recognition accuracy of difficult samples. The experimental
vol. 69, no. 3, pp. 626–644, Mar. 2020.
results demonstrate that the algorithm proposed in this article [12] J. Yang, S. Li, Z. Wang, and G. Yang, ‘‘Real-time tiny part defect detec-
improves the performance of the defect detection algorithm tion system in manufacturing using deep learning,’’ IEEE Access, vol. 7,
based on industrial parts under the premise of real-time detec- pp. 89278–89291, 2019.
[13] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, ‘‘Object detection with deep
tion and can help improve the yield in industrial production, learning: A review,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 30,
transportation, and other scenarios. Currently, the algorithm no. 11, pp. 3212–3232, Nov. 2019.
[14] X. Wang, M. Yang, S. Zhu, and Y. Lin, ‘‘Regionlets for generic object [32] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, ‘‘CBAM: Convolutional
detection,’’ in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 17–24. block attention module,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification pp. 3–19.
with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 2, [33] M. Tan, R. Pang, and Q. V. Le, ‘‘EfficientDet: Scalable and efficient
pp. 84–90, Jun. 2012. object detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[16] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for (CVPR), Jun. 2020, pp. 10781–10790.
large-scale image recognition,’’ 2014, arXiv:1409.1556. [34] X. Zhu, S. Lyu, X. Wang, and Q. Zhao, ‘‘TPH-YOLOv5: Improved
[17] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, YOLOv5 based on transformer prediction head for object detection on
V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ drone-captured scenarios,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, Workshops (ICCVW), Oct. 2021, pp. 2778–2788.
pp. 1–9.
[18] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
2018, arXiv:1804.02767.
[19] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal
speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
[20] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and HAI FENG LE received the bachelor’s degree from
A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ in Proc. Eur. Conf. the School of College of Urban Rail Transit and
Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 21–37. Logistics, Beijing Union University, in 2019. He is
[21] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image currently pursuing the graduate degree with the
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), School of Robotics, Beijing Union University. His
Jun. 2016, pp. 770–778. interests include deep learning and applications
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and computer graphics.
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
[23] J. Hu, L. Shen, and G. Sun, ‘‘Squeeze-and-excitation networks,’’ in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 7132–7141.
[24] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
S. Zagoruyko, ‘‘End-to-end object detection with transformers,’’ in Proc.
Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 213–229. LU JIA ZHANG was born in Beijing, China,
[25] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, in 1996. She received the bachelor’s degree in
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, computer science and technology from Beijing
J. Uszkoreit, and N. Houlsby, ‘‘An image is worth 16×16 words: Trans- Union University’s Smart City College, in 2019.
formers for image recognition at scale,’’ 2020, arXiv:2010.11929. She is currently pursuing the graduate degree with
[26] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo,
the School of Robotics, Beijing Union University.
‘‘Swin Transformer: Hierarchical vision transformer using shifted win-
Her research interests include image recognition
dows,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021,
pp. 10012–10022. and deep learning and applications.
[27] H T. Lu and Q. C. Zhang, ‘‘Applications of deep convolutional neural
network in computer vision,’’ J. Data Acquisition Process., vol. 31, no. 1,
pp. 1–17, 2016.
[28] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and
I.-H. Yeh, ‘‘CSPNet: A new backbone that can enhance learning capability
of CNN,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work-
shops (CVPRW), Jun. 2020, pp. 390–391. YAN XIA LIU received the Ph.D. degree from the
[29] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, School of Automation and Electrical Engineering,
‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf. University of Science and Technology, Beijing,
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125. in 2013. She is a Professor with the College of
[30] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, ‘‘Path aggregation network Urban Rail Transit and Logistics, Beijing Union
for instance segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern University. Her current research interests include
Recognit., Jun. 2018, pp. 8759–8768. pattern recognition, computer vision, deep learn-
[31] Q. Hou, D. Zhou, and J. Feng, ‘‘Coordinate attention for efficient mobile ing, and intelligent instruments.
network design,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2021, pp. 13713–13722.