Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

A_Small-Sized_Object_Detection_Oriented_Multi-Scale_Feature_Fusion_Approach_With_Application_to_Defect_Detection

This document presents a novel multi-scale feature fusion method, named atrous spatial pyramid pooling-balanced-feature pyramid network (ABFPN), aimed at improving small object detection, particularly for surface defects in printed circuit boards (PCBs). The proposed method integrates atrous convolution with skip connections to enhance feature fusion and is embedded in an improved PCB defect detection framework (IPDD), which outperforms existing state-of-the-art methods. Experimental results validate the effectiveness of the ABFPN in various benchmark datasets and demonstrate its practical application in defect detection tasks.

Uploaded by

Sabhya Lokhande
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

A_Small-Sized_Object_Detection_Oriented_Multi-Scale_Feature_Fusion_Approach_With_Application_to_Defect_Detection

This document presents a novel multi-scale feature fusion method, named atrous spatial pyramid pooling-balanced-feature pyramid network (ABFPN), aimed at improving small object detection, particularly for surface defects in printed circuit boards (PCBs). The proposed method integrates atrous convolution with skip connections to enhance feature fusion and is embedded in an improved PCB defect detection framework (IPDD), which outperforms existing state-of-the-art methods. Experimental results validate the effectiveness of the ABFPN in various benchmark datasets and demonstrate its practical application in defect detection tasks.

Uploaded by

Sabhya Lokhande
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

71, 2022 3507014

A Small-Sized Object Detection Oriented


Multi-Scale Feature Fusion Approach With
Application to Defect Detection
Nianyin Zeng , Peishu Wu , Zidong Wang , Fellow, IEEE, Han Li , Weibo Liu , and Xiaohui Liu

Abstract— Object detection is a well-known task in the field four categories that are image classification, object detection,
of computer vision, especially the small target detection problem semantic segmentation, and instance segmentation [7]. Due to
that has aroused great academic attention. In order to improve its wide application potential in image processing and pattern
the detection performance of small objects, in this article, a novel
enhanced multiscale feature fusion method is proposed, namely, recognition, object detection has received an ever-increasing
the atrous spatial pyramid pooling-balanced-feature pyramid net- research interest from both academic and industrial communi-
work (ABFPN). In particular, the atrous convolution operators ties during the past few decades. With the rapid development
with different dilation rates are employed to make full use of of deep learning techniques, object detection algorithms can
context information, where the skip connection is applied to be divided into two groups: the one-stage object detection
achieve sufficient feature fusions. In addition, there is a balanced
module to integrate and enhance features at different levels. The algorithms and the two-stage ones. The one-stage object detec-
performance of the proposed ABFPN is evaluated on three public tion algorithms can directly obtain the category probability
benchmark datasets, and experimental results demonstrate that and position coordinate values of objects, e.g., you only
it is a reliable and efficient feature fusion method. Furthermore, look once (YOLO) models, the single-shot multibox detector
in order to validate the applicational potential in small objects, (SSD), and the corner network [1], [27], [37], [40]–[42]. The
the developed ABFPN is utilized to detect surface tiny defects
of the printed circuit board (PCB), which acts as the neck part two-stage ones need to obtain the region proposals with rough
of an improved PCB defect detection (IPDD) framework. While location information and then classify the candidate regions
designing the IPDD, several powerful strategies are also employed into different groups. Some representative two-stage object
to further improve the overall performance, which is evaluated detection algorithms are the region convolutional neural net-
via extensive ablation studies. Experiments on a public PCB work (RCNN) [13], the fast RCNN [14], the faster RCNN [43],
defect detection database have demonstrated the superiority of
the designed IPDD framework against the other seven state-of- the mask RCNN [15], and the spatial pyramid pooling
the-art methods, which further validates the practicality of the network [16].
proposed ABFPN. Due to their strong abilities in defect detection and fault
Index Terms— Atrous spatial pyramid pooling (ASPP), defect diagnosis, object detection algorithms have been successfully
detection, feature fusion, object detection, printed circuit applied to a wide range of areas such as transportation,
board (PCB). electrical and electronic engineering, biomedical engineering,
and so on [2], [12], [22], [23], [51], [58]. It should be pointed
I. I NTRODUCTION out that the size of the object plays a critical role in object
detection, especially in industrial applications. In fact, the
C OMPUTER vision is a simulation of biological vision
using computers and related equipment. Recently, com-
puter vision has attracted enormous attention in various fields,
performance of the conventional object detection algorithms
is poor by using low-level features (e.g., edge information)
such as industrial production, agriculture, and medical health. for small object detection. In addition, it is difficult to extract
It is known that computer vision tasks can be divided into high-level semantic features of small objects. As such, it is
challenging to accurately position and classify small objects
Manuscript received January 14, 2022; revised February 5, 2022; accepted by using conventional object detection algorithms.
February 13, 2022. Date of publication February 24, 2022; date of current
version March 10, 2022. This work was supported in part by the National During the past few years, tremendous efforts have been
Natural Science Foundation of China under Grant 62073271, in part by devoted to small object detection [20], [30], [32], [33], [35],
the International Science and Technology Cooperation Project of Fujian [36]. To summarize, the recently developed small object detec-
Province of China under Grant 2019I0003, and in part by the Independent
Innovation Foundation of AECC under Grant ZZCX-2018-017. The Asso- tion methods can be divided into three types: 1) using context
ciate Editor coordinating the review process was Dr. Damodar Reddy Edla. information; 2) applying feature fusion; and 3) generating
(Corresponding author: Nianyin Zeng.) enhanced features. For example, a fully end-to-end object
Nianyin Zeng, Peishu Wu, and Han Li are with the Department of
Instrumental and Electrical Engineering, Xiamen University, Xiamen, Fujian detector has been proposed in [20], where an object relation
361005, China (e-mail: zny@xmu.edu.cn). module has been designed to integrate the context information
Zidong Wang, Weibo Liu, and Xiaohui Liu are with the Department of of the features. In [33], a feature pyramid network (FPN) has
Computer Science, Brunel University London, Uxbridge UB8 3PH, U.K.
(e-mail: zidong.wang@brunel.ac.uk). been proposed to merge the feature maps at different stages.
Digital Object Identifier 10.1109/TIM.2022.3153997 Recently, a path aggregation network has been introduced
1557-9662 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

in [36] by designing a bottom-up path enhancement branch, Motivated by the above discussions, there is a need to
which could integrate the information from high-level features develop an advanced object detection framework for PCB
and low-level ones in a sufficient manner. To deal with surface defect detection. In this article, an improved PCB
the inconsistency among different feature scales, an adaptive defect detection (IPDD) framework is put forward for defect
spatial feature fusion method has been proposed in [35] by detection, where the proposed ABFPN is embedded as the
learning the weighting parameters. Very recently, a trident feature fusion method in the IPDD framework. In summary,
network has been presented in [32] for detecting objects the main contributions of this article are outlined as follows.
in distinct sizes, where the atrous convolution method with 1) A novel feature fusion method, the ABFPN, is proposed
multiple dilation rates has been employed to generate different for small object detection, where a skip-ASPP module
receptive fields in parallel. with diverse dilation rates is designed to enlarge the
Unfortunately, the aforementioned small object detection receptive field. A balanced module is deployed to extract
methods still have some limitations, which do not fully mine latent features for feature fusion. Experimental results
latent information, such as more accurate location informa- demonstrate the effectiveness of the ABFPN on bench-
tion and stronger semantic features from the feature maps. mark datasets.
For instance, most context information-based object detection 2) An IPDD framework is put forward for PCB sur-
methods only concatenate high- and low-level features in a face defect detection, where the developed ABFPN is
simple manner; however, such fusion stage with rough stack- embedded as the feature fusion method in the IPDD
ing may cause the increase of redundant information, such as framework. An ablation study is conducted to verify the
noise information, which may decrease the detection perfor- effectiveness of the IPDD framework.
mance. In this case, existing small object detection methods 3) The proposed IPDD framework is successfully applied
may not be suitable for complex small object detection tasks to a public PCB tiny defect detection task. Experimental
in real-world applications, such as surface defect detection results demonstrate the superiority of the IPDD frame-
for the printed circuit board (PCB) [9], tiny target detection work over seven state-of-the-art methods [including the
for remote sensing images [31], and long-distance motion improved YOLOv3 (Impro YOLOv3), the improved
target detection [6]. A seemingly natural idea is to develop an faster RCNN (Impro faster RCNN), the fully convolu-
advanced small object detection framework by making full use tional one-stage object detection algorithm (FCOS), the
of context information and enhanced feature fusion together. PaddlePaddle-YOLO (PP-Yolo), the tiny defect detection
In this article, a novel feature fusion method, an atrous spa- network (TDD-Net), the efficient multiscale training
tial pyramid pooling (ASPP) balanced FPN (ABFPN), is put method (sniper), and the deformable detection trans-
forward for small object detection. The developed ABFPN former (deformable DETR)] in terms of detection pre-
makes full use of the advantages of the aforementioned three cision and recall.
types of small object detection methods. Specifically, a skip- The remainder of this article is organized as follows. The
ASPP module is developed to enhance feature fusion and proposed enhanced feature fusion method ABFPN and applied
expand the receptive field, where the ASPP with different robustness enhancement strategies are elaborated in Section II.
dilation rate D is set in a skip-connection manner [7]. Besides, Comprehensive benchmark evaluations of the ABFPN are
a balanced module consisting of three blocks (i.e., the resize & performed in Section III with an in-depth analysis of adopted
average block, the space nonlocal block, and the residual strategies. In Section IV, the proposed ABFPN is further used
block) is applied to learn the semantic and detailed infor- to develop the IPDD framework, which is applied to the PCB
mation more effectively. The features fused by the balanced surface tiny defect detection task. Finally, conclusions and an
module can have balanced information from each feature outlook of future works are presented in Section V.
map with different resolutions, which can avoid the semantic
information in nonadjacent layers being weakened with lateral II. M ETHODOLOGY
connections. Notice that the FPN is selected as the basis of
In this section, the structure of a typical object detection
the proposed ABFPN due to its capability in dealing with
framework is first illustrated. Then, the developed ABFPN
multiscale changes through the integration of low- and high-
is presented where the skip-ASPP module and the balanced
level features. It should be emphasized that the proposed
module are analyzed with details, which is a multiscale feature
ABFPN method is a competitive feature fusion approach,
fusion approach for small-sized object detection tasks. Mean-
which can be embedded in any existing object detection
while, some robustness enhancement strategies are introduced
framework.
for further improving the overall performance.
As a typical small object detection task, PCB surface
defect detection is very important in electrical and electronic
engineering. Generally speaking, the surface defects in PCB A. Structure of a Typical Object Detection Framework
can be classified into six categories: missing holes, mouse In a typical object detection network, there are generally
bite, open circuit, short circuit, spur, and spurious copper [9]. four basic components that are the input layer, the backbone,
In the public datasets, it is found that the PCB surface defects the neck, and the detection head [1]. The architecture of the
normally lie in a concealed area, and some of them even exist typical object detection network is shown in Fig. 1.
in the tiny wiring part, which greatly increases the difficulty In general, the input of the object detection framework
of surface defect detection. requires data augmentation to boost the robustness of the

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: SMALL-SIZED OBJECT DETECTION ORIENTED MULTISCALE FEATURE FUSION APPROACH 3507014

Fig. 1. General object detection framework.

Fig. 2. Diagram of the enhanced feature fusion method ASPP-balanced-FPN (ABFPN).

training model, especially for industrial applications. Some detection framework is the detection head, which is utilized
commonly used data augmentation techniques include spa- for localization and classification. It should be mentioned that
tial transformations (such as random scaling, cropping, and there is always a postprocessing module in the detection head,
flipping) and color distortions (e.g., changing transparency, which usually refers to the nonmaximum suppression (NMS)
brightness, and saturation). The backbone part is set for method [25] and its improved versions, such as the soft NMS
extracting features from the input layer. Some widely used method [3] and the weighted NMS method [26].
models include the visual geometry group [44], the residual
network (ResNet) [17], and the dark network [42]. The neck
part is of vital importance in object detection. To be specific, B. ABFPN: An Enhanced Feature Fusion Approach
feature fusion is carried out in the neck part to reprocess the The diagram of the proposed ABFPN is depicted in the red
extracted features and study the latent features according to dashed box of Fig. 2, where the proposed ABFPN is the neck
different requirements. For example, the SSD proposed in [37] part of the object detection framework. In the ABPFN, there
is applied for up and down sampling. The FPN can be used are two designed modules (which are the skip-ASPP module
for path aggregation [33]. The last component of the object and the balanced module) for feature fusion.

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

In Fig. 2, C1 denotes the feature map obtained by down- the space nonlocal block, and the residual block [39], [48].
sampling the input image; C = {C2 , C3 , C4 , C5 } denote the The work [39] proposed Libra RCNN that solves the problem
feature maps obtained by the corresponding residual block in of imbalance image sampling and feature selection, especially
the backbone at each stage. In this context, C5 is the output the balanced operation of different layers. Inspired by this,
feature map of the last residual block at the final stage of the utilized “balanced module” in the ABFPN handles the further
backbone, which is the input of the skip-ASPP module. refined feature maps from FPN and skip-ASPP, which enables
Compared with the traditional convolution operator, the detection models achieve the balance of enhanced feature
atrous convolution operator could obtain a larger receptive fusion and obtain more sufficient image context and receptive
field without increasing the number of kernel parameters. field information. To be specific, the resize & average block B1
In this article, D in the D-ASPP block stands for the dilation is designed for gathering multilevel features in P by resizing
rate. Notice that the larger the dilation rate, the larger the cor- and averaging P2 , P3 , and P5 to the same size as P4 . The
responding receptive field. As a result, five different D-ASPP output of B1 is
blocks are employed in the developed ABFPN, which enables  
pool(P2 , 4), pool(P3 , 2), intp(P5 , 2)
the model to capture multiscale context information. In the x= (3)
simulation, the values of D in five D-ASPP blocks are set to 3
be 3, 6, 12, 18, and 24, respectively, which are the same as where pool(P2 , 4) and pool(P3 , 2) represent the max pooling
DenseASPP [50]. It should be pointed out that [50] adopts operations with stride equaling to 4 and 2 for P2 and P3 ,
dense connection, which works well in networks with deeper respectively; intp(P5 , 2) denotes the nearest neighbor inter-
layers, while, in the proposed skip-ASPP module as a part of polation for P5 with multiplier factors of height and width
the ABFPN, skip connection has been employed, which could equaling to 2.
also reduce the computational complexity so as to speed up In general, once the scale of convolutional kernels is deter-
the convergence and inference. mined, the generated receptive field will be restricted to some
The skip connection is applied in the skip-ASPP module local regions of the feature map. To overcome the limitation of
to enhance the interaction of the preoutput and postoutput local information, the space nonlocal module B2 is employed
features of each D-ASPP block and enhance the feature to gather global information of the feature map. Based on the
fusion. The work principle of the whole skip-ASPP module output of B1 , the nonlocal output yi is obtained by
is formulated as follows: 
⎧ ∀ j f xi , x j c x j
⎨ yi =  (4)
⎨C5 ⊕ Si (C5 ), if i = 1
∀ j f xi , x j
outi = outi−1 ⊕ Si (outi−1 ), if i = 2, 3, 4 (1)

⎩ where x i ∈ x indicates the information of the current focused
Si (outi−1 ), if i = 5
location; x j represents the global information of the output of
where Si (·)(i = 1, 2, 3, 4, 5) stands for the operation of corre- B1 ; c(·) is the 1 × 1 convolution operator; and f (·) is the
sponding D-ASPP blocks; each D-ASPP block contains 1 × Embedded Gaussian function used to calculate the similarity
1 and 3 × 3 atrous convolution operator with dilation_rate = of x i and x j . f (·) is defined by
3, 6, 12, 18, 24, respectively; ⊕ is the concatenate operation; ·φ ( x j )
f x i , x j = eθ (xi )
T
and outi (i = 1, 2, 3, 4, 5) is the obtained result in each (5)
D-ASPP, as marked in Fig. 2.
where θ (·) and φ(·) both stand for 1 × 1 convolution operator.
The final output of the skip-ASPP module is calculated by
According to [48], the output of block B2 is
Out = S1 (C5 ) ⊕ S2 (out1 ) ⊕ S3 (out2 ) ⊕ S4 (out3 ) ⊕ S5 (out4 ).
z i = c(yi ) + x i (6)
(2)
where c(·) is the 1 × 1 convolution operator.
As shown in Fig. 2, the final output of the skip-ASPP The residual block B3 scatters refined features from the
module is then added with C5 in the elementwise manner output of B2 in a multilevel manner through a residual path.
after the 1 × 1 convolution operator to obtain the feature map To be specific, the operation of block B3 can be expressed by
P5 . Similar to the conventional FPN, P = {P2 , P3 , P4 , P5 } the following formula:
shares a concatenated path from P5 , which is combined with
C = {C2 , C3 , C4 , C5 } through lateral connection. In partic- Fk = Pk + intp z k , 0.5k−4 , (k = 2, 3, 4, 5) (7)
ular, the upsampled P5 , P4 , and P3 are merged with the
where intp(·) resizes the output of the space nonlocal block z k
corresponding feature maps C4 , C3 , and C2 in the element-
to be identical with the corresponding feature maps Pk (k =
wise manner. Note that the 1 × 1 convolution operation is
2, 3, 4, 5). Finally, via a skip connection, the output feature
performed on {C2 , C3 , C4 } to reduce the channel dimension
maps F = {F2 , F3 , F4 , F5 } of the entire ABFPN approach are
before merging with feature maps. After that, the obtained
obtained.
feature maps P (including P2 , P3 , P4 , and P5 ) are fed into the
balanced module. In order to balance detailed and semantic
information on small target detection tasks and improve the C. Robustness Enhancement Strategies for Object Detection
overall detection performance, the utilized balanced module It is worth pointing out that the proposed ABFPN
contains three blocks, which are the resize & average block, serves as the neck part in an object detection framework.

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: SMALL-SIZED OBJECT DETECTION ORIENTED MULTISCALE FEATURE FUSION APPROACH 3507014

The developed ABFPN aims to sufficiently merge abun-


dant context information with the hope to achieve satisfac-
tory detection accuracy for small-size objects. To further
improve the generalization ability and detection accuracy of
the framework, some existing robustness enhancement strate-
gies are employed in other components of the object detection
framework.
In this article, two well-known data augmentation tech-
niques, the AutoAugmetImage method [8] and the Mixup
method [52], are applied in the input part of the model training
process. Both of them can enhance the model performance,
and to be specific, AutoAugmetImage can automatically select
the optimal combination of enhancement strategies for dif-
ferent datasets, which customizes a data-specific augmen-
tation scheme, whereas the Mixup method can enrich the
database via randomly mixing two samples, including their
labels. In this way, the influence of samples with the wrong
Fig. 3. Illustration of the enhanced ResNeXt block.
label can be greatly reduced so that the model robustness is
improved.
In the backbone part, the ResNet proposed in [17] has III. E VALUATIONS OF THE P ROPOSED
become a popular network structure. In this article, the ASPP-BALANCED -FPN ON
ResNeXt structure [49] is adopted as the backbone, which B ENCHMARK DATASETS
includes stacked bottleneck paths with the same topology In this section, sufficient ablation studies are conducted on
and one shortcut pooling path. It should be highlighted that three public benchmark datasets for verifying the performance
each bottleneck path contains a squeeze-and-excitation (SE) of the proposed ABFPN, which are the COCO [34], the
attention mechanism, which is denoted as the attention bot- VOC [11], and the VisDrone detection dataset [56]. A brief
tleneck path in this article. Unimportant channel features are introduction of adopted datasets and experimental settings is
suppressed via an SE operator in each path, and the SE presented. Meanwhile, the faster RCNN with ResNet50 [43] is
operator essentially consists of one global average pooling selected as the baseline of detection method to verify the effec-
layer and two fully connected layers with the sigmoid func- tiveness and generalization ability of the proposed ABFPN
tion [21]. Moreover, the deformable convolution operator [54] along with the utilized robustness enhancement strategies.
is employed as a substitution of the traditional convolution A series of ablation studies are performed under the same
operator so that the receptive field can be adaptively adjusted condition for evaluation.
according to size, posture, and other geometric changes of the
objects. Furthermore, a stride equaling to 2 is shifted from A. Experiment Settings and Datasets
the first 1 × 1 convolution operator to the 3 × 3 one in each In this work, three well-known benchmark datasets in object
attention bottleneck path. In addition, a stride equaling to 2 detection, the MS COCO2017, the Pascal VOC07+12, and the
is shifted from the 1 × 1 convolution operator to the 2 × VisDrone2019 detection datasets, are applied for performance
2 average pooling operator in the shortcut pooling path. The evaluation. The COCO2017 dataset is a large-scale image
operation of shifting the position with a stride size of 2 could dataset consisting of 330 000 images of which more than
prevent the loss of a large amount of feature information. 200 000 are labeled. In COCO2017, there are 1.5 million
The diagram of the enhanced ResNeXt block is displayed object instances belonging to 80 categories. The Pascal
in Fig. 3. VOC07+12 dataset contains two mutually exclusive image
The cascade RCNN introduced in [4] is selected as the datasets (i.e., VOC2007 and VOC2012), which covers 20 kinds
detection head in this article, which is denoted by cascade of objects, and the number of instances in Pascal VOC07+12
RCNN* in Fig. 2. Specifically, the complete intersection over is over 20 000. The VisDrone2019 dataset contains ten classes
union (CIoU) loss proposed in [57] is applied to evaluate and 54 200 instances of remotely sensed objects collected by
the predicted bounding box. The DIoU-NMS serves as the drones, which covers complex scenes under different weather
postprocessing method [57]. The CIoU loss and DIoU-NMS and lighting conditions, and the detected targets are relatively
are utilized in the postprocessing stage of the head part in small in size, which makes the detection more challenging.
the object detection framework. It is remarkable that the In the experiment on the COCO2017 dataset, the numbers
employment of CIoU loss and DIoU-NMS considers: 1) the of training and testing samples are 118 287 and 5000, respec-
overlap areas between the predicted box and ground truth; 2) tively. For the VOC07+12 dataset, 16 551 images are used
the distance between the center points of the predicted box for training, and 4952 images are utilized for testing. The
and ground truth; and 3) the aspect ratio of the bounding box, training and validation sets of the VisDrone2019 detection
which would lead to a more reliable prediction result than dataset have 7018 and 1609 images, respectively. All models
traditional methods. are trained on the PaddlePaddle 1.8.4 framework with a single

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

TABLE I
E XPERIMENTAL S ETTINGS

TABLE II
A BLATION S TUDY ON THE MS COCO2017 D ATASET

GPU TeslaV100 (16 GB memory). Detailed information of TABLE III


experimental settings on three datasets is presented in Table I. A BLATION S TUDY ON THE PASCAL VOC07+12 T ESTING D ATASET

B. Experimental Results
As aforementioned, evaluations of the proposed ABFPN and
several designed strategies are performed mainly in the form
of an ablation study on three benchmarks, where the two-stage
network faster RCNN is selected as the baseline method.
1) Validation on COCO2017: Table II presents the experi-
mental results of the ablation study on the COCO2017 dataset,
where the evaluation metrics include average precision and ABFPN-based faster RCNN and the traditional faster RCNN
its extensions. To be specific, AP is average precision over based on experimental results on the COCO2017 dataset.
IoU at [0.5:0.95:0.05] (from 0.5 to 0.95 with the interval of 2) Validation on VOC07+12: Experimental results on the
0.05). AP@50 is AP over IoU at 0.5. APs , APm , and APl refer VOC07+12 testing set are displayed in Table III. The popular
to average detection precision on small-, medium-, and large- metric mAP (0.50, 11point) is employed on the VOC07+12
scale objects, respectively. As shown in Table II, the AP of the dataset, where mAP (0.50, 11point) stands for the mean aver-
proposed ABFPN is 0.9% larger than that of the FPN. On the age precision values of 11 points with IoU greater than 0.5 and
APs and APm metrics, the results of the ABFPN are 3.7% and recall in the range of [0:1:0.1] (from 0 to 1 with the interval
0.7% larger than that of the FPN, respectively, while the APl of 0.1).
result of the ABFPN is slightly smaller than that of the FPN, In Table III, it can be clearly observed that the mAP of the
and the AP@50 of both methods are the same. Furthermore, ABFPN is 84.06%, which is nearly 1% larger than that of the
experimental results of the model (that combines the ABFPN standard FPN. After introducing the robustness enhancement
with the robustness enhancement strategies) are better than that strategies, the mAP value of the modified ABFPN-based object
of the faster RCNN with the neck of the FPN on all metrics. detection framework is further increased to 85.59%, which
Specifically, the AP, AP@50, APs , APm , and APl of the faster indicates that the applied robustness enhancement strategies,
RCNN with the ABFPN and strategies are larger than that with indeed, improve the overall performance of the framework.
the FPN by 4.7%, 4.4%, 6.6%, 4.3%, and 4.1%, respectively. Furthermore, the performance comparison of the faster
According to the experimental results, the proposed ABFPN RCNN [43], the hierarchical shot detector (HSD) [5], the
is a reliable feature fusion method, which greatly increases the Perona Malik [24], the intertwiner network (InterNet) [28],
detection precision of small-size objects. Though the proposed the refinement detector (RefineDet) [53], the Blitz Network
ABFPN performs not well on the indicator APl , which may (BlitzNet) [10], the early exit evolutionary architecture net-
be caused by overfitting because the ABFPN concentrates work (EEEA-Net) [46], and our method on the VOC07+12
on the latent context information. By introducing a series dataset is shown in Table IV. Notice that the data of the
of robustness enhancement strategies, the deficiency of the utilized methods are directly obtained from the corresponding
ABFPN on the indicator APl is overcome. Other indicators literature, which is marked in Table IV. Experimental results
have been significantly increased as well, indicating improved demonstrate the effectiveness of the proposed ABFPN for
overall performance. As such, the combination of the ABFPN small-size objects’ detection compared with some state-of-the-
and robustness enhancement strategies performs better than the art algorithms. It is noteworthy that the comparison algorithms

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: SMALL-SIZED OBJECT DETECTION ORIENTED MULTISCALE FEATURE FUSION APPROACH 3507014

Fig. 4. Diagram of the proposed IPDD framework.

TABLE IV on multiple databases. To further validate the practicality of


D ETECTION E VALUATION R ESULTS OF D IFFERENT A LGORITHMS ON THE the ABFPN, in next section, it is applied to detect tiny surface
PASCAL VOC07+12 T ESTING D ATASET
defects of PCB.

IV. A PPLICATION IN PCB D EFECT D ETECTION


In this section, an IPDD framework is designed to detect
tiny surface defects in PCB, where the proposed ABFPN is
incorporated with the aforementioned robustness enhancement
strategies. To verify its effectiveness and practicality, the
developed IPDD framework is tested on the public PCB defect
dataset.

A. IPDD Framework
The proposed IPDD framework consists of the input layer,
used are architecturally designed to be suitable for application the backbone, the neck, and the detection head. The diagram of
to small target detection tasks; hence, the results are totally the IPDD framework is displayed in Fig. 4. The enhancement
comparable. Specifically, the proposed method achieves the strategies used in each part of the IPDD framework are
best result in terms of mAP. described in Section II-C. It is worth emphasizing that the
3) Validation on VisDrone2019: In this part, the results enhanced ResNeXt structure (including 152 layers with 50
of the ABFPN-based object detection framework with the blocks) is selected as the backbone, which is denoted as
robustness enhancement strategies on the VisDrone2019 detec- Enhanced-ResNeXt-152. Meanwhile, the proposed ABFPN is
tion dataset are with the ablation studies in Table V. It is chosen as the neck part, and the cascade RCNN* is selected
worth mentioning that the detection on the VisDrone2019 as the detection head.
dataset is a difficult small-sized target detection task, and In object detection, localization and classification are the
the chosen evaluation metrics are the same as used on the most significant tasks, by which the object bounding box and
COCO2017 dataset. As can be seen from Table V, the ABFPN the corresponding category are determined correctly. In Fig. 4,
can also guarantee a 1% improvement in average precision the localization and classification are highlighted within a
on complex detection tasks compared to the FPN, and the blue box. In the proposed IPDD framework, the head part
better performance is especially noticeable on smaller size employs a region proposal network (RPN) to obtain regions of
targets. When related strategies are further introduced, the interest (RoI). In addition, the RPN is applied to distinguish the
improvement in the five metrics AP, AP@50, APs , APm , foreground (i.e., the PCB surface defects) and the background.
and APl is 2.5%, 3.2%, 2.3%, 3.5%, and 2.7%, respectively, As stated previously, the feature maps F = {F2 , F3 , F4 , F5 }
compared to the original FPN. are the final output of the neck part and are also the input of the
The validation of the ablation experiments on the above detection head. Then, multiple proposals with different sizes
three public challenging datasets demonstrates the effective- and aspect ratios are generated at each position of the feature
ness of the proposed ABFPN and related strategies, which map. Each proposal is matched with corresponding ground
are particularly suitable for small-sized detection tasks; mean- truth and performed by the IoU threshold filtering operation,
while, the generalization ability of the ABFPN is also proven which could, thus, distinguish positive and negative samples.

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

TABLE V
A BLATION S TUDY ON THE V IS D RONE 2019 D ETECTION D ATASET

The bounding box loss function L rpn_bbox in the RPN is the distance between b p and bt , which is defined by
expressed by
ρ 2 b p , bt
2 dist b p , bt = (11)
M 0.5 ∗ loc p − loct , if dif < σ c2
L rpn_bbox =    
 
M σ ∗ loc p − loct − 0.5 ∗ σ , otherwise
2 where c is the diagonal distance of the smallest bounding
rectangle, which can cover both b p and bt ; ρ(·) stands for
(8)
the Euclidean distance.
where M is the average operation; loc p and loct represent The total loss function of the cascade RCNN* is given by
the location of bbox (short for bounding box) predicted by the 
3
RPN and the target bbox, respectively; dif = |loc p −loct | is the L total = L rpn_cls + L rpn_bbox + L ihead_cls + L ihead_bbox (12)
absolute value of the difference between loc p and loct ; and σ i=1
is the threshold parameter, which is set to 3 in this simulation.
where L head_cls is the cross-entropy loss function, as shown
Besides, the classification loss function of the RPN is
in (9).
 K 
j
 The DIoU-NMS method is applied for further refining
L rpn_cls = M −scls · l j + log i
exp scls the prediction results to preserve the best bounding box,
i=0 as there may be other redundant PCB tiny defects. It is worth
( j = 1, 2, . . . , K ) (9) mentioning that the DIoU-NMS method considers not only the
IoU value but also the distance between center points of two
where scls denotes the prediction score, l is the real label, and
bounding boxes. The DIoU-NMS method provides a score as
K represents the total number of categories.
reference, and the process of the method is
It should be highlighted that the RPN only accomplishes
the rough proposals, which needs further refinements. In fact, scorei , if |IoU − dist(b M , bi )| < 
a single PCB image may probably contain more than one scorei = (13)
0, otherwise
defect. As such, it is of vital significance to further identify
each type precisely from the proposals. Both the feature map where  is the threshold of the DIoU-NMS method, which
F and the generated RoI have performed a series of cascade is set to be 0.5 in this work; b M is the bounding box with
operations, denoted by the RoI align and the Bbox head blocks, the highest confidence value, and bi stands for nearby boxes.
as shown in Fig. 4. If the score is set to be 0, the corresponding box will be
Three cascade levels are resampled to increase the IoU redundant for a certain defect, which will be filtered out.
value of the proposals stage by stage. The “RoI align” blocks Otherwise, a small value of |IoU − dist(b M , bi )| implies that
adjust features of the candidate areas to a fixed size through the obtained box may belong to another defect, which should
the pooling operation. The “Bbox head” blocks obtain the not be eliminated arbitrarily.
prediction bounding box Bpre and classification score Scls . Each The pseudocode of the proposed IPDD framework is pro-
cascade stage is trained by using the positive and negative vided in Algorithm 1.
samples with different IoUs, and the output of the previous
stage serves as the input of the next stage. If the IoU of the B. Evaluation Results and Discussions of
generated RoI increases, the next cascade stage will focus on the IPDD Framework
a certain area in the updated proposal, so as to improve the To evaluate the performance of the proposed IPDD frame-
detection accuracy.
work, the PKU public PCB defect detection dataset has
For loss functions of the detection head, the classification been adopted [9]. Some existing defect detection algorithms
loss function L head_cls adopts the cross-entropy loss function, have been utilized for performance evaluation, including the
as shown in (9), and the CIoU loss mentioned in Section II-C
Impro YOLOv3 [29], the FCOS [47], the PP-Yolo [38], the
is used for the bounding box loss L head_bbox . The bounding Impro faster RCNN [19], the TDD-Net [9], the deformable
box loss L head_bbox of the head is calculated by
DETR [55], and the sniper [45]. Among the utilized methods,
  the Impro faster RCNN, the TDD-Net, the deformable DETR,
L head_bbox = M 1 − IoU + dist b p , bt + αν (10)
and the sniper are two-stage methods, which are similar to our
where b p and bt represent the predicted box and the real IPDD framework.
bounding box, respectively; α and ν are two influence factors The utilized dataset contains 693 images with six different
with respect to the aspect ratio of b p and bt . dist(·) calculates types of defects (including missing hole, mouse bite, open

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: SMALL-SIZED OBJECT DETECTION ORIENTED MULTISCALE FEATURE FUSION APPROACH 3507014

Algorithm 1 Pseudocode of the Proposed IPDD Framework


Require:
RGB images with PCB surface defects
Ensure:
The predicted bounding boxes and corresponding classifi-
cation results of PCB defects
1: Use the AutoAugmetImage and mixup techniques for data
augmentation;
2: the Backbone part Enhanced-ResNeXt-152 returns feature
maps C = {C2 , C3 , C4 , C5 };
3: the Neck part ABFPN outputs feature maps F =
{F2 , F3 , F4 , F5 } based on Eq. 1 - Eq. 7;
4: Enter the region proposal network (RPN) in the head part
to generate regions of interest (RoI);
5: Calculate the loss of the RPN, including L r pn_bbox by Eq. 8
and L r pn_cls by Eq. 9;
6: For i from 1 to 3:
Perform RoI align feature extraction;
Obtain the updated bounding box B pre and classification
score Scls ;
Calculate L head_bbox and L head_cls referring to Eq. 10 and
Eq. 11;
Endfor
7: Calculate total loss L total of the head part according to Fig. 5. Partial visualization information of the PCB defect detection dataset.
(a) Number of each PCB defect type. (b) Frequency of various area_ratios.
Eq. 12;
8: Apply the DIoU-NMS method for further refinement;
9: Get the final prediction bounding boxes and corresponding to be converged when AP, AP@50, and AP@75 are 56.4%,
classification scores. 98.8%, and 57.8%, respectively.
Table VI displays the comparison results of the proposed
IPDD framework and the other seven state-of-the-art detec-
tion methods. It is noteworthy that Impro YOLOv3, FCOS,
deformable DETR, and sniper are all the detection methods
circuit, short, spur, and spurious copper). The dataset is with excellent performance in small-sized object detection
visualized in Fig. 5, where the number of each defect type tasks, and TDD-Net is a specific method proposed for PCB
is plotted in Fig. 5(a). area_ratios is the proportion of the small defect detection. Evaluation metrics are the same, as pre-
ground-truth bounding box to the entire image, which also sented in Table II, with two extra ones that are AP@75 and
reflects the relative size of objects for detection. In Fig. 5(b), average recall (AR) rate. The larger the AR rate, the more
it is clear that almost all defects only occupy a tiny area positive samples are classified correctly. As shown in Table VI,
in an image, which makes it challenging to achieve accurate the proposed IPDD framework achieves the best results on all
positioning and classification results. performance indicators, which demonstrates the effectiveness
In the simulation, the proposed IPDD framework is of the IPDD framework for PCB defect detection. In particular,
trained with 50 000 iterations, and the initial learning rate is the IPDD framework outperforms the suboptimal method
0.00125. The decay factor is 0.1 in the iteration interval of sniper on all evaluation metrics of AP, AP@50, AP@75,
[42 000, 48 000]. 593 PCB images are randomly selected as APs , APm , APl , and AR. Compared with Impro YOLOv3
the training samples, whereas the rest 100 pictures are used for (which ranks second on APs ), the indicator APs is improved
testing. Other experimental settings and environments remain by 1.7% when using the IPDD framework, which indicates
the same, as presented in Section III. the superiority of the proposed IPDD framework on detecting
1) Algorithm Verification and Comparison: The change small defects.
curves of five loss functions (i.e., L rpn_cls , L rpn_bbox , L head_cls , In addition, TDD-Net, a dedicated algorithm proposed for
L head_bbox , and L total ) are shown in Fig. 6, where each loss PCB tiny defect detection, is selected in this article as a
value is calculated every 50 iterations. It is observed that, when comparison method for visualization and subsequent error
iteration passes nearly 880 × 50 = 44 000, the oscillation of analysis. For an intuitive view, experimental results of the
L total is restricted in a small range, which can be deemed to proposed IPDD framework and the TDD-Net are visualized
reach the stable state. Besides, Fig. 6(f) presents the change in Fig. 7. The first two columns are results obtained by
of AP, where AP@50 and AP@75 denote AP over IoU at 0.5 the IPDD framework, where images are enlarged for a clear
and 0.75, respectively. The precision value is sampled every view. Similarly, the last two columns are results obtained
2000 iteration, and when evaluation times reach 22, i.e., the by the TDD-Net. It should be highlighted that, for the
number of iterations is 22 × 2000 = 44 000, the curves tend mouse_bite defect shown in line 3, the TDD-Net outputs a

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Fig. 6. Iteration curves of loss functions and precision. (a) L rpn_cls . (b) L rpn_bbox . (c) L cls_head . (d) L bbox_head . (e) L total . (f) precision.

TABLE VI
C OMPARISONS OF D IFFERENT M ETHODS FOR PCB D EFECT D ETECTION

redundant prediction bounding box. By using the proposed


IPDD framework, the positioning is more accurate than that of
the TDD-Net with a higher confidence value that equals 0.99,
which shows that the proposed IPDD framework demonstrates
better overall performance than TDD-Net in terms of both
localization and classification of small objects.
Furthermore, to comprehensively evaluate the detection per-
formance of the proposed IPDD framework on each type of
defect, the precision–recall (PR) and score–recall (SR) curves
are employed for evaluation. Experimental results are shown
in Fig. 8, where the IoU threshold is fixed to 0.5. The PR
curve reflects a tradeoff between classification accuracy and
capability to cover positive samples (i.e., recall). The SR curve
shows the confidence scores under different recall values.
Generally, the value of precision and confidence scores will Fig. 7. Comparison of visualization results of our IPDD framework (left two
columns) and TDD-Net (right two columns).
monotonically decline as recall increases. Thus, an effective
and practical model is supposed to enable the precision and
confidence scores to maintain stability even when the recall level with growing recall, which validates the robustness and
is increased. As a result, the larger area enclosed by PR, reliability of the IPDD framework on PCB defect detection.
SR curves, and coordinate axes, the better performance of 2) Ablation Study and Error Analysis: To further validate
the model. Fig. 8 shows that the IPDD framework is able to the effectiveness of our proposed IPDD framework, an abla-
keep the value of precision and confidence score at a high tion study has been conducted in this article, where two

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: SMALL-SIZED OBJECT DETECTION ORIENTED MULTISCALE FEATURE FUSION APPROACH 3507014

Fig. 8. Precision–recall and score–recall curves of each defect type (IoU = 0.5).

TABLE VII
A BLATION S TUDY R ESULTS OF THE IPDD F RAMEWORK

variant IPDD frameworks (i.e., IPDD-Nv1 and IPDD-Nv2) are that other introduced strategies can effectively enhance the
adopted. To be specific, the IPDD framework employs both the robustness of the model.
designed ABFPN and other robustness enhancement strategies. Fig. 9 is the scatter plot of the precision and recall,
In the variant IPDD framework, IPDD-Nv1, the input layer is including eight PCB defect detection methods and the two
the original one without using data argumentation techniques, variant IPDD frameworks. Based on the relationship between
the backbone is the conventional ResNeXt-152, the neck part precision and recall, the point in the upper right corner
is the proposed ABFPN, and the cascade RCNN is used as indicates that the model is robust. As can be seen in Fig. 9,
the detection head. The only difference between IPDD-Nv2 the proposed IPDD framework is the best out of ten methods.
and IPDD-Nv1 is that the neck part in IPDD-Nv2 is a It should also be noticed that the variant IPDD-Nv1 that only
conventional FPN. employs the proposed ABFPN ranks second, which implies
The ablation study results are shown in Table VII. It can be that the introduced ABFPN is competitive in small object
seen in Table VII that IPDD-Nv1 outperforms IPDD-Nv2 on detection.
all indicators, particularly on APs . The APs of IPDD-Nv1 is In addition, the PR curve is used for error analysis [18].
3.1% larger than that of the IPDD-Nv2, which indicates the Fig. 10(a) and (b) shows the PR curves of the TDD-Net and
competitiveness of the designed ABFPN (that can be seen as the IPDD framework, where seven colored areas are marked.
an outstanding feature fusion method). By further introducing To be specific, C75 and C50 stand for the area enclosed by
robustness enhancement strategies, it is found that, except for the PR curve and coordinate axes at IoU = 0.75 and IoU =
APs , the IPDD framework has increased by 0.6%, 0.6%, 3.6%, 0.5, respectively. Compared with the TDD-Net, the proposed
0.6%, 2.9%, and 0.7%, respectively, on AP, AP@50, AP@75, IPDD framework has an improvement on the AP by 3.6%
APm , APl , and AR compared with IPDD-Nv1. On the APs on C50 and 14.1% on C75 , which indicates the effectiveness
metric, the value of APs in the proposed IPDD framework and superiority of the proposed ABFPN in positioning. After
is slightly smaller than that of the IPDD-Nv1 due mainly to removing location errors, the obtained new area is denoted by
the reason that the applied strategies focus on objects with the indicator Loc. Notice that AP of the IPDD framework on
middle size or large size. As such, the proposed IPDD frame- Loc is further increased from 98.8% to 99.3%, whereas AP of
work could achieve satisfactory overall detection performance. TDD-Net on Loc is changed from 95.2% to 96.5%, which
Improvements on the other six indicators have demonstrated indicates that inaccurate localization is a common reason

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

achieve precise classifications. The results on AP regarding


Oth and BG in the TDD-Net demonstrate that the classification
accuracy of the TDD-Net is worse than that of the proposed
IPDD framework. The last indicator FN is the AP value after
eliminating all kinds of mistakes. Based on the above discus-
sions, the proposed IPDD framework demonstrates remarkable
classification accuracy, and the main reason for inaccurate
detection is imperfect positioning performance.

V. C ONCLUSION
In this article, an IPDD framework has been put forward
for PCB surface defect detection, where an ABFPN has been
designed as the neck part of the IPDD framework for feature
fusion. In the developed ABFPN, the atrous convolution oper-
ator with different dilation rates has been utilized to enlarge
the receptive field. The skip connection has been adopted for
the atrous convolution operators, which could enhance the
interactions among features at different levels. In addition,
a balanced module has been introduced in the ABFPN for
Fig. 9. Scatter plot of the precision–recall relationship of each algorithm. studying the semantic information of the obtained features.
The performance of the ABFPN has been evaluated on three
public datasets, and the ablation studies prove the effective-
ness of the ABFPN, especially for small-sized objects. The
designed IPDD framework has been successfully applied to
small object detection with application to PCB surface defect
detection. Several robustness enhancement strategies have
been employed in the IPDD framework to further improve
the overall detection performance. Experimental results have
demonstrated the superiority of the proposed IPDD framework
over seven state-of-the-art methods in terms of both localiza-
tion and classification.
In the future, we aim to: 1) apply the proposed IPDD
framework to other small object detection tasks, such as
defect detection of industrial components and object detection
in pastoral landscapes; 2) investigate a precise localization
method to improve the positioning performance of the IPDD
framework; and 3) utilize evolutionary computation algorithms
to tune the hyperparameters of the proposed IPDD framework.

R EFERENCES
[1] A. Bochkovskiy, C.-Y. Wang, and H.-Y. Mark Liao, “YOLOv4: Optimal
speed and accuracy of object detection,” 2020, arXiv:2004.10934.
[2] Y. Bao et al., “Triplet-graph reasoning network for few-shot metal
generic surface defect segmentation,” IEEE Trans. Instrum. Meas.,
vol. 70, pp. 1–11, 2021.
[3] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS–
improving object detection with one line of code,” in Proc. IEEE Int.
Conf. Comput. Vis., Oct. 2017, pp. 5562–5570.
[4] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object
detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach.
Fig. 10. Error analysis via precision–recall curves. (a) TDD-Net. (b) IPDD Intell., vol. 43, no. 5, pp. 1483–1498, May 2021.
framework. [5] J. Cao, Y. Pang, J. Han, and X. Li, “Hierarchical shot detector,” in Proc.
IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 9704–9713.
[6] E. Chen, O. Haik, and Y. Yitzhaky, “Online spatio-temporal action
that causes the low detection performance. To conclude, the detection in long-distance imaging affected by the atmosphere,” IEEE
proposed IPDD framework performs better than the TDD-Net. Access, vol. 9, pp. 24531–24545, 2021.
[7] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
The indicator Oth is the value of AP after eliminating all “DeepLab: Semantic image segmentation with deep convolutional nets,
misclassification results; furthermore, when all false-positive atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern
samples are removed, the AP value is characterized by BG. Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.
[8] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “AutoAug-
It is found that both Oth and BG remain unchanged in ment: Learning augmentation strategies from data,” in Proc. IEEE/CVF
Fig. 10(b), which shows that the IPDD framework could Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 113–123.

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
ZENG et al.: SMALL-SIZED OBJECT DETECTION ORIENTED MULTISCALE FEATURE FUSION APPROACH 3507014

[9] R. Ding, L. Dai, G. Li, and H. Liu, “TDD-Net: A tiny defect detection [34] T. Lin et al., “Microsoft COCO: Common objects in context,” in Proc.
network for printed circuit boards,” CAAI Trans. Intell. Technol., vol. 4, Eur. Conf. Comput. Vis., Oct. 2014, pp. 740–755.
no. 2, pp. 110–116, 2019. [35] S. Liu, D. Huang, and Y. Wang, “Learning spatial fusion for single-shot
[10] N. Dvornik, K. Shmelkov, J. Mairal, and C. Schmid, “BlitzNet: A real- object detection,” 2019, arXiv:1911.09516.
time deep network for scene understanding,” in Proc. 16th IEEE Int. [36] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for
Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 4174–4182. instance segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
[11] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser- Recognit., Jun. 2018, pp. 8759–8768.
man, “The Pascal visual object classes (VOC) challenge,” Int. J. Comput. [37] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf.
Vis., vol. 88, no. 2, pp. 303–338, Sep. 2009. Comput. Vis., Aug. 2016, pp. 21–37.
[12] H. Geng, H. Liu, L. Ma, and X. Yi, “Multi-sensor filtering fusion [38] X. Long et al., “PP-YOLO: An effective and efficient implementation
meets censored measurements under a constrained network environment: of object detector,” 2020, arXiv:2007.12099.
Advances, challenges and prospects,” Int. J. Syst. Sci., vol. 52, no. 16, [39] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra
pp. 3410–3436, Dec. 2021. R-CNN: Towards balanced learning for object detection,” in Proc. IEEE
[13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature Conf. Comput. Vis. Pattern Recognit., no. 3, Jun. 2019, pp. 821–830.
hierarchies for accurate object detection and semantic segmentation,” [40] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Columbus, OH, once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput.
USA, Jun. 2014, pp. 580–587. Vis. Pattern Recognit., Jun. 2016, pp. 779–788.
[14] R. Girshick, “Fast R-CNN,” in Proc. ICCV, Sep. 2015, pp. 1440–1448. [41] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,”
[15] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, pp. 6517–6525.
Feb. 2020. [42] J. Redmon and A. Farhadi, “YOLOV3: An incremental improvement,”
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pool- in Proc. IEEE Conf. CVPR, Apr. 2017, pp. 1–6.
ing in deep convolutional networks for visual recognition,” IEEE [43] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, real-time object detection with region proposal networks,” IEEE Trans.
Sep. 2014. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for [44] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.
Apr. 2016, pp. 770–778. (ICLR), San Diego, CA, USA, May 2015, pp. 1–14.
[18] D. Hoiem, Y. Chodpathumwan, and Q. Dai, “Diagnosing error in [45] B. Singh, M. Najibi, and L. S. Davis, “SNIPER: Efficient multi-scale
object detectors,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2012, training,” in Proc. Adv. Neural Inf. Process. Syst., Montréal, QC, Canada,
pp. 340–353. Dec. 2018, pp. 9333–9343.
[19] B. Hu and J. Wang, “Detection of PCB surface defects with improved [46] C. Termritthikun, Y. Jamtsho, J. Ieamsaard, P. Muneesawang, and I. Lee,
faster-RCNN and feature pyramid network,” IEEE Access, vol. 8, “EEEA-Net: An early exit evolutionary neural architecture search,”
pp. 108335–108345, 2020. Eng. Appl. Artif. Intell., vol. 104, Sep. 2021, Art. no. 104397, doi:
[20] H. Hu, J. Gu, Z. Zhang, J. Dai, and Y. Wei, “Relation networks for object 10.1016/j.engappai.2021.104397.
detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), [47] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional
Jun. 2018, pp. 3588–3597. one-stage object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.
[21] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation (ICCV), Oct. 2019, pp. 9626–9635.
networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, [48] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural
pp. 2011–2023, Aug. 2020. networks,” in Proc. IEEE Int. Conf. Comput. Vis., Jun. 2018,
[22] J. Hu, H. Zhang, H. Liu, and X. Yu, “A survey on sliding mode pp. 7794–7803.
control for networked control systems,” Int. J. Syst. Sci., vol. 52, no. 6, [49] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual
pp. 1129–1147, Apr. 2021. transformations for deep neural networks,” in Proc. IEEE Conf. Comput.
[23] Y. Ju, X. Tian, H. Liu, and L. Ma, “Fault detection of networked Vis. Pattern Recognit., Jun. 2017, pp. 5987–5995.
dynamical systems: A survey of trends and techniques,” Int. J. Syst. [50] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, “DenseASPP for
Sci., vol. 52, no. 16, pp. 3390–3409, Dec. 2021. semantic segmentation in street scenes,” in Proc. IEEE Conf. Comput.
[24] S. Mishra et al., “Learning visual representations for transfer learning Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018,
by suppressing texture,” 2020, arXiv:2011.01901. pp. 3684–3692.
[25] A. Neubeck and L. J. Van Gool, “Efficient non-maximum suppression,” [51] W. Yue, Z. Wang, J. Zhang, and X. Liu, “An overview of recommen-
in Proc. IEEE Int. Conf. Pattern Recognit., Aug. 2006, pp. 850–855. dation techniques and their applications in healthcare,” IEEE/CAA J.
[26] C. Ning, H. Zhou, Y. Song, and J. Tang, “Inception single shot MultiBox Automatica Sinica, vol. 8, no. 4, pp. 701–717, Apr. 2021.
detector for object detection,” in Proc. IEEE Int. Conf. Multimedia Expo. [52] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond
Workshops (ICMEW), Jul. 2017, pp. 549–554. empirical risk minimization,” 2017, arXiv:1710.09412.
[27] H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,” [53] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot refinement
in Proc. Eur. Conf. Comput. Vis., Sep. 2018, pp. 734–750. neural network for object detection,” in Proc. IEEE Conf. Comput. Vis.
[28] H. Li, B. Dai, S. Shi, W. Ouyang, and X. Wang, “Feature intertwiner Pattern Recognit., Jun. 2018, pp. 4203–4212.
for object detection,” 2019, arXiv:1903.11851. [54] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable ConvNets v2: More
[29] J. Li, J. Gu, Z. Huang, and J. Wen, “Application research of improved deformable, better results,” in Proc. IEEE/CVF Conf. Comput. Vis.
YOLO v3 algorithm in PCB electronic component detection,” Appl. Sci., Pattern Recognit. (CVPR), Jun. 2019, pp. 9300–9308.
vol. 9, pp. 3738–3750, 2019. [55] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR:
[30] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual Deformable transformers for end-to-end object detection,” in Proc. 9th
generative adversarial networks for small object detection,” in Proc. Int. Conf. Learn. Represent. (ICLR), Vienna, Austria, May 2021.
IEEE CVPR, Jul. 2017, pp. 1951–1959. [56] P. Zhu et al., “Detection and tracking meet drones challenge,” IEEE
[31] J. Li and Z. Liu, “Self-measurements of point-spread function for remote Trans. Pattern Anal. Mach. Intell., early access, Oct. 14, 2021, doi:
sensing optical imaging instruments,” IEEE Trans. Instrum. Meas., 10.1109/TPAMI.2021.3119563.
vol. 69, no. 6, pp. 3679–3686, Jun. 2020. [57] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU
[32] Y. Li, Y. Chen, N. Wang, and Z.-X. Zhang, “Scale-aware trident loss: Faster and better learning for bounding box regression,” in Proc.
networks for object detection,” in Proc. IEEE/CVF Int. Conf. Comput. AAAI Conf. Artif. Intell., Feb. 2020, pp. 12993–13000.
Vis. (ICCV), Oct. 2019, pp. 6053–6062. [58] L. Zou, Z. Wang, J. Hu, Y. Liu, and X. Liu, “Communication-protocol-
[33] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, based analysis and synthesis of networked systems: Progress, prospects
“Feature pyramid networks for object detection,” in Proc. CVPR, and challenges,” Int. J. Syst. Sci., vol. 52, no. 14, pp. 3013–3034,
Jul. 2017, pp. 936–944. Oct. 2021.

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.
3507014 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Nianyin Zeng was born in Fujian, China, in 1986. Han Li received the bachelor’s degree in measure-
He received the B.Eng. degree in electrical engineer- ment and control technology and instrumentation
ing and automation and the Ph.D. degree in electrical from Xiamen University, Xiamen, China, in 2018,
engineering from Fuzhou University, Fuzhou, China, where he is currently pursuing the Ph.D. degree in
in 2008 and 2013, respectively. measuring and testing technologies and instruments.
From October 2012 to March 2013, he was an His research interests include intelligent optimiza-
RA with the Department of Electrical and Elec- tion algorithms and deep learning techniques.
tronic Engineering, The University of Hong Kong,
Hong Kong. From September 2017 to August 2018,
he was an ISEF Fellow founded by the Korea Foun-
dation for Advanced Studies, Seoul, South Korea,
and also a Visiting Professor at the Korea Advanced Institute of Science and
Technology, Daejeon, South Korea. He is currently an Associate Professor
with the Department of Instrumental and Electrical Engineering, Xiamen
University, Xiamen, China. He is the author or coauthor of several technical
papers and also a very active reviewer for many international journals and
conferences. His current research interests include intelligent data analysis,
computational intelligence, time-series modeling, and applications.
Dr. Zeng is currently serving as an Associate Editor for Neurocomputing,
Evolutionary Intelligence, and Frontiers in Medical Technology, and also an
Editorial Board Member of Computers in Biology and Medicine, Biomedical
Engineering Online, and Mathematical Problems in Engineering.

Peishu Wu received the bachelor’s degree in mea-


surement and control technology and instrumenta-
tion from the Tianjin University of Science and
Technology, Tianjin, China, in 2020. He is cur-
rently pursuing the master’s degree in measuring and Weibo Liu received the B.S. degree in electrical
testing technologies and instruments with Xiamen engineering from the Department of Electrical Engi-
University, Xiamen, China. neering and Electronics, University of Liverpool,
His research interests include computer vision and Liverpool, U.K., in 2015, and the Ph.D. degree in
deep learning techniques. computer science from Brunel University London,
Uxbridge, U.K., in 2019.
He is currently a Lecturer with the Department
of Computer Science, Brunel University London,
Uxbridge, U.K. His research interests include big
data analysis and deep learning techniques.
Zidong Wang (Fellow, IEEE) was born in Jiangsu,
China, in 1966. He received the B.Sc. degree
in mathematics from Suzhou University, Suzhou,
China, in 1986, and the M.Sc. degree in applied
mathematics and the Ph.D. degree in electrical engi-
neering from the Nanjing University of Science
and Technology, Nanjing, China, in 1990 and 1994,
respectively.
He is currently a Professor of dynamical systems
and computing with the Department of Computer
Science, Brunel University London, Uxbridge, U.K.
From 1990 to 2002, he held teaching and research appointments in universities
in China, Germany, and the U.K. His research interests include dynamical
systems, signal processing, bioinformatics, control theory, and applications.
He has published more than 600 articles in international journals. He is a
holder of the Alexander von Humboldt Research Fellowship of Germany, the
JSPS Research Fellowship of Japan, and the William Mong Visiting Research
Fellowship of Hong Kong.
Prof. Wang is a member of the Academia Europaea, the European Academy
of Sciences and Arts, and the program committee for many international Xiaohui Liu received the B.Eng. degree in comput-
conferences; an Academician of the International Academy for Systems and ing from Hohai University, Nanjing, China, in 1982,
Cybernetic Sciences; and a fellow of the Royal Statistical Society. He serves and the Ph.D. degree in computer science from
(or has served) as the Editor-in-Chief for International Journal of Systems Heriot-Watt University, Edinburgh, U.K., in 1988.
Science, Neurocomputing, and Systems Science & Control Engineering and He is currently a Professor of computing at Brunel
an Associate Editor for 12 international journals, including IEEE T RANS - University London, Uxbridge, U.K., where he con-
ACTIONS ON AUTOMATIC C ONTROL, IEEE T RANSACTIONS ON C ONTROL ducts research in artificial intelligence and intelligent
S YSTEMS T ECHNOLOGY, IEEE T RANSACTIONS ON N EURAL N ETWORKS data analysis, with applications in diverse areas,
AND L EARNING S YSTEMS , IEEE T RANSACTIONS ON S IGNAL P ROCESSING , including biomedicine and engineering.
and IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS —PART
C: A PPLICATIONS AND R EVIEWS .

Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on September 13,2024 at 09:04:08 UTC from IEEE Xplore. Restrictions apply.

You might also like