Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (184)

Search Parameters:
Keywords = pyramidal representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 10968 KiB  
Article
Multi-Scale Expression of Coastal Landform in Remote Sensing Images Considering Texture Features
by Ruojie Zhang and Yilang Shen
Remote Sens. 2024, 16(20), 3862; https://doi.org/10.3390/rs16203862 - 17 Oct 2024
Viewed by 197
Abstract
The multi-scale representation of remote sensing images is crucial for information extraction, data analysis, and image processing. However, traditional methods such as image pyramid and image filtering often result in the loss of image details, particularly edge information, during the simplification and merging [...] Read more.
The multi-scale representation of remote sensing images is crucial for information extraction, data analysis, and image processing. However, traditional methods such as image pyramid and image filtering often result in the loss of image details, particularly edge information, during the simplification and merging processes at different scales and resolutions. Furthermore, when applied to coastal landforms with rich texture features, such as biologically diverse areas covered with vegetation, these methods struggle to preserve the original texture characteristics. In this study, we propose a new method, multi-scale expression of coastal landforms considering texture features (METF-C), based on computer vision techniques. This method combines superpixel segmentation and texture transfer technology to improve the multi-scale representation of coastal landforms in remote sensing images. First, coastal landform elements are segmented using superpixel technology. Then, global merging is performed by selecting different classes of superpixels, with boundaries smoothed using median filtering and morphological operators. Finally, texture transfer is applied to create a fusion image that maintains both scale and level consistency. Experimental results demonstrate that METF-C outperforms traditional methods by effectively simplifying images while preserving important geomorphic features and maintaining global texture information across multiple scales. This approach offers significant improvements in edge preservation and texture retention, making it a valuable tool for analyzing coastal landforms in remote sensing imagery. Full article
Show Figures

Figure 1

25 pages, 27745 KiB  
Article
Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain
by Liangliang Li, Yan Shi, Ming Lv, Zhenhong Jia, Minqin Liu, Xiaobin Zhao, Xueyu Zhang and Hongbing Ma
Remote Sens. 2024, 16(20), 3804; https://doi.org/10.3390/rs16203804 - 13 Oct 2024
Viewed by 565
Abstract
The fusion of infrared and visible images together can fully leverage the respective advantages of each, providing a more comprehensive and richer set of information. This is applicable in various fields such as military surveillance, night navigation, environmental monitoring, etc. In this paper, [...] Read more.
The fusion of infrared and visible images together can fully leverage the respective advantages of each, providing a more comprehensive and richer set of information. This is applicable in various fields such as military surveillance, night navigation, environmental monitoring, etc. In this paper, a novel infrared and visible image fusion method based on sparse representation and guided filtering in Laplacian pyramid (LP) domain is introduced. The source images are decomposed into low- and high-frequency bands by the LP, respectively. Sparse representation has achieved significant effectiveness in image fusion, and it is used to process the low-frequency band; the guided filtering has excellent edge-preserving effects and can effectively maintain the spatial continuity of the high-frequency band. Therefore, guided filtering combined with the weighted sum of eight-neighborhood-based modified Laplacian (WSEML) is used to process high-frequency bands. Finally, the inverse LP transform is used to reconstruct the fused image. We conducted simulation experiments on the publicly available TNO dataset to validate the superiority of our proposed algorithm in fusing infrared and visible images. Our algorithm preserves both the thermal radiation characteristics of the infrared image and the detailed features of the visible image. Full article
Show Figures

Figure 1

36 pages, 17153 KiB  
Article
YOLO-RWY: A Novel Runway Detection Model for Vision-Based Autonomous Landing of Fixed-Wing Unmanned Aerial Vehicles
by Ye Li, Yu Xia, Guangji Zheng, Xiaoyang Guo and Qingfeng Li
Drones 2024, 8(10), 571; https://doi.org/10.3390/drones8100571 - 10 Oct 2024
Viewed by 545
Abstract
In scenarios where global navigation satellite systems (GNSSs) and radio navigation systems are denied, vision-based autonomous landing (VAL) for fixed-wing unmanned aerial vehicles (UAVs) becomes essential. Accurate and real-time runway detection in VAL is vital for providing precise positional and orientational guidance. However, [...] Read more.
In scenarios where global navigation satellite systems (GNSSs) and radio navigation systems are denied, vision-based autonomous landing (VAL) for fixed-wing unmanned aerial vehicles (UAVs) becomes essential. Accurate and real-time runway detection in VAL is vital for providing precise positional and orientational guidance. However, existing research faces significant challenges, including insufficient accuracy, inadequate real-time performance, poor robustness, and high susceptibility to disturbances. To address these challenges, this paper introduces a novel single-stage, anchor-free, and decoupled vision-based runway detection framework, referred to as YOLO-RWY. First, an enhanced data augmentation (EDA) module is incorporated to perform various augmentations, enriching image diversity, and introducing perturbations that improve generalization and safety. Second, a large separable kernel attention (LSKA) module is integrated into the backbone structure to provide a lightweight attention mechanism with a broad receptive field, enhancing feature representation. Third, the neck structure is reorganized as a bidirectional feature pyramid network (BiFPN) module with skip connections and attention allocation, enabling efficient multi-scale and across-stage feature fusion. Finally, the regression loss and task-aligned learning (TAL) assigner are optimized using efficient intersection over union (EIoU) to improve localization evaluation, resulting in faster and more accurate convergence. Comprehensive experiments demonstrate that YOLO-RWY achieves AP50:95 scores of 0.760, 0.611, and 0.413 on synthetic, real nominal, and real edge test sets of the landing approach runway detection (LARD) dataset, respectively. Deployment experiments on an edge device show that YOLO-RWY achieves an inference speed of 154.4 FPS under FP32 quantization with an image size of 640. The results indicate that the proposed YOLO-RWY model possesses strong generalization and real-time capabilities, enabling accurate runway detection in complex and challenging visual environments, and providing support for the onboard VAL systems of fixed-wing UAVs. Full article
Show Figures

Figure 1

19 pages, 5556 KiB  
Article
AFMSFFNet: An Anchor-Free-Based Feature Fusion Model for Ship Detection
by Yuxin Zhang, Chunlei Dong, Lixin Guo, Xiao Meng, Yue Liu and Qihao Wei
Remote Sens. 2024, 16(18), 3465; https://doi.org/10.3390/rs16183465 - 18 Sep 2024
Viewed by 449
Abstract
This paper aims to improve a small-scale object detection model to achieve detection accuracy matching or even surpassing that of complex models. Efforts are made in the module design phase to minimize parameter count as much as possible, thereby providing the potential for [...] Read more.
This paper aims to improve a small-scale object detection model to achieve detection accuracy matching or even surpassing that of complex models. Efforts are made in the module design phase to minimize parameter count as much as possible, thereby providing the potential for rapid detection of maritime targets. Here, this paper introduces an innovative Anchor-Free-based Multi-Scale Feature Fusion Network (AFMSFFNet), which improves the problems of missed detection and false positives, particularly in inshore or small target scenarios. Leveraging the YOLOX tiny as the foundational architecture, our proposed AFMSFFNet incorporates a novel Adaptive Bidirectional Fusion Pyramid Network (AB-FPN) for efficient multi-scale feature fusion, enhancing the saliency representation of targets and reducing interference from complex backgrounds. Simultaneously, the designed Multi-Scale Global Attention Detection Head (MGAHead) utilizes a larger receptive field to learn object features, generating high-quality reconstructed features for enhanced semantic information integration. Extensive experiments conducted on publicly available Synthetic Aperture Radar (SAR) image ship datasets demonstrate that AFMSFFNet outperforms the traditional baseline models in detection performance. The results indicate an improvement of 2.32% in detection accuracy compared to the YOLOX tiny model. Additionally, AFMSFFNet achieves a Frames Per Second (FPS) of 78.26 in SSDD, showcasing superior efficiency compared to the well-established performance networks, such as faster R-CNN and CenterNet, with efficiency improvement ranging from 4.7 to 6.7 times. This research provides a valuable solution for efficient ship detection in complex backgrounds, demonstrating the efficacy of AFMSFFNet through quantitative improvements in accuracy and efficiency compared to existing models. Full article
Show Figures

Graphical abstract

17 pages, 4838 KiB  
Article
Improved Detection of Multi-Class Bad Traffic Signs Using Ensemble and Test Time Augmentation Based on Yolov5 Models
by Ibrahim Yahaya Garta, Shao-Kuo Tai and Rung-Ching Chen
Appl. Sci. 2024, 14(18), 8200; https://doi.org/10.3390/app14188200 - 12 Sep 2024
Viewed by 438
Abstract
Various factors such as natural disasters, vandalism, weather, and environmental conditions can affect the physical state of traffic signs. The proposed model aims to improve detection of traffic signs affected by partial occlusion as a result of overgrown vegetation, displaced signs (those knocked [...] Read more.
Various factors such as natural disasters, vandalism, weather, and environmental conditions can affect the physical state of traffic signs. The proposed model aims to improve detection of traffic signs affected by partial occlusion as a result of overgrown vegetation, displaced signs (those knocked down, bent), perforated signs (those damaged with holes), faded signs (color degradation), rusted signs (corroded surface), and de-faced signs (placing graffiti, etc., by vandals). This research aims to improve the detection of bad traffic signs using three approaches. In the first approach, Spiral Pooling Pyramid-Fast (SPPF) and C3TR modules are introduced to the architecture of Yolov5 models. SPPF helps provide a multi-scale representation of the input feature map by pooling at different scales, which is useful in improving the quality of feature maps and detecting bad traffic signs of various sizes and perspectives. The C3TR module uses convolutional layers to enhance local feature extraction and transformers to boost understanding of the global context. Secondly, we use predictions of Yolov5 as base models to implement a mean ensemble to improve performance. Thirdly, test time augmentation (TTA) is applied at test time by using scaling and flipping to improve accuracy. Some signs are generated using stable diffusion techniques to augment certain classes. We test the proposed models on the CCTSDB2021, TT100K, GTSDB, and GTSRD datasets to ensure generalization and use k-fold cross-validation to further evaluate the performance of the models. The proposed models outperform other state-of-the-art models in comparison. Full article
Show Figures

Figure 1

17 pages, 6059 KiB  
Article
ECF-Net: Enhanced, Channel-Based, Multi-Scale Feature Fusion Network for COVID-19 Image Segmentation
by Zhengjie Ji, Junhao Zhou, Linjing Wei, Shudi Bao, Meng Chen, Hongxing Yuan and Jianjun Zheng
Electronics 2024, 13(17), 3501; https://doi.org/10.3390/electronics13173501 - 3 Sep 2024
Viewed by 541
Abstract
Accurate segmentation of COVID-19 lesion regions in lung CT images aids physicians in analyzing and diagnosing patients’ conditions. However, the varying morphology and blurred contours of these regions make this task complex and challenging. Existing methods utilizing Transformer architecture lack attention to local [...] Read more.
Accurate segmentation of COVID-19 lesion regions in lung CT images aids physicians in analyzing and diagnosing patients’ conditions. However, the varying morphology and blurred contours of these regions make this task complex and challenging. Existing methods utilizing Transformer architecture lack attention to local features, leading to the loss of detailed information in tiny lesion regions. To address these issues, we propose a multi-scale feature fusion network, ECF-Net, based on channel enhancement. Specifically, we leverage the learning capabilities of both CNN and Transformer architectures to design parallel channel extraction blocks in three different ways, effectively capturing diverse lesion features. Additionally, to minimize irrelevant information in the high-dimensional feature space and focus the network on useful and critical information, we develop adaptive feature generation blocks. Lastly, a bidirectional pyramid-structured feature fusion approach is introduced to integrate features at different levels, enhancing the diversity of feature representations and improving segmentation accuracy for lesions of various scales. The proposed method is tested on four COVID-19 datasets, demonstrating mIoU values of 84.36%, 87.15%, 83.73%, and 75.58%, respectively, outperforming several current state-of-the-art methods and exhibiting excellent segmentation performance. These findings provide robust technical support for medical image segmentation in clinical practice. Full article
(This article belongs to the Special Issue Biomedical Image Processing and Classification, 2nd Edition)
Show Figures

Figure 1

14 pages, 9214 KiB  
Article
End-to-End Implicit Object Pose Estimation
by Chen Cao, Baocheng Yu, Wenxia Xu, Guojun Chen and Yuming Ai
Sensors 2024, 24(17), 5721; https://doi.org/10.3390/s24175721 - 3 Sep 2024
Viewed by 585
Abstract
To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, [...] Read more.
To accurately estimate the 6D pose of objects, most methods employ a two-stage algorithm. While such two-stage algorithms achieve high accuracy, they are often slow. Additionally, many approaches utilize encoding–decoding to obtain the 6D pose, with many employing bilinear sampling for decoding. However, bilinear sampling tends to sacrifice the accuracy of precise features. In our research, we propose a novel solution that utilizes implicit representation as a bridge between discrete feature maps and continuous feature maps. We represent the feature map as a coordinate field, where each coordinate pair corresponds to a feature value. These feature values are then used to estimate feature maps of arbitrary scales, replacing upsampling for decoding. We apply the proposed implicit module to a bidirectional fusion feature pyramid network. Based on this implicit module, we propose three network branches: a class estimation branch, a bounding box estimation branch, and the final pose estimation branch. For this pose estimation branch, we propose a miniature dual-stream network, which estimates object surface features and complements the relationship between 2D and 3D. We represent the rotation component using the SVD (Singular Value Decomposition) representation method, resulting in a more accurate object pose. We achieved satisfactory experimental results on the widely used 6D pose estimation benchmark dataset Linemod. This innovative approach provides a more convenient solution for 6D object pose estimation. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

15 pages, 6149 KiB  
Article
A Deformable and Multi-Scale Network with Self-Attentive Feature Fusion for SAR Ship Classification
by Peng Chen, Hui Zhou, Ying Li, Bingxin Liu and Peng Liu
J. Mar. Sci. Eng. 2024, 12(9), 1524; https://doi.org/10.3390/jmse12091524 - 2 Sep 2024
Viewed by 564
Abstract
The identification of ships in Synthetic Aperture Radar (SAR) imagery is critical for effective maritime surveillance. The advent of deep learning has significantly improved the accuracy of SAR ship classification and recognition. However, distinguishing features between different ship categories in SAR images remains [...] Read more.
The identification of ships in Synthetic Aperture Radar (SAR) imagery is critical for effective maritime surveillance. The advent of deep learning has significantly improved the accuracy of SAR ship classification and recognition. However, distinguishing features between different ship categories in SAR images remains a challenge, particularly as the number of categories increases. The key to achieving high recognition accuracy lies in effectively extracting and utilizing discriminative features. To address this, we propose DCN-MSFF-TR, a novel recognition model inspired by the Transformer encoder–decoder architecture. Our approach integrates a deformable convolutional module (DCN) within the backbone network to enhance feature extraction. Additionally, we introduce multi-scale self-attention processing from the Transformer into the feature hierarchy and fuse these representations at appropriate levels using a feature pyramid strategy. This enables each layer to leverage both its own information and synthesized features from other layers, enhancing feature representation. Extensive evaluations on the OpenSARShip-3-Complex and OpenSARShip-6-Complex datasets demonstrate the effectiveness of our method. DCN-MSFF-TR achieves average recognition accuracies of 78.1% and 66.7% on the three-class and six-class datasets, respectively, outperforming existing recognition models and showcasing its superior capability in accurately identifying ship categories in SAR images. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

18 pages, 3834 KiB  
Article
Improved Tomato Leaf Disease Recognition Based on the YOLOv5m with Various Soft Attention Module Combinations
by Yong-Suk Lee, Maheshkumar Prakash Patil, Jeong Gyu Kim, Seong Seok Choi, Yong Bae Seo and Gun-Do Kim
Agriculture 2024, 14(9), 1472; https://doi.org/10.3390/agriculture14091472 - 29 Aug 2024
Viewed by 672
Abstract
To reduce production costs, environmental effects, and crop losses, tomato leaf disease recognition must be accurate and fast. Early diagnosis and treatment are necessary to cure and control illnesses and ensure tomato output and quality. The YOLOv5m was improved by using C3NN modules [...] Read more.
To reduce production costs, environmental effects, and crop losses, tomato leaf disease recognition must be accurate and fast. Early diagnosis and treatment are necessary to cure and control illnesses and ensure tomato output and quality. The YOLOv5m was improved by using C3NN modules and Bidirectional Feature Pyramid Network (BiFPN) architecture. The C3NN modules were designed by integrating several soft attention modules into the C3 module: the Convolutional Block Attention Module (CBAM), Squeeze and Excitation Network (SE), Efficient Channel Attention (ECA), and Coordinate Attention (CA). The C3 modules in the Backbone and Head of YOLOv5 model were replaced with the C3NN to improve feature representation and object detection accuracy. The BiFPN architecture was implemented in the Neck of the YOLOv5 model to effectively merge multi-scale features and improve the accuracy of object detection. Among the various combinations for the improved YOLOv5m model, the C3ECA-BiFPN-C3ECA-YOLOv5m achieved a precision (P) of 87.764%, a recall (R) of 87.201%, an F1 of 87.482, an mAP.5 of 90.401%, and an mAP.5:.95 of 68.803%. In comparison with the YOLOv5m and Faster-RCNN models, the improved models showed improvement in P by 1.36% and 7.80%, R by 4.99% and 5.51%, F1 by 3.18% and 6.86%, mAP.5 by 1.74% and 2.90%, and mAP.5:.95 by 3.26% and 4.84%, respectively. These results demonstrate that the improved models have effective tomato leaf disease recognition capabilities and are expected to contribute significantly to the development of plant disease detection technology. Full article
(This article belongs to the Special Issue Machine Vision Solutions and AI-Driven Systems in Agriculture)
Show Figures

Figure 1

19 pages, 4082 KiB  
Article
Real-Time Detection and Counting of Wheat Spikes Based on Improved YOLOv10
by Sitong Guan, Yiming Lin, Guoyu Lin, Peisen Su, Siluo Huang, Xianyong Meng, Pingzeng Liu and Jun Yan
Agronomy 2024, 14(9), 1936; https://doi.org/10.3390/agronomy14091936 - 28 Aug 2024
Viewed by 1080
Abstract
Wheat is one of the most crucial food crops globally, with its yield directly impacting global food security. The accurate detection and counting of wheat spikes is essential for monitoring wheat growth, predicting yield, and managing fields. However, the current methods face challenges, [...] Read more.
Wheat is one of the most crucial food crops globally, with its yield directly impacting global food security. The accurate detection and counting of wheat spikes is essential for monitoring wheat growth, predicting yield, and managing fields. However, the current methods face challenges, such as spike size variation, shading, weed interference, and dense distribution. Conventional machine learning approaches have partially addressed these challenges, yet they are hampered by limited detection accuracy, complexities in feature extraction, and poor robustness under complex field conditions. In this paper, we propose an improved YOLOv10 algorithm that significantly enhances the model’s feature extraction and detection capabilities. This is achieved by introducing a bidirectional feature pyramid network (BiFPN), a separated and enhancement attention module (SEAM), and a global context network (GCNet). BiFPN leverages both top-down and bottom-up bidirectional paths to achieve multi-scale feature fusion, improving performance in detecting targets of various scales. SEAM enhances feature representation quality and model performance in complex environments by separately augmenting the attention mechanism for channel and spatial features. GCNet captures long-range dependencies in the image through the global context block, enabling the model to process complex information more accurately. The experimental results demonstrate that our method achieved a precision of 93.69%, a recall of 91.70%, and a mean average precision (mAP) of 95.10% in wheat spike detection, outperforming the benchmark YOLOv10 model by 2.02% in precision, 2.92% in recall, and 1.56% in mAP. Additionally, the coefficient of determination (R2) between the detected and manually counted wheat spikes was 0.96, with a mean absolute error (MAE) of 3.57 and a root-mean-square error (RMSE) of 4.09, indicating strong correlation and high accuracy. The improved YOLOv10 algorithm effectively solves the difficult problem of wheat spike detection under complex field conditions, providing strong support for agricultural production and research. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

21 pages, 11293 KiB  
Article
DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer
by Xinyu Cao, Hanwei Wang, Xiong Wang and Bin Hu
Electronics 2024, 13(17), 3404; https://doi.org/10.3390/electronics13173404 - 27 Aug 2024
Viewed by 931
Abstract
Object detection in aerial images plays a crucial role across diverse domains such as agriculture, environmental monitoring, and security. Aerial images present several challenges, including dense small objects, intricate backgrounds, and occlusions, necessitating robust detection algorithms. This paper addresses the critical need for [...] Read more.
Object detection in aerial images plays a crucial role across diverse domains such as agriculture, environmental monitoring, and security. Aerial images present several challenges, including dense small objects, intricate backgrounds, and occlusions, necessitating robust detection algorithms. This paper addresses the critical need for accurate and efficient object detection in aerial images using a Transformer-based approach enhanced with specialized methodologies, termed DFS-DETR. The core framework leverages RT-DETR-R18, integrating the Cross Stage Partial Reparam Dilation-wise Residual Module (CSP-RDRM) to optimize feature extraction. Additionally, the introduction of the Detail-Sensitive Pyramid Network (DSPN) enhances sensitivity to local features, complemented by the Dynamic Scale Sequence Feature-Fusion Module (DSSFFM) for comprehensive multi-scale information integration. Moreover, Multi-Attention Add (MAA) is utilized to refine feature processing, which enhances the model’s capacity for understanding and representation by integrating various attention mechanisms. To improve bounding box regression, the model employs MPDIoU with normalized Wasserstein distance, which accelerates convergence. Evaluation across the VisDrone2019, AI-TOD, and NWPU VHR-10 datasets demonstrates significant improvements in the mean average precision (mAP) values: 24.1%, 24.0%, and 65.0%, respectively, surpassing RT-DETR-R18 by 2.3%, 4.8%, and 7.0%, respectively. Furthermore, the proposed method achieves real-time inference speeds. This approach can be deployed on drones to perform real-time ground detection. Full article
Show Figures

Figure 1

22 pages, 16731 KiB  
Article
Advanced Global Prototypical Segmentation Framework for Few-Shot Hyperspectral Image Classification
by Kunming Xia, Guowu Yuan, Mengen Xia, Xiaosen Li, Jinkang Gui and Hao Zhou
Sensors 2024, 24(16), 5386; https://doi.org/10.3390/s24165386 - 21 Aug 2024
Cited by 1 | Viewed by 784
Abstract
With the advancement of deep learning, related networks have shown strong performance for Hyperspectral Image (HSI) classification. However, these methods face two main challenges in HSI classification: (1) the inability to capture global information of HSI due to the restriction of patch input [...] Read more.
With the advancement of deep learning, related networks have shown strong performance for Hyperspectral Image (HSI) classification. However, these methods face two main challenges in HSI classification: (1) the inability to capture global information of HSI due to the restriction of patch input and (2) insufficient utilization of information from limited labeled samples. To overcome these challenges, we propose an Advanced Global Prototypical Segmentation (AGPS) framework. Within the AGPS framework, we design a patch-free feature extractor segmentation network (SegNet) based on a fully convolutional network (FCN), which processes the entire HSI to capture global information. To enrich the global information extracted by SegNet, we propose a Fusion of Lateral Connection (FLC) structure that fuses the low-level detailed features of the encoder output with the high-level features of the decoder output. Additionally, we propose an Atrous Spatial Pyramid Pooling-Position Attention (ASPP-PA) module to capture multi-scale spatial positional information. Finally, to explore more valuable information from limited labeled samples, we propose an advanced global prototypical representation learning strategy. Building upon the dual constraints of the global prototypical representation learning strategy, we introduce supervised contrastive learning (CL), which optimizes our network with three different constraints. The experimental results of three public datasets demonstrate that our method outperforms the existing state-of-the-art methods. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 1195 KiB  
Article
Separable CenterNet Detection Network Based on MobileNetV3—An Optimization Approach for Small-Object and Occlusion Issues
by Zhengkuo Jiao, Heng Dong and Naizhe Diao
Mathematics 2024, 12(16), 2524; https://doi.org/10.3390/math12162524 - 15 Aug 2024
Viewed by 469
Abstract
This paper proposes a novel object detection method to address the challenges posed by small objects and occlusion in object detection. This work is performed within the CenterNet framework, leveraging the MobileNetV3 backbone to model the input image’s abstract representation in a lightweight [...] Read more.
This paper proposes a novel object detection method to address the challenges posed by small objects and occlusion in object detection. This work is performed within the CenterNet framework, leveraging the MobileNetV3 backbone to model the input image’s abstract representation in a lightweight manner. A sparse convolutional skip connection is introduced in the bottleneck of MobileNetV3, specifically designed to adaptively suppress redundant and interfering information, thus enhancing feature extraction capabilities. A Dual-Path Bidirectional Feature Pyramid Network (DBi-FPN) is incorporated, allowing for high-level feature fusion through bidirectional flow and significantly improving the detection capabilities for small objects and occlusions. Task heads are applied within the feature space of multi-scale information merged by DBi-FPN, facilitating comprehensive consideration of multi-level representations. A bounding box-area loss function is also introduced, aimed at enhancing the model’s adaptability to object morphologies and geometric distortions. Extensive experiments on the PASCAL VOC 2007 and MS COCO 2017 datasets validate the competitiveness of our proposed method, particularly in real-time applications on resource-constrained devices. Our contributions offer promising avenues for enhancing the accuracy and robustness of object detection systems in complex scenarios. Full article
Show Figures

Figure 1

20 pages, 6060 KiB  
Article
Lightweight Frequency Recalibration Network for Diabetic Retinopathy Multi-Lesion Segmentation
by Yinghua Fu, Mangmang Liu, Ge Zhang and Jiansheng Peng
Appl. Sci. 2024, 14(16), 6941; https://doi.org/10.3390/app14166941 - 8 Aug 2024
Viewed by 683
Abstract
Automated segmentation of diabetic retinopathy (DR) lesions is crucial for assessing DR severity and diagnosis. Most previous segmentation methods overlook the detrimental impact of texture information bias, resulting in suboptimal segmentation results. Additionally, the role of lesion shape is not thoroughly considered. In [...] Read more.
Automated segmentation of diabetic retinopathy (DR) lesions is crucial for assessing DR severity and diagnosis. Most previous segmentation methods overlook the detrimental impact of texture information bias, resulting in suboptimal segmentation results. Additionally, the role of lesion shape is not thoroughly considered. In this paper, we propose a lightweight frequency recalibration network (LFRC-Net) for simultaneous multi-lesion DR segmentation, which integrates a frequency recalibration module into the bottleneck layers of the encoder to analyze texture information and shape features together. The module utilizes a Gaussian pyramid to generate features at different scales, constructs a Laplacian pyramid using a difference of Gaussian filter, and then analyzes object features in different frequency domains with the Laplacian pyramid. The high-frequency component handles texture information, while the low-frequency area focuses on learning the shape features of DR lesions. By adaptively recalibrating these frequency representations, our method can differentiate the objects of interest. In the decoder, we introduce a residual attention module (RAM) to enhance lesion feature extraction and efficiently suppress irrelevant information. We evaluate the proposed model’s segmentation performance on two public datasets, IDRiD and DDR, and a private dataset, an ultra-wide-field fundus images dataset. Extensive comparative experiments and ablation studies are conducted across multiple datasets. With minimal model parameters, our approach achieves an mAP_PR of 60.51%, 34.83%, and 14.35% for the segmentation of EX, HE, and MA on the DDR dataset and also obtains excellent results for EX and SE on the IDRiD dataset, which validates the effectiveness of our network. Full article
Show Figures

Figure 1

20 pages, 1928 KiB  
Article
An Automated Diagnosis Method for Lung Cancer Target Detection and Subtype Classification-Based CT Scans
by Lingfei Wang, Chenghao Zhang, Yu Zhang and Jin Li
Bioengineering 2024, 11(8), 767; https://doi.org/10.3390/bioengineering11080767 - 30 Jul 2024
Cited by 1 | Viewed by 992
Abstract
When dealing with small targets in lung cancer detection, the YOLO V8 algorithm may encounter false positives and misses. To address this issue, this study proposes an enhanced YOLO V8 detection model. The model integrates a large separable kernel attention mechanism into the [...] Read more.
When dealing with small targets in lung cancer detection, the YOLO V8 algorithm may encounter false positives and misses. To address this issue, this study proposes an enhanced YOLO V8 detection model. The model integrates a large separable kernel attention mechanism into the C2f module to expand the information retrieval range, strengthens the extraction of lung cancer features in the Backbone section, and achieves effective interaction between multi-scale features in the Neck section, thereby enhancing feature representation and robustness. Additionally, depth-wise convolution and Coordinate Attention mechanisms are embedded in the Fast Spatial Pyramid Pooling module to reduce feature loss and improve detection accuracy. This study introduces a Minimum Point Distance-based IOU loss to enhance correlation between predicted and ground truth bounding boxes, improving adaptability and accuracy in small target detection. Experimental validation demonstrates that the improved network outperforms other mainstream detection networks in terms of average precision values and surpasses other classification networks in terms of accuracy. These findings validate the outstanding performance of the enhanced model in the localization and recognition aspects of lung cancer auxiliary diagnosis. Full article
(This article belongs to the Section Biomedical Engineering and Biomaterials)
Show Figures

Figure 1

Back to TopTop