Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,305)

Search Parameters:
Keywords = feature pyramid

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 26737 KiB  
Article
EcoDetect-YOLO: A Lightweight, High-Generalization Methodology for Real-Time Detection of Domestic Waste Exposure in Intricate Environmental Landscapes
by Shenlin Liu, Ruihan Chen, Minhua Ye, Jiawei Luo, Derong Yang and Ming Dai
Sensors 2024, 24(14), 4666; https://doi.org/10.3390/s24144666 - 18 Jul 2024
Viewed by 144
Abstract
In response to the challenges of accurate identification and localization of garbage in intricate urban street environments, this paper proposes EcoDetect-YOLO, a garbage exposure detection algorithm based on the YOLOv5s framework, utilizing an intricate environment waste exposure detection dataset constructed in this study. [...] Read more.
In response to the challenges of accurate identification and localization of garbage in intricate urban street environments, this paper proposes EcoDetect-YOLO, a garbage exposure detection algorithm based on the YOLOv5s framework, utilizing an intricate environment waste exposure detection dataset constructed in this study. Initially, a convolutional block attention module (CBAM) is integrated between the second level of the feature pyramid etwork (P2) and the third level of the feature pyramid network (P3) layers to optimize the extraction of relevant garbage features while mitigating background noise. Subsequently, a P2 small-target detection head enhances the model’s efficacy in identifying small garbage targets. Lastly, a bidirectional feature pyramid network (BiFPN) is introduced to strengthen the model’s capability for deep feature fusion. Experimental results demonstrate EcoDetect-YOLO’s adaptability to urban environments and its superior small-target detection capabilities, effectively recognizing nine types of garbage, such as paper and plastic trash. Compared to the baseline YOLOv5s model, EcoDetect-YOLO achieved a 4.7% increase in mAP0.5, reaching 58.1%, with a compact model size of 15.7 MB and an FPS of 39.36. Notably, even in the presence of strong noise, the model maintained a mAP0.5 exceeding 50%, underscoring its robustness. In summary, EcoDetect-YOLO, as proposed in this paper, boasts high precision, efficiency, and compactness, rendering it suitable for deployment on mobile devices for real-time detection and management of urban garbage exposure, thereby advancing urban automation governance and digital economic development. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

24 pages, 10938 KiB  
Article
Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model
by Hailiang Gong, Litong Xiao and Xi Wang
Agronomy 2024, 14(7), 1565; https://doi.org/10.3390/agronomy14071565 - 18 Jul 2024
Viewed by 125
Abstract
Excessive fertilizer use has led to environmental pollution and reduced crop yields, underscoring the importance of research into variable-rate fertilization (VRF) based on digital image technology in precision agriculture. Current methods, which rely on spectral sensors for monitoring and prescription mapping, face significant [...] Read more.
Excessive fertilizer use has led to environmental pollution and reduced crop yields, underscoring the importance of research into variable-rate fertilization (VRF) based on digital image technology in precision agriculture. Current methods, which rely on spectral sensors for monitoring and prescription mapping, face significant technical challenges, high costs, and operational complexities, limiting their widespread adoption. This study presents an automated, intelligent, and precise approach to maize canopy image segmentation using the multi-scale attention and Unet model to enhance VRF decision making, reduce fertilization costs, and improve accuracy. A dataset of maize canopy images under various lighting and growth conditions was collected and subjected to data augmentation and normalization preprocessing. The MCAC-Unet model, built upon the MobilenetV3 backbone network and integrating the convolutional block attention module (CBAM), atrous spatial pyramid pooling (ASPP) multi-scale feature fusion, and content-aware reassembly of features (CARAFE) adaptive upsampling modules, achieved a mean intersection over union (mIOU) of 87.51% and a mean pixel accuracy (mPA) of 93.85% in maize canopy image segmentation. Coverage measurements at a height of 1.1 m indicated a relative error ranging from 3.12% to 6.82%, averaging 4.43%, with a determination coefficient of 0.911, meeting practical requirements. The proposed model and measurement system effectively address the challenges in maize canopy segmentation and coverage assessment, providing robust support for crop monitoring and VRF decision making in complex environments. Full article
Show Figures

Figure 1

23 pages, 7788 KiB  
Article
A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation
by Hao Ding, Bo Xia, Weilin Liu, Zekai Zhang, Jinglin Zhang, Xing Wang and Sen Xu
Remote Sens. 2024, 16(14), 2620; https://doi.org/10.3390/rs16142620 - 17 Jul 2024
Viewed by 256
Abstract
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential [...] Read more.
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance. Full article
Show Figures

Figure 1

12 pages, 868 KiB  
Article
Trademark Text Recognition Combining SwinTransformer and Feature-Query Mechanisms
by Boxiu Zhou, Xiuhui Wang, Wenchao Zhou and Longwen Li
Electronics 2024, 13(14), 2814; https://doi.org/10.3390/electronics13142814 - 17 Jul 2024
Viewed by 245
Abstract
The task of trademark text recognition is a fundamental component of scene text recognition (STR), which currently faces a number of challenges, including the presence of unordered, irregular or curved text, as well as text that is distorted or rotated. In applications such [...] Read more.
The task of trademark text recognition is a fundamental component of scene text recognition (STR), which currently faces a number of challenges, including the presence of unordered, irregular or curved text, as well as text that is distorted or rotated. In applications such as trademark infringement detection and analysis of brand effects, the diversification of artistic fonts in trademarks and the complexity of the product surfaces where the trademarks are located pose major challenges for relevant research. To tackle these issues, this paper proposes a novel recognition framework named SwinCornerTR, which aims to enhance the accuracy and robustness of trademark text recognition. Firstly, a novel feature-extraction network based on SwinTransformer with EFPN (enhanced feature pyramid network) is proposed. By incorporating SwinTransformer as the backbone, efficient capture of global information in trademark images is achieved through the self-attention mechanism and enhanced feature pyramid module, providing more accurate and expressive feature representations for subsequent text extraction. Then, during the encoding stage, a novel feature point-retrieval algorithm based on corner detection is designed. The OTSU-based fast corner detector is presented to generate a corner map, achieving efficient and accurate corner detection. Furthermore, in the encoding phase, a feature point-retrieval mechanism based on corner detection is introduced to achieve priority selection of key-point regions, eliminating character-to-character lines and suppressing background interference. Finally, we conducted extensive experiments on two open-access benchmark datasets, SVT and CUTE80, as well as a self-constructed trademark dataset, to assess the effectiveness of the proposed method. Our results showed that the proposed method achieved accuracies of 92.9%, 92.3% and 84.8%, respectively, on these datasets. These results demonstrate the effectiveness and robustness of the proposed method in the analysis of trademark data. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

16 pages, 9904 KiB  
Article
Improved Chinese Giant Salamander Parental Care Behavior Detection Based on YOLOv8
by Zhihao Li, Shouliang Luo, Jing Xiang, Yuanqiong Chen and Qinghua Luo
Animals 2024, 14(14), 2089; https://doi.org/10.3390/ani14142089 - 17 Jul 2024
Viewed by 237
Abstract
Optimizing the breeding techniques and increasing the hatching rate of Andrias davidianus offspring necessitates a thorough understanding of its parental care behaviors. However, A. davidianus’ nocturnal and cave-dwelling tendencies pose significant challenges for direct observation. To address this problem, this study constructed [...] Read more.
Optimizing the breeding techniques and increasing the hatching rate of Andrias davidianus offspring necessitates a thorough understanding of its parental care behaviors. However, A. davidianus’ nocturnal and cave-dwelling tendencies pose significant challenges for direct observation. To address this problem, this study constructed a dataset for the parental care behavior of A. davidianus, applied the target detection method to this behavior for the first time, and proposed a detection model for A. davidianus’ parental care behavior based on the YOLOv8s algorithm. Firstly, a multi-scale feature fusion convolution (MSConv) is proposed and combined with a C2f module, which significantly enhances the feature extraction capability of the model. Secondly, the large separable kernel attention is introduced into the spatial pyramid pooling fast (SPPF) layer to effectively reduce the interference factors in the complex environment. Thirdly, to address the problem of low quality of captured images, Wise-IoU (WIoU) is used to replace CIoU in the original YOLOv8 to optimize the loss function and improve the model’s robustness. The experimental results show that the model achieves 85.7% in the mAP50-95, surpassing the YOLOv8s model by 2.1%. Compared with other mainstream models, the overall performance of our model is much better and can effectively detect the parental care behavior of A. davidianus. Our research method not only offers a reference for the behavior recognition of A. davidianus and other amphibians but also provides a new strategy for the smart breeding of A. davidianus. Full article
Show Figures

Figure 1

20 pages, 28729 KiB  
Article
Unmanned Aerial Vehicle Object Detection Based on Information-Preserving and Fine-Grained Feature Aggregation
by Jiangfan Zhang, Yan Zhang, Zhiguang Shi, Yu Zhang and Ruobin Gao
Remote Sens. 2024, 16(14), 2590; https://doi.org/10.3390/rs16142590 - 15 Jul 2024
Viewed by 251
Abstract
General deep learning methods achieve high-level semantic feature representation by aggregating hierarchical features, which performs well in object detection tasks. However, issues arise with general deep learning methods in UAV-based remote sensing image object detection tasks. Firstly, general feature aggregation methods such as [...] Read more.
General deep learning methods achieve high-level semantic feature representation by aggregating hierarchical features, which performs well in object detection tasks. However, issues arise with general deep learning methods in UAV-based remote sensing image object detection tasks. Firstly, general feature aggregation methods such as stride convolution may lead to information loss in input samples. Secondly, common FPN methods introduce conflicting information by directly fusing feature maps from different levels. These shortcomings limit the model’s detection performance on small and weak targets in remote sensing images. In response to these concerns, we propose an unmanned aerial vehicle (UAV) object detection algorithm, IF-YOLO. Specifically, our algorithm leverages the Information-Preserving Feature Aggregation (IPFA) module to construct semantic feature representations while preserving the intrinsic features of small objects. Furthermore, to filter out irrelevant information introduced by direct fusion, we introduce the Conflict Information Suppression Feature Fusion Module (CSFM) to improve the feature fusion approach. Additionally, the Fine-Grained Aggregation Feature Pyramid Network (FGAFPN) facilitates interaction between feature maps at different levels, reducing the generation of conflicting information during multi-scale feature fusion. The experimental results on the VisDrone2019 dataset demonstrate that in contrast to the standard YOLOv8-s, our enhanced algorithm achieves a mean average precision (mAP) of 47.3%, with precision and recall rates enhanced by 6.3% and 5.6%, respectively. Full article
16 pages, 7412 KiB  
Article
An Identification Method for Mixed Coal Vitrinite Components Based on An Improved DeepLabv3+ Network
by Fujie Wang, Fanfan Li, Wei Sun, Xiaozhong Song and Huishan Lu
Energies 2024, 17(14), 3453; https://doi.org/10.3390/en17143453 - 13 Jul 2024
Viewed by 319
Abstract
To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, [...] Read more.
To address the high complexity and low accuracy issues of traditional methods in mixed coal vitrinite identification, this paper proposes a method based on an improved DeepLabv3+ network. First, MobileNetV2 is used as the backbone network to reduce the number of parameters. Second, an atrous convolution layer with a dilation rate of 24 is added to the ASPP (atrous spatial pyramid pooling) module to further increase the receptive field. Meanwhile, a CBAM (convolutional block attention module) attention mechanism with a channel multiplier of 8 is introduced at the output part of the ASPP module to better filter out important semantic features. Then, a corrective convolution module is added to the network’s output to ensure the consistency of each channel’s output feature map for each type of vitrinite. Finally, images of 14 single vitrinite components are used as training samples for network training, and a validation set is used for identification testing. The results show that the improved DeepLabv3+ achieves 6.14% and 3.68% improvements in MIOU (mean intersection over union) and MPA (mean pixel accuracy), respectively, compared to the original DeepLabv3+; 12% and 5.3% improvements compared to U-Net; 9.26% and 4.73% improvements compared to PSPNet with ResNet as the backbone; 5.4% and 9.34% improvements compared to PSPNet with MobileNetV2 as the backbone; and 6.46% and 9.05% improvements compared to HRNet. Additionally, the improved ASPP module increases MIOU and MPA by 3.23% and 1.93%, respectively, compared to the original module. The CBAM attention mechanism with a channel multiplier of 8 improves MIOU and MPA by 1.97% and 1.72%, respectively, compared to the original channel multiplier of 16. The data indicate that the proposed identification method significantly improves recognition accuracy and can be effectively applied to mixed coal vitrinite identification. Full article
(This article belongs to the Special Issue Factor Analysis and Mathematical Modeling of Coals)
Show Figures

Figure 1

17 pages, 10982 KiB  
Article
Automatic Identification of Sea Rice Grains in Complex Field Environment Based on Deep Learning
by Ruoling Deng, Weilin Cheng, Haitao Liu, Donglin Hou, Xiecheng Zhong, Zijian Huang, Bingfeng Xie and Ningxia Yin
Agriculture 2024, 14(7), 1135; https://doi.org/10.3390/agriculture14071135 - 12 Jul 2024
Viewed by 364
Abstract
The number of grains per sea rice panicle is an important parameter directly related to rice yield, and it is also a very important agronomic trait in research related to sea rice breeding. However, the grain number per sea rice panicle still mainly [...] Read more.
The number of grains per sea rice panicle is an important parameter directly related to rice yield, and it is also a very important agronomic trait in research related to sea rice breeding. However, the grain number per sea rice panicle still mainly relies on manual calculation, which has the disadvantages of being time-consuming, error-prone, and labor-intensive. In this study, a novel method was developed for the automatic calculation of the grain number per rice panicle based on a deep convolutional neural network. Firstly, some sea rice panicle images were collected in complex field environment and annotated to establish the sea rice panicle image data set. Then, a sea grain detection model was developed using the Faster R-CNN embedded with a feature pyramid network (FPN) for grain identification and location. Also, ROI Align was used to replace ROI pooling to solve the problem of relatively large deviations in the prediction frame when the model detected small grains. Finally, the mAP (mean Average Precision) and accuracy of the sea grain detection model were 90.1% and 94.9%, demonstrating that the proposed method had high accuracy in identifying and locating sea grains. The sea rice grain detection model can quickly and accurately predict the number of grains per panicle, providing an effective, convenient, and low-cost tool for yield evaluation, crop breeding, and genetic research. It also has great potential in assisting phenotypic research. Full article
(This article belongs to the Special Issue Application of Machine Learning and Data Analysis in Agriculture)
Show Figures

Figure 1

16 pages, 32240 KiB  
Article
A Novel Tongue Coating Segmentation Method Based on Improved TransUNet
by Jiaze Wu, Zijian Li, Yiheng Cai, Hao Liang, Long Zhou, Ming Chen and Jing Guan
Sensors 2024, 24(14), 4455; https://doi.org/10.3390/s24144455 - 10 Jul 2024
Viewed by 263
Abstract
Background: As an important part of the tongue, the tongue coating is closely associated with different disorders and has major diagnostic benefits. This study aims to construct a neural network model that can perform complex tongue coating segmentation. This addresses the issue of [...] Read more.
Background: As an important part of the tongue, the tongue coating is closely associated with different disorders and has major diagnostic benefits. This study aims to construct a neural network model that can perform complex tongue coating segmentation. This addresses the issue of tongue coating segmentation in intelligent tongue diagnosis automation. Method: This work proposes an improved TransUNet to segment the tongue coating. We introduced a transformer as a self-attention mechanism to capture the semantic information in the high-level features of the encoder. At the same time, the subtraction feature pyramid (SFP) and visual regional enhancer (VRE) were constructed to minimize the redundant information transmitted by skip connections and improve the spatial detail information in the low-level features of the encoder. Results: Comparative and ablation experimental findings indicate that our model has an accuracy of 96.36%, a precision of 96.26%, a dice of 96.76%, a recall of 97.43%, and an IoU of 93.81%. Unlike the reference model, our model achieves the best segmentation effect. Conclusion: The improved TransUNet proposed here can achieve precise segmentation of complex tongue images. This provides an effective technique for the automatic extraction in images of the tongue coating, contributing to the automation and accuracy of tongue diagnosis. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 5924 KiB  
Article
Multi-Scale Marine Object Detection in Side-Scan Sonar Images Based on BES-YOLO
by Quanhong Ma, Shaohua Jin, Gang Bian and Yang Cui
Sensors 2024, 24(14), 4428; https://doi.org/10.3390/s24144428 - 9 Jul 2024
Viewed by 360
Abstract
Aiming at the problem of low accuracy of multi-scale seafloor target detection in side-scan sonar images with high noise and complex background texture, a model for multi-scale target detection using the BES-YOLO network is proposed. First, an efficient multi-scale attention (EMA) mechanism is [...] Read more.
Aiming at the problem of low accuracy of multi-scale seafloor target detection in side-scan sonar images with high noise and complex background texture, a model for multi-scale target detection using the BES-YOLO network is proposed. First, an efficient multi-scale attention (EMA) mechanism is used in the backbone of the YOLOv8 network, and a bi-directional feature pyramid network (Bifpn) is introduced to merge the information of different scales, finally, a Shape_IoU loss function is introduced to continuously optimize the model and improve its accuracy. Before training, the dataset is preprocessed using 2D discrete wavelet decomposition and reconstruction to enhance the robustness of the network. The experimental results show that 92.4% of the mean average accuracy at IoU of 0.5 ([email protected]) and 67.7% of the mean average accuracy at IoU of 0.5 to 0.95 ([email protected]:0.95) are achieved using the BES-YOLO network, which is an increase of 5.3% and 4.4% compared to the YOLOv8n model. The research results can effectively improve the detection accuracy and efficiency of multi-scale targets in side-scan sonar images, which can be applied to AUVs and other underwater platforms to implement intelligent detection of undersea targets. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

15 pages, 1293 KiB  
Article
An Improved Lightweight YOLOv5s-Based Method for Detecting Electric Bicycles in Elevators
by Ziyuan Zhang, Xianyu Yang and Chengyu Wu
Electronics 2024, 13(13), 2660; https://doi.org/10.3390/electronics13132660 - 7 Jul 2024
Viewed by 372
Abstract
The increase in fire accidents caused by indoor charging of electric bicycles has raised concerns among people. Monitoring EBs in elevators is challenging, and the current object detection method is a variant of YOLOv5, which faces problems with calculating the load and detection [...] Read more.
The increase in fire accidents caused by indoor charging of electric bicycles has raised concerns among people. Monitoring EBs in elevators is challenging, and the current object detection method is a variant of YOLOv5, which faces problems with calculating the load and detection rate. To address this issue, this paper presents an improved lightweight method based on YOLOv5s to detect EBs in elevators. This method introduces the MobileNetV2 module to achieve the lightweight performance of the model. By introducing the CBAM attention mechanism and the Bidirectional Feature Pyramid Network (BiFPN) into the YOLOv5s neck network, the detection precision is improved. In order to better verify that the model can be deployed at the edge of an elevator, this article deploys it using the Raspberry Pi 4B embedded development board and connects it to a buzzer for application verification. The experimental results demonstrate that the model parameters of EBs are reduced by 58.4%, the computational complexity is reduced by 50.6%, the detection precision reaches 95.9%, and real-time detection of electric vehicles in elevators is achieved. Full article
Show Figures

Figure 1

17 pages, 20371 KiB  
Article
YOLOv8 Model for Weed Detection in Wheat Fields Based on a Visual Converter and Multi-Scale Feature Fusion
by Yinzeng Liu, Fandi Zeng, Hongwei Diao, Junke Zhu, Dong Ji, Xijie Liao and Zhihuan Zhao
Sensors 2024, 24(13), 4379; https://doi.org/10.3390/s24134379 - 5 Jul 2024
Viewed by 362
Abstract
Accurate weed detection is essential for the precise control of weeds in wheat fields, but weeds and wheat are sheltered from each other, and there is no clear size specification, making it difficult to accurately detect weeds in wheat. To achieve the precise [...] Read more.
Accurate weed detection is essential for the precise control of weeds in wheat fields, but weeds and wheat are sheltered from each other, and there is no clear size specification, making it difficult to accurately detect weeds in wheat. To achieve the precise identification of weeds, wheat weed datasets were constructed, and a wheat field weed detection model, YOLOv8-MBM, based on improved YOLOv8s, was proposed. In this study, a lightweight visual converter (MobileViTv3) was introduced into the C2f module to enhance the detection accuracy of the model by integrating input, local (CNN), and global (ViT) features. Secondly, a bidirectional feature pyramid network (BiFPN) was introduced to enhance the performance of multi-scale feature fusion. Furthermore, to address the weak generalization and slow convergence speed of the CIoU loss function for detection tasks, the bounding box regression loss function (MPDIOU) was used instead of the CIoU loss function to improve the convergence speed of the model and further enhance the detection performance. Finally, the model performance was tested on the wheat weed datasets. The experiments show that the YOLOv8-MBM proposed in this paper is superior to Fast R-CNN, YOLOv3, YOLOv4-tiny, YOLOv5s, YOLOv7, YOLOv9, and other mainstream models in regards to detection performance. The accuracy of the improved model reaches 92.7%. Compared with the original YOLOv8s model, the precision, recall, mAP1, and mAP2 are increased by 10.6%, 8.9%, 9.7%, and 9.3%, respectively. In summary, the YOLOv8-MBM model successfully meets the requirements for accurate weed detection in wheat fields. Full article
(This article belongs to the Special Issue Sensor and AI Technologies in Intelligent Agriculture: 2nd Edition)
Show Figures

Figure 1

17 pages, 2362 KiB  
Article
Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches
by Şahım Giray Kıvanç, Baha Şen, Fatih Nar and Ali Özgün Ok
Appl. Sci. 2024, 14(13), 5898; https://doi.org/10.3390/app14135898 - 5 Jul 2024
Viewed by 395
Abstract
Throughout the evolution of machine learning, the size of models has steadily increased as researchers strive for higher accuracy by adding more layers. This escalation in model complexity necessitates enhanced hardware capabilities. Today, state-of-the-art machine learning models have become so large that effectively [...] Read more.
Throughout the evolution of machine learning, the size of models has steadily increased as researchers strive for higher accuracy by adding more layers. This escalation in model complexity necessitates enhanced hardware capabilities. Today, state-of-the-art machine learning models have become so large that effectively training them requires substantial hardware resources, which may be readily available to large companies but not to students or independent researchers. To make the research on machine learning models more accessible, this study introduces a size reduction technique that leverages stages in pyramid training and similarity comparison. We conducted experiments on classification, segmentation, and object detection tasks using various network configurations. Our results demonstrate that pyramid training can reduce model complexity by up to 70% while maintaining accuracy comparable to conventional full-sized models. These findings offer a scalable and resource-efficient solution for researchers and practitioners in hardware-constrained environments. Full article
(This article belongs to the Special Issue Recent Advances in Automated Machine Learning: 2nd Edition)
Show Figures

Figure 1

28 pages, 7404 KiB  
Article
Context-Aggregated and SAM-Guided Network for ViT-Based Instance Segmentation in Remote Sensing Images
by Shuangzhou Liu, Feng Wang, Hongjian You, Niangang Jiao, Guangyao Zhou and Tingtao Zhang
Remote Sens. 2024, 16(13), 2472; https://doi.org/10.3390/rs16132472 - 5 Jul 2024
Viewed by 315
Abstract
Instance segmentation of remote sensing images can not only provide object-level positioning information but also provide pixel-level positioning information. This pixel-level information annotation has a wide range of uses in the field of remote sensing, and it is of great value for environmental [...] Read more.
Instance segmentation of remote sensing images can not only provide object-level positioning information but also provide pixel-level positioning information. This pixel-level information annotation has a wide range of uses in the field of remote sensing, and it is of great value for environmental detection and resource management. Because optical images generally have complex terrain environments and changeable object shapes, SAR images are affected by complex scattering phenomena, and the mask quality obtained by the traditional instance segmentation method used in remote sensing images is not high. Therefore, it is a challenging task to improve the mask quality of instance segmentation in remote sensing images. Since the traditional two-stage instance segmentation method consists of backbone, neck, bbox head, and mask head, the final mask quality depends on the product of all front-end work quality. Therefore, we consider the difficulty of optical and SAR images to bring instance segmentation to the targeted improvement of the neck, bbox head, and mask head, and we propose the Context-Aggregated and SAM-Guided Network (CSNet). In this network, the plain feature fusion pyramid network (PFFPN) can generate a pyramid for the plain feature and provide a feature map of the appropriate instance scale for detection and segmentation. The network also includes a context aggregation bbox head (CABH), which uses the context information and instance information around the instance to solve the problem of missed detection and false detection in detection. The network also has a SAM-Guided mask head (SGMH), which learns by using SAM as a teacher, and uses the knowledge learned to improve the edge of the mask. Experimental results show that CSNet significantly improves the quality of masks generated under optical and SAR images, and CSNet achieves 5.1% and 3.2% AP increments compared with other SOTA models. Full article
Show Figures

Figure 1

19 pages, 18726 KiB  
Article
A Small-Object Detection Model Based on Improved YOLOv8s for UAV Image Scenarios
by Jianjun Ni, Shengjie Zhu, Guangyi Tang, Chunyan Ke and Tingting Wang
Remote Sens. 2024, 16(13), 2465; https://doi.org/10.3390/rs16132465 - 5 Jul 2024
Viewed by 487
Abstract
Small object detection for unmanned aerial vehicle (UAV) image scenarios is a challenging task in the computer vision field. Some problems should be further studied, such as the dense small objects and background noise in high-altitude aerial photography images. To address these issues, [...] Read more.
Small object detection for unmanned aerial vehicle (UAV) image scenarios is a challenging task in the computer vision field. Some problems should be further studied, such as the dense small objects and background noise in high-altitude aerial photography images. To address these issues, an enhanced YOLOv8s-based model for detecting small objects is presented. The proposed model incorporates a parallel multi-scale feature extraction module (PMSE), which enhances the feature extraction capability for small objects by generating adaptive weights with different receptive fields through parallel dilated convolution and deformable convolution, and integrating the generated weight information into shallow feature maps. Then, a scale compensation feature pyramid network (SCFPN) is designed to integrate the spatial feature information derived from the shallow neural network layers with the semantic data extracted from the higher layers of the network, thereby enhancing the network’s capacity for representing features. Furthermore, the largest-object detection layer is removed from the original detection layers, and an ultra-small-object detection layer is applied, with the objective of improving the network’s detection performance for small objects. Finally, the WIOU loss function is employed to balance high- and low-quality samples in the dataset. The results of the experiments conducted on the two public datasets illustrate that the proposed model can enhance the object detection accuracy in UAV image scenarios. Full article
Show Figures

Figure 1

Back to TopTop