Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (1,557)

Search Parameters:
Keywords = multi-scale feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 3316 KiB  
Article
AMSformer: A Transformer for Grain Storage Temperature Prediction Using Adaptive Multi-Scale Feature Fusion
by Qinghui Zhang, Weixiang Zhang, Quanzhen Huang, Chenxia Wan and Zhihui Li
Agriculture 2025, 15(1), 58; https://doi.org/10.3390/agriculture15010058 (registering DOI) - 29 Dec 2024
Viewed by 273
Abstract
Grain storage temperature prediction is crucial for silo safety and can effectively prevent mold and mildew caused by increasing grain temperature and condensation due to decreasing grain temperature. However, current prediction methods lead to information redundancy when capturing temporal and spatial dependencies, which [...] Read more.
Grain storage temperature prediction is crucial for silo safety and can effectively prevent mold and mildew caused by increasing grain temperature and condensation due to decreasing grain temperature. However, current prediction methods lead to information redundancy when capturing temporal and spatial dependencies, which diminishes prediction accuracy. To tackle this issue, this paper introduces an adaptive multi-scale feature fusion transformer model (AMSformer). Firstly, the model utilizes the adaptive channel attention (ACA) mechanism to adjust the weights of different channels according to the input data characteristics and suppress irrelevant or redundant channels. Secondly, AMSformer employs the multi-scale attention mechanism (MSA) to more accurately capture dependencies at different time scales. Finally, the ACA and MSA layers are integrated by a hierarchical encoder (HED) to efficiently utilize adaptive multi-scale information, enhancing prediction accuracy. In this study, actual grain temperature data and six publicly available datasets are used for validation and performance comparison with nine existing models. The results demonstrate that AMSformer outperforms in 36 out of the 58 test cases, highlighting its significant advantages in prediction accuracy and efficiency. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

24 pages, 12644 KiB  
Article
Vehicle Flow Detection and Tracking Based on an Improved YOLOv8n and ByteTrack Framework
by Jinjiang Liu, Yonghua Xie, Yu Zhang and Haoming Li
World Electr. Veh. J. 2025, 16(1), 13; https://doi.org/10.3390/wevj16010013 (registering DOI) - 28 Dec 2024
Viewed by 228
Abstract
Vehicle flow detection and tracking are crucial components of intelligent transportation systems. However, traditional methods often struggle with challenges such as the poor detection of small objects and low efficiency when processing large-scale data. To address these issues, this paper proposes a vehicle [...] Read more.
Vehicle flow detection and tracking are crucial components of intelligent transportation systems. However, traditional methods often struggle with challenges such as the poor detection of small objects and low efficiency when processing large-scale data. To address these issues, this paper proposes a vehicle flow detection and tracking method that integrates an improved YOLOv8n model with the ByteTrack algorithm. In the detection module, we introduce the innovative MSN-YOLO model, which combines the C2f_MLCA module, the Detect_SEAM module, and the NWD loss function to enhance feature fusion and improve cross-scale information processing. These enhancements significantly boost the model’s ability to detect small objects and handle complex backgrounds. In the tracking module, we incorporate the ByteTrack algorithm and train unique vehicle re-identification (Re-ID) features, ensuring robust multi-object tracking in complex environments and improving the stability and accuracy of vehicle flow tracking. The experimental results demonstrate that the proposed method achieves a mean Average Precision (mAP) of 62.8% at IoU = 0.50 and a Multiple Object Tracking Accuracy (MOTA) of 72.16% in real-time tracking. These improvements represent increases of 2.7% and 3.16%, respectively, compared to baseline algorithms. This method provides effective technical support for intelligent traffic management, traffic flow monitoring, and congestion prediction. Full article
19 pages, 8480 KiB  
Article
HAD-YOLO: An Accurate and Effective Weed Detection Model Based on Improved YOLOV5 Network
by Long Deng, Zhonghua Miao, Xueguan Zhao, Shuo Yang, Yuanyuan Gao, Changyuan Zhai and Chunjiang Zhao
Agronomy 2025, 15(1), 57; https://doi.org/10.3390/agronomy15010057 (registering DOI) - 28 Dec 2024
Viewed by 254
Abstract
Weeds significantly impact crop yields and quality, necessitating strict control. Effective weed identification is essential to precision weeding in the field. Existing detection methods struggle with the inconsistent size scales of weed targets and the issue of small targets, making it difficult to [...] Read more.
Weeds significantly impact crop yields and quality, necessitating strict control. Effective weed identification is essential to precision weeding in the field. Existing detection methods struggle with the inconsistent size scales of weed targets and the issue of small targets, making it difficult to achieve efficient detection, and they are unable to satisfy both the speed and accuracy requirements for detection at the same time. Therefore, this study, focusing on three common types of weeds in the field—Amaranthus retroflexus, Eleusine indica, and Chenopodium—proposes the HAD-YOLO model. With the purpose of improving the model’s capacity to extract features and making it more lightweight, this algorithm employs the HGNetV2 as its backbone network. The Scale Sequence Feature Fusion Module (SSFF) and Triple Feature Encoding Module (TFE) from the ASF-YOLO are introduced to improve the model’s capacity to extract features across various scales, and on this basis, to improve the model’s capacity to detect small targets, a P2 feature layer is included. Finally, a target detection head with an attention mechanism, Dynamic head (Dyhead), is utilized to improve the detection head’s capacity for representation. Experimental results show that on the dataset collected in the greenhouse, the mAP for weed detection is 94.2%; using this as the pre-trained weight, on the dataset collected in the field environment, the mAP for weed detection is 96.2%, and the detection FPS is 30.6. Overall, the HAD-YOLO model has effectively addressed the requirements for accurate weed identification, offering both theoretical and technical backing for automatic weed control. Future efforts will involve collecting more weed data from various agricultural field scenarios to validate and enhance the generalization capabilities of the HAD-YOLO model. Full article
(This article belongs to the Section Weed Science and Weed Management)
25 pages, 19869 KiB  
Article
PMDNet: An Improved Object Detection Model for Wheat Field Weed
by Zhengyuan Qi and Jun Wang
Agronomy 2025, 15(1), 55; https://doi.org/10.3390/agronomy15010055 (registering DOI) - 28 Dec 2024
Viewed by 152
Abstract
Efficient and accurate weed detection in wheat fields is critical for precision agriculture to optimize crop yield and minimize herbicide usage. The dataset for weed detection in wheat fields was created, encompassing 5967 images across eight well-balanced weed categories, and it comprehensively covers [...] Read more.
Efficient and accurate weed detection in wheat fields is critical for precision agriculture to optimize crop yield and minimize herbicide usage. The dataset for weed detection in wheat fields was created, encompassing 5967 images across eight well-balanced weed categories, and it comprehensively covers the entire growth cycle of spring wheat as well as the associated weed species observed throughout this period. Based on this dataset, PMDNet, an improved object detection model built upon the YOLOv8 architecture, was introduced and optimized for wheat field weed detection tasks. PMDNet incorporates the Poly Kernel Inception Network (PKINet) as the backbone, the self-designed Multi-Scale Feature Pyramid Network (MSFPN) for multi-scale feature fusion, and Dynamic Head (DyHead) as the detection head, resulting in significant performance improvements. Compared to the baseline YOLOv8n model, PMDNet increased [email protected] from 83.6% to 85.8% (+2.2%) and [email protected]:0.95 from 65.7% to 69.6% (+5.9%). Furthermore, PMDNet outperformed several classical single-stage and two-stage object detection models, achieving the highest precision (94.5%, 14.1% higher than Faster-RCNN) and [email protected] (85.8%, 5.4% higher than RT-DETR-L). Under the stricter [email protected]:0.95 metric, PMDNet reached 69.6%, surpassing Faster-RCNN by 16.7% and RetinaNet by 13.1%. Real-world video detection tests further validated PMDNet’s practicality, achieving 87.7 FPS and demonstrating high precision in detecting weeds in complex backgrounds and small targets. These advancements highlight PMDNet’s potential for practical applications in precision agriculture, providing a robust solution for weed management and contributing to the development of sustainable farming practices. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

22 pages, 4773 KiB  
Article
GFN: A Garbage Classification Fusion Network Incorporating Multiple Attention Mechanisms
by Zhaoqi Wang, Wenxue Zhou and Yanmei Li
Electronics 2025, 14(1), 75; https://doi.org/10.3390/electronics14010075 - 27 Dec 2024
Viewed by 244
Abstract
With the increasing global attention to environmental protection and the sustainable use of resources, waste classification has become a critical issue that needs urgent resolution in social development. Compared with the traditional manual waste classification methods, deep learning-based waste classification systems offer significant [...] Read more.
With the increasing global attention to environmental protection and the sustainable use of resources, waste classification has become a critical issue that needs urgent resolution in social development. Compared with the traditional manual waste classification methods, deep learning-based waste classification systems offer significant advantages. This paper proposes an innovative deep learning framework, Garbage FusionNet (GFN), aimed at tackling the waste classification challenge. GFN enhances classification performance by integrating the local feature extraction strengths of ResNet with the global information processing capabilities of the Vision Transformer (ViT). Furthermore, GFN incorporates the Pyramid Pooling Module (PPM) and the Convolutional Block Attention Module (CBAM), which collectively improve multi-scale feature extraction and emphasize critical features, thereby increasing the model’s robustness and accuracy. The experimental results on the Garbage Dataset and Trashnet demonstrate that GFN achieves superior performance compared with other comparison models. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

22 pages, 9808 KiB  
Article
An Efficient Group Convolution and Feature Fusion Method for Weed Detection
by Chaowen Chen, Ying Zang, Jinkang Jiao, Daoqing Yan, Zhuorong Fan, Zijian Cui and Minghua Zhang
Agriculture 2025, 15(1), 37; https://doi.org/10.3390/agriculture15010037 - 27 Dec 2024
Viewed by 269
Abstract
Weed detection is a crucial step in achieving intelligent weeding for vegetables. Currently, research on vegetable weed detection technology is relatively limited, and existing detection methods still face challenges due to complex natural conditions, resulting in low detection accuracy and efficiency. This paper [...] Read more.
Weed detection is a crucial step in achieving intelligent weeding for vegetables. Currently, research on vegetable weed detection technology is relatively limited, and existing detection methods still face challenges due to complex natural conditions, resulting in low detection accuracy and efficiency. This paper proposes the YOLOv8-EGC-Fusion (YEF) model, an enhancement based on the YOLOv8 model, to address these challenges. This model introduces plug-and-play modules: (1) The Efficient Group Convolution (EGC) module leverages convolution kernels of various sizes combined with group convolution techniques to significantly reduce computational cost. Integrating this EGC module with the C2f module creates the C2f-EGC module, strengthening the model’s capacity to grasp local contextual information. (2) The Group Context Anchor Attention (GCAA) module strengthens the model’s capacity to capture long-range contextual information, contributing to improved feature comprehension. (3) The GCAA-Fusion module effectively merges multi-scale features, addressing shallow feature loss and preserving critical information. Leveraging GCAA-Fusion and PAFPN, we developed an Adaptive Feature Fusion (AFF) feature pyramid structure that amplifies the model’s feature extraction capabilities. To ensure effective evaluation, we collected a diverse dataset of weed images from various vegetable fields. A series of comparative experiments was conducted to verify the detection effectiveness of the YEF model. The results show that the YEF model outperforms the original YOLOv8 model, Faster R-CNN, RetinaNet, TOOD, RTMDet, and YOLOv5 in detection performance. The detection metrics achieved by the YEF model are as follows: precision of 0.904, recall of 0.88, F1 score of 0.891, and mAP0.5 of 0.929. In conclusion, the YEF model demonstrates high detection accuracy for vegetable and weed identification, meeting the requirements for precise detection. Full article
(This article belongs to the Special Issue Intelligent Agricultural Machinery Design for Smart Farming)
Show Figures

Figure 1

14 pages, 3595 KiB  
Article
HandFI: Multilevel Interacting Hand Reconstruction Based on Multilevel Feature Fusion in RGB Images
by Huimin Pan, Yuting Cai, Jiayi Yang, Shaojia Niu, Quanli Gao and Xihan Wang
Sensors 2025, 25(1), 88; https://doi.org/10.3390/s25010088 - 27 Dec 2024
Viewed by 305
Abstract
Interacting hand reconstruction presents significant opportunities in various applications. However, it currently faces challenges such as the difficulty in distinguishing the features of both hands, misalignment of hand meshes with input images, and modeling the complex spatial relationships between interacting hands. In this [...] Read more.
Interacting hand reconstruction presents significant opportunities in various applications. However, it currently faces challenges such as the difficulty in distinguishing the features of both hands, misalignment of hand meshes with input images, and modeling the complex spatial relationships between interacting hands. In this paper, we propose a multilevel feature fusion interactive network for hand reconstruction (HandFI). Within this network, the hand feature separation module utilizes attentional mechanisms and positional coding to distinguish between left-hand and right-hand features while maintaining the spatial relationship of the features. The hand fusion and attention module promotes the alignment of hand vertices with the image by integrating multi-scale hand features while introducing cross-attention to help determine the complex spatial relationships between interacting hands, thereby enhancing the accuracy of two-hand reconstruction. We evaluated our method with existing approaches using the InterHand 2.6M, RGB2Hands, and EgoHands datasets. Extensive experimental results demonstrated that our method outperformed other representative methods, with performance metrics of 9.38 mm for the MPJPE and 9.61 mm for the MPVPE. Additionally, the results obtained in real-world scenes further validated the generalization capability of our method. Full article
Show Figures

Figure 1

19 pages, 7424 KiB  
Article
Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation
by Wei-Jong Yang, Chih-Chen Wu and Jar-Ferr Yang
Sensors 2025, 25(1), 80; https://doi.org/10.3390/s25010080 - 26 Dec 2024
Viewed by 226
Abstract
Precision depth estimation plays a key role in many applications, including 3D scene reconstruction, virtual reality, autonomous driving and human–computer interaction. Through recent advancements in deep learning technologies, monocular depth estimation, with its simplicity, has surpassed the traditional stereo camera systems, bringing new [...] Read more.
Precision depth estimation plays a key role in many applications, including 3D scene reconstruction, virtual reality, autonomous driving and human–computer interaction. Through recent advancements in deep learning technologies, monocular depth estimation, with its simplicity, has surpassed the traditional stereo camera systems, bringing new possibilities in 3D sensing. In this paper, by using a single camera, we propose an end-to-end supervised monocular depth estimation autoencoder, which contains an encoder with a structure with a mixed convolution neural network and vision transformers and an effective adaptive fusion decoder to obtain high-precision depth maps. In the encoder, we construct a multi-scale feature extractor by mixing residual configurations of vision transformers to enhance both local and global information. In the adaptive fusion decoder, we introduce adaptive fusion modules to effectively merge the features of the encoder and the decoder together. Lastly, the model is trained using a loss function that aligns with human perception to enable it to focus on the depth values of foreground objects. The experimental results demonstrate the effective prediction of the depth map from a single-view color image by the proposed autoencoder, which increases the first accuracy rate about 28% and reduces the root mean square error about 27% compared to an existing method in the NYU dataset. Full article
Show Figures

Figure 1

22 pages, 4557 KiB  
Article
LKAFFNet: A Novel Large-Kernel Attention Feature Fusion Network for Land Cover Segmentation
by Bochao Chen, An Tong, Yapeng Wang, Jie Zhang, Xu Yang and Sio-Kei Im
Sensors 2025, 25(1), 54; https://doi.org/10.3390/s25010054 - 25 Dec 2024
Viewed by 205
Abstract
The accurate segmentation of land cover in high-resolution remote sensing imagery is crucial for applications such as urban planning, environmental monitoring, and disaster management. However, traditional convolutional neural networks (CNNs) struggle to balance fine-grained local detail with large-scale contextual information. To tackle these [...] Read more.
The accurate segmentation of land cover in high-resolution remote sensing imagery is crucial for applications such as urban planning, environmental monitoring, and disaster management. However, traditional convolutional neural networks (CNNs) struggle to balance fine-grained local detail with large-scale contextual information. To tackle these challenges, we combine large-kernel convolutions, attention mechanisms, and multi-scale feature fusion to form a novel LKAFFNet framework that introduces the following three key modules: LkResNet, which enhances feature extraction through parameterizable large-kernel convolutions; Large-Kernel Attention Aggregation (LKAA), integrating spatial and channel attention; and Channel Difference Features Shift Fusion (CDFSF), which enables efficient multi-scale feature fusion. Experimental comparisons demonstrate that LKAFFNet outperforms previous models on both the LandCover dataset and WHU Building dataset, particularly in cases with diverse scales. Specifically, it achieved a mIoU of 0.8155 on the LandCover dataset and 0.9326 on the WHU Building dataset. These findings suggest that LKAFFNet significantly improves land cover segmentation performance, offering a more effective tool for remote sensing applications. Full article
(This article belongs to the Section Environmental Sensing)
Show Figures

Figure 1

25 pages, 7795 KiB  
Article
Change Detection and Incremental Updates for Multi-Source Road Networks Considering Topological Consistency
by Xiaodong Wang, Dongbao Zhao, Xingze Li, Nan Jia and Li Guo
ISPRS Int. J. Geo-Inf. 2025, 14(1), 2; https://doi.org/10.3390/ijgi14010002 - 24 Dec 2024
Viewed by 273
Abstract
Vector road networks are vital components of intelligent transportation systems and electronic navigation maps. There is a pressing need for efficient and rapid dynamic updates for road network data. In this paper, we propose a series of methods designed specifically for geometric change [...] Read more.
Vector road networks are vital components of intelligent transportation systems and electronic navigation maps. There is a pressing need for efficient and rapid dynamic updates for road network data. In this paper, we propose a series of methods designed specifically for geometric change detection and the topological consistency updating of multi-source vector road networks without relying on complicated road network matching. For geometric change detection, we employ buffer analysis to compare various sources of vector road networks, differentiating between newly added, deleted, and unchanged road features. Furthermore, we utilize road shape similarity analysis to detect and recognize partial matching relationships between different road network sources. For incremental updates, we define topology consistency and propose three distinct methods for merging road nodes, aiming to preserve the topological integrity of the road network to the greatest extent possible. To address geometric conflicts and topological inconsistencies, we present a fusion and update method specifically tailored for partially matched road features. In order to verify the proposed methods, a road central line network with a scale of 1:10000 from the official institution is employed to geometrically update the commercial navigation road network of a similar scale in the remote area. The experiment results indicate that our method achieves an impressive 91.7% automation rate in detecting geometric changes for road features. For the remaining 8.3% of road features, our method provides suggestions on potential geometric changes, albeit necessitating manual verification and assessment. In terms of the incremental updating of the road network, approximately 89.2% of the data can be seamlessly updated automatically using our methods, while a minor 10.8% requires manual intervention for road updates. Collectively, our methods expedite the updating cycle of vector road network data and facilitate the seamless sharing and integrated utilization of multi-source road network data. Full article
Show Figures

Figure 1

21 pages, 1132 KiB  
Article
Lightweight Multi-Scale Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images
by Jun Li and Kaigen Huang
Electronics 2025, 14(1), 8; https://doi.org/10.3390/electronics14010008 - 24 Dec 2024
Viewed by 256
Abstract
Salient object detection in optical remote sensing images (ORSI-SOD) encounters notable challenges, mainly because of the small scale of salient objects and the similarity between these objects and their backgrounds in images captured by satellite and aerial sensors. Conventional approaches frequently struggle to [...] Read more.
Salient object detection in optical remote sensing images (ORSI-SOD) encounters notable challenges, mainly because of the small scale of salient objects and the similarity between these objects and their backgrounds in images captured by satellite and aerial sensors. Conventional approaches frequently struggle to efficiently leverage multi-scale and multi-stage features. Moreover, these methods usually rely on sophisticated and resource-heavy architectures, which can limit their practicality and efficiency in real-world applications. To overcome these limitations, this paper proposes a novel lightweight network called the Multi-scale Feature Fusion Network (MFFNet). Specifically, a Multi-stage Information Fusion (MIF) module is created to improve the detection of salient objects by effectively integrating features from multiple stages and scales. Additionally, we design a Semantic Guidance Fusion (SGF) module to specifically alleviate the problem of semantic dilution often observed in U-Net architecture. Comprehensive evaluations on two benchmark datasets show that the MFFNet attains outstanding performance in four out of eight evaluation metrics while only having 12.14M parameters and 2.75G FLOPs. These results highlight significant advancements over 31 state-of-the-art models, underscoring the efficiency of MFFNet in salient object-detection tasks. Full article
Show Figures

Figure 1

24 pages, 9347 KiB  
Article
RDAU-Net: A U-Shaped Semantic Segmentation Network for Buildings near Rivers and Lakes Based on a Fusion Approach
by Yipeng Wang, Dongmei Wang, Teng Xu, Yifan Shi, Wenguang Liang, Yihong Wang, George P. Petropoulos and Yansong Bao
Remote Sens. 2025, 17(1), 2; https://doi.org/10.3390/rs17010002 - 24 Dec 2024
Viewed by 260
Abstract
The encroachment of buildings into the waters of rivers and lakes can lead to increased safety hazards, but current semantic segmentation algorithms have difficulty accurately segmenting buildings in such environments. The specular reflection of the water and boats with similar features to the [...] Read more.
The encroachment of buildings into the waters of rivers and lakes can lead to increased safety hazards, but current semantic segmentation algorithms have difficulty accurately segmenting buildings in such environments. The specular reflection of the water and boats with similar features to the buildings in the environment can greatly affect the performance of the algorithm. Effectively eliminating their influence on the model and further improving the segmentation accuracy of buildings near water will be of great help to the management of river and lake waters. To address the above issues, the present study proposes the design of a U-shaped segmentation network of buildings called RDAU-Net that works through extraction and fuses a convolutional neural network and a transformer to segment buildings. First, we designed a residual dynamic short-cut down-sampling (RDSC) module to minimize the interference of complex building shapes and building scale differences on the segmentation results; second, we reduced the semantic and resolution gaps between multi-scale features using a multi-channel cross fusion transformer module (MCCT); finally, a double-feature channel-wise fusion attention (DCF) was designed to improve the model’s ability to depict building edge details and to reduce the influence of similar features on the model. Additionally, an HRI Building dataset was constructed, comprising water-edge buildings situated in a riverine and lacustrine regulatory context. This dataset encompasses a plethora of water-edge building sample scenarios, offering a comprehensive representation of the subject matter. The experimental results indicated that the statistical metrics achieved by RDAU-Net using the HRI and WHU Building datasets are better than those of others, and that it can effectively solve the building segmentation problems in the management of river and lake waters. Full article
Show Figures

Figure 1

24 pages, 7396 KiB  
Article
Smoke Detection Transformer: An Improved Real-Time Detection Transformer Smoke Detection Model for Early Fire Warning
by Baoshan Sun and Xin Cheng
Fire 2024, 7(12), 488; https://doi.org/10.3390/fire7120488 - 23 Dec 2024
Viewed by 370
Abstract
As one of the important features in the early stage of fires, the detection of smoke can provide a faster early warning of a fire, thus suppressing the spread of the fire in time. However, the features of smoke are not apparent; the [...] Read more.
As one of the important features in the early stage of fires, the detection of smoke can provide a faster early warning of a fire, thus suppressing the spread of the fire in time. However, the features of smoke are not apparent; the shape of smoke is not fixed, and it is easy to be confused with the background outdoors, which leads to difficulties in detecting smoke. Therefore, this study proposes a model called Smoke Detection Transformer (Smoke-DETR) for smoke detection, which is based on a Real-Time Detection Transformer (RT-DETR). Considering the limited computational resources of smoke detection devices, Enhanced Channel-wise Partial Convolution (ECPConv) is introduced to reduce the number of parameters and the amount of computation. This approach improves Partial Convolution (PConv) by using a selection strategy that selects channels containing more information for each convolution, thereby increasing the network’s ability to learn smoke features. To cope with smoke images with inconspicuous features and irregular shapes, the Efficient Multi-Scale Attention (EMA) module is used to strengthen the feature extraction capability of the backbone network. Additionally, in order to overcome the problem of smoke being easily confused with the background, the Multi-Scale Foreground-Focus Fusion Pyramid Network (MFFPN) is designed to strengthen the model’s attention to the foreground of images, which improves the accuracy of detection in situations where smoke is not well differentiated from the background. Experimental results demonstrate that Smoke-DETR has achieved significant improvements in smoke detection. In the self-building dataset, compared to RT-DETR, Smoke-DETR achieves a Precision that has reached 86.2%, marking an increase of 3.6 percentage points. Similarly, Recall has achieved 80%, showing an improvement of 3.6 percentage points. In terms of mAP50, it has reached 86.2%, with a 3.8 percentage point increase. Furthermore, mAP50 has reached 53.9%, representing a 3.6 percentage point increase. Full article
Show Figures

Figure 1

19 pages, 3359 KiB  
Article
MS-CLSTM: Myoelectric Manipulator Gesture Recognition Based on Multi-Scale Feature Fusion CNN-LSTM Network
by Ziyi Wang, Wenjing Huang, Zikang Qi and Shuolei Yin
Biomimetics 2024, 9(12), 784; https://doi.org/10.3390/biomimetics9120784 - 23 Dec 2024
Viewed by 581
Abstract
Surface electromyography (sEMG) signals reflect the local electrical activity of muscle fibers and the synergistic action of the overall muscle group, making them useful for gesture control of myoelectric manipulators. In recent years, deep learning methods have increasingly been applied to sEMG gesture [...] Read more.
Surface electromyography (sEMG) signals reflect the local electrical activity of muscle fibers and the synergistic action of the overall muscle group, making them useful for gesture control of myoelectric manipulators. In recent years, deep learning methods have increasingly been applied to sEMG gesture recognition due to their powerful automatic feature extraction capabilities. sEMG signals contain rich local details and global patterns, but single-scale convolutional networks are limited in their ability to capture both comprehensively, which restricts model performance. This paper proposes a deep learning model based on multi-scale feature fusion—MS-CLSTM (MS Block-ResCBAM-Bi-LSTM). The MS Block extracts local details, global patterns, and inter-channel correlations in sEMG signals using convolutional kernels of different scales. The ResCBAM, which integrates CBAM and Simple-ResNet, enhances attention to key gesture information while alleviating overfitting issues common in small-sample datasets. Experimental results demonstrate that the MS-CLSTM model achieves recognition accuracies of 86.66% and 83.27% on the Ninapro DB2 and DB4 datasets, respectively, and the accuracy can reach 89% in real-time myoelectric manipulator gesture prediction experiments. The proposed model exhibits superior performance in sEMG gesture recognition tasks, offering an effective solution for applications in prosthetic hand control, robotic control, and other human–computer interaction fields. Full article
(This article belongs to the Special Issue Human-Inspired Grasp Control in Robotics)
Show Figures

Figure 1

19 pages, 6995 KiB  
Article
A Classification Model for Fine-Grained Silkworm Cocoon Images Based on Bilinear Pooling and Adaptive Feature Fusion
by Mochen Liu, Xin Hou, Mingrui Shang, Eunice Oluwabunmi Owoola, Guizheng Zhang, Wei Wei, Zhanhua Song and Yinfa Yan
Agriculture 2024, 14(12), 2363; https://doi.org/10.3390/agriculture14122363 - 22 Dec 2024
Viewed by 445
Abstract
The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose [...] Read more.
The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose a serious challenge in accurately locating the effective areas and classifying silkworm cocoons. To improve the perception of intra-class features and the classification accuracy, this paper proposes a bilinear pooling classification model (B-Res41-ASE) based on adaptive multi-scale feature fusion and enhancement. B-Res41-ASE consists of three parts: a feature extraction module, a feature fusion module, and a feature enhancement module. Firstly, the backbone network, ResNet41, is constructed based on the bilinear pooling algorithm to extract complete cocoon features. Secondly, the adaptive spatial feature fusion module (ASFF) is introduced to fuse different semantic information to solve the problem of fine-grained information loss in the process of feature extraction. Finally, the squeeze and excitation module (SE) is used to suppress redundant information, enhance the weight of distinguishable regions, and reduce classification bias. Compared with the widely used classification network, the proposed model achieves the highest classification performance in the test set, with accuracy of 97.0% and an F1-score of 97.5%. The accuracy of B-Res41-ASE is 3.1% and 2.6% higher than that of the classification networks AlexNet and GoogLeNet, respectively, while the F1-score is 2.5% and 2.2% higher, respectively. Additionally, the accuracy of B-Res41-ASE is 1.9% and 7.7% higher than that of the Bilinear CNN and HBP, respectively, while the F1-score is 1.6% and 5.7% higher. The experimental results show that the proposed classification model without complex labelling outperforms other cocoon classification algorithms in terms of classification accuracy and robustness, providing a theoretical basis for the intelligent sorting of silkworm cocoons. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Back to TopTop