Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
applsci-logo

Journal Browser

Journal Browser

Deep Learning and Machine Learning in Image Processing and Pattern Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 February 2025 | Viewed by 16865

Special Issue Editors


E-Mail Website
Guest Editor
Automation Department, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
Interests: neural networks; machine learning; information fusion; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor
Automation Department, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
Interests: control theory; fuzzy systems; complex systems; robot control systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Pattern recognition and image processing have grown in importance within the field of artificial intelligence due to the swift advancement of science and technology. This field has developed rapidly in recent years due to the growing use of machine learning and deep learning in image processing and pattern recognition. The goal of this Special Issue is to examine the most recent developments as well as potential directions for machine learning and deep learning in pattern recognition and image processing.

Prof. Dr. Haitao Zhao
Dr. Meng Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • machine learning
  • pattern recognition
  • machine learning
  • neural network
  • artificial intelligence

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (19 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 1133 KiB  
Article
M2Tames: Interaction and Semantic Context Enhanced Pedestrian Trajectory Prediction
by Xu Gao, Yanan Wang, Yaqian Zhao, Yilong Li and Gang Wu
Appl. Sci. 2024, 14(18), 8497; https://doi.org/10.3390/app14188497 - 20 Sep 2024
Viewed by 328
Abstract
Autonomous driving pays considerable attention to pedestrian trajectory prediction as a crucial task. Constructing effective pedestrian trajectory prediction models depends heavily on utilizing the motion characteristics of pedestrians, along with their interactions among themselves and between themselves and their environment. However, traditional trajectory [...] Read more.
Autonomous driving pays considerable attention to pedestrian trajectory prediction as a crucial task. Constructing effective pedestrian trajectory prediction models depends heavily on utilizing the motion characteristics of pedestrians, along with their interactions among themselves and between themselves and their environment. However, traditional trajectory prediction models often fall short of capturing complex real-world scenarios. To address these challenges, this paper proposes an enhanced pedestrian trajectory prediction model, M2Tames, which incorporates comprehensive motion, interaction, and semantic context factors. M2Tames provides an interaction module (IM), which consists of an improved multi-head mask temporal attention mechanism (M2Tea) and an Interaction Inference Module (I2). M2Tea thoroughly characterizes the historical trajectories and potential interactions, while I2 determines the precise interaction types. Then, IM adaptively aggregates useful neighbor features to generate a more accurate interactive feature map and feeds it into the final layer of the U-Net encoder to fuse with the encoder’s output. Furthermore, by adopting the U-Net architecture, M2Tames can learn and interpret scene semantic information, enhancing its understanding of the spatial relationships between pedestrians and their surroundings. These innovations improve the accuracy and adaptability of the model for predicting pedestrian trajectories. Finally, M2Tames is evaluated on the ETH/UCY and SDD datasets for short- and long-term settings, respectively. The results demonstrate that M2Tames outperforms the state-of-the-art model MSRL by 2.49% (ADE) and 8.77% (FDE) in the short-term setting and surpasses the optimum Y-Net by 6.89% (ADE) and 1.12% (FDE) in the long-term prediction. Excellent performance is also shown on the ETH/UCY datasets. Full article
Show Figures

Figure 1

19 pages, 7602 KiB  
Article
EGS-YOLO: A Fast and Reliable Safety Helmet Detection Method Modified Based on YOLOv7
by Jianfeng Han, Zhiwei Li, Guoqing Cui and Jingxuan Zhao
Appl. Sci. 2024, 14(17), 7923; https://doi.org/10.3390/app14177923 - 5 Sep 2024
Viewed by 474
Abstract
Wearing safety helmets at construction sites is a major measure to prevent safety accidents, so it is essential to supervise and ensure that workers wear safety helmets. This requires a high degree of real-time performance. We improved the network structure based on YOLOv7. [...] Read more.
Wearing safety helmets at construction sites is a major measure to prevent safety accidents, so it is essential to supervise and ensure that workers wear safety helmets. This requires a high degree of real-time performance. We improved the network structure based on YOLOv7. To enhance real-time performance, we introduced GhostModule after comparing various modules to create a new efficient structure that generates more feature mappings with fewer linear operations. SE blocks were introduced after comparing several attention mechanisms to highlight important information in the image. The EIOU loss function was introduced to speed up the convergence of the model. Eventually, we constructed the efficient model EGS-YOLO. EGS-YOLO achieves a mAP of 91.1%, 0.2% higher than YOLOv7, and the inference time is 13.3% faster than YOLOv7 at 3.9 ms (RTX 3090). The parameters and computational complexity are reduced by 37.3% and 33.8%, respectively. The enhanced real-time performance while maintaining the original high precision can meet actual detection requirements. Full article
Show Figures

Figure 1

17 pages, 6911 KiB  
Article
A Deep-Learning-Based Approach to the Classification of Fire Types
by Eshrag Ali Refaee, Abdullah Sheneamer and Basem Assiri
Appl. Sci. 2024, 14(17), 7862; https://doi.org/10.3390/app14177862 - 4 Sep 2024
Viewed by 545
Abstract
The automatic detection of fires and the determination of their causes play a crucial role in mitigating the catastrophic consequences of such events. The literature reveals substantial research on automatic fire detection using machine learning models. However, once a fire is detected, there [...] Read more.
The automatic detection of fires and the determination of their causes play a crucial role in mitigating the catastrophic consequences of such events. The literature reveals substantial research on automatic fire detection using machine learning models. However, once a fire is detected, there is a notable gap in the literature concerning the automatic classification of fire types like solid-material fires, flammable gas fires, and electric-based fires. This classification is essential for firefighters to quickly and effectively determine the most appropriate fire suppression method. This work introduces a benchmark dataset comprising over 1353 manually annotated images, classified into five categories, which is publicly released. It introduces a multiclass dataset based on the types of origins of fires. This work also presents a system incorporating eight deep-learning models evaluated for fire detection and fire-type classification. In fire-type classification, this work focuses on four fire types: solid material, chemical, electrical-based, and oil-based fires. Under the single-level, five-way classification setting, our system achieves its best performance with an accuracy score of 94.48%. Meanwhile, under the two-level classification setting, our system achieves its best performance with accuracy scores of 98.16% for fire detection and 97.55% for fire-type classification, using the DenseNet121 and EffecientNet-b0 models, respectively. The results also indicate that electrical and oil-based fires are the most challenging to detect. Full article
Show Figures

Figure 1

17 pages, 3620 KiB  
Article
Image Registration Algorithm for Stamping Process Monitoring Based on Improved Unsupervised Homography Estimation
by Yujie Zhang and Yinuo Du
Appl. Sci. 2024, 14(17), 7721; https://doi.org/10.3390/app14177721 - 2 Sep 2024
Viewed by 472
Abstract
Homography estimation is a crucial task in aligning template images with target images in stamping monitoring systems. To enhance the robustness and accuracy of homography estimation against random vibrations and lighting variations in stamping environments, this paper proposes an improved unsupervised homography estimation [...] Read more.
Homography estimation is a crucial task in aligning template images with target images in stamping monitoring systems. To enhance the robustness and accuracy of homography estimation against random vibrations and lighting variations in stamping environments, this paper proposes an improved unsupervised homography estimation model. The model takes as input the channel-stacked template and target images and outputs the estimated homography matrix. First, a specialized deformable convolution module and Group Normalization (GN) layer are introduced to expand the receptive field and enhance the model’s ability to learn rotational invariance when processing large, high-resolution images. Next, a multi-scale, multi-stage unsupervised homography estimation network structure is constructed to improve the accuracy of homography estimation by refining the estimation through multiple stages, thereby enhancing the model’s resistance to scale variations. Finally, stamping monitoring image data is incorporated into the training through data fusion, with data augmentation techniques applied to randomly introduce various levels of perturbation, brightness, contrast, and filtering to improve the model’s robustness to complex changes in the stamping environment, making it more suitable for monitoring applications in this specific industrial context. Compared to traditional methods, this approach provides better homography matrix estimation when handling images with low texture, significant lighting variations, or large viewpoint changes. Compared to other deep-learning-based homography estimation methods, it reduces estimation errors and performs better on stamping monitoring images, while also offering broader applicability. Full article
Show Figures

Figure 1

19 pages, 34854 KiB  
Article
A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8
by Meng Ning, Hongrui Ma, Yuqian Wang, Liyang Cai and Yiliang Chen
Appl. Sci. 2024, 14(16), 7295; https://doi.org/10.3390/app14167295 - 19 Aug 2024
Viewed by 610
Abstract
During the drying and processing of raisins, the presence of foreign matter such as fruit stems, branches, stones, and plastics is a common issue. To address this, we propose an enhanced real-time detection approach leveraging an improved YOLOv8 model. This novel method integrates [...] Read more.
During the drying and processing of raisins, the presence of foreign matter such as fruit stems, branches, stones, and plastics is a common issue. To address this, we propose an enhanced real-time detection approach leveraging an improved YOLOv8 model. This novel method integrates the multi-head self-attention mechanism (MHSA) from BoTNet into YOLOv8’s backbone. In the model’s neck layer, selected C2f modules have been strategically replaced with RFAConv modules. The model also adopts an EIoU loss function in place of the original CIoU. Our experiments reveal that the refined YOLOv8 boasts a precision of 94.5%, a recall rate of 89.9%, and an F1-score of 0.921, with a mAP reaching 96.2% at the 0.5 IoU threshold and 81.5% across the 0.5–0.95 IoU range. For this model, comprising 13,177,692 parameters, the average time required for detecting each image on a GPU is 7.8 milliseconds. In contrast to several prevalent models of today, our enhanced model excels in mAP0.5 and demonstrates superiority in F1-score, parameter economy, computational efficiency, and speed. This study conclusively validates the capability of our improved YOLOv8 model to execute real-time foreign object detection on raisin production lines with high efficacy. Full article
Show Figures

Figure 1

16 pages, 1160 KiB  
Article
BSTCA-HAR: Human Activity Recognition Model Based on Wearable Mobile Sensors
by Yan Yuan, Lidong Huang, Xuewen Tan, Fanchang Yang and Shiwei Yang
Appl. Sci. 2024, 14(16), 6981; https://doi.org/10.3390/app14166981 - 9 Aug 2024
Viewed by 595
Abstract
Sensor-based human activity recognition has been widely used in various fields; however, there are still challenges involving recognition of daily complex human activities using sensors. In order to solve the problem of timeliness and homogeneity of recognition functions in human activity recognition models, [...] Read more.
Sensor-based human activity recognition has been widely used in various fields; however, there are still challenges involving recognition of daily complex human activities using sensors. In order to solve the problem of timeliness and homogeneity of recognition functions in human activity recognition models, we propose a human activity recognition model called ’BSTCA-HAR’ based on a long short-term memory (LSTM) network. The approach proposed in this paper combines an attention mechanism and a temporal convolutional network (TCN). The learning and prediction units in the model can efficiently learn important action data while capturing long time-dependent information as well as features at different time scales. Our series of experiments on three public datasets (WISDM, UCI-HAR, and ISLD) with different data features confirm the feasibility of the proposed method. This method excels in dynamically capturing action features while maintaining a low number of parameters and achieving a remarkable average accuracy of 93%, proving that the model has good recognition performance. Full article
Show Figures

Figure 1

18 pages, 16213 KiB  
Article
A Lightweight CER-YOLOv5s Algorithm for Detection of Construction Vehicles at Power Transmission Lines
by Pingping Yu, Yuting Yan, Xinliang Tang, Yan Shang and He Su
Appl. Sci. 2024, 14(15), 6662; https://doi.org/10.3390/app14156662 - 30 Jul 2024
Cited by 1 | Viewed by 687
Abstract
In the context of power-line scenarios characterized by complex backgrounds and diverse scales and shapes of targets, and addressing issues such as large model parameter sizes, insufficient feature extraction, and the susceptibility to missing small targets in engineering-vehicle detection tasks, a lightweight detection [...] Read more.
In the context of power-line scenarios characterized by complex backgrounds and diverse scales and shapes of targets, and addressing issues such as large model parameter sizes, insufficient feature extraction, and the susceptibility to missing small targets in engineering-vehicle detection tasks, a lightweight detection algorithm termed CER-YOLOv5s is firstly proposed. The C3 module was restructured by embedding a lightweight Ghost bottleneck structure and convolutional attention module, enhancing the model’s ability to extract key features while reducing computational costs. Secondly, an E-BiFPN feature pyramid network is proposed, utilizing channel attention mechanisms to effectively suppress background noise and enhance the model’s focus on important regions. Bidirectional connections were introduced to optimize the feature fusion paths, improving the efficiency of multi-scale feature fusion. At the same time, in the feature fusion part, an ERM (enhanced receptive module) was added to expand the receptive field of shallow feature maps through multiple convolution repetitions, enhancing the global information perception capability in relation to small targets. Lastly, a Soft-DIoU-NMS suppression algorithm is proposed to improve the candidate box selection mechanism, addressing the issue of suboptimal detection of occluded targets. The experimental results indicated that compared with the baseline YOLOv5s algorithm, the improved algorithm reduced parameters and computations by 27.8% and 31.9%, respectively. The mean average precision (mAP) increased by 2.9%, reaching 98.3%. This improvement surpasses recent mainstream algorithms and suggests stronger robustness across various scenarios. The algorithm meets the lightweight requirements for embedded devices in power-line scenarios. Full article
Show Figures

Figure 1

18 pages, 10474 KiB  
Article
Adaptive Frame Sampling and Feature Alignment for Multi-Frame Infrared Small Target Detection
by Chuanhong Yao and Haitao Zhao
Appl. Sci. 2024, 14(14), 6360; https://doi.org/10.3390/app14146360 - 22 Jul 2024
Viewed by 641
Abstract
In recent years, infrared images have attracted widespread attention, due to their extensive application in low-visibility search and rescue, forest fire monitoring, ground target monitoring, and other fields. Infrared small target detection technology plays a vital role in these applications. Although there has [...] Read more.
In recent years, infrared images have attracted widespread attention, due to their extensive application in low-visibility search and rescue, forest fire monitoring, ground target monitoring, and other fields. Infrared small target detection technology plays a vital role in these applications. Although there has been significant research over the years, accurately detecting infrared small targets in complex backgrounds remains a significant challenge. Multi-frame detection methods can significantly improve detection performance in these cases. However, current multi-frame methods face difficulties in balancing the number of input frames and detection speed, and cannot effectively handle the background motion caused by movement of the infrared camera. To address these issues, we propose an adaptive frame sampling method and a detection network aligned at the feature level. Our adaptive frame sampling method uses mutual information to measure motion changes between adjacent frames, construct a motion distribution, and sample frames with uniform motion based on the averaged motion distribution. Our detection network handles background motion by predicting a homography flow matrix that aligns features at the feature level. Extensive evaluation of all components showed that the proposed method can more effectively perform multi-frame infrared small target detection. Full article
Show Figures

Figure 1

24 pages, 10433 KiB  
Article
Glass Defect Detection with Improved Data Augmentation under Total Reflection Lighting
by Pengfei Ding and Liangen Yang
Appl. Sci. 2024, 14(13), 5658; https://doi.org/10.3390/app14135658 - 28 Jun 2024
Viewed by 694
Abstract
To address the technical challenge of identifying tiny defects, especially dust and point defects, on mobile phone flat glass, an automatic optical inspection system is established. The system investigates algorithms including imaging principles, target detection models, data augmentation, foreground segmentation, and image fusion. [...] Read more.
To address the technical challenge of identifying tiny defects, especially dust and point defects, on mobile phone flat glass, an automatic optical inspection system is established. The system investigates algorithms including imaging principles, target detection models, data augmentation, foreground segmentation, and image fusion. The system builds an automatic optical inspection platform to collect glass defect samples. It illuminates the glass samples with a combined total reflection–grazing light source, collects the defect sample data, segments the background and defects of the collected data, generates the defect mask, and extracts the complete defects of the cell phone flat glass. The system then seamlessly integrates the extracted defects with a flawless background using Poisson editing and outputs the location information of the defects and the label output to automatically generate the dataset. The deep learning network YOLOv5 works as the core algorithm framework, into which the Constructive Block Attention Module and the small target detection layer are specifically added to enhance the capability of the model to detect small defects. According to the experimental results, the combined lighting effectively improves the precision of detecting dust and bright spots. Additionally, with the adoption of novel data augmentation techniques, the enhanced YOLOv5 model is capable of effectively addressing the challenges posed by inefficient sample data and non-uniform distribution, thus mitigating network generalization issues. Furthermore, this data augmentation approach facilitates the rapid adaptation of the same detection tasks to diverse environmental scenarios, enabling the expedited and efficient deployment of the model across various industrial settings. The mean average precision (MAP) of the optimal model in the validation set reached 98.36%, 2.62% higher than that of the original YOLOv5. In addition, its false acceptance rate (FAR) is 1.27%, its false rejection rate (FRR) was 2.47%, its detection speed was 64 fps, and its correct detection rate in the validation set was 98.75%, which meets the current industrial detection requirements by and large. In this way, this paper achieved the automated inspection of mobile phone flat glass with high robustness, high precision, and a low false acceptance rate and false rejection rate, significantly reducing material losses in the factories and the likelihood of error occurrence in follow-on products. This method can be applied to the multi-scale and multi-type detection of glass defects. Full article
Show Figures

Figure 1

15 pages, 5037 KiB  
Article
Aerial Image Segmentation of Nematode-Affected Pine Trees with U-Net Convolutional Neural Network
by Jiankang Shen, Qinghua Xu, Mingyang Gao, Jicai Ning, Xiaopeng Jiang and Meng Gao
Appl. Sci. 2024, 14(12), 5087; https://doi.org/10.3390/app14125087 - 11 Jun 2024
Viewed by 722
Abstract
Pine wood nematode disease, commonly referred to as pine wilt, poses a grave threat to forest health, leading to profound ecological and economic impacts. Originating from the pine wood nematode, this disease not only causes the demise of pine trees but also casts [...] Read more.
Pine wood nematode disease, commonly referred to as pine wilt, poses a grave threat to forest health, leading to profound ecological and economic impacts. Originating from the pine wood nematode, this disease not only causes the demise of pine trees but also casts a long shadow over the entire forest ecosystem. The accurate identification of infected trees stands as a pivotal initial step in developing effective prevention and control measures for pine wilt. Nevertheless, existing identification methods face challenges in precisely determining the disease status of individual pine trees, impeding early detection and efficient intervention. In this study, we leverage the capabilities of unmanned aerial vehicle (UAV) remote sensing technology and integrate the VGG classical small convolutional kernel network with U-Net to detect diseased pine trees. This cutting-edge approach captures the spatial and characteristic intricacies of infected trees, converting them into high-dimensional features through multiple convolutions within the VGG network. This method significantly reduces the parameter count while enhancing the sensing range. The results obtained from our validation set are remarkably promising, achieving a Mean Intersection over Union (MIoU) of 81.62%, a Mean Pixel Accuracy (MPA) of 85.13%, an Accuracy of 99.13%, and an F1 Score of 88.50%. These figures surpass those obtained using other methods such as ResNet50 and DeepLab v3+. The methodology presented in this research facilitates rapid and accurate monitoring of pine trees infected with nematodes, offering invaluable technical assistance in the prevention and management of pine wilt disease. Full article
Show Figures

Figure 1

22 pages, 5984 KiB  
Article
Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering
by Wenjiang Shao and Xiaowei Zhang
Appl. Sci. 2024, 14(11), 4617; https://doi.org/10.3390/app14114617 - 27 May 2024
Viewed by 551
Abstract
Subspace clustering algorithms have demonstrated remarkable success across diverse fields, including object segmentation, gene clustering, and recommendation systems. However, they often face challenges, such as omitting cluster information and the neglect of higher-order neighbor relationships within the data. To address these issues, a [...] Read more.
Subspace clustering algorithms have demonstrated remarkable success across diverse fields, including object segmentation, gene clustering, and recommendation systems. However, they often face challenges, such as omitting cluster information and the neglect of higher-order neighbor relationships within the data. To address these issues, a novel subspace clustering method named Markov-Embedded Affinity Learning with Connectivity Constraints for Subspace Clustering is proposed. This method seamlessly embeds Markov transition probability information into the self-expression, leveraging a fine-grained neighbor matrix to uncover latent data structures. This matrix preserves crucial high-order local information and complementary details, ensuring a comprehensive understanding of the data. To effectively handle complex nonlinear relationships, the method learns the underlying manifold structure from a cross-order local neighbor graph. Additionally, connectivity constraints are applied to the affinity matrix, enhancing the group structure and further improving the clustering performance. Extensive experiments demonstrate the superiority of this novel method over baseline approaches, validating its effectiveness and practical utility. Full article
Show Figures

Figure 1

14 pages, 3618 KiB  
Article
DBCW-YOLO: A Modified YOLOv5 for the Detection of Steel Surface Defects
by Jianfeng Han, Guoqing Cui, Zhiwei Li and Jingxuan Zhao
Appl. Sci. 2024, 14(11), 4594; https://doi.org/10.3390/app14114594 - 27 May 2024
Viewed by 819
Abstract
In steel production, defect detection is crucial for preventing safety risks, and improving the accuracy of steel defect detection in industrial environments remains challenging due to the variable types of defects, cluttered backgrounds, low contrast, and noise interference. Therefore, this paper introduces a [...] Read more.
In steel production, defect detection is crucial for preventing safety risks, and improving the accuracy of steel defect detection in industrial environments remains challenging due to the variable types of defects, cluttered backgrounds, low contrast, and noise interference. Therefore, this paper introduces a steel surface defect detection model, DBCW-YOLO, based on YOLOv5. Firstly, a new feature fusion strategy is proposed to optimize the feature map fusion pair model using the BiFPN method to fuse information at multiple scales, and CARAFE up-sampling is introduced to expand the sensory field of the network and make more effective use of the surrounding information. Secondly, the WIoU uses a dynamic non-monotonic focusing mechanism introduced in the loss function part to optimize the loss function and solve the problem of accuracy degradation due to sample inhomogeneity. This approach improves the learning ability of small target steel defects and accelerates network convergence. Finally, we use the dynamic heads in the network prediction phase. This improves the scale-aware, spatial-aware, and task-aware performance of the algorithm. Experimental results on the NEU-DET dataset show that the average detection accuracy is 81.1, which is about (YOLOv5) 6% higher than the original model and satisfies real-time detection. Therefore, DBCW-YOLO has good overall performance in the steel surface defect detection task. Full article
Show Figures

Figure 1

16 pages, 3930 KiB  
Article
Prediction of Kiwifruit Sweetness with Vis/NIR Spectroscopy Based on Scatter Correction and Feature Selection Techniques
by Chang Wan, Rong Yue, Zhenfa Li, Kai Fan, Xiaokai Chen and Fenling Li
Appl. Sci. 2024, 14(10), 4145; https://doi.org/10.3390/app14104145 - 14 May 2024
Viewed by 836
Abstract
The sweetness is an important parameter for the quality of Cuixiang kiwifruit. The quick and accurate assessment of sweetness is necessary for farmers to make timely orchard management and for consumers to make purchasing choices. The objective of the study was to propose [...] Read more.
The sweetness is an important parameter for the quality of Cuixiang kiwifruit. The quick and accurate assessment of sweetness is necessary for farmers to make timely orchard management and for consumers to make purchasing choices. The objective of the study was to propose an effective physical method for determining the sweetness of fresh kiwifruit based on fruit hyperspectral reflectance in 400–2500 nm. In this study, the visible and near-infrared spectral (Vis/NIR) reflectance and sweetness values of kiwifruit were measured at different time periods after the fruit matured in 2021 and 2022. The multiplicative scatter correction (MSC) and standard normal variable (SNV) transformation were used for spectral denoising. The successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) methods were employed to select the most effective features for sweetness, and then the features were used as the inputs of partial least squares (PLS), least squares support vector machine (LSSVM), back propagation neural network (BP), and multiple linear regression (MLR) models to explore the best way of sweetness predicting. The study indicated that the most sensitive features were in the blue and red regions and the 970, 1200, and 1400 nm. The sweetness estimation model constructed by using the data of the whole harvest period from August to October performed better than the models constructed by each harvest period. Overall results indicated that hyperspectral reflectance incorporated with MSC-SPA-LSSVM could explain up to 79% of the variability in kiwifruit sweetness, which could be applied as an alternative fast and accurate method for the non-destructive determination of the sweetness of kiwifruit. This research could partially provide a theoretical basis for the development of nondestructive instrumentation for the detection of kiwifruit sweetness. Full article
Show Figures

Figure 1

12 pages, 4471 KiB  
Article
Dual Enhancement Network for Infrared Small Target Detection
by Xinyi Wu, Xudong Hu, Huaizheng Lu, Chaopeng Li, Lei Zhang and Weifang Huang
Appl. Sci. 2024, 14(10), 4132; https://doi.org/10.3390/app14104132 - 13 May 2024
Viewed by 887
Abstract
Infrared small target detection (IRSTD) is crucial for applications in security surveillance, unmanned aerial vehicle identification, military reconnaissance, and other fields. However, small targets often suffer from resolution limitations, background complexity, etc., in infrared images, which poses a great challenge to IRSTD, especially [...] Read more.
Infrared small target detection (IRSTD) is crucial for applications in security surveillance, unmanned aerial vehicle identification, military reconnaissance, and other fields. However, small targets often suffer from resolution limitations, background complexity, etc., in infrared images, which poses a great challenge to IRSTD, especially due to the noise interference and the presence of tiny, low-luminance targets. In this paper, we propose a novel dual enhancement network (DENet) to suppress background noise and enhance dim small targets. Specifically, to address the problem of complex backgrounds in infrared images, we have designed the residual sparse enhancement (RSE) module, which sparsely propagates a number of representative pixels between any adjacent feature pyramid layers instead of a simple summation. To handle the problem of infrared targets being extremely dim and small, we have developed a spatial attention enhancement (SAE) module to adaptively enhance and highlight the features of dim small targets. In addition, we evaluated the effectiveness of the modules in the DENet model through ablation experiments. Extensive experiments on three public infrared datasets demonstrated that our approach can greatly enhance dim small targets, where the average values of intersection over union (IoU), probability of detection (Pd), and false alarm rate (Fa) reached up to 77.33%, 97.30%, and 9.299%, demonstrating a performance superior to the state-of-the-art IRSTD method. Full article
Show Figures

Figure 1

19 pages, 8343 KiB  
Article
Precision-Boosted Forest Fire Target Detection via Enhanced YOLOv8 Model
by Zhaoxu Yang, Yifan Shao, Ye Wei and Jun Li
Appl. Sci. 2024, 14(6), 2413; https://doi.org/10.3390/app14062413 - 13 Mar 2024
Cited by 2 | Viewed by 1758
Abstract
Forest fires present a significant challenge to ecosystems, particularly due to factors like tree cover that complicate fire detection tasks. While fire detection technologies, like YOLO, are widely used in forest protection, capturing diverse and complex flame features remains challenging. Therefore, we propose [...] Read more.
Forest fires present a significant challenge to ecosystems, particularly due to factors like tree cover that complicate fire detection tasks. While fire detection technologies, like YOLO, are widely used in forest protection, capturing diverse and complex flame features remains challenging. Therefore, we propose an enhanced YOLOv8 multiscale forest fire detection method. This involves adjusting the network structure and integrating Deformable Convolution and SCConv modules to better adapt to forest fire complexities. Additionally, we introduce the Coordinate Attention mechanism in the Detection module to more effectively capture feature information and enhance model accuracy. We adopt the WIoU v3 loss function and implement a dynamically non-monotonic mechanism to optimize gradient allocation strategies. Our experimental results demonstrate that our model achieves a mAP of 90.02%, approximately 5.9% higher than the baseline YOLOv8 network. This method significantly improves forest fire detection accuracy, reduces False Positive rates, and demonstrates excellent applicability in real forest fire scenarios. Full article
Show Figures

Figure 1

23 pages, 4888 KiB  
Article
SFFNet: Staged Feature Fusion Network of Connecting Convolutional Neural Networks and Graph Convolutional Neural Networks for Hyperspectral Image Classification
by Hao Li, Xiaorui Xiong, Chaoxian Liu, Yong Ma, Shan Zeng and Yaqin Li
Appl. Sci. 2024, 14(6), 2327; https://doi.org/10.3390/app14062327 - 10 Mar 2024
Cited by 3 | Viewed by 1679
Abstract
The immense representation power of deep learning frameworks has kept them in the spotlight in hyperspectral image (HSI) classification. Graph Convolutional Neural Networks (GCNs) can be used to compensate for the lack of spatial information in Convolutional Neural Networks (CNNs). However, most GCNs [...] Read more.
The immense representation power of deep learning frameworks has kept them in the spotlight in hyperspectral image (HSI) classification. Graph Convolutional Neural Networks (GCNs) can be used to compensate for the lack of spatial information in Convolutional Neural Networks (CNNs). However, most GCNs construct graph data structures based on pixel points, which requires the construction of neighborhood matrices on all data. Meanwhile, the setting of GCNs to construct similarity relations based on spatial structure is not fully applicable to HSIs. To make the network more compatible with HSIs, we propose a staged feature fusion model called SFFNet, a neural network framework connecting CNN and GCN models. The CNN performs the first stage of feature extraction, assisted by adding neighboring features and overcoming the defects of local convolution; then, the GCN performs the second stage for classification, and the graph data structure is constructed based on spectral similarity, optimizing the original connectivity relationships. In addition, the framework enables the batch training of the GCN by using the extracted spectral features as nodes, which greatly reduces the hardware requirements. The experimental results on three publicly available benchmark hyperspectral datasets show that our proposed framework outperforms other relevant deep learning models, with an overall classification accuracy of over 97%. Full article
Show Figures

Figure 1

18 pages, 7048 KiB  
Article
U-ETMVSNet: Uncertainty-Epipolar Transformer Multi-View Stereo Network for Object Stereo Reconstruction
by Ning Zhao, Heng Wang, Quanlong Cui and Lan Wu
Appl. Sci. 2024, 14(6), 2223; https://doi.org/10.3390/app14062223 - 7 Mar 2024
Viewed by 1048
Abstract
The Multi-View Stereo model (MVS), which utilizes 2D images from multiple perspectives for 3D reconstruction, is a crucial technique in the field of 3D vision. To address the poor correlation between 2D features and 3D space in existing MVS models, as well as [...] Read more.
The Multi-View Stereo model (MVS), which utilizes 2D images from multiple perspectives for 3D reconstruction, is a crucial technique in the field of 3D vision. To address the poor correlation between 2D features and 3D space in existing MVS models, as well as the high sampling rate required for static sampling, we proposeU-ETMVSNet in this paper. Initially, we employ an integrated epipolar transformer module (ET) to establish 3D spatial correlations along epipolar lines, thereby enhancing the reliability of aggregated cost volumes. Subsequently, we devise a sampling module based on probability volume uncertainty to dynamically adjust the depth sampling range for the next stage. Finally, we utilize a multi-stage joint learning method based on multi-depth value classification to evaluate and optimize the model. Experimental results demonstrate that on the DTU dataset, our method achieves a relative performance improvement of 27.01% and 11.27% in terms of completeness error and overall error, respectively, compared to CasMVSNet, even at lower depth sampling rates. Moreover, our method exhibits excellent performance with a score of 58.60 on the Tanks &Temples dataset, highlighting its robustness and generalization capability. Full article
Show Figures

Figure 1

20 pages, 7918 KiB  
Article
Lightweight Non-Destructive Detection of Diseased Apples Based on Structural Re-Parameterization Technique
by Bo Han, Ziao Lu, Luan Dong and Jingjing Zhang
Appl. Sci. 2024, 14(5), 1907; https://doi.org/10.3390/app14051907 - 26 Feb 2024
Cited by 5 | Viewed by 1134
Abstract
This study addresses the challenges in the non-destructive detection of diseased apples, specifically the high complexity and poor real-time performance of the classification model for detecting diseased fruits in apple grading. Research is conducted on a lightweight model for apple defect recognition, and [...] Read more.
This study addresses the challenges in the non-destructive detection of diseased apples, specifically the high complexity and poor real-time performance of the classification model for detecting diseased fruits in apple grading. Research is conducted on a lightweight model for apple defect recognition, and an improved VEW-YOLOv8n method is proposed. The backbone network incorporates a lightweight, re-parameterization VanillaC2f module, reducing both complexity and the number of parameters, and it employs an extended activation function to enhance the model’s nonlinear expression capability. In the neck network, an Efficient-Neck lightweight structure, developed using the lightweight modules and augmented with a channel shuffling strategy, decreases the computational load while ensuring comprehensive feature information fusion. The model’s robustness and generalization ability are further enhanced by employing the WIoU bounding box loss function, evaluating the quality of anchor frames using outlier metrics, and incorporating a dynamically updated gradient gain assignment strategy. Experimental results indicate that the improved model surpasses the YOLOv8n model, achieving a 2.7% increase in average accuracy, a 24.3% reduction in parameters, a 28.0% decrease in computational volume, and an 8.5% improvement in inference speed. This technology offers a novel, effective method for the non-destructive detection of diseased fruits in apple grading working procedures. Full article
Show Figures

Figure 1

20 pages, 4385 KiB  
Article
A Multi-Task Learning and Knowledge Selection Strategy for Environment-Induced Color-Distorted Image Restoration
by Yuan Ding and Kaijun Wu
Appl. Sci. 2024, 14(5), 1836; https://doi.org/10.3390/app14051836 - 23 Feb 2024
Cited by 2 | Viewed by 955
Abstract
Existing methods for restoring color-distorted images in specific environments typically focus on a singular type of distortion, making it challenging to generalize their application across various types of color-distorted images. If it were possible to leverage the intrinsic connections between different types of [...] Read more.
Existing methods for restoring color-distorted images in specific environments typically focus on a singular type of distortion, making it challenging to generalize their application across various types of color-distorted images. If it were possible to leverage the intrinsic connections between different types of color-distorted images and coordinate their interactions during model training, it would simultaneously enhance generalization, address potential overfitting and underfitting issues during data fitting, and consequently lead to a positive performance boost. In this paper, our approach primarily addresses three distinct types of color-distorted images, namely dust-laden images, hazy images, and underwater images. By thoroughly exploiting the unique characteristics and interrelationships of these types, we achieve the objective of multitask processing. Within this endeavor, identifying appropriate correlations is pivotal. To this end, we propose a knowledge selection and allocation strategy that optimally distributes the features and correlations acquired by the network from the images to different tasks, enabling a more refined task differentiation. Moreover, given the challenge of difficult dataset pairing, we employ unsupervised learning techniques and introduce novel Transformer blocks, feedforward networks, and hybrid modules to enhance context relevance. Through extensive experimentation, we demonstrate that our proposed method significantly enhances the performance of color-distorted image restoration. Full article
Show Figures

Figure 1

Back to TopTop