Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,618)

Search Parameters:
Keywords = VGG19

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1315 KiB  
Article
Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks
by Wenjing Liu, Zhongmin Chen and Yunzhan Gong
Electronics 2025, 14(2), 381; https://doi.org/10.3390/electronics14020381 (registering DOI) - 19 Jan 2025
Viewed by 48
Abstract
Pre-trained neural networks like GPT-4 and Llama2 have revolutionized intelligent information processing, but their deployment in industrial applications faces challenges, particularly in harsh environments. To address these related issues, model offloading, which involves distributing the computational load of pre-trained models across edge devices, [...] Read more.
Pre-trained neural networks like GPT-4 and Llama2 have revolutionized intelligent information processing, but their deployment in industrial applications faces challenges, particularly in harsh environments. To address these related issues, model offloading, which involves distributing the computational load of pre-trained models across edge devices, has emerged as a promising solution. While this approach enables the utilization of more powerful models, it faces significant challenges in harsh environments, where reliability, connectivity, and resilience are critical. This paper introduces failure-resilient inference in mobile networks (FRIM), a framework that ensures robust offloading and inference without the need for model retraining or reconstruction. FRIM leverages graph theory to optimize partition redundancy and incorporates an adaptive failure detection mechanism for mobile inference with efficient fault tolerance. Experimental results on DNN models (AlexNet, ResNet, VGG-16) show that FRIM improves inference performance and resilience, enabling more reliable mobile applications in harsh operating environments. Full article
(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)
15 pages, 3242 KiB  
Article
Deep Transfer Learning for Classification of Late Gadolinium Enhancement Cardiac MRI Images into Myocardial Infarction, Myocarditis, and Healthy Classes: Comparison with Subjective Visual Evaluation
by Amani Ben Khalifa, Manel Mili, Mezri Maatouk, Asma Ben Abdallah, Mabrouk Abdellali, Sofiene Gaied, Azza Ben Ali, Yassir Lahouel, Mohamed Hedi Bedoui and Ahmed Zrig
Diagnostics 2025, 15(2), 207; https://doi.org/10.3390/diagnostics15020207 - 17 Jan 2025
Viewed by 319
Abstract
Background/Objectives: To develop a computer-aided diagnosis (CAD) method for the classification of late gadolinium enhancement (LGE) cardiac MRI images into myocardial infarction (MI), myocarditis, and healthy classes using a fine-tuned VGG16 model hybridized with multi-layer perceptron (MLP) (VGG16-MLP) and assess our model’s performance [...] Read more.
Background/Objectives: To develop a computer-aided diagnosis (CAD) method for the classification of late gadolinium enhancement (LGE) cardiac MRI images into myocardial infarction (MI), myocarditis, and healthy classes using a fine-tuned VGG16 model hybridized with multi-layer perceptron (MLP) (VGG16-MLP) and assess our model’s performance in comparison to various pre-trained base models and MRI readers. Methods: This study included 361 LGE images for MI, 222 for myocarditis, and 254 for the healthy class. The left ventricle was extracted automatically using a U-net segmentation model on LGE images. Fine-tuned VGG16 was performed for feature extraction. A spatial attention mechanism was implemented as a part of the neural network architecture. The MLP architecture was used for the classification. The evaluation metrics were calculated using a separate test set. To compare the VGG16 model’s performance in feature extraction, various pre-trained base models were evaluated: VGG19, DenseNet121, DenseNet201, MobileNet, InceptionV3, and InceptionResNetV2. The Support Vector Machine (SVM) classifier was evaluated and compared to MLP for the classification task. The performance of the VGG16-MLP model was compared with a subjective visual analysis conducted by two blinded independent readers. Results: The VGG16-MLP model allowed high-performance differentiation between MI, myocarditis, and healthy LGE cardiac MRI images. It outperformed the other tested models with 96% accuracy, 97% precision, 96% sensitivity, and 96% F1-score. Our model surpassed the accuracy of Reader 1 by 27% and Reader 2 by 17%. Conclusions: Our study demonstrated that the VGG16-MLP model permits accurate classification of MI, myocarditis, and healthy LGE cardiac MRI images and could be considered a reliable computer-aided diagnosis approach specifically for radiologists with limited experience in cardiovascular imaging. Full article
(This article belongs to the Special Issue Diagnostic AI and Cardiac Diseases)
Show Figures

Figure 1

21 pages, 853 KiB  
Article
Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences
by Snehlata Beriwal and John Ayeelyan
Electronics 2025, 14(2), 350; https://doi.org/10.3390/electronics14020350 - 17 Jan 2025
Viewed by 378
Abstract
Unprecedented levels of air pollution in our cities due to rapid urbanization have caused major health concerns, severely affecting the population, especially children and the elderly. A steady loss of ecological balance, without remedial measures like phytoremediation, coupled with alarming vehicular and industrial [...] Read more.
Unprecedented levels of air pollution in our cities due to rapid urbanization have caused major health concerns, severely affecting the population, especially children and the elderly. A steady loss of ecological balance, without remedial measures like phytoremediation, coupled with alarming vehicular and industrial pollution, have pushed the Air Quality Index (AQI) and particulate matter (PM) to dangerous levels, especially in the metropolitan cities of India. Monitoring and accurate prediction of inhalable Particulate Matter 2.5 (PM2.5) and Particulate Matter 10 (PM10) levels, which cause escalations in and increase the risks of asthma, respiratory inflammation, bronchitis, high blood pressure, compromised lung function, and lung cancer, have become more critical than ever. To that end, the authors of this work have proposed a federated learning (FL) framework for monitoring and predicting PM2.5 and PM10 across multiple locations, with a resultant impact analysis with respect to key health parameters. The proposed FL approach encompasses four stages: client selection for processing and model updates, aggregation for global model updates, a pollution prediction model with necessary explanations, and finally, the health impact analysis corresponding to the PM levels. This framework employs a VGG-19 deep learning model, and leverages Causal Inference for interpretability, enabling accurate impact analysis across a host of health conditions. This research has employed datasets specific to India, Nepal, and China for the purposes of model prediction, explanation, and impact analysis. The approach was found to achieve an overall accuracy of 92.33%, with the causal inference-based impact analysis producing an accuracy of 84% for training and 72% for testing with respect to PM2.5, and an accuracy of 79% for training and 74% for testing with respect to PM10. Compared to previous studies undertaken in this field, this proposed approach has demonstrated better accuracy, and is the first of its kind to analyze health impacts corresponding to PM2.5 and PM10 levels. Full article
Show Figures

Figure 1

24 pages, 9651 KiB  
Article
Fault Detection in Induction Machines Using Learning Models and Fourier Spectrum Image Analysis
by Kevin Barrera-Llanga, Jordi Burriel-Valencia, Angel Sapena-Bano and Javier Martinez-Roman
Sensors 2025, 25(2), 471; https://doi.org/10.3390/s25020471 - 15 Jan 2025
Viewed by 370
Abstract
Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating [...] Read more.
Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating a distinctive background to enhance spectral feature learning is proposed, enabling the detection of four types of faults: healthy motor coupled to a generator with a broken bar (HGB), broken rotor bar (BRB), race bearing fault (RBF), and bearing ball fault (BBF). The dataset was generated from three-phase signals of an induction motor controlled by a Direct Torque Controller under various operating conditions (20–1500 rpm with 0–100% load), resulting in 4251 images. The model, based on a Visual Geometry Group (VGG) architecture with 19 layers, achieved an overall accuracy of 98%, with specific accuracies of 99% for RAF, 100% for BRB, 100% for RBF, and 95% for BBF. A new model interpretability was assessed using explainability techniques, which allowed for the identification of specific learning patterns. This analysis introduces a new approach by demonstrating how different convolutional blocks capture particular features: the first convolutional block captures signal shape, while the second identifies background features. Additionally, distinct convolutional layers were associated with each fault type: layer 9 for RAF, layer 13 for BRB, layer 16 for RBF, and layer 14 for BBF. This methodology offers a scalable solution for predictive maintenance in induction motors, effectively combining signal processing, computer vision, and explainability techniques. Full article
(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2024)
Show Figures

Figure 1

12 pages, 3235 KiB  
Article
Predicting Semen Analysis Parameters from Testicular Ultrasonography Images Using Deep Learning Algorithms: An Innovative Approach to Male Infertility Diagnosis
by Lutfullah Sagir, Esat Kaba, Merve Huner Yigit, Filiz Tasci and Hakki Uzun
J. Clin. Med. 2025, 14(2), 516; https://doi.org/10.3390/jcm14020516 - 15 Jan 2025
Viewed by 298
Abstract
Objectives: Semen analysis is universally regarded as the gold standard for diagnosing male infertility, while ultrasonography plays a vital role as a complementary diagnostic tool. This study aims to assess the effectiveness of artificial intelligence (AI)-driven deep learning algorithms in predicting semen analysis [...] Read more.
Objectives: Semen analysis is universally regarded as the gold standard for diagnosing male infertility, while ultrasonography plays a vital role as a complementary diagnostic tool. This study aims to assess the effectiveness of artificial intelligence (AI)-driven deep learning algorithms in predicting semen analysis parameters based on testicular ultrasonography images. Materials and Methods: This study included male patients aged 18–54 who sought evaluation for infertility at the Urology Outpatient Clinic of our hospital between February 2022 and April 2023. All patients underwent comprehensive assessments, including blood hormone profiling, semen analysis, and scrotal ultrasonography, with each procedure being performed by the same operator. Longitudinal-axis images of both testes were obtained and subsequently segmented. Based on the semen analysis results, the patients were categorized into groups according to sperm concentration, progressive motility, and morphology. Following the initial classification, each semen parameter was further subdivided into “low” and “normal” categories. The testicular images from both the right and left sides of all patients were organized into corresponding folders based on their associated laboratory parameters. Three distinct datasets were created from the segmented images, which were then augmented. The datasets were randomly partitioned into an 80% training set and a 20% test set. Finally, the images were classified using the VGG-16 deep learning architecture. Results: The area under the curve (AUC) values for the classification of sperm concentration (oligospermia versus normal), progressive motility (asthenozoospermia versus normal), and morphology (teratozoospermia versus normal) were 0.76, 0.89, and 0.86, respectively. Conclusions: In our study, we successfully predicted semen analysis parameters using data derived from testicular ultrasonography images through deep learning algorithms, representing an innovative application of artificial intelligence. Given the limited published research in this area, our study makes a significant contribution to the field and provides a foundation for future validation studies. Full article
(This article belongs to the Special Issue Clinical Advances in Artificial Intelligence in Urology)
Show Figures

Figure 1

25 pages, 9176 KiB  
Article
Dynamic Ensemble Learning with Gradient-Weighted Class Activation Mapping for Enhanced Gastrointestinal Disease Classification
by Chih Mao Tsai and Jiann-Der Lee
Electronics 2025, 14(2), 305; https://doi.org/10.3390/electronics14020305 - 14 Jan 2025
Viewed by 324
Abstract
Gastrointestinal (GI) disease classification through endoscopic images is a critical yet challenging task, due to inter-class variability and subtle feature overlaps. This study introduces a novel ensemble learning framework that combines case-specific dynamic weighting with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both [...] Read more.
Gastrointestinal (GI) disease classification through endoscopic images is a critical yet challenging task, due to inter-class variability and subtle feature overlaps. This study introduces a novel ensemble learning framework that combines case-specific dynamic weighting with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both accuracy and interpretability. Three state-of-the-art convolutional neural networks (DenseNet201, InceptionV3, and VGG19) were fine-tuned on the Kvasir v2 dataset and integrated using optimized weights (0.4, 0.4, and 0.2, respectively). The ensemble achieved an accuracy of 0.91, outperforming individual models, particularly in complex classes such as Esophagitis and Normal-Z-Line. Grad-CAM visualizations confirmed the ensemble’s focus on clinically relevant features, highlighting its potential to improve diagnostic interpretability. While the dynamic ensemble approach significantly enhanced classification performance, further refinement is needed to address subtle and ambiguous cases. These results underscore the promise of dynamic ensemble learning as an explainable and clinically applicable tool in medical imaging. Full article
Show Figures

Figure 1

21 pages, 8009 KiB  
Article
Explainable AI for DeepFake Detection
by Nazneen Mansoor and Alexander I. Iliev
Appl. Sci. 2025, 15(2), 725; https://doi.org/10.3390/app15020725 - 13 Jan 2025
Viewed by 449
Abstract
The surge in technological advancements has resulted in concerns over its misuse in politics and entertainment, making reliable detection methods essential. This study introduces a deepfake detection technique that enhances interpretability using the network dissection algorithm. This research consists of two stages: (1) [...] Read more.
The surge in technological advancements has resulted in concerns over its misuse in politics and entertainment, making reliable detection methods essential. This study introduces a deepfake detection technique that enhances interpretability using the network dissection algorithm. This research consists of two stages: (1) detection of forged images using advanced convolutional neural networks such as ResNet-50, Inception V3, and VGG-16, and (2) applying the network dissection algorithm to understand the models’ internal decision-making processes. The CNNs’ performance is evaluated through F1-scores ranging from 0.8 to 0.9, demonstrating their effectiveness. By analyzing the facial features learned by the models, this study provides explainable results for classifying images as real or fake. This interpretability is crucial in understanding how deepfake detection models operate. Although numerous detection models exist, they often lack transparency in their decision-making processes. This research fills that gap by offering insights into how these models distinguish real from manipulated images. The findings highlight the importance of interpretability in deep neural networks, providing a better understanding of their hierarchical structures and decision processes. Full article
(This article belongs to the Special Issue New Advances in Computer Security and Cybersecurity)
Show Figures

Figure 1

21 pages, 2363 KiB  
Article
Smart Defect Detection in Aero-Engines: Evaluating Transfer Learning with VGG19 and Data-Efficient Image Transformer Models
by Samira Mohammadi, Vahid Rahmanian, Sasan Sattarpanah Karganroudi and Mehdi Adda
Machines 2025, 13(1), 49; https://doi.org/10.3390/machines13010049 - 13 Jan 2025
Viewed by 354
Abstract
This study explores the impact of transfer learning on enhancing deep learning models for detecting defects in aero-engine components. We focused on metrics such as accuracy, precision, recall, and loss to compare the performance of models VGG19 and DeiT (data-efficient image transformer). RandomSearchCV [...] Read more.
This study explores the impact of transfer learning on enhancing deep learning models for detecting defects in aero-engine components. We focused on metrics such as accuracy, precision, recall, and loss to compare the performance of models VGG19 and DeiT (data-efficient image transformer). RandomSearchCV was used for hyperparameter optimization, and we selectively froze some layers during training to help better tailor the models to our dataset. We conclude that the difference in performance across all metrics can be attributed to the adoption of the transformer-based architecture by the DeiT model as it does this well in capturing complex patterns in data. This research demonstrates that transformer models hold promise for improving the accuracy and efficiency of defect detection within the aerospace industry, which will, in turn, contribute to cleaner and more sustainable aviation activities. Full article
(This article belongs to the Section Machines Testing and Maintenance)
Show Figures

Figure 1

20 pages, 7167 KiB  
Article
Accelerating Deep Learning-Based Morphological Biometric Recognition with Field-Programmable Gate Arrays
by Nourhan Zayed, Nahed Tawfik, Mervat M. A. Mahmoud, Ahmed Fawzy, Young-Im Cho and Mohamed S. Abdallah
AI 2025, 6(1), 8; https://doi.org/10.3390/ai6010008 - 9 Jan 2025
Viewed by 499
Abstract
Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. [...] Read more.
Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. Nevertheless, the intricate architectural design of CNNs leads to significant computing requirements. To tackle these issues, it is essential to construct a system based on field-programmable gate arrays (FPGAs) to speed up CNNs. FPGAs provide fast development capabilities, energy efficiency, decreased latency, and advanced reconfigurability. A facial recognition solution by leveraging deep learning and subsequently deploying it on an FPGA platform is suggested. The system detects whether a person has the necessary authorization to enter/access a place. The FPGA is responsible for processing this system with utmost security and without any internet connectivity. Various facial recognition networks are accomplished, including AlexNet, ResNet, and VGG-16 networks. The findings of the proposed method prove that the GoogLeNet network is the best fit due to its lower computational resource requirements, speed, and accuracy. The system was deployed on three hardware kits to appraise the performance of different programming approaches in terms of accuracy, latency, cost, and power consumption. The software programming on the Raspberry Pi-3B kit had a recognition accuracy of around 70–75% and relied on a stable internet connection for processing. This dependency on internet connectivity increases bandwidth consumption and fails to meet the required security criteria, contrary to ZYBO-Z7 board hardware programming. Nevertheless, the hardware/software co-design on the PYNQ-Z2 board achieved an accuracy rate of 85% to 87%. It operates independently of an internet connection, making it a standalone system and saving costs. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

26 pages, 5178 KiB  
Article
Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models
by Salem Shamsul Alam, Nabila Rashid, Tasfia Azrin Faiza, Saif Ahmed, Rifat Ahmed Hassan, James Dudley and Taseef Hasan Farook
Oral 2025, 5(1), 3; https://doi.org/10.3390/oral5010003 - 8 Jan 2025
Viewed by 919
Abstract
Purpose: The purpose of this study was to compare multiple deep learning models for estimating age and sex using dental panoramic radiographs and identify the most successful deep learning models for the specified tasks. Methods: The dataset of 437 panoramic radiographs was divided [...] Read more.
Purpose: The purpose of this study was to compare multiple deep learning models for estimating age and sex using dental panoramic radiographs and identify the most successful deep learning models for the specified tasks. Methods: The dataset of 437 panoramic radiographs was divided into training, validation, and testing sets. Random oversampling was used to balance the class distributions in the training data and address the class imbalance in sex and age. The models studied were neural network models (CNN, VGG16, VGG19, ResNet50, ResNet101, ResNet152, MobileNet, DenseNet121, DenseNet169) and vision–language models (Vision Transformer and Moondream2). Binary classification models were built for sex classification, while regression models were developed for age estimations. Sex classification was evaluated using precision, recall, F1 score, accuracy, area under the curve (AUC), and a confusion matrix. For age regression, performance was evaluated using mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R2, and mean absolute percentage error (MAPE). Results: In sex classification, neural networks achieved accuracies of 85% and an AUC of 0.85, while Moondream2 had much lower accuracy (49%) and AUC (0.48). DenseNet169 performed better than other models for age regression, with an R2 of 0.57 and an MAE of 7.07. Among sex classes, the CNN model achieved the highest precision, recall, and F1 score for both males and females. Vision Transformers that specialised in identifying objects from images demonstrated weaker performance in dental panoramic radiographs, with an inference time of 4.5 s per image. Conclusions: The CNN and DenseNet169 were the most effective models for classifying sex and age regression, performing better than other models for estimating age and sex from dental panoramic radiographs. Full article
(This article belongs to the Special Issue Artificial Intelligence in Oral Medicine: Advancements and Challenges)
Show Figures

Figure 1

25 pages, 4771 KiB  
Article
Leveraging Deep Learning for Real-Time Coffee Leaf Disease Identification
by Opeyemi Adelaja and Bernardi Pranggono
AgriEngineering 2025, 7(1), 13; https://doi.org/10.3390/agriengineering7010013 - 8 Jan 2025
Viewed by 518
Abstract
Agriculture is vital for providing food and economic benefits, but crop diseases pose significant challenges, including coffee cultivation. Traditional methods for disease identification are labor-intensive and lack real-time capabilities. This study aims to address existing methods’ limitations and provide a more efficient, reliable, [...] Read more.
Agriculture is vital for providing food and economic benefits, but crop diseases pose significant challenges, including coffee cultivation. Traditional methods for disease identification are labor-intensive and lack real-time capabilities. This study aims to address existing methods’ limitations and provide a more efficient, reliable, and cost-effective solution for coffee leaf disease identification. It presents a novel approach to the real-time identification of coffee leaf diseases using deep learning. We implemented several transfer learning (TL) models, including ResNet101, Xception, CoffNet, and VGG16, to evaluate the feasibility and reliability of our solution. The experiment results show that the proposed models achieved high accuracy rates of 97.30%, 97.60%, 97.88%, and 99.89%, respectively. CoffNet, our proposed model, showed a notable processing speed of 125.93 frames per second (fps), making it suitable for real-time applications. Using a diverse dataset of mixed images from multiple devices, our approach reduces the workload of farmers and simplifies the disease detection process. The findings lay the groundwork for the development of practical and efficient systems that can assist coffee growers in disease management, promoting sustainable farming practices, and food security. Full article
(This article belongs to the Special Issue The Future of Artificial Intelligence in Agriculture)
Show Figures

Figure 1

17 pages, 19075 KiB  
Article
A Channel Attention-Driven Optimized CNN for Efficient Early Detection of Plant Diseases in Resource Constrained Environment
by Sana Parez, Naqqash Dilshad and Jong Weon Lee
Agriculture 2025, 15(2), 127; https://doi.org/10.3390/agriculture15020127 - 8 Jan 2025
Viewed by 452
Abstract
Agriculture is a cornerstone of economic prosperity, but plant diseases can severely impact crop yield and quality. Identifying these diseases accurately is often difficult due to limited expert availability and ambiguous information. Early detection and automated diagnosis systems are crucial to mitigate these [...] Read more.
Agriculture is a cornerstone of economic prosperity, but plant diseases can severely impact crop yield and quality. Identifying these diseases accurately is often difficult due to limited expert availability and ambiguous information. Early detection and automated diagnosis systems are crucial to mitigate these challenges. To address this, we propose a lightweight convolutional neural network (CNN) designed for resource-constrained devices termed as LeafNet. LeafNet draws inspiration from the block-wise VGG19 architecture but incorporates several optimizations, including a reduced number of parameters, smaller input size, and faster inference time while maintaining competitive accuracy. The proposed LeafNet leverages small, uniform convolutional filters to capture fine-grained details of plant disease features, with an increasing number of channels to enhance feature extraction. Additionally, it integrates channel attention mechanisms to prioritize disease-related features effectively. We evaluated the proposed method on four datasets: the benchmark plant village (PV), the data repository of leaf images (DRLIs), the newly curated plant composite (PC) dataset, and the BARI Sunflower (BARI-Sun) dataset, which includes diverse and challenging real-world images. The results show that the proposed performs comparably to state-of-the-art methods in terms of accuracy, false positive rate (FPR), model size, and runtime, highlighting its potential for real-world applications. Full article
Show Figures

Figure 1

15 pages, 7120 KiB  
Article
Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer
by Yao Huo, Yongbo Liu, Peng He, Liang Hu, Wenbo Gao and Le Gu
Agriculture 2025, 15(2), 120; https://doi.org/10.3390/agriculture15020120 - 7 Jan 2025
Viewed by 382
Abstract
In protected agriculture, accurately identifying the key growth stages of tomatoes plays a significant role in achieving efficient management and high-precision production. However, traditional approaches often face challenges like non-standardized data collection, unbalanced datasets, low recognition efficiency, and limited accuracy. This paper proposes [...] Read more.
In protected agriculture, accurately identifying the key growth stages of tomatoes plays a significant role in achieving efficient management and high-precision production. However, traditional approaches often face challenges like non-standardized data collection, unbalanced datasets, low recognition efficiency, and limited accuracy. This paper proposes an innovative solution combining generative adversarial networks (GANs) and deep learning techniques to address these challenges. Specifically, the StyleGAN3 model is employed to generate high-quality images of tomato growth stages, effectively augmenting the original dataset with a broader range of images. This augmented dataset is then processed using a Vision Transformer (ViT) model for intelligent recognition of tomato growth stages within a protected agricultural environment. The proposed method was tested on 2723 images, demonstrating that the generated images are nearly indistinguishable from real images. The combined training approach incorporating both generated and original images produced superior recognition results compared to training with only the original images. The validation set achieved an accuracy of 99.6%, while the test set achieved 98.39%, marking improvements of 22.85%, 3.57%, and 3.21% over AlexNet, DenseNet50, and VGG16, respectively. The average detection speed was 9.5 ms. This method provides a highly effective means of identifying tomato growth stages in protected environments and offers valuable insights for improving the efficiency and quality of protected crop production. Full article
Show Figures

Figure 1

14 pages, 3521 KiB  
Article
Attention Score-Based Multi-Vision Transformer Technique for Plant Disease Classification
by Eu-Tteum Baek
Sensors 2025, 25(1), 270; https://doi.org/10.3390/s25010270 - 6 Jan 2025
Viewed by 405
Abstract
This study proposes an advanced plant disease classification framework leveraging the Attention Score-Based Multi-Vision Transformer (Multi-ViT) model. The framework introduces a novel attention mechanism to dynamically prioritize relevant features from multiple leaf images, overcoming the limitations of single-leaf-based diagnoses. Building on the Vision [...] Read more.
This study proposes an advanced plant disease classification framework leveraging the Attention Score-Based Multi-Vision Transformer (Multi-ViT) model. The framework introduces a novel attention mechanism to dynamically prioritize relevant features from multiple leaf images, overcoming the limitations of single-leaf-based diagnoses. Building on the Vision Transformer (ViT) architecture, the Multi-ViT model aggregates diverse feature representations by combining outputs from multiple ViTs, each capturing unique visual patterns. This approach allows for a holistic analysis of spatially distributed symptoms, crucial for accurately diagnosing diseases in trees. Extensive experiments conducted on apple, grape, and tomato leaf disease datasets demonstrate the model’s superior performance, achieving over 99% accuracy and significantly improving F1 scores compared to traditional methods such as ResNet, VGG, and MobileNet. These findings underscore the effectiveness of the proposed model for precise and reliable plant disease classification. Full article
(This article belongs to the Special Issue Artificial Intelligence and Key Technologies of Smart Agriculture)
Show Figures

Figure 1

19 pages, 4622 KiB  
Article
Lightweight Deep Learning Model, ConvNeXt-U: An Improved U-Net Network for Extracting Cropland in Complex Landscapes from Gaofen-2 Images
by Shukuan Liu, Shi Cao, Xia Lu, Jiqing Peng, Lina Ping, Xiang Fan, Feiyu Teng and Xiangnan Liu
Sensors 2025, 25(1), 261; https://doi.org/10.3390/s25010261 - 5 Jan 2025
Viewed by 480
Abstract
Extracting fragmented cropland is essential for effective cropland management and sustainable agricultural development. However, extracting fragmented cropland presents significant challenges due to its irregular and blurred boundaries, as well as the diversity in crop types and distribution. Deep learning methods are widely used [...] Read more.
Extracting fragmented cropland is essential for effective cropland management and sustainable agricultural development. However, extracting fragmented cropland presents significant challenges due to its irregular and blurred boundaries, as well as the diversity in crop types and distribution. Deep learning methods are widely used for land cover classification. This paper proposes ConvNeXt-U, a lightweight deep learning network that efficiently extracts fragmented cropland while reducing computational requirements and saving costs. ConvNeXt-U retains the U-shaped structure of U-Net but replaces the encoder with a simplified ConvNeXt architecture. The decoder remains unchanged from U-Net, and the lightweight CBAM (Convolutional Block Attention Module) is integrated. This module adaptively adjusts the channel and spatial dimensions of feature maps, emphasizing key features and suppressing redundant information, which enhances the capture of edge features and improves extraction accuracy. The case study area is Hengyang County, Hunan Province, China, using GF-2 remote sensing imagery. The results show that ConvNeXt-U outperforms existing methods, such as Swin Transformer (Acc = 85.1%, IoU = 79.1%), MobileNetV3 (Acc = 83.4%, IoU = 77.6%), VGG16 (Acc = 80.5%, IoU = 74.6%), and ResUnet (Acc = 81.8%, IoU = 76.1%), achieving an IoU of 79.5% and Acc of 85.2%. Under the same conditions, ConvNeXt-U has a faster inference speed of 37 images/s, compared to 28 images/s for Swin Transformer, 35 images/s for MobileNetV3, and 0.43 and 0.44 images/s for VGG16 and ResUnet, respectively. Moreover, ConvNeXt-U outperforms other methods in processing the boundaries of fragmented cropland, producing clearer and more complete boundaries. The results indicate that the ConvNeXt and CBAM modules significantly enhance the accuracy of fragmented cropland extraction. ConvNeXt-U is also an effective method for extracting fragmented cropland from remote sensing imagery. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Back to TopTop