Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (606)

Search Parameters:
Keywords = multi-model inference

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 22292 KiB  
Article
An Efficient and Accurate Quality Inspection Model for Steel Scraps Based on Dense Small-Target Detection
by Pengcheng Xiao, Chao Wang, Liguang Zhu, Wenguang Xu, Yuxin Jin and Rong Zhu
Processes 2024, 12(8), 1700; https://doi.org/10.3390/pr12081700 - 14 Aug 2024
Viewed by 224
Abstract
Scrap steel serves as the primary alternative raw material to iron ore, exerting a significant impact on production costs for steel enterprises. With the annual growth in scrap resources, concerns regarding traditional manual inspection methods, including issues of fairness and safety, gain increasing [...] Read more.
Scrap steel serves as the primary alternative raw material to iron ore, exerting a significant impact on production costs for steel enterprises. With the annual growth in scrap resources, concerns regarding traditional manual inspection methods, including issues of fairness and safety, gain increasing prominence. Enhancing scrap inspection processes through digital technology is imperative. In response to these concerns, we developed CNIL-Net, a scrap-quality inspection network model based on object detection, and trained and validated it using images obtained during the scrap inspection process. Initially, we deployed a multi-camera integrated system at a steel plant for acquiring scrap images of diverse types, which were subsequently annotated and employed for constructing an enhanced scrap dataset. Then, we enhanced the YOLOv5 model to improve the detection of small-target scraps in inspection scenarios. This was achieved by adding a small-object detection layer (P2) and streamlining the model through the removal of detection layer P5, resulting in the development of a novel three-layer detection network structure termed the Improved Layer (IL) model. A Coordinate Attention mechanism was incorporated into the network to dynamically learn feature weights from various positions, thereby improving the discernment of scrap features. Substituting the traditional non-maximum suppression algorithm (NMS) with Soft-NMS enhanced detection accuracy in dense and overlapping scrap scenarios, thereby mitigating instances of missed detections. Finally, the model underwent training and validation utilizing the augmented dataset of scraps. Throughout this phase, assessments encompassed metrics like mAP, number of network layers, parameters, and inference duration. Experimental findings illustrate that the developed CNIL-Net scrap-quality inspection network model boosted the average precision across all categories from 88.8% to 96.5%. Compared to manual inspection, it demonstrates notable advantages in accuracy and detection speed, rendering it well suited for real-world deployment and addressing issues in scrap inspection like real-time processing and fairness. Full article
(This article belongs to the Special Issue Advanced Ladle Metallurgy and Secondary Refining)
Show Figures

Figure 1

17 pages, 906 KiB  
Article
VCC-DiffNet: Visual Conditional Control Diffusion Network for Remote Sensing Image Captioning
by Qimin Cheng, Yuqi Xu and Ziyang Huang
Remote Sens. 2024, 16(16), 2961; https://doi.org/10.3390/rs16162961 - 12 Aug 2024
Viewed by 306
Abstract
Pioneering remote sensing image captioning (RSIC) works use autoregressive decoding for fluent and coherent sentences but suffer from high latency and high computation costs. In contrast, non-autoregressive approaches improve inference speed by predicting multiple tokens simultaneously, though at the cost of performance due [...] Read more.
Pioneering remote sensing image captioning (RSIC) works use autoregressive decoding for fluent and coherent sentences but suffer from high latency and high computation costs. In contrast, non-autoregressive approaches improve inference speed by predicting multiple tokens simultaneously, though at the cost of performance due to a lack of sequential dependencies. Recently, diffusion model-based non-autoregressive decoding has shown promise in natural image captioning with iterative refinement, but its effectiveness is limited by the intrinsic characteristics of remote sensing images, which complicate robust input construction and affect the description accuracy. To overcome these challenges, we propose an innovative diffusion model for RSIC, named the Visual Conditional Control Diffusion Network (VCC-DiffNet). Specifically, we propose a Refined Multi-scale Feature Extraction (RMFE) module to extract the discernible visual context features of RSIs as input of the diffusion model-based non-autoregressive decoder to conditionally control a multi-step denoising process. Furthermore, we propose an Interactive Enhanced Decoder (IE-Decoder) utilizing dual image–description interactions to generate descriptions finely aligned with the image content. Experiments conducted on four representative RSIC datasets demonstrate that our non-autoregressive VCC-DiffNet performs comparably to, or even better than, popular autoregressive baselines in classic metrics, achieving around an 8.22× speedup in Sydney-Captions, an 11.61× speedup in UCM-Captions, a 15.20× speedup in RSICD, and an 8.13× speedup in NWPU-Captions. Full article
Show Figures

Figure 1

30 pages, 2658 KiB  
Article
SecuriDN: A Modeling Tool Supporting the Early Detection of Cyberattacks to Smart Energy Systems
by Davide Cerotti, Daniele Codetta Raiteri, Giovanna Dondossola, Lavinia Egidi, Giuliana Franceschinis, Luigi Portinale, Davide Savarro and Roberta Terruggia
Energies 2024, 17(16), 3882; https://doi.org/10.3390/en17163882 - 6 Aug 2024
Viewed by 493
Abstract
SecuriDN v. 0.1 is a tool for the representation of the assets composing the IT and the OT subsystems of Distributed Energy Resources (DERs) control networks and the possible cyberattacks that can threaten them. It is part of a platform that allows the [...] Read more.
SecuriDN v. 0.1 is a tool for the representation of the assets composing the IT and the OT subsystems of Distributed Energy Resources (DERs) control networks and the possible cyberattacks that can threaten them. It is part of a platform that allows the evaluation of the security risks of DER control systems. SecuriDN is a multi-formalism tool, meaning that it manages several types of models: architecture graph, attack graphs and Dynamic Bayesian Networks (DBNs). In particular, each asset in the architecture is characterized by an attack graph showing the combinations of attack techniques that may affect the asset. By merging the attack graphs according to the asset associations in the architecture, a DBN is generated. Then, the evidence-based and time-driven probabilistic analysis of the DBN permits the quantification of the system security level. Indeed, the DBN probabilistic graphical model can be analyzed through inference algorithms, suitable for forward and backward assessment of the system’s belief state. In this paper, the features and the main goals of SecuriDN are described and illustrated through a simplified but realistic case study. Full article
(This article belongs to the Special Issue Model Predictive Control-Based Approach for Microgrids)
Show Figures

Figure 1

14 pages, 605 KiB  
Article
A Hierarchical Multi-Task Learning Framework for Semantic Annotation in Tabular Data
by Jie Wu and Mengshu Hou
Entropy 2024, 26(8), 664; https://doi.org/10.3390/e26080664 - 4 Aug 2024
Viewed by 414
Abstract
To optimize the utilization and analysis of tables, it is essential to recognize and understand their semantics comprehensively. This requirement is especially critical given that many tables lack explicit annotations, necessitating the identification of column types and inter-column relationships. Such identification can significantly [...] Read more.
To optimize the utilization and analysis of tables, it is essential to recognize and understand their semantics comprehensively. This requirement is especially critical given that many tables lack explicit annotations, necessitating the identification of column types and inter-column relationships. Such identification can significantly augment data quality, streamline data integration, and support data analysis and mining. Current table annotation models often address each subtask independently, which may result in the neglect of constraints and contextual information, causing relational ambiguities and inference errors. To address this issue, we propose a unified multi-task learning framework capable of concurrently handling multiple tasks within a single model, including column named entity recognition, column type identification, and inter-column relationship detection. By integrating these tasks, the framework exploits their interrelations, facilitating the exchange of shallow features and the sharing of representations. Their cooperation enables each task to leverage insights from the others, thereby improving the performance of individual subtasks and enhancing the model’s overall generalization capabilities. Notably, our model is designed to employ only the internal information of tabular data, avoiding reliance on external context or knowledge graphs. This design ensures robust performance even with limited input information. Extensive experiments demonstrate the superior performance of our model across various tasks, validating the effectiveness of unified multi-task learning framework in the recognition and comprehension of table semantics. Full article
(This article belongs to the Special Issue Natural Language Processing and Data Mining)
Show Figures

Figure 1

16 pages, 3078 KiB  
Article
Multimodal Machine Translation Based on Enhanced Knowledge Distillation and Feature Fusion
by Erlin Tian, Zengchao Zhu, Fangmei Liu, Zuhe Li, Ran Gu and Shuai Zhao
Electronics 2024, 13(15), 3084; https://doi.org/10.3390/electronics13153084 - 4 Aug 2024
Viewed by 365
Abstract
Existing research on multimodal machine translation (MMT) has typically enhanced bilingual translation by introducing additional alignment visual information. However, picture form requirements in multimodal datasets pose important constraints on the development of MMT because this requires a form of alignment between image, source [...] Read more.
Existing research on multimodal machine translation (MMT) has typically enhanced bilingual translation by introducing additional alignment visual information. However, picture form requirements in multimodal datasets pose important constraints on the development of MMT because this requires a form of alignment between image, source text, and target text. This limitation is especially compounded by the fact that the inference phase, when aligning images, is not directly available in a conventional neural machine translation (NMT) setup. Therefore, we propose an innovative MMT framework called the DSKP-MMT model, which supports machine translation by enhancing knowledge distillation and feature refinement methods in the absence of images. Our model first generates multimodal features from the source text. Then, the purified features are obtained through the multimodal feature generator and knowledge distillation module. The features generated through image feature enhancement are subsequently further purified. Finally, the image–text fusion features are generated as input in the transformer-based machine translation reasoning task. In the Multi30K dataset test, the DSKP-MMT model has achieved a BLEU of 40.42 and a METEOR of 58.15, showing its ability to improve translation effectiveness and facilitating utterance communication. Full article
Show Figures

Figure 1

17 pages, 23072 KiB  
Article
Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices
by Shouliang Li, Jiale Han, Fanghui Chen, Rudong Min, Sixue Yi and Zhen Yang
Remote Sens. 2024, 16(15), 2846; https://doi.org/10.3390/rs16152846 - 2 Aug 2024
Viewed by 341
Abstract
Forest fires pose a catastrophic threat to Earth’s ecology as well as threaten human beings. Timely and accurate monitoring of forest fires can significantly reduce potential casualties and property damage. Thus, to address the aforementioned problems, this paper proposed an unmanned aerial vehicle [...] Read more.
Forest fires pose a catastrophic threat to Earth’s ecology as well as threaten human beings. Timely and accurate monitoring of forest fires can significantly reduce potential casualties and property damage. Thus, to address the aforementioned problems, this paper proposed an unmanned aerial vehicle (UAV) based on a lightweight forest fire recognition model, Fire-Net, which has a multi-stage structure and incorporates cross-channel attention following the fifth stage. This is to enable the model’s ability to perceive features at various scales, particularly small-scale fire sources in wild forest scenes. Through training and testing on a real-world dataset, various lightweight convolutional neural networks were evaluated on embedded devices. The experimental outcomes indicate that Fire-Net attained an accuracy of 98.18%, a precision of 99.14%, and a recall of 98.01%, surpassing the current leading methods. Furthermore, the model showcases an average inference time of 10 milliseconds per image and operates at 86 frames per second (FPS) on embedded devices. Full article
Show Figures

Figure 1

14 pages, 1193 KiB  
Article
Use of Machine Learning to Improve Additive Manufacturing Processes
by Izabela Rojek, Jakub Kopowski, Jakub Lewandowski and Dariusz Mikołajewski
Appl. Sci. 2024, 14(15), 6730; https://doi.org/10.3390/app14156730 - 1 Aug 2024
Viewed by 411
Abstract
Rapidly developing artificial intelligence (AI) can help machines and devices to perceive, analyze, and even make inferences in a similar way to human reasoning. The aim of this article is to present applications of AI methods, including machine learning (ML), in the design [...] Read more.
Rapidly developing artificial intelligence (AI) can help machines and devices to perceive, analyze, and even make inferences in a similar way to human reasoning. The aim of this article is to present applications of AI methods, including machine learning (ML), in the design and supervision of processes used in the field of additive manufacturing techniques. This approach will allow specific tasks to be solved as if they were performed by a human expert in the field. The application of AI in the development of additive manufacturing technologies makes it possible to be assisted by the knowledge of experienced operators in the design and supervision of processes acquired automatically. This reduces the risk of human error and simplifies and automates the production of products and parts. AI in 3D technology creates a wide range of possibilities for generating 3D objects and enables a machine equipped with a vision system, used in ML processes, to analyze data similar to human thought processes. Incremental printing using such a printer allows the production of objects of ever-increasing quality from several materials simultaneously. The process itself is also precise and fast. An accuracy of 97.56% means that the model is precise and makes very few errors. The 3D printing system with artificial intelligence allows the device to adapt to, for example, different material properties, as the printer examines the 3D-printed surface and automatically adjusts the printing. AI/ML-based solutions similar to ours, once learning sets are modified or extended, are easily adaptable to other technologies, materials, or multi-material 3D printing. They also allow the creation of dedicated, ML solutions that adapt to the specifics of a production line, including as self-learning solutions as production progresses. Full article
(This article belongs to the Section Additive Manufacturing Technologies)
Show Figures

Figure 1

11 pages, 945 KiB  
Article
VOGDB—Database of Virus Orthologous Groups
by Lovro Trgovec-Greif, Hans-Jörg Hellinger, Jean Mainguy, Alexander Pfundner, Dmitrij Frishman, Michael Kiening, Nicole Suzanne Webster, Patrick William Laffy, Michael Feichtinger and Thomas Rattei
Viruses 2024, 16(8), 1191; https://doi.org/10.3390/v16081191 - 25 Jul 2024
Viewed by 507
Abstract
Computational models of homologous protein groups are essential in sequence bioinformatics. Due to the diversity and rapid evolution of viruses, the grouping of protein sequences from virus genomes is particularly challenging. The low sequence similarities of homologous genes in viruses require specific approaches [...] Read more.
Computational models of homologous protein groups are essential in sequence bioinformatics. Due to the diversity and rapid evolution of viruses, the grouping of protein sequences from virus genomes is particularly challenging. The low sequence similarities of homologous genes in viruses require specific approaches for sequence- and structure-based clustering. Furthermore, the annotation of virus genomes in public databases is not as consistent and up to date as for many cellular genomes. To tackle these problems, we have developed VOGDB, which is a database of virus orthologous groups. VOGDB is a multi-layer database that progressively groups viral genes into groups connected by increasingly remote similarity. The first layer is based on pair-wise sequence similarities, the second layer is based on the sequence profile alignments, and the third layer uses predicted protein structures to find the most remote similarity. VOGDB groups allow for more sensitive homology searches of novel genes and increase the chance of predicting annotations or inferring phylogeny. VOGD B uses all virus genomes from RefSeq and partially reannotates them. VOGDB is updated with every RefSeq release. The unique feature of VOGDB is the inclusion of both prokaryotic and eukaryotic viruses in the same clustering process, which makes it possible to explore old evolutionary relationships of the two groups. VOGDB is freely available at vogdb.org under the CC BY 4.0 license. Full article
(This article belongs to the Section General Virology)
Show Figures

Figure 1

18 pages, 16041 KiB  
Article
Dynamic Inversion Method of Calculating Large-Scale Urban Building Height Based on Cooperative Satellite Laser Altimetry and Multi-Source Optical Remote Sensing
by Haobin Xia, Jianjun Wu, Jiaqi Yao, Nan Xu, Xiaoming Gao, Yubin Liang, Jianhua Yang, Jianhang Zhang, Liang Gao, Weiqi Jin and Bowen Ni
Land 2024, 13(8), 1120; https://doi.org/10.3390/land13081120 - 24 Jul 2024
Viewed by 453
Abstract
Building height is a crucial indicator when studying urban environments and human activities, necessitating accurate, large-scale, and fine-resolution calculations. However, mainstream machine learning-based methods for inferring building heights face numerous challenges, including limited sample data and slow update frequencies. Alternatively, satellite laser altimetry [...] Read more.
Building height is a crucial indicator when studying urban environments and human activities, necessitating accurate, large-scale, and fine-resolution calculations. However, mainstream machine learning-based methods for inferring building heights face numerous challenges, including limited sample data and slow update frequencies. Alternatively, satellite laser altimetry technology offers a reliable means of calculating building heights with high precision. Here, we initially calculated building heights along satellite orbits based on building-rooftop contour vector datasets and ICESat-2 ATL03 photon data from 2019 to 2022. By integrating multi-source passive remote sensing observation data, we used the inferred building height results as reference data to train a random forest model, regressing building heights at a 10 m scale. Compared with ground-measured heights, building height samples constructed from ICESat-2 photon data outperformed methods that indirectly infer building heights using total building floor number. Moreover, the simulated building heights strongly correlated with actual observations at a single-city scale. Finally, using several years of inferred results, we analyzed building height changes in Tianjin from 2019 to 2022. Combined with the random forest model, the proposed model enables large-scale, high-precision inference of building heights with frequent updates, which has significant implications for global dynamic observation of urban three-dimensional features. Full article
(This article belongs to the Special Issue GeoAI for Urban Sustainability Monitoring and Analysis)
Show Figures

Figure 1

25 pages, 1497 KiB  
Article
sBERT: Parameter-Efficient Transformer-Based Deep Learning Model for Scientific Literature Classification
by Mohammad Munzir Ahanger, Mohd Arif Wani and Vasile Palade
Knowledge 2024, 4(3), 397-421; https://doi.org/10.3390/knowledge4030022 - 18 Jul 2024
Viewed by 569
Abstract
This paper introduces a parameter-efficient transformer-based model designed for scientific literature classification. By optimizing the transformer architecture, the proposed model significantly reduces memory usage, training time, inference time, and the carbon footprint associated with large language models. The proposed approach is evaluated against [...] Read more.
This paper introduces a parameter-efficient transformer-based model designed for scientific literature classification. By optimizing the transformer architecture, the proposed model significantly reduces memory usage, training time, inference time, and the carbon footprint associated with large language models. The proposed approach is evaluated against various deep learning models and demonstrates superior performance in classifying scientific literature. Comprehensive experiments conducted on datasets from Web of Science, ArXiv, Nature, Springer, and Wiley reveal that the proposed model’s multi-headed attention mechanism and enhanced embeddings contribute to its high accuracy and efficiency, making it a robust solution for text classification tasks. Full article
Show Figures

Figure 1

18 pages, 1434 KiB  
Article
Scalable and Interpretable Forecasting of Hydrological Time Series Based on Variational Gaussian Processes
by Julián David Pastrana-Cortés, Julian Gil-Gonzalez, Andrés Marino Álvarez-Meza, David Augusto Cárdenas-Peña and Álvaro Angel Orozco-Gutiérrez
Water 2024, 16(14), 2006; https://doi.org/10.3390/w16142006 - 15 Jul 2024
Viewed by 478
Abstract
Accurate streamflow forecasting is crucial for effectively managing water resources, particularly in countries like Colombia, where hydroelectric power generation significantly contributes to the national energy grid. Although highly interpretable, traditional deterministic, physically-driven models often suffer from complexity and require extensive parameterization. Data-driven models [...] Read more.
Accurate streamflow forecasting is crucial for effectively managing water resources, particularly in countries like Colombia, where hydroelectric power generation significantly contributes to the national energy grid. Although highly interpretable, traditional deterministic, physically-driven models often suffer from complexity and require extensive parameterization. Data-driven models like Linear Autoregressive (LAR) and Long Short-Term Memory (LSTM) networks offer simplicity and performance but cannot quantify uncertainty. This work introduces Sparse Variational Gaussian Processes (SVGPs) for forecasting streamflow contributions. The proposed SVGP model reduces computational complexity compared to traditional Gaussian Processes, making it highly scalable for large datasets. The methodology employs optimal hyperparameters and shared inducing points to capture short-term and long-term relationships among reservoirs. Training, validation, and analysis of the proposed approach consider the streamflow dataset from 23 geographically dispersed reservoirs recorded during twelve years in Colombia. Performance assessment reveals that the proposal outperforms baseline Linear Autoregressive (LAR) and Long Short-Term Memory (LSTM) models in three key aspects: adaptability to changing dynamics, provision of informative confidence intervals through Bayesian inference, and enhanced forecasting accuracy. Therefore, the SVGP-based forecasting methodology offers a scalable and interpretable solution for multi-output streamflow forecasting, thereby contributing to more effective water resource management and hydroelectric planning. Full article
Show Figures

Figure 1

19 pages, 4490 KiB  
Article
Drug–Target Interaction Prediction Based on an Interactive Inference Network
by Yuqi Chen, Xiaomin Liang, Wei Du, Yanchun Liang, Garry Wong and Liang Chen
Int. J. Mol. Sci. 2024, 25(14), 7753; https://doi.org/10.3390/ijms25147753 - 15 Jul 2024
Viewed by 671
Abstract
Drug–target interactions underlie the actions of chemical substances in medicine. Moreover, drug repurposing can expand use profiles while reducing costs and development time by exploiting potential multi-functional pharmacological properties based upon additional target interactions. Nonetheless, drug repurposing relies on the accurate identification and [...] Read more.
Drug–target interactions underlie the actions of chemical substances in medicine. Moreover, drug repurposing can expand use profiles while reducing costs and development time by exploiting potential multi-functional pharmacological properties based upon additional target interactions. Nonetheless, drug repurposing relies on the accurate identification and validation of drug–target interactions (DTIs). In this study, a novel drug–target interaction prediction model was developed. The model, based on an interactive inference network, contains embedding, encoding, interaction, feature extraction, and output layers. In addition, this study used Morgan and PubChem molecular fingerprints as additional information for drug encoding. The interaction layer in our model simulates the drug–target interaction process, which assists in understanding the interaction by representing the interaction space. Our method achieves high levels of predictive performance, as well as interpretability of drug–target interactions. Additionally, we predicted and validated 22 Alzheimer’s disease-related targets, suggesting our model is robust and effective and thus may be beneficial for drug repurposing. Full article
(This article belongs to the Collection Feature Papers in Molecular Pharmacology)
Show Figures

Graphical abstract

17 pages, 452 KiB  
Article
Bootstrap Approximation of Model Selection Probabilities for Multimodel Inference Frameworks
by Andres Dajles and Joseph Cavanaugh
Entropy 2024, 26(7), 599; https://doi.org/10.3390/e26070599 - 15 Jul 2024
Viewed by 430
Abstract
Most statistical modeling applications involve the consideration of a candidate collection of models based on various sets of explanatory variables. The candidate models may also differ in terms of the structural formulations for the systematic component and the posited probability distributions for the [...] Read more.
Most statistical modeling applications involve the consideration of a candidate collection of models based on various sets of explanatory variables. The candidate models may also differ in terms of the structural formulations for the systematic component and the posited probability distributions for the random component. A common practice is to use an information criterion to select a model from the collection that provides an optimal balance between fidelity to the data and parsimony. The analyst then typically proceeds as if the chosen model was the only model ever considered. However, such a practice fails to account for the variability inherent in the model selection process, which can lead to inappropriate inferential results and conclusions. In recent years, inferential methods have been proposed for multimodel frameworks that attempt to provide an appropriate accounting of modeling uncertainty. In the frequentist paradigm, such methods should ideally involve model selection probabilities, i.e., the relative frequencies of selection for each candidate model based on repeated sampling. Model selection probabilities can be conveniently approximated through bootstrapping. When the Akaike information criterion is employed, Akaike weights are also commonly used as a surrogate for selection probabilities. In this work, we show that the conventional bootstrap approach for approximating model selection probabilities is impacted by bias. We propose a simple correction to adjust for this bias. We also argue that Akaike weights do not provide adequate approximations for selection probabilities, although they do provide a crude gauge of model plausibility. Full article
Show Figures

Figure 1

17 pages, 56438 KiB  
Article
Lightweight Network of Multi-Stage Strawberry Detection Based on Improved YOLOv7-Tiny
by Chenglin Li, Haonan Wu, Tao Zhang, Jiahuan Lu and Jiehao Li
Agriculture 2024, 14(7), 1132; https://doi.org/10.3390/agriculture14071132 - 12 Jul 2024
Viewed by 459
Abstract
The color features of strawberries at different growth stages vary slightly and occluded during growth. To address these challenges, this study proposes a lightweight multi-stage detection method based on You Only Look Once version 7-tiny (YOLOv7-tiny) for strawberries in complex environments. First, the [...] Read more.
The color features of strawberries at different growth stages vary slightly and occluded during growth. To address these challenges, this study proposes a lightweight multi-stage detection method based on You Only Look Once version 7-tiny (YOLOv7-tiny) for strawberries in complex environments. First, the size of the model is reduced by replacing the ordinary convolution of the neck network used for deep feature extraction and fusion with lightweight Ghost convolution. Then, by introducing the Coordinate Attention (CA) module, the model’s focus on the target detection area is enhanced, thereby improving the detection accuracy of strawberries. The Wise Intersection over Union (WIoU) loss function is integrated to accelerate model convergence and enhance the recognition accuracy of occluded targets. The advanced Adaptive nesterov momentum algorithm (Adan) is utilized for gradient descent, processing averaged sample data. Additionally, considering the small size of strawberry targets, a detection head specifically for small targets is added, performing detection on a 160 × 160 × 64 feature map, which significantly improves the detection performance for small strawberries. Experimental results demonstrate that the improved network model achieves an mAP@0.5 of 88.2% for multi-stage strawberry detection, which is 2.44% higher than the original YOLOv7-tiny algorithm. Meanwhile, GFLOPs and Params are reduced by 1.54% and 12.10%, respectively. In practical detection and inference, the improved model outperforms current mainstream target detection models, enabling a quicker and more accurate identification of strawberries at different growth stages, thus providing technical support for intelligent strawberry picking. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

18 pages, 61546 KiB  
Article
Research on Improved Lightweight YOLOv5s for Multi-Scale Ship Target Detection
by Peng Zhang, Peiqiao Zhu, Ze Sun, Jun Ding, Jiale Zhang, Junwei Dong and Wei Guo
Appl. Sci. 2024, 14(14), 6075; https://doi.org/10.3390/app14146075 - 12 Jul 2024
Viewed by 400
Abstract
Fast and accurate ship target detection technology plays an important role in improving driving safety, rescue at sea, marine environmental protection, and sea traffic control. It is also one of the key technologies for the development of ship informatization and intelligence. However, the [...] Read more.
Fast and accurate ship target detection technology plays an important role in improving driving safety, rescue at sea, marine environmental protection, and sea traffic control. It is also one of the key technologies for the development of ship informatization and intelligence. However, the current ship target detection models used at different scales in multiple scenarios exhibit high complexity and slow inference speed. The trade-off between model detection speed and accuracy limits the deployment of ship target detection models on edge devices. This study proposes a lightweight multi-scale ship target detection model based on the Yolov5s model. In the proposed model, the lightweight EfficientnetV2 and C3Ghost networks are integrated into the backbone and neck networks of the Yolov5s model to compress the computational and parametric quantities of the model and improve the detection speed. The Shuffle Attention mechanism is embedded in the neck network component of the model to enhance the representation of important feature information, suppress irrelevant feature information, and improve the model’s detection performance. The improved method is trained and verified on the dataset collected and labeled by the authors. Compared with the baseline model, the inference speed of the proposed model increased by 29.58%, mAP0.5 improved by 0.1%, and the parameters and floating-point operations decreased by 42.82% and 68.35%, respectively. The file size of the model is 8.02MB, which is 41.46% lower than the baseline model. Compared with other lightweight models, the method proposed in this study is more favored in edge computing. Full article
Show Figures

Figure 1

Back to TopTop