Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,057)

Search Parameters:
Keywords = scene classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 2602 KiB  
Article
Multi-Scale and Multi-Network Deep Feature Fusion for Discriminative Scene Classification of High-Resolution Remote Sensing Images
by Baohua Yuan, Sukhjit Singh Sehra and Bernard Chiu
Remote Sens. 2024, 16(21), 3961; https://doi.org/10.3390/rs16213961 - 24 Oct 2024
Abstract
The advancement in satellite image sensors has enabled the acquisition of high-resolution remote sensing (HRRS) images. However, interpreting these images accurately and obtaining the computational power needed to do so is challenging due to the complexity involved. This manuscript proposed a multi-stream convolutional [...] Read more.
The advancement in satellite image sensors has enabled the acquisition of high-resolution remote sensing (HRRS) images. However, interpreting these images accurately and obtaining the computational power needed to do so is challenging due to the complexity involved. This manuscript proposed a multi-stream convolutional neural network (CNN) fusion framework that involves multi-scale and multi-CNN integration for HRRS image recognition. The pre-trained CNNs were used to learn and extract semantic features from multi-scale HRRS images. Feature extraction using pre-trained CNNs is more efficient than training a CNN from scratch or fine-tuning a CNN. Discriminative canonical correlation analysis (DCCA) was used to fuse deep features extracted across CNNs and image scales. DCCA reduced the dimension of the features extracted from CNNs while providing a discriminative representation by maximizing the within-class correlation and minimizing the between-class correlation. The proposed model has been evaluated on NWPU-RESISC45 and UC Merced datasets. The accuracy associated with DCCA was 10% and 6% higher than discriminant correlation analysis (DCA) in the NWPU-RESISC45 and UC Merced datasets. The advantage of DCCA was better demonstrated in the NWPU-RESISC45 dataset due to the incorporation of richer within-class variability in this dataset. While both DCA and DCCA minimize between-class correlation, only DCCA maximizes the within-class correlation and, therefore, attains better accuracy. The proposed framework achieved higher accuracy than all state-of-the-art frameworks involving unsupervised learning and pre-trained CNNs and 2–3% higher than the majority of fine-tuned CNNs. The proposed framework offers computational time advantages, requiring only 13 s for training in NWPU-RESISC45, compared to a day for fine-tuning the existing CNNs. Thus, the proposed framework achieves a favourable balance between efficiency and accuracy in HRRS image recognition. Full article
Show Figures

Figure 1

22 pages, 3096 KiB  
Article
Training by Pairing Correlated Samples Improves Deep Network Generalization
by Duc H. Phan and Douglas L. Jones
Electronics 2024, 13(21), 4169; https://doi.org/10.3390/electronics13214169 - 24 Oct 2024
Abstract
Deep neural networks (DNNs) have been widely applied in different application domains. The DNN was first studied intensively in vision applications before adapting it to other fields. To migrate DNN solutions from the vision domain to another application domain, a neural network solution [...] Read more.
Deep neural networks (DNNs) have been widely applied in different application domains. The DNN was first studied intensively in vision applications before adapting it to other fields. To migrate DNN solutions from the vision domain to another application domain, a neural network solution may be influenced by new structures or improved training processes. This article focuses on training process improvements. We propose a pairing technique that simultaneously optimizes the performance of the models on pairs of training samples during the training process. A pair includes an original training example and a corresponding modified version. Our pairing techniques adapt the similarity part of the contrastive loss as an additional regularization term for the loss function of a given machine learning task. The pairing techniques show at least 1% improvement in network accuracy on top of mix-up augmentation for the CIFAR10 dataset and 2% increase in accuracy for DCASE 2020 Task 1A data. We show that the proposed training-by-pairs provides parameter regularization for ReLU deep networks. As a result, the technique can potentially apply to many other machine learning applications. Full article
Show Figures

Figure 1

17 pages, 2458 KiB  
Article
Data Augmentation Method Using Room Transfer Function for Monitoring of Domestic Activities
by Minhan Kim and Seokjin Lee
Appl. Sci. 2024, 14(21), 9644; https://doi.org/10.3390/app14219644 - 22 Oct 2024
Abstract
Monitoring domestic activities helps us to understand user behaviors in indoor environments, which has garnered interest as it aids in understanding human activities in context-aware computing. In the field of acoustics, this goal has been achieved through studies employing machine learning techniques, which [...] Read more.
Monitoring domestic activities helps us to understand user behaviors in indoor environments, which has garnered interest as it aids in understanding human activities in context-aware computing. In the field of acoustics, this goal has been achieved through studies employing machine learning techniques, which are widely used for classification tasks involving sound recognition and other objectives. Machine learning typically achieves better performance with large amounts of high-quality training data. Given the high cost of data collection, development datasets often suffer from imbalanced data or lack high-quality samples, leading to performance degradations in machine learning models. The present study aims to address this data issue through data augmentation techniques. Specifically, since the proposed method targets indoor activities in domestic activity detection, room transfer functions were used for data augmentation. The results show that the proposed method achieves a 0.59% improvement in the F1-Score (micro) from that of the baseline system for the development dataset. Additionally, test data including microphones that were not used during training achieved an F1-Score improvement of 0.78% over that of the baseline system. This demonstrates the enhanced model generalization performance of the proposed method on samples having different room transfer functions to those of the trained dataset. Full article
Show Figures

Figure 1

35 pages, 16179 KiB  
Article
Vegetative Index Intercalibration Between PlanetScope and Sentinel-2 Through a SkySat Classification in the Context of “Riserva San Massimo” Rice Farm in Northern Italy
by Christian Massimiliano Baldin and Vittorio Marco Casella
Remote Sens. 2024, 16(21), 3921; https://doi.org/10.3390/rs16213921 - 22 Oct 2024
Abstract
Rice farming in Italy accounts for about 50% of the EU’s rice area and production. Precision agriculture has entered the scene to enhance sustainability, cut pollution, and ensure food security. Various studies have used remote sensing tools like satellites and drones for multispectral [...] Read more.
Rice farming in Italy accounts for about 50% of the EU’s rice area and production. Precision agriculture has entered the scene to enhance sustainability, cut pollution, and ensure food security. Various studies have used remote sensing tools like satellites and drones for multispectral imaging. While Sentinel-2 is highly regarded for precision agriculture, it falls short for specific applications, like at the “Riserva San Massimo” (Gropello Cairoli, Lombardia, Northern Italy) rice farm, where irregularly shaped crops need higher resolution and frequent revisits to deal with cloud cover. A prior study that compared Sentinel-2 and the higher-resolution PlanetScope constellation for vegetative indices found a seasonal miscalibration in the Normalized Difference Vegetation Index (NDVI) and in the Normalized Difference Red Edge Index (NDRE). Dr. Agr. G.N. Rognoni, a seasoned agronomist working with this farm, stresses the importance of studying the radiometric intercalibration between the PlanetScope and Sentinel-2 vegetative indices to leverage the knowledge gained from Sentinel-2 for him to apply variable rate application (VRA). A high-resolution SkySat image, taken almost simultaneously with a pair of Sentinel-2 and PlanetScope images, offered a chance to examine if the irregular distribution of vegetation and barren land within rice fields might be a factor in the observed miscalibration. Using an unsupervised pixel-based image classification technique on SkySat imagery, it is feasible to split rice into two subclasses and intercalibrate them separately. The results indicated that combining histograms and agronomists’ expertise could confirm SkySat classification. Moreover, the uneven spatial distribution of rice does not affect the seasonal miscalibration object of past studies, which can be adjusted using the methods described here, even with images taken four days apart: the first method emphasizes accuracy using linear regression, histogram shifting, and histogram matching; whereas the second method is faster and utilizes only histogram matching. Full article
Show Figures

Figure 1

22 pages, 7929 KiB  
Article
Remote Sensing LiDAR and Hyperspectral Classification with Multi-Scale Graph Encoder–Decoder Network
by Fang Wang, Xingqian Du, Weiguang Zhang, Liang Nie, Hu Wang, Shun Zhou and Jun Ma
Remote Sens. 2024, 16(20), 3912; https://doi.org/10.3390/rs16203912 - 21 Oct 2024
Abstract
The rapid development of sensor technology has made multi-modal remote sensing data valuable for land cover classification due to its diverse and complementary information. Many feature extraction methods for multi-modal data, combining light detection and ranging (LiDAR) and hyperspectral imaging (HSI), have recognized [...] Read more.
The rapid development of sensor technology has made multi-modal remote sensing data valuable for land cover classification due to its diverse and complementary information. Many feature extraction methods for multi-modal data, combining light detection and ranging (LiDAR) and hyperspectral imaging (HSI), have recognized the importance of incorporating multiple spatial scales. However, effectively capturing both long-range global correlations and short-range local features simultaneously on different scales remains a challenge, particularly in large-scale, complex ground scenes. To address this limitation, we propose a multi-scale graph encoder–decoder network (MGEN) for multi-modal data classification. The MGEN adopts a graph model that maintains global sample correlations to fuse multi-scale features, enabling simultaneous extraction of local and global information. The graph encoder maps multi-modal data from different scales to the graph space and completes feature extraction in the graph space. The graph decoder maps the features of multiple scales back to the original data space and completes multi-scale feature fusion and classification. Experimental results on three HSI-LiDAR datasets demonstrate that the proposed MGEN achieves considerable classification accuracies and outperforms state-of-the-art methods. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

18 pages, 41079 KiB  
Article
Research on Target Image Classification in Low-Light Night Vision
by Yanfeng Li, Yongbiao Luo, Yingjian Zheng, Guiqian Liu and Jiekai Gong
Entropy 2024, 26(10), 882; https://doi.org/10.3390/e26100882 - 21 Oct 2024
Abstract
In extremely dark conditions, low-light imaging may offer spectators a rich visual experience, which is important for both military and civic applications. However, the images taken in ultra-micro light environments usually have inherent defects such as extremely low brightness and contrast, a high [...] Read more.
In extremely dark conditions, low-light imaging may offer spectators a rich visual experience, which is important for both military and civic applications. However, the images taken in ultra-micro light environments usually have inherent defects such as extremely low brightness and contrast, a high noise level, and serious loss of scene details and colors, which leads to great challenges in the research of low-light image and object detection and classification. The low-light night vision image used as the study object in this work has an excessively dim overall picture and very little information about the screen’s features. Three algorithms, HE, AHE, and CLAHE, were used to enhance and highlight the image. The effectiveness of these image enhancement methods is evaluated using metrics such as the peak signal-to-noise ratio and mean square error, and CLAHE was selected after comparison. The target image includes vehicles, people, license plates, and objects. The gray-level co-occurrence matrix (GLCM) was used to extract the texture features of the enhanced images, and the extracted image texture features were used as input to construct a backpropagation (BP) neural network classification model. Then, low-light image classification models were developed based on VGG16 and ResNet50 convolutional neural networks combined with low-light image enhancement algorithms. The experimental results show that the overall classification accuracy of the VGG16 convolutional neural network model is 92.1%. Compared with the BP and ResNet50 neural network models, the classification accuracy was increased by 4.5% and 2.3%, respectively, demonstrating its effectiveness in classifying low-light night vision targets. Full article
Show Figures

Figure 1

23 pages, 6173 KiB  
Article
Scene Classification of Remote Sensing Image Based on Multi-Path Reconfigurable Neural Network
by Wenyi Hu, Chunjie Lan, Tian Chen, Shan Liu, Lirong Yin and Lei Wang
Land 2024, 13(10), 1718; https://doi.org/10.3390/land13101718 - 20 Oct 2024
Viewed by 221
Abstract
Land image recognition and classification and land environment detection are important research fields in remote sensing applications. Because of the diversity and complexity of different tasks of land environment recognition and classification, it is difficult for researchers to use a single model to [...] Read more.
Land image recognition and classification and land environment detection are important research fields in remote sensing applications. Because of the diversity and complexity of different tasks of land environment recognition and classification, it is difficult for researchers to use a single model to achieve the best performance in scene classification of multiple remote sensing land images. Therefore, to determine which model is the best for the current recognition classification tasks, it is often necessary to select and experiment with many different models. However, finding the optimal model is accompanied by an increase in trial-and-error costs and is a waste of researchers’ time, and it is often impossible to find the right model quickly. To address the issue of existing models being too large for easy selection, this paper proposes a multi-path reconfigurable network structure and takes the multi-path reconfigurable residual network (MR-ResNet) model as an example. The reconfigurable neural network model allows researchers to selectively choose the required modules and reassemble them to generate customized models by splitting the trained models and connecting them through modules with different properties. At the same time, by introducing the concept of a multi-path input network, the optimal path is selected by inputting different modules, which shortens the training time of the model and allows researchers to easily find the network model suitable for the current application scenario. A lot of training data, computational resources, and model parameter experience are saved. Three public datasets, NWPU-RESISC45, RSSCN7, and SIRI-WHU datasets, were used for the experiments. The experimental results demonstrate that the proposed model surpasses the classic residual network (ResNet) in terms of both parameters and performance. Full article
(This article belongs to the Special Issue GeoAI for Land Use Observations, Analysis and Forecasting)
Show Figures

Figure 1

20 pages, 1584 KiB  
Article
Hyperspectral Image Classification Algorithm for Forest Analysis Based on a Group-Sensitive Selective Perceptual Transformer
by Shaoliang Shi, Xuyang Li, Xiangsuo Fan and Qi Li
Appl. Sci. 2024, 14(20), 9553; https://doi.org/10.3390/app14209553 - 19 Oct 2024
Viewed by 336
Abstract
Substantial advancements have been achieved in hyperspectral image (HSI) classification through contemporary deep learning techniques. Nevertheless, the incorporation of an excessive number of irrelevant tokens in large-scale remote sensing data results in inefficient long-range modeling. To overcome this hurdle, this study introduces the [...] Read more.
Substantial advancements have been achieved in hyperspectral image (HSI) classification through contemporary deep learning techniques. Nevertheless, the incorporation of an excessive number of irrelevant tokens in large-scale remote sensing data results in inefficient long-range modeling. To overcome this hurdle, this study introduces the Group-Sensitive Selective Perception Transformer (GSAT) framework, which builds upon the Vision Transformer (ViT) to enhance HSI classification outcomes. The innovation of the GSAT architecture is primarily evident in several key aspects. Firstly, the GSAT incorporates a Group-Sensitive Pixel Group Mapping (PGM) module, which organizes pixels into distinct groups. This allows the global self-attention mechanism to function within these groupings, effectively capturing local interdependencies within spectral channels. This grouping tactic not only boosts the model’s spatial awareness but also lessens computational complexity, enhancing overall efficiency. Secondly, the GSAT addresses the detrimental effects of superfluous tokens on model efficacy by introducing the Sensitivity Selection Framework (SSF) module. This module selectively identifies the most pertinent tokens for classification purposes, thereby minimizing distractions from extraneous information and bolstering the model’s representational strength. Furthermore, the SSF refines local representation through multi-scale feature selection, enabling the model to more effectively encapsulate feature data across various scales. Additionally, the GSAT architecture adeptly represents both global and local features of HSI data by merging global self-attention with local feature extraction. This integration strategy not only elevates classification precision but also enhances the model’s versatility in navigating complex scenes, particularly in urban mapping scenarios where it significantly outclasses previous deep learning methods. The advent of the GSAT architecture not only rectifies the inefficiencies of traditional deep learning approaches in processing extensive remote sensing imagery but also markededly enhances the performance of HSI classification tasks through the deployment of group-sensitive and selective perception mechanisms. It presents a novel viewpoint within the domain of hyperspectral image classification and is poised to propel further advancements in the field. Empirical testing on six standard HSI datasets confirms the superior performance of the proposed GSAT method in HSI classification, especially within urban mapping contexts, where it exceeds the capabilities of prior deep learning techniques. In essence, the GSAT architecture markedly refines HSI classification by pioneering group-sensitive pixel group mapping and selective perception mechanisms, heralding a significant breakthrough in hyperspectral image processing. Full article
Show Figures

Figure 1

22 pages, 3605 KiB  
Article
Instance-Level Scaling and Dynamic Margin-Alignment Knowledge Distillation for Remote Sensing Image Scene Classification
by Chuan Li, Xiao Teng, Yan Ding and Long Lan
Remote Sens. 2024, 16(20), 3853; https://doi.org/10.3390/rs16203853 - 17 Oct 2024
Viewed by 372
Abstract
Remote sensing image (RSI) scene classification aims to identify semantic categories in RSI using neural networks. However, high-performance deep neural networks typically demand substantial storage and computational resources, making practical deployment challenging. Knowledge distillation has emerged as an effective technique for developing compact [...] Read more.
Remote sensing image (RSI) scene classification aims to identify semantic categories in RSI using neural networks. However, high-performance deep neural networks typically demand substantial storage and computational resources, making practical deployment challenging. Knowledge distillation has emerged as an effective technique for developing compact models that maintain high classification accuracy in RSI tasks. Existing knowledge distillation methods often overlook the high inter-class similarity in RSI scenes, leading to low-confidence soft labels from the teacher model, which can mislead the student model. Conversely, overly confident soft labels may discard valuable non-target information. Additionally, the significant intra-class variability in RSI contributes to instability in the model’s decision boundaries. To address these challenges, we propose an efficient method called instance-level scaling and dynamic margin-alignment knowledge distillation (ISDM) for RSI scene classification. To balance the target and non-target class influence, we apply an entropy regularization loss to scale the teacher model’s target class at the instance level. Moreover, we introduce dynamic margin alignment between the student and teacher models to improve the student’s discriminative capability. By optimizing soft labels and enhancing the student’s ability to distinguish between classes, our method reduces the effects of inter-class similarity and intra-class variability. Experimental results on three public RSI scene classification datasets (AID, UCMerced, and NWPU-RESISC) demonstrate that our method achieves state-of-the-art performance across all teacher–student pairs with lower computational costs. Additionally, we validate the generalization of our approach on general datasets, including CIFAR-100 and ImageNet-1k. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

22 pages, 1654 KiB  
Article
A New Scene Sensing Model Based on Multi-Source Data from Smartphones
by Zhenke Ding, Zhongliang Deng, Enwen Hu, Bingxun Liu, Zhichao Zhang and Mingyang Ma
Sensors 2024, 24(20), 6669; https://doi.org/10.3390/s24206669 - 16 Oct 2024
Viewed by 274
Abstract
Smartphones with integrated sensors play an important role in people’s lives, and in advanced multi-sensor fusion navigation systems, the use of individual sensor information is crucial. Because of the different environments, the weights of the sensors will be different, which will also affect [...] Read more.
Smartphones with integrated sensors play an important role in people’s lives, and in advanced multi-sensor fusion navigation systems, the use of individual sensor information is crucial. Because of the different environments, the weights of the sensors will be different, which will also affect the method and results of multi-source fusion positioning. Based on the multi-source data from smartphone sensors, this study explores five types of information—Global Navigation Satellite System (GNSS), Inertial Measurement Units (IMUs), cellular networks, optical sensors, and Wi-Fi sensors—characterizing the temporal, spatial, and mathematical statistical features of the data, and it constructs a multi-scale, multi-window, and context-connected scene sensing model to accurately detect the environmental scene in indoor, semi-indoor, outdoor, and semi-outdoor spaces, thus providing a good basis for multi-sensor positioning in a multi-sensor navigation system. Detecting environmental scenes provides an environmental positioning basis for multi-sensor fusion localization. This model is divided into four main parts: multi-sensor-based data mining, a multi-scale convolutional neural network (CNN), a bidirectional long short-term memory (BiLSTM) network combined with contextual information, and a meta-heuristic optimization algorithm. Full article
(This article belongs to the Special Issue Smart Sensor Systems for Positioning and Navigation)
Show Figures

Figure 1

19 pages, 8953 KiB  
Article
Leveraging Multimodal Large Language Models (MLLMs) for Enhanced Object Detection and Scene Understanding in Thermal Images for Autonomous Driving Systems
by Huthaifa I. Ashqar, Taqwa I. Alhadidi, Mohammed Elhenawy and Nour O. Khanfar
Automation 2024, 5(4), 508-526; https://doi.org/10.3390/automation5040029 - 10 Oct 2024
Viewed by 918
Abstract
The integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This study investigates the potential of MLLMs, specifically GPT-4 Vision Preview and [...] Read more.
The integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This study investigates the potential of MLLMs, specifically GPT-4 Vision Preview and Gemini 1.0 Pro Vision, for interpreting thermal images for applications in ADS and ITS. Two primary research questions are addressed: the capacity of these models to detect and enumerate objects within thermal images, and to determine whether pairs of image sources represent the same scene. Furthermore, we propose a framework for object detection and classification by integrating infrared (IR) and RGB images of the same scene without requiring localization data. This framework is particularly valuable for enhancing the detection and classification accuracy in environments where both IR and RGB cameras are essential. By employing zero-shot in-context learning for object detection and the chain-of-thought technique for scene discernment, this study demonstrates that MLLMs can recognize objects such as vehicles and individuals with promising results, even in the challenging domain of thermal imaging. The results indicate a high true positive rate for larger objects and moderate success in scene discernment, with a recall of 0.91 and a precision of 0.79 for similar scenes. The integration of IR and RGB images further enhances detection capabilities, achieving an average precision of 0.93 and an average recall of 0.56. This approach leverages the complementary strengths of each modality to compensate for individual limitations. This study highlights the potential of combining advanced AI methodologies with thermal imaging to enhance the accuracy and reliability of ADS, while identifying areas for improvement in model performance. Full article
Show Figures

Figure 1

19 pages, 9016 KiB  
Article
Semi-Supervised Subcategory Centroid Alignment-Based Scene Classification for High-Resolution Remote Sensing Images
by Nan Mo and Ruixi Zhu
Remote Sens. 2024, 16(19), 3728; https://doi.org/10.3390/rs16193728 - 7 Oct 2024
Viewed by 507
Abstract
It is usually hard to obtain adequate annotated data for delivering satisfactory scene classification results. Semi-supervised scene classification approaches can transfer the knowledge learned from previously annotated data to remote sensing images with scarce samples for satisfactory classification results. However, due to the [...] Read more.
It is usually hard to obtain adequate annotated data for delivering satisfactory scene classification results. Semi-supervised scene classification approaches can transfer the knowledge learned from previously annotated data to remote sensing images with scarce samples for satisfactory classification results. However, due to the differences between sensors, environments, seasons, and geographical locations, cross-domain remote sensing images exhibit feature distribution deviations. Therefore, semi-supervised scene classification methods may not achieve satisfactory classification accuracy. To address this problem, a novel semi-supervised subcategory centroid alignment (SSCA)-based scene classification approach is proposed. The SSCA framework is made up of two components, namely the rotation-robust convolutional feature extractor (RCFE) and the neighbor-based subcategory centroid alignment (NSCA). The RCFE aims to suppress the impact of rotation changes on remote sensing image representation, while the NSCA aims to decrease the impact of intra-category variety across domains on cross-domain scene classification. The SSCA algorithm and several competitive approaches are validated on two datasets to demonstrate its effectiveness. The results prove that the proposed SSCA approach performs better than most competitive approaches by no less than 2% overall accuracy. Full article
(This article belongs to the Special Issue Deep Transfer Learning for Remote Sensing II)
Show Figures

Figure 1

19 pages, 5897 KiB  
Article
Tracking and Behavior Analysis of Group-Housed Pigs Based on a Multi-Object Tracking Approach
by Shuqin Tu, Jiaying Du, Yun Liang, Yuefei Cao, Weidian Chen, Deqin Xiao and Qiong Huang
Animals 2024, 14(19), 2828; https://doi.org/10.3390/ani14192828 - 30 Sep 2024
Viewed by 486
Abstract
Smart farming technologies to track and analyze pig behaviors in natural environments are critical for monitoring the health status and welfare of pigs. This study aimed to develop a robust multi-object tracking (MOT) approach named YOLOv8 + OC-SORT(V8-Sort) for the automatic monitoring of [...] Read more.
Smart farming technologies to track and analyze pig behaviors in natural environments are critical for monitoring the health status and welfare of pigs. This study aimed to develop a robust multi-object tracking (MOT) approach named YOLOv8 + OC-SORT(V8-Sort) for the automatic monitoring of the different behaviors of group-housed pigs. We addressed common challenges such as variable lighting, occlusion, and clustering between pigs, which often lead to significant errors in long-term behavioral monitoring. Our approach offers a reliable solution for real-time behavior tracking, contributing to improved health and welfare management in smart farming systems. First, the YOLOv8 is employed for the real-time detection and behavior classification of pigs under variable light and occlusion scenes. Second, the OC-SORT is utilized to track each pig to reduce the impact of pigs clustering together and occlusion on tracking. And, when a target is lost during tracking, the OC-SORT can recover the lost trajectory and re-track the target. Finally, to implement the automatic long-time monitoring of behaviors for each pig, we created an automatic behavior analysis algorithm that integrates the behavioral information from detection and the tracking results from OC-SORT. On the one-minute video datasets for pig tracking, the proposed MOT method outperforms JDE, Trackformer, and TransTrack, achieving the highest HOTA, MOTA, and IDF1 scores of 82.0%, 96.3%, and 96.8%, respectively. And, it achieved scores of 69.0% for HOTA, 99.7% for MOTA, and 75.1% for IDF1 on sixty-minute video datasets. In terms of pig behavior analysis, the proposed automatic behavior analysis algorithm can record the duration of four types of behaviors for each pig in each pen based on behavior classification and ID information to represent the pigs’ health status and welfare. These results demonstrate that the proposed method exhibits excellent performance in behavior recognition and tracking, providing technical support for prompt anomaly detection and health status monitoring for pig farming managers. Full article
(This article belongs to the Section Pigs)
Show Figures

Figure 1

21 pages, 9523 KiB  
Article
A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation
by Haoyuan Chen, Sihang Zhou, Kuan Li, Jianping Yin and Jian Huang
Mathematics 2024, 12(19), 3061; https://doi.org/10.3390/math12193061 - 30 Sep 2024
Viewed by 551
Abstract
In the realm of human–robot interaction, the integration of visual and verbal cues has become increasingly significant. This paper focuses on the challenges and advancements in referring image segmentation (RIS), a task that involves segmenting images based on textual descriptions. Traditional approaches to [...] Read more.
In the realm of human–robot interaction, the integration of visual and verbal cues has become increasingly significant. This paper focuses on the challenges and advancements in referring image segmentation (RIS), a task that involves segmenting images based on textual descriptions. Traditional approaches to RIS have primarily focused on pixel-level classification. These methods, although effective, often overlook the interconnectedness of pixels, which can be crucial for interpreting complex visual scenes. Furthermore, while the PolyFormer model has shown impressive performance in RIS, its large number of parameters and high training data requirements pose significant challenges. These factors restrict its adaptability and optimization on standard consumer hardware, hindering further enhancements in subsequent research. Addressing these issues, our study introduces a novel two-branch decoder framework with SAM (segment anything model) for RIS. This framework incorporates an MLP decoder and a KAN decoder with a multi-scale feature fusion module, enhancing the model’s capacity to discern fine details within images. The framework’s robustness is further bolstered by an ensemble learning strategy that consolidates the insights from both the MLP and KAN decoder branches. More importantly, we collect the segmentation target edge coordinates and bounding box coordinates as input cues for the SAM model. This strategy leverages SAM’s zero-sample learning capabilities to refine and optimize the segmentation outcomes. Our experimental findings, based on the widely recognized RefCOCO, RefCOCO+, and RefCOCOg datasets, confirm the effectiveness of this method. The results not only achieve state-of-the-art (SOTA) performance in segmentation but are also supported by ablation studies that highlight the contributions of each component to the overall improvement in performance. Full article
Show Figures

Figure 1

21 pages, 14147 KiB  
Article
Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet Head
by Jing Zhang, Zhaolong Hong, Xu Chen and Yunsong Li
Remote Sens. 2024, 16(19), 3630; https://doi.org/10.3390/rs16193630 - 29 Sep 2024
Viewed by 1580
Abstract
The emergence of few-shot object detection provides a new approach to address the challenge of poor generalization ability due to data scarcity. Currently, extensive research has been conducted on few-shot object detection in natural scene datasets, and notable progress has been made. However, [...] Read more.
The emergence of few-shot object detection provides a new approach to address the challenge of poor generalization ability due to data scarcity. Currently, extensive research has been conducted on few-shot object detection in natural scene datasets, and notable progress has been made. However, in the realm of remote sensing, this technology is still lagging behind. Furthermore, many established methods rely on two-stage detectors, prioritizing accuracy over speed, which hinders real-time applications. Considering both detection accuracy and speed, in this paper, we propose a simple few-shot object detection method based on the one-stage detector YOLOv5 with transfer learning. First, we propose a Segmentation Assistance (SA) module to guide the network’s attention toward foreground targets. This module assists in training and enhances detection accuracy without increasing inference time. Second, we design a novel detection head called the Triplet Head (Tri-Head), which employs a dual distillation mechanism to mitigate the issue of forgetting base-class knowledge. Finally, we optimize the classification loss function to emphasize challenging samples. Evaluations on the NWPUv2 and DIOR datasets showcase the method’s superiority. Full article
Show Figures

Figure 1

Back to TopTop