Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,745)

Search Parameters:
Keywords = complex scenes

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 7515 KiB  
Article
Maneuvering Trajectory Synthetic Aperture Radar Processing Based on the Decomposition of Transfer Functions in the Frequency Domain Using Average Blurred Edge Width Assessment
by Chenguang Yang, Duo Wang, Fukun Sun and Kaizhi Wang
Electronics 2024, 13(20), 4100; https://doi.org/10.3390/electronics13204100 - 17 Oct 2024
Viewed by 236
Abstract
With the rapid development of synthetic aperture radar (SAR), delivery platforms are gradually becoming diversified and miniaturized. The SAR flight process is susceptible to external influences, resulting in unsatisfactory imaging results, so it is necessary to optimize imaging processing in combination with the [...] Read more.
With the rapid development of synthetic aperture radar (SAR), delivery platforms are gradually becoming diversified and miniaturized. The SAR flight process is susceptible to external influences, resulting in unsatisfactory imaging results, so it is necessary to optimize imaging processing in combination with the SAR imaging quality assessment (IQA) index. Based on the principle of SAR imaging, this paper analyzes the impact of defocusing on imaging results caused by mismatched filters and draws on the assessment algorithm of motion blur, proposing a SAR IQA index based on average blurred edge width (ABEW) in the salient area. In addition, the idea of decomposing the transfer function in the frequency domain and fitting the matched filter with a polynomial is also proposed. The estimation of the flight trajectory is changed to a correction of the matched filter, avoiding the precise estimation of Doppler parameters and complex calculations during the time–frequency conversion process. The effectiveness of ABEW was verified by using SAR images of real scenes, and the results were highly consistent with the actual image quality. The imaging processing was tested using the echo signals generated by the errors introduced during the flight process, and more satisfactory imaging results were obtained by using ABEW with the filter for correction. The imaging process was tested using the echo signal generated by introducing errors during the flight, and the filter was corrected using ABEW as an index, obtaining a comparatively ideal imaging result. Full article
(This article belongs to the Special Issue Radar Signal Processing Technology)
Show Figures

Figure 1

16 pages, 8612 KiB  
Article
Deep Learning-Based Approximated Observation Sparse SAR Imaging via Complex-Valued Convolutional Neural Network
by Zhongyuan Ji, Lingyu Li and Hui Bi
Remote Sens. 2024, 16(20), 3850; https://doi.org/10.3390/rs16203850 - 16 Oct 2024
Viewed by 240
Abstract
Sparse synthetic aperture radar (SAR) imaging has demonstrated excellent potential in image quality improvement and data compression. However, conventional observation matrix-based methods suffer from high computational overhead, which is hard to use for real data processing. The approximated observation sparse SAR imaging method [...] Read more.
Sparse synthetic aperture radar (SAR) imaging has demonstrated excellent potential in image quality improvement and data compression. However, conventional observation matrix-based methods suffer from high computational overhead, which is hard to use for real data processing. The approximated observation sparse SAR imaging method relieves the computation pressure, but it needs to manually set the parameters to solve the optimization problem. Thus, several deep learning (DL) SAR imaging methods have been used for scene recovery, but many of them employ dual-path networks. To better leverage the complex-valued characteristics of echo data, in this paper, we present a novel complex-valued convolutional neural network (CNN)-based approximated observation sparse SAR imaging method, which is a single-path DL network. Firstly, we present the approximated observation-based model via the chirp-scaling algorithm (CSA). Next, we map the process of the iterative soft thresholding (IST) algorithm into the deep network form, and design the symmetric complex-valued CNN block to achieve the sparse recovery of large-scale scenes. In comparison to matched filtering (MF), the approximated observation sparse imaging method, and the existing DL SAR imaging methods, our complex-valued network model shows excellent performance in image quality improvement especially when the used data are down-sampled. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

22 pages, 17993 KiB  
Article
Research on Global Off-Road Path Planning Based on Improved A* Algorithm
by Zhihong Lv, Li Ni, Hongchun Peng, Kefa Zhou, Dequan Zhao, Guangjun Qu, Weiting Yuan, Yue Gao and Qing Zhang
ISPRS Int. J. Geo-Inf. 2024, 13(10), 362; https://doi.org/10.3390/ijgi13100362 - 16 Oct 2024
Viewed by 264
Abstract
In field driving activities, off-road areas usually lack existing paths that can be directly driven on by ground vehicles, but their surface environments can still satisfy the planning and passage requirements of some off-road vehicles. Additionally, the existing path planning methods face limitations [...] Read more.
In field driving activities, off-road areas usually lack existing paths that can be directly driven on by ground vehicles, but their surface environments can still satisfy the planning and passage requirements of some off-road vehicles. Additionally, the existing path planning methods face limitations in complex field environments characterized by undulating terrains and diverse land cover types. Therefore, this study introduces an improved A* algorithm and an adapted 3D model of real field scenes is constructed. A velocity curve is fitted in the evaluation function to reflect the comprehensive influences of different slopes and land cover types on the traffic speed, and the algorithm not only takes the shortest distance as the basis for selecting extension nodes but also considers the minimum traffic speed. The 8-neighborhood search method of the traditional A* algorithm is improved to a dynamic 14-neighborhood search method, which effectively reduces the number of turning points encountered along the path. In addition, corner thresholds and slope thresholds are incorporated into the algorithm to ensure the accessibility of path planning, and some curves and steep slopes are excluded, thus improving the usability and safety of the path. Experimental results show that this algorithm can carry out global path planning in complex field environments, and the planned path has better passability and a faster speed than those of the existing approaches. Compared with those of the traditional A* algorithm, the path planning results of the improved algorithm reduce the path length by 23.30%; the number of turning points is decreased by 33.16%; and the travel time is decreased by 38.92%. This approach is conducive to the smooth progress of various off-road activities and has certain guiding significance for ensuring the efficient and safe operations of vehicles in field environments. Full article
Show Figures

Figure 1

22 pages, 6362 KiB  
Article
CGADNet: A Lightweight, Real-Time, and Robust Crosswalk and Guide Arrow Detection Network for Complex Scenes
by Guangxing Wang, Tao Lin, Xiwei Dong, Longchun Wang, Qingming Leng and Seong-Yoon Shin
Appl. Sci. 2024, 14(20), 9445; https://doi.org/10.3390/app14209445 - 16 Oct 2024
Viewed by 469
Abstract
In the context of edge environments with constrained resources, realizing real-time and robust crosswalk and guide arrow detection poses a significant challenge for autonomous driving systems. This paper proposes a crosswalk and guide arrow detection network (CGADNet), a lightweight visual neural network derived [...] Read more.
In the context of edge environments with constrained resources, realizing real-time and robust crosswalk and guide arrow detection poses a significant challenge for autonomous driving systems. This paper proposes a crosswalk and guide arrow detection network (CGADNet), a lightweight visual neural network derived from YOLOv8. Specifically designed for the swift and accurate detection of crosswalks and guide arrows within the field of view of the vehicle, the CGADNet can seamlessly be implemented on the Jetson Orin Nano device to achieve real-time processing. In this study, we incorporated a novel C2f_Van module based on VanillaBlock, employed depth-separable convolution to reduce the parameters efficiently, utilized partial convolution (PConv) for lightweight FasterDetect, and utilized a bounding box regression loss with a dynamic focusing mechanism—WIoUv3—to enhance the detection performance. In complex scenarios, the proposed method in the stability of the [email protected] was maintained, resulting in a 4.1% improvement in the [email protected]:0.95. The network parameters, floating point operations (FLOPs), and weights were reduced by 63.81%, 70.07%, and 63.11%, respectively. Ultimately, a detection speed of 50.35 FPS was achieved on the Jetson Orin Nano. This research provides practical methodologies for deploying crosswalk and guide arrow detection networks on edge computing devices. Full article
(This article belongs to the Special Issue Future Information & Communication Engineering 2024)
Show Figures

Figure 1

17 pages, 6135 KiB  
Article
Research on Improved Image Segmentation Algorithm Based on GrabCut
by Shangzhen Pang, Tzer Hwai Gilbert Thio, Fei Lu Siaw, Mingju Chen and Yule Xia
Electronics 2024, 13(20), 4068; https://doi.org/10.3390/electronics13204068 - 16 Oct 2024
Viewed by 250
Abstract
The classic interactive image segmentation algorithm GrabCut achieves segmentation through iterative optimization. However, GrabCut requires multiple iterations, resulting in slower performance. Moreover, relying solely on a rectangular bounding box can sometimes lead to inaccuracies, especially when dealing with complex shapes or intricate object [...] Read more.
The classic interactive image segmentation algorithm GrabCut achieves segmentation through iterative optimization. However, GrabCut requires multiple iterations, resulting in slower performance. Moreover, relying solely on a rectangular bounding box can sometimes lead to inaccuracies, especially when dealing with complex shapes or intricate object boundaries. To address these issues in GrabCut, an improvement is introduced by incorporating appearance overlap terms to optimize segmentation energy function, thereby achieving optimal segmentation results in a single iteration. This enhancement significantly reduces computational costs while improving the overall segmentation speed without compromising accuracy. Additionally, users can directly provide seed points on the image to more accurately indicate foreground and background regions, rather than relying solely on a bounding box. This interactive approach not only enhances the algorithm’s ability to accurately segment complex objects but also simplifies the user experience. We evaluate the experimental results through qualitative and quantitative analysis. In qualitative analysis, improvements in segmentation accuracy are visibly demonstrated through segmented images and residual segmentation results. In quantitative analysis, the improved algorithm outperforms GrabCut and min_cut algorithms in processing speed. When dealing with scenes where complex objects or foreground objects are very similar to the background, the improved algorithm will display more stable segmentation results. Full article
Show Figures

Figure 1

18 pages, 14966 KiB  
Article
UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images
by Zhanyuan Chang, Mingyu Xu, Yuwen Wei, Jie Lian, Chongming Zhang and Chuanjiang Li
Sensors 2024, 24(20), 6655; https://doi.org/10.3390/s24206655 - 16 Oct 2024
Viewed by 282
Abstract
The application of deep neural networks for the semantic segmentation of remote sensing images is a significant research area within the field of the intelligent interpretation of remote sensing data. The semantic segmentation of remote sensing images holds great practical value in urban [...] Read more.
The application of deep neural networks for the semantic segmentation of remote sensing images is a significant research area within the field of the intelligent interpretation of remote sensing data. The semantic segmentation of remote sensing images holds great practical value in urban planning, disaster assessment, the estimation of carbon sinks, and other related fields. With the continuous advancement of remote sensing technology, the spatial resolution of remote sensing images is gradually increasing. This increase in resolution brings about challenges such as significant changes in the scale of ground objects, redundant information, and irregular shapes within remote sensing images. Current methods leverage Transformers to capture global long-range dependencies. However, the use of Transformers introduces higher computational complexity and is prone to losing local details. In this paper, we propose UNeXt (UNet+ConvNeXt+Transformer), a real-time semantic segmentation model tailored for high-resolution remote sensing images. To achieve efficient segmentation, UNeXt uses the lightweight ConvNeXt-T as the encoder and a lightweight decoder, Transnext, which combines a Transformer and CNN (Convolutional Neural Networks) to capture global information while avoiding the loss of local details. Furthermore, in order to more effectively utilize spatial and channel information, we propose a SCFB (SC Feature Fuse Block) to reduce computational complexity while enhancing the model’s recognition of complex scenes. A series of ablation experiments and comprehensive comparative experiments demonstrate that our method not only runs faster than state-of-the-art (SOTA) lightweight models but also achieves higher accuracy. Specifically, our proposed UNeXt achieves 85.2% and 82.9% mIoUs on the Vaihingen and Gaofen5 (GID5) datasets, respectively, while maintaining 97 fps for 512 × 512 inputs on a single NVIDIA GTX 4090 GPU, outperforming other SOTA methods. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

22 pages, 1174 KiB  
Article
Dual Stream Encoder–Decoder Architecture with Feature Fusion Model for Underwater Object Detection
by Mehvish Nissar, Amit Kumar Mishra and Badri Narayan Subudhi
Mathematics 2024, 12(20), 3227; https://doi.org/10.3390/math12203227 (registering DOI) - 15 Oct 2024
Viewed by 357
Abstract
Underwater surveillance is an imminent and fascinating exploratory domain, particularly in monitoring aquatic ecosystems. This field offers valuable insights into underwater behavior and activities, which have broad applications across various domains. Specifically, underwater surveillance involves detecting and tracking moving objects within aquatic environments. [...] Read more.
Underwater surveillance is an imminent and fascinating exploratory domain, particularly in monitoring aquatic ecosystems. This field offers valuable insights into underwater behavior and activities, which have broad applications across various domains. Specifically, underwater surveillance involves detecting and tracking moving objects within aquatic environments. However, the complex properties of water make object detection a challenging task. Background subtraction is a commonly employed technique for detecting local changes in video scenes by segmenting images into the background and foreground to isolate the object of interest. Within this context, we propose an innovative dual-stream encoder–decoder framework based on the VGG-16 and ResNet-50 models for detecting moving objects in underwater frames. The network includes a feature fusion module that effectively extracts multiple-level features. Using a limited set of images and performing training in an end-to-end manner, the proposed framework yields accurate results without post-processing. The efficacy of the proposed technique is confirmed through visual and quantitative comparisons with eight cutting-edge methods using two standard databases. The first one employed in our experiments is the Underwater Change Detection Dataset, which includes five challenges, each challenge comprising approximately 1000 frames. The categories in this dataset were recorded under various underwater conditions. The second dataset used for practical analysis is the Fish4Knowledge dataset, where we considered five challenges. Each category, recorded in different aquatic settings, contains a varying number of frames, typically exceeding 1000 per category. Our proposed method surpasses all methods used for comparison by attaining an average F-measure of 0.98 on the Underwater Change Detection Dataset and 0.89 on the Fish4Knowledge dataset. Full article
(This article belongs to the Section Mathematics and Computer Science)
Show Figures

Figure 1

20 pages, 14331 KiB  
Article
Stable Walking of a Biped Robot Controlled by Central Pattern Generator Using Multivariate Linear Mapping
by Yao Wu, Biao Tang, Jiawei Tang, Shuo Qiao, Xiaobing Pang and Lei Guo
Biomimetics 2024, 9(10), 626; https://doi.org/10.3390/biomimetics9100626 (registering DOI) - 15 Oct 2024
Viewed by 362
Abstract
In order to improve the walking stability of a biped robot in multiple scenarios and reduce the complexity of the Central Pattern Generator (CPG) model, a new CPG walking controller based on multivariate linear mapping was proposed. At first, in order to establish [...] Read more.
In order to improve the walking stability of a biped robot in multiple scenarios and reduce the complexity of the Central Pattern Generator (CPG) model, a new CPG walking controller based on multivariate linear mapping was proposed. At first, in order to establish a dynamics model, the lower limb mechanical structure of the biped robot was designed. According to the Lagrange and angular momentum conservation method, the hybrid dynamic model of the biped robot was established. The initial value of the robot’s passive walking was found by means of Poincaré mapping and cell mapping methods. Then, a multivariate linear mapping model was established to form a new lightweight CPG model based on a Hopf oscillator. According to the parameter distribution of the new CPG model, a preliminary parameter-tuning idea was proposed. At last, the joint simulation of MATLAB and V-REP shows that the biped robot based on the new CPG control has a stable periodic gait in flat and uphill scenes. The proposed method could improve the stability and versatility of bipedal walking in various environments and can provide general CPG generation and a tuning method reference for robotics scholars. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Figure 1

25 pages, 19372 KiB  
Article
TSAE-UNet: A Novel Network for Multi-Scene and Multi-Temporal Water Body Detection Based on Spatiotemporal Feature Extraction
by Shuai Wang, Yu Chen, Yafei Yuan, Xinlong Chen, Jinze Tian, Xiaolong Tian and Huibin Cheng
Remote Sens. 2024, 16(20), 3829; https://doi.org/10.3390/rs16203829 (registering DOI) - 15 Oct 2024
Viewed by 364
Abstract
The application of remote sensing technology in water body detection has become increasingly widespread, offering significant value for environmental monitoring, hydrological research, and disaster early warning. However, the existing methods face challenges in multi-scene and multi-temporal water body detection, including the diverse variations [...] Read more.
The application of remote sensing technology in water body detection has become increasingly widespread, offering significant value for environmental monitoring, hydrological research, and disaster early warning. However, the existing methods face challenges in multi-scene and multi-temporal water body detection, including the diverse variations in water body shapes and sizes that complicate detection; the complexity of land cover types, which easily leads to false positives and missed detections; the high cost of acquiring high-resolution images, limiting long-term applications; and the lack of effective handling of multi-temporal data, making it difficult to capture the dynamic changes in water bodies. To address these challenges, this study proposes a novel network for multi-scene and multi-temporal water body detection based on spatiotemporal feature extraction, named TSAE-UNet. TSAE-UNet integrates convolutional neural networks (CNN), depthwise separable convolutions, ConvLSTM, and attention mechanisms, significantly improving the accuracy and robustness of water body detection by capturing multi-scale features and establishing long-term dependencies. The Otsu method was employed to quickly process Sentinel-1A and Sentinel-2 images, generating a high-quality training dataset. In the first experiment, five rectangular areas of approximately 37.5 km2 each were selected to validate the water body detection performance of the TSAE-UNet model across different scenes. The second experiment focused on Jining City, Shandong Province, China, analyzing the monthly water body changes from 2020 to 2022 and the quarterly changes in 2022. The experimental results demonstrate that TSAE-UNet excels in multi-scene and long-term water body detection, achieving a precision of 0.989, a recall of 0.983, an F1 score of 0.986, and an IoU of 0.974, significantly outperforming FCN, PSPNet, DeepLabV3+, ADCNN, and MECNet. Full article
Show Figures

Figure 1

21 pages, 7110 KiB  
Article
Pose Tracking and Object Reconstruction Based on Occlusion Relationships in Complex Environments
by Xi Zhao, Yuekun Zhang and Yaqing Zhou
Appl. Sci. 2024, 14(20), 9355; https://doi.org/10.3390/app14209355 - 14 Oct 2024
Viewed by 356
Abstract
For the reconstruction of objects during hand–object interactions, accurate pose estimation is indispensable. By improving the precision of pose estimation, the accuracy of the 3D reconstruction results can be enhanced. Recently, pose tracking techniques are no longer limited to individual objects, leading to [...] Read more.
For the reconstruction of objects during hand–object interactions, accurate pose estimation is indispensable. By improving the precision of pose estimation, the accuracy of the 3D reconstruction results can be enhanced. Recently, pose tracking techniques are no longer limited to individual objects, leading to advancements in the reconstruction of objects interacting with other objects. However, most methods struggle to handle incomplete target information in complex scenes and mutual interference between objects in the environment, leading to a decrease in pose estimation accuracy. We proposed an improved algorithm building upon the existing BundleSDF framework, which enables more robust and accurate tracking by considering the occlusion relationships between objects. First of all, for detecting changes in occlusion relationships, we segment the target and compute dual-layer masks. Secondly, rough pose estimation is performed through feature matching, and a keyframe pool is introduced for pose optimization, which is maintained based on occlusion relationships. Lastly, the estimated results of historical frames are used to train an object neural field to assist in the subsequent pose-tracking process. Experimental verification shows that on the HO-3D dataset, our method can significantly improve the accuracy and robustness of object tracking in frequent interactions, providing new ideas for object pose-tracking tasks in complex scenes. Full article
(This article belongs to the Special Issue Technical Advances in 3D Reconstruction)
Show Figures

Figure 1

21 pages, 4510 KiB  
Article
Pedestrian Trajectory Prediction in Crowded Environments Using Social Attention Graph Neural Networks
by Mengya Zong, Yuchen Chang, Yutian Dang and Kaiping Wang
Appl. Sci. 2024, 14(20), 9349; https://doi.org/10.3390/app14209349 - 14 Oct 2024
Viewed by 560
Abstract
Trajectory prediction is a key component in the development of applications such as mixed urban traffic management and public safety. Traditional models have struggled with the complexity of modeling dynamic crowd interactions, the intricacies of spatiotemporal dependencies, and environmental constraints. Addressing these challenges, [...] Read more.
Trajectory prediction is a key component in the development of applications such as mixed urban traffic management and public safety. Traditional models have struggled with the complexity of modeling dynamic crowd interactions, the intricacies of spatiotemporal dependencies, and environmental constraints. Addressing these challenges, this paper introduces the innovative Social Attention Graph Neural Network (SA-GAT) framework. Utilizing Long Short-Term Memory (LSTM) networks, SA-GAT encodes pedestrian trajectory data to extract temporal correlations, while Graph Attention Networks (GAT) are employed to precisely capture the subtle interactions among pedestrians. The SA-GAT framework boosts its predictive accuracy with two key innovations. First, it features a Scene Potential Module that utilizes a Scene Tensor to dynamically capture the interplay between crowds and their environment. Second, it incorporates a Transition Intention Module with a Transition Tensor, which interprets latent transfer probabilities from trajectory data to reveal pedestrians’ implicit intentions at specific locations. Based on AnyLogic modeling of the metro station on Line 10 of Chengdu Shuangliu Airport, China, numerical studies reveal that the SA-GAT model achieves a substantial reduction in ADE and FDE metrics by 34.22% and 38.04% compared to baseline models. Full article
Show Figures

Figure 1

25 pages, 6736 KiB  
Article
LFIR-YOLO: Lightweight Model for Infrared Vehicle and Pedestrian Detection
by Quan Wang, Fengyuan Liu, Yi Cao, Farhan Ullah and Muxiong Zhou
Sensors 2024, 24(20), 6609; https://doi.org/10.3390/s24206609 - 14 Oct 2024
Viewed by 512
Abstract
The complexity of urban road scenes at night and the inadequacy of visible light imaging in such conditions pose significant challenges. To address the issues of insufficient color information, texture detail, and low spatial resolution in infrared imagery, we propose an enhanced infrared [...] Read more.
The complexity of urban road scenes at night and the inadequacy of visible light imaging in such conditions pose significant challenges. To address the issues of insufficient color information, texture detail, and low spatial resolution in infrared imagery, we propose an enhanced infrared detection model called LFIR-YOLO, which is built upon the YOLOv8 architecture. The primary goal is to improve the accuracy of infrared target detection in nighttime traffic scenarios while meeting practical deployment requirements. First, to address challenges such as limited contrast and occlusion noise in infrared images, the C2f module in the high-level backbone network is augmented with a Dilation-wise Residual (DWR) module, incorporating multi-scale infrared contextual information to enhance feature extraction capabilities. Secondly, at the neck of the network, a Content-guided Attention (CGA) mechanism is applied to fuse features and re-modulate both initial and advanced features, catering to the low signal-to-noise ratio and sparse detail features characteristic of infrared images. Third, a shared convolution strategy is employed in the detection head, replacing the decoupled head strategy and utilizing shared Detail Enhancement Convolution (DEConv) and Group Norm (GN) operations to achieve lightweight yet precise improvements. Finally, loss functions, PIoU v2 and Adaptive Threshold Focal Loss (ATFL), are integrated into the model to better decouple infrared targets from the background and to enhance convergence speed. The experimental results on the FLIR and multispectral datasets show that the proposed LFIR-YOLO model achieves an improvement in detection accuracy of 4.3% and 2.6%, respectively, compared to the YOLOv8 model. Furthermore, the model demonstrates a reduction in parameters and computational complexity by 15.5% and 34%, respectively, enhancing its suitability for real-time deployment on resource-constrained edge devices. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 2043 KiB  
Article
Rethinking the Non-Maximum Suppression Step in 3D Object Detection from a Bird’s-Eye View
by Bohao Li, Shaojing Song and Luxia Ai
Electronics 2024, 13(20), 4034; https://doi.org/10.3390/electronics13204034 - 14 Oct 2024
Viewed by 348
Abstract
In camera-based bird’s-eye view (BEV) 3D object detection, non-maximum suppression (NMS) plays a crucial role. However, traditional NMS methods become ineffective in BEV scenarios where the predicted bounding boxes of small object instances often have no overlapping areas. To address this issue, this [...] Read more.
In camera-based bird’s-eye view (BEV) 3D object detection, non-maximum suppression (NMS) plays a crucial role. However, traditional NMS methods become ineffective in BEV scenarios where the predicted bounding boxes of small object instances often have no overlapping areas. To address this issue, this paper proposes a BEV intersection over union (IoU) computation method based on relative position and absolute spatial information, referred to as B-IoU. Additionally, a BEV circular search method, called B-Grouping, is introduced to handle prediction boxes of varying scales. Utilizing these two methods, a novel NMS strategy called BEV-NMS is developed to handle the complex prediction boxes in BEV perspectives. This BEV-NMS strategy is implemented in several existing algorithms. Based on the results from the nuScenes validation set, there was an average increase of 7.9% in mAP when compared to the strategy without NMS. The NDS also showed an average increase of 7.9% under the same comparison. Furthermore, compared to the Scale-NMS strategy, the mAP increased by an average of 3.4%, and the NDS saw an average improvement of 3.1%. Full article
Show Figures

Figure 1

19 pages, 12366 KiB  
Article
An Effective Yak Behavior Classification Model with Improved YOLO-Pose Network Using Yak Skeleton Key Points Images
by Yuxiang Yang, Yifan Deng, Jiazhou Li, Meiqi Liu, Yao Yao, Zhaoyuan Peng, Luhui Gu and Yingqi Peng
Agriculture 2024, 14(10), 1796; https://doi.org/10.3390/agriculture14101796 - 12 Oct 2024
Viewed by 437
Abstract
Yak behavior is a valuable indicator of their welfare and health. Information about important statuses, including fattening, reproductive health, and diseases, can be reflected and monitored through several indicative behavior patterns. In this study, an improved YOLOv7-pose model was developed to detect six [...] Read more.
Yak behavior is a valuable indicator of their welfare and health. Information about important statuses, including fattening, reproductive health, and diseases, can be reflected and monitored through several indicative behavior patterns. In this study, an improved YOLOv7-pose model was developed to detect six yak behavior patterns in real time using labeled yak key-point images. The model was trained using labeled key-point image data of six behavior patterns including walking, feeding, standing, lying, mounting, and eliminative behaviors collected from seventeen 18-month-old yaks for two weeks. There were another four YOLOv7-pose series models trained as comparison methods for yak behavior pattern detection. The improved YOLOv7-pose model achieved the best detection performance with precision, recall, mAP0.5, and mAP0.5:0.95 of 89.9%, 87.7%, 90.4%, and 76.7%, respectively. The limitation of this study is that the YOLOv7-pose model detected behaviors under complex conditions, such as scene variation, subtle leg postures, and different light conditions, with relatively lower precision, which impacts its detection performance. Future developments in yak behavior pattern detection will amplify the simple size of the dataset and will utilize data streams like optical and video streams for real-time yak monitoring. Additionally, the model will be deployed on edge computing devices for large-scale agricultural applications. Full article
Show Figures

Figure 1

21 pages, 3845 KiB  
Article
Semantic Segmentation of Satellite Images for Landslide Detection Using Foreground-Aware and Multi-Scale Convolutional Attention Mechanism
by Chih-Chang Yu, Yuan-Di Chen, Hsu-Yung Cheng and Chi-Lun Jiang
Sensors 2024, 24(20), 6539; https://doi.org/10.3390/s24206539 - 10 Oct 2024
Viewed by 280
Abstract
Advancements in satellite and aerial imagery technology have made it easier to obtain high-resolution remote sensing images, leading to widespread research and applications in various fields. Remote sensing image semantic segmentation is a crucial task that provides semantic and localization information for target [...] Read more.
Advancements in satellite and aerial imagery technology have made it easier to obtain high-resolution remote sensing images, leading to widespread research and applications in various fields. Remote sensing image semantic segmentation is a crucial task that provides semantic and localization information for target objects. In addition to the large-scale variation issues common in most semantic segmentation datasets, aerial images present unique challenges, including high background complexity and imbalanced foreground–background ratios. However, general semantic segmentation methods primarily address scale variations in natural scenes and often neglect the specific challenges in remote sensing images, such as inadequate foreground modeling. In this paper, we present a foreground-aware remote sensing semantic segmentation model. The model introduces a multi-scale convolutional attention mechanism and utilizes a feature pyramid network architecture to extract multi-scale features, addressing the multi-scale problem. Additionally, we introduce a Foreground–Scene Relation Module to mitigate false alarms. The model enhances the foreground features by modeling the relationship between the foreground and the scene. In the loss function, a Soft Focal Loss is employed to focus on foreground samples during training, alleviating the foreground–background imbalance issue. Experimental results indicate that our proposed method outperforms current state-of-the-art general semantic segmentation methods and transformer-based methods on the LS dataset benchmark. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop