Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (620)

Search Parameters:
Keywords = frame fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 6829 KiB  
Article
Lightweight Deep Learning Model for Fire Classification in Tunnels
by Shakhnoza Muksimova, Sabina Umirzakova, Jushkin Baltayev and Young-Im Cho
Fire 2025, 8(3), 85; https://doi.org/10.3390/fire8030085 - 20 Feb 2025
Abstract
Tunnel fires pose a severe threat to human safety and infrastructure, necessitating the development of advanced and efficient fire detection systems. This paper presents a novel lightweight deep learning (DL) model specifically designed for real-time fire classification in tunnel environments. This model integrates [...] Read more.
Tunnel fires pose a severe threat to human safety and infrastructure, necessitating the development of advanced and efficient fire detection systems. This paper presents a novel lightweight deep learning (DL) model specifically designed for real-time fire classification in tunnel environments. This model integrates MobileNetV3 for spatial feature extraction, Temporal Convolutional Networks (TCNs) for temporal sequence analysis, and advanced attention mechanisms, including Convolutional Block Attention Modules (CBAMs) and Squeeze-and-Excitation (SE) blocks, to prioritize critical features such as flames and smoke patterns while suppressing irrelevant noise. The model is trained on a custom dataset containing real tunnel fire incidents generated using a newly prepared dataset. This approach enhances the model generalization capabilities, enabling it to handle diverse fire scenarios, including those with low visibility, high smoke density, and variable ventilation conditions. Deployment optimizations, such as quantization and layer fusion, ensure computational efficiency, achieving an average inference time of 12ms/frame, making it suitable for resource-constrained environments like IoT and edge devices. The experimental results demonstrate that the proposed model achieves an accuracy of 96.5%, a precision of 95.7%, and a recall of 97.2%, significantly outperforming state-of-the-art (SOTA) models such as ResNet50 and YOLOv5 in both accuracy and real-time performance. Robustness tests under challenging conditions validate model reliability and adaptability, marking it as a critical advancement in tunnel fire detection systems. This study provides valuable insights into the design and deployment of efficient fire classification systems for safety-critical applications. The proposed model offers a scalable, high-performance solution for tunnel fire monitoring and establishes a benchmark for future research in real-time video-based classification under complex environmental conditions. Full article
Show Figures

Figure 1

21 pages, 4811 KiB  
Article
YOLO-AMM: A Real-Time Classroom Behavior Detection Algorithm Based on Multi-Dimensional Feature Optimization
by Yi Cao, Qian Cao, Chengshan Qian and Deji Chen
Sensors 2025, 25(4), 1142; https://doi.org/10.3390/s25041142 - 13 Feb 2025
Abstract
Classroom behavior detection is a key task in constructing intelligent educational environments. However, the existing models are still deficient in detail feature capture capability, multi-layer feature correlation, and multi-scale target adaptability, making it challenging to realize high-precision real-time detection in complex scenes. This [...] Read more.
Classroom behavior detection is a key task in constructing intelligent educational environments. However, the existing models are still deficient in detail feature capture capability, multi-layer feature correlation, and multi-scale target adaptability, making it challenging to realize high-precision real-time detection in complex scenes. This paper proposes an improved classroom behavior detection algorithm, YOLO-AMM, to solve these problems. Firstly, we constructed the Adaptive Efficient Feature Fusion (AEFF) module to enhance the fusion of semantic information between different features and improve the model’s ability to capture detailed features. Then, we designed a Multi-dimensional Feature Flow Network (MFFN), which fuses multi-dimensional features and enhances the correlation information between features through the multi-scale feature aggregation module and contextual information diffusion mechanism. Finally, we proposed a Multi-Scale Perception and Fusion Detection Head (MSPF-Head), which significantly improves the adaptability of the head to different scale targets by introducing multi-scale feature perception, feature interaction, and fusion mechanisms. The experimental results showed that compared with the YOLOv8n model, YOLO-AMM improved the mAP0.5 and mAP0.5-0.95 by 3.1% and 4.0%, significantly improving the detection accuracy. Meanwhile, YOLO-AMM increased the detection speed (FPS) by 12.9 frames per second to 169.1 frames per second, which meets the requirement for real-time detection of classroom behavior. Full article
(This article belongs to the Special Issue Sensor-Based Behavioral Biometrics)
Show Figures

Figure 1

18 pages, 39910 KiB  
Article
DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM
by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu
Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025
Abstract
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

23 pages, 5392 KiB  
Article
A Sliding Window-Based CNN-BiGRU Approach for Human Skeletal Pose Estimation Using mmWave Radar
by Yuquan Luo, Yuqiang He, Yaxin Li, Huaiqiang Liu, Jun Wang and Fei Gao
Sensors 2025, 25(4), 1070; https://doi.org/10.3390/s25041070 - 11 Feb 2025
Abstract
In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame [...] Read more.
In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame time-series data, enabling the full utilization of temporal information. This is combined with convolutional neural networks (CNNs) for spatial feature extraction and a bidirectional gated recurrent unit (BiGRU) for temporal modeling. The proposed spatio-temporal information fusion framework for multi-frame point cloud data fully exploits spatio-temporal features, effectively alleviates the sparsity issue of radar point clouds, and significantly enhances the accuracy and robustness of pose estimation. Experimental results demonstrate that the proposed system accurately detects 25 skeletal joints, particularly improving the positioning accuracy of fine joints, such as the wrist, thumb, and fingertip, highlighting its potential for widespread application in human–computer interaction, intelligent monitoring, and motion analysis. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

15 pages, 3184 KiB  
Article
A Lightweight Single-Image Super-Resolution Method Based on the Parallel Connection of Convolution and Swin Transformer Blocks
by Tengyun Jing, Cuiyin Liu and Yuanshuai Chen
Appl. Sci. 2025, 15(4), 1806; https://doi.org/10.3390/app15041806 - 10 Feb 2025
Abstract
In recent years, with the development of deep learning technologies, Vision Transformers combined with Convolutional Neural Networks (CNNs) have made significant progress in the field of single-image super-resolution (SISR). However, existing methods still face issues such as incomplete high-frequency information reconstruction, training instability [...] Read more.
In recent years, with the development of deep learning technologies, Vision Transformers combined with Convolutional Neural Networks (CNNs) have made significant progress in the field of single-image super-resolution (SISR). However, existing methods still face issues such as incomplete high-frequency information reconstruction, training instability caused by residual connections, and insufficient cross-window information exchange. To address these problems and better leverage both local and global information, this paper proposes a super-resolution reconstruction network based on the Parallel Connection of Convolution and Swin Transformer Block (PCCSTB) to model the local and global features of an image. Specifically, through a parallel structure of channel feature-enhanced convolution and Swin Transformer, the network extracts, enhances, and fuses the local and global information. Additionally, this paper designs a fusion module to integrate the global and local information extracted by CNNs. The experimental results show that the proposed network effectively balances SR performance and network complexity, achieving good results in the lightweight SR domain. For instance, in the 4× super-resolution experiment on the Urban100 dataset, the network achieves an inference speed of 55 frames per second under the same device conditions, which is more than seven times as fast as the state-of-the-art network Shifted Window-based Image Restoration (SwinIR). Moreover, the network’s Peak Signal-to-Noise Ratio (PSNR) outperforms SwinIR by 0.29 dB at a 4× scale on the Set5 dataset, indicating that the network efficiently performs high-resolution image reconstruction. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)
Show Figures

Figure 1

20 pages, 6545 KiB  
Article
RFCS-YOLO: Target Detection Algorithm in Adverse Weather Conditions via Receptive Field Enhancement and Cross-Scale Fusion
by Gang Liu, Yingzheng Huang, Shuguang Yan and Enxiang Hou
Sensors 2025, 25(3), 912; https://doi.org/10.3390/s25030912 - 3 Feb 2025
Abstract
The paper proposes a model based on receptive field enhancement and cross-scale fusion (RFCS-YOLO). It addresses challenges like complex backgrounds and problems of missing and mis-detecting traffic targets in bad weather. First, an efficient feature extraction module (EFEM) is created. It reconfigures the [...] Read more.
The paper proposes a model based on receptive field enhancement and cross-scale fusion (RFCS-YOLO). It addresses challenges like complex backgrounds and problems of missing and mis-detecting traffic targets in bad weather. First, an efficient feature extraction module (EFEM) is created. It reconfigures the backbone network. This helps to make the receptive field better and improves its ability to extract features of targets at different scales. Next, a cross-scale fusion module (CSF) is introduced. It uses the receptive field coordinate attention mechanism (RFCA) to fuse information from different scales well. It also filters out noise and background information that might interfere. Also, a new Focaler-Minimum Point Distance Intersection over Union (F-MPDIoU) loss function is proposed. It makes the model converge faster and deals with issues of leakage and false detection. Experiments were conducted on the expanded Vehicle Detection in Adverse Weather Nature dataset (DWAN). The results show significant improvements compared to the conventional You Only Look Once v7 (YOLOv7) model. The mean Average Precision ([email protected]), precision, and recall are enhanced by 4.2%, 8.3%, and 1.4%, respectively. The mean Average Precision is 86.5%. The frame rate is 68 frames per second (FPS), which meets the requirements for real-time detection. A generalization experiment was conducted using the autonomous driving dataset SODA10M. The [email protected] achieved 56.7%, which is a 3.6% improvement over the original model. This result demonstrates the good generalization ability of the proposed method. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

21 pages, 10344 KiB  
Article
Efficient Deployment of Peanut Leaf Disease Detection Models on Edge AI Devices
by Zekai Lv, Shangbin Yang, Shichuang Ma, Qiang Wang, Jinti Sun, Linlin Du, Jiaqi Han, Yufeng Guo and Hui Zhang
Agriculture 2025, 15(3), 332; https://doi.org/10.3390/agriculture15030332 - 2 Feb 2025
Abstract
The intelligent transformation of crop leaf disease detection has driven the use of deep neural network algorithms to develop more accurate disease detection models. In resource-constrained environments, the deployment of crop leaf disease detection models on the cloud introduces challenges such as communication [...] Read more.
The intelligent transformation of crop leaf disease detection has driven the use of deep neural network algorithms to develop more accurate disease detection models. In resource-constrained environments, the deployment of crop leaf disease detection models on the cloud introduces challenges such as communication latency and privacy concerns. Edge AI devices offer lower communication latency and enhanced scalability. To achieve the efficient deployment of crop leaf disease detection models on edge AI devices, a dataset of 700 images depicting peanut leaf spot, scorch spot, and rust diseases was collected. The YOLOX-Tiny network was utilized to conduct deployment experiments with the peanut leaf disease detection model on the Jetson Nano B01. The experiments initially focused on three aspects of efficient deployment optimization: the fusion of rectified linear unit (ReLU) and convolution operations, the integration of Efficient Non-Maximum Suppression for TensorRT (EfficientNMS_TRT) to accelerate post-processing within the TensorRT model, and the conversion of model formats from number of samples, channels, height, width (NCHW) to number of samples, height, width, and channels (NHWC) in the TensorFlow Lite model. Additionally, experiments were conducted to compare the memory usage, power consumption, and inference latency between the two inference frameworks, as well as to evaluate the real-time video detection performance using DeepStream. The results demonstrate that the fusion of ReLU activation functions with convolution operations reduced the inference latency by 55.5% compared to the use of the Sigmoid linear unit (SiLU) activation alone. In the TensorRT model, the integration of the EfficientNMS_TRT module accelerated post-processing, leading to a reduction in the inference latency of 19.6% and an increase in the frames per second (FPS) of 20.4%. In the TensorFlow Lite model, conversion to the NHWC format decreased the model conversion time by 88.7% and reduced the inference latency by 32.3%. These three efficient deployment optimization methods effectively decreased the inference latency and enhanced the inference efficiency. Moreover, a comparison between the two frameworks revealed that TensorFlow Lite exhibited memory usage reductions of 15% to 20% and power consumption decreases of 15% to 25% compared to TensorRT. Additionally, TensorRT achieved inference latency reductions of 53.2% to 55.2% relative to TensorFlow Lite. Consequently, TensorRT is deemed suitable for tasks requiring strong real-time performance and low latency, whereas TensorFlow Lite is more appropriate for scenarios with constrained memory and power resources. Additionally, the integration of DeepStream and EfficientNMS_TRT was found to optimize memory and power utilization, thereby enhancing the speed of real-time video detection. A detection rate of 28.7 FPS was achieved at a resolution of 1280 × 720. These experiments validate the feasibility and advantages of deploying crop leaf disease detection models on edge AI devices. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

21 pages, 17223 KiB  
Article
Line-YOLO: An Efficient Detection Algorithm for Power Line Angle
by Chuanjiang Wang, Yuqing Chen, Zecong Wu, Baoqi Liu, Hao Tian, Dongxiao Jiang and Xiujuan Sun
Sensors 2025, 25(3), 876; https://doi.org/10.3390/s25030876 - 31 Jan 2025
Abstract
Aiming at the problem that the workload of human judgment of the power line tilt angle is large and prone to large errors, this paper proposes an improved algorithm Line-YOLO based on YOLOv8s-seg. Firstly, the problem of the variable shape of the power [...] Read more.
Aiming at the problem that the workload of human judgment of the power line tilt angle is large and prone to large errors, this paper proposes an improved algorithm Line-YOLO based on YOLOv8s-seg. Firstly, the problem of the variable shape of the power line is solved through the introduction of deformable convolutional DCNv4, and the detection accuracy is improved. The BiFPN structure is also introduced for the Neck layer, which shortens the time required for feature fusion and improves the detection efficiency. After that, the EMA attention mechanism module is added behind the second and third C2f modules of the original model, which improves the model’s ability to recognize the target, and effectively solves the problem of loss and error when power line targets overlap. Finally, a small target detection head is added after the first EMA attention mechanism module for detecting small or occluded targets in the image, which improves the model’s ability to detect small targets. In this paper, we conduct experiments by collecting relevant power line connection images and making our dataset. The experimental results show that the mAP@0.5 of Line-YOLO is improved by 6.2% compared to the benchmark model, the number of parameters is reduced by 28.2%, the floating-point operations per second is enhanced by 35.3%, and the number of detected frames per second is improved by 14 FPS. It is proved by the experiments that the enhanced model Line-YOLO detects the results better, and it can efficiently complete the power line angle detection task. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

16 pages, 8462 KiB  
Article
Wavelet-Based, Blur-Aware Decoupled Network for Video Deblurring
by Hua Wang, Pornntiwa Pawara and Rapeeporn Chamchong
Appl. Sci. 2025, 15(3), 1311; https://doi.org/10.3390/app15031311 - 27 Jan 2025
Abstract
Video deblurring faces a fundamental challenge, as blur degradation comprehensively affects frames by not only causing detail loss but also severely distorting structural information. This dual degradation across low- and high-frequency domains makes it challenging for existing methods to simultaneously restore both structural [...] Read more.
Video deblurring faces a fundamental challenge, as blur degradation comprehensively affects frames by not only causing detail loss but also severely distorting structural information. This dual degradation across low- and high-frequency domains makes it challenging for existing methods to simultaneously restore both structural and detailed information through a unified approach. To address this issue, we propose a wavelet-based, blur-aware decoupled network (WBDNet) that innovatively decouples structure reconstruction from detail enhancement. Our method decomposes features into multiple frequency bands and employs specialized restoration strategies for different frequency domains. In the low-frequency domain, we construct a multi-scale feature pyramid with optical flow alignment. This enables accurate structure reconstruction through bottom-up progressive feature fusion. For high-frequency components, we combine deformable convolution with a blur-aware attention mechanism. This allows us to precisely extract and merge sharp details from multiple frames. Extensive experiments on benchmark datasets demonstrate the superior performance of our method, particularly in preserving structural integrity and detail fidelity. Full article
Show Figures

Figure 1

16 pages, 6551 KiB  
Article
Steel Surface Defect Detection Technology Based on YOLOv8-MGVS
by Kai Zeng, Zibo Xia, Junlei Qian, Xueqiang Du, Pengcheng Xiao and Liguang Zhu
Metals 2025, 15(2), 109; https://doi.org/10.3390/met15020109 - 23 Jan 2025
Viewed by 348
Abstract
Surface defects have a serious detrimental effect on the quality of steel. To address the problems of low efficiency and poor accuracy in the manual inspection process, intelligent detection technology based on machine learning has been gradually applied to the detection of steel [...] Read more.
Surface defects have a serious detrimental effect on the quality of steel. To address the problems of low efficiency and poor accuracy in the manual inspection process, intelligent detection technology based on machine learning has been gradually applied to the detection of steel surface defects. An improved YOLOv8 steel surface defect detection model called YOLOv8-MGVS is designed to address these challenges. The MLCA mechanism in the C2f module is applied to increase the feature extraction ability in the backbone network. The lightweight GSConv and VovGscsp cross-stage fusion modules are added to the neck network to reduce the loss of semantic information and achieve effective information fusion. The self-attention mechanism is exploited into the detection network to improve the detection ability of small targets. Defect detection experiments were carried out on the NEU-DET dataset. Compared with YOLOv8n from experimental results, the average accuracy, recall rate, and frames per second of the improved model were improved by 5.2%, 10.5%, and 6.4%, respectively, while the number of parameters and computational costs were reduced by 5.8% and 14.8%, respectively. Furthermore, the defect detection generalization experiments on the GC-10 dataset and SDD DET dataset confirmed that the YOLOv8-MGVS model has higher detection accuracy, better lightweight, and speed. Full article
Show Figures

Figure 1

19 pages, 1575 KiB  
Article
FIFA3D: Flow-Guided Feature Aggregation for Temporal Three-Dimensional Object Detection
by Ruiqi Ma, Chunwei Wang, Chi Chen, Yihan Zeng, Bijun Li, Qin Zou, Qingqiu Huang, Xinge Zhu and Hang Xu
Remote Sens. 2025, 17(3), 380; https://doi.org/10.3390/rs17030380 - 23 Jan 2025
Viewed by 363
Abstract
Detecting accurate 3D bounding boxes from LiDAR point clouds is crucial for autonomous driving. Recent studies have shown the superiority of the performance of multi-frame 3D detectors, yet eliminating the misalignment across frames and effectively aggregating spatiotemporal information are still challenging problems. In [...] Read more.
Detecting accurate 3D bounding boxes from LiDAR point clouds is crucial for autonomous driving. Recent studies have shown the superiority of the performance of multi-frame 3D detectors, yet eliminating the misalignment across frames and effectively aggregating spatiotemporal information are still challenging problems. In this paper, we present a novel flow-guided feature aggregation scheme for 3D object detection (FIFA3D) to align cross-frame information. FIFA3D first leverages optical flow with supervised signals to model the pixel-to-pixel correlations between sequential frames. Considering the sparse nature of bird’s-eye-view feature maps, an additional classification branch is adopted to provide explicit pixel-wise clues. Meanwhile, we utilize multi-scale feature maps and predict flow in a coarse-to-fine manner. With guidance from the estimated flow, historical features can be well aligned to the current situation, and a cascade fusion strategy is introduced to benefit the following detection. Extensive experiments show that FIFA3D surpasses the single-frame baseline with remarkable margins of +10.8% mAPH and +6.8% mAP on the Waymo and nuScenes validation datasets and performs well compared with state-of-the-art methods. Full article
Show Figures

Figure 1

19 pages, 4699 KiB  
Article
Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
by Hao Zhang, Hengzhou Ye, Xiaoyu Guo, Xu Zhang, Yao Rong and Shuiwang Li
Drones 2025, 9(1), 68; https://doi.org/10.3390/drones9010068 - 17 Jan 2025
Viewed by 448
Abstract
Driven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy of real-time [...] Read more.
Driven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy of real-time tracking tasks. Addressing the shortcomings of existing tracking systems in capturing temporal relationships between consecutive frames, NT-Track meticulously analyzes the positional changes in targets across frames and leverages the similarity of the surrounding areas to extract feature information. Furthermore, our method integrates spatial and temporal information seamlessly into a unified framework through the introduction of a temporal feature fusion technique, thereby bolstering the overall performance of the model. NT-Track also incorporates a spatial neighborhood feature extraction module, which focuses on identifying and extracting features within the neighborhood of the target in each frame, ensuring continuous focus on the target during inter-frame processing. By employing an improved Transformer backbone network, our approach effectively integrates spatio-temporal information, enhancing the accuracy and robustness of tracking. Our experimental results on several challenging benchmark datasets demonstrate that NT-Track surpasses existing lightweight and deep learning trackers in terms of precision and success rate. It is noteworthy that, on the VisDrone2018 benchmark, NT-Track achieved a precision rate of 90% for the first time, an accomplishment that not only showcases its exceptional performance in complex environments, but also confirms its potential and effectiveness in practical applications. Full article
Show Figures

Figure 1

27 pages, 20827 KiB  
Article
Three-Dimensional Reconstruction of Space Targets Utilizing Joint Optical-and-ISAR Co-Location Observation
by Wanting Zhou, Lei Liu, Rongzhen Du, Ze Wang, Ronghua Shang and Feng Zhou
Remote Sens. 2025, 17(2), 287; https://doi.org/10.3390/rs17020287 - 15 Jan 2025
Viewed by 361
Abstract
With traditional three-dimensional (3-D) reconstruction methods for space targets, it is difficult to achieve 3-D structure and attitude reconstruction simultaneously. To tackle this problem, a 3-D reconstruction method for space targets is proposed, and the alignment and fusion of optical and ISAR images [...] Read more.
With traditional three-dimensional (3-D) reconstruction methods for space targets, it is difficult to achieve 3-D structure and attitude reconstruction simultaneously. To tackle this problem, a 3-D reconstruction method for space targets is proposed, and the alignment and fusion of optical and ISAR images are investigated. Firstly, multiple pairs of optical and ISAR images are acquired in the joint optical-and-ISAR co-location observation system (COS). Then, key points of space targets on the images are used to solve for the Doppler information and the 3-D attitude. Meanwhile, the image offsets of each pair are further aligned based on Doppler co-projection between optical and ISAR images. The 3-D rotational offset relationship and the 3-D translational offset relationship are next deduced to align the spatial offset between pairs of images based on attitude changes in neighboring frames. Finally, a voxel trimming mechanism based on growth learning (VTM-GL) is designed to obtain the reserved voxels where mask features are used. Experimental results verify the effectiveness and robustness of the OC-V3R-OI method. Full article
Show Figures

Graphical abstract

57 pages, 2891 KiB  
Review
Event-Based Visual Simultaneous Localization and Mapping (EVSLAM) Techniques: State of the Art and Future Directions
by Mohsen Shahraki, Ahmed Elamin and Ahmed El-Rabbany
J. Sens. Actuator Netw. 2025, 14(1), 7; https://doi.org/10.3390/jsan14010007 - 14 Jan 2025
Viewed by 479
Abstract
Recent advances in event-based cameras have led to significant developments in robotics, particularly in visual simultaneous localization and mapping (VSLAM) applications. This technique enables real-time camera motion estimation and simultaneous environment mapping using visual sensors on mobile platforms. Event cameras offer several distinct [...] Read more.
Recent advances in event-based cameras have led to significant developments in robotics, particularly in visual simultaneous localization and mapping (VSLAM) applications. This technique enables real-time camera motion estimation and simultaneous environment mapping using visual sensors on mobile platforms. Event cameras offer several distinct advantages over frame-based cameras, including a high dynamic range, high temporal resolution, low power consumption, and low latency. These attributes make event cameras highly suitable for addressing performance issues in challenging scenarios such as high-speed motion and environments with high-range illumination. This review paper delves into event-based VSLAM (EVSLAM) algorithms, leveraging the advantages inherent in event streams for localization and mapping endeavors. The exposition commences by explaining the operational principles of event cameras, providing insights into the diverse event representations applied in event data preprocessing. A crucial facet of this survey is the systematic categorization of EVSLAM research into three key parts: event preprocessing, event tracking, and sensor fusion algorithms in EVSLAM. Each category undergoes meticulous examination, offering practical insights and guidance for comprehending each approach. Moreover, we thoroughly assess state-of-the-art (SOTA) methods, emphasizing conducting the evaluation on a specific dataset for enhanced comparability. This evaluation sheds light on current challenges and outlines promising avenues for future research, emphasizing the persisting obstacles and potential advancements in this dynamically evolving domain. Full article
(This article belongs to the Section Actuators, Sensors and Devices)
Show Figures

Figure 1

27 pages, 19274 KiB  
Article
Enhancing Underwater Video from Consecutive Frames While Preserving Temporal Consistency
by Kai Hu, Yuancheng Meng, Zichen Liao, Lei Tang and Xiaoling Ye
J. Mar. Sci. Eng. 2025, 13(1), 127; https://doi.org/10.3390/jmse13010127 - 12 Jan 2025
Viewed by 646
Abstract
Current methods for underwater image enhancement primarily focus on single-frame processing. While these approaches achieve impressive results for static images, they often fail to maintain temporal coherence across frames in underwater videos, which leads to temporal artifacts and frame flickering. Furthermore, existing enhancement [...] Read more.
Current methods for underwater image enhancement primarily focus on single-frame processing. While these approaches achieve impressive results for static images, they often fail to maintain temporal coherence across frames in underwater videos, which leads to temporal artifacts and frame flickering. Furthermore, existing enhancement methods struggle to accurately capture features in underwater scenes. This makes it difficult to handle challenges such as uneven lighting and edge blurring in complex underwater environments. To address these issues, this paper presents a dual-branch underwater video enhancement network. The network synthesizes short-range video sequences by learning and inferring optical flow from individual frames. It effectively enhances temporal consistency across video frames through predicted optical flow information, thereby mitigating temporal instability within frame sequences. In addition, to address the limitations of traditional U-Net models in handling complex multiscale feature fusion, this study proposes a novel underwater feature fusion module. By applying both max pooling and average pooling, this module separately extracts local and global features. It utilizes an attention mechanism to adaptively adjust the weights of different regions in the feature map, thereby effectively enhancing key regions within underwater video frames. Experimental results indicate that when compared with the existing underwater image enhancement baseline method and the consistency enhancement baseline method, the proposed model improves the consistency index by 30% and shows a marginal decrease of only 0.6% in enhancement quality index, demonstrating its superiority in underwater video enhancement tasks. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Back to TopTop