Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = hierarchical semantic guidance

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 6500 KiB  
Article
NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion
by Jin Zeng and Qingpeng Zhu
Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024
Viewed by 360
Abstract
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article
(This article belongs to the Special Issue Image Sensors and Companion Chips)
Show Figures

Figure 1

18 pages, 5494 KiB  
Article
Hierarchical Semantic-Guided Contextual Structure-Aware Network for Spectral Satellite Image Dehazing
by Lei Yang, Jianzhong Cao, Hua Wang, Sen Dong and Hailong Ning
Remote Sens. 2024, 16(9), 1525; https://doi.org/10.3390/rs16091525 - 25 Apr 2024
Viewed by 490
Abstract
Haze or cloud always shrouds satellite images, obscuring valuable geographic information for military surveillance, natural calamity surveillance and mineral resource exploration. Satellite image dehazing (SID) provides the possibility for better applications of satellite images. Most of the existing dehazing methods are tailored for [...] Read more.
Haze or cloud always shrouds satellite images, obscuring valuable geographic information for military surveillance, natural calamity surveillance and mineral resource exploration. Satellite image dehazing (SID) provides the possibility for better applications of satellite images. Most of the existing dehazing methods are tailored for natural images and are not very effective for satellite images with non-homogeneous haze since the semantic structure information and inconsistent attenuation are not fully considered. To tackle this problem, this study proposes a hierarchical semantic-guided contextual structure-aware network (SCSNet) for spectral satellite image dehazing. Specifically, a hybrid CNN–Transformer architecture integrated with a hierarchical semantic guidance (HSG) module is presented to learn semantic structure information by synergetically complementing local representation from non-local features. Furthermore, a cross-layer fusion (CLF) module is specially designed to replace the traditional skip connection during the feature decoding stage so as to reinforce the attention to the spatial regions and feature channels with more serious attenuation. The results on the SateHaze1k, RS-Haze, and RSID datasets demonstrated that the proposed SCSNet can achieve effective dehazing and outperforms existing state-of-the-art methods. Full article
(This article belongs to the Special Issue Remote Sensing Cross-Modal Research: Algorithms and Practices)
Show Figures

Figure 1

18 pages, 7301 KiB  
Article
Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
by Li He, Qingxiang Wang, Jie Liu, Jianyong Duan and Hao Wang
Appl. Sci. 2024, 14(6), 2333; https://doi.org/10.3390/app14062333 - 10 Mar 2024
Viewed by 927
Abstract
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe [...] Read more.
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe three key issues. Firstly, models are prone to misguidance when fusing unrelated text and images. Secondly, most existing visual features are not enhanced or filtered. Finally, due to the independent encoding strategies employed for text and images, a noticeable semantic gap exists between them. To address these challenges, we propose a framework called visual clue guidance and consistency matching (GMF). To tackle the first issue, we introduce a visual clue guidance (VCG) module designed to hierarchically extract visual information from multiple scales. This information is utilized as an injectable visual clue guidance sequence to steer text representations for error-insensitive prediction decisions. Furthermore, by incorporating a cross-scale attention (CSA) module, we successfully mitigate interference across scales, enhancing the image’s capability to capture details. To address the third issue of semantic disparity between text and images, we employ a consistency matching (CM) module based on the idea of multimodal contrastive learning, facilitating the collaborative learning of multimodal data. To validate the effectiveness of our proposed framework, we conducted comprehensive experimental studies, including extensive comparative experiments, ablation studies, and case studies, on two widely used benchmark datasets, demonstrating the efficacy of the framework. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

23 pages, 3418 KiB  
Article
Semantic Attention and Structured Model for Weakly Supervised Instance Segmentation in Optical and SAR Remote Sensing Imagery
by Man Chen, Kun Xu, Enping Chen, Yao Zhang, Yifei Xie, Yahao Hu and Zhisong Pan
Remote Sens. 2023, 15(21), 5201; https://doi.org/10.3390/rs15215201 - 1 Nov 2023
Viewed by 945
Abstract
Instance segmentation in remote sensing (RS) imagery aims to predict the locations of instances and represent them with pixel-level masks. Thanks to the more accurate pixel-level information for each instance, instance segmentation has enormous potential applications in resource planning, urban surveillance, and military [...] Read more.
Instance segmentation in remote sensing (RS) imagery aims to predict the locations of instances and represent them with pixel-level masks. Thanks to the more accurate pixel-level information for each instance, instance segmentation has enormous potential applications in resource planning, urban surveillance, and military reconnaissance. However, current RS imagery instance segmentation methods mostly follow the fully supervised paradigm, relying on expensive pixel-level labels. Moreover, remote sensing imagery suffers from cluttered backgrounds and significant variations in target scales, making segmentation challenging. To accommodate these limitations, we propose a semantic attention enhancement and structured model-guided multi-scale weakly supervised instance segmentation network (SASM-Net). Building upon the modeling of spatial relationships for weakly supervised instance segmentation, we further design the multi-scale feature extraction module (MSFE module), semantic attention enhancement module (SAE module), and structured model guidance module (SMG module) for SASM-Net to enable a balance between label production costs and visual processing. The MSFE module adopts a hierarchical approach similar to the residual structure to establish equivalent feature scales and to adapt to the significant scale variations of instances in RS imagery. The SAE module is a dual-stream structure with semantic information prediction and attention enhancement streams. It can enhance the network’s activation of instances in the images and reduce cluttered backgrounds’ interference. The SMG module can assist the SAE module in the training process to construct supervision with edge information, which can implicitly lead the model to a representation with structured inductive bias, reducing the impact of the low sensitivity of the model to edge information caused by the lack of fine-grained pixel-level labeling. Experimental results indicate that the proposed SASM-Net is adaptable to optical and synthetic aperture radar (SAR) RS imagery instance segmentation tasks. It accurately predicts instance masks without relying on pixel-level labels, surpassing the segmentation accuracy of all weakly supervised methods. It also shows competitiveness when compared to hybrid and fully supervised paradigms. This research provides a low-cost, high-quality solution for the instance segmentation task in optical and SAR RS imagery. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

15 pages, 5975 KiB  
Article
DFEN: Dual Feature Enhancement Network for Remote Sensing Image Caption
by Weihua Zhao, Wenzhong Yang, Danny Chen and Fuyuan Wei
Electronics 2023, 12(7), 1547; https://doi.org/10.3390/electronics12071547 - 25 Mar 2023
Cited by 4 | Viewed by 1258
Abstract
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes [...] Read more.
The remote sensing image caption can acquire ground objects and the semantic relationships between different ground objects. Existing remote sensing image caption algorithms do not acquire enough ground object information from remote-sensing images, resulting in inaccurate captions. As a result, this paper proposes a codec-based Dual Feature Enhancement Network (“DFEN”) to enhance ground object information from both image and text levels. We build the Image-Enhancement module at the image level using the multiscale characteristics of remote sensing images. Furthermore, more discriminative image context features are obtained through the Image-Enhancement module. The hierarchical attention mechanism aggregates multi-level features and supplements the ground object information ignored due to large-scale differences. At the text level, we use the image’s potential visual features to guide the Text-Enhance module, resulting in text guidance features that correctly focus on the information of the ground objects. Experiment results show that the DFEN model can enhance ground object information from images and text. Specifically, the BLEU-1 index increased by 8.6% in UCM-caption, 2.3% in Sydney-caption, and 5.1% in RSICD. The DFEN model has promoted the exploration of advanced semantics of remote sensing images and facilitated the development of remote sensing image caption. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 5857 KiB  
Article
CLHF-Net: A Channel-Level Hierarchical Feature Fusion Network for Remote Sensing Image Change Detection
by Jinming Ma, Di Lu, Yanxiang Li and Gang Shi
Symmetry 2022, 14(6), 1138; https://doi.org/10.3390/sym14061138 - 1 Jun 2022
Cited by 3 | Viewed by 1839
Abstract
Remote sensing (RS) image change detection (CD) is the procedure of detecting the change regions that occur in the same area in different time periods. A lot of research has extracted deep features and fused multi-scale features by convolutional neural networks and attention [...] Read more.
Remote sensing (RS) image change detection (CD) is the procedure of detecting the change regions that occur in the same area in different time periods. A lot of research has extracted deep features and fused multi-scale features by convolutional neural networks and attention mechanisms to achieve better CD performance, but these methods do not result in well-fused feature pairs of the same scale and features of different layers. To solve this problem, a novel CD network with symmetric structure called the channel-level hierarchical feature fusion network (CLHF-Net) is proposed. First, a channel-split feature fusion module (CSFM) with symmetric structure is proposed, which consists of three branches. The CSFM integrates feature information of the same scale feature pairs more adequately and effectively solves the problem of insufficient communication between feature pairs. Second, an interaction guidance fusion module (IGFM) is designed to fuse the feature information of different layers more effectively. IGFM introduces the detailed information from shallow features into deep features and deep semantic information into shallow features, and the fused features have more complete feature information of change regions and clearer edge information. Compared with other methods, CLHF-Net improves the F1 scores by 1.03%, 2.50%, and 3.03% on the three publicly available benchmark datasets: season-varying, WHU-CD, and LEVIR-CD datasets, respectively. Experimental results show that the performance of the proposed CLHF-Net is better than other comparative methods. Full article
Show Figures

Figure 1

15 pages, 1463 KiB  
Article
Dual Attention-Guided Multiscale Dynamic Aggregate Graph Convolutional Networks for Skeleton-Based Human Action Recognition
by Zeyuan Hu and Eung-Joo Lee
Symmetry 2020, 12(10), 1589; https://doi.org/10.3390/sym12101589 - 24 Sep 2020
Cited by 7 | Viewed by 2410
Abstract
Traditional convolution neural networks have achieved great success in human action recognition. However, it is challenging to establish effective associations between different human bone nodes to capture detailed information. In this paper, we propose a dual attention-guided multiscale dynamic aggregate graph convolution neural [...] Read more.
Traditional convolution neural networks have achieved great success in human action recognition. However, it is challenging to establish effective associations between different human bone nodes to capture detailed information. In this paper, we propose a dual attention-guided multiscale dynamic aggregate graph convolution neural network (DAG-GCN) for skeleton-based human action recognition. Our goal is to explore the best correlation and determine high-level semantic features. First, a multiscale dynamic aggregate GCN module is used to capture important semantic information and to establish dependence relationships for different bone nodes. Second, the higher level semantic feature is further refined, and the semantic relevance is emphasized through a dual attention guidance module. In addition, we exploit the relationship of joints hierarchically and the spatial temporal correlations through two modules. Experiments with the DAG-GCN method result in good performance on the NTU-60-RGB+D and NTU-120-RGB+D datasets. The accuracy is 95.76% and 90.01%, respectively, for the cross (X)-View and X-Subon the NTU60dataset. Full article
(This article belongs to the Section Computer)
Show Figures

Graphical abstract

Back to TopTop