Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (130)

Search Parameters:
Keywords = hierarchical semantic feature

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 6500 KiB  
Article
NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion
by Jin Zeng and Qingpeng Zhu
Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024
Viewed by 280
Abstract
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article
(This article belongs to the Special Issue Image Sensors and Companion Chips)
Show Figures

Figure 1

24 pages, 4810 KiB  
Article
APTrans: Transformer-Based Multilayer Semantic and Locational Feature Integration for Efficient Text Classification
by Gaoyang Ji, Zengzhao Chen, Hai Liu, Tingting Liu and Bing Wang
Appl. Sci. 2024, 14(11), 4863; https://doi.org/10.3390/app14114863 - 4 Jun 2024
Viewed by 362
Abstract
Text classification is not only a prerequisite for natural language processing work, such as sentiment analysis and natural language reasoning, but is also of great significance for screening massive amounts of information in daily life. However, the performance of classification algorithms is always [...] Read more.
Text classification is not only a prerequisite for natural language processing work, such as sentiment analysis and natural language reasoning, but is also of great significance for screening massive amounts of information in daily life. However, the performance of classification algorithms is always affected due to the diversity of language expressions, inaccurate semantic information, colloquial information, and many other problems. We identify three clues in this study, namely, core relevance information, semantic location associations, and the mining characteristics of deep and shallow networks for different information, to cope with these challenges. Two key insights about the text are revealed based on these three clues: key information relationship and word group inline relationship. We propose a novel attention feature fusion network, Attention Pyramid Transformer (APTrans), which is capable of learning the core semantic and location information from sentences using the above-mentioned two key insights. Specially, a hierarchical feature fusion module, Feature Fusion Connection (FFCon), is proposed to merge the semantic features of higher layers with positional features of lower layers. Thereafter, a Transformer-based XLNet network is used as the backbone to initially extract the long dependencies from statements. Comprehensive experiments show that APTrans can achieve leading results on the THUCNews Chinese dataset, AG News, and TREC-QA English dataset, outperforming most excellent pre-trained models. Furthermore, extended experiments are carried out on a self-built Chinese dataset theme analysis of teachers’ classroom corpus. We also provide visualization work, further proving that APTrans has good potential in text classification work. Full article
Show Figures

Figure 1

26 pages, 21449 KiB  
Article
Automated Multi-Type Pavement Distress Segmentation and Quantification Using Transformer Networks for Pavement Condition Index Prediction
by Zaiyan Zhang, Weidong Song, Yangyang Zhuang, Bing Zhang and Jiachen Wu
Appl. Sci. 2024, 14(11), 4709; https://doi.org/10.3390/app14114709 - 30 May 2024
Viewed by 310
Abstract
Pavement distress detection is a crucial task when assessing pavement performance conditions. Here, a novel deep-learning method based on a transformer network, referred to as ISTD-DisNet, is proposed for multi-type pavement distress semantic segmentation. In this methodology, a mix transformer (MiT) based on [...] Read more.
Pavement distress detection is a crucial task when assessing pavement performance conditions. Here, a novel deep-learning method based on a transformer network, referred to as ISTD-DisNet, is proposed for multi-type pavement distress semantic segmentation. In this methodology, a mix transformer (MiT) based on a hierarchical transformer structure is chosen as the backbone to obtain multi-scale feature information on pavement distress, and a mixed attention module (MAM) is introduced at the decoding stage to capture the pavement distress features across different channels and spatial locations. A learnable transposed convolution upsampling module (TCUM) enhances the model’s ability to restore multi-scale distress details. Subsequently, a novel parameter—the distress pixel density ratio (PDR)—is introduced based on the segmentation results. Analyzing the intrinsic correlation between the PDR and the pavement condition index (PCI), a new pavement damage index prediction model is proposed. Finally, the experimental results reveal that the F1 and mIOU of the proposed method are 95.51% and 91.67%, respectively, and the segmentation performance is better than that of the other seven mainstream segmentation models. Further PCI prediction model validation experimental results also indicate that utilizing the PDR enables the quantitative evaluation of the pavement damage conditions for each assessment unit, holding promising engineering application potential. Full article
Show Figures

Figure 1

15 pages, 6519 KiB  
Article
FF-HPINet: A Flipped Feature and Hierarchical Position Information Extraction Network for Lane Detection
by Xiaofeng Zhou and Peng Zhang
Sensors 2024, 24(11), 3502; https://doi.org/10.3390/s24113502 - 29 May 2024
Viewed by 316
Abstract
Effective lane detection technology plays an important role in the current autonomous driving system. Although deep learning models, with their intricate network designs, have proven highly capable of detecting lanes, there persist key areas requiring attention. Firstly, the symmetry inherent in visuals captured [...] Read more.
Effective lane detection technology plays an important role in the current autonomous driving system. Although deep learning models, with their intricate network designs, have proven highly capable of detecting lanes, there persist key areas requiring attention. Firstly, the symmetry inherent in visuals captured by forward-facing automotive cameras is an underexploited resource. Secondly, the vast potential of position information remains untapped, which can undermine detection precision. In response to these challenges, we propose FF-HPINet, a novel approach for lane detection. We introduce the Flipped Feature Extraction module, which models pixel pairwise relationships between the flipped feature and the original feature. This module allows us to capture symmetrical features and obtain high-level semantic feature maps from different receptive fields. Additionally, we design the Hierarchical Position Information Extraction module to meticulously mine the position information of the lanes, vastly improving target identification accuracy. Furthermore, the Deformable Context Extraction module is proposed to distill vital foreground elements and contextual nuances from the surrounding environment, yielding focused and contextually apt feature representations. Our approach achieves excellent performance with the F1 score of 97.00% on the TuSimple dataset and 76.84% on the CULane dataset. Full article
Show Figures

Figure 1

18 pages, 5494 KiB  
Article
Hierarchical Semantic-Guided Contextual Structure-Aware Network for Spectral Satellite Image Dehazing
by Lei Yang, Jianzhong Cao, Hua Wang, Sen Dong and Hailong Ning
Remote Sens. 2024, 16(9), 1525; https://doi.org/10.3390/rs16091525 - 25 Apr 2024
Viewed by 454
Abstract
Haze or cloud always shrouds satellite images, obscuring valuable geographic information for military surveillance, natural calamity surveillance and mineral resource exploration. Satellite image dehazing (SID) provides the possibility for better applications of satellite images. Most of the existing dehazing methods are tailored for [...] Read more.
Haze or cloud always shrouds satellite images, obscuring valuable geographic information for military surveillance, natural calamity surveillance and mineral resource exploration. Satellite image dehazing (SID) provides the possibility for better applications of satellite images. Most of the existing dehazing methods are tailored for natural images and are not very effective for satellite images with non-homogeneous haze since the semantic structure information and inconsistent attenuation are not fully considered. To tackle this problem, this study proposes a hierarchical semantic-guided contextual structure-aware network (SCSNet) for spectral satellite image dehazing. Specifically, a hybrid CNN–Transformer architecture integrated with a hierarchical semantic guidance (HSG) module is presented to learn semantic structure information by synergetically complementing local representation from non-local features. Furthermore, a cross-layer fusion (CLF) module is specially designed to replace the traditional skip connection during the feature decoding stage so as to reinforce the attention to the spatial regions and feature channels with more serious attenuation. The results on the SateHaze1k, RS-Haze, and RSID datasets demonstrated that the proposed SCSNet can achieve effective dehazing and outperforms existing state-of-the-art methods. Full article
(This article belongs to the Special Issue Remote Sensing Cross-Modal Research: Algorithms and Practices)
Show Figures

Figure 1

21 pages, 1001 KiB  
Article
CCFNet: Collaborative Cross-Fusion Network for Medical Image Segmentation
by Jialu Chen and Baohua Yuan
Algorithms 2024, 17(4), 168; https://doi.org/10.3390/a17040168 - 21 Apr 2024
Viewed by 945
Abstract
The Transformer architecture has gained widespread acceptance in image segmentation. However, it sacrifices local feature details and necessitates extensive data for training, posing challenges to its integration into computer-aided medical image segmentation. To address the above challenges, we introduce CCFNet, a collaborative cross-fusion [...] Read more.
The Transformer architecture has gained widespread acceptance in image segmentation. However, it sacrifices local feature details and necessitates extensive data for training, posing challenges to its integration into computer-aided medical image segmentation. To address the above challenges, we introduce CCFNet, a collaborative cross-fusion network, which continuously fuses a CNN and Transformer interactively to exploit context dependencies. In particular, when integrating CNN features into Transformer, the correlations between local and global tokens are adaptively fused through collaborative self-attention fusion to minimize the semantic disparity between these two types of features. When integrating Transformer features into the CNN, it uses the spatial feature injector to reduce the spatial information gap between features due to the asymmetry of the extracted features. In addition, CCFNet implements the parallel operation of Transformer and the CNN and independently encodes hierarchical global and local representations when effectively aggregating different features, which can preserve global representations and local features. The experimental findings from two public medical image segmentation datasets reveal that our approach exhibits competitive performance in comparison to current state-of-the-art methods. Full article
Show Figures

Figure 1

20 pages, 10556 KiB  
Article
HSAA-CD: A Hierarchical Semantic Aggregation Mechanism and Attention Module for Non-Agricultural Change Detection in Cultivated Land
by Fangting Li, Fangdong Zhou, Guo Zhang, Jianfeng Xiao and Peng Zeng
Remote Sens. 2024, 16(8), 1372; https://doi.org/10.3390/rs16081372 - 13 Apr 2024
Viewed by 564
Abstract
Cultivated land plays a fundamental role in the sustainable development of the world. Monitoring the non-agricultural changes is important for the development of land-use policies. A bitemporal image transformer (BIT) can achieve high accuracy for change detection (CD) tasks and also become a [...] Read more.
Cultivated land plays a fundamental role in the sustainable development of the world. Monitoring the non-agricultural changes is important for the development of land-use policies. A bitemporal image transformer (BIT) can achieve high accuracy for change detection (CD) tasks and also become a key scientific tool to support decision-making. Because of the diversity of high-resolution RSIs in series, the complexity of agricultural types, and the irregularity of hierarchical semantics in different types of changes, the accuracy of non-agricultural CD is far below the need for the management of the land and for resource planning. In this paper, we proposed a novel non-agricultural CD method to improve the accuracy of machine processing. First, multi-resource surveying data are collected to produce a well-tagged dataset with cultivated land and non-agricultural changes. Secondly, a hierarchical semantic aggregation mechanism and attention module (HSAA) bitemporal image transformer method named HSAA-CD is performed for non-agricultural CD in cultivated land. The proposed HSAA-CD added a hierarchical semantic aggregation mechanism for clustering the input data for U-Net as the backbone network and an attention module to improve the feature edge. Experiments were performed on the open-source LEVIR-CD and WHU Building-CD datasets as well as on the self-built RSI dataset. The F1-score, intersection over union (IoU), and overall accuracy (OA) of these three datasets were 88.56%, 84.29%, and 68.50%; 79.84%, 73.41%, and 59.29%; and 98.83%, 98.39%, and 93.56%, respectively. The results indicated that the proposed HSAA-CD method outperformed the BIT and some other state-of-the-art methods and proved to be suitable accuracy for non-agricultural CD in cultivated land. Full article
Show Figures

Graphical abstract

18 pages, 7301 KiB  
Article
Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
by Li He, Qingxiang Wang, Jie Liu, Jianyong Duan and Hao Wang
Appl. Sci. 2024, 14(6), 2333; https://doi.org/10.3390/app14062333 - 10 Mar 2024
Viewed by 886
Abstract
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe [...] Read more.
The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe three key issues. Firstly, models are prone to misguidance when fusing unrelated text and images. Secondly, most existing visual features are not enhanced or filtered. Finally, due to the independent encoding strategies employed for text and images, a noticeable semantic gap exists between them. To address these challenges, we propose a framework called visual clue guidance and consistency matching (GMF). To tackle the first issue, we introduce a visual clue guidance (VCG) module designed to hierarchically extract visual information from multiple scales. This information is utilized as an injectable visual clue guidance sequence to steer text representations for error-insensitive prediction decisions. Furthermore, by incorporating a cross-scale attention (CSA) module, we successfully mitigate interference across scales, enhancing the image’s capability to capture details. To address the third issue of semantic disparity between text and images, we employ a consistency matching (CM) module based on the idea of multimodal contrastive learning, facilitating the collaborative learning of multimodal data. To validate the effectiveness of our proposed framework, we conducted comprehensive experimental studies, including extensive comparative experiments, ablation studies, and case studies, on two widely used benchmark datasets, demonstrating the efficacy of the framework. Full article
(This article belongs to the Special Issue Cross-Applications of Natural Language Processing and Text Mining)
Show Figures

Figure 1

19 pages, 1296 KiB  
Article
Precision-Driven Product Recommendation Software: Unsupervised Models, Evaluated by GPT-4 LLM for Enhanced Recommender Systems
by Konstantinos I. Roumeliotis, Nikolaos D. Tselikas and Dimitrios K. Nasiopoulos
Software 2024, 3(1), 62-80; https://doi.org/10.3390/software3010004 - 29 Feb 2024
Cited by 1 | Viewed by 1509
Abstract
This paper presents a pioneering methodology for refining product recommender systems, introducing a synergistic integration of unsupervised models—K-means clustering, content-based filtering (CBF), and hierarchical clustering—with the cutting-edge GPT-4 large language model (LLM). Its innovation lies in utilizing GPT-4 for model evaluation, harnessing its [...] Read more.
This paper presents a pioneering methodology for refining product recommender systems, introducing a synergistic integration of unsupervised models—K-means clustering, content-based filtering (CBF), and hierarchical clustering—with the cutting-edge GPT-4 large language model (LLM). Its innovation lies in utilizing GPT-4 for model evaluation, harnessing its advanced natural language understanding capabilities to enhance the precision and relevance of product recommendations. A flask-based API simplifies its implementation for e-commerce owners, allowing for the seamless training and evaluation of the models using CSV-formatted product data. The unique aspect of this approach lies in its ability to empower e-commerce with sophisticated unsupervised recommender system algorithms, while the GPT model significantly contributes to refining the semantic context of product features, resulting in a more personalized and effective product recommendation system. The experimental results underscore the superiority of this integrated framework, marking a significant advancement in the field of recommender systems and providing businesses with an efficient and scalable solution to optimize their product recommendations. Full article
Show Figures

Figure 1

12 pages, 1219 KiB  
Article
Hierarchical Perceptual Graph Attention Network for Knowledge Graph Completion
by Wenhao Han, Xuemei Liu, Jianhao Zhang and Hairui Li
Electronics 2024, 13(4), 721; https://doi.org/10.3390/electronics13040721 - 9 Feb 2024
Viewed by 889
Abstract
Knowledge graph completion (KGC), the process of predicting missing knowledge through known triples, is a primary focus of research in the field of knowledge graphs. As an important graph representation technique in deep learning, graph neural networks (GNNs) perform well in knowledge graph [...] Read more.
Knowledge graph completion (KGC), the process of predicting missing knowledge through known triples, is a primary focus of research in the field of knowledge graphs. As an important graph representation technique in deep learning, graph neural networks (GNNs) perform well in knowledge graph completion, but most existing graph neural network-based knowledge graph completion methods tend to aggregate neighborhood information directly and individually, ignoring the rich hierarchical semantic structure of KGs. As a result, how to effectively deal with multi-level complex relations is still not well resolved. In this study, we present a hierarchical knowledge graph completion technique that combines both relation-level and entity-level attention and incorporates a weight matrix to enhance the significance of the embedded information under different semantic conditions. Furthermore, it updates neighborhood information to the central entity using a hierarchical aggregation approach. The proposed model enhances the capacity to capture hierarchical semantic feature information and is adaptable to various scoring functions as decoders, thus yielding robust results. We conducted experiments on a public benchmark dataset and compared it with several state-of-the-art models, and the experimental results indicate that our proposed model outperforms existing models in several aspects, proving its superior performance and validating the effectiveness of the model. Full article
(This article belongs to the Special Issue Natural Language Processing and Information Retrieval, 2nd Edition)
Show Figures

Figure 1

22 pages, 13940 KiB  
Article
A Tolerance Specification Automatic Design Method for Screening Geometric Tolerance Types
by Guanghao Liu, Meifa Huang and Wenbo Su
Appl. Sci. 2024, 14(3), 1302; https://doi.org/10.3390/app14031302 - 5 Feb 2024
Viewed by 1042
Abstract
At present, the automatic generation of tolerance types based on rule-based reasoning has an obvious characteristic: for the same assembly feature, tolerance items are recommended that satisfy all feature characteristics, with a large number of recommendations. For this reason, automatically selecting tolerance types [...] Read more.
At present, the automatic generation of tolerance types based on rule-based reasoning has an obvious characteristic: for the same assembly feature, tolerance items are recommended that satisfy all feature characteristics, with a large number of recommendations. For this reason, automatically selecting tolerance types and reducing designer autonomy remains a challenging task, especially for complex mechanical products designed using heterogeneous CAD systems. This article proposes a tolerance specification design method for the automatic selection of assembly tolerance types. Based on the construction of a hierarchical representation model of assembly tolerance information with tolerance-zone degrees of freedom (DOFs), a semantic model of geometric tolerance information with tolerance-zone DOFs and a meta-ontology model of assembly tolerance information representation are constructed. Descriptive logic is used to express the attribute relationships between different classes in the assembly tolerance information meta-ontology model, and screening inference rules are constructed based on the mechanism for selecting assembly tolerance types based on tolerance-zone DOFs. On this basis, a process for selecting assembly geometric tolerance types based on the ontology of tolerance-zone DOFs is formed. Finally, the effectiveness and feasibility of this method were verified through examples. Full article
(This article belongs to the Special Issue Advances in Structural Optimization)
Show Figures

Figure 1

18 pages, 41901 KiB  
Article
SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions
by Saba Arshad and Tae-Hyoung Park
Sensors 2024, 24(3), 906; https://doi.org/10.3390/s24030906 - 30 Jan 2024
Viewed by 818
Abstract
Robust visual place recognition (VPR) enables mobile robots to identify previously visited locations. For this purpose, the extracted visual information and place matching method plays a significant role. In this paper, we critically review the existing VPR methods and group them into three [...] Read more.
Robust visual place recognition (VPR) enables mobile robots to identify previously visited locations. For this purpose, the extracted visual information and place matching method plays a significant role. In this paper, we critically review the existing VPR methods and group them into three major categories based on visual information used, i.e., handcrafted features, deep features, and semantics. Focusing the benefits of convolutional neural networks (CNNs) and semantics, and limitations of existing research, we propose a robust appearance-based place recognition method, termed SVS-VPR, which is implemented as a hierarchical model consisting of two major components: global scene-based and local feature-based matching. The global scene semantics are extracted and compared with pre-visited images to filter the match candidates while reducing the search space and computational cost. The local feature-based matching involves the extraction of robust local features from CNN possessing invariant properties against environmental conditions and a place matching method utilizing semantic, visual, and spatial information. SVS-VPR is evaluated on publicly available benchmark datasets using true positive detection rate, recall at 100% precision, and area under the curve. Experimental findings demonstrate that SVS-VPR surpasses several state-of-the-art deep learning-based methods, boosting robustness against significant changes in viewpoint and appearance while maintaining efficient matching time performance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

14 pages, 5412 KiB  
Article
Swin-Net: A Swin-Transformer-Based Network Combing with Multi-Scale Features for Segmentation of Breast Tumor Ultrasound Images
by Chengzhang Zhu, Xian Chai, Yalong Xiao, Xu Liu, Renmao Zhang, Zhangzheng Yang and Zhiyuan Wang
Diagnostics 2024, 14(3), 269; https://doi.org/10.3390/diagnostics14030269 - 26 Jan 2024
Viewed by 1559
Abstract
Breast cancer is one of the most common cancers in the world, especially among women. Breast tumor segmentation is a key step in the identification and localization of the breast tumor region, which has important clinical significance. Inspired by the swin-transformer model with [...] Read more.
Breast cancer is one of the most common cancers in the world, especially among women. Breast tumor segmentation is a key step in the identification and localization of the breast tumor region, which has important clinical significance. Inspired by the swin-transformer model with powerful global modeling ability, we propose a semantic segmentation framework named Swin-Net for breast ultrasound images, which combines Transformer and Convolutional Neural Networks (CNNs) to effectively improve the accuracy of breast ultrasound segmentation. Firstly, our model utilizes a swin-transformer encoder with stronger learning ability, which can extract features of images more precisely. In addition, two new modules are introduced in our method, including the feature refinement and enhancement module (RLM) and the hierarchical multi-scale feature fusion module (HFM), given that the influence of ultrasonic image acquisition methods and the characteristics of tumor lesions is difficult to capture. Among them, the RLM module is used to further refine and enhance the feature map learned by the transformer encoder. The HFM module is used to process multi-scale high-level semantic features and low-level details, so as to achieve effective cross-layer feature fusion, suppress noise, and improve model segmentation performance. Experimental results show that Swin-Net performs significantly better than the most advanced methods on the two public benchmark datasets. In particular, it achieves an absolute improvement of 1.4–1.8% on Dice. Additionally, we provide a new dataset of breast ultrasound images on which we test the effect of our model, further demonstrating the validity of our method. In summary, the proposed Swin-Net framework makes significant advancements in breast ultrasound image segmentation, providing valuable exploration for research and applications in this domain. Full article
(This article belongs to the Section Medical Imaging and Theranostics)
Show Figures

Figure 1

21 pages, 5618 KiB  
Article
ResU-Former: Advancing Remote Sensing Image Segmentation with Swin Residual Transformer for Precise Global–Local Feature Recognition and Visual–Semantic Space Learning
by Hanlu Li, Lei Li, Liangyu Zhao and Fuxiang Liu
Electronics 2024, 13(2), 436; https://doi.org/10.3390/electronics13020436 - 20 Jan 2024
Cited by 1 | Viewed by 1080
Abstract
In the field of remote sensing image segmentation, achieving high accuracy and efficiency in diverse and complex environments remains a challenge. Additionally, there is a notable imbalance between the underlying features and the high-level semantic information embedded within remote sensing images, and both [...] Read more.
In the field of remote sensing image segmentation, achieving high accuracy and efficiency in diverse and complex environments remains a challenge. Additionally, there is a notable imbalance between the underlying features and the high-level semantic information embedded within remote sensing images, and both global and local recognition improvements are also limited by the multi-scale remote sensing scenery and imbalanced class distribution. These challenges are further compounded by inaccurate local localization segmentation and the oversight of small-scale features. To achieve balance between visual space and semantic space, to increase both global and local recognition accuracy, and to enhance the flexibility of input scale features while supplementing global contextual information, in this paper, we propose a U-shaped hierarchical structure called ResU-Former. The incorporation of the Swin Residual Transformer block allows for the efficient segmentation of objects of varying sizes against complex backgrounds, a common scenario in remote sensing datasets. With the specially designed Swin Residual Transformer block as its fundamental unit, ResU-Former accomplishes the full utilization and evolution of information, and the maximum optimization of semantic segmentation in complex remote sensing scenarios. The standard experimental results on benchmark datasets such as Vaihingen, Overall Accuracy of 81.5%, etc., show the ResU-Former’s potential to improve segmentation tasks across various remote sensing applications. Full article
Show Figures

Figure 1

16 pages, 4024 KiB  
Article
Features Split and Aggregation Network for Camouflaged Object Detection
by Zejin Zhang, Tao Wang, Jian Wang and Yao Sun
J. Imaging 2024, 10(1), 24; https://doi.org/10.3390/jimaging10010024 - 18 Jan 2024
Viewed by 1610
Abstract
Higher standards have been proposed for detection systems since camouflaged objects are not distinct enough, making it possible to ignore the difference between their background and foreground. In this paper, we present a new framework for Camouflaged Object Detection (COD) named FSANet, which [...] Read more.
Higher standards have been proposed for detection systems since camouflaged objects are not distinct enough, making it possible to ignore the difference between their background and foreground. In this paper, we present a new framework for Camouflaged Object Detection (COD) named FSANet, which consists mainly of three operations: spatial detail mining (SDM), cross-scale feature combination (CFC), and hierarchical feature aggregation decoder (HFAD). The framework simulates the three-stage detection process of the human visual mechanism when observing a camouflaged scene. Specifically, we have extracted five feature layers using the backbone and divided them into two parts with the second layer as the boundary. The SDM module simulates the human cursory inspection of the camouflaged objects to gather spatial details (such as edge, texture, etc.) and fuses the features to create a cursory impression. The CFC module is used to observe high-level features from various viewing angles and extracts the same features by thoroughly filtering features of various levels. We also design side-join multiplication in the CFC module to avoid detail distortion and use feature element-wise multiplication to filter out noise. Finally, we construct an HFAD module to deeply mine effective features from these two stages, direct the fusion of low-level features using high-level semantic knowledge, and improve the camouflage map using hierarchical cascade technology. Compared to the nineteen deep-learning-based methods in terms of seven widely used metrics, our proposed framework has clear advantages on four public COD datasets, demonstrating the effectiveness and superiority of our model. Full article
Show Figures

Figure 1

Back to TopTop