Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,620)

Search Parameters:
Keywords = CNN–Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
9 pages, 4309 KiB  
Communication
Attention Mechanism-Based Glaucoma Classification Model Using Retinal Fundus Images
by You-Sang Cho, Ho-Jung Song, Ju-Hyuck Han and Yong-Suk Kim
Sensors 2024, 24(14), 4684; https://doi.org/10.3390/s24144684 (registering DOI) - 19 Jul 2024
Abstract
This paper presents a classification model for eye diseases utilizing attention mechanisms to learn features from fundus images and structures. The study focuses on diagnosing glaucoma by extracting retinal vessels and the optic disc from fundus images using a ResU-Net-based segmentation model and [...] Read more.
This paper presents a classification model for eye diseases utilizing attention mechanisms to learn features from fundus images and structures. The study focuses on diagnosing glaucoma by extracting retinal vessels and the optic disc from fundus images using a ResU-Net-based segmentation model and Hough Circle Transform, respectively. The extracted structures and preprocessed images were inputted into a CNN-based multi-input model for training. Comparative evaluations demonstrated that our model outperformed other research models in classifying glaucoma, even with a smaller dataset. Ablation studies confirmed that using attention mechanisms to learn fundus structures significantly enhanced performance. The study also highlighted the challenges in normal case classification due to potential feature degradation during structure extraction. Future research will focus on incorporating additional fundus structures such as the macula, refining extraction algorithms, and expanding the types of classified eye diseases. Full article
Show Figures

Figure 1

12 pages, 1800 KiB  
Article
Research on Public Service Request Text Classification Based on BERT-BiLSTM-CNN Feature Fusion
by Yunpeng Xiong, Guolian Chen and Junkuo Cao
Appl. Sci. 2024, 14(14), 6282; https://doi.org/10.3390/app14146282 (registering DOI) - 18 Jul 2024
Viewed by 82
Abstract
Convolutional neural networks (CNNs) face challenges in capturing long-distance text correlations, and Bidirectional Long Short-Term Memory (BiLSTM) networks exhibit limited feature extraction capabilities for text classification of public service requests. To address the abovementioned problems, this work utilizes an ensemble learning approach to [...] Read more.
Convolutional neural networks (CNNs) face challenges in capturing long-distance text correlations, and Bidirectional Long Short-Term Memory (BiLSTM) networks exhibit limited feature extraction capabilities for text classification of public service requests. To address the abovementioned problems, this work utilizes an ensemble learning approach to integrate model elements efficiently. This study presents a method for classifying public service request text using a hybrid neural network model called BERT-BiLSTM-CNN. First, BERT (Bidirectional Encoder Representations from Transformers) is used for preprocessing to obtain text vector representations. Then, context and process sequence information are captured through BiLSTM. Next, local features in the text are captured through CNN. Finally, classification results are obtained through Softmax. Through comparative analysis, the method of fusing these three models is superior to other hybrid neural network model architectures in multiple classification tasks. It has a significant effect on public service request text classification. Full article
Show Figures

Figure 1

18 pages, 1603 KiB  
Article
SPT-UNet: A Superpixel-Level Feature Fusion Network for Water Extraction from SAR Imagery
by Teng Zhao, Xiaoping Du, Chen Xu, Hongdeng Jian, Zhipeng Pei, Junjie Zhu, Zhenzhen Yan and Xiangtao Fan
Remote Sens. 2024, 16(14), 2636; https://doi.org/10.3390/rs16142636 - 18 Jul 2024
Viewed by 85
Abstract
Extracting water bodies from synthetic aperture radar (SAR) images plays a crucial role in the management of water resources, flood monitoring, and other applications. Recently, transformer-based models have been extensively utilized in the remote sensing domain. However, due to regular patch-partition and weak [...] Read more.
Extracting water bodies from synthetic aperture radar (SAR) images plays a crucial role in the management of water resources, flood monitoring, and other applications. Recently, transformer-based models have been extensively utilized in the remote sensing domain. However, due to regular patch-partition and weak inductive bias, transformer-based models face challenges such as edge serration and high data dependency when used for water body extraction from SAR images. To address these challenges, we introduce a new model, the Superpixel-based Transformer (SPT), based on the adaptive characteristic of superpixels and knowledge constraints of the adjacency matrix. (1) To mitigate edge serration, the SPT replaces regular patch partition with superpixel segmentation to fully utilize the internal homogeneity of superpixels. (2) To reduce data dependency, the SPT incorporates a normalized adjacency matrix between superpixels into the Multi-Layer Perceptron (MLP) to impose knowledge constraints. (3) Additionally, to integrate superpixel-level learning from the SPT with pixel-level learning from the CNN, we combine these two deep networks to form SPT-UNet for water body extraction. The results show that our SPT-UNet is competitive compared with other state-of-the-art extraction models, both in terms of quantitative metrics and visual effects. Full article
17 pages, 7982 KiB  
Article
Deep Dynamic Weights for Underwater Image Restoration
by Hafiz Shakeel Ahmad Awan and Muhammad Tariq Mahmood
J. Mar. Sci. Eng. 2024, 12(7), 1208; https://doi.org/10.3390/jmse12071208 - 18 Jul 2024
Viewed by 102
Abstract
Underwater imaging presents unique challenges, notably color distortions and reduced contrast due to light attenuation and scattering. Most underwater image enhancement methods first use linear transformations for color compensation and then enhance the image. We observed that linear transformation for color compensation is [...] Read more.
Underwater imaging presents unique challenges, notably color distortions and reduced contrast due to light attenuation and scattering. Most underwater image enhancement methods first use linear transformations for color compensation and then enhance the image. We observed that linear transformation for color compensation is not suitable for certain images. For such images, non-linear mapping is a better choice. This paper introduces a unique underwater image restoration approach leveraging a streamlined convolutional neural network (CNN) for dynamic weight learning for linear and non-linear mapping. In the first phase, a classifier is applied that classifies the input images as Type I or Type II. In the second phase, we use the Deep Line Model (DLM) for Type-I images and the Deep Curve Model (DCM) for Type-II images. For mapping an input image to an output image, the DLM creatively combines color compensation and contrast adjustment in a single step and uses deep lines for transformation, whereas the DCM employs higher-order curves. Both models utilize lightweight neural networks that learn per-pixel dynamic weights based on the input image’s characteristics. Comprehensive evaluations on benchmark datasets using metrics like peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) affirm our method’s effectiveness in accurately restoring underwater images, outperforming existing techniques. Full article
(This article belongs to the Special Issue Application of Deep Learning in Underwater Image Processing)
Show Figures

Figure 1

21 pages, 3747 KiB  
Article
ViT-PSO-SVM: Cervical Cancer Predication Based on Integrating Vision Transformer with Particle Swarm Optimization and Support Vector Machine
by Abdulaziz AlMohimeed, Mohamed Shehata, Nora El-Rashidy, Sherif Mostafa, Amira Samy Talaat and Hager Saleh
Bioengineering 2024, 11(7), 729; https://doi.org/10.3390/bioengineering11070729 - 18 Jul 2024
Viewed by 111
Abstract
Cervical cancer (CCa) is the fourth most prevalent and common cancer affecting women worldwide, with increasing incidence and mortality rates. Hence, early detection of CCa plays a crucial role in improving outcomes. Non-invasive imaging procedures with good diagnostic performance are desirable and have [...] Read more.
Cervical cancer (CCa) is the fourth most prevalent and common cancer affecting women worldwide, with increasing incidence and mortality rates. Hence, early detection of CCa plays a crucial role in improving outcomes. Non-invasive imaging procedures with good diagnostic performance are desirable and have the potential to lessen the degree of intervention associated with the gold standard, biopsy. Recently, artificial intelligence-based diagnostic models such as Vision Transformers (ViT) have shown promising performance in image classification tasks, rivaling or surpassing traditional convolutional neural networks (CNNs). This paper studies the effect of applying a ViT to predict CCa using different image benchmark datasets. A newly developed approach (ViT-PSO-SVM) was presented for boosting the results of the ViT based on integrating the ViT with particle swarm optimization (PSO), and support vector machine (SVM). First, the proposed framework extracts features from the Vision Transformer. Then, PSO is used to reduce the complexity of extracted features and optimize feature representation. Finally, a softmax classification layer is replaced with an SVM classification model to precisely predict CCa. The models are evaluated using two benchmark cervical cell image datasets, namely SipakMed and Herlev, with different classification scenarios: two, three, and five classes. The proposed approach achieved 99.112% accuracy and 99.113% F1-score for SipakMed with two classes and achieved 97.778% accuracy and 97.805% F1-score for Herlev with two classes outperforming other Vision Transformers, CNN models, and pre-trained models. Finally, GradCAM is used as an explainable artificial intelligence (XAI) tool to visualize and understand the regions of a given image that are important for a model’s prediction. The obtained experimental results demonstrate the feasibility and efficacy of the developed ViT-PSO-SVM approach and hold the promise of providing a robust, reliable, accurate, and non-invasive diagnostic tool that will lead to improved healthcare outcomes worldwide. Full article
Show Figures

Graphical abstract

20 pages, 560 KiB  
Article
Deep Learning Soft-Decision GNSS Multipath Detection and Mitigation
by Fernando Nunes and Fernando Sousa
Sensors 2024, 24(14), 4663; https://doi.org/10.3390/s24144663 - 18 Jul 2024
Viewed by 117
Abstract
A technique is proposed to detect the presence of the multipath effect in Global Navigation Satellite Signal (GNSS) signals using a convolutional neural network (CNN) as the building block. The network is trained and validated, for a wide range of [...] Read more.
A technique is proposed to detect the presence of the multipath effect in Global Navigation Satellite Signal (GNSS) signals using a convolutional neural network (CNN) as the building block. The network is trained and validated, for a wide range of C/N0 values, with a realistic dataset constituted by the synthetic noisy outputs of a 2D grid of correlators associated with different Doppler frequencies and code delays (time-domain dataset). Multipath-disturbed signals are generated in agreement with the various scenarios encompassed by the adopted multipath model. It was found that pre-processing the outputs of the correlators grid with the two-dimensional Discrete Fourier Transform (frequency-domain dataset) enables the CNN to improve the accuracy relative to the time-domain dataset. Depending on the kind of CNN outputs, two strategies can then be devised to solve the equation of navigation: either remove the disturbed signal from the equation (hard decision) or process the pseudoranges with a weighted least-squares algorithm, where the entries of the weighting matrix are computed using the analog outputs of the neural network (soft decision). Full article
Show Figures

Figure 1

25 pages, 43361 KiB  
Article
DFFNet: A Rainfall Nowcasting Model Based on Dual-Branch Feature Fusion
by Shuxian Liu, Yulong Liu, Jiong Zheng, Yuanyuan Liao, Guohong Zheng and Yongjun Zhang
Electronics 2024, 13(14), 2826; https://doi.org/10.3390/electronics13142826 - 18 Jul 2024
Viewed by 132
Abstract
Timely and accurate rainfall prediction is crucial to social life and economic activities. Because of the influence of numerous factors on rainfall, making precise predictions is challenging. In this study, the northern Xinjiang region of China is selected as the research area. Based [...] Read more.
Timely and accurate rainfall prediction is crucial to social life and economic activities. Because of the influence of numerous factors on rainfall, making precise predictions is challenging. In this study, the northern Xinjiang region of China is selected as the research area. Based on the pattern of rainfall in the local area and the needs of real life, rainfall is divided into four levels, namely ‘no rain’, ‘light rain’, ‘moderate rain’, and ‘heavy rain and above’, for rainfall levels nowcasting. To solve the problem that the existing model can only extract a single time dependence and cause the loss of some valuable information in rainfall data, a prediction model named DFFNet, which is based on dual-branch feature fusion, is proposed in this paper. The two branches of the model are composed of Transformer and CNN, which are used to extract time dependence and feature interaction in meteorological data, respectively. The features extracted from the two branches are fused for prediction. To verify the performance of DFFNet, the India public rainfall dataset and some sub-datasets in the UEA dataset are chosen for comparison. Compared with the baseline models, DFFNet achieves the best prediction performance on all the selected datasets; compared with the single-branch model, the training time consumption of DFFNet on the two rainfall datasets is reduced by 21% and 9.6%, respectively, and it has a faster convergence speed. The experimental results show that it has certain theoretical value and application value for the study of rainfall nowcasting. Full article
(This article belongs to the Special Issue Application of Big Data Mining and Analysis)
Show Figures

Figure 1

16 pages, 9223 KiB  
Article
NATCA YOLO-Based Small Object Detection for Aerial Images
by Yicheng Zhu, Zhenhua Ai, Jinqiang Yan, Silong Li, Guowei Yang and Teng Yu
Information 2024, 15(7), 414; https://doi.org/10.3390/info15070414 - 18 Jul 2024
Viewed by 129
Abstract
The object detection model in UAV aerial image scenes faces challenges such as significant scale changes of certain objects and the presence of complex backgrounds. This paper aims to address the detection of small objects in aerial images using NATCA (neighborhood attention Transformer [...] Read more.
The object detection model in UAV aerial image scenes faces challenges such as significant scale changes of certain objects and the presence of complex backgrounds. This paper aims to address the detection of small objects in aerial images using NATCA (neighborhood attention Transformer coordinate attention) YOLO. Specifically, the feature extraction network incorporates a neighborhood attention transformer (NAT) into the last layer to capture global context information and extract diverse features. Additionally, the feature fusion network (Neck) incorporates a coordinate attention (CA) module to capture channel information and longer-range positional information. Furthermore, the activation function in the original convolutional block is replaced with Meta-ACON. The NAT serves as the prediction layer in the new network, which is evaluated using the VisDrone2019-DET object detection dataset as a benchmark, and tested on the VisDrone2019-DET-test-dev dataset. To assess the performance of the NATCA YOLO model in detecting small objects in aerial images, other detection networks, such as Faster R-CNN, RetinaNet, and SSD, are employed for comparison on the test set. The results demonstrate that the NATCA YOLO detection achieves an average accuracy of 42%, which is a 2.9% improvement compared to the state-of-the-art detection network TPH-YOLOv5. Full article
Show Figures

Figure 1

23 pages, 7788 KiB  
Article
A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation
by Hao Ding, Bo Xia, Weilin Liu, Zekai Zhang, Jinglin Zhang, Xing Wang and Sen Xu
Remote Sens. 2024, 16(14), 2620; https://doi.org/10.3390/rs16142620 - 17 Jul 2024
Viewed by 256
Abstract
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential [...] Read more.
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance. Full article
Show Figures

Figure 1

14 pages, 968 KiB  
Article
MambaReID: Exploiting Vision Mamba for Multi-Modal Object Re-Identification
by Ruijuan Zhang, Lizhong Xu, Song Yang and Li Wang
Sensors 2024, 24(14), 4639; https://doi.org/10.3390/s24144639 - 17 Jul 2024
Viewed by 195
Abstract
Multi-modal object re-identification (ReID) is a challenging task that seeks to identify objects across different image modalities by leveraging their complementary information. Traditional CNN-based methods are constrained by limited receptive fields, whereas Transformer-based approaches are hindered by high computational demands and a lack [...] Read more.
Multi-modal object re-identification (ReID) is a challenging task that seeks to identify objects across different image modalities by leveraging their complementary information. Traditional CNN-based methods are constrained by limited receptive fields, whereas Transformer-based approaches are hindered by high computational demands and a lack of convolutional biases. To overcome these limitations, we propose a novel fusion framework named MambaReID, integrating the strengths of both architectures with the effective VMamba. Specifically, our MambaReID consists of three components: Three-Stage VMamba (TSV), Dense Mamba (DM), and Consistent VMamba Fusion (CVF). TSV efficiently captures global context information and local details with low computational complexity. DM enhances feature discriminability by fully integrating inter-modality information with shallow and deep features through dense connections. Additionally, with well-aligned multi-modal images, CVF provides more granular modal aggregation, thereby improving feature robustness. The MambaReID framework, with its innovative components, not only achieves superior performance in multi-modal object ReID tasks, but also does so with fewer parameters and lower computational costs. Our proposed MambaReID’s effectiveness is validated by extensive experiments conducted on three multi-modal object ReID benchmarks. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 11917 KiB  
Article
Exploring Spectrogram-Based Audio Classification for Parkinson’s Disease: A Study on Speech Classification and Qualitative Reliability Verification
by Seung-Min Jeong, Seunghyun Kim, Eui Chul Lee and Han Joon Kim
Sensors 2024, 24(14), 4625; https://doi.org/10.3390/s24144625 - 17 Jul 2024
Viewed by 185
Abstract
Patients suffering from Parkinson’s disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson’s patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in [...] Read more.
Patients suffering from Parkinson’s disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson’s patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson’s through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson’s speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson’s through speech using two different types of models but also validated the predictions of the model in practice. Full article
Show Figures

Figure 1

41 pages, 33915 KiB  
Article
Four Transformer-Based Deep Learning Classifiers Embedded with an Attention U-Net-Based Lung Segmenter and Layer-Wise Relevance Propagation-Based Heatmaps for COVID-19 X-ray Scans
by Siddharth Gupta, Arun K. Dubey, Rajesh Singh, Mannudeep K. Kalra, Ajith Abraham, Vandana Kumari, John R. Laird, Mustafa Al-Maini, Neha Gupta, Inder Singh, Klaudija Viskovic, Luca Saba and Jasjit S. Suri
Diagnostics 2024, 14(14), 1534; https://doi.org/10.3390/diagnostics14141534 - 16 Jul 2024
Viewed by 359
Abstract
Background: Diagnosing lung diseases accurately is crucial for proper treatment. Convolutional neural networks (CNNs) have advanced medical image processing, but challenges remain in their accurate explainability and reliability. This study combines U-Net with attention and Vision Transformers (ViTs) to enhance lung disease [...] Read more.
Background: Diagnosing lung diseases accurately is crucial for proper treatment. Convolutional neural networks (CNNs) have advanced medical image processing, but challenges remain in their accurate explainability and reliability. This study combines U-Net with attention and Vision Transformers (ViTs) to enhance lung disease segmentation and classification. We hypothesize that Attention U-Net will enhance segmentation accuracy and that ViTs will improve classification performance. The explainability methodologies will shed light on model decision-making processes, aiding in clinical acceptance. Methodology: A comparative approach was used to evaluate deep learning models for segmenting and classifying lung illnesses using chest X-rays. The Attention U-Net model is used for segmentation, and architectures consisting of four CNNs and four ViTs were investigated for classification. Methods like Gradient-weighted Class Activation Mapping plus plus (Grad-CAM++) and Layer-wise Relevance Propagation (LRP) provide explainability by identifying crucial areas influencing model decisions. Results: The results support the conclusion that ViTs are outstanding in identifying lung disorders. Attention U-Net obtained a Dice Coefficient of 98.54% and a Jaccard Index of 97.12%. ViTs outperformed CNNs in classification tasks by 9.26%, reaching an accuracy of 98.52% with MobileViT. An 8.3% increase in accuracy was seen while moving from raw data classification to segmented image classification. Techniques like Grad-CAM++ and LRP provided insights into the decision-making processes of the models. Conclusions: This study highlights the benefits of integrating Attention U-Net and ViTs for analyzing lung diseases, demonstrating their importance in clinical settings. Emphasizing explainability clarifies deep learning processes, enhancing confidence in AI solutions and perhaps enhancing clinical acceptance for improved healthcare results. Full article
(This article belongs to the Special Issue Artificial Intelligence in Biomedical Image Analysis—2nd Edition)
Show Figures

Figure 1

16 pages, 1341 KiB  
Article
DSCEH: Dual-Stream Correlation-Enhanced Deep Hashing for Image Retrieval
by Yulin Yang, Huizhen Chen, Rongkai Liu, Shuning Liu, Yu Zhan, Chao Hu and Ronghua Shi
Mathematics 2024, 12(14), 2221; https://doi.org/10.3390/math12142221 - 16 Jul 2024
Viewed by 220
Abstract
Deep Hashing is widely used for large-scale image-retrieval tasks to speed up the retrieval process. Current deep hashing methods are mainly based on the Convolutional Neural Network (CNN) or Vision Transformer (VIT). They only use the local or global features for low-dimensional mapping [...] Read more.
Deep Hashing is widely used for large-scale image-retrieval tasks to speed up the retrieval process. Current deep hashing methods are mainly based on the Convolutional Neural Network (CNN) or Vision Transformer (VIT). They only use the local or global features for low-dimensional mapping and only use the similarity loss function to optimize the correlation between pairwise or triplet images. Therefore, the effectiveness of deep hashing methods is limited. In this paper, we propose a dual-stream correlation-enhanced deep hashing framework (DSCEH), which uses the local and global features of the image for low-dimensional mapping and optimizes the correlation of images from the model architecture. DSCEH consists of two main steps: model training and deep-hash-based retrieval. During the training phase, a dual-network structure comprising CNN and VIT is employed for feature extraction. Subsequently, feature fusion is achieved through a concatenation operation, followed by similarity evaluation based on the class token acquired from VIT to establish edge relationships. The Graph Convolutional Network is then utilized to enhance correlation optimization between images, resulting in the generation of high-quality hash codes. This stage facilitates the development of an optimized hash model for image retrieval. In the retrieval stage, all images within the database and the to-be-retrieved images are initially mapped to hash codes using the aforementioned hash model. The retrieval results are subsequently determined based on the Hamming distance between the hash codes. We conduct experiments on three datasets: CIFAR-10, MSCOCO, and NUSWIDE. Experimental results show the superior performance of DSCEH, which helps with fast and accurate image retrieval. Full article
Show Figures

Figure 1

20 pages, 3532 KiB  
Article
A Fault Identification Method of Hybrid HVDC System Based on Wavelet Packet Energy Spectrum and CNN
by Yan Liang, Junwei Zhang, Zheng Shi, Haibo Zhao, Yao Wang, Yahong Xing, Xiaowei Zhang, Yujin Wang and Haixiao Zhu
Electronics 2024, 13(14), 2788; https://doi.org/10.3390/electronics13142788 - 16 Jul 2024
Viewed by 246
Abstract
Aiming at the shortcomings of traditional fault identification methods in fault information acquisition, In the scenario of hybrid HVDC transmission system, a new fault identification method is proposed by using wavelet packet energy spectrum and convolutional neural network (CNN), which effectively solves the [...] Read more.
Aiming at the shortcomings of traditional fault identification methods in fault information acquisition, In the scenario of hybrid HVDC transmission system, a new fault identification method is proposed by using wavelet packet energy spectrum and convolutional neural network (CNN), which effectively solves the problem of complex fault feature extraction of hybrid HVDC transmission system. This method effectively improves the accuracy of fault identification. Firstly, tThe frequency-domain characteristics of the fault transient signal are extracted by wavelet packet transform, and the feature differences are reflected in the form of energy spectrum. Secondly, according to the extracted energy feature information, the order of fault line and fault type is identified by CNN. Finally, through example verification and algorithm comparison, it is concluded that, the mentioned model has a strong ability to identify faults, and has strong anti-noise interference and tolerance to transition resistance. Full article
(This article belongs to the Special Issue Emerging Technologies in Computational Intelligence)
Show Figures

Figure 1

22 pages, 80762 KiB  
Article
Super-Resolution Image Reconstruction of Wavefront Coding Imaging System Based on Deep Learning Network
by Xueyan Li, Haowen Yu, Yijian Wu, Lieshan Zhang, Di Chang, Xuhong Chu and Haoyuan Du
Electronics 2024, 13(14), 2781; https://doi.org/10.3390/electronics13142781 - 15 Jul 2024
Viewed by 292
Abstract
Wavefront Coding (WFC) is an innovative technique aimed at extending the depth of focus (DOF) of optics imaging systems. In digital imaging systems, super-resolution digital reconstruction close to the diffraction limit of optical systems has always been a hot research topic. With the [...] Read more.
Wavefront Coding (WFC) is an innovative technique aimed at extending the depth of focus (DOF) of optics imaging systems. In digital imaging systems, super-resolution digital reconstruction close to the diffraction limit of optical systems has always been a hot research topic. With the design of a point spread function (PSF) generated by a suitably phase mask, WFC could also be used in super-resolution image reconstruction. In this paper, we use a deep learning network combined with WFC as a general framework for images reconstruction, and verify its possibility and effectiveness. Considering the blur and additive noise simultaneously, we proposed three super-resolution image reconstruction procedures utilizing convolutional neural networks (CNN) based on mean square error (MSE) loss, conditional Generative Adversarial Networks (CGAN), and Swin Transformer Networks (SwinIR) based on mean absolute error (MAE) loss. We verified their effectiveness by simulation experiments. A comparison of experimental results shows that the SwinIR deep residual network structure based on MAE loss optimization criteria can generate more realistic super-resolution images with more details. In addition, we used a WFC camera to obtain a resolution test target and real scene images for experiments. Using the resolution test target, we demonstrated that the spatial resolution could be improved from 55.6 lp/mm to 124 lp/mm by the proposed super-resolution reconstruction procedure. The reconstruction results show that the proposed deep learning network model is superior to the traditional method in reconstructing high-frequency details and effectively suppressing noise, with the resolution approaching the diffraction limit. Full article
(This article belongs to the Section Networks)
Show Figures

Figure 1

Back to TopTop