Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,472)

Search Parameters:
Keywords = self-attention mechanism

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 3230 KiB  
Article
Autism Identification Based on the Intelligent Analysis of Facial Behaviors: An Approach Combining Coarse- and Fine-Grained Analysis
by Jingying Chen, Chang Chen, Ruyi Xu and Leyuan Liu
Children 2024, 11(11), 1306; https://doi.org/10.3390/children11111306 (registering DOI) - 28 Oct 2024
Abstract
Background: Facial behavior has emerged as a crucial biomarker for autism identification. However, heterogeneity among individuals with autism poses a significant obstacle to traditional feature extraction methods, which often lack the necessary discriminative power. While deep-learning methods hold promise, they are often criticized [...] Read more.
Background: Facial behavior has emerged as a crucial biomarker for autism identification. However, heterogeneity among individuals with autism poses a significant obstacle to traditional feature extraction methods, which often lack the necessary discriminative power. While deep-learning methods hold promise, they are often criticized for their lack of interpretability. Methods: To address these challenges, we developed an innovative facial behavior characterization model that integrates coarse- and fine-grained analyses for intelligent autism identification. The coarse-grained analysis provides a holistic view by computing statistical measures related to facial behavior characteristics. In contrast, the fine-grained component uncovers subtle temporal fluctuations by employing a long short-term memory (LSTM) model to capture the temporal dynamics of head pose, facial expression intensity, and expression types. To fully harness the strengths of both analyses, we implemented a feature-level attention mechanism. This not only enhances the model’s interpretability but also provides valuable insights by highlighting the most influential features through attention weights. Results: Upon evaluation using three-fold cross-validation on a self-constructed autism dataset, our integrated approach achieved an average recognition accuracy of 88.74%, surpassing the standalone coarse-grained analysis by 8.49%. Conclusions: This experimental result underscores the improved generalizability of facial behavior features and effectively mitigates the complexities stemming from the pronounced intragroup variability of those with autism, thereby contributing to more accurate and interpretable autism identification. Full article
Show Figures

Figure 1

25 pages, 11268 KiB  
Article
Capsule Attention Network for Hyperspectral Image Classification
by Nian Wang, Aitao Yang, Zhigao Cui, Yao Ding, Yuanliang Xue and Yanzhao Su
Remote Sens. 2024, 16(21), 4001; https://doi.org/10.3390/rs16214001 (registering DOI) - 28 Oct 2024
Abstract
While many neural networks have been proposed for hyperspectral image classification, current backbones cannot achieve accurate results due to the insufficient representation by scalar features and always cause a cumbersome calculation burden. To solve the problem, we propose the capsule attention network (CAN), [...] Read more.
While many neural networks have been proposed for hyperspectral image classification, current backbones cannot achieve accurate results due to the insufficient representation by scalar features and always cause a cumbersome calculation burden. To solve the problem, we propose the capsule attention network (CAN), which combines an activity vector with an attention mechanism to improve HSI classification. In particular, we consider two attention mechanisms to improve the effectiveness of the activity vectors. First, an attention-based feature extraction (AFE) module is proposed to preprocess the spectral-spatial features of HSI data, which effectively mines useful information before the generation of the activity vectors. Second, we propose a self-weighted mechanism (SWM) to distinguish the importance of different capsule convolutions, which enhances the representation of the primary activity vectors. Experiments on four well-known HSI datasets have shown our CAN surpasses state-of-the-art (SOTA) methods on three widely used metrics with a much lower computational burden. Full article
(This article belongs to the Special Issue Hyperspectral Image Processing: Anomaly Detection and Classification)
Show Figures

Figure 1

23 pages, 1081 KiB  
Article
Implementing Real-Time Image Processing for Radish Disease Detection Using Hybrid Attention Mechanisms
by Mengxue Ji, Zizhe Zhou, Xinyue Wang, Weidong Tang, Yan Li, Yilin Wang, Chaoyu Zhou and Chunli Lv
Plants 2024, 13(21), 3001; https://doi.org/10.3390/plants13213001 (registering DOI) - 27 Oct 2024
Abstract
This paper developed a radish disease detection system based on a hybrid attention mechanism, significantly enhancing the precision and real-time performance in identifying disease characteristics. By integrating spatial and channel attentions, this system demonstrated superior performance across numerous metrics, particularly achieving 93% precision [...] Read more.
This paper developed a radish disease detection system based on a hybrid attention mechanism, significantly enhancing the precision and real-time performance in identifying disease characteristics. By integrating spatial and channel attentions, this system demonstrated superior performance across numerous metrics, particularly achieving 93% precision and 91% accuracy in detecting radish virus disease, outperforming existing technologies. Additionally, the introduction of the hybrid attention mechanism proved its superiority in ablation experiments, showing higher performance compared to standard self-attention and the convolutional block attention module. The study also introduced a hybrid loss function that combines cross-entropy loss and Dice loss, effectively addressing the issue of class imbalance and further enhancing the detection capability for rare diseases. These experimental results not only validate the effectiveness of the proposed method, but also provide robust technical support for the rapid and accurate detection of radish diseases, demonstrating its vast potential in agricultural applications. Future research will continue to optimize the model structure and computational efficiency to accommodate a broader range of agricultural disease detection needs. Full article
Show Figures

Figure 1

28 pages, 8210 KiB  
Article
Molecular Docking, Bioinformatic Analysis, and Experimental Verification for the Effect of Naringin on ADHD: Possible Inhibition of GSK-3β and HSP90
by Hatem I. Mokhtar, Sawsan A. Zaitone, Karima El-Sayed, Rehab M. Lashine, Nada Ahmed, Suzan M. M. Moursi, Shaimaa A. Shehata, Afaf A. Aldahish, Mohamed A. Helal, Mohamed K. El-Kherbetawy, Manal S. Fawzy and Noha M. Abd El-Fadeal
Pharmaceuticals 2024, 17(11), 1436; https://doi.org/10.3390/ph17111436 (registering DOI) - 26 Oct 2024
Abstract
Background/Objectives: One of the most abundant and growing neurodevelopmental disorders in recent decades is attention deficit hyperactivity disorder (ADHD). Many trials have been performed on using drugs for the improvement of ADHD signs. This study aimed to detect the possible interaction of naringin [...] Read more.
Background/Objectives: One of the most abundant and growing neurodevelopmental disorders in recent decades is attention deficit hyperactivity disorder (ADHD). Many trials have been performed on using drugs for the improvement of ADHD signs. This study aimed to detect the possible interaction of naringin with Wnt/β-catenin signaling and its putative anti-inflammatory and protective effects in the mouse ADHD model based on bioinformatic, behavioral, and molecular investigations. Furthermore, molecular docking was applied to investigate possible interactions with the GSK-3β and HSP90 proteins. Methods: Male Swiss albino mice were divided into four groups, a normal control group, monosodium glutamate (SGL) control, SGL + naringin 50 mg/kg, and SGL + naringin 100 mg/kg. The psychomotor activity of the mice was assessed using the self-grooming test, rope crawling test, and attentional set-shifting task (ASST). In addition, biochemical analyses were performed using brain samples. Results: The results of the SGL group showed prolonged grooming time (2.47-folds), a lower percentage of mice with successful crawling on the rope (only 16.6%), and a higher number of trials for compound discrimination testing in the ASST (12.83 ± 2.04 trials versus 5.5 ± 1.88 trials in the normal group). Treatment with naringin (50 or 100 mg per kg) produced significant shortening in the grooming time (31% and 27% reductions), as well as a higher percentage of mice succeeding in crawling with the rope (50% and 83%, respectively). Moreover, the ELISA assays indicated decreased dopamine levels (0.36-fold) and increased TNF-α (2.85-fold) in the SGL control group compared to the normal mice, but an improvement in dopamine level was observed in the naringin (50 or 100 mg per kg)-treated groups (1.58-fold and 1.97-fold). Similarly, the PCR test showed significant declines in the expression of the Wnt (0.36), and β-catenin (0.33) genes, but increased caspase-3 (3.54-fold) and BAX (5.36-fold) genes in the SGL group; all these parameters were improved in the naringin 50 or 100 mg/kg groups. Furthermore, molecular docking indicated possible inhibition for HSP90 and GSK-3β. Conclusions: Overall, we can conclude that naringin is a promising agent for alleviating ADHD symptoms, and further investigations are required to elucidate its mechanism of action. Full article
(This article belongs to the Section Natural Products)
Show Figures

Graphical abstract

17 pages, 14021 KiB  
Article
Influence of Al and Ti Alloying and Annealing on the Microstructure and Compressive Properties of Cr-Fe-Ni Multi-Principal Element Alloy
by Keyan An, Tailin Yang, Junjie Feng, Honglian Deng, Xiang Zhang, Zeyu Zhao, Qingkun Meng, Jiqiu Qi, Fuxiang Wei and Yanwei Sui
Metals 2024, 14(11), 1223; https://doi.org/10.3390/met14111223 (registering DOI) - 26 Oct 2024
Abstract
This study meticulously examines the influence of aluminum (Al) and titanium (Ti) on the genesis of self-generated ordered phases in high-entropy alloys (HEAs), a class of materials that has garnered considerable attention due to their exceptional multifunctionality and versatile compositional palette. By meticulously [...] Read more.
This study meticulously examines the influence of aluminum (Al) and titanium (Ti) on the genesis of self-generated ordered phases in high-entropy alloys (HEAs), a class of materials that has garnered considerable attention due to their exceptional multifunctionality and versatile compositional palette. By meticulously tuning the concentrations of Al and Ti, this research delves into the modulation of the in situ self-generated ordered phases’ quantity and distribution within the alloy matrix. The annealing heat treatment outcomes revealed that the strategic incorporation of Al and Ti elements facilitates a phase transformation in the Cr-Fe-Ni medium-entropy alloy, transitioning from a BCC (body-centered cubic) phase to a BCC + FCC (face-centered cubic) phase. Concurrently, this manipulation precipitates the emergence of novel phases, including B2, L21, and σ. This orchestrated phase evolution enacts a synergistic enhancement in mechanical properties through second-phase strengthening and solid solution strengthening, culminating in a marked improvement in the compressive properties of the HEA. Full article
(This article belongs to the Special Issue Processing Technology and Properties of Light Metals)
Show Figures

Figure 1

22 pages, 3751 KiB  
Article
Two-Dimensional Coherent Polarization–Direction-of-Arrival Estimation Based on Sequence-Embedding Fusion Transformer
by Zihan Wu, Jun Wang and Zhiquan Zhou
Remote Sens. 2024, 16(21), 3977; https://doi.org/10.3390/rs16213977 - 25 Oct 2024
Abstract
Addressing the issue of inadequate convergence and suboptimal accuracy in classical data-driven algorithms for coherent polarization–direction-of-arrival (DOA) estimation, a novel high-precision two-dimensional coherent polarization–DOA estimation method utilizing a sequence-embedding fusion (SEF) transformer is proposed for the first time. Drawing inspiration from natural language [...] Read more.
Addressing the issue of inadequate convergence and suboptimal accuracy in classical data-driven algorithms for coherent polarization–direction-of-arrival (DOA) estimation, a novel high-precision two-dimensional coherent polarization–DOA estimation method utilizing a sequence-embedding fusion (SEF) transformer is proposed for the first time. Drawing inspiration from natural language processing (NLP), this approach employs transformer-based multitasking text inference to facilitate joint estimation of polarization and DOA. This method leverages the multi-head self-attention mechanism of the transformer to effectively capture the multi-dimensional features within the spatial-polarization domain of the covariance matrix data. Additionally, an SEF module was proposed to fuse the spatial-polarization domain features from different dimensions. The module is a combination of a convolutional neural network (CNN) with local information extraction capabilities and a feature dimension transformation function, serving to improve the model’s ability to fuse information about features in the spatial-polarization domain. Moreover, to enhance the model’s expressive capacity, we designed a multi-task parallel output mode and a multi-task weighted loss function. Simulation results demonstrate that our method outperforms classical data-driven approaches in both accuracy and generalization, and the estimation accuracy of our method is improved relative to the traditional model-driven algorithm. Full article
Show Figures

Figure 1

14 pages, 2724 KiB  
Article
Improved Real-Time Detection Transformer-Based Rail Fastener Defect Detection Algorithm
by Wei Song, Bin Liao, Keqing Ning and Xiaoyu Yan
Mathematics 2024, 12(21), 3349; https://doi.org/10.3390/math12213349 - 25 Oct 2024
Abstract
To address the issues of the Real-Time DEtection TRansformer (RT-DETR) object detection model, including poor defect feature extraction in the task of rail fastener defect detection, inefficient use of computational resources, and suboptimal channel attention in the self-attention mechanism, the following improvements were [...] Read more.
To address the issues of the Real-Time DEtection TRansformer (RT-DETR) object detection model, including poor defect feature extraction in the task of rail fastener defect detection, inefficient use of computational resources, and suboptimal channel attention in the self-attention mechanism, the following improvements were made. Firstly, a Super-Resolution Convolutional Module (SRConv) was designed as a separate component and integrated into the Backbone network, which enhances the image details and clarity while preserving the original image structure and semantic content. This integration improves the model’s ability to extract defect features. Secondly, a channel attention mechanism was integrated into the self-attention module of RT-DETR to enhance the focus on feature map channels, addressing the problem of sparse attention maps caused by the lack of channel attention while saving computational resources. Finally, the experimental results show that compared to the original model, the improved RT-DETR-based rail fastener defect detection algorithm, with an additional 0.4 MB of parameters, achieved a higher accuracy, with a 2.8 percentage point increase in the Mean Average Precision (mAP) across IoU thresholds from 0.5 to 0.9 and a 1.7 percentage point increase in the Average Recall (AR) across the same thresholds. Full article
(This article belongs to the Special Issue Complex Process Modeling and Control Based on AI Technology)
Show Figures

Figure 1

20 pages, 15897 KiB  
Article
EMB-YOLO: A Lightweight Object Detection Algorithm for Isolation Switch State Detection
by Haojie Chen, Lumei Su, Riben Shu, Tianyou Li and Fan Yin
Appl. Sci. 2024, 14(21), 9779; https://doi.org/10.3390/app14219779 - 25 Oct 2024
Abstract
In power inspection, it is crucial to accurately and regularly monitor the status of isolation switches to ensure the stable operation of power systems. However, current methods for detecting the open and closed states of isolation switches based on image recognition still suffer [...] Read more.
In power inspection, it is crucial to accurately and regularly monitor the status of isolation switches to ensure the stable operation of power systems. However, current methods for detecting the open and closed states of isolation switches based on image recognition still suffer from low accuracy and high edge deployment costs. In this paper, we propose a lightweight object detection model, EMB-YOLO, to address this challenge. Firstly, we propose an efficient mobile inverted bottleneck convolution (EMBC) module for the backbone network. This module is designed with a lightweight structure, aimed at reducing the computational complexity and parameter count, thereby optimizing the model’s computational efficiency. Furthermore, an ELA attention mechanism is used in the EMBC module to enhance the extraction of horizontal and vertical isolation switch features in complex environments. Finally, we proposed an efficient-RepGDFPN fusion network. This network integrates feature maps from different levels to detect isolation switches at multiple scales in monitoring scenarios. An isolation switch dataset was self-built to evaluate the performance of the proposed EMB-YOLO. The experimental results demonstrated that the proposed method achieved superior detection performance on our self-built dataset, with a mean average precision (mAP) of 87.2%, while maintaining a computational cost of only 6.5×109 FLOPs and a parameter size of just 2.8×106 bytes. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

22 pages, 16144 KiB  
Article
Study of Five-Hundred-Meter Aperture Spherical Telescope Feed Cabin Time-Series Prediction Studies Based on Long Short-Term Memory–Self-Attention
by Shuai Peng, Minghui Li, Benning Song, Dongjun Yu, Yabo Luo, Qingliang Yang, Yu Feng, Kaibin Yu and Jiaxue Li
Sensors 2024, 24(21), 6857; https://doi.org/10.3390/s24216857 - 25 Oct 2024
Abstract
The Five-hundred-meter Aperture Spherical Telescope (FAST), as the world’s most sensitive single-dish radio telescope, necessitates highly accurate positioning of its feed cabin to utilize its full observational potential. Traditional positioning methods that rely on GNSS and IMU, integrated with TS devices, but the [...] Read more.
The Five-hundred-meter Aperture Spherical Telescope (FAST), as the world’s most sensitive single-dish radio telescope, necessitates highly accurate positioning of its feed cabin to utilize its full observational potential. Traditional positioning methods that rely on GNSS and IMU, integrated with TS devices, but the GNSS and TS devices are vulnerable to other signal and environmental disruptions, which can significantly diminish position accuracy and even cause observation to stop. To address these challenges, this study introduces a novel time-series prediction model that integrates Long Short-Term Memory (LSTM) networks with a Self-Attention mechanism. This model can hold the precision of feed cabin positioning when the measure devices fail. Experimental results show that our LSTM-Self-Attention model achieves a Mean Absolute Error (MAE) of less than 10 mm and a Root Mean Square Error (RMSE) of approximately 12 mm, with the errors across different axes following a near-normal distribution. This performance meets the FAST measurement precision requirement of 15 mm, a standard derived from engineering practices where measurement accuracy is set at one-third of the control accuracy, which is around 48 mm (according to the accuracy form the official threshold analysis on the focus cabin of FAST). This result not only compensates for the shortcomings of traditional methods in consistently solving feed cabin positioning, but also demonstrates the model’s ability to handle complex time-series data under specific conditions, such as sensor failures, thus providing a reliable tool for the stable operation of highly sensitive astronomical observations. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

28 pages, 8494 KiB  
Article
Visitors’ Behaviors and Perceptions of Spatial Factors of Uncultivated Internet-Famous Sites in Urban Riverfront Public Spaces: Case Study in Changsha, China
by Bohong Zheng, Yuanyuan Huang and Rui Guo
Buildings 2024, 14(11), 3385; https://doi.org/10.3390/buildings14113385 - 25 Oct 2024
Abstract
This article takes representative uncultivated riverfront internet-famous sites (uncultivated RIFSs) in Changsha city, China, as an example to explore the internal mechanism of their formation and finds that they are closely related to the “urban subculture” and the “informality of urban public space”. [...] Read more.
This article takes representative uncultivated riverfront internet-famous sites (uncultivated RIFSs) in Changsha city, China, as an example to explore the internal mechanism of their formation and finds that they are closely related to the “urban subculture” and the “informality of urban public space”. In terms of methodology, through questionnaire surveys and in-depth interviews, this study investigates the behavioral characteristics of onsite visitors, the overall perceptions and satisfaction of public spaces, and the perceptions of spatial and humanistic elements of visitors. The main findings are as follows: ① Onsite visitors are mainly male, with local tourists and nearby residents accounting for over 80%. Furthermore, over half of the visitors have limited understanding of the uncultivated RIFSs. ② People’s overall attitudes towards the uncultivated RIFSs are positive. And the ability to carry out meaningful activities and find comfort and safety are of the greatest concern to onsite tourists. ③ Among the visiting reasons, leisure stays accounted for the highest proportion, followed by sightseeing, sports stays and social stays. ④ The onsite visitors’ main focus of spatial elements and humanistic elements is different according to the different sites. However, visitors’ dissatisfaction is mainly reflected in poor site safety and sanitation conditions, inadequate facilities and poor surrounding environments. This paper also compares the online–offline differences in the spatial perceptions of the uncultivated RIFSs between this study and previous research; instead of focusing on the urban physical spaces, online social media users pay more attention to their self-presentation. Meanwhile, the visitors place greater emphasis on the functionality, practicality and experiential activities of the urban physical spaces. Finally, this article proposes optimization strategies for uncultivated RIFSs from planning and governance and public space design aspects to protect and strengthen the composite utilization of space, therefore enhancing diverse vitality. Full article
Show Figures

Figure 1

30 pages, 8185 KiB  
Review
A Review of Abnormal Crowd Behavior Recognition Technology Based on Computer Vision
by Rongyong Zhao, Feng Hua, Bingyu Wei, Cuiling Li, Yulong Ma, Eric S. W. Wong and Fengnian Liu
Appl. Sci. 2024, 14(21), 9758; https://doi.org/10.3390/app14219758 - 25 Oct 2024
Abstract
Abnormal crowd behavior recognition is one of the research hotspots in computer vision. Its goal is to use computer vision technology and abnormal behavior detection models to accurately perceive, predict, and intervene in potential abnormal behaviors of the crowd and monitor the status [...] Read more.
Abnormal crowd behavior recognition is one of the research hotspots in computer vision. Its goal is to use computer vision technology and abnormal behavior detection models to accurately perceive, predict, and intervene in potential abnormal behaviors of the crowd and monitor the status of the crowd system in public places in real time, to effectively prevent and deal with public security risks and ensure public life safety and social order. To this end, focusing on the abnormal crowd behavior recognition technology in the computer vision system, a systematic review study of its theory and cutting-edge technology is conducted. First, the crowd level and abnormal behaviors in public places are defined, and the challenges faced by abnormal crowd behavior recognition are expounded. Then, from the dimensions based on traditional methods and based on deep learning, the mainstream technologies of abnormal behavior recognition are discussed, and the design ideas, advantages, and limitations of various methods are analyzed. Next, the mainstream software tools are introduced to provide a comprehensive reference for the technical framework. Secondly, typical abnormal behavior datasets at home and abroad are sorted out, and the characteristics of these datasets are compared in detail from multiple perspectives such as scale, characteristics, and uses, and the performance indicators of different algorithms on the datasets are compared and analyzed. Finally, the full text is summarized and the future development direction of abnormal crowd behavior recognition technology is prospected. Full article
Show Figures

Figure 1

16 pages, 7008 KiB  
Article
Improving Top-Down Attention Network in Speech Separation by Employing Hand-Crafted Filterbank and Parameter-Sharing Transformer
by Aye Nyein Aung and Jeih-weih Hung
Electronics 2024, 13(21), 4174; https://doi.org/10.3390/electronics13214174 - 24 Oct 2024
Abstract
The “cocktail party problem”, the challenge of isolating individual speech signals from a noisy mixture, has traditionally been addressed using statistical methods. However, deep neural networks (DNNs), with their ability to learn complex patterns, have emerged as superior solutions. DNNs excel at capturing [...] Read more.
The “cocktail party problem”, the challenge of isolating individual speech signals from a noisy mixture, has traditionally been addressed using statistical methods. However, deep neural networks (DNNs), with their ability to learn complex patterns, have emerged as superior solutions. DNNs excel at capturing intricate relationships between mixed audio signals and their respective speech sources, enabling them to effectively separate overlapping speech signals in challenging acoustic environments. Recent advances in speech separation systems have drawn inspiration from the brain’s hierarchical sensory information processing, incorporating top-down attention mechanisms. The top-down attention network (TDANet) employs an encoder–decoder architecture with top-down attention to enhance feature modulation and separation performance. By leveraging attention signals from multi-scale input features, TDANet effectively modifies features across different scales using a global attention (GA) module in the encoder–decoder design. Local attention (LA) layers then convert these modulated signals into high-resolution auditory characteristics. In this study, we propose two key modifications to TDANet. First, we substitute the fully trainable convolutional encoder with a deterministic hand-crafted multi-phase gammatone filterbank (MP-GTF), which mimics human hearing. Experimental results demonstrated that this substitution yielded comparable or even slightly superior performance to the original TDANet with a trainable encoder. Second, we replace the single multi-head self-attention (MHSA) layer in the global attention module with a transformer encoder block consisting of multiple MHSA layers. To optimize GPU memory utilization, we introduce a parameter sharing mechanism, dubbed “Reverse Cycle”, across layers in the transformer-based encoder. Our experimental findings indicated that these proposed modifications enabled TDANet to achieve competitive separation performance, rivaling state-of-the-art techniques, while maintaining superior computational efficiency. Full article
(This article belongs to the Special Issue Natural Language Processing Method: Deep Learning and Deep Semantics)
Show Figures

Figure 1

20 pages, 1150 KiB  
Article
MPSA-Conformer-CTC/Attention: A High-Accuracy, Low-Complexity End-to-End Approach for Tibetan Speech Recognition
by Changlin Wu, Huihui Sun, Kaifeng Huang and Long Wu
Sensors 2024, 24(21), 6824; https://doi.org/10.3390/s24216824 - 24 Oct 2024
Abstract
This study addresses the challenges of low accuracy and high computational demands in Tibetan speech recognition by investigating the application of end-to-end networks. We propose a decoding strategy that integrates Connectionist Temporal Classification (CTC) and Attention mechanisms, capitalizing on the benefits of automatic [...] Read more.
This study addresses the challenges of low accuracy and high computational demands in Tibetan speech recognition by investigating the application of end-to-end networks. We propose a decoding strategy that integrates Connectionist Temporal Classification (CTC) and Attention mechanisms, capitalizing on the benefits of automatic alignment and attention weight extraction. The Conformer architecture is utilized as the encoder, leading to the development of the Conformer-CTC/Attention model. This model first extracts global features from the speech signal using the Conformer, followed by joint decoding of these features through CTC and Attention mechanisms. To mitigate convergence issues during training, particularly with longer input feature sequences, we introduce a Probabilistic Sparse Attention mechanism within the joint CTC/Attention framework. Additionally, we implement a maximum entropy optimization algorithm for CTC, effectively addressing challenges such as increased path counts, spike distributions, and local optima during training. We designate the proposed method as the MaxEnt-Optimized Probabilistic Sparse Attention Conformer-CTC/Attention Model (MPSA-Conformer-CTC/Attention). Experimental results indicate that our improved model achieves a word error rate reduction of 10.68% and 9.57% on self-constructed and open-source Tibetan datasets, respectively, compared to the baseline model. Furthermore, the enhanced model not only reduces memory consumption and training time but also improves generalization capability and accuracy. Full article
(This article belongs to the Special Issue New Trends in Biometric Sensing and Information Processing)
Show Figures

Figure 1

22 pages, 27370 KiB  
Article
Dynamic Temporal Denoise Neural Network with Multi-Head Attention for Fault Diagnosis Under Noise Background
by Zhongzhi Li, Rong Fan, Jinyi Ma, Jianliang Ai and Yiqun Dong
Sensors 2024, 24(21), 6813; https://doi.org/10.3390/s24216813 - 23 Oct 2024
Abstract
Fault diagnosis plays a crucial role in maintaining the operational safety of mechanical systems. As intelligent data-driven approaches evolve, deep learning (DL) has emerged as a pivotal technique in fault diagnosis research. However, the collected vibrational signals from mechanical systems are usually corrupted [...] Read more.
Fault diagnosis plays a crucial role in maintaining the operational safety of mechanical systems. As intelligent data-driven approaches evolve, deep learning (DL) has emerged as a pivotal technique in fault diagnosis research. However, the collected vibrational signals from mechanical systems are usually corrupted by unrelated noises due to complicated transfer path modulations and component coupling. To solve the above problems, this paper proposed the dynamic temporal denoise neural network with multi-head attention (DTDNet). Firstly, this model transforms one-dimensional signals into two-dimensional tensors based on the periodic self-similarity of signals, employing multi-scale two-dimensional convolution kernels to extract signal features both within and across periods. Secondly, for the problem of lacking denoising structure in traditional convolutional neural networks, a temporal variable denoise (TVD) module with dynamic nonlinear processing is proposed to filter the noises. Lastly, a multi-head attention fusion (MAF) module is used to weight the denoted features of signals with different periods. Evaluation on two datasets, Case Western Reserve University bearing dataset (single sensor) and Real aircraft sensor dataset (multiple sensors), demonstrates that the DTDNet can reduce the useless noises in signals and achieve a remarkable improvement in classification performance compared with the state-of-the-art method. DTDNet provides a high-performance solution for potential noise that may occur in actual fault diagnosis tasks, which has important application value. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

25 pages, 2849 KiB  
Article
Enhanced Hybrid U-Net Framework for Sophisticated Building Automation Extraction Utilizing Decay Matrix
by Ting Wang, Zhuyi Gong, Anqi Tang, Qian Zhang and Yun Ge
Buildings 2024, 14(11), 3353; https://doi.org/10.3390/buildings14113353 - 23 Oct 2024
Abstract
Automatically extracting buildings from remote sensing imagery using deep learning techniques has become essential for various real-world applications. However, mainstream methods often encounter difficulties in accurately extracting and reconstructing fine-grained features due to the heterogeneity and scale variations in building appearances. To address [...] Read more.
Automatically extracting buildings from remote sensing imagery using deep learning techniques has become essential for various real-world applications. However, mainstream methods often encounter difficulties in accurately extracting and reconstructing fine-grained features due to the heterogeneity and scale variations in building appearances. To address these challenges, we propose LDFormer, an advanced building segmentation model based on linear decay. LDFormer introduces a multi-scale detail fusion bridge (MDFB), which dynamically integrates shallow features to enhance the representation of local details and capture fine-grained local features effectively. To improve global feature extraction, the model incorporates linear decay self-attention (LDSA) and depthwise large separable kernel multi-layer perceptron (DWLSK-MLP) optimizations in the decoder. Specifically, LDSA employs a linear decay matrix within the self-attention mechanism to address long-distance dependency issues, while DWLSK-MLP utilizes step-wise convolutions to achieve a large receptive field. The proposed method has been evaluated on the Massachusetts, Inria, and WHU building datasets, achieving IoU scores of 76.10%, 82.87%, and 91.86%, respectively. LDFormer demonstrates superior performance compared to existing state-of-the-art methods in building segmentation tasks, showcasing its significant potential for building automation extraction. Full article
Show Figures

Figure 1

Back to TopTop