1. Introduction
With the continuous advancement of the manufacturing industry, China’s transition from being a manufacturing power to becoming a manufacturing juggernaut has emerged as a significant task for the nation’s economic progress in the modern era [
1]. Within the realm of industry, rolling bearings find extensive utilization across various apparatuses and machinery. Whenever a malfunction arises, it typically gives rise to a sequence of intricate, dynamic, and noise-obscured vibration signals, rendering the extraction of fault-related information a challenging task [
2].
With the proliferation of monitoring devices and the escalation in sampling frequency, the domain of bearing fault monitoring has stepped into the realm of “big data”. Consequently, the fusion of monitoring data with artificial intelligence for fault diagnosis has become a focal point of research. Hu et al. [
3] have developed an enhanced three-layer Laplace wavelet convolutional neural network that not only elucidates its physical implications but also enhances the network’s interpretability. This network exhibits a notable degree of accuracy and generalization across different types of bearing fault scenarios. However, in real-world industrial environments, the scarcity of high-quality training data for intelligent diagnostic models poses a challenge due to the transient nature of fault incidents during the prolonged normal operation of rotating mechanical equipment [
4]. Furthermore, existing deep learning algorithms necessitate an extensive analysis of sample data to yield a high-performance algorithmic model. To address these concerns, Xu et al. [
5] have proposed a ViT (Vision Transformer) model that leverages multi-information fusion, enabling bearing fault diagnosis with limited data samples. Additionally, Chen et al. [
6] have introduced a conditional depth convolution countermeasure generation networks (C-DCGAN) model capable of enhancing small-sample, multi-category data. The vibration signals emanating from bearings in mechanical equipment exhibit characteristics of both mechanical big data and low data density. Moreover, due to their prolonged operational lifetimes in normal working conditions, the monitoring data collected often suffer from high redundancy and low data value density. In this context, the advent of the digital twin (DT) concept provides a viable solution to the aforementioned challenges [
7].
The DT represents a novel technological advancement rooted in computer modeling and simulation techniques. It intricately intertwines physical systems with virtual realms, leveraging digits and information to manifest the behaviors of both real and virtual environments [
8]. By employing data acquired from sensors and generated within the virtual space, the DT technology captures the present state of a system, constructs precise digital models, and conducts real-time simulations and optimizations through computers. The rapid progress of information technology, particularly the emergence of next-generation technologies such as industrial IoT, cloud computing, big data, and machine learning, has propelled DT technology into the forefront of industrial research [
9,
10,
11,
12]. The inception of the DT concept can be traced back to Professor Michael Grieves’ 2003 proposal at the University of Michigan in the United States [
13]. Initially, DT technology found applications in the military and aerospace sectors. The US Air Force Research Laboratory and the National Aeronautics and Space Administration (NASA) employed DT technology to simulate and assess extreme scenarios, testing the resilience of future aerospace flight vehicles against higher loads and more demanding operational conditions [
14]. Recognizing its significance, Gartner, a leading global information technology consulting company, has listed DT technology among the top ten strategic trends and emerging technologies for the next 5–10 years [
15]. Scholars such as Guo et al. have harnessed DT technology to construct comprehensive DT models spanning the entire lifespan of bearings. They utilized neural networks to obtain dynamic response outcomes from the mechanical model of bearings, thereby uncovering the evolutionary patterns of their life cycles [
16]. Piltan et al. combined DT technology with machine learning to detect abnormal bearings and recognize crack sizes [
17]. Zhao et al. employed DT technology to establish a model for wind turbine gearboxes, leveraging deep learning networks to accurately classify the operating conditions of these gearboxes [
18]. Jahangiri et al. developed a mechanical model of a wind turbine transmission system using a DT approach, enabling the monitoring and identification of changes in structural model parameters for making damage assessments [
19]. Moreover, DT technology has recently found application in various fields, including construction [
20], medical care [
21], and communication [
22]. Within the domain of rolling bearing fault diagnosis, DT technology assumes a pivotal role. It facilitates the replication of rolling bearings in the digital realm, generating sample datasets that exhibit the same characteristic distribution. By simulating multidimensional and multi-field high-fidelity twin models, it becomes feasible to emulate bearing conditions under diverse operating circumstances and achieve fault diagnosis. Simultaneously, DT technology presents a new avenue for addressing the challenge of limited sample sizes in rolling bearing fault diagnosis, thus revolutionizing the research pertaining to the identification and diagnosis of bearings in rotating mechanical equipment.
In light of the disparity observed between the feature distributions of training and testing data, certain researchers have incorporated the principles of transfer learning into the realm of bearing fault diagnosis. Transfer learning leverages knowledge acquired from relevant source domains to make predictions in target domains, thereby facilitating a more profound comprehension of feature knowledge in the target domain and enhancing the model’s generalization capabilities. Zhou et al. [
23], at the helm of a team of researchers, have introduced a Transfer Learning Residual Network model (TL-ResNet) that combines residual networks and transfer learning techniques. This approach involves the conversion of one-dimensional vibration data into time-frequency images, followed by the transfer of training from the source domain dataset to the target domain bearings, ultimately enabling fault diagnosis in rolling bearings within the target domain. Huang et al. [
24] have put forth a profound deep transfer learning model that commences by judiciously selecting a suitable source domain dataset using the maximum mean discrepancy technique to support model training. Domain features are subsequently extracted using specialized domain feature extractors, and the alignment of classifier outputs is achieved via the Wasserstein distance. This approach proves efficacious in diagnosing faults in bearings under diverse operating conditions. Presently, the prevailing method in transfer learning entails constructing a fault diagnosis model employing experimental bench running data as the source domain dataset. However, the dissimilarities in the physical attributes of real working condition main bearings on the experimental bench, coupled with the inherent limitations in simulating operating conditions and environments, significantly impact the accuracy of fault diagnosis outcomes.
The aim of DT technology is to diminish the dependence on experimental data sets as the source domain by creating high-fidelity twin models and acquiring a comprehensive and balanced sample data set. It also strives to reduce the disparity in data distribution between the source and target domains by incorporating transfer learning into the diagnostic model framework. This integration helps to alleviate errors caused by imbalanced data distribution during the transfer of features and hyperparameters. In the research framework of rolling bearing fault diagnosis based on DT data, the selection of the network for feature extraction holds paramount importance. Wang et al. [
25] introduced a multi-scale attention mechanism residual network model (MSA-ResNet) that augments feature sensitivity by integrating attention mechanisms into each residual module. This model employs multi-scale convolution kernels to extract features from non-linear vibration signals and exhibits notable advantages in the accuracy of bearing fault classification. Huang et al. [
26] proposed a Channel Attention Mechanism Multi-Scale Convolutional Neural Network (CA-MCNN) model, which enhances the feature learning capabilities of the convolutional layers through the introduction of attention mechanisms. It effectively captures multi-scale information via a one-dimensional convolutional network. Experimental results validate the exceptional fault diagnosis performance of the model across various operating conditions. Zhang et al. proposed a bearing fault detection method based on an improved denoising autoencoder (DAE) and the bottleneck layer self-attention mechanism (MDAE-SAMB) [
27]. They achieved high-accuracy online bearing fault classification using only a limited number of fault samples for offline training. Hou et al. presented a bearing fault diagnosis method that combines the Transformer and Residual Neural Network (ResNet) for joint feature extraction [
28]. They employed a transfer learning strategy with fine-tuning to alleviate the training challenges of the proposed method in new tasks. The results exhibited superior prediction accuracy in high-noise environments compared to traditional deep learning networks. Zhao et al. proposed a dynamic capsule network with adaptive shared weights (DCCN) and adaptively adjusted convolutional weights using attention mechanisms [
29]. The effectiveness of the proposed method was validated through experiments on noisy and variable load-bearing faults, demonstrating a certain degree of generalizability. Wang et al. introduced a dual-stream hybrid generative data-based dual-attention feature fusion network (DAFFN) [
30]. They designed a feature fusion network with dual attention mechanisms to learn channel-level and layer-level weights for features. The results demonstrated that the proposed method maintained a certain diagnostic performance even with imbalanced datasets. The research indicates that deep learning networks are extensively employed in the field of bearing fault diagnosis. However, their deep-layered structure may give rise to gradient disappearance or explosion issues, resulting in an inefficient or slow convergence of the network, subsequently reducing the accuracy of bearing fault diagnosis. To tackle this challenge, this article proposes an enhanced ConvNext approach for bearing fault classification. As a next-generation convolutional neural network, ConvNext incorporates exemplary designs from ResNet and Swin Transformer, which have achieved remarkable success in the field of computer vision. Furthermore, the novel architectural design of ConvNext facilitates smoother network gradients, enabling faster convergence. To further enhance the performance of the basic network model, this article enhances the Block module of the ConvNext network by introducing a SimAM attention module after depthwise convolution. This module computes the similarity between two input sequences and fuses their features without introducing additional parameters, thereby improving the overall performance of the basic network. Simultaneously, an ECA attention module is inserted before the Layer Scale to allocate greater attention to fault features and reinforce the directionality of fault feature extraction, thus maximizing the utilization of fault features. Consequently, this paper employs the enhanced ConvNext network to construct a fault recognition model for rolling bearings.
Lastly, this article presents a fault diagnosis model framework for rolling bearings, incorporating DT data, transfer learning, and an enhanced ConvNext network. More specifically, the DT system for the rolling bearing is established by constructing a coupled reduced-order model (ROM) that encompasses the multi-physics field of the main bearing. This model is utilized to enrich the sample dataset of the source domain by introducing different faults and altering various environmental parameters within a specific range. Subsequently, an upgraded version of the ConvNext network model is initially formulated and trained using the source domain dataset. The parameters and model of this improved ConvNext network are then transferred to the rolling bearing through weight and feature transfer. Ultimately, precise and accurate fault recognition of the defective bearing is accomplished through the utilization of the enhanced ConvNext deep learning network. The specific contributions are delineated as follows:
- (1)
A digital twin system has been devised for rolling bearings, incorporating the integration of multiple physics domains and employing model order reduction techniques. This system facilitates the creation of a substantial and well-balanced dataset, effectively mitigating the challenge posed by limited samples in fault diagnosis. Such an approach not only ensures cost-effectiveness but also enhances convenience.
- (2)
The ECA-SimAM-ConvNext network model is introduced as an innovative classification framework for detecting rolling bearing faults. This model utilizes the ConvNext convolutional neural network as its foundation and integrates a parameter-free attention module (SimAM) and an efficient channel attention feature module (ECA) at strategic positions. These augmentations significantly enhance the network’s ability to extract fault features, resulting in improved performance.
- (3)
An innovative methodology is presented for the identification of rolling bearing faults, integrating digital twin data, transfer learning principles, and deep learning algorithms. The efficacy, precision, and superiority of this approach have been substantiated through experimental validation.
The paper is structured into multiple sections, each serving a distinct purpose.
Section 2 delves into the discussion of the digital twin system for rolling bearings, encompassing the construction of coupled reduced-order models for the multi-physics field and the establishment of the digital twin model. Furthermore, it provides a fundamental understanding of ConvNext, a key theoretical component. In
Section 3, we present the TL-ECA-SimAM-ConvNext method, which is proposed in this study and integrated into the digital twin system, forming a novel framework for fault diagnosis and recognition. The feasibility of the proposed fault diagnosis method is demonstrated in
Section 4, where two commonly used bearing datasets are combined. The experimental results are presented and compared with alternative intelligent fault diagnosis approaches. Finally,
Section 5 concludes the paper, summarizing the findings and implications.
3. DT-TL-ECA-SimAM ConvNext Model Bearing Fault Diagnosis Framework
This paper presents a framework for the fault diagnosis and identification of rolling bearings, as depicted in
Figure 7. The proposed approach can be summarized as follows: Step 1: By manipulating the input parameters of the rolling bearing’s X and Y direction vibration displacement signals through virtual sensors within the construction of the digital twin model, source domain datasets of rolling bearing simulation data under various operational conditions are generated. These datasets are then transformed into time-frequency maps using continuous wavelet transform in MATLAB. Subsequently, preliminary training of the ECA-SimAM-ConvNext network model is conducted. Step 2: The ECA-SimAM-ConvNext model is transferred to the target domain rolling bearings through weight and feature migration techniques. Step 3: The DT-TL-ECA-SimAM-ConvNext network model is employed to accomplish precise fault diagnosis and the identification of rolling bearings.
This article introduces an enhanced Block module within the ConvNext foundational network, referred to as the ECA-SimAM-ConvNext network model, illustrated in
Figure 8. Recent research has shown that the inclusion of ECA and SimAM attention modules within the Block module significantly improves the model’s proficiency in extracting fault features from images. To be precise, the integration of SimAM and ECA attention modules enhances the model’s perception of crucial features, emphasizing essential fault characteristics while suppressing noise. This augmentation strengthens the network’s ability to represent features, thereby facilitating improved differentiation among various bearing states. Through the adaptive selection of frequency ranges or spatial regions of interest, the model can effectively capture signal information related to faults, thus enhancing its adaptability to different types of bearing faults and ultimately boosting generalization performance.
3.1. SimAM
Research has unveiled the utilization of attention mechanisms by the human brain to effectively process intricate information. In the realm of deep learning, the integration of attention mechanisms allows for the allocation of varying weights to different segments of input data. This augmentation enhances the model’s interpretive capabilities by enabling a heightened focus on pertinent information while reducing attention towards extraneous details. Drawing inspiration from neuroscience theory, researchers have introduced SimAM [
32], an attention module devoid of parameters, as depicted in
Figure 9.
The researchers have defined the following energy function by seeking the method of identifying significant neurons, which measures the linear separability between neurons:
Among them, using binary labels and adding regular terms, the final energy function is defined as follows:
The minimum energy can be obtained by the following formula:
Among them, is the target neuron, and and are the mean and variance of the remaining neurons. It can be seen from Formula (3) that the lower the energy, the greater the difference between neuron and the surrounding neurons, and the higher the importance. Therefore, the importance of neurons can be obtained by .
According to the definition of attention mechanism, the features need to be enhanced:
Through the integration of the SimAM module into the network, it becomes feasible to bolster the network’s capacity for feature representation, expedite network convergence, mitigate overfitting to the training data, and consequently amplify the network’s prowess in image recognition.
3.2. ECA
ECA-Net is a channel attention module that was introduced during the 2020 CVPR conference [
33]. It enhances the channel features of the input feature map while preserving its original size. The module is visually represented, and the ECA module model is presented in
Figure 10.
The ECA attention module begins by performing global average pooling on the input feature maps, resulting in a 1 × 1 × C feature map. It then learns weights for different channels to enhance the channel features of the input feature map. These channel weights are applied to each channel of the input feature map, and the resulting channel-weighted feature map is obtained through element-wise multiplication. The output feature map, with channel attention, maintains the same size as the original feature map. By incorporating the ECA module into the ConvNext network, significant improvements in model performance can be achieved, while simultaneously reducing model complexity. This module enables the adaptive adjustment of the importance of each channel while eliminating unnecessary information, thereby enhancing the model’s representational capacity to capture key features in the image.