Research on a Bearing Fault Diagnosis Method Based on an Improved Wasserstein Generative Adversarial Network

Zhu, Chengshun; Lin, Wei; Zhang, Hongji; Cao, Youren; Fan, Qiming; Zhang, Hui

doi:10.3390/machines12080587

Open AccessArticle

Research on a Bearing Fault Diagnosis Method Based on an Improved Wasserstein Generative Adversarial Network

by

Chengshun Zhu

,

Wei Lin

,

Hongji Zhang

,

Youren Cao

,

Qiming Fan

and

Hui Zhang

^*

School of Mechanical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, China

^*

Author to whom correspondence should be addressed.

Machines 2024, 12(8), 587; https://doi.org/10.3390/machines12080587

Submission received: 25 June 2024 / Revised: 14 August 2024 / Accepted: 21 August 2024 / Published: 22 August 2024

(This article belongs to the Section Machines Testing and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, an advanced Wasserstein generative adversarial network (WGAN)-based bearing fault diagnosis approach is proposed to bolster the diagnostic efficacy of conventional WGANs and tackle the challenge of selecting optimal hyperparameters while reducing the reliance on sample labeling. Raw vibration signals undergo continuous wavelet transform (CWT) processing to generate time–frequency images that align with the model’s input dimensions. Subsequently, these images are incorporated into a region-based fully convolutional network (R-FCN), substituting the traditional discriminator for feature capturing. The WGAN model is refined through the utilization of the Bayesian optimization algorithm (BOA) to optimize the generator and discriminator’s semi-supervised learning loss function. This approach is verified using the Case Western Reserve University (CWRU) dataset and a centrifugal pump failure experimental dataset. The results showed improvements in data input generalization and fault feature extraction capabilities. By avoiding the need to label large quantities of sample data, the diagnostic accuracy was improved to 98.9% and 97.4%.

Keywords:

Bayesian optimization algorithms; fault diagnosis; WGAN

1. Introduction

As critical components of machinery, bearings are susceptible to wear, fractures, and other phenomena [1] due to vibrations and impacts during operation, potentially leading to malfunctions or accidents. Consequently, bearing fault diagnosis research is of paramount importance.

Contemporary diagnostic methods encompass time-domain, frequency-domain, machine learning, and deep learning techniques, with general procedures involving signal acquisition and processing, feature extraction, and fault classification [2,3,4,5,6]. However, diagnostic precision is impacted by the non-linear and non-stationary characteristics of signals, which lead to the obtained signals failing to fully express the bearing’s state [7]. Yan et al. employed multi-scale enveloping spectrum (MuSEnE) technology to visually ascertain bearing fault locations based on defect characteristic frequencies [8]. Hoang, Duy-Tang et al. proposed a method for diagnosing bearing faults using a deep convolutional neural network. Through the direct use of vibration signals as the input, they eliminated the need for feature extraction and achieved a high accuracy and robustness in noisy environments with the CWRU dataset [9]. Mohammadreza Ghorvei et al. proposed a novel deep subdomain adaptation graph convolutional neural network (DSAGCN), addressing unsupervised domain adaptation, data geometric structure, and subdomain relation issues in fault diagnosis [10]. Qiang Wang et al. introduced an adaptive denoising convolutional neural network (ADCNN) to tackle the common challenges of noise interference and varying load conditions in rolling bearing fault diagnosis, thereby enhancing diagnostic accuracy and robustness to meet the challenges of complex industrial environments [11]. Spyridon Plakias proposed a new deep learning framework for unsupervised fault detection. Through the integration of multiple types of autoencoders and employment of a soft voting mechanism, the framework effectively addressed the issue of identifying abnormal samples with only normal training data, significantly improving fault detection accuracy and robustness [12]. Jinchuan Qian et al. presented an industrial monitoring method based on autoencoders (AEs), addressing the fault detection and diagnosis requirements of complex processes. Their approach advanced the intelligence of monitoring systems, enhancing both performance and efficiency [13]. Jiazhen Zhu et al. proposed a load weighted denoising autoencoder (LWDAE), which highlighted useful information from both training and online data through weighted loading matrices. Through the modification of the loss function, the addition of regularization terms, and revisions to the calculation logic, the LWDAE resolved issues of mixed and neutralized process variables in the fault diagnosis process [14]. Despite advances in bearing fault diagnostic methods, the non-linear and non-stationary nature of signals continues to pose challenges to diagnostic accuracy. Generative adversarial networks (GANs) have been introduced as an innovative approach to improve effectiveness. GANs learn complex data distributions through training generators and discriminators, thereby extracting deeper fault characteristics directly from raw signals without the need for manual feature extraction. Liang et al. merged generative adversarial networks (GANs) with convolutional neural networks (CNNs) to accomplish end-to-end feature extraction, minimizing information loss and enhancing diagnostic efficiency [15]. Ren, Z. et al. proposed a multi-domain GAN with a self-reasoning training strategy to generate diverse, reliable samples. Separate domain models ensured training stability and similarity to real samples, addressing overfitting in small sample conditions and improving fault diagnosis accuracy [16]. Zhou, K. introduced a semi-supervised DCGAN method to diagnose gear faults with limited training data, combining labeled and unlabeled data to enhance performance. The approach ensures model stability, prevents collapse, and effectively explains the correlations between known and unseen faults [17]. Wang, R. et al. proposed the GFMGAN method to address data scarcity in fault diagnosis via expansion of the original dataset, which improved the reliability of vibration data and diagnostic accuracy [18].

The above methods not only solve the problem of non-linear and non-stationary signals but also solve the problem of missing signals, ensuring that the fault characteristics of the original vibration signal are retained. The WGAN, an extension of the GAN [19], was developed to address the limitations of both JS and KL divergence inherent in GANs and partially alleviate the vanishing gradient issue. Wang et al. employed data augmentation for under-represented classes and a WGAN to bolster small sample datasets, thereby enhancing the precision of fault diagnosis [20]. Wu et al. used a WGAN to enhance the number of fault samples in planetary gearboxes. He et al. integrated a WGAN with an attention mechanism to augment the model’s multi-fault identification capabilities [21]. Nonetheless, the WGAN discriminator, still grounded in GANs, exhibits incomplete fault feature extraction in convolutional layers, leading to suboptimal model training accuracy. This study proposes the implementation of the R-FCN as a substitute for the conventional discriminator network and the adoption of a two-dimensional convolutional layer for fault feature extraction as a means of refining the WGAN.

As the number of network layers escalates, optimization of network hyperparameters becomes increasingly challenging. The prevailing optimization techniques primarily encompass grid search (GS) [22], random search (RS) [23], genetic algorithms (GAs) [24], and Bayesian optimization (BO) [25]. Nv et al. employed GS optimization for maximum correlation kurtosis deconvolution and successfully extracted feeble initial-stage bearing fault vibration signals [26]. Chen et al. harnessed response surface (RS) optimization for long short-term memory (LSTM) models, accurately pinpointing various bearing fault locations and damage levels [27]. Xu et al. utilized a quantum genetic algorithm (QGA) for the global optimization of support vector machines (SVMs) to accurately identify and classify bearing faults [28]. Although the aforementioned methodologies can optimize networks to a certain degree, they remain inadequate in obtaining network hyperparameters for intricate problems with unknown objective functions [29]. Conversely, the Bayesian optimization algorithm (BOA) necessitates only a limited number of objective function evaluations to achieve optimal outcomes, streamlining network optimization.

In conclusion, existing methods have limitations when dealing with the non-linear and non-stationary characteristics of bearing fault signals, leading to a decrease in diagnostic accuracy. Furthermore, many current methods rely on a single dataset for validation and lack extensive data support to fully demonstrate their effectiveness and robustness. Although the traditional generative adversarial network (GAN) can partially address the issue of small sample sizes, deficiencies in fault feature extraction still impact the training accuracy and diagnostic efficacy of the model. Given this situation, this study proposes an enhanced WGAN fault diagnosis method by replacing the conventional discriminator network with R-FCN and utilizing two-dimensional convolutional layers for improved fault feature extraction. Additionally, the Bayesian optimization algorithm (BOA) is employed to optimize the enhanced WGAN by obtaining optimal network hyperparameters, while a semi-supervised learning loss function is formulated to optimize both the generator and discriminator to perform bearing fault diagnosis.

2. Methodology

2.1. WGAN-Related Theory

In this study, the WGAN is adopted—a model based on GAN that realizes network optimization based on both JS and KL divergence. However, when KL is used as the quantization standard for the similarity between synthetic data and actual data, the GAN’s training process is unstable. According to the formula derivation, as long as there is no intersection between the generated data and the real data, no matter whether the two are close or far in space, their JS divergence is always log2, while the KL divergence is always positive and infinite. This will lead to the problem of the disappearance of the generator gradient, even if the discriminator is well trained but the generator parameters are not optimized. The KL divergence definition is provided as follows:

H (p) - H (p, q) = - \int p (x) \log p (x) d x - (- \int p (x) \log q (x) d x)

(1)

where p and q are probability distributions of the real and predicted labels, respectively; H(p) is the information entropy; H(p,q) is the cross-entropy; p(x) is the weight of each x; and q(x) is the amount of Shannon information per point of x. JS divergence is used to measure the difference between the two distributions; it overcomes the fact that KL divergence is not distance and overcomes any asymmetry disadvantages. The JS divergence definition is provided as follows:

J S D (p | | q) = \frac{1}{2} D (p | | m) + \frac{1}{2} D (q | | m), m = \frac{1}{2} (p + q)

(2)

To address the problem that the JS and KL divergence cannot accurately measure the similarity between synthetic data and actual data in GANs, scholars have proposed using the Earth Mover’s (EM) distance in the WGAN to replace the KL and JS divergence. This allows for a more accurate measurement of the similarity between synthetic and actual data, enabling the generator and discriminator to be accurately trained. Figure 1 shows the WGAN structure determined using the Wasserstein distance.

Compared to the KL and JS divergence, the Wasserstein distance can reflect the distance between two distributions even when their support sets do not overlap [30], or they overlap very little, which improves the stability of the network. It is defined as follows:

W (P_{1}, P_{2}) = \binom{i n f}{γ ~ \prod (p_{1,} p_{2})} E_{(x, y) ~ γ} [‖x - y‖]

(3)

In this formula,

\binom{i n f}{γ ~ \prod (p_{1,} p_{2})}

is the set of joint probability γ of all possible edge distributions combined with

P_{1}

and

P_{2}

distributions.

W (P_{1}, P_{2})

is the lower bound of the sample’s expected value

E_{(x, y) ~ γ} [‖x - y‖]

. The quality of the network model needs appropriate loss functions as the media of forward propagation and backpropagation, and the WGAN also needs corresponding loss functions to judge whether the generator and discriminator are optimal.

The generator loss function:

J (G) = E_{Z ~ P_{Z}} [f_{W} (G (z))] - E_{x ~ P_{r}} [f_{W} (x)]

(4)

The discriminator loss function:

J (D) = - E_{Z ~ P_{Z}} [f_{W} (G (z))]

(5)

To ensure that the data generated by the generator are distributed along the gradient direction of the discriminator, the 1-Lipschitz condition is introduced in this paper to prevent the gradient from disappearing during training. The formula is as follows:

||f (x_{t}) - f (y_{t})|| \leq K ||x_{t} - y_{t}||

(6)

where

x_{t}

and

y_{t}

are two points in the domain of the function, and K is a constant and is a 1-Lipschitz constraint when K = 1. There is a significant error in directly calculating the probability distribution; however, this error can be eliminated using the Kantorovich–Rubinstein duality, that is, when the maximum value is set to not exceed the minimum value defined by the Wasserstein distance, as follows:

{W (P_{1}, P_{2})}_{m i n} = \binom{s u p}{| | {D | |}_{L} \leq 1} E_{x ~ p (x)} [D (x)] - E_{z ~ p (z)} [D (G (x))]

(7)

Here, sup

\binom{s u p}{| | {D | |}_{L} \leq 1}

is a constraint condition and constrains the loss value of the discriminator. To optimize the discriminator, the regularization term is added to the discriminator loss function to achieve the gradient penalty calculation, and the formula is as follows:

Ω_{\bar{x}} = 1 - {λ E}_{\bar{x} ~ p (\bar{x})} | | ▽_{\bar{x}} D {(\bar{x}) | |}^{2}

(8)

where λ is the regularization coefficient;

| |ε| |

is the norm; and

\bar{x}

represents the random data between two points.

2.2. Bayesian Optimization Algorithm

The surrogate model, alternatively referred to as the objective function, iteratively refines the calculation of the objective function by leveraging loop parameters from the preceding iteration, thereby accomplishing parameter updates and probabilistic model rectification. The expression is as follows:

f (x) ~ G P (m (x), k (x, x^{'}))

(9)

where

f (x)

represents the evaluation value of the objective function corresponding to the sample;

x

represents the sample in the training set; X =

(x_{1}, x_{2}, \dots \dots, x_{n})

represents the sample set;

m (x)

represents the Gaussian distribution mean function; and

k (x, x^{'})

represents the Gaussian distribution covariance function. The objective function is expressed as follows:

H = - \sum_{b = 1}^{B} \sum_{c = 1}^{C} τ_{b c} \ln ω_{b c} + λ \sum_{j = 1}^{2} ρ_{j}^{2}

(10)

where B represents the number of samples; C represents the number of classes;

τ_{b c}

indicates whether the b sample belongs to the c-th class;

ω_{b c}

represents the output result of classification; λ represents the regularization coefficient;

ρ_{j}^{2}

represents the parameter to be learned in the network layer; and j represents feature mapping. The mean function is as follows:

m (x) = E [f (x)]

(11)

where E represents a Gaussian distribution.

To counteract the issue of unbalanced sampling due to repetitive sampling at local optima, the acquisition function is employed to select an optimal sample from the surrogate model. Through the maximization of the acquisition function, the subsequent satisfactory sample point can be identified. The expression is as follows:

I^{*} = a r g \max_{i \in I} φ_{i} (x, A)

(12)

where

φ_{i}

represents the acquisition function and A represents the set of observed data.

To achieve optimal sample acquisition, the UCB acquisition function is utilized in this paper with the expression as follows:

φ_{U C B} (x, A) = μ (x) + β δ (x)

(13)

where

μ (x)

and

δ (x)

represent the mean and covariance functions of the surrogate model’s objective function and posterior distribution, respectively, and

β

represents the tuning parameter.

2.3. R-FCN

The R-FCN structure is described in [31]. R-FCN is an improvement over both R-CNN and faster R-CNN, where the ResNet101 network is used to replace VGG16. Compared to CNNs and others, the shared convolutional operation is increased to achieve improved feature extraction. SPP is used to standardize image size to avoid dimensional errors caused by different image sizes and to improve the overall network generalization. The RoI candidate box is generated through the RPN, then the RoI is mapped to the feature graph obtained by the shared convolutional operation, and the required features are calculated. Finally, the convolutional layer is used to extract the regional features from the RoI feature map and implement classification. Since multi-task training on the feature graph is conducted at the same time in the R-FCN, to prevent multi-task loss during training, cross-entropy classification loss and boundary regression loss are merged to form a loss function. The expression is as follows:

L (s, t_{x, y, w, h}) = L_{c l s} (s_{c^{*}}) + λ [c^{*} > 0] L_{r e g} (t, t^{*})

(14)

L_{c l s} (s_{c^{*}}) = - l o g (s_{c^{*}})

(15)

L_{r e g} (t, t^{*}) = \sum_{i \in \{x, y, w, h\}} {s m o o t h}_{L 1} (t_{i} - t_{i}^{*})

(16)

{s m o o t h}_{L 1} (x) = \{\begin{matrix} 0.5 x^{2} i f |x| < 1 \\ |x| - 0.5 o t h e r w i s e \end{matrix}

(17)

where

c^{*}

is labeled with real data;

L_{r e g}

is the box regression loss;

t^{*}

is the image reality frame;

L (s, t_{x, y, w, h})

is the final loss value; [c^∗ > 0] is a true–false indicator, equal to 1 if the parameter is true, otherwise 0;

λ

is the balance weight; and

L_{c l s} (s_{c^{*}})

is the cross-entropy loss function.

3. Implementation

3.1. Construction of the Improved WGAN Model

In R-FCN, the convolutional kernels of the one-dimensional convolutional layers can only perform sliding window operations in the length direction, resulting in a limited receptive field. Moreover, each convolutional kernel produces only one value, and the number of output channels depends on the number of convolutional kernels. In contrast, two-dimensional convolutional kernels can slide in the length, width, and channel directions, expanding the receptive field of individual elements in deeper CNN feature maps, thereby capturing larger features in the input signal. Additionally, two-dimensional convolutional layers can perform two-dimensional cross-correlation operations and are more effective at recognizing image features compared to one-dimensional convolutional layers. Therefore, the original convolutional layers in R-FCN are replaced with two-dimensional convolutional layers.

The WGAN originally consists of a discriminator and a generator. However, when the generator produces data or images, the training effectiveness of the 1D convolutional layer is poor, leading to the generation of data or images that cannot be accurately recognized by the discriminator. Consequently, the discriminator fails to extract deep features, impairing its ability to determine whether the data or images are “real” or “fake”.

The workflow of the generator in this paper includes vibration signal acquisition, feature extraction, and feature fusion. Acquiring vibration signals involves the first layer of the generator, where a fully connected layer and batch normalization process the randomly generated signals. The aim is to match the random signal size with the input size of the generative network, principally using 128-pixel RGB images. Feature extraction identifies prominent signal features, using four 2D depth convolutions (Conv2D) with four BN layers, four LeakyReLU activation functions, and dropout layers alternately iterating to remove redundant features and retain the effective feature information. Feature fusion is also achieved using 2D convolution, employing large 3 × 3 convolutional kernels and small 1 × 1 convolutional kernels for feature fusion. Finally, the tanh activation function outputs the final result created by the generator. The upper part of Figure 2 shows the network structure of the generator.

The discriminator is replaced by an improved R-FCN. Initially, five 2D convolutional layers are used to obtain the feature map, with the first four convolutional layers having a kernel size of 3 × 3 and the last one having a kernel size of 1 × 1. Batch normalization (BN) layers and LeakyReLU activation functions are employed between the convolutional layers. Next, 2D convolution is applied to the feature map using 1 × 1 convolutional kernels to extract information, resulting in a new feature map. This repeated convolutional operation performs different convolutional operations on the input image, deeply mining the multi-scale feature information of the image, removing redundant signal features, and reducing the loss of effective feature information. Finally, the feature map containing this information is input into the R-FCN to determine the authenticity of the image and to output the fault type and diagnostic accuracy. The lower part of Figure 2 shows the network structure of the discriminator.

Both the generator and discriminator contain BN layers and dropout layers to prevent the WGAN model from experiencing gradient vanishing problems. These layers ensure that the dimensionality of the generator’s output and the discriminator’s input signals remain consistent, effectively speeding up the convergence of the 2D convolutional layers and reducing the drawbacks of slow fitting. Compared to CNNs, the difference with fully convolutional networks (FCNs) is that FCNs replace the fully connected layers at the end of CNNs with convolutional layers. Fully convolutional networks can progressively extract deep hidden features from the input information. When handling large amounts of high-dimensional RGB image data, the number and activation level of neurons between convolutional layers determine the speed of network training.

Finally, the improved WGAN diagnosis model first refines the discriminator network structure based on the traditional WGAN model and then combines it with the generator. The specific network architecture is shown in Figure 2.

3.2. Hyperparameter Optimization Based on the BOA

To optimize the improved WGAN model using the Bayesian optimization algorithm (BOA), the following steps are performed:

(1): Initialize the network hyperparameters. Obtain the hyperparameters to be optimized, forming an initial parameter set $F_{n} = [F_{1}, F_{2}, {\dots, F}_{n}]$ (n = 1, 2, …, n);
(2): Establish the objective function. This function is used to calculate the posterior probability for updating the optimization function;
(3): Utilize the objective function in step (2) to calculate the values from step (1), obtaining the corresponding function evaluation values, and construct dataset A $= [(F_{1}, U_{1}), \dots, (F_{n}, U_{n})]$ comprised of $F_{n}$ and $U_{n}$ ;
(4): Employ dataset A to build a Gaussian regression model. Use the first set of data from step (3) to determine if the model meets the requirements. If the generator and discriminator fitting conditions are met, the hyperparameters for that set are satisfactory; otherwise, proceed to step (5);
(5): Continuously iterate and update the loss function parameters through the Gaussian regression model, to refine the probability model;
(6): Acquire the next set of hyperparameters using the acquisition function, and calculate to obtain a new evaluation value through step (2);
(7): Determine if the accuracy requirements are met; if so, confirm the hyperparameters $F_{n}$ and $U_{n}$ ; otherwise, continue executing steps (2)–(6).

3.3. Semi-Supervised Learning Optimization to Improve the WGAN Model

The loss function of the WGAN is not strictly limited to unsupervised learning. Although its core goal is to estimate the Wasserstein distance between the real data distribution and the generated data distribution in an unsupervised setting, unsupervised learning does have certain limitations. For instance, it cannot effectively leverage unlabeled data for training, which may result in erroneous outputs and failure to meet the high requirements of fault diagnosis. On the other hand, supervised learning can improve accuracy but requires a large quantity of labeled data, which means that it is time-consuming and faces difficulties in achieving fault diagnosis with a small dataset [32]. Therefore, in this paper, we propose the integration of a semi-supervised learning loss function into the WGAN loss function to optimize the network model. The loss function improvement method is as follows:

(1): Generator Loss Function

To optimize the network model of the generator, we adopt the mean squared error as the generator’s loss function. The MSE weight is normalized and multiplied by a weight scale factor to effectively reduce the impact of random noise. The expression is as follows:

L_{G} = {c α L}_{M S E} [D (G (z))]

(18)

where c is the weight, ranging from [−1, 1];

α

is the scale factor;

L_{M S E}

is the mean squared error; and z is the random noise.

(2): Discriminator Loss Function

The loss functions of supervised learning, unsupervised learning, and semi-supervised learning are integrated into a new loss function for the discriminator in this study. This approach not only addresses the drawbacks of long training times and dependency on a large quantity of labeled sample data but also eliminates inaccuracies in model diagnosis caused by potential errors from unlabeled data.

First, the supervised learning loss function is established as follows:

L_{s} = E_{u, v ~ P_{d a t a (u, v)}} [P_{f a k e} (v| u)]

(19)

where E represents the mathematical expectation;

u, v ~ P_{d a t a (u, v)}

is the probability distribution of real data u and v; and

P_{f a k e} (v| u)

is the probability that the discriminator determines that the data are fake.

Next, the unsupervised learning loss function is established, including the loss of the real data and the generated data, as follows:

L_{u s ~ r} = E_{u ~ P_{d a t a (u)}} D (u)

(20)

L_{u s ~ f} = E_{u ~ n o i s e} (1 - D (G (z)))

(21)

where

L_{u s ~ r}

is the unsupervised learning real loss;

L_{u s ~ f}

is the unsupervised learning generated loss; D(u) is the discriminator’s evaluation of real data u; G(z) is the generated data; Z is random noise;

u ~ P_{d a t a (u)}

represents the real data; D(G(z)) is the discriminator’s evaluation of generated data G(z); and

u ~ n o i s e

represents false data.

Finally, the discriminator loss function

(L_{D}

) is established as follows:

L_{D} = L_{s} + L_{u s ~ r} + L_{u s ~ f}

(22)

To improve the performance of the discriminator loss function, weights are added to the loss function, resulting in the total discriminator loss

L_{D}

:

L_{D} = λ_{1} L_{S} + λ_{2} L_{u s ~ r} + {λ_{3} L}_{u s ~ f}

(23)

where

λ_{1}

is the weight of

L_{S ~ D}

loss;

λ_{2}

is the weight of

L_{U ~ D ~ r}

loss; and

λ_{3}

is the weight of

L_{U ~ D ~ f}

loss.

The weight expression is as follows:

Where

w (t + 1)

is the current weight value;

w

(

t

) is the previous weight value; Lr is the learning rate;

x - \hat{x}

is the net output value of the sample; and y is the sample feature value. When the loss function is optimized, λ₁, λ₂, and λ₃ are the optimal weight values.

3.4. Improved WGAN Model Diagnosis Framework

As can be seen from the first three sub-sections of Section 3, the improvements to the traditional WGAN include enhancing the R-FCN, replacing the traditional discriminator with the improved R-FCN, using the BOA to optimize the WGAN’s hyperparameters, and replacing the original WGAN loss function with a semi-supervised learning loss function. The improved WGAN model is shown in Figure 2. The bearing fault diagnosis process based on the improved WGAN is shown in Figure 3, and the diagnosis process is as follows:

Step 1: Use MATLAB to process the vibration signals with continuous wavelet transform (CWT) to obtain a time–frequency dataset, eliminating the impact of noise and significantly different data.

Step 2 and Step 3: Process the dataset; split it into training and testing sets in an 8:2 ratio; and label the one-hot encoded sample data with labeling rates of ε₁ = 0, ε₂ = 0.2, ε₃ = 0.4, ε₄ = 0.5, and ε₅ = 1 (where ε₁ = 0 and ε₄ = 1 represent unsupervised learning and supervised learning, respectively).

Step 4: Randomly initialize the model parameters, and input the number of iterations, learning rate, fault categories, batch size, loss function, etc.

Step 5: Train the discriminator using the training set input into the R-FCN. Update the discriminator model through backpropagation, then use the updated discriminator model parameters to update the generator model. The generator model is considered optimal when the discriminator can no longer distinguish between the real and generated signals. At this point, stop the iteration and output the improved WGAN diagnosis model.

Step 6: Test the WGAN model using the test set. Check if the loss functions of the generator and discriminator gradually fit and if the training accuracy meets the requirements. If so, output the fault types and accuracy; otherwise, continue training.

4. Experimental Datasets and Validation

4.1. Different Experimental Data Verification

To validate the effectiveness of the improved WGAN model, the Case Western Reserve University (CWRU) dataset [33] and an experimental dataset were employed.

Ten types of faults, encompassing normal data and damage diameters of 0.007, 0.014, and 0.021 mils for the inner race, outer race, and rolling element, were selected from the drive end at 12 kHz. One-hot encoding was employed to label the ten types of faults from 0 to 9, as illustrated in Table 1 (where B represents rolling element faults, IR signifies inner race faults, and OR@6 denotes outer race faults in the 6 o’clock direction).

Seven types of failures, including normal data and damage to the inner ring, outer ring, and rolling element with diameters of 0.15 mm and 0.3 mm, were selected from the drive end at 12.8 kHz. A thermal code was used to mark the seven types of faults from 0 to 6, as shown in Table 2.

4.2. Preparation of Different Experimental Data

A.: Dataset of bearing faults from Case Western Reserve University

The vibration signal data used in this study were sourced from the Case Western Reserve University (CWRU) Bearing Data Center. The experimental setup involved an electric motor equipped with a torque transducer, encoder, dynamometer, and electronic controller. The motor shaft was supported by two bearings: one at the fan end and another at the drive end. Single-point faults were introduced to the test bearings using electrical discharge machining (EDM) with fault diameters of 0.007 in., 0.014 in., and 0.021 in. The fault types included inner race faults (IFs), outer race faults (OFs), and ball bearing faults (BFs). The motor’s load power ranged from 0 to 17 kW, with operational speeds of 1797 rpm, 1772 rpm, and 1750 rpm. Accelerometers mounted on the motor housing with magnetic bases captured the vibration data. The signals were sampled at 12 kHz at the drive end and 48 kHz at the fan end. Digital data from the drive end were collected at 12,000 samples per second; meanwhile, for the fan end bearing faults, the rate was increased to 48,000 samples per second. Speed and horsepower data were also recorded using the torque transducer/encoder. The test platform for obtaining bearing vibration signal data, depicted in Figure 4, included an electric motor, a torque transducer with an encoder, a dynamometer, and an electronic controller.

For each failure data type obtained from CWRU, 1200 continuous points were chosen as the sample. MATLAB was employed to convert the original vibration signal into a waveform diagram, which was subsequently transformed into an RGB-format time–frequency map of size 128 by applying the continuous wavelet transform (CWT) tool. In total, 12,000 time–frequency maps were procured for the ten fault types. Figure 5 and Figure 6 show the diagrams of the vibration signal waveform and the time–frequency analysis, respectively.

B.: Bearing data collected in real experiments

Bearing selection: Due to the long time taken to obtain full lifecycle data for the inner and outer rings and the ball body, the artificial damage method was used in this study to inject local faults into the inner ring, outer ring, and ball body. However, the type and position of a fault were simulated according to a normal fault, and cutting was achieved by the manufacturer with precision instruments, and thus the operation of the centrifugal pump and data acquisition were not interfered with. Seven types of data were obtained from the operation of seven different bearings. The customized bearing model is shown in Figure 7a, and the single damage model of the outer ring, inner ring, and ball body is shown in Figure 7b. Table 3 shows the damage parameters of the bearing samples.

In addition to the bearings and sensors, the experimental platform included a water reservoir and pipes to ensure a water flow source, lubricating oil to reduce the heat generated by the high-speed operation of the bearings, a Donghua data acquisition instrument for collecting the data, and accompanying PC software for analyzing the experimental data. The setup of the experimental platform is shown in Figure 8.

The test data were collected using the DH5902 system. The specific connection steps were as follows:

(1): Connect the Donghua data acquisition instrument and PC terminal via a network cable port;
(2): Connect the Donghua data acquisition instrument and bearing surface through a CRZ-401 type sensor, with the sensor connected to ports 1-09 and 1-10 of the instrument;
(3): Open the Donghua test software on the PC and set the data collection path for saving the required data;
(4): Set the test parameters. The relevant channel parameters are as follows: the measurement type is voltage measurement, the measurement quantity is acceleration, the measurement range is 10,000 m/s², the input method is ICP, the sensor zero is 10,000 mv/m/s², and the sampling frequency is 12.8 kHz;
(5): Turn on and connect each instrument, and click “Measurement” in the DHDAS dynamic signal acquisition and analysis system to obtain the test data.

For the test dataset, the pulse wavelet basis function in CWT was used, and the seven vibration signals obtained in this study were denoised. In the test, 12,800 data elements were obtained over 1 s. To display more test data, 0.68 s was selected as the node below, and 6963 continuous points were obtained for one node as data. The time-domain waveform of the original vibration signal after processing is shown in Figure 9, where the horizontal coordinate is amplitude and the vertical coordinate is time.

4.3. Hyperparameter Selection

The improved WGAN model’s hyperparameters were obtained in a Win10 configuration environment, featuring a CPU of i7-9700K, a 64-bit operating system, and a deep learning framework based on PyCharm 3.7 and TensorFlow 2.8. The diagnostic model attained its optimum when the loss functions of the generator and discriminator were continuously fitted. The model training iteration count (epochs) was 350, the learning rate (Lr) was 0.0001, the batch size was 32, and the generator and discriminator utilized SGD and Adam optimizers, respectively, with the momenta set at 0.9 and 0.999 and a weight decay set at 0.0002. The network model’s hyperparameters after BOA implementation are presented in Table 4 and Table 5, with each Conv2D in the R-FCN containing five convolutional layers.

4.4. Analysis of WGAN Results with Different Datasets

With the established parameters, the processed datasets were fed into both the traditional discriminator model and the model where R-FCN replaced the discriminator for training. To guarantee the validity of the training accuracy, multiple training rounds were conducted and the accuracy was averaged. Figure 10 and Figure 11 show the overall model training accuracy curves after R-FCN replaced the discriminator in both the CWRU and experimental datasets.

As depicted in Figure 8 and Figure 9, for both the CWRU and experimental datasets, after 320 iterations, the training accuracy of the traditional discriminator stabilizes without significant fluctuations, remaining at approximately 88.2%. Conversely, the discriminator replaced by the R-FCN exhibits stable training accuracy after 330 iterations of roughly 98.9%. The superior accuracy of the R-FCN can be attributed not only to its deeper shared convolutional layers, which enable more abstract feature representations, but also to the elimination of the separate sub-network structure for RoI processing, thereby achieving a true fully convolutional network architecture.

However, the original Resnet101 network of the R-FCN contains numerous convolutional layers. To expedite the discriminator’s detection of images without affecting the detection accuracy, we altered the number of bins in the RoI region from 7 × 7 to 3 × 3. Subsequently, the first 25, 35, 55, and 85 layers, as well as the full convolutional layer in Resnet101, were individually selected for experimental comparison under the R-FCN model. The learning rate and batch size of the experimental settings were consistent with those used in this study, and the number of iterations was 20. The experimental results were evaluated using the common

{A P}_{50}

,

{A P}_{75}

, and AP evaluation methods employed in the R-FCN. The evaluation results for the CWRU and experimental datasets are shown in Table 6 and Table 7.

As the experimental results in Table 4 show, when the number of layers in the Resnet101 network increases from 25 to 35 to 101, the AP detection result accuracy still improves after the 25th layer, but the

{A P}_{50}

,

{A P}_{75}

, and AP only increase by 0.33%, 0.05%, and 0.1%, respectively. Meanwhile, the detection accuracy remains essentially unchanged. Simultaneously, the structural properties of the WGAN remain unaltered. Consequently, the first 25 convolutional layers under the CWRU and experimental datasets were selected for feature information extraction, and the mean training accuracy of the discriminator was obtained, as shown in Figure 12 and Figure 13. As can be deduced from these figures, the first 25 convolutional layers not only exhibit better training accuracy than the full convolutional layer but also achieve faster fitting compared to the full convolutional layer at the training speed.

4.5. Impact of Semi-Supervised Learning

To validate the feasibility of semi-supervised learning, the data were fed into the enhanced WGAN model for training, with the training parameters consistent with those in Section 3. Based on the calculation formula of the semi-supervised learning loss function, the generator (G-Loss) and discriminator (D-Loss) loss function curves were obtained, and the results for the different datasets are depicted in Figure 14 and Figure 15.

Figure 14 reveals that prior to implementing the unimproved semi-supervised learning loss function, the loss function stabilizes after 325 iterations; however, it still exhibits notable issues of gradient vanishing and mode collapse. Following the enhancement of the loss function, the training curve stabilizes with smaller fluctuation amplitude after 100 iterations, with the generator loss value stabilizing at 0.0318 and the discriminator loss value stabilizing at 0.0305. The loss function for the experimental dataset shown in Figure 15 is also the same, with both the fitting speed and fitting degree better than before the improvement. This indicates a more stable and superior model with the enhanced loss function, yielding more accurate training outcomes.

To further elucidate the role of semi-supervised learning in fault diagnosis, the label rate was selected from the real sample data of one-hot coded labels in two different datasets:

ε_{1}

= 0.2,

ε_{2}

= 0.4,

ε_{3}

= 0.5, and

ε_{4}

= 1 (

ε_{4}

= 1 representing supervised learning with labels). These four samples were fed into both the WGAN and enhanced WGAN models for training. The mean training results are presented in Table 8, Table 9, Table 10 and Table 11.

As evidenced in Table 6 and Table 7, when the sample label is 0.5, the accuracy rate and the generator and discriminator loss mean that the values for both the WGAN and the enhanced WGAN models are essentially identical to those with a sample label of 1. Compared to unsupervised learning with a label rate of 0, the accuracy rate mean value and loss function are more precise. Finally, the discriminator, generator, and corresponding average training accuracy of the enhanced WGAN model under the two datasets are shown in Figure 16 and Figure 17.

4.6. Analysis of Bayesian Optimization Results

To validate the optimization effect of the BOA on the enhanced WGAN model, it was juxtaposed with GS, RS, and GA concerning hyperparameter selection. The experimental data and network model remained consistent, and the training results for the two datasets are displayed in Table 12 and Table 13.

To demonstrate the pre-eminence of the proposed enhanced WGAN model, it was contrasted with prevalent models such as LSTM, CNN, GAN, and WGAN. LSTM represents the ML model; the convolutional layers of the CNN, WGAN, and the proposed model are identical, all employing two-dimensional convolutional layers, while the traditional GAN utilizes a fully connected layer to construct the network model. In comparison to the CNN and GAN, the WGAN and its enhanced structure incorporate BN layers and LeakyReLU activation functions, with the network layers and hyperparameters diverging from those of the GAN. However, the generator parameter settings for the GAN and WGAN coincide with those of the WGAN. Moreover, the BOA was employed to optimize the enhanced WGAN model. The experimental comparison of the above network models under the CWRU dataset and the experimental dataset is illustrated in Figure 18 and Figure 19.

5. Conclusions

In this study, we explored the impact of the continuous wavelet transform, the Wasserstein generative adversarial network (WGAN), semi-supervised learning, and hyperparameter optimization on diagnostic models. We proposed an enhanced WGAN-based method for bearing fault diagnosis. By integrating of R-FCN, 2D convolutional layers, normalization, shared convolutional layers, and a region-independent processing sub-network structure, the new method helps extract features from deep layers. Compared with traditional methods, our method achieved a significant improvement in diagnostic accuracy by about 6.6% over the WGAN.

In terms of optimization methods, we compared methods such as the genetic algorithm (GA), grid search (GS), and random search (RS), and used the Bayesian optimization algorithm (BOA) to identify a better combination of hyperparameters. Through the implementation of the BOA, we obtained a more refined network model, which significantly improved the experimental results on different datasets. Specifically, we achieved a 98.9% diagnostic accuracy on the CWRU dataset and a 97.4% accuracy on the real experimental dataset.

To sum up, our research has achieved satisfactory results in the field of bearing fault diagnosis; not only has a breakthrough been accomplished in the model structure and optimization method, but significant performance improvement have also been demonstrated on different datasets.

Author Contributions

Conceptualization, C.Z. and W.L.; data curation, H.Z. (Hongji Zhang).; funding acquisition, H.Z. (Hui Zhang).; investigation, Y.C.; resources, H.Z. (Hui Zhang).; software, H.Z. (Hui Zhang).; supervision, C.Z.; validation, Y.C. and Q.F.; visualization, W.L.; writing—original draft, W.L.; writing—review and editing, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The University—Industry Cooperation Research Project in Jiangsu under Grant BY20221435.

Data Availability Statement

Data is contained within the article.

Acknowledgments

We would like to thank Jian Zhang for providing the vibration signal collection experimental equipment.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, J.; Wu, C.; Cao, S.; Or, S.W.; Deng, C.; Shao, X. Degradation Data-Driven Time-To-Failure Prognostics Approach for Rolling Element Bearings in Electrical Machines. IEEE Trans. Ind. Electron. 2019, 66, 529–539. [Google Scholar] [CrossRef]
Gunerkar, R.S.; Jalan, A.K.; Belgamwar, S.U. Fault diagnosis of rolling element bearing based on artificial neural network. J. Mech. Sci. Technol. 2019, 33, 505–511. [Google Scholar] [CrossRef]
Pantis-Simut, C.-A.; Preda, A.T.; Ion, L.; Manolescu, A.; Alexandru Nemnes, G. Mapping confinement potentials and charge densities of interacting quantum systems using conditional generative adversarial networks. Mach. Learn. Sci. Technol. 2023, 4, 025023. [Google Scholar] [CrossRef]
Kim, I.; Lee, M.; Seok, J. ICEGAN: Inverse covariance estimating generative adversarial network. Mach. Learn. Sci. Technol. 2023, 4, 025008. [Google Scholar] [CrossRef]
Trigila, C.; Srikanth, A.; Roncali, E. A generative adversarial network to speed up optical Monte Carlo simulations. Mach. Learn. Sci. Technol. 2023, 4, 025005. [Google Scholar] [CrossRef]
Zhang, H.; Che, W.; Cao, Y.; Guan, Z.; Zhu, C. Condition Monitoring and Fault Diagnosis of Rotating Machinery Towards Intelligent Manufacturing: Review and Prospect. Iran. J. Sci. Technol. Trans. Mech. Eng. 2024. [Google Scholar] [CrossRef]
Wang, Z.; Wang, H.; Cui, Y. Fault Diagnosis of Rolling Bearing Based on Wavelet Packet Decomposition and SVM-LMNN Algorithm. In Proceedings of the IncoME-VI and TEPEN 2021: Performance Engineering and Maintenance Engineering, Tianjin, China, 20–23 October 2021; Springer: Cham, Switzerland, 2022; pp. 439–451. [Google Scholar]
Yan, R.; Gao, R.X. Multi-scale enveloping spectrogram for bearing defect detection. In Proceedings of the World Tribology Congress III, Washington, DC, USA, 12–16 September 2005; pp. 855–856. [Google Scholar]
Hoang, D.-T.; Kang, H.-J. Rolling element bearing fault diagnosis using convolutional neural network and vibration image. Cogn. Syst. Res. 2019, 53, 42–50. [Google Scholar] [CrossRef]
Ghorvei, M.; Kavianpour, M.; Beheshti, M.T.H.; Ramezani, A. Spatial graph convolutional neural network via structured subdomain adaptation and domain adversarial learning for bearing fault diagnosis. Neurocomputing 2023, 517, 44–61. [Google Scholar] [CrossRef]
Wang, Q.; Xu, F. A novel rolling bearing fault diagnosis method based on Adaptive Denoising Convolutional Neural Network under noise background. Measurement 2023, 218, 113209. [Google Scholar] [CrossRef]
Plakias, S.; Boutalis, Y.S. A novel information processing method based on an ensemble of Auto-Encoders for unsupervised fault detection. Comput. Ind. 2022, 142, 103743. [Google Scholar] [CrossRef]
Qian, J.; Song, Z.; Yao, Y.; Zhu, Z.; Zhang, X. A review on autoencoder based representation learning for fault detection and diagnosis in industrial processes. Chemom. Intell. Lab. Syst. 2022, 231, 104711. [Google Scholar] [CrossRef]
Zhu, J.; Shi, H.; Song, B.; Tao, Y.; Tan, S.; Zhang, T. Nonlinear process monitoring based on load weighted denoising autoencoder. Measurement 2021, 171, 108782. [Google Scholar] [CrossRef]
Liang, P.; Deng, C.; Wu, J.; Yang, Z. Intelligent fault diagnosis of rotating machinery via wavelet transform, generative adversarial nets and convolutional neural network. Measurement 2020, 159, 107768. [Google Scholar] [CrossRef]
Ren, Z.; Gao, D.; Zhu, Y.; Ni, Q.; Yan, K.; Hong, J. Generative adversarial networks driven by multi-domain information for improving the quality of generated samples in fault diagnosis. Eng. Appl. Artif. Intell. 2023, 124, 106542. [Google Scholar] [CrossRef]
Zhou, K.; Diehl, E.; Tang, J. Deep convolutional generative adversarial network with semi-supervised learning enabled physics elucidation for extended gear fault diagnosis under data limitations. Mech. Syst. Signal Process. 2023, 185, 109772. [Google Scholar] [CrossRef]
Wang, R.; Chen, Z.; Li, W. Gradient flow-based meta generative adversarial network for data augmentation in fault diagnosis. Appl. Soft Comput. 2023, 142, 110313. [Google Scholar] [CrossRef]
Nakashima, Y.; Bannai, Y. A Comparison of Cartoon Portrait Generators Based on Generative Adversarial Networks. In Proceedings of the Human Interface and the Management of Information, Interacting with Information: Thematic Area, HIMI 2020, Held as Part of the 22nd International Conference, HCII 2020, Copenhagen, Denmark, 19–24 July 2020; Proceedings, Part II 22. 2020; pp. 231–244. [Google Scholar]
Wang, C.; Wang, Z.; Ma, L.; Dong, H.; Sheng, W. A novel contrastive adversarial network for minor-class data augmentation: Applications to pipeline fault diagnosis. Knowl. Based Syst. 2023, 271, 110516. [Google Scholar] [CrossRef]
Wu, C.; Feng, F.; Wu, G.; Jiang, P.C. An effective unbalanced sample generation method and its application in planetary transmission fault diagnosis. J. Ordnance Eng. J. 2019, 40, 1349–1357. [Google Scholar]
Chang, Z.; Yuan, W.; Huang, K. Remaining useful life prediction for rolling bearings using multi-layer grid search and LSTM. Comput. Electr. Eng. 2022, 101, 108083. [Google Scholar] [CrossRef]
Öztaş, G.Z.; Erdem, S. Random search with adaptive boundaries algorithm for obtaining better initial solutions. Adv. Eng. Softw. 2022, 169, 103141. [Google Scholar] [CrossRef]
Zhang, Y.; Randall, R.B. Rolling element bearing fault diagnosis based on the combination of genetic algorithms and fast kurtogram. Mech. Syst. Signal Process. 2009, 23, 1509–1517. [Google Scholar] [CrossRef]
Yaru, L.; Yulai, Z.; Wand, J.C. Review of Bayesian optimization methods for hyper-parameter estimation. Comput. Sci. 2022, 49, 86–92. [Google Scholar]
Miao, Y.; Zhao, M.; Lin, J.; Lei, Y. Application of an improved maximum correlated kurtosis deconvolution method for fault diagnosis of rolling element bearings. Mech. Syst. Signal Process. 2017, 92, 173–195. [Google Scholar] [CrossRef]
Chen, W.; Chen, J.; Jiang, Y.; Song, D.; Zhang, W. Fault identification of rolling bearings based on RS-LSTM. J. Chin. Sci. Technol. Pap. 2018, 13, 1134–1141. [Google Scholar]
Xu, D.; Ge, J.; Wang, Y.; Wei, F.; Shao, J. SVM rolling bearing fault diagnosis optimized by quantum genetic algorithm. J. Vib. Meas. Diagn. 2018, 38, 843–851+879. [Google Scholar]
Al-Ragehi, A.; Jadid Abdulkadir, S.; Muneer, A.; Sadeq, S.; Al-Tashi, Q. Hyper-Parameter Optimization of Semi-Supervised GANs Based-Sine Cosine Algorithm for Multimedia Datasets. Comput. Mater. Contin. 2022, 73, 2169–2186. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Iscen, A.; Tolias, G.; Avrithis, Y.; Chum, O. Label propagation for deep semi-supervised learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5070–5079. [Google Scholar]
Bearing Vibration Data Set. Available online: https://engineering.case.edu/bearingdatacenter/download-data-file (accessed on 11 August 2023).

Figure 1. WGAN structure.

Figure 2. The improved WGAN diagnosis model.

Figure 3. Bayesian optimization improved WGAN diagnosis process.

Figure 4. The experimental data acquisition platform at Case Western Reserve University.

Figure 5. The waveform of the original vibration signal.

Figure 6. The time–frequency analysis diagram.

Figure 7. The experimental bearing samples. (a) The customized bearing model. (b) The single damage model.

Figure 8. The bearing failure test platform involving a centrifugal pump.

Figure 9. The time−domain waveform diagram for the real experimental bearing vibration signal.

Figure 10. The training curve of the CWRU dataset discriminator model.

Figure 11. The training curve of the real experimental dataset discriminator model.

Figure 12. The training accuracy curve before and after the discriminator for the CWRU dataset.

Figure 13. The training accuracy curve before and after the discriminator for the real experimental dataset.

Figure 14. The loss function curve for the CWRU dataset generator and discriminator.

Figure 15. The loss function curve for the real experimental dataset generator and discriminator.

Figure 16. The average training curve for the CWRU dataset discriminator, generator, and improved WGAN model.

Figure 17. The average training curve for the real experimental dataset discriminator, generator, and improved WGAN model.

Figure 18. Comparison results of different test methods for the CWRU dataset.

Figure 19. Comparison results of different test methods for the real experimental datasets.

Table 1. The CWRU dataset for one-hot coding.

Fault Type	Original Label	One-Hot Coding
Normal	0	${[1 0 0 0 0 0 0 0 0 0]}^{T}$
B007	1	${[0 1 0 0 0 0 0 0 0 0]}^{T}$
IR007	2	${[0 0 1 0 0 0 0 0 0 0]}^{T}$
OR007@6	3	${[0 0 0 1 0 0 0 0 0 0]}^{T}$
B014	4	${[0 0 0 0 1 0 0 0 0 0]}^{T}$
IR014	5	${[0 0 0 0 0 1 0 0 0 0]}^{T}$
OR014@6	6	${[0 0 0 0 0 0 1 0 0 0]}^{T}$
B021	7	${[0 0 0 0 0 0 0 1 0 0]}^{T}$
IR021	8	${[0 0 0 0 0 0 0 0 1 0]}^{T}$
OR021@6	9	${[0 0 0 0 0 0 0 0 0 1]}^{T}$

Table 2. Real experimental dataset for one-hot coding.

Fault Type	Original Label	One-Hot Coding
Normal	0	${[1 0 0 0 0 0 0]}^{T}$
B0.15mm	1	${[0 1 0 0 0 0 0]}^{T}$
IR0.15mm	2	${[0 0 1 0 0 0 0]}^{T}$
OR0.15mm	3	${[0 0 0 1 0 0 0]}^{T}$
B0.3mm	4	${[0 0 0 0 1 0 0]}^{T}$
IR0.3mm	5	${[0 0 0 0 0 1 0]}^{T}$
OR0.3mm	6	${[0 0 0 0 0 0 1]}^{T}$

Table 3. Damage status of bearing samples.

Fault Type	Outer Ring	Inner Ring	Ball Body
Fault depth 1	0.15 mm	0.15 mm	0.15 mm
Fault depth 2	0.3 mm	0.3 mm	0.3 mm
Fault width	0.3 mm	0.3 mm	0.3 mm

Table 4. The parameters of the generator model.

Parameter	Dimension	Kernel	Size	Stride	Activation
Input	128	/	/	/	/
Dense	256	/	/	/	/
Conv2D-1	256	32	5 × 5	1	LeakyReLU
Conv2D-2	512	64	5 × 5	2	LeakyReLU
Conv2D-3	1024	128	5 × 5	2	LeakyReLU
Conv2D-4	2048	128	5 × 5	2	LeakyReLU
Conv2D-5	2048	128	5 × 5	2	/
Output	2048	/	/	/	/

Table 5. The R-FCN parameters.

Parameter	Dimension	Kernel	Size	Stride	Activation
Input	2048	/	/	/	/
Conv2D-1	1024	128	3 × 3	2	LeakyReLU
Conv2D-2	512	64	3 × 3	2	LeakyReLU
Conv2D-3	256	64	3 × 3	2	LeakyReLU
Conv2D-4	128	32	3 × 3	1	LeakyReLU
Conv2D-5	64	16	1 × 1	1	LeakyReLU
Maxplling	32	/	/	/	/
Dense	10	/	/	/	Softmax

Table 6. The AP evaluation results for the CWRU dataset.

Layers	${A P}_{50}$ /%	${A P}_{75}$ /%	$A P$ /%
25	97.20	88.30	68.30
35	97.40	88.50	68.43
55	97.50	88.38	68.42
85	97.53	88.35	68.40
101	97.53	88.35	68.40

Table 7. The AP evaluation results for the real experimental dataset.

Layers	${A P}_{50}$ /%	${A P}_{75}$ /%	$A P$ /%
25	96.42	86.55	67.25
35	96.52	86.62	67.29
55	96.58	86.75	67.35
85	96.61	86.76	67.36
101	96.63	86.76	67.36

Table 8. Training results of the WGAN for the CWRU dataset.

Sample Label	Accuracy Rate/%	D-Loss	G-Loss
0	96.20	0.0421	0.0434
0.2	96.90	0.0406	0.0411
0.4	97.20	0.0394	0.0396
0.5	97.70	0.0376	0.0381
1	97.90	0.0364	0.0375

Table 9. Training results of the improved WGAN for the CWRU dataset.

Sample Label	Accuracy Rate/%	D-Loss	G-Loss
0	97.8	0.0337	0.0346
0.2	98.1	0.0337	0.0346
0.4	98.6	0.0332	0.0341
0.5	98.7	0.0314	0.0325
1	98.9	0.0305	0.0318

Table 10. Training results of the WGAN for the real experimental dataset.

Sample Label	Accuracy Rate/%	D-Loss	G-Loss
0	94.23	0.0486	0.0475
0.2	94.91	0.0452	0.0461
0.4	95.75	0.0422	0.0415
0.5	95.93	0.0401	0.0398
1	96.14	0.0394	0.0382

Table 11. Training results of the improved WGAN for the real experimental dataset.

Sample Label	Accuracy Rate/%	D-Loss	G-Loss
0	95.25	0.0439	0.0573
0.2	95.64	0.0415	0.0529
0.4	96.17	0.0355	0.0452
0.5	97.12	0.0327	0.0429
1	97.31	0.0322	0.0425

Table 12. Training results of the CWRU dataset hyperparameter optimization method.

Method	Accuracy Rate/%	Time/min
GS	97.2	88.4
RS	92.8	57.5
GA	98.1	73.5
BOA	98.9	74.2

Table 13. Training results of the real experimental dataset hyperparameter optimization method.

Method	Accuracy Rate/%	Time/min
GS	95.36	84.2
RS	89.63	58.1
GA	96.90	70.5
BOA	97.35	71.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, C.; Lin, W.; Zhang, H.; Cao, Y.; Fan, Q.; Zhang, H. Research on a Bearing Fault Diagnosis Method Based on an Improved Wasserstein Generative Adversarial Network. Machines 2024, 12, 587. https://doi.org/10.3390/machines12080587

AMA Style

Zhu C, Lin W, Zhang H, Cao Y, Fan Q, Zhang H. Research on a Bearing Fault Diagnosis Method Based on an Improved Wasserstein Generative Adversarial Network. Machines. 2024; 12(8):587. https://doi.org/10.3390/machines12080587

Chicago/Turabian Style

Zhu, Chengshun, Wei Lin, Hongji Zhang, Youren Cao, Qiming Fan, and Hui Zhang. 2024. "Research on a Bearing Fault Diagnosis Method Based on an Improved Wasserstein Generative Adversarial Network" Machines 12, no. 8: 587. https://doi.org/10.3390/machines12080587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on a Bearing Fault Diagnosis Method Based on an Improved Wasserstein Generative Adversarial Network

Abstract

1. Introduction

2. Methodology

2.1. WGAN-Related Theory

2.2. Bayesian Optimization Algorithm

2.3. R-FCN

3. Implementation

3.1. Construction of the Improved WGAN Model

3.2. Hyperparameter Optimization Based on the BOA

3.3. Semi-Supervised Learning Optimization to Improve the WGAN Model

3.4. Improved WGAN Model Diagnosis Framework

4. Experimental Datasets and Validation

4.1. Different Experimental Data Verification

4.2. Preparation of Different Experimental Data

4.3. Hyperparameter Selection

4.4. Analysis of WGAN Results with Different Datasets

4.5. Impact of Semi-Supervised Learning

4.6. Analysis of Bayesian Optimization Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI