1. Introduction
Rolling bearings are integral components of high-speed and high-power systems, and are prone to failure due to fatigue, wear, excessive loads, and other factors. Among the variety of bearings available, ceramic bearings are distinguished by their exceptional performance in high-speed environments, which is attributed to their low density, high hardness, resistance to wear, and thermal expansion. These characteristics not only reduce energy loss due to friction, but also enhance the longevity of machinery components, making ceramic bearings an ideal choice for a plethora of advanced engineering applications. Therefore, accurate fault diagnosis is essential to prevent undue operational interruptions and minimize economic losses [
1]. Shi et al. [
2] introduced a discrete-time model for ceramic bearings considering incipient faults, and proposed a modified observer with enhanced design freedom compared to traditional Luenberger observers. Gao et al. [
3] developed an ultra-high-speed hybrid ceramic rolling element triboelectric bearing to enable the real-time monitoring of dynamic behavior and stability. Industrial failures typically manifest as weak periodic shocks in bearing signals, exhibiting non-stationary and nonlinear characteristics that can be obscured by background noise [
4]. Therefore, signal processing techniques are crucial for noise reduction in vibration signals. Techniques like empirical mode decomposition (EMD) [
5], ensemble empirical mode decomposition (EEMD) [
6,
7], wavelet threshold denoising (WTD) [
8], and local mean decomposition (LMD) [
9] have been applied for noise elimination in vibration signals, proving their efficacy in signal processing. However, EMD and LMD struggle with issues like mode mixing and endpoint effects, while EEMD, although it addresses mode mixing by introducing white noise to each decomposition, does not entirely resolve mode mixing and endpoint challenges. WTD’s noise reduction success is highly dependent on the threshold value selection. VMD [
10] emerges as a non-recursive, adaptive technique that decomposes signals into specific frequency-centered modes with finite bandwidths, overcoming the limitations of EMD and related methods. Jiang et al. [
11] enhanced VMD’s efficacy in diagnosing bearing faults by adaptively determining the signal modes and optimizing the initial center frequencies. Wang et al. [
12] explored VMD’s equivalent filtering characteristics and applied their insights to diagnose faults in rotors and stators. Li et al. [
13] advanced the application of VMD by integrating it with a kernel extreme learning machine for enhanced fault diagnosis in rolling bearings. However, VMD’s effectiveness is heavily dependent on certain parameters, such as the number of decomposition layers K and the penalty factor α [
14,
15]. As demonstrated above, there is a considerable amount of research on parameter-optimized VMD [
16,
17,
18]. Incorrect settings can result in over or under decomposition, highlighting the importance of selecting suitable objective functions and optimizing the parameters for reliable results. The key objective functions in this domain include envelope entropy [
19], ensemble kurtosis [
20], and envelope spectrum kurtosis [
21], which, despite their decomposition efficacy, are sensitive to noise. Correlated kurtosis (CK) [
22,
23], which considers the periodicity of bearing fault signals and effectively isolates non-periodic components, represents a potential solution to this problem. Additionally, the correlation coefficient offers insights into signal similarities [
24]. Leveraging the strengths of correlated kurtosis and the correlation coefficient, weighted correlated kurtosis (WCK) has been devised as a comprehensive objective function. However, VMD’s computational demands and sensitivity to mode count underscore the importance of careful parameter selection and optimization to ensure accurate and reliable fault diagnosis outcomes.
To address the challenges posed by VMD, Nazari et al. [
25] proposed variational mode extraction (VME), an approach derived from VMD. By accurately determining the center frequency ω and penalty factor α, VME efficiently isolates the desired mode components, significantly reducing the computation time. This method has been effectively applied to fault diagnosis of rolling bearings. Ye et al. [
26] combined VME with an improved one-dimensional convolutional neural network for the intelligent diagnosis of rolling bearings. However, the selection of the VME parameters was based on empirical judgment, raising concerns about its reliability. Yan et al. [
27] employed the whale optimization algorithm to refine the parameters of VME, integrating this improved algorithm with the k-nearest neighbor algorithm (KNN). Liu et al. [
28] proposed a window fusion strategy that adaptively determines the center frequency ω and penalty factor α. Despite this innovation, their method still necessitates manual intervention for fault identification, highlighting a gap in the development of fully automated, intelligent diagnostic systems.
In recent years, intelligent fault diagnosis has emerged as a novel and increasingly popular approach. Common methodologies in this domain include the back-propagation neural network (BPNN), support vector machine (SVM), and random forest (RF). These techniques are adept at determining the health status of bearings through effective feature selection and extraction. However, the manual process of feature extraction remains time consuming. Identifying features that are highly sensitive to vibration signals and filtering out noise remain significant challenges. Deep learning, a subset of machine learning, has attracted research interest in recent years, especially with the advancements in computational capabilities and sensor technology [
29]. Saucedo-Dorantes et al. [
30] used stacked autoencoder structures for feature extraction and fusion to achieve an enhanced condition assessment, demonstrating its effectiveness for fault diagnosis across various bearing technologies. The convolutional neural network (CNN) [
31,
32], primarily recognized for its applications in image processing, leverages a local receptive field, shared weights, and subsampling within a spatial domain. This approach significantly reduces the computational demands on the network, minimizes the risk of overfitting, and facilitates the automatic extraction of crucial signal features [
33]. CNNs are typically employed for pattern recognition using two approaches: directly using the vibration signal as the input and preprocessing the vibration signal into a two-dimensional image for model input. The latter often involves transforming the vibration signal into a grayscale image [
34,
35], or converting it into a time-domain image through continuous wavelet transform or short-time Fourier transform [
36,
37,
38,
39]. Despite the success of two-dimensional CNNs in intelligent fault diagnosis [
40,
41], vibration signals are inherently one-dimensional sequences. Converting them into two-dimensional images necessitates extra preprocessing, potentially exaggerating the impact of periodic shock signals and reducing diagnostic efficacy in noisy conditions. Consequently, some scholars have explored directly using one-dimensional vibration signals as inputs for diagnosis via a 1D-CNN. For example, Wang et al. [
42] employed vibration and acoustic signals as inputs for a 1D-CNN in their diagnostic models; Habbouche et al. [
43] utilized VMD-preprocessed vibration signals with a 1D-CNN for diagnosis; and Shao et al. [
44] applied a 1D-CNN for fault feature extraction and trained an SVM on rolling bearing fault diagnosis. However, traditional 1D-CNNs face the challenges of computational demand and limited noise immunity. To overcome these issues, this study proposes a one-dimensional multi-scale residual convolutional neural network (1D-MRCNN), designed to lower the computational costs and enhance noise resistance.
In summary, we propose a rolling bearing intelligent diagnosis scheme based on the Archimedes optimization algorithm (AOA), to optimize the parameters of VME and a 1D-MRCNN. Initially, WCK serves as the objective function to optimize the penalty factor α and center frequency ω of the VME, aiming to extract the desired mode components and eliminate noise in the vibration signals. Then, the processed vibration signals are input into the 1D-MRCNN for fault diagnosis. The fusion of VME with the 1D-MRCNN maximizes the advantages of both techniques, thereby enhancing recognition accuracy.
The following is an overview of this paper:
Section 2 presents related works (pertaining to AOA, VME, and 1D-MRCNN);
Section 3 employs simulated signals to validate the feasibility of parameter-optimized VME using WCK as the objective function; in
Section 4, the experimental signals are analyzed and compared to demonstrate the practicality and superiority of the proposed method; and
Section 5 presents our conclusions.
2. Related Works
2.1. Variational Mode Extraction
The VME algorithm is derived from VMD and shares a similar mathematical principle. However, VME extracts a single specific component, resulting in higher efficiency. In VME, the input signal
is decomposed into two parts:
In Equation (1), represents the desired mode component, and represents the residual signal.
Moreover,
needs to be compactly surrounded by the center frequency ω after the Hilbert transform and have minimal overlap with
and
. Therefore, the constraints need to be minimized to obtain the desired mode components, as follows:
In Equation (2), represents the Dirac distribution, denotes the center frequency of the mode component , and ∗ denotes the convolution operation.
The spectral overlap between
and
is minimized, and a penalty function is introduced, as follows:
In Equation (3), represents the impulse response of the frequency response filter.
Therefore, the problem of finding modes can be formulated as a problem of constrained minimization when Equations (2) and (3) are combined:
In Equation (4), denotes the parameter that balances and .
The previously described constrained optimization problem is transformed into an unconstrained format. This involves incorporating both a quadratic penalty term and Lagrange multipliers
, thereby forming the augmented Lagrange function:
The alternating direction method of multipliers is used to find the saddle point of the Lagrange function.
The steps are as follows:
Initialize , , , n = 1, and estimate the initial value of .
According to Equation (6), update
:
According to Equation (7), update
:
According to Equation (8), update the Lagrange multipliers for all ω > 0:
Repeat steps 2~4 until the iteration stop condition is satisfied:
End the loop and obtain the desired mode .
2.2. Archimedes Optimization Algorithm for Optimizing VME
2.2.1. Weighted Correlated Kurtosis
In VME, the objective function chosen for optimizing the center frequency ω and penalty factor α is correlated kurtosis, which is particularly sensitive to periodic impact components. The function for correlated kurtosis is as follows:
In Equation (10), M represents the shift, and T represents the impact signal period.
When T = 0 and M = 1, correlated kurtosis is equivalent to kurtosis. The superiority of correlated kurtosis lies in its heightened sensitivity to periodic impacts, making it more effective for extracting fault features in rotating machinery, such as rolling bearings.
The correlation coefficient, which quantifies the similarity between two signals, facilitates the detection of maximum similarity between the original and decomposed signals. This process aims to retain as much useful information as possible. The expression of the correlation coefficient is shown as follows:
In Equation (11), represents the mathematical expectation, and C represents the correlation coefficient between signals x and y.
However, the correlation coefficient is susceptible to noise interference. Therefore, this paper takes into account the distinct advantages of both correlated kurtosis and the correlation coefficient. With these considerations, WCK is utilized as the objective function for parameter-optimized VME. The expression of WCK is shown as follows:
2.2.2. Archimedes Optimization Algorithm
The AOA, a new heuristic algorithm proposed by Hashim et al. [
45], is designed for complex problems prone to local optimal solutions. In this algorithm, each individual is represented as an immersed object, and its acceleration is updated based on collisions with neighboring objects. The individual’s new position is determined by considering factors such as density, volume, and acceleration.
The initial location of an individual is defined as follows:
In Equation (13), and denote the lower and upper bounds of the search range, and denotes a random number in the range [0,1].
Acceleration, density, and volume are initialized as follows:
In this step, the initial population is evaluated, and the individual with the best fitness is selected and assigned , , and .
The updated density and volume are as follows:
In Equation (15), represents the current iteration, and represent the best individual density and volume found so far, and denotes the current individual optimal acceleration.
The transition from collision to balance between individuals is controlled by the balance factor TF. This signifies the transition from the exploration phase to the exploitation phase, as follows:
In Equation (16), represents the maximum number of iterations.
The density factor
determines the position state of the individual, which helps the AOA search from global to local optima:
The individual acceleration at iteration
is updated as follows:
In Equation (18), , , and , denote the density, volume, and acceleration, respectively, of a random individual.
Acceleration within the algorithm is normalized to facilitate optimal search behavior. When the target is distant from the global optimum, the acceleration is set at a high value, indicative of the exploration stage. Conversely, when the target nears the global optimum, the acceleration is reduced, signifying the transition to the exploitation stage:
In Equation (19), and are the standardization parameters and are set to 0.9 and 0.1, respectively.
The individual position at iteration
is updated as follows:
In Equation (20),
is set at 2,
is set at 6,
increases with the number of iterations in the range
, and
, which represents the direction factor, is defined as follows:
In Equation (21), , is set at 0.5.
A flowchart of the process for optimizing the parameters of VME using the AOA is shown in
Figure 1.
2.3. One-Dimensional Convolutional Neural Network
The convolutional neural network, a prominent deep learning algorithm, possesses exceptional feature learning capabilities. Due to its significant breakthroughs in image processing, it has garnered attention from scholars with regard to its potential in fault diagnosis applications.
A standard one-dimensional CNN consists of four key components: a convolutional layer, a pooling layer, a fully connected layer, and a classification layer.
Within the convolutional layer, the input signal undergoes convolution using a convolutional kernel at a specified step size:
In Equation (22), represents the output value of the th neuron in the th convolutional layer, represents the activation function, represents the weight parameter of the th convolution kernel in the th layer, represents the th input value of the layer input in the th convolution kernel, represents the bias parameter of the th neuron in the th convolutional layer, and k represents the size of the convolution kernel.
The activation function plays a crucial role in capturing the nonlinear characteristics of the input signal, thereby amplifying the network’s representational capacity. The ReLU activation function is notable for its efficiency in accelerating network training; moreover, it aids in preventing the vanishing gradient problem and helps to mitigate overfitting issues. In the architecture of the network, the ReLU function is typically employed after the convolutional layer. The expression of the ReLU function is as follows:
The pooling layer primarily focuses on extracting features and reducing the dimensions of data processed by the convolutional layer. There are two prevalent methods of pooling: average and maximum pooling. Maximum pooling involves calculating the maximum value within a specified region, which then represents the region after pooling. The expression of maximum pooling is as follows:
In Equation (24), represents the output features after the maximum pooling operation, and represents the pooling region size.
In the context of multi-classification issues, the output layer typically employs the Softmax function to facilitate the categorization process. The formulation of the Softmax function is as follows:
In Equation (25), represents the number of categories, and represents the input in the th neuron.
2.3.1. GAP Layer
In traditional CNNs, fully connected layers serve to concatenate the features derived from the convolution and pooling operations into one-dimensional vectors. However, the GAP layer offers significant advantages. This layer substantially decreases the parameter count, enhancing both the speed of training and the model’s generalization capabilities. The GAP layer is expressed as follows:
In Equation (26), represents the value after the global average pooling of the th layer, denotes the number of neurons, and represents the output value of the th neuron in the th convolutional layer.
2.3.2. Residual Structure
In neural networks, the introduction of a residual structure through shortcut connections that skip one or more layers allows the input to be directly added to the network’s output. This design enables the network to focus on refining the input rather than learning an entirely new mapping technique, significantly diminishing training challenges, and mitigates issues such as gradient vanishing and explosion, which are common in networks with numerous convolutional and activation layers. Consequently, this approach enhances training convergence and simplifies the training process of deep networks. The typical configuration of a residual structure is depicted in
Figure 2.
Mapping techniques can be categorized into the following two distinct methods based on the consistency of the input and output dimensions: identity mapping and projection mapping. Identity mapping is employed when the dimensions align, while projection mapping is utilized to reconcile dimensional discrepancies. Simplifying shallow networks by eliminating BN layers reduces the parameter count, accelerating training without increasing computational complexity. This adjustment makes the model more adaptable to small-scale datasets.
Figure 3 illustrates the residual structure implemented in further experiments.
2.3.3. One-Dimensional Multi-Scale Residual Convolutional Neural Network
Utilizing convolution kernels of varying sizes to construct a multi-scale fusion CNN enables the integration of information across different scales. Initially, the network employs a large convolution kernel in its first layer to extract basic features from the vibration signal. Subsequently, convolution kernels of sizes 7 × 1, 5 × 1, and 3 × 1 are deployed in parallel to obtain more in-depth knowledge on the signal’s features. This approach not only facilitates the fusion of information, but also incorporates modified residual blocks to expedite convergence. Following this, the GAP layer processes the extracted features, leading to the final classification stage performed by the Softmax layer. The structure of the one-dimensional multi-scale residual convolutional neural network is shown in
Figure 4. The parameters of the 1D-MRCNN are shown in
Table 1. The vibration signal extracted using parameter-optimized VME will be converted into a dataset for input into the 1D-MRCNN.
3. Fault Simulation Signal Analysis
In order to illustrate the effectiveness of the algorithm in this paper, a rolling bearing fault model is constructed to simulate the inner fault, and random shocks, periodic harmonics and Gaussian white noise are added, and the simulated signals are constructed as follows:
In Equation (27),
represents the mixed signal;
represents the bearing inner fault;
represents the random shocks caused by electromagnetic interference or the external environment;
represents periodic harmonics;
represents Gaussian white noise;
,
, and
denote different amplitudes;
denotes the interval between two adjacent pulses;
represents the small fluctuations caused by random sliding of the rolling element, which accounts for 1% of
;
denotes the frequency of the periodic harmonics;
denotes the phase of the periodic harmonics; and
denotes the impulse response function. The expression of
is as follows:
In Equation (28), denotes the frequency conversion.
The expression of
is as follows:
In Equation (29), denotes the resonant frequency, denotes the phase position, and denotes the attenuation coefficient.
The inner fault simulation signal constructed in this paper is shown in
Figure 5. The key parameters are as follows: the sampling frequency
is set to 16 kHz, the sampling number to 8192, and the frequency conversion
to 30 Hz, and the inner fault characteristic frequency is
. The resonance frequencies of the inner fault and random shocks are 3500 Hz and 4000 Hz, respectively. The attenuation coefficients are 800 and 2000, respectively.
is set to 0.3,
is a random number in the range [0.25,1.25],
is set to 0.15,
is a random number in the range [100,200], and
is set to −10 dB of Gaussian white noise.
Signal and envelope spectrum diagrams of the inner fault are represented in
Figure 6. To accurately identify the type of rolling bearing fault, the parameter-optimized VME method was employed for signal analysis. Firstly, the parameter
in the VME algorithm was optimized, and the variation curve of WCK with respect to the number of iterations is shown in
Figure 7.
After applying the AOA, the best parameter combination is determined to be [265,4271]. Further validation of the AOA’s efficiency for parameter optimization was conducted through comparative iterative optimization with the grey wolf algorithm (GWO), the whale optimization algorithm (WOA), and particle swarm optimization (PSO). The comparison reveals that the fitness values of the AOA, GWO, WOA, and PSO after the 4th, 12th, 11th, and 6th generations are 7.31, 7.15, 7.21, and 7.02, respectively (unit: 10 × 10−12).
This analysis demonstrates that the AOA converges more rapidly and is less prone to becoming trapped in the local optimal solution. The results of the parameter-optimized VME are shown in
Figure 8. Where the blue portion indicates the original signal, while the red portion indicates the extracted signal. The envelope spectrum reveals the inner fault frequency, accompanied by significant frequency doubling. The use of WCK as the objective function to optimize the VME parameters is demonstrated to be rational.