1. Introduction
The signal analysis of vibration in rotating machinery has been widely used in the field of fault diagnosis because the signals contain the operational state of the equipment [
1,
2]. However, in the case of the limitations on the number and installation location of sensors, the information obtained from the signals is limited [
3,
4]. Moreover, the non-stationary nature of the collected signals, the interference between multi-source fault signals and environmental noise may often result in the disappearance of feature information. Therefore, it is of great significance for the separation and extraction of compound faults based on vibration analysis [
5,
6].
There are many analysis methods based on vibration signals, such as feature extraction, pattern recognition and deep learning. For example, Wang et al. [
7] proposed a fault diagnosis method based on sparsity-guided empirical wavelet transform, which can defect single and multiple fault bearings of railway axles. Lu et al. [
8] introduced a method combining wavelet transform and K-mean clustering to realize the prediction about the battery state of health. Alimardani et al. [
9] present an approach based on vibration signals to diagnose the faults of rotor eccentricity. Zhang et al. [
10] developed a method based on the local outlier factor and improved adaptive matching pursuit, which can detect and recover the anomalous vibration signal. Li et al. [
11] present an adaptive data fusion strategy based on deep learning with the convolutional neural network, which is validated on an industrial fan system with non-manufacturing faults and a centrifugal pump. Łuczak [
12] proposed a method named CWTx6-CNN, which offered a clear representation of fault-related features. Wang et al. [
13] introduced a novel fault recognition method on the basis of multi-sensor data fusion and bottleneck layer optimized convolutional neural network (MB-CNN) and realized the identification and classification of multiple faults of bearings. We know that analysis methods based on vibration signals mostly focus on low-dimensional analysis [
14], and the information obtained from the original signal is bounded. It requires us to perform dimensionality transformation on one-dimensional vibration signals and observe the multi-dimensional signal so as to reveal unclear information. Simultaneously, the local feature information can be enhanced significantly with dimensionality transformation [
15,
16].
In the past few decades, many methods of dimensionality transformation have been proposed and widely applied in the fields such as signal separation, image clustering, biological information extraction, behavior feature recognition, and environmental perception and prediction [
17,
18,
19,
20]. The methods regarding dimensionality transformation can not only reduce the dimensionality of data but also extract salient features from high-dimensional data effectively. Meanwhile, it is beneficial for subsequent data processing and can achieve low dimensional visualization of data. The traditional dimensionality transformation algorithm actually seeks the intrinsic linear structure of the data in low dimensional space [
21,
22]. However, most of the internal structures of data are complex and show nonlinear characteristics. In addition, the dimensions of various types of data continue to grow at an extremely fast pace. Therefore, exploring the effective features and improving the ability to analyze such data has a positive effect. Machine learning algorithms based on matrix factorization are the key technologies for several types of problems in this field, including dictionary learning, non-negative matrix factorization (NMF), concept factorization, matrix padding, etc. [
23,
24,
25]. Among them, the NMF algorithm has attracted much attention in feature extraction engineering due to its unique advantage of interpretability and scalability [
26]. For example, Zhang et al. [
27] proposed a weighted NMF algorithm, which achieved image clustering by optimizing three parameters in the algorithm. Gu et al. [
28] introduced a method combining an improved NMF algorithm and a global position system to identify the sources driving ground deformation. Luo et al. [
29] developed a novel approach based on the robust ensemble manifold projective NMF algorithm for image representation. Saha et al. [
30] used a privacy-preserving NMF algorithm to ensure the degree of privacy guarantees. Li et al. [
31] adopted a deep autoencoder-like NMF method for link prediction. In addition, the NMF algorithm performs well in the field of biomedicine. Marta et al. [
32] proposed a negative binomial NMF algorithm, which can capture the variation across patients to extract the mutational signatures. Tu et al. [
33] proposed a hypergraph regularized joint deep semi-NMF algorithm to identify biomarkers of Alzheimer’s disease. Nasrin et al. [
34] put forward a model on the basis of the improved NMF algorithm that can recognize native decoys in protein structure prediction.
It can be observed that the NMF algorithm has been applied in many fields and has achieved many remarkable results since it was proposed. However, there is still some room to improve the NMF algorithm, especially in the blind source separation problem related to the diagnosis of compound faults in rotating machinery. Therefore, to solve the separation of multi-source signals and detect their features from a single channel, a signal separation method based on multi-constraint NMF algorithm is proposed. By utilizing the flexibility of β-divergence and the uniqueness of determinant constraint on the feature matrix, the objective function of non-negative matrix factorization can be converted to the minimum value smoothly, quickly and stably. According to the advantage of dimensionality transformation with the STFT algorithm, multi-constraint NMF algorithm, and construction of parameter WK, the proposed method can accomplish the separation of multi-source signals and their fault diagnosis of bearings, which makes fault diagnosis much easier and more reliable. As rolling bearings are important components of rotating machinery, this paper takes rolling bearings as the research object.
The remaining sections are organized as follows:
Section 2 describes the basic principle of the NMF algorithm. The STFT algorithm, multi-constraint NMF algorithm and the parameter WK are introduced in
Section 3. In
Section 4, the specific separation of compound fault signals based on the suggested method is presented. The simulated and experimental results are discussed in
Section 5. Finally, the conclusions are summarized in
Section 6.
2. Principle of Non-Negative Matrix Factorization
The basic idea of the non-negative matrix factorization algorithm can be generally represented as follows: for any non-negative matrix
, the NMF algorithm is constructed with an approximate factorization of two non-negative matrices
and
[
35], namely:
where
denotes a matrix with the dimension of
m, whereas n represents the number of samples.
denotes a basis matrix that can be regarded as a series of basis vectors.
denotes a coefficients matrix that can be regarded as the coordinates of each sample with respect to these basis vectors. In order to achieve better results of dimensionality reduction, the parameter
(rank of the matrix) is regarded as
. The model of the NMF algorithm is shown in
Figure 1. In the field of signal processing, it can be explained that if each column of the matrix
is considered an observed signal, each group of observed signals contains different features (mixed features, single features, or redundant information) represented by green squares and red triangles. Each column of the matrix
contains the separated feature of the observed signal by the NMF algorithm, which can be reconstructed to the original signal by multiplying the coefficients matrix
. It shows the idea of representing the whole based on parts.
At present, a variety of optimization algorithms about cost function are widely used, and the Euclid Distance is one of the most popular methods, which can be represented:
The cost function of Equation (2) is regarded as the following optimization problem:
The above problem can be solved with a gradient descent algorithm until convergence. The updated rules are presented:
3. Basic Principle
3.1. Parameter Selection of Short Time Fourier Transform
Signals can be transformed into the frequency domain, sparse domain, or other combination domains for processing and analysis. Indistinct features in the time domain can be manifested through such transformation. The traditional Fourier transform is a global transformation based on the combination of different frequency components, which cannot express the time–frequency localization. In order to describe the time–frequency properties of signals, short-time Fourier transform (STFT) is proposed.
STFT is a joint time–frequency analysis method based on non-stationary signals. Its basic idea is to truncate the signal by a window function with a fixed length, and the Fourier transform is performed on each segment of the truncated signal to obtain the local frequency spectrum of each segment. Its model can be presented as [
36]:
where
t is the time,
f is the frequency,
is the time-domain signal,
denotes a shift in time, and
is the window function, and
j is an imaginary unit. By shifting
τ continuously, Fourier Transforms at different times can be obtained. The set of these Fourier Transforms is
S(
t,
f).
As an important processing tool in time–frequency analysis, the short-time Fourier transform has the advantages of simple principle and excellent localization. The weak local feature information can be captured by the two-dimensional representation of vibration signals in the time–frequency domain, and the high-dimensional spatial matrix is easier to leverage the ability of non-negative matrix decomposition algorithms, making compound faults diagnosis easier to implement.
Two main parameters (types and lengths of the window function) affect the effectiveness of the short-time Fourier transform. Window function is a method of truncating signals, which can reduce the effect of spectral leakage. The length of the window function affects the time–frequency resolution. The longer the window length, the higher the frequency resolution, but the time resolution is lower. Therefore, the type of window function and the length of the window need to be determined based on the specific signal type and processing environment.
In order to reduce the effects of windowing and improve diagnostic accuracy, it is necessary to choose an appropriate window function. As we know, the wider the main lobe of the window function, the smoother the spectral peak of the signal is, and the more obvious the suppression effect of the fence effect is, but it will lead to a decrease in spectral resolution. From the perspective of spectrum analysis, it is required that the main lobe of the window function spectrum should be as narrow as possible to improve the resolution of the spectrum. At the same time, the side lobes of the window function spectrum should be as small as possible and decay rapidly with frequency, which can reduce leakage distortion. Therefore, comparing the performance of several common window functions for the coupling characteristics of compound fault signals in rotating machinery, the Sine-bell window is selected as the processing method in this paper. The sine-bell window performs well on side lobe suppression and can concentrate spectral energy in the main lobe. If the overlapping length is specified during its sliding process, the overlapping window segment can further compensate for signal attenuation at the window edge. The waveform and frequency response of the Sine-bell window are shown in
Figure 2. The window length is 128 samples, and the overlap is half of the window length.
3.2. Multi-Constraint Non-Negative Matrix Factorization
The selection of the cost function for the non-negative matrix factorization algorithm is determined by the type of data and the application environment. Although NMF has been proven to be a useful tool in source separation, one drawback is that the separation performance tends to be poor in the case of noise. Moreover, NMF incurs a risk of degrading the separation performance in compound fault signals due to the lack of prior knowledge. Meanwhile, in the process of feature extraction for multi-source fault signals, the worse the correlation between source signals, the more obvious the locality displayed, and the better the effect on dimensionality reduction. On the contrary, there will be redundant components during the decomposition, which fails to describe the fault characteristics. Therefore, the dual constraints with
β-divergence and determinant are selected as the cost function for the non-negative matrix factorization algorithm based on the characteristics of the fault signal. The
β-divergence constraint can reduce limitations on data structures, and the determinant constraint can ensure the uniqueness of the base matrix
W during the decomposition. The dual constraints can enhance local features effectively, which are more conducive to subsequent signal reconstruction. The model of
β-divergence [
37] can be presented as:
From the above Equation (6), it is easy to prove the continuity about
β-divergence when
β = 0 and
β = 1, and for any
β, the following Equation (7) holds:
When β = 0, it can be seen that Equation (7) has the property of scale invariance, which is independent of λ. The property of scale invariance indicates that energy components in the amplitude spectrum V have equal weight values during the decomposition. When β = 1, however, it overly relies on the higher energy components in the amplitude spectrum V, which is not conducive to the separation of coupled signals. Therefore, β = 0 is chosen in this paper.
In order to ensure the uniqueness of the base matrix
W and achieve better reconstruction results during the decomposition, the determinant constraint is introduced in the objective function of the NMF algorithm. The space formed by
n m-dimensional column vectors
is defined as
P(W), and the volume of
P(W) can be represented as the following Equation (8):
When is at its minimum value, the corresponding vector obtained can be determined uniquely.
The
β-divergence constraint and determinant constraint are used as new objective functions for the non-negative matrix factorization algorithm, which can be represented:
where α is the equilibrium parameter and is taken as 1 (α = 1) generally, which is used to balance the proportion of matrix
W and the reconstruction error.
According to the gradient descent method, we derive the iterative update rule for the objective function as follows:
When the objective function converges, the optimization with dual constraints can be achieved. The specific steps of Algorithm 1 are as follows:
Algorithm 1 Multi-constraint Non-Negative Matrix Factorization |
Step 1. Initialize non-negative matrices W and H randomly |
Step 2. Calculate the initial value of the objective function according to Equation (9) |
Step 3. Solve and update the matrices W and H alternately and iteratively based on Equation (10) |
Step 4. If the objective function (Equation (9)) converges, the iteration process is stopped, and the matrices W and H are output; otherwise, steps (2) and (3) are performed once again |
The advantage of the multi-constraint NMF algorithm is that the constraints of β-divergence and determinant are introduced in the objective function, which can be close to the source signal, and the redundant component is reduced during the decomposition.
3.3. Construction of Parameter WK
The kurtosis index is a numerical statistic that reflects the distribution characteristics of random variables. It is the normalized 4th-order center moment, which is a dimensionless parameter and is particularly sensitive to impact signals. The correlation coefficient can be characterized by the degree of similarity between two signals. Considering the advantages and disadvantages of two indicators, we constructed a comprehensive parameter called Weighted Kurtosis (WK) in this paper, which is defined as follows:
where
C is the correlation coefficient between the signals
x and
y, and
E represents the mathematical expectation,
K is the Kurtosis value of the signal. According to the Schwartz inequality
can be inferred. Thus, the parameter
WK can be seen as the weight of the Kurtosis value, called Weighted Kurtosis. We know that the early failures of rolling bearings are mostly characterized by impact, and kurtosis is used to detect the impact components in the reconstructed signal, while the correlation coefficient can be reflected in the correlation between the reconstructed signal and the original signal. Meanwhile, according to Equation (11), it can be seen when the signal is processed by the multi-constraint NMF algorithm; the larger the parameter WK in the reconstructed signal, the richer the feature information contained, which can represent the fault characteristic signal. Therefore, the parameter WK is constructed as a criterion for filtering the reconstructed signal in this paper.
6. Conclusions
In this paper, a novel blind source separation method under a single channel based on the multi-constraint NMF is proposed. The main research content and corresponding conclusions are as follows: (1) The performance of several common window functions are compared for compound fault signals, the Sine-bell window is selected as the processing method, and its parameter length is selected iteratively. (2) The constraints with β-divergence and determinant are introduced into the objective function of the traditional NMF algorithm, which can enhance local feature information and reduce redundant components during the decomposition. The iterative update rules for the multi-constraint NMF algorithm have been derived, and the convergence and practicality of the algorithm have been demonstrated in experiments. (3) The parameter Weighted Kurtosis (WK) is constructed as a criterion for filtering the reconstructed signals, and it has been proven to separate redundant signals effectively. (4) The simulated and experimental results indicate the effectiveness of the proposed approach, which realizes the separation of multi-source signals and extracts fault features. Meanwhile, compared with the NMF algorithm of the traditional objective function, the proposed method is more applicable for compound fault diagnosis.
It is worth considering that some deficiencies still exist, such as the initialization random of the algorithm in this paper. Therefore, future work will concentrate on the optimization initialization of the non-negative matrix factorization algorithm.