Abstract
Prognostics and health management (PHM) is critical for enhancing equipment reliability and reducing maintenance costs, and research on intelligent PHM has made significant progress driven by big data and deep learning techniques in recent years. However, complex working conditions and high-cost data collection inherent in real-world scenarios pose small-data challenges for the application of these methods. Given the urgent need for data-efficient PHM techniques in academia and industry, this paper aims to explore the fundamental concepts, ongoing research, and future trajectories of small data challenges in the PHM domain. This survey first elucidates the definition, causes, and impacts of small data on PHM tasks, and then analyzes the current mainstream approaches to solving small data problems, including data augmentation, transfer learning, and few-shot learning techniques, each of which has its advantages and disadvantages. In addition, this survey summarizes benchmark datasets and experimental paradigms to facilitate fair evaluations of diverse methodologies under small data conditions. Finally, some promising directions are pointed out to inspire future research.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Prognostics and health management (PHM), an increasingly important framework for realizing condition awareness and intelligent maintenance of mechanical equipment by analyzing collected monitoring data, is being applied in a growing spectrum of industries, such as aerospace (Randall 2021), transportation (Li et al. 2023a), and wind turbines (Han et al. 2023). According to a survey conducted by the National Science Foundation (NSF) (Gray et al. 2012), PHM technologies have created economic benefits of $855 million over the past decade. It is the fact that PHM has such great application potential that it continues to attract sustained attention and research from different academic communities, including but not limited to reliability analysis, mechanical engineering, and computer science.
Functionally, PHM covers the entire monitoring lifecycle of an equipment, fulfilling roles across four key dimensions: anomaly detection (AD), fault diagnosis (FD), remaining useful life (RUL) prediction, and maintenance execution (ME) (Zio 2022). First, AD aims to discern rare events that deviate significantly from standard patterns, and the crux lies in accurately differentiating a handful of anomalous data from an extensive volume of normal data (Li et al. 2022a). The focus of FD is to classify diverse faults, and the difficulty is to extract effective fault features under complex working conditions. RUL prediction emphasizes on estimating the time remaining before a component or system fails, and its main challenge is to construct comprehensive health indicators capable of characterizing trends in health degradation. Finally, ME optimizes maintenance decisions based on diagnostic and prognostic results (Lee and Mitici 2023).
Methodologically, the techniques employed to execute the PHM tasks of AD, FD, and RUL prediction can be classified into physics model-based, data-driven, and hybrid methods (Lei et al. 2018). Physics model-based methods utilize mathematical models to describe failure mechanisms and signal relationships, representative techniques include state observers (Choi et al. 2020), parameter estimation (Schmid et al. 2020), and some signal processing approaches (Gangsar and Tiwari 2020). However, data-driven methods involve manual or adaptive extraction of features from sensor signals, including statistical methods (Wang et al. 2022), machine learning (ML) (Huang et al. 2021) and deep learning (DL) (Fink et al. 2020). Hybrid approaches (Zhou et al. 2023a) combine elements from both physics model-based and data-driven techniques. Among these methods, DL-based techniques have gained widespread interest in PHM tasks, spanning from AD to ME, which is attributed to their pronounced advantages over conventional techniques in automatic feature extraction and pattern recognition.
Figure 1 depicts the intelligent PHM cycle based on DL models (Omri et al. 2020), the steps include data collection and processing, model construction, feature extraction, task execution, and model deployment. It is evident that monitoring data forms the foundation of this cycle, its volume and quality wield decisive influence on the eventual performance of DL models in industrial contexts. However, gathering substantial datasets consisting of diverse anomaly and fault patterns with precise labels under different working conditions is time-consuming, dangerous, and costly, leading to small data problems that challenge models’ performance in PHM tasks. A recent investigation conducted by Dimensional Research underscores this quandary, revealing that 96% of companies have encountered small data issues in implementing industrial ML and DL projects (D. Research 2019).
To address the small data issues in intelligent PHM, organizations have started to shift their focus from big data to small data to enhance the efficiency and robustness of Artificial Intelligence (AI) models, which is strongly evidenced by the rapid growth of academic publications over recent years. To provide a comprehensive overview, we applied the preferred reporting items for systematic reviews and meta-analyses (PRISMA) (Huang et al. 2024; Kumar et al. 2023) method for paper investigation and selection. As shown in Fig. 2, the PRISMA technique includes three steps: Defining the scope, databases, and keywords, screening search results, and identifying articles for analysis. At first, the search scope was limited to articles published in IEEE Xplore, Elsevier, and Web of Science databases from 2018 to 2023, and the keywords consisted of topic terms such as “small/limited/imbalanced/incomplete data”, technical terms such as “data augmentation”, “deep learning”, “transfer learning”, “few-shot learning”, “meta-learning”, and application-specific terminologies such as “intelligent PHM”, “anomaly detection”, “fault diagnosis”, “RUL prediction” etc. The second stage is to search the literature in the databases by looking for articles whose title, abstract and keywords contain the predefined keywords, resulting in 139, 1232, and 281 papers from IEEE Xplore, Elsevier, and Web of Science, respectively. In order to eliminate duplicate literature and select the most relevant literature on small data problems in PHM, the first 100 non-duplicate studies from each database were sorted (producing a sum of 300 papers) according to the inclusion and exclusion criteria as listed in Table 1. Finally, we further refined the obtained results with thorough review and evaluation, and a total of 201 representative papers were chosen for analysis presented in this survey.
Despite the growing number of studies, the statistics highlight that there are few review articles on the topic of small data challenges. The first related review is the report entitled “Small data’s big AI potential”, which was released by the Center for Security and Emerging Technology (CSET) at Georgetown University in September 2021 (Chahal et al. 2021), and it emphasized the benefits of small data and introduced some typical approaches. Then, Adadi (2021) reviewed and discussed four categories of data-efficient algorithms for tackling data-hungry problems in ML. More recently, a study (Cao et al. 2023) theoretically analyzed learning on small data, followed an agnostic active sampling theory and reviewed the aspects of generalization, optimization and challenges. Since 2021, scholars in the PHM community have been focusing on the small data problem in intelligent FD and have conducted some review studies. Pan et al. (2022) reviewed the applications of generative adversarial network (GAN)-based methods, Zhang et al. (2022a) outlined solutions from the perspective of data processing, feature extraction, and fault classification, and Li et al. (2022b) organized a comprehensive survey on transfer learning (TL) covering theoretical foundations, practical applications, and prevailing challenges.
It is worth noting that existing studies provide valuable guidance, but they have yet to delve into the foundational concepts of small data and exhibit certain limitations in the analysis. For instance, some reviews studied the small data problems from a macro perspective without considering the application characteristics of PHM tasks (Chahal et al. 2021; Adadi 2021; Cao et al. 2023). However, some concentrated solely on particular methodologies that were used to address small data challenges in FD tasks (Pan et al. 2022; Zhang et al. 2022a; Li et al. 2022b), lack the systematic research on the solutions to AD and RUL prediction tasks, seriously limiting the development and industrial application of intelligent PHM. Therefore, an in-depth exploration of the small data challenges in the PHM domain is necessary to provide guidance for the successful applications of intelligent models in the industry.
This review is a direct response to the contemporary demand for addressing the small data challenges in PHM, and it aims to clarify the following three key questions: (1) What is small data in PHM? (2) Why solve the small data challenges? and (3) How to address small data challenges effectively? These fundamental issues distinguish our work from existing surveys and demonstrate the major contributions:
-
1.
Small data challenges for intelligent PHM are studied for the first time, and the definition, causes, and impacts are analyzed in detail.
-
2.
An overview of various state-of-the-art methods for solving small data problems is presented, and the specific issues and remaining challenges for each PHM task are discussed.
-
3.
The commonly used benchmark datasets and experimental settings are summarized to provide a reference for developing and evaluating data-efficient models in PHM.
-
4.
Finally, promising directions are indicated to facilitate future research on small data.
Consequently, this paper is organized according to the hierarchical architecture shown in Fig. 3. Section 2 discusses the definition of small data in the PHM domain and analyzes the corresponding causes and impacts. Section 3 provides a comprehensive overview of representative approaches—including data augmentation (DA) methods (Sect. 3.1), transfer learning (TL) methods (Sect. 3.2), and few-shot learning (FSL) methods (Sect. 3.3). The problems in PHM applications are discussed in Sect. 4. Section 5 summarizes the datasets and experimental settings for model evaluation. Finally, potential research directions are given in Sect. 6 and the conclusions are drawn in Sect. 7. In addition, the abbreviations of notations used in this paper are summarized in Table 2.
2 Analysis of small data challenges in PHM
The excellent performance of DL models in executing PHM tasks is intricately tied to the premise of abundant and high-quality labeled data. However, this assumption is unlikely to be satisfied in industry, as small data is often the real situation, which exhibits distinct data distributions and may lead to difficulties in model learning. Therefore, this section first analyzes the definition, causes, and impacts of small data in PHM.
2.1 What is “small data”?
Before answering the question of what small data is, let us first review the relative term, “big data”, which has garnered distinct interpretations among scholars since its birth in 2012. Ward and Barker (1309) regarded big data as a phrase that “describes the storage and analysis of large or complex datasets using a series of techniques”. Another perspective, as presented in Suthaharan (2014), focused on the data’s cardinality, continuity, and complexity. Among the various definitions, the one that has been widely accepted is characterized by the “5 V” attributes: volume, variety, value, velocity, and veracity (Jin et al. 2015).
After long-term research, some experts have discovered the fact that big data is not ubiquitous, and the paradigm of small data has emerged as a novel area worthy of thorough investigation in the field of AI (Vapnik 2013; Berman 2013; Baeza-Yates 2024; Kavis 2015). Vapnik (2013) stands among the pioneers in this pursuit, having defined small data as a scenario where “the ratio of the number of training samples to the Vapnik–Chervonenkis (VC) dimensions of a learning machine is less than 20.” Berman (2013) considered small data as being used to solve discrete questions based on limited and structured data that come from one institution. Another study defines small data as “data in a volume and format that makes it accessible, informative and actionable.” (Baeza-Yates 2024). In an industrial context, Kavis (2015) described small data as “The small set of specific attributes produced by the Internet of Things, these are typically a small set of sensor data such as temperature, wind speed, vibration and status”.
Considering the distinctive attributes of equipment signals within industries, a new definition for small data in PHM is given here: Small data refers to datasets consisting of equipment or system status information collected from sensors that are characterized by a limited quantity or quality of samples. Taking the FD task as an example, the corresponding mathematical expression is: Given a dataset \(D = \left\{ {F_{I} (x_{i}^{I} ,y_{i}^{I} )_{i = 1}^{{n_{I} }} } \right\}_{I = 1}^{N}\), \(\left( {x_{i}^{I} ,y_{i}^{I} } \right)\) are the samples and labels (if any) of the Ith fault \(F_{I}\). \(N\) represents the number of fault classes in \(D,\) and each fault set has a sample size of \(n_{I}\). Notably, the term “small” carries two connotations: (i) On a quantitative scale, “small” signifies a limited dataset volume, a limited sample size \(n_{I}\), or a minimal total number of fault types \(N\); and (ii) From a qualitative perspective, “small” indicates a scarcity of valuable information within \(D\) due to a substantial proportion of anomalous, missing, unlabeled, or noisy-labeled data in \(\left( {x_{i}^{I} ,y_{i}^{I} } \right)\). There is no fixed threshold to define “small” concerning both quantity and quality, which is an open question depending on the specific PHM task to be performed, the equipment analyzed, the chosen methodology, and the desired performance. To further understand the meaning of small data, a comprehensive comparison is conducted with big data in Table 3.
2.2 Causes of small data problems in PHM
Rapid advances in sensors and industrial Internet technology has simplified the process of collecting monitoring data from equipment. However, only large companies currently have the ability to acquire data on a large scale. Since most of the collected data are normal samples with limited abnormal or faulty data, they cannot provide enough information for model training. As illustrated in Fig. 4, four main causes for small data challenges in PHM are analyzed.
2.2.1 Heavy investment
When deploying an intelligent PHM system, Return on Investment (ROI) is the top concern of companies. The substantial investment comes from two main aspects, as shown in the first quadrant of Fig. 4: First, factories need to digitally upgrade existing old equipment to collect monitoring data. (ii) Second, data labeling and processing requires manual operation and domain expertise. Although the costs of sensors and labeling outsourcing are relatively low today, installing sensors across numerous machines and processing terabytes of data is still beyond the reach of most manufacturers.
2.2.2 Data accessibility restrictions
Illustrated in the second quadrant, this factor is underscored by follows: (i) The sensitivity, security, or privacy of the data often leads to strict access controls, an example is data collection of military equipment. (ii) For data transfers and data sharing, individuals, corporations, and nations need to comply with laws and supervisory ordinances, especially after the release of the General Data Protection Regulation (Zarsky 2016).
2.2.3 Complex working conditions
The contents depicted in the third quadrant of Fig. 4 include: (i) Data distribution within PHM inherently displays significant variability across diverse production tasks, machines and operating conditions (Zhang et al. 2023), making it impossible to collect data under all potential conditions. (ii) Acquiring data within specialized service environments, such as high radiation, carries inherent risks. (iii) The development of equipment from a healthy state to eventual failure experiences a long process.
2.2.4 Multi-factor coupling
As equipment becomes more intricately integrated, correlation and coupling effects have undergone continuous augmentation. As shown in the fourth quadrant of Fig. 4: Couplings exist between (i) multiple-components, (ii) multiple-systems, and (iii) diverse processes. Such interactions are commonly characterized by nonlinearity, temporal variability, and uncertain attributes, further increasing the complexity of data acquisition.
2.3 Impacts of small data on PHM tasks
The availability of labeled and high-quality data remains limited, producing some impacts on performing PHM tasks, particularly involving both data and models (Wang et al. 2020a). As shown on the left side of Fig. 5, the effects at the data level primarily include incomplete data and unbalanced distribution, which subsequently leads to poor generalization at the model-level. This section analyzes the impacts with corresponding evaluation metrics based on the example of FD task.
2.3.1 Incomplete data
Data integrity refers to the “breadth, depth, and scope of information contained in the data” (Chen et al. 2023a). However, the obtained small dataset often exhibits a low density of supervised information owing to restricted fault categories or sample size. Further, missing values and labels, or outliers in the incomplete data exacerbates the scarcity of valuable information. Data incompleteness in PHM can be measured by the following metrics:
where \(I_{D}\) represents the incompleteness of the dataset \(D\), \(n_{{_{D} }}\) and \(N_{{_{D} }}\) are the number of incomplete samples and the total samples in \(D\), respectively. Similarly, this metric can also assess the incompleteness of samples within a certain class \(C_{i}\) in line with Eq. (2). When either \(I_{D}\) or \(I_{{C_{i} }}\) approaches 0, it indicates a relatively complete dataset or class. Conversely, a higher value represents a severe degree of data incompleteness, leading to a substantial loss of information within the data.
2.3.2 Imbalanced data distribution
The second impact is the imbalanced data distribution. The fault classes containing higher or lower numbers of samples are called the majority and minority classes, respectively. Depending on the imbalance that exists between different classes or within a single class, the phenomena of inter-class imbalance or intra-class imbalance arises accordingly. Considering a dataset with two distinct fault types, each comprising two subclasses, the degrees of inter-class \(IR\) and intra-class \(IR_{{C_{i} }}\) imbalances can be quantified as (Ren et al. 2023):
where \(N_{{{\text{maj}}}}\) and \(N_{\min }\) represent the count of the majority and minority classes within the dataset. \(n_{{{\text{maj}}}}\) and \(n_{\min }\) signify the respective sample sizes of the two subclasses within class \(C_{i}\). The above values span the interval [1, ∞) to describe the extent of the imbalance. A value of \(IR\) or \(IR_{{C_{i} }}\) is 1 indicates a balanced inter-class or intra-class case, whereas a value of 50 is typically thought of as a highly imbalanced task by domain experts (Triguero et al. 2015).
2.3.3 Poor model generalization
Technically, the principal of supervised DL is to build a model \(f\), which learns the underlying patterns from a training set \(D_{train}\) and tries to predict the labels of previously unseen test data \(D_{test}\). The empirical error \(E_{emp}\) on the training set and the expected error \(E_{exp}\) on the test set can be derived by calculating the discrepancy between the true labels \(Y\) and the predicted labels \(\hat{Y}\), respectively. And the difference between these two errors, i.e., the generalization error \(G(f,D_{train} ,D_{test} )\), is commonly used to measure the generalizability of the trained model on a test set. Generalization error is bounded by the model’s complexity \(h\) and the training data size \(P\) as follows (LeCun et al. 1998):
where k is a constant and α is a coefficient with a value range of [0.5, 1.0]. The equation above shows that the parameter \(P\) determines the model’s generalization. When \(P\) is large enough, \(G(f,D_{train} ,D_{test} )\) for the model with a certain \(h\) will converge towards to 0. However, the small, incomplete or unbalanced data often result in larger \(G(f,D_{train} ,D_{test} )\) and poor generalization.
3 Overview of approaches to small data challenges in PHM
This section provides a structured overview of the latest advancements in tackling small data challenges in representative PHM tasks such as AD, FD and RUL prediction. As depicted on the right-hand side of Fig. 5, three main strategies have been extracted from the current literatures: DA, TL and FSL. In the upcoming subsections, we delve into the relevant theories and proposed methodologies for each category, followed by a brief summary.
3.1 Data augmentation methods
DA methods provide data-level solutions to address small data issues, and their efficacy has been verified in many studies. The basic principle is to improve the quantity or quality of the training dataset by creating copies or new synthetic samples of existing data (Gay et al. 2023). Depending on how the auxiliary data are generated, transform-based, sampling-based, and deep generative models-based DA methods are analyzed.
3.1.1 Transform-based DA
Transform-based methods is one of the earliest classes of DA, which increases the size of small datasets by employing geometric transformations to existing samples without changing labels. These transformations are so diverse and flexible that they include random cropping, vertical and horizontal flipping, and noise injection. However, most of them were initially designed for two-dimensional (2-D) images and cannot be directly applied to one-dimensional (1-D) signals of equipment (Iglesias et al. 2023).
Considering the sequential nature of monitoring data, scholars have devised transformation methods applicable to increase the size of 1-D data (Meng et al. 2019; Li et al. 2020a; Zhao et al. 2020a; Fu et al. 2020; Sadoughi et al. 2019; Gay et al. 2022). For example, Meng et al. (2019) proposed a DA approach for FD of rotating machinery, which equally divided the original sample and then randomly reorganized the two segments to form a new fault sample. In Li et al. (2020a) and Zhao et al. (2020a), various transformation techniques, such as Gaussian noise, random scaling, time stretching, and signal translation, are simultaneously applied, as illustrated in Fig. 6. It is worth noting that all the aforementioned techniques are global transformations that are imposed on the entire signal, potentially overlooking the local fault properties. Consequently, some studies have combined local and global transforms (Zhang et al. 2020a; Yu et al. 2020, 2021a) to change both segments and the entirety of the original signal to obtain more authentic samples. For instance, Yu et al. (2020) simultaneously used strategies of local and global signal amplification, noise addition, and data exchange to improve the diversity of fault samples.
3.1.2 Sampling-based DA
Sampling-based DA methods are usually applied to solve data imbalance problems under small data conditions. Among them, under-sampling techniques solve data imbalance by reducing the sample size of the majority class, while over-sampling methods achieve DA by expanding samples of the minority class. Over-sampling can be further classified into random over-sampling and synthetic minority over-sampling techniques (SMOTE) (Chawla et al. 2002) depending on whether or not new classes are created. As shown in Fig. 7, random over-sampling copies the data of a minority class n times to increases data size, and SMOTE creates synthetic samples by calculating the k nearest neighbors of the samples from minority classes, thus enhancing both the quantity and the diversity of samples.
To address data imbalance arising from abundant healthy samples and fewer faulty samples in monitoring data, some studies (Yang et al. 2020a; Hu et al. 2020) have introduced enhanced random over-sampling methods for augmentation of small data. For example, Yang et al. (2020a) enhanced random over-sampling method by introducing a variable-scale sampling strategy for unbalanced and incomplete data in the FD task, and Hu et al. (2020) used resampling method to simulate data under different working conditions to decrease domain bias. In comparison, the SMOTE technique has gained widespread utilization in PHM tasks due to its inherent advantages (Hao and Liu 2020; Mahmoodian et al. 2021). Hao and Liu (2020) combined SMOTE with Euclidean distance to achieve better over-sampling of minority class samples. To address the difficulties of selecting appropriate nearest neighbors for synthetic samples, Zhu et al. (2022) calculated the Euclidean and Mahalanobis distances of the nearest neighbors, and Wang et al. (2023) used the characteristics of neighborhood distribution to equilibrate samples. Moreover, studies of (Liu and Zhu 2020; Fan et al. 2020; Dou et al. 2023) further improved the adaptability of SMOTE by employing weighted distributions to shift the importance of classification boundaries more toward the challenging minority classes, demonstrated effectiveness in resolving data imbalance issues.
3.1.3 Deep generative models-based DA
In addition, deep generative models have emerged as highly promising solutions to small data since 2017, autoencoder (AE) and generative adversarial network (GAN) are two prominent representatives (Moreno-Barea et al. 2020). AE is a special type of neural network characterized by encoding its input to the output in an unsupervised manner (Hinton and Zemel 1994), where the optimization goal is to learn an effective representation of the input data. The fundamental architecture of an AE, illustrated in Fig. 8a, comprises two symmetric parts encompassing a total of five shallow layers. The first half, known as the encoder, transforms input data into a latent space, while the second half, or the decoder, deciphers this latent representation to reconstruct the data. Likewise, a GAN is composed of two fundamental components, as shown in Fig. 8b. The first is the generator, responsible for creating fake samples based on input random noise, and the second is the discriminator for identifying the authenticity of the generated samples. These two components engage in an adversarial training process, progressively moving towards a state of Nash equilibrium.
The unique advantages of GAN in generating diverse samples makes it superior to traditional over-sampling DA methods, especially in tackling data imbalance problems for PHM tasks (Behera et al. 2023). Various innovative models have emerged, including variational auto-encoder (VAE) (Qi et al. 2023), deep convolutional GAN (DCGAN) (Zheng et al. 2019), Wasserstein GAN (Yu et al. 2019), etc. These methods can be classified into two groups based on their input types. The first commonly generates data from 1-D inputs like raw signals (Zheng et al. 2019; Yu et al. 2019; Dixit and Verma 2020; Ma et al. 2021; Zhao et al. 2021a, 2020b; Liu et al. 2022; Guo et al. 2020; Wan et al. 2021; Huang et al. 2020, 2022; Zhang et al. 2020b; Behera and Misra 2021; Wenbai et al. 2021; Jiang et al. 2023) and frequency features (Ding et al. 2019; Miao et al. 2021; Mao et al. 2019), which can capture the inherent temporal information in signals without complex pre-processing. For instance, Dixit and Verma (2020) proposed an improved conditional VAE to generate synthetic samples using raw vibration signals, yielding remarkable FD performance despite limited data availability. The work in Mao et al. (2019) applied the Fast Fourier Transform (FFT) to convert original signals into the frequency domain as inputs for GAN and obtained higher-quality generated samples. On the other hand, some studies (Du et al. 2019; Yan et al. 2022; Liang et al. 2020; Zhao and Yuan 2021; Zhao et al. 2022; Sun et al. 2021; Zhang et al. 2022b; Bai et al. 2023) took the strengths of AEs and GANs in the image domain, aimed to generate corresponding images by utilizing 2-D time–frequency representations. For instance, Bai et al. (2023) employed an intertemporal return plot to transform time-series data to 2-D images as inputs for Wasserstein GAN, and this method reduced data imbalance and improved diagnostic accuracy of bearing faults.
3.1.4 Epilog
Table 4 summarizes the diverse DA-based solutions to addressing small data problems in PHM, covering specific issues tackled by each technique, as well as the merits and drawbacks of each technique. It is evident that DA approaches focus on mitigating small data challenges at the data-level, including problems characterized by insufficient labelled training data, class imbalance, incomplete data, and samples contaminated with noise. To tackle these, transform-based methods primarily increase the size of the training dataset by imposing transformations onto signals, but the effectiveness depends on the quality of raw signals. As for sampling-based approaches, they excel at dealing with unbalanced problems in the PHM tasks, and SMOTE methods demonstrate proficiency in both augmenting minority class samples and diversifying their composition, but refining nearest neighbor selection and bolstering adaptability to high levels of class imbalance remain open research areas. While deep generative models-based DA provides a flexible and promising tool capable of generating samples for various equipment under different working conditions, but more in-depth research is needed on integrating the characteristics of PHM tasks, quality assessment of the generated data, and efficient training of generative models.
3.2 Transfer learning methods
Traditional DL models assumes that training and test data originate from an identical domain, however, changes in operating conditions inevitably cause divergences in data distributions. TL is a new technique that eliminates the requirement for same data distribution by transferring and reusing data or knowledge from related domains, ultimately solving small data problems in the target domain. TL is defined in terms of domains and tasks, each domain \(D\) consists of a feature space and a corresponding marginal distribution, and the task \(T\) associated with each domain contains a label space and a learning function (Yao et al. 2023). Within the PHM context, TL can be concisely defined as: Given a source domain \(D_{S}\) and a task \(T_{S}\), and a target domain \(D_{T}\) with a task \(T_{T}\). The goal of TL is to exploit the knowledge of certain equipment that is learned from \(D_{S}\) and \(T_{S}\) to enhance the learning process for \(T_{T}\) within \(D_{T}\) under the setting of \(D_{S} \ne D_{T}\) or \(T_{S} \ne T_{T}\), and the data volume of the source domain is considered much larger than that of the target domain. There is a range of categorization criteria for TL methods in the existing literature. From the perspective of “what to transfer” during the implementation phase, TL can be categorized into three types: instance-based TL, feature-based TL, and parameter-based TL. Among these categories, the former two are affiliated with solutions operating at the data level, while the latter belong to the realm of model-level approaches. These classifications are visually represented in Fig. 9.
3.2.1 Instance-based TL
The premise of applying TL is that the source domain contains sufficient labeled data, whereas the target domain either lacks sufficient labeled data or predominantly consists of unlabeled data. Although a straightforward way is to train a model for the target domain using samples from the source domain, which proves impractical due to the inherent distribution disparities between the two domains. Therefore, finding and applying labeled instances in the source domain that have similar data distribution to the target domain is the key. For this purpose, various methods have been proposed to minimize the distribution divergence, and weighting strategies are the most widely used.
Dynamic weight adjustment (DWA) is a popular strategy, and its novelty lies in reweighting the source and target domain samples based on their contributions to the learning of the target model. Take the well-known TrAdaBoost algorithm (Dai et al. 2007) as an example, which increases the weights of samples that are similar to the target domain, and reduces the weights of irrelevant source instances. The effectiveness of TrAdaBoost has been validated in FD for wind turbines (Chen et al. 2021), bearings (Miao et al. 2020), and induction motors (Xiao et al. 2019). Evolving from foundational research, scholars also introduced multi-objective optimization (Lee et al. 2021) and DL theories (Jamil et al. 2022; Zhang et al. 2020c) into TrAdaBoost to improve model training efficiency. However, DWA requires labeled target samples, otherwise, weight adjustment methods based on kernel mapping techniques are needed to estimate the key weight parameters, such as matching the mean of source and target domain samples in the replicated kernel Hilbert space (RKHS) (Tang et al. 2023a). For example, Chen et al. (2020) designed a white cosine similarity criterion based on kernel principal component analysis to determine the weight parameters for data in the source and target domain, boosting the diagnostic performance for gears under limited data and varying working conditions. More research can be found in Liu and Ren (2020), Xing et al. (2021), Ruan et al. (2022).
3.2.2 Feature-based TL
Unlike instance-based TL that finds similarities between different domains in the space of raw samples, feature-based methods perform knowledge transfer within a shared feature space between source and target domains. As demonstrated in Fig. 10, feature-based TL is widely applied in domain-adaption and domain-generalization scenarios, where the former focuses on how to migrate knowledge from the source domain to the target domain, and domain generalization aims to develop a model that is robust across multiple source domains so that it can be generalized to any new domain. The key to feature-based TL is to reduce the disparities between the marginal and conditional distributions of different domains by some operations, such as discrepancy-based methods and feature reduction methods, which eventually enable the model to achieve excellent adaptation and generalization on target tasks (Qin et al. 2023).
The main challenge for discrepancy-based methods is to accurately quantify the distributional similarity between domains, which relies on specific distance metrics. Table 5 lists the popular metrics (Borgwardt et al. 2006; Kullback and Leibler 1951; Gretton et al. 2012; Sun and Saenko 2016; Arjovsky et al. 2017) and the algorithms applied to PHM tasks (Yang et al. 2018, 2019a; Cheng et al. 2020; Zhao et al. 2020c; Xia et al. 2021; Zhu et al. 2023a; Li et al. 2020b, c, 2021a; He et al. 2021). Maximum Mean Discrepancy (MMD) is based on the distance between instance means in the RKHS, and Wasserstein distance assesses the likeness of probability distributions by considering geometric properties, both of which are widely used. For example, Yang et al. (2018) devised a convolutional adaptation network with multicore MMD to minimize the distribution discrepancy between the feature distributions derived from both laboratory and real machines failure data. And the integration of Wasserstein distance in Cheng et al. (2020) greatly enhanced the domain adaptation capability of the proposed model. Moreover, Fan et al. (2023a) proposed a domain-based discrepancy metric for domain generalization fault diagnosis under unseen conditions, which helps model balance the intra- and interdomain distances for multiple source domains. On the other hand, feature reduction approaches aim to automatically capture general representations across different domains, mainly using unsupervised methods such as clustering (Michau and Fink 2021; He et al. 2020a; Mao et al. 2021) and AE models (Tian et al. 2020; Lu and Yin 2021; Hu et al. 2021a; Mao et al. 2020). For instance, Mao et al. (2021) integrated time series clustering into TL, and used the meta-degradation information obtained from each cluster for temporal domain adaptation in bearing RUL prediction. To improve model performance for imbalanced and transferable FD, Lu and Yin (2021) designed a weakly supervised convolutional AE (CAE) model to learn representations from multi-domain data. Liao et al. (2020) presented a deep semi-supervised domain generalization network, which showed excellent generalization performance in performing rotary machinery fault diagnosis under unseen speed.
3.2.3 Parameter-based TL
The third category of TL is parameter-based TL, which supposes that the source and target tasks share certain knowledge at the model level, and the knowledge is encoded in the architecture and parameters of the model pre-trained on the source domain. It is motivated by the fact that retraining a model from scratch requires substantial data and time, while it is more efficient to directly transfer pre-trained parameters and fine-tune them in the target domain. In this way, there are basically two main implementations depending on the utilization of the transferred parameters in target model training: full fine-tuning (or freezing) and partial fine-tuning (or freezing), as shown in Fig. 11.
Full fine-tuning (or freezing) means that all parameters transferred from the source domain are fine-tuned with limited labelled data from the target domain, or those parameters are frozen without updating during the training of the target model. Conversely, partial fine-tuning (or freezing) is the selective fine-tuning of only specific upper layers or parameters, keeping the lower layer parameters consistent with the pre-trained model. In both cases, the classifier or predictor of the target model needs to be retrained with randomly initialized parameters to align with the number of classes or data distribution of the target task. The full fine-tuning (or freezing) approach is particularly applicable when the source and target domain samples exhibit a high degree of similarity, so that general features can be extracted from the target domain by using the pre-training parameters (Cho et al. 2020; He et al. 2019, 2020b; Zhiyi et al. 2020; Wu and Zhao 2020; Peng et al. 2021; Zhang et al. 2018; Che et al. 2020; Cao et al. 2018; Wen et al. 2020, 2019). From the perspective of the size of the pre-trained model and the fine-tuning time, the full fine-tuning and full freezing strategies are suitable for small and large models, respectively. For example, He et al. (Zhiyi et al. 2020) proposed to achieve knowledge transfer between bearings mounted on different machines by fully fine-tuning the pre-trained parameters with few target training samples. In Wen et al. (2020, 2019), researchers applied deep convolutional neural networks (CNN)—ResNet-50 (a 50-layer CNN) and VGG-19 (a 19-layer CNN) that was pre-trained on ImageNet as feature extractors, and train target FD models using full freezing methods. In contrast, partial fine-tuning (or freezing) strategies are more suitable for handling cases with significant domain differences (Wu et al. 2020; Zhang et al. 2020d; Yang et al. 2021; Brusa et al. 2021; Li et al. 2021b), such as transfer between complex working conditions (Wu et al. 2020) and multimodal data sources (Brusa et al. 2021). In addition, Kim and Youn (2019) introduced an innovative approach known as selective parameter freezing (SPF), where only a portion of parameters within each layer is frozen, which enables explicit selection of output-sensitive parameters from the source model, reducing the risk of overfitting the target model under limited data conditions.
3.2.4 Epilog
The TL framework breaks the assumption of homogeneous distribution of training and test data in traditional DL and compensates for the lack of labeled data in the target domain by acquiring and transferring knowledge from a large amount of easily collected data. As summarized in Table 6, instance-based TL can be regarded as a borrowed augmentation, wherein other datasets with similar distributions are utilized to enrich the samples in the target domain. Among the techniques, DWA strategies demonstrate superiority in solving insufficient labeled target data and imbalanced data, whereas their drawbacks of high computational cost and high dependence on similar distributions need further optimization. As a comparison, feature-based TL performs knowledge transfer by learning general fault representations and has the ability to handle domain-adaption and domain-generalization tasks with large distribution differences, such as transfers between distinct working conditions (He et al. 2020a), transfers between diverse components (Yang et al. 2019a), or even transfers from simulated to physical processes (Li et al. 2020b). And weakly supervised-based feature reduction techniques are capable of adaptively discovering better feature representations, and showing great potential in open domain generalization problems. Finally, parameter-based TL saves the target model from being retrained from scratch, but the effectiveness of these parameters hinges on the size and quality of the source samples, and model pre-training on multi-source domain data can be considered (Li et al. 2023b; Tang et al. 2021).
3.3 Few-shot learning methods
DA and TL methods both require that training dataset has a certain number (ranging from dozens to hundreds) of labeled samples. However, in some industrial cases, samples of specific classes (such as incipient failures or compound faults) may be exceptionally rare and inaccessible, with only a handful of samples (e.g., 5–10) per category for DL model training, resulting in poor model performance on such “few-shot” problems (Song et al. 2022). Inspired by the human ability to learn and reuse prior knowledge from previous tasks, which Juirgen Schmidhuber initially named meta-learning (Schmidhuber 1987), FSL methods have been proposed to learn a model that can be trained and quickly adapted to tasks with only a few examples. As shown in Fig. 12, there are some differences between traditional DL, TL, and FSL methods: (1) traditional DL and TL are trained and tested on data points from a single task, while FSL methods are often learning at the task level; (2) the learning of traditional DL requires large amounts of labeled training and test data, TL requires large amounts of labeled training data in source domain, while FSL methods perform meta-training and meta-test with limited data. The organization of FSL tasks follows the “N-way K-shot Q-query” protocol (Thrun and Pratt 2012), where N categories are randomly selected, and K support samples and Q query samples are randomly drawn from each category for each task. The objective of FSL is to combine previously acquired knowledge from multiple tasks during meta-training with a few support samples to predict the class of query samples during meta-test. Based on the way prior knowledge is learned, metric-, optimization-, and attribute-based FSL methods are primarily discussed.
3.3.1 Metric-based FSL
Metric-based FSL is to learn priori knowledge by measuring sample similarities, which consists of two components: a feature embedding module responsible for mapping samples to feature vectors, and a metric module to compute similarity (Li et al. 2021). Siamese Neural Networks is one of the pioneers, initially proposed by Koch et al. in 2015 for one-shot image recognition (Koch et al. 2015), which used two parallel CNNs and L1 distance to determine whether paired inputs are identical. Subsequently, Vinyals et al. (2016) introducing long short-term memory (LSTM) with attention mechanisms for effective assessment of multi-class similarity, and Snell et al. (2017) developed Prototypical Networks to calculate the distance between prototype representations, and Relation Networks (Sung et al. 2018) utilized adaptive neural networks instead of traditional functions. Table 7 lists the differences between these representative approaches in terms of embedding modules and metric functions.
According to current studies, two forms of metric-based FSL methods are applied in PHM tasks. The first utilizes fixed metrics (e.g., cosine distance) for measuring similarity, while the second leverages learnable metrics, such as the neural network of Relation Networks. For example, Zhang et al. (2019) firstly introduced a wide-kernel deep CNN-based Siamese Networks for the FD of rolling bearings, which achieved excellent performance with limited data under different working conditions. Then, various FSL algorithms based on Siamese networks (Li et al. 2022c; Zhao et al. 2023; Wang and Xu 2021), matching networks (Xu et al. 2020; Wu et al. 2023; Zhang et al. 2020e) and prototypical networks (Lao et al. 2023; Jiang et al. 2022; Long et al. 2023; Zhang et al. 2022c) have been developed for PHM tasks. Zhang et al. (2020e) designed an iterative matching network combined with a selective signal reuse strategy for the few-shot FD of wind turbines. Jiang et al. (2022) developed a two-branch prototype network (TBPN) model, which integrated both time and frequency domain signals to enhance fault classification accuracy. While Relation Networks have shown superiority over fixed metric-based FSL methods when measuring samples from different domains, and they are therefore widely applied for cross-domain few-shot tasks (Lu et al. 2021; Wang et al. 2020b; Luo et al. 2022; Yang et al. 2023a; Tang et al. 2023b). To illustrate, Lu et al. (2021) considered the FD of rotating machinery with limited data as a similarity metric learning problem, and they introduced Relation Networks into the TL framework as a solution. Luo et al. (2022) proposed a Triplet Relation Network method for performing cross-component few-shot FD tasks, and Tang et al. (2023b) designed a novel lightweight relation network for performing cross-domain few-shot FD tasks with high efficiency. Furthermore, to address domain shift issues resulting from diverse working conditions, Feng et al. (2021) integrated similarity-based meta-learning network with domain-adversarial for cross-domain fault identification.
3.3.2 Optimization-based FSL
Optimization-based FSL methods adhere to the “learning to optimize” principle to solve overfitting problems arising from small samples. Specifically, these techniques learn good global initialization parameters across various tasks, allowing the model to quickly adapt to new few-shot tasks during the meta-test (Parnami and Lee 2022). Taking the best-known model agnostic meta-learning (MAML) (Finn et al. 2017) algorithm as an example, optimization-based FSL typically follows a two-loop learning process, first learning a task-specific model (base learner) for a given task in the inner loop, and then learning a meta-learner over a distribution of tasks in the outer loop, where meta-knowledge is embedded in the model parameters and then used as initialization parameters of the model for meta-test tasks. MAML is compatible with diverse models that are trained using gradient descent, allowing models to generalize well to new few-shot tasks without overfitting.
Recent literature highlights the potential of MAML in PHM, mainly focuses on meta-classification and meta-regression. For meta-classification methods, the aim is to learn an optimized classification model based on multiple meta-training tasks that can accurately classify novel classes in meta-test with a few samples as support, typically used for AD (Chen et al. 2022) and FD tasks (Li et al. 2021c, 2023c; Hu et al. 2021b; Lin et al. 2023; Yu et al. 2021b; Chen et al. 2023b; Zhang et al. 2021; Ren et al. 2024). For example, Li et al. (2021c) proposed a MAML-based meta-learning FD technique for bearings under new conditions by exploiting the prior knowledge of known working conditions. To further improve meta-learning capabilities, advanced models such as task-sequencing MAML (Hu et al. 2021b) and meta-transfer MAML (Li et al. 2023c) have been designed for few-shot FD tasks, and a meta-learning based domain generalization framework was proposed to alleviate both low-resource and domain shift problems (Ren et al. 2024). On the other hand, meta-regression methods target prediction tasks in PHM, with the goal of predicting continuous variables using limited input samples based on meta-optimized models derived from analogous regression tasks (Li et al. 2019, 2022d; Ding et al. 2021, 2022a; Mo et al. 2022; Ding and Jia 2021). Li et al. (2019) first explored the application of MAML to RUL prediction with small size data in 2019, a fully connected neural network (FCNN)-based meta-regression model was designed for predicting tool wear under varying cutting conditions. In addition, MAML has also been integrated into reinforcement learning for fault control under degraded conditions, and more insights can be found in Dai et al. (2022), Yu et al. (2023).
3.3.3 Attribute-based FSL
There is also a unique paradigm of FSL known as “zero-shot learning” (Yang et al. 2022), where models are used to predict the classes for which no samples were seen during meta-training. In this setup, auxiliary information is necessary to bridge the information gap of unseen classes due to the absence of training data. The supplementary information must be valid, unique, and representative that can effectively differentiate various classes, such as attribute information for images in computer vision. As shown in Fig. 13, the classes of unseen animals are inferenced by transferring the between-class attributes, such as semantic descriptions of animals’ shape, voice, or habitats, whose effectiveness has been validated in many zero-shot tasks (Zhou et al. 2023b).
The idea of attributed-based FSL approach offers potential solutions to the zero-sample problem in PHM tasks. However, visual attributes cannot be used directly because they do not match the physical meaning of the sensor signals, and for this reason, scholars have worked on effective fault attributes. Given that fault-related semantic descriptions can be easily obtained from maintenance records and can be defined for specific faults in practice, semantic attributes are widely used in current research (Zhuo and Ge 2021; Feng and Zhao 2020; Xu et al. 2022; Chen et al. 2023c; Xing et al. 2022). For example, Feng and Zhao ( 2020) pioneered the implementation of zero-shot FD based on the transfer of fault description attributes, which included failure position, fault causes and consequences, providing auxiliary knowledge for the target faults. Xu et al. (2022) devised a zero-shot learning framework for compound FD, and the semantic descriptor of the framework can define distinct fault semantics for singular and compound faults. Fan et al. (2023b) proposed an attribute fusion transfer method for zero-shot FD with new fault modes. Despite the strides made in description-drive semantic attributes, certain limitations exist, including reliance on expert insights and inaccurate information sources. More recently, attributes without semantic information (termed non-semantic attributes) have also been explored in Lu et al. (2022), Lv et al. (2020). Lu et al. (2022) developed a zero-shot intelligent FD system by employing statistical attributes extracted from time and frequency domains of signal.
3.3.4 Epilog
FSL methods are advantageous in solving small data problems with extremely limited samples, such as giving only five, one, or even zero samples per class in each task. As listed in Table 8, metric-based FSL methods are concise in their principles and computation, and they shift the focus from sample quantity to intrinsic similarity, but the reliance on labeled data during the training of feature embeddings constrains their applicability in supervised settings. Optimization-based FSL, particularly those underpinned by MAML, boast broader applications including fault classification, RUL prediction, and fault control, but these techniques need substantial computational resources for the gradient optimization of deep networks, and the balance between the optimization parameters and model training speed is the key (Hu et al. 2023). Attribute-based FSL is an emerging but promising research topic that has huge potential to significantly reduce the cost of data collection in the industry, and zero-shot learning enables model generalize to new failure modes or conditions without retraining, achieving intelligent prognostics for complex systems even with “zero” abnormal or fault sample. In industry, few-shot is often accompanied by domain shift problems caused by varying speed and load conditions, which is a more difficult problem and poses challenges for traditional FSL methods to learn enough representative fault features that can be adapted and generalized to unseen data distributions, and research in this area has recently begun (Liu et al. 2023).
4 Discussion of problems in PHM applications
Different PHM tasks have distinct goals and characteristics, thus producing various forms of small data problems and needs corresponding solutions. Therefore, based on the methods discussed in Sect. 3, this section will further explore the specific issues and remaining challenges from the perspective of PHM applications. And the distribution of specific issues and corresponding methods for each task is shown in Fig. 14.
4.1 Small data problems in AD tasks
4.1.1 Main problems and corresponding solutions
In industrial applications, the amount of abnormal data is much less than normal data, which seriously hinders the development of accurate anomaly detectors. According to the statistics in Fig. 14, current research on small data in AD tasks focuses on three core issues: class imbalance, incomplete data, and poor model generalization, which have different impacts on AD tasks. Specifically, class imbalance may cause the model to be biased towards the normal class, which reduces the sensitivity of detecting rare anomalies; incomplete data can make it difficult for the model to distinguish between normal variations and true anomalies when key features are missing; and the problem of poor model generalization may lead to false-positives or false-negatives, which reduces the overall reliability of the anomaly detection system.
To address the above class imbalance problems, existing studies demonstrate that directly increasing the samples of minority classes through DA techniques yields positive results. In our survey of the literature, two papers (Fan et al. 2020; Rajagopalan et al. 2023) introduced optimized SMOTE algorithms, and one study employed GAN-based DA methods to AD tasks of wind turbines. These methods facilitated the generation of additional anomalous samples, and enhanced model accuracy while minimizing false positive rates. Another prevalent challenge in AD tasks is incomplete data, stemming from faulty sensors, inaccurate measurements, or different sampling rates. Deep generative models, with their superior learning capabilities, have been widely used to improve information density of incomplete data (Guo et al. 2020; Yan et al. 2022). To address inadequate model generalization when confronted with limited labeled training samples, Michau and Fink (2021) proposed an unsupervised TL framework. Notably, the majority of AD methods advanced in current research are rooted in unsupervised learning models, such as AE, with wide applications involving electric motors (Rajagopalan et al. 2023), process equipment (Guo et al. 2020), and wind turbines (Liu et al. 2019).
4.1.2 Remaining challenges
AD is an integral and fundamental task in equipment health monitoring, where the difficulty lies in dealing with a complex set of data and various anomalies (Pang et al. 2021). Though existing research has provided valuable insights into addressing small data challenges, certain unresolved issues warrant further exploration.
4.1.2.1 Adaptability of detection models
The majority of AD algorithms are domain-dependent and designed for specific anomalies and conditions. However, industrial production constitutes a dynamic and nonlinear process, where changes in variables such as environment, speed, or load may lead to data drift and novel anomalies. For small datasets, even minor changes in the underlying patterns can have a pronounced impact on the dataset’s characteristics, thus degrading anomaly detection performance of models. To address these issues, it is imperative to improve the adaptability of detection models by using adaptive optimizers and learners, such as the online adaptive Recurrent Neural Network proposed in Fekri et al. (2021), which had the capability to learn from newly arriving data and adapt to novel patterns.
4.1.2.2 Real-time anomaly detection
Real-time is always a desirable property of detection models, which ensure that anomalies can be detected and reported to the operator in a timely manner, and corresponding decisions can be made quickly, and this is especially important for complex equipment, such as UAVs (Yang et al. 2023b). The deployment of lightweight network architectures and edge computing technologies holds promise in enabling the realization of real-time detection capabilities.
4.2 Small data problems in FD tasks
4.2.1 Main problems and corresponding solutions
Accuracy is one of the most important metrics for evaluating models’ performance in classifying different types of faults, but it is strongly influenced by the size of the fault data. As shown in Fig. 14, the small data challenge in FD has received the most extensive research attention compared to AD and RUL prediction tasks. The small data problem has also manifested itself in a richer variety of ways, including limited labeled training data, class imbalance, incomplete data, low data quality, and poor generalization. First, limited labeled training data increases the risk of model overfitting and poses a challenge in capturing variations in fault conditions, class imbalance leads to lower sensitivity to few and unseen faults, incomplete data leads to incomplete extraction of fault features, low-quality data misleads the diagnostic model to generate false positives or false negatives; and poor generalization capability limits the applicability of the model to different operating conditions and equipment.
To address the scarcity of labeled training data, two practical solutions emerge: using samples within the existing datasets, and borrowing from external data sources. The former involves employing already acquired signals to generate samples adhering to the same data distribution. Following this idea, there are five and eight surveyed papers have utilized transform-based and deep generative models-based DA methods, respectively, with 1-D vibration signals as input. The second involves three main techniques: reusing samples from other domains through sample-based TL, obtaining available features via feature-based TL, and utilizing attribute representations through attribute-based FSL. According to the statistics of the surveyed papers, feature-based approaches were employed 15 times for cross-domain scenarios, and attribute-based methods were chosen 7 times for predicting novel classes with zero training samples. Data imbalance is another common problem in FD, with 16 articles retrieved on this topic, and most of them applying deep generative models to address inter-class imbalance problems. In addition to the issues discussed above, data quality problems such as incomplete data and noisy labels have also gained attention, with two and three papers based on deep generative models being presented, respectively.
Secondly, as for the issues caused by limited data at the model level, such as overfitting, diminished accuracy, and weakened generalization, researchers also have also proposed various solutions. This includes 12 papers using parameter-based TL methods, 14 papers applying metric-based FSL methods, and 8 papers using MAML-based FSL approaches. Among these, parameter-based TL methods leverage knowledge within the structure and parameters of models to decrease training time, while metric-based FSL alleviates the requirement for sample size by learning category similarities, and MAML-based FSL achieves fast adaptation to novel FD tasks by using meta-learned knowledge. These successful applications also demonstrate the potential for integrating TL and FSL paradigms to improve model accuracy and generalizability.
4.2.2 Remaining challenges
The data-level and model-level approaches proposed above have made significant progress in solving the small data problems in FD tasks. However, there are still some challenges that need to be addressed urgently.
4.2.2.1 Quality of small data
In our survey of 107 studies on FD tasks, most focused on solving sample size problems, only five papers investigated data quality issues in small data challenges. It is important for researchers to realize that a voluminous collection of irrelevant samples is far inferior to a small yet high-quality dataset on FD tasks. And the poor quality of small data results from both samples and labels, including but not limited to missing data, noise and outliers in signal measurement, and the errors during labeling. Consequently, there is a large research gap in factor analysis, data quality assessment and data enhancement.
4.2.2.2 System-level FD with limited data
The majority of current algorithms for handling small data problems focus on component-level FD, as evidenced by their applications to bearings (Zhang et al. 2020a; Yu et al. 2020, 2021a) and gears (Zhao et al. 2020b). However, these methods cannot meet the diagnostic demands of intricate industrial systems composed of multiple components. Thus, developing intelligent models to perform system-level FD with limited data requires more exploration.
4.3 Small data problems in RUL prediction tasks
4.3.1 Main problems and corresponding solutions
The paradox inherent in RUL prediction lies in its aim to estimate degradation trends of an equipment based on historical monitoring data, whereas run-to-failure data are difficult to obtain. This paradox has motivated scholars to recognize the significance of small data issues within prognostic tasks. Among the 27 reviewed papers, the problems of limited labeled training data, class imbalance, incomplete data, and poor model generalization are mainly studied. While these issues are similar to those in the FD task, the RUL prediction task has different implications due to its continuous label space nature (Ding et al. 2023a). Specifically, limited labeled training data makes it difficult to learn sufficiently robust representations of health indictors, class imbalance may lead to more frequent prediction of non-failure events and produce conservative estimates, missing information in incomplete data further increases prediction uncertainty, and poorer generalization capability reduces the compatibility of the model for different operating conditions or devices.
RUL prediction is a typical regression task, wherein the quantity of training data profoundly influences the feature learning and nonlinear fitting abilities of DL models. Addressing the challenges of limited labeled training data, solutions include transform-(Fu et al. 2020; Sadoughi et al. 2019; Gay et al. 2022) and generative model-based (Zhang et al. 2020b) DA methods, alongside instance- (Zhang et al. 2020c; Ruan et al. 2022) and feature-based (Xia et al. 2021; Mao et al. 2021, 2020) TL methods. Among the reviewed papers, three sampling-based DA methods and one deep generative models-based DA approach have been reported to alleviate class imbalance problems. For instance, the Adaptive Synthetic over-sampling strategy was proposed in Liu and Zhu (2020) for tool wear prediction with imbalanced truncation data. Another major challenge of RUL prediction is the incomplete time-series data, which is treated as an imputation problem, the proposed GAN-based methods in Huang et al. (2022), Wenbai et al. (2021) achieved the insertion of missing data by automatically learning the correlations within time series. Model-level solutions based on TL and FSL have also been employed to enhance the generalization of predictive models across domains when faced with limited time series samples. Notably, MAML-based few-shot prognostics (Li et al. 2019, 2022d; Ding et al. 2021, 2022a; Mo et al. 2022; Ding and Jia 2021) have recently demonstrated substantial advancements within the PHM field. In addition, LSTM has become a popular benchmark model for RUL prediction tasks, due to their proficiency in capturing long-term dependencies, and the combination of LSTM with CNN has extended the capability of learning degradation patterns (Wenbai et al. 2021; Xia et al. 2021).
4.3.2 Remaining challenges
Significant strides have been made in addressing the challenges of limited data in RUL predictions. However, it is noteworthy that many of the proposed methods rely more or less on certain assumptions that might not hold in real-world conditions. In order to achieve more reliable forecasts, a number of major challenges must be addressed.
4.3.2.1 Interpretability of prognostic models
Although numerous prognostic models have shown impressive predictive performance, many are poorly interpreted. The inherent “black box” nature of DL models diminishes their desired interpretability, transparency, and causal insights about both the models and their outcomes. Consequently, within RUL prediction, interpretability is much needed to reveal the underlying degradation mechanisms hidden in the monitoring data, thus increasing the level of “trust” of intelligent models.
4.3.2.2 Uncertainty quantification in small data conditions
Uncertainty quantification (UQ) is an important dimension of the PHM framework that can improve the quality of RUL prediction through risk assessment and management. Uncertainty involved in RUL predictions can be categorized into aleatory uncertainty and epistemic uncertainty (Kiureghian and Ditlevsen 2009), where the first type often results from the noise inherent in data, such as the noise in signal measurements; while the second category of uncertainty is attributed to the deficiency of model knowledge, including model architecture, and model parameters. As discussed above, the impacts of small data challenge at the data level (incomplete data and unbalanced distribution) and model level (poor generalization) both further increase the uncertainty of predictive results, leading to few studies on UQ under small data conditions. The existing research on the UQ of intelligent RUL predictions mainly applies Gaussian process regression and Bayesian neural networks. For example, Ding et al. (2023b) designed a Bayesian approximation enhanced probabilistic meta-learning method to reduce parameter uncertainty in few-shot prognostics. The recent study (Nemani et al. 2023) demonstrates that the physics-informed ML is promising for the UQ of RUL predictions in small data conditions by combining physics-based and data-driven modeling.
5 Datasets and experimental settings
There is a growing number of proposed methods for small data problems in the PHM domain, but there is a lack of corresponding unified criteria for fair and valid evaluation of the proposed methods, one of the major reasons being the complexity and variability of the equipment and working conditions under study. To this end, we analyze and distill two key elements of model evaluation in current studies—datasets and small data settings, which are summarized in this section to provide guidance for effective evaluation of existing models.
5.1 Datasets
In the past decade, the PHM community has released many diagnostic and prognostic benchmarks that cover different mechanical objects, such as bearings (Smith and Randall 2015; Lessmeier et al. 2016; Qiu et al. 2006; Nectoux et al. 2012; Wang et al. 2018a; Bechhoefer 2013), gearbox (Shao et al. 2018; Xie et al. 2016), turbofan engines (Saxena et al. 2008), and cutting tools (Agogino and Goebel 2007). Table 9 lists several datasets that have been widely used in existing research to study small data problems, and the signal types, failure modes, counts of operational conditions, application to PHM tasks, and features of these datasets are outlined.
Different datasets exhibit distinct characteristics and are therefore suitable for studying various problems. Depending on how the fault data is generated, these datasets can be broadly categorized as simulated fault datasets, real fault datasets, and hybrid datasets. Among them, the simulated fault datasets (Smith and Randall 2015; Qiu et al. 2006; Wang et al. 2018a; Shao et al. 2018; Xie et al. 2016; Saxena et al. 2008; Bronz et al. 2020; Downs and Vogel 1993) obtain fault samples by using artificially induced faults or simulation software, and the experimental process involves limited and human-controlled variables, so the fault characteristics and degradation modes in the data are relatively simpler, and the DL models often achieve excellent performance. A typical example is the Case Western Reserve University (CWRU) dataset (Smith and Randall 2015), which is a well-known simulation benchmark widely used for small data problems in AD, FD and RUL prediction tasks. The CWRU has characteristics of multiple failure modes, unbalanced classes, different bearings, and various operating conditions, which provide opportunities for studying limited labeled training data (Ding et al. 2019), class imbalance (Mao et al. 2019), incomplete data (Yang et al. 2020a), and equipment degradation under various conditions (Kim and Youn 2019; Li et al. 2021c).
However, real fault datasets (Nectoux et al. 2012; Agogino and Goebel 2007) collect failure samples from equipment during natural degradation, which is often accompanied by many uncontrollable factors from the equipment itself and the external environment, resulting in more complex data distributions. These datasets are generally used to validate the robustness of small-data solutions in practical conditions (Sadoughi et al. 2019). Moreover, hybrid datasets (Lessmeier et al. 2016; Bechhoefer 2013) contain both artificially damaged and real-damaged fault data, and they are used to validate the transfer between failures across objects, working conditions, and from laboratory to real environments (Wang et al. 2020b).
Further, in terms of the types of signals contained in the datasets, vibration signals, sound signals, electric currents, and temperatures are the most common. These various signals open up avenues for developing multi-source data fusion techniques (Yan et al. 2023). In addition, some datasets include not only single faults but also composite faults, and these datasets facilitate the study of diagnosis and prognosis of compound faults. Moreover, as shown in Table 9, most of the datasets collect signals from individual components, but samples from subsystems (Shao et al. 2018) or entire systems (Downs and Vogel 1993) for system-level diagnostics and prediction are required.
5.2 Experimental setups
During the execution of intelligent PHM tasks, the general process of conducting experiments for DL models is to first divide the dataset into training, validation, and test sets according to a certain ratio. However, in order to simulate limited data scenarios, designing “small data” is a conundrum. There are two popular strategies used in the current studies, as shown in Table 10.
5.2.1 Setting a small sample size
The most direct and commonly employed setup in studying small data problems involves reducing the number of training or test samples to a few or dozens, which is achieved by selecting a tiny subset of the entire dataset. For example, 2.5% of the dataset was used for training in Xing et al. (2022), meaning only five fault samples of each class were provided to the model, which is far less than the hundreds or thousands of samples required by traditional DL methods. Due to the ease of implementation and understanding, this strategy has been widely used in AD, FD, and RUL prediction tasks with limited data, and it was notably observed in most experiments using DA and TL methods. However, the number of “small sample” is relative to the total size of the dataset and lacks a unified standard, which should be consistent when comparing various methods.
5.2.2 Following the N-way K-shot protocol
Another strategy is to treat PHM tasks under limited data conditions as a few-shot classification or regression problems. This strategy draws on the organization of the FSL method that extends the input samples from data point level to task space. As described in Sect. 3.3, each N-way K-shot task consists of N (N ≤ 20) classes, with each class containing K (K ≤ 10) support samples. Creating multiple N-way K-shot subtasks can be used for the training and test of FSL models. In the case of the CWRU dataset, for example, 10-way 1/5-shot FD tasks are frequently designed. This setting better aligns the principles of the FSL framework and proves to be beneficial in detecting novel faults under unseen conditions. However, tasks need to be sampled from a sufficiently large number of categories, otherwise the tasks will be homogeneous and degrade model performance.
6 Future research directions
Currently, most of the data sizes involved in intelligent PHM tasks are still in the small data stage and will be for a long time. Various methods proposed in existing research have made significant progress, but there is still a long way to go to realize data-efficient PHM. For this reason, we propose some directions for further research on small data challenges.
6.1 Data governance
Existing research on the limited data challenge focuses on the quantity of monitoring data, with relatively little attention paid to the quality of the samples. In fact, monitoring data serves as the “raw material” for implementing PHM tasks, and its quality seriously affects the performance of intelligent models as well as the accuracy of maintenance decisions. As a result, it is imperative to research the theories and methodologies concerning the governance of industrial data, which involves the quantification, assessment, and enhancement of data quality (Karkošková 2023). An in-depth exploration of these ensures that the collected monitoring data meets the data quality requirements set out in the ISO/IEC 25012 standard (Gualo et al. 2021), thereby minimizing the adverse effects of factors such as sensor drift, measurement errors, environmental noise, and label inaccuracies. Data governance is a key component to steer the trajectory of intelligent PHM from the prevalent model-centric paradigm towards a data-centric fashion (Zha et al. 2023).
6.2 Multimodal learning
Multimodal learning is a novel paradigm to train models with multiple different data modalities, which provides a potential means to solve small data problems in PHM. Specifically, rich forms of monitoring data exist in industry, including but not limited to surveillance videos, equipment images, and maintenance records, these data contain a wealth of intermodal and cross-modal information, which can be fused by multimodal learning techniques (Xu et al. 2023) to compensate for the low information density of limited unimodal data. Meanwhile, multimodal data from different systems and equipment can help to perceive their health status more comprehensively, thus improving the intelligent diagnosis and forecasting capability for the entire fleet of equipment (Jose et al. 2023).
6.3 Physics-informed data-driven approaches
Existing studies have demonstrated that data-driven approaches, especially those based on DL, excel at capturing underlying patterns from multivariable data, but they are susceptible to small dataset size. While physics model-based methods incorporate mechanisms or expert knowledge during the modeling process, but have limited data processing capabilities. Considering the respective strengths and weaknesses of these two paradigms, an emerging trend is to develop hybrid frameworks that integrate domain knowledge with implicit knowledge extracted from data (Ma et al. 2023), which has two obvious advantages in solving small data problems. On the one hand, the introduction of physical knowledge reduces the black-box characteristics of the DL model to a certain extent and enhances the interpretability of the PHM task decision-making under small samples (Weikun et al. 2023); On the other hand, physical modeling takes the known physical laws and principles as priori knowledge, which can reduce the uncertainty and domain bias brought by the small-sample data under the complex working conditions, for example, Shi et al. (2022) have validated the effectiveness of introducing multibody dynamic simulation into data augmentation for robustness enhancement.
6.4 Weak-supervised learning
DL-based models have demonstrated much potential in numerous PHM tasks, but their performance relies heavily on supervised learning, and the need for abundant annotated data are significant barriers to deploying these models in the industry. As we all know, obtaining high-quality labeled data is time-intensive and expensive, but unlabeled data is more readily available in practice. This reality has spurred the exploration of techniques such as unsupervised and self-supervised learning methods to perform autonomous construction of learning models using unlabeled data. Weak-supervised strategies have been successfully employed in the fields of computer vision and natural language processing, and the application potential in PHM tasks has been explored by Zhao et al. (2021b) and Ding et al. (2022b), and the results illustrate that these methods excel at addressing the open-set diagnostic and prognostic problems with small data.
6.5 Federated learning
Federated learning (FL) (Yang et al. 2019b), a promising framework for developing DL models with low resources, which adheres to the unique principle of “data stays put, models move”. FL allows the training of decentralized models using data generated by each manufacturing company separately, without the need to aggregate the data from all manufacturers into a centralized repository, resulting in two significant benefits. First, from a cost perspective, FL reduces the expenses associated with large-scale data collection, transmission, storage, and model training. Second, from a data privacy standpoint, the FL approach directly leverages locally-held data without data sharing, eliminating concerns of data owners about data sovereignty and business secrets. Moreover, the distributed training process of exchanging only partially model parameters reduces the risk of malicious attacks on PHM models in industrial applications (Arunan et al. 2023). At present, representative models include federated averaging (FedAvg) (McMahan et al. 2017), federated proximal (FedProx) (Mishchenko et al. 2022), federated transfer learning (Kevin et al. 2021), and federated meta-learning (Fallah et al. 2020), which provide valuable guidance in developing reliable and responsible intelligent PHM. Due to the complexity of equipment composition and working conditions, issues such as device heterogeneity and data imbalance in FL applications in PHM require more attention and research (Berghout et al. 2022).
6.6 Large-scale models
Since the release of GPT-3 (Brown et al. 2020) and ChatGPT (Scheurer et al. 2023), large-scale models have become a hot topic in academia and industry, triggering a new wave of innovation. Technically, large-scale models are the evolution and extension of traditional DL models that they require large amounts of data and computing resources to train the hundreds of millions of parameters, and demonstrate amazing abilities in data understanding, multi-task performing, logical reasoning, and domain generalization. Considering the remaining challenges of traditional DL models in performing PHM tasks with small data, developing large-scale models for the PHM-domain is a promising direction, the pre-trained large-scale model is first chosen based on the target PHM task and signal type, such as the pre-trained BERT is reused for RUL prediction (Zhu et al. 2023b), and which is then fine-tuned by freezing most layers and only fine-tuning the top layers with small amounts of data, and regularization and architecture adjustment techniques may be used to alleviate overfitting during the process. The study in Li et al. (2023d) have validated that the large-scale models pretrained on multi-modal data from related equipment and working conditions can be generalized to cross-task and cross-domain tasks with zero-shot.
7 Conclusions
Intelligent PHM is a key part in Industry 4.0, and it is closely linked to big data and AI models. To address the difficulties of developing DL models with limited data, we provide the first comprehensive overview of small data challenges in PHM. The definition, causes and impacts of small data are first systematically analyzed to answer the research questions of “what” and “why” of solving data scarcity problems. We then comprehensively summarize the proposed solutions along three technical lines to report how the small data issues have been addressed in existing studies. Furthermore, the problems and remaining challenges within each specific PHM task are explored. Additionally, available benchmark datasets, experimental settings, and promising directions are discussed, to offer valuable references for future research on more intelligent, data-efficient, and explainable PHM methods. Learning from small data is critical to advancing intelligent PHM, as well as contributing the development of General Industrial AI.
References
Adadi A (2021) A survey on data-efficient algorithms in big data era. J Big Data 8:1–54
Agogino A, Goebel K (2007) BEST lab, UC Berkeley, Milling Data Set. NASA Ames Prognostics Data Repository, NASA Ames Research Center, Moffett Field
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. PMLR, pp 214–223
Arunan A, Qin Y, Li X, Yuen C (2023) A federated learning-based industrial health prognostics for heterogeneous edge devices using matched feature extraction. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3274648
Baeza-Yates R (2024) Gold blog BIG, small or right data: Which is the proper focus?
Bai G, Sun W, Cao C, Wang D, Sun Q, Sun L (2023) GAN-based bearing fault diagnosis method for short and imbalanced vibration signal. IEEE Sens J 24:1894–1904
Bechhoefer E (2013) Condition based maintenance fault database for testing diagnostics and prognostic algorithms. MFPT Data
Behera S, Misra R (2021) Generative adversarial networks based remaining useful life estimation for IIoT. Comput Electr Eng 92:107195
Behera S, Misra R, Sillitti A (2023) GAN-based multi-task learning approach for prognostics and health management of IIoT. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3267860
Berghout T, Benbouzid M, Bentrcia T, Lim WH, Amirat Y (2022) Federated learning for condition monitoring of industrial processes: a review on fault diagnosis methods challenges, and prospects. Electronics 12:158
Berman JJ (2013) Principles of big data: preparing, sharing, and analyzing complex information. Newnes
Borgwardt KM, Gretton A, Rasch MJ, Kriegel H-P, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22:e49–e57
Bronz M, Baskaya E, Delahaye D, Puechmore S (2020) Real-time fault detection on small fixed-wing UAVs using machine learning. In: 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), IEEE, San Antonio, TX, USA, pp 1–10
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. Advances in neural information processing systems. Curran Associates Inc, pp 1877–1901
Brusa E, Delprete C, Di Maggio LG (2021) Deep transfer learning for machine diagnosis: from sound and music recognition to bearing fault detection. Appl Sci 11:11663
Cao P, Zhang S, Tang J (2018) Preprocessing-free gear fault diagnosis using small datasets with deep convolutional neural network-based transfer learning. IEEE Access 6:26241–26253
Cao X, Bu W, Huang S, Zhang M, Tsang IW, Ong YS, Kwok JT (2023) A survey of learning on small data: generalization, optimization, and challenge
Chahal H, Toner H, Rahkovsky I (2021) Small data’s big AI potential. Center for Security and Emerging Technology
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Che C, Wang H, Ni X, Fu Q (2020) Domain adaptive deep belief network for rolling bearing fault diagnosis. Comput Ind Eng 143:106427
Chen C, Shen F, Xu J, Yan R (2020) Domain adaptation-based transfer learning for gear fault diagnosis under varying working conditions. IEEE Trans Instrum Meas 70:1–10
Chen W, Qiu Y, Feng Y, Li Y, Kusiak A (2021) Diagnosis of wind turbine faults with transfer learning algorithms. Renew Energy 163:2053–2067
Chen J, Hu W, Cao D, Zhang Z, Chen Z, Blaabjerg F (2022) A meta-learning method for electric machine bearing fault diagnosis under varying working conditions with limited data. IEEE Trans Indus Inform 19:2552–2564
Chen X, Liu H, Nikitas N (2023a) Internal pump leakage detection of the hydraulic systems with highly incomplete flow data. Adv Eng Inform 56:101974
Chen J, Tang J, Li W (2023b) Industrial edge intelligence: federated-meta learning framework for few-shot fault diagnosis. IEEE Trans Netw Sci Eng. https://doi.org/10.1109/TNSE.2023.3266942
Chen X, Zhao C, Ding J (2023c) Pyramid-type zero-shot learning model with multi-granularity hierarchical attributes for industrial fault diagnosis. Reliab Eng Syst Saf 240:109591
Cheng C, Zhou B, Ma G, Wu D, Yuan Y (2020) Wasserstein distance based deep adversarial transfer learning for intelligent fault diagnosis with unlabelled or insufficient labelled data. Neurocomputing 409:35–45
Cho SH, Kim S, Choi J-H (2020) Transfer learning-based fault diagnosis under data deficiency. Appl Sci 10:7768
Choi K, Kim Y, Kim S-K, Kim K-S (2020) Current and position sensor fault diagnosis algorithm for PMSM drives based on robust state observer. IEEE Trans Industr Electron 68:5227–5236
D Research (2019) Artificial intelligence and machine learning projects are obstructed by data issues
Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. 2007. In: Proceedings of the 24th International Conference on Machine Learning
Dai H, Chen P, Yang H (2022) Metalearning-based fault-tolerant control for skid steering vehicles under actuator fault conditions. Sensors 22:845
Der Kiureghian A, Ditlevsen O (2009) Aleatory or epistemic? Does it matter? Struct Saf 31:105–112
Ding P, Jia M (2021) Mechatronics equipment performance degradation assessment using limited and unlabeled data. IEEE Trans Industr Inf 18:2374–2385
Ding Y, Ma L, Ma J, Wang C, Lu C (2019) A generative adversarial network-based intelligent fault diagnosis method for rotating machinery under small sample size conditions. IEEE Access 7:149736–149749
Ding P, Jia M, Zhao X (2021) Meta deep learning based rotating machinery health prognostics toward few-shot prognostics. Appl Soft Comput 104:107211
Ding P, Jia M, Ding Y, Cao Y, Zhao X (2022a) Intelligent machinery health prognostics under variable operation conditions with limited and variable-length data. Adv Eng Inform 53:101691
Ding Y, Zhuang J, Ding P, Jia M (2022b) Self-supervised pretraining via contrast learning for intelligent incipient fault detection of bearings. Reliab Eng Syst Saf 218:108126
Ding P, Zhao X, Shao H, Jia M (2023a) Machinery cross domain degradation prognostics considering compound domain shifts. Reliab Eng Syst Saf 239:109490
Ding P, Jia M, Ding Y, Cao Y, Zhuang J, Zhao X (2023b) Machinery probabilistic few-shot prognostics considering prediction uncertainty. IEEE/ASME Trans Mechatron 29:106–118
Dixit S, Verma NK (2020) Intelligent condition-based monitoring of rotary machines with few samples. IEEE Sens J 20:14337–14346
Dou J, Wei G, Song Y, Zhou D, Li M (2023) Switching triple-weight-smote in empirical feature space for imbalanced and incomplete data. IEEE Trans Autom Sci Eng 21:1–17
Downs JJ, Vogel EF (1993) A plant-wide industrial process control problem. Comput Chem Eng 17:245–255
Du Y, Zhang W, Wang J, Wu H (2019) DCGAN based data generation for process monitoring. In: IEEE, pp 410–415
Fallah A, Mokhtari A, Ozdaglar A (2020) Personalized federated learning with theoretical guarantees: a model-agnostic meta-learning approach. Adv Neural Inf Process Syst 33:3557–3568
Fan Y, Cui X, Han H, Lu H (2020) Chiller fault detection and diagnosis by knowledge transfer based on adaptive imbalanced processing. Sci Technol Built Environ 26:1082–1099
Fan Z, Xu Q, Jiang C, Ding SX (2023a) Deep mixed domain generalization network for intelligent fault diagnosis under unseen conditions. IEEE Trans Industr Electron 71:965–974
Fan L, Chen X, Chai Y, Lin W (2023b) Attribute fusion transfer for zero-shot fault diagnosis. Adv Eng Inform 58:102204
Fekri MN, Patel H, Grolinger K, Sharma V (2021) Deep learning for load forecasting with smart meter data: online adaptive recurrent neural network. Appl Energy 282:116177
Feng L, Zhao C (2020) Fault description based attribute transfer for zero-sample industrial fault diagnosis. IEEE Trans Industr Inf 17:1852–1862
Feng Y, Chen J, Yang Z, Song X, Chang Y, He S, Xu E, Zhou Z (2021) Similarity-based meta-learning network with adversarial domain adaptation for cross-domain fault identification. Knowl-Based Syst 217:106829
Fink O, Wang Q, Svensen M, Dersin P, Lee W-J, Ducoffe M (2020) Potential, challenges and future directions for deep learning in prognostics and health management applications. Eng Appl Artif Intell 92:103678
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: 34th International Conference on Machine Learning, ICML 2017 3:1856–1868
Fu B, Yuan W, Cui X, Yu T, Zhao X, Li C (2020) Correlation analysis and augmentation of samples for a bidirectional gate recurrent unit network for the remaining useful life prediction of bearings. IEEE Sens J 21:7989–8001
Gangsar P, Tiwari R (2020) Signal based condition monitoring techniques for fault detection and diagnosis of induction motors: a state-of-the-art review. Mech Syst Signal Process 144:106908
Gay A, Voisin A, Iung B, Do P, Bonidal R, Khelassi A (2022) Data augmentation-based prognostics for predictive maintenance of industrial system. CIRP Ann 71:409–412
Gay A, Voisin A, Iung B, Do P, Bonidal R, Khelassi A (2023) A study on data augmentation optimization for data-centric health prognostics of industrial systems. IFAC-PapersOnLine 56:1270–1275
Gray DO, Rivers D, Vermont G (2012) Measuring the economic impacts of the NSF Industry/University Cooperative Research Centers Program: a feasibility study, Arlington, Virginia
Gretton A, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Fukumizu K, Sriperumbudur BK (2012) Optimal kernel choice for large-scale two-sample tests. Adv Neural Inform Process Syst 25
Gualo F, Rodríguez M, Verdugo J, Caballero I, Piattini M (2021) Data quality certification using ISO/IEC 25012: industrial experiences. J Syst Softw 176:110938
Guo C, Hu W, Yang F, Huang D (2020) Deep learning technique for process fault detection and diagnosis in the presence of incomplete data. Chin J Chem Eng 28:2358–2367
Han T, Xie W, Pei Z (2023) Semi-supervised adversarial discriminative learning approach for intelligent fault diagnosis of wind turbine. Inf Sci 648:119496
Hao W, Liu F (2020) Imbalanced data fault diagnosis based on an evolutionary online sequential extreme learning machine. Symmetry 12:1204
He Z, Shao H, Zhang X, Cheng J, Yang Y (2019) Improved deep transfer auto-encoder for fault diagnosis of gearbox under variable working conditions with small training samples. IEEE Access 7:115368–115377
He Y, Hu M, Feng K, Jiang Z (2020a) An intelligent fault diagnosis scheme using transferred samples for intershaft bearings under variable working conditions. IEEE Access 8:203058–203069
He Z, Shao H, Wang P, (Jing) Lin J, Cheng J, Yang Y (2020b) Deep transfer multi-wavelet auto-encoder for intelligent fault diagnosis of gearbox with few target training samples. Knowl-Based Syst 191:105313
He J, Li X, Chen Y, Chen D, Guo J, Zhou Y (2021) Deep transfer learning method based on 1d-cnn for bearing fault diagnosis. Shock Vib 2021:1–16
Hinton GE, Zemel RS (1994) Autoencoders, minimum description length, and Helmholtz free energy. Adv Neural Inf Process Syst 6:3–10
Hu T, Tang T, Lin R, Chen M, Han S, Wu J (2020) A simple data augmentation algorithm and a self-adaptive convolutional architecture for few-shot fault diagnosis under different working conditions. Measurement 156:107539
Hu C, Zhou Z, Wang B, Zheng W, He S (2021a) Tensor transfer learning for intelligence fault diagnosis of bearing with semisupervised partial label learning. J Sens 2021:1–11
Hu Y, Liu R, Li X, Chen D, Hu Q (2021b) Task-sequencing meta learning for intelligent few-shot fault diagnosis with limited data. IEEE Trans Industr Inf 18:3894–3904
Hu Z, Shen L, Wang Z, Liu T, Yuan C, Tao D (2023) Architecture, dataset and model-scale agnostic data-free meta-learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7736–7745
Huang N, Chen Q, Cai G, Xu D, Zhang L, Zhao W (2020) Fault diagnosis of bearing in wind turbine gearbox under actual operating conditions driven by limited data with noise labels. IEEE Trans Instrum Meas 70:1–10
Huang F, Sava A, Adjallah KH, Wang Z (2021) Fuzzy model identification based on mixture distribution analysis for bearings remaining useful life estimation using small training data set. Mech Syst Signal Process 148:107173
Huang Y, Tang Y, VanZwieten J, Liu J (2022) Reliable machine prognostic health management in the presence of missing data. Concurr Comput Pract Exp 34:e5762
Huang C, Bu S, Lee HH, Chan KW, Yung WKC (2024) Prognostics and health management for induction machines: a comprehensive review. J Intell Manuf 35:937–962
Iglesias G, Talavera E, González-Prieto Á, Mozo A, Gómez-Canaval S (2023) Data Augmentation techniques in time series domain: a survey and taxonomy. Neural Comput Appl 35:10123–10145
Jamil F, Verstraeten T, Nowé A, Peeters C, Helsen J (2022) A deep boosted transfer learning method for wind turbine gearbox fault detection. Renew Energy 197:331–341
Jiang C, Chen H, Xu Q, Wang X (2022) Few-shot fault diagnosis of rotating machinery with two-branch prototypical networks. J Intell Manuf. https://doi.org/10.1007/s10845-021-01904-x
Jiang Y, Drescher B, Yuan G (2023) A GAN-based multi-sensor data augmentation technique for CNC machine tool wear prediction. IEEE Access 11:95782–95795
Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Res 2:59–64
Jose S, Nguyen KTP, Medjaher K (2023) Multimodal machine learning in prognostics and health management of manufacturing systems. Artificial intelligence for smart manufacturing: methods, applications, and challenges. Springer, pp 167–197
Karkošková S (2023) Data governance model to enhance data quality in financial institutions. Inf Syst Manag 40:90–110
Kavis M (2015) Forget big data–small data is driving the Internet of Things, https://www.Forbes.Com/Sites/Mikekavis/2015/02/25/Forget-Big-Datasmall-Data-Is-Driving-the-Internet-of-Things
Kevin I, Wang K, Zhou X, Liang W, Yan Z, She J (2021) Federated transfer learning based cross-domain prediction for smart manufacturing. IEEE Trans Industr Inf 18:4088–4096
Kim H, Youn BD (2019) A new parameter repurposing method for parameter transfer with small dataset and its application in fault diagnosis of rolling element bearings. IEEE Access 7:46917–46930
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Kumar P, Raouf I, Kim HS (2023) Review on prognostics and health management in smart factory: from conventional to deep learning perspectives. Eng Appl Artif Intell 126:107126
Lao Z, He D, Jin Z, Liu C, Shang H, He Y (2023) Few-shot fault diagnosis of turnout switch machine based on semi-supervised weighted prototypical network. Knowl-Based Syst 274:110634
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324
Lee YO, Jo J, Hwang J (2017) Application of deep neural network and generative adversarial network to industrial maintenance: a case study of induction motor fault detection. In: Proceedings—2017 IEEE International Conference on Big Data, Big Data 2017 2018-Janua, pp 3248–3253
Lee J, Mitici M (2023) Deep reinforcement learning for predictive aircraft maintenance using probabilistic remaining-useful-life prognostics. Reliab Eng Syst Saf 230:108908
Lee K, Han S, Pham VH, Cho S, Choi H-J, Lee J, Noh I, Lee SW (2021) Multi-objective instance weighting-based deep transfer learning network for intelligent fault diagnosis. Appl Sci 11:2370
Lei Y, Li N, Guo L, Li N, Yan T, Lin J (2018) Machinery health prognostics: a systematic review from data acquisition to RUL prediction. Mech Syst Signal Process 104:799–834
Lessmeier C, Kimotho JK, Zimmer D, Sextro W (2016) Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: a benchmark data set for data-driven classification, 17
Li Y, Liu C, Hua J, Gao J, Maropoulos P (2019) A novel method for accurately monitoring and predicting tool wear under varying cutting conditions based on meta-learning. CIRP Ann 68:487–490
Li X, Zhang W, Ding Q, Sun JQ (2020a) Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation. J Intell Manuf 31:433–452
Li W, Gu S, Zhang X, Chen T (2020b) Transfer learning for process fault diagnosis: knowledge transfer from simulation to physical processes. Comput Chem Eng 139:106904
Li X, Zhang W, Ding Q, Li X (2020c) Diagnosing rotating machines with weakly supervised data using deep transfer learning. IEEE Trans Industr Inf 16:1688–1697
Li F, Tang T, Tang B, He Q (2021a) Deep convolution domain-adversarial transfer learning for fault diagnosis of rolling bearings. Measurement 169:108339
Li Y, Jiang W, Zhang G, Shu L (2021b) Wind turbine fault diagnosis based on transfer learning and convolutional autoencoder with small-scale data. Renew Energy 171:103–115
Li C, Li S, Zhang A, He Q, Liao Z, Hu J (2021c) Meta-learning for few-shot bearing fault diagnosis under complex working conditions. Neurocomputing 439:197–211
Li X, Yang X, Ma Z, Xue JH (2021d) Deep metric learning for few-shot image classification: a selective review, arXiv Preprint https://arXiv.org/2105.08149
Li Z, Sun Y, Yang L, Zhao Z, Chen X (2022a) Unsupervised machine anomaly detection using autoencoder and temporal convolutional network. IEEE Trans Instrum Meas 71:1–13
Li W, Huang R, Li J, Liao Y, Chen Z, He G, Yan R, Gryllias K (2022b) A perspective survey on deep transfer learning for fault diagnosis in industrial scenarios: theories, applications and challenges. Mech Syst Signal Process 167:108487
Li C, Li S, Zhang A, Yang L, Zio E, Pecht M, Gryllias K (2022c) A Siamese hybrid neural network framework for few-shot fault diagnosis of fixed-wing unmanned aerial vehicles. J Comput Design Eng 9:1511–1524
Li Y, Wang J, Huang Z, Gao RX (2022d) Physics-informed meta learning for machining tool wear prediction. J Manuf Syst 62:17–27
Li Y, Yang Y, Feng K, Zuo MJ, Chen Z (2023a) Automated and adaptive ridge extraction for rotating machinery fault detection. IEEE/ASME Trans Mechatron 28:2565
Li K, Lu J, Zuo H, Zhang G (2023b) Source-free multi-domain adaptation with fuzzy rule-based deep neural networks. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2023.3276978
Li C, Li S, Wang H, Gu F, Ball AD (2023c) Attention-based deep meta-transfer learning for few-shot fine-grained fault diagnosis. Knowl-Based Syst 264:110345
Li Y-F, Wang H, Sun M (2023d) ChatGPT-like large-scale foundation models for prognostics and health management: a survey and roadmaps. Reliab Eng Syst Saf 243:109850
Liang P, Deng C, Wu J, Yang Z, Zhu J, Zhang Z (2020) Single and simultaneous fault diagnosis of gearbox via a semi-supervised and high-accuracy adversarial learning framework. Knowl-Based Syst 198:105895
Liao Y, Huang R, Li J, Chen Z, Li W (2020) Deep semisupervised domain generalization network for rotary machinery fault diagnosis under variable speed. IEEE Trans Instrum Meas 69:8064–8075
Lin J, Shao H, Zhou X, Cai B, Liu B (2023) Generalized MAML for few-shot cross-domain fault diagnosis of bearing driven by heterogeneous signals. Expert Syst Appl 230:120696
Liu J, Ren Y (2020) A general transfer framework based on industrial process fault diagnosis under small samples. IEEE Trans Industr Inf 3203:1–11
Liu C, Zhu L (2020) A two-stage approach for predicting the remaining useful life of tools using bidirectional long short-term memory. Measurement 164:108029
Liu J, Qu F, Hong X, Zhang H (2019) A small-sample wind turbine fault detection method with synthetic fault data using generative adversarial nets. IEEE Trans Industr Inf 15:3877–3888
Liu S, Jiang H, Wu Z, Li X (2022) Data synthesis using deep feature enhanced generative adversarial networks for rolling bearing imbalanced fault diagnosis. Mech Syst Signal Process 163:108139
Liu S, Chen J, He S, Shi Z, Zhou Z (2023) Few-shot learning under domain shift: attentional contrastive calibrated transformer of time series for fault diagnosis under sharp speed variation. Mech Syst Signal Process 189:110071
Long J, Chen Y, Huang H, Yang Z, Huang Y, Li C (2023) Multidomain variance-learnable prototypical network for few-shot diagnosis of novel faults. J Intell Manuf. https://doi.org/10.1007/s10845-023-02123-2
Lu N, Yin T (2021) Transferable common feature space mining for fault diagnosis with imbalanced data. Mech Syst Signal Process 156:107645
Lu N, Hu H, Yin T, Lei Y, Wang S (2021) Transfer relation network for fault diagnosis of rotating machinery with small data. IEEE Trans Cybern 52:11927–11941
Lu N, Zhuang G, Ma Z, Zhao Q (2022) A zero-shot intelligent fault diagnosis system based on EEMD. IEEE Access 10:54197–54207
Luo M, Xu J, Fan Y, Zhang J (2022) TRNet: a cross-component few-shot mechanical fault diagnosis. IEEE Trans Indus Inform. https://doi.org/10.1109/TII.2022.3204554
Lv H, Chen J, Pan T, Zhou Z (2020) Hybrid attribute conditional adversarial denoising autoencoder for zero-shot classification of mechanical intelligent fault diagnosis. Appl Soft Comput 95:106577
Ma L, Ding Y, Wang Z, Wang C, Ma J, Lu C (2021) An interpretable data augmentation scheme for machine fault diagnosis based on a sparsity-constrained generative adversarial network. Expert Syst Appl 182:115234
Ma Z, Liao H, Gao J, Nie S, Geng Y (2023) Physics-informed machine learning for degradation modelling of an electro-hydrostatic actuator system. Reliab Eng Syst Saf 229:108898
Mahmoodian A, Durali M, Saadat M, Abbasian T (2021) A life clustering framework for prognostics of gas turbine engines under limited data situations. Int J Eng Trans C: Aspects 34:728–736
Mao W, Liu Y, Ding L, Li Y (2019) Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: a comparative study. IEEE Access 7:9515–9530
Mao W, He J, Zuo MJ (2020) Predicting remaining useful life of rolling bearings based on deep feature representation and transfer learning. IEEE Trans Instrum Meas 69:1594–1608
Mao W, He J, Sun B, Wang L (2021) Prediction of bearings remaining useful life across working conditions based on transfer learning and time series clustering. IEEE Access 9:135285–135303
McMahan B, Moore E, Ramage D, Hampson S, Arcas BAY (2017) Communication-efficient learning of deep networks from decentralized data. Artificial intelligence and statistics. PMLR, pp 1273–1282
Meng Z, Guo X, Pan Z, Sun D, Liu S (2019) Data segmentation and augmentation methods based on raw data using deep neural networks approach for rotating machinery fault diagnosis. IEEE Access 7:79510–79522
Miao Y, Jiang Y, Huang J, Zhang X, Han L (2020) Application of fault diagnosis of seawater hydraulic pump based on transfer learning. Shock Vib 2020:1–8
Miao J, Wang J, Zhang D, Miao Q (2021) Improved generative adversarial network for rotating component fault diagnosis in scenarios with extremely limited data. IEEE Trans Instrum Meas 71:1–13
Michau G, Fink O (2021) Unsupervised transfer learning for anomaly detection: application to complementary operating condition transfer. Knowl-Based Syst 216:106816
Mishchenko K, Khaled A, Richtárik P (2022) Proximal and federated random reshuffling. In: International Conference on Machine Learning, PMLR, pp 15718–15749
Mo Y, Li L, Huang B, Li X (2022) Few-shot RUL estimation based on model-agnostic meta-learning. J Intell Manuf 34:1–14
Moreno-Barea FJ, Jerez JM, Franco L (2020) Improving classification accuracy using data augmentation on small data sets. Expert Syst Appl 161:113696
Nectoux P, Gouriveau R, Medjaher K, Ramasso E, Chebel-Morello B, Zerhouni N, Varnier C (2012) PRONOSTIA: an experimental platform for bearings accelerated degradation tests. In: IEEE International Conference on Prognostics and Health Management, PHM’12, pp 1–8
Nemani V, Biggio L, Huan X, Hu Z, Fink O, Tran A, Wang Y, Zhang X, Hu C (2023) Uncertainty quantification in machine learning for engineering design and health prognostics: a tutorial. Mech Syst Signal Process 205:110796
Omri N, Al-Masry Z, Mairot N, Giampiccolo S, Zerhouni N (2020) Industrial data management strategy towards an SME-oriented PHM. J Manuf Syst 56:23–36
Pan T, Chen J, Zhang T, Liu S, He S, Lv H (2022) Generative adversarial network in mechanical fault diagnosis under small sample: a systematic review on applications and future perspectives. ISA Trans 128:1–10
Pang G, Cao L, Aggarwal C (2021) Deep learning for anomaly detection: challenges, methods, and opportunities, pp 1127–1130
Parnami A, Lee M (2022) Learning from few examples: a summary of approaches to few-shot learning. arXiv Preprint https://arXiv.org/2203.04291
Peng C, Li L, Chen Q, Tang Z, Gui W, He J (2021) A fault diagnosis method for rolling bearings based on parameter transfer learning under imbalance data sets. Energies 14:944
Qi L, Ren Y, Fang Y, Zhou J (2023) Two-view LSTM variational auto-encoder for fault detection and diagnosis in multivariable manufacturing processes. Neural Comput Appl 35:1–20
Qin A, Mao H, Zhong J, Huang Z, Li X (2023) Generalized transfer extreme learning machine for unsupervised cross-domain fault diagnosis with small and imbalanced samples. IEEE Sens J 23:15831–15843
Qiu H, Lee J, Lin J, Yu G (2006) Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. J Sound Vib 289:1066–1090
Rajagopalan S, Singh J, Purohit A (2023) VMD-based ensembled SMOTEBoost for imbalanced multi-class rotor mass imbalance fault detection and diagnosis under industrial noise. J Vib Eng Technol 12:1–22
Randall RB (2021) Vibration-based condition monitoring: industrial, automotive and aerospace applications. Wiley
Ren Z, Lin T, Feng K, Zhu Y, Liu Z, Yan K (2023) A systematic review on imbalanced learning methods in intelligent fault diagnosis. IEEE Trans Instrum Meas 72:3508535
Ren L, Mo T, Cheng X (2024) Meta-learning based domain generalization framework for fault diagnosis with gradient aligning and semantic matching. IEEE Trans Ind Inf 20:754–764
Ruan D, Wu Y, Yan J, Gühmann C (2022) Fuzzy-membership-based framework for task transfer learning between fault diagnosis and RUL prediction. IEEE Trans Reliab 72:989–1002
Sadoughi M, Lu H, Hu C (2019) A deep learning approach for failure prognostics of rolling element bearings. In: IEEE, pp 1–7
Saxena A, Goebel K, Simon D, Eklund N (2008) Damage propagation modeling for aircraft engine run-to-failure simulation. In: IEEE, pp 1–9
Scheurer J, Campos JA, Korbak T, Chan JS, Chen A, Cho K, Perez E (2023) Training language models with language feedback at scale, arXiv Preprint https://arXiv.org/2303.16755
Schmid M, Gebauer E, Hanzl C, Endisch C (2020) Active model-based fault diagnosis in reconfigurable battery systems. IEEE Trans Power Electron 36:2584–2597
Schmidhuber J (1987) Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook
Shao S, McAleer S, Yan R, Baldi P (2018) Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans Industr Inf 15:2446–2455
Shi D, Ye Y, Gillwald M, Hecht M (2022) Robustness enhancement of machine fault diagnostic models for railway applications through data augmentation. Mech Syst Signal Process 164:108217
Smith WA, Randall RB (2015) Rolling element bearing diagnostics using the Case Western Reserve University data: a benchmark study. Mech Syst Signal Process 64:100–131
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inform Process Syst 30
Song Y, Wang T, Mondal SK, Sahoo JP (2022) A comprehensive survey of few-shot learning: evolution applications, challenges, and opportunities. ACM Comput Surv 271:1–40
Sun B, Saenko K (2016) Deep coral: correlation alignment for deep domain adaptation. Springer, pp 443–450
Sun Y, Zhao T, Zou Z, Chen Y, Zhang H (2021) Imbalanced data fault diagnosis of hydrogen sensors using deep convolutional generative adversarial network with convolutional neural network. Rev Sci Instrum 92:095007
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1199–1208
Suthaharan S (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform Eval Rev 41:70–73
Tang Z, Bo L, Liu X, Wei D (2021) An autoencoder with adaptive transfer learning for intelligent fault diagnosis of rotating machinery. Meas Sci Technol 32:55110
Tang Y, Xiao X, Yang X, Lei B (2023a) Research on a small sample feature transfer method for fault diagnosis of reciprocating compressors. J Loss Prev Process Ind 85:105163
Tang T, Qiu C, Yang T, Wang J, Zhao J, Chen M, Wu J, Wang L (2023b) A novel lightweight relation network for cross-domain few-shot fault diagnosis. Measurement 213:112697
Thrun S, Pratt L (2012) Learning to learn. Springer Science Business Media
Tian Y, Tang Y, Peng X (2020) Cross-task fault diagnosis based on deep domain adaptation with local feature learning. IEEE Access 8:127546–127559
Triguero I, Del Río S, López V, Bacardit J, Benítez JM, Herrera F (2015) ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowl-Based Syst 87:69–79
Vapnik V (2013) The nature of statistical learning theory. Springer Science Business Media
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. Adv Neural Inform Process Syst 29:3637–3645
Wan W, He S, Chen J, Li A, Feng Y (2021) QSCGAN: an un-supervised quick self-attention convolutional GAN for LRE bearing fault diagnosis under limited label-lacked data. IEEE Trans Instrum Meas 70:1–16
Wang C, Xu Z (2021) An intelligent fault diagnosis model based on deep neural network for few-shot fault diagnosis. Neurocomputing 456:550–562
Wang B, Lei Y, Li N, Li N (2018a) A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Trans Reliab 69:401–412
Wang Z, Wang J, Wang Y (2018b) An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition. Neurocomputing 310:213–222
Wang Y, Yao Q, Kwok JT, Ni LM (2020a) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv (CSUR) 53:1–34
Wang S, Wang D, Kong D, Wang J, Li W, Zhou S (2020b) Few-shot rolling bearing fault diagnosis with metric-based meta learning. Sensors (switzerland) 20:1–15
Wang D, Zhang M, Xu Y, Lu W, Yang J, Zhang T (2021) Metric-based meta-learning model for few-shot fault diagnosis under multiple limited data conditions. Mech Syst Signal Process 155:107510
Wang Z, Yang J, Guo Y (2022) Unknown fault feature extraction of rolling bearings under variable speed conditions based on statistical complexity measures. Mech Syst Signal Process 172:108964
Wang S, Ma L, Wang J (2023) Fault diagnosis method based on CND-SMOTE and BA-SVM algorithm. J Phys Conf Ser 2493:012008
Ward JS, Barker A (2013) Undefined by data: a survey of big data definitions. arXiv Preprint https://arXiv.org/1309.5821
Weikun D, Nguyen KT, Medjaher K, Christian G, Morio J (2023) Physics-informed machine learning in prognostics and health management: state of the art and challenges. Appl Math Model 124:325–352
Wen L, Li X, Li X, Gao L (2019) A new transfer learning based on VGG-19 network for fault diagnosis. In: IEEE, pp 205–209
Wen L, Li X, Gao L (2020) A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Comput Appl 32:6111–6124
Wenbai C, Chang L, Weizhao C, Huixiang L, Qili C, Peiliang W (2021) A prediction method for the RUL of equipment for missing data. Complexity 2021:2122655
Wu H, Zhao J (2020) Fault detection and diagnosis based on transfer learning for multimode chemical processes. Comput Chem Eng 135:106731
Wu J, Zhao Z, Sun C, Yan R, Chen X (2020) Few-shot transfer learning for intelligent fault diagnosis of machine. Measurement 166:108202
Wu K, Yukang N, Wu J, Yuanhang W (2023) Prior knowledge-based self-supervised learning for intelligent bearing fault diagnosis with few fault samples. Meas Sci Technol 34:105104
Xia P, Huang Y, Li P, Liu C, Shi L (2021) Fault knowledge transfer assisted ensemble method for remaining useful life prediction. IEEE Trans Industr Inf 18:1758–1769
Xiao D, Huang Y, Qin C, Liu Z, Li Y, Liu C (2019) Transfer learning with convolutional neural networks for small sample size problem in machinery fault diagnosis. Proc Inst Mech Eng C J Mech Eng Sci 233:5131–5143. https://doi.org/10.1177/0954406219840381
Xie J, Zhang L, Duan L, Wang J (2016) On cross-domain feature fusion in gearbox fault diagnosis under various operating conditions based on transfer component analysis. In: IEEE, pp 1–6
Xing S, Lei Y, Yang B, Lu N (2021) Adaptive knowledge transfer by continual weighted updating of filter kernels for few-shot fault diagnosis of machines. IEEE Trans Industr Electron 69:1968–1976
Xing S, Lei Y, Wang S, Lu N, Li N (2022) A label description space embedded model for zero-shot intelligent diagnosis of mechanical compound faults. Mech Syst Signal Process 162:108036
Xu J, Xu P, Wei Z, Ding X, Shi L (2020) DC-NNMN: across components fault diagnosis based on deep few-shot learning. Shock Vib 2020:3152174
Xu J, Zhou L, Zhao W, Fan Y, Ding X, Yuan X (2022) Zero-shot learning for compound fault diagnosis of bearings. Expert Syst Appl 190:116197
Xu P, Zhu X, Clifton DA (2023) Multimodal learning with transformers: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3275156
Yan H, Wang J, Chen J, Liu Z, Feng Y (2022) Virtual sensor-based imputed graph attention network for anomaly detection of equipment with incomplete data. J Manuf Syst 63:52–63
Yan H, Liu Z, Chen J, Feng Y, Wang J (2023) Memory-augmented skip-connected autoencoder for unsupervised anomaly detection of rocket engines with multi-source fusion. ISA Trans 133:53–65
Yang B, Lei Y, Jia F, Xing S (2018) A transfer learning method for intelligent fault diagnosis from laboratory machines to real-case machines. In: IEEE, pp 35–40
Yang B, Lei Y, Jia F, Xing S (2019a) An intelligent fault diagnosis approach based on transfer learning from laboratory bearings to locomotive bearings. Mech Syst Signal Process 122:692–706
Yang Q, Liu Y, Cheng Y, Kang Y, Chen T, Yu H (2019b) Federated learning, synthesis lectures on artificial intelligence and machine learning 13, pp 1–207
Yang J, Xie G, Yang Y (2020a) An improved ensemble fusion autoencoder model for fault diagnosis from imbalanced and incomplete data. Control Eng Pract 98:104358
Yang Y, Wang H, Liu Z, Yang Z (2020b) Few-shot learning for rolling bearing fault diagnosis via Siamese two-dimensional convolutional neural network. In: Proceedings—11th International Conference on Prognostics and System Health Management, PHM-Jinan 2020, pp 373–378
Yang X, Bai M, Liu J, Liu J, Yu D (2021) Gas path fault diagnosis for gas turbine group based on deep transfer learning. Measurement 181:109631
Yang G, Ye Z, Zhang R, Huang K (2022) A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation. ACI 2:1–31
Yang C, Zhang J, Chang Y, Zou J, Liu Z, Fan S (2023a) A novel deep parallel time-series relation network for fault diagnosis. IEEE Trans Instrum Meas 72:1–13
Yang L, Li S, Li C, Zhu C, Zhang A, Liang G (2023b) Data-driven unsupervised anomaly detection and recovery of unmanned aerial vehicle flight data based on spatiotemporal correlation. Sci China Technol Sci 66:1–13
Yao S, Kang Q, Zhou M, Rawa MJ, Abusorrah A (2023) A survey of transfer learning for machinery diagnostics and prognostics. Artif Intell Rev 56:2871–2922
Yin H, Li Z, Zuo J, Liu H, Yang K, Li F (2020) Wasserstein generative adversarial network and convolutional neural network (WG-CNN) for bearing fault diagnosis. Math Probl Eng 2020:2604191
Yu Y, Tang B, Lin R, Han S, Tang T, Chen M (2019) CWGAN: conditional Wasserstein generative adversarial nets for fault data generation. In: IEEE, pp 2713–2718
Yu K, Ma H, Lin TR, Li X (2020) A consistency regularization based semi-supervised learning approach for intelligent fault diagnosis of rolling bearing. Measurement 165:107987
Yu K, Lin TR, Ma H, Li X, Li X (2021a) A multi-stage semi-supervised learning approach for intelligent fault diagnosis of rolling bearing using data augmentation and metric learning. Mech Syst Signal Process 146:107043
Yu C, Ning Y, Qin Y, Su W, Zhao X (2021b) Multi-label fault diagnosis of rolling bearing based on meta-learning. Neural Comput Appl 33:5393–5407
Yu Q, Luo L, Liu B, Hu S (2023) Re-planning of quadrotors under disturbance based on meta reinforcement learning. J Intell Rob Syst 107:13
Zarsky TZ (2016) Incompatible: the GDPR in the age of big data. Seton Hall L Rev 47:995
Zha D, Bhat ZP, Lai K-H, Yang F, Hu X (2023) Data-centric AI: perspectives and challenges. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), SIAM, pp 945–948
Zhang A, Wang H, Li S, Cui Y, Liu Z, Yang G, Hu J (2018) Transfer learning with deep recurrent neural networks for remaining useful life estimation. Appl Sci 8:2416
Zhang A, Li S, Cui Y, Yang W, Dong R, Hu J (2019) Limited data rolling bearing fault diagnosis with few-shot learning. IEEE Access 7:110895–110904
Zhang Y, Ren Z, Zhou S (2020a) An intelligent fault diagnosis for rolling bearing based on adversarial semi-supervised method. IEEE Access 8:149868–149877
Zhang X, Qin Y, Yuen C, Jayasinghe L, Liu X (2020b) Time-series regeneration with convolutional recurrent generative adversarial network for remaining useful life estimation. IEEE Trans Industr Inf 17:6820–6831
Zhang L, Guo L, Gao H, Dong D, Fu G, Hong X (2020c) Instance-based ensemble deep transfer learning network: a new intelligent degradation recognition method and its application on ball screw. Mech Syst Signal Process 140:106681
Zhang H, Zhang Q, Shao S, Niu T, Yang X, Ding H (2020d) Sequential network with residual neural network for rotatory machine remaining useful life prediction using deep transfer learning. Shock Vib 2020:1–16
Zhang K, Chen J, Zhang T, He S, Pan T, Zhou Z (2020e) Intelligent fault diagnosis of mechanical equipment under varying working condition via iterative matching network augmented with selective Signal reuse strategy. J Manuf Syst 57:400–415
Zhang S, Ye F, Wang B, Habetler TG (2021) Few-shot bearing fault diagnosis based on model-agnostic meta-learning. IEEE Trans Ind Appl 57:4754–4764
Zhang T, Chen J, Li F, Zhang K, Lv H, He S, Xu E (2022a) Intelligent fault diagnosis of machines with small & imbalanced data: a state-of-the-art review and possible extensions. ISA Trans 119:152–171
Zhang X, Wu B, Zhang X, Zhou Q, Hu Y, Liu J (2022b) A novel assessable data augmentation method for mechanical fault diagnosis under noisy labels. Measurement 198:111114
Zhang X, Wang J, Han B, Zhang Z, Yan Z, Jia M, Guo L (2022c) Feature distance-based deep prototype network for few-shot fault diagnosis under open-set domain adaptation scenario. Measurement 201:111522
Zhang T, Chen J, Liu S, Liu Z (2023) Domain discrepancy-guided contrastive feature learning for few-shot industrial fault diagnosis under variable working conditions. IEEE Trans Industr Inf 19:10277–10287
Zhao B, Yuan Q (2021) Improved generative adversarial network for vibration-based fault diagnosis with imbalanced data. Measurement 169:108522
Zhao Z, Li T, Wu J, Sun C, Wang S, Yan R, Chen X (2020a) Deep learning algorithms for rotating machinery intelligent diagnosis: an open source benchmark study. ISA Trans 107:224–255
Zhao X, Jia M, Lin M (2020b) Deep Laplacian auto-encoder and its application into imbalanced fault diagnosis of rotating machinery. Measurement 152:107320
Zhao K, Jiang H, Wu Z, Lu T (2020c) A novel transfer learning fault diagnosis method based on manifold embedded distribution alignment with a little labeled data. J Intell Manuf 33:1–15
Zhao B, Niu Z, Liang Q, Xin Y, Qian T, Tang W, Wu Q (2021a) Signal-to-signal translation for fault diagnosis of bearings and gears with few fault samples. IEEE Trans Instrum Meas 70:1–10
Zhao Z, Zhang Q, Yu X, Sun C, Wang S, Yan R, Chen X (2021b) Applications of unsupervised deep transfer learning to intelligent fault diagnosis: a survey and comparative study. IEEE Trans Instrum Meas 70:1–28
Zhao K, Jiang H, Liu C, Wang Y, Zhu K (2022) A new data generation approach with modified Wasserstein auto-encoder for rotating machinery fault diagnosis with limited fault data. Knowl-Based Syst 238:107892
Zhao J, Yuan M, Cui J, Huang J, Zhao F, Dong S, Qu Y (2023) A novel hierarchical training architecture for Siamese Neural Network based fault diagnosis method under small sample. Measurement 215:112851
Zheng T, Song L, Guo B, Liang H, Guo L (2019) An efficient method based on conditional generative adversarial networks for imbalanced fault diagnosis of rolling bearing. In: IEEE, pp 1–8
Zhiyi H, Haidong S, Lin J, Junsheng C, Yu Y (2020) Transfer fault diagnosis of bearing installed in different machines using enhanced deep auto-encoder. Measurement 152:107393
Zhou K, Diehl E, Tang J (2023a) Deep convolutional generative adversarial network with semi-supervised learning enabled physics elucidation for extended gear fault diagnosis under data limitations. Mech Syst Signal Process 185:109772
Zhou L, Liu Y, Bai X, Li N, Yu X, Zhou J, Hancock ER (2023b) Attribute subspaces for zero-shot learning. Pattern Recogn 144:109869
Zhu QX, Zhang N, He YL, Xu Y (2022) Novel imbalanced fault diagnosis method based on CSMOTE integrated with LSDA and LightGBM for industrial process. In: IEEE, pp 326–331
Zhu R, Peng W, Wang D, Huang C-G (2023a) Bayesian transfer learning with active querying for intelligent cross-machine fault prognosis under limited data. Mech Syst Signal Process 183:109628
Zhu J, Long Z, Ma X, Luan F (2023b) Bearing remaining useful life prediction based on BERT fine-tuning. In: 2023 Global Reliability and Prognostics and Health Management Conference (PHM-Hangzhou), IEEE, pp 1–6
Zhuo Y, Ge Z (2021) Auxiliary information guided industrial data augmentation for any-shot fault learning and diagnosis. IEEE Trans Industr Inf 3203:1–11
Zio E (2022) Prognostics and health management (PHM): where are we and where do we (need to) go in theory and practice. Reliab Eng Syst Saf 218:108119
Acknowledgements
This work was supported in part by the National Key Research and Development Program of China [No. 2023YFB3308800]; in part by National Natural Science Foundation of China [No. 52275480]; in part by the Guizhou Province Higher Education Project [No. QJH KY [2020]005], in part by the Guizhou University Natural Sciences Special Project (Guida Tegang Hezi (2023) No.61).
Author information
Authors and Affiliations
Contributions
Chuanjiang Li: Conceptualization, Investigation, Methodology, Software, Data curation, Writing-Original draft preparation. Shaobo Li: Conceptualization, Supervision, Funding support. Yixiong Feng: Investigation, Writing-review. Konstantinos Gryllias: Methodology, Writing-review. Fengshou Gu: Methodology, Writing-review & editing. Michael Pecht: Methodology, Writing-review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, C., Li, S., Feng, Y. et al. Small data challenges for intelligent prognostics and health management: a review. Artif Intell Rev 57, 214 (2024). https://doi.org/10.1007/s10462-024-10820-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-024-10820-4