Vibration Analysis For IoT Enabled Predictive Maintenance
Abstract—Vibration sensor is becoming an essential part of New generation of vibration sensor, using MEMS ac-
Internet of Things (IoT), fueled by the quickly evolving technology celerometer as measurement device, is reshaping the equip-
improving the measurement accuracy and lowering the hardware ment servicing and replacement workflow in last few years.
cost. Vibration sensors physically attach to core equipments in It overcomes the limitations of conventional vibration sensor,
control and manufacturing systems, e.g., motors and tubes, pro- based on piezoelectric accelerometer, which is bulky, energy
viding key insight into the running status of these devices. Massive
hungry and expensive. These new vibration sensors are now
readings from vibration sensors, however, pose new technical
challenges to the analytical system, due to the non-continuous sufficiently light for easy installation, sustainable for months
sampling strategy for sensor energy saving, as well as hardness under battery power and cheap to purchase and replace. For
of data interpretation. To maximize the utility and minimize the the first time, vibration analysis is now applicable for system
operational overhead of vibration sensors, we propose a new administrator to unveil a handful of key insights into the
analytical framework, especially designed for vibration analysis system. Firstly, vibration directly reflects the operational status
based on its unique characteristics. In particular, our data of the equipments, mostly generated by mechanical behavior
engine targets to support Remaining Usefulness Lifetime (RUL) of the equipments, such as the rotation of the motors and
estimation, known as one of the most important problems in the running flow in the pipes. Secondly, vibration is fairly
cyber-physical system maintenance, to optimize the replacement independent of other external factors, e.g., temperature and
scheduling over the equipments under monitoring. Our empirical
humidity. The readings of the vibrations over the target devices
evaluations on real manufacturing sites show that scalable and
accurate analysis over the vibration data enables to prolong the provide more reliable information, than conventional sensors
average lifetime of the tubes by 1.2x and reduce the replacement could offer. Thirdly, vibration behavior is believed to evolve
cost by 20%. quickly with respect to the ageing equipments. It is thus a per-
fect indicator of early signs on the failures and problems, way
before the actual problem occurs to the devices. With all these
I. I NTRODUCTION enticing features, vibration sensors are becoming an essential
part of IoT in CPS, especially to support predictive mainte-
We are now at the step of the door to a new industrial era nance over the replaceable components. Instead of replacing
with the adoption of Internet of Things (IoT) in almost every the equipment at a fixed period, vibration-based predictive
Cyber Physical System (CPS). The key vision of the new era is maintenance allows the system to much more closely monitor
the possibilities enabled by IoT on more accurate monitoring and evaluate the equipments in real-time, to alert the system
over and real-time optimal response to the physical devices in administrator with servicing or replacement suggestions only
the system. Device failure is one of the most important classes when the equipment is about to fail or degenerate.
of events CPS expects to prevent, detect and fix, resorting to
the sensor and data analysis techniques in IoT. In a typical Predictive maintenance with vibration sensor data, how-
semiconductor fabrication plant (FAB), for example, thousands ever, demands new processing and analysis techniques in the
of vacuum pumps endlessly evacuate air from chambers in existing Manufacturing Execution Systems (MES) [1], [3],
semiconductor wafer fabrication, to constantly meet the strin- [4], [2]. The challenges are in two-fold. Firstly, due to the
gent vacuum condition. To maintain a required yield rate of spectrum nature of vibration data, the readings from vibration
wafers, factory operators usually adopt a conservative and risk sensors span in a high-dimensional space, only part of which
averse strategy for its pump replacement, by which every pump is useful to the maintenance prediction objective. Such high
must be replaced after it reaches a maximal working period dimensionality requires effective feature selection tools based
even if the pump remains in perfect condition. As we will on the spectral properties of vibration behaviors, in order to
present in our empirical evaluations, although a fraction of identify key dimensions for accurate device failure prediction.
the pumps keep healthy for over 24 months, they are now Secondly, the bandwidth between vibration sensor and base
replaced within the time frame of 6 months, because of the station is limited, due to the hardware and energy consumption
large variance on the lifetime distribution. While most of constraints on the sensors. The back-end analytical engine is
the manufacturers are aware of the waste, such replacement expected to handle asynchronous and incomplete observations
strategy is the only applicable option to the decision makers, from multiple sensors for predictive modelling.
due to the limited analytical capability over the pumps. Any
minor problem with a pump in operation could render huge In this paper, we present our novel solution to vibration
cost of defected products and stoppage of the pipeline. analysis and lessons learnt from our practice. We build a
35 5
30 3
0 2
10 10 3 10 4 10 5
! "#
Sampling Frequency (Hz)
%)&)' Fig. 5: Trade-off :sampling frequency, report periods, and node
Reliable Data Collection ! " % & ' lifetime
# &)(
& (
III. P ROBLEM D ESCRIPTION TABLE II: Table of notation descriptions.
In this section, we discuss the model of the data and for- Symbol Description
mulate the mathematical problem of vibration analysis based M number of vibration sensors
on the data representation. We also highlight the challenges N number of measurements
behind our data analysis problem, which motivate our analysis K number of samples in each measurement
methodologies introduced in subsequent sections in detail. m m-th vibration sensor
n n-th measurement
A. Problem Overview k k-th sample in a measurement
In Figure 6, we present an example analysis over a number amnk 3-dimensional vector in {x, y, z} space
of homogeneous equipments, say pumps in a semiconductor almn k-dimensional vector on readings over l ∈
factory, to illustrate the general workflow of vibration analysis. {x, y, z}
Generally speaking, given the periodic samples on the acceler- âlmn normalized vector of almn
ation reading data from the vibration sensors, the ultimate goal a L2-norm of the vector a
of our analysis is to estimate the Remaining Useful Lifetime rmn RMS feature of measurement n over equip-
(or RUL in short) of the equipments, based on the historical ment m
records in the database system. From operational perspective, smn feature vector of measurement n over equip-
it is more important to identify the equipments when they are ment m
close to end of their lifetime and about to fail in the near future. C set of all candidate labels
In the following, we list a number of basic assumptions on the Ci a particular label for every i ∈ {1, 2, 3, 4}
data and the equipments, in plain language, in order to justify (smn , qmn ) a training sample with label qmn ∈ C
our data and analysis model in the rest of the section. T s , Te starting and ending time of data used in
Based on the data collection mechanism introduced in analysis
Sec. II, we make a number of assumptions on the infor-
mation coming from the vibration sensors. Firstly, the data 1) Indirect Measurement: The acceleration rates in three
only provides indirect information over the vibrations, i.e., dimensional space is an indirect and approximate mea-
samples on the acceleration rate of the target equipment in surement over the vibration phenomenon of the target
three dimensions. Secondly, samples from different sensors equipment.
may cover various intervals on the time domain. In Fig. 4, 2) Noisy and Unaligned Observations: The readings of
for example, the samples from five equipments are collected acceleration data from different sensors cannot be well
at completely different time points. Therefore, it does not aligned and may contain huge amount of noise.
generate any meaningful results by directly comparing the 3) Variance on Initial Status: The initial status of the target
acceleration data from two vibration sensors. Thirdly, there equipment could be completely different from each other.
is always significant noise in the data. Although the technical 4) Diversity on Lifetime model: The usage and lifetime
advances on sensor technology has greatly reduced the error, model of the target equipment depends on a number of
the acceleration rate of the sensors may not accurately reflect unknown and external factors.
the vibration phenomenon of the physical equipment. The
analysis algorithm is supposed to be robust to the noise, to B. Data Representation
ensure the usefulness of the analysis outcomes.
In this part of the section, we discuss the data represen-
On the other hand, the unique characteristics of cyber- tation in the analysis system. There are three types of data
physical systems also bring up a few assumptions on the involved in the analysis process, including sensor data, human
equipments, which must be taken into consideration when de- label data and analysis meta data.
signing data and analysis model. Firstly, the equipments under
monitoring have different ages when the vibration sensors are Sensor Data We assume that there is one and only one
attached to them. Basically, it means that the initial status of vibration sensor attached to each equipment of interest. We
the equipment could be completely different. When plotting leave the extension from single sensor to multiple sensors to
the lifetime of the equipments in Fig. 6, for example, they our future research work. The notations used throughout the
are in very different positions in the time-feature visualization section are summarized in Tab. II. Following the conventions,
on the right part of the figure. Secondly, the lifetime of the we use bold font of lowercase to denote vectors and bold font
target equipment has huge variance, which depends on a of uppercase to denote matrices. Given a vector a, we use a
group of external factors. The expected lifetime of a pump, to denote the L2-norm of the vector a.
for example, depend on the manufacturing process where the
pump is installed during its operation. On the right part of Fig. Assume that we have M equipment in the cyber-physical
6, there are two general groups of equipments, which evolve on system, each of which is monitored by a vibration sensor.
two different directions in the feature space over usage time. We use m to indicate the id of the equipment as well as
The analysis model is supposed to distinguish between such the corresponding vibration sensor. Each vibration sensor has
equipment classes, in order to accomplish accurate predictions. taken N measurement, and each measurement contains K
samples of acceleration readings on three dimensions (x, y, z)
As a short summary, we design the analysis engine for the independently. Given the measurement id n and the sample id
vibration sensor network, by addressing the following major k, the readings from vibration sensor m is thus a 3-dimensional
challenges: vector amnk = (axmnk , aymnk , azmnk ).
! " #
$%& '
$ #
Similarly, given l ∈ {x, y, z}, we use almn to denote the Human Label Data To accurately capture the running status
vector of length K, containing all measurement on direction l of the equipments, we also employ human experts to contribute
in lthe n-th measurement from vibration sensor m, i.e., almn = labels to the running status of the equipments. There are
amn1 , amn2 , . . . , almnK .
four options to the experts, when they are asked to label the
usability of the equipments. We use C to denote the complete
Due to the impact of gravity with earth, the original set of labels. In our problem setting, there are four labels
acceleration readings may contain bias irrelevant to the be- C = {C1 , . . . , C4 }, with details of the labels listed below:
havior of the target equipment. To eliminate the bias, we
apply normalization over the original readings, by subtracting • C1 : known as Zone A, Vibration of newly Commissioned
the average over all readings from one single measurement. machines.
Therefore, the normalized readings are calculated as âlmn = • C2 : known as Zone B, Acceptable for unrestricted long
K axmnk -term operation.
amn − 1 · k=1 K , in which 1 is a unit vector of length
system to support periodical update, unless explicitly specified
by the system administrator. For instance, the system’s analysis
period could be set as long as an hour and refreshed when Te ! #$
is passed, i.e. Tsj = Tsj−1 and Tej + 1 hour, which forces "
the analytical engine to update the results in every hour. To
simplify our notations, we assume that Ts and Te are pre-
defined and unchanged, when the context is clear in the rest
of the paper. During the analysis period, the vibrations report
their vibration measurements during their assigned time slot in
every round as shown Fig.4. To simplify our problem without
loss generality, we assume that all sensors send that total N )
number of measurements during the analysis period.
% &'('
C. Problem Formulation
Given the abstract problem in III-A and the data model in
previous subsection, we provide a mathematical formulation * +*&
for the predictive maintenance problem.
) ,- %
We t = (t1 , t2 , t3 , t4 ) to denote the class label vector, in / .
which tk (k = 1, 2, 3, 4) is a binary indicator to the class label .
Ck . Since each equipment of interest4 is attached with exactly
one label at any time, we have k=1 tk = 1.
Let zmn denote an feature mapping of smn by a mapping Fig. 7: Analytical workflow overview.
function φ(s). Given the training data in the database, D =
{(smn , qmn )}, the goal of model training is to construct an where Θ = [θ1 , · · · , θL ]T , and evaluating (2) for Zpred =
evaluation of the probability over any class labels Ck ∈ C, XTpred ŴΘ̂ where Xpred = X + XRU L and XRU L is a
i.e., remaining useful lifetime matrix for m, n.
In the following section, we will discuss how we run data
P (qmn = Ck |zmn , D) (1) cleaning, feature extraction and model optimization, based on
the formulation above.
Given the probability evaluation function, our model pre-
dicts the class label for an unseen sample zmn by finding the IV. METHODOLOGY
estimated class label q̂ with the maximal likelihood, as
In this section, we describe our overall methodology and
algorithms developed for our predictive analytic. In Fig. 7,
q̂ = arg max P (qmn = Ck |zmn , D) (2) we present a layered architecture of the vibration analysis
Ck ∈C
engine. The data retrieval layer provides a common restful-type
In order to find the optimal function φ∗ for the class API for data transformation layer to retrieve data from factory
label prediction task, we aim to optimize with the following database and sensor database given a analysis period. The data
objective function: transformation layer performs various conversion functions to
transform unitless raw data into various measurement data
φ∗ = arg max P (qmn |zmn , D) (3) such as acceleration in g(= m/s2 ), power spectral density
(zmn ,qmn )∈D (g 2 /Hz), and etc. The data preprocessing layer performs a
number of tasks, including 1) an outlier detection to remove
Let xmn denote a service time of equipment m at nth invalid measurements and a moving average with user-defined
measurement since its installation. Then let X denote a M N × time window (1 day by default) to reduce noises in sensor
1 matrix of xmn sorted in an increasing order, and Z denote measurement, and 2) matrix construction for various variables
a M N × 1 matrix of zmn corresponding to X. in Table II while eliminating invalid measurements to prevent
unwanted computations followed in next layers. The feature
Let L denote the number of underlying equipment lifetime matrix extraction layer extracts a feature matrix from data
models. Then we assume that a linear model z = θi x holds matrix and stores them in data analytic server. Finally, the
for each lifetime model i = 1, · · · , L. Let w = (w1 , · · · , wL ) RUL model layer gathers aggregates pumps’ feature matrix
denote the label vector ofa binary indicator for lifetime model and build RUL model. More details on these processing steps
where wl ∈ {0, 1} and 1=1 wl = 1. Then let W denote a are elaborated in the following section.
M N × L matrix of w that assigns zmn to one of the lifetime
model l = 1, · · · , L. A. Data Preprocessing
Finally, our RUL estimation is done by first solving the The low-cost MEMS based sensor often suffers from a
following objective function: long-term offset drifts, by which a zero-offset of acceleration
measurement gradually increases or decreases over time. In our
Ŵ, Θ̂, L̂ = arg min XT WΘ − Z (4)
W,Θ,L problem setting, it does affect our analysis results, because
1.0 Cluster Label a harmonic peak feature by defining a feature space with a
x-axis acceleration group of pairs of significant peaks’ value and frequency in
0.5 y-axis acceleration
z-axis acceleration PSD. The harmonic peak feature of sn is formally defined by
0.0 pn = {(fnk , pnk )}k=1,··· ,np where pnk ,fnk , and np are peak
0.5 value, peak frequency, and the maximum number of peaks to
be searched, respectively.
Power Spectral Density
Hz )
4 Detected Harmonic Peaks [Baseline]
Amplitde (g/
00 500 1000 1500 2000 2500
8 Power Spectral Density
Hz )
00 500 1000 1500 2000 2500
Power Spectral Density
Hz )
00 500 1000 1500 2000 2500
Power Spectral Density
Hz )
00 500 1000 1500 2000 2500
Frequency, (Hz)
Fig. 9: Peak harmonic feature distance comparison: (Top) a PSD sample and its peak harmonic feature taken from a heathy condition, Zone A [baseline peak
harmonic feature] (from the Second to the Bottom) Comparison of peak harmonic distances from the baseline feature (on Top) for PSD samples randomly
chosen taken other Zones.
domain expert for Fab equipments and a Fab manager. To 100
Prob(Da ZoneA)
simplify the annotation process, we do not distinguish between Prob(Da ZoneBC)
Zone B and C. Therefore, we use Zone BC to indicate the 80 Prob(Da ZoneD)
Number of samples
the human generated labels include physical inspection on
audial (almost noise only) and visual inspection for pumps
and event logs from FICS as well. We collect labels for 2800
measurements containing 700 labels on Zone A, 1400 labels 40
on Zone BC, and 700 labels on Zone D, respectively. In some
rare cases, the label could be invalid due to human mistakes. 20
We simply discard the label and corresponding measurement
from our data for model building and evaluation.
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Note that all those selected pumps are an identical model Peak Harmonic Distance from Zone A, Da
and products from the same pump manufacturer. Their ex- Fig. 11: Probability density function estimates of Da given
pected lifetime follows the same distribution, before their Zone x∈ {A, BC, D}, a decision boundary of Da for Zone D
operation over the target equipments. However, their remaining is 0.21
useful lifetime (RUL) vary dramatically during their operation,
due to different environment and behavior of the target equip-
functions for each histogram are estimated using gaussian
ments. After 3 months of experiment, a domain expert and
kernel density as well. We can easily find that the posterior
Fab manager estimates RUL of these 12 pumps by extensive
probability P rob(Da |Zone x) is clearly separated for all these
diagnostics, and compares the diagnostic results against our
three types of labels. After simple calculation, we identify the
analysis outputs.
optimal boundary between zone D and other two zones at 0.21
in peak harmonic distance domain.
B. Performance
Zone A Zone BC
In this part of the section, we reports the results on 1.0 1.0
classification and prediction tasks in our analysis system. For
0.8 0.8
classification, we use only 50 labels for training and all other
labels for testing. For prediction, we learn RUL model on entire 0.6 0.6
40 40 40
Amplitde (g/
30 30 30 0.8 0.8
20 20 20 0.6 0.6
10 10 10
0.4 0.4
Peak harmonic dist.
00 00 00
500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500 0.2 0.2 Euclidian dist.
Frequency, (Hz) Frequency, (Hz) Frequency, (Hz) Mahal dist.
0.0 0.0 Temp.
Fig. 10: 100 sample traces for Zone A, Zone BC, and Zone D 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
Number of training samples
1) Classification: In Fig. 10 we show PSD of 100 sample
measurements in frequency domain for Zone A, Zone BC, and Fig. 12: Precision
Zone D, respectively. From the results, it is clear that overall
amplitude, shape and peak location in frequency domain are Fig. 12, Fig. 13 and Fig. 14 present the precision, recall and
all different from zone to zone. Note that zone BC and zone accuracy of our classification algorithm using peak harmonic
D become nearly indistinguishable when the random noise feature over different number of training samples. We com-
grows to cover each frequency area in PSD measurement pare the performance against other feature metric, including
domain. Furthermore, from zone BC to zone D, the variance of Euclidian/Mahalanobis distance of vibration measurement and
PSD at each frequency area increases proportionally, and the temperature measurement from FICS. These results imply that
fluctuation of PSD grows accordingly in zone D. Following our metric is much more stable, leading to better classification
results show that harmonic peak distance allows us to capture performance than other metrics. In particular, it shows that
most distinguishing features under a presence of large signal temperature data does not work for classification at all. It
fluctuation. is because equipments’ temperature is greatly affected by
the factory control system rather than equipments’ inherent
In Fig. 11, we show histograms of Da , the peak harmonic condition. Tab. IV shows the confusion table given 15 training
distance from training samples assigned to zone A, for all 2800 samples. In the table, Euclidian and Mahalanobis distance
labelled measurements given pump’s condition in zone A , feature often misclassify measurements from zone D as Zone
zone BC and Zone D. The corresponding probability density BC. Such errors are mostly fatal to the Fab factory, and thus
TABLE III: Confusion Table at number of training samples = 15
Zone A Zone BC time. The threshold value of Da between the zone D and zone
1.0 1.0
BC are plotted at 0.21 over service times as well. The pump
0.8 0.8 with Da greater than 0.21 is very likely to operate under health
0.6 0.6
condition of zone D, hence whose RUL becomes 0. The figure
Fig. 15: Two lifetime models (Da = bj1 Dx + bj0 for RUL Model I and Model II) found by our recursive RANSAC algorithm
network on the problem of degradation modeling. While all Technology (KIAT) grant funded by the Korea government
these approaches above work well in their problem settings, Ministry of Trade, Industry and Energy. (No. N050600080
they share one common drawback. They are all based on exper- ). Zhang and Winslett are supported by the research grant
iments over well-controlled equipment samples, and therefore for the HCCS Programme at the Advanced Digital Sciences
impractical when used in real world applications. Our proposal Center from Singapore’s Agency for Science, Technology and
tackles this problem with a purely data-driven solution, without Research (A*STAR).
any assumption on the equipments under monitoring.
In data mining community, predictive maintenance is re-
