1. Introduction
The decline in cost and the increase in use of sensors has led to both an abundance of sensor data and the development of sensor data analytics (SDA) [
1,
2,
3]. A critical problem in SDA is the evaluation of sensor quality, especially in a network of sensors. This article describes an analytical tool in MATLAB (Version R2016b, MathWorks, Natick, MA, USA) that is designed to quickly and effectively identify faulty sensors out of a set of possibly hundreds involved in an experiment, study or sensor system.
The original motivation for the development of our software draws from an issue naval engineers experience who perform acoustic tests with the aim of detecting undersea objects. Such acoustic tests are expensive, time-consuming, and rely on accurate, well-functioning microphones. However, verification of microphone data validity by specialized engineers would not take place for days, or sometimes weeks, after conducting tests. Inevitably, some verifications would indicate microphone failure, but, by the time of verification, re-testing would be impractical, if not impossible. This meant that when microphone errors were discovered, some data were deemed compromised and, potentially, unusable. Such large sensor systems are common to a number of settings including wind tunnel systems where microphones are used to estimate beamforms. As analysis of data from natural systems may have additional problems associated with identification of objects, we focus here on problems associated with microphone arrays in laboratory systems such as wind tunnels.
A variety of methods have been proposed for identifying outliers, anomalous observations and failing sensors. In a recent paper, for example, Liu and colleagues [
4] developed an entropy based method to model the rate at which sensor readings degrade using data collected on a turbofan engine. Their first concern was to develop a method to select a subset of the sensors with which to train their model, which is described in [
5]. This is done to reduce the computational burden of working with the entire data set. Liu et al. later extended this simple framework to a more complete one which incorporates correlation between individual sensors. As a measure of correlation, Liu et al. used the mutual information between two sensors. This quantity measures to what extent knowledge about one sensor reduces uncertainty about the other one. Due to its greater simplicity, our method does not require subsetting. Thus, we are able to consider the entire output from our sensor system.
Similar to the work of [
4,
5], Pang et al. [
6] applied Gaussian process (GP) regression to anomaly detection in time series data. Pang’s approach can be described as a moving window based one-step-ahead prediction. Given an initial collection of data points, they train a GP model and then use it to predict the next input from the series. As their method is a one-step-ahead prediction based on a Gaussian model, it can be likened to a Kalman filter (see [
7]). As with the work of Liu and colleagues, that of Pang et al. relies on a rather involved model to perform predictions. GPs scale poorly with data, which precludes their general application to datasets of the size we consider.
Hayes and Capretz [
8] outlined a general approach to anomaly detection in big data settings. They were concerned both with quick and scalable detection, as well as taking good account of correlations between multiple sensors for more robust detection. Essentially, this is a univariate Gaussian predictor, using historical data to estimate a mean and variance. While simple, it is fast and lightweight, allowing monitoring of numerous sensors individually. Methods based on mean and variance may not be appropriate in cases where unusual observations significantly affect estimation of these quantities. The approach described below is quite similar in spirit to their method, however is nonparametric as the method uses interquantile ranges in place of standard deviations to measure discrepancy from the central signal.
Unlike any of the other work considered, our method is implemented in a lightweight and user friendly MATLAB interface. Our context also involves more user involvement with the data than those considered in the above papers [
4,
6,
8]. While the previous papers consider industrial process monitoring applications, our data comes from a research application. Although this macro might be used for automated anomaly detection, our current focus is more on pointing out unexpected patterns in data for further inspection from researchers, and on identifying sensors that have failed.
2. Microphone Arrays
Modern wind tunnel testing involves multiple diverse sensor systems with anywhere from hundreds to thousands of individual sensors. These large-scale tests can be quite expensive and are usually time-sensitive. As such, any delays due to faulty instrumentation can have serious consequences. Equally serious is the possibility of discovering a sensor failure after the test has been completed because time and effort will have been spent collecting useless information. Although sometimes correctable, the time involved in doing so distracts the experimenter from achieving the experimental goals.
As such, any large scale sensor system needs methods to ensure that all the individual sensors are working as intended. Most commercial sensor systems contain rudimentary error detection for sensors within a given system, but these methods typically have no way of incorporating information about the ambient conditions under which they were run, or, more importantly, information from the output of other systems which are used in conjunction. This is a significant problem, as most large sensor systems in wind tunnel tests are made of smaller, unique sensor subsets. By combining the information from diverse sensor systems into a global error detection process, we can measure the extent to which sensors across systems are correlated and use that correlation to produce powerful predictions and error detection capabilities.
Thus, our work is motivated by two two questions: (1) Can we detect large-scale microphone issues during or immediately following tests? and (2) If a small percentage of microphones are malfunctioning, can we use the surrounding microphones to infer the true values of the bad microphones with a range and degree of certainty useful for analytics?
Question 1 focuses on prevention; if faulty microphones can be discovered on location, it is much less costly to replace the bad microphones and retest than if the bad microphones are detected at a later time. Question 2 focuses on salvaging imperfect data.
The analytical tool described in this paper aims to evaluate the first issue by automating much of the error identification process. The structure of the tool is described and evaluated using data from microphone arrays. Although the methods were developed specifically for microphone arrays, the program and tools can be useful in other situations where rapid assessment of sensor quality is required. We next discuss the statistical and graphical methods and their implementation. Results of a simulation are used to evaluate the error rate of the method for times series models. An example using data from a wind tunnel experiment is given to illustrate the package. Finally, some extensions are described. An appendix describes the macro in more detail and uses simulated data to illustrate the inputs and graphics.
3. Statistical Methods
Common failures with microphones include complete failure in which the microphone turns off resulting in a flat signal with low variability, drift of the signal and groups of signals that are different from regular signals. In addition, some microphones exhibit over-range and clipping. Over-range is defined here as having values that exceed normal ranges for the microphone. Clipping occurs when a value, usually at the upper-most or lower-most value, occurs multiple times.
The data set used in this analysis is viewed as consisting of measurements on K microphones or sensors. Each microphone is assumed to have the same starting and ending value and the same number of measurements. The measurements are indexed by a time sequence. Thus, we take to be a measurement from microphone i, at time j, .
A standard approach for evaluating unusual observations in statistical analysis makes use of means and standard deviations of data at each time. However, this approach is limited because odd values can increase standard deviations and bias means resulting in reduced sensitivity. In addition, the approach requires a large number of sensors relative to the number of observations and the use may be limited due to correlations between measurements on the same sensor. For microphone data, it is not uncommon to have over 500,000 data values for a single microphone over a 30 s interval. Thus, we are typically dealing with a relatively small number of long series. In this work, the preference is to make use of a nonparametric approach that is less sensitive to extreme values and reduces correlation over time. Our approach is now described.
The first step involves segmenting the series, determining common behavior in each segment, and identifying outliers. The approach is to divide the series for a sensor into M segments of equal or roughly equal length. If L is the length of the segment, then M = T/L. It is convenient in many applications to take the length of the segment in proportion to the sampling rate as this will often result in all segments having the same length. We then re-index the data for microphone i as with and . The second step is to define a statistic that summarizes the segment. For this purpose, we use a range that contains a specified percentage of observations, i.e., a range percentile. The range percentile is based on the quantiles from the data in each segment. Specifically, define the range percentile level , that determines the upper and lower quantiles and hence the quantile range, . The value of is defined as the difference between the upper and lower quantiles, and within each segment where is the value in the segment chosen so that of the values are less than and .
For example, for an = range percentile, use a quantile interval based the 5th and 95th quantiles of the for each microphone and segment . The is the difference between the 5th and 95th quantiles. In our applications, the = quantile interval is used; however, in implementation, other values for may be used. A larger value of would result in identifying individual observations while a smaller value would focus on groups of observations. In addition, for applications, there are M segments per microphone, thus there are M interquantile ranges that we denote as . With the range of typical data points provided by each , outliers are determined based on the median and pseudo standard deviation (PSD) of the set of values (IQRs) for each microphone. Extreme values are defined as values that are more extreme than the median plus or minus a critical value times the PSD. When the number of segments is large, the critical value might be the appropriate quantile from the standard normal distribution (i.e., the critical z value). Other options might be to use a critical t value or adjusted t value as described below. The critical value is user defined and is associated with a desired level of certainty in correctly identifying sensor oddities.
In addition to identifying outliers, checks are also made to identify over-range data and clipped data. Over-range is determined by identifying observations that exceed specified limits of the microphone, while clipping is determined when an observed minimum or maximum observation duplicates, as if the data are censored for reasons that are unclear. For example, real data collected from accurate microphones often seem governed by a Gaussian distribution. Histograms of clipped data present large spikes in the tails in an otherwise Gaussian-shaped histogram and overranged data present two or more distributional modes. For this paper, we select candidate data for overranging and clipping by analyzing tail patterns and searching for histogram bins near the tails that are unusually large. Then, because data from the clipped microphones are more similar to Gaussian distributions than data from the overranged microphones, we run an Anderson–Darling test for normality [
9,
10] on samples from the microphones identified in the previous step, and we sort the microphones based on the results of this test.
The final check we make to assess data quality is to assess whether microphones, as a whole, are malfunctioning. Microphones that are not functioning may still emit a low signal, and thus may be determined to be off when a large number of observations are below some threshold that is user determined. Graphical displays are used to help distinguish different types of patterns and reduce false signals. These include power spectrum plots of microphones and histograms of observations for individual microphones. A heatmap is used to display the full set of oddities for comparison across microphones to identify patterns. The heatmap is a colored matrix that has M rows associated with segments and K columns associated with sensors. Four colors are used in the matrix, corresponding to a high extreme value, a low extreme value, a value that is off and a normal value.
4. Simulation Results and Sensitivity Measure
One critical decision in outlier and anomaly detection is the numerical criterion for deciding if an observation is sufficiently extreme to be deemed an outlier or potential outlier. While the actual determination might involve other factors and checks on the microphone, the determination of the cutoff is critical as too small a value will result in numerous false signals and too high a value in numerous missed outliers. To evaluate the method and estimate potential error rates, a simulation is used. Data were generated to mimic the microphone data in terms of the number of data points and segment size. Ten thousand series were generated from a variety of time series models (AR(1), AR(2) and ARMA) assuming normally distributed errors. Each series was of length 250,000 and segments of size 1000 were used, resulting in 250 evaluations. The statistical approach above was used to determine if a value was an outlier using two criteria: first, a cutoff of 3.0 PSDs (common in quality control methods) was used and, second, a cutoff was chosen based on the Bonferroni method. The cutoff of 3.0 is chosen from a normal distribution corresponding to an upper tail probability of approximately 0.0015; with the cutoff, the probability of observing a high outlier or more extreme would be 0.0015 (similarly for a lower tail outlier). While conservative for a single calculation, when there are many observations to assess, a fixed cutoff may result in a number of false signals. The Bonferroni method adjusts the cutoff so that the tail probability may account for the number of tests that are considered. Suppose the interest is in a overall error rate (e.g., 5%). Then, simply divide by the number of tests to specify the error rate for each test. The adjusted cutoff corresponds to the quantile from the normal or t-distribution associated with an upper tail probability of , where T is the number of tests.
The results of the simulation are summarized in
Table 1. When
and a cutoff of 3.0 is used an excessive number of observations are flagged as odd observations when no correction is made. In the case of autoregressive models AR1, the error rate increases as the autocorrelation increases. A similar pattern holds for data generated from other time series models and error rates can be quite high. The Bonferroni method controls the error rate and results in a small rate of rejections as expected. As the autocorrelation increases, the Bonferroni rejection rate increases slightly, suggesting that the associated error rate may be affected by autocorrelation. Note that these results are for a situation with a large number of tests (250 in this case); when the number is smaller, the cutoff of 3.0 may be adequate. Thinning of the series may also be used to reduce correlation; however, this results in a loss of observations. Alternately, an adjustment to the sample size may be helpful (i.e., adjust degrees of freedom by multiplying by one minus the autocorrelation). In simulations, we found a sample size adjustment to result in fewer outliers although the approach tended to be conservative. Clearly, graphical displays and scientific judgement should be used to make final decisions about the quality of the observations and microphones. Simulated data are also used to illustrate the program and graphical displays in the appendix.
5. Evaluation of Microphone Array Data
This section describes an experiment that will be used as an example. The experiment is part of a study that focuses on the acoustic effects of a discontinuity or step in an otherwise smooth surface. For example, the surface of a ship may be mostly smooth except for discontinuities in the hull where plates are joined. Such steps may be acoustically loud and interest lies in the study of the effect of steps on acoustic characteristics of flow [
11] and the design of efficient low-noise vehicles. The program was used to evaluate an experimental study using data collected in the Stability Wind Tunnel at Virginia Tech (Blacksburg, VA, USA) (
Figure 1). The wind tunnel allows for experiments with anechoic (free from echo) or semi-anechoic conditions. The data evaluated below are part of a study on pressure fluctuations produced by forward steps immersed in a turbulent boundary layer [
11].
Figure 2 shows a schematic of the test section, test wall and anechoic chambers. In a typical study, sidewalls are constructed of Kevlar cloth; however, for this study, one wall was removed and replaced with an adjustable wall that is suitable for growing high-Reynolds number two-dimensional turbulent boundary layers (
Figure 3).
In the overall experiment, step sizes of 3.6, 14.6 and 58.3 mm are used with flow speeds of 30 and 60 m/s. A variety of measurements are made including the mean wall Cp, mean velocity profiles, 3D turbulence profiles and wall pressure spectra. Below, we evaluate one part of the experiment with a focus on wall pressure spectra. Measurements are made using 31 wall-mounted microphones placed forward and aft of a sharp step at a particular location (
Figure 4). A sampling rate of 51,200 Hz is used. To segment the data, segments of size 51,200 are used resulting in 20 segments. A cutoff of
z = 3.02 is used for the critical value, based on the Bonferroni method adjusting for 20 tests.
Diagnostic plots of the 31 channels are presented in
Figure 5 and suggest problems with several microphones. Three microphones are identified as being off and five channels suggest clipping. The three channels have power spectra that are considerably different from the other microphones. This is also illustrated in the segment bar chart and the heatmap. A variety of other channels have minor violations. A plot of channel 4 illustrates clipping (
Figure 6). Note that the histogram plotted in the lower left plot shows peaks in the distribution at the upper and lower values. Channel 6 is one channel that is identified as being off and this is indicated by the flat time series, the narrow histogram and simple power spectra (
Figure 7).
6. Conclusions
The MATLAB Graphical User Inteface (GUI) allows rapid evaluation of microphone data quality and identifies extreme segments, microphones that are not functioning as well as clipping and overrange. In an example with 31 microphones, we identified three microphones that were not functioning and several microphones with clipping. Identification of these issues may lead to improved analysis of the information in the microphone system. The GUI provides some flexibility for a user. This includes selection of the critical threshold for identifying unusual groups of observations. It is recommended that the user use the GUI with some data of known quality to identify the threshold prior to use on experimental data. We have found the threshold of three to be somewhat liberal with real data, i.e., it tends to identify unusual observations that do not affect resulting analyses. Evaluation of individual series using a Bonferroni correction can also be helpful in selecting the critical threshold.
The GUI can be extended in a number of ways. First, more interactive plots can be added. Although not included in this version of the GUI, we have a feature that allows interactive evaluation of data by allowing the user to click on a suspected spectrum to visualize the series that is suspect. A future implementation will focus on systems with multiple sets of sensors and correlated systems. The current analysis treats the microphones as providing independent information. To incorporate correlation, a Gaussian Process approach is being developed that allows for both correlation between microphones of the same and different types of sensors while maintaining the nonparametric nature of the analysis.