Outlier detection is a well established area of study for statistical data. However, most of the existing outlier detection techniques are designed targeting the regular data model, where the entire dataset is available for random access. Typical outlier detection techniques construct a standard data distribution or model from the entire dataset and execute their detection algorithms over each data point. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this work we propose two outlier detection techniques, called Distance Based Outline Detection for Data Streams (DB-ODDS) and Automatic Outlier Detection for Data Streams (A-ODDS). Both techniques are based on a novel continuously adaptive data distribution function that addresses all the issues of data streams.
Recommendations
Online outlier detection for data streams
IDEAS '11: Proceedings of the 15th Symposium on International Database Engineering & ApplicationsOutlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a ...
Outlier and anomaly pattern detection on data streams
AbstractA data stream is a sequence of data generated continuously over time. A data stream is too big to be saved in memory, and its underlying data distribution may change over time. Outlier detection aims to find data instances which significantly ...
Distance-based outlier detection in data streams
Continuous outlier detection in data streams has important applications in fraud detection, network security, and public health. The arrival and departure of data objects in a streaming manner impose new challenges for outlier detection algorithms, ...