Unsupervised outlier detection in sensor networks using aggregation tree

K Zhang, S Shi, H Gao, J Li - … conference on advanced data mining and …, 2007 - Springer
K Zhang, S Shi, H Gao, J Li
International conference on advanced data mining and applications, 2007Springer
In the applications of sensor networks, outlier detection has attracted more and more
attention. The identification of outliers can be used to filter false data, find faulty nodes and
discover interesting events. A few papers have been published for this issue. However some
of them consume too much communication, some of them need user to pre-set correct
thresholds, some of them generate approximate results rather than exact ones. In this paper,
a new unsupervised approach is proposed to detect global top n outliers in the network. This …
Abstract
In the applications of sensor networks, outlier detection has attracted more and more attention. The identification of outliers can be used to filter false data, find faulty nodes and discover interesting events. A few papers have been published for this issue. However some of them consume too much communication, some of them need user to pre-set correct thresholds, some of them generate approximate results rather than exact ones. In this paper, a new unsupervised approach is proposed to detect global top n outliers in the network. This approach can be used to answer both snapshot queries and continuous queries. Two novel concepts, modifier set and candidate set for the global outliers, are defined in the paper. Also a commit-disseminate-verify mechanism for outlier detection in aggregation tree is provided. Using this mechanism and the these two concepts, the global top n outliers can be detected through exchanging short messages in the whole tree. Theoretically, we prove that the results generated by our approach are exact. The experimental results show that our approach is the most communication-efficient one compared with other existing methods. Moreover, our approach does not need any pre-specified threshold. It can be easily extended to multi-dimensional data, and is suitable for detecting outliers of various definitions.
Springer