Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1182635.1164145acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Online outlier detection in sensor data using non-parametric models

Published: 01 September 2006 Publication History

Abstract

Sensor networks have recently found many popular applications in a number of different settings. Sensors at different locations can generate streaming data, which can be analyzed in real-time to identify events of interest. In this paper, we propose a framework that computes in a distributed fashion an approximation of multi-dimensional data distributions in order to enable complex applications in resource-constrained sensor networks.We motivate our technique in the context of the problem of outlier detection. We demonstrate how our framework can be extended in order to identify either distance- or density-based outliers in a single pass over the data, and with limited memory requirements. Experiments with synthetic and real data show that our method is efficient and accurate, and compares favorably to other proposed techniques. We also demonstrate the applicability of our technique to other related problems in sensor networks.

References

[1]
{1} Crossbow Technology Inc. http://www.xbow.com/.
[2]
{2} Earth Climate and Weather, University of Washington. http://www-k12.atmos.washington.edu/k12/grayskies/.
[3]
{3} Andreas Arning, Rakesh Agrawal, and Prabhakar Raghavan. A Linear Method for Deviation Detection in Large Databases. In KDD, 1996.
[4]
{4} Brian Babcock, Mayur Datar, and Rajeev Motwani. Sampling From a Moving Window Over Streaming Data. In SODA, 2002.
[5]
{5} Brian Babcock, Mayur Datar, Rajeev Motwani, and Liadan O'Callaghan. Maintaining Variance And k-medians Over Data Stream Windows. In PODS, pages 234-243, USA, 2003.
[6]
{6} V. Barnet and T. Lewis. Outliers in Statistical Data. John Wiley and Sons, Inc., 1994.
[7]
{7} Shai Ben-David, Johannes Gehrke, and Daniel Kifer. Identifying Distribution Change in Data Streams. In VLDB, Toronto, ON, Canada, 2004.
[8]
{8} Bjorn Blohsfeld, Dieter Korus, and Bernhard Seeger. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. In SIGMOD, 1999.
[9]
{9} B. Bonfils and P. Bonnet. Adaptive and decentralized operator placement for in-network query processing. In IPSN, 2003.
[10]
{10} M.M. Breunig, H.-P. Kriegel, R.T. Ng, and Jörg Sander. LOF: Identifying Density-Based Local Outliers. In SIGMOD, 2000.
[11]
{11} Paul G. Brown and Peter J. Haas. Techniques for warehousing of sample data. In ICDE, 2006.
[12]
{12} Kaushik Chakrabarti, Minos N. Garofalakis, Rajeev Rastogi, and Kyuseok Shim. Approximate Query Processing Using Wavelets. In VLDB, 2000.
[13]
{13} Graham Cormode and Minos N. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In VLDB, pages 13-24, 2005.
[14]
{14} Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & sons, 1991.
[15]
{15} Amol Deshpande, Carlos Guestrin, and Samuel R. Madden. Using Probabilistic Models for Data Management in Acquisitional Environments. In Proc. CIDR, 2005.
[16]
{16} Amol Deshpande, Carlos Guestrin, Samuel R. Madden, Joseph M. Hellerstein, and Wei Hong. Model-Driven Data Acquisition in Sensor Networks. In VLDB, Toronto, ON, Canada, 2004.
[17]
{17} D. Ganesan, B. Greenstein, D. Estrin, J. Heidemann, and R. Govindan. Multiresolution storage and search in sensor networks. ACM TOS, 1(3):27-315, 2005.
[18]
{18} Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. In VLDB, Rome, Italy, 2001.
[19]
{19} M.B. Greenwald and S. Khanna. Power-Conserving Computation of Order-Statistics over Sensor Networks. In PODS, 2004.
[20]
{20} Carlos Guestrin, Peter Bodik, Romain Thibaux, Mark Paskin, and Samuel Madden. Distributed Regression: an Efficient Framework for Modeling Sensor Network Data. In IPSN, Berkeley, CA, 2004.
[21]
{21} Sudipto Guha and Nick Koudas. Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. In ICDE, pages 567-576, San Jose, CA, USA, 2002.
[22]
{22} Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, 2006.
[23]
{23} Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, and Carlotta Domeniconi. Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. In SIGMOD, 2000.
[24]
{24} Chalermek Intanagonwiwat, Deborah Estrin, Ramesh Govindan, and John Heidemann. Impact of network density on data aggregation in wireless sensor networks. In ICDCS, 2002.
[25]
{25} H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, and Torsten Suel. Optimal Histograms with Quality Guarantees. In VLDB, New York, NY, USA, 1998.
[26]
{26} A. Jain, E.Y. Chang, and Y.-F. Wang. Adaptive Stream Resource Management Using Kalman Filters. In SIGMOD, 2004.
[27]
{27} Ralph M. Kling. Intel Mote: An Enhanced Sensor Network Node. In Workshop on Advanced Sensors, Structural Health Monitoring, and Smart Structures, Kanagawa, Japan, 2003.
[28]
{28} E.M. Knorr and R.T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In VLDB, NY, NY, 1998.
[29]
{29} Lillian Lee. On the effectiveness of the skew divergence for statistical language analysis. In Artificial Intelligence and Statistics 2001, pages 65-72, 2001.
[30]
{30} J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. Infor. Theory, 37:145-151, 1991.
[31]
{31} Samuel Madden and Michael J. Franklin. Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data. In ICDE, 2002.
[32]
{32} Samuel Madden, Michael J. Franklin, and Joseph M. Hellerstein. TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks. In OSDI, 2002.
[33]
{33} N. Malpani, J. Welch, and N. Vaidya. Leader Election Algorithms for Mobile Ad Hoc Networks. In DIAL M Workshop, 2000.
[34]
{34} S. Muthukrishnan, Rahul Shah, and Jeffrey Scott Vitter. Mining Deviants in Time Series Data Streams. In SSDBM, 2004.
[35]
{35} C. Olston, J. Jiang, and J. Widom. Adaptive Filters for Continuous Queries over Distributed Data Streams. In SIGMOD, 2003.
[36]
{36} S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral, 2003.
[37]
{37} Vasundhara Puttagunta and Konstantinos Kalpakis. Adaptive Methods for Activity Monitoring of Streaming Data. In ICMLA, 2002.
[38]
{38} Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient Algorithms for Mining Ouliers from Large Data Sets. In SIGMOD, 2000.
[39]
{39} Dongmei Ren, Baoying Wang, and William Perrizo. Rdf: A density-based outlier detection method using vertical data representation. In ICDM, pages 503-506, 2004.
[40]
{40} D. Scott. Multivariate Density Estimation: Theory, Practice and Visualization. Wiley & Sons, 1992.
[41]
{41} Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, and Subhash Suri. Medians and Beyond: New Aggregation Techniques for Sensor Networks. In ACM SenSys, Baltimore, MD, USA, 2004.
[42]
{42} N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In SIGMOD Conference, 2002.
[43]
{43} B. Warneke, M. Last, B. Liebowitz, and K. Pister. Smart dust: Communicating with a cubic-millimeter computer. IEEE Computer Magazine, pages 44-51, January 2001.
[44]
{44} Kenji Yamanishi, Jun ichi Takeuchi, Graham J. Williams, and Peter Milne. On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. Data Mining and Knowledge Discovery, 8(3):275-300, 2004.
[45]
{45} Yong Yao and Johannes Gehrke. Query Processing for Sensor Networks. In CIDR, Asilomar, CA, USA, 2003.
[46]
{46} Fan Ye, Haiyun Luo, Jerry Cheng, Songwu Lu, and Lixia Zhang. A Two-Tier Data Dissemination Model for Large-Scale Wireless Sensor Networks. In MOBICOM, Atlanta, GA, USA, 2002.
[47]
{47} S. Zhao, K. Tepe, I. Seskar, and D. Raychaudhuri. Routing protocols for self-organizing hierarchical ad hoc wireless networks. In IEEE Sarnoff Symposium, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages

Sponsors

  • SIGMOD: ACM Special Interest Group on Management of Data
  • K.I.S.S. SIG on Databases
  • AJU Information Technology Co., Ltd
  • US Army ITC-PAC Asian Research Office
  • Google Inc.
  • The Database Society of Japan
  • Samsung SOS
  • Advanced Information Technology Research Center
  • Naver
  • Microsoft: Microsoft
  • Korea Info Sci Society: Korea Information Science Society
  • SK telecom
  • Systems Applications Products
  • ORACLE: ORACLE
  • International Business Management
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Kosef
  • Kaist
  • LG Electronics
  • CCF-DBS

Publisher

VLDB Endowment

Publication History

Published: 01 September 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media