Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1807167.1807187acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PODS: a new model and processing algorithms for uncertain data streams

Published: 06 June 2010 Publication History

Abstract

Uncertain data streams, where data is incomplete, imprecise, and even misleading, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the PODS system that supports stream processing for uncertain data naturally captured using continuous random variables. PODS employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for complex relational operators, i.e., aggregates and joins, by exploring advanced statistical theory and approximation. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements, and significantly outperform a state-of-the-art sampling method. A case study further shows that our techniques can enable a tornado detection system (for the first time) to produce detection results at stream speed and with much improved quality.

References

[1]
P. Agrawal and J. Widom. Continuous uncertainty in Trio. In MUD, 2009.
[2]
L. Antova, et al. Fast and simple relational processing of uncertain data. In ICDE, 983--992, 2008.
[3]
O. Benjelloun, et al. Uldbs: Databases with uncertainty and lineage. In VLDB, 953--964, 2006.
[4]
D. Carney, et al. Monitoring streams: a new class of data management applications. In VLDB, 215--226, 2002.
[5]
C. Casella and R. Berger. Statistical Inference. Duxbury, 2001.
[6]
R. Cheng, et al. Evaluating probabilistic queries over imprecise data. In SIGMOD, 551--562, 2003.
[7]
G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, 281--292, 2007.
[8]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523--544, 2007.
[9]
A. Deshpande, et al. Model-driven data acquisition in sensor networks. In VLDB, 588--599, 2004.
[10]
A. Deshpande and S. Madden. MauveDB: supporting model-based user views in database systems. In SIGMOD, 73--84, 2006.
[11]
Y. Diao, et al. Capturing data uncertainty in high-volume stream processing. In CIDR, 2009.
[12]
T. Ge and S. B. Zdonik. Handling uncertain data in array database systems. In ICDE, 1140--1149, 2008.
[13]
L. Girod, et al. Xstream: a signal-oriented data stream management system. In ICDE, 1180--1189, 2008.
[14]
C. Guestrin, et al. Distributed regression: an efficient framework for modeling sensor network data. In IPSN, 1--10, 2004.
[15]
R. Jampani, et al. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, 687--700, 2008.
[16]
T. S. Jayram, et al. Estimating statistical aggregates on probabilistic data streams. In PODS, 243--252, 2007.
[17]
S. R. Jeffery, et al. Adaptive cleaning for RFID data streams. In VLDB, 163--174, 2006.
[18]
B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 2008.
[19]
B. Kanagal and A. Deshpande. Efficient query evaluation over temporally correlated probabilistic streams. In ICDE, 2009.
[20]
J. F. Kurose, et al. An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In AINTEC, 1--15, 2006.
[21]
G. McLachlan and D. Peel. Finite Mixture Models. Wiley-Interscience, 2000.
[22]
S. Fruhwirth-Schnatter. Finite Mixture and Markov Switching Models, Springer, 2006.
[23]
T. Sauer. Numerical Analysis. Addison Wesley, 2005.
[24]
P. Sen, et al. Exploiting shared correlations in probabilistic databases. In VLDB, 809--820, 2008.
[25]
S. Singh, et al. Database support for probabilistic attributes and tuples. In ICDE, 1053--1061, 2008.
[26]
D. Suciu, et al. Embracing uncertainty in large-scale computational astrophysics. In MUD, 2009.
[27]
A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In SIGMOD, 791--804, 2008.
[28]
T. Tran, et al. PODS: A new model and processing algorithms for uncertain data streams. Technical report, UMass Amherst, 2009. http://www.cs.umass.edu/~ttran/pubs/pods.pdf.
[29]
T. Tran, et al. Probabilistic inference over rfid streams in mobile environments. In ICDE, 1096--1107, 2009.
[30]
D. Z. Wang, et al. Bayesstore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 340--351 2008.
[31]
N. Ye, editor. The Handbook of Data Mining. Lawrence Earlbaum Associates, 2003.

Cited By

View all

Index Terms

  1. PODS: a new model and processing algorithms for uncertain data streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
    June 2010
    1286 pages
    ISBN:9781450300322
    DOI:10.1145/1807167
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data models
    2. relational processing
    3. uncertain data streams

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '10
    Sponsor:
    SIGMOD/PODS '10: International Conference on Management of Data
    June 6 - 10, 2010
    Indiana, Indianapolis, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
    • (2020)Uncertain Spatial Data Management: An OverviewHandbook of Big Geospatial Data10.1007/978-3-030-55462-0_14(355-397)Online publication date: 17-Dec-2020
    • (2018)Efficient Aggregation Methods for Probabilistic Data StreamsBusiness Modeling and Software Design10.1007/978-3-319-94214-8_8(116-132)Online publication date: 30-Jun-2018
    • (2016)Ontology-Based Data Quality Management for Data StreamsJournal of Data and Information Quality10.1145/29683327:4(1-34)Online publication date: 6-Oct-2016
    • (2015)Distribution Driven Extraction and Tracking of Features for Time-varying Data AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2015.246743622:1(837-846)Online publication date: 27-Oct-2015
    • (2015)A Cloud-Friendly RFID Trajectory Clustering Algorithm in Uncertain EnvironmentsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.234728626:8(2075-2088)Online publication date: 1-Aug-2015
    • (2015)Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning MachineJournal of Computer Science and Technology10.1007/s11390-015-1566-630:4(874-887)Online publication date: 8-Jul-2015
    • (2015)Sliding windows over uncertain data streamsKnowledge and Information Systems10.1007/s10115-014-0804-545:1(159-190)Online publication date: 1-Oct-2015
    • (2014)Uncertainty-Aware Sensor Data Management and Early Warning for Monitoring Industrial InfrastructuresInternational Journal of Monitoring and Surveillance Technologies Research10.4018/IJMSTR.20141001012:4(1-24)Online publication date: 1-Oct-2014
    • (2014)Quality mattersProceedings of the 8th ACM International Conference on Distributed Event-Based Systems10.1145/2611286.2611292(1-12)Online publication date: 26-May-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media