Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1807167.1807242acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Evaluation of probabilistic threshold queries in MCDB

Published: 06 June 2010 Publication History

Abstract

MCDB is a prototype database system for managing stochastic models for uncertain data. In this paper, we study the problem of how to use MCDB to answer statistical queries that search for database objects which satisfy some filter condition with greater (or less than) a user-specified probability. For example: "Which packages will arrive late with > 5% probability?" "Which regions will see more than a 2% decline in sales with > 50% probability?" "What items will be out of stock by Friday with > 20% probability?" We consider both the systems aspects and the statistical aspects of the problem.

References

[1]
P. Agrawal, O. Benjelloun, A. D. Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. Trio: A system for data, uncertainty, and lineage. In VLDB, pages 1151--1154, 2006.
[2]
G. Casella and R. L. Berger. Statistical Inference. Buxbury Press, 2001.
[3]
Y. Cui and J. Widom. Practical lineage tracing in data warehouses. In ICDE, pages 367--378, 2000.
[4]
Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB J., 12(1):41--58, 2003.
[5]
N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7):86--94, 2009.
[6]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864--875, 2004.
[7]
A. Deshpande and S. Madden. Mauvedb: supporting model-based user views in database systems. In SIGMOD, pages 73--84, 2006.
[8]
B. Eisenberg. The asymptotic solution to the keifer-weiss problem. In Comm. Statistics C-Sequential Analysis, volume 1, pages 81--88, 1982.
[9]
J. Galambos and I. Simonelli. Bonferroni-Type Inequalities with Applications. Springer-Verlag, 1996.
[10]
B. K. Gosh and P. K. Sen. Handbook of Sequential Anal. 1991.
[11]
M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, pages 673--686, 2008.
[12]
M. Huffman. An efficient approximate solution to the kiefer-weiss problem. In The Annals of Statistics, volume 11, pages 306--316, 1983.
[13]
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD, pages 687--700, 2008.
[14]
R. M. Karp and M. Luby. Monte-carlo algorithms for enumeration and reliability problems. In FOCS, pages 56--64, 1983.
[15]
J. Kiefer and L. Weiss. Some properties of generalized sequential probability ratio tests. In The Annals of Mathematical Statistics, volume 28, pages 57--74, 1957.
[16]
E. L. Lehmann. Testing Statistical Hypotheses. Springer, second edition, 1997.
[17]
G. Lorden. 2-sprts and the modified keifer-weiss problem of minimizing an expected sample size. In The Annals of Statistics, volume 4, pages 281--291, 1976.
[18]
J. Neyman and E. Pearson. On the problem of the most efficient tests of statistical hypotheses. Phil. Tran. of the Royal Soc. of London, Series A, 231:289--337, 1933.
[19]
C. K. Oliver Kennedy. Pip: A database system for great and small expectations. In ICDE, page to appear, 2010.
[20]
I. Pavlov. Sequential procedure of testing compositie hypotheses with application to the keifer-weiss problem. In Theory of Probability and Its Applications, volume 35, pages 280--292, 1991.
[21]
C. Re, N. N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, pages 886--895, 2007.
[22]
M. A. Soliman, I. F. Ilyas, and K. C.-C. Chang. Top-k query processing in uncertain databases. In ICDE, pages 896--905, 2007.
[23]
A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In SIGMOD, pages 791--804, 2008.
[24]
A. Wald. Sequential Analysys. Wiley, 1947

Cited By

View all

Index Terms

  1. Evaluation of probabilistic threshold queries in MCDB

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
    June 2010
    1286 pages
    ISBN:9781450300322
    DOI:10.1145/1807167
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. algorithms

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '10
    Sponsor:
    SIGMOD/PODS '10: International Conference on Management of Data
    June 6 - 10, 2010
    Indiana, Indianapolis, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Efficiently Answering Durability Prediction QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457305(591-604)Online publication date: 9-Jun-2021
    • (2014)Relative Accuracy EvaluationPLoS ONE10.1371/journal.pone.01038539:8(e103853)Online publication date: 18-Aug-2014
    • (2014)Model-data EcosystemsProceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/2594538.2594562(76-87)Online publication date: 18-Jun-2014
    • (2013)Efficient and scalable monitoring and summarization of large probabilistic dataProceedings of the 2013 SIGMOD/PODS Ph.D. symposium10.1145/2483574.2483586(61-66)Online publication date: 23-Jun-2013
    • (2013)Dealing with UncertaintyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2012.17925:11(2463-2482)Online publication date: 1-Nov-2013
    • (2013)A survey of queries over uncertain dataKnowledge and Information Systems10.1007/s10115-013-0638-637:3(485-530)Online publication date: 30-Apr-2013
    • (2012)Accuracy-Aware Uncertain Stream DatabasesProceedings of the 2012 IEEE 28th International Conference on Data Engineering10.1109/ICDE.2012.96(174-185)Online publication date: 1-Apr-2012
    • (2012)Efficient Threshold Monitoring for Distributed Probabilistic DataProceedings of the 2012 IEEE 28th International Conference on Data Engineering10.1109/ICDE.2012.34(1120-1131)Online publication date: 1-Apr-2012
    • (2011)Probabilistic threshold join over distributed uncertain dataProceedings of the 12th international conference on Web-age information management10.5555/2035562.2035573(68-80)Online publication date: 14-Sep-2011
    • (2011)The monte carlo database systemACM Transactions on Database Systems10.1145/2000824.200082836:3(1-41)Online publication date: 26-Aug-2011
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media