Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1807167.1807203acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Threshold query optimization for uncertain data

Published: 06 June 2010 Publication History

Abstract

The probabilistic threshold query (PTQ) is one of the most common queries in uncertain databases, where all results satisfying the query with probabilities that meet the threshold requirement are returned. PTQ is used widely in nearest-neighbor queries, range queries, ranking queries, etc. In this paper, we investigate the general PTQ for arbitrary SQL queries that involve selections, projections and joins. The uncertain database model that we use is one that combines both attribute and tuple uncertainty as well as correlations between arbitrary attribute sets. We address the PTQ optimization problem that aims at improving the efficiency of PTQ query execution by enabling alternative query plan enumeration for optimization. We propose general optimization rules as well as rules specifically for selections, projections and joins. We introduce a threshold operator (τ-operator) to the query plan and show it is generally desirable to push down the τ-operator as much as possible.

References

[1]
P. K. Agarwal, S. W. Cheng, Y. Tao, and K. Yi. Indexing uncertain data. In PODS, 2009.
[2]
O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. Uldbs: databases with uncertainty and lineage. In VLDB, 2006.
[3]
G. Beskales, M. A. Soliman, and I. F. Ilyas. Efficient search for the top-k probable nearest neighbors in uncertain databases. In VLDB, 2008.
[4]
C. Bohm, A. Pryakhin, and M. Schubert. The gauss-tree: Efficient object identification in databases of probabilistic feature vectors. In Proceedings of the International Conference on Data Engineering (ICDE), 2006.
[5]
J. Boulos, N. Dalvi, B. Mandhani, S. Mathur, C. Re, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD, 2005.
[6]
R. Cheng, J. Chen, M. Mokbel, and C. Y. Chow. Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In ICDE, 2008.
[7]
R. Cheng, S. Singh, S. Prabhakar, R. Shah, J. S. Vitter, and Y. Xia. Efficient join processing over uncertain data. In CIKM, 2006.
[8]
R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB, 2004.
[9]
G. Cormode, F. Li, and K. Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009.
[10]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
[11]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
[12]
T. Ge, S. Zdonik, and S. Madden. Top-k queries on uncertain data: On score distribution and typical answers. In SIGMOD, 2009.
[13]
M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: A probabilistic threshold approach. In SIGMOD, 2008.
[14]
J. Huang, L. Antova, C. Koch, and D. Olteanu. Maybms: A probabilistic database management system (demonstration). In SIGMOD, 2009.
[15]
B. Kanagal and A. Deshpande. Indexing correlated probabilistic databases. In SIGMOD, 2009.
[16]
F. Li, K. Yi, and J. Jestes. Ranking distributed probabilistic data. In SIGMOD, 2009.
[17]
X. Lian and L. Chen. Monochromatic and bichromatic reverse skyline search over uncertain databases. In SIGMOD, 2008.
[18]
D. Olteanu, J. Huang, and C. Koch. Sprout: Lazy vs. eager query plans for tuple-independent probabilistic databases. In ICDE, 2009.
[19]
D. Olteanu, J. Huang, and C. Koch. Approximate confidence computation in probabilistic databases. In ICDE, 2010.
[20]
J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, 2007.
[21]
A. D. Sarma, M. Theobald, and J. Widom. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In Proceedings of the International Conference on Data Engineering (ICDE), 2008.
[22]
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
[23]
S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S. Hambrusch, J. Neville, and R. Cheng. Database support for probabilistic attributes and tuples. In ICDE, 2008.
[24]
M. A. Soliman, I. F. Ilyas, and K. C. Chang. Urank: formulation and efficient evaluation of top-k queries in uncertain databases. In SIGMOD, 2007.
[25]
Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. Kao, and S. Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB, 2005.
[26]
Y. Tao, X. Xiao, and R. Cheng. Range search on multidimensional uncertain data. ACM Trans. Database Syst., 32(3), 2007.
[27]
G. Tracevski, O. Wolfson, K. Hinrichs, and S. Chamberlain. Managing uncertainty in moving objects databases. ACM Trans. Database Syst., 29(3):463--507, 2004.
[28]
D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, 2008.
[29]
W. Zhang, X. Lin, Y. Zhang, W. Wang, and J. X. Yu. Probabilistic skyline operator over sliding windows. In ICDE, 2009.

Cited By

View all

Index Terms

  1. Threshold query optimization for uncertain data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
    June 2010
    1286 pages
    ISBN:9781450300322
    DOI:10.1145/1807167
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. probabilistic data
    2. query optimization
    3. threshold queries
    4. uncertain data

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '10
    Sponsor:
    SIGMOD/PODS '10: International Conference on Management of Data
    June 6 - 10, 2010
    Indiana, Indianapolis, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
    • (2022)Probabilistic Databases10.1007/978-3-031-01879-4Online publication date: 2-Mar-2022
    • (2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
    • (2018)Efficient histogram-based range query estimation for dirty dataFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5551-112:5(984-999)Online publication date: 1-Oct-2018
    • (2016)SMeGeoinformatica10.1007/s10707-015-0230-120:1(19-58)Online publication date: 1-Jan-2016
    • (2014)Quasi-SLCA Based Keyword QueryProcessing over Probabilistic XML DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.6726:4(957-969)Online publication date: 1-Apr-2014
    • (2014)ProbKS: Keyword Search on Probabilistic Spatial DataDatabase Systems for Advanced Applications10.1007/978-3-662-43984-5_30(388-402)Online publication date: 11-Jul-2014
    • (2014)Range Queries on Uncertain DataAlgorithms and Computation10.1007/978-3-319-13075-0_26(326-337)Online publication date: 8-Nov-2014
    • (2013)Top-k query processing in probabilistic databases with non-materialized viewsProceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)10.1109/ICDE.2013.6544819(122-133)Online publication date: 8-Apr-2013
    • (2013)A survey of queries over uncertain dataKnowledge and Information Systems10.1007/s10115-013-0638-637:3(485-530)Online publication date: 30-Apr-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media