research-article

Free access

Probabilistic ranked queries in uncertain databases

Authors:

Lei ChenAuthors Info & Claims

EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology

Pages 511 - 522

https://doi.org/10.1145/1353343.1353406

Published: 25 March 2008 Publication History

Abstract

Recently, many new applications, such as sensor data monitoring and mobile device tracking, raise up the issue of uncertain data management. Compared to "certain" data, the data in the uncertain database are not exact points, which, instead, often locate within a region. In this paper, we study the ranked queries over uncertain data. In fact, ranked queries have been studied extensively in traditional database literature due to their popularity in many applications, such as decision making, recommendation raising, and data mining tasks. Many proposals have been made in order to improve the efficiency in answering ranked queries. However, the existing approaches are all based on the assumption that the underlying data are exact (or certain). Due to the intrinsic differences between uncertain and certain data, these methods are designed only for ranked queries in certain databases and cannot be applied to uncertain case directly. Motivated by this, we propose novel solutions to speed up the probabilistic ranked query (PRank) over the uncertain database. Specifically, we introduce two effective pruning methods, spatial and probabilistic, to help reduce the PRank search space. Then, we seamlessly integrate these pruning heuristics into the PRank query procedure. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach in answering PRank queries, in terms of both wall clock time and the number of candidates to be refined.

References

[1]

R. Akbarinia, E. Pacitti, and P. Valduriez. Best position algorithms for top-k queries. In VLDB, pages 495--506, 2007.

Digital Library

[2]

L. Antova, C. Koch, and D. Olteanu. Query language support for incomplete information in the MayBMS system. In VLDB, pages 1422--1425, 2007.

Digital Library

[3]

B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925, 2007.

Digital Library

[4]

C. Böhm, A. Pryakhin, and M. Schubert. The Gauss-tree: efficient object identification in databases of probabilistic feature vectors. In ICDE, page 9, 2006.

Digital Library

[5]

N. Bruno, S. Chaudhuri, and L. Gravano. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. TODS, 2002.

Digital Library

[6]

K. C.-C. Chang and S.-W. Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346--357, 2002.

Digital Library

[7]

Y.-C. Chang, L. D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The Onion technique: indexing for linear optimization queries. In SIGMOD, pages 391--402, 2000.

Digital Library

[8]

L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005.

Digital Library

[9]

R. Cheng, D. Kalashnikov, and S. Prabhakar. Querying imprecise data in moving object environments. In TKDE, volume 16, pages 1112--1127, 2004.

Digital Library

[10]

R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, pages 551--562, 2003.

Digital Library

[11]

R. Cheng, S. Singh, and S. Prabhakar. U-DBMS: A database system for managing constantly-evolving data. In VLDB, pages 1271--1274, 2005.

Digital Library

[12]

R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB, pages 876--887, 2004.

Digital Library

[13]

G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, 2006.

Digital Library

[14]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, pages 102--113, 2001.

Digital Library

[15]

A. Faradjian, J. Gehrke, and P. Bonnet. Gadt: A probability space ADT for representing and querying the physical world. In ICDE, pages 201--211, 2002.

Digital Library

[16]

A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.

Digital Library

[17]

V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. In SIGMOD, 2001.

Digital Library

[18]

V. Hristidis and Y. Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDBJ, 13(1):49--70, 2004.

Digital Library

[19]

M. Hua, J. Pei, A. W.-C. Fu, X. Lin, and H.-F. Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.

Digital Library

[20]

I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDBJ, 13(3):207--221, 2004.

Digital Library

[21]

H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz. Probabilistic similarity join on uncertain data. In DASFAA, 2006.

Digital Library

[22]

H.-P. Kriegel, P. Kunath, and M. Renz. Probabilistic nearest-neighbor query on uncertain objects. In DASFAA, 2007.

Digital Library

[23]

C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song. RankSQL: Query algebra and optimization for relational top-k queries. In SIGMOD, pages 131--142, 2005.

Digital Library

[24]

C. Li, M. Wang, L. Lim, H. Wang, and K. C.-C. Chang. Supporting ranking and clustering as generalized order-by and group-by. In SIGMOD, pages 127--138, 2007.

Digital Library

[25]

V. Ljosa and A. K. Singh. APLA: indexing arbitrary probability distributions. In ICDE, pages 247--258, 2007.

[26]

Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: Top-k keyword query in relational databases. In SIGMOD, pages 115--126, 2007.

Digital Library

[27]

A. Marian, N. Bruno, and L. Gravano. Evaluating top-k queries over web-accessible databases. TODS, 29(2):319--362, 2004.

Digital Library

[28]

J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, 2007.

Digital Library

[29]

C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.

[30]

R. Ross, V. S. Subrahmanian, and J. Grant. Aggregate operators in probabilistic databases. J. ACM, 52(1):54--101, 2005.

Digital Library

[31]

A. D. Sarma, O. B., A. Y. Halevy, and J. Widom. Working models for uncertain data. In ICDE, page 7, 2006.

Digital Library

[32]

M. A. Soliman, I. F. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.

[33]

Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. K., and S. Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB, pages 922--933, 2005.

Digital Library

[34]

Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424--445, 2007.

Digital Library

[35]

Y. Tao, D. Papadias, and X. Lian. Reverse kNN search in arbitrary dimensionality. In VLDB, pages 744--755, 2004.

Digital Library

[36]

Y. Tao, D. Papadias, X. Lian, and X. Xiao. Multidimensional reverse kNN search. In VLDBJ, 2005.

Digital Library

[37]

M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB, pages 648--659, 2004.

Digital Library

[38]

Y. Theodoridis and T. Sellis. A model for the prediction of R-tree performance. In PODS, pages 161--171, 1996.

Digital Library

[39]

D. Xin, C. Chen, and J. Han. Towards robust indexing for ranked queries. In VLDB, 2006.

Digital Library

[40]

D. Xin, J. Han, and K. C.-C. Chang. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In SIGMOD, pages 103--114, 2007.

Digital Library

[41]

K. Yi, F. Li, D. Srivastava, and G. Kollios. Efficient processing of top-k queries in uncertain databases. In ICDE, pages 385--394, 2000.

[42]

M. L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis. Top-k spatial preference queries. In ICDE, pages 1076--1085, 2007.

[43]

M. L. Yiu and N. Mamoulis. Efficient processing of top-k dominating queries on multi-dimensional data. In VLDB, pages 483--494, 2007.

Digital Library

Cited By

Yang SSun YLiu JXiao XLi RWei Z(2022)Approximating probabilistic group steiner trees in graphsProceedings of the VLDB Endowment10.14778/3565816.356583416:2(343-355)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.14778/3565816.3565834
Wahab RMohd Rum SIbrahim HSidi FIshak I(2021)A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window ModelWSEAS TRANSACTIONS ON SYSTEMS AND CONTROL10.37394/23203.2021.16.2216(261-269)Online publication date: 25-May-2021
https://doi.org/10.37394/23203.2021.16.22
Ding XSheng SLiu JZhou P(2021)Efficient Probabilistic K-NN Computation in Uncertain Sensor NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2021.30998648:3(2575-2587)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TNSE.2021.3099864
Show More Cited By

Index Terms

Probabilistic ranked queries in uncertain databases
1. Information systems
  1. Data management systems
    1. Database administration
    2. Database management system engines
      1. Database query processing
  2. Information systems applications
    1. Data mining
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Probabilistic inverse ranking queries in uncertain databases

Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data ...
Probabilistic top-k dominating queries in uncertain databases

Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Ranked Query Processing in Uncertain Databases

Recently, many new applications, such as sensor data monitoring and mobile device tracking, raise up the issue of uncertain data management. Compared to "certain” data, the data in the uncertain database are not exact points, which, instead, often ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology

March 2008

762 pages

ISBN:9781595939265

DOI:10.1145/1353343

Conference Chair:
Noureddine Mouaddib,
General Chair:
Patrick Valduriez,
Program Chairs:
Alfons Kemper
Technische Universität München, Germany
,
Mokrane Bouzeghoub,
Volker Markl,
Laurent Amsaleg,
Ioana Manolescu,
Publications Chair:
Jens Teubner

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

EDBT '08

EDBT '08: 11th International Conference on Extending Database Technology

March 25 - 29, 2008

Nantes, France

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
826
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)13

Reflects downloads up to 09 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang SSun YLiu JXiao XLi RWei Z(2022)Approximating probabilistic group steiner trees in graphsProceedings of the VLDB Endowment10.14778/3565816.356583416:2(343-355)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.14778/3565816.3565834
Wahab RMohd Rum SIbrahim HSidi FIshak I(2021)A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window ModelWSEAS TRANSACTIONS ON SYSTEMS AND CONTROL10.37394/23203.2021.16.2216(261-269)Online publication date: 25-May-2021
https://doi.org/10.37394/23203.2021.16.22
Ding XSheng SLiu JZhou P(2021)Efficient Probabilistic K-NN Computation in Uncertain Sensor NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2021.30998648:3(2575-2587)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TNSE.2021.3099864
Züfle A(2020)Uncertain Spatial Data Management: An OverviewHandbook of Big Geospatial Data10.1007/978-3-030-55462-0_14(355-397)Online publication date: 17-Dec-2020
https://doi.org/10.1007/978-3-030-55462-0_14
Davari MEdalat ALieutier ABouyer P(2019)The convex hull of finitely generable subsets and its predicate transformerProceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science10.5555/3470152.3470178(1-14)Online publication date: 24-Jun-2019
https://dl.acm.org/doi/10.5555/3470152.3470178
Schmid KZüfle A(2019)Representative Query Answers on Uncertain DataProceedings of the 16th International Symposium on Spatial and Temporal Databases10.1145/3340964.3340974(140-149)Online publication date: 19-Aug-2019
https://dl.acm.org/doi/10.1145/3340964.3340974
Davari MEdalat ALieutier A(2019)The convex hull of finitely generable subsets and its predicate transformer2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2019.8785680(1-14)Online publication date: Jun-2019
https://doi.org/10.1109/LICS.2019.8785680
Dai CNutanong SChow CCheng R(2018)Entropy-Based Scheduling Policy for Cross Aggregate Ranking WorkloadsIEEE Transactions on Services Computing10.1109/TSC.2016.258606211:3(507-520)Online publication date: 1-May-2018
https://doi.org/10.1109/TSC.2016.2586062
Chen JFeng L(2017)Efficient pruning for top-K ranking queries on attribute-wise uncertain datasetsJournal of Intelligent Information Systems10.1007/s10844-016-0403-x48:1(215-242)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s10844-016-0403-x
Xiao GWu FZhou XLi KXiao ZLi K(2016)Probabilistic top-k range query processing for uncertain databasesJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-16904031:2(1109-1120)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.3233/JIFS-169040
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents