Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1353343.1353406acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Probabilistic ranked queries in uncertain databases

Published: 25 March 2008 Publication History

Abstract

Recently, many new applications, such as sensor data monitoring and mobile device tracking, raise up the issue of uncertain data management. Compared to "certain" data, the data in the uncertain database are not exact points, which, instead, often locate within a region. In this paper, we study the ranked queries over uncertain data. In fact, ranked queries have been studied extensively in traditional database literature due to their popularity in many applications, such as decision making, recommendation raising, and data mining tasks. Many proposals have been made in order to improve the efficiency in answering ranked queries. However, the existing approaches are all based on the assumption that the underlying data are exact (or certain). Due to the intrinsic differences between uncertain and certain data, these methods are designed only for ranked queries in certain databases and cannot be applied to uncertain case directly. Motivated by this, we propose novel solutions to speed up the probabilistic ranked query (PRank) over the uncertain database. Specifically, we introduce two effective pruning methods, spatial and probabilistic, to help reduce the PRank search space. Then, we seamlessly integrate these pruning heuristics into the PRank query procedure. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach in answering PRank queries, in terms of both wall clock time and the number of candidates to be refined.

References

[1]
R. Akbarinia, E. Pacitti, and P. Valduriez. Best position algorithms for top-k queries. In VLDB, pages 495--506, 2007.
[2]
L. Antova, C. Koch, and D. Olteanu. Query language support for incomplete information in the MayBMS system. In VLDB, pages 1422--1425, 2007.
[3]
B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925, 2007.
[4]
C. Böhm, A. Pryakhin, and M. Schubert. The Gauss-tree: efficient object identification in databases of probabilistic feature vectors. In ICDE, page 9, 2006.
[5]
N. Bruno, S. Chaudhuri, and L. Gravano. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. TODS, 2002.
[6]
K. C.-C. Chang and S.-W. Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346--357, 2002.
[7]
Y.-C. Chang, L. D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The Onion technique: indexing for linear optimization queries. In SIGMOD, pages 391--402, 2000.
[8]
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005.
[9]
R. Cheng, D. Kalashnikov, and S. Prabhakar. Querying imprecise data in moving object environments. In TKDE, volume 16, pages 1112--1127, 2004.
[10]
R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, pages 551--562, 2003.
[11]
R. Cheng, S. Singh, and S. Prabhakar. U-DBMS: A database system for managing constantly-evolving data. In VLDB, pages 1271--1274, 2005.
[12]
R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB, pages 876--887, 2004.
[13]
G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis. Answering top-k queries using views. In VLDB, 2006.
[14]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, pages 102--113, 2001.
[15]
A. Faradjian, J. Gehrke, and P. Bonnet. Gadt: A probability space ADT for representing and querying the physical world. In ICDE, pages 201--211, 2002.
[16]
A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[17]
V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. In SIGMOD, 2001.
[18]
V. Hristidis and Y. Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDBJ, 13(1):49--70, 2004.
[19]
M. Hua, J. Pei, A. W.-C. Fu, X. Lin, and H.-F. Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.
[20]
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDBJ, 13(3):207--221, 2004.
[21]
H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz. Probabilistic similarity join on uncertain data. In DASFAA, 2006.
[22]
H.-P. Kriegel, P. Kunath, and M. Renz. Probabilistic nearest-neighbor query on uncertain objects. In DASFAA, 2007.
[23]
C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song. RankSQL: Query algebra and optimization for relational top-k queries. In SIGMOD, pages 131--142, 2005.
[24]
C. Li, M. Wang, L. Lim, H. Wang, and K. C.-C. Chang. Supporting ranking and clustering as generalized order-by and group-by. In SIGMOD, pages 127--138, 2007.
[25]
V. Ljosa and A. K. Singh. APLA: indexing arbitrary probability distributions. In ICDE, pages 247--258, 2007.
[26]
Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: Top-k keyword query in relational databases. In SIGMOD, pages 115--126, 2007.
[27]
A. Marian, N. Bruno, and L. Gravano. Evaluating top-k queries over web-accessible databases. TODS, 29(2):319--362, 2004.
[28]
J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, 2007.
[29]
C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.
[30]
R. Ross, V. S. Subrahmanian, and J. Grant. Aggregate operators in probabilistic databases. J. ACM, 52(1):54--101, 2005.
[31]
A. D. Sarma, O. B., A. Y. Halevy, and J. Widom. Working models for uncertain data. In ICDE, page 7, 2006.
[32]
M. A. Soliman, I. F. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.
[33]
Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. K., and S. Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB, pages 922--933, 2005.
[34]
Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424--445, 2007.
[35]
Y. Tao, D. Papadias, and X. Lian. Reverse kNN search in arbitrary dimensionality. In VLDB, pages 744--755, 2004.
[36]
Y. Tao, D. Papadias, X. Lian, and X. Xiao. Multidimensional reverse kNN search. In VLDBJ, 2005.
[37]
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB, pages 648--659, 2004.
[38]
Y. Theodoridis and T. Sellis. A model for the prediction of R-tree performance. In PODS, pages 161--171, 1996.
[39]
D. Xin, C. Chen, and J. Han. Towards robust indexing for ranked queries. In VLDB, 2006.
[40]
D. Xin, J. Han, and K. C.-C. Chang. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In SIGMOD, pages 103--114, 2007.
[41]
K. Yi, F. Li, D. Srivastava, and G. Kollios. Efficient processing of top-k queries in uncertain databases. In ICDE, pages 385--394, 2000.
[42]
M. L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis. Top-k spatial preference queries. In ICDE, pages 1076--1085, 2007.
[43]
M. L. Yiu and N. Mamoulis. Efficient processing of top-k dominating queries on multi-dimensional data. In VLDB, pages 483--494, 2007.

Cited By

View all
  • (2022)Approximating probabilistic group steiner trees in graphsProceedings of the VLDB Endowment10.14778/3565816.356583416:2(343-355)Online publication date: 1-Oct-2022
  • (2021)A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window ModelWSEAS TRANSACTIONS ON SYSTEMS AND CONTROL10.37394/23203.2021.16.2216(261-269)Online publication date: 25-May-2021
  • (2021)Efficient Probabilistic K-NN Computation in Uncertain Sensor NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2021.30998648:3(2575-2587)Online publication date: 1-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology
March 2008
762 pages
ISBN:9781595939265
DOI:10.1145/1353343
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT '08

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)13
Reflects downloads up to 09 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Approximating probabilistic group steiner trees in graphsProceedings of the VLDB Endowment10.14778/3565816.356583416:2(343-355)Online publication date: 1-Oct-2022
  • (2021)A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window ModelWSEAS TRANSACTIONS ON SYSTEMS AND CONTROL10.37394/23203.2021.16.2216(261-269)Online publication date: 25-May-2021
  • (2021)Efficient Probabilistic K-NN Computation in Uncertain Sensor NetworksIEEE Transactions on Network Science and Engineering10.1109/TNSE.2021.30998648:3(2575-2587)Online publication date: 1-Jul-2021
  • (2020)Uncertain Spatial Data Management: An OverviewHandbook of Big Geospatial Data10.1007/978-3-030-55462-0_14(355-397)Online publication date: 17-Dec-2020
  • (2019)The convex hull of finitely generable subsets and its predicate transformerProceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science10.5555/3470152.3470178(1-14)Online publication date: 24-Jun-2019
  • (2019)Representative Query Answers on Uncertain DataProceedings of the 16th International Symposium on Spatial and Temporal Databases10.1145/3340964.3340974(140-149)Online publication date: 19-Aug-2019
  • (2019)The convex hull of finitely generable subsets and its predicate transformer2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2019.8785680(1-14)Online publication date: Jun-2019
  • (2018)Entropy-Based Scheduling Policy for Cross Aggregate Ranking WorkloadsIEEE Transactions on Services Computing10.1109/TSC.2016.258606211:3(507-520)Online publication date: 1-May-2018
  • (2017)Efficient pruning for top-K ranking queries on attribute-wise uncertain datasetsJournal of Intelligent Information Systems10.1007/s10844-016-0403-x48:1(215-242)Online publication date: 1-Feb-2017
  • (2016)Probabilistic top-k range query processing for uncertain databasesJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-16904031:2(1109-1120)Online publication date: 1-Jan-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media