Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Asymptotically efficient algorithms for skyline probabilities of uncertain data

Published: 02 June 2011 Publication History

Abstract

Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied. Some earlier work focused on probabilistic skylines with a given threshold; Atallah and Qi [2009] studied the problem to compute skyline probabilities for all instances of uncertain objects without the use of thresholds, and proposed an algorithm with subquadratic time complexity. In this work, we propose a new algorithm for computing all skyline probabilities that is asymptotically faster: worst-case O(nn log n) time and O(n) space for 2D data; O(n2−1/d logd−1 n) time and O(n logd−2 n) space for d-dimensional data. Furthermore, we study the online version of the problem: Given any query point p (unknown until the query time), return the probability that no instance in the given data set dominates p. We propose an algorithm for answering such an online query for d-dimensional data in O(n1−1/d logd−1 n) time after preprocessing the data in O(n2−1/d logd−1) time and space.

References

[1]
Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 1039--1051.
[2]
Afshani, P., Agarwal, P. K., Arge, L., Larsen, K. G., and Phillips, J. M. 2011. (Approximate) uncertain skylines. In Proceedings of the 14th International Conference on Database Theory.
[3]
Antova, L., Jansen, T., Koch, C., and Olteanu, D. 2008. Fast and simple relational processing of uncertain data. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los Alamitos, CA, 983--992.
[4]
Atallah, M. J. and Qi, Y. 2009. Computing all skyline probabilities for uncertain data. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, NY, 279--287.
[5]
Benjelloun, O., Sarma, A. D., Halevy, A., and Widom, J. 2006. ULDBs: Databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 953--964.
[6]
Bentley, J. L. 1980. Multidimensional divide-and-conquer. Comm. ACM 23, 214--229.
[7]
Beskales, G., Soliman, M. A., and IIyas, I. F. 2008. Efficient search for the top-k probable nearest neighbors in uncertain databases. Proc. VLDB Endow. 1, 326--339.
[8]
Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 421--430.
[9]
Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., and Suciu, D. 2005. MYSTIQ: a system for finding more answers by using probabilities. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 891--893.
[10]
Chen, B.-C., LeFevre, K., and Ramakrishnan, R. 2007. Privacy skyline: Privacy with multidimensional adversarial knowledge. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 770--781.
[11]
Cheng, R., Chen, J., Mokbel, M., and Chow, C.-Y. 2008. Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 973--982.
[12]
Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 551--562.
[13]
Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2004a. Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. 16, 1112--1127.
[14]
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., and Vitter, J. S. 2004b. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of the 30th International Conference on Very Large Data Bases VLDB. Vol. 30, VLDB Endowment, 876--887.
[15]
Cormode, G., Li, F., and Yi, K. 2009. Semantics of ranking queries for probabilistic data and expected ranks. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 305--316.
[16]
Dellis, E. and Seeger, B. 2007. Efficient computation of reverse skyline queries. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 291--302.
[17]
Hua, M., Pei, J., Zhang, W., and Lin, X. 2008. Ranking queries on uncertain data: A probabilistic threshold approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 673--686.
[18]
Kung, H. T., Luccio, F., and Preparata, F. P. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4, 469--476.
[19]
Li, J., Saha, B., and Deshpande, A. 2009. A unified approach to ranking in probabilistic databases. Proc. VLDB Endow. 2, 502--513.
[20]
Lian, X. and Chen, L. 2008a. Monochromatic and bichromatic reverse skyline search over uncertain databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 213--226.
[21]
Lian, X. and Chen, L. 2008b. Probabilistic ranked queries in uncertain databases. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology (EDBT). ACM, New York, NY, 511--522.
[22]
Lin, X., Yuan, Y., Zhang, Q., and Zhang, Y. 2007. Selecting stars: The k most representative skyline operator. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 86--95.
[23]
Ljosa, V. and Singh, A. K. 2008. Top-k spatial joins of probabilistic objects. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 566--575.
[24]
Mehlhorn, K. 1984. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. Springer-Verlag Berlin.
[25]
Morse, M., Patel, J. M., and Jagadish, H. V. 2007. Efficient skyline computation over low-cardinality domains. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 267--278.
[26]
Pei, J., Fu, A. W.-C., Lin, X., and Wang, H. 2007a. Computing compressed multidimensional skyline cubes efficiently. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 96--105.
[27]
Pei, J., Jiang, B., Lin, X., and Yuan, Y. 2007b. Probabilistic skylines on uncertain data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 15--26.
[28]
Preparata, F. and Shamos, M. 1985. Computational Geometry: An Introduction. Springer-Verlag.
[29]
Sarma, A. D., Theobald, M., and Widom, J. 2008. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1023--1032.
[30]
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., and Cheng, R. 2008. Database support for probabilistic attributes and tuples. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1053--1061.
[31]
Soliman, M. A., Ilyas, I. F., and Chang, K. C.-C. 2007. Urank: formulation and efficient evaluation of top-k queries in uncertain databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 1082--1084.
[32]
Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., and Prabhakar, S. 2005. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 922--933.
[33]
Vlachou, A., Doulkeridis, C., Kotidis, Y., and Vazirgiannis, M. 2007. SKYPEER: Efficient subspace skyline computation over distributed data. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 416--425.
[34]
Willard, D. E. 1985. New data structures for orthogonal range queries. SIAM J. Comput. 14, 232--253.
[35]
Wu, P., Agrawal, D., Egecioglu, O., and El Abbadi, A. 2007. DeltaSky: Optimal maintenance of skyline deletions without exclusive dominance region generation. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 486--495.
[36]
Zhang, W., Lin, X., Zhang, Y., Wang, W., and Yu, J. X. 2009. Probabilistic skyline operator over sliding windows. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1060--1071.
[37]
Zhang, X. and Chomicki, J. 2008. Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Datab. 26, 67--126.
[38]
Zhu, L., Zhou, S., and Guan, J. 2007. Efficient skyline retrieval on peer-to-peer networks. In Proceedings of the Future Generation Communication and Networking—Volume 02 (FGCN). IEEE Computer Society, Los, Alamitos, CA, 309--314.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 36, Issue 2
May 2011
257 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1966385
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2011
Accepted: 01 February 2011
Revised: 01 December 2010
Received: 01 May 2010
Published in TODS Volume 36, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Uncertain data
  2. probabilistic skyline

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media