Abstract
We consider the maximal vector problem on uncertain data, which has been recently posed by the study on processing skyline queries over a probabilistic data stream in the database context. Let D n be a set of n points in a d-dimensional space and q (0 < q ⩽1) be a probability threshold; each point in D n has a probability to occur. Our problem is concerned with how to estimate the expected size of the probabilistic skyline, which consists of all the points that are not dominated by any other point in D n with a probability not less than q. We prove that the upper bound of the expected size is O(min{n, (−ln q)(ln n)d−1}) under the assumptions that the value distribution on each dimension is independent and the values of the points along each dimension are distinct. The main idea of our proof is to find a recurrence about the expected size and solve it. Our results reveal the relationship between the probability threshold q and the expected size of the probabilistic skyline, and show that the upper bound is poly-logarithmic when q is not extremely small.
Similar content being viewed by others
References
Kung H T, Luccio F, Preparata F P. On finding the maxima of a set of vectors. J ACM, 1975, 22: 469–476
Barndorff-Nielsen O, Sobel M. On the distribution of the number of admissible points in a vector random sample. Theor Probab Appl, 1966, 11: 249–269
Bentley J L, Kung H T, Schkolnick M, et al. On the average number of maxima in a set of vectors and applications. J ACM, 1978, 25: 536–543
Buchta C. On the average number of maxima in a set of vectors. Inf Process Lett, 1989, 33: 63–65
Golin M J. Maxima in convex regions. In: Proceedings of the 4th Annual ACM-SIAMSymposium on Discrete Algorithms, Philadelphia, PA, USA, 1993. 352–360
Börzsönyi S, Kossmann D, Stocker K. The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, Washington DC, USA, 2001. 421–430
Zhang WJ, Lin XM, Zhang Y, et al. Probabilistic skyline operator over sliding windows. In: Proceedings of International Conference on Data Engineering, Los Alamitos, CA, USA, 2009. 1060–1071
Pei J, Jiang B, Lin X M, et al. Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 2007. 15–26
Godfrey P. Skyline cardinality for relational processing. In: Foundations of Information and Knowledge Systems, Wilhelminenburg Castle, Austria, 2004. 78–97
Godfrey P, Shipley R, Gryz J. Algorithms and analyses for maximal vector computation. VLDB J, 2007, 16: 5–28
Lin X M, Yuan Y D, Wang W, et al. Stabbing the sky: Efficient skyline computation over sliding windows. In: Proceedings of the 21st International Conference on Data Engineering, Washington DC, USA, 2005. 502–513
Knuth D E. The Art of Computer Programming, Volume 1 (3rd ed): Fundamental Algorithms. Redwood City: Addison Wesley Longman Publishing Co, Inc, 1997
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, Y., Wang, Y. Towards estimating expected sizes of probabilistic skylines. Sci. China Inf. Sci. 54, 2554–2564 (2011). https://doi.org/10.1007/s11432-011-4387-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4387-4