Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Domination in the Probabilistic World: Computing Skylines for Arbitrary Correlations and Ranking Semantics

Published: 26 May 2014 Publication History

Abstract

In a probabilistic database, deciding if a tuple u is better than another tuple v has not a univocal solution, rather it depends on the specific Probabilistic Ranking Semantics (PRS) one wants to adopt so as to combine together tuples' scores and probabilities.
In deterministic databases it is known that skyline queries are a remarkable alternative to (top-k) ranking queries, because they remove from the user the burden of specifying a scoring function that combines values of different attributes into a single score. The skyline of a deterministic relation R is the set of undominated tuples in R -- tuple u dominates tuple v iff on all the attributes of interest u is better than or equal to v and strictly better on at least one attribute. Domination is equivalent to having s(u) ≥ s(v) for all monotone scoring functions s().
The skyline of a probabilistic relation Rp can be similarly defined as the set of P-undominated tuples in Rp, where now u P-dominates v iff, whatever monotone scoring function one would use to combine the skyline attributes, u is reputed better than v by the PRS at hand. This definition, which is applicable to arbitrary ranking semantics and probabilistic correlation models, is parametric in the adopted PRS, thus it ensures that ranking and skyline queries will always return consistent results.
In this article we provide an overall view of the problem of computing the skyline of a probabilistic relation. We show how, under mild conditions that indeed hold for all known PRSs, checking P-domination can be cast into an optimization problem, whose complexity we characterize for a variety of combinations of ranking semantics and correlation models. For each analyzed case we also provide specific P-domination rules, which are exploited by the algorithm we detail for the case where the probabilistic model is known to the query processor. We also consider the case in which the probability of tuple events can only be obtained through an oracle, and describe another skyline algorithm for this loosely integrated scenario. Our experimental evaluation of P-domination rules and skyline algorithms confirms the theoretical analysis.

Supplementary Material

a14-bartolini-apndx.pdf (bartolini.zip)
Supplemental movie, appendix, image and software files for, Domination in the Probabilistic World: Computing Skylines for Arbitrary Correlations and Ranking Semantics

References

[1]
F. N. Afrati, P. Koutris, D. Suciu, and J. D. Ullman. 2012. Parallel skyline queries. In Proceedings of the 15th International Conference on Database Theory (ICDT'12). ACM Press, New York, 274--284.
[2]
P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. U. Nabar, T. Sugihara, and J. Widom. 2006. Trio: A system for data, uncertainty, and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06). ACM Press, New York, 1151--1154.
[3]
M. J. Atallah, Y. Qi, and H. Yuan. 2011. Asymptotically efficient algorithms for skyline probabilities of uncertain data. ACM Trans. Database Syst. 36, 2.
[4]
M. Balazinska, A. Deshpande, M. J. Franklin, P. B. Gibbons, J. Gray, M. H. Hansen, M. Liebhold, S. Nath, A. S. Szalay, and V. Tao. 2007. Data management in the worldwide sensor web. IEEE Pervas. Comput. 6, 2, 30--40.
[5]
I. Bartolini, P. Ciaccia, and M. Patella. 2008. Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33, 4.
[6]
I. Bartolini, P. Ciaccia, and M. Patella. 2013. The skyline of a probabilistic relation. IEEE Trans. Knowl. Data Engin. 25, 7, 1656--1669.
[7]
O. Benjelloun, A. Das Sarma, A. Y. Halevy, M. Theobald, and J. Widom. 2008. Databases with uncertainty and lineage. VLDB J. 17, 2, 243--264.
[8]
S. Borzsonyi, D. Kossmann, and K. Stocker. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering (ICDE'01). IEEE Computer Society, 421--430.
[9]
J. Chomicki, P. Ciaccia, and N. Meneghetti. 2013. Skyline queries, front and back. SIGMOD Rec. 42, 3, 6--18.
[10]
J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. 2003. Skyline with presorting. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). IEEE Computer Society, 717--719.
[11]
C.-Y. Chong and S. P. Kumar. 2003. Sensor networks: Evolution, opportunities, and challenges. Proc. IEEE 91, 8, 1247--1256.
[12]
G. Cormode, F. Li, and K. Yi. 2009. Semantics of ranking queries for probabilistic data and expected ranks. In Proceedings of the 25th International Conference on Data Engineering (ICDE'09). IEEE Computer Society, 305--316.
[13]
R. G. Cowell, P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer.
[14]
N. N. Dalvi, C. Re, and D. Suciu. 2011. Queries and materialized views on probabilistic databases. J. Comput. Syst. Sci. 77, 3, 473--490.
[15]
N. N. Dalvi and D. Suciu. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04). Morgan Kaufmann, 864--875.
[16]
A. Das Sarma, O. Benjelloun, A. Y. Halevy, and J. Widom. 2006. Working models for uncertain data. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE Computer Society.
[17]
X. L. Dong, A. Y. Halevy, and C. Yu. 2007. Data integration with uncertainty. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). ACM Press, New York, 687--698.
[18]
T. Emrich, H.-P. Kriegel, N. Mamoulis, M. Renz, and A. Zufle. 2012. Querying uncertain spatiotemporal data. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). IEEE Computer Society, 354--365.
[19]
M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco, CA.
[20]
P. Godfrey, R. Shipley, and J. Gryz. 2005. Maximal vector computation in large data sets. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). ACM Press, 229--240.
[21]
J. Jestes, G. Cormode, F. Li, and K. Yi. 2011. Semantics of ranking queries for probabilistic data. IEEE Trans. Knowl. Data Engin. 23, 12, 1903--1917.
[22]
J. Kleinberg and E. Tardos. 2006. Algorithm Design. Addison-Wesley.
[23]
J. Li, B. Saha, and A. Deshpande. 2009. A unified approach to ranking in probabilistic databases. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB'09). ACM Press, New York, 502--513.
[24]
J. Li, B. Saha, and A. Deshpande. 2011. A unified approach to ranking in probabilistic databases. VLDB J. 20, 2, 249--275.
[25]
X. Lin, Y. Zhang, W. Zhang, and M. A. Cheema. 2011. Stochastic skyline operator. In Proceedings of the 25th International Conference on Data Engineering (ICDE'11). IEEE Computer Society, 721--732.
[26]
M. D. Morse, J. M. Patel, and H. V. Jagadish. 2007. Efficient skyline computation over low-cardinality domains. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). ACM Press, New York, 267--278.
[27]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1, 41--82.
[28]
J. Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA.
[29]
J. Pei, B. Jiang, X. Li, and Y. Yuan. 2007. Probabilistic skylines on uncertain data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). ACM Press, New York, 15--26.
[30]
M. A. Soliman, I. F. Ilyas, and K. C.-C. Chang. 2007. Top-k query processing in uncertain databases. In Proceedings of the 23rd International Conference on Data Engineering (ICDE'07). IEEE Computer Society, 896--905.
[31]
M. A. Soliman, I. F. Ilyas, and K. C.-C. Chang. 2008. Probabilistic top-k and ranking-aggregate queries. ACM Trans. Database Syst. 33, 3.
[32]
Y. Tao and D. Papadias. 2006. Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Engin. 18, 2, 377--391.
[33]
G. Trimponias, I. Bartolini, D. Papadias, and Y. Yang. 2013. Skyline processing on distributed vertical decompositions. IEEE Trans. Knowl. Data Engin. 25, 4, 850--862.
[34]
D. Yan and W. Ng. 2011. Robust ranking of uncertain data. In Proceedings of the 16th International Conference on Database Systems for Advanced Applications (DASFAA'11). 254--268.
[35]
K. Yi, F. Li, G. Kollios, and D. Srivastava. 2008. Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans. Knowl. Data Engin. 20, 12, 1669--1682.
[36]
Y. Yuan, X. Lin, Q. Liu, W. Wang, J. X. Yu, and Q. Zhang. 2005. Efficient computation of the skyline cube. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). ACM Press, New York, 241--252.
[37]
S. Zhang, N. Mamoulis, and D. W. Cheung. 2009. Scalable skyline computation using object-based space partitioning. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, 483--494.
[38]
W. Zhang, X. Lin, Y. Zhang, M. A. Cheema, and Q. Zhang. 2012. Stochastic skylines. ACM Trans. Database Syst. 37, 2.
[39]
X. Zhang and J. Chomicki. 2008. On the semantics and evaluation of top-k queries in probabilistic databases. In Proceedings of the 2nd International Workshop on Ranking in Databases (DBRank'08). IEEE Computer Society, 556--563.
[40]
X. Zhang and J. Chomicki. 2009. Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Parallel Databases 26, 1, 67--126.

Cited By

View all

Index Terms

  1. Domination in the Probabilistic World: Computing Skylines for Arbitrary Correlations and Ranking Semantics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 39, Issue 2
    May 2014
    336 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/2627748
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 May 2014
    Accepted: 01 February 2014
    Revised: 01 February 2014
    Received: 01 June 2013
    Published in TODS Volume 39, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Skyline queries
    2. probabilistic database
    3. ranking semantics

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Robust Best Point Selection under Unreliable User FeedbackProceedings of the VLDB Endowment10.14778/3681954.368195517:11(2681-2693)Online publication date: 1-Jul-2024
    • (2023)Finding Best Tuple via Error-prone User Interaction2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00141(1803-1816)Online publication date: Apr-2023
    • (2023)Probabilistic Reverse Top-k Query on Probabilistic DataDatabases Theory and Applications10.1007/978-3-031-47843-7_3(30-43)Online publication date: 7-Nov-2023
    • (2022)Interactive mining with ordered and unordered attributesProceedings of the VLDB Endowment10.14778/3551793.355181015:11(2504-2516)Online publication date: 1-Jul-2022
    • (2021)Interactive Search for One of the Top-kProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457322(1920-1932)Online publication date: 9-Jun-2021
    • (2021)Top-K Deep Video AnalyticsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452786(1037-1050)Online publication date: 9-Jun-2021
    • (2020)Flexible SkylinesACM Transactions on Database Systems10.1145/340611345:4(1-45)Online publication date: 10-Dec-2020
    • (2020)Modeling and Computing Probabilistic Skyline on Incomplete DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290496732:7(1405-1418)Online publication date: 1-Jul-2020
    • (2018)A Step forward for Spatial Skyline Queries for a Group of UsersProceedings of the 22nd International Database Engineering & Applications Symposium10.1145/3216122.3216142(54-63)Online publication date: 18-Jun-2018
    • (2018)Top k probabilistic skyline queries on uncertain dataNeurocomputing10.1016/j.neucom.2018.03.052317(1-14)Online publication date: Nov-2018
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media