Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Flexible Skylines: Dominance for Arbitrary Sets of Monotone Functions

Published: 10 December 2020 Publication History

Abstract

Skyline and ranking queries are two popular, alternative ways of discovering interesting data in large datasets. Skyline queries are simple to specify, as they just return the set of all non-dominated tuples, thereby providing an overall view of potentially interesting results. However, they are not equipped with any means to accommodate user preferences or to control the cardinality of the result set. Ranking queries adopt, instead, a specific scoring function to rank tuples, and can easily control the output size. While specifying a scoring function allows one to give different importance to different attributes by means of, e.g., weight parameters, choosing the “right” weights to use is known to be a hard problem.
In this article, we embrace the skyline approach by introducing an original framework able to capture user preferences by means of constraints on the weights used in a scoring function, which is typically much easier than specifying precise weight values. To this end, we introduce the novel concept of F-dominance, i.e., dominance with respect to a family of scoring functions F: a tuple t is said to F-dominate tuple s when t is always better than or equal to s according to all the functions in F.
Based on F-dominance, we present two flexible skyline (F-skyline) operators, both returning a subset of the skyline: nd, characterizing the set of non-F-dominated tuples; po, referring to the tuples that are also potentially optimal, i.e., best according to some function in F. While nd and po coincide and reduce to the traditional skyline when F is the family of all monotone scoring functions, their behaviors differ when subsets thereof are considered. We discuss the formal properties of these new operators, show how to implement them efficiently, and evaluate them on both synthetic and real datasets.

Supplementary Material

a18-ciaccia-apndx.pdf (ciaccia.zip)
Supplemental movie, appendix, image and software files for, Flexible Skylines: Dominance for Arbitrary Sets of Monotone Functions

References

[1]
E. Cabral Balreira, Olga Kosheleva, and Vladik Kreinovich. 2014. Algorithmics of checking whether a mapping is injective, surjective, and/or bijective. In Constraint Programming and Decision Making. 1--7.
[2]
Ilaria Bartolini, Paolo Ciaccia, and Marco Patella. 2008. Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33, 4 (2008), 31:1--31:49.
[3]
Ilaria Bartolini, Paolo Ciaccia, and Marco Patella. 2014. Domination in the probabilistic world: Computing skylines for arbitrary correlations and ranking semantics. ACM Trans. Database Syst. 39, 2 (2014), 14:1--14:45.
[4]
Marcos V. N. Bedo, Paolo Ciaccia, Davide Martinenghi, and Daniel de Oliveira. 2019. A k-skyband approach for feature selection. In Proceedings of the 12th International Conference on Similarity Search and Applications (SISAP’19). 160--168.
[5]
Kenneth S. Bøgh, Sean Chester, and Ira Assent. 2016. SkyAlign: A portable, work-efficient skyline algorithm for multicore and GPU architectures. VLDB J. 25, 6 (2016), 817--841.
[6]
Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. 421--430.
[7]
L. E. J. Brouwer. 1912/3. Invariantz des n-dimensionalen Gebiets. Math. Ann. 71, 2 (1912/3), 305--313.
[8]
Michael J. Carey and Donald Kossmann. 1997. On saying “enough already!” in SQL. In Proceedings ACM SIGMOD International Conference on Management of Data (SIGMOD’97), 219--230.
[9]
Yuan-Chi Chang, Lawrence D. Bergman, Vittorio Castelli, Chung-Sheng Li, Ming-Ling Lo, and John R. Smith. 2000. The onion technique: Indexing for linear optimization queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 391--402.
[10]
Jan Chomicki, Paolo Ciaccia, and Niccolo’ Meneghetti. 2013. Skyline queries, front and back. SIGMOD Rec. 42, 3 (2013), 6--18.
[11]
Jan Chomicki, Parke Godfrey, Jarek Gryz, and Dongming Liang. 2003. Skyline with presorting. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’03). 717--719.
[12]
Paolo Ciaccia and Davide Martinenghi. 2017. Reconciling skyline and ranking queries. Proc. VLDB Endow. 10, 11 (2017), 1454--1465.
[13]
Paolo Ciaccia and Davide Martinenghi. 2018. FA + TA <FSA: Flexible score aggregation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). 57--66.
[14]
Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. 2016. Crowdsourcing for top-K query processing over uncertain data. Trans. Knowl. Data Eng. 28, 1 (2016), 41--53.
[15]
Mark de Berg, Otfried Cheong, Marc J. van Kreveld, and Mark H. Overmars. 2008. Computational Geometry: Algorithms and Applications, 3rd Edition. Springer. Retrieved from http://www.worldcat.org/oclc/227584184.
[16]
Evangelos Dellis and Bernhard Seeger. 2007. Efficient computation of reverse skyline queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB’07). ACM, 291--302.
[17]
Yun Seong Eum, Kyung Sam Park, and Soung Hie Kim. 2001. Establishing dominance and potential optimality in multi-criteria analysis with imprecise weight and value. Comput. Oper. Res. 28 (2001), 397--409.
[18]
Ronald Fagin. 1996. Combining fuzzy information from multiple systems. In Proceedings of the Symposium on Principles of Database Systems (PODS’96). 216--226.
[19]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the Symposium on Principles of Database Systems (PODS’01).
[20]
Alex Alves Freitas. 2004. A critical review of multi-objective optimization in data mining: A position paper. SIGKDD Explor. 6, 2 (2004), 77--86.
[21]
Volker Gaede and Oliver Günther. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 2 (1998), 170--231.
[22]
Bernd Gärtner and Jiří Matoušek. 2006. Understanding and Using Linear Programming. Springer. 81--104.
[23]
Parke Godfrey. 2004. Skyline cardinality for relational processing. In Proceedings of the 3rd International Symposium on Foundations of Information and Knowledge Systems (FoIKS’04). 78--97.
[24]
Parke Godfrey, Ryan Shipley, and Jarek Gryz. 2007. Algorithms and analyses for maximal vector computation. VLDB J. 16, 1 (2007), 5--28.
[25]
Ihab F. Ilyas, George Beskales, and Mohamed A. Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (2008).
[26]
Volker Kaibel and Marc E. Pfetsch. 2003. Some algorithmic problems in polytope theory. In Algebra, Geometry, and Software Systems [outcome of a Dagstuhl seminar]. 23--47.
[27]
Kenneth A. Kaufman and Ryszard S. Michalski. 1999. Learning from inconsistent and noisy data: The AQ18 approach. In Proceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS’99) (Lecture Notes in Computer Science), Vol. 1609. Springer, 411--419.
[28]
Xuemin Lin, Yidong Yuan, Qing Zhang, and Ying Zhang. 2007. Selecting stars: The k most representative skyline operator. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07). 86--95.
[29]
Christoph Lofi and Wolf-Tilo Balke. 2013. On skyline queries and how to choose from pareto sets. In Advanced Query Processing, Volume 1: Issues and Trends. 15--36.
[30]
Christoph Lofi, Ulrich Güntzer, and Wolf-Tilo Balke. 2010. Efficient computation of trade-off skylines. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). 597--608.
[31]
Niccolo’ Meneghetti, Denis Mindolin, Paolo Ciaccia, and Jan Chomicki. 2015. Output-sensitive evaluation of prioritized skyline queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1955--1967.
[32]
Denis Mindolin and Jan Chomicki. 2011. Preference elicitation in prioritized skyline queries. VLDB J. 20, 2 (2011), 157--182.
[33]
Kyriakos Mouratidis and Bo Tang. 2018. Exact processing of uncertain top-k queries in multi-criteria settings. Proc. VLDB Endow. 11, 8 (2018), 866--879.
[34]
Kyriakos Mouratidis, Jilian Zhang, and HweeHwa Pang. 2015. Maximum rank query. Proc. VLDB Endow. 8, 12 (2015), 1554--1565. Retrieved from http://www.vldb.org/pvldb/vol8/p1554-Mouratidis.pdf.
[35]
Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. Trans. Data. Syst. 30, 1 (2005), 41--82.
[36]
Jian Pei, Bin Jiang, Xuemin Lin, and Yidong Yuan. 2007. Probabilistic skylines on uncertain data. In Proceedings of the International Conference on Very Large Data Bases (VLDB’07). ACM, 15--26.
[37]
Jian Pei, Yidong Yuan, Xuemin Lin, Wen Jin, Martin Ester, Qing Liu, Wei Wang, Yufei Tao, Jeffrey Xu Yu, and Qing Zhang. 2006. Towards multidimensional subspace skyline analysis. Trans. Data. Syst. 31, 4 (2006), 1335--1381.
[38]
Li Qian, Jinyang Gao, and H. V. Jagadish. 2015. Learning user preferences by adaptive pairwise comparison. Proc. VLDB Endow. 8, 11 (2015), 1322--1333.
[39]
Ahti Salo and Raimo P. Hämäläinen. 2010. Preference programming—Multicriteria weighting models under incomplete information. In Handbook of Multicriteria Analysis. Springer, Chapter 4, 167--187.
[40]
Mehdi Sharifzadeh and Cyrus Shahabi. 2006. The spatial skyline queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB’06). 751--762.
[41]
Yannis Siskos, Evangelos Grigoroudis, and Nikolaos F. Matsatsinis. 2005. UTA Methods. Springer, New York, 297--334.
[42]
Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’11). 805--816.
[43]
Yufei Tao, Ling Ding, Xuemin Lin, and Jian Pei. 2009. Distance-based representative skyline. In Proceedings of the 25th International Conference on Data Engineering (ICDE’09). 892--903.
[44]
Man Lung Yiu and Nikos Mamoulis. 2007. Efficient processing of top-k dominating queries on multi-dimensional data. In Proceedings of the 33rd International Conference on Very Large Data Bases. 483--494. Retrieved from http://www.vldb.org/conf/2007/papers/research/p483-yiu.pdf.
[45]
Jilian Zhang, Kyriakos Mouratidis, and HweeHwa Pang. 2014. Global immutable region computation. In Proceedings of the International Conference on Management of Data (SIGMOD’14). 1151--1162.

Cited By

View all
  • (2025)Parallelizing the Computation of Grid Resistance to Measure the Strength of Skyline TuplesAlgorithms10.3390/a1801002918:1(29)Online publication date: 7-Jan-2025
  • (2024)Marrying Top-k with Skyline Queries: Operators with Relaxed Preference Input and Controllable Output SizeACM Transactions on Database Systems10.1145/3705726Online publication date: 22-Nov-2024
  • (2024)Directional Queries: Making Top-k Queries More Effective in Discovering Relevant ResultsProceedings of the ACM on Management of Data10.1145/36988072:6(1-26)Online publication date: 20-Dec-2024
  • Show More Cited By

Index Terms

  1. Flexible Skylines: Dominance for Arbitrary Sets of Monotone Functions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 45, Issue 4
    SIGMOD 2019 Best Paper, PODS 2019 Best Paper, and Regular Papers
    December 2020
    170 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/3441631
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 December 2020
    Accepted: 01 June 2020
    Revised: 01 May 2020
    Received: 01 May 2019
    Published in TODS Volume 45, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Skyline queries
    2. monotone functions
    3. ranking queries

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Parallelizing the Computation of Grid Resistance to Measure the Strength of Skyline TuplesAlgorithms10.3390/a1801002918:1(29)Online publication date: 7-Jan-2025
    • (2024)Marrying Top-k with Skyline Queries: Operators with Relaxed Preference Input and Controllable Output SizeACM Transactions on Database Systems10.1145/3705726Online publication date: 22-Nov-2024
    • (2024)Directional Queries: Making Top-k Queries More Effective in Discovering Relevant ResultsProceedings of the ACM on Management of Data10.1145/36988072:6(1-26)Online publication date: 20-Dec-2024
    • (2024)Efficient Skyline Keyword-Based Tree Retrieval on Attributed GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338898836:11(6056-6070)Online publication date: Nov-2024
    • (2024)Hybrid Regret Minimization: A Submodular ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3328596(1-14)Online publication date: 2024
    • (2023)rkHit: Representative Query with Uncertain PreferenceProceedings of the ACM on Management of Data10.1145/35892711:2(1-26)Online publication date: 20-Jun-2023
    • (2021)Efficient processing of k-regret minimization queries with theoretical guaranteesInformation Sciences10.1016/j.ins.2021.11.080Online publication date: Dec-2021

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media