Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Parallel computation of skyline and reverse skyline queries using mapreduce

Published: 01 September 2013 Publication History

Abstract

The skyline operator and its variants such as dynamic skyline and reverse skyline operators have attracted considerable attention recently due to their broad applications. However, computations of such operators are challenging today since there is an increasing trend of applications to deal with big data. For such data-intensive applications, the MapReduce framework has been widely used recently.
In this paper, we propose efficient parallel algorithms for processing the skyline and its variants using MapReduce. We first build histograms to effectively prune out nonskyline (non-reverse skyline) points in advance. We next partition data based on the regions divided by the histograms and compute candidate (reverse) skyline points for each region independently using MapReduce. Finally, we check whether each candidate point is actually a (reverse) skyline point in every region independently. Our performance study confirms the effectiveness and scalability of the proposed algorithms.

References

[1]
F. N. Afrati, P. Koutris, D. Suciu, and J. D. Ullman. Parallel skyline queries. In ICDT, pages 274-284, 2012.
[2]
Apache. Apache hadoop. http://hadoop.apache.org, 2010.
[3]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322-331, 1990.
[4]
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In IEEE ICDE, pages 421-430, 2001.
[5]
R. E. Bryant. Data intensive scalable computing. Carnegie Mellon University. Retrieved August, 10:2009, 2008.
[6]
C. Y. Chan, H. V. Jagadish, K.-L. Tan, A. K. H. Tung, and Z. Zhang. On high dimensional skylines. In EDBT, 2006.
[7]
J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting. In IEEE ICDE, pages 717-719, 2003.
[8]
D. Comer. The ubiquitous b-tree. ACM Comput. Surv., 11(2):121-137, 1979.
[9]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, J. Gerth, J. Talbot, K. Elmeleegy, and R. Sears. Online aggregation and continuous query support in mapreduce. In SIGMOD Conference, 2010.
[10]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communication of the ACM, 51(1):107-113, 2008.
[11]
E. Dellis and B. Seeger. Efficient computation of reverse skyline queries. In VLDB, pages 291-302, 2007.
[12]
R. Finkel and J. Bentley. Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1), 1974.
[13]
Z. Huang, C. S. Jensen, H. Lu, and B. C. Ooi. Skyline queries against mobile lightweight devices in manets. In ICDE, page 66, 2006.
[14]
H. Köhler, J. Yang, and X. Zhou. Efficient parallel skyline processing using hyperplane projections. In SIGMOD Conference, pages 85-96, 2011.
[15]
D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, pages 275-286, 2002.
[16]
A. N. Laboratory. Mpich2. http://www.mpich.org/.
[17]
T. Lappas and D. Gunopulos. Efficient confident search in large review corpora. In ECML/PKDD (2), 2010.
[18]
J. Lee, S. won Hwang, Z. Nie, and J.-R. Wen. Navigation system for product search. In ICDE, 2010.
[19]
J. J. Levandoski, M. F. Mokbel, and M. E. Khalefa. Preference query evaluation over expensive attributes. In CIKM, 2010.
[20]
X. Lian and L. Chen. Reverse skyline search in uncertain databases. ACM Trans. Database Syst., 35(1), 2010.
[21]
X. Lin, Y. Zhang, W. Zhang, and M. A. Cheema. Stochastic skyline operator. In ICDE, pages 721-732, 2011.
[22]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. An optimal and progressive algorithm for skyline queries. In SIGMOD, pages 467-478, 2003.
[23]
J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, pages 15-26, 2007.
[24]
M. Sharifzadeh and C. Shahabi. The spatial skyline queries. In VLDB, pages 751-762, 2006.
[25]
K.-L. Tan, P.-K. Eng, and B. C. Ooi. Efficient progressive skyline computation. In VLDB, pages 301-310, 2001.
[26]
Y. Tao and D. Papadias. Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng., 18(2):377-391, 2006.
[27]
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37-57, 1985.
[28]
A. Vlachou, C. Doulkeridis, Y. Kotidis, and M. Vazirgiannis. Skypeer: Efficient subspace skyline computation over distributed data. In ICDE, 2007.
[29]
G. Wang, J. Xin, L. Chen, and Y. Liu. Energy-efficient reverse skyline query processing over wireless sensor networks. IEEE Trans. Knowl. Data Eng., 24(7), 2012.
[30]
B. Zhang, S. Zhou, and J. Guan. Adapting skyline computation to the mapreduce framework: Algorithms and experiments. In DASFAA, pages 403-414, 2011.
[31]
W. Zhang, X. Lin, Y. Zhang, W. Wang, and J. X. Yu. Probabilistic skyline operator over sliding windows. In ICDE, pages 1060-1071, 2009.
[32]
L. Zhu, Y. Tao, and S. Zhou. Distributed skyline retrieval with low bandwidth consumption. IEEE Trans. Knowl. Data Eng., 21(3):384-400, 2009.
[33]
L. Zou, L. Chen, M. T. Özsu, and D. Zhao. Dynamic skyline queries in large graphs. In DASFAA, 2010.

Cited By

View all
  • (2025)Mining area skyline objects from map-based big data using Apache Spark frameworkArray10.1016/j.array.2024.10037325(100373)Online publication date: Mar-2025
  • (2024)Efficient and Privacy-Preserving Skyline Queries Over Encrypted Data Under a Blockchain-Based Audit ArchitectureIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337360236:9(4603-4617)Online publication date: 8-Mar-2024
  • (2023)Efficient Location-Based Skyline Queries With Secure R-Tree Over Encrypted DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325388335:10(10436-10450)Online publication date: 1-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 6, Issue 14
September 2013
384 pages

Publisher

VLDB Endowment

Publication History

Published: 01 September 2013
Published in PVLDB Volume 6, Issue 14

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Mining area skyline objects from map-based big data using Apache Spark frameworkArray10.1016/j.array.2024.10037325(100373)Online publication date: Mar-2025
  • (2024)Efficient and Privacy-Preserving Skyline Queries Over Encrypted Data Under a Blockchain-Based Audit ArchitectureIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337360236:9(4603-4617)Online publication date: 8-Mar-2024
  • (2023)Efficient Location-Based Skyline Queries With Secure R-Tree Over Encrypted DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.325388335:10(10436-10450)Online publication date: 1-Oct-2023
  • (2023)Distributed probabilistic top-k dominating queries over uncertain databasesKnowledge and Information Systems10.1007/s10115-023-01917-365:11(4939-4965)Online publication date: 1-Jul-2023
  • (2023)An Enhanced Distributed Algorithm for Area Skyline Computation Based on Apache SparkKnowledge Science, Engineering and Management10.1007/978-3-031-40292-0_4(35-43)Online publication date: 16-Aug-2023
  • (2022)Research on Reverse Skyline Query Algorithm Based on Decision SetJournal of Database Management10.4018/JDM.31397133:1(1-28)Online publication date: 17-Nov-2022
  • (2022)Parallel Skyline Processing Using Space Pruning on GPUProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557414(1074-1083)Online publication date: 17-Oct-2022
  • (2022)Spatial-Keyword Skyline Publish/Subscribe Query Processing Over Distributed Sliding Window Streaming DataIEEE Transactions on Computers10.1109/TC.2022.314088471:10(2659-2674)Online publication date: 1-Oct-2022
  • (2022)ReSKY: Efficient Subarray Skyline Computation in Array DatabasesDistributed and Parallel Databases10.1007/s10619-022-07419-540:2-3(261-298)Online publication date: 1-Sep-2022
  • (2022)Weighted spatial skyline queries with distributed dominance testsCluster Computing10.1007/s10586-022-03559-625:5(3249-3264)Online publication date: 1-Oct-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media