Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3274895.3274923acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
poster

Detecting skewness of big spatial data in SpatialHadoop

Published: 06 November 2018 Publication History

Abstract

In recent years several extensions of Hadoop system have been proposed for dealing with spatial data and SpatialHadoop belongs to this group. In the MapReduce paradigm a task can be parallelized by partitioning data into chunks and performing the same operation on them, eventually combining the partial results at the end. Thus, the applied partitioning technique can tremendously affect the performance of a parallel execution, since it is the key point for obtaining balanced map tasks. However, when skewed distributed datasets are considered, using a regular grid might not be the right choice and other techniques have to be applied, which in turn are more expensive to build. This paper illustrates an approach for detecting the degree of skewness of a spatial dataset, based on the box counting function. Moreover, given the degree of skewness and some experimental observations, a heuristic is sketched in order to decide which partitioning technique to apply in order to improve as much as possible the performance of subsequent operations.

References

[1]
A. M. Aly, A. R. Mahmood, M. S. Hassan, W. G. Aref, M. Ouzzani, H. Elmeleegy, and T. Qadah. 2015. AQWA: Adaptive Query Workload Aware Partitioning of Big Spatial Data. Proc. VLDB Endow. 8, 13 (2015), 2062--2073.
[2]
A. Belussi, O. Boucelma, B. Catania, Y. Lassoued, and P. Podestà. 2006. Towards similarity-based topological query languages. In 10th International Conference on Extending Database Technology, EDBT 2006. Springer, Berlin, 675--686.
[3]
A. Belussi, D. Carra, S. Migliorini, M. Negri, and G. Pelagatti. 2018. What Makes Spatial Data Big? A Discussion on How to Partition Spatial Data. In 10th Int. Conf. on Geographic Information Science. LIPIcs, Dagstuhl, Germany, 1--15.
[4]
A. Belussi, C. Combi, and G. Pozzani. 2008. Towards a formal framework for spatio-temporal granularities. In Proceedings of the 15th Int. Workshop on Temporal Representation and Reasoning. 49--53.
[5]
A. Belussi and C. Faloutsos. 1998. Self-spacial Join Selectivity Estimation Using Fractal Concepts. ACM Trans. Inf. Syst. 16, 2 (1998), 161--201.
[6]
A. Eldawy, L. Alarabi, and M. F. Mokbel. 2015. Spatial Partitioning Techniques in SpatialHadoop. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1602--1605.
[7]
A. Eldawy and M. F. Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In 2015 IEEE 31st Int. Conf. on Data Engineering. 1352--1363.
[8]
A. Eldawy and M. F. Mokbel. 2017. Spatial Join with Hadoop. Springer International Publishing, 2032--2036.
[9]
C. Faloutsos, B. Seeger, A. Traina, and C. Traina, Jr. 2000. Spatial Join Selectivity Using Power Laws. SIGMOD Rec. 29, 2 (2000), 177--188.
[10]
P. Lu, G. Chen, B. C. Ooi, H. T. Vo, and S. Wu. 2014. ScalaGiST: Scalable Generalized Search Trees for Mapreduce Systems. Proc. VLDB Endow. 7, 14 (2014), 1797--1808.
[11]
H. Vo, A. Aji, and F. Wang. 2014. SATO: A Spatial Data Partitioning Framework for Scalable Query Processing. In Proc. of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 545--548.
[12]
D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In Proc. of the 2016 Int. Conf. on Management of Data. 1071--1085.

Cited By

View all
  • (2024)Augmentation Techniques for Balancing Spatial Datasets in Machine and Deep Learning ApplicationsProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691230(91-101)Online publication date: 29-Oct-2024
  • (2024)A Generic Machine Learning Model for Spatial Query Optimization based on Spatial EmbeddingsACM Transactions on Spatial Algorithms and Systems10.1145/365763310:4(1-33)Online publication date: 13-Apr-2024
  • (2024)Extract User-Generated Content from Spatial Data Provision Services2024 IEEE 18th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT61888.2024.10740450(1-6)Online publication date: 25-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2018
655 pages
ISBN:9781450358897
DOI:10.1145/3274895
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Check for updates

Author Tags

  1. BigData
  2. MapReduce
  3. SpatialHadoop
  4. partitioning
  5. skewed data

Qualifiers

  • Poster

Conference

SIGSPATIAL '18
Sponsor:

Acceptance Rates

SIGSPATIAL '18 Paper Acceptance Rate 30 of 150 submissions, 20%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Augmentation Techniques for Balancing Spatial Datasets in Machine and Deep Learning ApplicationsProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691230(91-101)Online publication date: 29-Oct-2024
  • (2024)A Generic Machine Learning Model for Spatial Query Optimization based on Spatial EmbeddingsACM Transactions on Spatial Algorithms and Systems10.1145/365763310:4(1-33)Online publication date: 13-Apr-2024
  • (2024)Extract User-Generated Content from Spatial Data Provision Services2024 IEEE 18th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT61888.2024.10740450(1-6)Online publication date: 25-Sep-2024
  • (2024)A learning-based framework for spatial join processing: estimation, optimization and tuningThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00836-133:4(1155-1177)Online publication date: 1-Jul-2024
  • (2022)Guard: Attack-Resilient Adaptive Load Balancing in Distributed Streaming SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.312307119:6(4172-4186)Online publication date: 1-Nov-2022
  • (2022)A Context-Aware Method for Indexing Large-Scale SpatioTemporal Data2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020916(6057-6065)Online publication date: 17-Dec-2022
  • (2022)Completeness Assessment and Improvement in Mobile Crowd-Sensing EnvironmentsSN Computer Science10.1007/s42979-022-01104-13:3Online publication date: 10-Apr-2022
  • (2021)CoPart: a context-based partitioning technique for big dataJournal of Big Data10.1186/s40537-021-00410-48:1Online publication date: 19-Jan-2021
  • (2021)SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/34600137:3(1-43)Online publication date: 8-Jun-2021
  • (2021)A High-Performance Spatial Range Query-Based Data Discovery Method on Massive Remote Sensing Data via Adaptive Geographic Meshing and CodingIEEE Journal on Miniaturization for Air and Space Systems10.1109/JMASS.2020.30356492:3(117-128)Online publication date: Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media