SATO: a spatial data partitioning framework for scalable query processing

H Vo, A Aji, F Wang - Proceedings of the 22nd ACM SIGSPATIAL …, 2014 - dl.acm.org
Proceedings of the 22nd ACM SIGSPATIAL international conference on advances …, 2014dl.acm.org
Scalable spatial query processing relies on effective spatial data partitioning for query
parallelization, data pruning, and load balancing. These are often challenged by the intrinsic
characteristics of spatial data, such as high skew in data distribution and high complexity of
irregular multi-dimensional objects. In this demo, we present SATO, a spatial data
partitioning framework that can quickly analyze and partition spatial data with an optimal
spatial partitioning strategy for scalable query processing. SATO works in following steps: 1) …
Scalable spatial query processing relies on effective spatial data partitioning for query parallelization, data pruning, and load balancing. These are often challenged by the intrinsic characteristics of spatial data, such as high skew in data distribution and high complexity of irregular multi-dimensional objects. In this demo, we present SATO, a spatial data partitioning framework that can quickly analyze and partition spatial data with an optimal spatial partitioning strategy for scalable query processing. SATO works in following steps: 1) Sample, which samples a small fraction of input data for analysis, 2) Analyze, which quickly analyzes sampled data to find an optimal partition strategy, 3) Tear, which provides data skew aware partitioning and supports MapReduce based scalable partitioning, and 4) Optimize, which collects succinct partition statistics for potential query optimization. SATO also provides multiple level partitioning, which can be used to significantly improve window based queries in cloud based spatial query processing systems. SATO comes with a visualization component that provides heat maps and histograms for qualitative evaluation. SATO has been implemented within the Hadoop-GIS, a high performance spatial data warehousing system over MapReduce. SATO is also released as an independent software package to support various scalable spatial query processing systems. Our experiments have demonstrated that SATO can generate much balanced partitioning that can significantly improve spatial query performance with MapReduce comparing to traditional spatial partitioning approaches.
ACM Digital Library