A Performance Study of Big Spatial Data Systems

Published: 06 November 2018 Publication History


With the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence, spatial data processing system has become an important field of research. In recent times a number of Big Spatial Data systems have been proposed by researchers around the world. These systems can be roughly categorized into Apache Hadoop-based and in-memory systems based on Apache Spark. The available features supported by these systems vary widely. However, there has not been any comprehensive evaluation study of these systems in terms of performance, scalability and functionality. To address this need, we propose a benchmark to evaluate Big Spatial Data systems.
Although, Spark is a very popular framework, its performance is limited by the overhead associated with distributed resource management and coordination. The Big Spatial Data systems that are based on Spark, are also constrained by these. We introduce SpatialIgnite, a Big Spatial Data system that we have developed based on Apache Ignite. We investigate the present status of the Big Spatial Data systems by conducting a comprehensive feature analysis and performance evaluation of a few representative systems with our benchmark. Our study shows that SpatialIgnite performs better than Hadoop and Spark based systems that we have evaluated.


BigSpatial '18: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
November 2018
