Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3282834.3282841acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

A Performance Study of Big Spatial Data Systems

Published: 06 November 2018 Publication History

Abstract

With the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence, spatial data processing system has become an important field of research. In recent times a number of Big Spatial Data systems have been proposed by researchers around the world. These systems can be roughly categorized into Apache Hadoop-based and in-memory systems based on Apache Spark. The available features supported by these systems vary widely. However, there has not been any comprehensive evaluation study of these systems in terms of performance, scalability and functionality. To address this need, we propose a benchmark to evaluate Big Spatial Data systems.
Although, Spark is a very popular framework, its performance is limited by the overhead associated with distributed resource management and coordination. The Big Spatial Data systems that are based on Spark, are also constrained by these. We introduce SpatialIgnite, a Big Spatial Data system that we have developed based on Apache Ignite. We investigate the present status of the Big Spatial Data systems by conducting a comprehensive feature analysis and performance evaluation of a few representative systems with our benchmark. Our study shows that SpatialIgnite performs better than Hadoop and Spark based systems that we have evaluated.

References

[1]
A. Ablimit, W. Fusheng, V. Hoang, L. Rubao, L. Qiaoling, Z. Xiaodong, and S.Joel. 2013. Hadoop GIS: A High Performance Spatial Data Warehousing System over Mapreduce. Proc. VLDB Endow. (2013).
[2]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In ACM SIGMOD.
[3]
T. Ashish, S. J. Sen, J. Namit, S. Zheng, C. Prasad, A. Suresh, L. Hao, W. Pete, and M. Raghotham. 2009. Hive: A Warehousing Solution over a Mapreduce Framework. Proc. VLDB Endow. (2009).
[4]
O. Christopher, R. Benjamin, S. Utkarsh, K. Ravi, and T. Andrew. 2008. Pig Latin: A Not-so-foreign Language for Data Processing. In ACM SIGMOD.
[5]
J. Dean and S. Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM (2008).
[6]
X. Dong, L. Feifei, Y. Bin, L. Gefei, Z. Liang, and G. Minyi. 2016. Simba: Efficient In-Memory Spatial Analytics. In ACM SIGMOD.
[7]
A. Eldawy and M. F. Mokbel. 2014. Pigeon: A Spatial MapReduce Language. In ICDE.
[8]
A. Eldawy and M. F. Mokbel. 2015. SpatialHadoop: A Mapreduce Framework for Spatial Data. In ICDE.
[9]
The Apache Software Foundation. 2007. "Apache Pig". (2007). https://pig.apache.org/
[10]
The Apache Software Foundation. 2013. "Apache Spark". (2013). https://spark.apache.org/
[11]
The Apache Software Foundation. 2015. "Apache Ignite". (2015). https://ignite.apache.org/
[12]
M. Frank, I. Michael, and M. Derek G. 2015. Scalability! But at What Cost?. In USENIX HOTOS.
[13]
F. García-García, A. Corral, L. Iribarne, G. Mavrommatis, and M. Vassilakopoulos. 2017. A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries. In ADBIS.
[14]
geospark 2018. GeoSpark. https://github.com/DataSystemsLab/GeoSpark/releases. (2018).
[15]
Hadoop 2018. Apache Hadoop. https://hadoop.apache.org/. (2018).
[16]
S. Hagedorn, P. Götze, and Kai-Uwe Sattler. 2017. Big Spatial Data Processing Frameworks: Feature and Performance Evaluation. In EDBT.
[17]
V. Hoang, A. Ablimit, and W. Fusheng. 2014. SATO: A Spatial Data Partitioning Framework for Scalable Query Processing. In SIGSPATIAL GIS.
[18]
Java Sun Microsystems Inc. 2017. "JTS Topology Suite". (2017). http://www.tsusiatsoftware.net/jts/main.html
[19]
Y. Jia, W. Jinxuan, and S. Mohamed. 2015. GeoSpark: A Cluster Computing Framework for Processing Large-scale Spatial Data. In SIGSPATIAL GIS.
[20]
T. Mingjie, Y. Yongyang, M. Qutaibah M., O. Mourad, and W. G. Aref. 2016. LocationSpark: A Distributed In-memory Data Management System for Big Spatial Data. Proc. VLDB Endow. (2016).
[21]
OGC 2018. Open Geospatial Consortium. Simple Feature Access - Part2: SQL Option. http://www.opengeospatial.org/standards/sfs. (2018).
[22]
M. Patrou, M. M. Alam, P. Memarzia, S. Ray, V. C. Bhavsar, K. B. Kent, and G. W. Dueck. 2018. DISTIL: A Distributed In-Memory Data Processing System for Location-Based Services. In SIGSPATIAL GIS.
[23]
N. Gupta S. M. Ali A. Rath R. K. Lenka, R. K. Barik and H. Dubey. 2016. Comparative Analysis of SpatialHadoop and GeoSpark for Geospatial Big Data Analytics. In IC3I.
[24]
S. Ray, B. Simion, and A. D. Brown. 2011. Jackpine: A Benchmark to Evaluate Spatial Database Performance. In ICDE.
[25]
G. Roumelis, M. Vassilakopoulos, A. Corral, and Y. Manolopoulos. 2017. Efficient Query Processing on Large Spatial Databases: A Performance Study. Journal of Systems and Software (2017).
[26]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. "The Hadoop Distributed File System". In MSST.
[27]
B. Simion, S. Ray, and A. D. Brown. 2012. Surveying the Landscape: An In-Depth Analysis of Spatial Database Workloads. In SIGSPATIAL GIS.
[28]
spatialhadoop.cs.umn.edu 2013. SpatialHadoop. https://github.com/aseldawy/spatialhadoop2. (2013).
[29]
H. Stefan, G. Philipp, and S. Kai-Uwe. 2017. The STARK Framework for Spatio-Temporal Data Analytics on Spark. Datenbanksysteme für Business, Technologie und Web (2017).
[30]
M. Stonebraker, J. Frew, K. Gardels, and J. Meredith. 1993. The SEQUOIA 2000 storage benchmark. In ACM SIGMOD.
[31]
Tiger® 2011. http://www.census.gov/geo/www/tiger. (2011).
[32]
TPC 1988. The Transaction Processing Performance Council. http://www.tpc.org. (1988).
[33]
S. You, J. Zhang, and L. Gruenwald. 2015. Large-Scale Spatial Join Query Processing in Cloud. In ICDE.
[34]
Y. Chen Z. Huang, L. Wan, and X. Peng. 2017. GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark. ISPRS Int. J. of Geo-Information (2017).
[35]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In USENIX NSDI.
[36]
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, and I. Stoica. 2010. Spark: Cluster Computing with Working Sets. In USENIX HotCloud.
[37]
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM (2016).

Cited By

View all
  • (2023)Scalable Spatial Analytics and In Situ Query Processing in DaskDBProceedings of the 18th International Symposium on Spatial and Temporal Data10.1145/3609956.3609978(189-193)Online publication date: 23-Aug-2023
  • (2023)A mediation system for continuous spatial queries on a unified schema using Apache SparkBig Earth Data10.1080/20964471.2023.22758548:1(115-141)Online publication date: 9-Nov-2023
  • (2022)REIP: A Reconfigurable Environmental Intelligence Platform and Software Framework for Fast Sensor Network PrototypingSensors10.3390/s2210380922:10(3809)Online publication date: 17-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BigSpatial '18: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
November 2018
68 pages
ISBN:9781450360418
DOI:10.1145/3282834
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Benchmark
  2. Big Spatial Data
  3. Hadoop
  4. Ignite
  5. In-Memory
  6. Performance Evaluation
  7. Spark

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGSPATIAL '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 32 of 58 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Scalable Spatial Analytics and In Situ Query Processing in DaskDBProceedings of the 18th International Symposium on Spatial and Temporal Data10.1145/3609956.3609978(189-193)Online publication date: 23-Aug-2023
  • (2023)A mediation system for continuous spatial queries on a unified schema using Apache SparkBig Earth Data10.1080/20964471.2023.22758548:1(115-141)Online publication date: 9-Nov-2023
  • (2022)REIP: A Reconfigurable Environmental Intelligence Platform and Software Framework for Fast Sensor Network PrototypingSensors10.3390/s2210380922:10(3809)Online publication date: 17-May-2022
  • (2022)Emerging Technologies for Smart Cities’ Transportation: Geo-Information, Data Analytics and Machine Learning ApproachesISPRS International Journal of Geo-Information10.3390/ijgi1102008511:2(85)Online publication date: 24-Jan-2022
  • (2022)A Survey on Spatio-temporal Data Analytics SystemsACM Computing Surveys10.1145/350790454:10s(1-38)Online publication date: 10-Nov-2022
  • (2022)Big Spatial Data Systems - A review2022 5th International Conference on Engineering Technology and its Applications (IICETA)10.1109/IICETA54559.2022.9888481(147-152)Online publication date: 31-May-2022
  • (2021)A Survey on Big Data Processing Frameworks for Mobility AnalyticsACM SIGMOD Record10.1145/3484622.348462650:2(18-29)Online publication date: 31-Aug-2021
  • (2020)Analyzing spatial analytics systems based on Hadoop and Spark: A user perspectiveSoftware: Practice and Experience10.1002/spe.288250:12(2121-2144)Online publication date: 31-Aug-2020
  • (2019)GeoYCSB: A Benchmark Framework for the Performance and Scalability Evaluation of NoSQL Databases for Geospatial Workloads2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9005570(3666-3675)Online publication date: Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media