research-article

LocationSpark: a distributed in-memory data management system for big spatial data

Authors:

Qutaibah M. Malluhi,

Mourad Ouzzani,

Walid G. ArefAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 9, Issue 13

Pages 1565 - 1568

https://doi.org/10.14778/3007263.3007310

Published: 01 September 2016 Publication History

Abstract

We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immutable spatial indexes have low overhead with fault tolerance. In addition, we build two new layers over Spark, namely a query scheduler and a query executor. The query scheduler is responsible for mitigating skew in spatial queries, while the query executor selects the best plan based on the indexes and the nature of the spatial queries. Furthermore, to avoid unnecessary network communication overhead when processing overlapped spatial data, We embed an efficient spatial Bloom filter into LocationSpark's indexes. Finally, LocationSpark tracks frequently accessed spatial data, and dynamically flushes less frequently accessed data into disk. We evaluate our system on real workloads and demonstrate that it achieves an order of magnitude performance gain over a baseline framework.

References

[1]

Geotrellis. https://github.com/geotrellis/geotrellis.

[2]

Magellan. https://github.com/harsha2010/magellan.

[3]

Spatialspark. http://simin.me/projects/spatialspark/.

[4]

F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. Technical report, National Technical University of Athens, Stanford University, December 2009.

[5]

A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop gis: A high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow., 6(11):1009--1020, Aug. 2013.

Digital Library

[6]

A. M. Aly, A. R. Mahmood, M. S. Hassan, W. G. Aref, M. Ouzzani, H. Elmeleegy, and T. Qadah. AQWA: adaptive query-workload-aware partitioning of big spatial data. PVLDB, 8(13):2062--2073, 2015.

Digital Library

[7]

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark sql: Relational data processing in spark. In SIGMOD '15, pages 1383--1394, New York, NY, USA, 2015. ACM.

Digital Library

[8]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04. USENIX Association, 2004.

Digital Library

[9]

A. Eldawy and M. Mokbel. Spatialhadoop: A mapreduce framework for spatial data. In ICDE'15, pages 1352--1363, April 2015.

[10]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI'14, pages 599--613, Broomfield, CO, Oct. 2014. USENIX Association.

Digital Library

[11]

M. S. Jia Yu, Jinxuan Wu. Geospark: A cluster computing framework for processing large-scale spatial data. In ACM SIGSPATIAL'15, Seattle, WA.

Digital Library

[12]

Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. Skewtune: Mitigating skew in mapreduce applications. In SIGMOD '12, pages 25--36, New York, NY, USA, 2012. ACM.

Digital Library

[13]

W. Lu, Y. Shen, S. Chen, and B. C. Ooi. Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow., 5(10):1016--1027, June 2012.

Digital Library

[14]

S. Nishimura, S. Das, D. Agrawal, and A. Abbadi. Md-hbase: A scalable multi-dimensional data infrastructure for location aware services. In MDM'12, volume 1, pages 7--16, 2011.

Digital Library

[15]

S. Shekhar, S. K. Feiner, and W. G. Aref. Spatial computing. Commun. ACM, 59(1):72--81, 2016.

Digital Library

[16]

B. Sowell, M. V. Salles, T. Cao, A. Demers, and J. Gehrke. An experimental analysis of iterated spatial joins in main memory. Proc. VLDB Endow., 6(14):1882--1893, Sept. 2013.

Digital Library

[17]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI'12, pages 15--28, San Jose, CA, 2012. USENIX.

Digital Library

[18]

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP'13, pages 423--438, New York, NY, USA, 2013. ACM.

Digital Library

Cited By

Geng LLee RZhang X(2024)RayJoin: Fast and Precise Spatial JoinProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656610(124-136)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656610
Yang XGuan XPang ZKui XWu H(2024)GridMesaFuture Generation Computer Systems10.1016/j.future.2024.02.010155:C(324-339)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.future.2024.02.010
Chen ZCong GAref W(2023)STAR: A Cache-based Stream Warehouse System for Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/36059449:4(1-27)Online publication date: 27-Jun-2023
https://dl.acm.org/doi/10.1145/3605944
Show More Cited By

Index Terms

LocationSpark: a distributed in-memory data management system for big spatial data
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Phenology of vegetation in Southern England from Envisat MERIS terrestrial chlorophyll index MTCI data

Given the close association between climate change and vegetation response, there is a pressing requirement to monitor the phenology of vegetation and understand further how its metrics vary over space and time. This article explores the use of the ...
Phenology of vegetation in Southern England from Envisat MERIS terrestrial chlorophyll index MTCI data

Given the close association between climate change and vegetation response, there is a pressing requirement to monitor the phenology of vegetation and understand further how its metrics vary over space and time. This article explores the use of the ...
Accuracy assessment of the ASTER GDEM and SRTM3 DEM: an example in the Loess Plateau and North China Plain of China

The digital elevation model DEM produced by the Shuttle Radar Topographic Mission SRTM has provided important fundamental data for topographic analysis in many fields. The recently released global digital elevation model GDEM produced by the Advanced ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 9, Issue 13

September 2016

378 pages

ISSN:2150-8097

Editor:
Surajit Chaudhuri
Microsoft Research

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2016

Published in PVLDB Volume 9, Issue 13

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
787
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)6

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Geng LLee RZhang X(2024)RayJoin: Fast and Precise Spatial JoinProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656610(124-136)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656610
Yang XGuan XPang ZKui XWu H(2024)GridMesaFuture Generation Computer Systems10.1016/j.future.2024.02.010155:C(324-339)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.future.2024.02.010
Chen ZCong GAref W(2023)STAR: A Cache-based Stream Warehouse System for Spatial DataACM Transactions on Spatial Algorithms and Systems10.1145/36059449:4(1-27)Online publication date: 27-Jun-2023
https://dl.acm.org/doi/10.1145/3605944
Hori KSasaki YAmagata DMurosaki YOnizuka MBordawekar RShmueli OAmsterdamer YFirmani DKipf A(2023)Learned Spatial Data PartitioningProceedings of the Sixth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3593078.3593932(1-8)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3593078.3593932
Liu KTong PLi MWu YHuang J(2023)ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at ScaleProceedings of the ACM on Management of Data10.1145/35889411:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588941
Lan HXie JBao ZLi FTian WWang FWang SZhang A(2022)VREProceedings of the VLDB Endowment10.14778/3554821.355483115:12(3398-3410)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.14778/3554821.3554831
Tong YPan XZeng YShi YXue CZhou ZZhang XChen LXu YXu KLv W(2022)Hu-FuProceedings of the VLDB Endowment10.14778/3514061.351406415:6(1159-1172)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.14778/3514061.3514064
Baig FNalluri PKong JWang FRenz MSarwat M(2022)SPEAR-boardProceedings of the 30th International Conference on Advances in Geographic Information Systems10.1145/3557915.3561042(1-4)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1145/3557915.3561042
Alam MTorgo LBifet A(2022)A Survey on Spatio-temporal Data Analytics SystemsACM Computing Surveys10.1145/350790454:10s(1-38)Online publication date: 10-Nov-2022
https://dl.acm.org/doi/10.1145/3507904
Xiao MWang HGeng LLee RZhang X(2022)An RDMA-enabled In-memory Computing Platform for R-tree on ClustersACM Transactions on Spatial Algorithms and Systems10.1145/35035138:2(1-26)Online publication date: 12-Feb-2022
https://dl.acm.org/doi/10.1145/3503513
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents