Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2525314.2525347acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

A parallel spatial data analysis infrastructure for the cloud

Published: 05 November 2013 Publication History

Abstract

Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these applications slow, even for medium-sized datasets. At the same time, there is a rapid expansion in available processing cores, through multicore machines and Cloud computing. The confluence of these trends demands effective parallelization of spatial query processing. Unfortunately, traditional parallel spatial databases are ill-equipped to deal with the performance heterogeneity that is common in the Cloud.
We introduce Niharika, a parallel spatial data analysis infrastructure that exploits all available cores in a heterogeneous cluster. Niharika first uses a declustering technique that creates balanced spatial partitions. Then, Niharika adapts to performance heterogeneity and processing skew in the spatial dataset using dynamic load-balancing. We evaluate Niharika with three load-balancing algorithms and two different spatial datasets (both from TIGER) using Amazon EC2 instances. Niharika adapts to the performance heterogeneity in the EC2 nodes, thereby achieving excellent speedups (e.g., 63.6X using 64 cores on 16 4-core EC2 nodes, in the best case) and outperforming an approach that does not adapt.

References

[1]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In VLDB, pages 922--933, 2009.
[2]
R. Acker, C. Roth, and R. Bayer. Parallel query processing in databases on multicore architectures. In ICA3PP, 2008.
[3]
A. Aji, F. Wang, and J. H. Saltz. Towards building a high performance spatial query system for large scale medical imaging data. In SIGSPATIAL, pages 309--318, 2012.
[4]
M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. In VLDB, pages 1064--1075, 2012.
[5]
V. Bharadwaj, D. Ghose, V. Mani, and T. G. Robertazzi. Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society, 1996.
[6]
D. Borthakur. Petabyte scale databases and storage systems deployed at facebook. In SIGMOD, 2013.
[7]
T. Brinkhoff, H. peter Kriegel, and B. Seeger. Parallel Processing of Spatial Joins Using R-trees. In ICDE, 1996.
[8]
P. C. Campbell, K. D. Devine, J. E. Flaherty, L. G. Gervasio, and J. D. Teresco. Dynamic octree load balancing using space-filling curves. Williams College, TR CS-03-01, 2003.
[9]
D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database processing. Commun. of the ACM, 35(6):85--98, 1992.
[10]
B. Farley, A. Juels, V. Varadarajan, T. Ristenpart, K. D. Bowers, and M. M. Swift. More for your money: exploiting performance heterogeneity in public clouds. In SoCC, 2012.
[11]
N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database Servers on Chip Multiprocessors: Limitations and Opportunities. In CIDR, pages 79--87, 2007.
[12]
E. H. Jacox and H. Samet. Spatial join techniques. ACM Transactions on Database Systems, 32(1), 2007.
[13]
G. Luo, J. F. Naughton, and C. J. Ellmann. A Non-Blocking Parallel Spatial Join Algorithm. In ICDE, 2002.
[14]
T. Mayr, P. Bonnet, and J. Gehrke. Leveraging non-uniform resources for parallel query processing. In CCGrid, 2002.
[15]
PostgreSQL Partitioning. http://www.postgresql.org/-docs/8.3/static/ddl-partitioning.html.
[16]
J. M. Patel and D. J. DeWitt. Partition based spatial-merge join. In SIGMOD, pages 259--270, 1996.
[17]
J. M. Patel and D. J. DeWitt. Clone join and shadow join: two parallel spatial join algorithms. In SIGSPATIAL, 2000.
[18]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. Dewitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009.
[19]
S. Ray, B. Simion, and A. D. Brown. Jackpine: A benchmark to evaluate spatial database performance. In ICDE, 2011.
[20]
B. Simion, S. Ray, and A. D. Brown. Surveying the landscape: An in-depth analysis of spatial database workloads. In SIGSPATIAL, pages 376--385, 2012.
[21]
http://www.census.gov/geo/www/tiger.
[22]
M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI, pages 29--42, 2008.
[23]
S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. SJMR: Parallelizing spatial join with MapReduce on clusters. In CLUSTER, pages 1--8, 2009.
[24]
X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. Geoinformatica, 1998.

Cited By

View all
  • (2023)Optimum Vector Information Technologies Based on the Multidimensional Combinatorial ConfigurationsInternational Journal of Computational and Applied Mathematics & Computer Science10.37394/232028.2023.3.123(104-112)Online publication date: 6-Nov-2023
  • (2023)An Enhanced Partitioning Approach in SpatialHadoop for Handling Big Spatial DataInternational Journal of Computational Intelligence Systems10.1007/s44196-023-00188-816:1Online publication date: 15-Feb-2023
  • (2022)Spatial Data Quality in the Internet of Things: Management, Exploitation, and ProspectsACM Computing Surveys10.1145/349833855:3(1-41)Online publication date: 3-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL'13: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2013
598 pages
ISBN:9781450325219
DOI:10.1145/2525314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud
  2. load balancing
  3. performance heterogeneity
  4. spatial join

Qualifiers

  • Research-article

Conference

SIGSPATIAL'13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Optimum Vector Information Technologies Based on the Multidimensional Combinatorial ConfigurationsInternational Journal of Computational and Applied Mathematics & Computer Science10.37394/232028.2023.3.123(104-112)Online publication date: 6-Nov-2023
  • (2023)An Enhanced Partitioning Approach in SpatialHadoop for Handling Big Spatial DataInternational Journal of Computational Intelligence Systems10.1007/s44196-023-00188-816:1Online publication date: 15-Feb-2023
  • (2022)Spatial Data Quality in the Internet of Things: Management, Exploitation, and ProspectsACM Computing Surveys10.1145/349833855:3(1-41)Online publication date: 3-Feb-2022
  • (2021)Big Data Process Engineering under Manifold Coordinate SystemsWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2021.18.218(7-11)Online publication date: 2-Apr-2021
  • (2020)Methods of Big Vector Data Processing Under Toroidal Coordinate Systems2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT)10.1109/CSIT49958.2020.9321955(105-108)Online publication date: 23-Sep-2020
  • (2020)A machine learning approach for predicting computational intensity and domain decomposition in parallel geoprocessingInternational Journal of Geographical Information Science10.1080/13658816.2020.173085034:11(2243-2274)Online publication date: 20-Feb-2020
  • (2019)Research challenges in query processing and data analytics on the edgeProceedings of the 29th Annual International Conference on Computer Science and Software Engineering10.5555/3370272.3370308(317-322)Online publication date: 4-Nov-2019
  • (2019)Computational Domain Decomposition in Parallel Geoprocessing – the Case on Generating DEM from LiDAR Point CloudIGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium10.1109/IGARSS.2019.8900149(4795-4798)Online publication date: Jul-2019
  • (2019)Parallel co-location mining with MapReduce and NoSQL systemsKnowledge and Information Systems10.1007/s10115-019-01381-yOnline publication date: 21-Aug-2019
  • (2018)2DPR-Tree: Two-Dimensional Priority R-Tree Algorithm for Spatial Partitioning in SpatialHadoopISPRS International Journal of Geo-Information10.3390/ijgi70501797:5(179)Online publication date: 9-May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media