Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3557915.3560960acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
poster

Spatial embedding: a generic machine learning model for spatial query optimization

Published: 22 November 2022 Publication History

Abstract

Machine learning and deep learning techniques are increasingly applied to produce efficient query optimizers, in particular in regards to big data systems. The optimization of spatial operations is even more challenging due to the inherent complexity of such kind of operations, like spatial join, range queries, and the peculiarities of spatial data. Even though a few ML-based spatial query optimizers have been proposed in literature, their design limits their use, since each one is tailored for a specific collection of datasets, a specific operation, or specific a hardware. Changes to any of these will require building and training a completely new model which entails collecting a new very large training data to obtain a good model
This paper proposes a new approach for ML-based query optimization which exploits the use of the novel notion of spatial embedding for overcoming these limitations. In particular, a preliminary model is defined which captures the relevant features of spatial datasets, independently from the operation to be optimized and in an unsupervised manner. Given that, a specialized model for the optimization of each spatial operation can be trained by using spatial embeddings as input, so the cost of building the first model can be amortized and a smaller training set is required for the specialized ones.

References

[1]
W. Aref and H. Samet. 1994. A Cost Model for Query Optimization Using R-Trees. In ACMGIS. ACM, 60--67.
[2]
Alberto Belussi and Christos Faloutsos. 1998. Self-spacial Join Selectivity Estimation Using Fractal Concepts. ACM TIS 16, 2 (1998), 161--201.
[3]
Alberto Belussi, Sara Migliorini, and Ahmed Eldawy. 2018. Detecting Skewness of Big Spatial Data in SpatialHadoop (SIGSPATIAL '18). 432--435.
[4]
A Belussi, S Migliorini, and A Eldawy. 2020. A Cost Model for Spatial Join Operations in SpatialHadoop. GeoInformatica 24, 4 (2020), 1021--1059.
[5]
Ahmed Eldawy et al. 2021. Beast: Scalable Exploratory Analytics on Spatio-temporal Data. In CIKM. ACM.
[6]
A. Eldawy and M. F. Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In ICDE. 1352--1363.
[7]
Puloma Katiyar, Tin Vu, Sara Migliorini, Alberto Belussi, and Ahmed Eldawy. 2020. SpiderWeb: A Spatial Data Generator on the Web. In SIGSPATIAL. ACM.
[8]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR.
[9]
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018).
[10]
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2016. Autoencoding beyond Pixels Using a Learned Similarity Metric (ICML'16). JMLR.org, 1558--1566.
[11]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Learning to Steer Query Optimizers. SIGMOD.
[12]
Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In aiDM@SIGMOD. ACM, 3:1--3:4.
[13]
Ryan C. Marcus et al. 2019. Neo: A Learned Query Optimizer. PVLDB 12, 11 (2019), 1705--1718.
[14]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).
[15]
Samriddhi Singla and Ahmed Eldawy. 2020. Flexible Computation of Multidimensional Histograms. In SpatialGems (Seattle, Washington, USA). ACM.
[16]
Tin Vu, Alberto Belussi, Sara Migliorini, and Ahmed Eldawy. 2020. Using Deep Learning for Big Spatial Data Partitioning. ACM Trans. Spatial Algorithms Syst. 7, 1 (2020), 3:1--3:37.
[17]
Tin Vu, Alberto Belussi, Sara Migliorini, and Ahmed Eldawy. 2021. A Learned Query Optimizer for Spatial Join (SIGSPATIAL '21). 458--467.
[18]
T. Vu, S. Migliorini, A. Eldawy, and A. Belussi. 2019. Spatial Data Generators. In 1st ACM SIGSPATIAL Int. Workshop on Spatial Gems (SpatialGems 2019). 7.
[19]
Zongheng Yang et al. 2020. NeuroCard: One Cardinality Estimator for All Tables. PVLDB 14, 1 (2020), 61--73.
[20]
J. Yu, J. Wu, and M. Sarwat. 2015. GeoSpark: a cluster computing framework for processing large-scale spatial data. In SIGSPATIAL. 70:1--70:4.

Cited By

View all
  • (2024)Augmentation Techniques for Balancing Spatial Datasets in Machine and Deep Learning ApplicationsProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691230(91-101)Online publication date: 29-Oct-2024
  • (2024)A Generic Machine Learning Model for Spatial Query Optimization based on Spatial EmbeddingsACM Transactions on Spatial Algorithms and Systems10.1145/365763310:4(1-33)Online publication date: 13-Apr-2024
  • (2023)MaaSDB: Spatial Databases in the Era of Large Language Models (Vision Paper)Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625597(1-4)Online publication date: 13-Nov-2023

Index Terms

  1. Spatial embedding: a generic machine learning model for spatial query optimization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems
      November 2022
      806 pages
      ISBN:9781450395298
      DOI:10.1145/3557915
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 November 2022

      Check for updates

      Author Tags

      1. big data
      2. machine learning
      3. query optimizer
      4. range query

      Qualifiers

      • Poster

      Conference

      SIGSPATIAL '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 257 of 1,238 submissions, 21%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Augmentation Techniques for Balancing Spatial Datasets in Machine and Deep Learning ApplicationsProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691230(91-101)Online publication date: 29-Oct-2024
      • (2024)A Generic Machine Learning Model for Spatial Query Optimization based on Spatial EmbeddingsACM Transactions on Spatial Algorithms and Systems10.1145/365763310:4(1-33)Online publication date: 13-Apr-2024
      • (2023)MaaSDB: Spatial Databases in the Era of Large Language Models (Vision Paper)Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625597(1-4)Online publication date: 13-Nov-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media