Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389701acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Architecting a Query Compiler for Spatial Workloads

Published: 31 May 2020 Publication History

Abstract

Modern location-based applications rely extensively on the efficient processing of spatial data and queries. Spatial query engines are commonly engineered as an extension to a relational database or a cluster-computing framework. Large parts of the spatial processing runtime is spent on evaluating spatial predicates and traversing spatial indexing structures. Typical high-level implementations of these spatial structures incur significant interpretive overhead, which increases latency and lowers throughput. A promising idea to improve the performance of spatial workloads is to leverage native code generation techniques that have become popular in relational query engines. However, architecting a spatial query compiler is challenging since spatial processing has fundamentally different execution characteristics from relational workloads in terms of data dimensionality, indexing structures, and predicate evaluation. In this paper, we discuss the underlying reasons why standard query compilation techniques are not fully effective when applied to spatial workloads, and we demonstrate how a particular style of query compilation based on techniques borrowed from partial evaluation and generative programming manages to avoid most of these difficulties by extending the scope of custom code generation into the data structures layer. We extend the LB2 main-memory query compiler, a relational engine developed in this style, with spatial data types, predicates, indexing structures, and operators. We show that the spatial extension matches the performance of specialized library code and outperforms relational and map-reduce extensions.

Supplementary Material

MP4 File (3318464.3389701.mp4)
Presentation Video

References

[1]
Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. 2013. Hadoop gis: a high performance spatial data warehousing system over mapreduce. VLDB, Vol. 6, 11 (2013), 1009--1020.
[2]
Ahmed M Aly, Ahmed R Mahmood, Mohamed S Hassan, Walid G Aref, Mourad Ouzzani, Hazem Elmeleegy, and Thamir Qadah. 2015. AQWA: adaptive query workload aware partitioning of big spatial data. PVLDB, Vol. 8, 13 (2015), 2062--2073.
[3]
Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. 2015. Spark sql: Relational data processing in spark. In SIGMOD. ACM, 1383--1394.
[4]
Morton M. Astrahan, Mike W. Blasgen, Donald D. Chamberlin, Kapali P. Eswaran, Jim N Gray, Patricia P. Griffiths, W Frank King, Raymond A. Lorie, Paul R. McJones, James W. Mehl, et al. 1976. System R: relational approach to database management. TODS, Vol. 1, 2 (1976), 97--137.
[5]
Furqan Baig, Hoang Vo, Tahsin Kurc, Joel Saltz, and Fusheng Wang. 2017. SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing. In SIGSPATIAL. ACM, 28.
[6]
Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher Aberger, and Kunle Olukotun. 2016. Have Abstraction and Eat Performance, Too: Optimized Heterogeneous Computing with Parallel Patterns. In CGO. ACM, 194--205.
[7]
Dennis Butterstein and Torsten Grust. 2016. Precision performance surgery for CostgreSQL: LLVM-based Expression Compilation, Just in Time. VLDB, Vol. 9, 13 (2016), 1517--1520.
[8]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), Vol. 26, 2 (2008), 4.
[9]
Charles Consel and Olivier Danvy. 1993. Partial evaluation: Principles and perspectives. In Journees Francophones des Langages Applicatifs. 493--501.
[10]
Judith R Davis. 1998. IBM'S DB2 SPATIAL EXTENDER: MANAGING GEO-SPATIAL INFORMATION WITHIN THE DBMS. (1998).
[11]
Zhenhong Du, Xianwei Zhao, Xinyue Ye, Jingwei Zhou, Feng Zhang, and Renyi Liu. 2017. An Effective high-performance multiway spatial join algorithm with Spark. ISPRS International Journal of Geo-Information, Vol. 6, 4 (2017), 96.
[12]
Christian Duta, Denis Hirn, and Torsten Grust. 2019. Compiling PL/SQL Away. CIDR (2019).
[13]
Ahmed Eldawy and Mohamed F Mokbel. 2015. Spatialhadoop: A mapreduce framework for spatial data. In ICDE. 1352--1363.
[14]
Gregory Essertel, Ruby Tahboub, James Decker, Kevin Brown, Kunle Olukotun, and Tiark Rompf. 2018. Flare: Optimizing apache spark with native compilation for scale-up architectures and medium-size data. In OSDI. 799--815.
[15]
Grégory Essertel, Ruby Y Tahboub, Fei Wang, James Decker, and Tiark Rompf. 2019. Flare & lantern: efficiently swapping horses midstream. Proceedings of the VLDB Endowment, Vol. 12, 12 (2019), 1910--1913.
[16]
Yi Fang, Marc Friedman, Giri Nair, Michael Rys, and Ana-Elisa Schmid. 2008. Spatial indexing in microsoft SQL server 2008. SIGMOD, 1207--1216.
[17]
Anthony Fox, Chris Eichelberger, James Hughes, and Skylar Lyon. 2013. Spatio-temporal indexing in non-relational distributed databases. In Big Data. IEEE, 291--299.
[18]
Yoshihiko Futamura. 1971. Partial Evaluation of Computation Process -- An approach to a Compiler-Compiler. Transactions of the Institute of Electronics and Communication Engineers of Japan, Vol. 54-C, 8 (1971), 721--728.
[19]
Yoshihiko Futamura. 1999. Partial evaluation of computation process, revisited. Higher-Order and Symbolic Computation, Vol. 12, 4 (1999), 377--380.
[20]
Goetz Graefe. 1994. Volcano-an extensible and parallel query evaluation system. Knowledge and Data Engineering, IEEE Transactions on, Vol. 6, 1 (1994), 120--135.
[21]
Rick Greer. 1999. Daytona and the fourth-generation language Cymbal. In ACM SIGMOD Record, Vol. 28. ACM, 525--526.
[22]
Ralf Hartmut Güting, Thomas Behr, Victor Almeida, Zhiming Ding, Frank Hoffmann, Markus Spiekermann, and LG Datenbanksysteme für neue Anwendungen. [n. d.]. SECONDO: An extensible DBMS architecture and prototype.
[23]
Stefan Hagedorn, Philipp Götze, and Kai-Uwe Sattler. 2017. The STARK framework for spatio-temporal data analytics on spark. BTW (2017).
[24]
Neil D Jones. 1996. An introduction to partial evaluation. ACM Computing Surveys (CSUR), Vol. 28, 3 (1996), 480--503.
[25]
Yannis Klonatos, Christoph Koch, Tiark Rompf, and Hassan Chafi. 2014. Building efficient query engines in a high-level language. PVLDB, Vol. 7, 10 (2014), 853--864.
[26]
Konstantinos Krikellas, Stratis D Viglas, and Marcelo Cintra. 2010. Generating code for holistic query evaluation. In ICDE. IEEE, 613--624.
[27]
Jiamin Lu and Ralf Hartmut Guting. 2012. Parallel secondo: boosting database engines with hadoop. In Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on. IEEE, 738--743.
[28]
Ahmed R Mahmood, Sri Punni, and Walid G Aref. 2019. Spatio-temporal access methods: a survey (2010--2017). GeoInformatica, Vol. 23, 1 (2019), 1--36.
[29]
Frank McSherry, Michael Isard, and Derek G. Murray. 2015. Scalability! But at what COST? Technical Report. Microsoft Research.
[30]
Mohamed F. Mokbel, Thanaa M. Ghanem, and Walid G. Aref. 2003. Spatio-temporal access methods. IEEE Data Eng. Bull. (2003).
[31]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, Vol. 4, 9 (2011), 539--550.
[32]
Long-Van Nguyen-Dinh, Walid G Aref, and Mohamed Mokbel. 2010. Spatio-temporal access methods: Part 2 (2003--2010). IEEE Data Eng. Bull. (2010).
[33]
Shoji Nishimura, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2011. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In MDM, Vol. 1. IEEE, 7--16.
[34]
Peter Ogden, David Thomas, and Peter Pietzuch. 2016. AT-GIS: Highly Parallel Spatial Query Processing with Associative Transducers. In SIGMOD. ACM, 1041--1054.
[35]
B Ooi, Ron Sacks-Davis, and Jiawei Han. 1993. Indexing in spatial databases. (1993).
[36]
Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A Common Runtime for High Performance Data Analytics. In CIDR.
[37]
Varun Pandey, Andreas Kipf, Thomas Neumann, and Alfons Kemper. 2018. How good are modern spatial analytics systems? Proceedings of the VLDB Endowment, Vol. 11, 11 (2018), 1661--1673.
[38]
Varun Pandey, Andreas Kipf, Dimitri Vorona, Tobias Mü hlbauer, Thomas Neumann, and Alfons Kemper. 2016. High-Performance Geospatial Analytics in HyPerSpace. In SIGMOD. 2145--2148.
[39]
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - a vector algebra for portable database performance on modern hardware. VLDB, Vol. 9, 14 (2016), 1707--1718.
[40]
Jun Rao, Hamid Pirahesh, C Mohan, and Guy Lohman. 2006. Compiled query execution engine using JVM. In ICDE. IEEE, 23--23.
[41]
Tiark Rompf and Nada Amin. 2015. Functional pearl: a SQL to C compiler in 500 lines of code. In ICFP. ACM, 2--9.
[42]
Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, Vol. 55, 6 (2012), 121--130.
[43]
Amir Shaikhha, Ioannis Klonatos, Lionel Emile Vincent Parreaux, Lewis Brown, Mohammad Dashti Rahmat Abadi, and Christoph Koch. 2016. How to Architect a Query Compiler. In SIGMOD.
[44]
Eugene Sharygin, Ruben Buchatskiy, Roman Zhuykov, and Arseny Sher. 2017. Runtime Specialization of PostgreSQL Query Executor. In International Andrei Ershov Memorial Conference on Perspectives of System Informatics. Springer, 375--386.
[45]
Darius vS idlauskas and Christian S Jensen. 2014. Spatial joins in main memory: Implementation matters! PVLDB, Vol. 8, 1 (2014), 97--100.
[46]
Benjamin Sowell, Marcos Vaz Salles, Tuan Cao, Alan Demers, and Johannes Gehrke. 2013. An experimental analysis of iterated spatial joins in main memory. VLDB, Vol. 6, 14 (2013), 1882--1893.
[47]
Spark. [n. d.]. Tuning Spark. https://spark.apache.org/docs/latest/tuning.html.
[48]
Arvind K Sujeeth, Kevin J Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM TECS, Vol. 13, 4s (2014), 134.
[49]
Arvind K. Sujeeth, HyoukJoong. Lee, Kevin J. Brown, Tiark Rompf, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: an Implicitly Parallel Domain-Specific Language for Machine Learning. In ICML.
[50]
Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In SIGMOD. 307--322.
[51]
Ruby Y Tahboub and Tiark Rompf. 2016. On supporting compilation in spatial query engines:(Vision paper). In SIGSPATIAL.
[52]
Ruby Y Tahboub, Xilun Wu, Grégory M Essertel, and Tiark Rompf. 2019. Towards compiling graph queries in relational engines. In Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages. 30--41.
[53]
Mingjie Tang, Yongyang Yu, Qutaibah M Malluhi, Mourad Ouzzani, and Walid G Aref. 2016. Locationspark: a distributed in-memory data management system for big spatial data. VLDB, Vol. 9, 13 (2016), 1565--1568.
[54]
Maarten Vermeij, Wilko Quak, Martin Kersten, and Niels Nes. 2008. MonetDB, a novel spatial column-store DBMS. (2008).
[55]
Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M Essertel, and Tiark Rompf. 2019. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. PACMPL, Vol. 3, ICFP (2019), 1--31.
[56]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. (2016).
[57]
Simin You, Jianting Zhang, and Le Gruenwald. 2015. Large-scale spatial join query processing in cloud. In 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW). IEEE, 34--41.
[58]
Jia Yu, Jinxuan Wu, and Mohamed Sarwat. 2015. GeoSpark: A cluster computing framework for processing large-scale spatial data. In SIGSPATIAL. ACM, 70.
[59]
Shubin Zhang, Jizhong Han, Zhiyong Liu, Kai Wang, and Zhiyong Xu. 2009. Sjmr: Parallelizing spatial join with mapreduce on clusters. In Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE international conference on. IEEE, 1--8.

Cited By

View all
  • (2024)Compilation of Shape Operators on Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36897528:OOPSLA2(1162-1188)Online publication date: 8-Oct-2024
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2023) Deep Game Location Acquisition for Pokemon GO on Mobile Devices IEEE Transactions on Games10.1109/TG.2022.321705715:3(440-449)Online publication date: Sep-2023
  • Show More Cited By

Index Terms

  1. Architecting a Query Compiler for Spatial Workloads

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code generation
    2. extensible compilers
    3. futamura projections
    4. query compilation
    5. spatial query processing
    6. staging

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)132
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Compilation of Shape Operators on Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36897528:OOPSLA2(1162-1188)Online publication date: 8-Oct-2024
    • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
    • (2023) Deep Game Location Acquisition for Pokemon GO on Mobile Devices IEEE Transactions on Games10.1109/TG.2022.321705715:3(440-449)Online publication date: Sep-2023
    • (2022)BabelfishProceedings of the VLDB Endowment10.14778/3489496.348950115:2(196-210)Online publication date: 4-Feb-2022
    • (2021)Towards a polyglot framework for factorized MLProceedings of the VLDB Endowment10.14778/3476311.347637214:12(2918-2931)Online publication date: 28-Oct-2021
    • (2020)Compiling symbolic execution with staging and algebraic effectsProceedings of the ACM on Programming Languages10.1145/34282324:OOPSLA(1-33)Online publication date: 13-Nov-2020
    • (2020)How Good Are Modern Spatial Libraries?Data Science and Engineering10.1007/s41019-020-00147-96:2(192-208)Online publication date: 7-Nov-2020

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media