Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Budget-Conscious Fine-Grained Configuration Optimization for Spatio-Temporal Applications

Published: 01 September 2022 Publication History

Abstract

Based on the performance requirements of modern spatio-temporal data mining applications, in-memory database systems are often used to store and process the data. To efficiently utilize the scarce DRAM capacities, modern database systems support various tuning possibilities to reduce the memory footprint (e.g., data compression) or increase performance (e.g., additional indexes). However, the selection of cost and performance balancing configurations is challenging due to the vast number of possible setups consisting of mutually dependent individual decisions. In this paper, we introduce a novel approach to jointly optimize the compression, sorting, indexing, and tiering configuration for spatio-temporal workloads. Further, we consider horizontal data partitioning, which enables the independent application of different tuning options on a fine-grained level. We propose different linear programming (LP) models addressing cost dependencies at different levels of accuracy to compute optimized tuning configurations for a given workload and memory budgets. To yield maintainable and robust configurations, we extend our LP-based approach to incorporate reconfiguration costs as well as a worst-case optimization for potential workload scenarios. Further, we demonstrate on a real-world dataset that our models allow to significantly reduce the memory footprint with equal performance or increase the performance with equal memory size compared to existing tuning heuristics.

References

[1]
Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proc. ACM SIGMOD. 671--682.
[2]
Sanjay Agrawal, Nicolas Bruno, Surajit Chaudhuri, and Vivek R Narasayya. 2006. AutoAdmin: Self-Tuning Database SystemsTechnology. IEEE Data Eng. Bull. 29, 3 (2006), 7--15.
[3]
Ana Carolina Almeida, Fernanda Baião, Sérgio Lifschitz, Daniel Schwabe, and Maria Luiza M Campos. 2021. Tun-ocm: A model-driven approach to support database tuning decision making. Decision Support Systems 145 (2021).
[4]
Alexander Boehm. 2019. In-memory for the masses: enabling cost-efficient deployments of in-memory data management platforms for business applications. Proc. VLDB Endow. 12, 12, 2273--2275.
[5]
Martin Boissier. 2022. Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems. Proc. VLDB Endow. 15, 4 (2022), 780--793.
[6]
Martin Boissier and Max Jendruk. 2019. Workload-Driven and Robust Selection of Compression Schemes for Column Stores. In EDBT. 674--677.
[7]
Martin Boissier, Rainer Schlosser, and Matthias Uflacker. 2018. Hybrid data layouts for tiered HTAP databases with pareto-optimal data placements. In ICDE. IEEE, 209--220.
[8]
Michael Brendle, Nick Weber, Mahammad Valiyev, Norman May, Robert Schulze, Alexander Böhm, Guido Moerkotte, and Michael Grossniklaus. 2021. Precise, Compact, and Fast Data Access Counters for Automated Physical Database Design. BTW (2021).
[9]
Michael L. Bynum, Gabriel A. Hackebeil, William E. Hart, Carl D. Laird, Bethany L. Nicholson, John D. Siirola, Jean-Paul Watson, and David L. Woodruff. 2021. Pyomo-optimization modeling in Python (third ed.). Vol. 67. Springer Science & Business Media.
[10]
Surajit Chaudhuri and Vivek Narasayya. 2007. Self-tuning database systems: a decade of progress. In Proc. VLDB. 3--14.
[11]
Su Chen, Beng Chin Ooi, Kian-Lee Tan, and Mario A Nascimento. 2008. ST2B-tree: a self-tunable spatio-temporal B+-tree index for moving objects. In Proc. ACM SIGMOD. 29--42.
[12]
Patrick Damme, Annett Ungethüm, Juliana Hildebrandt, Dirk Habich, and Wolfgang Lehner. 2019. From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms. ACM Trans. Database Syst. 44, 3 (2019), 9:1--9:46.
[13]
Debabrata Dash, Neoklis Polyzotis, and Anastasia Ailamaki. 2011. CoPhy: a scalable, portable, and interactive index advisor for large workloads. Proc. VLDB Endow. 4, 6 (2011).
[14]
Markus Dreseler. 2022. Automatic Tiering for In-Memory Database Systems. Ph.D. Dissertation. Universität Potsdam.
[15]
Markus Dreseler, Martin Boissier, Tilmann Rabl, and Matthias Uflacker. 2020. Quantifying TPC-H choke points and their optimizations. Proc. VLDB Endow. 13, 8 (2020), 1206--1220.
[16]
Markus Dreseler, Jan Kossmann, Martin Boissier, Stefan Klauck, Matthias Uflacker, and Hasso Plattner. 2019. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. In Proc. EDBT. 313--324.
[17]
Martin Faust, David Schwalb, Jens Krüger, and Hasso Plattner. 2012. Fast Lookups for In-Memory Column Stores: Group-Key Indices, Lookup and Maintenance. In ADMS@VLDB. 13--22.
[18]
Martin Faust, David Schwalb, and Hasso Plattner. 2014. Composite Group-Keys: Space-efficient Indexing of Multiple Columns for Compressed In-Memory Column Stores. In Proc. IMDM@VLDB 2014. IMDM, 42--54.
[19]
Zhenni Feng and Yanmin Zhu. 2016. A survey on trajectory data mining: Techniques and applications. IEEE Access 4 (2016), 2056--2067.
[20]
Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.
[21]
William E Hart, Jean-Paul Watson, and David L Woodruff. 2011. Pyomo: modeling and solving mathematical programs in Python. Mathematical Programming Computation 3, 3 (2011), 219--260.
[22]
Hideaki Kimura, Vivek R. Narasayya, and Manoj Syamala. 2011. Compression Aware Physical Database Design. Proc. VLDB Endow. 4, 10 (2011), 657--668.
[23]
Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic mirror in my hand, which is the best in the land? an experimental evaluation of index selection algorithms. Proc. VLDB Endow. 13, 12 (2020), 2382--2395.
[24]
Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proc. VLDB Endow. 13, 11 (2020), 2382--2395.
[25]
Jan Kossmann and Rainer Schlosser. 2020. Self-driving database systems: A conceptual approach. Distributed and Parallel Databases 38, 4 (2020), 795--817.
[26]
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In Proc. ACM SIGMOD. 311--326.
[27]
Ahmed R Mahmood, Sri Punni, and Walid G Aref. 2019. Spatio-temporal access methods: a survey (2010--2017). GeoInformatica 23, 1 (2019), 1--36.
[28]
Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Networks for Query Performance Prediction. Proc. of the VLDB Endow. 12, 11 (2019).
[29]
Ryan Marcus, Olga Papaemmanouil, Sofiya Semenova, and Solomon Garber. 2018. NashDB: an end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proc. ACM SIGMOD. 1253--1267.
[30]
Jean Damascène Mazimpaka and Sabine Timpf. 2016. Trajectory data mining: A review of methods and applications. Journal of Spatial Information Science 13 (2016), 61--99.
[31]
Thomas Neumann and Michael J Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In Proc. CIDR.
[32]
Varun Pandey, Andreas Kipf, Dimitri Vorona, Tobias Mühlbauer, Thomas Neumann, and Alfons Kemper. 2016. High-performance geospatial analytics in hyperspace. In Proc. ACM SIGMOD. 2145--2148.
[33]
Jignesh M Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A data platform based on the scaling-up approach. Proc. VLDB Endow. 11, 6 (2018), 663--676.
[34]
Maria Patrou, Md Mahbub Alam, Puya Memarzia, Suprio Ray, Virendra C Bhavsar, Kenneth B Kent, and Gerhard W Dueck. 2018. DISTIL: a distributed in-memory data processing system for location-based services. In Proc. ACM SIGSPATIAL. 496--499.
[35]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems. In CIDR, Vol. 4. 1.
[36]
Ivy Peng, Marty McFadden, Eric Green, Keita Iwabuchi, Kai Wu, Dong Li, Roger Pearce, and Maya Gokhale. 2019. UMap: Enabling application-driven optimizations for page management. In IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). IEEE, 71--78.
[37]
Ivy B. Peng, Maya B. Gokhale, Karim Youssef, Keita Iwabuchi, and Roger Pearce. 2022. Enabling Scalable and Extensible Memory-Mapped Datastores in Userspace. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 866--877.
[38]
Vijayshankar Raman and Garret Swart. 2006. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In Proc. VLDB. 858--869.
[39]
Keven Richly. 2018. A Survey on Trajectory Data Management for Hybrid Transactional and Analytical Workloads. In IEEE Big Data. 562--569.
[40]
Keven Richly. 2019. Optimized Spatio-Temporal Data Structures for Hybrid Transactional and Analytical Workloads on Columnar In-Memory Databases. In Proc. PhD Workshop@VLDB.
[41]
Keven Richly, Janos Brauer, and Rainer Schlosser. 2020. Predicting Location Probabilities of Drivers to Improve Dispatch Decisions of Transportation Network Companies Based on Trajectory Data. In Proc. ICORES. 47--58.
[42]
Keven Richly, Rainer Schlosser, and Martin Boissier. 2021. Joint Index, Sorting, and Compression Optimization for Memory-Efficient Spatio-Temporal Data Management. In Proc. ICDE. 1901--1906.
[43]
Rainer Schlosser, Jan Kossmann, and Martin Boissier. 2019. Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies. In Proc. ICDE. 1238--1249.
[44]
NYC Taxi and Limousine Commission (TLC). 2020. Trip Record Data. https://www1.nyc.gov/site/tlc/about/data.page, accessed on 2022/06/01.
[45]
Gary Valentin, Michael Zuliani, Daniel C Zilio, Guy Lohman, and Alan Skelley. 2000. DB2 advisor: An optimizer smart enough to recommend its own indexes. In Proc. ICDE. 101--110.
[46]
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proc. ICDE. 1009--1024.
[47]
Maarten Vermeij, Wilko Quak, Martin Kersten, and Niels Nes. 2008. Monetdb, a novel spatial columnstore dbms. In Proc. FOSS4G. Citeseer, 193--199.
[48]
Lukas Vogel, Viktor Leis, Alexander van Renen, Thomas Neumann, Satoshi Imamura, and Alfons Kemper. 2020. Mosaic: a budget-conscious storage engine for relational database systems. Proc. VLDB Endow. 13, 12 (2020), 2662--2675.
[49]
Haozhou Wang, Kai Zheng, Hoyoung Jeung, Shane Bracher, Asadul Islam, Wasim Sadiq, Shazia Sadiq, and Xiaofang Zhou. 2015. Storing and processing massive trajectory data on SAP HANA. In Proc. ADC. 66--77.
[50]
Junxiong Wang, Immanuel Trummer, and Debabrota Basu. 2021. UDO: Universal Database Optimization Using Reinforcement Learning. Proc. VLDB Endow. 14, 13 (sep 2021), 3402--3414.
[51]
Marcel Weisgut, Daniel Ritter, Martin Boissier, and Michael Perscheid. 2022. Separated Allocator Metadata in Disaggregated In-Memory Databases: Friend or Foe?. In IEEE International Parallel and Distributed Processing Symposium, IPDPS Workshops. 1202--1208.
[52]
Xike Xie, Benjin Mei, Jinchuan Chen, Xiaoyong Du, and Christian S Jensen. 2016. Elite: an elastic infrastructure for big spatiotemporal trajectories. The VLDB Journal 25, 4 (2016), 473--493.
[53]
Ningyu Zhang, Guozhou Zheng, Huajun Chen, Jiaoyan Chen, and Xi Chen. 2014. Hbasespatial: A scalable spatial data storage based on hbase. In Proc. IEEE TRUSTCOM. 644--651.
[54]
Zhigang Zhang, Cheqing Jin, Jiali Mao, Xiaolin Yang, and Aoying Zhou. 2017. Trajspark: A scalable and efficient in-memory management system for big trajectory data. In Proc. APWeb-WAIM. 11--26.
[55]
Yu Zheng. 2015. Trajectory data mining: an overview. Proc. ACM TIST 6, 3 (2015), 1--41.
[56]
Daniel C Zilio, Jun Rao, Sam Lightstone, Guy Lohman, Adam Storm, Christian Garcia-Arellano, and Scott Fadden. 2004. DB2 design advisor: integrated automatic physical database design. In Proc. VLDB. 1087--1097.

Cited By

View all
  • (2023)AWARE: Workload-aware, Redundancy-exploiting Linear AlgebraProceedings of the ACM on Management of Data10.1145/35886821:1(1-28)Online publication date: 30-May-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 13
September 2022
278 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2022
Published in PVLDB Volume 15, Issue 13

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)AWARE: Workload-aware, Redundancy-exploiting Linear AlgebraProceedings of the ACM on Management of Data10.1145/35886821:1(1-28)Online publication date: 30-May-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media