Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance

Published: 20 December 2024 Publication History

Abstract

This paper aims to bridge the gap between fast in-memory query engines and slow but robust engines that can utilize external storage. We find that current systems have to choose between fast in-memory operators and slower out-of-memory operators. We present a solution that leverages two independent but complementary techniques: First, we propose adaptive materialization, which can turn any hash-based in-memory operator into an out-of-memory operator without reducing in-memory performance. Second, we introduce self-regulating compression, which optimizes the throughput of spilling operators based on the current workload and available hardware. We evaluate these techniques using the prototype query engine Spilly, which matches the performance of state-of-the-art in-memory systems, but also efficiently executes large out-of-memory workloads by spilling to NVMe arrays.

References

[1]
August 24, 2022. DuckDB Out-of-Memory Hash Join. https://github.com/duckdb/duckdb/pull/4189
[2]
January 16, 2023. DuckDB Benchmarking Guidelines. https://duckdb.org/faq#i-benchmarked-duckdb-and-its-slowerthan-some-other-system
[3]
January 19, 2023. AWS c7g instance family. https://aws.amazon.com/de/ec2/instance-types/c7g/
[4]
January 19, 2024. https://rocksdb.org/
[5]
January 26, 2023. Tableau Hyper API. https://tableau.github.io/hyper-db/docs/
[6]
June 11, 2024. https://geizhals.de/micron-rdimm-32gb-mtc20f2085s1rc48ba1-a3017802.html
[7]
June 17, 2024. AWS i3 instance family. https://aws.amazon.com/ec2/instance-types/i3/
[8]
June 17, 2024. AWS i4i instance family. https://aws.amazon.com/ec2/instance-types/i4i/
[9]
March 26, 2024. https://www.scylladb.com/
[10]
November 1, 2023. Elevate Performance with 5th Gen Intel Xeon Processors Featuring Intel Accelerator Engines. https://www.intel.com/content/dam/www/central-libraries/us/en/documents/2023--11/5thgen-acceleratorengines-eguide.pdf
[11]
November 1, 2023. Intel 5th Generation Xeon Benchmarks. https://edc.intel.com/content/www/us/en/products/performance/benchmarks/5th-generation-intel-xeon-scalable-processors/
[12]
November 16, 2023. https://geizhals.de/micron-5100-pro-960gb-mtfddak960tcb-1ar16abyy-a1562532.html
[13]
November 16, 2023. Kioxia CM-7 Price History. https://www.idealo.de/preisvergleich/OffersOfProduct/203225197_-cm7-r-3--84tb-kioxia.html
[14]
November 16, 2023. Samsung PM1733 Price History. https://geizhals.de/samsung-ssd-pm1733--3--84tb-mzwlj3t8hbls-00007-a2202065.html
[15]
November 16, 2023. Samsung PM983 Price History. https://geizhals.de/samsung-ssd-pm983--3--84tb-mzqlb3t8hals-00007-a1870387.html
[16]
October 26, 2023. Skew-Aware Join in Postgres. https://github.com/postgres/postgres/blob/611806cd/src/include/executor/hashjoin.h#L95
[17]
September 13, 2023. Efficient IO with io_uring. https://kernel.dk/io_uring.pdf
[18]
September 26, 2023. DuckDB 0.9 Release Announcement. https://duckdb.org/2023/09/26/announcing-duckdb-090.html
[19]
Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB. Morgan Kaufmann, 169--180.
[20]
Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. 1999. DBMSs on a Modern Processor: Where Does Time Go?. In VLDB. Morgan Kaufmann, 266--277.
[21]
Manos Athanassoulis, Shimin Chen, Anastasia Ailamaki, Phillip B. Gibbons, and Radu Stoica. 2011. MaSM: efficient online updates in data warehouses. In SIGMOD. ACM, 865--876.
[22]
Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE. 362--373.
[23]
Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In SIGMOD. 168--180.
[24]
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD Conference. ACM, 37--48.
[25]
Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (2008), 77--85.
[26]
Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database Architecture Optimized for the New Bottleneck: Memory Access. In VLDB. 54--65.
[27]
John Cieslewicz and Kenneth A. Ross. 2007. Adaptive Aggregation on Chip Multiprocessors. In VLDB. 339--350.
[28]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, and David A. Wood. 1984. Implementation Techniques for Main Memory Database Systems. In SIGMOD. 1--8.
[29]
Jaeyoung Do and Jignesh M. Patel. 2009. Join processing for flash SSDs: remembering past lessons. In DaMoN. 1--8.
[30]
Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, and David J. DeWitt. 2013. Fast peak-to-peak behavior with SSD buffer pool. In ICDE. IEEE Computer Society, 1129--1140.
[31]
Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, and Alan Halverson. 2011. Turbocharging DBMS buffer pool using SSDs. In SIGMOD Conference. ACM, 1113--1124.
[32]
Thanh Do, Goetz Graefe, and Jeffrey F. Naughton. 2022. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4 (2022), 16:1--16:35.
[33]
Dominik Durner, Viktor Leis, and Thomas Neumann. 2023. Exploiting Cloud Object Storage for High-Performance Analytics. PVLDB 16, 11 (2023), 2769--2782.
[34]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In AofA: Analysis of Algorithms.
[35]
Goetz Graefe. 2007. The five-minute rule twenty years later, and how flash memory changes the rules. In DaMoN. 6.
[36]
Goetz Graefe, Ross Bunker, and Shaun Cooper. 1998. Hash Joins and Hash Teams in Microsoft SQL Server. In VLDB. 86--97.
[37]
Goetz Graefe, Stavros Harizopoulos, Harumi A. Kuno, Mehul A. Shah, Dimitris Tsirogiannis, and Janet L. Wiener. 2010. Designing Database Operators for Flash-enabled Memory Hierarchies. IEEE Data Eng. Bull. 33, 4 (2010), 21--27.
[38]
Jim Gray and Bob Fitzgerald. 2008. Flash Disk Opportunity for Server Applications. ACM Queue 6, 4 (2008), 18--23.
[39]
Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.
[40]
Gabriel Haas and Viktor Leis. 2023. What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines. PVLDB 16, 9 (2023), 2090--2102.
[41]
Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the looking glass, and what we found there. In SIGMOD Conference. ACM, 981--992.
[42]
Sven Helmer, Thomas Neumann, and Guido Moerkotte. 2002. Early grouping gets the skew. Technical reports 2 (2002).
[43]
Kaisong Huang, Tianzheng Wang, Qingqing Zhou, and Qingzhong Meng. 2023. The Art of Latency Hiding in Modern Database Engines. PVLDB 17, 3, 577--590.
[44]
Shiva Jahangiri, Michael J. Carey, and Johann-Christoph Freytag. 2022. Design Trade-offs for a Robust Dynamic Hybrid Hash Join. PVLDB 15, 10 (2022), 2257--2269.
[45]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE. 195--206.
[46]
Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter A. Boncz. 2018. Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask. PVLDB 11, 13 (2018), 2209--2222.
[47]
Ioannis Koltsidas and Stratis Viglas. 2011. Data management over flash memory. In SIGMOD. 1209--1212.
[48]
Laurens Kuiper, Peter Boncz, and Hannes Mühleisen. 2024. Robust External Hash Aggregation in the Solid State Age. In ICDE. IEEE.
[49]
Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis. 2023. BtrBlocks: Efficient Columnar Compression for Data Lakes. Proc. ACM Manag. Data 1, 2 (2023), 118:1--118:26.
[50]
Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High- Performance Concurrency Control Mechanisms for Main-Memory Databases. Proc. VLDB Endow. 5, 4 (2011), 298--309.
[51]
Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD. 743--754.
[52]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215.
[53]
Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. 185--196.
[54]
Viktor Leis, Kan Kundhikanjana, Alfons Kemper, and Thomas Neumann. 2015. Efficient Processing of Window Functions in Analytical SQL Queries. Proc. VLDB Endow. 8, 10 (2015), 1058--1069.
[55]
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In DaMoN. 3:1--3:8.
[56]
Feng Li, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. In SIGMOD. ACM, 355--370.
[57]
Aurosish Mishra, Shasank Chavan, Allison Holloway, Tirthankar Lahiri, Zhen Hua Liu, Sunil Chakkappen, Dennis Lui, Vinita Subramanian, Ramesh Kumar, Maria Colgan, Jesse Kamp, Niloy Mukherjee, and Vineet Marwah. 2016. Accelerating Analytics with Dynamic In-Memory Expressions. PVLDB 9, 13 (2016), 1437--1448.
[58]
Ingo Müller, Peter Sanders, Arnaud Lacurie, Wolfgang Lehner, and Franz Färber. 2015. Cache-Efficient Aggregation: Hashing Is Sorting. In SIGMOD Conference. 1123--1136.
[59]
Masaya Nakayama, Masaru Kitsuregawa, and Mikio Takagi. 1988. Hash-Partitioned Join Method Using Dynamic Destaging Strategy. In VLDB. 468--478.
[60]
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.
[61]
Thomas Neumann and Viktor Leis. 2014. Compiling Database Queries into Machine Code. IEEE Data Eng. Bull. 37, 1 (2014), 3--11.
[62]
Hamish Nicholson, Periklis Chrysogelos, and Anastasia Ailamaki. 2022. HPCache: Memory-Efficient OLAP Through Proportional Caching. In DaMoN. 7:1--7:9.
[63]
Hamish Nicholson, Aunn Raza, Periklis Chrysogelos, and Anastasia Ailamaki. 2023. HetCache: Synergising NVMe Storage and GPU acceleration for Memory-Efficient Analytics. In CIDR.
[64]
Stefan Noll, Jens Teubner, Norman May, and Alexander Böhm. 2018. Accelerating Concurrent Workloads with CPU Cache Partitioning. In ICDE. IEEE Computer Society, 437--448.
[65]
Tarikul Islam Papon and Manos Athanassoulis. 2021. The Need for a New I/O Model. In CIDR. www.cidrdb.org.
[66]
Jignesh M. Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A Data Platform Based on the Scaling-Up Approach. Proc. VLDB Endow. 11, 6 (2018), 663--676.
[67]
Orestis Polychroniou and Kenneth A. Ross. 2014. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD. 755--766.
[68]
Mark Raasveldt, Pedro Holanda, Tim Gubner, and Hannes Mühleisen. 2018. Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing. In DBTest. 2:1--2:6.
[69]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In SIGMOD. ACM, 1981--1984.
[70]
Bogdan Raducanu, Peter A. Boncz, and Marcin Zukowski. 2013. Micro adaptivity in Vectorwise. In SIGMOD. 1231--1242.
[71]
Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Müller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB 6, 11 (2013), 1080--1091.
[72]
Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, and Bishwaranjan Bhattacharjee. 2013. Making Updates Disk-I/O Friendly Using SSDs. Proc. VLDB Endow. 6, 11 (2013), 997--1008.
[73]
Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In SIGMOD Conference. ACM, 1961--1976.
[74]
Felix Martin Schuhknecht, Pankaj Khanchandani, and Jens Dittrich. 2015. On the Surprising Difficulty of Simple Things: the Case of Radix Partitioning. PVLDB 8, 9 (2015), 934--937.
[75]
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The End of an Architectural Era (It's Time for a Complete Rewrite). In VLDB. ACM, 1150--1160.
[76]
Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener, and Goetz Graefe. 2009. Query processing techniques for solid state drives. In SIGMOD. 59--72.
[77]
Alexander van Renen and Viktor Leis. 2023. Cloud Analytics Benchmark. PVLDB 16, 6 (2023), 1413--1425.
[78]
Leonard von Merzljak, Philipp Fent, Thomas Neumann, and Jana Giceva. 2022. What Are You Waiting For? Use Coroutines for Asynchronous I/O to Hide I/O Latencies and Maximize the Read Bandwidth!. In ADMS.
[79]
Till Westmann, Donald Kossmann, Sven Helmer, and Guido Moerkotte. 2000. The Implementation and Performance of Compressed Databases. SIGMOD 29, 3 (2000), 55--67.
[80]
Qizhen Zhang, Philip A. Bernstein, Daniel S. Berger, and Badrish Chandramouli. 2021. Redy: Remote Dynamic Memory Cache. PVLDB 15, 4 (2021), 766--779.
[81]
Zichen Zhu, Xiao Hu, and Manos Athanassoulis. 2023. NOCAP: Near-Optimal Correlation-Aware Partitioning Joins. Proc. ACM Manag. Data 1, 4 (2023), 252:1--252:27.
[82]
Marcin Zukowski, Mark van de Wiel, and Peter A. Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In ICDE. 1349--1350.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 6
SIGMOD
December 2024
792 pages
EISSN:2836-6573
DOI:10.1145/3709598
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2024
Published in PACMMOD Volume 2, Issue 6

Permissions

Request permissions for this article.

Author Tags

  1. high-performance
  2. nvme
  3. olap
  4. out-of-core
  5. out-of-memory
  6. spilling
  7. ssd

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 248
    Total Downloads
  • Downloads (Last 12 months)248
  • Downloads (Last 6 weeks)164
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media