research-article

Open access

High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance

Authors:

Maximilian Kuschewski,

Thomas Neumann,

Viktor LeisAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 6

Article No.: 238, Pages 1 - 27

https://doi.org/10.1145/3698813

Published: 20 December 2024 Publication History

Abstract

This paper aims to bridge the gap between fast in-memory query engines and slow but robust engines that can utilize external storage. We find that current systems have to choose between fast in-memory operators and slower out-of-memory operators. We present a solution that leverages two independent but complementary techniques: First, we propose adaptive materialization, which can turn any hash-based in-memory operator into an out-of-memory operator without reducing in-memory performance. Second, we introduce self-regulating compression, which optimizes the throughput of spilling operators based on the current workload and available hardware. We evaluate these techniques using the prototype query engine Spilly, which matches the performance of state-of-the-art in-memory systems, but also efficiently executes large out-of-memory workloads by spilling to NVMe arrays.

References

[1]

August 24, 2022. DuckDB Out-of-Memory Hash Join. https://github.com/duckdb/duckdb/pull/4189

[2]

January 16, 2023. DuckDB Benchmarking Guidelines. https://duckdb.org/faq#i-benchmarked-duckdb-and-its-slowerthan-some-other-system

[3]

January 19, 2023. AWS c7g instance family. https://aws.amazon.com/de/ec2/instance-types/c7g/

[4]

January 19, 2024. https://rocksdb.org/

[5]

January 26, 2023. Tableau Hyper API. https://tableau.github.io/hyper-db/docs/

[6]

June 11, 2024. https://geizhals.de/micron-rdimm-32gb-mtc20f2085s1rc48ba1-a3017802.html

[7]

June 17, 2024. AWS i3 instance family. https://aws.amazon.com/ec2/instance-types/i3/

[8]

June 17, 2024. AWS i4i instance family. https://aws.amazon.com/ec2/instance-types/i4i/

[9]

March 26, 2024. https://www.scylladb.com/

[10]

November 1, 2023. Elevate Performance with 5th Gen Intel Xeon Processors Featuring Intel Accelerator Engines. https://www.intel.com/content/dam/www/central-libraries/us/en/documents/2023--11/5thgen-acceleratorengines-eguide.pdf

[11]

November 1, 2023. Intel 5th Generation Xeon Benchmarks. https://edc.intel.com/content/www/us/en/products/performance/benchmarks/5th-generation-intel-xeon-scalable-processors/

[12]

November 16, 2023. https://geizhals.de/micron-5100-pro-960gb-mtfddak960tcb-1ar16abyy-a1562532.html

[13]

November 16, 2023. Kioxia CM-7 Price History. https://www.idealo.de/preisvergleich/OffersOfProduct/203225197_-cm7-r-3--84tb-kioxia.html

[14]

November 16, 2023. Samsung PM1733 Price History. https://geizhals.de/samsung-ssd-pm1733--3--84tb-mzwlj3t8hbls-00007-a2202065.html

[15]

November 16, 2023. Samsung PM983 Price History. https://geizhals.de/samsung-ssd-pm983--3--84tb-mzqlb3t8hals-00007-a1870387.html

[16]

October 26, 2023. Skew-Aware Join in Postgres. https://github.com/postgres/postgres/blob/611806cd/src/include/executor/hashjoin.h#L95

[17]

September 13, 2023. Efficient IO with io_uring. https://kernel.dk/io_uring.pdf

[18]

September 26, 2023. DuckDB 0.9 Release Announcement. https://duckdb.org/2023/09/26/announcing-duckdb-090.html

[19]

Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB. Morgan Kaufmann, 169--180.

[20]

Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. 1999. DBMSs on a Modern Processor: Where Does Time Go?. In VLDB. Morgan Kaufmann, 266--277.

Digital Library

[21]

Manos Athanassoulis, Shimin Chen, Anastasia Ailamaki, Phillip B. Gibbons, and Radu Stoica. 2011. MaSM: efficient online updates in data warehouses. In SIGMOD. ACM, 865--876.

[22]

Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE. 362--373.

[23]

Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In SIGMOD. 168--180.

[24]

Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD Conference. ACM, 37--48.

[25]

Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (2008), 77--85.

Digital Library

[26]

Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database Architecture Optimized for the New Bottleneck: Memory Access. In VLDB. 54--65.

Digital Library

[27]

John Cieslewicz and Kenneth A. Ross. 2007. Adaptive Aggregation on Chip Multiprocessors. In VLDB. 339--350.

[28]

David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, and David A. Wood. 1984. Implementation Techniques for Main Memory Database Systems. In SIGMOD. 1--8.

Digital Library

[29]

Jaeyoung Do and Jignesh M. Patel. 2009. Join processing for flash SSDs: remembering past lessons. In DaMoN. 1--8.

[30]

Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, and David J. DeWitt. 2013. Fast peak-to-peak behavior with SSD buffer pool. In ICDE. IEEE Computer Society, 1129--1140.

[31]

Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, and Alan Halverson. 2011. Turbocharging DBMS buffer pool using SSDs. In SIGMOD Conference. ACM, 1113--1124.

Digital Library

[32]

Thanh Do, Goetz Graefe, and Jeffrey F. Naughton. 2022. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4 (2022), 16:1--16:35.

Digital Library

[33]

Dominik Durner, Viktor Leis, and Thomas Neumann. 2023. Exploiting Cloud Object Storage for High-Performance Analytics. PVLDB 16, 11 (2023), 2769--2782.

Digital Library

[34]

Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In AofA: Analysis of Algorithms.

[35]

Goetz Graefe. 2007. The five-minute rule twenty years later, and how flash memory changes the rules. In DaMoN. 6.

[36]

Goetz Graefe, Ross Bunker, and Shaun Cooper. 1998. Hash Joins and Hash Teams in Microsoft SQL Server. In VLDB. 86--97.

[37]

Goetz Graefe, Stavros Harizopoulos, Harumi A. Kuno, Mehul A. Shah, Dimitris Tsirogiannis, and Janet L. Wiener. 2010. Designing Database Operators for Flash-enabled Memory Hierarchies. IEEE Data Eng. Bull. 33, 4 (2010), 21--27.

[38]

Jim Gray and Bob Fitzgerald. 2008. Flash Disk Opportunity for Server Applications. ACM Queue 6, 4 (2008), 18--23.

Digital Library

[39]

Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.

[40]

Gabriel Haas and Viktor Leis. 2023. What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines. PVLDB 16, 9 (2023), 2090--2102.

Digital Library

[41]

Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the looking glass, and what we found there. In SIGMOD Conference. ACM, 981--992.

Digital Library

[42]

Sven Helmer, Thomas Neumann, and Guido Moerkotte. 2002. Early grouping gets the skew. Technical reports 2 (2002).

[43]

Kaisong Huang, Tianzheng Wang, Qingqing Zhou, and Qingzhong Meng. 2023. The Art of Latency Hiding in Modern Database Engines. PVLDB 17, 3, 577--590.

Digital Library

[44]

Shiva Jahangiri, Michael J. Carey, and Johann-Christoph Freytag. 2022. Design Trade-offs for a Robust Dynamic Hybrid Hash Join. PVLDB 15, 10 (2022), 2257--2269.

Digital Library

[45]

Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE. 195--206.

[46]

Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter A. Boncz. 2018. Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask. PVLDB 11, 13 (2018), 2209--2222.

Digital Library

[47]

Ioannis Koltsidas and Stratis Viglas. 2011. Data management over flash memory. In SIGMOD. 1209--1212.

[48]

Laurens Kuiper, Peter Boncz, and Hannes Mühleisen. 2024. Robust External Hash Aggregation in the Solid State Age. In ICDE. IEEE.

[49]

Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis. 2023. BtrBlocks: Efficient Columnar Compression for Data Lakes. Proc. ACM Manag. Data 1, 2 (2023), 118:1--118:26.

Digital Library

[50]

Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High- Performance Concurrency Control Mechanisms for Main-Memory Databases. Proc. VLDB Endow. 5, 4 (2011), 298--309.

Digital Library

[51]

Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD. 743--754.

[52]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215.

Digital Library

[53]

Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. 185--196.

[54]

Viktor Leis, Kan Kundhikanjana, Alfons Kemper, and Thomas Neumann. 2015. Efficient Processing of Window Functions in Analytical SQL Queries. Proc. VLDB Endow. 8, 10 (2015), 1058--1069.

Digital Library

[55]

Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In DaMoN. 3:1--3:8.

[56]

Feng Li, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. In SIGMOD. ACM, 355--370.

Digital Library

[57]

Aurosish Mishra, Shasank Chavan, Allison Holloway, Tirthankar Lahiri, Zhen Hua Liu, Sunil Chakkappen, Dennis Lui, Vinita Subramanian, Ramesh Kumar, Maria Colgan, Jesse Kamp, Niloy Mukherjee, and Vineet Marwah. 2016. Accelerating Analytics with Dynamic In-Memory Expressions. PVLDB 9, 13 (2016), 1437--1448.

Digital Library

[58]

Ingo Müller, Peter Sanders, Arnaud Lacurie, Wolfgang Lehner, and Franz Färber. 2015. Cache-Efficient Aggregation: Hashing Is Sorting. In SIGMOD Conference. 1123--1136.

Digital Library

[59]

Masaya Nakayama, Masaru Kitsuregawa, and Mikio Takagi. 1988. Hash-Partitioned Join Method Using Dynamic Destaging Strategy. In VLDB. 468--478.

[60]

Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.

[61]

Thomas Neumann and Viktor Leis. 2014. Compiling Database Queries into Machine Code. IEEE Data Eng. Bull. 37, 1 (2014), 3--11.

[62]

Hamish Nicholson, Periklis Chrysogelos, and Anastasia Ailamaki. 2022. HPCache: Memory-Efficient OLAP Through Proportional Caching. In DaMoN. 7:1--7:9.

[63]

Hamish Nicholson, Aunn Raza, Periklis Chrysogelos, and Anastasia Ailamaki. 2023. HetCache: Synergising NVMe Storage and GPU acceleration for Memory-Efficient Analytics. In CIDR.

[64]

Stefan Noll, Jens Teubner, Norman May, and Alexander Böhm. 2018. Accelerating Concurrent Workloads with CPU Cache Partitioning. In ICDE. IEEE Computer Society, 437--448.

[65]

Tarikul Islam Papon and Manos Athanassoulis. 2021. The Need for a New I/O Model. In CIDR. www.cidrdb.org.

[66]

Jignesh M. Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A Data Platform Based on the Scaling-Up Approach. Proc. VLDB Endow. 11, 6 (2018), 663--676.

Digital Library

[67]

Orestis Polychroniou and Kenneth A. Ross. 2014. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD. 755--766.

Digital Library

[68]

Mark Raasveldt, Pedro Holanda, Tim Gubner, and Hannes Mühleisen. 2018. Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing. In DBTest. 2:1--2:6.

[69]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In SIGMOD. ACM, 1981--1984.

Digital Library

[70]

Bogdan Raducanu, Peter A. Boncz, and Marcin Zukowski. 2013. Micro adaptivity in Vectorwise. In SIGMOD. 1231--1242.

[71]

Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Müller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB 6, 11 (2013), 1080--1091.

Digital Library

[72]

Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, and Bishwaranjan Bhattacharjee. 2013. Making Updates Disk-I/O Friendly Using SSDs. Proc. VLDB Endow. 6, 11 (2013), 997--1008.

Digital Library

[73]

Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In SIGMOD Conference. ACM, 1961--1976.

[74]

Felix Martin Schuhknecht, Pankaj Khanchandani, and Jens Dittrich. 2015. On the Surprising Difficulty of Simple Things: the Case of Radix Partitioning. PVLDB 8, 9 (2015), 934--937.

Digital Library

[75]

Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The End of an Architectural Era (It's Time for a Complete Rewrite). In VLDB. ACM, 1150--1160.

[76]

Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener, and Goetz Graefe. 2009. Query processing techniques for solid state drives. In SIGMOD. 59--72.

[77]

Alexander van Renen and Viktor Leis. 2023. Cloud Analytics Benchmark. PVLDB 16, 6 (2023), 1413--1425.

Digital Library

[78]

Leonard von Merzljak, Philipp Fent, Thomas Neumann, and Jana Giceva. 2022. What Are You Waiting For? Use Coroutines for Asynchronous I/O to Hide I/O Latencies and Maximize the Read Bandwidth!. In ADMS.

[79]

Till Westmann, Donald Kossmann, Sven Helmer, and Guido Moerkotte. 2000. The Implementation and Performance of Compressed Databases. SIGMOD 29, 3 (2000), 55--67.

Digital Library

[80]

Qizhen Zhang, Philip A. Bernstein, Daniel S. Berger, and Badrish Chandramouli. 2021. Redy: Remote Dynamic Memory Cache. PVLDB 15, 4 (2021), 766--779.

Digital Library

[81]

Zichen Zhu, Xiao Hu, and Manos Athanassoulis. 2023. NOCAP: Near-Optimal Correlation-Aware Partitioning Joins. Proc. ACM Manag. Data 1, 4 (2023), 252:1--252:27.

Digital Library

[82]

Marcin Zukowski, Mark van de Wiel, and Peter A. Boncz. 2012. Vectorwise: A Vectorized Analytical DBMS. In ICDE. 1349--1350.

Index Terms

High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance
1. Information systems

Recommendations

Performance analysis of NVMe SSDs and their implication on real world databases
SYSTOR '15: Proceedings of the 8th ACM International Systems and Storage Conference

The storage subsystem has undergone tremendous innovation in order to keep up with the ever-increasing demand for throughput. Non Volatile Memory Express (NVMe) based solid state devices are the latest development in this domain, delivering ...
Efficient Crash Consistency for NVMe over PCIe and RDMA
This article presents crash-consistent Non-Volatile Memory Express (ccNVMe), a novel extension of the NVMe that defines how host software communicates with the non-volatile memory (e.g., solid-state drive) across a PCI Express bus and RDMA-capable ...
Register spilling via transformed interference equations for PAC DSP architecture

Digital signal processors DSPs with very long instruction word VLIW data-path architectures are increasingly being deployed on embedded devices for multimedia processing applications. To reduce the power consumption and design cost of VLIW DSP ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 2, Issue 6

SIGMOD

December 2024

792 pages

EISSN:2836-6573

DOI:10.1145/3709598

Editor:
Divyakant Agrawal
University of California, Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2024

Published in PACMMOD Volume 2, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

European Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
248
Total Downloads

Downloads (Last 12 months)248
Downloads (Last 6 weeks)164

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents