research-article

FlexPushdownDB: hybrid pushdown and caching in a cloud DBMS

Authors:

Matthew Woicik,

Marco Serafini,

Ashraf Aboulnaga,

Michael StonebrakerAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 14, Issue 11

Pages 2101 - 2113

https://doi.org/10.14778/3476249.3476265

Published: 01 July 2021 Publication History

Abstract

Modern cloud databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Two solutions have been explored to mitigate the bottleneck: caching and computation pushdown. While both techniques can significantly reduce network traffic, existing DBMSs consider them as orthogonal techniques and support only one or the other, leaving potential performance benefits unexploited.

In this paper we present FlexPushdownDB (FPDB), an OLAP cloud DBMS prototype that supports fine-grained hybrid query execution to combine the benefits of caching and computation pushdown in a storage-disaggregation architecture. We build a hybrid query executor based on a new concept called separable operators to combine the data from the cache and results from the pushdown processing. We also propose a novel Weighted-LFU cache replacement policy that takes into account the cost of pushdown computation. Our experimental evaluation on the Star Schema Benchmark shows that the hybrid execution outperforms both the conventional caching-only architecture and pushdown-only architecture by 2.2X. In the hybrid architecture, our experiments show that Weighted-LFU can outperform the baseline LFU by 37%.

References

[1]

2012. Akka. https://akka.io/.

[2]

2012. Ceph. https://ceph.io/.

[3]

2016. Apache Arrow. https://arrow.apache.org/.

[4]

2016. Apache Parquet. https://parquet.apache.org/.

[5]

2016. MinIO. https://min.io/.

[6]

2017. AWS Nitro System. https://aws.amazon.com/ec2/nitro/.

[7]

2018. Amazon Athena --- Serverless Interactive Query Service. https://aws.amazon.com/athena/.

[8]

2018. Amazon Redshift. https://aws.amazon.com/redshift/.

[9]

2018. Amazon S3. https://aws.amazon.com/s3/.

[10]

2018. Gandiva: an LLVM-based Arrow expression compiler. https://arrow.apache.org/blog/2018/12/05/gandiva-donation/.

[11]

2018. Presto. https://prestodb.io/.

[12]

2020. AQUA (Advanced Query Accelerator) for Amazon Redshift. https://pages.awscloud.com/AQUA_Preview.html/.

[13]

2020. Azure Data Lake Storage query acceleration. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration/.

[14]

2020. Presto documentation, Alluxio Cache Service. https://prestodb.io/docs/current/cache/alluxio.html/.

[15]

Gul Agha. 1986. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press.

Digital Library

[16]

Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In SIGMOD. 1383--1394.

Digital Library

[17]

Joe Armstrong. 1996. Erlang---a Survey of the Language and its Industrial Applications. In Proc. INAP, Vol. 96.

[18]

L. A. Belady. 1966. A Study of Replacement Algorithms for a Virtual-Storage Computer. IBM System Journal 5, 2 (1966), 78--101.

Digital Library

[19]

Dominik Charousset, Raphael Hiesgen, and Thomas C. Schmidt. 2016. Revisiting Actor Programming in C++. Computer Languages, Systems & Structures 45, C (2016).

Digital Library

[20]

Hybrid Memory Cube Consortium. 2014. HMCSpecification2.1.

[21]

Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD. 215--226.

Digital Library

[22]

Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In SIGMOD. 1221--1230.

Digital Library

[23]

Phil Francisco. 2011. The Netezza Data Appliance Architecture.

[24]

Michael J. Franklin, Björn Thór Jónsson, and Donald Kossmann. 1996. Performance Tradeoffs for Client-Server Query Processing. SIGMOD Record 25, 2 (1996), 149--160.

Digital Library

[25]

Shinya Fushimi, Masaru Kitsuregawa, and Hidehiko Tanaka. 1986. An Overview of The System Software of A Parallel Relational Database Machine GRACE. In VLDB. 209--219.

Digital Library

[26]

Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing. In HPCA. 126--137.

[27]

Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, and Onur Mutlu. 2018. Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions. arXiv preprint arXiv:1802.00320 (2018).

[28]

Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, and Peter J. Weinberger. 1994. Quickly Generating Billion-Record Synthetic Databases. SIGMOD Record 23, 2 (1994), 243--252.

Digital Library

[29]

Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A Framework for Near-Data Processing of Big Data Workloads. In ISCA. 153--165.

Digital Library

[30]

Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for Simpler Data Warehouses. In SIGMOD. 1917--1923.

Digital Library

[31]

Randall Hunt. 2018. S3 Select and Glacier Select - Retrieving Subsets of Objects. https://aws.amazon.com/blogs/aws/s3-glacier-select/.

[32]

Sang-Woo Jun, Shuotao Xu, and Arvind. 2017. Terabyte Sort on FPGA-accelerated Flash Storage. In IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM). 17--24.

[33]

Kimberly Keeton, David A Patterson, and Joseph M Hellerstein. 1998. A Case for Intelligent Disks (IDISKs). SIGMOD Record 27, 3 (1998), 42--52.

Digital Library

[34]

Tiago R. Kepe, Eduardo C. de Almeida, and Marco A. Z. Alves. 2019. Database Processing-in-Memory: An Experimental Study. VLDB 13, 3 (2019), 334--347.

Digital Library

[35]

Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In MICRO. 219--231.

Digital Library

[36]

Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The Vertica Analytic Database: C-Store 7 Years Later. VLDB 5, 12 (2012), 1790--1801.

Digital Library

[37]

Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of Web-Scale Datasets. VLDB 3, 1--2 (2010), 330--339.

Digital Library

[38]

Patrick O'Neil, Elizabeth O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Technology Conference on Performance Evaluation and Benchmarking. 237--252.

Digital Library

[39]

Erik Riedel, Christos Faloutsos, Garth A Gibson, and David Nagle. 2001. Active disks for large-scale data processing. Computer 34, 6 (2001), 68--74.

Digital Library

[40]

Manuel Rodríguez-Martínez and Nick Roussopoulos. 2000. MOCHA: A Self-Extensible Database Middleware System for Distributed Data Sources. In SIGMOD. 213--224.

Digital Library

[41]

Mary Tork Roth and Peter M. Schwarz. 1997. Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources. In VLDB. 266--275.

Digital Library

[42]

Junjay Tan, Thanaa Ghanem, Matthew Perron, Xiangyao Yu, Michael Stonebraker, David DeWitt, Marco Serafini, Ashraf Aboulnaga, and Tim Kraska. 2019. Choosing A Cloud DBMS: Architectures and Tradeoffs. VLDB 12, 12 (2019), 2170--2182.

Digital Library

[43]

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. 2010. Hive --- A Petabyte Scale Data Warehouse Using Hadoop. In ICDE. 996--1005.

[44]

Michael Ubell. 1985. The Intelligent Database Machine (IDM). In Query processing in database systems. 237--247.

[45]

Ben Vandiver, Shreya Prasad, Pratibha Rana, Eden Zik, Amin Saeidi, Pratyush Parimal, Styliani Pantela, and Jaimin Dave. 2018. Eon Mode: Bringing the Vertica Columnar Database to the Cloud. In SIGMOD. 797--809.

Digital Library

[46]

Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In SIGMOD. 1041--1052.

Digital Library

[47]

Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta, Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvilli, et al. 2018. Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. In SIGMOD. 789--796.

Digital Library

[48]

Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building an Elastic Query Engine on Disaggregated Storage. In NSDI. 449--462.

Digital Library

[49]

Ronald Weiss. 2012. A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server. Oracle White Paper. (2012).

[50]

Matthew Woicik. 2021. Determining the Optimal Amount of Computation Pushdown for a Cloud Database to Minimize Runtime. MIT Master Thesis (2021).

[51]

Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: an Intelligent Storage Engine with Support for Advanced SQL Offloading. VLDB 7, 11 (2014), 963--974.

Digital Library

[52]

Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In MICRO. 386--399.

[53]

Xiangyao Yu, Matt Youill, Matthew Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. 2020. PushdownDB: Accelerating a DBMS using S3 Computation. In ICDE. 1802--1805.

Cited By

Lu JRaina ACidon AFreedman MEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Fusion: An Analytics Object Store Optimized for Query PushdownProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707234(540-556)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707234
Tang CFan BZhao JLiang CWang YWang BQiu ZQiu LDing BSun SChe SMai JChen SZhu YXie JSun YLi YZhang YWang KChen MBagchi SZhang Y(2024)Data caching for enterprise-grade petabyte-scale OLAPProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692047(901-915)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692047
Li GTian WZhang JGrosman RLiu ZLi S(2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685806
Show More Cited By

Index Terms

FlexPushdownDB: hybrid pushdown and caching in a cloud DBMS

Index terms have been assigned to the content through auto-classification.

Recommendations

FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs
Abstract
Modern cloud-native OLAP databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. ...
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 14, Issue 11

July 2021

732 pages

ISSN:2150-8097

Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2021

Published in PVLDB Volume 14, Issue 11

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
376
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)4

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu JRaina ACidon AFreedman MEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Fusion: An Analytics Object Store Optimized for Query PushdownProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707234(540-556)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707234
Tang CFan BZhao JLiang CWang YWang BQiu ZQiu LDing BSun SChe SMai JChen SZhu YXie JSun YLi YZhang YWang KChen MBagchi SZhang Y(2024)Data caching for enterprise-grade petabyte-scale OLAPProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692047(901-915)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692047
Li GTian WZhang JGrosman RLiu ZLi S(2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685806
Petrescu DGuirguis AQuoc DPicorel JGuerraoui RDinu F(2024)Accelerating Transfer Learning with Near-Data Computation on Cloud Object StoresProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698549(995-1011)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698549
Lin LChen JSun RZhang JWang V(2024)A Unified Graph Framework for Storage-Compute Coupled Cluster and High-Density Computing ClusterProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3663741.3664790(1-6)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3663741.3664790
Pang XWang J(2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654983
Dong HZhang CLi GZhang H(2024)Cloud-Native Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339750836:12(7772-7791)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3397508
Yang YYu XSerafini MAboulnaga AStonebraker M(2024)FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00867-833:5(1643-1670)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00867-8
Wang RGao CWang JKadam PTamerÖzsu MAref W(2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00863-y33:6(1813-1836)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s00778-024-00863-y
Liu CPavlenko AInterlandi MHaynes B(2023)A Deep Dive into Common Open Formats for Analytical DBMSsProceedings of the VLDB Endowment10.14778/3611479.361150716:11(3044-3056)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611507
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents