Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Choosing a cloud DBMS: architectures and tradeoffs

Published: 01 August 2019 Publication History

Abstract

As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.

References

[1]
O. Agmon Ben-Yehuda, M. Ben-Yehuda, A. Schuster, and D. Tsafrir. Deconstructing amazon ec2 spot instance pricing. ACM Trans. Econ. Comput., 1(3):16:1--16:20, 2013.
[2]
AWS. Redshift documentation: Factors affecting query performance, 2012. https://docs.aws.amazon.com/redshift/latest/dg/c-query-performance.html, Last accessed 2018-06-15.
[3]
AWS. Cluster configuration guidelines and best practices, 2019. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html, Last accessed 2019-02-01.
[4]
D. Bermbach, J. Kuhlenkamp, A. Dey, A. Ramachandran, A. Fekete, and S. Tai. Benchfoundry: A benchmarking framework for cloud storage services. In Proc. Int. Conf. on Service-Oriented Computing (ICSOC), pages 314--330, 2017.
[5]
C. Binnig, D. Kossmann, T. Kraska, and S. Loesing. How is the weather tomorrow?: Towards a benchmark for the cloud. In Proc. Second Int. Workshop on Testing Database Systems, DBTest '09, pages 9:1--9:6, 2009.
[6]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proc. 1st ACM Sym. on Cloud Computing, SoCC '10, pages 143--154, 2010.
[7]
B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The snowflake elastic data warehouse. In Proc. 2016 Int. Conf. on Management of Data, SIGMOD '16, pages 215--226, 2016.
[8]
Databricks. Benchmarking Big Data SQL Platforms in the Cloud: TPC-DS benchmarks demonstrate Databricks Runtime 3.0's superior performance, 2017. https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html, Last accessed 2018-07-15.
[9]
M. Dayarathna and T. Suzumura. Graph database benchmarking on cloud environments with XGDBench. Automated Software Eng., 21(4):509--533, 2014.
[10]
A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon Redshift and the case for simpler data warehouses. In Proc. of the 2015 Int. Conf. Management of Data, SIGMOD '15, pages 1917--1923, 2015.
[11]
M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of Pregel-like graph processing systems. PVLDB, 7(12):1047--1058, 2014.
[12]
Hortonworks. Apache Tez: Overview, 2018. https://hortonworks.com/apache/tez/, Last accessed 2018-08-01.
[13]
D. Kossmann, T. Kraska, and S. Loesing. An evaluation of alternative architectures for transaction processing in the cloud. In Proc. of the 2010 Int. Conf. Management of Data, SIGMOD '10, pages 579--590, 2010.
[14]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The Vertica analytic database: C-store 7 years later. PVLDB, 5(12):1790--1801, 2012.
[15]
A. Lenk, M. Menzel, J. Lipsky, S. Tai, and P. Offermann. What are you paying for? performance benchmarking for infrastructure-as-a-service offerings. In Proc. 2011 IEEE 4th Int. Conf. on Cloud Computing, CLOUD '11, pages 484--491, 2011.
[16]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.
[17]
N. Nix. CIA tech official calls Amazon cloud project 'transformational'. Bloomberg, June 2018. https://www.bloomberg.com/news/articles/2018-06-20/cia-tech-official-calls-amazon-cloud-project-transformational, Last accessed 2018-10-01.
[18]
J. Schad, J. Dittrich, and J.-A. Quiané-Ruiz. Runtime measurements in the cloud: Observing, analyzing, and reducing variance. PVLDB, 3(1-2):460--471, 2010.
[19]
R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yigitbasi, H. Jin, E. Hwang, N. Shingte, and C. Berner. Presto: SQL on everything. In IEEE 35th Int. Conf. on Data Eng. (ICDE), pages 1802--1813, 2019.
[20]
A. Shiu. Why we chose Redshift, 2015. https://amplitude.com/blog/2015/03/27/why-we-chose-redshift, Last accessed 2018-11-05.
[21]
M. Stonebraker, A. Pavlo, R. Taft, and M. L. Brodie. Enterprise database applications and the cloud: A difficult road ahead. In 2014 IEEE Int. Conf. Cloud Eng.}, pages 1--6, 2014.
[22]
D. Sundstrom. Even faster: Data at the speed of Presto ORC, 2015. https://code.fb.com/core-data/even-faster-data-at-the-speed-of-presto-orc/, Last accessed 2018-04-15.
[23]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: A warehousing solution over a map-reduce framework. PVLDB, 2(2):1626--1629, 2009.
[24]
B. Vandiver, S. Prasad, P. Rana, E. Zik, A. Saeidi, P. Parimal, S. Pantela, and J. Dave. Eon mode: Bringing the Vertica columnar database to the cloud. In Proc. 2018 Int. Conf. Management of Data, SIGMOD '18, pages 797--809, 2018.
[25]
B. Varghese, O. Akgun, I. Miguel, L. Thai, and A. Barker. Cloud benchmarking for performance. In IEEE 6th Int. Conf. Cloud Computing Technology and Science, pages 535--540, 2014.
[26]
B. Varghese, L. T. Subba, L. Thai, and A. Barker. Container-based cloud virtual machine benchmarking. In IEEE Int. Conf. on Cloud Eng. (IC2E), pages 192--201, 2016.
[27]
A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon Aurora: Design considerations for high throughput cloud-native relational databases. In Proc. ACM Int. Conf. on Management of Data, SIGMOD '17, pages 1041--1052, 2017.
[28]
Vertica. Configuring storage (documentation). https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/UsingVerticaOnAWS/ConfiguringStorage.htm, Last accessed 2019-01-13.

Cited By

View all
  • (2024)Why TPC is Not Enough: An Analysis of the Amazon Redshift FleetProceedings of the VLDB Endowment10.14778/3681954.368203117:11(3694-3706)Online publication date: 30-Aug-2024
  • (2024)Saving Money for Analytical Workloads in the CloudProceedings of the VLDB Endowment10.14778/3681954.368201817:11(3524-3537)Online publication date: 1-Jul-2024
  • (2024)So Far and yet so Near - Accelerating Distributed Joins with CXLProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663449(1-9)Online publication date: 10-Jun-2024
  • Show More Cited By
  1. Choosing a cloud DBMS: architectures and tradeoffs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 12
    August 2019
    547 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2019
    Published in PVLDB Volume 12, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)105
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Why TPC is Not Enough: An Analysis of the Amazon Redshift FleetProceedings of the VLDB Endowment10.14778/3681954.368203117:11(3694-3706)Online publication date: 30-Aug-2024
    • (2024)Saving Money for Analytical Workloads in the CloudProceedings of the VLDB Endowment10.14778/3681954.368201817:11(3524-3537)Online publication date: 1-Jul-2024
    • (2024)So Far and yet so Near - Accelerating Distributed Joins with CXLProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663449(1-9)Online publication date: 10-Jun-2024
    • (2024)SkyPIE: A Fast & Accurate Oracle for Object PlacementProceedings of the ACM on Management of Data10.1145/36393102:1(1-27)Online publication date: 26-Mar-2024
    • (2024)Intelligent Scaling in Amazon RedshiftCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653394(269-279)Online publication date: 9-Jun-2024
    • (2024)FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00867-833:5(1643-1670)Online publication date: 1-Sep-2024
    • (2023)Efficient Data Transfer in Shared-storage Cloud Data Processing Systems with OPTICSProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3623630(230-234)Online publication date: 11-Sep-2023
    • (2023)Exploiting Cloud Object Storage for High-Performance AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361148616:11(2769-2782)Online publication date: 24-Aug-2023
    • (2023)Cloud Analytics BenchmarkProceedings of the VLDB Endowment10.14778/3583140.358315616:6(1413-1425)Online publication date: 1-Feb-2023
    • (2023)Raven: Benchmarking Monetary Expense and Query Efficiency of OLAP Engines on the CloudDatabase Systems for Advanced Applications10.1007/978-3-031-30678-5_45(593-605)Online publication date: 17-Apr-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media