Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3267809.3267822acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

Published: 11 October 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Cloud-based data analysis is nowadays common practice because of the lower system management overhead as well as the pay-as-you-go pricing model. The pricing model, however, is not always suitable for query processing as heavy use results in high costs. For example, in query-as-a-service systems, where users are charged per processed byte, collections of queries accessing the same data frequently can become expensive. The problem is compounded by the limited options for the user to optimize query execution when using declarative interfaces such as SQL. In this paper, we show how, without modifying existing systems and without the involvement of the cloud provider, it is possible to significantly reduce the overhead, and hence the cost, of query-as-a-service systems. Our approach is based on query rewriting so that multiple concurrent queries are combined into a single query. Our experiments show the aggregated amount of work done by the shared execution is smaller than in a query-at-a-time approach. Since queries are charged per byte processed, the cost of executing a group of queries is often the same as executing a single one of them. As an example, we demonstrate how the shared execution of the TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and bigquery than using a query-at-a-time approach while achieving a higher throughput.

    References

    [1]
    Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, and Luis Perez. "The DataPath System: A Data-centric Analytic Processing Engine for Large Data Warehouses." In: Proc. SIGMOD. 2010, pp. 519--530.
    [2]
    George Candea, Neoklis Polyzotis, and Radek Vingralek. "A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses." In: Proc. VLDB Endow. 2009, pp. 277--288.
    [3]
    Chung-Min Chen and Nick Roussopoulos. "The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching." In: EDBT. 1994, pp. 323--336.
    [4]
    Latha S. Colby, Latha S., Colby, and Latha S. "A recursive algebra and query optimization for nested relations." In: Proc. SIGMOD. Vol. 18. 2. 1989, pp. 273--283.
    [5]
    Sheldon Finkelstein. "Common Expression Analysis in Database Applications." In: Proc. SIGMOD. 1982, pp. 235--245.
    [6]
    Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. "SharedDB: Killing One Thousand Queries with One Stone." In: Proc. VLDB Endow. 2012, pp. 526--537.
    [7]
    Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, and Donald Kossmann. "Shared Workload Optimization." In: Proc. VLDB Endow. 2014, pp. 429--440.
    [8]
    Georgios Giannikis, Philipp Unterbrunner, Jeremy Meyer, Gustavo Alonso, Dietmar Fauser, and Donald Kossmann. "Crescando." In: Proc. SIGMOD. 2010, pp. 1227--1230.
    [9]
    Boris Glavic and Gustavo Alonso. "Perm: Processing Provenance and Data on the same Data Model through Query Rewriting." In: Proc. ICDE. 2009, pp. 174--185.
    [10]
    Goetz Graefe and William J. McKenna. "The Volcano Optimizer Generator: Extensibility and Efficient Search." In: Proc. ICDE. 1993, pp. 209--218.
    [11]
    Stavros Harizopoulos and Anastassia Ailamaki. "StagedDB: Designing Database Servers for Modern Hardware." In: IEEE Data Eng. Bull. 28.2 (2005), pp. 11--16.
    [12]
    Stavros Harizopoulos, Vladislav Shkapenyuk, and Anastassia Ailamaki. "QPipe: A Simultaneously Pipelined Relational Query Engine." In: Proc. SIGMOD. 2005, pp. 383--394.
    [13]
    Google Inc. BigQuery Documentation -- Troubleshooting Errors. Accessed: 2017-09-31. 2017. URL: https://tinyurl.com/y7ymxkls.
    [14]
    Google Inc. Google BigQuery User-Defined-Functions Limitations. 2017. url: https://tinyurl.com/y7br452b.
    [15]
    Milena G. Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo A.P. Gonçalves. "An Architecture for Recycling Intermediates in a Column-store." In: Proc. SIGMOD. 2009, pp. 309--320.
    [16]
    Christian A. Lang, Bishwaranjan Bhattacharjee, Tim Malkemus, and Kwai Wong. "Increasing Buffer-locality for Multiple Index Based Scans Through Intelligent Placement and Index Scan Speed Control." In: Proc. VLDB Endow. 2007, pp. 1298--1309.
    [17]
    Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. "MQJoin: Efficient Shared Execution of Main-memory Joins." In: Proc. VLDB Endow. 2016, pp. 480--491.
    [18]
    Darko Makreshanski, Jana Giceva, Claude Barthels, and Gustavo Alonso. "BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications." In: Proc. SIGMOD. 2017, pp. 37--50. ISBN: 9781450341974.
    [19]
    Stefan Manegold, Arjan Pellenkoft, and Martin L. Kersten. "A Multi-query Optimizer for Monet." In: Proc. BNCOD. 2000, pp. 36--50.
    [20]
    Thomas Neumann and Guido Moerkotte. "Generating optimal DAG-structured query evaluation plans." In: CSRD (2009), pp. 103--117.
    [21]
    Vijayshankar Raman, Garret Swart, Lin Qiao, Frederick Reiss, Vijay Dialani, Donald Kossmann, Inderpal Narang, and Richard Sidle. "Constant-Time Query Processing." In: Proc. ICDE. 2008, pp. 60--69.
    [22]
    Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. "Efficient and Extensible Algorithms for Multi Query Optimization." In: Proc. SIGMOD. 2000, pp. 249--260.
    [23]
    Timos K. Sellis. "Multiple-query Optimization." In: ACM Trans. Database Syst. (1988), pp. 23--52.
    [24]
    Transaction Processing Performance Council. TPC Benchmark H (Decision Support). Standard Specification. 2018, pp. 1--137.
    [25]
    P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. "Predictable Performance for Unpredictable Workloads." In: Proc. VLDB Endow. 2009, pp. 706--717.
    [26]
    Xiaodan Wang, Christopher Olston, Anish Das Sarma, and Randal Burns. "CoScan: Cooperative Scan Sharing in the Cloud." In: Proc. SoCC. 2011, 11:1--11:12.
    [27]
    Jan Wolf. "Multiple Query Execution through SQL Rewriting." Master's Thesis. ETH Zürich, 2017.
    [28]
    Eugene Wu, Leilani Battle, and Samuel R. Madden. "The Case for Data Visualization Management Systems: Vision Paper." In: Proc. VLDB Endow. 2014, pp. 903--906.
    [29]
    Marcin Zukowski, Sándor Héman, Niels Nes, and Peter Boncz. "Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS." In: Proc. VLDB Endow. 2007, pp. 723--734.

    Cited By

    View all
    • (2023)tf.data serviceProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624666(358-375)Online publication date: 30-Oct-2023
    • (2023)Secure query processing and optimization in cloud environment: a reviewInformation Security Journal: A Global Perspective10.1080/19393555.2023.227097633:2(172-191)Online publication date: 20-Dec-2023
    • (2022)ProRes: Proactive re-selection of materialized viewsComputer Science and Information Systems10.2298/CSIS210606003M19:2(735-762)Online publication date: 2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '18: Proceedings of the ACM Symposium on Cloud Computing
    October 2018
    546 pages
    ISBN:9781450360111
    DOI:10.1145/3267809
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cloud Computing
    2. Data Warehouse
    3. Query Processing
    4. Serverless
    5. Shared Workload Execution

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SoCC '18
    Sponsor:
    SoCC '18: ACM Symposium on Cloud Computing
    October 11 - 13, 2018
    CA, Carlsbad, USA

    Acceptance Rates

    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)tf.data serviceProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624666(358-375)Online publication date: 30-Oct-2023
    • (2023)Secure query processing and optimization in cloud environment: a reviewInformation Security Journal: A Global Perspective10.1080/19393555.2023.227097633:2(172-191)Online publication date: 20-Dec-2023
    • (2022)ProRes: Proactive re-selection of materialized viewsComputer Science and Information Systems10.2298/CSIS210606003M19:2(735-762)Online publication date: 2022
    • (2021)A Unified Approach to Spatial Proximity Query Processing in Dynamic Spatial NetworksSensors10.3390/s2116525821:16(5258)Online publication date: 4-Aug-2021
    • (2021)Database technology for the massesProceedings of the VLDB Endowment10.14778/3476249.347629614:11(2483-2490)Online publication date: 27-Oct-2021
    • (2021)Sharing opportunities for OLTP workloads in different isolation levelsProceedings of the VLDB Endowment10.14778/3401960.340196713:10(1696-1708)Online publication date: 10-Mar-2021
    • (2020)Workload merging potential in SAP HybrisProceedings of the workshop on Testing Database Systems10.1145/3395032.3395326(1-6)Online publication date: 19-Jun-2020
    • (2020)Group Processing of Multiple k-Farthest Neighbor Queries in Road NetworksIEEE Access10.1109/ACCESS.2020.30022638(110959-110973)Online publication date: 2020
    • (2019)Security and Cost Optimization Auditing for Amazon Web ServicesProceedings of the 2nd International Conference on Software Engineering and Information Management10.1145/3305160.3305181(44-48)Online publication date: 10-Jan-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media