Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3209950.3209958acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Snowtrail: Testing with Production Queries on a Cloud Database

Published: 15 June 2018 Publication History

Abstract

Database as a service provided on cloud computing platforms has been rapidly gaining popularity in recent years. The Snowflake Elastic Data Warehouse (henceforth referred to as Snowflake) is a cloud database service provided by Snowflake Computing. The cloud native capabilities of new database services such as Snowflake bring exciting new opportunities for database testing. First, Snowflake maintains extensive knowledge of historical customer queries, including both the query text and corresponding system configurations. Second, Snowflake is multi-tenant, which provides easy access to metadata and data that can be used to rerun customer queries from a privileged role. Furthermore, the elastic nature of Snowflake's data warehouse service allows testing with these queries using a separate set of resources without impacting the customer's production workload.
This paper presents Snowtrail, an infrastructure developed within Snowflake for testing using customer production queries with result obfuscation. Running tests with production queries provides us with direct insight into the impact of improvements and new features on customer workloads. It enables testing on queries of more shapes and complexity than can be manually constructed by developers. Snowtrail is also used to help ensure the stability of the online upgrade process of the system.

References

[1]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 215--226.
[2]
Snowflake Documentation. 2016. Multi-cluster Warehouses. (2016). https://docs.snowflake.net/manuals/user-guide/warehouses-multicluster.html
[3]
Leonidas Galanis, Supiti Buranawatanachoke, Romain Colle, Benoît Dageville, Karl Dias, Jonathan Klein, Stratos Papadomanolakis, Leng Leng Tan, Venkateshwaran Venkataramani, Yujun Wang, et al. 2008. Oracle database replay. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1159--1170.
[4]
S. Jain, B. Howe, J. Yan, and T. Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. ArXiv e-prints (Jan. 2018). arXiv:cs.DB/1801.05613
[5]
Trupti M Kodinariya and Prashant R Makwana. 2013. Review on determining number of Cluster in K-Means Clustering. International Journal 1, 6 (2013), 90--95.
[6]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.
[7]
Ming-Chuan Wu, Jingren Zhou, Nicolas Bruno, Yu Zhang, and Jon Fowler. 2012. Scope Playback: Self-validation in the Cloud. In Proceedings of the Fifth International Workshop on Testing Database Systems (DBTest '12). ACM, New York, NY, USA, Article 3, 6 pages.
[8]
Khaled Yagoub, Peter Belknap, Benoit Dageville, Karl Dias, Shantanu Joshi, and Hailing Yu. 2008. Oracle's SQL Performance Analyzer. (2008).

Cited By

View all
  • (2024)Testing Graph Database Systems via Graph-Aware Metamorphic RelationsProceedings of the VLDB Endowment10.14778/3636218.363623617:4(836-848)Online publication date: 5-Mar-2024
  • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
  • (2024)DoppelGanger++: Towards Fast Dependency Graph Generation for Database ReplayProceedings of the ACM on Management of Data10.1145/36393222:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DBTest '18: Proceedings of the Workshop on Testing Database Systems
June 2018
49 pages
ISBN:9781450358262
DOI:10.1145/3209950
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automation
  2. Database-as-a-service
  3. Snowflake
  4. Snowtrail
  5. Testing
  6. Workload Selection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 31 of 56 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Testing Graph Database Systems via Graph-Aware Metamorphic RelationsProceedings of the VLDB Endowment10.14778/3636218.363623617:4(836-848)Online publication date: 5-Mar-2024
  • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
  • (2024)DoppelGanger++: Towards Fast Dependency Graph Generation for Database ReplayProceedings of the ACM on Management of Data10.1145/36393222:1(1-26)Online publication date: 26-Mar-2024
  • (2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
  • (2024)Mimicking Production Behavior With Generated MocksIEEE Transactions on Software Engineering10.1109/TSE.2024.345844850:11(2921-2946)Online publication date: Nov-2024
  • (2023)FASTune: Towards Fast and Stable Database Tuning System with Reinforcement LearningElectronics10.3390/electronics1210216812:10(2168)Online publication date: 10-May-2023
  • (2023)Testing Database Engines via Query Plan GuidanceProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00174(2060-2071)Online publication date: 14-May-2023
  • (2023)Smart Query Sampling with Feature Coverage and Unsupervised Machine Learning2023 10th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud58648.2023.00036(193-198)Online publication date: 14-Aug-2023
  • (2022)DIAMETRICSCommunications of the ACM10.1145/356746465:12(105-112)Online publication date: 22-Nov-2022
  • (2022)Journey of Migrating Millions of Queries on The CloudProceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems10.1145/3531348.3532177(10-16)Online publication date: 17-Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media