Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3526050acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Intelligent Automated Workload Analysis for Database Replatforming

Published: 11 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Performing a detailed workload analysis is a crucial step in determining the feasibility, timeline and cost of a major data warehouse replatforming project, i.e., migration from one platform to another. A large company's data warehouse applications may include millions of queries, some of which will use features that are unsupported or have different semantics in the new warehouse, or may have poor performance there.
    In this paper we present qInsight, a workload analyzer that Datometry has used in data warehouse replatforming efforts for dozens of major clients. qInsight leverages Datometry's Hyper-Q to obtain insights from a workload, including SQL features and workload structural information that could not be obtained without deep query analysis. qInsight uses the identified features and a weighting scheme based on human expert judgments to assess the difficulty of rewriting each application in the workload via traditional migration methods. Datometry's clients find this information useful in planning their projects, including the order in which to migrate applications. We present a qInsight-based data warehouse usage analysis of over 1.7 billion queries from real-world workloads.

    References

    [1]
    Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollár, Arunprasad P. Marathe, Vivek R. Narasayya, and Manoj Syamala. 2004. Database Tuning Advisor for Microsoft SQL Server 2005. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004, Mario A. Nascimento, M. Tamer Özsu, Donald Kossmann, Renée J. Miller, José A. Blakeley, and K. Bernhard Schiefer (Eds.). Morgan Kaufmann, 1110--1121. https://doi.org/10.1016/B978-012088469--8.50097--8
    [2]
    Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, Anastasios Kementsietsidis and Marcos Antonio Vaz Salles (Eds.). IEEE Computer Society, 390--401. https://doi.org/10.1109/ICDE.2012.64
    [3]
    Amazon. 2021. Amazon Redshift - Correlated subqueries. Retrieved February, 2021 from https://docs.aws.amazon.com/redshift/latest/dg/r_correlated_subqueries.html
    [4]
    Amazon. 2021. Amazon Redshift - Defining table constraints. Retrieved February, 2021 from https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html
    [5]
    Amazon. 2021. AWS Schema Conversion Tool. Retrieved February, 2021 from https://docs.aws.amazon.com/SchemaConversionTool/index.html
    [6]
    Lyublena Antova, Derrick Bryant, Tuan Cao, Michael Duller, Mohamed A Soliman, and F Michael Waas. 2018. Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q. In SIGMOD.
    [7]
    Carlo Curino, Yang Zhang, Evan P. C. Jones, and Samuel Madden. 2010. Schism: a Workload-Driven Approach to Database Replication and Partitioning. Proc. VLDB Endow. 3, 1 (2010), 48--57. https://doi.org/10.14778/1920841.1920853
    [8]
    Benoît Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD.
    [9]
    Datametica. 2021. Datametica migration documentation. Retrieved February, 2021 from https://www.datametica.com/migration-to-gcp/
    [10]
    Datametica. 2021. Raven. Retrieved February, 2021 from https://www.datametica.com/raven
    [11]
    C de Groote. 1989. Software portability: J Henderson Gower Technical Press, Aldershot, UK (1988) 151 (+ xix) pp£ 22.50 hardback.
    [12]
    David Flater. 2018. Software Science Revisited: Rationalizing Halstead's System Using Dimensionless Units. US Department of Commerce, National Institute of Standards and Technology.
    [13]
    Andre Freitas, Juliano Sales, Siegfried Handschuh, and Edward Curry. 2015. How hard is this query? Measuring the Semantic Complexity of Schema-agnostic Queries.
    [14]
    Hamza Ghandorh, Abdulfattah Noorwali, Ali Bou Nassif, Luiz Fernando Capretz, and Roy Eagleson. 2020. A Systematic Literature Review for Software Portability Measurement: Preliminary Results. In Proceedings of the 2020 9th International Conference on Software and Computer Applications. Association for Computing Machinery, New York, NY, USA, 152--157.
    [15]
    Google. 2021. Bigquery Data Transfer Service. Retrieved February, 2021 from https://cloud.google.com/bigquery-transfer/docs/data-warehouse-migration-overview
    [16]
    Google. 2021. Google BigQuery. Retrieved February, 2021 from https://cloud.google.com/bigquery/
    [17]
    Anurag Gupta, Deepak Agarwal, Derek Tan, Jakub Kulesza, Rahul Pathak, Stefano Stefani, and Vidhya Srinivasan. 2015. Amazon Redshift and the Case for Simpler Data Warehouses. In SIGMOD.
    [18]
    Mitsuari Hakuta and Masato Ohminami. 1997. A study of software portability evaluation. J. Syst. Softw. 38, 2 (Aug. 1997), 145--154.
    [19]
    Maurice Howard Halstead. 1977. Elements of Software Science. Elsevier.
    [20]
    Impetus. 2021. ETL, EDW and Analytics Conversion to Cloud. Retrieved February, 2021 from https://www.impetus.com/data-warehouse-modernization/automated-workload-transformation
    [21]
    Ispirer. 2021. Ispirer Database Migration. Retrieved February, 2021 from https://www.ispirer.com/products/database-migration
    [22]
    Ispirer. 2021. Ispirer MnMTK Customization Case Study. Retrieved February, 2021 from https://bit.ly/3pOyTYR
    [23]
    Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed Lazowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 281--293. https://doi.org/10.1145/2882903.2882957
    [24]
    Shrainik Jain, Jiaqi Yan, Thierry Cruanes, and Bill Howe. 2019. Database-Agnostic Workload Management. In CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p110-jain-cidr19.pdf
    [25]
    J Lenhard and G Wirtz. 2013. Measuring the Portability of Executable Service-Oriented Processes. In 2013 17th IEEE International Enterprise Distributed Object Computing Conference. 117--126.
    [26]
    Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. arXiv:1208.0278 [cs.DB]
    [27]
    Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J. Gordon. 2018. Query-based Workload Forecasting for Self-Driving Database Management Systems. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 631--645. https://doi.org/10.1145/3183713.3196908
    [28]
    T. J. McCabe. 1976. A Complexity Measure. IEEE Transactions on Software Engineering SE-2, 4 (1976), 308--320. https://doi.org/10.1109/TSE.1976.233837
    [29]
    Microsoft. 2021. Azure Database Migration Service. Retrieved February, 2021 from https://azure.microsoft.com/en-us/services/database-migration
    [30]
    Microsoft. 2021. Microsoft Azure Synapse Analytics. Retrieved February 2021 from https://docs.microsoft.com/en-us/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture
    [31]
    Microsoft. 2021. Primary key, foreign key, and unique key using dedicated SQL pool in Azure Synapse Analytics. Retrieved February, 2021 from https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-table-constraints
    [32]
    Panos Parchas, Yonatan Naamad, Peter Van Bouwel, Christos Faloutsos, and Michalis Petropoulos. 2020. Fast and Effective Distribution-Key Recommendation for Amazon Redshift. Proc. VLDB Endow. 13, 11 (2020), 2411--2423. http://www.vldb.org/pvldb/vol13/p2411-parchas.pdf
    [33]
    Youcef Remil, Anes Bendimerad, Romain Mathonat, Philippe Chaleat, and Mehdi Kaytoue. 2021. "What makes my queries slow?": Subgroup Discovery for SQL Workload Analysis.
    [34]
    Mohamed A. Soliman, Lyublena Antova, Marc Sugiyama, Michael Duller, Amirhossein Aleyasen, Gourab Mitra, Ehab Abdelhamid, Mark Morcos, Michele Gage, Dmitri Korablev, and Florian M. Waas. 2020. A Framework for Emulating Database Operations in Cloud Data Warehouses. In SIGMOD.
    [35]
    Ramakrishna Varadarajan, Vivek Bharathan, Ariel Cary, Jaimin Dave, and Sreenath Bodagala. 2014. DBDesigner: A customizable physical design tool for Vertica Analytic Database. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, Isabel F. Cruz, Elena Ferrari, Yufei Tao, Elisa Bertino, and Goce Trajcevski (Eds.). IEEE Computer Society, 1084--1095. https://doi.org/10.1109/ICDE.2014.6816725
    [36]
    Aditya Vashistha and Shrainik Jain. 2016. Measuring query complexity in SQL- Share workload. In Proceedings of the 2019 international conference on management of data.
    [37]
    Wentao Wu, Y. Chi, Hakan Hacigümüs, and J. Naughton. 2013. Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads. Proc. VLDB Endow. 6 (2013), 925--936.
    [38]
    Zainab Zolaktaf, Mostafa Milani, and Rachel Pottinger. 2020. Facilitating SQL Query Composition and Analysis. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 209--224. https://doi.org/10.1145/3318464.3380602

    Cited By

    View all
    • (2024)TRAP: Tailored Robustness Assessment for Index Advisors via Adversarial Perturbation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00011(42-55)Online publication date: 13-May-2024
    • (2023)S/C: Speeding up Data Materialization with Bounded Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00393(1981-1994)Online publication date: Apr-2023
    • (2023)How Global Retailer ADEO Migrated to Google BigQuery with Database Virtualization2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386640(1895-1898)Online publication date: 15-Dec-2023

    Index Terms

    1. Intelligent Automated Workload Analysis for Database Replatforming

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
        June 2022
        2597 pages
        ISBN:9781450392495
        DOI:10.1145/3514221
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 June 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. adaptive data virtualization
        2. data warehousing
        3. database replatforming
        4. porting complexity
        5. workload analysis

        Qualifiers

        • Research-article

        Conference

        SIGMOD/PODS '22
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 785 of 4,003 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)65
        • Downloads (Last 6 weeks)3
        Reflects downloads up to

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)TRAP: Tailored Robustness Assessment for Index Advisors via Adversarial Perturbation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00011(42-55)Online publication date: 13-May-2024
        • (2023)S/C: Speeding up Data Materialization with Bounded Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00393(1981-1994)Online publication date: Apr-2023
        • (2023)How Global Retailer ADEO Migrated to Google BigQuery with Database Virtualization2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386640(1895-1898)Online publication date: 15-Dec-2023

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media