Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2915218acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

DBSherlock: A Performance Diagnostic Tool for Transactional Databases

Published: 26 June 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Running an online transaction processing (OLTP) system is one of the most daunting tasks required of database administrators (DBAs). As businesses rely on OLTP databases to support their mission-critical and real-time applications, poor database performance directly impacts their revenue and user experience. As a result, DBAs constantly monitor, diagnose, and rectify any performance decays. Unfortunately, the manual process of debugging and diagnosing OLTP performance problems is extremely tedious and non-trivial. Rather than being caused by a single slow query, performance problems in OLTP databases are often due to a large number of concurrent and competing transactions adding up to compounded, non-linear effects that are difficult to isolate. Sudden changes in request volume, transactional patterns, network traffic, or data distribution can cause previously abundant resources to become scarce, and the performance to plummet.
    This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. By analyzing hundreds of statistics and configurations collected over the lifetime of the system, our algorithm quickly identifies a small set of potential causes and presents them to the DBA. The root-cause established by the DBA is reincorporated into our algorithm as a new causal model to improve future diagnoses. Our experiments show that this algorithm is substantially more accurate than the state-of-the-art algorithm in finding correct explanations.

    References

    [1]
    DBSeer. http://www.dbseer.org.
    [2]
    Microsoft Azure. https://azure.microsoft.com/.
    [3]
    OLTPBenchmark. http://oltpbenchmark.com/.
    [4]
    stress-ng. http://tinyurl.com/pw59xs3.
    [5]
    TPC-C benchmark. http://www.tpc.org/tpcc/.
    [6]
    TPC-E benchmark. http://www.tpc.org/tpce/.
    [7]
    S. Babu. Towards automatic optimization of mapreduce programs. In SoCC, 2010.
    [8]
    P. Belknap, B. Dageville, K. Dias, and K. Yagoub. Self-tuning for sql performance in oracle database 11g. In ICDE, 2009.
    [9]
    D. G. Benoit. Automatic diagnosis of performance problems in database management systems. In ICAC, 2005.
    [10]
    K. A. Bollen. Structural equations with latent variables. 1989.
    [11]
    N. Borisov, S. Uttamchandani, R. Routray, and A. Singh. Why did my query slow down. In CIDR, 2009.
    [12]
    D. P. Brown, A. Richards, and D. Galeazzi. Teradata active system management, 2008.
    [13]
    D. P. Brown and P. Sinclair. White paper: Real-time diagnostic tools for teradata's parallel query optimizer. Technical report, Teradata Solutions Group, 2000.
    [14]
    P. Buneman, S. Khanna, and T. Wang-Chiew. Why and where: A characterization of data provenance. In ICDT. 2001.
    [15]
    L. Cao, Q. Wang, and E. A. Rundensteiner. Interactive outlier exploration in big data streams. PVLDB, 7, 2014.
    [16]
    S. Chen, A. Ailamaki, M. Athanassoulis, P. B. Gibbons, R. Johnson, I. Pandis, and R. Stoica. Tpc-e vs. tpc-c: Characterizing the new tpc-e benchmark via an i/o comparison study. ACM SIGMOD Record, 39(3):5--10, 2011.
    [17]
    J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 2007.
    [18]
    T. M. Cover and J. A. Thomas. Entropy, relative entropy and mutual information. Elements of Information Theory, pages 12--49, 1991.
    [19]
    A. Deligiannakis, V. Stoumpos, Y. Kotidis, V. Vassalos, and A. Delis. Outlier-aware data aggregation in sensor networks. In ICDE, 2008.
    [20]
    K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis & tuning in oracle. In CIDR, 2005.
    [21]
    D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. Proceedings of the VLDB Endowment, 7(4):277--288, 2013.
    [22]
    S. Duan, S. Babu, and K. Munagala. Fa: A system for automating failure diagnosis. In ICDE, 2009.
    [23]
    S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with ituned. PVLDB, 2, 2009.
    [24]
    O. D. Duncan. Introduction to structural equation models. 1975.
    [25]
    M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996.
    [26]
    D. Gmach, S. Krompass, A. Scholz, M. Wimmer, and A. Kemper. Adaptive quality of service management for enterprise services. ACM Transactions on the Web (TWEB), 2(1):8, 2008.
    [27]
    I. Guyon and A. Elisseeff. An introduction to variable and feature selection. JMLR, 3, 2003.
    [28]
    J. Y. Halpern and J. Pearl. Causes and explanations: a structural-model approach. part i: causes. In UAI, 2001.
    [29]
    J. Y. Halpern and J. Pearl. Causes and explanations: a structural-model approach. part ii: explanations. In IJCAI, 2001.
    [30]
    H. Herodotou and S. Babu. Xplus: a sql-tuning-aware query optimizer. PVLDB, 3, 2010.
    [31]
    E. Jahani, M. J. Cafarella, and C. Ré. Automatic optimization for mapreduce programs. PVLDB, 4, 2011.
    [32]
    B. Kanagal, J. Li, and A. Deshpande. Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In SIGMOD, 2011.
    [33]
    E. Keogh, J. Lin, A. W. Fu, and H. Van Herle. Finding unusual medical time-series subsequences: Algorithms and applications. Information Technology in Biomedicine, IEEE Transactions on, 10(3):429--439, 2006.
    [34]
    N. Khoussainova, M. Balazinska, and D. Suciu. Perfxplain: Debugging mapreduce job performance. PVLDB, 5, 2012.
    [35]
    A. A. Mahimkar, Z. Ge, A. Shaikh, J. Wang, J. Yates, Y. Zhang, and Q. Zhao. Towards automated performance diagnosis in a large IPTV network. In ACM SIGCOMM Computer Communication Review, volume 39, pages 231--242. ACM, 2009.
    [36]
    A. Meliou, W. Gatterbauer, and D. Suciu. Bringing provenance to its full potential using causal reasoning. In TaPP, 2011.
    [37]
    B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013.
    [38]
    B. Mozafari, C. Curino, and S. Madden. Dbseer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.
    [39]
    B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. CliffGuard: A principled framework for finding robust database designs. In SIGMOD, 2015.
    [40]
    A. Rogge-Solti and G. Kasneci. Temporal anomaly detection in business processes. In Business Process Management. 2014.
    [41]
    S. Roy, A. C. König, I. Dvorkin, and M. Kumar. Perfaugur: Robust diagnostics for performance anomalies in cloud services. In ICDE, 2015.
    [42]
    S. Sarawagi. Explaining differences in multidimensional aggregates. In VLDB, 1999.
    [43]
    Y. Tan, X. Gu, and H. Wang. Adaptive system anomaly prediction for large-scale hosting infrastructures. In PODC, 2010.
    [44]
    L. Wei, N. Kumar, V. N. Lolla, E. J. Keogh, S. Lonardi, and C. A. Ratanamahatana. Assumption-free anomaly detection in time series. In SSDBM, volume 5, pages 237--242, 2005.
    [45]
    L. Wei, W. Qian, A. Zhou, W. Jin, and X. Jeffrey. Hot: Hypergraph-based outlier test for categorical data. In AKDDM. 2003.
    [46]
    E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. PVLDB, 6, 2013.
    [47]
    D. Y. Yoon, B. Mozafari, and D. P. Brown. DBSeer: Pain-free database administration through workload intelligence. PVLDB, 2015.
    [48]
    J. X. Yu, W. Qian, H. Lu, and A. Zhou. Finding centric local outliers in categorical/numerical spaces. KAIS, 9, 2006.

    Cited By

    View all
    • (2024)Multivariate time series collaborative compression for monitoring systems in securing cloud-based digital twinJournal of Cloud Computing10.1186/s13677-023-00579-413:1Online publication date: 10-Jan-2024
    • (2024)Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence ReasoningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623348(1-13)Online publication date: 20-May-2024
    • (2023)FASTune: Towards Fast and Stable Database Tuning System with Reinforcement LearningElectronics10.3390/electronics1210216812:10(2168)Online publication date: 10-May-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
    June 2016
    2300 pages
    ISBN:9781450335317
    DOI:10.1145/2882903
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. OLTP
    2. anomaly detection
    3. performance diagnosis
    4. transactions

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation

    Conference

    SIGMOD/PODS'16
    Sponsor:
    SIGMOD/PODS'16: International Conference on Management of Data
    June 26 - July 1, 2016
    California, San Francisco, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multivariate time series collaborative compression for monitoring systems in securing cloud-based digital twinJournal of Cloud Computing10.1186/s13677-023-00579-413:1Online publication date: 10-Jan-2024
    • (2024)Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence ReasoningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623348(1-13)Online publication date: 20-May-2024
    • (2023)FASTune: Towards Fast and Stable Database Tuning System with Reinforcement LearningElectronics10.3390/electronics1210216812:10(2168)Online publication date: 10-May-2023
    • (2023)Real-Time Workload Pattern Analysis for Large-Scale Cloud DatabasesProceedings of the VLDB Endowment10.14778/3611540.361155716:12(3689-3701)Online publication date: 1-Aug-2023
    • (2023)ShapleyIQ: Influence Quantification by Shapley Values for Performance Debugging of MicroservicesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624771(287-323)Online publication date: 25-Mar-2023
    • (2023)XInsight: eXplainable Data Analysis Through The Lens of CausalityProceedings of the ACM on Management of Data10.1145/35893011:2(1-27)Online publication date: 20-Jun-2023
    • (2023)DBPA: A Benchmark for Transactional Database Performance AnomaliesProceedings of the ACM on Management of Data10.1145/35889261:1(1-26)Online publication date: 30-May-2023
    • (2023)QEVIS: Multi-grained Visualization of Distributed Query ExecutionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.3326930(1-11)Online publication date: 2023
    • (2023)CODEC: Cost-Effective Duration Prediction System for Deadline Scheduling in the Cloud2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00069(298-308)Online publication date: 9-Oct-2023
    • (2023)Aegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud Systems2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00026(222-233)Online publication date: May-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media