Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Performance of database workloads on shared-memory systems with out-of-order processors

Published: 01 October 1998 Publication History
  • Get Citation Alerts
  • Abstract

    Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering workloads. Given the radically different behavior of database workloads (especially OLTP), it is important to re-evaluate key system design decisions in the context of this important class of applications.This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue is indeed effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively. In addition, speculative techniques enable optimized implementations of memory consistency models that significantly improve the performance of stricter consistency models, bringing the performance to within 10--15% of the performance of more relaxed models.The second part of our study focuses on the more challenging OLTP workload. We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within 15%). Furthermore, our characterization shows that a large fraction of the data communication misses in OLTP exhibit migratory behavior; our preliminary results show that software prefetch and writeback/flush hints can be used for this data to further reduce execution time by 12%.

    References

    [1]
    H. Abdel-Shafi, J. Hall, S. V. Adve, and V. S. Adve. An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors. In Proceedings of the 3rd International Symposium on High-Pe .rformance Computer Architecture, pages 204-215, February 1997.
    [2]
    L. A. Barroso, K. Gharachorloo, and E D. Bugnion. Memory System Characterization of Commercial Workloads. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
    [3]
    A. L. Cox and R. J. Fowler. Adaptive cache coherency for detecting migratory shared data. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 98-I08, May 1993.
    [4]
    Z. Cventanovic and D. Bhandarkar. Performance characterization of the Alpha 21164 microprocessor using TP and SPECworkloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60-70, Apr 1994.
    [5]
    Z. Cvetanovic and D. D. Donaldson. AlphaServer 4100 performance characterization. Digital Technical Journal, 8(4):3-20, 1996.
    [6]
    R. J. Eickemeyer, R. E. Johnson, S. R. Kunkel, M. S. Squillante, and S. Liu. Evaluation of multithreaded uniprocessors for commercial application environments. In Proceedings of the 21th Annual International Symposium on Computer Architecture, pages 203-212, June 1996.
    [7]
    K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models, in Proceedings of the 1991 International Conference on Parallel Processing, pages 1:355-364, August 1991.
    [8]
    K. Gharachorloo, A. Gupta, and J. Hennessy. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 22-33, May 1992.
    [9]
    M.D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative shared memory: Software and hardware support for scalable multiprocessors. TOCS, 11 (4):300-318, November 1993.
    [10]
    N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings 17th Annual International Symposium on Computer Architecture, pages 364-373, June 1990.
    [11]
    K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance Characterization of the Quad Pentium Pro SMP Using OLTP Workloads. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
    [12]
    D. Krofi. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual international Symposium on Computer Architecture, pages 81-85, 1981.
    [13]
    J. L. Lo, L. A. Barroso, S. J. Eggers, K. Gharachorloo, H. M. Levy, and S. S. Parekh. An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
    [14]
    A. M. G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proceedings of the Sixth International Conjerence on Architectural Support for Programming Languages and Operating Systems, pages 145-156, Oct 1994.
    [15]
    J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. In IEEE Technical Committee on Computer Architecture Newsletter, Dec 1995.
    [16]
    B. A. Nayfeh, L. Hammond, and K. Olukotun. Evaluation of Design Alternatives for a Multiprocessor Micro processor, in Proceedings of the 23rd lnternational Symp. on Computer Architec ture, pages 67-77, May 1996.
    [17]
    V. S. Pal, P. Ranganathan, and S. V. Adve. RSIM Reference Manual version 1.0. Technical Report 9705, Department of Electrical and Computer e University, August 1997.
    [18]
    V. S. Pai, P. Ranganathan, and S. V. Adve. The Impact of Instruction Level Parallelism on Multiprocessor Performance and Simulation Methodology. In Proceedings of the 3rd international Symposium on High Pe.rl'brmance Computer Architecture, pages 72-83, February 1997.
    [19]
    V. S. Pai, P. Ranganathan, S. V. Adve, and T. Harton. An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors. in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12-23, Oct~ 1996.
    [20]
    S. E. Perl and R. L. Sites. Studies of windows NT performance using dynamic execution traces. In Proceedings of the Second Symposium on Operating System Design and Implementation, pages 169- 184, Oct. 1996.
    [21]
    M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta. The impact of architectural trends on operating system performance. in Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pages 285-298, 1995.
    [22]
    J. Skeppstedt and P. Stenstrom. A Compiler Algorithm that Reduces Read Latency in Ownership-Based Cache Coherence Protocols. In International Conference on Parallel Architectures and Compilation Techniques, 1995.
    [23]
    A. Srivastava and A. Eustace. ATOM: A System for Building Customized Program Analys is Tools. Proceedings of the ACM SIGPLAN '94 Conference on Programming Languages, March 1994.
    [24]
    Standard Performance Council. The SPEC95 CPU Benchmark Suite. http://www.specbench.org, 1995.
    [25]
    P. Stenstrom, M. Brorsson, and L. Sandberg. An adaptive cache coherence protocol optimized for migratory sharing. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 109-118, May 1993.
    [26]
    T-Y. Yeh and Y.N.Patt. Alternative Implementations of Two-level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992.
    [27]
    S. S. Thakkar and M. Sweiger. Performance of an OLTP application on Symmetry multiprocessor system. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 228- 238, June 1990.
    [28]
    P. Trancoso, J.-L. Larriba-Pey, Z. Zhang, and J. Torrellas. The memory performance of DSS commercial workloads in sharedmemory multiprocessors. In Third International Symposium on High- Performance Computer Architecture, Jan 1997.
    [29]
    Transaction Processing Performance Council. TPC Benchmark B (Online Transaction Processing) Standard Specification, 1990.
    [30]
    Transaction Processing Performance Council. TPC Benchmark D (Decision Support) Standard Specification, Dec 1995.
    [31]
    S. C. Woo, M. Ohara, E. Tome, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, June 1995.

    Cited By

    View all
    • (2020)Boosting Store Buffer Efficiency with Store-Prefetch Bursts2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00054(568-580)Online publication date: Oct-2020
    • (2016)Micro-architectural Analysis of In-memory OLTPProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882916(387-402)Online publication date: 26-Jun-2016
    • (2016)An Efficient Hybrid-Switched Network-on-Chip for Chip MultiprocessorsIEEE Transactions on Computers10.1109/TC.2015.244984665:5(1656-1662)Online publication date: 1-May-2016
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 33, Issue 11
    Nov. 1998
    309 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/291006
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
      October 1998
      326 pages
      ISBN:1581131070
      DOI:10.1145/291069
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 1998
    Published in SIGPLAN Volume 33, Issue 11

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)114
    • Downloads (Last 6 weeks)29
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Boosting Store Buffer Efficiency with Store-Prefetch Bursts2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00054(568-580)Online publication date: Oct-2020
    • (2016)Micro-architectural Analysis of In-memory OLTPProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882916(387-402)Online publication date: 26-Jun-2016
    • (2016)An Efficient Hybrid-Switched Network-on-Chip for Chip MultiprocessorsIEEE Transactions on Computers10.1109/TC.2015.244984665:5(1656-1662)Online publication date: 1-May-2016
    • (2015)Selection on Modern CPUsProceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics10.1145/2803140.2803145(1-8)Online publication date: 31-Aug-2015
    • (2001)The effect of seance communication on multiprocessing systemsACM Transactions on Computer Systems (TOCS)10.1145/377769.37778019:2(252-281)Online publication date: 1-May-2001
    • (2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
    • (2021)Post‐Moore Datacenter Server ArchitectureMulti‐Processor System‐on‐Chip 210.1002/9781119818410.ch6(123-134)Online publication date: 28-Apr-2021
    • (2020)Managing the Hidden Costs of CoordinationQueue10.1145/3380774.338077917:6(71-93)Online publication date: 21-Jan-2020
    • (2020)Cognitive Work of Hypothesis Exploration During Anomaly ResponseQueue10.1145/3380774.338077817:6(52-70)Online publication date: 21-Jan-2020
    • (2020)RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00057(609-621)Online publication date: Oct-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media