article

Free access

Performance of database workloads on shared-memory systems with out-of-order processors

Authors:

Parthasarathy Ranganathan,

Kourosh Gharachorloo,

Sarita V. Adve,

Luiz André BarrosoAuthors Info & Claims

ACM SIGPLAN Notices, Volume 33, Issue 11

Pages 307 - 318

https://doi.org/10.1145/291006.291067

Published: 01 October 1998 Publication History

Abstract

Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering workloads. Given the radically different behavior of database workloads (especially OLTP), it is important to re-evaluate key system design decisions in the context of this important class of applications.This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue is indeed effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively. In addition, speculative techniques enable optimized implementations of memory consistency models that significantly improve the performance of stricter consistency models, bringing the performance to within 10--15% of the performance of more relaxed models.The second part of our study focuses on the more challenging OLTP workload. We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within 15%). Furthermore, our characterization shows that a large fraction of the data communication misses in OLTP exhibit migratory behavior; our preliminary results show that software prefetch and writeback/flush hints can be used for this data to further reduce execution time by 12%.

References

[1]

H. Abdel-Shafi, J. Hall, S. V. Adve, and V. S. Adve. An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors. In Proceedings of the 3rd International Symposium on High-Pe .rformance Computer Architecture, pages 204-215, February 1997.

Digital Library

[2]

L. A. Barroso, K. Gharachorloo, and E D. Bugnion. Memory System Characterization of Commercial Workloads. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.

Digital Library

[3]

A. L. Cox and R. J. Fowler. Adaptive cache coherency for detecting migratory shared data. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 98-I08, May 1993.

Digital Library

[4]

Z. Cventanovic and D. Bhandarkar. Performance characterization of the Alpha 21164 microprocessor using TP and SPECworkloads. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 60-70, Apr 1994.

Digital Library

[5]

Z. Cvetanovic and D. D. Donaldson. AlphaServer 4100 performance characterization. Digital Technical Journal, 8(4):3-20, 1996.

[6]

R. J. Eickemeyer, R. E. Johnson, S. R. Kunkel, M. S. Squillante, and S. Liu. Evaluation of multithreaded uniprocessors for commercial application environments. In Proceedings of the 21th Annual International Symposium on Computer Architecture, pages 203-212, June 1996.

Digital Library

[7]

K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models, in Proceedings of the 1991 International Conference on Parallel Processing, pages 1:355-364, August 1991.

[8]

K. Gharachorloo, A. Gupta, and J. Hennessy. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 22-33, May 1992.

Digital Library

[9]

M.D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative shared memory: Software and hardware support for scalable multiprocessors. TOCS, 11 (4):300-318, November 1993.

Digital Library

[10]

N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings 17th Annual International Symposium on Computer Architecture, pages 364-373, June 1990.

Digital Library

[11]

K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance Characterization of the Quad Pentium Pro SMP Using OLTP Workloads. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.

Digital Library

[12]

D. Krofi. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual international Symposium on Computer Architecture, pages 81-85, 1981.

Digital Library

[13]

J. L. Lo, L. A. Barroso, S. J. Eggers, K. Gharachorloo, H. M. Levy, and S. S. Parekh. An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.

Digital Library

[14]

A. M. G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proceedings of the Sixth International Conjerence on Architectural Support for Programming Languages and Operating Systems, pages 145-156, Oct 1994.

Digital Library

[15]

J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. In IEEE Technical Committee on Computer Architecture Newsletter, Dec 1995.

[16]

B. A. Nayfeh, L. Hammond, and K. Olukotun. Evaluation of Design Alternatives for a Multiprocessor Micro processor, in Proceedings of the 23rd lnternational Symp. on Computer Architec ture, pages 67-77, May 1996.

Digital Library

[17]

V. S. Pal, P. Ranganathan, and S. V. Adve. RSIM Reference Manual version 1.0. Technical Report 9705, Department of Electrical and Computer e University, August 1997.

[18]

V. S. Pai, P. Ranganathan, and S. V. Adve. The Impact of Instruction Level Parallelism on Multiprocessor Performance and Simulation Methodology. In Proceedings of the 3rd international Symposium on High Pe.rl'brmance Computer Architecture, pages 72-83, February 1997.

Digital Library

[19]

V. S. Pai, P. Ranganathan, S. V. Adve, and T. Harton. An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors. in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12-23, Oct~ 1996.

Digital Library

[20]

S. E. Perl and R. L. Sites. Studies of windows NT performance using dynamic execution traces. In Proceedings of the Second Symposium on Operating System Design and Implementation, pages 169- 184, Oct. 1996.

Digital Library

[21]

M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta. The impact of architectural trends on operating system performance. in Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pages 285-298, 1995.

Digital Library

[22]

J. Skeppstedt and P. Stenstrom. A Compiler Algorithm that Reduces Read Latency in Ownership-Based Cache Coherence Protocols. In International Conference on Parallel Architectures and Compilation Techniques, 1995.

Digital Library

[23]

A. Srivastava and A. Eustace. ATOM: A System for Building Customized Program Analys is Tools. Proceedings of the ACM SIGPLAN '94 Conference on Programming Languages, March 1994.

Digital Library

[24]

Standard Performance Council. The SPEC95 CPU Benchmark Suite. http://www.specbench.org, 1995.

[25]

P. Stenstrom, M. Brorsson, and L. Sandberg. An adaptive cache coherence protocol optimized for migratory sharing. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 109-118, May 1993.

Digital Library

[26]

T-Y. Yeh and Y.N.Patt. Alternative Implementations of Two-level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992.

Digital Library

[27]

S. S. Thakkar and M. Sweiger. Performance of an OLTP application on Symmetry multiprocessor system. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 228- 238, June 1990.

Digital Library

[28]

P. Trancoso, J.-L. Larriba-Pey, Z. Zhang, and J. Torrellas. The memory performance of DSS commercial workloads in sharedmemory multiprocessors. In Third International Symposium on High- Performance Computer Architecture, Jan 1997.

Digital Library

[29]

Transaction Processing Performance Council. TPC Benchmark B (Online Transaction Processing) Standard Specification, 1990.

[30]

Transaction Processing Performance Council. TPC Benchmark D (Decision Support) Standard Specification, Dec 1995.

[31]

S. C. Woo, M. Ohara, E. Tome, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, June 1995.

Digital Library

Cited By

Cebrian JKaxiras SRos A(2020)Boosting Store Buffer Efficiency with Store-Prefetch Bursts2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00054(568-580)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00054
Sirin UTözün PPorobic DAilamaki AÖzcan FKoutrika GMadden S(2016)Micro-architectural Analysis of In-memory OLTPProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882916(387-402)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1145/2882903.2882916
Lotfi-Kamran PModarressi MSarbazi-Azad H(2016)An Efficient Hybrid-Switched Network-on-Chip for Chip MultiprocessorsIEEE Transactions on Computers10.1109/TC.2015.244984665:5(1656-1662)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1109/TC.2015.2449846
Show More Cited By

Index Terms

Recommendations

Performance of database workloads on shared-memory systems with out-of-order processors
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized ...
Performance of database workloads on shared-memory systems with out-of-order processors

Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized ...
An analysis of database workload performance on simultaneous multithreaded processors
ISCA '98: Proceedings of the 25th annual international symposium on Computer architecture

Simultaneous multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 33, Issue 11

Nov. 1998

309 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/291006

Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwel
Massachusetts Institute of Technology, Cambridge

Issue’s Table of Contents

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
October 1998
326 pages
ISBN:1581131070
DOI:10.1145/291069
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwal
Massachusetts Institute of Technology, Cambridge

Copyright © 1998 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 1998

Published in SIGPLAN Volume 33, Issue 11

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

210
Total Citations
View Citations
1,576
Total Downloads

Downloads (Last 12 months)114
Downloads (Last 6 weeks)29

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cebrian JKaxiras SRos A(2020)Boosting Store Buffer Efficiency with Store-Prefetch Bursts2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00054(568-580)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00054
Sirin UTözün PPorobic DAilamaki AÖzcan FKoutrika GMadden S(2016)Micro-architectural Analysis of In-memory OLTPProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882916(387-402)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1145/2882903.2882916
Lotfi-Kamran PModarressi MSarbazi-Azad H(2016)An Efficient Hybrid-Switched Network-on-Chip for Chip MultiprocessorsIEEE Transactions on Computers10.1109/TC.2015.244984665:5(1656-1662)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1109/TC.2015.2449846
Zeuch SFreytag J(2015)Selection on Modern CPUsProceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics10.1145/2803140.2803145(1-8)Online publication date: 31-Aug-2015
https://dl.acm.org/doi/10.1145/2803140.2803145
Mendelson AGabbay F(2001)The effect of seance communication on multiprocessing systemsACM Transactions on Computer Systems (TOCS)10.1145/377769.37778019:2(252-281)Online publication date: 1-May-2001
https://dl.acm.org/doi/10.1145/377769.377780
Gonzalez AKolli AKhan SLiu SDadu VKarandikar SChang JAsanovic KRanganathan PSolihin YHeinrich M(2023)Profiling Hyperscale Big Data ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589082(1-16)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589082
Falsafi B(2021)Post‐Moore Datacenter Server ArchitectureMulti‐Processor System‐on‐Chip 210.1002/9781119818410.ch6(123-134)Online publication date: 28-Apr-2021
https://doi.org/10.1002/9781119818410.ch6
Maguire L(2020)Managing the Hidden Costs of CoordinationQueue10.1145/3380774.338077917:6(71-93)Online publication date: 21-Jan-2020
https://dl.acm.org/doi/10.1145/3380774.3380779
Grayson M(2020)Cognitive Work of Hypothesis Exploration During Anomaly ResponseQueue10.1145/3380774.338077817:6(52-70)Online publication date: 21-Jan-2020
https://dl.acm.org/doi/10.1145/3380774.3380778
Zhang CZeng YShalf JGuo X(2020)RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00057(609-621)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00057
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents