Article

Compiler optimization of scalar value communication between speculative threads

Authors:

Christopher B. Colohan,

J. Gregory Steffan,

Todd C. MowryAuthors Info & Claims

ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems

Pages 171 - 183

https://doi.org/10.1145/605397.605416

Published: 01 October 2002 Publication History

Abstract

While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.

References

[1]

AKKARY, H., AND DRISCOLL, M. A Dynamic Multithreading Processor. In MICRO-31 (December 1998).]]

Digital Library

[2]

AMMONS, G., AND LARUS, J. R. Improving data-flow analysis with path profiling. In Proc. ACM SIGPLAN 98 Conference on Programming Language Design and Implementation (1998).]]

Digital Library

[3]

BALL, T., AND LARUS, J. R. Efficient path profiling. In Proceedings of Micro-29 (1996).]]

Digital Library

[4]

BROADCOM CORPORATION. The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian.]]

[5]

CHANG, P. P., WARTER, N. J., MAHLKE, S. A., CHEN, W. Y., AND HWU, W. W. Three Superblock Scheduling Models for Superscalar and Superpipelined Processors. Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991.]]

[6]

CHEN, D. K., AND YEW, P. C. Statement re-ordering for DOACROSS loops. In International Conference on Parallel Processing (Aug. 1994), pp. 24-28.]]

Digital Library

[7]

CINTRA, M., MARTÍNEZ, J. F., AND TORRELLAS, J. Learning Cross-Thread Violations in Speculative Parallelization for Scalar Multiprocessors. In Proceedings of the 8th HPCA (February 2002).]]

Digital Library

[8]

CYTRON, R. Doacross: Beyond vectorization for multiprocessors. In International Conference on Parallel Processing (1986).]]

[9]

EMER, J. Ev8: The post-ultimate alpha.(keynote address). In International Conference on Parallel Architectures and Compilation Techniques (2001).]]

[10]

FISHER, J. A. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 13 (June 1981).]]

[11]

FRANKLIN, M. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.]]

Digital Library

[12]

GALLAGHER, D. M., CHEN, W. Y., MAHLKE, S. A., GYLLENHAAL, J. C., AND HWU, W. W. Dynamic Memory Disambiguation Using the Memory Conflict Buffer. In Proceedings of the 6th ASPLOS (October 1994), pp. 183-195.]]

Digital Library

[13]

GOPAL, S., VIJAYKUMAR, T., SMITH, J., AND SOHI, G. Speculative Versioning Cache. In Proceedings of the 4th HPCA (February 1998).]]

Digital Library

[14]

GUPTA, M., AND NIM, R. Techniques for Speculative Run-Time Parallelization of Loops. In Supercomputing '98 (November 1998).]]

Digital Library

[15]

HAMMOND, L., WILLEY, M., AND OLUKOTUN, K. Data Speculation Support for a Chip Multiprocessor. In Proceedings of ASPLOS-VIII (October 1998).]]

Digital Library

[16]

HOLLEY, L. H., AND K. ROSEN, B. Qualified data flow problems. IEEE Transactions on Software Engineering 7, 1 (Jan. 1981).]]

[17]

KAHLE, J. Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99 (October 1999).]]

[18]

KNOOP, J., AND RUTHING, O. Lazy code motion. In Proc. ACM SIGPLAN 92 Conference on Programming Language Design and Implementation (92).]]

Digital Library

[19]

KRISHNAN, V., AND TORRELLAS, J. The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In Proceedings of PACT '99 (October 1999).]]

Digital Library

[20]

MARCUELLO, P., AND GONZLEZ, A. Clustered Speculative Multithreaded Processors. In Proc. of the ACM Int. Conf. on Supercomputing (June 1999).]]

Digital Library

[21]

MARCUELLO, P., TUBELLA, J., AND GONZSSLEZ, A. Value prediction for speculative multithreaded architectures. In Proceedings of Micro-32 (Haifa, Israel, Nov. 1999).]]

Digital Library

[22]

MIDKIFF, S. P., AND PADUA, D. A. Compiler algorithms for synchronization. IEEE Transactions on Computers C-36, 12 (1987), 1485-1495.]]

Digital Library

[23]

MOSHOVOS, A. I., BREACH, S. E., VIJAYKUMAR, T., AND SOHI, G. S. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th ISCA (June 1997).]]

Digital Library

[24]

NICOLAU, A. Run-time Disambiguation: Coping with Statically Unpredictable Dependencies. IEEE Transactions on Computers 38 (May 1989), 663-678.]]

Digital Library

[25]

OPLINGER, J., HEINE, D., AND LAM, M. S. In Search of Speculative Thread-Level Parallelism. In Proceedings of PACT '99 (October 1999).]]

Digital Library

[26]

PADUA, D., KUCK, D., AND LAWRIE, D. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computing (September 1980).]]

[27]

SOHI, G. S., BREACH, S., AND VIJAYKUMAR, T. Multiscalar processors. In Proceedings of the 22nd ISCA (June 1995).]]

Digital Library

[28]

STANDARD PERFORMANCE EVALUATION CORPORATION. The SPEC Benchmark Suite. http://www.specbench.org.]]

[29]

STEFFAN, J. G., COLOHAN, C. B., AND MOWRY, T. C. Architectural Support for Thread-Level Data Speculation. Tech. Rep. CMU-CS-97-188, School of Computer Science, Carnegie Mellon University, November 1997.]]

[30]

STEFFAN, J. G., COLOHAN, C. B., ZHAI, A., AND MOWRY, T. C. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th ISCA (June 2000).]]

Digital Library

[31]

STEFFAN, J. G., COLOHAN, C. B., ZHAI, A., AND MOWRY, T. C. Improving Value Communication For Thread-Level Speculation. In Proceedings of the 8th HPCA (February 2002).]]

Digital Library

[32]

TJIANG, S., WOLF, M., LAM, M., PIEPER, K., AND HENNESSY, J. Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Germany, 1992, pp. 137-151.]]

[33]

TREMBLAY, M. MAJC: Microprocessor Architecture for Java Computing. HotChips '99 (August 1999).]]

[34]

TSAI, J.-Y., HUANG, J., AMLO, C., LILJA, D., AND YEW, P.-C. The Superthreaded Processor Architecture. IEEE Transactions on Computers, Special Issue on Multithreaded Architectures 48, 9 (September 1999).]]

Digital Library

[35]

VIJAYKUMAR, T. Compiling for the Multiscalar Architecture. PhD thesis, Computer Sciences Department, University of Wisconsin-Madison, Jan. 1998.]]

Digital Library

[36]

YEAGER, K. The MIPS R10000 superscalar microprocessor. IEEE Micro (April 1996).]]

Digital Library

[37]

ZHAI, A., COLOHAN, C. B., STEFFAN, J. G., AND MOWRY, T. C. Compiler Optimizations to Accelerate Scalar Value Communication Between Speculative Threads. Tech. Rep. CMU-CS-02-162, School of Computer Science, Carnegie Mellon University, August 2002.]]

[38]

ZHU, C.-Q., AND YEW, P.-C. A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering 13, 6 (June 1987), 726-739.]]

Digital Library

[39]

ZILLES, C. B., AND SOHI, G. S. Master/Slave Speculative Parallelization with Distilled Programs. Tech. Rep. TR-1438, Computer Sciences Department, University of Wisconsin-Madison, April 2002.]]

Cited By

Ying VJeffrey MSanchez D(2020)T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00024(159-172)Online publication date: May-2020
https://doi.org/10.1109/ISCA45697.2020.00024
Doi KShioya RAndo H(2018)Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread ApplicationsJournal of Information Processing10.2197/ipsjjip.26.44526(445-460)Online publication date: 2018
https://doi.org/10.2197/ipsjjip.26.445
Lattuada MPilato CFerrandi F(2016)Performance Estimation of Task Graphs Based on Path ProfilingInternational Journal of Parallel Programming10.1007/s10766-015-0372-744:4(735-771)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s10766-015-0372-7
Show More Cited By

Compiler optimization of scalar value communication between speculative threads
1. Software and its engineering
  1. Software notations and tools

Recommendations

Compiler optimization of scalar value communication between speculative threads

While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this ...
Compiler optimization of scalar value communication between speculative threads

While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this ...
Compiler optimization of scalar value communication between speculative threads
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems

While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems

October 2002

318 pages

ISBN:1581135742

DOI:10.1145/605397

Conference Chair:
Kourosh Gharachorloo
Compaq Western Research Lab
,
Program Chair:
David A. Wood

ACM SIGARCH Computer Architecture News Volume 30, Issue 5
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
December 2002
296 pages
ISSN:0163-5964
DOI:10.1145/635506
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 36, Issue 5
December 2002
296 pages
ISSN:0163-5980
DOI:10.1145/635508
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 37, Issue 10
October 2002
296 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/605432
Issue’s Table of Contents

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS02

Sponsor:

ASPLOS02: Tenth International Conference on Architectural Support for Programming Languages and Operating Systems

October 5 - 9, 2002

California, San Jose

Acceptance Rates

ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

82
Total Citations
View Citations
935
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ying VJeffrey MSanchez D(2020)T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00024(159-172)Online publication date: May-2020
https://doi.org/10.1109/ISCA45697.2020.00024
Doi KShioya RAndo H(2018)Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread ApplicationsJournal of Information Processing10.2197/ipsjjip.26.44526(445-460)Online publication date: 2018
https://doi.org/10.2197/ipsjjip.26.445
Lattuada MPilato CFerrandi F(2016)Performance Estimation of Task Graphs Based on Path ProfilingInternational Journal of Parallel Programming10.1007/s10766-015-0372-744:4(735-771)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s10766-015-0372-7
Campanoni SBrownell KKanev SJones TWei GBrooks DYew PZhai AKeckler S(2014)HELIX-RCProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665705(217-228)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665705
Campanoni SBrownell KKanev SJones TWei GBrooks D(2014)HELIX-RCACM SIGARCH Computer Architecture News10.1145/2678373.266570542:3(217-228)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.1145/2678373.2665705
Campanoni SBrownell KKanev SJones TWei GBrooks D(2014)HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)10.1109/ISCA.2014.6853215(217-228)Online publication date: Jun-2014
https://doi.org/10.1109/ISCA.2014.6853215
Li MZhao Y(2014)A Dynamically Adaptive Approach for Speculative Loop Execution in SMT ArchitecturesProceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)10.1109/HPCC.2014.171(1024-1031)Online publication date: 20-Aug-2014
https://dl.acm.org/doi/10.1109/HPCC.2014.171
Li MZhao YSi Y(2014)Dynamic Core Allocation for Energy-Efficient Thread-Level SpeculationProceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering10.1109/CSE.2014.145(682-689)Online publication date: 19-Dec-2014
https://dl.acm.org/doi/10.1109/CSE.2014.145
Luo YHsu WZhai A(2013)The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread executionACM Transactions on Architecture and Code Optimization10.1145/2541228.254123310:4(1-29)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2541228.2541233
Sharafeddine MJothi KAkkary H(2012)Disjoint out-of-order execution processorACM Transactions on Architecture and Code Optimization10.1145/2355585.23555929:3(1-32)Online publication date: 5-Oct-2012
https://dl.acm.org/doi/10.1145/2355585.2355592
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents