Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/605397.605416acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Compiler optimization of scalar value communication between speculative threads

Published: 01 October 2002 Publication History

Abstract

While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.

References

[1]
AKKARY, H., AND DRISCOLL, M. A Dynamic Multithreading Processor. In MICRO-31 (December 1998).]]
[2]
AMMONS, G., AND LARUS, J. R. Improving data-flow analysis with path profiling. In Proc. ACM SIGPLAN 98 Conference on Programming Language Design and Implementation (1998).]]
[3]
BALL, T., AND LARUS, J. R. Efficient path profiling. In Proceedings of Micro-29 (1996).]]
[4]
BROADCOM CORPORATION. The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian.]]
[5]
CHANG, P. P., WARTER, N. J., MAHLKE, S. A., CHEN, W. Y., AND HWU, W. W. Three Superblock Scheduling Models for Superscalar and Superpipelined Processors. Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991.]]
[6]
CHEN, D. K., AND YEW, P. C. Statement re-ordering for DOACROSS loops. In International Conference on Parallel Processing (Aug. 1994), pp. 24-28.]]
[7]
CINTRA, M., MARTÍNEZ, J. F., AND TORRELLAS, J. Learning Cross-Thread Violations in Speculative Parallelization for Scalar Multiprocessors. In Proceedings of the 8th HPCA (February 2002).]]
[8]
CYTRON, R. Doacross: Beyond vectorization for multiprocessors. In International Conference on Parallel Processing (1986).]]
[9]
EMER, J. Ev8: The post-ultimate alpha.(keynote address). In International Conference on Parallel Architectures and Compilation Techniques (2001).]]
[10]
FISHER, J. A. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 13 (June 1981).]]
[11]
FRANKLIN, M. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.]]
[12]
GALLAGHER, D. M., CHEN, W. Y., MAHLKE, S. A., GYLLENHAAL, J. C., AND HWU, W. W. Dynamic Memory Disambiguation Using the Memory Conflict Buffer. In Proceedings of the 6th ASPLOS (October 1994), pp. 183-195.]]
[13]
GOPAL, S., VIJAYKUMAR, T., SMITH, J., AND SOHI, G. Speculative Versioning Cache. In Proceedings of the 4th HPCA (February 1998).]]
[14]
GUPTA, M., AND NIM, R. Techniques for Speculative Run-Time Parallelization of Loops. In Supercomputing '98 (November 1998).]]
[15]
HAMMOND, L., WILLEY, M., AND OLUKOTUN, K. Data Speculation Support for a Chip Multiprocessor. In Proceedings of ASPLOS-VIII (October 1998).]]
[16]
HOLLEY, L. H., AND K. ROSEN, B. Qualified data flow problems. IEEE Transactions on Software Engineering 7, 1 (Jan. 1981).]]
[17]
KAHLE, J. Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99 (October 1999).]]
[18]
KNOOP, J., AND RUTHING, O. Lazy code motion. In Proc. ACM SIGPLAN 92 Conference on Programming Language Design and Implementation (92).]]
[19]
KRISHNAN, V., AND TORRELLAS, J. The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In Proceedings of PACT '99 (October 1999).]]
[20]
MARCUELLO, P., AND GONZLEZ, A. Clustered Speculative Multithreaded Processors. In Proc. of the ACM Int. Conf. on Supercomputing (June 1999).]]
[21]
MARCUELLO, P., TUBELLA, J., AND GONZSSLEZ, A. Value prediction for speculative multithreaded architectures. In Proceedings of Micro-32 (Haifa, Israel, Nov. 1999).]]
[22]
MIDKIFF, S. P., AND PADUA, D. A. Compiler algorithms for synchronization. IEEE Transactions on Computers C-36, 12 (1987), 1485-1495.]]
[23]
MOSHOVOS, A. I., BREACH, S. E., VIJAYKUMAR, T., AND SOHI, G. S. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th ISCA (June 1997).]]
[24]
NICOLAU, A. Run-time Disambiguation: Coping with Statically Unpredictable Dependencies. IEEE Transactions on Computers 38 (May 1989), 663-678.]]
[25]
OPLINGER, J., HEINE, D., AND LAM, M. S. In Search of Speculative Thread-Level Parallelism. In Proceedings of PACT '99 (October 1999).]]
[26]
PADUA, D., KUCK, D., AND LAWRIE, D. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computing (September 1980).]]
[27]
SOHI, G. S., BREACH, S., AND VIJAYKUMAR, T. Multiscalar processors. In Proceedings of the 22nd ISCA (June 1995).]]
[28]
STANDARD PERFORMANCE EVALUATION CORPORATION. The SPEC Benchmark Suite. http://www.specbench.org.]]
[29]
STEFFAN, J. G., COLOHAN, C. B., AND MOWRY, T. C. Architectural Support for Thread-Level Data Speculation. Tech. Rep. CMU-CS-97-188, School of Computer Science, Carnegie Mellon University, November 1997.]]
[30]
STEFFAN, J. G., COLOHAN, C. B., ZHAI, A., AND MOWRY, T. C. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th ISCA (June 2000).]]
[31]
STEFFAN, J. G., COLOHAN, C. B., ZHAI, A., AND MOWRY, T. C. Improving Value Communication For Thread-Level Speculation. In Proceedings of the 8th HPCA (February 2002).]]
[32]
TJIANG, S., WOLF, M., LAM, M., PIEPER, K., AND HENNESSY, J. Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Germany, 1992, pp. 137-151.]]
[33]
TREMBLAY, M. MAJC: Microprocessor Architecture for Java Computing. HotChips '99 (August 1999).]]
[34]
TSAI, J.-Y., HUANG, J., AMLO, C., LILJA, D., AND YEW, P.-C. The Superthreaded Processor Architecture. IEEE Transactions on Computers, Special Issue on Multithreaded Architectures 48, 9 (September 1999).]]
[35]
VIJAYKUMAR, T. Compiling for the Multiscalar Architecture. PhD thesis, Computer Sciences Department, University of Wisconsin-Madison, Jan. 1998.]]
[36]
YEAGER, K. The MIPS R10000 superscalar microprocessor. IEEE Micro (April 1996).]]
[37]
ZHAI, A., COLOHAN, C. B., STEFFAN, J. G., AND MOWRY, T. C. Compiler Optimizations to Accelerate Scalar Value Communication Between Speculative Threads. Tech. Rep. CMU-CS-02-162, School of Computer Science, Carnegie Mellon University, August 2002.]]
[38]
ZHU, C.-Q., AND YEW, P.-C. A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering 13, 6 (June 1987), 726-739.]]
[39]
ZILLES, C. B., AND SOHI, G. S. Master/Slave Speculative Parallelization with Distilled Programs. Tech. Rep. TR-1438, Computer Sciences Department, University of Wisconsin-Madison, April 2002.]]

Cited By

View all
  • (2020)T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00024(159-172)Online publication date: May-2020
  • (2018)Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread ApplicationsJournal of Information Processing10.2197/ipsjjip.26.44526(445-460)Online publication date: 2018
  • (2016)Performance Estimation of Task Graphs Based on Path ProfilingInternational Journal of Parallel Programming10.1007/s10766-015-0372-744:4(735-771)Online publication date: 1-Aug-2016
  • Show More Cited By
  1. Compiler optimization of scalar value communication between speculative threads

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
    October 2002
    318 pages
    ISBN:1581135742
    DOI:10.1145/605397
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
      Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
      December 2002
      296 pages
      ISSN:0163-5964
      DOI:10.1145/635506
      Issue’s Table of Contents
    • cover image ACM SIGOPS Operating Systems Review
      ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
      December 2002
      296 pages
      ISSN:0163-5980
      DOI:10.1145/635508
      Issue’s Table of Contents
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 37, Issue 10
      October 2002
      296 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/605432
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ASPLOS02

    Acceptance Rates

    ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00024(159-172)Online publication date: May-2020
    • (2018)Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread ApplicationsJournal of Information Processing10.2197/ipsjjip.26.44526(445-460)Online publication date: 2018
    • (2016)Performance Estimation of Task Graphs Based on Path ProfilingInternational Journal of Parallel Programming10.1007/s10766-015-0372-744:4(735-771)Online publication date: 1-Aug-2016
    • (2014)HELIX-RCProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665705(217-228)Online publication date: 14-Jun-2014
    • (2014)HELIX-RCACM SIGARCH Computer Architecture News10.1145/2678373.266570542:3(217-228)Online publication date: 14-Jun-2014
    • (2014)HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)10.1109/ISCA.2014.6853215(217-228)Online publication date: Jun-2014
    • (2014)A Dynamically Adaptive Approach for Speculative Loop Execution in SMT ArchitecturesProceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)10.1109/HPCC.2014.171(1024-1031)Online publication date: 20-Aug-2014
    • (2014)Dynamic Core Allocation for Energy-Efficient Thread-Level SpeculationProceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering10.1109/CSE.2014.145(682-689)Online publication date: 19-Dec-2014
    • (2013)The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread executionACM Transactions on Architecture and Code Optimization10.1145/2541228.254123310:4(1-29)Online publication date: 1-Dec-2013
    • (2012)Disjoint out-of-order execution processorACM Transactions on Architecture and Code Optimization10.1145/2355585.23555929:3(1-32)Online publication date: 5-Oct-2012
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media