Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/605397.605416acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Compiler optimization of scalar value communication between speculative threads

Published: 01 October 2002 Publication History
  • Get Citation Alerts
  • Abstract

    While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.

    References

    [1]
    AKKARY, H., AND DRISCOLL, M. A Dynamic Multithreading Processor. In MICRO-31 (December 1998).]]
    [2]
    AMMONS, G., AND LARUS, J. R. Improving data-flow analysis with path profiling. In Proc. ACM SIGPLAN 98 Conference on Programming Language Design and Implementation (1998).]]
    [3]
    BALL, T., AND LARUS, J. R. Efficient path profiling. In Proceedings of Micro-29 (1996).]]
    [4]
    BROADCOM CORPORATION. The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian.]]
    [5]
    CHANG, P. P., WARTER, N. J., MAHLKE, S. A., CHEN, W. Y., AND HWU, W. W. Three Superblock Scheduling Models for Superscalar and Superpipelined Processors. Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991.]]
    [6]
    CHEN, D. K., AND YEW, P. C. Statement re-ordering for DOACROSS loops. In International Conference on Parallel Processing (Aug. 1994), pp. 24-28.]]
    [7]
    CINTRA, M., MARTÍNEZ, J. F., AND TORRELLAS, J. Learning Cross-Thread Violations in Speculative Parallelization for Scalar Multiprocessors. In Proceedings of the 8th HPCA (February 2002).]]
    [8]
    CYTRON, R. Doacross: Beyond vectorization for multiprocessors. In International Conference on Parallel Processing (1986).]]
    [9]
    EMER, J. Ev8: The post-ultimate alpha.(keynote address). In International Conference on Parallel Architectures and Compilation Techniques (2001).]]
    [10]
    FISHER, J. A. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 13 (June 1981).]]
    [11]
    FRANKLIN, M. The Multiscalar Architecture. PhD thesis, University of Wisconsin -- Madison, 1993.]]
    [12]
    GALLAGHER, D. M., CHEN, W. Y., MAHLKE, S. A., GYLLENHAAL, J. C., AND HWU, W. W. Dynamic Memory Disambiguation Using the Memory Conflict Buffer. In Proceedings of the 6th ASPLOS (October 1994), pp. 183-195.]]
    [13]
    GOPAL, S., VIJAYKUMAR, T., SMITH, J., AND SOHI, G. Speculative Versioning Cache. In Proceedings of the 4th HPCA (February 1998).]]
    [14]
    GUPTA, M., AND NIM, R. Techniques for Speculative Run-Time Parallelization of Loops. In Supercomputing '98 (November 1998).]]
    [15]
    HAMMOND, L., WILLEY, M., AND OLUKOTUN, K. Data Speculation Support for a Chip Multiprocessor. In Proceedings of ASPLOS-VIII (October 1998).]]
    [16]
    HOLLEY, L. H., AND K. ROSEN, B. Qualified data flow problems. IEEE Transactions on Software Engineering 7, 1 (Jan. 1981).]]
    [17]
    KAHLE, J. Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99 (October 1999).]]
    [18]
    KNOOP, J., AND RUTHING, O. Lazy code motion. In Proc. ACM SIGPLAN 92 Conference on Programming Language Design and Implementation (92).]]
    [19]
    KRISHNAN, V., AND TORRELLAS, J. The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In Proceedings of PACT '99 (October 1999).]]
    [20]
    MARCUELLO, P., AND GONZLEZ, A. Clustered Speculative Multithreaded Processors. In Proc. of the ACM Int. Conf. on Supercomputing (June 1999).]]
    [21]
    MARCUELLO, P., TUBELLA, J., AND GONZSSLEZ, A. Value prediction for speculative multithreaded architectures. In Proceedings of Micro-32 (Haifa, Israel, Nov. 1999).]]
    [22]
    MIDKIFF, S. P., AND PADUA, D. A. Compiler algorithms for synchronization. IEEE Transactions on Computers C-36, 12 (1987), 1485-1495.]]
    [23]
    MOSHOVOS, A. I., BREACH, S. E., VIJAYKUMAR, T., AND SOHI, G. S. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th ISCA (June 1997).]]
    [24]
    NICOLAU, A. Run-time Disambiguation: Coping with Statically Unpredictable Dependencies. IEEE Transactions on Computers 38 (May 1989), 663-678.]]
    [25]
    OPLINGER, J., HEINE, D., AND LAM, M. S. In Search of Speculative Thread-Level Parallelism. In Proceedings of PACT '99 (October 1999).]]
    [26]
    PADUA, D., KUCK, D., AND LAWRIE, D. High-speed multiprocessors and compilation techniques. IEEE Transactions on Computing (September 1980).]]
    [27]
    SOHI, G. S., BREACH, S., AND VIJAYKUMAR, T. Multiscalar processors. In Proceedings of the 22nd ISCA (June 1995).]]
    [28]
    STANDARD PERFORMANCE EVALUATION CORPORATION. The SPEC Benchmark Suite. http://www.specbench.org.]]
    [29]
    STEFFAN, J. G., COLOHAN, C. B., AND MOWRY, T. C. Architectural Support for Thread-Level Data Speculation. Tech. Rep. CMU-CS-97-188, School of Computer Science, Carnegie Mellon University, November 1997.]]
    [30]
    STEFFAN, J. G., COLOHAN, C. B., ZHAI, A., AND MOWRY, T. C. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th ISCA (June 2000).]]
    [31]
    STEFFAN, J. G., COLOHAN, C. B., ZHAI, A., AND MOWRY, T. C. Improving Value Communication For Thread-Level Speculation. In Proceedings of the 8th HPCA (February 2002).]]
    [32]
    TJIANG, S., WOLF, M., LAM, M., PIEPER, K., AND HENNESSY, J. Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Germany, 1992, pp. 137-151.]]
    [33]
    TREMBLAY, M. MAJC: Microprocessor Architecture for Java Computing. HotChips '99 (August 1999).]]
    [34]
    TSAI, J.-Y., HUANG, J., AMLO, C., LILJA, D., AND YEW, P.-C. The Superthreaded Processor Architecture. IEEE Transactions on Computers, Special Issue on Multithreaded Architectures 48, 9 (September 1999).]]
    [35]
    VIJAYKUMAR, T. Compiling for the Multiscalar Architecture. PhD thesis, Computer Sciences Department, University of Wisconsin-Madison, Jan. 1998.]]
    [36]
    YEAGER, K. The MIPS R10000 superscalar microprocessor. IEEE Micro (April 1996).]]
    [37]
    ZHAI, A., COLOHAN, C. B., STEFFAN, J. G., AND MOWRY, T. C. Compiler Optimizations to Accelerate Scalar Value Communication Between Speculative Threads. Tech. Rep. CMU-CS-02-162, School of Computer Science, Carnegie Mellon University, August 2002.]]
    [38]
    ZHU, C.-Q., AND YEW, P.-C. A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering 13, 6 (June 1987), 726-739.]]
    [39]
    ZILLES, C. B., AND SOHI, G. S. Master/Slave Speculative Parallelization with Distilled Programs. Tech. Rep. TR-1438, Computer Sciences Department, University of Wisconsin-Madison, April 2002.]]

    Cited By

    View all
    • (2020)T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00024(159-172)Online publication date: May-2020
    • (2018)Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread ApplicationsJournal of Information Processing10.2197/ipsjjip.26.44526(445-460)Online publication date: 2018
    • (2016)Performance Estimation of Task Graphs Based on Path ProfilingInternational Journal of Parallel Programming10.1007/s10766-015-0372-744:4(735-771)Online publication date: 1-Aug-2016
    • Show More Cited By
    1. Compiler optimization of scalar value communication between speculative threads

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
      October 2002
      318 pages
      ISBN:1581135742
      DOI:10.1145/605397
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 30, Issue 5
        Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems
        December 2002
        296 pages
        ISSN:0163-5964
        DOI:10.1145/635506
        Issue’s Table of Contents
      • cover image ACM SIGOPS Operating Systems Review
        ACM SIGOPS Operating Systems Review  Volume 36, Issue 5
        December 2002
        296 pages
        ISSN:0163-5980
        DOI:10.1145/635508
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 37, Issue 10
        October 2002
        296 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/605432
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 October 2002

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ASPLOS02

      Acceptance Rates

      ASPLOS X Paper Acceptance Rate 24 of 175 submissions, 14%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00024(159-172)Online publication date: May-2020
      • (2018)Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread ApplicationsJournal of Information Processing10.2197/ipsjjip.26.44526(445-460)Online publication date: 2018
      • (2016)Performance Estimation of Task Graphs Based on Path ProfilingInternational Journal of Parallel Programming10.1007/s10766-015-0372-744:4(735-771)Online publication date: 1-Aug-2016
      • (2014)HELIX-RCProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665705(217-228)Online publication date: 14-Jun-2014
      • (2014)HELIX-RCACM SIGARCH Computer Architecture News10.1145/2678373.266570542:3(217-228)Online publication date: 14-Jun-2014
      • (2014)HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)10.1109/ISCA.2014.6853215(217-228)Online publication date: Jun-2014
      • (2014)A Dynamically Adaptive Approach for Speculative Loop Execution in SMT ArchitecturesProceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)10.1109/HPCC.2014.171(1024-1031)Online publication date: 20-Aug-2014
      • (2014)Dynamic Core Allocation for Energy-Efficient Thread-Level SpeculationProceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering10.1109/CSE.2014.145(682-689)Online publication date: 19-Dec-2014
      • (2013)The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread executionACM Transactions on Architecture and Code Optimization10.1145/2541228.254123310:4(1-29)Online publication date: 1-Dec-2013
      • (2012)Disjoint out-of-order execution processorACM Transactions on Architecture and Code Optimization10.1145/2355585.23555929:3(1-32)Online publication date: 5-Oct-2012
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media