Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Mixed speculative multithreaded execution models

Published: 05 October 2012 Publication History
  • Get Citation Alerts
  • Abstract

    The current trend toward multicore architectures has placed great pressure on programmers and compilers to generate thread-parallel programs. Improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution. One notable technique that facilitates the extraction of parallel threads from sequential applications is thread-level speculation (TLS). This technique allows programmers/compilers to generate threads without checking for inter-thread data and control dependences, which are then transparently enforced by the hardware. Most prior work on TLS has concentrated on thread selection and mechanisms to efficiently support the main TLS operations, such as squashes, data versioning, and commits.
    This article seeks to enhance TLS functionality by combining it with other speculative multithreaded execution models. The main idea is that TLS already requires extensive hardware support, which when slightly augmented can accommodate other speculative multithreaded techniques. Recognizing that for different applications, or even program phases, the application bottlenecks may be different, it is reasonable to assume that the more versatile a system is, the more efficiently it will be able to execute the given program.
    Toward this direction, we first show that mixed execution models that combine TLS with Helper Threads (HT), RunAhead execution (RA) and MultiPath execution (MP) perform better than any of the models alone. Based on a simple model that we propose, we show that benefits come from being able to extract additional ILP without harming the TLP extracted by TLS. We then show that by combining all the execution models in a unified one that combines all these speculative multithreaded models, ILP can be further enhanced with only minimal additional cost in hardware.

    References

    [1]
    Ahuja, P., Skadron, K., Martonosi, M., and Clark, D. 1998. Multipath execution: Opportunities and limits. In Proceedings of the International Conference on Supercomputing, 101--108.
    [2]
    Aragon, J. L., González,J., Garca, J. M., and González, A. 2001. Confidence estimation for branch prediction reversal. In Proceedings of the International Conference on High Performance Computing, 213--224.
    [3]
    Barnes, R., Nystrom, E., Sias, J., Patel, S., Navarro, N., and Hwu, W. M. 2003. Beating in-order stalls with “flea-ficker” two-pass pipelining. In Proceedings of the International Symposium on Microarchitecture. 387--398.
    [4]
    Ceze, L., Strauss, K., Tuck, J., Renau, J., and Torrellas, J. 2006. CAVA: Using checkpoint-assisted value prediction to hide L2 misses. ACM Trans. Architecture Code Optim. 3, 2, 182--208.
    [5]
    Chappell, R. S., Stark, J., Kim, S. P., Reinhardt, S. K., and Patt, Y. N. 1999. Simultaneous subordinate microthreading (SSMT). In Proceedings of the International Symposium on Computer Architecture. 186--195.
    [6]
    Chappell, R. S., Tseng, F., Patt, Y. N., and Yoaz, A. 2002. Difficult-path branch prediction using subordinate microthreads. In Proceedings of the International Symposium on Computer Architecture. 307--317.
    [7]
    Chaudhry, S., Cypher, R., Ekman, M., Karlsson, M., Landin, A., Yip, S., Zeffer, H., and Tremblay, M. 2009. Simultaneous speculative threading: A novel pipeline architecture implemented in Sun's ROCK Processor. In Proceedings of the International Symposium on Computer Architecture. 484--495.
    [8]
    Chidester, M. C., George, A. D., and Radlinski, M. A. 2003. Multiple-path execution for chip multiprocessors. J. Syst. Archit. 33--52.
    [9]
    Collins, J. D., Wang, H., Tullsen, D. M., Hughes, C., Lee, Y.-F., Lavery, D., and Shen, J. P. 2001. Speculative precomputation: Long-range prefetching of delinquent loads. In Proceedings of the International Symposium on Computer Architecture. 14--25.
    [10]
    Dundas, J. and Mudge, T. 1997. Improving data cache performance by pre-executing instructions under a cache miss. In Proceedings of the International Conference on Supercomputing. 68--75.
    [11]
    Garg, A., Parihar, R., and Huang, M. C. 2011. Speculative parallelization in decoupled look-ahead. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 413--423.
    [12]
    Grunwald, D., Klauser, A., Manne, S., and Pleszkun, A. 1998. Confidence estimation for speculation control. In Proceedings of the International Symposium on Computer Architecture. 122--131.
    [13]
    Hammond, L., Wiley, M., and Olukotun, K. 1998. Data speculation support for a chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 58--69.
    [14]
    Heil, T. and Smith, J. E. 1996. Selective dual path execution. Tech. Rep., Department of Electrical and Computer Engineering, University of Wisconsin-Madison.
    [15]
    Intel Corp. Intel turbo boost technology in intel core microarchitecture (nehalem) based processors. http://download.intel.com/design/processor/applnots/320354.pdf.
    [16]
    Ioannou, N. and Cintra, M. 2011.Complementing user-level coarse-grain parallelism with implicit speculative parallelism. In Proceedings of the International Symposium on Microarchitecture. 284--295.
    [17]
    Kim, H., Joao, J. A., Mutlu, O., and Patt, Y. N. 2006. Diverge-Merge Processor (DMP): Dynamic predicated execution of complex control-flow graphs based on frequently executed paths. In Proceedings of the International Symposium on Microarchitecture. 53--64.
    [18]
    Kirman, N., Kirman, M., Chaudhuri, M., and Martínez, J. F. 2005. Checkpointed early load retirement. In Proceedings of the International Symposium on High-Performance Computer Architecture. 16--27.
    [19]
    Klauser, A., Paithankar, A., and Grunwald, D. 1998. Selective eager execution on the polypath architecture In Proceedings of the International Symposium on Computer Architecture. 250--259.
    [20]
    Krishnan, V. and Torrellas, J. 1998. Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor. In Proceedings of the International Conference on Supercomputing, 85--92.
    [21]
    Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., and Torrellas, J. 2006. POSH: A TLS compiler that exploits program structure. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. 158--167.
    [22]
    Marcuello, P. and González, A. 1999. Clustered speculative multithreaded processors. In Proceedings of the International Conference on Supercomputing. 365--372.
    [23]
    Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: An alternative to very large instruction windows. In Proceedings of the International Symposium on High-Performance Computer Architecture. 129--140.
    [24]
    Nikas, K., Anastopoulos, N., Goumas, G., and Koziris, N. 2009. Employing transactional memory and helper threads to speed up Dijkstra's algorithm. In Proceedings of the International Conference on Parallel Processing, 388--395.
    [25]
    Porter, L. Choi, B., and Tullsen, D. M. 2009. Mapping out a path from hardware transactional memory to speculative multithreading. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 313--324.
    [26]
    Renau, J. SESC simulator. http://sesc.sourceforge.net.
    [27]
    Renau, J., Tuck, J., Liu, W., Ceze, L., Strauss, K., and Torrellas, J. 2005. Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation. In Proceedings of the International Conference on Supercomputing. 179--188.
    [28]
    Seznec, A. 2005. Analysis of the OGEHL predictor. In Proceedings of the International Symposium on Computer Architecture, 394--405.
    [29]
    Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. 1995. Multiscalar processors. In Proceedings of the International Symposium on Computer Architecture. 414--425.
    [30]
    Steffan, J. G. and Mowry, T. C. 1998. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the International Symposium on High-Performance Computer Architecture. 2--13.
    [31]
    Sundaramoorthy, K., Purser, Z., and Rotenberg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 257--268.
    [32]
    Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. Cacti 4.0. Tech. Rep., Compaq Western Research Lab.
    [33]
    Tullsen, D. M., Eggers, S. J., Emer, J. S., Levy, H. M., Lo, J. L., and Stamm, R. L. 1996. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the International Symposium on Computer Architecture. 191--202.
    [34]
    Warg, F. 2005. Reducing misspeculation overhead for module-level speculative execution. In Proceedings of the International Conference on Computing Frontiers. 289--298.
    [35]
    Xekalakis, P. and Cintra, M. 2010. Handling branches in TLS systems with multi-path execution. In Proceedings of the International Symposium on High-Performance Computer Architecture. 367--378.
    [36]
    Xekalakis, P., Ioannou, N., and Cintra, M. 2009. Combining thread level speculation, helper threads, and runahead execution. In Proceedings of the International Conference on Supercomputing. 410--420.
    [37]
    Zilles, C. and Sohi, G. 2001. Execution-based prediction using speculative slices. In Proceedings of the International Symposium on Computer Architecture. 2--13.

    Cited By

    View all
    • (2016)Exhaustive analysis of thread-level speculationProceedings of the 3rd International Workshop on Software Engineering for Parallel Systems10.1145/3002125.3002127(25-34)Online publication date: 21-Oct-2016
    • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
    • (2016)Proceedings of the 3rd International Workshop on Software Engineering for Parallel SystemsundefinedOnline publication date: 21-Oct-2016
    • Show More Cited By

    Index Terms

    1. Mixed speculative multithreaded execution models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 3
      September 2012
      313 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/2355585
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 October 2012
      Accepted: 01 March 2012
      Revised: 01 July 2011
      Received: 01 December 2010
      Published in TACO Volume 9, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Speculative parallelization
      2. helper threads
      3. multipath execution
      4. runahead execution
      5. speculative multithreading

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)59
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Exhaustive analysis of thread-level speculationProceedings of the 3rd International Workshop on Software Engineering for Parallel Systems10.1145/3002125.3002127(25-34)Online publication date: 21-Oct-2016
      • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
      • (2016)Proceedings of the 3rd International Workshop on Software Engineering for Parallel SystemsundefinedOnline publication date: 21-Oct-2016
      • (2014)Exploiting Thread-Level Parallelism Based on Balancing Load for Speculative MultithreadingApplied Mechanics and Materials10.4028/www.scientific.net/AMM.678.8678(8-11)Online publication date: Oct-2014
      • (2014)SCaLeMProceedings of the 20 Years of Beowulf Workshop on Honor of Thomas Sterling's 65th Birthday10.1145/2737909.2737910(34-43)Online publication date: 13-Oct-2014
      • (2014)A Dynamically Adaptive Approach for Speculative Loop Execution in SMT ArchitecturesProceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)10.1109/HPCC.2014.171(1024-1031)Online publication date: 20-Aug-2014

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media