Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2155620.2155654acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Published: 03 December 2011 Publication History

Abstract

Multi-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Programs using parallel programming primitives like PThreads or OpenMP often exploit coarse-grain parallelism, because it offers a good trade-off between programming effort versus performance gain. Some parallel applications show limited or no scaling beyond a number of cores. Given the abundant number of cores expected in future many-cores, several cores would remain idle in such cases while execution performance stagnates. This paper proposes using cores that do not contribute to performance improvement for running implicit fine-grain speculative threads. In particular, we present a many-core architecture and protocol that allow applications with coarse-grain explicit parallelism to further exploit implicit speculative parallelism within each thread. Implicit speculative parallelism frees the programmer from the additional effort to explicitly partition the work into finer and properly synchronized tasks. Our results show that, for a many-core comprising of 128 cores supporting implicit speculative parallelism in clusters of 2 or 4 cores, performance improves on top of the highest scalability point by 41% on average for the 4-core cluster and by 27% on average for the 2-core cluster. These performance improvements come with an energy consumption that is close to -- and sometimes better than -- the baseline. This approach often leads to better performance and energy efficiency compared to existing alternatives such as Core Fusion and Frequency Boosting. We also investigate the tradeoffs between explicit and implicit threads as input dataset sizes vary. Finally, we present a dynamic mechanism to choose the number of explicit and implicit threads, which performs within 6% of the static oracle selection of threads.

References

[1]
D. H. Bailey et al. The nas parallel benchmarks -- summary and preliminary results. In SC, December 1991.
[2]
M. Bhadauria, V. M. Weaver, and S. A. McKee. Understanding parsec performance on contemporary cmps. In IISWC, October 2009.
[3]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In PACT, October 2008.
[4]
C. Blundell et al. Deconstructing transactional semantics: The subtleties of atomicity. In WDDD, June 2005.
[5]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, June 2000.
[6]
L. Ceze, J. Tuck, C. Cascaval, and J. Torrellas. Bulk disambiguation of speculative threads in multiprocessors. In ISCA, June 2006.
[7]
H. Chafi, J. Casper, B. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In HPCA, February 2007.
[8]
S. Chaudhry et al. Simultaneous speculative threading: A novel pipeline architecture implemented in Sun's ROCK processor. In ISCA, June 2009.
[9]
M. Cintra, J. Martínez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In ISCA, June 2000.
[10]
M. Curtis-Maury et al. Prediction models for multi-dimensional power-performance optimization on many cores. In PACT, October 2008.
[11]
C. Dave et al. Cetus: A source-to-source compiler infrastructure for multicores. IEEE Computer, December 2009.
[12]
B. R. de Supinski. Personal Communication. Lawrence Livermore National Laboratory, May 2011.
[13]
S. Eyerman and L. Eeckhout. Modeling critical sections in amdahl's law and its implications for multicore design. In ISCA, June 2010.
[14]
L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS, October 1998.
[15]
M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. IEEE Computer, July 2008.
[16]
J. Howard et al. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In ISSCC, 2010.
[17]
J. Huang, A. Raman, T. B. Jablin, Y. Zhang, T.-H. Hung, and D. I. August. Decoupled software pipelining creates parallelization opportunities. In CGO, April 2010.
[18]
Intel. Intel Turbo Boost Technology in Intel Core Microarchitecture (Nehalem) Based Processors, White Paper, November, 2008.
[19]
Intel Corporation. Intel Core2 Duo Processors and Intel Core2 Extreme Processors for Platforms Based on Mobile Intel 965 Express Chipset Family Datasheet, 2008.
[20]
E. Ipek, M. Kirman, N. Kirman, and J. F. Martínez. Core fusion: Accommodating software diversity in chip multiprocessors. In ISCA, 2007.
[21]
H. Kim, A. Raman, F. Liu, J. W. Lee, and D. I. August. Scalable speculative parallelization on commodity clusters. In MICRO, December 2010.
[22]
V. Krishnan and J. Torrellas. Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor. In ICS, July 1998.
[23]
R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In ISCA, June 2005.
[24]
K. Kusano et al. Performance evaluation of the omni openmp compiler. In ISHPC, October 2000.
[25]
E. A. Lee. The problem with threads. IEEE Computer, January 2006.
[26]
D. Lenoski, J. Laudon, K. Guarachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In ISCA, May 1990.
[27]
W. Liu et al. POSH: a TLS compiler that exploits program structure. In PPoPP, March 2006.
[28]
C. Madriles, P. López, J. M. Codina, E. Gibert, F. Latorre, A. Martinez, R. Martinez, and A. Gonzalez. Boosting single-thread performance in multi-core systems through fine-grain multi-threading. In ISCA, June 2009.
[29]
P. Marcuello and A. González. Clustered speculative multithreaded processors. In ICS, June 1999.
[30]
J. Martinez and J. Torrellas. Speculative synchronization: Applying thread-level speculation to explicitly parallel applications. In ASPLOS, October 2002.
[31]
C. C. Minh et al. An effective hybrid transactional memory system with strong isolation guarantees. In ISCA, June 2007.
[32]
M. J. Moravan, J. Bobba, K. E. Moore, L. Yen, M. D. Hill, B. Liblit, M. M. Swift, and D. A. Wood. Supporting nested transactional memory in logtm. In ASPLOS, October 2006.
[33]
C.-L. Ooi, S. W. Kim, I. Park, R. Eigenmann, B. Falsafi, and T. N. Vijaykumar. Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In ICS, 2001.
[34]
L. Porter, B. Choi, and D. M. Tullsen. Mapping out a path from hardware transactional memory to speculative multithreading. In PACT, September 2009.
[35]
R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO, December 2001.
[36]
R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In ASPLOS, October 2002.
[37]
J. Renau et al. SESC simulator, January 2005. http://sesc.sourceforge.net.
[38]
J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss, and J. Torrellas. Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation. In ICS, June 2005.
[39]
G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar processors. In ISCA, June 1995.
[40]
M. A. Suleman et al. An asymmetric architecture for accelerating critical sections. In ASPLOS, 2008.
[41]
D. Tarjan, S. Thoziyoor, and N. P. Jouppi. Cacti 4.0. Technical report, Compaq Research Lab., 2006.
[42]
C. von Praun et al. Implicit parallelism with ordered transactions. In PPoPP, March 2007.
[43]
S. Woo et al. The splash-2 programs: Characterization and methodological considerations. In ISCA, June 1995.
[44]
L. Yen et al. Logtm-se: Decoupling hardware transactional memory from caches. In HPCA, June 2007.

Cited By

View all
  • (2023)Sustainable Data Dependency Resolution Architectural Framework to Achieve Energy Efficiency Using Speculative Parallelization2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT)10.1109/CISCT57197.2023.10351343(1-6)Online publication date: 8-Sep-2023
  • (2022)An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheadsInternational Journal of Intelligent Systems10.1002/int.2306237:12(11764-11790)Online publication date: 8-Sep-2022
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • Show More Cited By

Index Terms

  1. Complementing user-level coarse-grain parallelism with implicit speculative parallelism

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
      December 2011
      519 pages
      ISBN:9781450310536
      DOI:10.1145/2155620
      • Conference Chair:
      • Carlo Galuzzi,
      • General Chair:
      • Luigi Carro,
      • Program Chairs:
      • Andreas Moshovos,
      • Milos Prvulovic
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 December 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. many-core architecture
      2. thread-level parallelism
      3. thread-level speculation

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MICRO-44
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Sustainable Data Dependency Resolution Architectural Framework to Achieve Energy Efficiency Using Speculative Parallelization2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT)10.1109/CISCT57197.2023.10351343(1-6)Online publication date: 8-Sep-2023
      • (2022)An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheadsInternational Journal of Intelligent Systems10.1002/int.2306237:12(11764-11790)Online publication date: 8-Sep-2022
      • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
      • (2015)Coherence domain restriction on large scale systemsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830832(686-698)Online publication date: 5-Dec-2015
      • (2015)Position-aware thread-level speculative parallelization for large-scale chip-multiprocessorProceedings of the 12th ACM International Conference on Computing Frontiers10.1145/2742854.2742866(1-8)Online publication date: 6-May-2015
      • (2014)Improving Speculation Accuracy with Inter-thread Fetching Value PredictionAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-11194-0_19(245-258)Online publication date: 2014
      • (2013)HEUSPECProceedings of the 2013 42nd International Conference on Parallel Processing10.1109/ICPP.2013.76(621-630)Online publication date: 1-Oct-2013
      • (2012)Mixed speculative multithreaded execution modelsACM Transactions on Architecture and Code Optimization10.1145/2355585.23555919:3(1-26)Online publication date: 5-Oct-2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media