Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Accurate branch prediction for short threads

Published: 01 March 2008 Publication History

Abstract

Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to expose parallelism. Current branch predictors seek to incorporate large amounts of control flow history to maximize accuracy. However, when that history is absent the predictor fails to work as intended. Thus, modern predictors are almost useless for threads below a certain length.
Using a Speculative Multithreaded (SpMT) architecture as an example of a system which generates shorter threads, this work examines techniques to improve branch prediction accuracy when a new thread begins to execute on a different core. This paper proposes a minor change to the branch predictor that gives virtually the same performance on short threads as an idealized predictor that incorporates unknowable pre-history of a spawned speculative thread. At the same time, strong performance on long threads is preserved. The proposed technique sets the global history register of the spawned thread to the initial value of the program counter. This novel and simple design reduces branch mispredicts by 29% and provides as much as a 13% IPC improvement on selected SPEC2000 benchmarks.

Supplementary Material

JPG File (1346298.jpg)
index.html (index.html)
Slides from the presentation
ZIP File (p125-choi-slides.zip)
Supplemental material for Accurate branch prediction for short threads
Audio only (1346298.mp3)
Video (1346298.mp4)

References

[1]
H. Akkary and M. A. Driscoll. A dynamic multithreading processor. In 31st International Symposium on Microarchitecture, pages 226--236, Nov. 1998.
[2]
M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In 32nd Annual International Symposium on Computer Architecture, pages 298--309, June 2005.
[3]
P. Chaparro, J. Gonzalez, and A. Gonzalez. Thermal-aware clustered microarchitectures. In International Conference on Computer Design, pages 48--53, Oct. 2004.
[4]
R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, and Y. N. Patt. Simultaneous subordinate microthreading (SSMT). In 26th Annual International Symposium on Computer Architecture, pages 186--195, May 1999.
[5]
I.-C. K. Chen, J. T. Coffey, and T. N. Mudge. Analysis of branch prediction via data compression. In 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 128--137, Oct. 1996.
[6]
J. Chung, H. Chafi, C. Minh, A. McDonald, B. Carlstrom, C. Kozyrakis, and K. Olukotun. The common case transactional behavior of multithreaded programs. In Sixth International Symposium on High-Performance Computer Architecture, pages 266--277, Feb. 2006.
[7]
M. Cintra, J. F. Martńez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In 27th Annual International Symposium on Computer Architecture, pages 13--24, June 2000.
[8]
J. Collins, H. Wang, D. Tullsen, C. Hughes, Y.-F. Lee, D. Lavery, and J. Shen. Speculative precomputation: Long-range prefetching of delinquent loads. In 28th Annual International Symposium on Computer Architecture, July 2001.
[9]
M. de Alba and D. Kaeli. Path-based hardware loop prediction. In 4th International Conference on Control, Virtual Instrumentation and Digital Systems, August 2002.
[10]
J. B. Dennis and D. P. Misunas. A preliminary architecture for a basic data-flow processor. In 2th Annual International Symposium on Computer Architecture, pages 126--132, June 1975.
[11]
A. N. Eden and T. Mudge. The YAGS branch prediction scheme. In 31st International Symposium on Microarchitecture, pages 69--77, Nov. 1998.
[12]
J. Gummaraju and M. Franklin. Branch prediction in multi-threaded processors. In 9th International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Oct. 2000.
[13]
L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun. Programming with transactional coherence and consistency (TCC). In 11th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1--13, Oct. 2004.
[14]
S. Hily and A. Seznec. Branch prediction and simultaneous multithreading. In Conference on Parallel Architectures and Compilation Techniques, page 169, Oct. 1996.
[15]
H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, G. R. Gao, and L. J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319--348, Feb 1996.
[16]
D. A. Jimenez. Fast path-based neural branch prediction. In 36th International Symposium on Microarchitecture, page 243, Dec. 2003.
[17]
D. A. Jimenez and C. Lin. Neural methods for dynamic branch prediction. ACM Transactions on Computer Systems, 20(4):369--397, Feb. 2002.
[18]
R. E. Kessler. The alpha 21264 microprocessor. IEEE MICRO, 19(2):24--36, Mar. 1999.
[19]
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction. In 36th International Symposium on Microarchitecture, pages 81--92, Dec. 2003.
[20]
R. Kumar, N. P. Jouppi, and D. M. Tullsen. Conjoined-core chip multiprocessing. In 37th International Symposium on Microarchitecture, pages 195--206, Dec. 2004.
[21]
C. Kyriacou, P. Evripidou, and P. Trancoso. Data-driven multithreading using conventional microprocessors. IEEE Transactions on Parallel and Distributed Systems, 17(10):1176--1188, Oct. 2006.
[22]
P. Marcuello. Speculative multithreaded processors, Ph. D. Thesis, Universitat Politecnica de Catalunya. 2003.
[23]
P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Second International Symposium on High-Performance Computer Architecture, page 55, Feb 2002.
[24]
P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In 32nd International Symposium on Microarchitecture, pages 230--236, Nov. 1999.
[25]
P. Marcuelo and A. Gonzàlez. A quantitative assessment of thread-level speculation techniques. In 14th International Symposium on Parallel and Distributed Processing, page 595, May 2000.
[26]
S. McFarling. Combining branch predictors. DEC WRL Technical Note TN-36, 1993.
[27]
P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In 24th Annual International Symposium on Computer Architecture, pages 292--303, June 1997.
[28]
K. Olukotun, L. Hammond, and M. Willey. Improving the performance of speculatively parallel applications on the Hydra-CMP. In 13th International Conference on Supercomputing, pages 21--30, June 1999.
[29]
G. D. Pizzol and P. O. A. Navaux. Branch prediction topologies for SMT architectures. In Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing, pages 118--125, June 2005.
[30]
M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In 10th Symposium on Principles and Practice of Parallel Programming, pages 142--152, June 2005.
[31]
Z. Purser, K. Sundaramoorthy, and E. Rotenberg. A study of slipstream processors. In 33rd International Symposium on Microarchitecture, pages 269--280, Dec 2000.
[32]
C. G. Quiñones, C. Madriles, J. Sanchez, P. Marcuello, A. Gonzalez, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Conference on Programming Language Design and Implementation, pages 269--279, June 2005.
[33]
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An architecture of a dataflow single chip processor. In 16th Annual International Symposium on Computer Architecture, pages 46--53, Apr 1989.
[34]
A. Seznec. Analysis of the O-GEometric history length branch predictor. In 32nd Annual International Symposium on Computer Architecture, pages 394--405, 2005.
[35]
A. Seznec. The L-TAGE branch predictor. In Journal of Instruction-Level Parallelism, vol. 9, May 2007.
[36]
A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides. Design tradeoffs for the alpha EV8 conditional branch predictor. In 29th Annual International Symposium on Computer Architecture, pages 295--306, June 2002.
[37]
A. Seznec and P. Michaud. De-aliashed hybrid branch predictors. Technical Report RR-3618, Inria, Feb. 1999.
[38]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 45--57, Oct. 2002.
[39]
J. E. Smith. A study of branch prediction strategies. In 25th Annual International Symposium on Computer Architecture, pages 202--215, June 1998.
[40]
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In 22nd Annual International Symposium on Computer Architecture, pages 414--425, June 1995.
[41]
E. Sprangle, R. S. Chappell, M. Alsup, and Y. N. Patt. The agree predictor: a mechanism for reducing negative branch history interference. In 24th Annual International Symposium on Computer Architecture, pages 284--291, June 1997.
[42]
S. T. Srinivasan, H. Akkary, T. Holman, and K. Lai. A minimal dual-core speculative multi-threading architecture. In International Conference on Computer Design, pages 360--367, Oct. 2004.
[43]
J. Steffan and T. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In 4th International Symposium on High-Performance Computer Architecture, page 2, Jan. 1998.
[44]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In 27th Annual International Symposium on Computer Architecture, pages 1--12, June 2000.
[45]
J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, Sep. 1999.
[46]
D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In In 22nd Annual Computer Measurement Group Conference, December 1996.
[47]
T. Vijaykumar, S. Gopal, J. Smith, and G. Sohi. Speculative versioning cache. IEEE Transactions on Parallel and Distributed Systems, 12(12):1305--1317, Dec. 2001.
[48]
C. von Praun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In 12th Symposium on Principles and Practice of Parallel Programming, pages 79--89, Sep 2007.
[49]
T.-Y. Yeh and Y. N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257--266, May 1993.
[50]
W. Zhang, B. Calder, and D. M. Tullsen. An event-driven multithreaded dynamic optimization framework. In 14th International Conference on Parallel Architectures and Compilation Techniques, pages 87--98, Sep. 2005.
[51]
W. Zhang, B. Calder, and D. M. Tullsen. A self-repairing prefetcher in an event-driven dynamic optimization framework. In International Symposium on Code Generation and Optimization, pages 50--64, March 2006.
[52]
C. Zilles and G. Sohi. Execution-based prediction using speculative slices. SIGARCH Computer Architecture News, 29(2):2--13, June 2001.

Cited By

View all
  • (2017)DualStack: A High Efficient Dynamic Page Scheduling Scheme in Hybrid Main Memory2017 International Conference on Networking, Architecture, and Storage (NAS)10.1109/NAS.2017.8026855(1-6)Online publication date: Aug-2017
  • (2017)Branch Prediction Migration for Multi-Core Architectures2017 International Conference on Networking, Architecture, and Storage (NAS)10.1109/NAS.2017.8026848(1-2)Online publication date: Aug-2017
  • (2017)Improving Branch Prediction for Thread Migration on Multi-core ArchitecturesNetwork and Parallel Computing10.1007/978-3-319-68210-5_8(87-99)Online publication date: 20-Oct-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 42, Issue 2
ASPLOS '08
March 2008
339 pages
ISSN:0163-5980
DOI:10.1145/1353535
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
    March 2008
    352 pages
    ISBN:9781595939586
    DOI:10.1145/1346281
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008
Published in SIGOPS Volume 42, Issue 2

Check for updates

Author Tags

  1. branch prediction
  2. chip multiprocessors

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)DualStack: A High Efficient Dynamic Page Scheduling Scheme in Hybrid Main Memory2017 International Conference on Networking, Architecture, and Storage (NAS)10.1109/NAS.2017.8026855(1-6)Online publication date: Aug-2017
  • (2017)Branch Prediction Migration for Multi-Core Architectures2017 International Conference on Networking, Architecture, and Storage (NAS)10.1109/NAS.2017.8026848(1-2)Online publication date: Aug-2017
  • (2017)Improving Branch Prediction for Thread Migration on Multi-core ArchitecturesNetwork and Parallel Computing10.1007/978-3-319-68210-5_8(87-99)Online publication date: 20-Oct-2017
  • (2021)Novel and Prevalent Techniques for Resolving Control HazardData Engineering for Smart Systems10.1007/978-981-16-2641-8_22(235-244)Online publication date: 14-Nov-2021
  • (2018)A survey of techniques for dynamic branch predictionConcurrency and Computation: Practice and Experience10.1002/cpe.466631:1Online publication date: 2-Sep-2018
  • (2015)An Energy-Efficient Branch Prediction with Grouped Global HistoryProceedings of the 2015 44th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2015.23(140-149)Online publication date: 1-Sep-2015
  • (2013)Multithreading ArchitectureSynthesis Lectures on Computer Architecture10.2200/S00458ED1V01Y201212CAC0218:1(1-109)Online publication date: 15-Jan-2013
  • (2012)Energy-efficient branch prediction with compiler-guided history stackProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492823(449-454)Online publication date: 12-Mar-2012
  • (2012)Disjoint out-of-order execution processorACM Transactions on Architecture and Code Optimization10.1145/2355585.23555929:3(1-32)Online publication date: 5-Oct-2012
  • (2012)Energy-efficient branch prediction with Compiler-guided History Stack2012 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.1109/DATE.2012.6176513(449-454)Online publication date: Mar-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media