research-article

Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar Processors

Authors:

Niranjan Soundararajan,

Ragavendra Natarajan,

Lihu Rappoport,

Sreenivas SubramoneyAuthors Info & Claims

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 519 - 530

https://doi.org/10.1145/3352460.3358315

Published: 12 October 2019 Publication History

Abstract

Branch prediction accuracy plays a dominant role in the performance provided by modern Out-of-Order(OOO) superscalar processors. While global history-based branch predictors are more popular, local history-based predictors offer an additional dimension towards enhancing the overall branch prediction accuracy. Integrating the local predictors in modern cores, though, comes with non-trivial challenges associated with managing the local predictor's state and repairing this state on any branch misprediction is essential for the local predictor to operate effectively. Using a highly accurate, industry standard simulator modeling a Skylake-like OOO core and workloads spanning diverse categories including Server, High Performance Computing (HPC) and personal computing suites, besides SPEC, we methodically highlight the issues that need to be tackled, why local predictor repair is non-trivial and the performance opportunity that is lost when the local predictor repair is not handled efficiently. We discuss the issues with prior techniques and quantify their limitations when using them in current OOO cores. Further, we propose three practical, implementable and efficient repair techniques with minimal storage requirements that provide significant performance gains for local predictors. Unlike prior repair techniques that can only attain 50% of the oracular gains, our realistic repair techniques retain about 80% of the oracular gains resulting in significantly better application performance.

References

[1]

2015. Anandtech. https://www.anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/.

[2]

2015. Intel Skylake Processor Architecture Overview. https://pcper.com/2015/08/intel-skylake-processor-architecture-overview-scaling-from-tablets-to-servers/.

[3]

2015. MobileXPRT. http://https://principledtechnologies.com/benchmarkxprt/mobilexprt/.

[4]

2016. Apache Cassandra. http://cassandra.apache.org/.

[5]

2017. TabletMark. https://bapco.com/products/tabletmark/.

[6]

2018. 6th Generation Intel Processor Family. https://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html.

[7]

2018. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://www.netlib.org/benchmark/hpl/.

[8]

2018. SPEC CPU 2006. https://www.spec.org/cpu2006/.

[9]

2018. SPECMPI2007. http://spec.org/mpi2007/.

[10]

2018. SYSmark 2014. https://bapco.com/products/sysmark-2014/.

[11]

2019. Apache Spark. http://spark.apache.org/.

[12]

2019. EEMBC. https://www.eembc.org/.

[13]

2019. SPEC CPU2017. https://www.spec.org/cpu2017/.

[14]

2019. SPECjbb2015. http://spec.org/jbb2015/.

[15]

Mike Clark. 2016. A new× 86 core architecture for the next generation of computing. In Hot Chips 28 Symposium (HCS), 2016 IEEE. IEEE, 1--19.

[16]

Hongliang Gao and Huiyang Zhou. 2005. Adaptive information processing: An effective way to improve perceptron branch predictors. Journal of Instruction-Level Parallelism 7 (2005), 1--10.

[17]

Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data.

Digital Library

[18]

Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1--28.

[19]

W. W. Hwu and Y. N. Patt. 1987. Checkpoint Repair for Out-of-order Execution Machines. In Proceedings of the 14th Annual International Symposium on Computer Architecture (ISCA '87).

[20]

Yasuo Ishii. 2007. Fused two-level branch prediction with ahead calculation. Journal of Instruction-Level Parallelism 9 (2007), 1--19.

[21]

Yasuo Ishii, Keisuke Kuroyanagi, Takeo Sawada, Mary Inaba, and Kei Hiraki. 2010. Revisiting Local History to Improve the Fused Two-Level Branch Predictor. 3rd Championship Branch Prediction (2010).

[22]

D Jiménez. 2016. Multiperspective perceptron predictor. Championship Branch Prediction (CBP-5) (2016).

[23]

Daniel A Jiménez, Stephen W Keckler, and Calvin Lin. 2000. The impact of delay on the design of branch predictors. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture.

Digital Library

[24]

Daniel A Jiménez and Calvin Lin. 2001. Dynamic branch prediction with perceptrons. In High-Performance Computer Architecture, 2001. HPCA. The Seventh International Symposium on. IEEE, 197--206.

Digital Library

[25]

Primate Labs. 2019. Geekbench. https://geekbench.com.

[26]

Scott McFarling. 1993. Combining Branch Predictors. Techincal Report DEC (1993).

[27]

D. Richins, T. Ahmed, R. Clapp, and V. Janapa Reddi. 2018. Amdahl's Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench). 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2018).

[28]

Satish Kumar Sadasivam, Brian W Thompto, Ron Kalla, and William J Starke. 2017. IBM Power9 processor architecture. IEEE Micro 37, 2 (2017), 40--51.

Digital Library

[29]

A. Seznec. 2005. Analysis of the O-GEometric history length branch predictor. In 32nd International Symposium on Computer Architecture (ISCA'05).

Digital Library

[30]

André Seznec. 2011. A New Case for the TAGE Branch Predictor. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 117--127. https://doi.org/10.1145/2155620.2155635

Digital Library

[31]

André Seznec. 2016. Tage-sc-l branch predictors again. 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5) (2016).

[32]

André Seznec and Pierre Michaud. 2006. A Case for (partially) Tagged Geometric history length branch prediction. In Journal of Instruction Level Parallelism. http://jilp.org/vol8

[33]

André Seznec, Joshua San Miguel, and Jorge Albericio. 2015. The Inner Most Loop Iteration Counter: A New Dimension in Branch History. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 347--357. https://doi.org/10.1145/2830772.2830831

Digital Library

[34]

Kevin Skadron, Margaret Martonosi, and D Clark. 2000. Speculative updates of local and global branch history: A quantitative analysis. Journal of Instruction-Level Parallelism 2 (2000).

[35]

James E. Smith. 1981. A Study of Branch Prediction Strategies. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA '81). IEEE Computer Society Press, Los Alamitos, CA, USA, 135--148. http://dl.acm.org/citation.cfm?id=800052.801871

Digital Library

[36]

Eric Sprangle and Doug Carmean. 2002. Increasing processor performance by implementing deeper pipelines. In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on.

[37]

Tse-Yu Yeh and Yale N. Patt. 1991. Two-level Adaptive Training Branch Prediction. In Proceedings of the 24th Annual International Symposium on Microarchitecture (MICRO 24).

Cited By

Islam Chowdhuryy MZhang ZYao F(2023)BeKnight: Guarding Against Information Leakage in Speculatively Updated Branch Predictors2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323658(01-09)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323658
Perais A(2021)Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction PotentialMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480050(792-803)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480050
Hajiabadi ADiavastos ACarlson TSherwood TBerger EKozyrakis C(2021)NOREBA: a compiler-informed non-speculative out-of-order commit processorProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446726(182-193)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446726
Show More Cited By

Index Terms

Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar Processors
1. Computer systems organization
  1. Architectures
    1. Serial architectures
      1. Pipeline computing
      2. Superscalar architectures

Recommendations

Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors
Branch Classification: A New Mechanism for Improving Branch Predictor Performance

There is wide agreement that one of the most significant impediments to the performance of current and future pipelined superscalar processors is the presence of conditional branches in the instruction stream. Speculative execution is one solution to ...
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

The effective performance of wide-issue superscalar processors depends on many parameters, such as branch prediction accuracy, available instruction-level parallelism, and instruction-fetch bandwidth. This paper explores the relations between some of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 2019

1104 pages

ISBN:9781450369381

DOI:10.1145/3352460

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MICRO '52

Sponsor:

SIGMICRO

MICRO '52: The 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 12 - 16, 2019

OH, Columbus, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
1,147
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)6

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Islam Chowdhuryy MZhang ZYao F(2023)BeKnight: Guarding Against Information Leakage in Speculatively Updated Branch Predictors2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323658(01-09)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323658
Perais A(2021)Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction PotentialMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480050(792-803)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480050
Hajiabadi ADiavastos ACarlson TSherwood TBerger EKozyrakis C(2021)NOREBA: a compiler-informed non-speculative out-of-order commit processorProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446726(182-193)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446726
Zhao JGonzalez AAmid AKarandikar SAsanovic K(2021)COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00053(310-320)Online publication date: Mar-2021
https://doi.org/10.1109/ISPASS51385.2021.00053
Mirbagher-Ajorpaz SGarza EPokam GJimenez D(2020)CHiRP: Control-Flow History Reuse Prediction2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00023(131-145)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00023

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents