Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3352460.3358315acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar Processors

Published: 12 October 2019 Publication History

Abstract

Branch prediction accuracy plays a dominant role in the performance provided by modern Out-of-Order(OOO) superscalar processors. While global history-based branch predictors are more popular, local history-based predictors offer an additional dimension towards enhancing the overall branch prediction accuracy. Integrating the local predictors in modern cores, though, comes with non-trivial challenges associated with managing the local predictor's state and repairing this state on any branch misprediction is essential for the local predictor to operate effectively. Using a highly accurate, industry standard simulator modeling a Skylake-like OOO core and workloads spanning diverse categories including Server, High Performance Computing (HPC) and personal computing suites, besides SPEC, we methodically highlight the issues that need to be tackled, why local predictor repair is non-trivial and the performance opportunity that is lost when the local predictor repair is not handled efficiently. We discuss the issues with prior techniques and quantify their limitations when using them in current OOO cores. Further, we propose three practical, implementable and efficient repair techniques with minimal storage requirements that provide significant performance gains for local predictors. Unlike prior repair techniques that can only attain 50% of the oracular gains, our realistic repair techniques retain about 80% of the oracular gains resulting in significantly better application performance.

References

[1]
2015. Anandtech. https://www.anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/.
[2]
2015. Intel Skylake Processor Architecture Overview. https://pcper.com/2015/08/intel-skylake-processor-architecture-overview-scaling-from-tablets-to-servers/.
[3]
2015. MobileXPRT. http://https://principledtechnologies.com/benchmarkxprt/mobilexprt/.
[4]
2016. Apache Cassandra. http://cassandra.apache.org/.
[5]
2017. TabletMark. https://bapco.com/products/tabletmark/.
[6]
2018. 6th Generation Intel Processor Family. https://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html.
[7]
2018. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://www.netlib.org/benchmark/hpl/.
[8]
2018. SPEC CPU 2006. https://www.spec.org/cpu2006/.
[9]
2018. SPECMPI2007. http://spec.org/mpi2007/.
[10]
2018. SYSmark 2014. https://bapco.com/products/sysmark-2014/.
[11]
2019. Apache Spark. http://spark.apache.org/.
[12]
2019. EEMBC. https://www.eembc.org/.
[13]
2019. SPEC CPU2017. https://www.spec.org/cpu2017/.
[14]
2019. SPECjbb2015. http://spec.org/jbb2015/.
[15]
Mike Clark. 2016. A new× 86 core architecture for the next generation of computing. In Hot Chips 28 Symposium (HCS), 2016 IEEE. IEEE, 1--19.
[16]
Hongliang Gao and Huiyang Zhou. 2005. Adaptive information processing: An effective way to improve perceptron branch predictors. Journal of Instruction-Level Parallelism 7 (2005), 1--10.
[17]
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data.
[18]
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1--28.
[19]
W. W. Hwu and Y. N. Patt. 1987. Checkpoint Repair for Out-of-order Execution Machines. In Proceedings of the 14th Annual International Symposium on Computer Architecture (ISCA '87).
[20]
Yasuo Ishii. 2007. Fused two-level branch prediction with ahead calculation. Journal of Instruction-Level Parallelism 9 (2007), 1--19.
[21]
Yasuo Ishii, Keisuke Kuroyanagi, Takeo Sawada, Mary Inaba, and Kei Hiraki. 2010. Revisiting Local History to Improve the Fused Two-Level Branch Predictor. 3rd Championship Branch Prediction (2010).
[22]
D Jiménez. 2016. Multiperspective perceptron predictor. Championship Branch Prediction (CBP-5) (2016).
[23]
Daniel A Jiménez, Stephen W Keckler, and Calvin Lin. 2000. The impact of delay on the design of branch predictors. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture.
[24]
Daniel A Jiménez and Calvin Lin. 2001. Dynamic branch prediction with perceptrons. In High-Performance Computer Architecture, 2001. HPCA. The Seventh International Symposium on. IEEE, 197--206.
[25]
Primate Labs. 2019. Geekbench. https://geekbench.com.
[26]
Scott McFarling. 1993. Combining Branch Predictors. Techincal Report DEC (1993).
[27]
D. Richins, T. Ahmed, R. Clapp, and V. Janapa Reddi. 2018. Amdahl's Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench). 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2018).
[28]
Satish Kumar Sadasivam, Brian W Thompto, Ron Kalla, and William J Starke. 2017. IBM Power9 processor architecture. IEEE Micro 37, 2 (2017), 40--51.
[29]
A. Seznec. 2005. Analysis of the O-GEometric history length branch predictor. In 32nd International Symposium on Computer Architecture (ISCA'05).
[30]
André Seznec. 2011. A New Case for the TAGE Branch Predictor. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 117--127. https://doi.org/10.1145/2155620.2155635
[31]
André Seznec. 2016. Tage-sc-l branch predictors again. 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5) (2016).
[32]
André Seznec and Pierre Michaud. 2006. A Case for (partially) Tagged Geometric history length branch prediction. In Journal of Instruction Level Parallelism. http://jilp.org/vol8
[33]
André Seznec, Joshua San Miguel, and Jorge Albericio. 2015. The Inner Most Loop Iteration Counter: A New Dimension in Branch History. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 347--357. https://doi.org/10.1145/2830772.2830831
[34]
Kevin Skadron, Margaret Martonosi, and D Clark. 2000. Speculative updates of local and global branch history: A quantitative analysis. Journal of Instruction-Level Parallelism 2 (2000).
[35]
James E. Smith. 1981. A Study of Branch Prediction Strategies. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA '81). IEEE Computer Society Press, Los Alamitos, CA, USA, 135--148. http://dl.acm.org/citation.cfm?id=800052.801871
[36]
Eric Sprangle and Doug Carmean. 2002. Increasing processor performance by implementing deeper pipelines. In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on.
[37]
Tse-Yu Yeh and Yale N. Patt. 1991. Two-level Adaptive Training Branch Prediction. In Proceedings of the 24th Annual International Symposium on Microarchitecture (MICRO 24).

Cited By

View all
  • (2023)BeKnight: Guarding Against Information Leakage in Speculatively Updated Branch Predictors2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323658(01-09)Online publication date: 28-Oct-2023
  • (2021)Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction PotentialMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480050(792-803)Online publication date: 18-Oct-2021
  • (2021)NOREBA: a compiler-informed non-speculative out-of-order commit processorProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446726(182-193)Online publication date: 19-Apr-2021
  • Show More Cited By

Index Terms

  1. Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar Processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
      October 2019
      1104 pages
      ISBN:9781450369381
      DOI:10.1145/3352460
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Branch Prediction
      2. Local predictors
      3. Performance
      4. Superscalar cores

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MICRO '52
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Upcoming Conference

      MICRO '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)77
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)BeKnight: Guarding Against Information Leakage in Speculatively Updated Branch Predictors2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323658(01-09)Online publication date: 28-Oct-2023
      • (2021)Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction PotentialMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480050(792-803)Online publication date: 18-Oct-2021
      • (2021)NOREBA: a compiler-informed non-speculative out-of-order commit processorProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446726(182-193)Online publication date: 19-Apr-2021
      • (2021)COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00053(310-320)Online publication date: Mar-2021
      • (2020)CHiRP: Control-Flow History Reuse Prediction2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00023(131-145)Online publication date: Oct-2020

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media