Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO.2012.38acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Control-Flow Decoupling

Published: 01 December 2012 Publication History

Abstract

Mobile and PC/server class processor companies continue to roll out flagship core micro architectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption at a premium. Hence, the challenge is to find future micro architecture optimizations that not only increase performance but also conserve energy. Eliminating branch mispredictions--which waste both time and energy--is valuable in this respect. We first explore the control-flow landscape by characterizing mispredictions in four benchmark suites. We find that a third of mispredictions-per-1K-instructions (MPKI) come from what we call separable branches: branches with large control-dependent regions (not suitable for if-conversion), whose backward slices do not depend on their control-dependent instructions or have only a short dependence. We propose control-flow decoupling (CFD) to eradicate mispredictions of separable branches. The idea is to separate the loop containing the branch into two loops: the first contains only the branch's predicate computation and the second contains the branch and its control-dependent instructions. The first loop communicates branch outcomes to the second loop through an architectural queue. Micro architecturally, the queue resides in the fetch unit to drive timely, non-speculative fetching or skipping of successive dynamic instances of the control-dependent region. Either the programmer or compiler can transform a loop for CFD, and we evaluate both. On a micro architecture configured similar to Intel's Sandy Bridge core, CFD increases performance by up to 43%, and reduces energy consumption by up to 41%. Moreover, for some applications, CFD is a necessary catalyst for future complexity-effective large-window architectures to tolerate memory latency.

References

[1]
K. Albayraktaroglu et al., "Biobench: a benchmark suite of bioinformatics applications," in Int'l Symp. on Performance Analysis of Systems and Software, 2005, pp. 182 -188.
[2]
J. R. Allen et al., "Conversion of control dependence to data dependence," in 10th Symp. on Principles of Programming Languages, 1983, pp. 177-189.
[3]
D. August et al., "Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results," in 3rd Int'l Symp. on High-Performance Computer Architecture, 1997, pp. 84-93.
[4]
P. L. Bird, A. Rawsthorne, and N. P. Topham, "The effectiveness of decoupling," in 7th Int'l Conf. on Supercomputing, 1993, pp. 47-56.
[5]
E. Brunvand, "The nsr processor," in 26th Hawaii Int'l Conf. on System Sciences, vol. 1, 1993, pp. 428-435.
[6]
B. Burgess et al., "Bobcat: amd's low-power x86 processor," IEEE Micro, vol. 31, no. 2, pp. 16-25, 2011.
[7]
D. Carmean, "Inside the pentium 4 processor micro-architecture." Presented at Intel Developer Forum, 2000.
[8]
R. Chappell et al., "Simultaneous subordinate microthreading (ssmt)," in 26th Int'l Symp. on Computer Architecture, 1999, pp. 186-195.
[9]
R. Chappell et al., "Difficult-path branch prediction using subordinate microthreads," in 29th Int'l Symp. on Comp. Arch., 2002, pp. 307-317.
[10]
cTuning, "Collective Benchmark," in http://cTuning.org/cbench.
[11]
A. Farcy et al., "Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes," in 31st Int'l Symp. on Microarchitecture, 1998, pp. 59-68.
[12]
J. Huang et al., "Decoupled software pipelining creates parallelization opportunities," in 8th Int'l Symp. on Code Generation and Optimization, 2010, pp. 121-130.
[13]
E. Jacobsen, E. Rotenberg, and J. Smith, "Assigning confidence to conditional branch predictions," in 29th Int'l Symp. on Microarchitecture, 1996, pp. 142-152.
[14]
H. Kim et al., "Diverge-merge processor (dmp): dynamic predicated execution of complex control-flow graphs based on frequently executed paths," in 39th Int'l Symp. on Microarchitecture, 2006, pp. 53-64.
[15]
H. Kim et al., "Wish branches: combining conditional branching and predication for adaptive predicated execution," in 38th Int'l Symp. on Microarchitecture, 2005, pp. 43-54.
[16]
A. Klauser et al., "Dynamic hammock predication for nonpredicated instruction set architectures," in 7th Int'l Conf. on Parallel Architectures and Compilation Techniques, 1998, pp. 278-285.
[17]
T. Lanier, "Exploring the design of the cortex-a15 processor," 2011.
[18]
H. Q. Le et al., "Ibm power6 microarchitecture," IBM Journal of Research and Development, vol. 51, no. 6, pp. 639-662, 2007.
[19]
S. Li et al., "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in 42nd Int'l Symp. on Microarchitecture, 2009, pp. 469-480.
[20]
C.-K. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," SIGPLAN Not., vol. 40, no. 6, pp. 190-200, Jun. 2005.
[21]
S. Mahlke and B. Natarajan, "Compiler synthesized dynamic branch prediction," in 29th Int'l Symp. on Microarch., 1996, pp. 153-164.
[22]
S. Mahlke et al., "Effective compiler support for predicated execution using the hyperblock," in 25th Int'l Symp. on Microarchitecture, 1992, pp. 45-54.
[23]
O. Mutlu et al., "Runahead execution: an alternative to very large instruction windows for out-of-order processors," in 9th Int'l Symp. on High-Performance Computer Architecture, 2003, pp. 129-140.
[24]
R. Narayanan et al., "Minebench: a benchmark suite for data mining workloads," in Int'l Symp. on Workload Characterization, 2006, pp. 182-188.
[25]
G. Ottoni et al., "Automatic thread extraction with decoupled software pipelining," in 38th Int'l Symp. on Microarchitecture, 2005, pp. 105-118.
[26]
E. Quinones, J.-M. Parcerisa, and A. Gonzalez, "Improving branch prediction and predicated execution in out-of-order processors," in 13th Int'l Symp. on High Perf. Computer Architecture, 2007, pp. 75-84.
[27]
A. Roth and G. Sohi, "Speculative data-driven multithreading," in 7th Int'l Symp. on High-Perf. Computer Architecture, 2001, pp. 37-48.
[28]
A. Seznec, "A 64 kbytes isl-tage branch predictor," in 3rd Championship Branch Prediction, 2011.
[29]
A. Seznec, "A new case for the tage branch predictor," in 44th Int'l Symp. on Microarchitecture, 2011, pp. 117-127.
[30]
J. E. Smith, "Decoupled access/execute computer architectures," in 9th Int'l Symp. on Computer Architecture, 1982, pp. 112-119.
[31]
S. T. Srinivasan et al., "Continual flow pipelines," in 11th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, 2004, pp. 107-119.
[32]
Standard Performance Evaluation Corporation, "The SPEC CPU 2006 Benchmark Suite," in http://www.spec.org.
[33]
A. Tyagi, H.-C. Ng, and P. Mohapatra, "Dynamic branch decoupled architecture," in 17th Int'l Conf. on Comp. Design, 1999, pp. 442-450.
[34]
N. Vachharajani et al., "Speculative decoupled software pipelining," in 16th Int'l Conf. on Parallel Architecture and Compilation Techniques, 2007, pp. 49-59.
[35]
B. Valentine, "Introducing sandy bridge." Presented at Intel Developer Forum, San Francisco, 2010.
[36]
C. Zilles and G. Sohi, "Execution-based prediction using speculative slices," in 28th Int'l Symp. on Computer Architecture, 2001, pp. 2-13.

Cited By

View all
  • (2021)Fast Key-Value Lookups with Node TrackerACM Transactions on Architecture and Code Optimization10.1145/345209918:3(1-26)Online publication date: 8-Jun-2021
  • (2020)Opportunistic Early Pipeline Re-steering for Data-dependent BranchesProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414628(305-316)Online publication date: 30-Sep-2020
  • (2018)Cimple: instruction and memory level parallelismProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243185(1-16)Online publication date: 1-Nov-2018
  • Show More Cited By

Index Terms

  1. Control-Flow Decoupling

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
      December 2012
      487 pages
      ISBN:9780769549248

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 01 December 2012

      Check for updates

      Author Tags

      1. ISA extensions
      2. branch prediction
      3. hardware/software codesign
      4. pre-execution
      5. predication
      6. superscalar processor

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Fast Key-Value Lookups with Node TrackerACM Transactions on Architecture and Code Optimization10.1145/345209918:3(1-26)Online publication date: 8-Jun-2021
      • (2020)Opportunistic Early Pipeline Re-steering for Data-dependent BranchesProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414628(305-316)Online publication date: 30-Sep-2020
      • (2018)Cimple: instruction and memory level parallelismProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243185(1-16)Online publication date: 1-Nov-2018
      • (2018)Architectural support for probabilistic branchesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00018(108-120)Online publication date: 20-Oct-2018
      • (2015)Branch vanguardACM SIGARCH Computer Architecture News10.1145/2872887.275040043:3S(323-335)Online publication date: 13-Jun-2015
      • (2015)Branch vanguardProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750400(323-335)Online publication date: 13-Jun-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media