Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1183401.1183411acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Wide and efficient trace prediction using the local trace predictor

Published: 28 June 2006 Publication History

Abstract

High prediction bandwidth enables performance improvements and power reduction techniques. This paper explores a mechanism to increase prediction width (instructions per prediction) by predicting instruction traces. Our analysis shows that predicting traces including multiple branches is not significantly less accurate than predicting single branches. A novel Local Trace Predictor organization is proposed. It increases prediction width without reducing the ratio of prediction accuracy versus memory resources with respect to a Basic Block Predictor.Compared to the previously proposed Next-Trace Predictor, the Local Trace Predictor reduces memory requirements by codifying trace predictions, and by limiting the number of traces starting at the same instruction to 2 or 4. The limit lessens prediction width only slightly, and does not affect prediction accuracy. The overall result is that the Local Trace Predictor outperforms the Next-Trace Predictor for sizes higher than 12 KBytes.

References

[1]
T. Austin, E. Larson, and D. Ernst. Simplescalar: an infrastructure for computer system modeling. IEEE Computer, 35(2):59--67, 2002.
[2]
B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In ISCA '94: Proceedings of the 21st annual international symposium on Computer architecture, pages 2--11, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
[3]
T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel. Optimization of instruction fetch mechanisms for high issue rates. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 333--344, New York, NY, USA, 1995. ACM Press.
[4]
S. Dutta and M. Franklin. Control flow prediction schemes for wide-issue superscalar processors. IEEE Transactions on Parallel and Distributed Systems, 10(4):346--359, 1999.
[5]
A. Falcon, J. Stark, A. Ramirez, K. Lai, and M. Valero. Prophet/critic hybrid branch prediction. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, pages 250--263, Washington, DC, USA, 2004. IEEE Computer Society.
[6]
D. H. Friendly, S. J. Patel, and Y. N. Patt. Alternative fetch and issue policies for the trace cache fetch mechanism. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 24--33, Washington, DC, USA, 1997. IEEE Computer Society.
[7]
Q. Jacobson, E. Rotenberg, and J. E. Smith. Path-based next trace prediction. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 14--23, Washington, DC, USA, 1997. IEEE Computer Society.
[8]
Q. Jacobson and J. E. Smith. Trace preconstruction. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 37--46, New York, NY, USA, 2000. ACM Press.
[9]
D. A. Jimenez. Fast path-based neural branch prediction. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, pages 243--252, Washington, DC, USA, 2003. IEEE Computer Society.
[10]
D. A. Jimenez, S. W. Keckler, and C. Lin. The impact of delay on the design of branch predictors. In MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 67--76, New York, NY, USA, 2000. ACM Press.
[11]
K. N. Menezes, S. W. Sathaye, and T. M. Coate. Path prediction for high issue-rate processors. In PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques, pages 178--188, Washington, DC, USA, 1997. IEEE Computer Society.
[12]
P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture, pages 292--303, New York, NY, USA, 1997. ACM Press.
[13]
J. C. Moure, D. Benitez, D. I. Reixachs, and E. Luque. Target encoding for efficient indirect jump prediction. In Lecture Notes in Computer Science (LNCS 3648): Proceedings of the 11th International Euro-Par Conference, pages 497--507, 2005.
[14]
J. C. Moure, D. I. Reixachs, and E. Luque. Optimizing a decoupled front-end architecture: The indexed fetch target buffer (iftb). In Lecture Notes in Computer Science (LNCS 2790): Proceedings of the 9th International Euro-Par Conference, pages 566--575, 2003.
[15]
P. S. Oberoi and G. S. Sohi. Parallelism in the front-end. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 230--240, New York, NY, USA, 2003. ACM Press.
[16]
S. J. Patel, D. H. Friendly, and Y. N. Patt. Evaluation of design options for the trace cache fetch mechanism. IEEE Transactions on Computers, 48(2):193--204, 1999.
[17]
M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 54--65, Washington, DC, USA, 2001. IEEE Computer Society.
[18]
R. Rakvic, B. Black, and J. P. Shen. Completion time multiple branch prediction for enhancing trace cache performance. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 47--58, New York, NY, USA, 2000. ACM Press.
[19]
A. Ramirez, O. J. Santana, J. L. Larriba-Pey, and M. Valero. Fetching instruction streams. In MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 371--382, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.
[20]
G. Reinman, B. Calder, and T. M. Austin. Optimizations enabled by a decoupled front-end architecture. IEEE Transactions on Computers, 50(4):338--355, 2001.
[21]
G. Reinman, B. Calder, and T. M. Austin. High performance and energy efficient serial prefetch architecture. In ISHPC-4: Proceedings of the 4th International Symposium on High Performance Computing, pages 146--159, 2002.
[22]
R. Rosner, Y. Almog, M. Moffie, N. Schwartz, and A. Mendelson. Power awareness through selective dynamically optimized traces. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, pages 162--173, Washington, DC, USA, 2004. IEEE Computer Society.
[23]
R. Rosner, M. Moffie, Y. Sazeides, and R. Ronen. Selecting long atomic traces for high coverage. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 2--11, New York, NY, USA, 2003. ACM Press.
[24]
E. Rotenberg, S. Bennett, and J. E. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 24--35, Washington, DC, USA, 1996. IEEE Computer Society.
[25]
A. Seznec and A. Fraboulet. Effective ahead pipelining of instruction block address generation. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 241--252, New York, NY, USA, 2003. ACM Press.
[26]
A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-block ahead branch predictors. In ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pages 116--127, New York, NY, USA, 1996. ACM Press.
[27]
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 336--349, New York, NY, USA, 2003. ACM Press.
[28]
J. E. Smith. A study of branch prediction strategies. In ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, pages 135--148, Los Alamitos, CA, USA, 1981. IEEE Comp. Soc. Press.
[29]
S. Wallace and N. Bagherzadeh. Multiple branch and block prediction. In HPCA-3: Proceedings of the 3rd International International Symposium on High-Performance Computer Architecture, pages 94--103. IEEE Computer Society, 1997.
[30]
T.-Y. Yeh, D. T. Marr, and Y. N. Patt. Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. In ICS '93: Proceedings of the 7th international conference on Supercomputing, pages 67--76, New York, NY, USA, 1993. ACM Press.
[31]
T.-Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction. In MICRO 24: Proceedings of the 24th annual international symposium on Microarchitecture, pages 51--61, New York, NY, USA, 1991. ACM Press.
[32]
T.-Y. Yeh and Y. N. Patt. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. In MICRO 25: Proceedings of the 25th annual international symposium on Microarchitecture, pages 129--139, Los Alamitos, CA, USA, 1992. IEEE Computer Society Press.

Index Terms

  1. Wide and efficient trace prediction using the local trace predictor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '06: Proceedings of the 20th annual international conference on Supercomputing
    June 2006
    385 pages
    ISBN:1595932828
    DOI:10.1145/1183401
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. branch prediction
    2. high bandwidth fetch mechanism

    Qualifiers

    • Article

    Conference

    ICS06
    Sponsor:
    ICS06: International Conference on Supercomputing 2006
    June 28 - July 1, 2006
    Queensland, Cairns, Australia

    Acceptance Rates

    ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 252
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media