Article

Wide and efficient trace prediction using the local trace predictor

Authors:

Domingo Benítez,

Dolores I. Rexachs,

Emilio LuqueAuthors Info & Claims

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Pages 55 - 65

https://doi.org/10.1145/1183401.1183411

Published: 28 June 2006 Publication History

Abstract

High prediction bandwidth enables performance improvements and power reduction techniques. This paper explores a mechanism to increase prediction width (instructions per prediction) by predicting instruction traces. Our analysis shows that predicting traces including multiple branches is not significantly less accurate than predicting single branches. A novel Local Trace Predictor organization is proposed. It increases prediction width without reducing the ratio of prediction accuracy versus memory resources with respect to a Basic Block Predictor.Compared to the previously proposed Next-Trace Predictor, the Local Trace Predictor reduces memory requirements by codifying trace predictions, and by limiting the number of traces starting at the same instruction to 2 or 4. The limit lessens prediction width only slightly, and does not affect prediction accuracy. The overall result is that the Local Trace Predictor outperforms the Next-Trace Predictor for sizes higher than 12 KBytes.

References

[1]

T. Austin, E. Larson, and D. Ernst. Simplescalar: an infrastructure for computer system modeling. IEEE Computer, 35(2):59--67, 2002.

Digital Library

[2]

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In ISCA '94: Proceedings of the 21st annual international symposium on Computer architecture, pages 2--11, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.

Digital Library

[3]

T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel. Optimization of instruction fetch mechanisms for high issue rates. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 333--344, New York, NY, USA, 1995. ACM Press.

Digital Library

[4]

S. Dutta and M. Franklin. Control flow prediction schemes for wide-issue superscalar processors. IEEE Transactions on Parallel and Distributed Systems, 10(4):346--359, 1999.

Digital Library

[5]

A. Falcon, J. Stark, A. Ramirez, K. Lai, and M. Valero. Prophet/critic hybrid branch prediction. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, pages 250--263, Washington, DC, USA, 2004. IEEE Computer Society.

Digital Library

[6]

D. H. Friendly, S. J. Patel, and Y. N. Patt. Alternative fetch and issue policies for the trace cache fetch mechanism. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 24--33, Washington, DC, USA, 1997. IEEE Computer Society.

Digital Library

[7]

Q. Jacobson, E. Rotenberg, and J. E. Smith. Path-based next trace prediction. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 14--23, Washington, DC, USA, 1997. IEEE Computer Society.

Digital Library

[8]

Q. Jacobson and J. E. Smith. Trace preconstruction. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 37--46, New York, NY, USA, 2000. ACM Press.

Digital Library

[9]

D. A. Jimenez. Fast path-based neural branch prediction. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, pages 243--252, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

[10]

D. A. Jimenez, S. W. Keckler, and C. Lin. The impact of delay on the design of branch predictors. In MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 67--76, New York, NY, USA, 2000. ACM Press.

Digital Library

[11]

K. N. Menezes, S. W. Sathaye, and T. M. Coate. Path prediction for high issue-rate processors. In PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques, pages 178--188, Washington, DC, USA, 1997. IEEE Computer Society.

Digital Library

[12]

P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture, pages 292--303, New York, NY, USA, 1997. ACM Press.

Digital Library

[13]

J. C. Moure, D. Benitez, D. I. Reixachs, and E. Luque. Target encoding for efficient indirect jump prediction. In Lecture Notes in Computer Science (LNCS 3648): Proceedings of the 11th International Euro-Par Conference, pages 497--507, 2005.

Digital Library

[14]

J. C. Moure, D. I. Reixachs, and E. Luque. Optimizing a decoupled front-end architecture: The indexed fetch target buffer (iftb). In Lecture Notes in Computer Science (LNCS 2790): Proceedings of the 9th International Euro-Par Conference, pages 566--575, 2003.

[15]

P. S. Oberoi and G. S. Sohi. Parallelism in the front-end. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 230--240, New York, NY, USA, 2003. ACM Press.

Digital Library

[16]

S. J. Patel, D. H. Friendly, and Y. N. Patt. Evaluation of design options for the trace cache fetch mechanism. IEEE Transactions on Computers, 48(2):193--204, 1999.

Digital Library

[17]

M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 54--65, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[18]

R. Rakvic, B. Black, and J. P. Shen. Completion time multiple branch prediction for enhancing trace cache performance. In ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture, pages 47--58, New York, NY, USA, 2000. ACM Press.

Digital Library

[19]

A. Ramirez, O. J. Santana, J. L. Larriba-Pey, and M. Valero. Fetching instruction streams. In MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 371--382, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Digital Library

[20]

G. Reinman, B. Calder, and T. M. Austin. Optimizations enabled by a decoupled front-end architecture. IEEE Transactions on Computers, 50(4):338--355, 2001.

Digital Library

[21]

G. Reinman, B. Calder, and T. M. Austin. High performance and energy efficient serial prefetch architecture. In ISHPC-4: Proceedings of the 4th International Symposium on High Performance Computing, pages 146--159, 2002.

Digital Library

[22]

R. Rosner, Y. Almog, M. Moffie, N. Schwartz, and A. Mendelson. Power awareness through selective dynamically optimized traces. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, pages 162--173, Washington, DC, USA, 2004. IEEE Computer Society.

Digital Library

[23]

R. Rosner, M. Moffie, Y. Sazeides, and R. Ronen. Selecting long atomic traces for high coverage. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 2--11, New York, NY, USA, 2003. ACM Press.

Digital Library

[24]

E. Rotenberg, S. Bennett, and J. E. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 24--35, Washington, DC, USA, 1996. IEEE Computer Society.

Digital Library

[25]

A. Seznec and A. Fraboulet. Effective ahead pipelining of instruction block address generation. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 241--252, New York, NY, USA, 2003. ACM Press.

Digital Library

[26]

A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-block ahead branch predictors. In ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pages 116--127, New York, NY, USA, 1996. ACM Press.

Digital Library

[27]

T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pages 336--349, New York, NY, USA, 2003. ACM Press.

Digital Library

[28]

J. E. Smith. A study of branch prediction strategies. In ISCA '81: Proceedings of the 8th annual symposium on Computer Architecture, pages 135--148, Los Alamitos, CA, USA, 1981. IEEE Comp. Soc. Press.

Digital Library

[29]

S. Wallace and N. Bagherzadeh. Multiple branch and block prediction. In HPCA-3: Proceedings of the 3rd International International Symposium on High-Performance Computer Architecture, pages 94--103. IEEE Computer Society, 1997.

Digital Library

[30]

T.-Y. Yeh, D. T. Marr, and Y. N. Patt. Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. In ICS '93: Proceedings of the 7th international conference on Supercomputing, pages 67--76, New York, NY, USA, 1993. ACM Press.

Digital Library

[31]

T.-Y. Yeh and Y. N. Patt. Two-level adaptive training branch prediction. In MICRO 24: Proceedings of the 24th annual international symposium on Microarchitecture, pages 51--61, New York, NY, USA, 1991. ACM Press.

Digital Library

[32]

T.-Y. Yeh and Y. N. Patt. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. In MICRO 25: Proceedings of the 25th annual international symposium on Microarchitecture, pages 129--139, Los Alamitos, CA, USA, 1992. IEEE Computer Society Press.

Digital Library

Index Terms

Wide and efficient trace prediction using the local trace predictor
1. Computer systems organization
  1. Architectures
    1. Serial architectures

Recommendations

A novel meta predictor design for hybrid branch prediction

Recent systems have been paved the way for being high-performance due to the super-pipelining, dynamic scheduling and superscalar processor technologies. The performance of the system is greatly affected by the accuracy of the branch prediction because ...
NTB branch predictor: dynamic branch predictor for high-performance embedded processors

Branch prediction accuracy becomes more crucial in high-performance embedded processors. The importance of branch prediction in embedded processors continues to grow in the future. Many branch predictors have been proposed to alleviate the performance ...
Path-based next trace prediction
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

The trace cache has been proposed as a mechanism for providing increased fetch bandwidth by allowing the processor to fetch across multiple branches in a single cycle. But to date predicting multiple branches per cycle has meant paying a penalty in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

June 2006

385 pages

ISBN:1595932828

DOI:10.1145/1183401

General Chairs:
Greg Egan
Monash University
,
Yoichi Muraoka
Waseda University

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICS06

Sponsor:

ICS06: International Conference on Supercomputing 2006

June 28 - July 1, 2006

Queensland, Cairns, Australia

Acceptance Rates

ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
252
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten