Article

Parallelism in the front-end

Authors:

Paramjit S. Oberoi,

Gurindar S. SohiAuthors Info & Claims

ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

Pages 230 - 240

https://doi.org/10.1145/859618.859645

Published: 01 May 2003 Publication History

Abstract

As processor back-ends get more aggressive, front-ends will have to scale as well. Although the back-ends of superscalar processors have continued to become more parallel, the front-ends remain sequential. This paper describes techniques for fetching and renaming multiple non-contiguous portions of the dynamic instruction stream in parallel using multiple fetch and rename units. It demonstrates that parallel front-ends are a viable alternative to high-performance sequential front-ends.Compared with an equivalently-sized trace cache, our technique increases cache bandwidth utilization by 17%, front-end throughput by 20%, and performance by 5%. Parallelism also enhances latency tolerance: a parallel front-end loses only 6% performance as the cache size is decreased from 128 KB to 8 KB, compared with a 50--65% performance loss for sequential fetch mechanisms.

References

[1]

V. Bala, E. Duesterwald, and S. Banerjia. Transparent Dynamic Optimization. Technical Report HPL-1999-77, Hewlett Packard Labs, June 1999.

[2]

T. Ball and J. R. Larus. Branch Prediction For Free. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 300--313, Albuquerque, New Mexico, June 23--25, 1993.

Digital Library

[3]

S. Breach. Design and Evaluation of a Multiscalar Processor. Ph.D. thesis, University of Wisconsin-Madison, 1998.

Digital Library

[4]

D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin-Madison, Jun. 1997.

Digital Library

[5]

B. Calder and D. Grunwald. Reducing Branch Costs via Branch Alignment. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 242--251, San Jose, California, October 4--7, 1994.

Digital Library

[6]

C-Y. Cher and T. N. Vijaykumar. Skipper: A Microarchitecture For Exploiting Control-flow Independence. In Proceedings of the 34th Annual International Symposium on Microarchitecture, Austin, Texas, Dec. 2--5, 2001.

Digital Library

[7]

T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel. Optimization of Instruction Fetch Mechanisms for High Issue Rates. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 333--344, Santa Margherita Ligure, Italy, June 22--24, 1995.

Digital Library

[8]

J. Emer. EV8: The Post--Ultimate Alpha. Keynote Address, 10th International Conference on Parallel Architectures and Compilation Techniques, 2001.

[9]

M. Franklin and M. Smotherman. A Fill-Unit Approach to Multiple Instruction Issue. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 162--171, November 30-December 2, 1994.

Digital Library

[10]

D. H. Friendly, S. J. Patel, and Y. N. Patt. Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors. In Proceedings of the 31st Annual International Symposium on Microarchitecture, pages 173--181, Dallas, Texas, November 30-December 2, 1998.

Digital Library

[11]

Q. Jacobson, E. Rotenberg, and J. E. Smith. Path-Based Next Trace Prediction. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 14--23, Dec. 1--3, 1997.

Digital Library

[12]

R. Muth, S. Debray, S. Watterson, and K. de Bosschere. ALTO: A Link-Time Optimizer for the DEC Alpha. Technical Report TR98-14, University of Arizona, September 1998.

[13]

P. S. Oberoi and G. S. Sohi. Out-of-Order Instruction Fetch using Multiple Sequencers. In Proceedings of the 2002 International Conference on Parallel Processing, pages 14--23, Vancouver, Canada, August 18--21, 2002.

Digital Library

[14]

S. J. Patel, D. H. Friendly, and Y. N. Patt. Critical Issues Regarding the Trace Cache Fetch Mechanism. Technical Report CSE-TR-335-97, Department of Electrical Engineering and Computer Science, University of Michigan, May 1997.

[15]

S. J. Patel, T. Tung, S. Bose, and M. M. Crum. Increasing the Size of Atomic Instruction Blocks Using Control Flow Assertions. In Proceedings of the 33rd Annual International Symposium on Microarchitecture, pages 303--313, Monterey, California, December 10--13, 2000.

Digital Library

[16]

A. Peleg and U. Weiser. Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line. US Patent 5,381,533, March 30, 1994.

[17]

M. Postiff, G. Tyson, and T. Mudge. Performance Limits of Trace Caches. Journal of Instruction-Level Parallelism, 1, August 1998.

[18]

A. Ramirez, J-L. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero. Software Trace Cache. In Proceedings of the 1999 international conference on Supercomputing, pages 119--126, Rhodes, Greece, 1999.

Digital Library

[19]

A. Ramirez, O. J. Santana, J. L. Larriba-Pey, and M. Valero. Fetching Instruction Streams. In Proceedings of the 35rd Annual International Symposium on Microarchitecture, Istanbul, Turkey, November 18--22, 2002.

Digital Library

[20]

E. Rotenberg, S. Bennett, and J. E. Smith. Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 24--34, Paris, France, Dec. 2--4, 1996.

Digital Library

[21]

G. S. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Proc. 22nd International Symposium on Computer Architecture, pages 414--425, Jun. 1995.

Digital Library

[22]

J. Stark, P. Racunas, and Y. N. Patt. Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions into the Reservation Stations Out-of-Order. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 34--43, Dec. 1--3, 1997.

Digital Library

Cited By

Sharma NPandey K(2010)Graceful degradation in performance of wavescalar2010 International Conference on Computer and Communication Technology (ICCCT)10.1109/ICCCT.2010.5640456(693-697)Online publication date: Sep-2010
https://doi.org/10.1109/ICCCT.2010.5640456
Hilton ARoth A(2007)GingerACM SIGARCH Computer Architecture News10.1145/1273440.125071635:2(436-447)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1273440.1250716
Hilton ARoth ATullsen DCalder B(2007)GingerProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250716(436-447)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1250662.1250716
Show More Cited By

Parallelism in the front-end
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

A scalable front-end architecture for fast instruction delivery
Special Issue: Proceedings of the 26th annual international symposium on Computer architecture (ISCA '99)

In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the ...
Parallelism in the front-end
ISCA 2003

As processor back-ends get more aggressive, front-ends will have to scale as well. Although the back-ends of superscalar processors have continued to become more parallel, the front-ends remain sequential. This paper describes techniques for fetching ...
A Front-end Execution Architecture for High Energy Efficiency
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Smart phones and tablets have recently become widespread and dominant in the computer market. Users require that these mobile devices provide a high-quality experience and an even higher performance. Hence, major developers adopt out-of-order ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

June 2003

432 pages

ISBN:0769519458

DOI:10.1145/859618

Conference Chair:
Allan Gottlieb
New York University & NEC Laboratories America
,
Program Chair:
Kai Li
Princeton University

ACM SIGARCH Computer Architecture News Volume 31, Issue 2
ISCA 2003
May 2003
422 pages
ISSN:0163-5964
DOI:10.1145/871656
Issue’s Table of Contents

Copyright © 2003 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA03

Sponsor:

SIGARCH

ISCA03: International Symposium on Computer Architecture

June 9 - 11, 2003

California, San Diego

Acceptance Rates

ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
623
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sharma NPandey K(2010)Graceful degradation in performance of wavescalar2010 International Conference on Computer and Communication Technology (ICCCT)10.1109/ICCCT.2010.5640456(693-697)Online publication date: Sep-2010
https://doi.org/10.1109/ICCCT.2010.5640456
Hilton ARoth A(2007)GingerACM SIGARCH Computer Architecture News10.1145/1273440.125071635:2(436-447)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1273440.1250716
Hilton ARoth ATullsen DCalder B(2007)GingerProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250716(436-447)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1250662.1250716
Zhang AHelal S(2007)SuperCacheProceedings of the International Conference on Information Technology10.1109/ITNG.2007.189(908-914)Online publication date: 2-Apr-2007
https://dl.acm.org/doi/10.1109/ITNG.2007.189
Moure JBenítez DRexachs DLuque EEgan GMuraoka Y(2006)Wide and efficient trace prediction using the local trace predictorProceedings of the 20th annual international conference on Supercomputing10.1145/1183401.1183411(55-65)Online publication date: 28-Jun-2006
https://dl.acm.org/doi/10.1145/1183401.1183411
Sangireddy R(2006)Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End ArchitecturesIEEE Transactions on Computers10.1109/TC.2006.8855:6(672-685)Online publication date: 1-Jun-2006
https://dl.acm.org/doi/10.1109/TC.2006.88
Sangireddy R(2006)Fast and low-power processor front-end with reduced rename logic circuit complexity2006 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2006.1692520(4)Online publication date: 2006
https://doi.org/10.1109/ISCAS.2006.1692520
Swanson SMichelson KSchwerin AOskin M(2003)WaveScalarProceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture10.5555/956417.956546Online publication date: 3-Dec-2003
https://dl.acm.org/doi/10.5555/956417.956546
Adve VLattner CBrukman MShukla AGaeke B(2003)LLVAProceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture10.5555/956417.956545Online publication date: 3-Dec-2003
https://dl.acm.org/doi/10.5555/956417.956545
Swanson SMichelson KSchwerin AOskin M(2003)WaveScalar22nd Digital Avionics Systems Conference. Proceedings (Cat. No.03CH37449)10.1109/MICRO.2003.1253203(291-302)Online publication date: 2003
https://doi.org/10.1109/MICRO.2003.1253203
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents