Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/859618.859645acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Parallelism in the front-end

Published: 01 May 2003 Publication History

Abstract

As processor back-ends get more aggressive, front-ends will have to scale as well. Although the back-ends of superscalar processors have continued to become more parallel, the front-ends remain sequential. This paper describes techniques for fetching and renaming multiple non-contiguous portions of the dynamic instruction stream in parallel using multiple fetch and rename units. It demonstrates that parallel front-ends are a viable alternative to high-performance sequential front-ends.Compared with an equivalently-sized trace cache, our technique increases cache bandwidth utilization by 17%, front-end throughput by 20%, and performance by 5%. Parallelism also enhances latency tolerance: a parallel front-end loses only 6% performance as the cache size is decreased from 128 KB to 8 KB, compared with a 50--65% performance loss for sequential fetch mechanisms.

References

[1]
V. Bala, E. Duesterwald, and S. Banerjia. Transparent Dynamic Optimization. Technical Report HPL-1999-77, Hewlett Packard Labs, June 1999.
[2]
T. Ball and J. R. Larus. Branch Prediction For Free. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 300--313, Albuquerque, New Mexico, June 23--25, 1993.
[3]
S. Breach. Design and Evaluation of a Multiscalar Processor. Ph.D. thesis, University of Wisconsin-Madison, 1998.
[4]
D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin-Madison, Jun. 1997.
[5]
B. Calder and D. Grunwald. Reducing Branch Costs via Branch Alignment. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 242--251, San Jose, California, October 4--7, 1994.
[6]
C-Y. Cher and T. N. Vijaykumar. Skipper: A Microarchitecture For Exploiting Control-flow Independence. In Proceedings of the 34th Annual International Symposium on Microarchitecture, Austin, Texas, Dec. 2--5, 2001.
[7]
T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel. Optimization of Instruction Fetch Mechanisms for High Issue Rates. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 333--344, Santa Margherita Ligure, Italy, June 22--24, 1995.
[8]
J. Emer. EV8: The Post--Ultimate Alpha. Keynote Address, 10th International Conference on Parallel Architectures and Compilation Techniques, 2001.
[9]
M. Franklin and M. Smotherman. A Fill-Unit Approach to Multiple Instruction Issue. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 162--171, November 30-December 2, 1994.
[10]
D. H. Friendly, S. J. Patel, and Y. N. Patt. Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors. In Proceedings of the 31st Annual International Symposium on Microarchitecture, pages 173--181, Dallas, Texas, November 30-December 2, 1998.
[11]
Q. Jacobson, E. Rotenberg, and J. E. Smith. Path-Based Next Trace Prediction. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 14--23, Dec. 1--3, 1997.
[12]
R. Muth, S. Debray, S. Watterson, and K. de Bosschere. ALTO: A Link-Time Optimizer for the DEC Alpha. Technical Report TR98-14, University of Arizona, September 1998.
[13]
P. S. Oberoi and G. S. Sohi. Out-of-Order Instruction Fetch using Multiple Sequencers. In Proceedings of the 2002 International Conference on Parallel Processing, pages 14--23, Vancouver, Canada, August 18--21, 2002.
[14]
S. J. Patel, D. H. Friendly, and Y. N. Patt. Critical Issues Regarding the Trace Cache Fetch Mechanism. Technical Report CSE-TR-335-97, Department of Electrical Engineering and Computer Science, University of Michigan, May 1997.
[15]
S. J. Patel, T. Tung, S. Bose, and M. M. Crum. Increasing the Size of Atomic Instruction Blocks Using Control Flow Assertions. In Proceedings of the 33rd Annual International Symposium on Microarchitecture, pages 303--313, Monterey, California, December 10--13, 2000.
[16]
A. Peleg and U. Weiser. Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line. US Patent 5,381,533, March 30, 1994.
[17]
M. Postiff, G. Tyson, and T. Mudge. Performance Limits of Trace Caches. Journal of Instruction-Level Parallelism, 1, August 1998.
[18]
A. Ramirez, J-L. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero. Software Trace Cache. In Proceedings of the 1999 international conference on Supercomputing, pages 119--126, Rhodes, Greece, 1999.
[19]
A. Ramirez, O. J. Santana, J. L. Larriba-Pey, and M. Valero. Fetching Instruction Streams. In Proceedings of the 35rd Annual International Symposium on Microarchitecture, Istanbul, Turkey, November 18--22, 2002.
[20]
E. Rotenberg, S. Bennett, and J. E. Smith. Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 24--34, Paris, France, Dec. 2--4, 1996.
[21]
G. S. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In Proc. 22nd International Symposium on Computer Architecture, pages 414--425, Jun. 1995.
[22]
J. Stark, P. Racunas, and Y. N. Patt. Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions into the Reservation Stations Out-of-Order. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 34--43, Dec. 1--3, 1997.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture
June 2003
432 pages
ISBN:0769519458
DOI:10.1145/859618
  • Conference Chair:
  • Allan Gottlieb,
  • Program Chair:
  • Kai Li
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 31, Issue 2
    ISCA 2003
    May 2003
    422 pages
    ISSN:0163-5964
    DOI:10.1145/871656
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA03
Sponsor:
ISCA03: International Symposium on Computer Architecture
June 9 - 11, 2003
California, San Diego

Acceptance Rates

ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2010)Graceful degradation in performance of wavescalar2010 International Conference on Computer and Communication Technology (ICCCT)10.1109/ICCCT.2010.5640456(693-697)Online publication date: Sep-2010
  • (2007)GingerACM SIGARCH Computer Architecture News10.1145/1273440.125071635:2(436-447)Online publication date: 9-Jun-2007
  • (2007)GingerProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250716(436-447)Online publication date: 9-Jun-2007
  • (2007)SuperCacheProceedings of the International Conference on Information Technology10.1109/ITNG.2007.189(908-914)Online publication date: 2-Apr-2007
  • (2006)Wide and efficient trace prediction using the local trace predictorProceedings of the 20th annual international conference on Supercomputing10.1145/1183401.1183411(55-65)Online publication date: 28-Jun-2006
  • (2006)Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End ArchitecturesIEEE Transactions on Computers10.1109/TC.2006.8855:6(672-685)Online publication date: 1-Jun-2006
  • (2006)Fast and low-power processor front-end with reduced rename logic circuit complexity2006 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2006.1692520(4)Online publication date: 2006
  • (2003)WaveScalarProceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture10.5555/956417.956546Online publication date: 3-Dec-2003
  • (2003)LLVAProceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture10.5555/956417.956545Online publication date: 3-Dec-2003
  • (2003)WaveScalar22nd Digital Avionics Systems Conference. Proceedings (Cat. No.03CH37449)10.1109/MICRO.2003.1253203(291-302)Online publication date: 2003
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media