Article

Free access

Increasing the instruction fetch rate via block-structured instruction set architectures

Authors:

Yale N. PattAuthors Info & Claims

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Pages 191 - 200

Published: 02 December 1996 Publication History

Abstract

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA processor executing enlarged atomic blocks outperforms a conventional ISA processor by 12% while using simpler microarchitectural mechanisms to support wide-issue and dynamic scheduling.

References

[1]

T. M. Conte, K. N. Menezes, P. M. Mills, and B. Patel. Optimization of instruction fetch mechanisms for high issue rates. In Proceedings of the 22st Annual International Symposium on Computer Architecture, pages 333-344, 1995.

Digital Library

[2]

S. Dutta and M. Franklin. Control flow prediction with treelike subgraphs for superscalar processors. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 258-263, 1995.

Digital Library

[3]

K. Ebcio~,lu. Some design ideas for a VLIW architecture for sequential natured software. Parallel Processing (Proceedings of iFIP WG 10.3 Working Conference on Parallel Processing, pages 3-21, Apr. 1988.

[4]

J. A. Fisher. 2'~-way jump microinstruction hardware and an effective instruction binding method. In Proceedings of the 13th Annual Microprogramming Workshop, pages 64-75, 1980.

Digital Library

[5]

J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C- 30(7):478-490, July 1981.

Digital Library

[6]

M. Franklin and G. S. Sohi. The expandable split window paradigm for exploiting fine-grain parallelism. In Proceedings of the 19th Annual international Symposium on Computer Architecture, pages 58-67, 1992.

Digital Library

[7]

P. Hsu and E. Davidson. Highly concurrent scalar processing. In Proceedings of the 13th Annual International Symposium on Computer Architecture, 1986.

Digital Library

[8]

W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Wafter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for VLiW and superscalar compilation. Journal of Supercomputing, 7(9-50), 1993.

Digital Library

[9]

Intel Corporation. Intel Reference C Compiler User's Guide for UNIX Systems, 1993.

[10]

K. Karplus and A. Nicolau. Efficient hardware for multi-way jumps and prefetches. In Proceedings of the 18th Annual Microprogramming Workshop, pages 11-18, 1985.

Digital Library

[11]

S.A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu. Characterizing the impact of predicated execution on branch prediction, in Proceedings of the 27th Annual ACM/IEEE International Symposium on Microarchitecture, pages 217-227, 1994.

Digital Library

[12]

S.A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25thAnnual A CM/IEEE International Symposium on Microarchitecture, pages 45-54, 1992.

Digital Library

[13]

S. Melvin and Y. Patt. Enhancing instruction scheduling with a block-structured ISA. International Journal on Parallel Processing, 23(3):221-243, 1995.

Digital Library

[14]

S. Melvin and Y. N. Patt. Exploiting fine-grained parallelism through a combination of hardware and software techniques. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 287-297, 1991.

Digital Library

[15]

S.-M. Moon and K. Ebcio~,lu. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. In Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, pages 55-71, 1992.

Digital Library

[16]

Y. Patt, W. Hwu, and M. Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Microprogramming Workshop, pages 103-107, 1985.

Digital Library

[17]

Y.N. Part, S. W. Melvin, W. Hwu, and M. C. Shebanow. Critical issues regarding HPS, a high performance microarchitecture. In Proceedings of the 18th Annual Microprogramming Workshop, pages 109-116, 1985.

Digital Library

[18]

D.N. Pnevmatikatos and G. S. Sohi. Guarded execution and dynamic branch prediction in dynamic ILP processors. In Proceedings of the 2 Ist Annual International Symposium on Computer Architecture, pages 120-129, 1994.

Digital Library

[19]

E. Rotenberg, S. Bennett, and J. E. Smith. Trace cache: A low latency approach to high bandwidth instruction fetching. Technical Report 1310, University of Wisconsin - Madison, Apr. 1996.

Digital Library

[20]

A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multipleblock ahead branch predictors. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 1996. To appear.

Digital Library

[21]

G. S. Sohi, S. E. Breach, and T N. Vijaykumar. Multiscalar processors. In Proceedings of the 22st Annual International Symposium on Computer Architecture, 1995.

Digital Library

[22]

E. Sprangle and Y. Part. Facilitating superscalar processing via a combined static/dynamic register renaming scheme. In Proceedings of the 27th Annual ACM/IEEE International Symposium on Microarchitecture, pages 143-147, 1994.

Digital Library

[23]

R. Uhlig, D. Nagle, T Mudge, S. Sechrest, and J. Emer. Instruction fetching: Coping with code bloat, in Proceedings of the 22st Annual International Symposium on Computer Architecture, pages 345-356, 1995.

Digital Library

[24]

T-Y. Yeh, D. Marr, and Y. N. Part. Increasing the instruction fetch rate via multiple branch prediction and branch address cache. In Proceedings of the International Conference on Supercomputing, pages 67-76, 1993.

Digital Library

[25]

T.-Y. Yeh and Y. N. Patt. Two-level adaptive branch prediction. In Proceedings of the 24th Annual ACM/IEEE International Symposium on Microarchitecture, pages 51-6 I, 1991.

Digital Library

Cited By

Lakshminarayana NHyesoon Kim (2015)Block-Precise Processors: Low-Power Processors with Reduced Operand Store Accesses and Result BroadcastsIEEE Transactions on Computers10.1109/TC.2015.239543664:11(3102-3114)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1109/TC.2015.2395436
Maher BSmith ABurger DMcKinley K(2006)Merging Head and Tail Duplication for Convergent Hyperblock FormationProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.34(65-76)Online publication date: 9-Dec-2006
https://dl.acm.org/doi/10.1109/MICRO.2006.34
Sankaralingam KNagarajan RMcDonald RDesikan RDrolia SGovindan MGratz PGulati DHanson HKim CLiu HRanganathan NSethumadhavan SSharif SShivakumar PKeckler SBurger D(2006)Distributed Microarchitectural Protocols in the TRIPS Prototype ProcessorProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.19(480-491)Online publication date: 9-Dec-2006
https://dl.acm.org/doi/10.1109/MICRO.2006.19
Show More Cited By

Index Terms

Increasing the instruction fetch rate via block-structured instruction set architectures
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Hardware
  1. Electronic design automation
    1. Methodologies for EDA
  2. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
      2. Design modules and hierarchy

Recommendations

Block enlargement optimizations for increasing the instruction fetch rate in block-structured instruction set architectures
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance ...
Automatic custom instruction identification for application-specific instruction set processors

The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

December 1996

359 pages

ISBN:0818676418

Chairmen:
Stephen Melvin
Zytek Communications Corp.
,
Steve Beaty
Hewlett-Packard Corp.

Copyright © Copyright (c) 1996 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\TCMM: TC on Microprocessors & Microcomputers

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 December 1996

Check for updates

Qualifiers

Article

Conference

MICRO96

Sponsor:

SIGMICRO
IEEE-CS\TCMM

MICRO96: 29th Annual International Symposium on Microarchitecture

December 2 - 4, 1996

Paris, France

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
398
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)14

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lakshminarayana NHyesoon Kim (2015)Block-Precise Processors: Low-Power Processors with Reduced Operand Store Accesses and Result BroadcastsIEEE Transactions on Computers10.1109/TC.2015.239543664:11(3102-3114)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1109/TC.2015.2395436
Maher BSmith ABurger DMcKinley K(2006)Merging Head and Tail Duplication for Convergent Hyperblock FormationProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.34(65-76)Online publication date: 9-Dec-2006
https://dl.acm.org/doi/10.1109/MICRO.2006.34
Sankaralingam KNagarajan RMcDonald RDesikan RDrolia SGovindan MGratz PGulati DHanson HKim CLiu HRanganathan NSethumadhavan SSharif SShivakumar PKeckler SBurger D(2006)Distributed Microarchitectural Protocols in the TRIPS Prototype ProcessorProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.19(480-491)Online publication date: 9-Dec-2006
https://dl.acm.org/doi/10.1109/MICRO.2006.19
Nagarajan RSankaralingam KBurger DKeckler SPatt YFisher JFaraboschi PSkadron K(2001)A design space evaluation of grid processor architecturesProceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture10.5555/563998.564005(40-51)Online publication date: 1-Dec-2001
https://dl.acm.org/doi/10.5555/563998.564005
Reinman GCalder BAustin T(2001)Optimizations Enabled by a Decoupled Front-End ArchitectureIEEE Transactions on Computers10.1109/12.91927950:4(338-355)Online publication date: 1-Apr-2001
https://dl.acm.org/doi/10.1109/12.919279
Reinman GAustin TCalder B(1999)A scalable front-end architecture for fast instruction deliveryACM SIGARCH Computer Architecture News10.1145/307338.30099927:2(234-245)Online publication date: 1-May-1999
https://dl.acm.org/doi/10.1145/307338.300999
Black BRychlik BShen J(1999)The block-based trace cacheACM SIGARCH Computer Architecture News10.1145/307338.30099627:2(196-207)Online publication date: 1-May-1999
https://dl.acm.org/doi/10.1145/307338.300996
Reinman GAustin TCalder BGottlieb ADally W(1999)A scalable front-end architecture for fast instruction deliveryProceedings of the 26th annual international symposium on Computer architecture10.1145/300979.300999(234-245)Online publication date: 2-May-1999
https://dl.acm.org/doi/10.1145/300979.300999
Black BRychlik BShen JGottlieb ADally W(1999)The block-based trace cacheProceedings of the 26th annual international symposium on Computer architecture10.1145/300979.300996(196-207)Online publication date: 2-May-1999
https://dl.acm.org/doi/10.1145/300979.300996
Patel SFriendly DPatt Y(1999)Evaluation of Design Options for the Trace Cache Fetch MechanismIEEE Transactions on Computers10.1109/12.75266148:2(193-204)Online publication date: 1-Feb-1999
https://dl.acm.org/doi/10.1109/12.752661
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents