Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/243846.243885acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

Increasing the instruction fetch rate via block-structured instruction set architectures

Published: 02 December 1996 Publication History

Abstract

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA processor executing enlarged atomic blocks outperforms a conventional ISA processor by 12% while using simpler microarchitectural mechanisms to support wide-issue and dynamic scheduling.

References

[1]
T. M. Conte, K. N. Menezes, P. M. Mills, and B. Patel. Optimization of instruction fetch mechanisms for high issue rates. In Proceedings of the 22st Annual International Symposium on Computer Architecture, pages 333-344, 1995.
[2]
S. Dutta and M. Franklin. Control flow prediction with treelike subgraphs for superscalar processors. In Proceedings of the 28th Annual A CM/IEEE International Symposium on Microarchitecture, pages 258-263, 1995.
[3]
K. Ebcio~,lu. Some design ideas for a VLIW architecture for sequential natured software. Parallel Processing (Proceedings of iFIP WG 10.3 Working Conference on Parallel Processing, pages 3-21, Apr. 1988.
[4]
J. A. Fisher. 2'~-way jump microinstruction hardware and an effective instruction binding method. In Proceedings of the 13th Annual Microprogramming Workshop, pages 64-75, 1980.
[5]
J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C- 30(7):478-490, July 1981.
[6]
M. Franklin and G. S. Sohi. The expandable split window paradigm for exploiting fine-grain parallelism. In Proceedings of the 19th Annual international Symposium on Computer Architecture, pages 58-67, 1992.
[7]
P. Hsu and E. Davidson. Highly concurrent scalar processing. In Proceedings of the 13th Annual International Symposium on Computer Architecture, 1986.
[8]
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Wafter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for VLiW and superscalar compilation. Journal of Supercomputing, 7(9-50), 1993.
[9]
Intel Corporation. Intel Reference C Compiler User's Guide for UNIX Systems, 1993.
[10]
K. Karplus and A. Nicolau. Efficient hardware for multi-way jumps and prefetches. In Proceedings of the 18th Annual Microprogramming Workshop, pages 11-18, 1985.
[11]
S.A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu. Characterizing the impact of predicated execution on branch prediction, in Proceedings of the 27th Annual ACM/IEEE International Symposium on Microarchitecture, pages 217-227, 1994.
[12]
S.A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25thAnnual A CM/IEEE International Symposium on Microarchitecture, pages 45-54, 1992.
[13]
S. Melvin and Y. Patt. Enhancing instruction scheduling with a block-structured ISA. International Journal on Parallel Processing, 23(3):221-243, 1995.
[14]
S. Melvin and Y. N. Patt. Exploiting fine-grained parallelism through a combination of hardware and software techniques. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 287-297, 1991.
[15]
S.-M. Moon and K. Ebcio~,lu. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. In Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, pages 55-71, 1992.
[16]
Y. Patt, W. Hwu, and M. Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Microprogramming Workshop, pages 103-107, 1985.
[17]
Y.N. Part, S. W. Melvin, W. Hwu, and M. C. Shebanow. Critical issues regarding HPS, a high performance microarchitecture. In Proceedings of the 18th Annual Microprogramming Workshop, pages 109-116, 1985.
[18]
D.N. Pnevmatikatos and G. S. Sohi. Guarded execution and dynamic branch prediction in dynamic ILP processors. In Proceedings of the 2 Ist Annual International Symposium on Computer Architecture, pages 120-129, 1994.
[19]
E. Rotenberg, S. Bennett, and J. E. Smith. Trace cache: A low latency approach to high bandwidth instruction fetching. Technical Report 1310, University of Wisconsin - Madison, Apr. 1996.
[20]
A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multipleblock ahead branch predictors. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, 1996. To appear.
[21]
G. S. Sohi, S. E. Breach, and T N. Vijaykumar. Multiscalar processors. In Proceedings of the 22st Annual International Symposium on Computer Architecture, 1995.
[22]
E. Sprangle and Y. Part. Facilitating superscalar processing via a combined static/dynamic register renaming scheme. In Proceedings of the 27th Annual ACM/IEEE International Symposium on Microarchitecture, pages 143-147, 1994.
[23]
R. Uhlig, D. Nagle, T Mudge, S. Sechrest, and J. Emer. Instruction fetching: Coping with code bloat, in Proceedings of the 22st Annual International Symposium on Computer Architecture, pages 345-356, 1995.
[24]
T-Y. Yeh, D. Marr, and Y. N. Part. Increasing the instruction fetch rate via multiple branch prediction and branch address cache. In Proceedings of the International Conference on Supercomputing, pages 67-76, 1993.
[25]
T.-Y. Yeh and Y. N. Patt. Two-level adaptive branch prediction. In Proceedings of the 24th Annual ACM/IEEE International Symposium on Microarchitecture, pages 51-6 I, 1991.

Cited By

View all
  • (2015)Block-Precise Processors: Low-Power Processors with Reduced Operand Store Accesses and Result BroadcastsIEEE Transactions on Computers10.1109/TC.2015.239543664:11(3102-3114)Online publication date: 1-Nov-2015
  • (2006)Merging Head and Tail Duplication for Convergent Hyperblock FormationProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.34(65-76)Online publication date: 9-Dec-2006
  • (2006)Distributed Microarchitectural Protocols in the TRIPS Prototype ProcessorProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.19(480-491)Online publication date: 9-Dec-2006
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
December 1996
359 pages
ISBN:0818676418

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 December 1996

Check for updates

Qualifiers

  • Article

Conference

MICRO96
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)14
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Block-Precise Processors: Low-Power Processors with Reduced Operand Store Accesses and Result BroadcastsIEEE Transactions on Computers10.1109/TC.2015.239543664:11(3102-3114)Online publication date: 1-Nov-2015
  • (2006)Merging Head and Tail Duplication for Convergent Hyperblock FormationProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.34(65-76)Online publication date: 9-Dec-2006
  • (2006)Distributed Microarchitectural Protocols in the TRIPS Prototype ProcessorProceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2006.19(480-491)Online publication date: 9-Dec-2006
  • (2001)A design space evaluation of grid processor architecturesProceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture10.5555/563998.564005(40-51)Online publication date: 1-Dec-2001
  • (2001)Optimizations Enabled by a Decoupled Front-End ArchitectureIEEE Transactions on Computers10.1109/12.91927950:4(338-355)Online publication date: 1-Apr-2001
  • (1999)A scalable front-end architecture for fast instruction deliveryACM SIGARCH Computer Architecture News10.1145/307338.30099927:2(234-245)Online publication date: 1-May-1999
  • (1999)The block-based trace cacheACM SIGARCH Computer Architecture News10.1145/307338.30099627:2(196-207)Online publication date: 1-May-1999
  • (1999)A scalable front-end architecture for fast instruction deliveryProceedings of the 26th annual international symposium on Computer architecture10.1145/300979.300999(234-245)Online publication date: 2-May-1999
  • (1999)The block-based trace cacheProceedings of the 26th annual international symposium on Computer architecture10.1145/300979.300996(196-207)Online publication date: 2-May-1999
  • (1999)Evaluation of Design Options for the Trace Cache Fetch MechanismIEEE Transactions on Computers10.1109/12.75266148:2(193-204)Online publication date: 1-Feb-1999
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media