article

Free access

Improving performance of small on-chip instruction caches

Authors:

a. R. PleszkunAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 17, Issue 3

Pages 234 - 241

https://doi.org/10.1145/74926.74952

Published: 01 April 1989 Publication History

Abstract

Most current single-chip processors employ an on-chip instruction cache to improve performance. A miss in this instruction cache will cause an external memory reference which must compete with data references for access to the external memory, thus affecting the overall performance of the processor. One common way to reduce the number of off-chip instruction requests is to increase the size of the on-chip cache. An alternative approach is presented in this paper, in which a combination of an instruction cache, instruction queue and instruction queue buffer is used to achieve the same effect with a much smaller instruction cache size. Such an approach is significant for emerging technologies where high circuit densities are initially difficult to achieve yet a high level of performance is desired, or for more mature technologies where chip area can be used to provide more functionality. The viability of this approach is demonstrated by its implementation in an existing single-chip processor.

References

[1]

A. Agarwal, P. Chow, M. Horowitz, J. Acken. A. Salz, and J. Hennessy, "On-chip Instruction Caches for High Performance Processors." Proceedings of the Conference on Advanced Research in VLSI, Stanford, pp. l-24, March 1987.

[2]

"Advanced Micro Devices," AM29000 User's Manual (1987).

[3]

A. D. Berenbaum. B. W. Colbry, D. R. Ditzel, R. D. FErnan, H. R. McLellan, and K. J. O'Conner, "CRISP: A Pipelined 32-bit Microprocessor with 13k-bit of Cache Memory," IEEE Journal of Solid State Circuits, vol. SC-22, pp. 776782. October 1987.

[4]

J. R. Goodman, J.-t. Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, and H. C. Young, "PIPE: a VLSI Decoupled Architecture," Proceedings of the Twelfih Annual Symposium on Computer Architecture. pp. 20-27. June 1985.

Digital Library

[5]

G. F. Grohoski and J. H. Patel, "A Performance Model for Instruction Prcfetch in Pipclined Instruction" Units,"" Proceedings of the Ninth International Symposium on Parallel Processing, pp. 248-252, August 1982.

[6]

M. Horowitz, P. Chow. D. Stark, R.T. Simoni, A. Salz. S. Przybylski. J. Hennessy, G. Gulak, A. Agarwal, and J.M. Acken, "MIPS-X: A 20-MIPS Peak, 32-bit Microprocessor with On-Chip Cache," IEEE Journal of Solid-State Circuits, vol. SC-22, pp.790-799, Oct. 1987

[7]

J. Hennessy, "VLSI Processor Architecture," IEEE Transactions on Computers, vol. C-33, No. 12, pp.1221-1246, Dec. 1984

[8]

M. D. Hi, Aspects of Cache Memory and Instruction Buffer Peflormance, Doctoral Thesis, Department of Computer Sciences, University of California, Berkeley, California.

[9]

J. Hennessy, N. Jouppi, F. Baskett, T. Gross and J. Gill, "Hardware/Software Tradeoffs for Increased Performance." Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 33-54, March 1983.

Digital Library

[10]

J. Hermessy. N. Jouppi, S. Przybylski, C. Rowen, and T. Gross, "Design of a High Performance VLSI Processor," Proceedings of the Third Caltech Conference on VLSI, pp. 2-l I, March 1982.

[11]

H. Kadota. J. Miyake, I. Okabayashi, T. Maeda, T. Okamoto. M. Nakajima, and K. Kagawa, "A 32-bit CMOS Microprocessor with On-Chip Cache and TLB," IEEE Journal of Solid-State Circuits, vol. SC- 22. pp.800-807. Oct. 1987

[12]

F. H. McMahon, "LLNL FORTRANS KERNELS: MFL.OPS." Lawrence Livermore Laboratories, Livermore, CA, March 1984.

[13]

D. A. Patterson and C. H. Sequin, "Design Considerations for Single-Chin Commuters of the Future." IEEE &ansactiow"on Co;npute&, Vol. C-29, No. 2, February 1980.

[14]

A. R. Pleszkun and M. K. Farrens. "An Instruction Cache Design for use with a Delayed Branch," Advanced Research in VLSI: Proceedings of the Fourth MIT Conference, April 1986.

Digital Library

[15]

G. Radin, "The 801 Minicomputer," Symposium on Architectural Support for Programming Languages and Operating Systems," pp. 3947, March 1982.

Digital Library

[16]

B. R. Rau and G. E. Rossman. "The Effect of Instruction Fetch Strategies upon the. Performance of" Pipelined Instruction Units." Proceedings of the Fourth Annual Symposiwn on Computer Archiitecnue. 80-89, June 1977.

Digital Library

[17]

James E. Smith, "A Study of Branch Prediction Strategies", Proceedings of the Eighth Annual Symposium on Computer Architecture, pp. 135-148, May 1981.

Digital Library

[18]

J. E. Smith and J. R. Goodman, "Instruction Cache Replacement Policies and Organizations," IEEE Transactions on Computers, Vol. C-34, NO. 3, 234-241, March 1985.

[19]

A. J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, No. 3, September 1982.

Digital Library

[20]

H. C. Young and I. R. Goodman, "A Simulation Study of Architectural Data Queues and Preparebranch" Instruction, Proceedings of the IEEE tnternatiOnal Conference on Computer Design, pp. 544- 549. October 1984.

Cited By

Uhlig RNagle DMudge TSechrest SEmer J(1995)Instruction fetchingACM SIGARCH Computer Architecture News10.1145/225830.22444523:2(345-356)Online publication date: 1-May-1995
https://dl.acm.org/doi/10.1145/225830.224445
Farrens MPleszkun A(1991)Implementation of the PIPE ProcessorComputer10.1109/2.6719524:1(65-70)Online publication date: 1-Jan-1991
https://dl.acm.org/doi/10.1109/2.67195
Farrens MPleszkun APapachristou CAllan V(1990)An evaluation of functional unit lengths for single-chip processorsProceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture10.5555/255237.255278(209-215)Online publication date: 30-Nov-1990
https://dl.acm.org/doi/10.5555/255237.255278
Show More Cited By

Index Terms

Improving performance of small on-chip instruction caches
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
2. Hardware
  1. Very large scale integration design

Recommendations

Improving performance of small on-chip instruction caches
ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture

Most current single-chip processors employ an on-chip instruction cache to improve performance. A miss in this instruction cache will cause an external memory reference which must compete with data references for access to the external memory, thus ...
Performance of One's Complement Caches

On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which ...
Managing wire delay in chip multiprocessor caches

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 17, Issue 3

Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture

June 1989

400 pages

ISSN:0163-5964

DOI:10.1145/74926

Editor:
Jean-Claude Syre

Issue’s Table of Contents

ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture
April 1989
426 pages
ISBN:0897913191
DOI:10.1145/74925
Chairman:
Jean-Claude Syre

Copyright © 1989 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Published in SIGARCH Volume 17, Issue 3

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
475
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)20

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Uhlig RNagle DMudge TSechrest SEmer J(1995)Instruction fetchingACM SIGARCH Computer Architecture News10.1145/225830.22444523:2(345-356)Online publication date: 1-May-1995
https://dl.acm.org/doi/10.1145/225830.224445
Farrens MPleszkun A(1991)Implementation of the PIPE ProcessorComputer10.1109/2.6719524:1(65-70)Online publication date: 1-Jan-1991
https://dl.acm.org/doi/10.1109/2.67195
Farrens MPleszkun APapachristou CAllan V(1990)An evaluation of functional unit lengths for single-chip processorsProceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture10.5555/255237.255278(209-215)Online publication date: 30-Nov-1990
https://dl.acm.org/doi/10.5555/255237.255278
Jouppi N(1998)Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.285998(388-397)Online publication date: 1-Aug-1998
https://dl.acm.org/doi/10.1145/285930.285998
Pierce JMudge TMelvin SBeaty S(1996)Wrong-path instruction prefetchingProceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture10.5555/243846.243882(165-175)Online publication date: 2-Dec-1996
https://dl.acm.org/doi/10.5555/243846.243882
Lipasti MSchmidt WKunkel SRoediger RMudge TEbcioğlu K(1995)SPAIDProceedings of the 28th annual international symposium on Microarchitecture10.5555/225160.225197(231-236)Online publication date: 1-Dec-1995
https://dl.acm.org/doi/10.5555/225160.225197
Uhlig RNagle DMudge TSechrest SEmer J(1995)Instruction fetchingACM SIGARCH Computer Architecture News10.1145/225830.22444523:2(345-356)Online publication date: 1-May-1995
https://dl.acm.org/doi/10.1145/225830.224445
Uhlig RNagle DMudge TSechrest SEmer JPatterson D(1995)Instruction fetchingProceedings of the 22nd annual international symposium on Computer architecture10.1145/223982.224445(345-356)Online publication date: 1-Jul-1995
https://dl.acm.org/doi/10.1145/223982.224445
Nagle DUhlig RMudge TSechrest S(1994)Optimal allocation of on-chip memory for multiple-API operating systemsACM SIGARCH Computer Architecture News10.1145/192007.19207022:2(358-369)Online publication date: 1-Apr-1994
https://dl.acm.org/doi/10.1145/192007.192070
Nagle DUhlig RMudge TSechrest SPatterson D(1994)Optimal allocation of on-chip memory for multiple-API operating systemsProceedings of the 21st annual international symposium on Computer architecture10.1145/191995.192070(358-369)Online publication date: 18-Apr-1994
https://dl.acm.org/doi/10.1145/191995.192070
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents