article

Free access

Performance optimization of pipelined primary cache

Authors:

Kunle Olukotun,

Richard BrownAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 20, Issue 2

Pages 181 - 190

https://doi.org/10.1145/146628.139726

Published: 01 April 1992 Publication History

Abstract

The CPU cycle time of a high-performance processor is usually determined by the access time of the primary cache. As processors speeds increase, designers will have to increase the number of pipeline stages used to fetch data from the cache in order to reduce the dependence of CPU cycle time on cache access time. This paper studies the performance advantages of a pipelined cache for a GaAs implementation of the MIPS based architecture using a design methodology that includes long traces of multiprogrammed applications and detailed timing analysis. The study evaluates instruction and data caches with various pipeline depths, cache sizes, block sizes, and refill penalties. The impact on CPU cycle time of these alternatives is also factored into the evaluation. Hardware-based and software-based strategies are considered for hiding the branch and load delays which may be required to avoid pipeline hazards. The results show that software-based methods for mitigating the penalty of branch delays can be as successful as the hardware-based branch-target buffer approach, despite the code-expansion inherent in the software methods. The situation is similar for load delays; while hardware-based dynamic methods hide more delay cycles than do static approaches, they may give up the advantage by extending the cycle time. Because these methods are quite successful at hiding small numbers of branch and load delays, and because processors with pipelined caches also have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two to three pipeline stages to fetch data from the cache.

References

[1]

H.B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Reading, Massachusetts: Addison- Wesley Publishing Company, 1990.

[2]

E Chow, S. Correll, M. Himestei, E. Killian, and L. Weber, "How many addressing modes are enough?," in Proc. 2nd Int. Conf. Architectural Support for Programming Languages and Operating Systents (ASPLOS-II), pp. I 17-121, Oct. 1987.

[3]

T.I. Chappell, B. A. Chappell, S. E. Schuster, J. W. Allan, S. P. Klepner, R. V. Joshi, and R. L. Franch, "A 2-ns cycle, 3.8-ns access 512-kb CMOS ECL SRAM with a fully pipelined architecture," IEEE Jour of Solid-State Circuits, vol. 26, pp. 1577-1585, Nov. 1991.

[4]

W.W. Flwu, T. M. Conte, and P. P. Chang, "Comparing software and hardware schemes for reducing the cost of branches," in Proc.16th Annual Int. Symp. ComputerArchitecture, pp. 224-233, June 1989.

Digital Library

[5]

M.D. Hill, Aspects of Cache Memory and Instruction Buffer Performance. PhD thesis, University of California, Berkeley, 1987.

Digital Library

[6]

J.L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach. Sara Mateo, California: Morgan Kaufman Publishers, inc., 1990.

Digital Library

[7]

G. Kane and J. Heinrich, MIPS RISC Architecture. Englewood Cliffs, New Jersey: Prentice Hall, 1992.

Digital Library

[8]

M. Katevenis and N. Tzartzanis, "Reducing the branch penalty by rearranging instructions in a double-width memory," in Proc. 4th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pp. 15-27, Apr. 1991,

Digital Library

[9]

D.J. Lilja, "Reducing the branch penalty in pipelined processors," IEEE Computer Magazine, vol. 21, pp. 47-55, July 1988.

Digital Library

[10]

J.K.F. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," IEEE Computer Magazine, vol. 17, pp. 6-22, Jan. 1984.

Digital Library

[11]

T. N. Mudge, R. B. Brown, W. P. Bh'mingham, J. A. Dykstra, A. i. Kayssi, R. J. Lomax, O. A. Olukotun, K. A. Sakallah, and R. Millano, "The design of a microsupercomputer," IEEE Computer Magazine, vol. 24, Jan. 1991.

Digital Library

[12]

S. McFarling and J. Hennessy, "Reducing the cost of branches," in Proc.13th Annual Int. Symp. Computer Architecture, pp. 396-403, june 1986.

Digital Library

[13]

MIPS Computer Systems, Inc, MIPS RISCompiler Languages Programmer's Guide, Dec. 1988.

[14]

O. A. Olukotun, R. B. Brown, R. J. Lomax, T. N. Mudge, and K. A. Sakallah, "Multilevel optimization in the design of a high-performance GaAs microcomputer," IEEE J. Solid-State Circuits, vol. 26, May 1991.

[15]

O.A. Olukotun, Technology-Organization Tradeoffs in the Architecture of a High Performance Processor. PhD thesis, The University of Michigan, Ann Arbor, 1991.

Digital Library

[16]

S.A. Przybylski, Cache and Memory Hierarchy Design. San Mateo, California: Morgan Kaufman Publishers, inc., 1990.

Digital Library

[17]

D. Patterson and C. Sdquin, "A VLSI RISC," IEEE Computer Magazine, vol. 15, pp. 8-21, Sept. 1982.

Digital Library

[18]

A.J. Smith, "A comparative study of set associative memory mapping algorithms and their use for cache and main memory," IEEE Tram. Software Engineering, vol. SE-4, pp. 121-130, Mar. 1978.

Digital Library

[19]

J.E. Smith, "A study of branch prediction strategies," in Proc. 8th Annual Int. Symp. Computer Architecture, pp. 135-147, July 1981.

Digital Library

[20]

K.A. Sakallah, T. N. Mudge, and O. A. Olukotun, "checkT, and mint,. : Tmaing verification and optimal clocking of synchronous digital circuits," in Proc. IEEE Conf. Computer-Aided Design, (Santa Clara, California), Nov. 1990.

Cited By

Ninos CVergos HNikolos D(2000)Design and Analysis of On-Chip CPU Pipelined CachesVLSI: Systems on a Chip10.1007/978-0-387-35498-9_15(161-172)Online publication date: 2000
https://doi.org/10.1007/978-0-387-35498-9_15
Albonesi DKoren I(1999)STATSJournal of Systems Architecture: the EUROMICRO Journal10.1016/S1383-7621(98)00052-645:12-13(1097-1110)Online publication date: 1-Jun-1999
https://dl.acm.org/doi/10.1016/S1383-7621%2898%2900052-6
Mudge T(2014)Author retrospective improving data cache performance by pre-executing instructions under a cache missACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591655(40-41)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591655
Show More Cited By

Index Terms

Performance optimization of pipelined primary cache

Recommendations

Performance optimization of pipelined primary cache
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture

The CPU cycle time of a high-performance processor is usually determined by the access time of the primary cache. As processors speeds increase, designers will have to increase the number of pipeline stages used to fetch data from the cache in order to ...
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Effects of Multithreading on Cache Performance
Special issue on cache memory and related problems

As the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 20, Issue 2

Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)

May 1992

429 pages

ISSN:0163-5964

DOI:10.1145/146628

Editor:
Allan Gotlieb
New York Univ., New York, NY

Issue’s Table of Contents

ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
May 1992
439 pages
ISBN:0897915097
DOI:10.1145/139669
Chairman:
Allan Gottlieb
New York Unvi., New York, NY

Copyright © 1992 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1992

Published in SIGARCH Volume 20, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
780
Total Downloads

Downloads (Last 12 months)202
Downloads (Last 6 weeks)39

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ninos CVergos HNikolos D(2000)Design and Analysis of On-Chip CPU Pipelined CachesVLSI: Systems on a Chip10.1007/978-0-387-35498-9_15(161-172)Online publication date: 2000
https://doi.org/10.1007/978-0-387-35498-9_15
Albonesi DKoren I(1999)STATSJournal of Systems Architecture: the EUROMICRO Journal10.1016/S1383-7621(98)00052-645:12-13(1097-1110)Online publication date: 1-Jun-1999
https://dl.acm.org/doi/10.1016/S1383-7621%2898%2900052-6
Mudge T(2014)Author retrospective improving data cache performance by pre-executing instructions under a cache missACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591655(40-41)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591655
Majumder PT VMutyam M(2014)SFFMap: Set-First Fill mapping for an energy efficient pipelined data cache2014 IEEE 32nd International Conference on Computer Design (ICCD)10.1109/ICCD.2014.6974669(104-109)Online publication date: Oct-2014
https://doi.org/10.1109/ICCD.2014.6974669
Wilson KOlukotun K(2001)High Bandwidth On-Chip Cache DesignIEEE Transactions on Computers10.1109/12.91927650:4(292-307)Online publication date: 1-Apr-2001
https://dl.acm.org/doi/10.1109/12.919276
Lynch WLauterbach GChamdani J(1998)Low load latency through sum-addressed memory (SAM)ACM SIGARCH Computer Architecture News10.1145/279361.27940626:3(369-379)Online publication date: 16-Apr-1998
https://dl.acm.org/doi/10.1145/279361.279406
Lynch WLauterbach GChamdani JValero MSohi G(1998)Low load latency through sum-addressed memory (SAM)Proceedings of the 25th annual international symposium on Computer architecture10.1145/279358.279406(369-379)Online publication date: 16-Apr-1998
https://dl.acm.org/doi/10.1145/279358.279406
Wilson KOlukotun K(1997)Designing high bandwidth on-chip cachesACM SIGARCH Computer Architecture News10.1145/384286.26415325:2(121-132)Online publication date: 1-May-1997
https://dl.acm.org/doi/10.1145/384286.264153
Wilson KOlukotun KPleszkun AMudge T(1997)Designing high bandwidth on-chip cachesProceedings of the 24th annual international symposium on Computer architecture10.1145/264107.264153(121-132)Online publication date: 1-Jun-1997
https://dl.acm.org/doi/10.1145/264107.264153
Lioupis DMilios S(1997)The effects of cache architecture on the performance of operating systems in multithreaded processorsProceedings Ninth Euromicro Workshop on Real Time Systems10.1109/EMWRTS.1997.613766(72-79)Online publication date: 1997
https://doi.org/10.1109/EMWRTS.1997.613766
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents