Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Performance optimization of pipelined primary cache

Published: 01 April 1992 Publication History

Abstract

The CPU cycle time of a high-performance processor is usually determined by the access time of the primary cache. As processors speeds increase, designers will have to increase the number of pipeline stages used to fetch data from the cache in order to reduce the dependence of CPU cycle time on cache access time. This paper studies the performance advantages of a pipelined cache for a GaAs implementation of the MIPS based architecture using a design methodology that includes long traces of multiprogrammed applications and detailed timing analysis. The study evaluates instruction and data caches with various pipeline depths, cache sizes, block sizes, and refill penalties. The impact on CPU cycle time of these alternatives is also factored into the evaluation. Hardware-based and software-based strategies are considered for hiding the branch and load delays which may be required to avoid pipeline hazards. The results show that software-based methods for mitigating the penalty of branch delays can be as successful as the hardware-based branch-target buffer approach, despite the code-expansion inherent in the software methods. The situation is similar for load delays; while hardware-based dynamic methods hide more delay cycles than do static approaches, they may give up the advantage by extending the cycle time. Because these methods are quite successful at hiding small numbers of branch and load delays, and because processors with pipelined caches also have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two to three pipeline stages to fetch data from the cache.

References

[1]
H.B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Reading, Massachusetts: Addison- Wesley Publishing Company, 1990.
[2]
E Chow, S. Correll, M. Himestei, E. Killian, and L. Weber, "How many addressing modes are enough?," in Proc. 2nd Int. Conf. Architectural Support for Programming Languages and Operating Systents (ASPLOS-II), pp. I 17-121, Oct. 1987.
[3]
T.I. Chappell, B. A. Chappell, S. E. Schuster, J. W. Allan, S. P. Klepner, R. V. Joshi, and R. L. Franch, "A 2-ns cycle, 3.8-ns access 512-kb CMOS ECL SRAM with a fully pipelined architecture," IEEE Jour of Solid-State Circuits, vol. 26, pp. 1577-1585, Nov. 1991.
[4]
W.W. Flwu, T. M. Conte, and P. P. Chang, "Comparing software and hardware schemes for reducing the cost of branches," in Proc.16th Annual Int. Symp. ComputerArchitecture, pp. 224-233, June 1989.
[5]
M.D. Hill, Aspects of Cache Memory and Instruction Buffer Performance. PhD thesis, University of California, Berkeley, 1987.
[6]
J.L. Hennessy and D. A. Patterson, Computer Architecture A Quantitative Approach. Sara Mateo, California: Morgan Kaufman Publishers, inc., 1990.
[7]
G. Kane and J. Heinrich, MIPS RISC Architecture. Englewood Cliffs, New Jersey: Prentice Hall, 1992.
[8]
M. Katevenis and N. Tzartzanis, "Reducing the branch penalty by rearranging instructions in a double-width memory," in Proc. 4th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pp. 15-27, Apr. 1991,
[9]
D.J. Lilja, "Reducing the branch penalty in pipelined processors," IEEE Computer Magazine, vol. 21, pp. 47-55, July 1988.
[10]
J.K.F. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," IEEE Computer Magazine, vol. 17, pp. 6-22, Jan. 1984.
[11]
T. N. Mudge, R. B. Brown, W. P. Bh'mingham, J. A. Dykstra, A. i. Kayssi, R. J. Lomax, O. A. Olukotun, K. A. Sakallah, and R. Millano, "The design of a microsupercomputer," IEEE Computer Magazine, vol. 24, Jan. 1991.
[12]
S. McFarling and J. Hennessy, "Reducing the cost of branches," in Proc.13th Annual Int. Symp. Computer Architecture, pp. 396-403, june 1986.
[13]
MIPS Computer Systems, Inc, MIPS RISCompiler Languages Programmer's Guide, Dec. 1988.
[14]
O. A. Olukotun, R. B. Brown, R. J. Lomax, T. N. Mudge, and K. A. Sakallah, "Multilevel optimization in the design of a high-performance GaAs microcomputer," IEEE J. Solid-State Circuits, vol. 26, May 1991.
[15]
O.A. Olukotun, Technology-Organization Tradeoffs in the Architecture of a High Performance Processor. PhD thesis, The University of Michigan, Ann Arbor, 1991.
[16]
S.A. Przybylski, Cache and Memory Hierarchy Design. San Mateo, California: Morgan Kaufman Publishers, inc., 1990.
[17]
D. Patterson and C. Sdquin, "A VLSI RISC," IEEE Computer Magazine, vol. 15, pp. 8-21, Sept. 1982.
[18]
A.J. Smith, "A comparative study of set associative memory mapping algorithms and their use for cache and main memory," IEEE Tram. Software Engineering, vol. SE-4, pp. 121-130, Mar. 1978.
[19]
J.E. Smith, "A study of branch prediction strategies," in Proc. 8th Annual Int. Symp. Computer Architecture, pp. 135-147, July 1981.
[20]
K.A. Sakallah, T. N. Mudge, and O. A. Olukotun, "checkT, and mint,. : Tmaing verification and optimal clocking of synchronous digital circuits," in Proc. IEEE Conf. Computer-Aided Design, (Santa Clara, California), Nov. 1990.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 20, Issue 2
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)
May 1992
429 pages
ISSN:0163-5964
DOI:10.1145/146628
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
    May 1992
    439 pages
    ISBN:0897915097
    DOI:10.1145/139669

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1992
Published in SIGARCH Volume 20, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)39
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2000)Design and Analysis of On-Chip CPU Pipelined CachesVLSI: Systems on a Chip10.1007/978-0-387-35498-9_15(161-172)Online publication date: 2000
  • (1999)STATSJournal of Systems Architecture: the EUROMICRO Journal10.1016/S1383-7621(98)00052-645:12-13(1097-1110)Online publication date: 1-Jun-1999
  • (2014)Author retrospective improving data cache performance by pre-executing instructions under a cache missACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591655(40-41)Online publication date: 10-Jun-2014
  • (2014)SFFMap: Set-First Fill mapping for an energy efficient pipelined data cache2014 IEEE 32nd International Conference on Computer Design (ICCD)10.1109/ICCD.2014.6974669(104-109)Online publication date: Oct-2014
  • (2001)High Bandwidth On-Chip Cache DesignIEEE Transactions on Computers10.1109/12.91927650:4(292-307)Online publication date: 1-Apr-2001
  • (1998)Low load latency through sum-addressed memory (SAM)ACM SIGARCH Computer Architecture News10.1145/279361.27940626:3(369-379)Online publication date: 16-Apr-1998
  • (1998)Low load latency through sum-addressed memory (SAM)Proceedings of the 25th annual international symposium on Computer architecture10.1145/279358.279406(369-379)Online publication date: 16-Apr-1998
  • (1997)Designing high bandwidth on-chip cachesACM SIGARCH Computer Architecture News10.1145/384286.26415325:2(121-132)Online publication date: 1-May-1997
  • (1997)Designing high bandwidth on-chip cachesProceedings of the 24th annual international symposium on Computer architecture10.1145/264107.264153(121-132)Online publication date: 1-Jun-1997
  • (1997)The effects of cache architecture on the performance of operating systems in multithreaded processorsProceedings Ninth Euromicro Workshop on Real Time Systems10.1109/EMWRTS.1997.613766(72-79)Online publication date: 1997
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media