Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Published: 01 October 1998 Publication History

Abstract

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of software complexity, with a view to allowing more flexible memory system design. This paper investigates some issues in choosing between RAMpage and a conventionalcache architecture, with a view to illustrating trade-offs which can be made in choosing whether to place complexity in the memory system in hardware or in software. Performance results in this paper are based on a simple Rambus implementation of DRAM, with performance characteristics of Direct Rambus, which should be available in 1999. This paper explores the conditions under which it becomes feasible to perform a context switch on a miss in the RAMpage model, and the conditions under which RAMpage is a win over a conventional cache architecture: as the CPU-DRAM speed gap grows, RAMpage becomes more viable.

References

[1]
A. Agarwal and S.D. Pudar. Column associative caches: A technique for reducing the miss rate of direct mapped caches. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 179-190, May 1993.
[2]
D. Burger and T M. Austin. The SimpleScalar Tool Set. Version 2.0, Tech. Report No. 1342, Computer Sciences Department, University of Wisconsin- Madison, June 1997. ftp://ftp, cs .wisc. edu/galileo/ dburger/papers / TR_13 4 2 . ps.
[3]
J.K. Bennet, J.B. Carter, and W. Zwaenepoel. Adaptive software cache management for distributed shared memory architectures. In Proc. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 125- 134, Seattle, WA, May 1990.
[4]
K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59-67, August 1994.
[5]
S Belayneh and D.R. Kaeli. A discussion of nonblocking/lockup-free caches. Computer Architecture News, 24(3):18-25, June 1996.
[6]
B.N. Bershad, D. Lee, T.H. Romer, and J.B. Chen. Avoiding conflict misses dynamically in large directmapped caches. In Proc. 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-6), pages 158-170, October 1994.
[7]
T. Chen and J. Baer. Reducing memory latency via non-blocking and prefetching caches. In Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 51-61, September 1992.
[8]
D.R. Cheriton, A. Gupta, P.D. Boyle, and H.A Goosen. The VMP multiprocessor: Initial experience, refinements and performance evaluation. In Proc. 15th Int. Syrup. on ComputerArchitecture (ISCA '88), pages 410-421, Honolulu, May/June 1988.
[9]
D.R. Cheriton, H.A. Goosen, H. Holbrook, and P. Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: The value of distributed synchronization. In Proc. 7th Workshop on Parallel and Distributed Simulation, pages 159-162, San Diego, May 1993.
[10]
D.R. Cheriton, H.A. Goosen, and P Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memorymultiprocessor: A first experience. In Proc. Int. Symp. on SharedMemory Multiprocessing, pages 109-118, Tokyo, April 1991.
[11]
T-F. Chen. An effective programmable prefetch engine for on-chip caches. In Proc. 28th Int. Symp. on Microarchitecture (MICRO-28), pages 237-242, Ann Arbor, MI, 29 November- 1 December 1995.
[12]
Richard Crisp. Direct Rambus tecnology: The new main memory standard. IEEE Micro, 17(6):18-28, November/December 1997.
[13]
C. Crowley. Operating Systems: A Design-Oriented Approach. Irwin Publishing, 1997.
[14]
D.R. Cheriton, G. Slavenburg, and P. Boyle. Softwarecontrolled caches in the VMP multiprocessor. In Proc. 13th Int. Syrup. on Computer Architecture (ISCA '86), pages 366-374, Tokyo, June 1986.
[15]
C. Dulong. The IA-64 architecture at work. Computer, 31(7):24-32, July 1998.
[16]
R.A. Fatoohi. Vector performance analysis of the NEC SX-2. In Proc. Int. Conf. on Supercomputing, pages 389-400, 1990.
[17]
J. Handy. The Cache Memory Book. Academic Press, San Diego, CA, 2nd edition, 1998.
[18]
J. Huck and J. Hays. Architectural support for translation table management in large address space machines. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 39-50, San Diego, CA, May 1993.
[19]
Y. Hidaka, H. Koike, and H Tanaka. Multiple threads in cyclic register windows. In Proc. 20th Annual Int. Symp. on Computer architecture (ISCA '93), pages 131-142, San Diego, CA, May 1993.
[20]
J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kauffmann, San Francisco, CA, 2nd edition, 1996.
[21]
IBM. Synchronous DRAMs' The DRAM of the Future. http' //www. chips, ibm. com/products / memory / sdramart / sdramart, html, 1997.
[22]
IBM. PowerPC 750 RISC Microprocessor Technical Summary. http: //www. chips, ibm. corn/ products/ppc / do cumen ts / da tashe et s / 750/750_TS_R%0 .pdf, January 1998.
[23]
J. Inouye, R. Konuru, J. Walpole, and B. Sears. The Effects of Virtually Addressed Caches on Virtual Memory Design and Performance. Tech. Report No. CS/E 92-010, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Engineering, March 1992.
[24]
Intel. Pentium H Processor Product Overview. http: //developer. intel, com/design/ PentiumII/prodbref/index.htm, 1998.
[25]
B. Jacob and T. Mudge. Software-managed address translation. In Prec. Third Int. Symp. on High- Performance Computer Architecture, San Antonio, Texas, February 1997.
[26]
N.P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Prec. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 364-373, May 1990.
[27]
D.R. Kaeli and P.G. Emma. Improving the accuracy of history-based branch prediction. IEEE Transactions on Computers, 46(4):469-472, April 1997.
[28]
T Kilburn, D.B.J. Edwards, M.J. Lanigan, and F.H. Sumner. One-level storage system. IRE Transactions on Electronic Computers, EC-11(2):223-35, April 1962.
[29]
G. Kane and J. Heinrich. MIPS RISC Architecture. Prentice Hall, Englewood Cliffs, NJ, 1992.
[30]
R.E. Kessler and M.D. Hill. Page placement algorithms for large real-indexed caches. A CM Transactions on Computer Systems, 10(4):338-359, November 1992.
[31]
A. Ki and A. E. Knowles. Adaptive data prefetching using cache information. In Prec. 1997 Int. Conf. on Supercomputing, pages 204-212, Vienna, 1997.
[32]
D. Kroft. Lockup-free instruction fetch/prefetch cache organisation. In Prec. 8th Int. Symp. on Computer Architecture (ISCA '81), pages 81-84, May 1981.
[33]
J.L. Lo, J.S. Emer, H.M. Levy, R.L. Stamm, and D.M. Tullsen. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems, 15(3):322-354, August 1997.
[34]
P. Machanick. The case for SRAM main memory. ComputerArchitecture News, 24(5):23-30, December 1996.
[35]
TC. Mowry, M.S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 62-73, September 1992.
[36]
P. Machanick and P. Salverda. Preliminary investigation of the RAMpage memory hierarchy. South African Computer Journal, 1998. In press. http://www, cs .wits .ac. za/~philip/ papers / rampage, html.
[37]
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software-managed TLBs. In Prec. 20th Int. Symp. on ComputerArchitecture (ISCA '93), pages 27-38, San Diego, CA, May 1993.
[38]
M. Rosenblum, S.A. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel and Distributed Technology, 3(4):34-43, Winter 1995.
[39]
A. Rogers and K. Li. Software support for speculative loads. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 38-50, September 1992.
[40]
M.L. Simmons, H.J. Wasserman, O.A. Lubeck, C. Eoyang, R Mendez, H Harada, and M Ishigum. A performance comparison of four supercomputers. Comm. ACM, 35(8):116-124, August 1992.
[41]
B. Wheeler and B.N. Bershad. Consistency management for virtually indexed caches. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 124-136, September 1992.
[42]
C.A. Waldspurger and W.E. Weihl. Register relocation: flexible contexts for multithreading. In Prec. 20th Annual Int. Syrup. on Computer architecture (ISCA '93), pages 120-130, San Diego, CA, May 1993.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
    October 1998
    326 pages
    ISBN:1581131070
    DOI:10.1145/291069
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 1998
Published in SIGPLAN Volume 33, Issue 11

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)53
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2006)Software-based instruction caching for embedded processorsACM SIGARCH Computer Architecture News10.1145/1168919.116889434:5(293-302)Online publication date: 20-Oct-2006
  • (2006)Software-based instruction caching for embedded processorsACM SIGPLAN Notices10.1145/1168918.116889441:11(293-302)Online publication date: 20-Oct-2006
  • (2006)Software-based instruction caching for embedded processorsACM SIGOPS Operating Systems Review10.1145/1168917.116889440:5(293-302)Online publication date: 20-Oct-2006
  • (2006)Software-based instruction caching for embedded processorsProceedings of the 12th international conference on Architectural support for programming languages and operating systems10.1145/1168857.1168894(293-302)Online publication date: 23-Oct-2006
  • (2016)memifACM SIGARCH Computer Architecture News10.1145/2980024.287240144:2(369-383)Online publication date: 25-Mar-2016
  • (2016) memif ACM SIGOPS Operating Systems Review10.1145/2954680.287240150:2(369-383)Online publication date: 25-Mar-2016
  • (2016)memifACM SIGPLAN Notices10.1145/2954679.287240151:4(369-383)Online publication date: 25-Mar-2016
  • (2016)memifProceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872401(369-383)Online publication date: 25-Mar-2016
  • (2014)CAMEOProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.63(1-12)Online publication date: 13-Dec-2014
  • (2014)Transparent Hardware Management of Stacked DRAM as Part of MemoryProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.56(13-24)Online publication date: 13-Dec-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media