Article

Free access

Missing the memory wall: the case for processor/memory integration

Authors:

Ashley Saulsbury,

Andreas NowatzykAuthors Info & Claims

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

Pages 90 - 101

https://doi.org/10.1145/232973.232984

Published: 01 May 1996 Publication History

Abstract

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening gap between CPU and main memory speeds. Yet, many large applications do not operate well on these systems and are limited by the memory subsystem performance.This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and complexity. Based on a design study using the next generation 0.25µm, 256Mbit dynamic random-access memory (DRAM) process and on the analysis of existing machines, we show that processor memory integration can be used to build competitive, scalable and cost-effective MP systems.We present results from execution driven uni- and multi-processor simulations showing that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.

References

[1]

Wulf, Wm.A and McKee, S.A. Hitting the Memory Walh Implications of the Obvious. ACM Computer Architecture News. Vol.23, No.1 March 1995.]]

Digital Library

[2]

Wilkes, M.V., The Memory Wall and the CMOS End-Point, ACM Computer Architecture News. Vol. 23, No. 4 September 1995.]]

Digital Library

[3]

SPEC Newsletter; URL: http : //www. specbench, org/ results .html]]

[4]

Synopsys Inc., 700 East Middlefield Rd. Mountain View, California, CA 94043.]]

[5]

Horiguchi, M. et.al., An Experimental 220MHz 1Gb DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.252.]]

[6]

Sugibayashi, T. et.al., A 1Gb DRAM for file Applications, IEEE international Solid-State Circuits Conference 1995. San Francisco, p.254.]]

[7]

Miyano, S. et.al., A 1.6GB/s Data-Transfer-Rate 8Mb Embedded DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.300]]

[8]

MicroSparc documentation, internal communication with Sparc Technology Business inc.]]

[9]

Shimizu, et.al. A Multimedia 32b RISC Microprocessor with 16Mb DRAM, International Solid-State-Circuits Conference, February 1996, pp216-217.]]

[10]

MIPS R4300i Processor Reference Manual, URL: http : / / www.mips.com/r4300i/R4300i B.html]]

[11]

Nowatzyk, A., Browne, M., Kelly, E. and Parkin, M. S-Connect: from Network of Workstations to Supercomputer Performance. Proceedings of the 22nd International Symposium on Computer Architecture, June 1994.]]

Digital Library

[12]

Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B. and Vishin, S. The S3.mp Scalable Shared Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, 1995.]]

[13]

MB81164840- CMOS 4x2Mx8 Synchronous DRAM, Fujitsu Microelectronics Inc., 3455 N. first St., San Jose CA 95134,]]

[14]

RDRAM Reference Manual, Rambus Inc., 2465 Latharn Street, Mountain View, CA 94040.]]

[15]

Yoo, J.H. et.al., A 32-bank 1Gb DRAM with 1GB/s Bandwidth, IEEE international Solid-State Circuits Conference 1996, San Francisco, p.378.]]

[16]

Przybylski, S., MoSys Reveals MDRAM Architecture,/Vlicroprocessor Report, Vol 9:17, Dec 25, 1995, MicroDesign Resources, Sebastopol, CA95472. ISSN 0899-9341]]

[17]

Koike, H., et.al., A 30ns 64Mb DRAM with Built-in Self-Test and Repair Function,iSSCC t 992, San Francisco, p 150]]

[18]

Jouppi, N. Improving Direct-Mapped Cache Performance by Addition of a Small Fully-Associative Cache and Prefetch Buffer, Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990 pages 364-373]]

Digital Library

[19]

Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B. and Vishin, S.Exploiting Parallelism in Cache Coherency Protocol engines, Europar 1995, Stockholm, Sweden]]

Digital Library

[20]

Lenoski, D. The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor. PhD Dissertation, Stanford University, December 1991.]]

Digital Library

[21]

Saulsbury, A. et.al. An Argument for Simple COMA, 1st IEEE Symposium on High Performance Computer Architecture January 22-25th 1995, Rayleigh, North Carolina, USA; pages 276-285.]]

Digital Library

[22]

Cmelik, B. The SHADE simulator, Sun-Labs Technical Report, 1993]]

[23]

Marsan, G.,Conti, A class of generalized stochastic petrinets for the performance evaluation of multiprocessor systems, ACM Transactions on Computer Systems, 2(2): 93, May 1984]]

Digital Library

[24]

Dubois, M., Skeppstedt, J., Ricciulli, L., Ramamurthy, K. and StenstrOm, P. The Detection and Elimination of Useless Misses in Multiprocessors. Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 88- 97, May 1993.]]

Digital Library

[25]

Singh, J.P., Weber, W.-D., and Gupta, A. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1):5-44, March 1992.]]

Digital Library

[26]

Brorsson, M., Dahlgren, E, Nilsson, H. and Stenstr6m, P. The CacheMire Test Bench - A Flexible and Effective Approach for Simulation of Multiprocessors. Proceedings of the 26th Annual Simulation Symposium, pp. 115-124, 1993,]]

[27]

The Transputer Reference Manual, 1988, INMOS Ltd., Pub. Prentice Hall, ISBN 0-13-929001-X.]]

[28]

Dally, W.J. et. al. M-Machine Microarchitecture, Tech Report, Artificial Intelligence Lab MIT, Cambridge, MA. Jan 1993]]

[29]

Kogge, P.M., EXECUBE- A New Architecture for Scalable MPPs, 1994 international Conference on Parallel Processing.]]

Digital Library

[30]

ADSP-21060 SHAR C Super Harvard Architecture Computer, ANALOG DEVICES, Norwood, MA, Oct. 1993.]]

Cited By

Sutradhar PBavikadi SDinakarrao SIndovina MGanguly A(2024)3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive ApplicationsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329314012:1(60-72)Online publication date: Jan-2024
https://doi.org/10.1109/TETC.2023.3293140
Das PSutradhar PIndovina MDinakarrao SGanguly A(2022)Implementation and Evaluation of Deep Neural Networks in Commercially Available Processing in Memory Hardware2022 IEEE 35th International System-on-Chip Conference (SOCC)10.1109/SOCC56010.2022.9908126(1-6)Online publication date: 5-Sep-2022
https://doi.org/10.1109/SOCC56010.2022.9908126
Dong FSi XChang M(2022)Design Methodology and Trends of SRAM-Based Compute-in-Memory Circuits2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT55466.2022.9963239(1-4)Online publication date: 25-Oct-2022
https://doi.org/10.1109/ICSICT55466.2022.9963239
Show More Cited By

Index Terms

Missing the memory wall: the case for processor/memory integration
1. Computing methodologies
  1. Modeling and simulation
2. Hardware

Recommendations

Missing the memory wall: the case for processor/memory integration
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening ...
Cache Design with Domain Wall Memory
Domain wall memory (DWM) is a recently developed spin-based memory technology in which several bits of data are densely packed into the domains of a ferromagnetic wire. DWM has shown great promise in enabling non-volatile memory with very high density and ...
Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack
We propose to overcome the memory capacity limitation of GPUs with a <italic>Heterogeneous Memory Stack (HMS)</italic> that integrates Storage Class Memory (SCM) and DRAM in a 3D memory stack. By effectively utilizing the DRAM as a cache, the HMS ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

May 1996

318 pages

ISBN:0897917863

DOI:10.1145/232973

Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle

ACM SIGARCH Computer Architecture News Volume 24, Issue 2
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
May 1996
303 pages
ISSN:0163-5964
DOI:10.1145/232974
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
Issue’s Table of Contents

Copyright © 1996 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA: TC on Computer Arhitecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1996

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA96

Sponsor:

SIGARCH
IEEE-CS\TCCA

ISCA96: International Conference on Computer Architecture

May 22 - 24, 1996

Pennsylvania, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

149
Total Citations
View Citations
1,395
Total Downloads

Downloads (Last 12 months)196
Downloads (Last 6 weeks)16

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sutradhar PBavikadi SDinakarrao SIndovina MGanguly A(2024)3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive ApplicationsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329314012:1(60-72)Online publication date: Jan-2024
https://doi.org/10.1109/TETC.2023.3293140
Das PSutradhar PIndovina MDinakarrao SGanguly A(2022)Implementation and Evaluation of Deep Neural Networks in Commercially Available Processing in Memory Hardware2022 IEEE 35th International System-on-Chip Conference (SOCC)10.1109/SOCC56010.2022.9908126(1-6)Online publication date: 5-Sep-2022
https://doi.org/10.1109/SOCC56010.2022.9908126
Dong FSi XChang M(2022)Design Methodology and Trends of SRAM-Based Compute-in-Memory Circuits2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)10.1109/ICSICT55466.2022.9963239(1-4)Online publication date: 25-Oct-2022
https://doi.org/10.1109/ICSICT55466.2022.9963239
Underwood AStine J(2019)An Emphasis on Memory and Processor Interactions in Undergraduate Computer Architecture EducationProceedings of the Workshop on Computer Architecture Education10.1145/3338698.3338888(1-8)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3338698.3338888
Velasquez AJha SScheideler CFineman J(2018)Brief AnnouncementProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210657(95-98)Online publication date: 11-Jul-2018
https://dl.acm.org/doi/10.1145/3210377.3210657
Efnusheva DTentov A(2017)Design of Processor in Memory with RISC-modified Memory-Centric ArchitectureCybernetics and Mathematics Applications in Intelligent Systems10.1007/978-3-319-57264-2_7(70-81)Online publication date: 7-Apr-2017
https://doi.org/10.1007/978-3-319-57264-2_7
Alves MDiener MSantos PCarro LFanucci LTeich J(2016)Large vector extensions inside the HMCProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972100(1249-1254)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972100
Moazzemi KHsieh CDutt NKent KYoo S(2016)HAMEXProceedings of the 27th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype10.1145/2990299.2990316(100-106)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2990299.2990316
Siegl PBuchty RBerekovic MJacob B(2016)Data-Centric Computing FrontiersProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989087(295-308)Online publication date: 3-Oct-2016
https://dl.acm.org/doi/10.1145/2989081.2989087
Richardson JGeorge ACheng KLam H(2016)Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I/O, & Realizable-Utilization MetricsACM Transactions on Reconfigurable Technology and Systems10.1145/288840110:1(1-21)Online publication date: 24-Sep-2016
https://dl.acm.org/doi/10.1145/2888401
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents