Article

SCIMA-SMP: on-chip memory processor architecture for SMP

Authors:

Chikafumi Takahashi,

Daisuke Takahashi,

Hiroshi Nakamura,

Mitsuhisa SatoAuthors Info & Claims

WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture

Pages 121 - 128

https://doi.org/10.1145/1054943.1054960

Published: 20 June 2004 Publication History

Abstract

In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software Controlled Integrated Memory Architecture for SMP) with the intent of solving the performance gap problem between a processor and off-chip memory. With special instructions which enable the explicit data transfer between on-chip memory and off-chip memory, this architecture is able to control the data transfer timing and its granularity by the application program, and the SMP bus is utilized efficiently compared with traditional cache-only architecture. Through the performance evaluation based on clock-level simulation for various HPC applications, we confirmed that this architecture largely reduces the bus access cycle by avoiding redundant data transfer and controlling the granularity of the data movement between on-chip and off-chip memory.

References

[1]

S. Aoki, R. Burkhalter, K. Kanaya, T. Yoshié, T. Boku, H. Nakamura, and Y. Yamashita. Performance of lattice QCD programs on CP-PACS. In Parallel Computing, volume 25, pages 1243--1255, 1999.

Digital Library

[2]

D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. In NASA Ames Research Center Report, NAS-05-020, 1995.

[3]

D. E. Culler, J. P. Singh, and with Anoop Gupta. Parallel Computer Architecture, pages 293--299. Morgan Kaufmann Publishers Inc., 1999.

[4]

L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. Computational Sci. & Eng., 5(1):46--55, Jan./Mar. 1998.

Digital Library

[5]

K. Diefendorff. Sony's emotionally charged chip. In Microprocessor Report, volume 13, 1999.

[6]

J Draper and J. C. et al. The architecture of the DIVA processing-in-memory chip. In Proc. ICS2002, pages 14--25, 2002.

Digital Library

[7]

H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H. Honda. OSCAR multi-grain architecture and its evaluation. In Proc. of IWIA'97, pages 106--115, 1997.

[8]

M. Kondo, H. Okawara, H. Nakamura, and T. Boku. SCIMA: Software controlled integrated memory architecture for high performance computing. In Proc. of ICCD2000, pages 105--111, 2000.

Digital Library

[9]

M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. ASPLOS-IV, pages 63--74, 1991.

Digital Library

[10]

H. Nakamura, M. Kondo, T. Ohneda, M. Fujita, S. Chiba, M. Sato, and T. Boku. Architecture and compiler co-optimization for high performance computing. In Proc. of IWIA '02, pages 50--56, 2002.

Digital Library

[11]

D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A Case for Intelligent RAM: IRAM. IEEE Micro, 17(2):34--44, Apr. 1997.

Digital Library

[12]

P. Ranganathan, S. Adve, and N. Jouppi. Reconfigurable caches and their application to media processing. In Proc. of ISCA-27, pages 214--224, 2000.

Digital Library

[13]

G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In proc. of SC2000, 2000.

Digital Library

[14]

J. B. Rothman and A. J. Smith. Analysis of shared memory misses and reference patterns. In Proc. of ICCD2000, pages 187--198, 2000.

Digital Library

[15]

T. Sunaga and P. M. K. et al. A processor in memory chip for massively parallel embedded applications. IEEE J. of Solid State Circuits, pages 1556--1559, Oct. 1996.

[16]

J. Turley. Strongarm speed to triple. In Microprocessor Report, volume 32, 1999.

[17]

H. A. van der Vorst. BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems. In SIAM J. Sci. Stat. Comput., volume 13, pages 631--644, 1992.

Digital Library

[18]

A. Veidenbaum and W. T. et al. Adapting cache line size to application behavior. In Proc. ICS '99, pages 145--154, 1999.

Digital Library

Index Terms

SCIMA-SMP: on-chip memory processor architecture for SMP
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

SCIMA: A Novel Architecture for High Performance Computing
IWIA '99: Proceedings of the 1999 International Workshop on Innovative Architecture

Technological trends have brought the growing disparity between processor and memory speeds. This memory wall problem is becoming very serious especially in high performance computing. In this paper, we propose a new architecture SCIMA for solving this ...
The Starfire SMP interconnect
SC '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing

The Starfire interconnect extends the envelope of Unix symmetric multiprocessor (SMP) systems in several dimensions. Interconnect: an active centerplane with four address routers and a 16x16 data crossbar provides 64 UltraSPARC processors with uniform ...
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing
ICCD '00: Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors

Processor performance has been improved due to clock acceleration and ILP extraction techniques. Performance of main memory, however, has not been improved so much. The performance gap between processor and memory will be growing further in the future. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture

June 2004

146 pages

ISBN:159593040X

DOI:10.1145/1054943

Conference Chairs:
John Carter
University of Utah
,
Lixin Zhang
IBM Austin Research Lab

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
234
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents