Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1054943.1054960acmotherconferencesArticle/Chapter ViewAbstractPublication PageswmpiConference Proceedingsconference-collections
Article

SCIMA-SMP: on-chip memory processor architecture for SMP

Published: 20 June 2004 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software Controlled Integrated Memory Architecture for SMP) with the intent of solving the performance gap problem between a processor and off-chip memory. With special instructions which enable the explicit data transfer between on-chip memory and off-chip memory, this architecture is able to control the data transfer timing and its granularity by the application program, and the SMP bus is utilized efficiently compared with traditional cache-only architecture. Through the performance evaluation based on clock-level simulation for various HPC applications, we confirmed that this architecture largely reduces the bus access cycle by avoiding redundant data transfer and controlling the granularity of the data movement between on-chip and off-chip memory.

    References

    [1]
    S. Aoki, R. Burkhalter, K. Kanaya, T. Yoshié, T. Boku, H. Nakamura, and Y. Yamashita. Performance of lattice QCD programs on CP-PACS. In Parallel Computing, volume 25, pages 1243--1255, 1999.
    [2]
    D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. In NASA Ames Research Center Report, NAS-05-020, 1995.
    [3]
    D. E. Culler, J. P. Singh, and with Anoop Gupta. Parallel Computer Architecture, pages 293--299. Morgan Kaufmann Publishers Inc., 1999.
    [4]
    L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. Computational Sci. & Eng., 5(1):46--55, Jan./Mar. 1998.
    [5]
    K. Diefendorff. Sony's emotionally charged chip. In Microprocessor Report, volume 13, 1999.
    [6]
    J Draper and J. C. et al. The architecture of the DIVA processing-in-memory chip. In Proc. ICS2002, pages 14--25, 2002.
    [7]
    H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H. Honda. OSCAR multi-grain architecture and its evaluation. In Proc. of IWIA'97, pages 106--115, 1997.
    [8]
    M. Kondo, H. Okawara, H. Nakamura, and T. Boku. SCIMA: Software controlled integrated memory architecture for high performance computing. In Proc. of ICCD2000, pages 105--111, 2000.
    [9]
    M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. ASPLOS-IV, pages 63--74, 1991.
    [10]
    H. Nakamura, M. Kondo, T. Ohneda, M. Fujita, S. Chiba, M. Sato, and T. Boku. Architecture and compiler co-optimization for high performance computing. In Proc. of IWIA '02, pages 50--56, 2002.
    [11]
    D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A Case for Intelligent RAM: IRAM. IEEE Micro, 17(2):34--44, Apr. 1997.
    [12]
    P. Ranganathan, S. Adve, and N. Jouppi. Reconfigurable caches and their application to media processing. In Proc. of ISCA-27, pages 214--224, 2000.
    [13]
    G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In proc. of SC2000, 2000.
    [14]
    J. B. Rothman and A. J. Smith. Analysis of shared memory misses and reference patterns. In Proc. of ICCD2000, pages 187--198, 2000.
    [15]
    T. Sunaga and P. M. K. et al. A processor in memory chip for massively parallel embedded applications. IEEE J. of Solid State Circuits, pages 1556--1559, Oct. 1996.
    [16]
    J. Turley. Strongarm speed to triple. In Microprocessor Report, volume 32, 1999.
    [17]
    H. A. van der Vorst. BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems. In SIAM J. Sci. Stat. Comput., volume 13, pages 631--644, 1992.
    [18]
    A. Veidenbaum and W. T. et al. Adapting cache line size to application behavior. In Proc. ICS '99, pages 145--154, 1999.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
    June 2004
    146 pages
    ISBN:159593040X
    DOI:10.1145/1054943
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 234
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media