research-article

A portable runtime interface for multi-level memory hierarchies

Authors:

Timothy Knight,

Kayvon Fatahalian,

Pat HanrahanAuthors Info & Claims

PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Pages 143 - 152

https://doi.org/10.1145/1345206.1345229

Published: 20 February 2008 Publication History

Abstract

We present a platform independent runtime interface for moving data and computation through parallel machines with multi-level memory hierarchies. We show that this interface can be used as a compiler target and can be implemented easily and efficiently on a variety of platforms. The interface design allows us to compose multiple runtimes, achieving portability across machines with multiple memory levels. We demonstrate portability of programs across machines with two memory levels with runtime implementations for multi-core/SMP machines, the STI Cell Broadband Engine, a distributed memory cluster, and disk systems. We also demonstrate portability across machines with multiple memory levels by composing runtimes and running on a cluster of SMP nodes, out-of-core algorithms on a Sony Playstation 3 pulling data from disk, and a cluster of Sony Playstation 3's. With this uniform interface, we achieve good performance for our applications and maximize bandwidth and computational resources on these system configurations.

References

[1]

U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. In SPAA '00: Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, pages 1--12, New York, NY, USA, 2000. ACM.

Digital Library

[2]

B. Alpern, L. Carter, and J. Ferrante. Modeling parallel computers as memory hierarchies. In Proc. Programming Models for Massively Parallel Computers, 1993.

[3]

ANL. MPICH2. http://www-unix.mcs.anl.gov/mpi/mpich2, 2007.

[4]

R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th Symposium on Principles and Practice of Parallel Programming, 1995.

Digital Library

[5]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004.

Digital Library

[6]

W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. University of California-Berkeley Technical Report: CCS-TR-99-157, 1999.

[7]

A. Chow, G. Fossum, and D. Brokenshire. A programming example: Large FFT on the Cell Broadband Engine, 2005.

[8]

L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998.

Digital Library

[9]

S. J. Deitz, B. L. Chamberlain, and L. Snyder. Abstractions for dynamic data distribution. In Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, pages 42--51. IEEE Computer Society, 2004.

[10]

D. L. Eager and J. Jahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Trans. Comput. Syst., 11(1):1--32, 1993.

Digital Library

[11]

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006.

Digital Library

[12]

M. Frigo. A fast Fourier transform compiler. In Proc. 1999 ACM SIGPLAN Conf. on Programming Language Design and Implementation, volume 34, pages 169--180, May 1999.

Digital Library

[13]

T. Fukushige, J. Makino, and A. Kawai. GRAPE-6A: A single-card GRAPE-6 for parallel PC-GRAPE cluster systems. Publications of the Astronomical Society of Japan, 57:1009--1021, dec 2005.

[14]

A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing. Cambridge, MA, USA, 1994. MIT Press.

Digital Library

[15]

D. R. Horn, M. Houston, and P. Hanrahan. ClawHMMER: A streaming HMMer-search implementation. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, page 11, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[16]

W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, and D. K. Panda. Design and implementation of high performance MVAPICH2: MPI2 over InfiniBand. In International Symposium on Cluster Computing and the Grid (CCGrid), May 2006.

Digital Library

[17]

IBM. IBM BladeCenter QS20. http://www.ibm.com/technology/splash/qs20, 2007.

[18]

IBM. IBM Cell Broadband Engine Software Development Kit. http://www.alphaworks.ibm.com/tech/cellsw, 2007.

[19]

Intel. Math kernel library. http://www.intel.com/software/products/mkl, 2005.

[20]

L. Kalé and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108. ACM Press, September 1993.

Digital Library

[21]

T. J. Knight, J. Y. Park, M. Ren, M. Houston, M. Erez, K. Fatahalian, A. Aiken, W. J. Dally, and P. Hanrahan. Compilation for explicitly managed memory hierarchies. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 226--236, Mar. 2007.

Digital Library

[22]

F. Labonte, P. Mattson, I. Buck, C. Kozyrakis, and M. Horowitz. The stream virtual machine. In Proceedings of the 2004 International Conference on Parallel Architectures and Compilation Techniques, Antibes Juan-les-pins, France, September 2004.

Digital Library

[23]

MPIF. MPI: A message passing interface standard. In International Journal of Supercomputer Applications, pages 165--416, 1994.

[24]

MPIF. MPI-2: Extensions to the Message-Passing Interface. Technical Report, University of Tennessee, Knoxville, 1996.

[25]

R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1--31, 1998.

Digital Library

[26]

Sony. Sony Playstation 3. http://www.us.playstation.com/PS3, 2007.

[27]

K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM 1998 Workshop on Java for High-Performance Network Computing, Stanford, California, 1998.

Cited By

Greenspan D(2019)LLAMA - automatic memory allocationsProceedings of the International Symposium on Memory Systems10.1145/3357526.3357534(363-372)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357534
Pawlick JColbert EZhu Q(2019)A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and PrivacyACM Computing Surveys10.1145/333777252:4(1-28)Online publication date: 30-Aug-2019
https://dl.acm.org/doi/10.1145/3337772
Tang DSubramanian V(2019)Random Walk Based Sampling for Load Balancing in Multi-Server SystemsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3322205.33110853:1(1-44)Online publication date: 26-Mar-2019
https://dl.acm.org/doi/10.1145/3322205.3311085
Show More Cited By

Index Terms

A portable runtime interface for multi-level memory hierarchies
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments

Recommendations

A portable runtime interface for multi-level memory hierarchies
Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

The Intel® Xeon Phi™ coprocessor platform has a new software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-capable Intel® Architecture CPU, namely, the Intel® ...
Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support

The heterogeneous system architecture (HSA), announced by the HSA Foundation, is an approach to integrate central processing unit (CPU) and graphics processing unit (GPU) architectures. The open computing language (OpenCL) is a programming framework ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

February 2008

308 pages

ISBN:9781595937957

DOI:10.1145/1345206

General Chair:
Siddhartha Chatterjee
IBM Research USA
,
Program Chair:
Michael L. Scott
University of Rochester USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP08

Sponsor:

PPoPP08: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 20 - 23, 2008

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
752
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Greenspan D(2019)LLAMA - automatic memory allocationsProceedings of the International Symposium on Memory Systems10.1145/3357526.3357534(363-372)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357534
Pawlick JColbert EZhu Q(2019)A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and PrivacyACM Computing Surveys10.1145/333777252:4(1-28)Online publication date: 30-Aug-2019
https://dl.acm.org/doi/10.1145/3337772
Tang DSubramanian V(2019)Random Walk Based Sampling for Load Balancing in Multi-Server SystemsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3322205.33110853:1(1-44)Online publication date: 26-Mar-2019
https://dl.acm.org/doi/10.1145/3322205.3311085
Su LXu J(2019)Securing Distributed Gradient Descent in High Dimensional Statistical LearningProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3322205.33110833:1(1-41)Online publication date: 26-Mar-2019
https://dl.acm.org/doi/10.1145/3322205.3311083
Yu HWei EBerry R(2019)Analyzing Location-Based Advertising for Vehicle Service Providers Using Effective ResistancesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3322205.33110773:1(1-35)Online publication date: 26-Mar-2019
https://dl.acm.org/doi/10.1145/3322205.3311077
Jia ZTreichler SShipman GMcCormick PAiken A(2018)IsometryProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205301(295-306)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205301
Jia ZTreichler SShipman GBauer MWatkins NMaltzahn CMcCormick PAiken A(2017)Integrating External Resources with a Task-Based Programming Model2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00043(307-316)Online publication date: Dec-2017
https://doi.org/10.1109/HiPC.2017.00043
Watkins NJia ZShipman GMaltzahn CAiken AMcCormick PButt ALofstead J(2015)Automatic and transparent I/O optimization with storage integrated application runtime supportProceedings of the 10th Parallel Data Storage Workshop10.1145/2834976.2834983(49-54)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2834976.2834983
Pinho LNélis VYomsi PQuiñones EBertogna MBurgio PMarongiu AScordino CGai PRamponi MMardiak M(2015)P-SOCRATESMicroprocessors & Microsystems10.1016/j.micpro.2015.06.00439:8(1190-1203)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.micpro.2015.06.004
Treichler SBauer MAiken AAmaral JTorrellas J(2014)RealmProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628084(263-276)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2628071.2628084
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents