Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/263764.263772acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free access

Compiler and software distributed shared memory support for irregular applications

Published: 21 June 1997 Publication History

Abstract

We investigate the use of a software distributed shared memory (DSM) layer to support irregular computations on distributed memory machines. Software DSM supports irregular computation through demand fetching of data in response to memory access faults. With the addition of a very limited form of compiler support, namely the identification of the section of the indirection array accessed by each processor, many of these on-demand page fetches can be aggregated into a single message, and prefetched prior to the access fault.We have measured the performance of this approach for two irregular applications, moldyn and nbf, using the Tread-Marks DSM system on an 8-processor IBM SP2. We find that it has similar performance to the inspector-executor method supported by the CHAOS run-time library, while requiring much simpler compile-time support. For moldyn, it is up to 23% faster than CHAOS, depending on the input problem's characteristics; and for nbf, it is no worse than 14% slower. If we include the execution time of the inspector, the software DSM-based approach is always faster than CHAOS. The advantage of this approach increases as the frequency of changes to the indirection array increases. The disadvantage of this approach is the potential for false sharing overhead when the data set is small or has poor spatial locality.

References

[1]
G. AgarwM and J. SaJtz. Interprocedural compilation of irregular applications for distributed memory machines. In Proceedings of Supercomputing '95, December 1995.
[2]
C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Tread- Marks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18-28, February 1996.
[3]
B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, and M. Karplus. Charmm: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4:187, 1983.
[4]
D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. Journal of Parallel and Distributed Computing, 5:517- 550, 1988.
[5]
3.B. Carter, 3.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. A CM Transactions on Computer Systems, 13(3):205-243, August 1995.
[6]
R. Das, P. Havlak, J. Saltz, and K. Kennedy. Index array flattening through program transformation. In Proceedings of Supercomputing '95, December 1995.
[7]
R. Das, M. Uysal, 3. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462-479, September 1994.
[8]
S. Dwarkadas, A.L. Cox, and W. Zwaenepoel. An integrated compile-time/run-time software distributed shared memory system. In Proceedings of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems, October 1996.
[9]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessots. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.
[10]
W.F. van Gunsteren and H.J.C. Berendsen. GROMOS: GROningen MOlecular Simulation software. Technical report, Laboratory of Physical Chemistry, University of Groningen, 1988.
[11]
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350-360, July 1991.
[12]
K. Kennedy, K. S. McKinley, and C. Tseng. Analysis and transformation in an interactive parallel programming tool. Concurrency: Practice and Experience, 5(7), October 1993.
[13]
K. Li and P. Hudak. Memory coherence in shared virtual memory systems. A CM Transactions on Computer Systems, 7(4):321-359, November 1989.
[14]
H. Lu, S. Dwarkadas, A.L. Cox, and W. Zwaenepoel. Message passing versus distributed shared memory on networks of workstations. In Proceedings SuperComputing '95, December 1995.
[15]
T.C. Mowry, A.K. Demke, and O. Krieger. Automatic compiler-inserted I/O prefetching for out-of-core applications. In Proceedings of the Second USENIX Symposium on Operating System Design and Implementation, pages 3-17, November 1996.
[16]
S.S. Mukherjee, S.D. Sharma, M.D. Hill, 3.R. Larus, A. Rogers, and J. Saltz. Efficient support for irregular applications on distributed memory machines. In Proceedings of the 5th Symposium on the Principles and Practice of Parallel Programming, July 1995.
[17]
Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-level shared memory. In Proceedings of the f21th Annual International Symposium on Computer Architecture, pages 325-337, April 1994.
[18]
J. Saltz, H. Berryman, and J. Wu. Multiprocessors and run-time compilation. Concurrency:Practice and Experience, 3(6):573-592, December 1991.
[19]
S. Sharma, R. Ponnusamy, B. Moon, Y. Hwang, R. Das, and J. Saltz. Interprocedural compilation of irregular applications for distributed memory machines. In Proceedings SuperComputing '95, dec 1995.
[20]
R. yon Hanxleden and K. Kennedy. Give-N-Take-a balanced code placement framework. In Proceedings of the A CM SIGPLAN 91 Conference on Proyramming Language Design and Implementation, June 1994.
[21]
R. yon Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. Compiler analysis for irregular problems in Fortran D. In Proceedings of the 5th Workshop on Languages and Compilers for Parallel Computing, August 1992.
[22]
Reinhard yon Hanxleden. Handling irregular problems with Fortran D- a preliminary report. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, December 1993.

Cited By

View all
  • (2024)TrackFM: Far-out Compiler Support for a Far Memory WorldProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624856(401-419)Online publication date: 27-Apr-2024
  • (2022)Cache-coherent CLAM (WIP)Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535073(111-115)Online publication date: 14-Jun-2022
  • (2017)Optimizing locality in graph computations using reuse distance profiles2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC)10.1109/PCCC.2017.8280444(1-8)Online publication date: Dec-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
June 1997
287 pages
ISBN:0897919068
DOI:10.1145/263764
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PPoPP97
Sponsor:
PPoPP97: Principles & Practices of Parallel Programming
June 18 - 21, 1997
Nevada, Las Vegas, USA

Acceptance Rates

PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TrackFM: Far-out Compiler Support for a Far Memory WorldProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624856(401-419)Online publication date: 27-Apr-2024
  • (2022)Cache-coherent CLAM (WIP)Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535073(111-115)Online publication date: 14-Jun-2022
  • (2017)Optimizing locality in graph computations using reuse distance profiles2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC)10.1109/PCCC.2017.8280444(1-8)Online publication date: Dec-2017
  • (2009)Programming matrix algorithms-by-blocks for thread-level parallelismACM Transactions on Mathematical Software10.1145/1527286.152728836:3(1-26)Online publication date: 23-Jul-2009
  • (2008)Optimizing irregular shared-memory applications for clustersProceedings of the 22nd annual international conference on Supercomputing10.1145/1375527.1375566(256-265)Online publication date: 7-Jun-2008
  • (2007)Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architecturesProceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures10.1145/1248377.1248397(116-125)Online publication date: 9-Jun-2007
  • (2007)Satisfying your dependencies with SuperMatrixProceedings of the 2007 IEEE International Conference on Cluster Computing10.1109/CLUSTR.2007.4629221(91-99)Online publication date: 17-Sep-2007
  • (2006)Runtime address space computation for SDSM systemsProceedings of the 19th international conference on Languages and compilers for parallel computing10.5555/1757112.1757145(330-344)Online publication date: 2-Nov-2006
  • (2006)Optimizing irregular shared-memory applications for distributed-memory systemsProceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/1122971.1122990(119-128)Online publication date: 29-Mar-2006
  • (2005)Shared Memory Parallelization of Data Mining AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2005.1817:1(71-89)Online publication date: 1-Jan-2005
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media