Article

Contention elimination by replication of sequential sections in distributed shared memory programs

Authors:

Willy ZwaenepoelAuthors Info & Claims

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

Pages 53 - 61

https://doi.org/10.1145/379539.379568

Published: 18 June 2001 Publication History

Abstract

In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the sequential section, causing a flurry of data requests to converge on that processor.

We address this problem in a software distributed shared memory system by replicating the execution of the sequential sections on all processors. Communication during this replicated sequential execution is reduced by using multicast.

We have implemented replicated sequential execution with multicast support in OpenMP/NOW, a version of of OpenMP that runs on networks of workstations. We do not rely on compile-time data analysis, and therefore we can handle irregular and pointer-based applications. We show significant improvement for two pointer-based applications that suffer from severe contention without replicated sequential execution.

References

[1]

S.V. Adve and M.D. Hill. A unified formalization of four shared-memory models. IEEE Transactions on Parallel and Distributed Systems, 4(6):613-624, June 1993.]]

Digital Library

[2]

A. Agarwal, D. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors. In IEEE Transactions on Parallel and Distributed Systems, volume 6, pages 943-962, September 1995.]]

Digital Library

[3]

G. Agrawal and J. Saltz. Interprocedural compilation of irregular applications for distributed memory machines. In Proceedings of Supercomputing '95, December 1995.]]

Digital Library

[4]

S. Amarasinghe and M. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN 93 Conference onProgramming Language Design and Implementation, June 1993.]]

Digital Library

[5]

S. P. Amarasinghe, J. M. Anderson, M. S. Lam, and C. W. Tseng. The SUIF compiler for scalable parallel machines. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, February 1995.]]

[6]

C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18-28, February 1996.]]

Digital Library

[7]

J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN 93 Conference on Programming Language Design and Implementation, June 1993.]]

Digital Library

[8]

E.C. Cooper. Replicated distributed programs. In Proceedings of the 10th ACM Symposium on Operating Systems Principles, pages 63-78, December 1985.]]

Digital Library

[9]

R. W. Cottingham Jr., R. M. Idury, and A. A. Schaffer. Faster sequential genetic linkage computations. American Journal of Human Genetics, 53:252-263, 1993.]]

[10]

R. Das, P. Havlak, J. Saltz, and K. Kennedy. Index array attening through program transformation. In Proceedings of Supercomputing '95, December 1995.]]

Digital Library

[11]

E. de Lara, Y. C. Hu, H. Lu, A. L. Cox, and W. Zwaenepoel. The effect of contention on the scalability of page-based software shared memory systems. In Languages, Compilers, and Run-Time Systems for Scalable Computers(Proc. 5th Intl. Workshop LCR2000), Rochester, NY, May 2000. Springer-Verlag.]]

Digital Library

[12]

S. Dwarkadas, A.A. Schaffer, R.W. Cottingham Jr., A. L. Cox, P. Keleher, and W. Zwaenepoel. Parallelization of general linkage analysis problems. Human Heredity, 44:127-141, 1994.]]

[13]

E.N. Elnozahy and W. Zwaenepoel. Replicated distributed process in Manetho. In Proceedings of the 22nd International Symposium on Fault-Tolerant Computing, pages 18-27, July 1992.]]

[14]

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.]]

Digital Library

[15]

P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy release consistency for software distributed shared memory. InProceedings of the 19th Annual International Symposium on Computer Architecture, pages 13-21, May 1992.]]

Digital Library

[16]

P. Keleher, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter Usenix Conference, pages 115-131, January 1994.]]

Digital Library

[17]

G. M. Lathrop, J. M. Lalouel, C. Julier, and J. Ott. Strategies for multilocus linkage analysis in humans. Proceedings of National Academy of Science, USA, 81:3443-3446, June 1984.]]

[18]

J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361-376, July 1991.]]

Digital Library

[19]

H. Lu, Y. C. Hu, and W. Zwaenepoel. OpenMP on networks of workstations. In Proceedings of Supercomputing '98, November 1998.]]

Digital Library

[20]

OpenMP Architecture Review Board. OpenMP Fortran Application Program Interface, Version 1.0. http://www.openmp.org, October 1997.]]

[21]

OpenMP Architecture Review Board. OpenMP C and C++ Application Program Interface, Version 1.0. http://www.openmp.org, October 1998.]]

[22]

J. Saltz, H. Berryman, and J. Wu. Multiprocessors and run-time compilation. Concurrency:Practice and Experience, 3(6):573-592, December 1991.]]

[23]

W.E. Speight and J.K. Bennett. Using multicast and multithreading to reduce communication in software DSM systems. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 312-323, February 1998.]]

Digital Library

[24]

R. von Hanxleden and K. Kennedy. Give-N-Take -a balanced code placement framework. In Proceedings of the ACM SIGPLAN 94 Conference onProgramming Language Design and Implementation, June 1994.]]

Digital Library

[25]

S. C. Woo, M.Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, June 1995.]]

Digital Library

Cited By

Prokopec ARosà ALeopoldseder DDuboscq GTůma PStudener MBulej LZheng YVillazón ASimon DWürthinger TBinder WMcKinley KFisher K(2019)Renaissance: benchmarking suite for parallel applications on the JVMProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314637(31-47)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314637
Nikolopoulos D(2003)Quantifying contention and balancing memory load on hardware DSM multiprocessorsJournal of Parallel and Distributed Computing10.1016/S0743-7315(03)00105-963:9(866-886)Online publication date: 1-Sep-2003
https://dl.acm.org/doi/10.1016/S0743-7315%2803%2900105-9
Nikolopoulos D(2002)Quantifying and Resolving Remote Memory Access Contention on Hardware DSM MultiprocessorsProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.660885Online publication date: 15-Apr-2002
https://dl.acm.org/doi/10.5555/645610.660885

Index Terms

Contention elimination by replication of sequential sections in distributed shared memory programs
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Contention elimination by replication of sequential sections in distributed shared memory programs

In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the ...
Compiling shared-memory applications for distributed-memory systems
Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs

Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

June 2001

142 pages

ISBN:1581133464

DOI:10.1145/379539

Chairmen:
Michael Heath
Univ. of Illinois, Illinois, IN
,
Andrew Lumsdaine
Indiana Univ.

ACM SIGPLAN Notices Volume 36, Issue 7
July 2001
143 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/568014
Issue’s Table of Contents

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PPoPP01

Sponsor:

SIGPLAN

PPoPP01: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Utah, Snowbird, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
347
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Prokopec ARosà ALeopoldseder DDuboscq GTůma PStudener MBulej LZheng YVillazón ASimon DWürthinger TBinder WMcKinley KFisher K(2019)Renaissance: benchmarking suite for parallel applications on the JVMProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314637(31-47)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314637
Nikolopoulos D(2003)Quantifying contention and balancing memory load on hardware DSM multiprocessorsJournal of Parallel and Distributed Computing10.1016/S0743-7315(03)00105-963:9(866-886)Online publication date: 1-Sep-2003
https://dl.acm.org/doi/10.1016/S0743-7315%2803%2900105-9
Nikolopoulos D(2002)Quantifying and Resolving Remote Memory Access Contention on Hardware DSM MultiprocessorsProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.660885Online publication date: 15-Apr-2002
https://dl.acm.org/doi/10.5555/645610.660885

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents