Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/379539.379568acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Contention elimination by replication of sequential sections in distributed shared memory programs

Published: 18 June 2001 Publication History
  • Get Citation Alerts
  • Abstract

    In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the sequential section, causing a flurry of data requests to converge on that processor.
    We address this problem in a software distributed shared memory system by replicating the execution of the sequential sections on all processors. Communication during this replicated sequential execution is reduced by using multicast.
    We have implemented replicated sequential execution with multicast support in OpenMP/NOW, a version of of OpenMP that runs on networks of workstations. We do not rely on compile-time data analysis, and therefore we can handle irregular and pointer-based applications. We show significant improvement for two pointer-based applications that suffer from severe contention without replicated sequential execution.

    References

    [1]
    S.V. Adve and M.D. Hill. A unified formalization of four shared-memory models. IEEE Transactions on Parallel and Distributed Systems, 4(6):613-624, June 1993.]]
    [2]
    A. Agarwal, D. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors. In IEEE Transactions on Parallel and Distributed Systems, volume 6, pages 943-962, September 1995.]]
    [3]
    G. Agrawal and J. Saltz. Interprocedural compilation of irregular applications for distributed memory machines. In Proceedings of Supercomputing '95, December 1995.]]
    [4]
    S. Amarasinghe and M. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN 93 Conference onProgramming Language Design and Implementation, June 1993.]]
    [5]
    S. P. Amarasinghe, J. M. Anderson, M. S. Lam, and C. W. Tseng. The SUIF compiler for scalable parallel machines. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, February 1995.]]
    [6]
    C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18-28, February 1996.]]
    [7]
    J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the ACM SIGPLAN 93 Conference on Programming Language Design and Implementation, June 1993.]]
    [8]
    E.C. Cooper. Replicated distributed programs. In Proceedings of the 10th ACM Symposium on Operating Systems Principles, pages 63-78, December 1985.]]
    [9]
    R. W. Cottingham Jr., R. M. Idury, and A. A. Schaffer. Faster sequential genetic linkage computations. American Journal of Human Genetics, 53:252-263, 1993.]]
    [10]
    R. Das, P. Havlak, J. Saltz, and K. Kennedy. Index array attening through program transformation. In Proceedings of Supercomputing '95, December 1995.]]
    [11]
    E. de Lara, Y. C. Hu, H. Lu, A. L. Cox, and W. Zwaenepoel. The effect of contention on the scalability of page-based software shared memory systems. In Languages, Compilers, and Run-Time Systems for Scalable Computers(Proc. 5th Intl. Workshop LCR2000), Rochester, NY, May 2000. Springer-Verlag.]]
    [12]
    S. Dwarkadas, A.A. Schaffer, R.W. Cottingham Jr., A. L. Cox, P. Keleher, and W. Zwaenepoel. Parallelization of general linkage analysis problems. Human Heredity, 44:127-141, 1994.]]
    [13]
    E.N. Elnozahy and W. Zwaenepoel. Replicated distributed process in Manetho. In Proceedings of the 22nd International Symposium on Fault-Tolerant Computing, pages 18-27, July 1992.]]
    [14]
    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.]]
    [15]
    P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy release consistency for software distributed shared memory. InProceedings of the 19th Annual International Symposium on Computer Architecture, pages 13-21, May 1992.]]
    [16]
    P. Keleher, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter Usenix Conference, pages 115-131, January 1994.]]
    [17]
    G. M. Lathrop, J. M. Lalouel, C. Julier, and J. Ott. Strategies for multilocus linkage analysis in humans. Proceedings of National Academy of Science, USA, 81:3443-3446, June 1984.]]
    [18]
    J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361-376, July 1991.]]
    [19]
    H. Lu, Y. C. Hu, and W. Zwaenepoel. OpenMP on networks of workstations. In Proceedings of Supercomputing '98, November 1998.]]
    [20]
    OpenMP Architecture Review Board. OpenMP Fortran Application Program Interface, Version 1.0. http://www.openmp.org, October 1997.]]
    [21]
    OpenMP Architecture Review Board. OpenMP C and C++ Application Program Interface, Version 1.0. http://www.openmp.org, October 1998.]]
    [22]
    J. Saltz, H. Berryman, and J. Wu. Multiprocessors and run-time compilation. Concurrency:Practice and Experience, 3(6):573-592, December 1991.]]
    [23]
    W.E. Speight and J.K. Bennett. Using multicast and multithreading to reduce communication in software DSM systems. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 312-323, February 1998.]]
    [24]
    R. von Hanxleden and K. Kennedy. Give-N-Take -a balanced code placement framework. In Proceedings of the ACM SIGPLAN 94 Conference onProgramming Language Design and Implementation, June 1994.]]
    [25]
    S. C. Woo, M.Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-36, June 1995.]]

    Cited By

    View all
    • (2019)Renaissance: benchmarking suite for parallel applications on the JVMProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314637(31-47)Online publication date: 8-Jun-2019
    • (2003)Quantifying contention and balancing memory load on hardware DSM multiprocessorsJournal of Parallel and Distributed Computing10.1016/S0743-7315(03)00105-963:9(866-886)Online publication date: 1-Sep-2003
    • (2002)Quantifying and Resolving Remote Memory Access Contention on Hardware DSM MultiprocessorsProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.660885Online publication date: 15-Apr-2002

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
    June 2001
    142 pages
    ISBN:1581133464
    DOI:10.1145/379539
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    PPoPP01
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Renaissance: benchmarking suite for parallel applications on the JVMProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314637(31-47)Online publication date: 8-Jun-2019
    • (2003)Quantifying contention and balancing memory load on hardware DSM multiprocessorsJournal of Parallel and Distributed Computing10.1016/S0743-7315(03)00105-963:9(866-886)Online publication date: 1-Sep-2003
    • (2002)Quantifying and Resolving Remote Memory Access Contention on Hardware DSM MultiprocessorsProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.660885Online publication date: 15-Apr-2002

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media