Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/263764.263778acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free access

Tradeoffs between false sharing and aggregation in software distributed shared memory

Published: 21 June 1997 Publication History
  • Get Citation Alerts
  • Abstract

    Software Distributed Shared Memory (DSM) systems based on virtual memory techniques traditionally use the hardware page as the consistency unit. The large size of the hardware page is considered to be a performance bottleneck because of the implied false sharing overheads. Instead, we show that in the presence of a relaxed consistency model and a multiple writer protocol, a large consistency unit is generally not detrimental to performance. We study the tradeoffs between false sharing and aggregation effects when using large consistency units. In this context, this paper makes three separate contributions:1. We document the cost of false sharing in terms of extra messages and extra data being communicated. We find that, for the applications considered, when the virtual memory page is used as the consistency unit, the number of extra messages is small, while the amount of extra data can be substantial.2. We evaluate the performance when the consistency unit is increased to a multiple of the virtual memory page size. For most applications and data sets, the performance improves, except when the false sharing effects include extra messages or a large amount of extra data.3. We present a new algorithm for dynamically aggregating pages. In our algorithm, the aggregated pages do not necessarily need to be contiguous. In all cases, the performance of our dynamic aggregation algorithm is similar to that achieved with the best static page size.These results were obtained by measuring the performance of eight applications on the TreadMarks distributed shared memory system. The hardware platform used is a network of 166Mhz Pentiums connected by a switched 100Mbps Ethernet network.

    References

    [1]
    C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Tread- Marks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18-28, February 1996.]]
    [2]
    D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. Technical Report 103863, NASA, July 1993.]]
    [3]
    J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for reducing consistency-related information in distributed shared memory systems. A CM Transactions on Computer Systems, 13(3):205-243, August 1995.]]
    [4]
    C. Dubnicki and T. LeBlanc. Adjustable block size coherent caches. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 170-180, May 1992.]]
    [5]
    S. Dwarkadas, A.A. SchLffer, R.W. Cottinghazn Jr., A.L. Cox, P. Keleher, and W. Zwaenepoel. Parallelization of general linkage analysis problems. Human Heredity, 44:127-141, 1994.]]
    [6]
    S.J. Eggers and R.H. Katz. A characterization of sharing in paxallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373-383, May 1988.]]
    [7]
    S.J. Eggers and R.H. Katz. The effect of sharing on the cache and bus performance of parallel programs. In Proceedings of the 3rd Symposium on Architectural Support .for Programming Languages and Operating Systems, pages 257-270, April 1989.]]
    [8]
    G.A. Geist and V.S. Sunderam. Network-based concurrent computing on the PVM system. Concurrency: Practice and Experience, pages 293-311, June 1992.]]
    [9]
    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessots. In Proceedings o.f the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.]]
    [10]
    J.R. Goodman. Coherency for multiprocessor virtual address caches. In Proceedings o} the #nd Symposium on Architectural Support }or Programming Languages and Operating Systems, pages 72-81, October 1987.]]
    [11]
    E. Granston and H. Wijshoff. Managing pages in shared virtual memory systems: Getting the compiler into the game. In Proceedings o.f the 1993 A CM International Conference on Supercomputing, July 1993.]]
    [12]
    T.E. Jeremiassen and S. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In Proceedings of the 5th Symposium on the Principles and Practice of Parallel Programming, July 1995.]]
    [13]
    P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy release consistency for software distributed shared memory. In Proceedings o.f the 19th Annual international Symposium on Computer Architecture, pages 13-21, May 1992.]]
    [14]
    K. Li and P. Hudak. Memory coherence in shaxed virtual memory systems. A CM Transactions on Computer Systems, 7(4):321-359, November 1989.]]
    [15]
    H. Lu, S. Dwaxkadas, A.L. Cox, and W. Zwaenepoel. Quantifying the performance differences between PVM and TreadMarks. Journal of Parallel and Distributed Computing, June 1997. To appear.]]
    [16]
    R. Sadourny. The dynamics of finite-difference models of the shallow-water equations. Journal o.f Atmospheric Sciences, 32(4), April 1975.]]
    [17]
    J.P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):2-12, March 1992.]]
    [18]
    W.-D. Weber and A. Gupta. Analysis of cache invalidation patterns in multiprocessors. In Proceedings of the 3rd Symposium on Architectural Support }or Programming Languages and Operating Systems, pages 243-256, April 1989.]]
    [19]
    M.J. Zekauskas, W.A. Sawdon, and B.N. Bershad. Software write detection for distributed shared memory. In Proceedings of the First USENIX Symposium on Op. erating System Design and Implementation, pages 87- 100, November 1994.]]
    [20]
    Y. Zhou, L. Iftode, K. Li, J.P. Singh, B.R. Toonen, I. Schoinas, M.D. Hill, and D.A. Wood. Relaxed consistency and coherence granularity in DSM systems: A performance evaluation. In Proceedings o.f the 6th Symposium on the Principles and Practice o.f Parallel Programming, June 1997. To appear.]]

    Cited By

    View all
    • (2013)Exploiting Locality in Lease-Based Replicated Transactional Memory via Task MigrationDistributed Computing10.1007/978-3-642-41527-2_9(121-133)Online publication date: 2013
    • (2012)An Accurate Prefetch Technique for Dynamic Paging Behaviour for Software Distributed Shared MemoryProceedings of the 2012 41st International Conference on Parallel Processing10.1109/ICPP.2012.16(209-218)Online publication date: 10-Sep-2012
    • (2011)Java Support Packages and Benchmarks for Multi-core ProcessorsProceedings of the 2011 IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2011.75(528-535)Online publication date: 2-Sep-2011
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
    June 1997
    287 pages
    ISBN:0897919068
    DOI:10.1145/263764
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 June 1997

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    PPoPP97
    Sponsor:
    PPoPP97: Principles & Practices of Parallel Programming
    June 18 - 21, 1997
    Nevada, Las Vegas, USA

    Acceptance Rates

    PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;
    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)Exploiting Locality in Lease-Based Replicated Transactional Memory via Task MigrationDistributed Computing10.1007/978-3-642-41527-2_9(121-133)Online publication date: 2013
    • (2012)An Accurate Prefetch Technique for Dynamic Paging Behaviour for Software Distributed Shared MemoryProceedings of the 2012 41st International Conference on Parallel Processing10.1109/ICPP.2012.16(209-218)Online publication date: 10-Sep-2012
    • (2011)Java Support Packages and Benchmarks for Multi-core ProcessorsProceedings of the 2011 IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2011.75(528-535)Online publication date: 2-Sep-2011
    • (2010)Adaptive conflict unit size for distributed optimistic synchronizationProceedings of the 16th international Euro-Par conference on Parallel processing: Part I10.5555/1887695.1887755(547-559)Online publication date: 31-Aug-2010
    • (2010)Region-Based Prefetch Techniques for Software Distributed Shared Memory SystemsProceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing10.1109/CCGRID.2010.16(113-122)Online publication date: 17-May-2010
    • (2004)Performance analysis of methods that overcome false sharing effects in software DSMsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2004.02.00364:8(887-907)Online publication date: 1-Aug-2004
    • (2002)Optimistic Synchronization and Transactional Consistency2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02)10.1109/CCGRID.2002.1017155(331-331)Online publication date: 2002
    • (2001)Multiple-writer entry consistencyCluster computing10.5555/770406.770416(97-108)Online publication date: 1-Jan-2001
    • (2001)Transparent adaptation of sharing granularity in multiview-based DSM systemsProceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 200110.1109/IPDPS.2001.924974(10)Online publication date: 2001
    • (2001)A DSM cluster architecture supporting aggressive computation in active networksProceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2001.923241(547-554)Online publication date: 2001
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media