Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Published: 01 September 1996 Publication History

Abstract

This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. In addition, the system allows the coherence granularity to vary across different shared data structures in a single application. Shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores. For each shared load or store, the inserted code checks to see if the data is available locally and communicates with other processors if necessary. The system uses numerous techniques to reduce the run-time overhead of these checks. Since Shasta is implemented entirely in software, it also provides tremendous flexibility in supporting different types of cache coherence protocols. We have implemented an efficient cache coherence protocol that incorporates a number of optimizations, including support for multiple communication granularities and use of relaxed memory models. This system is fully functional and runs on a cluster of Alpha workstations.The primary focus of this paper is to describe the techniques used in Shasta to reduce the checking overhead for supporting fine granularity sharing in software. These techniques include careful layout of the shared address space, scheduling the checking code for efficient execution on modern processors, using a simple method that checks loads using only the value loaded, reducing the extra cache misses caused by the checking code, and combining the checks for multiple loads and stores. To characterize the effect of these techniques, we present detailed performance results for the SPLASH-2 applications running on an Alpha processor. Without our optimizations, the checking overheads are excessively high, exceeding 100% for several applications. However, our techniques are effective in reducing these overheads to a range of 5% to 35% for almost all of the applications. We also describe our coherence protocol and present some preliminary results on the parallel performance of several applications running on our workstation cluster. Our experience so far indicates that once the cost of checking memory accesses is reduced using our techniques, the Shasta approach is an attractive software solution for supporting a shared address space with fine-grain access to data.

References

[1]
H.E. Bal, M. E Kaashoek, and A. S. Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. IEEE Transactions on Software Enginee rin g, 18 ( 3): 190-205, Mar. 1992.]]
[2]
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In COMPCON 1993, pages 528- 537, Mar. 1993.]]
[3]
M. C. Carlisle and A. Rogers. Software Caching and Computation Migration in Olden. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 29-38, July 1995.]]
[4]
J. B. Carter, j. K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. In Proceedings of the i3th ACM Symposium on Operating Systems Principles, pages 152-164, Oct. 1991.]]
[5]
D. Chiou, B. S. Ang, Arvind, M. j. Becherle, A. Boughton, R. Greiner, J. E. Hicks, and J. C. Hoe. StarT-NG: Delivering Seamless Parallel Computing. In Proceedings of EURO-PAR '95, pages 101-116, Aug. 1995.]]
[6]
D.E. Culler et al. Parallel Programming in Spht-C In Proceedings of Supercomputing '93, pages 262-273, Nov. 1993.]]
[7]
A. Erlichson, N. Nuckolls, G. Chesson, and J, Hennessy. SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.]]
[8]
K. Gharachofloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990.]]
[9]
R. Gillett, M. Collins, and D. Pimm. Overview of Memory Channel Network for PCI. In Proceedings of COMPCON '96, pages 244-248, Feb. 1996.]]
[10]
M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing Memory Operations: Providing Memory Performance Feedback in Modem Processors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 260-270, May 1996.]]
[11]
K. L. Johnson, M. F. Kaashoek, and D. A. Wallach. CRL: High- Performance All-Software Distributed Shared Memory. In Proceedings of the Fifteenth Symposium on Operating System Principles, pages 213-228, Dec. 1995.]]
[12]
P. Keleher, A.L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the 1994 Winter Usenix Conference, pages 115-132, January 1994.]]
[13]
K. Li and P. Hudak. Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems, 7(4):321-359, Nov. 1989.]]
[14]
R. S. Nikhil. Cid: a Parallel, "Shared-memory" C for Distributedmemory Machines. In Seventh Workshop on Languages and Compilers for Parallel Computing, pages 376-390, Aug. 1994.]]
[15]
S. K. Reinhardt, R. W. Pfile, and D A. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 34-43, May 1996.]]
[16]
D. J. Scales and M. S. Lam. The Design and Evaluation of a Shared ObJect System for Distributed Memory Machines. In Proceedings of the First Symposium on Operating System Design and Implementation, pages 101-114, Nov. 1994.]]
[17]
I. Schoinas, B. Falsafi, M. D. Hill, J. R. Larus, C. E. Lukas, S. S. Mukherjee, S. K. Reinhardt, E. Schnarr, and D. A. Wood. Implementing Fine-Grain Distributed Shared Memory on Commodity SMP Workstations. Technical Report 1307, University of Wisconsin Computer Sciences, Mar. 1996.]]
[18]
I. Schoinas, B. Falsafi, A. R. Lebeck, S. K. Reinhardt, J. R. Lares, and D. A. Wood. Fine-grain Access Control for Distributed Shared Memory. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297-306, Oct. 1994.]]
[19]
J. P. Singh, W. D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1 ):5-44, Mar. 1992.]]
[20]
A. Srivastava and A. Eustace. ATOM: A System for Building Customized Program Analysis Tools. In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, pages 196-205, June 1994.]]
[21]
P. R. Wilson and T. G. Moher. A Card-marking Scheme for Controlling Intergenerational References in Generation-Based GC on Stock Hardware. SIGPLAN Notices, 24(5):87-92, 1989.]]
[22]
S. C. Woo, M. Ohara, E. Tome, J. P. Singh, and A Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposmm on Computer Architecture, pages 24-36, June 1995.]]
[23]
D. Yeung, J. Kubiatowicz, and A. Agarwal. MGS: A Multigrain shared Memory System. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 44-55, May 1996.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 30, Issue 5
Dec. 1996
273 pages
ISSN:0163-5980
DOI:10.1145/248208
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
    October 1996
    290 pages
    ISBN:0897917677
    DOI:10.1145/237090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996
Published in SIGOPS Volume 30, Issue 5

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)30
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Software Transactional MemoryTransactional Memory10.1007/978-3-031-01719-3_3(53-130)Online publication date: 17-Oct-2022
  • (2021)On atomic registers and randomized consensus in M&M systemsDistributed Computing10.1007/s00446-021-00405-7Online publication date: 27-Oct-2021
  • (2019)Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared MemoryJournal of Computer Science and Technology10.1007/s11390-019-1901-434:1(94-112)Online publication date: 18-Jan-2019
  • (2017)Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime SupportACM Transactions on Parallel Computing10.1145/31081384:2(1-42)Online publication date: 30-Aug-2017
  • (2016)Drinking from both glassesProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2851141.2851143(1-13)Online publication date: 27-Feb-2016
  • (2015)K2ACM Transactions on Computer Systems10.1145/269967633:2(1-27)Online publication date: 8-Jun-2015
  • (2015)Low-overhead software transactional memory with progress guarantees and strong semanticsProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688510(97-108)Online publication date: 24-Jan-2015
  • (2014)K2ACM SIGARCH Computer Architecture News10.1145/2654822.254197542:1(285-300)Online publication date: 24-Feb-2014
  • (2014)K2ACM SIGPLAN Notices10.1145/2644865.254197549:4(285-300)Online publication date: 24-Feb-2014
  • (2014)K2Proceedings of the 19th international conference on Architectural support for programming languages and operating systems10.1145/2541940.2541975(285-300)Online publication date: 24-Feb-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media