Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/645606.661333guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Dag-Consistent Distributed Shared Memory

Published: 15 April 1996 Publication History

Abstract

We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag consistency in software for the Cilk multithreaded runtime system running on a Connection Machine CM5. Our implementation includes a dag-consistent distributed cactus stack for storage allocation. We provide empirical evidence of the flexibility and efficiency of dag consistency for applications that include blocked matrix multiplication, Strassen's matrix multiplication algorithm, and a Barnes-Hut code. Although Cilk schedules the executions of these programs dynamically, their performances are competitive with statically scheduled implementations in the literature. We also prove that the number F_P of page faults incurred by a user program running on P processors can be related to the number F_1 of page faults running serially by the formula F_P \leq F_1 + 2Cs, where C is the cache size and s is the number of thread migrations executed by Cilk's scheduler.

References

[1]
Joshua E. Barnes. A hierarchical O(N log N) N-body code. Available on the Internet from ftp://hubble. ifa.hawaii.edu/pub/barnes/treecode/.
[2]
Monica Beltrametti, Kenneth Bobey, and John R. Zorbas. The control mechanism for the Myrias parallel computer system. Computer Architecture News, 16(4):21-30, September 1988.
[3]
Robert D. Blumofe. Executing Multithreaded Programs Efficiently. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, September 1995.
[4]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 207-216, Santa Barbara, California, July 1995.
[5]
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pages 356-368, Santa Fe, New Mexico, November 1994.
[6]
John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and performance of Munin. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, pages 152-164, Pacific Grove, California, October 1991.
[7]
David Chaiken and Anant Agarwal. Software-extended coherent shared memory: Performance and cost. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 314-324, Chicago, Illinois, April 1994.
[8]
Jeffrey S. Chase, Franz G. Amador, Edward D. Lazowska, Henry M. Levy, and Richard J. Littlefield. The Amber system: Parallel programming on a network of multiprocessors. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pages 147-158, Litchfield Park, Arizona, December 1989.
[9]
Daved E. Culler, Andrea Dusseau, Seth Copen Goldstein, Arvind Krishnamurthy, Steven Lumetta, Thorsten von Eicken, and Katherine Yelick. Parallel programming in Split-C. In Supercomputing '93, pages 262-273, Portland, Oregon, November 1993.
[10]
Michel Dubois, Christoph Scheurich, and Faye Briggs. Memory access buffering in multiprocessors. In Proceedings of the 13th Annual International Symposium on Computer Architecture, pages 434-442, June 1986.
[11]
Vincent W. Freeh, David K. Lowenthal, and Gregory R. Andrews. Distributed Filaments: Efficient fine-grain parallelism on a cluster of workstations. In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 201-213, Monterey, California, November 1994.
[12]
Guang R. Gao and Vivek Sarkar. Location consistency: Stepping beyond the barriers of memory coherence and serializability. Technical Report 78, McGill University, School of Computer Science, Advanced Compilers, Architectures, and Parallel Systems (ACAPS) Laboratory, December 1993.
[13]
Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phil lip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, Seattle, Washington, June 1990.
[14]
James R. Goodman. Cache consistency and sequential consistency. Technical Report 61, IEEE Scalable Coherent Interface (SCI) Working Group, March 1989.
[15]
E. A. Hauck and B. A. Dent. Burroughs' B6500/B7500 stack mechanism. Proceedings of the AFIPS Spring Joint Computer Conference, pages 245-251, 1968.
[16]
John L. Hennessy and David A. Patterson. Computer Architecture: a Quantitative Approach. Morgan Kaufmann, San Mateo, CA, 1990.
[17]
Christopher E Joerg. The Cilk System for Parallel Multithreaded Computing. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, January 1996.
[18]
Kirk L. Johnson, M. Frans Kaashoek, and Deborah A. Wallach. CRL: High-performance all-software distributed shared memory. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pages 213-228, Copper Mountain Resort, Colorado, December 1995.
[19]
Edward G. Coffman Jr. and Peter J. Denning. Operating Systems Theory. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1973.
[20]
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, and Willy Zwaenepoel. TreadMarks: Distributed shared memory on standard workstations and operating systems. In USENIX Winter 1994 Conference Proceedings, pages 115-132, San Francisco, California, January 1994.
[21]
Jeffrey Kuskin, David Ofelt, Mark Heinrich, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John Hennessy. The Stanford Flash multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 302- 313, Chicago, Illinois, April 1994.
[22]
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690-691, September 1979.
[23]
James R. Larus, Brad Richards, and Guhan Viswanathan. LCM: Memory system support for parallel language implementation. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 208-218, San Jose, California, October 1994.
[24]
Charles E. Leiserson, Zahi S. Abuhamdeh, David C. Douglas, Carl R. Feynman, Mahesh N. Ganmukhi, Jeffrey V. Hill, W. Daniel Hillis, Bradley C. Kuszmaul, Margaret A. St. Pierre, David S. Wells, Monica C. Wong, Shaw-Wen Yang, and Robert Zak. The network architecture of the Connection Machine CM-5. In Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 272-285, San Diego, California, June 1992.
[25]
Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4):321-359, November 1989.
[26]
Piyush Mehrotra and Jon Van Rosendale. The BLAZE language: A parallel language for scientific programming. Parallel Computing, 5:339-361, 1981.
[27]
Robert C. Miller. A type-checking preprocessor for Cilk 2, a multithreaded C language. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 1995.
[28]
Joel Moses. The function of FUNCTION in LISP or why the FUNARG problem should be called the environment problem. Technical Report memo AI-199, MIT Artificial Intelligence Laboratory, June 1970.
[29]
Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-level shared memory. In Proceedings qf the 21st Annual International Symposium on Computer Architecture, pages 325-336, Chicago, Illinois, April 1994.
[30]
Daniel J. Scales and Monica S . Lam. The design and evaluation of a shared object system for distributed memory machines. In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 101-114, Monterey, California, November 1994.
[31]
Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Stev en K. Reinhardt, James R. Larus, and David A. Wood. Fine-grain access control for distributed shared memory. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297-306, San Jose, California, October 1994.
[32]
Per Stenström. VLSI support for a cactus stack oriented memory organization. Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences, volume 1, pages 211-220, January 1988.
[33]
Volker Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 14(3):354-356, 1969.
[34]
Andrew S. Tanenbaum, Henri E. Bal, and M. Frans Kaashoek. Programming a distributed system using shared objects. In Proceedings of the Second International Symposium on High Performance Distributed Computing, pages 5-12, Spokane, Washington, July 1993.
[35]
Matthew J. Zekauskas, Wayne A. Sawdon, and Brian N. Bershad. Software write detection for a distributed shared memory. In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 87-100, Monterey, California, November 1994.

Cited By

View all
  • (2019)Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous SystemsJournal of Computer Science and Technology10.1007/s11390-019-1962-434:5(1096-1108)Online publication date: 1-Sep-2019
  • (2017)Designing Scalable Distributed Memory ModelsProceedings of the Computing Frontiers Conference10.1145/3075564.3077425(174-182)Online publication date: 15-May-2017
  • (2014)O-structuresProceedings of the workshop on Memory Systems Performance and Correctness10.1145/2618128.2618130(1-8)Online publication date: 13-Jun-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium
April 1996
851 pages
ISBN:0818672552

Publisher

IEEE Computer Society

United States

Publication History

Published: 15 April 1996

Author Tags

  1. cactus stack
  2. dag consistency
  3. distributed shared memory
  4. dynamic scheduling
  5. memory model
  6. multithreading
  7. page faults

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Degree-of-Node Task Scheduling of Fine-Grained Parallel Programs on Heterogeneous SystemsJournal of Computer Science and Technology10.1007/s11390-019-1962-434:5(1096-1108)Online publication date: 1-Sep-2019
  • (2017)Designing Scalable Distributed Memory ModelsProceedings of the Computing Frontiers Conference10.1145/3075564.3077425(174-182)Online publication date: 15-May-2017
  • (2014)O-structuresProceedings of the workshop on Memory Systems Performance and Correctness10.1145/2618128.2618130(1-8)Online publication date: 13-Jun-2014
  • (2012)Automatic generation of software pipelines for heterogeneous parallel systemsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389029(1-12)Online publication date: 10-Nov-2012
  • (2011)Scheduling irregular parallel computations on hierarchical cachesProceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures10.1145/1989493.1989553(355-366)Online publication date: 4-Jun-2011
  • (2011)Brief announcementProceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures10.1145/1989493.1989531(257-258)Online publication date: 4-Jun-2011
  • (2010)Low depth cache-oblivious algorithmsProceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures10.1145/1810479.1810519(189-199)Online publication date: 13-Jun-2010
  • (2009)Brief announcementProceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures10.1145/1583991.1584024(121-123)Online publication date: 11-Aug-2009
  • (2007)R-KleeneAlgorithmica10.5555/3118786.311925447:2(203-213)Online publication date: 1-Feb-2007
  • (2001)Y-InvalidateInternational Journal of Parallel Programming10.1023/A:101319040362229:6(583-606)Online publication date: 1-Dec-2001
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media