Article

Free access

Distributed filaments: efficient fine-grain parallelism on a cluster of workstations

Authors:

Vincent W. Freeh,

David K. Lowenthal,

Gregory R. AndrewsAuthors Info & Claims

OSDI '94: Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Pages 15 - es

Published: 14 November 1994 Publication History

PDF eReader Publisher Site

Abstract

A fine-grain parallel program is one in which processes are typically small ranging from a few to a few hundred instructions. Fine-grain parallelism arises naturally in many situations such as iterative grid computations recursive fork/join programs the bodies of parallel FOR loops and the implicit parallelism in functional or dataflow languages. It is useful both to describe massively parallel computations and as a target for code generation by compilers. However fine-grain parallelism has long been thought to be inefficient due to the overheads of process creation context switching, and synchronization. This paper describes a software kernel. Distributed Filaments (DF) that implements fine-grain parallelism both portably and efficiently on a workstation cluster DF runs on existing off-the-shelf hardware and software. It has a simple interface so it is easy to use. DF achieves e ciency by using stateless threads on each node overlapping communication and computation, employing a new reliable datagram communication protocol and automatically balancing the work generated by fork/join computations.

References

[1]

[ALL89] T. E. Anderson, E. D. Lazowska and H. M. Levy. The performance implications of thread management alternatives for shared-memory multiprocessors. IEEE Transactions on Computers. 38(12):1631-1644. December 1989.

Digital Library

[2]

[AOC⁺88] Gregory R. Andrews, Ronald A. Olsson. Michael Coffin. Irving Elshoff, Kelvin Nilsen. Titus Pursin and Gregg Townsend. An overview of the SR language and implementation. ACM Transactions on Programming Languages and Systems, 10(1):51-86. January 1988.

Digital Library

[3]

[Bal90] Henri E. Bal. Experience with distributed programming in Orca. Proc. IEEE CS 1990 Int. Conf. on Computer Languages, pages 79-89. March 1990.

[4]

[BS90] Peter A. Buhr and R. A. Stroobosscher. The uSystem: providing light-weight concurrency on shared memory multiprocessor computers running UNIX. Software Practice and Experience , pages 929-964. September 1990.

[5]

[BZS93] Brian N. Bershad. Matthew J. Zekauskas and Wayne A. Sawdon. The Midway distributed shared memory system. In COMPCON '93. 1993.

[6]

[CBZ91] John B. Carter, John K. Bennett and Willy Zwaenepoel. Implementation and performance of Munin. In Proceedings of 13th ACM Symposium On Operating Systems, pages 152-164. October 1991.

Digital Library

[7]

[CGL86] Nicholas Carriero, David Gelernter and Jerry Leichter. Distributed data structures in Linda. In Thirteenth ACM Symp. on Principles of Programming Languages. pages 236-242. January 1986.

Digital Library

[8]

[CGSv93] David E. Culler. Seth Copen Goldstein, Klaus Erik Schauser and Thorsten von Eicken. TAM--a compiler controlled threaded abstract machine. Journal of Parallel and Distributed Computing, 18(3):347-370. August 1993.

Digital Library

[9]

[CZ83] D. R. Cheriton and W. Zwaenepoel. The distributed V kernel and its performance for diskless workstations. In Proceedings of the Ninth ACM Symposium on Operating Systems Principles pages 128-140. October 1983.

Digital Library

[10]

[DJAR91] Partha Dasgupta. Richard J. LeBlanc Jr. Mustaque Ahmad and Umakishore Ramachandran. The Clouds distributed operating system. Computer pages 34-44. November 1991.

Digital Library

[11]

[EAL93] Dawson R. Engler. Gregory R. Andrews and David K. Lowenthal. Shared Filaments: Efficient support for fine-grain parallelism on shared-memory multiprocessors. TR 93-13, Dept. of Computer Science. University of Arizona, April 1993.

[12]

[EZ93] Derek L. Eager and John Zahorjan. Chores: Enhanced run time support for shared memory parallel computing. ACM Transactions on Computer Systems. 11(1):1-32. February 1993.

Digital Library

[13]

[FP89] Brett D. Fleisch and Gerald J. Popek. Mirage: a coherent distributed shared memory design. In Proceedings of th ACM Symposium On Operating Systems, pages 211-223. December 1989.

Digital Library

[14]

[Fre94] Vincent W. Freeh. A comparison of implicit and explicit parallel programming. TR 93-30a, University of Arizona. May 1994.

[15]

[FRS⁺91] W. Fenton, B. Ramkumar, V. A. Saletore, A. B. Sinha and L. V. Kale. Supporting machine independent programming on diverse parallel architectures. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, Software, pages II-193-II-201, Boca Raton, FL, August 1991. CRC Press.

[16]

[HB92] Matthew Haines and Wim Bohm. The design of VISA: A virtual shared addressing system. Technical Report CS-92-120. Colorado State University May 1992.

[17]

[HFM88] D. Hansgen, R. Finkel, and U. Manber. Two algorithms for barier synchronization. Int. Journal of Parallel Programming, 17(1):1-18, February 1988.

Digital Library

[18]

[KCA91] Kiyoshi Kurihara, David Chaiken, and Anant Agarwal. Latency tolerance through multithreading in large scale multiprocessors. In International Symposium on Shared Memory Multiprocessing, pages 91-101, April 1991.

[19]

[KDCZ94] Pete Keleher, Sandhya Dwarkadas, Alan Cox, and Willy Zwaenepoel. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter Usenix Conference. pages 115-131, January 1994.

[20]

[LH89] Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4), November 1989.

Digital Library

[21]

[SFL⁺94] Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steven K. Reinhardt, James R. Larus, and David A. Wood. Fine-grain access control for distributed shared memory. In Sixth International Conference on Architecture Support for Programming Languages and Operating Systems (to appear), October 1994.

Digital Library

[22]

[SHG93] Jaswinder Pal Singh, John L. Hennessy, and Anoop Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. Computer 26(7):42-50. July 1993.

Digital Library

[23]

[TC88] Robert H. Thomas and Will Crowther. The Uniform system: an approach to runtime support for large scale shared memory parallel processors. In 1988 Conference on Parallel Processing , pages 245-254. August 1988.

[24]

[TL93] Chanramohan A. Thekkath and Henry M. Levy. Limits to low-latency communication on high-speed networks. ACM Transactions on Computer Systems, 11(2):179-203. May 1993.

Digital Library

[25]

[vCGS92] Thorsten von Eicken, David E. Culler, Seth Copen Goldstein and Klaus Eric Schauser. Active Messages: a mechanism for intergrated communication and computation. In Proceedings of the 19th International Symposium on Computer Architecture, pages 256-266, May 1992.

Digital Library

Cited By

Agrawal KLi JLu KMoseley BScheideler CGilbert S(2016)Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow TimeProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935782(195-205)Online publication date: 11-Jul-2016
https://dl.acm.org/doi/10.1145/2935764.2935782
Lee IBoyd-Wickizer SHuang ZLeiserson CSalapura VGschwind MKnoop J(2010)Using memory mapping to support cactus stacks in work-stealing runtime systemsProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854324(411-420)Online publication date: 11-Sep-2010
https://dl.acm.org/doi/10.1145/1854273.1854324
Agrawal KLee ISukha Jauf der Heide FPhillips C(2010)Brief announcementProceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures10.1145/1810479.1810517(186-188)Online publication date: 13-Jun-2010
https://dl.acm.org/doi/10.1145/1810479.1810517
Show More Cited By

Index Terms

Distributed filaments: efficient fine-grain parallelism on a cluster of workstations

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

OSDI '94: Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

November 1994

228 pages

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems
USENIX Assoc: USENIX Assoc
IEEE Technical Committee on Operating Systems (TCOS)

Publisher

USENIX Association

United States

Publication History

Published: 14 November 1994

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)8

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Agrawal KLi JLu KMoseley BScheideler CGilbert S(2016)Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow TimeProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935782(195-205)Online publication date: 11-Jul-2016
https://dl.acm.org/doi/10.1145/2935764.2935782
Lee IBoyd-Wickizer SHuang ZLeiserson CSalapura VGschwind MKnoop J(2010)Using memory mapping to support cactus stacks in work-stealing runtime systemsProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854324(411-420)Online publication date: 11-Sep-2010
https://dl.acm.org/doi/10.1145/1854273.1854324
Agrawal KLee ISukha Jauf der Heide FPhillips C(2010)Brief announcementProceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures10.1145/1810479.1810517(186-188)Online publication date: 13-Jun-2010
https://dl.acm.org/doi/10.1145/1810479.1810517
Balasubramanian KLowenthal D(2003)Efficient support for pipelining in software distributed shared memory systemsReal-time system security10.5555/903866.903874(95-121)Online publication date: 1-Jan-2003
https://dl.acm.org/doi/10.5555/903866.903874
Price GLowenthal D(2003)A comparative analysis of fine-grain threads packagesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2003.06.00163:11(1050-1063)Online publication date: 1-Nov-2003
https://dl.acm.org/doi/10.1016/j.jpdc.2003.06.001
Narlikar GBlelloch G(1999)Space-efficient scheduling of nested parallelismACM Transactions on Programming Languages and Systems10.1145/314602.31460721:1(138-173)Online publication date: 1-Jan-1999
https://dl.acm.org/doi/10.1145/314602.314607
Arora NBlumofe RPlaxton CMiller GGibbons P(1998)Thread scheduling for multiprogrammed multiprocessorsProceedings of the tenth annual ACM symposium on Parallel algorithms and architectures10.1145/277651.277678(119-129)Online publication date: 1-Jun-1998
https://dl.acm.org/doi/10.1145/277651.277678
Thitikamol KKeleher P(1998)Per-Node Multithreading and Remote LatencyIEEE Transactions on Computers10.1109/12.67571147:4(414-426)Online publication date: 1-Apr-1998
https://dl.acm.org/doi/10.1109/12.675711
Blumofe RLisiecki P(1997)Adaptive and reliable parallel computing on networks of workstationsProceedings of the annual conference on USENIX Annual Technical Conference10.5555/1268680.1268690(10-10)Online publication date: 6-Jan-1997
https://dl.acm.org/doi/10.5555/1268680.1268690
Narlikar GBlelloch G(1997)Space-efficient implementation of nested parallelismACM SIGPLAN Notices10.1145/263767.26377032:7(25-36)Online publication date: 21-Jun-1997
https://dl.acm.org/doi/10.1145/263767.263770
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten