Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/263764.263770acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free access

Space-efficient implementation of nested parallelism

Published: 21 June 1997 Publication History
  • Get Citation Alerts
  • Abstract

    Many of today's high level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This paper presents a scheduling algorithm that is provably space-efficient and time-efficient for nested parallel languages. In addition to proving the space and time bounds of the parallel schedule generated by the algorithm, we demonstrate that it is efficient in practice. We have implemented a runtime system that uses our algorithm to schedule parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.

    References

    [1]
    Arvind, R. S. Nikhil, and K. Pingali. i-structures: Data structures for parallel computing. A CM Transactions on Programming Languages and Systems, 11(4):598- 632, October 1989.]]
    [2]
    G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21(1):4-14, April 1994.]]
    [3]
    G. E. Blelloch, P. B. (gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. In Proc. Symposium on Parallel Algorithms and Architectures, Santa Barbara, July 1995.]]
    [4]
    R. D. Blumofe, M. Frigo, C. F. Joerg, C. E. Leiserson, and K. H. Randall. An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms. In Proc. Syrup. on Parallel Algorithms and Architectures, pages 297- 308, June 1996.]]
    [5]
    R.D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proc. Symposium on Principles and Practice of Parallel Programming, pages 207-216, November 1995.]]
    [6]
    R. D. Blumofe and C. E. Leiserson. Space-efficient scheduling of multithreaded computations, in Proc. 25th A CM Syrup. on Theory of Computing, pages 362- 371, May 1993.]]
    [7]
    R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proc. 35th IEEB Syrup. on Foundations of Computer Science, pages 356-368, November 1994.]]
    [8]
    F. W. Burton. Storage management in virtual tree machines. IEEE Trans. on Computers, 37(3):321-328, 1988.]]
    [9]
    F. W. Burton and D. J. Simpson. Space efficient execution of deterministic parallel programs. Manuscript, December 1994.]]
    [10]
    F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Conference on Functional Programming Languages and Computer Architecture, October 1981.]]
    [11]
    R. Chandra, A. Gupta, and J. Hennessy. COOL: An object-based language for parallel programming. IEEE Computer, 27(8):13-26, August 1994.]]
    [12]
    K. M. Chandy and C. Kesselman. Compositional c++: compositional parallel programming. In Proc. 5th. Intl. Wkshp. on Languages and Compilers for Parallel Computing, pages 124-144, New Haven, CT, August 1992.]]
    [13]
    J. S. Chase, F. G. Amador, and E. D. Lazowslm. The amber system: Parallel programming on a network of multiprocessors. In Proc. Symposium on Operating Systems Principles, December 1989.]]
    [14]
    J. H. Chow and W. L. Harrison III. Switch-stacks: A scheme for microtasking nested parallel loops. In Proc. Supercomputing, New York, NY, November 1990.]]
    [15]
    S. A. Cook. A taxonomy of problems with fast parallel algorithms. Information and Control, 64:2-22, 1985.]]
    [16]
    D. E. Culler and Arvind. Resource requirements of data/tow programs. In Proc. Intl. Symposium on Computer Architecture, May 1988.]]
    [17]
    J. T. Feo, D. C. Cann, and R. R. Oldehoeft. A report on the Sisal language project. Journal of Parallel and Distributed Computing, 10(4):349-366, December 1990.]]
    [18]
    V. W. Freeh, D. K. Lowenthal, and G. R. Andrews. Distributed filaments: efficient fine-grain parallelism on a cluster of workstations. In First Symposium on Operating Systems Design and Implementation, pages 201- 212, Monterey, CA, November 1994.]]
    [19]
    S. C. Goldstein, D. E. Culler, and K. E. Schauser. Enabling primitives for compiling parallel languages. In Third Workshop on Languages, Compilers, and Run- Time Systems for Scalable Computers, Rochester, NY, May 1995.]]
    [20]
    L. Greengard. The rapid evaluation of potential fields in particle systems. The MIT Press, 1987.]]
    [21]
    High Performance Fortran Forum. High Performance Fortran language specification, May 1993.]]
    [22]
    W. E. Hseih, P. Wang, and W. E. Weihl. Computation migration: enhancing locality for distributed memory parallel systems. In Proc. Symposium on Principles and Practice of Parallel Programming, San Francisco, California, May 1993.]]
    [23]
    S. F. Hummel and E. Schonberg. Low-overhead scheduling of nested parallelsim. IBM Journal of Research and Development, 35(5-6):743-65, 1991.]]
    [24]
    S. F. Hummel, E. Schonberg, and L. E. Flynn. Factoring: a method for scheduling parallel loops. Communications of the A CM, 35(8):90-101, Aug 1992.]]
    [25]
    IEEE. Threads extension for portable operating systems (draft 6), Feb 1985.]]
    [26]
    C. D. Polychronopoulos; D.J. Kuck. Guided selfscheduling: a practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C- 36(12):1425-39, Dec 1987.]]
    [27]
    J. M. Mellor-Crummey. Concurrent queues: Practical Fetch-and-4' algorithms. Technical Report 229, University of Rochester, November 1987.]]
    [28]
    P. H. Mills, L. S. Nyland, J. F. Prins, J. H. Reif, and R. A. Wagner. Prototyping parallel and distributed programs in Proteus. Technical Report UNC- CH TR90-041, Computer Science Department, University of North Carolina, 1990.]]
    [29]
    G. J. Narlikar and G. E. Blelloch. A framework for space and time efficient scheduling of parallelism. Technical Report CMU-CS-96-197, Computer Science Department, Carnegie Mellon University, 1996.]]
    [30]
    M. L. Powell, S. R. Kleiman, S. Barton, D. Shah, D. Stein, and M. Weeks. SunOS multi-thread architecture. In Proc. USENIX COnference, 1991.]]
    [31]
    J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81-106. 1986.]]
    [32]
    Jr. R. H. Halstead. Multilisp: A language for concurrent symbolic computation. A CM Trans. on Programming Languages and Systems, 7(4):501-538, 1985.]]
    [33]
    M. C. Rinard, D. J. Scales, and M. S. Lain. Jade: A high-level, machine-independent language for parallel programming. IEEE Computer, June 1993.]]
    [34]
    A. Rogers, M. Carlisle, J. Reppy, and L. Hendren. Supporting dynamic data structures on distributed memory machines. A CM Transactions on Programming Languages and Systems, 17(2):233-263, March 1995.]]
    [35]
    R.S.Nikhil. Cid: A parallel, shared-memory c for distributed memory machines. In Proc. 7th. Ann. Wkshp. on Languages and Compilers for Parallel Computing, pages 376-390, August 1994.]]
    [36]
    C. A. Rugguero and J. Sargeant. Control of parallelism in the manchester dataflow machine. In Functional Programming Languages and Computer Architecture, volume 174 of Lecture Notes in Computer Science, pages 1-15. Springer-Verlag, 1987.]]
    [37]
    V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13:354-356, 1969.]]
    [38]
    T. H. Tzen and L. M. Ni. Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE 7#nsactions on Parallel and Distributed Systems, 4(1):87-98, Jan 1993.]]

    Cited By

    View all
    • (2009)Concurrent programming method for digital signal processing2009 7th International Symposium on Intelligent Systems and Informatics10.1109/SISY.2009.5291151(267-271)Online publication date: Sep-2009
    • (2008)Abstractions for concurrent programming in embedded systems2008 6th International Symposium on Intelligent Systems and Informatics10.1109/SISY.2008.4664930(1-4)Online publication date: Sep-2008
    • (2003)Cited ReferencesComputer algebra handbook10.5555/940131.940137(493-622)Online publication date: 1-Jan-2003
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
    June 1997
    287 pages
    ISBN:0897919068
    DOI:10.1145/263764
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 June 1997

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic scheduling
    2. language implementation
    3. multithreading
    4. nested parallelism
    5. space efficiency

    Qualifiers

    • Article

    Conference

    PPoPP97
    Sponsor:
    PPoPP97: Principles & Practices of Parallel Programming
    June 18 - 21, 1997
    Nevada, Las Vegas, USA

    Acceptance Rates

    PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;
    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)56
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2009)Concurrent programming method for digital signal processing2009 7th International Symposium on Intelligent Systems and Informatics10.1109/SISY.2009.5291151(267-271)Online publication date: Sep-2009
    • (2008)Abstractions for concurrent programming in embedded systems2008 6th International Symposium on Intelligent Systems and Informatics10.1109/SISY.2008.4664930(1-4)Online publication date: Sep-2008
    • (2003)Cited ReferencesComputer algebra handbook10.5555/940131.940137(493-622)Online publication date: 1-Jan-2003
    • (2002)Expressing Irregular Computations in Modern Fortran DialectsLanguages, Compilers, and Run-Time Systems for Scalable Computers10.1007/3-540-49530-4_1(1-16)Online publication date: 24-Sep-2002
    • (2001)Low-contention depth-first scheduling of parallel computations with write-once synchronization variablesProceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures10.1145/378580.378639(189-198)Online publication date: 3-Jul-2001
    • (2001)A general scheduling framework for parallel execution environmentsProceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2001.923260(680-687)Online publication date: 2001
    • (2000)More types for nested data parallel programmingACM SIGPLAN Notices10.1145/357766.35124935:9(94-105)Online publication date: 1-Sep-2000
    • (2000)More types for nested data parallel programmingProceedings of the fifth ACM SIGPLAN international conference on Functional programming10.1145/351240.351249(94-105)Online publication date: 1-Sep-2000
    • (2000)Memory requirements for parallel programsParallel Computing10.1016/S0167-8191(00)00053-326:13-14(1739-1763)Online publication date: 1-Dec-2000
    • (1999)Irregular computations in Fortran - expression and implementation strategiesScientific Programming10.1155/1999/6076597:3-4(313-326)Online publication date: 1-Aug-1999
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media