Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

The fuzzy barrier: a mechanism for high speed synchronization of processors

Published: 01 April 1989 Publication History
  • Get Citation Alerts
  • Abstract

    Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using shared variables has two major drawbacks. Firstly, the execution of the barrier may be slow as it may not only require execution of several instructions and but also result in hot-spot accesses. Secondly, processors that are stalled waiting for other processors to reach the barrier are essentially idling and cannot do any useful work. In this paper, the notion of the fuzzy barrier is presented, that avoids the above drawbacks. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by extending the barrier concept to include a region of statements that can be executed by a processor while it awaits synchronization. The barrier regions are constructed by a compiler and consist of several instructions such that a processor is ready to synchronize upon reaching the first instruction in this region and must synchronize before exiting the region. When synchronization does occur, the processors could be executing at any point in their respective barrier regions. The larger the barrier region, the more likely it is that none of the processors will have to stall. Preliminary investigations show that barrier regions can be large and the use of program transformations can significantly increase their size. Examples of situations where such a mechanism can result in improved performance are presented. Results based on a software implementation of the fuzzy barrier on the Encore multiprocessor indicate that the synchronization overhead can be greatly reduced using the mechanism.

    References

    [1]
    P. Tang and P.C. Yew, "Processor Self-Scheduling for Multiple-Nested Parallel Loops," Proc. International Conf. on Parallel Processing, pp. 528-535, August, 1986.
    [2]
    R. Gupta, "Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems," Tech. Report TR-88-019, Philips Laboratoriea, Briarcliff Manor, NY, 1988.
    [3]
    H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Publishing Company, 1987.
    [4]
    P.C. Yew, N.F. Tzeng, and D.H. Lawrie, "Distributing Hot-Spot Addressing in Large Scale Multiprocessors," IEEE Trans. on Computers, vol. 0- 36, no. 4, April, 1987.
    [5]
    C.D. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design," IEEE Trans. on Computers, vol. 37, no. 8, pp. 991-1004, August, 1988.
    [6]
    J.R. Ellis, Bulldog: A Compiler for VLIW Architectures, MIT Press, 1986.
    [7]
    R. Gupta, "A Reconfigurable LIW Architecture and its Compiler," Dept. of Computer Science; Ph.D. dissertation, Tech. Report 87-3, University of Pittsburgh, August, 1987.
    [8]
    R. Gupta and M.L. Sofia, "A Reconfigurable LIW Architecture," Proc. of the International Conf. on Parallel Processing, pp. 893-900, August, 1987.
    [9]
    D.A. Patterson, "Reduced instruction Set Computers,'' Communications of the A CM, vol. 28, no. 1, pp. 8-21, Jan., 1985.
    [10]
    A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.
    [11]
    J. Hennessy and T. Gross, "Postpass Code Optimization of Pipeline Constraints," A CM Trans. on Programming Languages and Systems, vol. 3, no. 5, pp. 422-448, 1983.
    [12]
    W.C. Hsu, "Register Allocation and Code Scheduling for Load/Store Architectures," Dept. of Computer Science; Ph.D. dissertation, University of Wisconsin, Madison, 1987.
    [13]
    D.J Kuck, R.H. Kuhn, D.A. Padua, B. Leasure, and M. Wolfe, "Dependence Graphs and Compiler Optimizations," 8th Annual A CM Syrup. on Principles of Programming Languages, pp. 207-218, 1981.
    [14]
    "Multimax Technical Summary," Encore Computer Corporation, Marlboro MA, 1987.
    [15]
    A. Osterhaug, "Guide to Parallel Programming on Sequent Computer Systems," Sequent Computer Systems, Inc., Beaverton, Oregan, 1987.
    [16]
    R. Gupta and M. Epstein, "Achieving Low Cost Synchronization in a Multiprocessor System," Philips Laboratories; Tech. Note TN-88-140, Briarcliff Manor, NY, October, 1988.
    [17]
    R. Cytron, "Doacross: Beyond Vectorization for Multiprocessors," Proc. International Conf. on Parallel Processing, pp. 836-844, August, 1986.
    [18]
    C.D. Polychronopoulos, D.J. Kuck, and D.A. Padua, "Execution of Parallel Loops on Parallel Processor Systems," Proc. International Conf. on Parallel Processing, pp. 235-242, August, 1986.
    [19]
    C.D. Polychronopoulos and D.J. Kuck, "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers," IEEE Trans. on Computers, vol. C-36, no. 12, pp. 1425-1439, Dec., 1987.
    [20]
    M. Byler, J.R.B. Davies, C. Huson, B. Leasure, and M. Wolfe, "Multiple Version Loops," International Conf. on Parallel Processing, pp. 312-318, August, 1987.

    Cited By

    View all
    • (2024)Polynima: Practical Hybrid Recompilation for Multithreaded BinariesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650065(1126-1141)Online publication date: 22-Apr-2024
    • (2019)Rethinking Incremental and Parallel Pointer AnalysisACM Transactions on Programming Languages and Systems10.1145/329360641:1(1-31)Online publication date: 1-Mar-2019
    • (2018)Feature-Specific ProfilingACM Transactions on Programming Languages and Systems10.1145/327551941:1(1-34)Online publication date: 19-Dec-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 17, Issue 2
    Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
    April 1989
    291 pages
    ISSN:0163-5964
    DOI:10.1145/68182
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
      April 1989
      303 pages
      ISBN:0897913000
      DOI:10.1145/70082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 1989
    Published in SIGARCH Volume 17, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)14

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Polynima: Practical Hybrid Recompilation for Multithreaded BinariesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650065(1126-1141)Online publication date: 22-Apr-2024
    • (2019)Rethinking Incremental and Parallel Pointer AnalysisACM Transactions on Programming Languages and Systems10.1145/329360641:1(1-31)Online publication date: 1-Mar-2019
    • (2018)Feature-Specific ProfilingACM Transactions on Programming Languages and Systems10.1145/327551941:1(1-34)Online publication date: 19-Dec-2018
    • (2018)Dynamic Deadlock Verification for General Barrier SynchronisationACM Transactions on Programming Languages and Systems10.1145/322906041:1(1-38)Online publication date: 11-Dec-2018
    • (2014)Library-Independent Data Race DetectionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.20925:10(2606-2616)Online publication date: Oct-2014
    • (2013)An OpenMP* Barrier Using SIMD Instructions for Intel® Xeon PhiTM CoprocessorOpenMP in the Era of Low Power Devices and Accelerators10.1007/978-3-642-40698-0_8(99-113)Online publication date: 2013
    • (2011)Types for X10 ClocksElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.69.869(111-129)Online publication date: 18-Oct-2011
    • (2010)Identifying ad-hoc synchronization for enhanced race detection2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)10.1109/IPDPS.2010.5470343(1-10)Online publication date: Apr-2010
    • (2009)Programming with intervalsProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_14(203-217)Online publication date: 8-Oct-2009
    • (1996)An incremental algorithm for satisfying hierarchies of multiway dataflow constraintsACM Transactions on Programming Languages and Systems10.1145/225540.22554318:1(30-72)Online publication date: 1-Jan-1996
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media