article

Free access

The fuzzy barrier: a mechanism for high speed synchronization of processors

Author:

Rajiv GuptaAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 17, Issue 2

Pages 54 - 63

https://doi.org/10.1145/68182.68187

Published: 01 April 1989 Publication History

Abstract

Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using shared variables has two major drawbacks. Firstly, the execution of the barrier may be slow as it may not only require execution of several instructions and but also result in hot-spot accesses. Secondly, processors that are stalled waiting for other processors to reach the barrier are essentially idling and cannot do any useful work. In this paper, the notion of the fuzzy barrier is presented, that avoids the above drawbacks. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by extending the barrier concept to include a region of statements that can be executed by a processor while it awaits synchronization. The barrier regions are constructed by a compiler and consist of several instructions such that a processor is ready to synchronize upon reaching the first instruction in this region and must synchronize before exiting the region. When synchronization does occur, the processors could be executing at any point in their respective barrier regions. The larger the barrier region, the more likely it is that none of the processors will have to stall. Preliminary investigations show that barrier regions can be large and the use of program transformations can significantly increase their size. Examples of situations where such a mechanism can result in improved performance are presented. Results based on a software implementation of the fuzzy barrier on the Encore multiprocessor indicate that the synchronization overhead can be greatly reduced using the mechanism.

References

[1]

P. Tang and P.C. Yew, "Processor Self-Scheduling for Multiple-Nested Parallel Loops," Proc. International Conf. on Parallel Processing, pp. 528-535, August, 1986.

[2]

R. Gupta, "Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems," Tech. Report TR-88-019, Philips Laboratoriea, Briarcliff Manor, NY, 1988.

[3]

H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Publishing Company, 1987.

Digital Library

[4]

P.C. Yew, N.F. Tzeng, and D.H. Lawrie, "Distributing Hot-Spot Addressing in Large Scale Multiprocessors," IEEE Trans. on Computers, vol. 0- 36, no. 4, April, 1987.

Digital Library

[5]

C.D. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design," IEEE Trans. on Computers, vol. 37, no. 8, pp. 991-1004, August, 1988.

Digital Library

[6]

J.R. Ellis, Bulldog: A Compiler for VLIW Architectures, MIT Press, 1986.

Digital Library

[7]

R. Gupta, "A Reconfigurable LIW Architecture and its Compiler," Dept. of Computer Science; Ph.D. dissertation, Tech. Report 87-3, University of Pittsburgh, August, 1987.

Digital Library

[8]

R. Gupta and M.L. Sofia, "A Reconfigurable LIW Architecture," Proc. of the International Conf. on Parallel Processing, pp. 893-900, August, 1987.

[9]

D.A. Patterson, "Reduced instruction Set Computers,'' Communications of the A CM, vol. 28, no. 1, pp. 8-21, Jan., 1985.

Digital Library

[10]

A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.

Digital Library

[11]

J. Hennessy and T. Gross, "Postpass Code Optimization of Pipeline Constraints," A CM Trans. on Programming Languages and Systems, vol. 3, no. 5, pp. 422-448, 1983.

Digital Library

[12]

W.C. Hsu, "Register Allocation and Code Scheduling for Load/Store Architectures," Dept. of Computer Science; Ph.D. dissertation, University of Wisconsin, Madison, 1987.

Digital Library

[13]

D.J Kuck, R.H. Kuhn, D.A. Padua, B. Leasure, and M. Wolfe, "Dependence Graphs and Compiler Optimizations," 8th Annual A CM Syrup. on Principles of Programming Languages, pp. 207-218, 1981.

Digital Library

[14]

"Multimax Technical Summary," Encore Computer Corporation, Marlboro MA, 1987.

[15]

A. Osterhaug, "Guide to Parallel Programming on Sequent Computer Systems," Sequent Computer Systems, Inc., Beaverton, Oregan, 1987.

Digital Library

[16]

R. Gupta and M. Epstein, "Achieving Low Cost Synchronization in a Multiprocessor System," Philips Laboratories; Tech. Note TN-88-140, Briarcliff Manor, NY, October, 1988.

[17]

R. Cytron, "Doacross: Beyond Vectorization for Multiprocessors," Proc. International Conf. on Parallel Processing, pp. 836-844, August, 1986.

[18]

C.D. Polychronopoulos, D.J. Kuck, and D.A. Padua, "Execution of Parallel Loops on Parallel Processor Systems," Proc. International Conf. on Parallel Processing, pp. 235-242, August, 1986.

[19]

C.D. Polychronopoulos and D.J. Kuck, "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers," IEEE Trans. on Computers, vol. C-36, no. 12, pp. 1425-1439, Dec., 1987.

Digital Library

[20]

M. Byler, J.R.B. Davies, C. Huson, B. Leasure, and M. Wolfe, "Multiple Version Loops," International Conf. on Parallel Processing, pp. 312-318, August, 1987.

Cited By

Deshpande CParzefall FHetzelt FFranz M(2024)Polynima: Practical Hybrid Recompilation for Multithreaded BinariesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650065(1126-1141)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650065
Liu BHuang JRauchwerger L(2019)Rethinking Incremental and Parallel Pointer AnalysisACM Transactions on Programming Languages and Systems10.1145/329360641:1(1-31)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1145/3293606
Andersen LSt-Amour VVitek JFelleisen M(2018)Feature-Specific ProfilingACM Transactions on Programming Languages and Systems10.1145/327551941:1(1-34)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1145/3275519
Show More Cited By

Index Terms

Recommendations

The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using ...
Read More
Support for speculative execution in high-performance processors
Read More
Support for Speculative Execution in High-Performance Processors
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 17, Issue 2

Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems

April 1989

291 pages

ISSN:0163-5964

DOI:10.1145/68182

Editor:
Joel Emer

Issue’s Table of Contents

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
April 1989
303 pages
ISBN:0897913000
DOI:10.1145/70082
Chairman:
Joel Emer,
General Chair:
John Hennessy
Stanford University

Copyright © 1989 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Published in SIGARCH Volume 17, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

167
Total Citations
View Citations
1,109
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)14

Other Metrics

View Author Metrics

Citations

Cited By

Deshpande CParzefall FHetzelt FFranz M(2024)Polynima: Practical Hybrid Recompilation for Multithreaded BinariesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650065(1126-1141)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650065
Liu BHuang JRauchwerger L(2019)Rethinking Incremental and Parallel Pointer AnalysisACM Transactions on Programming Languages and Systems10.1145/329360641:1(1-31)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1145/3293606
Andersen LSt-Amour VVitek JFelleisen M(2018)Feature-Specific ProfilingACM Transactions on Programming Languages and Systems10.1145/327551941:1(1-34)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1145/3275519
Cogumbreiro THu RMartins FYoshida N(2018)Dynamic Deadlock Verification for General Barrier SynchronisationACM Transactions on Programming Languages and Systems10.1145/322906041:1(1-38)Online publication date: 11-Dec-2018
https://dl.acm.org/doi/10.1145/3229060
Jannesari ATichy W(2014)Library-Independent Data Race DetectionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.20925:10(2606-2616)Online publication date: Oct-2014
https://doi.org/10.1109/TPDS.2013.209
Caballero DDuran AMartorell X(2013)An OpenMP* Barrier Using SIMD Instructions for Intel® Xeon PhiTM CoprocessorOpenMP in the Era of Low Power Devices and Accelerators10.1007/978-3-642-40698-0_8(99-113)Online publication date: 2013
https://doi.org/10.1007/978-3-642-40698-0_8
Martins FT. Vasconcelos VCogumbreiro T(2011)Types for X10 ClocksElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.69.869(111-129)Online publication date: 18-Oct-2011
https://doi.org/10.4204/EPTCS.69.8
Jannesari ATichy W(2010)Identifying ad-hoc synchronization for enhanced race detection2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)10.1109/IPDPS.2010.5470343(1-10)Online publication date: Apr-2010
https://doi.org/10.1109/IPDPS.2010.5470343
Matsakis NGross T(2009)Programming with intervalsProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_14(203-217)Online publication date: 8-Oct-2009
https://dl.acm.org/doi/10.1007/978-3-642-13374-9_14
Vander Zanden B(1996)An incremental algorithm for satisfying hierarchies of multiway dataflow constraintsACM Transactions on Programming Languages and Systems10.1145/225540.22554318:1(30-72)Online publication date: 1-Jan-1996
https://dl.acm.org/doi/10.1145/225540.225543
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents