Article

Free access

Efficient synchronization primitives for large-scale cache-coherent multiprocessors

Authors:

James R. Goodman,

Mary K. Vernon, and

Philip J. WoestAuthors Info & Claims

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

April 1989

Pages 64 - 75

https://doi.org/10.1145/70082.68188

Published: 01 April 1989 Publication History

Abstract

This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the inter-connect, and (in one case) that the interconnect supports broadcast.

The primitives make use of synchronization bits (syncbits) to provide a simple mechanism for mutual exclusion. The proposed implementation of the primitives includes efficient (i.e. local) busy-waiting for syncbits. In addition, a hardware-supported mechanism for maintaining a first-come first-serve queue of requests for a syncbit is proposed. This queueing mechanism allows for a very efficient implementation of, as well as fair access to, binary semaphores. We also propose to implement Fetch and Add with combining in software rather than hardware. This allows an architecture to scale to a large number of processors while avoiding the cost of hardware combining.

Scenarios for common synchronization events such as work queues and barriers are presented to demonstrate the generality and ease of use of the proposed primitives. The efficient implementation of the primitives is simpler if the multiprocessor has a hardware cache-consistency protocol. To illustrate this point, we outline how the primitives would be implemented in the Multicube multiprocessor [GoWo88].

References

[1]

Archibald, J., and I. L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Transactions on Computer Systems, November 1986, pp. 273-298.

Digital Library

[2]

Baer, J. L., and W. H. Wang, "Architectural Choices for Multilevel Cache Hierarchies," Proceedings of the 1987 International Conference on Parallel Processing, August 1987, pp. 258-261.

[3]

Bell C. G., "Multis: A New Class of Multiprocessor Computers," Science, April 26, 1985, pp. 462-467.

[4]

Bitar, P., and A. M. Despain, "Multiprocessor Cache Synchronization Issues, Innovations, Evolution," Proceedings of the 13th Annual International Symposium on Computer Architecture, June 1986, pp. 424-433.

Digital Library

[5]

Brantley, W. C., K. P. McAuliffe, and J. Weiss, "RP3 Processor-Memory Element," Proceedings of the 1985 International Conference on Parallel Processing, August 1985, pp 782-789.

[6]

Brooks, E. D., "The Butterfly Barrier," International Journal of Parallel Programming, August 1986, pp 295-307.

Digital Library

[7]

Goodman, J. R., M. D. Hill, and P. J. Woest, "Scalability and Its Application to Multicube," submitted to the 16th Annual international Symposium on Coo~uter Architecture, May 1989.

[8]

Goodman, J. R., and P. J. Woest, "The Wisconsin Multicube: A New Large-Scale Cache-Coherent Multiproeessor," Proceedings of the 15th Annual International Symposium on Computer Arclu'tecture, June 1988, pp. 422-431.

Digital Library

[9]

Gottlieb, A., B. D. Lubachevsky, and L. Rudolph, "Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Pmcsots," ACM Transactions on Programming Languages and Systems, April 1983, pp. 164-189.

Digital Library

[10]

Oottlieb, A., R. Orishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, And M. Snir, "The NYU Ultracomputer-- Designing an MIMD, Shared Memory Parallel Machine," IEEE Transactions on Computers, February 1983, pp. 175-189.

[11]

Jordan, H. F., "Performance Measurements on HEP -- a Pipelined MIMD Computer," Proceedings of the lOth Annual international Symposium on Computer Architecture, June 1983, pp. 207-212.

Digital Library

[12]

Leutenegger, S. T., and M. K. Vernon, "A Mean- Value Performance Analysis of a New Multiprocessor Architecture," Proceedings of the 1988 ACM SIG- METRICS Conference, May 1988, pp. 16%176.

Digital Library

[13]

Lundstrom, S. F., "Applications Considerations in the System Design of Highly Concurrent Multiprocesmrs," IEEE Transactions on Computers, November 1987, pp. 1292-1309.

Digital Library

[14]

Osterhaug, A., Guide to Parallel Programming on Sequent Computer Systems, 2nd ed., Sequent Computer Systems, Inc., Beaverton, Oregon, 1987.

Digital Library

[15]

Pfister, O. A., and V. A. Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks," Proceedings of the 1985 International Conference on Parallel Processing, August 1985, pp. 790-797.

[16]

Rudolph, L., and Z. Segall, "Dynamic Dex.~tralized Cache Scbemes for MIMD Parallel Processors," Proceedings of the l lth Annual International Symposiam on Computer Architecture, June 1984, pp. 340-347.

Digital Library

[17]

Yew, P. C., N. F. Tzeng, and D. H. Lawrie, "Distributing Hot-Spot Addressing in Large-Scale Multiprocessors," IEEE Transactions on Computers, April 1987, pp 388-395.

Digital Library

[18]

Zhu, C. Q., and P. C. Yew, "A Scheme to Enforce Data Dependence on Large Multiprocessor Systems," IEEE Transactions on Soj~are Engineering, June 1987, pp. 726-739.

Digital Library

Cited By

Cho KJeon SRaad AKang J(2023)Memento: A Framework for Detectable Recoverability in Persistent MemoryProceedings of the ACM on Programming Languages10.1145/35912327:PLDI(292-317)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591232
Milman-Sela GKogan ALev YLuchangco VPetrank E(2022)BQ: A Lock-Free Queue with BatchingACM Transactions on Parallel Computing10.1145/35127579:1(1-49)Online publication date: 23-Mar-2022
https://dl.acm.org/doi/10.1145/3512757
Shen ZWan ZGu YSun YAgrawal KLee I(2022)Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficientProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538574(273-286)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538574
Show More Cited By

Index Terms

Recommendations

Efficient synchronization primitives for large-scale cache-coherent multiprocessors
Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems

This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the inter-connect, and (in one case) that ...
Read More
Two economical directory schemes for large-scale cache coherent multiprocessors

Cache coherence problem is a major issue in the design of shared-memory multiprocessors. As the number of processors grows, traditional bus-based snoopy schemes for cache coherence are no longer adequate. Instead, the directory-based scheme is a ...
Read More
A hierarchical directory scheme for large-scale cache-coherent multiprocessors
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

April 1989

303 pages

ISBN:0897913000

DOI:10.1145/70082

Chairman:
Joel Emer,
General Chair:
John Hennessy
Stanford University

ACM SIGARCH Computer Architecture News Volume 17, Issue 2
Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
April 1989
291 pages
ISSN:0163-5964
DOI:10.1145/68182
Editor:
Joel Emer
Issue’s Table of Contents

Copyright © 1989 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS89

Sponsor:

ASPLOS89: Int'l Conference on Architecture Support for Programming Lang & Operating Systems

April 3 - 6, 1989

Massachusetts, Boston, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

236
Total Citations
View Citations
1,540
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)9

Other Metrics

View Author Metrics

Citations

Cited By

Cho KJeon SRaad AKang J(2023)Memento: A Framework for Detectable Recoverability in Persistent MemoryProceedings of the ACM on Programming Languages10.1145/35912327:PLDI(292-317)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591232
Milman-Sela GKogan ALev YLuchangco VPetrank E(2022)BQ: A Lock-Free Queue with BatchingACM Transactions on Parallel Computing10.1145/35127579:1(1-49)Online publication date: 23-Mar-2022
https://dl.acm.org/doi/10.1145/3512757
Shen ZWan ZGu YSun YAgrawal KLee I(2022)Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficientProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538574(273-286)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538574
Giannoula CVijaykumar NPapadopoulou NKarakostas VFernandez IGomez-Luna JOrosa LKoziris NGoumas GMutlu O(2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00031
Ekemark PYao YRos ASagonas KKaxiras S(2021)TSOPER: Efficient Coherence-Based Strict Persistency2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00021(125-138)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00021
Duţu ASinclair MBeckmann BWood DChow MMartínez JDuato JEeckhout L(2020)Independent forward progress of work-groupsProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00087(1022-1035)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00087
Steil JTonsen MSugano YBulling A(2019)InvisibleEyeGetMobile: Mobile Computing and Communications10.1145/3372300.337230723:2(30-34)Online publication date: 14-Nov-2019
https://dl.acm.org/doi/10.1145/3372300.3372307
Tang XZhai JQian XChen WBahar IHerlihy MWitchel ELebeck A(2019)pLockProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304030(765-778)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304030
M. Yaghini PMichelogiannakis GV. Gratz P(2019)SpecLock: Speculative Lock Forwarding2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00041(273-282)Online publication date: Nov-2019
https://doi.org/10.1109/ICCD46524.2019.00041
Almeida PBaquero C(2019)Scalable eventually consistent counters over unreliable networksDistributed Computing10.1007/s00446-017-0322-232:1(69-89)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s00446-017-0322-2
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents