The scalable coherent interface (SCI), a local or extended computer backplane interface being defined by an IEEE standard project (P1596), is discussed. the interconnection is scalable, meaning that up to 64 K processor, memory, or I/O nodes can effectively interface to a shared SCI interconnection. The SCI sharing-list structures are described, and sharing-list addition and removal are examined. Optimizations being considered to improve the performance of large system configurations are discussed. Request combining, a useful feature of linked-list coherence, is described. SCI's optional extensions, including synchronization using a queued-on-lock bit, are considered
References
[1]
1. G.F. Pfister et al., "The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture," Proc. Int'l Conf. Parallel Processing, Computer Society Press, Los Alamitos, Calif., Order No. 637 (microfiche only), 1985, pp. 764-771.
2. J.R. Goodman, M.K. Vernon, and P.J. Woest, "Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors," Proc. ASPLOS III, Computer Society Press, Los Alamitos, Calif., Order No. 1936, 1989, pp. 64-75.
Fernández-Pascual RRos AAcacio M(2017)To be silent or notThe Journal of Supercomputing10.1007/s11227-017-2026-673:10(4428-4443)Online publication date: 1-Oct-2017
Fernández-Pascual RRos AAcacio M(2016)Are distributed sharing codes a solution to the scalability problem of coherence directories in manycores? An evaluation studyThe Journal of Supercomputing10.1007/s11227-015-1596-472:2(612-638)Online publication date: 1-Feb-2016
LCN '03: Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
This paper compares and evaluates the multicastperformance of two of the most widely deployed System-AreaNetworks (SANs), Dolphin's Scalable Coherent Interface (SCI)and Myricom's Myrinet. Both networks deliver low latency andhigh bandwidth to ...
MASCOTS '95: Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
The IEEE Std. 1596-1992 for the Scalable Coherent Interface (SCI) specifies a topology-independent communication protocol with the possibility of connecting up to 64 K nodes. SCILab is a collection of tools to simulate the behavior of SCI based ...
Thakkar, Dubois, Laundrie, and Sohi
The authors briefly survey shared-memory multiprocessor hardware architectures, emphasizing the current main directions of research. They do not discuss distributed multiprocessor architectures such as NCube or iPSC. For shared-memory architectures, the authors mention network switching-based architectures (such as BBN's Butterfly) and bus-based architectures (such as Sequent's Symmetry). They say very little about network switching-based architectures, however, and instead focus on directory-based and bus-based schemes for maintaining coherence (providing for the integrity of data shared among the processors during computation).
After quickly reviewing four coherence properties that are incorporated in most protocols for maintaining coherence in shared-memory architectures, the authors summarize the use of presence flags, B pointers, and linked lists as bases for alternative protocols. As an example of the linked list approach, they mention the IEEE Scalable Coherent Interface project.
Protocols for maintaining coherence become more complex and voluminous as the number of processors (and their associated ports and cache memories) increases. The reason for being concerned with coherence is to attempt to avoid a more-than-proportional increase in the complexity and volume of the protocol as the number of processors increases (the “scale” relationship). Another approach to seeking a favorable scale relationship is to modify the hardware for the processor connections. In this area, the authors review bus-based schemes, emphasizing multiple-bus and hierarchical-bus systems. They briefly mention various proposals for differing topologies and roles for the processor connections for enabling access to shared memory.
While the authors profess to have had a lot of help in preparing this survey, the result is not well-balanced. The purpose was to provide a context for three subsequent short papers, one on an example of a bus-based scheme (the Aquarius multiple-bus multiprocessor architecture) and two on the linked-list variety of directory-based schemes (the SCI at the Universities of Oslo and Wisconsin and the SDD protocol at Stanford University). The context could have been better set by leaving fewer loose ends, by being more consistent in the use of terminology, and by being more direct about the complexity supposedly being mitigated.
From the terminology and the references, the multiprocessor hardware and protocol people are clearly not talking with the software database people. While significant parallels exist in the situations and problems they face, as well as in the general character of the resources they can marshal, each group seems to be trying to proceed as though the other had little to offer. I see very capable people in both groups, but they are not in touch with each other.
James, Laundrie, Gjessing, and Sohi
The aim of the Scalable Coherent Interface (SCI), IEEE standards project P1596, is to define an extended computer backplane enabling access to a shared memory, scalable up to 64K nodes with a transfer rate of one gigabyte per second per node. Nodes may be processors, memories, or input-output ports in any mix.
The approach taken thus far is to use a distributed directory; linked lists; cache memory; point-to-point unidirectional connections for the communication of packets; and techniques emphasizing reliability, fault recovery, and optimization for high-frequency transactions. The definition work is being done by simulation, with participation by a group at the University of Oslo and a group at the University of Wisconsin. The SCI-P1596 chair is David B. Gustavson of Stanford University.
The bulk of the paper discusses some of the list handling done for common anticipated situations. The discussion of how the proposed list handling differs from the usual bidirectionally linked list handling for queues and stacks seems weak. The bibliography is disappointingly skimpy.
Thapar and Delagi
The authors report on their work on a distributed-directory scheme for shared-memory multiprocessors. Singly-linked lists are the fundamental data structures used to help provide coherence in the access to shared data. The authors use most of the paper to describe basic list operations performed on the distributed queues; they also contrast their work with the Wisconsin-Oslo SCI work.
While this work appears to be more complex than the SCI work, the authors also apparently assume fewer restrictions on the hardware configuration. While they offer some words of contrast, I would have liked to read how they see their list operations as differing from the usual and what they see as the tradeoffs on coherence for their proposed protocols.
C arlton and Despain
In the Aquarius scalable multiple-bus shared-memory hardware architecture, each node has access to two or more buses arranged in a multidimensional array and serving as a network. Access to the network is provided only for nodes; each node has memory, a cache, and a processor. Part of the memory is used for a portion of a distributed directory to provide coherence in processing shared data. Shared data are held in cache, except at the “root node” for the data. The root node can have private (unshared) data. Nodes can share data most quickly when they are on the same bus. Cache states and directory states are distributed, with each node showing the states only for the data it has.
The authors give a clear summary of their proposed “multi-multi” architecture and protocol, including a few rough quantitative measures of scalability. They give no feel for the tradeoffs and compromises, and the list of references is helpful only on history. The authors only briefly discuss how they visualize the protocol working for widely shared data, a point I would have liked to read more about.
Access critical reviews of Computing literature here
Fernández-Pascual RRos AAcacio M(2017)To be silent or notThe Journal of Supercomputing10.1007/s11227-017-2026-673:10(4428-4443)Online publication date: 1-Oct-2017
Fernández-Pascual RRos AAcacio M(2016)Are distributed sharing codes a solution to the scalability problem of coherence directories in manycores? An evaluation studyThe Journal of Supercomputing10.1007/s11227-015-1596-472:2(612-638)Online publication date: 1-Feb-2016
Fernández-Pascual RRos AAcacio M(2016)Optimization of a Linked Cache Coherence Protocol for Scalable Manycore CoherenceProceedings of the 29th International Conference on Architecture of Computing Systems -- ARCS 2016 - Volume 963710.1007/978-3-319-30695-7_8(100-112)Online publication date: 4-Apr-2016
Attiya HGramoli VMilani A(2010)A provably starvation-free distributed directory protocolProceedings of the 12th international conference on Stabilization, safety, and security of distributed systems10.5555/1926829.1926864(405-419)Online publication date: 20-Sep-2010
Barrow-Williams NFensch CMoore SSalapura VGschwind MKnoop J(2010)Proximity coherence for chip multiprocessorsProceedings of the 19th international conference on Parallel architectures and compilation techniques10.1145/1854273.1854293(123-134)Online publication date: 11-Sep-2010
Kunz RHorowitz MBerger EChen B(2008)The case for simple, visible cache coherencyProceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)10.1145/1353522.1353532(31-35)Online publication date: 2-Mar-2008
de Dios ASahelices BIbáñez PViñals VLlabería J(2006)Speeding-up synchronizations in DSM multiprocessorsProceedings of the 12th international conference on Parallel Processing10.1007/11823285_49(473-484)Online publication date: 28-Aug-2006