Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
39 views

Interconnection Networks

Some paper on interconnection networks describing common solutions in multiprocessor architectures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
39 views

Interconnection Networks

Some paper on interconnection networks describing common solutions in multiprocessor architectures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 40
interconnection Networks Howard Jay Siegel William Tsun-yuk Hsu 6.1. Introduction Many tasks require the computational power made possible by parallel pro- cessing. The demand for fast computation is usually due to a desire for real- time response and/or the need to process immense data sets. These types of tasks include acrodynamic simulations, air traffic control, chemical re- action simulations, seismic data processing, satellite-collected imagery anal- ysis, missile guidance, ballistic missile defense, weather forecasting, map making, robot vision, and speech understanding. Systems comprising a mul- titude of tightly coupled, cooperating processors can help provide the com- putational performance required by these tasks. STARAN (7) and MPP (8) are examples of existing systems with 2° and 2'* simple processors, re- Some of the material in this chapter is summarized from Interconnection Networks for Large-Scale Parallel Processing, by H. J. Siegel, Lexington Books, D. C. Heath and Company, Lexington, MA, copyright 1985. This project was supported by the Rome Air Development Center, under contract number F30602-83-K-01 19, the Institute for Defense Analyses Super- computing Research Center under contract number MDA 904-85-C-5027, and the Purdue Re- search Foundation David Ross Grant 1985/86 number 0857. 225 226 Part Il/Topics in Multiprocessing spectively. Ultracomputer (28) is a proposed design for a system consisting of 2'? complex processors. This chapter examines methods to provide com- munications among the processors and memories of such large-scale parallel/ distributed systems. Two models of interprocessor communication networks were intro- duced in Chapter 5. The processor-to-memory model assumes N processors on one side of a bidirectional network and N memory modules on the other side. It is also possible to organize processors and memory modules into processor/memory pairs or processing elements (PEs). In the PE-to-PE model, PE i is connected to input i and output i of a unidirectional inter- connection network. In this chapter, the PE-to-PE model will be used; how- ever, the material presented is also applicable to processor-to-memory systems. The taxonomy originated by Flynn (26) to describe parallel processors has already been described in Chapter 5. Two of the modes of parallelism described by Flynn are the SIMD and MIMD modes. SIMD stands for single instruction stream-multiple data stream. An SIMD machine may consist of N PEs, an interconnection network that provides communications between. PEs, and a single control unit. The control unit broadcasts instructions to all the PEs, and all enabled PEs execute the same instructions simultane- ously, hence forming a single instruction stream. Each PE operates on its own data from its memory. Hence, there are multiple data streams. MIMD stands for multiple instruction stream—multiple data stream. An MIMD ma- chine may consist of N PEs linked by an interconnection network. Each PE stores and executes its own instructions and operates on its own data. There- fore, there are multiple instruction streams and multiple data streams. In addition, there are MSIMD (multiple-SIMD) machines and partitionable SIMD/MIMD machines. MSIMD machines are systems that can be recon- figured into a number of smaller, independent SIMD machines. Partitionable SIMD/MIMD machines can be partitioned into smaller virtual machines working in SIMD or MIMD mode. These have been covered in Chapter 5. The task of interconnecting N processors and N memory modules, where N may be in the range 2° to 2'°, is a nontrivial one. The interconnection scheme must provide fast and flexible communications without unreasonable cost. A single shared bus, as shown in Figure 6.1, is not sufficient because it is often desirable to allow all processors to send data to other processors simultaneously (e.g., from processor i to processor i — 1, 1 =i < N). The Figure 6.1 A single shared bus used to provide communications for N devices. Interconnection Networks 227 Figure 6.2 A completely connected system for N = 8. ideal situation would be to link directly each processor ta every other pro- cessor so that the system is completely connected. This is shown for N = 8 in Figure 6.2, where one could assume, for example, that each node is a processor with its own memory. Unfortunately, this is highly impractical for large N because it requires N — 1 unidirectional lines for each processor. For example, if N = 2°, then 2° x (2° — 1) = 261,632 links would be needed. An alternative interconnection scheme that allows all processors to communicate simultaneously is the crossbar switch, shown in Figure 6.3. In this example, the processors communicate through the memories. The network can be viewed as a set of intersecting lines, where interconnections between processors and memories are specified by the crosspoint switches at each line intersection (75). The difficulty with crossbar networks is that 168 proc. i [ ‘crosspoint switch proc. New Figure 6.3 A crossbar switch connecting N processors to N memories. 228 Part I/Topics in Multiprocessing the cost of the network (the number of crosspoint switches) grows with N?, which, given current technology, makes it infeasible for large systems. In order to solve the problem of providing fast, efficient communica- tions at a reasonable cost, many different networks between the extremes of the single bus and the completely connected scheme have been proposed in the literature. No single network is generally considered ‘‘best.”” The cost- effectiveness of a particular network design depends on such factors as the computational tasks for which it will be used, the desired speed of inter- processor data transfers, the actual hardware implementation of the net- work, the number of processors in the system, and any cost constraints on the construction. A variety of networks that have been proposed are over- viewed in numerous survey articles and books, e.g., (4, 12, 21, 32, 34, 37, 42, 62, 74, 81). This chapter is a study of an important collection of network designs that can be used to support large scale parallelism—i.e., these networks can provide the communications needed in a parallel processing system con- sisting of a large number of processors (¢.g., 2° to 2'6) that are working together to perform a single overall task. Many of these networks can be used in dynamically reconfigurable machines that can perform independent multiple tasks, where each task is processed using parallelism. The networks examined here are based on the ‘‘Shuffle-Exchange,”” “Cube,” ‘‘PM2I’’ (plus-minus 2‘), and “Illiac’* (nearest neighbor) inter- connection patterns. These networks and their single stage implementations are explored in Section 6.2. Section 6.3 is a study of the multistage Cube/ Shuffle-Exchange class of networks. The Generalized Cube network will be discussed as an example of this type of network. A fault-tolerant yersion of the Generalized Cube network, called the Extra Stage Cube network, is the subject of Section 6.4. Data manipulator type networks, which are multistage implementations of the PM2I connection patterns, will be discussed in Sec- tion 6.5. 6.2. Interconnection Functions and Single Stage Networks 6.2.1. Introduction Assume a parallel system with N = 2” PEs, numbered (addressed) from 0 to N — 1. An interconnection network can be described by a set of inter- connection functions. Each interconnection function is a bijection (permu- tation) on the set of PE addresses. Interconnection functions represent inter- PE data transfers using mathematical mappings. When an interconnection function f is executed, PE i sends data to PE f(i). If a system is operating in SIMD mode, this means that every PE sends data to exactly one PE, and every PE receives data from exactly one PE (assuming all PEs are active). Interconnection Networks 229 Otherwise, the data transfer from PE i to PE f(i) may occur only for a subset of the PEs in the system. Four types of interconnection networks will be discussed: the Cube, the Iliac, the PM2I, and the Shuffle-Exchange. Interconnection networks can be constructed from a single stage of switches or multiple stages of switches. In a single-stage network, data items may have to be passed through the switches severa! times before reaching their final destinations. In a multistage network, generally one pass through the multiple (usually m) stages of switches is sufficient to transfer the data items to their final destinations. An important consideration in the selection of an interconnection net- work for a system is the partitionability of the network. The partitionability of an interconnection network is the ability to divide the network into in- dependent subnetworks of different sizes (60). Each subnetwork of size N’ < N must have all of the interconnection capabilities of a complete network of that same type built to be of size N'. Multiple-SIMD systems use par- titionable interconnection networks to dynamically reconfigure the system into independent SIMD machines of varying sizes. The multiple-SIMD model will be used as a framework for the partitioning analyses in this chap- ter. However, the results can be used to partition MIMD and partitionable SIMD/MIMD machines also. The subject of this section is the single-stage implementation of the Cube, Iliac, PM2I, and Shuffle-Exchange interconnection networks. Each of these networks will be defined, and examples of their operation in both the SIMD and MIMD modes of parallelism will be given. The partitionability of these single stage networks will also be discussed. Further information about these topics is in (59-61, 69). The following notation-will be used: let the binary representation of an arbitrary PE address P be pm—1Pm—2 « - » P1Po, let p; be the complement of p:, and let the integer n be the square root of N. It is assumed throughout this chapter that —7 modulo N = N — j modulo N, for j > 0—e.g., —4 modulo 16 = 12 modulo 16. 6.2.2. The Cube Network The Cube network consists of m interconnection functions defined by: cube(Pm—1 °° Pir 1PiPi-1 *** Po) = Pm—1°** Pix1PiPi-1 *"* Po for 0 4. Because P + 2”~! = P — 2”~! modulo N, PM2n—1 and PM2—¢,—» are equivalent. Figure 6.7 shows the PM2.,; interconnections for N = 8; PM2_,; is the same as PM2..; except the direction is reversed. This network is called the Plus— Minus 2' because, in terms of mapping source addresses to destinations, it Figure 6.6 Partitioning a size-cight Cube network. (A) Physical cube; (logical cubes). (B) Physical cube, (logical cubes). Interconnection Networks 233 ic 0 oe o 7, Figure 6.7 PM2I network for N = 8. (A) PM2+0 connections. (B) PM2.1 connections. (C) PM2,2 connections. can add or subtract 2‘ from the PE addresses—i.e., it allows PE P to send data to any one of PE P + 2! or PE P — 2', arithmetic modulo N, 0 si < m. A network similar to the PM21 is used in the “Novel Multiprocessor Array’? (50) and is included in the network of the Omen computer (31), The interconnection network of the SIMDA machine is similar in concept to that of the PM2I (78). The data manipulator (20), ADM (66), IADM (63), and gamma (52) multistage networks are based on the PM21 connection pattern. Various properties of the single-stage PM2I network are discussed in (24, 56, 58, 67, 70). Network control in SIMD made can be achieved by means of a system control unit, as in the Cube network. Suppose the PM2I network is imple- mented in the hardware, and a cube, transfer is needed. Mathematically, this means that the i-th bit of each PE address would have to be comple- mented using PM2I functions—i.e., data needs to be moved from PE P to PE cube,(P), 0 = P

You might also like