In this paper we describe paradigms for building and designing parallel computing machines. Firstly we elaborate the uniqueness of MIMD model for the execution of diverse applications. Then we compare the General Purpose Architecture of... more
In this paper we describe paradigms for building and designing parallel computing machines. Firstly we elaborate the uniqueness of MIMD model for the execution of diverse applications. Then we compare the General Purpose Architecture of Parallel Computers with Special Purpose Architecture of Parallel Computers in terms of cost, throughput and efficiency. Then we describe how Parallel Computer Architecture employs parallelism and concurrency through pipelining. Since Pipelining improves the performance of a machine by dividing an instruction into a number of stages, therefore we describe how the performance of a vector processor is enhanced by employing multi pipelining among its processing elements. Also we have elaborated the RISC architecture and Pipelining in RISC machines After comparing RISC computers with CISC computers we observe that although the high speed of RISC computers is very desirable but the significance of speed of a computer is dependent on implementation strategies. Only CPU clock speed is not the only parameter to move the system software from CISC to RISC computers but the other parameters should also be considered like instruction size or format, addressing modes, complexity of instructions and machine cycles required by instructions. Considering all parameters will give performance gain . We discuss Multiprocessor and Data Flow Machines in a concise manner. Then we discuss three SIMD (Single Instruction stream Multiple Data stream) machines which are DEC/MasPar MP-1, Systolic Processors and Wavefront array Processors. The DEC/MasPar MP-1 is a massively parallel SIMD array processor. A wide variety of number representations and arithmetic systems for computers can be implemented easily on the DEC/MasPar MP-1 system. The principal advantages of using such 64×64 SIMD array of 4-bit processors for the implementation of a computer arithmetic laboratory arise out of its flexibility. After comparison of Systolic Processors with Wave front Processors we found that both of the Systolic Processors and Wave front Processors are fast and implemented in VLSI. The major drawback of Systolic Processors is the problem of availability of inputs when clock ticks because of propagation delays in connection buses. The Wave front Processors combine the Systolic Processor architecture with Data Flow machine architecture. Although the Wave front processors use asynchronous data flow computing structure, the timing in the interconnection buses, at input and at output is not problematic..
"This Contemporary computer systems are multiprocessor or multicomputer machines. Their efficiency depends on good methods of administering the executed works. Fast processing of a parallel application is possible only when its parts are... more
"This Contemporary computer systems are multiprocessor or multicomputer machines. Their efficiency depends on good methods of administering the executed works. Fast processing of a parallel application is possible only when its parts are appropriately ordered in time and space. This calls for efficient scheduling policies in parallel computer systems. In this work deterministic problems of scheduling are considered. The classical scheduling theory assumed that the application in any moment of time is executed by only one processor. This assumption has been weakened recently, especially in the context of parallel and distributed computer systems. This monograph is devoted to problems of deterministic scheduling applications (or tasks according to the scheduling terminology) requiring more than one processor simultaneously. We name such applications multiprocessor tasks. In this work the complexity of open multiprocessor task scheduling problems has been established. Algorithms for scheduling multiprocessor tasks on parallel and dedicated processors are proposed. For a special case of applications with regular structure which allow for dividing it into parts of arbitrary size processed independently in parallel, a method of finding optimal scattering of work in a distributed computer system is proposed. The applications with such regular characteristics are called divisible tasks. The concept of a divisible task enables creation of tractable computation models in a wide class of computer architectures such as chains, stars, meshes, hypercubes, multistage networks. Divisible task method gives rise to the evaluation of computer system performance. Examples of such performance evaluation are presented. This work summarizes earlier works of the author as well as contains new original results. Mukul Varshney | Jyotsna | Abhakiran Rajpoot | Shivani Garg""Problems in Task Scheduling in Multiprocessor System"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd2198.pdf Article URL: http://www.ijtsrd.com/computer-science/computer-architecture/2198/problems-in-task-scheduling-in-multiprocessor-system/mukul-varshney"
... The simulation tools presented in this paper uses the pro-cess-orientqd approach being the activity of the processes defined by the programmer. ... This is important for the simulation of MIMD architectures where pro-cesses execute... more
... The simulation tools presented in this paper uses the pro-cess-orientqd approach being the activity of the processes defined by the programmer. ... This is important for the simulation of MIMD architectures where pro-cesses execute different tasks simultaneously. ...
This paper presents a compiling technique to generate parallel code with explicit local communications for a mesh-connected distributed memory, MIMD architecture. Our compiling technique works for the geometric paradigm of parallel... more
This paper presents a compiling technique to generate parallel code with explicit local communications for a mesh-connected distributed memory, MIMD architecture. Our compiling technique works for the geometric paradigm of parallel computation, ie a data-parallel paradigm where array data structures are partitioned and assigned to a set of processing nodes, which, to perform their identical tasks, need to exchange some of the data allocated to them. This means that some data dependencies exist between computations ...
Reconstruction of phylogenetic trees for very large datasets is a known example of a computationally hard problem. In this paper, we present a parallel computing model for the widely used Multiple Instruction Multiple Data (MIMD)... more
Reconstruction of phylogenetic trees for very large datasets is a known example of a computationally hard problem. In this paper, we present a parallel computing model for the widely used Multiple Instruction Multiple Data (MIMD) architecture. Following the idea of divide-and-conquer, our model adapts the Recursive-DCM3 decomposition method (Roshan et al., 2004) to divide datasets into smaller subproblems. It distributes
The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel... more
The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned
Morero-supercomputer is first of its kind, designed and constructed at any educational institute in Pakistan. Morero is a MIMD multiprocessor cluster, composed of off-the-shelf commodity personal computers, connected through a dedicated... more
Morero-supercomputer is first of its kind, designed and constructed at any educational institute in Pakistan. Morero is a MIMD multiprocessor cluster, composed of off-the-shelf commodity personal computers, connected through a dedicated network that acts as a single supercomputer. Morero is based on Beowulf class supercomputer architecture, the Beowulf cluster architecture is defined as a high performance parallel system networked cluster
A parallel unstructured finite element (FE) reacting flow solver designed for message passing MIMD computers is described. This implementation employs automated partitioning algorithms for load balancing unstructured grids, a distributed... more
A parallel unstructured finite element (FE) reacting flow solver designed for message passing MIMD computers is described. This implementation employs automated partitioning algorithms for load balancing unstructured grids, a distributed sparse matrix representation of the global FE equations, and parallel Krylov subspace iterative solvers. In this paper, a number of issues related to the efficient implementation of parallel unstructured mesh applications are presented. These issues include the differences between structured and unstructured mesh parallel applications, major communication kernels for unstructured Krylov iterative solvers, automatic mesh partitioning algorithms, and the influence of mesh partitioning metrics and single-node CPU performance on parallel performance. Results are presented for example FE heat transfer, fluid flow and full reacting flow applications on a 1024 processor nCUBE 2 hypercube and a 1904 processor Intel Paragon. Results indicate that very high computational rates and high scaled efficiencies can be achieved for large problems despite the use of sparse matrix data structures and the required unstructured data communication.