Assignment of Algorithm
Assignment of Algorithm
Algorithms has been confined to single processor computers. In PRAM Algorithms we study
algorithms for parallel machines (computers with more than one processors). There are many
applications in day-to-day life that demands real time solutions to problems. For example weather
forecasting has to be done in a timely fashion. In the case of severe hurricanes, snowstorms or
evacuation has to be done in short period of time. If an expert system is used to aid an physician in
surgical procedures, decisions have to be made within seconds. Programs written for such
applications have to perform an enormous amount of computation. In the forecasting example, large
sized matrices have to be operated on. In the medical examples, thousand of rules have to be tried.
Even the fastest single- processor machines may not be able to come up with solutions within
tolerable time limits. Parallel machines offer the potential of decreasing the solution enormously.
COMPUTATIONAL MODEL :
The sequential Computational model we have employed so far is the RAM(random access machines).
In the RAM model we assume that any of the following operations can be performed in one unit of
time: addition, subtraction , multiplication, division, comparison, memory access , assignment and so
on. This model has been widely accepted as a valid sequential model. On the other hand, when it
comes to parallel computing, numerous models have been proposed and algorithms have been
designed for each such model.
An important feature for parallel computing that is absent in sequential computing is the need for
interprocessor communication. For example, given any problem the processors have to communicate
among themselves and agree on the subproblems each will work on. Also they need to communicate
to see whether every one has finished its task. Each machine or processor in a parallel computer can
be assumed to be a RAM. Various parallel models differ in the way they support interprocessor
communication. Parallel models can be categorized into two: Fixed Connection Machines and Shared
Memory Machines.
A fixed connection network is a graph G(V,E) whose nodes represent processors and whose edges
represent communication link between processors.
In Shared Memory models [also called PRAMs (Parallel Random Access Machines), a number of
processors work synchronously. They communicate with each other using a common block of global
memory that is accessible by all. This global memory is called common or shared memory.
Each processor in a PRAM is a RAM with some local memory. A Single step of a PRAM Algorithm can
be one of the following : arithmetic operation(addition, division and so on) , comparison , memory
access(local or global), assignment etc. the number of cells in the global memory is typically assumed
to be the same as p. But this need not always be the case.
EREW(Exclusive Read and Exclusive Write): PRAM is the shared memory model in which no
concurrent read or write is allowed on any cell of the global memory. For Example, at a given time
step, processor one might access cell five and at the same time processor two might access cell 12.
But both processors cannot access same cell for example at the same time.
CREW(Concurrent Read and Exclusive Write): It allows all the processors to read from the same
memory location but are not allowed to write into the same memory location at the same time.
ERCW(Exclusive Read and Concurrent Write): It allows all the processors to write to the same
memory location but are now allowed to read the same memory location at the same time.
CRCW(Concurrent Read and Concurrent Write): It allows all the processors to read from and write to
the same memory location parallelly.
FUNDAMENTAL TECHNIQUES AND ALGORITHM :
The two basic problems that arise in parallel solution of numerous problems. The first problem is
known as Prefix Computation Problem and the second one is List Ranking Problem.
Let ∑ be any domain in which the binary associative operator Ꚛ is defined. An operator Ꚛ is said to be
associative if for any three elements x,y and z from ∑, ((xꚚy) Ꚛz)= (xꚚ(y Ꚛz)) the order in which the
operation Ꚛ is performed doesnot matter. It is also assumed that this is unit time computable and
the ∑ is closed under this operation, that is for any x,y ϵ ∑ ,xꚚy ϵ ∑. The prefix computation problem
on ∑ has input n elements from ∑ say, x1, x2, x3……..xn. The problem is to compute the n elements
x1,x1Ꚛx2,x1Ꚛx2Ꚛx3……..Ꚛxn. The output elements are often referred to as the prefixes.
The prefix Computation problem can be solved in O(n) time sequentially. Any sequential algorithms
for this problem needs Ω(n) time. We present a CREW PRAM Algorithm that uses n/logn processors
and run O(logn) time. Note that an workdone by such an algorithm is O(n) and hence the efficiency of
an algorithm is O(1) and work optimal. Also the speed up of this algorithm is Ɵ(n/logn). We apply
divide and conquer approach to devise the prefix algorithm.
Step 1: Let the first n/2 processors recursively compute the prefixes of x1,x2……xn/2 and Let
y1,y2…..yn/2 be the result. At the same time let the rest of the processors recursively compute the
prefixes of xn/2+1,xn/2+2,…….xn and let this be the output yn/2+1,yn/2+2……yn.
Step 3: Note that the first half of the final answer is same as y1,y2…..yn/2. The second half of the
final answer is yn/2Ꚛyn/2+1Ꚛyn/2+2,….yn/2Ꚛyn.
Let the second half of the processors read yn/2 concurrently from the global memory and update
their answers. This step takes O(1) time.
Let T(n) be the run time of above algorithm on any input of size n using n processors step 2 takes
T(n/2) time and step 3 takes O(1) time.
T(n)=T(n/2)+O(1) , T(1)=1
This solves T(n)=O(log n). Note that in defining the runtime of a parallel divide and conquer
algorithm, it is essential to quantify with the number of processors used.
Example:
Let the input to the prefix computation be 5,12,8,6,3,9,11,12,1,5,6,7,10,4,3,5 and let Ꚛ stand for
addition. Here n=16 and logn=4. Thus in step 1,each of the four processors computes prefix sums on
four numbers each. In step 2, prefix sum on the local sum is computed and in step 3, the locally
computed results are updated.
(Prefix Computation - An Example)
2. List Ranking: List Ranking plays a vital role in the parallel solution of several graph problems.
The input to the problem is a list given in the form of array of nodes. A node consist of some data and
a pointer to its right neighbour in the list. The node themselves need not occur in any order in the
input. The problem is to compute for each node in the list the number of nodes to its right (also
called the rank of the node). Since the data contained in an node is irrelevant to the list ranking
problem we assume that each node contains only a pointer to its right neighbour. The rightmost
nodes pointer field is zero.
Consider the input A[1:6]. The right neighbour of node A[2] is A[4] and so on. Node A[4] is the
rightmost node and its rank is zero. Node A[2] has rank 1 since the only node to its right is A[4]. Node
A[5] has rank 3, since the nodes A[3],A[2] and A[4] are to its right. In this example, the left-to-right
order of the list nodes is given by A[6],A[1],A[5],A[3],A[2],A[4].
SELECTION:
In Selection the problem takes an input a sequence of n keys and an integer i, 1<= i <=n and output of
the ith smallest key from the sequence.
• Algorithm : O(1) CRCW algorithm; use n² CPUs; assume all numbers are distinct
Step 2: . For each CPU i,j (for each 1 <= i,j <= n) in parallel : M i,j = 1 if xi < xj
Step 3: The n² processors are grouped into n groups G1,G2,G3…..Gn where Gi(1<=i<=n) consider of
the processors pi1,pi2…..pin. For each row, use n CPUs to compute OR of n elements.
• Analysis
Step 1: If n = 1, return x1
Step 2: Partition n elements & n processors into k groups, say, G1, G2, …, Gk (assume k² = n). In
parallel, call the algorithm recursively to find maximum element mi of each group Gi
Step 3: Use previous algorithm with n CPUs to find the maximum of m1, m2, …, mk.
• Analysis:
This algorithm uses divide and conquer strategy In step 2, each sub problem has size k or n½ (n½
processors & n½ elements) In step 3, the running time is O(1) , we assume that n = 2pow(2)pow(q)
and T(2) = O(1). The total running time T(n) satisfy the recurrence T(n) = T(n½) + O(1)
... =
Note: n=2pow(2)pow(q)
)+i*O(1) (aftern i steps) =
T(n1 / 2**i 2pow(q)=log
…= q=loglogn
T(n1 / 2**q) + q* O(1) =
T(2) + q*O(1) =
O(1) + q =
O(q) =
O(log log n) =
Algorithm:
for i= 1 to 2c do
Step 1: Find the maximum of all the alive keys with respect to their ith parts. Let M be the
maximum.
Instead we show that if each key is an integer in the range [0,npow(c)], where c is constant , maximal
selection can be done optimally in O(1) time. Speed up of this algorithm is Ɵ(n) and its efficiency is
Ɵ(1).
:EXAMPLE
Consider the problem of finding the maximum of the following four bit keys k1=1010,k2=1101,
k3=0110 , k4=1100. Here n=4,c=2 nd logn=2. In the first step the maximum of four numbers with
respect to their MSB is 1. Thus K3 gets eliminated. In the second Basic Step the maximum of k1,k2
and k4 with respect to their second part is found. As a result K1 is dropped. In the third Basic step, no
.key gets eliminated. Finally in the four basic step, k4 is deleted to output k2 as maximum