Grid Computing: Techniques and Applications Barry Wilkinson
Grid Computing: Techniques and Applications Barry Wilkinson
Grid Computing: Techniques and Applications Barry Wilkinson
Barry Wilkinson
CONTENTS
9.1 Introduction
Definition of Grid Enabling
Types of Jobs to Grid-Enable
9.2 Implementing Parameter Sweep
9.3 Using an Existing Program on Multiple Grid Computers
Data Partitioning
Deploying Legacy Code
9.4 Message-Passing Programming
Background
Brief Overview of Message-Passing Interface (MPI
Grid-Enabled MPI)
Grid-Enabling MPI Programs
9.5 More Advanced Grid-Enabling
Exposing an Application as a Service
Using Grid Middleware APIs
Higher Level Middleware Independent APIs
9.6 Summary
Further Reading
Bibliography
Self-Assessment Questions
Problems Oct 24, 2008
Do not distribute.
Copyright B. Wilkinson
This material is the property of Professor Barry Wilkinson and is for the sole
and exclusive use of the students enrolled in the Fall 2008 Grid computing
course broadcast on the North Carolina Research and Education Network
(NCREN) to universities across North Carolina.
0-8493-XXXX-X/04/$0.00+$1.50
2008 by CRC Press LLC 1
CHAPTER 9
Grid-Enabling Applications
In previous chapters, we have described how to execute jobs through command line
interfaces and GUI interfaces, but for the most part, the actual structure of the jobs
been submitted have been assumed to be ready for the Grid platform. In this chapter,
we explore what jobs and applications are suitable, how to modify them if not, and
how to execute them on a Grid platform.
9.1 INTRODUCTION
0-8493-XXXX-X/04/$0.00+$1.50
2008 by CRC Press LLC 2
GRID-ENABLING APPLICATIONS 3
However, even that simple definition is not agreed upon by everyone! Sometimes,
definitions have specific constraints. For example, one definition in the literature is
(Sanjeepan et al. 2005):
... In the context of this paper Grid enabling means turning an existing appli-
cation, installed on a Grid resource, into a service and generating the applica-
tion-specific user interfaces to use that application through a web portal.
This definition assumes a portal interface and the use of services. Using services as a
front-end is one approach to distributing computing, that we will look at later, but is
only one way. It is particularly relevant for porting legacy code onto the Grid.
A broader definition that matches our view of Grid enabling applications is
(Nolan 2008):
How does one Grid-enable an application is still an open question and in the research
domain without a standard approach. Here will describe various approaches. First let
us look at the type of applications that are best suited for a Grid computing platform.
Clearly, one can run simple batch jobs on a Grid resource and even interactive
programs using commands such as Globus command globusrun-ws, as described
in Chapter 6 and Chapter 7. But all this would do is instruct the scheduler to schedule
the job, which would be executed in due course on a Grid resource (assuming a
scheduler). Although this is perhaps the most common way of using a Grid platform,
it does not make most use of the possibilities that exist in a distributed computing
platform nor the potential collaborative nature of computing that Grid computing can
offer. A distributed computing platform such as a Grid platform offers the possibility
of executing multiple programs concurrently and possibly collectively. The programs
can be executed on separate computers in the Grid with the potential of reduced time
of overall execution time.
We can identify several applications for Grid computing:
Parameter sweep applications
Using multiple computing together
Applications with physically distributed components
Legacy code
scientist attempts to sweep across a parameter space with different values of input
parameter values in search of a solution. There are several reasons for doing a
parameter sweep, but in many cases there is no closed mathematical solution or
readily computed answer and human intervention is involved in a search or design
space. Examples appear in many areas of science and engineering, especially simu-
lation studies, such as simulations in electrical, mechanical and civil engineering,
biology, chemistry, and physics. For example, a scientist might wish to search for a
new drug and needs to try different formulations that might best fit with a particular
protein. In engineering, a design engineer might be studying the effects of different
aerodynamic designs on the performance of an aircraft. Sometimes it is aesthetic
design process and there are many possible alternative designs and a human has to
choose. Sometimes, it a learning process in which the design engineer wishes to
understand the effects of changing various parameters. Typically, there will be many
parameters than can be altered and there might be a vast combination of parameter
values. Ideally, some automated way of doing parameter sweep is needed that
includes both specifying the parameter sweep and a way of scheduling the individual
sweeps across the Grid platform.
should be computationally-intensive. The most ideal parallel program is one that has
the least communications.
The perfect parallel algorithm would be one with no inter-process communica-
tion at all. Although generally such applications are rare if not impossible as at least
the result of each process needs to be collected, there are real and important examples
that are very close. The Monte Carlo algorithms, for example, use random sampling
to obtain a solution to difficult mathematical problems such as integration of high
order functions. The Monte Carlo random samples are not related to each other. The
final solution is obtained by combining the result of each random sample. Parallel
algorithms that are easy and obvious to divide into concurrent parts are called embar-
rassingly parallel by Geoffrey Fox (Wilson, 1995) or perhaps more aptly called
naturally parallel. The implication of embarrassing parallel computations is that they
have minimal inter-process communication although the actually amount is subjec-
tive.
Parallel programming techniques are fully described elsewhere in the context
of a parallel computers and a cluster of locally connected computers for example
(Wilkinson and Allen, 2005). However most of these techniques do not map well
onto a Grid computing platform without significant modifications because of distrib-
uted nature of the Grid. For example some very important and common application
areas have communications that are limited to local inter-process communication
which would appear to be a desirable feature. There are many important examples in
most scientific domains, especially in simulations in the physical world. In such
problems, the solution space is divided into a mesh of 2-dimensional or 3-dimen-
sional regions and communication is usually only required between adjacent regions.
In weather forecasting, the atmosphere is divided into cubes and physical factors such
as temperature, pressure, humidity etc. are computed using values computed at
adjacent cells and repeatedly to simulate the passage of time. Data is only passed
between adjacent cells. Unfortunately although this communication pattern might
suit a closely coupled parallel computers, the communications need to be synchro-
nized and happen very frequently. Being local to adjacent regions does not help if the
regions are mapped onto widely distributed computers on the Internet that will need
to operate in synchronism.
Later we will describe the message passing libraries called MPI as this can be
ported onto a Grid for creating communicating parts for a program. We will also
explore how using the Grid platform in this environment.
is to use Web services as front-ends for both remote resources such as hardware
accelerators and software components of remote applications.
We shall look at each of the above areas in more detail in subsequent sections.
Parameter sweep can be simply achieved by submitting multiple job description files,
one for each set of parameters but that is not very efficient. Parameter sweep appli-
cations are so important that many research projects on Grid tools have been devoted
to making them efficient on a Grid and providing for them in job description lan-
guages. It appears explicitly in the RSL-1/RSL-2/JDD job description languages
described in Chapter 6, Section 6.2.2 with the count element. For example, inserting:
<count> 5 </count>
in the RSL-2/JDD job description file would cause five instances of the job to be sub-
mitted. This in itself would simply cause five identical executables to be submitted
as jobs. Four would be pointless unless either the code itself selected actions for each
instance, or different inputs and output files were selected for each instance in the job
description file. Job description elements usually can be modified to include a
variable that would change for each instance.
Originally JSDL (version 1) did not have accommodation for parameter sweep,
but JSDL version 1 has been extended to incorporate features for parameter sweep.
The JSDL 1.0 specification allows for extensions although such extensions limit the
interoperatiblity. The extensions to JSDL 1.0 were discussed at OGF 19 in 2007 and
received further consideration at several subsequent OGF meetings (OGF 23). Two
forms of parameter sweep creation were identified, enumeration in a list and numer-
ically related arguments. Following may become standardized later
<JobDefinition>
<JobDescription>
<Application>
.
<Executable>... </Executable>
<Argument>... </Argument>
<Argument>... </Argument>
.
.
<Argument> ... </Argument>
Subsitution .
values in Select element
</Application>
sequence to
obtain <Assignment>
multiple job <Parameter> selection expression
descriptions </Parameter>
<Value>
.
. namespaces omitted
.
</Value>
.
</JobDescription>
Subsitution
values in
sequence to Select
obtain element
multiple job Submit jobs
descriptions
sion provides a way to select an XML element in a XML document. XPath was
mentioned in Chapter 3 Section 3.3.5 for selecting an XML element from an index
service XML document but not fully explained. Briefly, suppose the XML document
has the form:
<a>
<b>
<c>
</c>
</b>
</a>
the XPath expression to identify the element <c> ... </c> would be /a/b/c.
XPath allows for much more expressive forms to select an element. For example
suppose there are multiple tags called <c>:
<a>
<b>
<c>
</c>
.
.
.
<c>
</c>
</b>
</a>
The XPath expression to select the third <c> element would be: /a/b/c[3].
To take an example for a parameter sweep, consider the JSDL job:
<jsdl:JobDefinition>
<jsdl:JobDescription>
<jsdl:Application>
<jsdl-posix:POSIXApplication>
<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable>
<jsdl-posix:Argument>Hello</jsdl-posix:Argument>
<jsdl-posix:Argument>Fred</jsdl-posix:Argument>
</jsdl-posix:POSIXApplication> </jsdl:Application>
</jsdl:JobDescription>
</jsdl:JobDefinition>
To alter the second argument to be Bob, Alice, and Tom (three sweeps), the code
might be:
<jsdl:JobDefinition>
<jsdl:JobDescription>
<jsdl:Application>
<jsdl-posix:POSIXApplication>
<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable>
<jsdl-posix:Argument>Hello</jsdl-posix:Argument>
<jsdl-posix:Argument>Fred</jsdl-posix:Argument>
</jsdl-posix:POSIXApplication>
</jsdl:Application>
</jsdl:JobDescription>
<sweep:Sweep>
<sweep:Assignment>
<sweep:Parameter>//jsdl-posix:Argument[2]</sweep:Parameter>
2008 by CRC Press LLC
10 GRID COMPUTING: TECHNIQUES AND APPLICATIONS
<sweepfunc:Values>
<sweepfunc:Value>Bob</sweepfunc:Value>
<sweepfunc:Value>Alice</sweepfunc:Value>
<sweepfunc:Value>Tom</sweepfunc:Value>
</sweepfunc:Values>
</sweep:Assignment>
</sweep:Sweep>
</jsdl:JobDefinition>
Hello Bob
Hello Alice
Hello Tom
(or possibly in different orders depending upon how the jobs are scheduled on the
machine that run the jobs).
Multiple assignment elements can be used to alter more than one argument in
each sweep. <Assignment> elements can also be nested to create combinations of
arguments. It is left as an exercise to produce such variations (Assignment ??).
Perhaps the easiest way to use multiple computers together to solve a problem is to
divide the its data into parts and have each computer work on each part, so-called data
partitioning. There are problems that particularly lend themselves for this approach.
For example, the BLAST algorithm used in bioinformatics to find make statistical
matches between gene sequences (BLAST home page). A BLAST user might submit
a sequence query that is then compared to a very large database of known sequences
in order to discover relationships or match the sequence to a gene family. The
databases can be extremely large (100s MBytes). If there is just one sequence from
the user, the database might be partitioned into parts and different computers work on
different parts of the data base with the same sequence as illustrated in Figure 9.3 Of
course this assumes that the database can be easily split into parts. There will be a
significant cost in sending a large database partition to different sites but once done
it can be reused. There will be communications at the edges of two databases to match
the sequence across the partitions if the database partitions do not overlap.
An alternative approach if the user or users are submitting many queries is to
submit each query to a different computer holding or having access to the whole
database as illustrated in Figure 9.4. This approach is in essence a form of parameter
Users sequence
Copy sequence
Compare
sequence
with database
partitions
Database
Users sequence
Compare sequences
with database
Computers
Access to
database
Database
sweep that we described earlier, where each input sequence is one sweep. In a distrib-
uted system, significant network communications to access the database would result
unless the whole database is already at each site.
The two simple approaches described for speeding up a BLAST search on a
multiprocessor or distributed system do not need to rewrite the BLAST application.
BLAST has been ported onto Grid platform, for example Dynamic BLAST (Afgan
and Bangalore 2008), which provides a graphical user interface and integrated with
Globus environment. Another example is GridBlast (Krishnan 2005).
Legacy programs
For the most part, Grid users want to re-use their existing code, which may be origi-
nally programs written for a single computer in C, C++ or even Fortran if really old.
The best one could hope for such programs without extensive re-writing is to execute
them on a remote compute resource. Some applications may not be available in doc-
umented source code and may be pre-packaged by the manufacturer so rewriting may
not even be an option.
One project that addresses porting legacy code onto a Grid is GriddLeS: Grid
Enabling Legacy Software (GriddLeS). That project focusses on the file handling and
overloads existing file handling routines and redirects the requests to remote
locations if required, as illustrated in Figure 9.5.
Read()
File
multiplexer
Write()
Local file
system
Close()
Figure 9.5 GriddLeS: Grid Enabling Legacy Software approach to file handling. (GriddLeS)
Not sure whether to use this in book. May need to check with source.
9.4.1 Background
Computers on a Grid are connected through a wide area network, usually the Internet,
or in the case of very high performance Grid infrastructures such as theTeraGrid,
through extremely high bandwidth dedicated networks. So to use these computers
collectively suggests a programming model similar to that often used in clusters, i.e.
have processes on different computers communicate through message passing. This
approach is central to cluster computing when the interconnected computers are
physically close and grouped into a cluster as one compute resource. An extension of
this idea is to have each computer execute more than one program or process concur-
rently, and these processes communicate between themselves using message passing.
We usually talk about communicating processes rather than communicating
programs as it is possible to have multiple communicating processes within a single
program.
Message passing is generally done using library routines. The programmer
explicitly inserts these message-passing routines into their code. It is extremely error
prone and requires great care to achieve the desired results. The full treatment of
message-passing programming is outside the scope of this book, but we will
introduce the essential concepts and then describe how they might extend to a Grid
computing platform, which has some superficial similarities to a cluster but has some
significant differences.
First it is necessary to have available suitable message-passing software envi-
ronment and libraries. Perhaps the first highly successful message passing suite of
libraries is PVM (Parallel Virtual Machine) developed at Oak Ridge National Labo-
ratories in the late 1980s by Sunderam (Sunderam 1990) (Geist et al. 1994). PVM
became widely used in the early 1990s. It provided a complete fully implemented set
of library routines for message passing.
Subsequently, a standard specification for a very large set of message-passing
APIs was developed called MPI (Message Passing Interface) (Snir, et al. 1998),
(Gropp, Lusk, and Skjellum 1999), which replaced PVM in use in the mid 1990s for
message-passing parallel programming. Whereas PVM was implementation of
message-passing libraries, MPI only specified the API interface (routine names,
arguments, required actions, and return value) for C, C++, and Fortran. MPI was
developed by the MPI forum, a collective with more that 40 industrial and research
organizations. It has been very widely accepted. As with all standards, the actual
implementation was left to others and there are now many implementations of MPI,
most free. The first version of the MPI standard was finalized in 1994. Version 2 of
MPI was introduced in 1997 with a greatly expanded scope, most notably to offer
new operations that would be efficient such as one-sided message passing. MPI
version 1 has about 126 routines and MPI version 2 increased the number of routines
to about 156 routines. (check) MPI-2 did depreciate some MPI-1 routines and
provided better versions, but even so, the standard is daunting. However, in most
applications only a few MPI routines are actually needed. It has been suggested that
many MPI programs can be written using only about six different routines (see later).
2008 by CRC Press LLC
14 GRID COMPUTING: TECHNIQUES AND APPLICATIONS
In the following, we will describe MPI but only in sufficient detail to explain
important characteristics and factors for Grid enabling. Readers who know MPI can
skip the following review section on introducing the basic features of MPI. Figure
9.6, Figure 9.7, Figure 9.8, Figure 9.9, Figure 9.11, and Figure 9.12 are based upon
(Wilkinson and Allen 2005). Requires permission
Source
file
Compile to suit
processor
Executables
Processors executing
processes
Process 0 Process p 1
Figure 9.6 MPI version 1 programming model Based upon (Wilkinson and Allen 2005).
coit-grid01.uncc.edu
coit-grid02.uncc.edu
coit-grid03.uncc.edu
coit-grid04.uncc.edu
coit-grid05.uncc.edu
Usually the specified machines are connected together in a cluster through local
network. If a machines file is not specified a default machines file will be used or it
may be that the program will only run on a single computer.
Typically, the MPI implementation uses daemons running on each computer to
manage the processes and effect the message passing. These daemons may need to
be started before the MPI programs. In LAM-MPI, a specific command called
mpiboot must be issued by the user. In MPICH-2, the MPI daemons should be
already running after the MPI installation but they can be re-started by the user with
a similar command, mpdboot.
mpicc typically uses the cc compiler and corresponding flags (options) are avail-
able. Once compiled, typically the program can be executed. The MPI-1 standard
does not specify implementation details at all and left it to the implementer. The
actual commands would depend upon the implementation of MPI, but the common
command would be mpirun where -np option specifies the number of processes.
Hence a program myProg would be run with 4 processes using the command:
if a machines file is specified. Command line arguments for myProg can be added
after myProg.
The MPI-2 standard did make certain recommendations regarding the imple-
mentation and introduced the mpiexec command with certain flags. The -np option
specifies the number of processes. Hence, the corresponding commands would be :
mpiexec -n 4 myProg
or
mpiexec -machinefile <machinefilename> -n 4 myProg
I am 3 of 4
I am 1 of 4
I am 4 of 4
I am 2 of 4
2 Technically, there are two types of MPI communicator, an intracommunicator and an intercom-
municator and a process could have a rank in multiple communicators. More details can be found in
(Gropp, Lusk, and Skjellum 1999).
MPI_Finalize();
}
where master() and slave() are procedures to be executed by the master process
and slave process, respectively.
Message-passing routines.
Data transfer. One of principal purposes for the message passing is to send
data from one process to another process to continue the distributed computation. The
simplest and most fundamental activity is point-to-point message passing in which
data is carried in a message that is sent from a single process and received at one
process. This requires an MPI_send() routine at the source and an MPI_recv()
routine at the destination as illustrated in Figure 9.7. In this example, the value
contained in the variable x is sent from process 1 to process 2 and y is assigned that
value. At the very least, the arguments in the send and receive routines must identify
the value being sent, and other process to be able to implement the transfer. Usually,
the arguments in the send and receive include more details such as the datatype. In
MPI, there are six arguments (parameters) defined in the basic MPI send routine and
seven arguments in the basic receive routines:
MPI_Send(*buf, count, datatype, dest, tag, comm)
Process 1 Process 2
int x int y
Movement
MPI_send(&x, ... ,2, ...); of data
MPI_recv(&y, ... ,1, ...);
Figure 9.7 Generic point-to-point message passing using send() and recv() library calls.
Based upon (Wilkinson and Allen 2005).
Action
buf
Figure 9.8 Broadcast operation. Based upon (Wilkinson and Allen 2005).
Address of Datatype of
send buffer each item
Number of items Rank root Communicator
to send process (source of broadcast)
Scatter is shown in Figure 9.9. The arguments for the MPI scatter are given
below:
MPI_Scatter(*sendbuf,sendcount,sendtype,*recvbuf,recvcount,recvtype,root,comm)
The sendcount refers to the number of items sent to each process. One can see now
we have a large number of arguments, somewhat strangely that the send and receive
types and count are specified separately and could be different but almost always
sendcount would be the same as recvcount, and sendtype would be the same as
recvtype. (When would want otherwise and how does that work?)
The simplest scatter would be as illustrated in Figure 9.9 in which one element
of an array is sent to different processes. A simple extension would be to send a fixed
number of contiguous elements to each process. In the code below, the size of the
send buffer is given by 100 * <number of processes> and 100 contiguous
elements are send to each process:
Action
buf
MPI_scatter(); MPI_scatter(); MPI_scatter();
Code
Figure 9.9 Simple scatter operation. Based upon (Wilkinson and Allen 2005).
MPI_Scatter(sendbuf,100,MPI_INT,recvbuf,100,MPI_INT,0,MPI_COMM_WORLD);
.
MPI_Finalize(); /* terminate MPI */
}
which is illustrated in Figure 9.10.
Gather is shown in Figure 9.11 and is essentially the opposite of scatter. The
arguments for the MPI gather are similar to the scatter, i.e. :
MPI_Gather(*sendbuf,sendcount,sendtype,*recvbuf,recvcount,recvtype,root,comm)
Send buffer
Action
buf
MPI_Gather(); MPI_Gather(); MPI_Gather();
Code
Action
+
buf
MPI_reduce(); MPI_reduce(); MPI_reduce();
Code
Figure 9.12 Reduce operation (addition). Based upon (Wilkinson and Allen 2005).
Time
MPI_Barrier();
Wait MPI_Barrier();
Wait
MPI_Barrier();
Continue
send data without synchronizing the source and destination processes. As mentioned
the MPI_Send() does not wait for the message to be received before the process can
continue. A synchronous version exists called MPI_SSend() which does not
continue until the message has been received and hence combines message passing
with synchronization. Collective operations such as MPI_broadcast(),
MPI_gather(), MPI_scatter(), and MPI_reduce() have innate synchroniza-
tion. None of the processes can proceed until all have participated in the operations.
Technically, it might be possible to allow sources in a collective operation to send
their data and continue although MPI collective operations do not have that facility
(check if any do).
MPI Interface
MPI Implementation
Different implementations
for different systems e.g:
ch p4 device for
Routines to Abtract Device Interface Workstation clusters
move data. ch_shmem device for a
shared memory system
Communication
Processes
staging). It used GIS (Globus Information System) and MDS (Monitoring and
Discovery Service (Check) to obtain information about remote resources. It used
GRAM to submit MPI jobs through Globus to remote resources, interfacing with
local schedulers. In that regard, MPICH-G created the appropriate job description file
using RSL-1, the job description language used at the time. As pointed out earlier, it
is necessary for the communicating processes to be running at the same time. To
achieve this, the Glolbus version 2 component called DUROC (Dynamically
Updated Request Online Coallocator) was used to coordinate the start-up of
processes and coordinate processes to operate within the single default communica-
tor MPI_COMM_WORLD.
Figure 9.15 shows a concept comparison of using MPI and using MPICH-G. In
both cases, the same mpirun command is invoked and a machines file is used to
identify the machines. MPICH-G may also need additional information, for example
the specific job manager name and port number, which could be specified in the
machines file or found from a Globus information service (MDS). Other Globus
services are also needed as described before. Finally, GRAM is used to submit the job
using a job description file created by MPICH-G from the user input with mpirun.
Notice that GSI is used to ensure a secure environment. Originally, MPICH
employed the insecure rsh where information is passed in plain text although it can
be reconfigured to use secure ssh. MPICH-G used a communication components
called Nexus, which can manage different communication methods. For improved
performance, MPICH-G2, a complete re-implementation of MPICH-G, replaced
Nexus with its own communication components (Karonis, Toonen, and Foster 2003).
MPICH-G2 implements MPI-version 1.1, which for many applications is suf-
ficient, but there are other issues in its use. It is usually necessary to work with fire-
Machines file
Compute resources
Look up
machines
Machines file
Stage input files
Retreive output Compute resources
Look up GASS
files
machines
Submit job
mpirun
description
Submit
Get contact job
details GRAM
Local
MDS scheduler
GSI DUROC
Coallocate
resources
Globus Information
Service Globus services
walls. Specific ports must be open. Each site will have a separate independent local
job scheduler with many users, and may be difficult to guarantee all MPI processes
will be operating at different sites at the same time for them to communicate. The
delays in messages in transit are much larger and variable between sites, so often the
programmer will have to redesign his/her program to accommodate the indeterminate
latencies.
Apart from the early MPICH-G/MPICH-G project, there have been other
projects for providing tools to run MPI programs across a geographically distributed
Grid that do not necessarily use Globus software, for example the Grid-Computing
library PACX-MPI (Parallel Computer Extension - MPI) and open-source GridMPI
project. PACX-MPI can encrypt messages using ssl. The GridMPI project avoided
a security layer altogether for performance. This project found that MPI programs
scale up to communication latencies of 20 mS, which represents about 500 miles for
1 - 10 Gps networks. They argue that 500 miles represents the distance between
major sites in Japan (where the project was developed), geographical distributed high
performance computer sites often use their own networks, a security is not an issue,
and then MPI programming is feasible.
In both GridMPI and PACX-MPI, the idea is not to have to alter the MPI programs
at all and simply run MPI programs that would run on a local cluster. That does not
preclude modifying the programs to take into account the geographically distributed
nature of the compute resources and there has been projects to explore how to modify
MPI programs. The most notable aspects for running MPI programs on a Grid are:
Mostly, the focus is to ameliorate the effects of latency. One basic concept is to
overlap communications with computations while waiting for the communication to
complete, which comes directly from regular parallel programming, and is illustrated
in Figure 9.16. In this example, process n needs to send data through a message to
process m. The MPI_Send() routine in process n occurs before the MPI_Recv()
routine in process m in time. After the send routine completes all its local actions, it
returns and starts on subsequent computations before the message has been received.
The MPI MPI_Send() routine actually will return in this manner. (Other MPI
routines such as MPI_Ssend() will only return after the message has been received
and would force both processes to wait for each other.) At some point, it usually
Process n Process m
MPI_send() Communication
necessary to synchronize the two processes with say a MPI_Barrier() routine. One
should avoid such routines it at all possible as it will delay processes until all
processes reach that point. There are algorithms in parallel programming that try to
minimize the synchronization points. In Figure 9.16, it is possible for the
MPI_Recv() to occur before the MPI_Send() routine and that case the
MPI_Recv() routine would wait for the MPI_Send() routine.
The concept can be extended to allow the processes to be decoupled further.
For example, in an algorithm that needs to iterate towards a solution and have to send
values obtained at each iteration to neighboring processes, it is possible to buffer the
data from multiple messages allowing the source to continue with subsequent itera-
tions. More details of this approach can be found in (Villalobos and Wilkinson,
2008).
Other issues in using a GRid include the effects of failures especially on long-
running jobs. Most, if not all, MPI implementations do not handle system failures. If
a process fails, normally the whole system will crash. Dealing with faults at the pro-
gramming level would be very cumbersome, and ideally the system should be fault
tolerant. Finally, systems in a Grid environment are very likely to be heterogeneous.
Computers being of different types would be expected in a Grid environment and
more so than in a local cluster set up at one time. The seemingly trivial factor that
some computers store their data in little endian (least significant byte first) and others
store their data in big endian (most significant byte first) needs to be handled.
In this final section, we will mention approaches that are available to deploying an
application on a Grid platform
Grid computing has embraced Web service technology for its implementation and it
is nature to use Web services for the applications also to make them accessible on a
Grid platform. An approach is to wrap the code to produce a Web service, as illus-
trated in Figure 9.17. Then, the application can be accessed through its URL from
anywhere. The service might be accessed from another service or application or by
the user perhaps through a customized portlet in a portal.
The Web service approach described so far only provides an interface for an
application. The application still has to be capable of executing fully in the environ-
ment it is placed. All input files need to be available. Input and output staging may
be needed. In a Grid service environment, it would be possible for services to call
other services in a workflow as described in Chapter 8.
More - grid wrapper code example?
Portal
Grid service
Grid service
interface Wrapped
application
Application
Some basic actions an application might wish to perform in a Grid environment are
to transfer files to and from locations, read from files and write to files, perform
standard input and output. More advanced activities might take advantage of Grid
middleware operations such as starting and monitoring jobs, and monitoring and
discovery of Grid resources. An approach is to incorporate Grid middleware APIs in
the code for external operations such as file input/output and even starting new jobs
from within the application.
Using Globus APIs. Globus does provide its middleware APIs for
invoking Globus services and routines, but there is an extremely steep learning curve
for the user with advanced knowledge. There are literals hundreds, if not thousands,
of C and Java routines listed at the Globus site, but with no tutorial help and sample
usage. As an example, Figure 9.18 shown the code needed just to copy a file from
one location to another using Globus APIs.
Using CoG kit APIs. Using CoG kit APIs is at slightly higher level as
mentioned in Chapter 8. Figure 8.13 in Chapter 8 shows the code to transfer files
using CoG kit routines, which is not too difficult but still requires setting up the
Globus context.
Clearly it is desirable to have a higher level of abstraction than using Globus middle-
ware APIs not only because of the complexity of these routines but also because
Grid middleware changes very often. Globus typically is revised every couple of
months and has major changes every year or two. Sometimes, very significant
changes occur. Globus version 3 was abandoned after only about a year. It is
globus_gass_copy_handleattr_set_ftp_attr(&gass_copy_handleattr,&ftp_handleattr
);
globus_gass_copy_handle_init (&gass_copy_handle,&gass_copy_handleattr);
if (source_url.scheme_type == GLOBUS_URL_SCHEME_GSIFTP ||
source_url.scheme_type == GLOBUS_URL_SCHEME_FTP ) {
globus_ftp_client_operationattr_init (&source_ftp_attr);
globus_gass_copy_attr_set_ftp (&source_gass_copy_attr,&source_ftp_attr);
}
else {
globus_gass_transfer_requestattr_init (&source_gass_attr,
source_url.scheme);
globus_gass_copy_attr_set_gass(&source_gass_copy_attr,&source_gass_attr);
}
output_file = globus_libc_open ((char*) target,O_WRONLY | O_TRUNC | O_CREAT,
S_IRUSR | S_IWUSR | S_IRGRP |S_IWGRP);
if ( output_file == -1 ) {
printf ("could not open the file \"%s\"\n", target);
return (-1);
}
/* convert stdout to be a globus_io_handle */
if ( globus_io_file_posix_convert
(output_file,0,&dest_io_handle)!=GLOBUS_SUCCESS){
printf ("Error converting the file handle\n");
return (-1);
}
result = globus_gass_copy_register_url_to_handle (&gass_copy_handle,
(char*)source_URL,
&source_gass_copy_attr, &dest_io_handle,my_callback, NULL);
if ( result != GLOBUS_SUCCESS ) {
printf ("error: %s\n", globus_object_printable_to_string(globus_error_get
(result)));
return (-1);
}
globus_url_destroy (&source_url);
return (0);
}
Figure 9.18 Code using Globus APIs to copy a file (C++)
Directly from (van Nieuwpoort) Also in (Kaiser 2004) (Kaiser 2005).
extremely difficult to keep up with the changes. Also Globus is not the only Grid mid-
dleware. We have focussed on Globus but there are efforts around the world to
provide Grid middleware, including UNICORE (Uniform Interface to Computing
Resources), which started in Germany about the same time as Globus and continues
to be developed, and gLite (Lightweight middleware for Grid computing), which is
part of the EGEE (Enabling Grids for E-sciencE) collaborative. To give an indication
of the rapid changes that occur, gLite 3.0.2 Update 43 was released May 22, 2008.
gLite 3.1 Update 27 was released July 3, 2008, 6 weeks later.
The concept of higher-level APIs above the Grid middleware is illustrated in
Figure 9.19. These higher-level APIs should expose a simple interface that is not tied
to specific version of Grid middleware or even Grid middleware family at all. The
GAT (Grid Application Toolkit) developed in 2003-2005 time frame (Kaiser 2004,
2005) followed this approach. As an example, code to copy a file in GAT is shown in
Figure 9.20. Subsequently, a effort was made by the Grid community to standardize
these APIs leading to SAGA (Simple API for Grid Applications) proposal
(Kielmann 2006). Reading a file in SAGA is show in Figure 9.21.One can see in this
code, there is no mention of the underlying Grid middleware and the code is applica-
ble to any such middleware.
Users
Applications
Grid Middleware
services (e.g. Globus)
#include <GAT++.hpp>
GAT::Result RemoteFile::GetFile (GAT::Context context,
std::string source_url, std::string target_url)
{
try {
GAT::File file (context, source_url);
file.Copy (target_url);
}
catch (GAT::Exception const &e) {
std::cerr << e.what() << std::endl;
return e.Result();
}
return GAT_SUCCESS;
}
#include <string>
#include <iostream>
#include <saga.h>
int main () {
try {
// open a remote file
saga::file f (gsiftp://ftp.university.edu/pub/INDEX);
// read data
while ( string s = f.read (100) ) {
std::cout << s;
}
}catch (saga::exception e) {
std::cerr << e.what() << std::endl;
}
}
9.6 SUMMARY
This chapter introduced the types of jobs and applications amenable to porting onto
a Grid platform and techniques for porting them. The following was introduced:
Parameter Sweep
Partitioning a problem for executing on multiple computers
Parallel programming with MPI
Grid-enabling MPI
Exposing an applications a Web service
Using higher level APIs to call Grid middleware services and routines
FURTHER READING
There are many tutorials on MPI. The primarily source material can be found in the
MPI reference manual (Message Passing Interface Forum. 2008) and the Message
Passing Interface (MPI) standard home page. For more information on parallel pro-
gramming see (Wilkinson and Allen 2005). Grid-enabling application is still a
research topic with no common agreement. There have been workshops focussing on
this topic such as Workshop on Grid-Enabling Applications, 15th ACM Mardi Gras
conference, 2008.
BIBLIOGRAPHY
SELF-ASSESSMENT QUESTIONS
The following questions are multiple choice, Unless otherwise noted, there is only one correct
question for each question.
1. Question
a) Answer
PROGRAMMING ASSIGNMENTS
PROBLEMS