A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system. The components of a cluster are usually connected to each other through fast local area networks, each node running its own instance on an operating system. Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high speed networks, and software for high performance distributed computing. Parallel computing has seen many changes since the days of the highly expensive and proprietary supercomputers. Changes and improvements in performance have also been seen in the area of mainframe computing for many environments. But these compute environments may not be the most cost effective and flexible solution for a problem. Over the past decade, cluster technologies have been developed that allow multiple low cost computers to work in a coordinated fashion to process applications. The economics, performance and flexibility of compute clusters makes cluster computing an attractive alternative to centralized computing models and the attendant to cost, inflexibility, and scalability issues inherent to these models. Many enterprises are now looking at clusters of high-performance, low cost computers to provide increased application performance, high availability, and ease of scaling within

the data center. Interest in and deployment of computer clusters has largely been driven by the increase in the performance of off-the-shelf commodity computers, high-speed, low-latency network switches and the maturity of the software components. Application performance continues to be of significant concern for various entities including governments, military, education, scientific and now enterprise organizations. This document provides a review of cluster computing, the various types of clusters and their associated applications. This document is a high-level informational document; it does not provide details about various cluster implementations and applications.

1.2 STATEMENT OF PROBLEM (i) The desire to get more computing horsepower and better reliability by

orchestrating a number of low cost commercial off-the-shelf computers has given rise to a variety of architectures and configurations. In the light of the above it is therefore necessary to explain the role of cluster computing with a view of explaining its full potentials 1.3 AIM AND OBJECTIVE The aim and objectives of this study are as follows 1. To describe the role of cluster computing as a form of computing that delivers high performance and reliability.

2. To explain the designed and implemented of cluster computing in having a wide range of applicability and deployment. 1.4 SIGNIFICANCE OF STUDY This study was carried out to investigate and provide information of the role of cluster computing as a medium of fulfilling the desire to get more computing horsepower and better reliability by orchestrating a number of low cost commercial off-the-shelf computers.

CHAPTER TWO LITERATURE REVIEW 2.1 BASIC CONCEPTS Greg Pfister's has stated that clusters were not invented by any specific vendor but by customers who could not fit all their work on one computer, or needed a backup. Pfister estimates the date as some time in the 1960s. The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing. The history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivations for the development of a network was to link computing resources, creating a de facto computer cluster. The computer clustering approach usually (but not always) connects a number of readily available computing nodes (e.g. personal computers used as servers) via a fast local area network. The activities of the computing nodes are orchesterated by "clustering middleware", a software layer that sits atop the

nodes and allows the users to treat the cluster as by and large one cohesive computing unit, e.g. via a single system image concept. 2.2 CLUSTER COMPUTING Cluster computing is best characterized as the integration of a number of off-the-shelf commodity computers and resources integrated through hardware, networks, and software to behave as a single computer. Initially, the terms cluster computing and high performance computing was viewed as one and the same. However, the technologies available today have redefined the term cluster computing to extend beyond parallel computing to incorporate load-balancing clusters (for example, web clusters) and high availability clusters. Clusters may also be deployed to address load balancing, parallel processing, systems management, and scalability. Today, clusters are made up of commodity computers usually restricted to a single switch or group of interconnected switches operating at Layer 2 and within a single virtual local-area network (VLAN).Each compute node (computer) may have different characteristics such as single processor or symmetric multiprocessor design, and access to various types of storage devices. The underlying network is a dedicated network made up of highspeed, low-latency switches that may be of a single switch or a hierarchy of

multiple switches. A growing range of possibilities exists for a cluster interconnection technology. Different variables will determine the network hardware for the cluster. Price-per-port, bandwidth, latency, and throughput are key variables. The choice of network technology depends on a number of factors, including price, performance, and compatibility with other cluster hardware and system software as well as communication characteristics of the applications that will use the cluster. Clusters are not commodities in themselves, although they may be based on commodity hardware. A number of decisions need to be made (for example, what type of hardware the nodes run on, which interconnect to use, and which type of switching architecture to build on) before assembling a cluster range. Each decision will affect the others, and some will probably be dictated by the intended use of the cluster. Selecting the right cluster elements involves an understanding of the application and the necessary resources that include, but are not limited to, storage, throughput, latency, and number of nodes. When considering a cluster implementation, there are some basic questions that can help determine the cluster attributes such that technology options can be evaluated:

1. 2.

Will the application be primarily processing a single dataset? Will the application be passing data around or will it generate real-time

information? 3. Is the application 32- or 64-bit?

The answers to these questions will influence the type of CPU, memory architecture, storage, cluster interconnect, and cluster network design. Cluster applications are often CPU-bound so that interconnect and storage bandwidth are not limiting factors, although this is not always the case. 2.3 CLUSTER APPROACH Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers. It is distinct from other approaches such as peer to peer or grid computing which also use many nodes, but with a far more distributed nature. A computer cluster may be a simple two-node system which just connects two personal computers, or may be a very fast supercomputer. A basic approach to building a cluster is that of a Beowulf cluster which may be built with a few personal computers to produce a cost-effective alternative to traditional high performance computing. An early project that showed the viability of the concept was the 133 nodes Stone Soupercomputer. The developers used

Linux, the Parallel Virtual Machine toolkit and the Message Passing Interface library to achieve high performance at a relatively low cost. Although a cluster may consist of just a few personal computers connected by a simple network, the cluster architecture may also be used to achieve very high levels of performance. The TOP500 organization's semiannual list of the 500 fastest supercomputers often includes many clusters, e.g. the world's fastest machine in 2011 was the K computer which has a distributed memory, cluster architecture.

3.1 CLUSTER SYSTEMS Computer clusters may be configured for different purposes ranging from general purpose business needs such as web-service support, to computation-intensive scientific calculations. In either case, the cluster may use a high-availability approach. Note that the attributes described below are not exclusive and a "compute cluster" may also use a highavailability approach, etc. "Load-balancing" clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. For example, a web server cluster may assign different queries to different nodes, so the overall response time will be optimized. However, approaches to load-balancing may significantly differ among applications, e.g. a high-performance cluster used for scientific computations would balance load with different algorithms from a web-server cluster which may just use a simple round-robin method by assigning each new request to a different node. "Compute clusters" are used for computation-intensive purposes, rather than handling IOoriented operations such as web service or databases. For instance, a compute cluster might support computational simulations of weather or vehicle crashes. Very tightlycoupled compute clusters are designed for work that may approach "supercomputing". "High-availability clusters" (also known as failover clusters, or HA clusters) improve the availability of the cluster approach. They operate by having redundant nodes, which are

then used to provide service when system components fail. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure. There are commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux operating system. 3.2 CLUSTER BENEFITS The main benefits of clusters are scalability, availability, and performance. For scalability, a cluster uses the combined processing power of compute nodes to run cluster-enabled applications such as a parallel database server at a higher performance than a single machine can provide. Scaling the cluster's processing power is achieved by simply adding additional nodes to the cluster. Availability within the cluster is assured as nodes within the cluster provide backup to each other in the event of a failure. In high-availability clusters, if a node is taken out of service or fails, the load is transferred to another node (or nodes) within the cluster. To the user, this operation is transparent as the applications and data running are also available on the failover nodes. An additional benefit comes with the existence of a single system image and the ease of manageability of the cluster. From the users perspective the users sees an application resource as the provider of services and applications. The user does not know or care if this resource is a single server, a cluster, or even which node within the cluster is providing services.

These benefits map to needs of today's enterprise business, education, military and scientific community infrastructures. In summary, clusters provide: Scalable capacity for compute, data, and transaction intensive applications, including support of mixed workloads Horizontal and vertical scalability without downtime Ability to handle unexpected peaks in workload Central system management of a single systems image 24 x 7 availability

3.3 DESIGN AND CONFIGURATION One of the issues in designing a cluster is how tightly-coupled the individual nodes may be. For instance, a single computer job may require frequent communication among nodes: this implies that the cluster shares a dedicated network, is densely located, and probably has homogenous nodes. The other extreme is where a computer job uses one or few nodes, and needs little or no inter-node communication, approaching grid computing. In a Beowulf system, the application programs never see the computational nodes (also called slave computers) but only interact with the "Master" which is a specific computer handling the scheduling and management of the slaves. In a typical implementation the Master has two network interfaces, one that communicates with the private Beowulf network for the slaves, the other for the

general purpose network of the organization. The slave computers typically have their own version of the same operating system, and local memory and disk space. However, the private slave network may also have a large and shared file server that stores global persistent data, accessed by the slaves as needed. By contrast, the special purpose 144 node DEGIMA cluster is tuned to running astrophysical N-body simulations using the Multiple-Walk parallel treecode, rather than general purpose scientific computations. Due to the increasing computing power of each generation of game consoles, a novel use has emerged where they are repurposed into High-performance computing (HPC) clusters. Some examples of game console clusters are Sony PlayStation clusters and Microsoft Xbox clusters. Another example of consumer game product is the Nvidia Tesla Personal Supercomputer workstation, which uses multiple graphics accelerator processor chips. Computer clusters have historically run on separate physical computers with the same operating system. With the advent of virtualization, the cluster nodes may run on separate physical computers with different operating systems which are painted above with a virtual layer to look similar. The cluster may also be virtualized on various configurations as maintenance takes place. An example implementation is Xen as the virtualization manager with Linux-HA.


Task scheduling When a large multi-user cluster needs to access very large amounts of data, task scheduling becomes a challenge. The MapReduce approach was suggested by Google in 2004 and other algorithms such as Hadoop have been implemented. However, given that in a complex application environment the performance of each job depends on the characteristics of the underlying cluster, mapping tasks onto CPU cores and GPU devices provides significant challenges. Node failure management
When a node in a cluster fails, strategies such as "fencing" may be employed to keep the rest of the system operational. Fencing is the process of isolating a node or protecting shared resources when a node appears to be malfunctioning. There are two classes of fencing methods, one which disables a node itself, the other disallows access to resources such as shared disks. TYPES OF CLUSTERS There are several types of clusters, each with specific design goals and functionality. These clusters range from distributed or parallel clusters for

computation intensive or data intensive applications that are used for protein, seismic, or nuclear modeling to simple load-balanced clusters. High Availability or Failover Clusters These clusters are designed to provide uninterrupted availability of data or services (typically web services) to the end-user community. The purpose of these clusters is to ensure that a single instance of an application is only ever running on one cluster member at a time but if and when that cluster member is no longer available, the application will failover to another cluster member. With a highavailability cluster, nodes can be taken out-of-service for maintenance or repairs. Additionally, if a node fails, the service can be restored without affecting the availability of the services provided by the cluster. While the application will still be available, there will be a performance drop due to the missing node.

Unlike distributed or parallel processing clusters, high-availability clusters seamlessly and transparently integrate existing standalone, non-cluster aware applications together into a single virtual machine necessary to allow the network to effortlessly grow to meet increased business demands. Clusters-Aware and Cluster-Unaware Applications Cluster-aware applications are designed specifically for use in clustered environment. They know about the existence of other nodes and are able to communicate with them. Clustered database is one example of such application. Instances of clustered database run in different nodes and have to notify other instances if they need to lock or modify some data. Cluster-unaware applications do not know if they are running in a cluster or on a single node. The existence of a cluster is completely transparent for such applications, and some additional software is usually needed to set up a cluster. A web server is a typical cluster-unaware application. All servers in the cluster have the same content, and the client does not care from which server the server provides the requested content. Load Balancing Cluster This type of cluster distributes incoming requests for resources or content among multiple nodes running the same programs or having the same content. Every node in the cluster is able to handle requests for the same content or application. If a

node fails, requests are redistributed between the remaining available nodes. This type of distribution is typically seen in a web-hosting environment.

Both the high availability and load-balancing cluster technologies can be combined to increase the reliability, availability, and scalability of application and data resources that are widely deployed for web, mail, news, or FTP services. Parallel/Distributed Processing Clusters Traditionally, parallel processing was performed by multiple processors in a specially designed parallel computer. These are systems in which multiple processors share a single memory and bus interface within a single computer. With

the advent of high speed, low-latency switching technology, computers can be interconnected to form a parallel-processing cluster. These types of cluster increase availability, performance, and scalability for applications, particularly

computationally or data intensive tasks. A parallel cluster is a system that uses a number of nodes to simultaneously solve a specific computational or data-mining task. Unlike the load balancing or highavailability cluster that distributes requests/tasks to nodes where a node processes the entire request, a parallel environment will divide the request into multiple subtasks that are distributed to multiple nodes within the cluster for processing. Parallel clusters are typically used for CPU-intensive analytical applications, such as mathematical computation, scientific analysis (weather forecasting, seismic analysis, etc.), and financial data analysis. One of the more common cluster operating systems is the Beowulf class of clusters. A Beowulf cluster can be defined as a number of systems whose collective processing capabilities are simultaneously applied to a specific technical, scientific, or business application. Each individual computer is referred to as a "node" and each node communicates with other nodes within a cluster across standard Ethernet technologies (10/100 Mbps, GibeE, or 10GbE). Other highspeed interconnects such as Myrinet, Infiniband, or Quadrics may also be used.

3.5 CLUSTER COMPONENTS The basic building blocks of clusters are broken down into multiple categories: the cluster nodes, cluster operating system, network switching hardware and the node/switch interconnect. Significant advances have been accomplished over the past five years to improve the performance of both the compute nodes as well as the underlying switching infrastructure. 3.6 CLUSTER NODES Node technology has migrated from the conventional tower cases to single rackunit multiprocessor systems and blade servers that provide a much higher processor density within a decreased area. Processor speeds and server architectures have increased in performance, as well as solutions that provide options for either 32-bit or 64-bit processors systems. Additionally, memory performance as well as hard-disk access speeds and storage capacities have also increased. It is interesting to note that even though performance is growing exponentially in some cases, the cost of these technologies has dropped considerably. As shown in the Figure below, node participation in the cluster falls into one of two responsibilities: master (or head) node and compute (or slave) nodes. The master node is the unique server in cluster systems. It is responsible for running the file system and also serves as the key system for clustering middleware to route

processes, duties, and monitor the health and status of each slave node. A compute (or slave) node within a cluster provides the cluster a computing and data storage capability. These nodes are derived from fully operational, standalone computers that are typically marketed as desktop or server systems that, as such, are off-the-

Figure 3 Cluster Nodes

shelf commodity systems.

3.7 CLUSTER APPLICATION Parallel applications exhibit a wide range of communication behaviors and impose various requirements on the underlying network. These may be unique to a specific application, or an application category depending on the requirements of the computational processes.

Some problems require the high bandwidth and low-latency capabilities of today's lowlatency, high throughput switches using 10GbE, Infiniband or Myrinet. Other application classes perform effectively on commodity clusters and will not push the bounds of the bandwidth and resources of these same switches. Many applications and the messaging algorithms used fall in between these two ends of the spectrum. Currently, there are four primary categories of applications that use parallel clusters: compute intensive, data or input/output (I/O) intensive, and transaction intensive. Each of these has its own set of characteristics and associated network requirements. Each has a different impact on the network as well as how each is impacted by the architectural characteristics of the underlying network. The following subsections describe each application types. Compute Intensive Application Compute intensive is a term that applies to any computer application that demands a lot of computation cycles (for example, scientific applications such as meteorological prediction). These types of applications are very sensitive to end-to-end message latency. This latency sensitivity is caused by either the processors having to wait for instruction messages, or if transmitting results data between nodes takes longer. In general, the more time spent idle waiting for an instruction or for results data, the longer it takes to complete the application. Some compute-intensive applications may also be graphic intensive. Graphic intensive is a term that applies to any application that demands a lot of computational cycles where the end result is the delivery of significant information for

the development of graphical output such as ray-tracing applications. These types of applications are also sensitive to end-to-end message latency. The longer the processors have to wait for instruction messages or the longer it takes to send resulting data, the longer it takes to present the graphical representation of the resulting data. Data or I/O Intensive Applications Data intensive is a term that applies to any application that has high demands of attached storage facilities. Performance of many of these applications is impacted by the quality of the I/O mechanisms supported by current cluster architectures, the bandwidth available for network attached storage, and, in some cases, the performance of the underlying network components at both Layer 2 and 3. Data-intensive applications can be found in the area of data mining, image processing, and genome and protein science applications. The movement to parallel I/O systems continues to occur to improve the I/O performance for many of these applications. Transaction Intensive Applications Transaction intensive is a term that applies to any application that has a high-level of interactive transactions between an application resource and the cluster resources.

There are three main careabouts for cluster applications: message latency, CPU utilization, and throughput. Each of these plays an important part in improving or impeding application performance. This section describes each of these issues and their associated impact on application performance. MESSAGE LATENCY Message latency is defined as the time it takes to send a zero-length message from one processor to another (measured in microseconds). The lower the latency for some application types, the better. Message latency is made up of aggregate latency incurred at each element within the cluster network, including within the cluster nodes themselves .Although network latency is often focused on, the protocol processing latency of message passing interface (MPI) and TCP processes within the host itself are typically larger. Throughput of today's cluster nodes are impacted by protocol processing, both for TCP/IP processing and the MPI. To maintain cluster stability, node synchronization, and data sharing, the cluster uses message passing technologies such as Parallel Virtual Machine (PVM) or MPI. TCP/IP stack processing is a CPU-intensive task that limits performance within high speed networks. As CPU performance has increased and new techniques such as TCP offload engines (TOE) have been introduced, PCs are now able to drive the bandwidth

levels higher to a point where we see traffic levels reaching near theoretical maximum for TCP/IP on Gigabit Ethernet and near bus speeds for PCI-X based systems when using 10 Gigabit Ethernet. These high-bandwidth capabilities will continue to grow as processor speeds increase and more vendors build network adapters to the PCI-Express specification. To address host stack latency, reductions in protocol processing have been addressed somewhat through the implementation of TOE and further developments of combined TOE and Remote Direct Memory Access (RDMA) technologies are occurring that will significantly reduce the protocol processing in the host.


High-performance cluster computing is enabling a new class of computationally intensive applications that are solving problems that were previously cost prohibitive for many enterprises. The use of commodity computers collaborating to resolve highly complex, computationally intensive tasks has broad application across several industry verticals such as chemistry or biology, quantum physics, petroleum exploration, crash test simulation, CG rendering, and financial risk analysis. However, cluster computing pushes the limits of server architectures, computing, and network performance. Due to the economics of cluster computing and the flexibility and high performance offered, cluster computing has made its way into the mainstream enterprise data centers using clusters of various sizes. As clusters become more popular and more pervasive, careful consideration of the application requirements and what that translates to in terms of network characteristics becomes critical to the design and delivery of an optimal and reliable performing solution. Knowledge of how the application uses the cluster nodes and how the characteristics of the application impact and are impacted by the underlying network is critically important. As critical as the selection of the cluster nodes and operating system, so too are the selection of the node interconnects and underlying cluster network switching technologies. A scalable and modular networking solution is critical, not only to provide incremental connectivity but also to provide incremental bandwidth options as the cluster grows. The

ability to use advanced technologies within the same networking platform, such as 10 Gigabit Ethernet, provides new connectivity options, increases bandwidth, whilst providing investment protection. The technologies associated with cluster computing, including host protocol stack-processing and interconnect technologies, are rapidly evolving to meet the demands of current, new, and emerging applications. Much progress has been made in the development of low-latency switches, protocols, and standards that efficiently and effectively use network hardware components.

