Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ABSTRACT

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 16

ABSTRACT

The RAIN project is research collaboration between Caltech and NASA-JPL on


distributed computing and data storage systems for future space-borne missions.
The goal of the

project is to identify and develop key building blocks for reliable distributed
systems built with

inexpensive off-the-shelf components.

The RAIN platform consists of a heterogeneous cluster of computing and/or


storage nodes

connected via multiple interfaces to networks configured in fault-tolerant


topologies. The RAIN

software components run in conjunction with operating system services and


standard network

protocols. Through software-implemented fault tolerance, the system tolerates


multiple node,

link, and switch failures, with no single point of failure.


The RAIN technology has been transferred to RAIN finity, a start-up company
focusing on

creating clustered solutions for improving the performance and availability of


Internet data 2

INTRODUCTION

RAIN technology originated in a research project at the California Institute of

Technology (Caltech), in collaboration with NASA's Jet Propulsion Laboratory and


the Defense

Advanced Research Projects Agency (DARPA). The name of the original research
project was

RAIN, which stands for Reliable Array of Independent Nodes. The main purpose of
the RAIN

project was to identify key software building blocks for creating reliable
distributed applications

using off-the-shelf hardware. The focus of the research was on high-performance,


fault-tolerant
and portable clustering technology for space-borne computing. Led by Caltech
professor Shuki

Bruck, the RAIN research team in 1998 formed a company called Rainfinity.
Rainfinity, located

in Mountain View, Calif., is already shipping its first commercial software package
derived from

the RAIN technology, and company officials plan to release several other Internet-
oriented

applications. The RAIN project was started four years ago at Caltech to create an
alternative to

the expensive, special-purpose computer systems used in space missions. The


Caltech

Researchers wanted to put together a highly reliable and available computer


system by

distributing processing across many low-cost commercial hardware and software


components.
To tie these components together, the researchers created RAIN software, which
has three

components:

1. A component that stores data across distributed processors and retrieves it


even if some

of the processors fail.

2. A communications component that creates a redundant network between


multiple

processors and supports a single, uniform way of connecting to any of the


processors.

3. A computing component that automatically recovers and restarts applications if


a processor fails.
Diagram

Myrinet switches provide the high speed cluster message passing network for
passing messages

between compute nodes and for I/O. The Myrinet switches have a few counters
that can be

accessed from an ethernet connection to the switch. These counters can be


accessed to monitor

the health of the connections, cables, etc. The following information refers to the
16-port, the

clos-64 switches, and the Myrinet2000 switches.

ServerNet is a switched fabric communications link primarily used in proprietary


computers

made by Tandem Computers, Compaq, and HP. Its features include good
scalability, clean fault

containment, error detection and failover.


The ServerNet architecture specification defines a connection between nodes,
either processor or

high performance I/O nodes such as storage devices. Tandem Computers


developed the original

ServerNet architecture and protocols for use in its own proprietary computer
systems starting in

1992, and released the first ServerNet systems in 1995.

Early attempts to license the technology and interface chips to other companies
failed, due in part

to a disconnect between the culture of selling complete hardware / software /


middleware

computer systems and that needed for selling and supporting chips and licensing
technology. 6

A follow-on development effort ported the Virtual Interface Architecture to


ServerNet with PCI
interface boards connecting personal computers. Infiniband directly inherited
many ServerNet

features. After 25 years, systems still ship today based on the ServerNet
architecture.

ORIGIN

1. Rain Technology developed by the California Institute of technology, in


collaboration

with NASA’s Jet Propulsion laboratory and the DARPA.

2. The name of the original research project was RAIN, which stands for Reliable
Array of

Independent Nodes.

3. The RAIN research team in 1998 formed a company called Rainfinity7

ARCHITECTURE

The RAIN technology incorporates a number of unique innovations as its core


modules:
Reliable transport ensures the reliable communication between the nodes in the
cluster. This

transport has a built-in acknowledgement scheme that ensures reliable packet


delivery. It

transparently uses all available network links to reach the destination.

When it fails to do so, it alerts the upper layer, therefore functioning as a failure
detector. This

module is portable to different computer platforms, operating systems and


networking

environments. Consistent global state sharing protocol provides consistent group


membership,

optimized information distribution and distributed group-decision making for a


RAIN cluster.

This module is at the core of a RAIN cluster. It enables efficient group


communication among
the computing nodes, and ensures that they operate together without conflict.
Always on IP

maintains pools of "always-available" virtual IPs.

This virtual IPs is nothing but the logical addresses that can move from one node
to another for

load sharing or fail-over. Usually a pool of virtual IPs is created for each subnet
that the RAIN

cluster is connected to. A pool can consist of one or more virtual IPs.

Always on IP guarantees that all virtual IP addresses representing the cluster are
available as

long as at least one node in the cluster is operational. In other words, when a
physical node fails

in the cluster, its virtual IP will be taken over by another healthy node in the
cluster.

Local and global fault monitors monitor, on a continuous or event-driven basis,


the critical
resources within and around the cluster: network connections, Rainfinity or other
applications

residing on the nodes, remote nodes or applications.

It is an integral part of the RAIN technology, guaranteeing the healthy operation


of the cluster.10

Diagram

FEATURES OF RAIN

1. Communication.

i) Bundled interface.

ii) Link monitoring.

iii) Fault-tolerant interconnects topologies.


 The Problem

 A Naïve Approach

 Diameters Construction dc=2

2. Data Storage.

3. Group Membership.

 Token Mechanism.

 Aggressive Failure Detection.

 Conservative Failure Detection.

 Uniqueness of Tokens.

 911 Mechanisms

 Token Regeneration
 Dynamic Scalability

 Link Failures and Transient Failures

1 - Communication

As the network is frequently a single point of failure, RAIN provides fault


tolerance in

the network through the following mechanisms:

i) Bundled interfaces: Nodes are permitted to have multiple interface cards. This
not only

adds fault tolerance to the network, but also gives improved bandwidth.12

ii) Link monitoring: To correctly use multiple paths between nodes in the presence
of

faults, we have developed a link state monitoring protocol that provides a


consistent

history of the link state at each endpoint.


iii) Fault-tolerant interconnects topologies: Network partitioning is always a
problem

when a cluster of computers must act as a whole. We have designed network


topologies

that are resistant to partitioning as network elements fail.

 The Problem:

We look at the following problem: Given n switches of degree ds connected in

a ring, what is the best way to connect n compute nodes of degree dc to the

switches to minimize the possibility of partitioning the compute nodes when

switch failure occur? Figure3 illustrates the problem.

Figure 3
 A Naïve Approach:

At a first glance, Figure 4a may seem a solution to our problem. In this

construction we simply connect the compute nodes to the nearest switches in

regular fashion. If we use this approach, we are relying entirely on fault

tolerance in the switching network.

A ring is 1-fault-tolerant for connectivity, so we can lose one switch without

upset. A second switch failure can partition the switches and thus the compute

nodes, as in figure 4b. this prompts the study of whether we can use the

multiple connections of the compute nodes to make the compute nodes more

resistant to partitioning. In other word, we want a construction where the 14

connectivity of the nodes is maintained even after the switch network has
become partitioned.

 Diameters Construction dc=2;

2 - Data Storage

Fault tolerance in data storage over multiple disks is achieved through redundant
storage

schemes. Novel error-correcting codes have been developed for this purpose.
These are array

codes that encode and decode using simple XOR operations. Traditional RAID
codes generally

allow mirroring or parity as options. Array codes exhibit optimality in the storage
requirements

as well as in the number of update operations needed. Although some of the


original motivations

for these codes come from traditional RAID systems, these schemes apply equally
well to
partitioning data over disks on distinct nodes or even partitioning data over
remote geographic

locations.

3 - Group Membership16

You might also like