ABSTRACT

ABSTRACT
The RAIN project is research collaboration between Caltech and NASA-JPL on

distributed computing and data storage systems for future space-borne missions.
The goal of the
project is to identify and develop key building blocks for reliable distributed
systems built with
inexpensive off-the-shelf components.
The RAIN platform consists of a heterogeneous cluster of computing and/or

storage nodes
connected via multiple interfaces to networks configured in fault-tolerant

topologies. The RAIN
software components run in conjunction with operating system services and

standard network
protocols. Through software-implemented fault tolerance, the system tolerates

multiple node,
link, and switch failures, with no single point of failure.

The RAIN technology has been transferred to RAIN finity, a start-up company
focusing on
creating clustered solutions for improving the performance and availability of

Internet data 2
INTRODUCTION
RAIN technology originated in a research project at the California Institute of
Technology (Caltech), in collaboration with NASA's Jet Propulsion Laboratory and

the Defense
Advanced Research Projects Agency (DARPA). The name of the original research
project was
RAIN, which stands for Reliable Array of Independent Nodes. The main purpose of
the RAIN
project was to identify key software building blocks for creating reliable
distributed applications
using off-the-shelf hardware. The focus of the research was on high-performance,

fault-tolerant
and portable clustering technology for space-borne computing. Led by Caltech
professor Shuki
Bruck, the RAIN research team in 1998 formed a company called Rainfinity.
Rainfinity, located
in Mountain View, Calif., is already shipping its first commercial software package
derived from
the RAIN technology, and company officials plan to release several other Internet-
oriented
applications. The RAIN project was started four years ago at Caltech to create an
alternative to
the expensive, special-purpose computer systems used in space missions. The

Caltech
Researchers wanted to put together a highly reliable and available computer

system by
distributing processing across many low-cost commercial hardware and software

components.
To tie these components together, the researchers created RAIN software, which
has three
components:
1. A component that stores data across distributed processors and retrieves it

even if some
of the processors fail.
2. A communications component that creates a redundant network between

multiple
processors and supports a single, uniform way of connecting to any of the

processors.
3. A computing component that automatically recovers and restarts applications if

a processor fails.
Diagram
Myrinet switches provide the high speed cluster message passing network for
passing messages
between compute nodes and for I/O. The Myrinet switches have a few counters
that can be
accessed from an ethernet connection to the switch. These counters can be

accessed to monitor
the health of the connections, cables, etc. The following information refers to the
16-port, the
clos-64 switches, and the Myrinet2000 switches.
ServerNet is a switched fabric communications link primarily used in proprietary

computers
made by Tandem Computers, Compaq, and HP. Its features include good
scalability, clean fault
containment, error detection and failover.

The ServerNet architecture specification defines a connection between nodes,
either processor or
high performance I/O nodes such as storage devices. Tandem Computers

developed the original
ServerNet architecture and protocols for use in its own proprietary computer
systems starting in
1992, and released the first ServerNet systems in 1995.
Early attempts to license the technology and interface chips to other companies
failed, due in part
to a disconnect between the culture of selling complete hardware / software /

middleware
computer systems and that needed for selling and supporting chips and licensing
technology. 6
A follow-on development effort ported the Virtual Interface Architecture to

ServerNet with PCI
interface boards connecting personal computers. Infiniband directly inherited
many ServerNet
features. After 25 years, systems still ship today based on the ServerNet
architecture.
ORIGIN
1. Rain Technology developed by the California Institute of technology, in

collaboration
with NASA’s Jet Propulsion laboratory and the DARPA.
2. The name of the original research project was RAIN, which stands for Reliable
Array of
Independent Nodes.
3. The RAIN research team in 1998 formed a company called Rainfinity7
ARCHITECTURE
The RAIN technology incorporates a number of unique innovations as its core

modules:
Reliable transport ensures the reliable communication between the nodes in the
cluster. This
transport has a built-in acknowledgement scheme that ensures reliable packet

delivery. It
transparently uses all available network links to reach the destination.
When it fails to do so, it alerts the upper layer, therefore functioning as a failure
detector. This
module is portable to different computer platforms, operating systems and

networking
environments. Consistent global state sharing protocol provides consistent group

membership,
optimized information distribution and distributed group-decision making for a

RAIN cluster.
This module is at the core of a RAIN cluster. It enables efficient group

communication among
the computing nodes, and ensures that they operate together without conflict.
Always on IP
maintains pools of "always-available" virtual IPs.
This virtual IPs is nothing but the logical addresses that can move from one node
to another for
load sharing or fail-over. Usually a pool of virtual IPs is created for each subnet
that the RAIN
cluster is connected to. A pool can consist of one or more virtual IPs.
Always on IP guarantees that all virtual IP addresses representing the cluster are
available as
long as at least one node in the cluster is operational. In other words, when a
physical node fails
in the cluster, its virtual IP will be taken over by another healthy node in the
cluster.
Local and global fault monitors monitor, on a continuous or event-driven basis,

the critical
resources within and around the cluster: network connections, Rainfinity or other
applications
residing on the nodes, remote nodes or applications.
It is an integral part of the RAIN technology, guaranteeing the healthy operation

of the cluster.10
Diagram
FEATURES OF RAIN
1. Communication.
i) Bundled interface.
ii) Link monitoring.
iii) Fault-tolerant interconnects topologies.

 The Problem
 A Naïve Approach
 Diameters Construction dc=2
2. Data Storage.
3. Group Membership.
 Token Mechanism.
 Aggressive Failure Detection.
 Conservative Failure Detection.
 Uniqueness of Tokens.
 911 Mechanisms
 Token Regeneration
 Dynamic Scalability
 Link Failures and Transient Failures
1 - Communication
As the network is frequently a single point of failure, RAIN provides fault

tolerance in
the network through the following mechanisms:
i) Bundled interfaces: Nodes are permitted to have multiple interface cards. This
not only
adds fault tolerance to the network, but also gives improved bandwidth.12
ii) Link monitoring: To correctly use multiple paths between nodes in the presence
of
faults, we have developed a link state monitoring protocol that provides a

consistent
history of the link state at each endpoint.

iii) Fault-tolerant interconnects topologies: Network partitioning is always a
problem
when a cluster of computers must act as a whole. We have designed network

topologies
that are resistant to partitioning as network elements fail.
 The Problem:
We look at the following problem: Given n switches of degree ds connected in
a ring, what is the best way to connect n compute nodes of degree dc to the
switches to minimize the possibility of partitioning the compute nodes when
switch failure occur? Figure3 illustrates the problem.
Figure 3
 A Naïve Approach:
At a first glance, Figure 4a may seem a solution to our problem. In this
construction we simply connect the compute nodes to the nearest switches in
regular fashion. If we use this approach, we are relying entirely on fault
tolerance in the switching network.
A ring is 1-fault-tolerant for connectivity, so we can lose one switch without
upset. A second switch failure can partition the switches and thus the compute
nodes, as in figure 4b. this prompts the study of whether we can use the
multiple connections of the compute nodes to make the compute nodes more
resistant to partitioning. In other word, we want a construction where the 14
connectivity of the nodes is maintained even after the switch network has
become partitioned.
 Diameters Construction dc=2;
2 - Data Storage
Fault tolerance in data storage over multiple disks is achieved through redundant
storage
schemes. Novel error-correcting codes have been developed for this purpose.
These are array
codes that encode and decode using simple XOR operations. Traditional RAID
codes generally
allow mirroring or parity as options. Array codes exhibit optimality in the storage
requirements
as well as in the number of update operations needed. Although some of the

original motivations
for these codes come from traditional RAID systems, these schemes apply equally
well to
partitioning data over disks on distinct nodes or even partitioning data over
remote geographic
locations.
3 - Group Membership16

ABSTRACT

Uploaded by

Copyright:

Available Formats

ABSTRACT

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ABSTRACT

Uploaded by

Copyright:

Available Formats

ABSTRACT

The RAIN project is research collaboration between Caltech and NASA-JPL on

inexpensive off-the-shelf components.

The RAIN platform consists of a heterogeneous cluster of computing and/or

connected via multiple interfaces to networks configured in fault-tolerant

software components run in conjunction with operating system services and

protocols. Through software-implemented fault tolerance, the system tolerates

link, and switch failures, with no single point of failure.

creating clustered solutions for improving the performance and availability of

RAIN technology originated in a research project at the California Institute of

Technology (Caltech), in collaboration with NASA's Jet Propulsion Laboratory and

using off-the-shelf hardware. The focus of the research was on high-performance,

the expensive, special-purpose computer systems used in space missions. The

Researchers wanted to put together a highly reliable and available computer

distributing processing across many low-cost commercial hardware and software

1. A component that stores data across distributed processors and retrieves it

of the processors fail.

2. A communications component that creates a redundant network between

processors and supports a single, uniform way of connecting to any of the

3. A computing component that automatically recovers and restarts applications if

accessed from an ethernet connection to the switch. These counters can be

clos-64 switches, and the Myrinet2000 switches.

ServerNet is a switched fabric communications link primarily used in proprietary

containment, error detection and failover.

high performance I/O nodes such as storage devices. Tandem Computers

1992, and released the first ServerNet systems in 1995.

to a disconnect between the culture of selling complete hardware / software /

A follow-on development effort ported the Virtual Interface Architecture to

1. Rain Technology developed by the California Institute of technology, in

with NASA’s Jet Propulsion laboratory and the DARPA.

3. The RAIN research team in 1998 formed a company called Rainfinity7

The RAIN technology incorporates a number of unique innovations as its core

transport has a built-in acknowledgement scheme that ensures reliable packet

transparently uses all available network links to reach the destination.

module is portable to different computer platforms, operating systems and

environments. Consistent global state sharing protocol provides consistent group

optimized information distribution and distributed group-decision making for a

This module is at the core of a RAIN cluster. It enables efficient group

maintains pools of "always-available" virtual IPs.

Local and global fault monitors monitor, on a continuous or event-driven basis,

residing on the nodes, remote nodes or applications.

It is an integral part of the RAIN technology, guaranteeing the healthy operation

ii) Link monitoring.

iii) Fault-tolerant interconnects topologies.

 Diameters Construction dc=2

 Aggressive Failure Detection.

 Conservative Failure Detection.

 Link Failures and Transient Failures

As the network is frequently a single point of failure, RAIN provides fault

the network through the following mechanisms:

faults, we have developed a link state monitoring protocol that provides a

history of the link state at each endpoint.

when a cluster of computers must act as a whole. We have designed network

that are resistant to partitioning as network elements fail.

We look at the following problem: Given n switches of degree ds connected in

switches to minimize the possibility of partitioning the compute nodes when

switch failure occur? Figure3 illustrates the problem.

At a first glance, Figure 4a may seem a solution to our problem. In this

construction we simply connect the compute nodes to the nearest switches in

regular fashion. If we use this approach, we are relying entirely on fault