ABSTRACT
ABSTRACT
ABSTRACT
project is to identify and develop key building blocks for reliable distributed
systems built with
INTRODUCTION
Advanced Research Projects Agency (DARPA). The name of the original research
project was
RAIN, which stands for Reliable Array of Independent Nodes. The main purpose of
the RAIN
project was to identify key software building blocks for creating reliable
distributed applications
Bruck, the RAIN research team in 1998 formed a company called Rainfinity.
Rainfinity, located
in Mountain View, Calif., is already shipping its first commercial software package
derived from
the RAIN technology, and company officials plan to release several other Internet-
oriented
applications. The RAIN project was started four years ago at Caltech to create an
alternative to
components:
Myrinet switches provide the high speed cluster message passing network for
passing messages
between compute nodes and for I/O. The Myrinet switches have a few counters
that can be
the health of the connections, cables, etc. The following information refers to the
16-port, the
made by Tandem Computers, Compaq, and HP. Its features include good
scalability, clean fault
ServerNet architecture and protocols for use in its own proprietary computer
systems starting in
Early attempts to license the technology and interface chips to other companies
failed, due in part
computer systems and that needed for selling and supporting chips and licensing
technology. 6
features. After 25 years, systems still ship today based on the ServerNet
architecture.
ORIGIN
2. The name of the original research project was RAIN, which stands for Reliable
Array of
Independent Nodes.
ARCHITECTURE
When it fails to do so, it alerts the upper layer, therefore functioning as a failure
detector. This
This virtual IPs is nothing but the logical addresses that can move from one node
to another for
load sharing or fail-over. Usually a pool of virtual IPs is created for each subnet
that the RAIN
cluster is connected to. A pool can consist of one or more virtual IPs.
Always on IP guarantees that all virtual IP addresses representing the cluster are
available as
long as at least one node in the cluster is operational. In other words, when a
physical node fails
in the cluster, its virtual IP will be taken over by another healthy node in the
cluster.
Diagram
FEATURES OF RAIN
1. Communication.
i) Bundled interface.
A Naïve Approach
2. Data Storage.
3. Group Membership.
Token Mechanism.
Uniqueness of Tokens.
911 Mechanisms
Token Regeneration
Dynamic Scalability
1 - Communication
i) Bundled interfaces: Nodes are permitted to have multiple interface cards. This
not only
adds fault tolerance to the network, but also gives improved bandwidth.12
ii) Link monitoring: To correctly use multiple paths between nodes in the presence
of
The Problem:
a ring, what is the best way to connect n compute nodes of degree dc to the
Figure 3
A Naïve Approach:
upset. A second switch failure can partition the switches and thus the compute
nodes, as in figure 4b. this prompts the study of whether we can use the
multiple connections of the compute nodes to make the compute nodes more
connectivity of the nodes is maintained even after the switch network has
become partitioned.
2 - Data Storage
Fault tolerance in data storage over multiple disks is achieved through redundant
storage
schemes. Novel error-correcting codes have been developed for this purpose.
These are array
codes that encode and decode using simple XOR operations. Traditional RAID
codes generally
allow mirroring or parity as options. Array codes exhibit optimality in the storage
requirements
for these codes come from traditional RAID systems, these schemes apply equally
well to
partitioning data over disks on distinct nodes or even partitioning data over
remote geographic
locations.
3 - Group Membership16