Improving Availability with Adaptive Roaming
Replicas in Presence of Determined DoS Attacks
Chin-Tser Huang, Prasanth Kalakota, Alexander B. Alexandrov
Department of Computer Science and Engineering
University of South Carolina
{huangct, kalakota, alexand2}@cse.sc.edu
Abstract— Static replicas have been proven useful in providing
fault tolerance and load balancing, but they may not provide
enough assurance on the continuous availability of missioncritical data in face of a determined denial-of-service (DoS)
attacker. A roaming replica scheme can provide higher
availability assurance, but the overhead associated with replica
movement and lookup is high. In this paper, we propose ARRP,
an adaptive roaming replication protocol in which static replicas
are used normally but if a certain percentage of static replicas has
already been shut down, then a small number of roaming replicas
will be added and stored in randomly selected hosts that are
changed periodically. In particular, we analyze the appropriate
threshold when the roaming replica scheme should be enabled by
empirically investigating the tradeoff between availability,
performance, and overhead. Simulation results show that ARRP
can effectively mitigate the impacts of DoS attacks and host
failures to ensure continuous availability of critical data, with
better performance and reasonable overhead compared to only
using static replicas.
Index Terms—Data Replication, Assurance, Availability,
Denial-of-Service Attacks.
I. INTRODUCTION
Data replication on the Internet is becoming more and more
common. By placing replicas at multiple locations people can
access the data more quickly and reliably. By replicating data
single point of failure can be avoided, as this is a particularly
desirable property for mission-critical applications.
Consider a military operation scenario where the soldiers
are the clients, base station serves as the server with battlefield
information being the critical data exchanged between them.
Assume that the soldiers periodically contact the base station
and get critical data. If the base station is shut down by a DoS
attack or due to some other problem, the soldiers can no longer
access this data. However, if the same data were replicated at
different places, then the soldiers can access the critical data
via the replicas.
There are mainly three objectives with such a data
replication strategy. First, as explained in the previous
example, if the main server is down due to host failures or
attacks, the clients can still have access to the critical data
from the replicas. Second, replicas may provide faster access
to clients. Third, by allowing clients to access different
replicas, Load balancing can be achieved.
One of the common ways to replicate data is by using static
copies. In this approach the server designates some nodes as
replicas and transfers data to them. To maintain the
consistency of replicated data, the server periodically updates
the replicas. This approach achieves the second and third
objectives. One problem with static replicas is that if an
adversary locates all the static replicas, they can launch
targeted DoS attacks as discussed in [9] to shut down these
replicas. For greater assurance more replicas can be stored in
the network, but this may incur too much overhead as all
replicas have to be updated when the original copy is updated.
To mitigate this problem, a roaming replica method is
proposed in our previous paper [4]. In this approach, instead
of using a large number of static replicas, only a small number
of roaming replicas are used. The server periodically moves
the replicas to different hosts, and the clients use a discovery
protocol to find the location of replicas in a secure manner. It
is demonstrated in [4] that by moving data periodically greater
assurance on data availability can be achieved.
The main problem with the roaming replica approach is the
necessity to move replicas periodically and to discover the
current location of a replica. Even during periods when there
are no attacks or attack level is low, data still needs to be
moved periodically. Clients have to run the discovery protocol
every time to find the location of replicas. In this paper, we
propose an Adaptive Roaming Replication Protocol (ARRP)
which aims to integrate both schemes of static replicas and
roaming replicas. In this approach roaming replicas are
included in addition to static replicas only if the attack level is
above the threshold, otherwise only static replicas are used. In
this way the survivability of replicas is increased and the
overhead of roaming replicas is minimized. The main question
that remains to be determined is: what is the most appropriate
threshold needed to enable the roaming replica scheme.
In an attempt to answer this question, we analyze the
tradeoff relationship between Availability, Performance, and
Overhead in the presence of attacks. Availability is a measure
of how survivable the replicas are in face of DoS attacks. If
either the server of at least one of the replicas is available, then
the critical data is available. Performance is a measure of how
long it would take for a client to get the critical data from the
server or an available replica. If more replicas are deployed,
there is a better chance that a client can quickly find a replica
close to it, thereby improving the performance. On the other
hand, in the presence of DoS attacks, as it takes more time to
find an available replica performance decreases. Overhead is
composed of four components, namely movement overhead,
discovery overhead, storage overhead, and update overhead.
Movement overhead is about message transmissions needed to
move roaming replicas to a different location. Discovery
overhead is about message transmissions needed to find an
available replica. Storage and update overheads are
proportional to the number of replicas used in the system. In
the simulation and evaluation, we create a random network
topology with 300 nodes and try to find the performance and
overhead for various client loads in presence of various
degrees of replica failures.
The remainder of this paper is organized as follows. In
Section II, we discuss the related work. In Sections III and IV,
roaming data redundancy scheme and ARRP scheme are
discussed. In Section V we describe the experimental setup
and simulation and discuss the evaluation on the tradeoff
between availability, performance, and overhead. Finally, we
conclude and discuss future work in Section VI.
II. RELATED WORKS
Most of the work related to replica placement focuses on
providing Quality of Service (QoS) to clients. A method using
Tapestry method was proposed by Chen et al. [3]. In this work
the network is treated as a dissemination tree and the method
addresses the placement of replicas in order to achieve load
balancing. In [1], Bartolini et al. propose a method for placing
replicas based on the traffic estimations and current replica
location. In [11], Szymaniak et al. propose HotZone, an
algorithm to place replicas in wide-area network such that the
client to replica latency is minimized. This algorithm work
very well in terms of performance, but they do not address the
issue about the availability of critical data.
In [12], Tang and Xu discuss the problem of placing
replicas of an object in content distributed systems to meet
QoS requirements while minimizing the replicating cost. The
authors used replica-aware model and replica-blind model for
different problem specifications. In the replica-aware model
the server knows the location of the replicas and the user and
redirects the user request to the replica which is close to the
user, which is NP-complete problem. In the replica-blind
model, the server randomly selects one replica and sends the
user request to that replica, which can be solved in polynomial
time. Still, this work does not address replica availability.
In [5], Khattab et al. propose a method in which n out of m
servers are selected to be active servers, rendering the
remaining m - n servers acting as honeypots. To mitigate DoS
attacks a different set of n servers are randomly selected to be
active after some time. In [6], the same authors propose a
method in which the server roams among a pool of servers.
This method requires modifications to TCP connection state as
servers are moved physically. Only legitimate clients can
follow the location of the server. Although this approach
achieves fault tolerance and increases availability, the
overhead of moving servers can be high. Their results show an
increase of about 14% in average response time when there are
no attacks. In contrast, our approach adds a small number of
roaming replicas only when replica failure rate is high.
In [9], Srivatsa and Liu discuss the targeted file attacks and
propose a LocationGuard scheme to counter the attacks. In this
approach, each client uses a lookup guard which takes a stored
location key to securely calculate the location of the target file
or its replica in an overlay network. The adversary is hidden
from the target location because it does not know the
corresponding location key. However, this scheme may not
tolerate a determined attacker who launches multiple DoS
attacks by guessing and gradually taking down more replicas.
As for the discovery of available replicas, there have been
several distributed hash table (DHT) based replica lookup
protocols, e.g. Chord [10], CAN [7], Pastry [8], and Tapestry
[3]. These schemes allow for lookup in a small and bounded
number of hops. However, in presence of determined DoS
attacks, these schemes will also require a lot of retries.
III. ROAMING DATA REDUNDANCY
The Roaming Data Redundancy Scheme was proposed in
our previous paper [4]. This scheme basically consists of two
protocols, a Redundant Data Moving Protocol (RDMP) and a
Redundant Data Discovery Protocol (RDDP). RDMP allows
host to move replicas of the critical data periodically to
different hosts. RDDP allows roaming replicas of critical data
to be discovered by clients. Both protocols are designed to
incorporate multiple types of critical data, with each type of
critical data maintained by a different host.
A. Assumptions
Before presenting the scheme, we discuss the assumptions
that we make about the critical data service and the adversary.
We assume that there are multiple types of critical data present
in the network. All legitimate clients are aware of which host is
the main server of which type of critical data. To protect the
privacy and integrity of critical data and location of redundant
copies we assume all the communication is encrypted. For
broadcast messages we assume that the messages are
encrypted with the shared key between the main server and
client hosts, whereas all the unicast messages are encrypted
using public key encryption. And also we assume that all the
hosts in the network can be trusted. They do not collude with
adversary by leaking the private keys.
Even though adversary cannot decrypt the messages, we
assume that the adversary can do traffic analysis and can also
perform replay attacks. We also assume that the adversary is
aware of the location of all the hosts in the network. The
adversary can also attack the m hosts simultaneously and can
shut them down all at once.
B. Redundant Data Moving Protocol
Redundant Data Moving Protocol consists of n processes
rdm[0..n-1]. Each host participating in the protocol has an
input cd, which represents the critical data maintained by the
host, which is the owner of the critical data of the host. The
owner has the authority to manage the roaming replicas of the
data. Each rdm[i] also maintains an array rd[0..n-1] which
represents the replicas of the other hosts’ critical data currently
kept by this host. Each host also maintains an array sq[0..n-1]
that represents the next sequence number to be used by each
process to send the next request message to move the critical
data. Periodically rdm[i] selects the next keeper of its critical
data, broadcasts the dlt message, to notify the keeper to delete
the outdated message and sends unicast message to transfer the
critical data to the next keeper. If process rdm[j] keeps a
roaming replica, then rdm[j] sends a dltack message to rdm[i]
to acknowledge the deletion. If rdm[j] is the next keeper of the
roaming replica then rdm[j] sends a movack message to
acknowledge the reception of the critical data.
C. Redundant Data Discovery Protocol
The Redundant Data Discovery Protocol consists of n
processes rdd[0..n-1]. Each process rdd[i] maintains an input
array rd[0..n-1] which is provided by rdm[i] in the redundant
data moving protocol and represents the replicas of the other
hosts’ critical data currently kept by this host. Each process
rdd[i] also maintains an array sq[0..n-1] that represents the
next sequence number to be used by each process to send the
next query. Each process rdd[i] in the RDDP can send to every
other process a drqst(sq[i], tgt, i) request message, where sq[i]
is the sequence number of the drqst message sent by rdd[i], tgt
is the index of the target critical data and i is the index of
rdd[i]. Every time rdd[i] sends out a drqst message, sq[i]
needs to be incremented by 1 in every process in order to keep
consistency. If process rdd[j] currently keeps a roaming
replica, then rdd[j] will send a drply(sq[i], tgt, j) message to
rdd[i], where sq[i] is the corresponding sequence number of
rdd[i], tgt is the index of the target critical data, and j is the
index of rdd[j]. The other processes that do not keep track of
the critical data will discard the message. Figure 1 illustrates
the basic operations in RDMP and RDDP.
mov
dlt
dlt
dlt
server
drqst
old
replica
old
replica
client
mov
new
replica
new
replica
drply
drqst
drqst
replica
drply
replica
Figure 1: Basic operations of Redundant Data Moving Protocol (RDMP) and
Redundant Data Discovery Protocol (RDDP).
IV. ADAPTIVE ROAMING REPLICATION PROTOCOL
It has been shown in [4] that the roaming replica scheme
effectively mitigates the impacts of DoS attacks and host
failures and provides higher assurance on the continuous
availability of critical data. However, the main disadvantage of
the pure roaming replica scheme is about its overheads due to
the movement and discovery of roaming replicas. These
overheads remain even when the level of DoS attack is very
low, and as a result, they may cancel the benefit of availability
guarantee and discourage the adoption of this scheme.
To address this problem, we propose the Adaptive Roaming
Replication Protocol that integrates both schemes of static and
roaming replicas. The static replica scheme is still used at all
times. Each client caches the addresses of a few static replicas
that are close to it, such that the client can access the closest
available replica first and shorten the latency. If none of the
cached addresses of static replicas is reachable, then the client
will use the RDDP protocol to find an available replica, either
static or roaming. The server is able to derive an estimate of
the percentage of failed static replicas when it periodically
updates the replicas. When the percentage of failed replicas
exceeds a certain threshold th, the roaming replica scheme is
enabled to add a small number of roaming replicas to the
network. Since there are still some available static replicas in
the network, the client can attempt to access the cached static
replica positions to see if they are still available. If so, then the
client will access the data from the available static replica.
Only when these attempts fail the client will resort to RDDP
protocol to find a roaming replica. Later if some failed replicas
recover and the server detects that the percentage of failed
replicas falls below th, it turns off the roaming replica scheme.
This is easily achieved by requesting the current holders of a
roaming replica to remove it from their storage without
designating the next roaming replica holders.
V. EXPERIMENTAL SETUP AND RESULTS
A. Simulation Model
We have developed a simplified model of our roaming
replica scheme in C++ and conducted a number of experiments
to study the effect of different parameters. We first created 300
node network topology using BRITE [13]. To evaluate the
effectiveness of our simulation we tested our results on
different topological networks sparse, medium and dense
networks whose nodes have average degrees of 2, 5 and 8
respectively. We assume that these three different topologies
cover various network densities. In addition we assume that all
links in the network are 10Mbps unless specified otherwise.
Initially, the server selects 20 nodes as static replicas and
transfers critical data to them. In each time unit clients try to
contact their closest available replica and gets critical data.
Replicas can go down due to the presence of DoS attacks and
node failures. To implement the node failures and DoS attacks
on replicas, we used a probabilistic method in which a random
number is generated to determine whether a replica is up or
down. We ran the simulations under replica failure
probabilities of 25%, 50% and 75% respectively.
In addition to the static replicas we used 3 replicas as
roaming replicas. In each time period the server selects 3 new
replicas and uses RDMP protocol to send critical data to them.
Even if the attacker is able to find a subset of the roaming
replicas, the attack is successful only during the time period
because after each time period the roaming replicas change
their location.
We assume that the clients cache the location of some
closest static replicas. The client first checks the cached
locations before applying RDDP protocol to find an available
replica. In the simulation we assume that the client caches 1, 2
or 3 closest static replicas respectively and the results are
compared with the basic static replica placement method
where a legitimate client knows the location of all static
replicas (e.g. by looking up some public directory).
We ran the simulation for 100 time units with server
updating the replica position for every time unit. The
simulation is run for 20 times and each point in the graph
represents the average over 20 runs. All the client requests are
distributed uniformly throughout the time period.
B. Evaluation
We use the above model to conduct various simulations in
order to evaluate the availability, performance, and overhead
n−r
a−r
n
a
With the parameters used in our simulation (300 nodes, 3
roaming replicas, and 30 attacks at the same time), the
probability is just 0.09%, which is very low.
In order to quantify the availability more concretely, we
design the following experiment. In the roaming replica
scheme, every time unit we let the original source server
randomly choose 1, 2, 3, or 4 servers out of 300 total servers
to keep the roaming replicas. Every time unit the adversary
randomly chooses 30 servers to attack simultaneously. If all
the current roaming redundant copies are hit by the DoS
attacks, then the attack is regarded successful and we measure
the time elapsed in time units. Otherwise, in the next time unit
the original source server again randomly chooses 1, 2, or 3
servers and the adversary again randomly chooses 30 servers
to attack. The longer the elapsed time before the attack
succeeds the better the availability is, since the network proves
to be more survivable to the attack. The results are compared
with the static replica scheme in which 10, 20, and 30 static
replicas are stored in a total of 300 servers. At the beginning of
the simulation the source chooses 10, 20, or 30 servers to keep
the redundant copies and the attacker randomly selects 30
servers to attack. If the adversary hits a server that keeps a
redundant copy, the adversary shuts it down. The adversary
uses its remaining attacks to keep attacking until it locates and
shuts down all the servers that keep a redundant copy, and we
measure the time elapsed in time units.
Figure 2 shows the statistics of 1000 runs for our roaming
data redundancy model – 1, 2, 3, and 4 roaming copies in 300
total servers under 30 attacks. As we increase the number of
roaming copies, the time needed for the adversary to succeed
increases exponentially. Therefore by increasing the number of
roaming copies by just one, we can achieve exponential
increase in the difficulty for the adversary.
Figure 3 shows the statistics for our comparison model –
10, 20, 30 static copies distributed in 300 total servers. Note
that the number of simultaneous attacks is 30 so that the
adversary is able to successfully shut down all the static
replicas. While the increase of the number of static replicas
increases the time necessary for a successful attack, the
increase is smooth and the average time needed for a
successful attack is apparently shorter than when 2 or 3
roaming copies are used.
From the figures it is clear that the ARRP scheme provides
higher availability than using only static replicas, and Figure 3
shows that the benefits of using our approach increase when
the number of roaming copies increases, as the average time
needed for the attack to succeed increases by around 10 times
100000
11194.7
10000
1045.31
Time
1000
109.96
100
11.08
10
1
1
2
3
4
Number of roaming replicas
Figure 2: Time for successful DoS attacks, with 300 nodes, 30
simultaneous attacks, and 1, 2, 3, 4 roaming replicas respectively.
50
46.4
45
40
35
33.4
30
Time
of the ARRP scheme. In particular, we analyze the tradeoff
between the three aspects.
1) Availability
We first analyze how survivable the roaming replicas are
against a determined attacker. In theory, if the number of hosts
in the service network is n, with r roaming replicas among the
n hosts, and the attacker is able to launch an attack in parallel,
then the probability that the attacker hits all the r roaming
replicas at the same time is given by
25
24.7
20
15
10
5
0
10
20
30
Number of static replicas
Figure 3: Time for successful DoS attacks, with 300 nodes, 30
simultaneous attacks, and 10, 20, 30 static replicas respectively.
with every additional roaming copy. The results also indicate
that in this specific setup adding 3 roaming replicas to the
service network can already achieve high availability; adding
more replicas will just add to the overhead.
2) Performance
We evaluate the performance by calculating the average
amount of time it takes to for clients to get data from replicas.
Due to space limit we only show the simulation results for
medium networks (with average node degree of 5). Figures 4
and 5 show the amount of time it takes to get data from
replicas under different replica failure rates (25%, 50%, 75%)
in case of medium (with 50 clients) and high loads (with 100
clients) respectively. The client requests are distributed
uniformly over the time period.
In the base case in which there is no roaming replica and
the clients know the location of all static replicas, the clients
will check the availability of replicas one by one without
applying RDDP protocol. However as the replica failure
increases there is a lower chance to find an available replica.
So it takes more time to find the closest available replicas. In
the other three cases the clients cache 1, 2, and 3 closest static
replica locations respectively. Using RDDP protocol it is
possible to find the closest available replica fast by sending
broadcast request, because it does not require a lot of retries
when the replica failure rate is high.
From the figures we can see that as the replica failure
increases, the average amount of time to get data also
increases. Moreover, as the load increases the time taken to
process the requests also increases, thereby each client request
has to wait for longer time on average. This explains the
increase in average amount of time to get data as the load
increases.
VI. CONCLUDING REMARKS
6
Only check static replicas
Check 1 known static replica first
Average amt. of time to get data
5
Check 2 known static replicas first
Check 3 known static replicas first
4
3
2
1
0
0
10
20
30
40
50
60
70
80
% fo replica failures
Figure 4: Performance against the percentage of replica failures for 50 clients
in medium network.
16
Only check static replicas
Average amt. of time to get data
14
Check 1 known static replica first
Check 2 known static replicas first
12
Check 3 known static replicas first
10
8
6
4
2
0
0
10
20
30
40
50
60
70
80
% of replica failures
In this paper, we point out the need for greater assurance of
the continuous availability of critical data services, and show
that current solutions are not sufficient to provide the desired
level of assurance under determined DoS attacks. We then
introduce a novel adaptive roaming replica scheme called
ARRP that aims to ensure constant availability of critical data
by adding a small number of roaming replicas when the
percentage of static replica failure is higher than a threshold.
Simulation results show that ARRP can effectively mitigate the
impacts of DoS attacks and host failures to ensure continuous
availability of critical data, with better performance and
reasonable overhead compared to only using static replicas.
In the future work, we will implement a prototype of ARRP
and evaluate it with client access sequences recorded from a
real service network and synthetic attack traffic data.
Moreover, we will investigate how frequently the roaming
replicas should be moved so that they can survive the attacks
with less overhead. Furthermore, we will study the impacts
that the topology of the network and the routing algorithm has
on the overall performance and overhead of ARRP.
Figure 5: Performance against the percentage of replica failures for 100
clients in medium network.
3) Overhead
We want to determine how much overhead the proposed
approach incurs when different threshold is used. As discussed
in Section I, the overhead is composed of movement overhead,
discovery overhead, storage overhead and update overhead.
Among these four components, movement overhead, storage
overhead and update overhead are proportional to the number
of roaming replicas, and we have shown that only a very small
number of roaming replicas are needed to achieve high
availability. The discovery overhead, however, is dependent
on how many accesses end up using RDDP to find an available
replica. To estimate this value, we use the same simulation
model to develop a random sequence of 2000 client accesses
and experiment it with different percentage of static replica
failures. The number of cached static replica addresses is
assumed to be 3.
The results are shown in Figure 6. When the percentage of
replica failure is below 65%, less than 10% of all accesses will
use RDDP. This figure provides us two insights. First, the
discovery overhead due to RDDP is small when the replica
failure percentage is low. Second, the appropriate threshold
should be higher than 65%, because when the replica failure
percentage is less than 65% most clients can still find an
available replica close to it without resorting to RDDP.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
% of accesses that use RDDP
protocol
60
[10]
50
40
30
[11]
20
10
[12]
0
30
35
40
45
50
55
60
65
70
75
80
% of static replicas that are down
Figure 6: The percentage of total accesses that use RDDP protocol to find a
replica under different percentage of static replica failure.
[13]
N. Bartolini, F. L. Presti and C. Petrioli, “Dynamic Replica Placement
and user request Redirection in Content Delivery Networks,” IEEE
International Conference on Communications, ICC 2005.
Y. Chen, A. Bargteil, D. Bindel, R. Katz, J. Kubiatowicz, “Quantifying
Network Denial of Service: A Location Service Case Study,”
Proceedings of Third International Conference on Information and
Communications Security (ICICS 2001), November 2001.
Y. Chen, R. H. Katz, J. D. Kubiatowicz, “Dynamic Replica Placement
for Scalable Content Delivery,” Proceedings of First International
Workshop on Peer-to-Peer Systems (IPTPS 2002), Cambridge, MA,
March 2002.
C.-T. Huang, A. B. Alexandrov, P. Kalakota, “Roaming Data
Redundancy for Assurance in Critical Data Services,” Proceedings of
2006 High Availability and Performance Computing Workshop
(HAPCW 2006), October 2006.
S. M. Khattab, C. Sangpachatanaruk, D. Mossé, R. Melhem, T. Znati,
“Roaming Honeypots for Mitigating Service-Level Denial-of-Service
Attacks,” Proceedings of 24th International Conference on Distributed
Computing Systems, March 2004.
S. M. Khattab, C. Sangpachatanaruk, D. Mossé, T. Znati, “Proactive
Server Roaming for Mitigating Denial-of-Service Attacks”, Annual
Simulation Symposium, 2003.
S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, “A Scalable
Content-Addressable Network”, Proceedings of ACM SIGCOMM
Conference, August 2001.
A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object
location and routing for large-scale peer-to-peer systems”, Proceedings
of the 18th IFIP/ACM International Conference on Distributed Systems
Platforms (Middleware), November 2001.
M. Srivatsa, L. Liu, “Countering Targeted File Attacks using
LocationGuard,” Proceedings of 14th USENIX Security Symposium
(USENIX Security 2005).
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan,
“Chord: A Scalable Peer-to-Peer Lookup Service for Internet
Applications”, Proceedings of ACM SIGCOMM Conference, August
2001.
M. Szymaniak, G. Pierre, M. V. Steen, “Latency-Driven Replica
Placement,” Proceedings of the 2005 IEEE International Symposium on
Applications and the Internet, February 2005.
X. Tang and J. Xu, “On replica placement for QoS-aware content
distribution,” Proceedings of IEEE INFOCOM’2004, March 2004.
BRITE. Boston University Representative Internet Topology Generator.
Available at http://www.cs.bu.edu/brite/