FRSN

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/224642575
Fault Repair Framework for Mobile Sensor Networks
Conference Paper · January 2006

DOI: 10.1109/COMSWA.2006.1665200 · Source: IEEE Xplore
CITATIONS READS
22 126
4 authors, including:
Henry Le Nadeem Ahmed

long an University of Economics of industry UNSW Sydney
22 PUBLICATIONS 442 CITATIONS 55 PUBLICATIONS 1,975 CITATIONS
SEE PROFILE SEE PROFILE
Sanjay Jha
UNSW Sydney
316 PUBLICATIONS 7,582 CITATIONS
SEE PROFILE
All content following this page was uploaded by Henry Le on 16 July 2014.
The user has requested enhancement of the downloaded file.

Fault Repair Framework for Mobile Sensor
Networks
Tuan Le 1,2 Nadeem Ahmed 1,2 Nandan Parameswaran 1 Sanjay Jha 1
tuan.le@nicta.com.au nahmed@cse.unsw.edu.au paramesh@cse.unsw.edu.au sjha@cse.unsw.edu.au
2 1
National ICT Australia (NICTA) University of NSW, Sydney
Australian Technology Park, Sydney Australia.
Australia.
Abstract— In this paper, we propose a framework for fault is unacceptable in some applications. Therefore, to overcome
repair in mobile sensor networks. A hierarchical structure which sensor node failure and to guarantee system reliability, faulty
consists of replacement module, management policy module, nodes should be detected and repaired promptly.
knowledge module, decision making module, and evaluation
module is adopted. We also propose a solution for faulty sensor On the other hand, in most cases faulty sensors can not
replacement problem. Through the numerical results, we show be easily replaced manually. Especially, in cases involving a
that our algorithm is more efficient and achieves higher energy polluted area or a hazardous chemical leak in a building [2],
savings than the greedy approach to sensor replacement. We it is too dangerous for a human to access the site for sensor
believe that the problem of faulty sensor nodes can be solved
replacement. Other applications such as military surveillance
efficiently through the cooperation and communication across
different modules, such as evaluation decision making, knowledge and smart homes may not only require maintaining the original
management, and replacement. sensing topology but also extending the existing sensing cov-
erage. In such cases, mobile sensors equipped with movement
I. I NTRODUCTION capabilities are a potential solution. For example, sensor nodes
may be placed at the entrance of the building, allowed to
In general, fault is the incorrect state of hardware or a proceed inside the building and find the desired position.
program as a consequence of a failure of a component [1]. However, the energy consumption for movement itself is
Permanent faults are the ones resulting from systems or costly. Hence, a method that minimizes such a cost is needed
communication hardware failure. For example, node may die to improve system utility.
due to battery depletion. An intermittent fault is one that This paper presents our preliminary ideas on a fault repair
has only incidental appearance due to unstable characteristic framework for mobile wireless sensor networks. We introduce
of the hardware. A transient fault is one that is the con- policy based management in conjunction with learning tech-
sequence of temporary environmental impact on otherwise niques. We also describe an off-line algorithm for replacement
correct hardware. For example, the change in environment may module of our architecture using Integer Linear Programming
cause incorrect sensor reading. In this paper we consider only formulation and numerical results to support our intuition.
permanent fault, which once activated remains continuous until The rest of the paper is organized as follows. We propose
it is detected and repaired. fault repair architecture for sensor network in Section II. In
One common characteristic of nodes in wireless sensor Section III, we describe an initial solution for replacement
networks is that they are prone to failure. Sensor nodes module of our proposed architecture and show some numerical
carry limited, generally irreplaceable, power sources. Nodes results. We discuss the related works in Section V. Section VI
in sensor networks can fail for many different due to several discuss some of the future works and concludes the paper.
reasons: their batteries may be depleted, they may be acciden-
tally destroyed, and a malicious adversary may deliberately II. FAULT R EPAIR A RCHITECTURE
incapacitate them. As time progresses, faults will occur more
often in sensor networks. Sensor network can continue to The objective of fault repair is to maintain the overall health
operate and provide services even with loss of some of the of a sensor network. The health of network here is the current
sensor nodes. However, the quality of offered services, such as sensing coverage. Assume that a sensor network is deployed
coverage, is greatly degraded upon loss of few core nodes. The to monitor a certain target area, which is divided into different
network loses utility when it does not provide the required cov- sections. It is likely that the coverage requirement is different
erage. Moreover, sensor failure may cause network topology in different sections of the area. There are major sections
changes and in extreme cases, network partitioning. Messages which may require high coverage, while other sections may
may still flow through the network despite these partitions. accept lower coverage. As sensor network is prone to failure,
However, the resulting paths may have a longer delay, which the major reason for coverage loss is faulty nodes. Therefore,
given a coverage distribution over the entire area, we want to Therefore, fault repair in sensor networks is more compli-
maintain adequate coverage of the network. cated and have different characteristics than in the traditional
Faults in sensor network have different characteristics than telecommunication network.
traditional networks. We discuss some of the distinguishing
characteristics of fault repair for wireless sensor networks here. A. Network Topology
We consider a randomly deployed sensor network that
1) Resource Limitation consists of a set of sensor S in a two dimensional area A.
S = {S1 , S2 , . . . , Sn }; (1)
Limited resources is a major issue for fault repair in
sensor networks. As the lifetime of a sensor node is Each sensor Si is located at coordinate (xi , yi ) inside A.
restricted to the limited battery power, an excessive Let the sensors be grouped into m cluster
communication burden on nodes to locate a faulty
sensor is certainly unacceptable. So, fault detection and C = {c1 , c2 , . . . , cm }; (2)
repair should spend as little energy as possible.
each consisting of at most c sensors.
The sensor network is hybrid, consisting of both static and
2) Response Time
mobile nodes. Mobile nodes are initially considered as re-
dundant nodes, not participating in the sensing/communication
To assure an application’s reliability, the faulty node
operation.
may need to be fixed quickly. For example, faulty
Each cluster maintains coverage above a certain coverage
sensors may reduce the coverage. As a result, the
threshold T C with tolerance rate tr. The coverage in a section
monitoring task may become unreliable. Faulty sensors
is thus to be maintained between the bounds given by Equation
should be replaced as soon as possible to guarantee
3.
continuing network performance. The response time for
 
CoverageBounds = T C − tr, T C + tr (3)
repair is thus also an important factor that should be
considered. In Figure 1, region A may be divided into four clusters
and each cluster may maintain different sensing coverage. For
3) Flexibility instance, cluster 1 may have T C = 70% and tr = 5%,
while cluster 2 may require T C = 30% and tr = 10%.
As sensor networks contain a large number of nodes, Coverage of cluster 1 is to be maintained between 65% and
the overall system behavior may not be affected 75%, and that of cluster 2 between 20% and 40%, respectively.
considerably in the presence of few faulty nodes. Thus, If sensing coverage requirement changes, e.g., environment
faults may not need to be repaired unless they do change, application requirement, etc, the sensing coverage
cause a problem. If faulty sensors are located in an threshold T C is re-evaluated.
unimportant region, they can be ignored. Similarly, if
faulty sensors have a minor impact on coverage, they
can also be ignored. So, the fault repair should be
smart enough to decide which nodes are eligible for
replacement.
4) Adaptive
Fault repair framework should be adaptive to the

operational surroundings. e.g. the coverage requirement
in a section may change due to a variety of reasons. For
instance, if the number of events increases suddenly in
this section, this section needs to adjust the coverage
resolution suitably. The framework should observe the Fig. 1. Sensing Coverage
dynamic changes, learn dynamic behavior (training)
and make predictions. Adaptablilty is thus a desirable A set of sensors called monitors are deployed to observe the
feature in fault reapir for sensor networks. health of sensor network (Figure 1). Each cluster will contain
local monitor sensor(s) (LM S) that observe its local health.
5) Scalability It is assumed that the LM S has information about the current
location of redundant sensor. The LMS thus form an overlay
Scalability is a significant issue since a sensor network network on top of the sensing layer (Figure 2).
is usually deployed on a large scale. There is also a special monitor sensor, global monitor sensor
GM S, that looks after the health of entire network. GM S
monitor the entire health of the network by accumulating
the health of sub-regions reported from various LM Ss. The
GM S may be a micro server which is not resource constraint.
In a cluster, if the current sensing coverage is lower than a
defined threshold due to faulty sensors, LM S(s) of the cluster
will implement reasonable actions to recover the coverage.
For example, mobile sensors in the network may be asked to
relocate themselves in order to achieve a desired configuration.
Since the cost associated with mobility is usually expensive,
the total distance of sensor movement in the network should be
minimized. Alternatively, a nearby redundant static sensor can
be tasked to substitute a faulty sensor, if this sensor can help in
maintaining the health of the network. LMS may also request
a low energy sensor to reduce the number of communications,
in order to extend this sensor’s life. We discuss the fault repair
actions in the next part.
Fig. 3. Fault Repair Architecture
Obviously, if the coverage is still satisfied in the presence of

a few faulty nodes, they can be ignored. The second action is
re-assignment that is responsible for requesting a static sensor
to substitute a faulty node’s role. Some of sensors may be
either in the idle state or working insignificantly, e.g., on low
duty cycle. In this case, these sensors may be re-assigned new
Fig. 2. a. Monitoring Layer b. Sensing Layer tasks so that they can substitute the faulty sensors. Thirdly,
relocation action is responsible for moving a set of sensors
B. The Architecture to extend the existing coverage. However, as the cost of
A hierarchical structure is adopted for fault repair, transportation itself reduces energy in the system, a movement
consisting of an evaluation module, a decision making schedule should be carefully planned such that the energy
module, replacement module, a management policy module, consumption is minimized and the total energy remaining is
and a knowledge module (see Figure 3). The cooperation maximized. Finally, replacement is to replace a faulty node
among these modules makes the framework efficient and by a redundant sensor. Obviously, if a faulty sensor is one of
adaptive. Following subsection discusses the individual role the constituent sensors in the network, it should be replaced
played by these modules in the fault repair framework. promptly. Therefore, based on the estimation of impact, an
appropriate action will be determined and performed.
1) Evaluation Module: Evaluation module is responsible Replacing the faulty sensor by a redundant sensor is
for evaluating the health i.e. sensing coverage of the network. intuitively simple. However, the replacement should be
Based on the current status, it will report the coverage to the energy efficient. Each faulty sensor can have different
decision making module, in which an appropriate action to preferences for its replacement, e.g. response time, remaining
be performed is decided. Also, after a recovery action takes energy etc. The best suitable redundant sensor is thus the
place, the final coverage is evaluated and validated in this one that satisfies its criterion. The cost of the movement
module. depends on the distance between the faulty sensor and the
replacement sensor and energy consumed per unit of distance.
2) Decision Making Module: Decision making module is By intuition, the redundant sensor located closest to the
the central layer of fault repair architecture. Based on the faulty node is preferred. However, this is not always the best
report received from the evaluation module and the policies case. For example, it is obvious that a faulty sensor does not
from management policies module, the scenario is analyzed want to be replaced by a low energy sensor no matter how
and an appropriate action, for replacement module, is decided. close and free it is. Instead, a faulty sensor would rather be
replaced by a long distance sensor with adequate remaining
3) Replacement module: As the name suggests, the re- energy.
placement module is responsible for sensor relocation that is
decided in the decision making module. We define four main 4) Management Policy Module: The management policy
actions for the replacement module: ignore, re-assignment, re- module contains a set of rules that the decision making
location, and replacement. The first alternate action is ignore. module and other modules should comply with them in order
to achieve common objectives, e.g., energy saving, coverage the surroundings and predict what will happen next.
resolution, etc. Fault repair is used not only for current actions, but also for
Policy is a set of rules or set of actions governing decisions improving the ability to perform optimally in future to achieve
that will be implemented to achieve the objectives [3]. Policy the objective. Prediction of faulty sensor will save energy
is not only defined by administrator, but also obtained and consumption on movement. For example, a redundant sensor
updated from the knowledge module, in which the learning may reject the request for replacement of a long distance faulty
on the network behavior is performed. sensor, if it knows in advance that the nearby sensor may soon
Policy contains four components: event, condition, action, become faulty.
and scope. Whenever an event occurs, the policy condition is Reinforcement and supervised learning are the two main
evaluated. If the specified condition is true, the corresponding classes of learning that can be applied to this environment. A
action is executed. The scope of a policy indicates its targets, redundant sensor node would attempt to decide appropriate
i.e., at which nodes it should be enforced. Policy will support movement based on its current fault location information.
the decision of decision making module for selecting an In reinforcement learning, the machine can produce actions
appropriate action performed when a fault occurs. There are which affect the state of the world, and receive rewards (or
many factors such as the probability of faulty sensors, the punishment). Its goal is to maximize the rewards (or minimize
battery status, the coverage problem, and the communication punishment) in the long term [5]. Reinforcement learning
situations that should be in considered. For example, the policy determines actions to take as well as the possible outcomes.
may define the important level of a region. If faults occur in From the actions and outcomes, it tries to learn how to
an important region, the repair algorithm may replace them behave successfully to achieve a goal while interacting with an
as soon as possible. Otherwise, faults may be ignored. Using external environment. In other words, reinforcement learning
management policy will enhance the network performance by learns via experiences. Reinforcement learning may be suitable
reducing energy consumption on unnecessary movements. in the case of soft fault sensor, in which the sensor node is still
Since there are limited resources in sensor network, the active but gives inaccurate-reading. However, it may require
management policy should be light-weight. In our architecture, lot of message exchanges resulting in a high overhead. As
three main classes of management policies are defined: cover- we only consider permanent fault in our model, we prefer a
age policies, resource policies, and performance policies. Each simpler model called supervised model.
policy contains set of rules that support its objective. Coverage Supervised learning consists of one set of observations,
policies contain set of rules related to the degree of coverage called inputs, and another set of observations, called outputs.
over a region in the network. These rules may allow every Supervised learning tries to determine the function that maps
location in a region be monitored by one node, for example. any input to an output such that disagreement with future
It also allows some regions to be maintained at a low degree of input-output pairs is minimized [4]. In our case, supervised
coverage. Resource policies are related to energy consumption learner uses the faulty location information and target values as
issues. Energy is a paramount concern in sensor network that training data. After learners are trained, they would be able to
needs to operate for a long time on battery power. To reduce make decisions based on system sensor readings. For instance,
energy consumption, resource rules may be defined to allow with such a model, sensor nodes could be able to compute the
a certain number of nodes to be inactive, while the remaining fault distribution based on faulty sensor information i.e. if the
active nodes still provide continuous services. Performance result of sensor readings is unusually different from the result
policies are the policies about delay, priority, or resolution of a of its neighbors, monitor sensors can predict that this sensor
region. Again, a fundamental problem is to define the number may be in faulty state. Further testing can be implemented to
of nodes that remain active, while still achieving acceptable a clarify the hypothesis.
degree of coverage for applications.
However, supervised learning is not flexible, as the system
There are several policy languages for network. For
will encounter the state that has not been previously observed.
example, we can use Policy Framework Definition Language
In this case, additional rules may need to be included to adjust
(PFDL) [4] to express various kinds of network policies.
the action to be adaptive to changes of environment.
The PFDL can simply express lists of IF <condition>
T HEN <action> type of rules. List of the rules will form
a policy. <condition> above is in fact a disjunctive normal III. R EPLACEMENT M ODULE –I NITIAL SOLUTION
form of single condition expressions, and <action> is a list
of single action statements. If the evaluation of the condition We presented a preliminary framework for mobility based
expression request succeeds, the action list can be performed. fault repair architecture in Section II. This work is in early
stage. Here we provide our initial results where we devel-
5) Knowledge module: Knowledge module represents in- oped an offline algorithm for the replacement module of
telligence of the network. It is motivated by the consequences the architecture. Our ambition is to develop a distributed
of the environmental changes and resource limitations. To decentralized algorithm for the replacement module. We also
obtain the knowledge about faults, monitored sensors need to provide examples of basic policies for the management module
participate in accumulating knowledge, learn what occurs in that can help improve the fault repair system performance.
PP
A. Problem statement Objective: Maximize T R = erij xij
Constraints:
Given a collection of sensors and a monitor sensor LM S,
together with their locations and the energy of each sensor, find Pij ≤ 1; i = 1, . . . , b; j = 1, . . . , a
1) x
a replacement schedule with maximum energy remaining. 2) P xij ≤ 1; j = 1, . . . , a
We assume the network topology as discussed in section II- 3) xij ≤ 1; i = 1, . . . , b
A. All the data in the cluster is reported to its LM S. When the
energy of a sensor is lower than a threshold bound, T Elow , it An optimal replacement can be found using an integer program
will alert its status to the LM S. Periodically the LM S checks with linear constraints. The integer program computes the TR
the health of the sensors in its cluster. If there are some dying subject to constraint (1) and additional linear constraints (2)
sensors, the LM S needs to make replacement schedule. Thus, and (3). Constraint (2) ensures that any redundant sensor j
at the LM S, the problem can be formulated as following: can only replace one faulty sensor. Similarly, constraint (3) is
There are a redundant sensors Sr, (a ≤ n): used to guarantee that any faulty sensor i can be replaced by
Sr = {Sr1 , Sr2 , . . . , Sra }; (4) only one sensor.
IV. N UMERICAL RESULTS
There are b faulty sensors Sf , (b ≤ a ≤ n):
A. An example
Sf = {Sf 1 , Sf 2 , . . . , Sf b }; (5) In Figure 4, we randomly place 50 working sensors (filled
circle) and 20 redundant sensors (hollow circle) in an area
Objective function: 100x100 meters. Among 50 working sensors, 10 faulty sensors
”What is a replacement schedule in order to minimize (cross-marked) are selected arbitrarily. Redundant sensors are
energy consumption?” assigned different energy remaining and all faulty sensors are
assumed to require a constant response time.
This problem is similar to the bipartite matching problem
that can be represented as a bipartite graph where one side is 100
redundant sensors and the other is faulty sensors. Hence, the 90
problem of fault replacement is transformed into the problem 80
of finding perfect matching in a bipartite graph. The bipartite 70
graph consists of two set of nodes Sr and Sf , representing 60
redundant sensors and faulty sensor respectively. There is an 50

edge from Srj to Sf i and the weight of the edge is the actual
40
energy remaining after movement.
30
B. Problem formulation 20
Let’s call matrix ERb×a energy remain matrix, for it 10
specifies the remaining energy for the network. 0

0 10 20 30 40 50 60 70 80 90 100
er11 er12 . . . er1a 

 

 er21 er22 . . . er2a 

 
 Fig. 4. Initial Network Topology
ERb×a = 
 
.. .. .. .. 
 
 


 . . . . 


TABLE I
 
erb1 erb2 . . . erba
R EDUNDANT SENSORS WITH DIFFERENT REMAINING ENERGY
Where, erij is the total energy remaining at redundant Redundant Node Energy Redundant Node Energy
sensor j if faulty sensor i is replaced by sensor j. Thus,
1 55 11 77
2 32 12 79
erij = Ej − Emove ; (6) 3 95 13 50
4 64 14 59
where Ej is the initial energy of sensor j and Emove is the 5 31 15 73
energy consumed for movement. 6 106 16 114
7 95 17 29
8 56 18 105
X = {xij }; i = 1, . . . , b; j = 1, . . . , a (7) 9 77 19 46
10 91 20 101
where xij = 1 if a faulty sensor i is replaced by redundant
sensor j, xij = 0, otherwise. Firstly, the energy remaining matrix (redundant sensor j
The replacement problem can be formulated as Integer replaces faulty sensor i) is calculated (Table II). Then by using
Linear Program as following: ILP, we calculate the maximum total remain energy that comes
out to be 1228 units of Energy.
TABLE II
E NERGY R EMAINING M ARTIX
j1 j2 j3 j4 j5 j6 j7 j8 j9 j10 j11 j12 j13 j14 j15 j16 j17 j18 j19 j20
i1 0 5 51.9 20 0 42 84.7 11.3 63.3 66.2 40.9 26.6 0 33.3 25 99.2 0 44.8 0 82.6
i2 13.4 0 14.8 0 0 68 40.9 50.6 35.8 37.2 3.7 33.1 0 19.1 0 75.5 0 37.9 3.8 37.9
i3 15.3 0 18.9 0 0 60.3 0 0 4.7 0 0 42.8 3.4 0 2.7 19.9 0 76.3 7.8 12.5
i4 35.3 0 24.1 0 0 80 14.3 0 19.6 0 0 60.4 6.9 0 6.3 36.8 0 79.8 27.9 24.1
i5 1.5 0 71.7 35.3 0 47.9 62 0 58.6 39.7 19.3 40.4 25.4 7 48.2 73.2 13.4 67.9 0 77.5
i6 11.9 0 50.2 11.1 0 56 25.9 0 28.1 4.8 0 50.6 34.7 0 34 41.3 6.8 99.1 6.7 41
i7 41.9 0 26.5 0 0 92.5 34.1 20.9 38.1 21.6 0 61.5 0.3 0 5.6 61.6 0 64.7 33.6 38.4
i8 36.9 0 19 0 0 91.6 31.4 26.8 33.7 21.3 0 52.3 0 1 0 61.3 0 55.5 26.7 33.6
i9 18.8 0 57 18.8 0 64.2 46.1 0 49.7 26.1 2.1 58.9 31.5 0 37.3 63.1 22.9 8.5 13.9 58.4
i10 0 0 58.7 31.8 0 18.4 76.9 0 41.5 58.9 51 6.7 0 15.9 29.4 78.1 0 32 0 88.1
The replacement schedule is as follows: may work occasionally. If a sensor fault occurs in an important
section, it should be repaired soon. Therefore, each faulty
Assignment = [ 7 8 12 1 3 18 6 16 9 20 ] sensor has different time delay preference for replacement.
The management policy for this problem may be as follow:
This assignment matrix means the faulty sensor 1 will be If the redundant sensor response time is greater than the
replaced by redundant sensor 7; faulty sensor 2 is replaced response time acceptable for a faulty sensor, then do not move
redundant sensor 8, and so on. The following Figure describes that redundant sensor.
the movement schedule:
dij
≤ Ti (9)
vj
Where dij is the distance between redundant sensor j and
faulty sensor i, vj is velocity of redundant sensor j and Ti is
the response time required for faulty sensor i.
For our given network topology, we applied these manage-
ment policy rules and re-calculated the movement schedule
and the cost of movement (see Figure 6).
MOVEMENT SCHEDULE WITH MANAGEMENT POLICY
100
6 92
90
1 42 8
4 19 51
80 3 60 7 2
8
70 12
60
99
Fig. 5. Movement Schedule Without Management Policy 6
18
50 23
9
17 14
9 16
99
40 13
1
B. With simple management policy 30
5
102
72 7
We provide a simple example of a resource rule and a 20 20

88
15
performance rule that can be defined in the management policy 10
5
3
10 11
4
module of the fault repair architecture: 0
0 10 20 30 40 50 60 70 80 90 100
1) Resource rule: It is obvious that a redundant sensor
should only be used for replacement if it still has a minimum
Fig. 6. Movement Schedule With Simple Management Policy
desired level of energy after the movement. This desired
level of energy, T haccept , guarantees that the sensor continues
The amount of total energy remaining in the network
working at the new location after the movement. Therefore, a
with simple management policy (1321 unit)is better than no
simple resource rule for management policy follows:
management policy defined (1228 unit of energy). This is
A redundant sensor can only be chosen for replacement, if
because the resource rule of the management policy helps the
the remaining energy of redundant sensor after movement is
replacement plan to avoid any long distance movements. For
greater than T haccept .
example, without the management policy, a monitor sensor
will ask the redundant sensor 16 to replace the faulty sensor
(Ei − Emove ) ≥ T haccept (8)
8, and the energy remaining in the redundant sensor 16 after
2) Performance rule: Each location usually has different movement is 61 (Figure 5). However, it is not a optimal
working load. While there are major sections in which the schedule, since sensor 16 travels a long distance and drains
sensing tasks occur continuously, in some other section sensors its energy considerably. With the inclusion of the management
policy, instead of replacing the faulty sensor 8 by the redundant Figure 7 shows that the greedy algorithm comes close to ILP
sensor 16, it asks the redundant sensor 6 to replace 8 (Figure results for a range of different number of redundant nodes as
6), and the remaining energy of the node 6 after the movement well as when the number of faulty nodes increase.
is fairly high (92 unit). Also, the redundant sensor 16 is
scheduled to replace the faulty sensor 5, resulting in 99 units V. R ELATED W ORK
of energy after the movement. As a result, the total energy
remaining improves. Therefore, the use of management policy Fault repair concept has been used widely for almost half a
enhances the network performance. century in most of areas. In computer systems, proactive fault
repair also has been attended. For example, Moore and Shan-
C. Replacement algorithm performance non [6] and von Neumann [7] use the redundancy to enhance
We want to investigate the performance of replacement reliability for the networks which are built from unreliable
algorithm. components. More recently, fault tolerance in Internet such as
1) Greedy Heuristics: To measure the performance of the network availability and performance has been discussed in
replacement algorithm we implemented our algorithm and [8]. Recently, wireless sensor community has greatly focused
compared it to the greedy algorithm (heuristics algorithm). on related research topic called fault tolerance. A reliable
The idea of the greedy algorithm is that each faulty sensor routing protocol in sensor network with an arbitrary network
will select one of redundant sensors that have the minimum topology has been discussed in [9]. In [10], a distributed
cost for movement (e.g. physically closest.) In Figure 7, there routing algorithm is proposed, which is able to deal with
are 90 working sensors, 10 faulty sensors which are randomly faults or holes presented in a sensor network. Moreover, due
selected and the number of redundant sensor is increased from to harsh environmental conditions, majority of measurements
10 to 30 nodes. In Figure 8, there are 100 working sensors, in sensor networks are usually subject to errors. Techniques
from which we selected the number of faulty sensors (from 1 for measuring and adjusting uncertain values are presented in
to 20). We kept the number of redundant sensor constant at [11] and [12]. They guarantees reliable and accurate output
30. when a large number of sensor measurement faults occurs. In
[13] [14] and [15], several localized threshold based decision
800
schemes to detect faulty sensor are proposed. The most re-
750
our algorithm
cent readings of sensors are stored and statistically analyzed.
greedy algorithm
Faults are detected based on any abnormal readings which
Total
Energy
700 are beyond an application-specific threshold. By exploring the
Remaining
650
correlation among neighboring sensor readings, faulty readings
are separated from event readings. The intuition behind the
600 approaches is that event readings are likely to be spatially
correlated. The confidence are computed statistically based on
550
the decision predicates from neighboring sensors. A different
500 approach to detect failed nodes through route discovery and
update is presented in [16]. A watchdog mechanism is used
450
10 12 14 16 18 20 22
Number of redundant sensor nodes
24 26 28 30
to identify misbehaving nodes and a path navigator is used
for supporting routing protocols to avoid them. In general
sensor readings are collected at a base station. An algorithm
Fig. 7. Increasing the No of Redundant Sensors
that is able to trace faulty nodes once these reading are
received at the base station is proposed in [17]. However, there
1000
has been a limited research on fault repair. While there are
900
several works on energy replacement by using mobile robots
our algorithm
800
greedy algorithm
to recharge sensor nodes [18] [19], these new technologies
have not been implemented yet. The author in [20] proposes
700
a framework for replacing faulty sensor nodes by relocating
600
Total
Energy
Remaining
mobile sensors. The framework consists of two phases, a Grid
500
Quorum solution that locates the closest redundant sensor
400
and the calculation of an efficient route for the relocation of
300
mobile sensors. Cascaded movement is used to achieve good
200 balance between energy efficiency and response time when
100 determining a sensor relocation path. In [21], an algorithm
0
0 2 4 6 8 10 12 14 16 18 20
called Coverage Fidelity maintenance algorithm (Co-Fi) uses
Number of faulty sensor nodes
mobility of sensor nodes for automated deployment and for
repairing of coverage loss in the monitoring area. One of the
Fig. 8. Increasing the No of Faulty Sensors limitations of these algorithms is that they are not able to
replace multiple faulty sensor nodes at a time. Hence, it is [13] M.Ding, D. Chen, K. Xing, and X. Cheng, “Localized fault-tolerant
obviously not suitable for a long term maintenance, where a event boundary detection in sensor networks,” in Infocom 05, March
2005.
number of faulty sensors can be significantly large. To the [14] B. Krishnamachari and S. Iyengar, “Distributed bayesian algorithms for
best of our knowledge, there is no existing fault repair archi- fault-tolerant event region detection in wireless sensor networks,” IEEE
tecture in sensor networks in the presence of numerous faulty Transactions on Computers, vol. 53, pp. 241–250, March 2004.
[15] M. Alanyali, S. Venkatesh, O. Savas, and S. Aeron, “Distributed
sensor nodes. Our proposed architecture provides robustness, bayesian hypothesis testing in sensor networks,” in American Control
adaptivity, and scalability. Conference, 2004.
[16] S. Marti, T. Giuli, K. Lai, and M. Baker, “Mitigating routing misbehavior
in mobile ad hoc networks,” in Mobicom, 2000.
VI. C ONCLUSION AND F UTURE WORK [17] J. Staddon, D. Balfanz, and G. Durfee, “Efficient tracing of failed nodes
in sensor networks,” in ACM Workshop on Wireless Sensor Networks
In this paper, we proposed a fault repair architecture. and Applications WSNA ’02, September 2002.
[18] M. Rahimi, H. Shah, G. Sukhatme, J. Heidemann, and D. Estrin, “Study-
We introduce policy based management in conjunction with ing the feasibility of energy harvesting in a mobile sensor network,” in
learning techniques. As a starting point we provide some IEEE International Conference on Robotics and Automation (ICRA),
examples of the management policies and an offline algorithm 2003.
[19] A. LaMarca, D. Koizumi, M. Lease, S. Sigurdsson, G. Borriello,
for the replacement module of our architecture using Integer W. Brunette, K. Sikorski, and D. Fox, “Plantcare: An investigation in
Linear Programming formulation. We compare our ILP results practical ubiquitous systems,” Intel Research, vol. IRS-TR-02-007, 2002.
with greedy heuristics. Our results suggest that the greedy [20] G. Wang, G. Cao, T. Porta, and W. Zhang, “Sensor relocation in mobile
sensor networks,” in Infocom 05, March 2005.
algorithm performs very close to the optimal ILP results in [21] S. Ganeriwal, A. Kansal, and M. B. Srivastava, “Self aware actuation
terms of energy remaining. In its current stage our algorithm for fault repair in sensor networks,” in IEEE International Conference
is intended to be implemented at the Base Station and cluster on Robotics and Automation (ICRA), May 2004.
heads. We are also working on the distributed version of it at
the moment. Moreover, in this work we presented the empir-
ical evaluation of the replacement action in the replacement
module of architecture. We are currently extending our work
to include numerical and experimental evaluation of the entire
system. Our future work will develop all other modules of the
framework and provide experimental evaluation of a prototype
system.
R EFERENCES
[1] F.Koushanfar, M. Potkonjak, and A. Sangiovanni-Vincentelli, “Fault
tolerance in wireless ad hoc sensor networks,” IEEE Sensors, vol. 2,
pp. 1491–1496, June 2002.
[2] A. Howard, M. J. Mataric, and G. S. Sukhatme, “Mobile sensor network
deployment using potential fields:A distributed, scalable solution to the
area coverage problem,” in 6th International Symposium on Distributed
Autonomous Robotics Systems (DARS02), June 2002.
[3] M. Blaze, J. Feigenbaum, and J. Lacy, “Decentralized trust manage-
ment,” AT&T Research, 1996.
[4] P. F. D. Language, “draft-ietf-policy-framework-pfdl-00.txt,
http://www.ietf.org/proceedings/98dec/i-d/draft-ietf-policy-framework-
pfdl-00.txt.”
[5] T. Runarsson and S. Sigurdsson, “The learning methodology,
http://cerium.raunvis.hi.is/ tpr/courseware/svm/notes/chapter1.pdf.”
[6] E. Moore and C. Shannon, “Reliable circuits using less reliable relays,”
Franklin Institute, vol. 262, pp. 191–208, 1956.
[7] J. Neumann, “Probabilistic logics and the synthesis of reliable organisms
from unreliable components,” Automata Studies, pp. 43–98, 1956.
[8] D. Medhi, “Network reliability and fault tolerance,” in Wiley Encyclo-
pedia of Electrical & Electronics Engineering, University of Missouri,
1999.
[9] S. Iyengar, M. Sharma, and R. Kashyap, “Information routing and
reliability issues in distributed sensor network,” IEEE Transaction Signal
Processing, vol. 40, pp. 3012–3021, 1992.
[10] Q. Fang, J. Gao, and L. J. Guibas, “Locating and bypassing routing
holes in sensor networks,” in IEEE INFOCOM 2004, June 2004.
[11] V. Bychkovskiy, S. Megerian, D. Estrin, and M. Potkonjak, “A collab-
orative approach to in-place sensor calibration,” in 2nd International
Workshop on Information Processing in Sensor Networks IPSN ’03,
University of California, 2003.
[12] K. Whitehouse and D. Culler, “Calibration as parameter estimation in
sensor networks,” in ACM Workshop on Wireless Sensor Networks and
Applications WSNA ’02, September 2002.
View publication stats

FRSN

Uploaded by

Copyright:

Available Formats

FRSN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FRSN

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Fault Repair Framework for Mobile Sensor Networks

Conference Paper · January 2006

Henry Le Nadeem Ahmed

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

tuan.le@nicta.com.au nahmed@cse.unsw.edu.au paramesh@cse.unsw.edu.au sjha@cse.unsw.edu.au

Fault repair framework should be adaptive to the

Fig. 3. Fault Repair Architecture

Obviously, if the coverage is still satisfied in the presence of

redundant sensors and the other is faulty sensors. Hence, the 90

problem of fault replacement is transformed into the problem 80

of finding perfect matching in a bipartite graph. The bipartite 70

graph consists of two set of nodes Sr and Sf , representing 60

redundant sensors and faulty sensor respectively. There is an 50

Let’s call matrix ERb×a energy remain matrix, for it 10

specifies the remaining energy for the network. 0

er11 er12 . . . er1a 

We provide a simple example of a resource rule and a 20 20

View publication stats

You might also like