0% found this document useful (0 votes)

23 views

Adaptive Reinforcement Learning-Based Routing Protocol For Wireless Multihop Networks

Uploaded by

Mark Jennings

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Adaptive Reinforcement Learning-Based Routing Protocol For Wireless Multihop Networks

Uploaded by

Mark Jennings

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2018 14th International Scientific-Technical Conference APEIE – 44894

Adaptive Reinforcement Learning-based Routing

Protocol for Wireless Multihop Networks
Dmitrii A. Dugaev1, Ivan G. Matveev1, Eduard Siemens1, Viatcheslav P. Shuvalov2
1
Faculty of Electrical, Mechanical and Industrial Engineering, Anhalt University of Applied Sciences, Koethen,
Germany
2
Siberian State University of Telecommunication and Information Sciences (SibSUTIS), Novosibirsk, Russia

Abstract – This paper presents a research on a topic of traffic profiles. For instance, medium access control
development of an adaptive packet routing scheme for wireless mechanisms, usually located on L2 layer of MANET
multihop networks, based on reinforcement learning protocol implementation, should handle considerable amount
optimization algorithm. of issues coming from radio resource allocation algorithms,
A brief overview of classical approaches for data routing in
high bit- and packet- error probabilities, collisions and
multihop networks is provided, emphasizing main drawbacks of
such algorithms, caused by ineffective hop count routing metric interferences. All the mentioned input conditions are handled
used in traditional multihop routing algorithms. Then, an on L2 layer of IEEE 802.11, 802.15.4, 802.15.1 and many
approach based on reinforcement learning theory is presented, other communication standards, or custom L2 MANET
that has a potential to select more effective routes, relying on implementations.
feedback information from neighboring nodes. An algorithm This paper focuses on effective data routing algorithms,
based on reinforcement learning optimization function is which are built on L3 layer, upon the already chosen and
proposed, as well as additional functions are introduced for implemented L2 standards. Widely used implementations of
initial route weights distribution and dynamic route probability the routing algorithms for wireless multihop networks can be
selection, depending from the current packet loss ratio (PLR)
found in OLSR [4] and B.A.T.M.A.N. [5] routing protocols.
and receive signal strength indicator (RSSI) factors.
The elaborated adaptive routing scheme then has been tested in Some classical routing approaches for MANETs are also
real wireless multihop topology, where a programming described in AODV, DSDV [5] and DSR [7] algorithms.
implementation of the proposed algorithm – RLRP protocol, Those algorithms will be shortly overviewed in the
showed better routing performance characteristics in terms of corresponding section of this article.
PLR and RRT (Route Recovery Time), compared to a It also should be mentioned, that various techniques and
traditional improved proactive scheme of wireless multihop algorithms from Machine Learning and Artificial Intelligence
routing, implemented in widely used B.A.T.M.A.N. (Better fields are being applied more and more commonly to many
Approach to Mobile Ad hoc Networking) protocol. specific telecommunication tasks, including a problem of
effective data routing in wireless multihop networks. For
Index Terms – RL-routing, adaptive routing, RLRP routing
protocol, ad hoc networks, MANET, wireless multihop instance, there is a number of research works which cover
networks, wireless mesh networks. this topic by introducing Reinforcement Learning-based and
the other Machine Learning-based concepts to the routing
problem in WSNs, WMNs and MANETs in general,
I. INTRODUCTION described in the Related Work section. Most of them are

T HE TOPIC of wireless multihop networks development concentrating on providing a generalized concept of adaptive
is becoming more and more important in routing with machine learning algorithms, skipping the
telecommunication industry as well as among the important implementation and testing phases of real network
researchers, especially in a context of Internet of Things protocol development.
(IoT) and Industry 4.0 (Industrial IoT) paradigms. The The adaptive Reinforcement Learning-based routing
following kinds of data transmission networks show a huge algorithm, presented in this paper, has a focus towards
potential to fulfill a demand for many industrial and embedding into real-life scenarios of MANET applications,
customer-based services, such as FireChat messenger [1], or and introduces a reward distribution function, the feedback
an intellectual street illumination system – SmartLighting scheme for adjusting the reward value of the chosen next-hop
project [2], [3]. neighboring node, as well as functions for dynamic neighbor
Wireless multihop networks are often converged into a selections. It also proposes a new, combined routing metric,
general name of MANET, which stands for Mobile Ad hoc based on classical hop-count value, current PLR value and
Networks, and includes many variations – e.g. Wireless RSSI (Received Signal Strength Indicator) of an incoming
Sensor Networks (WSNs), Wireless Mesh Networks packet.
(WMNs), etc. A well-tested programming implementation of the
Currently, real implementations of MANET networks have proposed adaptive routing algorithm is presented as well
some performance drawbacks, primarily caused by unreliable under a name of RLRP – Reinforcement Learning Routing
transmission medium under decentralized and sporadic Protocol, which has been extensively tested in real wireless
mulihop networks and showed better performance results in

209
978-1-5386-7054-5/18/$31.00 ©2018 IEEE
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 21:28:29 UTC from IEEE Xplore. Restrictions apply.
XIV Международная научно-техническая конференция АПЭП – 2018

terms of PLR (Packet Loss Rate) and RRT (Route Recovery network nodes increases, therefore, lowering the overall
Time) per selected route, compared to the traditional useful network throughput.
multihop routing approach, realized in widely used Examples of proactive routing schemes are realized in
B.A.T.M.A.N. protocol. DSDV [5]. The improved versions of proactive schemes
were successfully implemented in OLSR [4] and
B.A.T.M.A.N. routing protocols [5].
II. RELATED WORK

A. Classical routing approaches

Traditional task of packet routing in ad hoc networks has
been extensively investigated by many researches [4] - [8].
Those routing protocols are usually divided into three
subclasses:
- reactive (on-demand) protocols:
these protocols are based on on-demand strategies, when a
network route is created only when the source node requests Fig. 2. Example of proactive routing scheme.
a data transmission to a destination node. For this purpose, a – hybrid protocols:
route discovery procedure is invoked every time a data The hybrid multi-hop routing protocols combine the first
packet has to be transmitted (Fig. 1). When the data two subclasses (reactive and proactive) and allow a dynamic
transmission is finished, the established route becomes switch of the routing algorithm, depending on the current
inactive till the next demand in packet transmission. communication conditions. For instance, the hybrid
Such routing approach is beneficial in autonomous HWMP protocol, which has been developed as a part of
wireless multi-hop networks as WSNs, where the aspect of experimental IEEE 802.11s mesh standard [9], switches the
autonomous operation time is critical and, therefore, a routing scheme from the reactive to proactive one and vice
minimum redundancy in service message transmission is versa, according to the proposed airtime-link metric. This
required. dynamic switch allows to leverage the advantages of both
The reactive routing algorithms are used in AODV [8] and classical schemes under varying communication conditions.
DSR protocols [7].
B. Machine Learning-based routing approaches
There are also many authors who introduced an application
of various Machine Learning algorithms [10] - [11], as well
as dynamic programming algorithms [12] to the routing task.
In [13], authors propose a Reinforcement Learning-based
generic model for adaptive routing in ad hoc networks.
Authors give a clear problem statement and provide an
algorithm for learning optimal routing strategies. The
Fig. 1. Example of reactive routing scheme. proposed algorithm is then being tested in the network
simulation conditions and compared with the classical
– proactive protocols: AODV scheme. The similar approach is presented in [14],
proactive protocols are based on “table-driven” routing and also provides simulation results of the proposed
techniques, when the information about all possible routes to RL-based scheme, compared to traditional routing
all possible destinations (i.e., global routing table) is gathered approaches.
continuously and independently from the user traffic. In this Even though the above mentioned papers present a clear
case, each node has its own global routing table. Such application of RL algorithms to the problem of routing tasks
information is continuously being updated by sending control with the corresponding mathematical optimization models,
(route update) packets throughout the network (Fig. 2). This they do not provide methods or a program implementation of
route update mechanism becomes very important in the the developed algorithms for direct usage in real wireless
conditions of wireless ad hoc networks considering their multihop networks and their applications.
properties, such as dynamic network topology and unreliable In contrary, this paper provides concrete schemes for
communication environment. calculating neighbor selection probabilities, the concrete
The advantage of the proactive protocols lies in higher solution for packet forwarding and reward feedback logic,
route selection flexibility, compared to reactive protocols, as based on RL theory, as well presents a well-structured,
well as in lower overall transmission delay and, therefore, independent adaptive routing protocol, available to be used
higher route throughput. However, those advantages are in real wireless ad hoc network applications.
devaluated as the network size grows, since every node has
to periodically broadcast the route-update service message
towards the neighbors. The number of those service
messages tends to grows exponentially, as the amount of

210

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 21:28:29 UTC from IEEE Xplore. Restrictions apply.
2018 14th International Scientific-Technical Conference APEIE – 44894

III. OVERVIEW OF REINFORCEMENT LEARNING ALGORITHM – reward estimation value on the previous step;
– reward estimation value on the current step;
Reinforcement Learning (RL) is a part of machine learning
– reward value for an action taken on the current step;
theory that introduces a notion of agent, environment and
– step size parameter;
reward, which are meant to optimize a given task within a
– current step number.
certain criteria, actively interacting with environment by
From generalizing the problems the RL theory solves, two
executing the given actions.
main RL-tasks can be highlighted:
In a fundamental work of R. Sutton, devoted to the RL
theory [15], a generalized process of RL interaction is
1) how to update a set of estimation values Q after the
described (Fig. 3), where an agent has some set of actions A,
reward is received?
which the agent is able to choose from and interact with the
given environment. An environment reacts on the action The simplest way is to update the current estimation value
performed by the agent, and sends back a feedback in a form according to sample average method – i.e., by keeping the
of reward, thus, the environment reinforces the agent with current arithmetic average of the estimation value and
additional knowledge about itself. This new knowledge is updating it with the received reward value.
then used by the agent to adapt the selection of the actions
towards the environment in the future, which is usually 2) how to select an action from set A, given sets P and S?
controlled by the introduced estimation value Q.
The most common methods for action selection are:
greedy, e-greedy and softmax methods. The greedy method
implies the selection of the action with the maximum
estimation value all the time. E-greedy method introduces a
small value of e, when the selected action is different from
the one with the maximum estimation value. The softmax
method implies a dynamic change of selection probabilities
of the actions, according to pre-defined probability function
– e.g. a Gibbs-Boltzmann distribution, as mentioned in [15].

IV. RL APPLICATION TO THE ROUTING TASK

Fig. 3. Generalized Reinforcement Learning scheme [15]. The data routing task in the reactive mode for wireless
According to R. Sutton [15], the RL optimization process multihop networks can be divided into two main operational
can be mathematically represented as Markov Decision stages:
Process (MDP), introducing 4 sets {S, A, P, R}, where:
S – set of states an agent can be in, in a given moment of Stage 1. Initial route discovery towards destination node.
time; This stage is executed at initial phase of routing operation,
A – set of possible actions an agent can perform in a given when the routes towards various destinations as well as the
moment of time; network topology are unknown.
P – set of probabilities, that an agent, being in a state s, The proposed RL-based algorithm uses a part of the
transits to a state s' by performing an action A within t+1 reactive routing scheme, used for finding initial route tables
time. during path discovery process. It implies the usage of RREQ
The transition probabilities from s to s' after performing (Route Request) and RREP (Route Reply) service messages,
action a, according to MDP, can be described as [15]: which are being spread across the network nodes [7].
= { = | = , = } (1) Stage 2. Packet forwarding.
The corresponding estimation reward values for given This stage corresponds to a normal operation of the routing
action a, from state s to s', is described as [15]: algorithms, when the information about initial routes and
network topology is available. In this stage, the packets are
= { | = , = , = } (2) being sent according to a forwarding policy, specified by the
In terms of RL theory, the sets S and P are interpreted as a algorithm.
single set of estimation values Q, which is dependent from In the given context, this algorithm corresponds to
the feedback reward value from an environment, and also RL-based value estimation function alongside with the
from the current moment of time t, when the corresponding feedback reward, coming from the selected neighboring
action has been performed. The RL estimation values node.
function is presented as follows [15]: Having identified the main stages of routing operation, the
relations between the routing forwarding task and the RL
= + ∙ − , (3) action selection task can be established, presented in Table 1
where: and Fig. 4.

211

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 21:28:29 UTC from IEEE Xplore. Restrictions apply.
XIV Международная научно-техническая конференция АПЭП – 2018

TABLE I 3. ACK message with high reward is received.

THE RELATIONS BETWEEN ROUTING AND RL TASKS
If the ACK message contains high reward value, this means
Routing Task RL Task that the selected neighbor has “good” path towards the
source node agent destination, and this neighbor is preferable to be used for
neighbors of source node set of actions A forwarding packets towards the specified destination address.
set of reward estimation The high reward value increases the chances for the neighbor
route table to be selected more often in the future for the given
values Q
source nodes send a packet agent has selected and destination.
towards its neighbor performed an action The proposed algorithm uses the reactive approach on the
source node receives an stage 1 of the routing process. Thus, the RREQ and RREP
agent receives a reward for services messages are broadcasted across the network. In the
ACK message from its
a chosen action event of the broadcast, the receiving nodes forward the
neighbor
corresponding RREQ/RREP further through the network.
In the event, when a node receives an incoming packet The initial distribution of the route weights along
assigned to a specific destination, the node selects a next-hop RREQ/RREP broadcast is proposed to calculate according to
node from current list of neighbors and forwards the packet the following formula:
further via the selected neighbor. Then, the sending node
waits for the corresponding ACK message, which is sent by
= , ≠ 0,
the chosen neighbor after receiving the packet. During this | |∙
(4)
packet forwarding process, the following three events might = , = 0,
happen:
where:
1. A loss of the forwarded packet or a loss of the returning – estimated reward value for the current RREQ/RREP
ACK message. recipient;
Under this event, the source node will not receive the ACK – received signal strength indicator upon reception of
message and will assign a negative reward value for the RREQ/RREP message;
selected route via the given neighbor after pre-defined – number of hops a RREQ/RREP has traversed
timeout interval. This behavior makes sense, since an event before being received by the current node estimated reward
of packet loss has occurred, and it is prospectively beneficial value for the current RREQ/RREP recipient.
to lower a selection probability of the chosen neighbor More detailed scheme of the proposed initial route weight
towards the given destination address. This is done by distribution is described in [16]. According to the proposed
assigning a negative reward value on the sender side. scheme, shorter routes with higher RSSI values along the
path have more chances to be selected at the initial stage of
2. ACK message with low reward is received. packet forwarding. However, those weights, which directly
correspond to the selection probabilities, as described further,
This scenario assumes, that the chosen neighbor has will vary during the packet forwarding stage (Stage 2).
succesfully received an incoming packet, and generated back This variation from the initial weight distribution is
the ACK message, which was then received by the sender essential in the context of wireless multi-hop networks,
node. However, the selected neighbor generated a low where the more effective route is not always the shorter one,
reward value, which can be interpreted as, that the neighbor since the hop-count metric in not always optimal in multihop
has “bad” path towards the given destination. The low wireless topologies [17].
reward value will decrease a selection probability of the Instead, the proposed packet forwarding scheme constantly
chosen neighbor towards the given destination address. monitors the selected routes by receiving feedback in a form
of ACK messages, containing an up-to-date information
about the route weight variations towards the destination.
Moreover, in an event of packet/ACK loss, the proposed
scheme will automatically decrease a selection probability of
the chosen neighbor. This behavior allows a more rapid
selection of alternative routes, as it will be shown in RRT
(Route Recovery Time) characteristics, obtained during the
experiments.
The packet forwarding scheme at the Stage 2 (Fig. 3) is
based on the estimated reward value, according to the
formula Error! Reference source not found.. When the
selection decision is made and the packet is forwarded, the
sender node waits for incoming ACK message from the
selected neighbor during the pre-defined timeout interval,
called ACK delay timeout (TACK), calculated as follows:
Fig. 4. Relation between routing task (left) and RL task (right).

212

( )
= + + , (5)
( )= ( ) , (7)
where: ∑
– estimated round trip time value along a direct link
between sender and receiver; where:
– ACK generation delay on receiver side; ( ) – selection probability of action a on step t;
– transmission time variation over a direct link, which ( ) – estimated reward value for selecting action a on
depends from the lower L2 protocol implementation – i.e., step t;
CSMA/CA timeouts, L2 ARQ timeouts, etc. ( ) – estimated reward value for selecting alternative
action b on step t;
– positive temperature parameter.
In a context of the given packet forwarding task, a
temperature parameter τ defines how likely the neighbor with
the maximal estimation reward value is chosen, i.e. it defines
the selection probability of the most attractive node,
according to its estimation value. The parameter τ is
proposed to be changed dynamically, depending on the
current packet loss rate (PLR) on the selected route,
introducing the following τ(PLR) function:

( ) = 10 , ≤ 1,
(8)
( ) = 10 ∙ ∙( − 1) + 10 , > 1,
where:
= ⋅ 100 – packet loss rate, varying in
range [0, 100];
– temperature parameter from Gibbs-Boltzmann
distribution;
– growth coefficient, equaled to 0.5 by default.
An example of the behavior of the route selection
probability function, depending from τ(x), is presented
Fig. 5. Packet forwarding scheme with ACK delay timeout. on Fig. 6. In the given example, a source node has 5 direct
neighbors and has to forward an incoming packet towards the
In the developed packet forwarding scheme at Stage 2, the given destination node.
following reward generation rule at the receiver side is After the path discovery stage, when initial weights
proposed: towards the source and destination nodes are established, the
source node has a list of estimated rewards to all direct
= , ≠ 0, neighbors towards the destination – Q(n). In the given
| |
(6) example with 5 direct neighbors, the list size is equal to 5.
Assuming that after the initialization, the list of weights
= 10 ∙ , = 0, contains the following values, in a dictionary data format of
where: Python programming language:
∑
= – average estimation value at the Q_n = {n1: 50.0, n2: 33.3, n3: 11.1, n4: 44.0, n5: 51.0}
receiver node, towards the given destination address, Using the mentioned Gibbs-Boltzmann distribution, the
calculated from the current route table information of the following action selection probabilities list P_t is calculated.
receiver node; At the step 0, a temperature parameter τ(x) is equal to 10,
– estimation value towards given destination IP via assuming that the initial PLR value is 0:
i-th route in the table;
– total number of estimation values. P_0 = {n1: 0.35, n2: 0.07, n3: 0.01, n4: 0.2, n5: 0.37}
The reward values are ranged from 0 to 100. Initial
negative reward value is equal to -1.
The proposed method for next-hop neighbor selection uses
the softmax method, based on Gibbs-Boltzmann
distribution [15]:

213

={ , }, (9)
where:
– packet is delivered successfully to a next hop;
– packet is lost during transmission;

={ , }, (10)
where:
– select a next-hop node and forward packet there;
– remove the packet.
Corresponding to the sets S and A, define the following
state transition probabilities:
α – transition probability from to
1- α – transition probability from to
β – transition probability from to
1 - β – transition probability from to
The set R has the following entries:
Fig. 6. Selection probability distribution of next-hop neighbors, depending
from PLR.
∈{ , , , }, (11)
Thus, the neighbor 5 will be selected most of the time –
where:
with 37 % of selection probability at the initial step 0.
– reward for successful packet transmission,
With the ongoing process of neighbor selections and
calculated by the formula Error! Reference source not
subsequent packet forwarding, the PLR value is assumed to
found.;
vary in an unpredicted pattern, thus, affecting the reliability
– reward for a single unsuccessful packet
of the established routes. In such scenarios, the temperature τ
transmission, which has a fixed value, equal to:
parameter is modified according to the formula Error!
= −1, = 1;
Reference source not found..
– reward for subsequent unsuccessful packet
E.g., during the packet forwarding phase, at the step t, the
transmissions, calculated by inverted exponential law, which
estimated PLR value is changed from 0 to 20 percent.
allows faster switching to alternative neighbors, according to
Accordingly, the new τ parameter has the new value of:
the following formula:
τ = τ(PLR) = 10 * 0.5 * (20 - 1) + 10 = 105
= ( ) = −1 ∙ , 2; (12)
The new selection probabilities list Pt(a) at the step t will
be updated: – reward for successful packet transmission after
previous fail. This reward value is equal to a positive
P_t = {n1: 0.22, n2: 0.19, n3: 0.15, n4: 0.21, n5: 0.22} reward .
Fig. 7 demonstrates a state transition probabilities graph,
It can be noticed, that at the new step t, the selection depending on the given selected actions. The circles
probabilities of neighbors 1 and 5 have decreased, while the correspond to the states, while the arrows relate to the
chances of selecting the neighbors 2 and 3 increased selected actions, leading to the end state. Each arrow has a
significantly. This modification of selection probabilities
caption in a format {A, , }, where s' – next state. Sum
implies a selection of previously less-attractive routes, since
of transition probabilities for each action is equal to 1.
the overall channel reliability had decreased drastically
(from 0 to 20 %). This allows much more flexible route
selection process under unreliable communication
conditions, making sure that the alternative routes are
explored more frequently.
In the given example, with the initial estimated reward
values equal to Q(n), a dependency between the neighbor
selection probability and the PLR value has the form,
presented on Fig. 4.

Fig. 7. State transition probabilities graph for given actions.

V. ANALYTICAL DESCRIPTION
As it was mentioned above, the RL-based tasks can be Thus, the following state transition matrices for action
described as MDP process with {S, A, P, R} sets. Applying and are formed:
this concept to the packet routing task, the following sets S
and A can be defined:

214

1− 0 1 layers, and sends the modified packet via physical interface

( )= ; ( )= (13) (e.g., wlan0). On the receiver side, the packet is received by
1− 0 1 the physical network interface and processed according to the
information, contained in RLRP header. If the receiving node
The reward matrices have the following form:
−1 has an IP address, identical to the destination address inside
the received packet, then the packet is being forward further
( )= (14)
up to the stack, to the application.

0 −1
( )= (15)
0
The values of the defined state transition probabilities
α and β can be estimated experimentally, using the following
relations:

= , (16)

= , (17)

where:
– number of subsequent packets, successfully Fig. 8. RLRP protocol header in TCP/IP stack.
delivered; The size of the RLRP header depends from the received
– number of subsequently lost packets; packet type – e.g. TCP/UDP unicast, broadcast, or RLRP
– total number of transmitted packets. service message. The complete list of RLRP headers is
described in more detail in [16].
VI. PROGRAMMING IMPLEMENTATION The generalized structure of RLRP protocol
implementation as well as more detailed description of each
A programming implementation of the developed programming module can be also found in [16].
RL-based adaptive routing scheme has been developed in a
form of independent routing protocol RLRP – Reinforcement
Learning Routing Protocol [16]. VII. EXPERIMENTAL SETUP
The developed protocol implementation is based on a An experimental setup included a real wireless multi-hop
standard Linux TCP/IP stack with IPv4 and IPv6 addressing network topology, built for the testing purposes of the
support. Moreover, the RLRP protocol is independent from developed routing scheme in a format of RLRP program
L4 and L2/L1 layers of the OSI model, making it universal to implementation. As a reference protocol, used for comparing
employ in wireless multihop network, based on Linux SoC the performance results with RLRP under the same testing
hardware [18]. conditions, B.A.T.M.A.N. routing protocol was used.
The protocol provides two networking interfaces – one for The testing topology is illustrated on Fig. 9. The source
communication with upper application layer, based on node is indirectly connected to the destination node via set of
TCP/UDP transport, and the other one for physical intermediate nodes, providing the initial conditions for the
communication with the neighboring network nodes. At the routing forwarding task.
start-up of RLRP routing daemon under Linux OS Each node in the testing environment consisted of Linux
environment, a virtual tun [19] interface is created, under SoC device [18], equipped with wireless network interface
adhoc0 name. After the corresponding IPv4/IPv6 addresses under 802.11 standard, working in Ad-hoc mode.
are assigned to the adhoc0 interface by means of network
stack of Linux OS (usually, initiated on the upper network
application), the corresponding network application can start
communication via the network using the standard
socket-interfaces [20] of Linux OS. As for the real physical
interface, used for physical transmission of the generated
packets towards the wireless neighbors, any available
interface can be used, such as wlan0, bt0, eth0, etc.
Fig. 8 shows a location of RLRP protocol header in OSI
model. The protocol receives an incoming data from the
upper application layer (including TCP/UDP fields) via
adhoc0 interface, processes it (determines a destination IP Fig. 9. Experimental network topology.
address), inserts its own RLRP header in-between L3 and L2

215

The following end-to-end connection performance 100

characteristics under the chosen route were evaluated: 90

Route Recovery Time (RRT),

80
– RTT (Round Trip Time) if ICMP packets;
70
– PLR (Packet Loss Rate) of ICMP packets;
60
– RRT (Route Recovery Time), measured after the route

seconds
RLRP protocol
failure event. 50
B.A.T.M.A.N. protocol
These characteristics were measured using the ping 40
network utility [21], which was the upper level application 30
for the tested routing protocols, generating the ICMP traffic 20
towards the destination. The first 2 characteristics (RTT and 10
PLR) were automatically measured by the ping utility, while 0
the RRT value has been measured separately via the
Fig. 11. Route Recovery Time values of RLRP and B.A.T.M.A.N.
dedicated application script, which had the following logic.
When the tested routing protocol has established a route
towards the destination via one of the intermediate nodes, an 80
ICMP reply/request traffic started to flow through the route. 70

Round Trip Time (RTT), ms

Then, by means of the testing script, the intermediate node
60
was switched off, therefore, breaking the established route.
RLRP protocol
After that, the tested routing protocol (either B.A.T.M.A.N. 50
or RLRP) will have to re-establish a connection to the 40
destination by choosing an alternative route (i.e., an B.A.T.M.A.N.
30 protocol
alternative intermediate node) in real time. The time
difference between the events of route breaking and finding 20
an alternative route was considered exactly as Route 10
Recovery Time – RRT.
0

Fig. 12. Round Trip Time values of RLRP and B.A.T.M.A.N.

VIII. RESULTS
However, in terms of RTT evaluation, the developed
According to the results of the series of experiments, the programming implementation showed higher value of RTT,
developed routing scheme showed significantly better result comparing to standalone B.A.T.M.A.N. protocol, as
in PLR value (Fig. 10), which presents an advantage in the illustrated on Fig. 12. Such result is primarily related to 2
protocol’s usage on multihop wireless topologies, especially factors:
considering highly unreliable transmission conditions in such
networks. – current programming realization of RLRP:
Moreover, the developed adaptive packet forwarding
scheme allows considerably faster switching to alternative the current version of RLRP protocol is implemented in
routes in the event of route failures, without an increase in user-space of Linux OS, using a dynamic interpreted Python
overall service message transmissions. The corresponding programming language. This time overhead plays a
RRT values are presented on Fig. 11. significant role, especially in a context of high number of I/O
operations, followed by network protocol operation;
16
- usage of a back channel for ACK feedback:
14
the proposed routing scheme is actively using the ACK
Packet Loss Rate (PLR), %

12 messages, coming from the selected nodes. The current

10 RLRP protocol implementation of the feedback algorithm (packet – ACK
8 exchange) should be optimized, in order to mitigate the
B.A.T.M.A.N. impact of the feedback channel.
6 protocol However, it should be noticed, that the main factor of the
4 higher RTT values lies in the current unoptimized
2 programming realization, which can be tackled in the future.
0
Fig. 10. Packet Loss Rate values of RLRP and B.A.T.M.A.N.
IX. FUTURE WORK
The developed routing protocol has a potential for
performance optimization, especially towards increasing the
speed of I/O operations, as well as moving the software
implementation to kernel space. This would significantly

216

decrease the time processing overhead and, therefore, lower [8] C. Perkins, E. Belding-Royer, and S. Das, “Ad hoc On-Demand
Distance Vector (AODV) Routing,” IETF RFC 3561, Jul. 2002.
the average RTT value, making it closer to the other routing
[9] IEEE 802.11s. IEEE Standard for Information Technology –
protocol implementations, written on C programming Telecommunications and Information Exchange Between Systems -
language, such as B.A.T.M.A.N. Local and Metropolitan Area Networks – Specific Requirements - Part
Moreover, the feedback mechanism should also be a point 11: Wireless LAN Medium Access Control (MAC) and Physical Layer
(PHY) Specifications - Amendment 10: Mesh Networking, IEEE
of optimization, making it flexible from the incoming user
Std., 2011.
traffic intensity. Overall, these optimization actions will gain [10] L. Peshkin and V. Savova, “Reinforcement learning for adaptive
in RTT and throughput characteristics of the routed routing,” in Neural Networks, 2002. IJCNN’02. Proceedings of the
connection. 2002 International Joint Conference on, vol. 2. IEEE, 2002, pp. 1825 –
1830.
It also should be noticed, that the developed adaptive
[11] J. Dowling, E. Curran, R. Cunningham, and V. Cahill, “Using
scheme can be used as more generalized approach for feedback in collaborative reinforcement learning to adaptively
solving the other tasks, close to the wireless multi-hop optimize MANET routing,” Systems, Man and Cybernetics, Part A,
routing task. For example, the proposed solution might be IEEE Transactions on, vol. 35, no. 3, pp. 360 – 372, 2005.
[12] Z. Qin, Z. Jia, X. Chen. “Fuzzy Dynamic Programming based Trusted
applied in conjunction with the algorithms of available
Routing Decision in Mobile Ad Hoc Networks“, 2008, The Fifth IEEE
bandwidth estimation for more effective route selection International Symposium on Embedded Computing(SEC ’08). Beijing,
across fat-pipe WAN networks [22]. China , pp.180 – 185.
The other possible application of the developed routing [13] P. Nurmi, "Reinforcement Learning for Routing in Ad Hoc Networks,"
Proc. 5th Inti. Symposium on Modeling and Optimization in Mobile,
scheme can be found in conjunction with stationary mobile
Ad Hoc, and Wireless Networks. IEEE Computer Society, Los
networks, for purposes of remote mobile monitoring [23]. Alamitos, 2007.
[14] R. Desai and B. Patil, “Cooperative reinforcement learning approach
for routing in ad hoc networks,” in Pervasive Computing (ICPC), 2015
X. CONCLUSION International Conference on. IEEE, 2015, pp. 1 – 5.
[15] R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction,
This paper presented an application of RL-based MIT Press, Cambridge, MA, 1998.
algorithms to the routing task in wireless multihop [16] https://github.com/dugdmitry/adhoc_routing/wiki.
topologies. As a result, a flexible, reliable, adaptive packet [17] J.N.Davies, V.Grout and R.Picking, "Prediction of Wireless Network
Signal Strength within a Building", Proceedings of the Seventh
forwarding scheme has been developed, which showed International Network Conference (INC), pp. 193 – 207, 2008.
significantly better results in PLR and RRT values, compared [18] BeagleBoard.org, Beagleboard-black. [Online]. Available:
to the classical routing approach, widely used in the current http://beagleboard.org/black.
ad hoc multi-hop networks. [19] Universal TUN/TAP device driver. Copyright (C) 1999-2000 Maxim
Krasnyansky.
The developed RL-based adaptive routing scheme also https://www.kernel.org/doc/Documentation/networking/tuntap.txt.
proposes additional mechanisms for initial route weights [20] Linux Programmer's Manual, http://man7.org/linux/man-
distribution, as well as the functions for dynamic variations pages/man2/socket.2.html.
of negative and positive rewards, generated for the chosen [21] Mike Muuss. "The Story of the PING Program". U.S. Army Research
Laboratory. Archived from the original on 8 September 2010.
packet forwarding action. Based on soft-max selection Retrieved 8 September 2010.
method, a next-hop node selection probabilities function has [22] D. Dugaev, D. Kachan, I. Fedotova Concept of traffic routing in
been elaborated as well. mobile ad-hoc networks based on highly accurate available bandwidth
The developed programming implementation of the estimations // Vestnik SibSUTIS #4, 2015.
[23] D. Dugaev Evaluating Uplink Connection Establishing Time in LTE
proposed scheme – RLRP routing protocol [16] has also been Networks // T-COMM – Telecommunication and Transport. Moscow.
tested under real wireless multihop topologies [2], [3]. #5, November. 2013. (in Russian).

REFERENCES MEng., Dmitrii Dugaev, Research Engineer at

Future Internet Lab Anhalt (FILA), Anhalt
[1] FireChat – https://en.wikipedia.org/wiki/FireChat.
University of Applied Sciences, Germany.
[2] D. Dugaev, S. Zinov, E. Siemens A Survey of Multi-Hop Routing
Schemes in Wireless Networks applied for the Smartlighting Scenario
// V international science conference “Technologies and equipment for
information measurement”, Tomsk. Russia. May 2014.
[3] S. Zinov, E. Siemens "The Smart Lighting Concept", Workshop on
Problems of Autonomous Power Systems in the Siberian Region,
Anhalt University of Applied Sciences, Germany, October 2013.
[4] T. Clausen and P. Jacquet. Optimized Link State Routing Protocol
(OLSR). RFC 3626 (Experimental), October 2003.
[5] R. Sanchez-Iborra et al., “Performance evaluation of BATMAN MEng., Ivan Matveev, Research Engineer at
routing protocol for VoIP services: a QoE perspective”, IEEE Future Internet Lab Anhalt (FILA), Anhalt
Transactions Wireless Communication, vol. 13, no. 9, pp. 4947 – University of Applied Sciences, Germany.
4958, 2014.
[6] Perkins Charles E., Bhagwat Pravin: Highly Dynamic Destination-
Sequenced Distance-Vector Routing (DSDV) for Mobile Computers,
London England UK, SIGCOMM 94-8/94.
[7] David B. Johnson, David A. Maltz, and Josh Broch. DSR: The
Dynamic Source Routing Protocol for Multi-Hop Wireless Ad Hoc
Networks. in Ad Hoc Networking, edited by Charles E. Perkins,
Chapter 5, pp. 139 – 172, Addison-Wesley, 2001.

217

Prof. Dr.-Ing., Eduard Siemens, Head of Future

Internet Lab Anhalt (FILA), Anhalt University of
Applied Sciences, Germany.

Prof. Dr., Viatcheslav Shuvalov, Head of

Discrete Packet Transmission department at
Siberian State University of Telecommunications
and Information Science (SibSUTIS).

218

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on July 03,2023 at 21:28:29 UTC from IEEE Xplore. Restrictions apply.