3rd Paper
3rd Paper
3rd Paper
and navigation.
Article:
Koh, S., Zhou, B., Fang, H. et al. (5 more authors) (2020) Real-time deep reinforcement
learning based vehicle routing and navigation. Applied Soft Computing, 96. 106694. ISSN
1568-4946
https://doi.org/10.1016/j.asoc.2020.106694
Reuse
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs
(CC BY-NC-ND) licence. This licence only allows you to download this work and share it with others as long
as you credit the authors, but you can’t change the article in any way or use it commercially. More
information and the full terms of the licence here: https://creativecommons.org/licenses/
Takedown
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by
emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request.
eprints@whiterose.ac.uk
https://eprints.whiterose.ac.uk/
Real-time Deep Reinforcement Learning based Vehicle
Routing and Navigation
Songsang Koha , Bo Zhoua , Hui Fangb,∗, Po Yangd , Zaili Yanga , Qiang
Yangc , Lin Guanb , Zhigang Jie,∗,
a
Department of Computer Science, Liverpool John Moores University, Liverpool, United
Kingdom L3 3AF
b
Department of Computer Science, Loughborough University, Loughborough, United
Kindoms LE11 3TU
c
College of Electrical Engineering, Zhejiang University Hangzhou, PRC 310027
d
Department of Computer Science, Sheffield Unviversity, Sheffiled, United Kingdom S10
2TN
e
National Key Laboratory of Science and Technology on Micro/Nano Fabrication,
Shanghai Jiaotong University, Shanghai, PRC 200240
Abstract
Traffic congestion has become one of the most serious contemporary city is-
sues as it leads to unnecessary high energy consumption, air pollution and
extra traveling time. During the past decade, many optimization algorithms
have been designed to achieve the optimal usage of existing roadway capac-
ity in cities to leverage the problem. However, it is still a challenging task
for the vehicles to interact with the complex city environment in a real time
manner. In this paper, we propose a deep reinforcement learning (DRL)
method to build a real-time intelligent vehicle routing and navigation system
by formulating the task as a sequence of decisions. In addition, we pro-
vide an integrated framework to facilitate the intelligent vehicle navigation
research by embedding smart agents into the SUMO simulator. Nine real-
istic traffic scenarios are simulated to test the proposed navigation method.
The experimental results have demonstrated the efficient convergence of the
vehicle navigation agents and their effectiveness to make optimal decisions
under the volatile traffic conditions. The results also show that the proposed
method provides a better navigation solution comparing to the benchmark
routing optimization algorithms. The performance has been further vali-
dated by using the Wilcoxon test. It is found that the achieved improvement
of our proposed method becomes more significant under the maps with more
edges (roads) and more complicated traffics comparing to the state-of-the-art
navigation methods.
Keywords: Routing and navigation optimization; Deep reinforcement
learning; Deep-Q learning; SUMO; Intelligent vehicle.
1. Introduction
In recent years, traffic congestion in the urban area has become a serious
problem due to the rapid development of urbanization. It brings a major
impact on urban transportation networks that lead to extra traveling hours,
increased fuel consumption and air pollution. According to the study in Mc-
Groarty (2010), the congestion can be categorized into recurring congestion
(RC) and non-recurring congestion (NRC). NRC is defined as the congestion
made by unexpected events, such as construction work, inclement weather,
accidents, and special events Hall (1993). Unsurprisingly, NRC accounts for
a larger proportion of traffic delays in urban areas comparing to the RC due
to its unpredictable nature Sun et al. (2017). There are three categories of
methods proposed to tackle the NRC problem: (1) detecting and predicting
traffic congestions by utilizing both the historical and real-time sensor data
Zygouras et al. (2015); Ghafouri et al. (2017); (2) optimizing traffic signal
control and management Wen (2008); Mousavi et al. (2017a); and (3) vehicle
routing and navigation optimization Ritzinger et al. (2016); Jabbarpour et al.
(2018); Okulewicz and Mańdziuk (2017); Abdulkader et al. (2015). Wherein,
the vehicle routing and navigation, as the most promising solution, has been
investigated extensively during the past decades.
Classical vehicle routing problem (VRP) is defined as finding the mini-
mum cost of the combined routes of a given number of vehicles m to serve n
customers. One of the typical examples is path planning for collecting and
sending packages from a delivery company. Traditional VRP is formed and
classified as an NP-hard problem Jabbarpour et al. (2018). Many optimiza-
tion algorithms are proposed to find the sub-optimal solution under different
∗
Corresponding author
Email addresses: S.S.Koh@2014.ljmu.ac.uk (Songsang Koh), b.zhou@ljmu.ac.uk
(Bo Zhou), H.Fang@lboro.ac.uk (Hui Fang), po.yang@sheffield.ac.uk (Po Yang),
z.yang@ljmu.ac.uk (Zaili Yang), qyang@zju.edu.cn (Qiang Yang),
L.Guan@lboro.ac.uk (Lin Guan), zhigangji@sjtu.edu.cn (Zhigang Ji)
3
navigation task; (2) several advanced schemes, including double DQN, duel-
ing DQN and priority experience replay, are integrated into the framework
to achieve a more reliable convergence of the network; and (3) a distance
ranking sampling strategy is proposed to speed up the convergence speed.
In our work, a traffic simulator, SUMO, is seamlessly connected to smart
navigation agents to provide the integrated experimental framework. It can
simulate real-world traffic conditions as well as embed the agent decisions
into the traffic simulator. Once the agents are required to make navigation
decisions, observations generated from the simulated traffic environment are
fed into the navigation agents as the current traffic state representations.
The agents can, therefore, make automated real-time navigation decisions
that minimize the travel time to reach their destination. The proposed DRL
framework provides an appealing and innovative solution for the vehicle rout-
ing and navigation problem as the learning process is fully automated which
does not require any labeling or guidance. Furthermore, the agents can auto-
matically adapt their policy networks by analyzing the global environment as
the context and eventually decrease the travel time to the destination when
new data is generated.
The main contributions of our research can be highlighted as follows: (1)
this work proposes a novel DRL algorithm to achieve an effective real-time
vehicle routing and navigation system; (2) the DRL agents are embedded into
the traffic simulator SUMO to achieve an integrated framework to facilitate
the intelligent vehicle navigation research under the context of a dynamic
urban transportation system; and (3) the potential practical usage is tested
in nine different road and traffic combined conditions. The efficiency has
been validated by using the Wilcoxon test.
The remainder of the paper is organized as follows: The background of
our research is described in Section 2 to clarify the motivation and the aim of
the work. Section 3 provides an overview of the proposed framework and in-
troduces the main components in our system. Section 4 and Section 5 explain
the detail of two main components, i.e. the traffic simulator and DRL smart
agents. In Section 6, the experimental results are presented to demonstrate
the convergence of the DRL agents, how the agents navigate vehicles to make
routing choice and their performance compared to the SUMO build-in route
optimization algorithms. Finally, Section 7 presents the conclusive remarks
and suggests future work.
4
2. Background
The original idea for solving the problem of traffic congestion was via traf-
fic control and optimization in which a significant number of research works
had conducted. In particular, many of the works focus on path planning and
directing vehicles to their destination as soon as possible with considerations
of static conditions, e.g. travel distance and speed limit.
The very first solution in the early years for vehicle navigation problems
was the shortest path algorithm, which aims to find a path between two
nodes with minimum traveling distance. Dijkstra proposed a static algo-
rithm to find the shortest path without considering any external factors such
as congestion, accident, or average vehicle speed Dijkstra (1959). Therefore,
this solution is no longer considered applicable to today’s traffic congestion
problem given the fast development of modern transportation networks and
the constraints it brought into. Although in general common GPS applica-
tions, e.g. Google map or Waze still rely on shortest path algorithm Lanning
et al. (2014), they also allow drivers to access certain real-time traffic infor-
mation. e.g. accidents, construction, road blocking that may be biased and
inaccurate due to the information rely heavily on human input.
Another direction for vehicle navigation path planning is to use the Ant-
colony algorithm, which was first proposed by DorigoDorigo et al. (1996).
It was inspired by the natural behavior performed by ants in finding food
resources. Previous experiments have proven that ants can find the shortest
route between two individual sections by leaving pheromone trails to allow
other ants to track the path. Nahar and Hashim Nahar and Hashim (2011)
proposed a traffic congestion control method based on different preferences
to create an optimal traffic system and reduce the average traveling time by
adjusting the ant colony variables. The experiment showed that the number
of ants is directly correlated with the algorithm performance. Therefore, this
method does not perform well when there are only a limited number of agents
in the network. A multi-agent evacuation model was introduced by Zong et
al. Zong et al. (2010) to minimize the total evacuation time for vehicles and
balance the traffic load. Ants belong to one colony find their routes with
the same properties. During the process of searching for routes, the two ant
colonies will interact and share information. Experiments have shown that
the Multi-Ant colony system is more effective than a single agent system.
Kponyo et al. Kponyo et al. (2012) also proposed a distributed intelligent
traffic system that uses vehicle average speed as a parameter to determine
5
the traffic condition. This system guides vehicles to paths with less traffics.
Therefore, this system selects the best path more efficiently comparing with
selecting the path in a random fashion. More recently, some other additional
objectives such as work load balancing (in terms of time and distance) or min-
imize vehicles emission are considered in research. Galindres-Guancha et al.
proposed a multi-objective problem of multi-depot vehicle routing (MOMD-
VRP) with the aim to minimize the total distance travelled and the balance
of routes. They developed a three-stage solution approach using constructive
heuristic, iterated local search multi-objective meta-heuristics (ILSMO) and
concepts of dominance Galindres-Guancha et al. (2018). Weiheng Zhang et
al. presented a Multi-Depot Green Vehicle Routing Problem (MDGVRP)
that applies a Two-stage Ant Colony System (TSACS)Zhang et al. (2020)
to find a feasible and acceptable solution to minimize the total carbon emis-
sions. Although these Ant-colony methods have achieved promising results,
they did not perform well when it comes to a more realistic, complex and
dynamic transportation system and cannot deal with unexpected events in-
stantly.
With the rapid development and recent success of machine learning tech-
nologies lately, many researchers started to focus on solving the traffic con-
gestion problem based on the deep learning based methods. Karlaftis et al.
conducted an overview comparing statistical methods with neural networks
in transportation-related research and it demonstrated that solutions based
on deep reinforcement learning are very promising Karlaftis and Vlahogianni
(2011). Most research projects in this area apply deep learning for traffic pre-
diction Lv et al. (2015); Polson and Sokolov (2016) or accident prediction Ren
et al. (2018); Sun et al. (2017) to detect traffic congestions in advance. For
traffic prediction, Lv et al. proposed a deep learning-based traffic flow pre-
diction method by using a stacked auto-encoder model to learn generic traffic
flow features Lv et al. (2015). Polson et al. presented a deep learning predic-
tor for spatial-temporal relations presented in traffic speed measurements.
It focused on forecasting traffic flows that occur unexpectedly and hardly
predictable such as special events or extreme weather Polson and Sokolov
(2016). Both approaches aforementioned provide a relatively accurate traffic
prediction and allow the user to foresee the potential upcoming traffic con-
gestion. Nevertheless, these predictions lack of providing a decision-making
method for users to plan their route to avoid traffic congestion. Ren et al.
Ren et al. (2018) collected traffic accident data and analyzed the spatial and
temporal patterns of the traffic accident frequency for the accident risk pre-
6
dictor. Sun et al. Sun et al. (2017) proposed a deep neural network DxNAT
to identify non-recurring traffic congestion by converting traffic data in Traf-
fic Message Channel (TMC) format to an image, and use a convolutional
neural network (CNN) to identify non-recurring traffic anomalies. Similarly,
although the papers above can identify the non-recurring traffic congestion,
a decision-making method that aims to help the user to avoid those traffic
congestions is missing. Furthermore, several works attempted to handle the
traffic at the intersections such as controlling the traffic light signal Van der
Pol and Oliehoek (2016); Mousavi et al. (2017b); Genders and Razavi (2016)
and navigating vehicles at occluded intersections Isele et al., 2017, 2018). Al-
though these approaches demonstrated promising result for reducing traffic
congestion in certain ways, a real traffic network involves heavily in route
planning and navigation, which are not covered by the study above.
Traffic optimization can potentially be more efficient if it combines with
an intelligent vehicle navigation system in a complex traffic network via DRL
methods. After identifying the challenges of traffic optimization and dis-
cussing the strengthes and weaknesses of the related works, our proposed
framework that uses DRL method for intelligent vehicle navigation is pre-
sented in Section 3.
3. The Framework
Our framework encapsulates the use of SUMO (Simulation of Urban Mo-
bility) with Traffic Control Interface (TraCI) that allows us to connect with
the DRL agent. I this study, an improved Deep Q-Learning Network (DQN)
methodMnih et al. (2015) is adopted to train intelligent agents to navigate the
vehicles to their destinations and avoid congestions. As shown in Figure 1,
the designed framework consists of three parts: The first part is the SUMO,
which is the environment simulator for creating realistic traffic scenarios; the
second part is the middleware that connects the SUMO environment with
the DRL agents; and finally, the third part is the DRL agents which are
capable of maintaining and updating the navigation policies and providing
commands for the navigation simulation.
Our training framework coordinates the environment simulator SUMO
with the observations, actions, and rewards needed and produced by the DRL
agent. After the training is being initialized, we can load, progress and reset
the simulation in SUMO with required information such as the transportation
7
Figure 1: The Framework consists of SUMO simulator, Middleware and DRL Agent for
the vehicle navigation task
network and vehicles via TraCI. More details about the simulator and TraCI
can be found in section 4.
Our objective is to train the policy network that can navigate a vehicle
to find the best route to its destination and avoid congestions. To justify the
feasibility of the proposed training framework, several experiments with dif-
ferent transportation maps are carried out and the policy network is trained
with different levels of traffics for performance comparison. In every train-
ing step, the training framework obtains the environmental observations in
8
SUMO and sends them to the DRL agent via the middleware. Based on the
observations, the DRL agent evaluates the current traffic environment and
assigns an action based on the policy neural network. The framework then
sets the action accordingly to update the state and move to the next step
in SUMO until the simulation is completed. The reward is calculated and
passed to the DRL agent for optimization at the end of each simulation run.
4. Traffic Simulator
Simulation is considered an efficient approach in computer science for
research in investigating different scientific problems. Facilitating the in-
creasing processing power possessed by computers, simulation allows testing
the complex scientific models in a reasonable time with minimum cost. Traf-
fic simulator is especially widely used in transportation research as running
experiments with vehicles in the real world is simply not practical Kotu-
sevski and Hawick (2009). There are several widely used traffic simulators,
including Quadstone Paramics Cameron and Duncan (1996), VISSIM Fellen-
dorf (1994), AIMSUN Casas et al. (2010), MATSIM Axhausen et al. (2016)
and SUMO Krajzewicz et al. (2012). These simulators provide different fea-
tures and models for commercial and research purposes. Kotusevski et al.
carried out a comprehensive comparison of these simulators with their fea-
tures, characteristics and limitations Kotusevski and Hawick (2009). Among
them, SUMO comes with an outstanding ability to simulate a very large and
complex transportation network of up to 10,000 edges (roads).
As SUMO is an open-source, microscopic, multi-model traffic and exten-
sible simulator, it has been widely used in research projects with worldwide
community support. It allows the users to simulate specific traffic scenarios
performing in given road maps. In our experiments, SUMO is used as the
traffic simulator because: (i) it performs an optimized traffic distribution
method based on vehicle types or driver behaviors to maximize the capacity
of the urban transportation network; (ii) it provides flexibility and scalabil-
ity to create the scenario maps; and (iii) it supports TraCI, a Python-based
API to communicate the traffic simulation with the controls from the smart
agents.
A SUMO map can be either generated manually with simple XML-data
containing nodes and edges, or from third-party sources such as Open Street
Map. To import external maps, SUMO provides an additional tool called
“netconvert” to convert the map from other formats into a compatible SUMO
9
version for traffic simulation. There are two main sources for the map gen-
eration. Firstly, it could read maps from other traffic simulators such as
VISUM, Vissim, or MATsim, and compute the needed input for SUMO to
generate its maps in the compatible XML format. Secondly, it could also
import common maps such as those from Open Street Map. Open Street
Map is a valuable source for real-world map data which is free to be viewed
and enhanced. Figure 2 shows an example of converting an Open Street Map
to the SUMO map.
10
in the network between roads is a node in our graph and an edge is defined
if there is a road segment that connects the two corresponding intersections.
As shown in Figure 3, the left sub-figure is a normal urban road traffic
network that runs in SUMO simulator and the right sub-figure is its graph
representation.
11
Figure 4: Problem statement of the DRL based multi-agents navigation
12
deterministic policy π : S → A so that the expectation of traveling time
under similar travel conditions can be reduced significantly.
Action refers to the navigation decision made by a vehicle agent. The
actions are discrete values corresponding to the decisions of navigating vehicle
to m connected edges from the current edge. For example, the actions include
left-turn, right-turn or go-straight as illustrated in Figure 4 where m = 3. In
our experiments with realistic city maps, m is set to the maximum number
of connected edges on the map.
State is an efficient representation of current traffic condition. The
representation variables contain multiple parameters reflecting the circum-
stances in the global urban transportation network to precisely describe the
complexity of its dynamics. Here, the state is defined as a vector with
[me , ne , le , xv , yv ] where the me is the numbers of vehicles in the edges, ne
is the average driving speeds in the edges, le is the road lengths of the edges,
and the xv and yv represent the location of the agent and its destination.
Figure 5 illustrates an example of the state representation in a sample net-
work and how the traffic conditions are observed and extracted. From the ex-
ample, there are 8 roads in the network as E ∈ {AC, CA, BC, CB, CD, DC,
CE, EC}. Road DC has three vehicles as nDC = 3; road CA, EC have
two vehicles as nCA , nEC = 2; road AC, BC, CD each has one vehicle as
nAC , nBC , nCD = 1 and road BC has none as nBC = 0. Meanwhile, to reduce
the dimensionality of the state space, feature construction is applied for the
calculation of expected traveling time on the road, which is obtained from
several features in the network. The calculation is shown by following:
le
ve
, if ne > 0
te = le
me
, if ne = 0
where le is the length of the road, ve is the average driving speed on the road,
me is the speed limit on the road and ne is the number of vehicles on the
road. In the example, vehicle position xv is the coordination of middle point
of road AC where the vehicle agent (the red vehicle on the left) is at the
current time stamp. While the destination yv is the coordination of the road
CE. As the vehicle agent only requires the latest state to make decision, we
compartmentalise a segment in each edge named Decision Zone to indicate
the best timing for obtaining a state. The decision zone dv is expressed as:
13
where ledge is the length of the edge, vmax is the maximum speed of in-
dividual vehicle v, bv is the deceleration function of vehicle v and τ is the
driver’s reaction time. The decision zone is determined by both vehicle type
and driver’s reaction time due to the safety consideration of the urban trans-
portation network.
14
T
X
R= γ k rs+k
k=0
15
of epochs. The loss of individual experience to train the online network is
defined in Eqn.3:
16
Figure 6: Comparison of different DQN methods
17
lower than the DQN and Combo-DQN. Furthermore, it shows that the cu-
mulative travel time of the proposed method during the training is also the
best among the three methods. The pseudocode of our algorithm is presented
in Algorithm 1 to clarify the details of the method. In addition, the next
section explains all the implementation details for its reproducibility.
6. Experiment Preparation
This section presents the preparation of the experiment for implementing
our proposed method. It includes simulation environment building, demand
traffic generation and smart agent training process.
18
6.1. Building a Simulation with SUMO
This subsection presents the implementation of how to build a simulation
for experiment. Figure 8 illustrates the SUMO traffic simulation process
diagram.
19
NetEdit. Eventually, the SUMO networks are designed by microscopic road
traffic simulations, which is ready for the routing navigation purpose.
Demand Traffic: Each vehicle in SUMO simulation is defined explic-
itly since SUMO is a microscopic traffic simulator. A unique identifier, its
departure time, and the vehicle’s route are provided via the SUMO net-
work. Here, the route is the complete list of connected edges between the
origin/destination pair. A trip is defined as the trajectory of a single vehicle
that contains the origin/destination pair and the departure time. The trip
data is stored in .trips.xml file.
Moreover, vehicle’s properties can be further categorized based on vehicle
type. The considered properties for the description of vehicle type in this
experiment are described as follows:
• color: The color for this vehicle type (only applied in SUMO-GUI).
Two vehicle types are defined in this experiment which is ”normal car”
and ”truck”. The definition details of these two vehicle types are displayed in
Table 1 and stored in .add.xml file. As shown in Table 1, each type of vehicle
has its attributes, i.e. length, acceleration, deceleration, sigma, maximum
speed, color and probability. Here the color is only for visualization purposes.
These .trips.xml and .add.xml files are supplied to route generation method
to generate the route file .rou.xml for traffic simulation.
20
Vehicle
Length Accel. Decel. Sigma Max Speed Color Prob.
Type
Normal Car 5.0 2.0 5.0 0.5 20.0 yellow 0.8
Truck 8.0 1.0 5.0 0.5 5.0 green 0.2
Table 1: Definition of vehicle type
and extracts the data during the simulation. Table 2 shows the TraCI meth-
ods that used to retrieve features in the network, i.e. the number of vehicle
on each road ne , the expected travel time of each road te , the current position
of the agent cv and its destination dv .
21
and the prioritization important sampling is increased from 0.4 to 1.
Parameter Value
Episodes 10000
Learning Rate 0.001
Exploration 1.0 → 0.05
Target Network Update per learning step 3000
Discount Factor 0.99
Replay Memory Size 10000
Mini Batch for Update 32
Prioritisation Exponent 0.6
Prioritisation Important Sampling 0.4 → 1.0
Table 3: Vehicle agent hyper parameters for intelligent navigation
7. Experiment Evaluation
There are two subsections in the experimental evaluation: Firstly, two toy
data maps are generated for testing the convergence of intelligent navigation
agents. Additionally, the toy data simulation can further provide a tool to
gain insight of the decisions made by the intelligent agent during the navi-
gation. Secondly, nine traffic conditions based on three regions in Liverpool
city center are simulated to demonstrate the efficiency of the DRLs.
To further demonstrate the performance of the proposed method, we com-
pare our proposed method with five algorithms, which are GDUE-Dijkstra,
GDUE-A*, Dynamic-Dijkstra, Dynamic-A* and Ant-Colony. GDUE-Dijkstra
and GDUE-A* are the default traffic assignment algorithm in SUMO, where
GDUE stands for Gawron’s Dynamic User Equilibrium Gawron (1998). GDUE
uses Dynamic Traffic Assignment (DTA) to model the traffics via a discrete
time-dependent network. It assigns routes for all trips using some shortest
path algorithms (e.g., Dijkstra’s algorithm and A* algorithm) as an initial-
ization step by taking the edge length as edge cost. After running the traffic
simulation, it records the actual traveling time on each edge, and uses the
same shortest path algorithms to re-assign the routes. This step is done
iteratively until the edge cost for all roads is relatively converged. Dynamic-
Dijkstra and Dynamic-A*Kaparias and Bell (2010) uses a dynamic vehicle
route planning method. These routing approaches re-compute their route
22
periodically, or at a specific time, dynamically by using one of the shortest
path methods. The routing takes into account the current and recent state
of traffic in the network and thus adapts to traffic jams and other changes in
the network. Based on these methods, vehicles can be re-routed dynamically
while a simulation is running. Furthermore, we also implemented the Ant
Colony algorithm in Kponyo et al. (2012) to find the shortest traveling time
for a vehicle to its destination. The core idea is to let the vehicle to choose,
in probability, the path marked by stronger pheromone concentrations.
23
(a) Simple map 1
to the volatile traffic states. However, the DRL based agent (illustrated from
Figure 10 (b) to Figure 10 (e)) makes flexible routing decision based on its
observation when it approaches each decision zone. In Figure 10 (b), the Q
value of the decision to travel straight is much higher than the Q value of
the decision to turn left. It is consistent with an intuitive observation that a
truck with lower speed is on the left edge. In Figure 10 (c), the selection of
the left edge is reasonable as the vehicle number on the right edge is much
more than the number on the left edge. The route selected by the intelligent
agent is illustrated in Figure 10 (f). In this demonstration, it takes 39 time
steps to reach the destination by using the proposed algorithm while it takes
56 time steps when using the routing algorithm of Dijkstra and A*. In other
words, the proposed navigation method improves 30.4% in terms of traveling
24
Figure 10: Simulation Result
25
(a) Liverpool city map 1
26
time in the first map is the shortest. Another finding is that the converging
speed is relatively slower when there are more vehicles in the simulation due
to the more volatile road conditions.
Figure 12: Convergence of the smart agent in (a) City Map 1 (b) City Map 2 (c) City
Map 3
27
Map1 Map 2 Map 3
Methods 20 30 50 30 50 80 50 80 120
vehicles vehicles vehicles vehicles vehicles vehicles vehicles vehicles vehicles
GDUE 81.43 81.78 92.66 109.48 118.23 131.26 162.94 180.04 206.82
Dijkstra (±3.31) (±3.61) (±14.47) (±5.74) (±11.59) (±19.03) (±12.75) (±16.72) (±25.37)
GDUE 82.48 82.96 95.54 110.78 119.14 131.77 163.86 181.70 208.51
A* (±4.34) (±3.87) (±13.82) (±8.62) (±11.76) (±19.48) (±13.06) (±17.44) (±28.26)
Dynamic 79.26 79.70 90.84 105.74 110.96 126.42 148.98 161.34 194.32
Dijkstra (±4.94) (±2.98) (±11.99) (±4.68) (±8.82) (±9.46) (±11.50) (±11.07) (±9.80)
Dynamic 81.25 80.26 91.18 107.98 111.36 127.76 149.16 162.12 195.96
A* (±6.40) (±4.01) (±13.36) (±5.90) (±8.62) (±9.82) (±11.65) (±9.91) (±8.92)
Ant 79.56 79.84 90.98 106.90 113.36 127.98 151.08 176.80 205.12
Colony (±4.79) (±4.72) (±12.64) (±4.63) (±11.82) (±12.44) (±9.07) (±18.00) (±30.97)
RL 78.10 78.72 84.06 103.44 107.66 117.00 143.64 147.82 159.02
Agent (±2.66) (±4.19) (±5.29) (±2.48) (±4.43) (±6.85) (±4.84) (±5.60) (±8.73)
where the data population does not require to follow a normal distribution
assumption. According to the significance level analysis presented in Table
6, all the significant level comparisons between the proposed method and
individual benchmark algorithms are smaller than 0.05. It shows that the
performance of our proposed method is superior over the state-of-the-art
methods. Furthermore, the improvement becomes more significant when the
city map is larger and the demand traffic is higher.
28
changing environment. Consequently, this causes a serious traffic conges-
tion problem and leads to environmental damage. Therefore, in this paper
we proposed a novel DRL based vehicle route optimization approach to re-
route vehicles to their destinations, making them adapt to the complexity
of urban transportation network to avoid traffic congestion. The proposed
method provides not only just a case of designing a DRL framework to lever-
age one of the most challenging contemporary issues, but also an implication
to connect statistical theory with the DRL to improve the efficiency of the
convergence process.
The major contributions of this paper are: 1) We designed a novel frame-
work to facilitate the vehicle route optimization research under complex ur-
ban transportation context. It enables an accessible way to optimize vehicle
route planning problem using DRL methods. It enhances SUMO simulator
to make it more suitable for optimising vehicle route selection with DRL
algorithms. 2) We also designed an effective observations, reward scheme
and DRL algorithms to achieve efficient convergence of the DRL training.
This paper describes an effective observation as the representation of current
traffic conditions within a specific area of urban networks. The representa-
tion variables contain multiple parameters reflecting the circumstances in the
global urban transportation network to precisely describe the complexity of
its dynamics. 3) The integration of the proposed vehicle route optimization
approach with the real urban map to achieve real-time intelligent vehicle
navigation, and finally, 4) this work carried out an objective comparison
against the existing routing methods and analyzed their significance levels
to demonstrate the potential of the proposed system.
Unlike existing navigation systems that focus on individual vehicles, our
solution takes the complex traffic conditions across the observed area into
account, guiding to all the vehicles involved to maximize the efficiency of the
transportation network. The improved design of a DQN architecture makes
our solution best suited for real-time smart vehicle navigation. The method
is considered to be deployed into affordable hardware devices so that it can
be embedded into the next generation smart vehicle navigation system.
Nonetheless, although this paper shows a significant reduction of the ve-
hicle traveling time when applying the DRL method to train the model for
vehicle agent, the performance is expected to be further improved when more
efficient features can be extracted from the environment. Furthermore, the
proposed framework needs to not just focus on traveling time but also the
vehicle emissions to achieve a more sustainable transportation network. The
29
third limitation is that the model is trained based on the simulation traffic
data. Thus, it is required to be fine-tuned when deployed on the real traffic
data.
To further improve the proposed vehicle route optimization for urban
transportation networks, several future works are worth mentioning: 1) opti-
mization of the proposed framework towards a more sustainable urban trans-
portation network. A method that can index the impact of vehicle emission
for vehicle navigation is needed in the next stage. Besides, abnormal driving
behaviors for emergency vehicles should be considered by the DRL agent,
for example exceeding the speed limit, overtaking on the right or driving in
opposite direction. 2) More efficient features can be investigated to represent
a more realistic urban traffic condition and vehicle behavior. The features in
the proposed approach have confirmed the effectiveness of optimizing vehi-
cle routes in the urban transportation network. However, in practice, more
factors could affect the urban traffic condition and to be investigated in the
future, and finally 3) Collection of real traffic data to narrow the gap between
real-data and simulation data generated by the simulator. Another potential
future work is to use transfer learning and domain adaptation techniques to
fill the gap so that the concept can be commercialized.
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin,
M., Ghemawat, S., Irving, G., Isard, M., et al., 2016. Tensorflow: A
system for large-scale machine learning, in: 12th {USENIX} Symposium
on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–
283.
Anjum, S.S., Noor, R.M., Aghamohammadi, N., Ahmedy, I., Kiah, M.L.M.,
Hussin, N., Anisi, M.H., Qureshi, M.A., 2019. Modeling traffic congestion
30
based on air quality for greener environment: an empirical study. IEEE
Access 7, 57100–57119.
Axhausen, K.W., Nagel, K., Horni, A., 2016. The multi-agent transport
simulation matsim .
Bertsimas, D., Jaillet, P., Martin, S., 2019. Online vehicle routing: The
edge of optimization in large-scale applications. Operations Research 67,
143–162.
Boesen, P.V., 2017. Vehicle with interaction between vehicle navigation sys-
tem and wearable devices. US Patent App. 15/356,978.
Casas, J., Ferrer, J.L., Garcia, D., Perarnau, J., Torday, A., 2010. Traffic
simulation with aimsun, in: Fundamentals of traffic simulation. Springer,
pp. 173–232.
Dean, R., Nagy, B., Stentz, A., Bavar, B., Zhang, X., Panzica, A., 2019.
Autonomous vehicle routing using annotated maps. US Patent 10,416,677.
Dorigo, M., Maniezzo, V., Colorni, A., et al., 1996. Ant system: optimization
by a colony of cooperating agents. IEEE Transactions on Systems, man,
and cybernetics, Part B: Cybernetics 26, 29–41.
31
Genders, W., Razavi, S., 2016. Using a deep reinforcement learning agent
for traffic signal control. arXiv preprint arXiv:1611.01142 .
Ghafouri, A., Laszka, A., Dubey, A., Koutsoukos, X., 2017. Optimal de-
tection of faulty traffic sensors used in route planning, in: Proceedings of
the 2nd International Workshop on Science of Smart City Operations and
Platforms Engineering, ACM. pp. 1–6.
Guo, D., Wang, J., Zhao, J.B., Sun, F., Gao, S., Li, C.D., Li, M.H., Li,
C.C., 2019. A vehicle path planning method based on a dynamic traffic
network that considers fuel consumption and emissions. Science of The
Total Environment 663, 935–943.
Hall, R.W., 1993. Non-recurrent congestion: how big is the problem? are
traveler information systems the solution? Transportation Research Part
C: Emerging Technologies 1, 89–103.
Isele, D., Cosgun, A., Fujimura, K., 2017. Analyzing knowledge transfer in
deep q-networks for autonomously handling multiple intersections. arXiv
preprint arXiv:1705.01197 .
Isele, D., Cosgun, A., Subramanian, K., Fujimura, K., . Navigating inter-
sections with autonomous vehicles using deep reinforcement learning. may
2017. URL http://arxiv. org/abs/1705.01196 .
Isele, D., Rahimi, R., Cosgun, A., Subramanian, K., Fujimura, K., 2018.
Navigating occluded intersections with autonomous vehicles using deep re-
inforcement learning, in: 2018 IEEE International Conference on Robotics
and Automation (ICRA), IEEE. pp. 2034–2039.
32
Karlaftis, M.G., Vlahogianni, E.I., 2011. Statistical methods versus neu-
ral networks in transportation research: Differences, similarities and some
insights. Transportation Research Part C: Emerging Technologies 19, 387–
399.
Kponyo, J.J., Kuang, Y., Li, Z., 2012. Real time status collection and dy-
namic vehicular traffic control using ant colony optimization, in: 2012 in-
ternational conference on computational problem-solving (ICCP), IEEE.
pp. 69–72.
Krajzewicz, D., Erdmann, J., Behrisch, M., Bieker, L., 2012. Recent develop-
ment and applications of sumo-simulation of urban mobility. International
Journal On Advances in Systems and Measurements 5.
Lanning, D.R., Harrell, G.K., Wang, J., 2014. Dijkstra’s algorithm and
google maps, in: Proceedings of the 2014 ACM Southeast Regional Con-
ference, ACM. p. 30.
Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y., 2015. Traffic flow prediction
with big data: a deep learning approach. IEEE Transactions on Intelligent
Transportation Systems 16, 865–873.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare,
M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.,
2015. Human-level control through deep reinforcement learning. Nature
518, 529.
Mousavi, S.S., Schukat, M., Howley, E., 2017a. Traffic light control using
deep policy-gradient and value-function-based reinforcement learning. IET
Intelligent Transport Systems 11, 417–423.
33
Mousavi, S.S., Schukat, M., Howley, E., 2017b. Traffic light control using
deep policy-gradient and value-function-based reinforcement learning. IET
Intelligent Transport Systems 11, 417–423.
Van der Pol, E., Oliehoek, F.A., 2016. Coordinated deep reinforcement learn-
ers for traffic light control. Proceedings of Learning, Inference and Control
of Multi-Agent Systems (at NIPS 2016) .
Polson, N., Sokolov, V., 2016. Deep learning predictors for traffic flows. arXiv
preprint arXiv:1604.04527 .
Ren, H., Song, Y., Wang, J., Hu, Y., Lei, J., 2018. A deep learning approach
to the citywide traffic accident risk prediction, in: 2018 21st International
Conference on Intelligent Transportation Systems (ITSC), IEEE. pp. 3346–
3351.
Ritzinger, U., Puchinger, J., Hartl, R.F., 2016. A survey on dynamic and
stochastic vehicle routing problems. International Journal of Production
Research 54, 215–231.
Ruiz, E., Soto-Mendoza, V., Barbosa, A.E.R., Reyes, R., 2019. Solving the
open vehicle routing problem with capacity and distance constraints with a
biased random key genetic algorithm. Computers & Industrial Engineering
133, 207–219.
Samaras, C., Tsokolis, D., Toffolo, S., Magra, G., Ntziachristos, L., Samaras,
Z., 2019. Enhancing average speed emission models to account for con-
gestion impacts in traffic network link-based simulations. Transportation
Research Part D: Transport and Environment 75, 197–210.
Schaul, T., Quan, J., Antonoglou, I., Silver, D., 2015. Prioritized experience
replay. arXiv preprint arXiv:1511.05952 .
34
Sun, F., Dubey, A., White, J., 2017. Dxnat—deep neural networks for ex-
plaining non-recurring traffic congestion, in: 2017 IEEE International Con-
ference on Big Data (Big Data), IEEE. pp. 2141–2150.
Van Hasselt, H., Guez, A., Silver, D., 2016. Deep reinforcement learning with
double q-learning, in: Thirtieth AAAI Conference on Artificial Intelligence.
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas,
N., 2015. Dueling network architectures for deep reinforcement learning.
arXiv preprint arXiv:1511.06581 .
Wen, W., 2008. A dynamic and automatic traffic light control expert system
for solving the road congestion problem. Expert Systems with Applications
34, 2370–2381.
Zhang, W., Gajpal, Y., Appadoo, S., Wei, Q., et al., 2020. Multi-depot green
vehicle routing problem to minimize carbon emissions. Sustainability 12,
3500.
Zong, X., Xiong, S., Fang, Z., Li, Q., 2010. Multi-ant colony system for
evacuation routing problem with mixed traffic flow, in: IEEE Congress on
Evolutionary Computation, IEEE. pp. 1–6.
Zygouras, N., Panagiotou, N., Zacheilas, N., Boutsis, I., Kalogeraki, V.,
Katakis, I., Gunopulos, D., 2015. Towards detection of faulty traffic sensors
in real-time., in: MUD@ ICML, pp. 53–62.
35