Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views13 pages

SocProS 2018 Paper ID 35

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328980435

Autonomous Vehicle for Obstacle Detection and Avoidance Using


Reinforcement Learning

Chapter · December 2018


DOI: 10.1007/978-981-15-0035-0_5

CITATIONS READS

24 3,016

2 authors:

Arvind Srinivasa J. Senthilnath

20 PUBLICATIONS 198 CITATIONS


Agency for Science, Technology and Research (A*STAR)
168 PUBLICATIONS 4,004 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Arvind Srinivasa on 29 November 2021.

The user has requested enhancement of the downloaded file.


Autonomous Vehicle for Obstacle Detection and
Avoidance Using Reinforcement Learning
C.S.Arvinda1, J.Senthilnathb2

a
Department of Computer Science, Dr. Ambedker Institute of Technology, Bengaluru, India
b
School of Electrical and Electronic Engineering, Nanyang Technological University
Singapore, Singapore

{1csarvind2000@gmail.com, 2senthil.iiscb@gmail.com}

Abstract- Obstacle detection and avoidance during navigation of an autono-


mous vehicle is one of the challenging problems. Different sensors like RGB cam-
era, Radar and Lidar are presently used to analyze the environment around the ve-
hicle for obstacle detection. Analyzing the environment using supervised learning
techniques has proven to be an expensive process due to the training of different
obstacle for different scenarios. In order to overcome such difficulty, in this paper
Reinforcement Learning (RL) techniques are used to understand the uncertain en-
vironment based on sensor information to take the decision. Policy free, model-
free Q-learning based RL algorithm with the multi-layer perceptron neural net-
work (MLP-NN) is applied and trained to predict optimal vehicle future action
based on the current state of the vehicle. Further, the proposed Q-Learning with
MLP-NN based approach is compared with the state-of-the-art, namely, Q-
learning. A simulated urban area obstacles scenario is considered with the differ-
ent number of ultra-sonic radar sensors in detecting obstacles. The experimental
result shows that Q-learning with MLP-NN along with the ultrasonic sensors is
proven to be more accurate than conventional Q-learning technique with the ultra-
sonic sensors. Hence it is demonstrated that combining Q-learning with MLP-NN
will improve in predicting obstacles for autonomous vehicle navigation.

Keywords: Autonomous Vehicle, Ultra-sonic Radar, Reinforcement Learning, Q-


learning, Multi-Layer Perceptron Neural Network.

1 Introduction

Technological advancement in advance driver assistance system (ADAS) has em-


phasized on passenger and vehicle safety. Autonomous vehicle navigate within the
urban situation is way too complex because of static and dynamic obstacles pre-
sent around the vehicle environment. Detection and avoidance of these obstacles
are one of the important aspects of autonomous vehicle navigation. State-of-the-
art data analysis using different sensors like camera, radar and lidar can effectively
2

detect obstacles using computer vision and machine learning in particular rein-
forcement techniques [1]. At present in ADAS, obstacle detection is conducted by
processing radar and visual inputs, which requires a lot of training and ground
truth for early detection [2]. To overcome the limitation of the above methods re-
searchers are working an early detection of obstacles using reinforcement tech-
nique. It will reduce the effort of generating the ground truth of obstacles manual-
ly.
Reinforcement learning (RL) is a machine learning technique in which agent
that is an autonomous vehicle will learn its environment based on action, state and
reward it obtains from the previous action [3]. There are two modes of learning (i)
model-based and (ii) model-free. In the model-based learning, dynamics of the
environment is simulated, such that the model learns the transition probability
T(s1|(s0,a)) from the pair of current state s0 and action ‘a’ to the next state ‘s1’. The
main disadvantage of this method is a state space and action space grows, storing
this information becomes impractical [4]. To overcome this drawback model-free
algorithms rely on trial-error to update its knowledge. As a result, it does not re-
quire space to store all the combination of state and actions to learn about the en-
vironment.
Madhu et, al [5] has used the Q-Learning algorithm to predict future action to be
taken for an autonomous motion of robot based on obstacle detection from a vi-
sion-based sensor. Chu P. et.al [6] has successfully demonstrated using reinforce-
ment learning robot will become intelligent and automatically navigate by avoid-
ing collision with the static and dynamic obstacle. The main disadvantage of Q-
learning based obstacle detection are (i) learning is slow because of an iterative
method in finding optimal Q-value of the next state ‘s’ and greedy action ‘a’ using
off-policy model (ii) Memory requirement is more to store Q-values of all possible
situation and action values. To overcome this drawback, Bing-qiang et.al [7] has
approached robot obstacle avoidance using model-free, off policy based Q-
learning reinforcement technique alone with the neural network and has stated, the
simulated robot can learn obstacle even in a complex environment. Mihai et. al,
[8] has proposed a new path planning algorithm based on Q-learning and artificial
neural network for mobile robots and has analyzed the simulated obstacle avoid-
ance using virtual reality for easier visualization and safer testing activities. Chen
Xia et.al [9] has also used Q-learning reinforcement learning for obstacle avoid-
ance for the industrial mobile vehicles in an unknown environment using the neu-
ral network.
The present research work of obstacle detection has been done on the controlled
uncertain environment for the robotic task. There are two major challenges that
make autonomous vehicle different from the robotic task (i) For precise vehicle
control, action space must be continuous which cannot be dealt with the traditional
Q-learning algorithm (ii) autonomous vehicle must satisfy various constraints like
traffic rules and vehicle dynamics.
In this paper, we propose static obstacle detection using reinforcement learning
for autonomous vehicle navigation in a simulated environment. The ultra-sonic
radar sensors distance measure is used to determine the obstacle ahead. The simu-
lation is performed for the dynamic urban scenario. MLP-NN will improve the

`
3

continuous action space problem by optimally predicting the next action based on
vehicle acceleration, heading angle, distance measure from the ultrasonic sensor.
The comparative study is carried out using two reinforcement learning, namely,
conventional Q-learning and Q-learning with multi-layer perceptron neural net-
work for static obstacle detection. The static obstacle is detected with the number
of ultra-sonic sensors connected to the vehicle; to determine the accuracy of con-
tinuous action state. Also, ‘stop-go’ collision avoidance mechanism is tested on an
autonomous car prototype hardware model which is been developed for this re-
search work.
The paper is organized as follows: Section 2 describes the reinforcement meth-
odology for the detection of static obstacles. In section 3 the details of the experi-
ment and result are presented. Section 4 discusses the conclusion and future work.

2 Methodology

In our research, the reinforcement learning agent is an autonomous vehicle and the
surrounding area is its environment. Autonomous vehicle performs the following
actions A(t) {Turn Left, Turn Right, Move Forward, Move Reverse}. The
environment will return the future state of the vehicle S(t+1) and reward R(t+1).
Fig. 1 represents the agent “vehicle” performing actions and obtaining the future
state S(t+1) and R(t+1).

Fig.1: (a) Representation of an agent to environment interaction using reinforcement learn-


ing. (b) Obstacle detection scenario of urban situation

Fig. 1(b) represents actual urban scenario of autonomous vehicle with sensors nav-
igating on narrow roads with vehicles parked on road (static obstacles) and the
circle indicated the road junction to help both turning and to maintain one-way
circular roadway.

`
4

2.1 Q-learning Algorithm

Detection of the obstacle is based on the distance information from ultrasonic sen-
sors. To get the wide-angle information, one or multiple sensors are placed in
front of the vehicle. The agent (vehicle) will perform action „A‟ like move for-
ward, move reverse, turn left and turn right. State „X‟ signifies the present state of
vehicle-based on „A‟ action {1,…,Nx}. Q-learning is model-off, policy less rein-
forcement learning algorithm [10]. Initial arbitrary state and action will help in
calculating policy (Π). Based on policy (Π) future action is determined using trial
and error method. Using reward points „R‟ iteratively Q-learning model will un-
derstand the environment. Positive reward points are awarded for corrective ac-
tion. Negative reward points are penalized for the wrong action. Optimal state and
action for every action are calculated using,

Q(S,A) = R + γ(max(Q(S`,A’)) (1)

where Q(S,A) is the current policy of action „A‟ from state „S‟. The reward for the
action is „R‟, max(Q(S’,A’)) define the maximum future reward and γ is the dis-
count factor varying from 0 to 1 which determines the significance of future re-
wards. A factor of 0 will make the agent to consider current rewards. While a fac-
tor nearing 1 will make it stay for long term high reward. This will helps in
accelerate learning.
Predicted optimal new state and action are stored in Q-table. The agent will nav-
igate using Q-table information in the environment which helps in detecting ob-
stacles and avoiding it. Algorithm1 explains the steps taken to detect obstacles us-
ing Q-learning.
__________________________________________________________________
Algorithm1: Obstacle detection using Q-learning

Input: Action = A {Move forward, Move Reverse, Turn Left, Turn Right}
State = X {1………………...Ns}
Output = Q(X,A) optimal State and Action
Let γЄ [0,1] →Discount factor,
Let Є {0,1} → learning rate = 0.1
Let R→ Reward
Q-learning Parameters (X, A, R, T, alpha, λ)
Initialize S = Random State, A = Random Action, R = Arbitrarily
Q: S*A→ R
For ii to Iteration do
Start in state s Є S
while s is not terminal do
Calculate Policy (Π(x)) ← arg maxaQ(x,a)
Action ← Policy (state)

`
5

if CollisionwithObstacle == True
reward ← R(state, action)
reward = -500
else
reward ← R(state, action)
update(reward)
endif
s’ ← T(s,a) // Receive new state
Q(s’,a) ← Equation 1
s ← s’
return
endFor

2.2 Q-Learning with MLP-NN

The main drawback of obstacle detection using Q-learning is that it cannot be ap-
plied to solve complex problems because of the sparse nature of Q-table to store
large data. To solve complex problems like obstacle detection, a multi-layer per-
ceptron neural network algorithm [11] combined with Q-learning can predict op-
timal state value. The Q-learning output is of two dimensions (state, action).
Combining with MLP output is reduced to optimal one dimension (new state)
from present state, action, reward, and new state input values. Optimal Q(S,A) is
predicted using MLP-NN loss function equation 2.

Loss(Q(S’,A’)) = 1/n ∑ |xi –Q(S,A)| (2)

where xi is the input (state, reward, action, new state) and Q(S,A) is the previous
iteration Q value. In our research, we have used single layer MLP with one epoch,
with varying hidden layers. Input data is processed using mini-batch processing
using RELU activation [12] function. RMSE is minimized using the optimization
technique, namely, gradient descent to obtain optimal Q(S,A). The steps taken to
detect obstacles using hybrid Q-learning with MLP algorithm is discussed in Al-
gorithm 2.

__________________________________________________________________
Algorithm 2: Obstacle detection using Q-learning with MLP-NN

Input: Action = A {Move forward, Move Reverse, Turn Left, Turn Right}
State = X {1………………...Ns}
Output = Q(X,A) optimal State and Action
Let γЄ [0,1] →Discount factor,

`
6

Let Є {0,1} → learning rate


Let R→ Reward Let E → Epochs = 5
Let minibatch = 64
Parameters for Qlearning (X,A,R,T,alpha, λ)
Initialize S = Random State, A = Random Action, R = Arbitrarily
Q: S*A→ R
For ii to Epochs do
For jj to iteration do
Start in state s Є S
while s is not terminal do
Calculate Policy (Π(x) ← arg maxaQ(x,a)
action← Policy (state)
if CollisionwithObstacle == True
reward ← R(state,action)
reward = -500
else
reward ← R(state,action)
update(reward) // update the reward
endif
s’ ← T(s,a) // Receive new state
For kk to miniBatch do
find maxa(s’,,a’) using MLP()
Q(s ,a) = append maxa(s’,,a’)

endFor
endFor
endFor

3 SIMULATION RESULTS

Obstacle detection is conducted on a simulation environment, considering the ur-


ban scenario using pygame [13]. Tensorflow [14] MLP-NN learning library is
used for model building. Figure 3 represents the simulation urban scenario setup,
where the blue moving object is an autonomous vehicle with the ultra-sonic radar
of maximum 30 meters range. Red color static objects represent road obstacles.
The green color object represents a road circle at the road intersection. Black lines
represent the road border/lanes. The velocity of the vehicle is fixed at 10 meters
per second such that autonomous vehicle moves at the same speed within the envi-
ronment. Reinforcement learning algorithm will learn about the environment,
from sensor input values based on action and rewards. Without rewards it would
be difficult to know whether the crashing is bad, not crashing for a long time is
good. In our experiment, using trial and error approach a negative reward of -500
is given each time when a car crashes into the obstacle or road borders, a reward

`
7

of +5 for each step without a crash. Different numbers of the ultra-sonic sensor
with varying hyper parameters of the MLP-NN are simulated.

Fig 3: Obstacle detection simulation training setup of urban scenario.

Table 1 depicts inputs and hyperparameters used in training reinforcement model


to detect obstacles using Q-learning and Q-learning with MLP methodology.

TABLE 1: Training parameters for Q-learning with MLP-NN model for obstacle detection.

Parameters Value
Number of Ultra-Sonic Sensors 3,5,7
Ultra-Sonic Sensor Placement Angle 0o,450,600,750
Neural Network Hidden Units [256]
Number of epochs 1
Number of iteration per epoch 100,000
Batch Size [40, 100, 400]
Buffer Size for Q-Values [10000, 50000]
Learning Rate α 0.9
Discount Factor γ 0.1

Ultra-sonic sensors are placed on the front bonnet of the vehicle at the height of 2
feet from the ground. We have experimented with obstacle detection and avoid-
ance using three, five and seven ultra-sonic sensors at different angles. Angles at
which ultra-sonic sensors are placed on the bonnet of the vehicle are given in Ta-
ble 1.

3.2 Training Q-learning with MLP-NN Model

To find optimal obstacle detection, MLP-NN with different values of hyper-


parameters like the number of hidden units, batch size, learning rate, discount fac-
tor, and buffer size is tested. Table 1 shows different hyper parameters values used
in building the MLP-NN model. Figure 4 shows model training loss function val-
ue with the number of iteration for [256] hidden units of 40 batch size, and learn-

`
8

ing rate of 0.9. In figure 4(a) red color line represent model built using 7 sensors
input is converging faster within less number of iterations in understanding the
environment for obstacle detection compared to 5 and 3 sensors. This figure clear-
ly illustrates more sensors will help in faster learning about the environment.

Fig. 4: (a) Moving average MLP-NN training loss for 3 different sensors for [256] hidden layers,
with the batch size of 100 for 100,000 iterations. (b) Fast converging red lines of 7 sensors in less
iteration frames compared to green (5 sensors) and blue (3 sensors) in understanding the envi-
ronment (20,000 iterations).

3.3 Assessment of Q-learning and Q-learning MLP-NN

Evaluation of Q-learning and Q-learning MLP-NN is tested on different testing


simulation environment with more static obstacles of different sizes and more in
number. Figure 5 shows simulated testing environment setup evaluating two algo-
rithms for obstacle detection. The red color objects represent static obstacles, blue
color circle object represent autonomous vehicle navigating in the urban scenario.
The performance analysis of the obstacle detection using Q-learning and Q-
learning with MLP is evaluated using ROC curve [15, 16, 17]. True positive (TP)
is the number of correct actions predicted, in our experiment correct action means
detecting an obstacle and avoiding it. False positive (FP) is the number of action is
false. In our experiment, false action mean obstacle is not detected and vehicle
collides with the obstacle. If vehicle collides with road border/ lane edge, then
those action is considered as false negative (FN). Table 2 shows the performance
of Q-learning and Q-learning with MLP. The results depict Q-learning can detect
and avoid obstacle with more sensors (7 sensors) with high false positive and false
negative. Inferring Q-learning alone cannot learn complex scenarios. Combination
of Q-learning with MLP-NN helps in understanding complex scenarios with fewer
sensor inputs (3 sensors). In our experiment with 7 sensors inputs, 256 hidden lay-
ers of batch size 40 is detecting and avoiding obstacle with an accuracy of 0.9995
F1 scores.

`
9

Fig 5: Simulation testing environment setup for evaluating Q-learning and Q-leaning MLP-
NN for obstacle detection.

TABLE 2: Comparative study result of Q-learning and Q-learning with MLP for obstacle
detection using different sensor input.

Obstacle detection using Q-Learning

Sensors Input TP FP FN
17492 17131 361 399
3 Precision Recall F1 Score
0.9793 0.9772 0.9782
Sensors Input TP FP FN
17492 17268 224 227
5 Precision Recall F1 Score
0.9871 0.9870 0.9871
Sensors Input TP FP FN
17492 17330 162 171
7 Precision Recall F1 Score
0.9907 0.9902 0.9904

Obstacle detection using Q-Learning with MLP


Hidden layer [256], Batch Size = 100

Sensors Input TP FP FN
17492 17451 36 46
3 Precision Recall F1 Score
0.9979 0.9973 0.9976
17492 17462 30 32
5 Precision Recall F1 Score
0.9982 0.9981 0.9982
17492 17486 8 4
7 Precision Recall F1 Score
0.9995 0.9996 0.9995

`
10

3.4 Simulation to Hardware prototype

We have developed a self-driving car prototype model which can detect an obsta-
cle using an ultra-sonic sensor. Figure 5 represents complete hardware design and
measurement of self-driving car prototype and customized track to test the obsta-
cle detection with „STOP-GO‟ mechanism using ultra-sonic radar distance and
camera input. Vehicle automatically stops if any obstacle is detected and gradually
navigates forward when the distance from the obstacle to the vehicle reaches the
safer threshold. For experiment 20 centimeter is set as safer threshold. In Figure 5
(a) hardware blueprint of vehicle design is shown with a length of 25 cm and
height of 5 cm from the ground. Ultrasonic radar is mounted at 2 cm from the
ground level and the camera is placed at 23 cm from ground at a tilted angle of 20
degrees. Figure 5 (b) shows detail hardware circuit connection, where “M” motors
of the vehicle are connected to L-298N H-bridge [18] which act like invertor in
controlling the motor spin. Pulse Width Modulation (PWM) of 50 hertz frequency
is set to make the spinning of the motor at constant speed. Camera and Ultrasonic
radar is connected to Raspberry Pi 3, which is on-board electronic processing unit
on the vehicle where all the algorithms are running. Figure 5(c) built hardware
setup of autonomous vehicles used in this work. Figure 5(d) customized test track
setup to test obstacle detection for the urban situation.

Fig 5: (a) Hardware design and measurements of self-driving car prototype, (b) Complete hard-
ware design of robotic self-driving car prototype, (c) Self-driving robotic car prototype build for
this research work, (d) customized test track to test obstacle detection.

`
11

Figure 6 shows frame by frame output of an obstacle detection and avoidance us-
ing the ‘stop-go’ mechanism on hardware prototype. Figure 6 (a-c) Vehicle is au-
tonomously navigating on test track based on perception algorithms. 6(d) Vehicle
automatically stops as it has detected an obstacle ahead, 5(e) vehicle autonomous-
ly navigate forward when navigation path is free from obstacles 5(f) shows exper-
imental result of ‘stop –go’ mechanism where an obstacle ahead is detected and
collision avoidance is mitigated.

Fig 6: Experimental result of “stop-go” mechanism on hardware prototype.

4 Conclusions and Future Work

In this paper, we developed a reinforcement learning technique, namely, Q-


learning with MLP-NN for obstacle detection and avoidance. The proposed meth-
od is proven to be the effective and efficient way of collision avoidance for auton-
omous vehicle navigation. Simulation experiment results depict Q-learning with
MLP-NN is able to understand and detect complex urban scenarios like static ob-
stacles, road border/ lane marking better than conventional Q-learning technique.
We also developed a hardware prototype and tested „stop-go‟ when the sensor de-
tects an obstacle.
Although the results are very promising, some challenges not considered part of
this study. For example, including the dynamic moving obstacles on the road. De-
tecting and avoiding them can be accomplished using fused inputs from multiple
sensors. Learning this complex dynamic environment can be solved using deep re-
inforcement learning method which of future interest.

`
12

References
1. Pendleton, S.D.; Andersen, H.; Du, X.; Shen, X.; Meghjani, M.; Eng, Y.H.; Rus, D.;
Ang, M.H. ”Perception, Planning, Control, and Coordination for Autonomous Vehi-
cles. Machines 2017, 5, 6.
2. Fernando Garcia, David Martin, Arturo De La Escalera, Jose Maria Armingol, "Sensor
fusion methodology for vehicle detection", IEEE Trans. Intell. Transp. Syst. Maga-
zine, vol. 9, pp. 123-133, Jan. 2017.
3. http://outlace.com/rlpart1.html
4. J.-F. Qiao, Z.-J. Hou, X.-G. Ruan, "Application of reinforcement learning based on
neural network to dynamic obstacle avoidance", Proceedings of the 2008 IEEE Inter-
national Conference on Information and Automation, pp. 784-788, 2008.
5. V. M. Babu, U. V. Krishna, S. K. Shahensha, "An autonomous path finding robot us-
ing Q-learning", IEEE International Conference on Intelligent Systems and Control,
2016.
6. Chu P., Vu H., Yeo D., Lee B., Um K., Cho K. (2015) Robot Reinforcement Learning
for Automatically Avoiding a Dynamic Obstacle in a Virtual Environment. In: Park J.,
Chao HC., Arabnia H., Yen N. (eds) Advanced Multimedia and Ubiquitous Engineer-
ing. Lecture Notes in Electrical Engineering, vol 352. Springer, Berlin, Heidelberg
7. Bing-Qiang Huang, Guang-Yi Cao and Min Guo, "Reinforcement Learning Neural
Network to the Problem of Autonomous Mobile Robot Obstacle Avoidance," 2005 In-
ternational Conference on Machine Learning and Cybernetics, Guangzhou, China,
2005, pp. 85-89.
8. Mihai Duguleana,” Neural networks based reinforcement learning for mobile robots
obstacle avoidance,” Expert Systems with Applications: An International Journal ar-
chive,Volume 62 Issue C, November 2016,Pages 104-115.
9. XIA C., EL KAMEL A. (2015) A Reinforcement Learning Method of Obstacle
Avoidance for Industrial Mobile Vehicles in Unknown Environments Using Neural
Network. In: Qi E., Shen J., Dou R. (eds) Proceedings of the 21st International Con-
ference on Industrial Engineering and Engineering Management 2014. Proceedings of
the International Conference on Industrial Engineering and Engineering Management.
Atlantis Press, Paris.
10. http://outlace.com/rlpart3.html
11. C. E. Thorpe, "Neural network based autonomous navigation", Vision and Navigation:
The Carnegie Mellon Navlab, Kluwer, 1990.
12. Hidenori Ide, Takio Kurita, “Improvement of learning for CNN with ReLU activation
by sparse regularization”, International Joint Conference on Neural Networks
(IJCNN), 2017.
13. https://www.pygame.org.
14. https://www.tensorflow.org/.
15. Zeng-Chang Qin, “ROC analysis for predictions made by probabilistic classifiers”, In-
ternational Conference on Machine Learning and Cybernetics,2005.
16. Senthilnath, J., Kulkarni, S., Benediktsson, J.A. and Yang, X.S., 2016. A novel ap-
proach for multispectral satellite image classification based on the bat algorithm. IEEE
Geoscience and Remote Sensing Letters, 13(4), pp.599-603.
17. Senthilnath, J., Bajpai, S., Omkar, S.N., Diwakar, P.G. and Mani, V., 2012. An ap-
proach to multi-temporal MODIS image analysis using image classification and seg-
mentation. Advances in Space Research, 50(9), pp.1274-1287.
18. Abu Tayab Noman, M A Mahmud Chowdhury, Humayun Rashid, “Design and im-
plementation of microcontroller based assistive robot for person with blind autism and
visual impairment” 2017 2017 20th International Conference of Computer and Infor-
mation Technology (ICCIT)

View publication stats

You might also like