SocProS 2018 Paper ID 35
SocProS 2018 Paper ID 35
SocProS 2018 Paper ID 35
net/publication/328980435
CITATIONS READS
24 3,016
2 authors:
All content following this page was uploaded by Arvind Srinivasa on 29 November 2021.
a
Department of Computer Science, Dr. Ambedker Institute of Technology, Bengaluru, India
b
School of Electrical and Electronic Engineering, Nanyang Technological University
Singapore, Singapore
{1csarvind2000@gmail.com, 2senthil.iiscb@gmail.com}
1 Introduction
detect obstacles using computer vision and machine learning in particular rein-
forcement techniques [1]. At present in ADAS, obstacle detection is conducted by
processing radar and visual inputs, which requires a lot of training and ground
truth for early detection [2]. To overcome the limitation of the above methods re-
searchers are working an early detection of obstacles using reinforcement tech-
nique. It will reduce the effort of generating the ground truth of obstacles manual-
ly.
Reinforcement learning (RL) is a machine learning technique in which agent
that is an autonomous vehicle will learn its environment based on action, state and
reward it obtains from the previous action [3]. There are two modes of learning (i)
model-based and (ii) model-free. In the model-based learning, dynamics of the
environment is simulated, such that the model learns the transition probability
T(s1|(s0,a)) from the pair of current state s0 and action ‘a’ to the next state ‘s1’. The
main disadvantage of this method is a state space and action space grows, storing
this information becomes impractical [4]. To overcome this drawback model-free
algorithms rely on trial-error to update its knowledge. As a result, it does not re-
quire space to store all the combination of state and actions to learn about the en-
vironment.
Madhu et, al [5] has used the Q-Learning algorithm to predict future action to be
taken for an autonomous motion of robot based on obstacle detection from a vi-
sion-based sensor. Chu P. et.al [6] has successfully demonstrated using reinforce-
ment learning robot will become intelligent and automatically navigate by avoid-
ing collision with the static and dynamic obstacle. The main disadvantage of Q-
learning based obstacle detection are (i) learning is slow because of an iterative
method in finding optimal Q-value of the next state ‘s’ and greedy action ‘a’ using
off-policy model (ii) Memory requirement is more to store Q-values of all possible
situation and action values. To overcome this drawback, Bing-qiang et.al [7] has
approached robot obstacle avoidance using model-free, off policy based Q-
learning reinforcement technique alone with the neural network and has stated, the
simulated robot can learn obstacle even in a complex environment. Mihai et. al,
[8] has proposed a new path planning algorithm based on Q-learning and artificial
neural network for mobile robots and has analyzed the simulated obstacle avoid-
ance using virtual reality for easier visualization and safer testing activities. Chen
Xia et.al [9] has also used Q-learning reinforcement learning for obstacle avoid-
ance for the industrial mobile vehicles in an unknown environment using the neu-
ral network.
The present research work of obstacle detection has been done on the controlled
uncertain environment for the robotic task. There are two major challenges that
make autonomous vehicle different from the robotic task (i) For precise vehicle
control, action space must be continuous which cannot be dealt with the traditional
Q-learning algorithm (ii) autonomous vehicle must satisfy various constraints like
traffic rules and vehicle dynamics.
In this paper, we propose static obstacle detection using reinforcement learning
for autonomous vehicle navigation in a simulated environment. The ultra-sonic
radar sensors distance measure is used to determine the obstacle ahead. The simu-
lation is performed for the dynamic urban scenario. MLP-NN will improve the
`
3
continuous action space problem by optimally predicting the next action based on
vehicle acceleration, heading angle, distance measure from the ultrasonic sensor.
The comparative study is carried out using two reinforcement learning, namely,
conventional Q-learning and Q-learning with multi-layer perceptron neural net-
work for static obstacle detection. The static obstacle is detected with the number
of ultra-sonic sensors connected to the vehicle; to determine the accuracy of con-
tinuous action state. Also, ‘stop-go’ collision avoidance mechanism is tested on an
autonomous car prototype hardware model which is been developed for this re-
search work.
The paper is organized as follows: Section 2 describes the reinforcement meth-
odology for the detection of static obstacles. In section 3 the details of the experi-
ment and result are presented. Section 4 discusses the conclusion and future work.
2 Methodology
In our research, the reinforcement learning agent is an autonomous vehicle and the
surrounding area is its environment. Autonomous vehicle performs the following
actions A(t) {Turn Left, Turn Right, Move Forward, Move Reverse}. The
environment will return the future state of the vehicle S(t+1) and reward R(t+1).
Fig. 1 represents the agent “vehicle” performing actions and obtaining the future
state S(t+1) and R(t+1).
Fig. 1(b) represents actual urban scenario of autonomous vehicle with sensors nav-
igating on narrow roads with vehicles parked on road (static obstacles) and the
circle indicated the road junction to help both turning and to maintain one-way
circular roadway.
`
4
Detection of the obstacle is based on the distance information from ultrasonic sen-
sors. To get the wide-angle information, one or multiple sensors are placed in
front of the vehicle. The agent (vehicle) will perform action „A‟ like move for-
ward, move reverse, turn left and turn right. State „X‟ signifies the present state of
vehicle-based on „A‟ action {1,…,Nx}. Q-learning is model-off, policy less rein-
forcement learning algorithm [10]. Initial arbitrary state and action will help in
calculating policy (Π). Based on policy (Π) future action is determined using trial
and error method. Using reward points „R‟ iteratively Q-learning model will un-
derstand the environment. Positive reward points are awarded for corrective ac-
tion. Negative reward points are penalized for the wrong action. Optimal state and
action for every action are calculated using,
where Q(S,A) is the current policy of action „A‟ from state „S‟. The reward for the
action is „R‟, max(Q(S’,A’)) define the maximum future reward and γ is the dis-
count factor varying from 0 to 1 which determines the significance of future re-
wards. A factor of 0 will make the agent to consider current rewards. While a fac-
tor nearing 1 will make it stay for long term high reward. This will helps in
accelerate learning.
Predicted optimal new state and action are stored in Q-table. The agent will nav-
igate using Q-table information in the environment which helps in detecting ob-
stacles and avoiding it. Algorithm1 explains the steps taken to detect obstacles us-
ing Q-learning.
__________________________________________________________________
Algorithm1: Obstacle detection using Q-learning
Input: Action = A {Move forward, Move Reverse, Turn Left, Turn Right}
State = X {1………………...Ns}
Output = Q(X,A) optimal State and Action
Let γЄ [0,1] →Discount factor,
Let Є {0,1} → learning rate = 0.1
Let R→ Reward
Q-learning Parameters (X, A, R, T, alpha, λ)
Initialize S = Random State, A = Random Action, R = Arbitrarily
Q: S*A→ R
For ii to Iteration do
Start in state s Є S
while s is not terminal do
Calculate Policy (Π(x)) ← arg maxaQ(x,a)
Action ← Policy (state)
`
5
if CollisionwithObstacle == True
reward ← R(state, action)
reward = -500
else
reward ← R(state, action)
update(reward)
endif
s’ ← T(s,a) // Receive new state
Q(s’,a) ← Equation 1
s ← s’
return
endFor
The main drawback of obstacle detection using Q-learning is that it cannot be ap-
plied to solve complex problems because of the sparse nature of Q-table to store
large data. To solve complex problems like obstacle detection, a multi-layer per-
ceptron neural network algorithm [11] combined with Q-learning can predict op-
timal state value. The Q-learning output is of two dimensions (state, action).
Combining with MLP output is reduced to optimal one dimension (new state)
from present state, action, reward, and new state input values. Optimal Q(S,A) is
predicted using MLP-NN loss function equation 2.
where xi is the input (state, reward, action, new state) and Q(S,A) is the previous
iteration Q value. In our research, we have used single layer MLP with one epoch,
with varying hidden layers. Input data is processed using mini-batch processing
using RELU activation [12] function. RMSE is minimized using the optimization
technique, namely, gradient descent to obtain optimal Q(S,A). The steps taken to
detect obstacles using hybrid Q-learning with MLP algorithm is discussed in Al-
gorithm 2.
__________________________________________________________________
Algorithm 2: Obstacle detection using Q-learning with MLP-NN
Input: Action = A {Move forward, Move Reverse, Turn Left, Turn Right}
State = X {1………………...Ns}
Output = Q(X,A) optimal State and Action
Let γЄ [0,1] →Discount factor,
`
6
endFor
endFor
endFor
3 SIMULATION RESULTS
`
7
of +5 for each step without a crash. Different numbers of the ultra-sonic sensor
with varying hyper parameters of the MLP-NN are simulated.
TABLE 1: Training parameters for Q-learning with MLP-NN model for obstacle detection.
Parameters Value
Number of Ultra-Sonic Sensors 3,5,7
Ultra-Sonic Sensor Placement Angle 0o,450,600,750
Neural Network Hidden Units [256]
Number of epochs 1
Number of iteration per epoch 100,000
Batch Size [40, 100, 400]
Buffer Size for Q-Values [10000, 50000]
Learning Rate α 0.9
Discount Factor γ 0.1
Ultra-sonic sensors are placed on the front bonnet of the vehicle at the height of 2
feet from the ground. We have experimented with obstacle detection and avoid-
ance using three, five and seven ultra-sonic sensors at different angles. Angles at
which ultra-sonic sensors are placed on the bonnet of the vehicle are given in Ta-
ble 1.
`
8
ing rate of 0.9. In figure 4(a) red color line represent model built using 7 sensors
input is converging faster within less number of iterations in understanding the
environment for obstacle detection compared to 5 and 3 sensors. This figure clear-
ly illustrates more sensors will help in faster learning about the environment.
Fig. 4: (a) Moving average MLP-NN training loss for 3 different sensors for [256] hidden layers,
with the batch size of 100 for 100,000 iterations. (b) Fast converging red lines of 7 sensors in less
iteration frames compared to green (5 sensors) and blue (3 sensors) in understanding the envi-
ronment (20,000 iterations).
`
9
Fig 5: Simulation testing environment setup for evaluating Q-learning and Q-leaning MLP-
NN for obstacle detection.
TABLE 2: Comparative study result of Q-learning and Q-learning with MLP for obstacle
detection using different sensor input.
Sensors Input TP FP FN
17492 17131 361 399
3 Precision Recall F1 Score
0.9793 0.9772 0.9782
Sensors Input TP FP FN
17492 17268 224 227
5 Precision Recall F1 Score
0.9871 0.9870 0.9871
Sensors Input TP FP FN
17492 17330 162 171
7 Precision Recall F1 Score
0.9907 0.9902 0.9904
Sensors Input TP FP FN
17492 17451 36 46
3 Precision Recall F1 Score
0.9979 0.9973 0.9976
17492 17462 30 32
5 Precision Recall F1 Score
0.9982 0.9981 0.9982
17492 17486 8 4
7 Precision Recall F1 Score
0.9995 0.9996 0.9995
`
10
We have developed a self-driving car prototype model which can detect an obsta-
cle using an ultra-sonic sensor. Figure 5 represents complete hardware design and
measurement of self-driving car prototype and customized track to test the obsta-
cle detection with „STOP-GO‟ mechanism using ultra-sonic radar distance and
camera input. Vehicle automatically stops if any obstacle is detected and gradually
navigates forward when the distance from the obstacle to the vehicle reaches the
safer threshold. For experiment 20 centimeter is set as safer threshold. In Figure 5
(a) hardware blueprint of vehicle design is shown with a length of 25 cm and
height of 5 cm from the ground. Ultrasonic radar is mounted at 2 cm from the
ground level and the camera is placed at 23 cm from ground at a tilted angle of 20
degrees. Figure 5 (b) shows detail hardware circuit connection, where “M” motors
of the vehicle are connected to L-298N H-bridge [18] which act like invertor in
controlling the motor spin. Pulse Width Modulation (PWM) of 50 hertz frequency
is set to make the spinning of the motor at constant speed. Camera and Ultrasonic
radar is connected to Raspberry Pi 3, which is on-board electronic processing unit
on the vehicle where all the algorithms are running. Figure 5(c) built hardware
setup of autonomous vehicles used in this work. Figure 5(d) customized test track
setup to test obstacle detection for the urban situation.
Fig 5: (a) Hardware design and measurements of self-driving car prototype, (b) Complete hard-
ware design of robotic self-driving car prototype, (c) Self-driving robotic car prototype build for
this research work, (d) customized test track to test obstacle detection.
`
11
Figure 6 shows frame by frame output of an obstacle detection and avoidance us-
ing the ‘stop-go’ mechanism on hardware prototype. Figure 6 (a-c) Vehicle is au-
tonomously navigating on test track based on perception algorithms. 6(d) Vehicle
automatically stops as it has detected an obstacle ahead, 5(e) vehicle autonomous-
ly navigate forward when navigation path is free from obstacles 5(f) shows exper-
imental result of ‘stop –go’ mechanism where an obstacle ahead is detected and
collision avoidance is mitigated.
`
12
References
1. Pendleton, S.D.; Andersen, H.; Du, X.; Shen, X.; Meghjani, M.; Eng, Y.H.; Rus, D.;
Ang, M.H. ”Perception, Planning, Control, and Coordination for Autonomous Vehi-
cles. Machines 2017, 5, 6.
2. Fernando Garcia, David Martin, Arturo De La Escalera, Jose Maria Armingol, "Sensor
fusion methodology for vehicle detection", IEEE Trans. Intell. Transp. Syst. Maga-
zine, vol. 9, pp. 123-133, Jan. 2017.
3. http://outlace.com/rlpart1.html
4. J.-F. Qiao, Z.-J. Hou, X.-G. Ruan, "Application of reinforcement learning based on
neural network to dynamic obstacle avoidance", Proceedings of the 2008 IEEE Inter-
national Conference on Information and Automation, pp. 784-788, 2008.
5. V. M. Babu, U. V. Krishna, S. K. Shahensha, "An autonomous path finding robot us-
ing Q-learning", IEEE International Conference on Intelligent Systems and Control,
2016.
6. Chu P., Vu H., Yeo D., Lee B., Um K., Cho K. (2015) Robot Reinforcement Learning
for Automatically Avoiding a Dynamic Obstacle in a Virtual Environment. In: Park J.,
Chao HC., Arabnia H., Yen N. (eds) Advanced Multimedia and Ubiquitous Engineer-
ing. Lecture Notes in Electrical Engineering, vol 352. Springer, Berlin, Heidelberg
7. Bing-Qiang Huang, Guang-Yi Cao and Min Guo, "Reinforcement Learning Neural
Network to the Problem of Autonomous Mobile Robot Obstacle Avoidance," 2005 In-
ternational Conference on Machine Learning and Cybernetics, Guangzhou, China,
2005, pp. 85-89.
8. Mihai Duguleana,” Neural networks based reinforcement learning for mobile robots
obstacle avoidance,” Expert Systems with Applications: An International Journal ar-
chive,Volume 62 Issue C, November 2016,Pages 104-115.
9. XIA C., EL KAMEL A. (2015) A Reinforcement Learning Method of Obstacle
Avoidance for Industrial Mobile Vehicles in Unknown Environments Using Neural
Network. In: Qi E., Shen J., Dou R. (eds) Proceedings of the 21st International Con-
ference on Industrial Engineering and Engineering Management 2014. Proceedings of
the International Conference on Industrial Engineering and Engineering Management.
Atlantis Press, Paris.
10. http://outlace.com/rlpart3.html
11. C. E. Thorpe, "Neural network based autonomous navigation", Vision and Navigation:
The Carnegie Mellon Navlab, Kluwer, 1990.
12. Hidenori Ide, Takio Kurita, “Improvement of learning for CNN with ReLU activation
by sparse regularization”, International Joint Conference on Neural Networks
(IJCNN), 2017.
13. https://www.pygame.org.
14. https://www.tensorflow.org/.
15. Zeng-Chang Qin, “ROC analysis for predictions made by probabilistic classifiers”, In-
ternational Conference on Machine Learning and Cybernetics,2005.
16. Senthilnath, J., Kulkarni, S., Benediktsson, J.A. and Yang, X.S., 2016. A novel ap-
proach for multispectral satellite image classification based on the bat algorithm. IEEE
Geoscience and Remote Sensing Letters, 13(4), pp.599-603.
17. Senthilnath, J., Bajpai, S., Omkar, S.N., Diwakar, P.G. and Mani, V., 2012. An ap-
proach to multi-temporal MODIS image analysis using image classification and seg-
mentation. Advances in Space Research, 50(9), pp.1274-1287.
18. Abu Tayab Noman, M A Mahmud Chowdhury, Humayun Rashid, “Design and im-
plementation of microcontroller based assistive robot for person with blind autism and
visual impairment” 2017 2017 20th International Conference of Computer and Infor-
mation Technology (ICCIT)