Real-Time Collision Warning System Based On Computer Vision Using Mono Camera
Real-Time Collision Warning System Based On Computer Vision Using Mono Camera
net/publication/345985767
CITATION READS
1 152
5 authors, including:
Some of the authors of this publication are also working on these related projects:
7th Biennial Colloquium on OPTICAL WIRELESS COMMUNICATIONS - 12th IEEE/IET International Symposium on Communication Systems, Networks and DSP, 20-22 July
2020, Porto, PORTUGAL - https://csndsp2020.av.it.pt View project
All content following this page was uploaded by Tawfik Ismail on 17 November 2020.
Abstract– This paper aims to help self-driving cars and autonomous This model is trained on a local machine with more datasets taken
vehicles systems to merge with the road environment safely and from a mono camera to detect cars from the road in different
ensure the reliability of these systems in real life. Crash avoidance is environments. Then the distance of the cars is calculated by the
a complex system that depends on many parameters, the forward-
depth equation used in [5]. The main weak point in the depth
collision warning system is simplified into four main objectives:
detecting cars, depth estimation, assigning cars into lanes (lane
equation in [5] was that the proposed depth estimation algorithm
assign) and tracking technique. The presented work targets the built using the iPhone camera. It is not a realistic solution in the
software approach by using YOLO (You Only Look Once), which is automation field, so there was a problem with the distance
a deep learning object detector network to detect cars with an equation [5]. Also, it depends on the camera specifications.
accuracy of up to 93%. Therefore, apply a depth estimation However, the first step was modifying the depth estimation
algorithm that uses the output boundary boxes’ s dimensions (width equation to fit the new camera specifications and keep the range
and height) from YOLO. These dimensions used to estimate the of depth estimation accuracy of 80.4 % [5].
distance with an accuracy of 80.4%. Also, a real-time computer vision
algorithm is applied to assign cars into lanes. However, a tracking On the other hand, the distance of the cars is not the only parameter
proposed algorithm is applied to evaluate the speed limit to keep the to rely on in crash avoiding problem. Therefore, there is other
vehicle safe. Finally, the real-time system achieved for all algorithms information that should be determined, such as the speed of other
with streaming speed 23 FPS (frame per second). cars and in which region the other cars reside. So, one of the
proposed solutions divides the road into three virtual regions; the
Keywords: Self-driving Cars, Computer Vision, YOLO, Crash Emergency region (E), Left region (L), and Right region (R), as
Avoidance, Depth Estimation, Real-time System. shown in Fig. 1. The more interest will be given to the cars in the
emergency region than the cars in the other regions. The
I. INTRODUCTION emergency region is defined to be the region in front of the vehicle,
which has to be clear and safe, while the other two regions
Nowadays, cars are the most common method of transport. They represent the second priority after the emergency lane. The more
are comfortable, safe and they give a slim, but non-zero risk of an the car approaches the emergency lane, the higher priority it is.
accident. Cars remain one of the deadliest transport systems; That is why the proposed system defines a threshold for the
however, only motorcycles that are increasingly overshadowed. relative speed with cars in the emergency region only.
According to the World Health Organization, they are also an
unsustainable economic burden, costing between 1 and 3 percent
of each country's GDP [1]. The safety element in cars is an
essential pillar in the reduction of accidents around the world.
Self-driving vehicles have five core components [2]: computer
vision, sensor fusion, localization, optimal path, and control unit.
Regarding computer vision, it uses a camera to make a vision
system of the car like that in humans. So, the computer can extract
information and features from these images as humans do.
There are many approaches to provide a vision system to the cars
regarding the hardware, such as radar, lidar, a stereo vision system,
or a single camera. The stereo vision system consists of two
cameras embedded together to provide depth information [3]. The
single (mono) camera has been used in many applications to
provide computer vision applications. regarding the software
approach, some many networks and algorithms can achieve object
detection and depth estimation efficiently.
This paper focuses on a mono camera and a pre-trained model Fig. 1. Three regions of the road
using the YOLO network with the COCO dataset from darknet [4].
A tracking technique should be created to give each car an index regression techniques as mentioned in [5-6], the depth estimation
and keep the same index for each car through all coming frames. equation (Eq. 1) is modified using trial and error algorithm that
Therefore, the relative speed could be calculated from the tries different values for one parameter ( ) from 0 to 1 in the
difference between the distances of the same car at two different equation (this parameter comes out from regression algorithm and
frames and divided over the time between these two frames. does not have any physical meaning) then the algorithm compares
Finally, calculate the speed limit that represents the value of speed the output estimated distance at each value of ( ) with the real
that should keep our vehicle in a safe situation.
distance from real dataset. Table I shows the most precise three
The remainder of the paper is structured as follows: Section II values from the self-generating algorithm, the most accurate value
discusses the proposed anti-collision algorithm. Section III was at = 0.00540.
presents the simulation results. Finally, Section IV summarizes the
conclusions. = [2021.256 − 1.276714 ∗ ℎ ℎ − 0.6042361 ∗ width
+ 0.0004751 ∗ height ∗ width] (1)
II. PROPOSED ANTI-COLLISION ALGORITHM
Table I. Comparison between correction values of the depth estimation equation
A) Object Detection
Front
In this section, a multi-task algorithm is proposed to extract from # α Equation
Acc.
each frame all needed information while maintaining the Dist=(0.00740)[2021.256 −
principles of real-time, YOLO is object detection tool that divides 1 0.00740 1.276714*height − 0.6042361*width + 78%
the input image into an (S×S) grid as shown in Fig. 2. Each grid 0.0004751*height*width]
cell predicts only one object. For example, the yellow grid cell Dist=(0.00606)[2021.256 −
below tries to predict the “person” object whose center (the blue 2 0. 00606 1.276714*height − 0.6042361*width + 80.4%
dot) falls inside the grid cell. Each grid cell predicts a fixed number 0.0004751*height*width]
of boundary boxes. In this example, the yellow grid cell makes Dist=(0.00540)[2021.256 −
two boundary box predictions (blue boxes) to locate where the 3 0. 00540 1.276714*height − 0.6042361*width + 80%
object is. There are several versions of YOLO, but version 3 was 0.0004751*height*width]
chosen according to its speed and accuracy [9]. Also, V3 is
suitable for hardware used in processing.
C) Lane Assignation & Tracking Algorithm
After applying depth estimation, two algorithms developed to
complete the task; lane assignation and tracking algorithm [9]. For
the lane assignation algorithm, geometric principles were used to
assign cars into lanes depending on each vehicle’s bounding box.
horizon
Finally, the relative speed of the front vehicle and speed limit is
evaluated with the same accuracy at 80.4% the same as the depth
estimation. So, the main target is achieved in real-time with
noticeable accuracy as shown in Fig. 9.
IV. CONCLUSION
In this paper, the main objective is to design a crash avoidance
system to achieve the main four objectives: car detection, depth
estimation, lane assignation, and car tracking. These four
objectives are achieved by using only a mono camera. The object
detection system was made using YOLO pre-trained network.
After that, a depth estimation algorithm is applied then the frame
taken by YOLO will be passed to a real-time computer vision
Fig. 7. Depth estimation results technique to assign cars into lanes. Finally, the tracking algorithm
is applied to evaluate the relative speed and the speed limit to
Secondly, the lane assignment algorithm was done with accuracy avoid crashes. Using YOLO on a local machine does not achieve
up to 93% which is the accuracy of the object detection, Since the acceptable real-time results. So, the Jetson TX2 kit used to achieve
accuracy of the detection was upgraded by fine-tuning the real-time performance. The depth estimation algorithm that used
convolutional neural network (CNN) with more images from from the mentioned paper [4] had a critical missing parameter
environment conditions close to that the system will work in, as related to the used camera. So, a modification was made on it to
shown in Fig. 8. The algorithm has assigned each vehicle into its reach the desired performance. For lane assign, a computer vision
lane the letter (E) refers to the emergency lane, and the algorithm technique used with applying the geometric principles, which
automatically colors any detection box with the label (E) with the guarantee the accuracy and present a light load on the processing,
red color, while the right and left lanes take the letters (R) and (L) so it achieves the real-time concept with accuracy up to 93% with
respectively and both take the green color. 23 FPS.
V. REFERENCES
[1] “Road Traffic Deaths,” in World Health Organization, [online]
Available: http://www.who.int/gho/road_safety/mortality/en/
[2] T. Litman, “Autonomous Vehicle Implementation Predictions:
Implications for Transport Planning,” in Victoria Transport Policy
Institute, vol. 28, 2017.
[3] Y. M. Mustafah, R. Noor, H. Hasbi and A. W. Azma, “Stereo vision
images processing for real-time object distance and size
measurements,” in International Conference on Computer and
Communication Engineering (ICCCE), Kuala Lumpur, pp. 659-663,
2012.
[4] YOLO: Real-Time Object Detection,
https://pjreddie.com/darknet/yolo/, September 2019.
[5] Elzayat, M. I., A. M. Saad, M. M. Mostafa, M. R. Hassan, M. S.
Darweesh, H. Abdelmunim, and H. Mostafa, “Real-Time Car
Detection-Based Depth Estimation Using Mono Camera,” in IEEE
International Conference on Microelectronics (ICM 2018), Sousse,
Tunisia, 2018.
[6] Ott, M. Longnecker, and J. D. Draper, “An introduction to statistical
methods and data analysis,” MA: Cengage Learning, 2016.
[7] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look
Once: Unified, Real-Time Object Detection,” in IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV,
pp. 779-788, 2016.
[8] Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi,
Philip H. S. Torr, “Fully-Convolutional Siamese Networks for Object
Tracking,” in European Conference on Computer Vision (ECCV), pp.
850-865, 2016.
[9] Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael
Felsberg, “ECO: Efficient Convolution Operators for Tracking,” in
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2017.
[10] M. U. Akhlaq, U. Izhar and U. Shahbaz, “Depth estimation from a
single camera image using power fit,” in International Conference on
Robotics and Emerging Allied Technologies in Engineering
(iCREATE), Islamabad, pp. 221-227, 2014.