Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
53 views

Real-Time Collision Warning System Based On Computer Vision Using Mono Camera

This paper proposes a real-time collision warning system using a mono camera and deep learning. It detects cars using YOLO and estimates distances using modified depth equations. The road is divided into emergency, left, and right regions to prioritize collision risks. A tracking algorithm calculates relative speeds to determine a safe speed limit. The system achieves 23 FPS with accuracy over 80% for distance estimation and 93% for object detection.

Uploaded by

trí nguyễn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Real-Time Collision Warning System Based On Computer Vision Using Mono Camera

This paper proposes a real-time collision warning system using a mono camera and deep learning. It detects cars using YOLO and estimates distances using modified depth equations. The road is divided into emergency, left, and right regions to prioritize collision risks. A tracking algorithm calculates relative speeds to determine a safe speed limit. The system achieves 23 FPS with accuracy over 80% for distance estimation and 93% for object detection.

Uploaded by

trí nguyễn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/345985767

Real-Time Collision Warning System Based on Computer Vision Using Mono


Camera

Conference Paper · October 2020


DOI: 10.1109/NILES50944.2020.9257941

CITATION READS

1 152

5 authors, including:

Ahmed Ibrahim Andrew E Tawfiles


Nile University Institute of aviation engineering and technology (IAET), EGYPT
3 PUBLICATIONS   1 CITATION    2 PUBLICATIONS   1 CITATION   

SEE PROFILE SEE PROFILE

M. Saeed Darweesh Tawfik Ismail


Nile University Cairo University
46 PUBLICATIONS   105 CITATIONS    100 PUBLICATIONS   287 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Tracking and sensing in sports games View project

7th Biennial Colloquium on OPTICAL WIRELESS COMMUNICATIONS - 12th IEEE/IET International Symposium on Communication Systems, Networks and DSP, 20-22 July
2020, Porto, PORTUGAL - https://csndsp2020.av.it.pt View project

All content following this page was uploaded by Tawfik Ismail on 17 November 2020.

The user has requested enhancement of the downloaded file.


Real-Time Collision Warning System Based on
Computer Vision Using Mono Camera
Ahmed M. Ibrahim1, Rania M. Hassan2, Andrew E. Tawfiles1, M. Saeed Darweesh3, and Tawfik Ismail3,4
1Institute of Aviation Engineering and Technology, Egypt
2
Valeo Incorporation, Egypt
3
WINC: School of Engineering and Applied Sciences, Nile University, Egypt
4
National Institute of Laser Enhanced Sciences, Cairo University, Egypt

Abstract– This paper aims to help self-driving cars and autonomous This model is trained on a local machine with more datasets taken
vehicles systems to merge with the road environment safely and from a mono camera to detect cars from the road in different
ensure the reliability of these systems in real life. Crash avoidance is environments. Then the distance of the cars is calculated by the
a complex system that depends on many parameters, the forward-
depth equation used in [5]. The main weak point in the depth
collision warning system is simplified into four main objectives:
detecting cars, depth estimation, assigning cars into lanes (lane
equation in [5] was that the proposed depth estimation algorithm
assign) and tracking technique. The presented work targets the built using the iPhone camera. It is not a realistic solution in the
software approach by using YOLO (You Only Look Once), which is automation field, so there was a problem with the distance
a deep learning object detector network to detect cars with an equation [5]. Also, it depends on the camera specifications.
accuracy of up to 93%. Therefore, apply a depth estimation However, the first step was modifying the depth estimation
algorithm that uses the output boundary boxes’ s dimensions (width equation to fit the new camera specifications and keep the range
and height) from YOLO. These dimensions used to estimate the of depth estimation accuracy of 80.4 % [5].
distance with an accuracy of 80.4%. Also, a real-time computer vision
algorithm is applied to assign cars into lanes. However, a tracking On the other hand, the distance of the cars is not the only parameter
proposed algorithm is applied to evaluate the speed limit to keep the to rely on in crash avoiding problem. Therefore, there is other
vehicle safe. Finally, the real-time system achieved for all algorithms information that should be determined, such as the speed of other
with streaming speed 23 FPS (frame per second). cars and in which region the other cars reside. So, one of the
proposed solutions divides the road into three virtual regions; the
Keywords: Self-driving Cars, Computer Vision, YOLO, Crash Emergency region (E), Left region (L), and Right region (R), as
Avoidance, Depth Estimation, Real-time System. shown in Fig. 1. The more interest will be given to the cars in the
emergency region than the cars in the other regions. The
I. INTRODUCTION emergency region is defined to be the region in front of the vehicle,
which has to be clear and safe, while the other two regions
Nowadays, cars are the most common method of transport. They represent the second priority after the emergency lane. The more
are comfortable, safe and they give a slim, but non-zero risk of an the car approaches the emergency lane, the higher priority it is.
accident. Cars remain one of the deadliest transport systems; That is why the proposed system defines a threshold for the
however, only motorcycles that are increasingly overshadowed. relative speed with cars in the emergency region only.
According to the World Health Organization, they are also an
unsustainable economic burden, costing between 1 and 3 percent
of each country's GDP [1]. The safety element in cars is an
essential pillar in the reduction of accidents around the world.
Self-driving vehicles have five core components [2]: computer
vision, sensor fusion, localization, optimal path, and control unit.
Regarding computer vision, it uses a camera to make a vision
system of the car like that in humans. So, the computer can extract
information and features from these images as humans do.
There are many approaches to provide a vision system to the cars
regarding the hardware, such as radar, lidar, a stereo vision system,
or a single camera. The stereo vision system consists of two
cameras embedded together to provide depth information [3]. The
single (mono) camera has been used in many applications to
provide computer vision applications. regarding the software
approach, some many networks and algorithms can achieve object
detection and depth estimation efficiently.
This paper focuses on a mono camera and a pre-trained model Fig. 1. Three regions of the road
using the YOLO network with the COCO dataset from darknet [4].
A tracking technique should be created to give each car an index regression techniques as mentioned in [5-6], the depth estimation
and keep the same index for each car through all coming frames. equation (Eq. 1) is modified using trial and error algorithm that
Therefore, the relative speed could be calculated from the tries different values for one parameter ( ) from 0 to 1 in the
difference between the distances of the same car at two different equation (this parameter comes out from regression algorithm and
frames and divided over the time between these two frames. does not have any physical meaning) then the algorithm compares
Finally, calculate the speed limit that represents the value of speed the output estimated distance at each value of ( ) with the real
that should keep our vehicle in a safe situation.
distance from real dataset. Table I shows the most precise three
The remainder of the paper is structured as follows: Section II values from the self-generating algorithm, the most accurate value
discusses the proposed anti-collision algorithm. Section III was at = 0.00540.
presents the simulation results. Finally, Section IV summarizes the
conclusions. = [2021.256 − 1.276714 ∗ ℎ ℎ − 0.6042361 ∗ width
+ 0.0004751 ∗ height ∗ width] (1)
II. PROPOSED ANTI-COLLISION ALGORITHM
Table I. Comparison between correction values of the depth estimation equation
A) Object Detection
Front
In this section, a multi-task algorithm is proposed to extract from # α Equation
Acc.
each frame all needed information while maintaining the Dist=(0.00740)[2021.256 −
principles of real-time, YOLO is object detection tool that divides 1 0.00740 1.276714*height − 0.6042361*width + 78%
the input image into an (S×S) grid as shown in Fig. 2. Each grid 0.0004751*height*width]
cell predicts only one object. For example, the yellow grid cell Dist=(0.00606)[2021.256 −
below tries to predict the “person” object whose center (the blue 2 0. 00606 1.276714*height − 0.6042361*width + 80.4%
dot) falls inside the grid cell. Each grid cell predicts a fixed number 0.0004751*height*width]
of boundary boxes. In this example, the yellow grid cell makes Dist=(0.00540)[2021.256 −
two boundary box predictions (blue boxes) to locate where the 3 0. 00540 1.276714*height − 0.6042361*width + 80%
object is. There are several versions of YOLO, but version 3 was 0.0004751*height*width]
chosen according to its speed and accuracy [9]. Also, V3 is
suitable for hardware used in processing.
C) Lane Assignation & Tracking Algorithm
After applying depth estimation, two algorithms developed to
complete the task; lane assignation and tracking algorithm [9]. For
the lane assignation algorithm, geometric principles were used to
assign cars into lanes depending on each vehicle’s bounding box.

horizon

Fig. 2. Grid cells [4]

The YOLO is combined with several algorithms, such as depth


estimation and computer vision, in a single algorithm to achieve Fig. 3. Road regions
the desired real-time collision warning system. This algorithm
Fig. 3 shows the three regions under the horizon line. The horizon
uses in autonomous cars; that is why the latency is a very critical
line is dependent on the camera position in the presented work; the
parameter that should keep minimum as possible. So, the YOLO
horizon line represented 55% of the height. As shown in Fig. 3 the
is used, since the lower the number of classes are detected by
image is divided virtually with two significant lines into four
YOLO, the less the processing will be, the higher the performance
quarters; the upper half is unused as it represents the sky or the far
is.
vehicles, while the down half represents the road (in the presented
work), which is divided into three regions. Lane assignation is a
B) Depth Estimation
very critical task, as it will determine which vehicle to give interest
First, depth estimation algorithm uses the boundary boxes (width and which not, So, the performance of this task must be optimal
and height) from YOLO and get the distance using multiple [9].
The problem was which point of the bounding box will be used to only. So, the algorithm will be more simple and faster, also the
represent the entire box (the car)? Lane assigning algorithm is need to be aware of the relative speed of the cars in front of our
divided into two layers/phases; layer 1, is to determine at what vehicle.
quarter the vehicle is moving. In this layer, the center of the
Fig. 5 shows the concept of the tracking system. After the first
bounding box uses as a reference point, as shown in Fig.4. The
detection, the positions of the vehicles in the emergency lane are
primary role of this layer is to determine if the vehicle is above the
saved, then start comparing at every frame the new position of the
horizon (it does not represent any interest right now), regardless
vehicles in the emergency lane with the last saved positions. If the
of the interest or if the vehicle is below the horizon so it is in the
difference of vehicle’s positions is within the threshold (difference
exciting area and initially determined if it is in the right side or the
< 3%) so it considered as the same vehicle, hence the tracking is
left side [6].
achieved, the threshold is presented with blue circle around the
For layer 2, two virtual lanes are created, which represent the box’s center as shown in Fig. 5.
width of the vehicle; this layer is more critical than the first layer.
Thus, the results are expected to be precisely accurate, especially
in the busy roads. In this layer, two different reference points are
used depending on the output of the first layer, as shown in Fig. 4.
If the vehicle was detected at the right half, the bottom-left point
of the bounding box is used as the reference point to the vehicle.
This reference point substituted in the right virtual lane equation
to determine whether the vehicle is in the emergency lane or not;
if not, automatically, the car will be assigned to the right lane.
While if the car was detected at the left part, the bottom-right point
of the bounding box is the reference point to the car. This reference
point is used to determine whether the car is in the emergency lane
or not; if not, automatically, the car will be assigned to the left
lane. Fig. 4 shows how the input image converted after object
detection into a geometric problem to assign vehicles to their
actual lanes [11]. Fig. 5. Tracking algorithm methodology

D) Relative speed & Speed limit


After tracking is accomplished, the relative speed calculated by
subtracting the old distance from the new one. It is then dividing
the result over the time between the two frames. Another case here
is that the time between frames is not deterministic since it
depends on the detecting time taken by YOLO and the processing
time, which depends on the computational power. Therefore, the
time between frames has to be calculated every time separately.

Fig. 4. The assigning technique illustration

Finally, the tracking algorithm is responsible for tracking the same


vehicles through all the frames, so the change in its distance could
be evaluated then calculating its relative speed. The tracking
algorithm is a complex problem, especially if the road is crowded
with vehicles since it is a must to compare all vehicles in the
current frame with all vehicles in the previous frame, then a
suitable threshold (error circle) shown in Fig. 5, was estimated. If
the new position differs by a maximum of 5% from the last
position, so it is the same vehicle. Hence tracking the same vehicle
through the frames is achieved but it’s heavy work and increases
the delay of the system.
So, here is the second importance of the lane assigning algorithm,
hence after the vehicles are divided into three categories, the
tracking algorithm is applied on the vehicles in the emergency lane Fig. 6. Relative speed calculation flow chart
As shown in Fig. 6. before starting the car’s detection, a timer
starts counting and after getting the distance, the timer stops and
use the difference between the two values of the timer in the
relative speed equation. The speed limit is the last step in the
proposed algorithm to be calculated then the driver will be
admonished immediately if any emergency case transpired.
To assess the criticality of the situation, there are many parameters
should be taken into consideration. The most important one is the
speed limit, at first, the relative speed should be kept equal to zero
then start assessing the situation of the leading vehicle, so the
speed limit is the speed that will make the relative speed equals to
zero.

III. SIMULATION RESULTS Fig. 8. Lane assign results

The performance of the proposed algorithm depends on real-time


detection, computer vision, and the hardware computational
power used. In the presented work NVidia Jetson TX2 Kit is used,
which is the most power-efficient embedded AI (Artificial
Intelligence) computing device, built around an NVIDIA Pascal-
family GPU and supported with 8 GB of memory and 59.7 GB/s
of memory bandwidth, as a trail for reaching 23 frames per second.
To study the performance of the proposed algorithm, there was no
access to implement the proposed algorithm on a real car. So, the
algorithm was tested in-lab using NVidia TX2, 15” LCD display,
and a recorded video was taken from a mono camera on a real
vehicle. The results and performance evaluation demonstrated the
capability of the proposed algorithm by calculating the separated
distance of other vehicles through depth estimation algorithm with
accuracy up to 80.4%, as shown in Fig. 7. Fig. 9. The speed limit warning on real-time environment

Finally, the relative speed of the front vehicle and speed limit is
evaluated with the same accuracy at 80.4% the same as the depth
estimation. So, the main target is achieved in real-time with
noticeable accuracy as shown in Fig. 9.

IV. CONCLUSION
In this paper, the main objective is to design a crash avoidance
system to achieve the main four objectives: car detection, depth
estimation, lane assignation, and car tracking. These four
objectives are achieved by using only a mono camera. The object
detection system was made using YOLO pre-trained network.
After that, a depth estimation algorithm is applied then the frame
taken by YOLO will be passed to a real-time computer vision
Fig. 7. Depth estimation results technique to assign cars into lanes. Finally, the tracking algorithm
is applied to evaluate the relative speed and the speed limit to
Secondly, the lane assignment algorithm was done with accuracy avoid crashes. Using YOLO on a local machine does not achieve
up to 93% which is the accuracy of the object detection, Since the acceptable real-time results. So, the Jetson TX2 kit used to achieve
accuracy of the detection was upgraded by fine-tuning the real-time performance. The depth estimation algorithm that used
convolutional neural network (CNN) with more images from from the mentioned paper [4] had a critical missing parameter
environment conditions close to that the system will work in, as related to the used camera. So, a modification was made on it to
shown in Fig. 8. The algorithm has assigned each vehicle into its reach the desired performance. For lane assign, a computer vision
lane the letter (E) refers to the emergency lane, and the algorithm technique used with applying the geometric principles, which
automatically colors any detection box with the label (E) with the guarantee the accuracy and present a light load on the processing,
red color, while the right and left lanes take the letters (R) and (L) so it achieves the real-time concept with accuracy up to 93% with
respectively and both take the green color. 23 FPS.
V. REFERENCES
[1] “Road Traffic Deaths,” in World Health Organization, [online]
Available: http://www.who.int/gho/road_safety/mortality/en/
[2] T. Litman, “Autonomous Vehicle Implementation Predictions:
Implications for Transport Planning,” in Victoria Transport Policy
Institute, vol. 28, 2017.
[3] Y. M. Mustafah, R. Noor, H. Hasbi and A. W. Azma, “Stereo vision
images processing for real-time object distance and size
measurements,” in International Conference on Computer and
Communication Engineering (ICCCE), Kuala Lumpur, pp. 659-663,
2012.
[4] YOLO: Real-Time Object Detection,
https://pjreddie.com/darknet/yolo/, September 2019.
[5] Elzayat, M. I., A. M. Saad, M. M. Mostafa, M. R. Hassan, M. S.
Darweesh, H. Abdelmunim, and H. Mostafa, “Real-Time Car
Detection-Based Depth Estimation Using Mono Camera,” in IEEE
International Conference on Microelectronics (ICM 2018), Sousse,
Tunisia, 2018.
[6] Ott, M. Longnecker, and J. D. Draper, “An introduction to statistical
methods and data analysis,” MA: Cengage Learning, 2016.
[7] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look
Once: Unified, Real-Time Object Detection,” in IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV,
pp. 779-788, 2016.
[8] Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi,
Philip H. S. Torr, “Fully-Convolutional Siamese Networks for Object
Tracking,” in European Conference on Computer Vision (ECCV), pp.
850-865, 2016.
[9] Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael
Felsberg, “ECO: Efficient Convolution Operators for Tracking,” in
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2017.
[10] M. U. Akhlaq, U. Izhar and U. Shahbaz, “Depth estimation from a
single camera image using power fit,” in International Conference on
Robotics and Emerging Allied Technologies in Engineering
(iCREATE), Islamabad, pp. 221-227, 2014.

View publication stats

You might also like