Detection of Falling Objects in Tracks by Using Deep Learning
Detection of Falling Objects in Tracks by Using Deep Learning
If tools, bolts, and and other objects remain on railroad tracks after the com-
Abstract pletion of maintenance and improvement work, there is a serious threat that a
dangerous accident can occur. In order to eliminate accidents caused by such
fall objects, checking work for objects is carried out immediately after the work
ends. For automating this checking work, we developed a detection technology
for fallen objects on railroad tracks by using an image processing function of a
camera.
For the detection of fallen objects, we adopted the method of deep learning
that is noted for its high performance in resolving prediction problems and its
attention to detail. By the deep learning, the presence of fallen objects displayed
on the screen can be detected. By virtue of this expertise, the identification
accuracy of fallen objects and debris is raised to 99.3%. If this technology is
adopted for the dedicated inspection vehicle that is used for the checkout work
for finding fallen objects and debris on the tracks after the maintenance and/or
improvement work, it will contribute considerably to the automation of fallen
object detection work for railroad facilities.
input image. In this approach, an accurate image is lems in the image recognition field. As such, fine
taken, only of the ballast (gravels laid on roads and tuning is believed to be effective. Still more, fine tun-
tracks), sleepers, and rails. An image where any ing offers an advantage featuring efficient learning
fallen object is taken is regarded as an abnormal based on a small amount of learning data. In this
image. In this approach, a technology of deep learn- approach, the fully connected layer consisting of
ing is used only for two classes of recognition: a three layers can be made to learn newly with a vec-
normal image and an abnormal image. Deep learn- tor input of 4096 dimensions obtained from the
ing is one machine learning approach where a AlexNet convolution layer. Fig. 2 shows a network
large-scale neural network having deep structures configuration used for auto-detection techniques.
is employed. Recently, this approach has been rec- Using a network obtained from the aforementioned
ognized in the world for its outstanding performance learning, 2-class recognition of normal image and
in the field of identification and prediction of prob- abnormal image is carried out.
lems.
This approach employs a network structure
3 Accuracy Verification
proposed by Alex et al. ( AlexNet hereafter). Among
many deep learning network structures, the AlexNet We verified an accuracy of this technology
has proven track records in the field of image object through the photo-taking of normal and abnormal
recognition. Since the AlexNet has a network struc- images in the daytime inside the railroad track. The
ture where images are identified into 1000 classes, number of data was 26,562 in total of both normal
the output of the fully connected layer also contains and abnormal images. 87.5% of total data volume
1000 classes. Apart from this, the output of our for both normal and abnormal images was used as
approach involves only two classes: normal image the learning data respectively. Each remainder of
or an abnormal image. For this reason, it is neces- 12.5% was used as evaluation data. Fig. 3 shows a
sary to make fine-tuning beforehand in order to relationship between learning frequencies and iden-
adapt the fully connected layer of the AlexNet to our tification accuracy. Around 300 times of learning fre-
approach. Fine-tuning is a process when a network quencies, the identification accuracy came to con-
finishing learning a certain problem, it is then made vergence. The obtained identification accuracy was
to learn the last part of the weight again so that it 99.7% for the learning data and 99.3% for the eval-
can then adapt it to another problem. The convolu- uation data. Fig. 4 shows an example of successful
tion layer of deep learning can be used as an fallen object detection. Input images are shown on
extractor feature. In addition, it has a characteristic the left side while the right side shows images where
capable of extraction common to a variety of prob- the influence rate on the recognition is visualized.