A Deep Learning Based Assistant For The Visually Impaired
A Deep Learning Based Assistant For The Visually Impaired
A Deep Learning Based Assistant For The Visually Impaired
ABSTRACT
For visually impaired people to carry out basic tasks like recognizing objects, people in the background are very challenging.
The paper presents work done in the field of object detection for visually impaired people. It mainly focuses to detect sharp,
dangerous objects like a fork, knife, gas stove, stairs, and microwave using a pertained model. Google’s open image dataset v6
is used for training the R- CNN.
Keywords: Gas stove, Knife, Stair’s detection, Deep learning, Dataset, R-CNN.
1. INTRODUCTION
According to the world health organization (WHO) in 2012, 285 million peoples were visually impaired people in the
world. Roughly 36 million individuals are blind among them and the rest 217 million individuals have different vision
impairments [1].
Now a days dangerous and sharp object detection is very important in the research field. New technology is arriving
every day which makes our living more comfortable. But the life of visually impaired people is still difficult. The
visually impaired requires more help in their daily life. To make their life more comfortable like us we can use new
technologies and develop models for their assistance. Some new models are already developed to make them
independent like to solve a traveling problem of visually impaired people staircase detection is done using pertain
model and sensors [1]. Smart Cap is developed for visually impaired people to interact with people with some
commands. This smart cap includes features like face, text, and image captioning [12].
In our day to day lives we come across many dangerous and sharp objects even at our home. Such dangerous and
sharp objects are mainly found in kitchen places. In this paper, some of the dangerous objects such as gas stove, sharp
knife and stair case are used to for designing our object detection model that will help visually impaired people.
Dangerous and sharp object detection is done using a pertained model. For the pertained model of a specific class of
objects like a gas stove, knife and staircase are used. We have taken these three classes of object images from Google’s
open image dataset v6. These images have different categories of a gas stoves and stair case. We are detecting the ON
and OFF condition of a gas stove, up and down stair and knife with high confidence score. These images are trained
using regions with convolution neural networks (R-CNN) which is a good object detection model.
2. LITERATURE REVIEW
There are different proposed methods provided in application areas like object detection, face recognition, text
recognition, and human activity recognition in recent years [11]. Literature review is described in Table 1.
Table 1: Literature review.
3. PROPOSED WORK
The block diagram of the proposed system is shown in Figure 1.
The image dataset which contains images is given as an input to the system. Images are labeled using the image labeler
application in MATLAB. R-CNN pretrained network is used for training. Inputs which are required for training R-
CNN object detector are.
Training data: Labeled Image dataset in input to detector. Dataset is in table format which contains grayscale or
true color images. The table contains two or more columns. In this table, the first column must be an image
filename and other columns are single object class.
Network: Which network is used to train our detector is specified here. Some valid networks are listed
alexnet’,'vgg16','vgg19','resnet18','resnet50','resnet101','inceptionv3','googlenet','inceptionresnetv2','squeezenet','
mobilenetv2’. Alexnet network is used in our system.
Option: Training parameters of a neural network are defined in option.
Volume 10, Issue 5, May 2021 Page 103
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 10, Issue 5, May 2021 ISSN 2319 - 4847
After training the detector testing is done. For testing we have given one image to the network to verify whether the
system detects the correct object in that image. And at the output stage we get the final object detected.
3.1 Dataset
Dataset includes a number of images of a number of classes. In this project common object in context (COCO2017)
dataset is used. COCO2017 has a maximum number of dangerous objects are found. Some classes of objects are
downloaded from Google open image Dataset V6 which comes with labels and annotation for each image. The
dangerous and sharp object list is shown in Table 2.
Table 2: Dangerous object list.
Dangerous Object List
Fork Bicycles
Knife Car
Microwave Oven Bus
Edges and corner of Train
tables
Edges and corner of Traffic
chairs
Door Wild animals
Scissors Trees
Broken glass Stair cases
Fungal food Drainage
Footpaths Gas Cylinder
Figure 3(a). ON
Using edge box algorithm region proposals are generated by RCNN detector. From the images these region
proposals are resized and cropped. Support vector machine is used for refinement of region proposals bounding boxes.
The function ’trainRCNNObjectDetector’ is used to train detector. Detected object in an image is output of the detector.
Figure 4 shows R-CNN detector.
For our system Alexnet network is used. Using Alexnet network we have done comparative analysis on the training
parameters. These analyses are explained in Table 3 below. Depending upon analysis done, we have used Sr. No 1
parameter in our system as it has less time duration is less and better accuracy compared to others.
Table 3: R-CNN Network parameters.
Sr. No Parameter Time MiniBatch MinibBatc
Elapsed Accuracy h Loss
1 Epoch-10 00.31.41 96.88% 0.2390
Mini Batch size-
32
Learn Rate-le-4
2 Epoch-10 00.33.20 95.31% 0.1139
Mini Batch size-
64
Learn Rate-le-6
3 Epoch-10 00.33.00 94.53% 0.1738
Mini Batch size-
128
Learn Rate-0.01
Training progress of knife class of object is shown in below Figure 5. Figure 5 shows the training loss for each
iteration to number of iterations.
Input Data: Load the image data. Image datastore automatically label the images and store in image datastore
object. Split the dataset in to training and validating. For our system 60% images are used for training and
40% for validating. Store these split images in to new tow datastores. Figure 8 shows some sample of images
from dataset.
Load Pretrained Network: Load alexnet Pretrained network. We have to install deep learning toolbox model for
alexnet network. First layer of this network is Data. Image’s size of 227x227x3 is required for input data.
Final layers are replaced to do fine tuning for new classification images. These layers are replaced with fully
connected layer, softmax layer and classification output layer. In fully connected layer filter size is equal to
number of object classes. In this model we use three classes of objects, so filter size kept three for the network.
Train Network: Network is trained by giving training options. Training options used in this system is shown in
table5. Augmented image datastore is used which resizes images during training. Training progress of the
network is shown in figure 9.
Classify the validation images: “Fine-tuned networks are used to classify the validation images” [17]. Classified
images are shown in figure 10.
4. RESULTS
Object detected with high confidence score of class knife, gas stove and stair are shown in Table 6. For Gas stove ‘ON’
and ‘Off’ condition of object is detected. Stair case ‘Upstair’ and ‘Downstair’ condition is detected. Using Alexnet
pretrained network classification of these three classes of object is done. We have achieved 91.8% of accuracy rate for
the classification of the images. Confusion matrix is shown in Figure 11.
Table 6: Results A) test images B) detected images
Figure 11 Confusion matrix for gas stove, knife and stair class of object
5. CONCLUSION
In this paper dangerous object detection is done using pertained R-CNN model. The gas stove is detected for both ON
and OFF conditions. The stair case is detected for both Upstair and Downstair condition. Sharp object knife is also
detected with high confidence score. We can develop an alert system for the same. For optimal run time performance,
we can use a faster R-CNN network.
References
[1] Md. Ahsan Habib, Md. Milon Islam, Md. Milon Islam,Mahmudul Hasan,“Staircase Detection to Guide Visually
Impaired People: A Hybrid Approach “, International Journal of Computer Science and Information Security
(IJCSIS),Vol. 16, No. 12, December 2018.
[2] S Prabakaran, Samanvya Tripathi, and Utkarsh Nagpal, “ Navigational Aid for the Blind Using Deep Learning on
Edge Device”, International Journal of Advanced Science and Technology Vol. 29, No. 3, (2020), pp. 11421 –
11433.
[3] Junlong Zhou, Jianming Yan, TongquanWei, Kaijie Wu, Xiaodao Chen, and Shiyan Hu, “Sharp Corner/Edge
Recognition in Domestic Environments Using RGB-D Camera Systems”, IEEE TRANSACTIONS ON CIRCUITS
AND SYSTEMS—II: VOL. 62, NO. 10, OCTOBER 2015.
[4] Yulong Wang , Hang Su, Bo Zhang, and Xiaolin Hu , Senior Member, IEEE, “Learning Reliable Visual Saliency
For Model Explanations”, IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 22, NO. 7, JULY 2020.
[5] Salma kammoun jarraya 1, wafa saad al-shehri 2, and manar salamah ali 1, “Deep Multi-Layer Perceptron-Based
Obstacle Classification Method From Partial Visual Information: Application to the Assistance of Visually
Impaired People”, IEE Access, VOLUME 8, 2020.
[6] Chanhum park , se woon cho , na rae baek , jiho choi ,and kang ryoung park , (Member, IEEE), “Deep Feature-
Based Three-Stage Detection of Banknotes and Coins for Assisting Visually Impaired, People”, IEE Access,
VOLUME 8, 2020.
[7] Wan-jung chang , (member, ieee), liang-bi chen , (Senior Member, IEEE),chia-hao hsu , jheng-hao chen , tzu-chin
yang , and cheng-pei lin, “MedGlasses: A Wearable Smart-Glasses-Based Drug Pill Recognition System Using
Deep Learning for Visually Impaired Chronic Patients”,IEE Access, VOLUME 8, 2020.
[8] Abdullah asim yilmaz , mehmet serdar guzel, erkan bostanci ,and iman askerzade, “A Novel Action Recognition
Framework Based on “Deep-Learning and Genetic Algorithms”, IEE Access, VOLUME 8, 2020.
[9] Hardik Gupta1, Dhruv Dahiya1, Malay Kishore Dutta1, Carlos M. Travieso2 and Jose Luis Vásquez-Nuñez3,
“Real Time Surrounding Identification for Visually Impaired using Deep Learning Technique”, IEEE
International Work Conference on Bioinspired Intelligence, July 3-5, 2019.
[10] Ashwani Kumar, S S Sai Satyanarayana Reddy, Vivek Kulkarni, “An Object Detection Technique For Blind
People in Real-Time Using Deep Neural Network”, 2019 Fifth International Conference on Image Information
Processing (ICIIP).
[11] Jinesh, A Shah, Aashreen Raorane, Akash Ramani Hitanshu Rami, Narendra Shekokar,“EYERIS: A Virtual Eye
to Aid the Visually Impaired”, 3rd International Conference on Communication System, Computing and IT
Applications (CSCITA), 2020.
[12] Amey Hengle, Atharva Kulkarni, Nachiket Bavadekar, Niraj Kulkarni, Rutuja Udyawar,” Smart Cap: A Deep
Learning and IoT Based Assistant for the Visually Impaired”, Proceedings of the Third International Conference
on Smart Systems and Inventive Technology (ICSSIT 2020).
[13] Ashwani Kumar, S S Sai Satyanarayana Reddy, Vivek Kulkarni, “An Object Detection Technique For Blind
People in Real-Time Using Deep Neural Network”, 2019 Fifth International Conference on Image Information
Processing (ICIIP).
Volume 10, Issue 5, May 2021 Page 111
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 10, Issue 5, May 2021 ISSN 2319 - 4847
[14] Mouna Afif*1, Riadh ayachi1, Yahia Said1,2,, Edwige Pissaloux3, Mohamed Atri1,4,,”Recognizing signs and
doors for Indoor Wayfinding for Blind and Visually Impaired Persons”, 5th International Conference on Advanced
Technologies For Signal and Image Processing, ATSIP' 2020,September 02-05, 2020, Sfax, Tunisia
[15] Saleh Shadi, Saleh Hadi, Mohammad Amin Nazari, Wolfram Hardt, “Outdoor Navigation for Visually Impaired
based on Deep Learning”.
[16] https://in.mathworks.com/help/vision/ug/getting-started-with-r-cnn-fast-r-cnn-and-faster-r-
cnn.html#mw_5ad75928-8822-4277-a1f6-6a762a5bda32.
[17] https://in.mathworks.com/help/deeplearning/ref/alexnet.html.