Scene Description for Visually Impaired People with Multi-Label Convolutional SVM Networks
Abstract
:1. Introduction
2. Proposed Methodology
2.1. A. Convolution Layer
Algorithm 1 Convolution layer |
Input: Training images ; SVM filters: filter parameters (width , and stride);
size of sampled training set for generating a single feature map. Output: Feature maps: 1: ; 2: 3: ; 4: |
2.2. B. Fusion Layer
Algorithm 2 Pooling and fusion |
Input: Feature maps: = 1,…, K produced by the ith CSVM branch Output: Fusion result: 1: Apply an average pooling to each to generate a set of activation maps of reduced spatial size. 2: Fuse the resulting activation using max rule to generate the feature map . These maps will be used as a common input to the next convolution layers in each CSVM branch. |
2.3. C. Feature Generation and Classification
3. Experimental Results
3.1. Dataset Description
3.2. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Blindness and Vision Impairment. Available online: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment (accessed on 12 September 2019).
- Ulrich, I.; Borenstein, J. The GuideCane-applying mobile robot technologies to assist the visually impaired. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2001, 31, 131–136. [Google Scholar] [CrossRef]
- Shoval, S.; Borenstein, J.; Koren, Y. The NavBelt-a computerized travel aid for the blind based on mobile robotics technology. IEEE Trans. Biomed. Eng. 1998, 45, 1376–1386. [Google Scholar] [CrossRef]
- Bahadir, S.K.; Koncar, V.; Kalaoglu, F. Wearable obstacle detection system fully integrated to textile structures for visually impaired people. Sens. Actuators A Phys. 2012, 179, 297–311. [Google Scholar] [CrossRef]
- Shin, B.-S.; Lim, C.-S. Obstacle Detection and Avoidance System for Visually Impaired People. In Proceedings of the Haptic and Audio Interaction Design, Seoul, Korea, 29–30 November 2007; Oakley, I., Brewster, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 78–85. [Google Scholar]
- Bousbia-Salah, M.; Bettayeb, M.; Larbi, A. A Navigation Aid for Blind People. J. Intell. Robot. Syst. 2011, 64, 387–400. [Google Scholar] [CrossRef]
- Brilhault, A.; Kammoun, S.; Gutierrez, O.; Truillet, P.; Jouffrais, C. Fusion of Artificial Vision and GPS to Improve Blind Pedestrian Positioning. In Proceedings of the 2011 4th IFIP International Conference on New Technologies, Mobility and Security, Paris, France, 7–10 February 2011; pp. 1–5. [Google Scholar]
- Hasanuzzaman, F.M.; Yang, X.; Tian, Y. Robust and Effective Component-Based Banknote Recognition for the Blind. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2012, 42, 1021–1030. [Google Scholar] [CrossRef]
- López-de-Ipiña, D.; Lorido, T.; López, U. BlindShopping: Enabling Accessible Shopping for Visually Impaired People through Mobile Technologies. In Proceedings of the Toward Useful Services for Elderly and People with Disabilities, Montreal, QC, Canada, 20–22 June 2011; Abdulrazak, B., Giroux, S., Bouchard, B., Pigot, H., Mokhtari, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 266–270. [Google Scholar]
- Tekin, E.; Coughlan, J.M. An Algorithm Enabling Blind Users to Find and Read Barcodes. Proc. IEEE Workshop Appl. Comput. Vis. 2009, 2009, 1–8. [Google Scholar]
- Pan, H.; Yi, C.; Tian, Y. A primary travelling assistant system of bus detection and recognition for visually impaired people. In Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), San Jose, CA, USA, 15–19 July 2013; pp. 1–6. [Google Scholar]
- Jia, T.; Tang, J.; Lik, W.; Lui, D.; Li, W.H. Plane-based detection of staircases using inverse depth. In Proceedings of the Australasian Conference on Robotics and Automation (ACRA), Wellington, New Zealand, 3–5 December 2012. [Google Scholar]
- Chen, X.R.; Yuille, A.L. Detecting and reading text in natural scenes. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 June–2 July 2004; Volume 2, p. II. [Google Scholar]
- Yang, X.; Tian, Y. Robust door detection in unfamiliar environments by combining edge and corner features. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 57–64. [Google Scholar]
- Wang, S.; Tian, Y. Camera-Based Signage Detection and Recognition for Blind Persons. In Proceedings of the Computers Helping People with Special Needs, Linz, Austria, 11–13 July 2012; Miesenberger, K., Karshmer, A., Penaz, P., Zagler, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 17–24. [Google Scholar]
- Fernandes, H.; Costa, P.; Paredes, H.; Filipe, V.; Barroso, J. Integrating Computer Vision Object Recognition with Location Based Services for the Blind. In Proceedings of the Universal Access in Human-Computer Interaction. Aging and Assistive Environments, Heraklion, Greece, 22–27 June 2014; Stephanidis, C., Antona, M., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 493–500. [Google Scholar]
- Tian, Y.; Yang, X.; Yi, C.; Arditi, A. Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Mach. Vis. Appl. 2013, 24, 521–535. [Google Scholar] [CrossRef]
- Malek, S.; Melgani, F.; Mekhalfi, M.L.; Bazi, Y. Real-Time Indoor Scene Description for the Visually Impaired Using Autoencoder Fusion Strategies with Visible Cameras. Sensors 2017, 17, 2641. [Google Scholar] [CrossRef]
- Mekhalfi, M.L.; Melgani, F.; Bazi, Y.; Alajlan, N. Fast indoor scene description for blind people with multiresolution random projections. J. Vis. Commun. Image Represent. 2017, 44, 95–105. [Google Scholar] [CrossRef]
- Moranduzzo, T.; Mekhalfi, M.L.; Melgani, F. LBP-based multiclass classification method for UAV imagery. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 2362–2365. [Google Scholar]
- Mariolis, I.; Peleka, G.; Kargakos, A.; Malassiotis, S. Pose and category recognition of highly deformable objects using deep learning. In Proceedings of the 2015 International Conference on Advanced Robotics (ICAR), Istanbul, Turkey, 27–31 July 2015; pp. 655–662. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Proceedings of the Computer Vision–ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 346–361. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1, Lake Tahoe, Nevada, 3–6 December 2012; Curran Associates Inc.: Brooklyn, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Zhang, X.; Sun, J. Object Detection Networks on Convolutional Feature Maps. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1476–1481. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Ouyang, W.; Zeng, X.; Wang, X.; Qiu, S.; Luo, P.; Tian, Y.; Li, H.; Yang, S.; Wang, Z.; Li, H.; et al. DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1320–1334. [Google Scholar] [CrossRef] [PubMed]
- Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Learning Hierarchical Features for Scene Labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1915–1929. [Google Scholar] [CrossRef] [PubMed]
- Sun, W.; Wang, R. Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM. IEEE Geosci. Remote Sens. Lett. 2018, 15, 474–478. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Chatfield, K.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Return of the Devil in the Details: Delving Deep into Convolutional Nets. arXiv 2014, arXiv:1405.3531. [Google Scholar]
- Azizpour, H.; Razavian, A.S.; Sullivan, J.; Maki, A.; Carlsson, S. Factors of Transferability for a Generic ConvNet Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1790–1802. [Google Scholar] [CrossRef]
- Nogueira, R.F.; de Alencar Lotufo, R.; Machado, R.C. Fingerprint Liveness Detection Using Convolutional Neural Networks. IEEE Trans. Inf. Forensics Secur. 2016, 11, 1206–1213. [Google Scholar] [CrossRef]
- Gao, C.; Li, P.; Zhang, Y.; Liu, J.; Wang, L. People counting based on head detection combining Adaboost and CNN in crowded surveillance environment. Neurocomputing 2016, 208, 108–116. [Google Scholar] [CrossRef]
- Bazi, Y.; Melgani, F. Convolutional SVM Networks for Object Detection in UAV Imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3107–3118. [Google Scholar] [CrossRef]
- Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; Lin, C.-J. Liblinear: A Library for Large Linear Classification. J. Mach. Learn. Res. 2008, 9, 1871–1874. [Google Scholar]
- Chang, K.; Lin, C. Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines. J. Mach. Learn. Res. 2008, 9, 1369–1398. [Google Scholar]
Dataset | Layer 1 | Layer 2 | ||
---|---|---|---|---|
Number of SVMs | Spatial Size | Number of SVMs | Spatial Size | |
Dataset 1 | 512 | 7 | / | / |
Dataset 2 | 32 | 7 | 256 | 3 |
Dataset 3 | 1 | 3 | 2 | 3 |
(a) | |||
Method | SEN (%) | SPE (%) | AVG (%) |
ResNet | 71.16 | 93.84 | 82.50 |
GoogLeNet | 78.65 | 94.34 | 86.49 |
VGG16 | 74.53 | 94.58 | 84.55 |
Ours | 89.14 | 84.26 | 86.70 |
(b) | |||
Methods | SEN (%) | SPE (%) | AVG (%) |
ResNet | 89.54 | 96.38 | 92.96 |
GoogLeNet | 83.63 | 96.86 | 90.25 |
VGG16 | 81.81 | 96.14 | 88.98 |
Ours | 93.64 | 92.17 | 92.90 |
(c) | |||
Methods | SEN (%) | SPE (%) | AVG (%) |
ResNet | 64.17 | 92.40 | 78.29 |
GoogLeNet | 62.50 | 93.27 | 77.88 |
VGG16 | 64.32 | 93.98 | 79.15 |
Ours | 80.79 | 82.27 | 81.53 |
Dataset | Runtime (s) |
---|---|
Dataset 1 | 76 |
Dataset 2 | 42 |
Dataset 3 | 8 |
(a) | |
Method | Runtime (s) |
ResNet | 0.207 |
GoogLeNet | 0.141 |
VGG16 | 0.291 |
Ours | 0.206 |
(b) | |
Method | Runtime (s) |
ResNet | 0.208 |
GoogLeNet | 0.144 |
VGG16 | 0.291 |
Ours | 0.115 |
(c) | |
Method | Runtime (s) |
ResNet | 0.207 |
GoogLeNet | 0.145 |
VGG16 | 0.300 |
Ours | 0.002 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bazi, Y.; Alhichri, H.; Alajlan, N.; Melgani, F. Scene Description for Visually Impaired People with Multi-Label Convolutional SVM Networks. Appl. Sci. 2019, 9, 5062. https://doi.org/10.3390/app9235062
Bazi Y, Alhichri H, Alajlan N, Melgani F. Scene Description for Visually Impaired People with Multi-Label Convolutional SVM Networks. Applied Sciences. 2019; 9(23):5062. https://doi.org/10.3390/app9235062
Chicago/Turabian StyleBazi, Yakoub, Haikel Alhichri, Naif Alajlan, and Farid Melgani. 2019. "Scene Description for Visually Impaired People with Multi-Label Convolutional SVM Networks" Applied Sciences 9, no. 23: 5062. https://doi.org/10.3390/app9235062