Visual Image Caption Generator Using Deep Learning

Sharma, Grishma; Kalena, Priyanka; Malde, Nishi; Nair, Aromal; Parkar, Saurabh

doi:10.2139/ssrn.3368837

Download This Paper

Open PDF in Browser

Add Paper to My Library

Visual Image Caption Generator Using Deep Learning

2nd International Conference on Advances in Science & Technology (ICAST) 2019 on 8th, 9th April 2019 by K J Somaiya Institute of Engineering & Information Technology, Mumbai, India

6 Pages Posted: 12 Apr 2019 Last revised: 16 May 2019

See all articles by Grishma Sharma

Image Caption Generation has always been a study of great interest to the researchers in the Artificial Intelligence department. Being able to program a machine to accurately describe an image or an environment like an average human has major applications in the field of robotic vision, business and many more. This has been a challenging task in the field of artificial intelligence throughout the years. In this paper, we present different image caption generating models based on deep neural networks, focusing on the various RNN techniques and analyzing their influence on the sentence generation. We have also generated captions for sample images and compared the different feature extraction and encoder models to analyse which model gives better accuracy and generates the desired results.

Keywords: CNN, RNN, LSTM , VGG, GRU, Encoder - Decoder

Suggested Citation: Suggested Citation

Sharma, Grishma and Kalena, Priyanka and Malde, Nishi and Nair, Aromal and Parkar, Saurabh, Visual Image Caption Generator Using Deep Learning (April 8, 2019). 2nd International Conference on Advances in Science & Technology (ICAST) 2019 on 8th, 9th April 2019 by K J Somaiya Institute of Engineering & Information Technology, Mumbai, India, Available at SSRN: https://ssrn.com/abstract=3368837 or http://dx.doi.org/10.2139/ssrn.3368837