Image Caption Generator
Image Caption Generator
By:-
Somya Yadav (1783910053) Under Supervision of:-
Nitesh Singh (1783910038) Naveen Tiwari
Kaustubh Rajput (1783910027)
Vedbrat Dwivedi (1783910057)
Overview
About
CNN(Convolutional Neural Network)
LSTM(Long Short-Term Memory)
Model
Data Set
Requirements
Application
What is Image Caption
Generator?
Image caption generator is a task that involves computer vision
and natural language processing concepts to recognize the
context of and image and describe them in a natural language
like English.
Example:-
In this project , we will be implementing the caption generator using CNN and
LSTM .The image feature will be extracted from Xception which is CNN model
trained on the imagenet dataset and then we feed the features into the LSTM model
which will be responsible for generating the image caption.
What is CNN?
Convolutional Neural Network are specialized deep neural
networks which can process the data that has input shape like a
2D matrix . Images are easily represented as a 2D matrix and
CNN is very useful in working with images.
CNN is basically used for image classification and identifying if an image
is a bird , a plane , etc.
It scans image from left to right and top to bottom to pull out important
features from the image and combines the features to classify images .
It can handle the images that have been translated , rotated, scaled and
changes in perspective.
What is LSTM?
LSTM stands for Long short term memory , they are a type of RNN
which is well suited for sequence prediction problems . Based on
the previous text , we can predict what the next word will be. It has
proven itself effective from the traditional RNN by overcoming the
limitations of RNN which had short term memory . LSTM can carry
out relevant information throughout the processing of inputs and
with a forget gate, it discards non relevant information.
Image Caption Generator
Model:-
So, to make our image caption generator model, we will be
merging these architectures. It is also called a CNN-RNN
model.
• CNN is used for extracting features from the image . We will use the
pre-trained model Xception.
• LSTM will use the information from CNN to help generate a description
of the image.
Model- Image Caption
Generator :-
Data Sets
Flickr8k
8000 images, each annotated with 5 sentences via AMT
1000 for validation, testing
Flickr 30k
30k images
1000 validation, 1000 testing
MSCOCO
123,000 images
5000 for validation, testing
Requirements:-
1. Deep Learning.
2. Python.
3. Jupyter notebook.
4. Keras library.
5. Numpy.
6. Natural Language Processing.
Application:-
1. Image Searching Tool.
2. Self driving car .
3. Google Photos.
4. Skin Vision.
Thank You