Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
192 views

Image Caption Generator

This document summarizes an image caption generator project built by students. It uses a CNN-RNN model with an Xception CNN to extract image features and an LSTM to generate captions. The model is trained on datasets like Flickr8k and MSCOCO. Requirements include Python, Keras, and NLP libraries. Applications include image search tools, self-driving cars, Google Photos, and medical imaging analysis.

Uploaded by

Samrat Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views

Image Caption Generator

This document summarizes an image caption generator project built by students. It uses a CNN-RNN model with an Xception CNN to extract image features and an LSTM to generate captions. The model is trained on datasets like Flickr8k and MSCOCO. Requirements include Python, Keras, and NLP libraries. Applications include image search tools, self-driving cars, Google Photos, and medical imaging analysis.

Uploaded by

Samrat Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Image Caption Generator

Department of Computer Science and


Engineering
Rajkiya Engineering College Kannauj

By:-
Somya Yadav (1783910053) Under Supervision of:-
Nitesh Singh (1783910038) Naveen Tiwari
Kaustubh Rajput (1783910027)
Vedbrat Dwivedi (1783910057)
Overview
 About
 CNN(Convolutional Neural Network)
 LSTM(Long Short-Term Memory)
 Model
 Data Set
 Requirements
 Application
What is Image Caption
Generator?
Image caption generator is a task that involves computer vision
and natural language processing concepts to recognize the
context of and image and describe them in a natural language
like English.
Example:-

There is a very colorful bus coming


on the street.

A Dog is running on a beach.


About The Project:-
The Objective of our project is to learn the concept of a CNN
and LSTM model and build a working model of Image caption
generator.

In this project , we will be implementing the caption generator using CNN and
LSTM .The image feature will be extracted from Xception which is CNN model
trained on the imagenet dataset and then we feed the features into the LSTM model
which will be responsible for generating the image caption.
What is CNN?
Convolutional Neural Network are specialized deep neural
networks which can process the data that has input shape like a
2D matrix . Images are easily represented as a 2D matrix and
CNN is very useful in working with images.
CNN is basically used for image classification and identifying if an image
is a bird , a plane , etc.

It scans image from left to right and top to bottom to pull out important
features from the image and combines the features to classify images .
It can handle the images that have been translated , rotated, scaled and
changes in perspective.
What is LSTM?
LSTM stands for Long short term memory , they are a type of RNN
which is well suited for sequence prediction problems . Based on
the previous text , we can predict what the next word will be. It has
proven itself effective from the traditional RNN by overcoming the
limitations of RNN which had short term memory . LSTM can carry
out relevant information throughout the processing of inputs and
with a forget gate, it discards non relevant information.
Image Caption Generator
Model:-
So, to make our image caption generator model, we will be
merging these architectures. It is also called a CNN-RNN
model.
• CNN is used for extracting features from the image . We will use the
pre-trained model Xception.

• LSTM will use the information from CNN to help generate a description
of the image.
Model- Image Caption
Generator :-
Data Sets
Flickr8k
8000 images, each annotated with 5 sentences via AMT
1000 for validation, testing
Flickr 30k
30k images
1000 validation, 1000 testing
MSCOCO
123,000 images
5000 for validation, testing
Requirements:-
1. Deep Learning.
2. Python.
3. Jupyter notebook.
4. Keras library.
5. Numpy.
6. Natural Language Processing.
Application:-
1. Image Searching Tool.
2. Self driving car .
3. Google Photos.
4. Skin Vision.
Thank You

You might also like