Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)

This document describes a model for image captioning using a convolutional neural network (CNN) and long short-term memory (LSTM) network. It first outlines the problem and overall model, then discusses the key building blocks: CNNs for image feature extraction, transfer learning with Inception V3, RNNs/LSTMs for generating captions, and word embeddings. It shows how these pieces are connected in the final model and provides examples of model performance on test and real data, as well as potential applications.

Uploaded by

উদয় কামাল

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views

Image Captioning Using CNN & LSTM: Digital Signal Processing Laboratory (EEE - 316)

Uploaded by

উদয় কামাল

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Digital Signal Processing Laboratory ( EEE -316)

Image Captioning Using

CNN & LSTM

Uday Kamal Hasib Amin Rajib Al-Sabah Abhishek Shushil

Id : 1406041 Id : 1406045 Id : 1406035 Id : 1406034

Bangladesh University of Engineering and Technology (BUET)

Presentation Outline

● Problem Statement
● Basic building blocks for the
network
- CNN
- Transfer Learning
- RNN
- LSTM
● How do we wire them together?
● Code
● Other places this can be
implemented
● Interaction & Questions
Problem Overview
Problem Overview
Overall Model:
Building Blocks for the Network:
CNN
Building Blocks for the Network:
CNN

Convolution layer is a feature detector that automagically learns to filter out the not needed
information from an input by using convolution kernel.
Pooling layers compute the max or average value of a particular feature over a region of
the input data (downsizing of input images). Also helps to detect objects in some unusual
places and reduces memory size.
Building Blocks for the Network:
Transfer Learning
Building Blocks for the Network:
Inception V3
Building Blocks for the Network:
RNN
● As humans we understand context
● Every single time we don’t reset our understanding
● Thoughts have persistence
● Traditional NNs like CNNs don’t have persistence
● speech recognition, language modeling, translation
requires this persistence

RNNs are general computers which can learn algorithms to map

input sequences to output sequences (flexible-sized vectors).
The output vector’s contents are influenced by the entire
history of inputs.
Building Blocks for the Network:
RNN
Building Blocks for the Network:
LSTM
The LSTM units give the network memory cells with read, write and reset
operations. During training, the network can learn when it should remember data
and when it should throw it away.
Building Blocks for the Network:
LSTM

Ct is the cell state, which flows through the

entire chain...
Building Blocks for the Network:
LSTM

Forget Gate:

Concatenate
Building Blocks for the Network:
LSTM

Input Gate Layer

New contribution to cell state

Classic neuron
Building Blocks for the Network:
LSTM

Update Cell State (memory):

Building Blocks for the Network:
LSTM

Output Gate Layer

Output to next layer

Building Blocks for the Network:
Word Embedding

Embeddings are used to turn textual data (words, sentences, paragraphs) into
high- dimensional vector representations and group them together with
semantically similar data in a vectorspace. Thereby, computer can detect
similarities mathematically.
Final Model:
Training Data:

Flickr8k Dataset:
Dataset contains 8000 different images with 5 different human
labelled captions.:
The image is given 5 different captions:

1) A boy runs as others play on a home-made slip and

slide.

2) Children in swimming clothes in a field.

3) Little kids are playing outside with a water hose and

are sliding down a water slide.

4) Several children are playing outside with a wet tarp on

the ground.

5) Several children playing on a homemade water slide.

Training History:
Model’s Performance on Test Data:
Model’s Performance on Real Data:

Three people are on a boat in Three people pose for a One man is sitting at a table
the water picture together in front of a restaurant

A soccer player prepares to A group of kids play in the A boy hits the ball at a
kick the ball water baseball game .
Application:

● Visual to Text systems for blind people

● Search Engines for searching medical records based on

content based caption

● Auto Tagging different imaging data

● Auto Video tagging and summary generation

Ad3511 Deep Learning Laboratory
No ratings yet
Ad3511 Deep Learning Laboratory
1 page
"Perfwall" - Perforated Wood Shear Wall Analysis: Program Description
No ratings yet
"Perfwall" - Perforated Wood Shear Wall Analysis: Program Description
3 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
Automatic Fault Detection System Using PLC
No ratings yet
Automatic Fault Detection System Using PLC
26 pages
Handwritten Digit Regonizer
100% (3)
Handwritten Digit Regonizer
11 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
CNS Notes CS8792 CNS Notes CS8792
No ratings yet
CNS Notes CS8792 CNS Notes CS8792
17 pages
Introduction To Machine Learning PDF
100% (1)
Introduction To Machine Learning PDF
17 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
Object Detection
No ratings yet
Object Detection
73 pages
AI - (Deep Learning/NLP) : 5 Days
No ratings yet
AI - (Deep Learning/NLP) : 5 Days
4 pages
Acn Question Bank With Solution.
No ratings yet
Acn Question Bank With Solution.
47 pages
Synopsis P
100% (1)
Synopsis P
6 pages
Python Scripting
100% (1)
Python Scripting
15 pages
Age and Gender Detection Using Deep Learning: HYDERABAD - 501 510
No ratings yet
Age and Gender Detection Using Deep Learning: HYDERABAD - 501 510
11 pages
DC Toppers Solution
No ratings yet
DC Toppers Solution
92 pages
Object Detection and Tracking Algorithms For Vehicle Counting: A Comparative Analysis
No ratings yet
Object Detection and Tracking Algorithms For Vehicle Counting: A Comparative Analysis
11 pages
Image Enhancement
No ratings yet
Image Enhancement
144 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Introduction To NLTK
No ratings yet
Introduction To NLTK
101 pages
Computer and Network Security: Simplified Data Encryption Standard (DES)
No ratings yet
Computer and Network Security: Simplified Data Encryption Standard (DES)
21 pages
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
45 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
Image Caption Generator
No ratings yet
Image Caption Generator
13 pages
A Brief Introduction To Data Mining (DM) : Bs Cs - V Iii BY Sanianayab
No ratings yet
A Brief Introduction To Data Mining (DM) : Bs Cs - V Iii BY Sanianayab
23 pages
Cybersecurity Essentials Syllabus
No ratings yet
Cybersecurity Essentials Syllabus
2 pages
Synopsis - Note Sharing Application Using Django
No ratings yet
Synopsis - Note Sharing Application Using Django
12 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
CRC and Hamming Code
100% (1)
CRC and Hamming Code
12 pages
Revised CS8383 (Eee) Oop Lab Man
No ratings yet
Revised CS8383 (Eee) Oop Lab Man
85 pages
Image Super Resolution Report
No ratings yet
Image Super Resolution Report
12 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Face Detection & Emotion Recognition
No ratings yet
Face Detection & Emotion Recognition
26 pages
CSE Dept. PPT 176 173
No ratings yet
CSE Dept. PPT 176 173
17 pages
Deep Learning Titans Compared - TensorFlow vs. PyTorch
No ratings yet
Deep Learning Titans Compared - TensorFlow vs. PyTorch
13 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Multithreaded Programming Using Java Threads
No ratings yet
Multithreaded Programming Using Java Threads
33 pages
Android 100 MCQS
No ratings yet
Android 100 MCQS
39 pages
OS Lecture3 - Inter Process Communication
No ratings yet
OS Lecture3 - Inter Process Communication
43 pages
Develop A Menu Driven Program To Animate A Flag Using Bezier Curve Algorithm
No ratings yet
Develop A Menu Driven Program To Animate A Flag Using Bezier Curve Algorithm
4 pages
300+ TOP Operating System LAB VIVA Questions and Answers
No ratings yet
300+ TOP Operating System LAB VIVA Questions and Answers
25 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
Computer Networks Practical Files
No ratings yet
Computer Networks Practical Files
7 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
Unit 4 (MongoDB)
No ratings yet
Unit 4 (MongoDB)
46 pages
Distributed Objects and Components
No ratings yet
Distributed Objects and Components
62 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Evolution of Machine Learning Algorithm
No ratings yet
Evolution of Machine Learning Algorithm
21 pages
Web Mining
No ratings yet
Web Mining
71 pages
Computer Vision Research Proposal
No ratings yet
Computer Vision Research Proposal
10 pages
Java Notes-Ii CS
No ratings yet
Java Notes-Ii CS
265 pages
Digital Notes: (Department of Computer Applications)
No ratings yet
Digital Notes: (Department of Computer Applications)
14 pages
Object Detection - Week 1 - Object Detection in 20 Years - Final
No ratings yet
Object Detection - Week 1 - Object Detection in 20 Years - Final
280 pages
19 Deep Learning
100% (1)
19 Deep Learning
49 pages
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
No ratings yet
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
10 pages
Spintronics ML Accelerator PDF
No ratings yet
Spintronics ML Accelerator PDF
40 pages
Dataset For Uav PDF
No ratings yet
Dataset For Uav PDF
9 pages
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
No ratings yet
DRUM: A Dynamic Range Unbiased Multiplier For Approximate Applications
8 pages
Aynal Sir Biomed Book Scanned PDF
No ratings yet
Aynal Sir Biomed Book Scanned PDF
124 pages
Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
No ratings yet
Adaptive Signal Processing Bernard Widrow, Peter N. Stearns
14 pages
Data Leakage For Dummies PDF
100% (1)
Data Leakage For Dummies PDF
74 pages
Javascript Tutorial
No ratings yet
Javascript Tutorial
39 pages
Siemens Mindsphere Brochure
No ratings yet
Siemens Mindsphere Brochure
3 pages
Python Code Smells Detection Using Conventional Machine Learning Models
No ratings yet
Python Code Smells Detection Using Conventional Machine Learning Models
21 pages
Autodyn: Explicit Software For Nonlinear Dynamics
No ratings yet
Autodyn: Explicit Software For Nonlinear Dynamics
67 pages
Understanding Locking in Oracle
No ratings yet
Understanding Locking in Oracle
64 pages
Message Passing Paradigm
No ratings yet
Message Passing Paradigm
20 pages
Research Design and Secondary Data
No ratings yet
Research Design and Secondary Data
39 pages
Wifi Technology PDF
No ratings yet
Wifi Technology PDF
17 pages
Programming in Java: Abstract Class and Interface
No ratings yet
Programming in Java: Abstract Class and Interface
23 pages
SAP HANA Commvault Best Practices
No ratings yet
SAP HANA Commvault Best Practices
2 pages
ATCD 5th Sem 1st Module VTU
No ratings yet
ATCD 5th Sem 1st Module VTU
37 pages
Advanced Java Programming
No ratings yet
Advanced Java Programming
11 pages
Texturas in Artlantis 5
No ratings yet
Texturas in Artlantis 5
11 pages
GPS Module
No ratings yet
GPS Module
7 pages
Emp-168 Analizador Bioquimico PDF
No ratings yet
Emp-168 Analizador Bioquimico PDF
2 pages
MOBILE COMPUTING Paper Presentation
No ratings yet
MOBILE COMPUTING Paper Presentation
6 pages
ZK Basics
No ratings yet
ZK Basics
32 pages
Superscalar Architectures: COMP375 Computer Architecture and Organization
No ratings yet
Superscalar Architectures: COMP375 Computer Architecture and Organization
35 pages
Statically Indeterminate Structures Chu Kia Wang PHD R
No ratings yet
Statically Indeterminate Structures Chu Kia Wang PHD R
3 pages
Craft CMS General Config Settings
No ratings yet
Craft CMS General Config Settings
6 pages
Unsupervised Learning of Video Representations Using Lstms
No ratings yet
Unsupervised Learning of Video Representations Using Lstms
12 pages
Cs SQL
No ratings yet
Cs SQL
64 pages
Loops in C++: Iterative Method
No ratings yet
Loops in C++: Iterative Method
6 pages
MSDOS Programming Info
No ratings yet
MSDOS Programming Info
631 pages
Testing - Stripe Payments
No ratings yet
Testing - Stripe Payments
1 page
EDrive Configurator User Manual 0 1
No ratings yet
EDrive Configurator User Manual 0 1
19 pages
RA No 8792
No ratings yet
RA No 8792
10 pages
Exp 03 Combinational Circuit based Car Security System
No ratings yet
Exp 03 Combinational Circuit based Car Security System
12 pages