Classification Using Deep Learning Networks
Classification Using Deep Learning Networks
Presented By
Mentored By
Vishakha Lall
Dr. S. J. Nanda
Chitransh Gaurav
Rishabh Khanna
Waheed Nangigadda
Outline
● Basics of Machine Learning
● Neural Networks
● Deep Learning Networks
● Semantic Segmentation
● Core DCNN Based Semantic Segmentation
● Traffic Scene Segmentation
Neural Networks
● They are a type of machine learning model that build up complex models by
connecting simpler units(called neurons).
● Complex nonlinear hypotheses function.
● A neural network component computes a linear combination of its input
signals and further applies a sigmoid function to the result.
Forward propagation(Vectorized Implementation)
● Moving from input layer towards output layer while propagating through the
hidden layers.
● The activation values of one layer are learned from the activation values of the
previous layers to calculate even more complex hypotheses features.
● Used to compute the activation values for all of the neurons in our neural network.
Backward Propagation
● An error value is calculated for each of the neurons in the output layer and is
propagated backwards.
● Backpropagation uses these error values to calculate the gradient of the loss
function,this gradient is fed to the optimization method, which in turn uses it to
update the weights, in an attempt to minimize the loss function.
DCNN
● Comprised of one or more convolutional layers followed by one or more fully
connected layers as in a standard multilayer neural network.
● Easier to train
● Have fewer parameters than fully connected networks with the same number of
hidden units.
● The limitations of the old fashioned neural networks were overcome with
availability of large volumes of training data and accelerated GPU computing.
● Thus people have started adding more and more layers in between leading to
Deep convolutional Neural networks.
Neural networks vs Deep Learning Networks
● Most deep learning methods use neural network architectures, which is why
deep learning models are often referred to as deep neural networks.
● The term “deep” usually refers to the number of hidden layers in the neural
network. Neural networks contains 2-3 hidden layers, Deep networks can
have as many as 150.
● Deep learning models are trained by using large sets of labeled data and
neural network architectures that learn features directly from the data without
the need for manual feature extraction.
● DLN learn to detect different features of an image using tens or hundreds of
hidden layers which in turn increases complexity.
Semantic Segmentation
● Recognising and delineating objects in an image classifying each pixel in the
image is called semantic segmentation.
● Deep convolutional neural networks are successful at learning a good
representation of the visual inputs.
● Conditional Random Field (CRF) postprocessing are usually used to improve the
segmentation. CRFs are graphical models which ‘smooth’ segmentation based on
the underlying image intensities. They work based on the observation that similar
intensity pixels tend to be labeled as the same class.
Conditional Random Fields(CRF’S)
Each Xi can take a value from the label set. Connect random variables to form a
random field.
Unary Cost:-
If label doesn’t agree with the initial classifier → you pay a penalty.
Pairwise Cost:-
Pairwise energies are defined for every pixel pair in the image.
Simple to Complex Framework for Weakly-
Supervised Semantic Segmentation
● Pixel level classification problem or labelled segmentation
● Existing other methods either too expensive or inaccurate
● Since pixel annotation is time taking and expensive
● STC framework uses image level annotations.
Procedure
1. Train I-DCNN
2. Train E-DCNN
3. Train P-DCNN
The technique makes use of only image level annotations for simple images and
progressively learns to perform for complex images.
The basic process of annotating pixels is being automated. Thus reducing costs.
I-DCNN
● An I-DCNN is learnt using saliency maps generated which will be explained soon.
The so learnt segmentation network is termed Initial-DCNN.
The proposed multi-label cross entropy loss function gives better results as compared
to other algorithms like SaliencyCut.
E-DCNN
Makes use of predicted segmentation masks of images using previous I-DCNN.
A single label cross entropy loss function is used to train the E-DCNN. Since only a
single label can be allotted to each pixel.
P-DCNN
E-DCNN are used to predict segmentation masks which will supervise training P-
DCNN.
Implementation
Using C++ based Caffe framework built by UC Berkeley with interface to Python
utilizing Deeplab and VGG-16 Models.
Expected Applications
● Object Identification
● Speed Adjustment: Adapt speed of the vehicle based on position and speed of
other vehicles, lane departure frequencies of vehicles on the same route, traffic
along the route etc.
● Accident Detection: Prompt response to emergency department and
automatically reroute other passengers.
● Smart Traffic Lights: Real-time dynamic traffic light sequence.
● Smart Speed Limit Signs
Semantic Segmentation of Traffic Scene
● Input: RGB-D image obtained from Kinect hardware, multiple cameras or depth
sensors
● Goals:
○ Improve segmentation accuracy
○ Real-time performance
○ Competitive accuracy over other techniques
● Challenges:
○ Traffic scenes are complex as they are not still
○ Depth information is required over greater distance
● Output: Segmented image
Cityscapes Dataset
● It contains traffic scene images captured with stereo cameras from 50 different
cities.
● High quality pixel annotations of RGB-D images.
● 5000 colour images which are divided into training, validation and test set images.
● Divide the set into 11 dominant classes (road, sidewalk, building, pole, traffic sign,
tree, lawn, sky, person, vehicle, two-wheeler, others) labelled from 0-11.
Disparity Map
● Object information like depth, edges etc.
● Extracting rich features for CNN.
● Larger grey value implies larger disparity.
● Brighter regions imply object is closer to camera.
Obtaining Disparity Map
Stereo Vision Matching Algorithm Trade off between accuracy and time complexity
Matching Accuracy
Time Consumed
Requirement of Global Image Smoothing
Issues with Semi Global Mapping
8. Yunchao Wei, Xiaodan Liang, Yunpeng Chen, Xiaohui Shen, Ming-Ming Cheng,
Jiashi Feng, Yao Zhao, Senior Member, IEEE, and Shuicheng Yan, “STC: A Simple
to Complex Framework for Weakly-Supervised Semantic Segmentation”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39 no. 11, pp.
Thank you