“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
1 of 29
More Related Content
Deep Learning - A Literature survey
1. A Technical seminar on
“DEEP LEARNING”
Student: Akshay N. Hegde
1RV12SIT02
Mtech –IT
1st sem
Department of ISE, RVCE
3. INTRODUCTION
• What is Deep Learning?
• Some successful stories.
• Examples of Deep learning.
• Learning and training of Objects.
• Conclusion & Future scope
Dept. of ISE, RVCE.
4. What is Deep learning?
• “Automatically learning multiple levels of representations of
the underlying distribution of the data to be modelled”
• Deep learning algorithms have shown superior learning and
classification performance
• In areas such as transfer learning, speech and handwritten
character recognition, face recognition among others.
5. • A deep learning algorithm automatically extracts
the low & high-level features necessary for
classification.
• By high level features, one means feature that
hierarchically depends on other features.
• “Automatic representation learning” is key point of
interest of this kind of approach as the need for
potentially time consuming handcrafted feature
design is eliminated.
7. Hierarchies in Vision
• Lampert et al. CVPR’09
• Learn attributes, then classes
as combination of attributes
8. What we can do ? (With the right dataset)
• Recognize faces
• Categorize scenes
• Detect, segment and track objects
• 3D from multiple images or stereo
• Classify actions
9. What we can do..
Detect and Localize ObjectsCategorize Scenes
BEACH
Face Detection and
Recognition
10. Why Deep Learning ?
• Data mining: using historical data to improve decision
– medical records ⇒ medical knowledge
– log data to model user
• Software applications we can’t program by hand
– autonomous driving
– speech recognition
• Self customizing programs
– Newsreader that learns user interests
11. Some success stories
• Data Mining
• Analysis of astronomical data
• Human Speech Recognition
• Handwriting recognition
• Face recognition
• Fraudulent Use of Credit Cards
• Drive Autonomous Vehicles
• Predict Stock Rates
• Intelligent Elevator Control
• DNA Classification
14. Probabilistic max pooling
X3X1 X2 X4
max {x1, x2, x3, x4}
Convolutional Neural net:
Convolutional DBN:
X3X1 X2 X4
max {x1, x2, x3, x4}
Where xi are real numbers.
Where xi are {0,1}, and mutually
exclusive. Thus, 5 possible cases:
Collapse 2n configurations into n+1
configurations. Permits bottom up and
top down inference.
0
0 0 0 0
0
0 0 0 0 0 0 0 0 0 0
00000 0
1 1
1
1
11
1
1
15. Convolutional DBN for audio
One CDBN
layerDetection units
Max pooling
Detection units
Max pooling
Second CDBN
layer
16. Convolutional DBN for Images
Wk
Detection layer H
Max-pooling layer P
Hidden nodes (binary)
“Filter” weights (shared)
‘’max-pooling’’ node (binary)
Input data V
17. Convolutional DBN on face images
pixels
edges
object parts
(combination
of edges)
object models
18. Learning of object parts
Examples of learned object parts from object categories
Faces Cars Elephants Chairs
19. Training on multiple objects
Plot of H(class|neuron active)
Trained on 4 classes (cars, faces, motorbikes, airplanes).
Second layer: Shared-features and object-specific features.
Third layer: More specific features.
22. State-of-the-art task performance
TIMIT Phone classification Accuracy
Prior art (Clarkson et al.,1999) 79.6%
Stanford Feature learning 80.3%
TIMIT Speaker identification Accuracy
Prior art (Reynolds, 1995) 99.7%
Stanford Feature learning 100.0%
Audio
Images
Multimodal (audio/video)
CIFAR Object classification Accuracy
Prior art (Yu and Zhang, 2010) 74.5%
Stanford Feature learning 75.5%
NORB Object classification Accuracy
Prior art (Ranzato et al., 2009) 94.4%
Stanford Feature learning 96.2%
AVLetters Lip reading Accuracy
Prior art (Zhao et al., 2009) 58.9%
Stanford Feature learning 63.1%
Video
UCF activity classification Accuracy
Prior art (Kalser et al., 2008) 86%
Stanford Feature learning 87%
Hollywood2 classification Accuracy
Prior art (Laptev, 2004) 47%
Stanford Feature learning 50%
23. • Fig. 1. DeSTIN Hierarchy for the MNIST dataset studies. Four layers are
used with 64, 16, 4 and 1 node per layer arranged in a hierarchical
manner.
• At each node the output belief b(s) at each temporal step is fed to a
parent-node.
• At each temporal step the parent receives input beliefs from four
child nodes to generate its own belief (fed to its parent) and an
advice value a which is fed back to the child nodes.
25. Named-entity
recognition (NER)
• Also known as entity identification and entity
extraction is a subtask of information extraction that
seeks to locate and classify atomic elements in text
into predefined categories such as the names of
persons, organizations, locations, expressions of
times, quantities, monetary
values, percentages, etc.
• Most research on NER systems has been structured
as taking an unannotated block of text, such as this
one:
• “Jim bought 300 shares of Acme Corp. in 2006.”
26. • And producing an annotated block of text, such as
this one:
<ENAMEX TYPE="PERSON"> Jim </ENAMEX> bought
<NUMEX TYPE="QUANTITY"> 300 </NUMEX> shares of
<ENAMEX TYPE="ORGANIZATION"> Acme Corp.
</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>
• State-of-the-art NER systems for English produce
near-human performance. For example, the best
system entering MUC-7 scored 93.39% of F
measure while human annotators scored 97.60%
and 96.95%
27. CONCLUSION & FUTURE WORK
• Test result shows that a deep learning approach
allows better classification than popular classifiers
on the handcrafted features chosen in this work.
• This is a significant advantage over the typical
classification approach that requires careful (and
possibly time consuming) selection of features.
• Instead of hand-tuning features, use unsupervised
feature learning
• Advanced topics:
o Self-taught learning
o Scaling up
28. • More practical implementations must be done.
• Researches are going on by Stanford University.
29. REFERENCES
• [1]D. Erhan, Y. Bengio, A. Courville, P. A. Manzagol, P. Vincent, and S. Bengio, "Why
Does Unsupervised Pre-training Help Deep Learning?," Journal of Machine Learning
Research, vol. 11, pp. 625-660, Feb 2010.
• [2] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked
Denoising Autoencoders: Learning Useful Representations in a Deep Network with a
Local Denoising Criterion," Journal of Machine Learning Research, vol. 11, 2010.
• [3] G. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief
nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
• [4] D. Keysers, “Comparison and Combination of State-of-the-art Techniques for
Handwritten Character Recognition: Topping the MNIST Benchmark,” Arxiv preprint
arXiv:0710.2231, 2007.
• [5] H. Lee, Y. Largman, P. Pham, and A. Ng, “Unsupervised feature learning for
audio classification using convolutional deep belief networks,”Advances in neural
information processing systems, vol. 22, pp. 1096– 1104, 2009.
• [6] Francis, Quintal, Lauzon, “An introduction to deep learning,” IEEE Transactions
on Deep Learning, pp. 1438–1439, 2012.