Introduction to deep learning

INTRODUCTIONTO
DEEP LEARNING
Zeynep Su Kurultay

Outline
■ Modeling humans in machines
■ Introduction to neural nets
■ What makes an algorithm intelligent?
■ Learning
– Supervised learning
■ Deep learning
– Neural nets in detail
■ Framework discussion & sample code
■ Future

Modeling humans in machines
But why?

Neural networks
■ The mammal brain is organized in a deep
architecture (Serre, Kreiman, Kouh, Cadieu,
Knoblich, & Poggio, 2007)
(E.g. visual system has 5 to 10 levels)
■ Very popular at the beginning of 1990s but fell
out of favor after it was found that they were
not performing well
■ Why is it gaining power again now: Deep
architectures might be able to represent some
functions otherwise not efficiently
representable. Breakthrough in 2006/2007 with
Hinton, Bengio papers

Examples around us
Date: November 2014

Examples around us
Image: NasenSpray/Imgur

Examples around us
Image: http://www.telegraph.co.uk/technology/google/11730050/deep-dream-best-images.html?frame=3370388

Examples around us
Image: drkaugumon/Imgur

What makes an algorithm intelligent?
Image courtesy ofToptal.com

What makes an algorithm intelligent?

Learning
■ Supervised machine learning:The program is “trained” on a pre-defined set of
“training examples”, which then facilitate its ability to reach an accurate conclusion
when given new data.
■ Semi-supervised machine learning:The program infers the unknown labels through
“label propagation”, utilizing similarities between different examples and inferring
non-existent labels from existent ones
■ Unsupervised machine learning:The program is given a bunch of data and must find
patterns and relationships therein. – e.g. clustering via nearest neighbor algorithm

Supervised Learning
■ Binary classification: Does this person have that disease?
■ Regression:What is the market value of this house?
■ Multiclass classification: Digit recognition, Face recognition

Supervised Learning
■ Goal: Given a number of features, try to make sense out of it!
■ Example: Employee satisfaction rates – depends on ?  So, given these
features in a dataset, try to predict the rate

Supervised Learning
■ But how do we adjust ourselves? How do we know at each step we are getting better?
■ Measurement of wrongness: Loss functions

Gradient descent
How do we know how to “roll down
the hill”?
The gradient (the derivatives of the
loss function over all of the individual
weights of features -i.e. parameters-)
tells us “which way is down”.

What exactly is deep learning?
■ “a network would need more than one hidden layer to be a deep network, networks
with one or two hidden layers are traditional neural networks…….”
■ “in my experience, a network can be considered deep when there is at least one
hidden layer.Although the term deep learning can be fuzzy, …”
■ “in my own thinking, deep is not related to the number of layers, but it talks about
how hard the feature to be discovered is…….”
■ - a discussion from StackExchange

Deep learning
■ What is the difference? Remember the quote fromYann LeCun from before? It goes
on:
■ “A pattern recognition system is like a black box with a camera at one end, a green
light and a red light on top, and a whole bunch of knobs on the front…. Now, imagine a
box with 500 million knobs, 1,000 light bulbs, and 10 million images to train it with.
That’s what a typical Deep Learning system is.”

Aim: Learning features
■ Deep learning excels in tasks where the basic unit, a single
pixel, a single frequency, or a single word has very little
meaning in and of itself, but the combination of such units
has a useful meaning. It can learn these useful combinations
of values without any human intervention.

(convolutional neural networks)

Neural networks
■ An input, output, and one or more hidden layers
of units/neurons/perceptrons
■ Each connection between two neurons has a
weight w. Best weights can again be found with
gradient descent.
Image courtesy of
http://ljs.academicdirect.org/A15/053_070.htm

Neural networks
■ Example: Input vector: [7, 1, 2]  Into the input
units
■ Forward propagation
■ Activation function
Image courtesy of

Neural networks
■ Why deep?
■ Number of parameterized transformations a
signal encounters as it propagates from the
input layer to the output layer, where a
parameterized transformation is a processing
unit that has trainable parameters, such as
weights.
Image courtesy of

■ The goal of deep learning methods is to learn higher
levels of feature from lower level features.

Other important concepts
■ Overfitting – there is such a thing as learning too much –or too specific-!
■ Regularization – a technique that prevents overfitting

Overfitting
■ Overfitting – there is such a thing as learning too much –or too specific-!
■ Regularization – a technique that prevents overfitting

Overfitting
U.S. Census Population overTime

Different frameworks
■ Pylearn2, Lasagne,Caffe,Torch,Theano, Blocks, Plate,Crino,Theanet, DL4J, Keras, …

■ Theano:
– A mathematical expression compiler, designed with machine learning in mind.
– Lets you define an objective and automatically produces the code that computes the
gradient of the objective.
– Good for experimenting with different loss functions
– Slightly lower layer of abstraction vs more possibilities

■ Caffe:
– Developed by UC Berkeley
– Widely used machine-vision library that ported Matlab’s implementation of fast
convolutional nets to C and C++
– Not intended for other deep-learning applications such as text, sound or time series data
CORRECTION: There are new implementations of RNNs and LSTMs in Caffe, so it is not
only for images any more!
– Very fast: over 60M images per day with a single NVIDIA K40 GPU

■ Torch:
– Written in Lua (a scripting language developed in Brazil in the early 1990s)
– A highly customized version of it is used by large tech companies such as Google and
Facebook

■ Keras:
– Minimalist, highly modular neural network library in the spirit ofTorch
– Written in Python
– UsesTheano under the hood for optimized tensor manipulation on GPU andCPU
– It was developed with a focus on enabling fast experimentation
– 60K images took 30 hours on Amazon g2.2xlarge 

Comparing Keras andTheano
MNIST digits dataset
- serves as a benchmark to compare
results with as new articles come
out.
Multilayer Perceptron
- Basic feedforward neural network

Demo
Code snippets – inside the gradient descent
Output =Wx+b

Demo
Code snippets – inside the hidden layer

Demo
Code snippets – inside the network

Demo
■ https://algorithmia.com/demo/handwriting

Future of deep learning
■ Deep learning has a lot of hype right now, and it is apparent that it is very useful for
specific tasks.
■ What frontiers and challenges do you think are the most exciting for researchers in
the field of neural networks in the next ten years?
■ I cannot see ten years into the future. For me, the wall of fog starts at about 5 years. ...
I think that the most exciting areas over the next five years will be really understanding
videos and text. I will be disappointed if in five years time we do not have something
that can watch aYouTube video and tell a story about what happened. I have had a lot
of disappointments.
– From Geoffrey Hinton’s AMA on Reddit

Now &The future
Facebook Deep Learning, March 26, 2015
Image courtesy ofVenturebeat.com

Join us!
■ Open positions: https://angel.co/algorithmia/jobs/
– Algorithm Developer [this is me!]
– Backend Developer
– Product Manager
– Technical Evangelist

Further resources
■ Introductory:
■ Andrew Ng’s Machine Learning course on Coursera
■ Geoffrey Hinton’s Neural Networks course on Coursera
■ Advanced:
■ Stanford’s Convolutional Neural Networks forVisual Recognition http://cs231n.github.io/
■ Who is afraid of non-convex loss functions? ByYann LeCun http://videolectures.net/eml07_lecun_wia/
■ What is wrong with Deep Learning? ByYann Lecun http://techtalks.tv/talks/whats-wrong-with-deep-learning/61639/
■ For those who like papers, recent advances:
■ PlayingAtari with Deep Reinforcement Learning - http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
■ Unsupervised Face Detection - http://cs.stanford.edu/~quocle/faces_full.pdf

■ Content:
■ Toptal.com, Deeplearning.net
■ http://www.computerworld.com/article/2918161/emerging-technology/the-ai-ecosystem.html
■ Introduction to Machine Learning CMU-10701 - Deep Learning slides
■ Images:
■ http://www.spyemporium.com/images/products/st-sc1720.jpg
■ http://stats.stackexchange.com/questions/128616/whats-a-real-world-example-of-overfitting
■ http://www.homedepot.com/catalog/productImages/1000/c4/c4c34d2e-56ce-4c11-94c0-67aa19b769fa_1000.jpg
■ http://www.bulborama.com/images/products/1933.jpg
■ https://xkcd.com/1122/, https://xkcd.com/1425/
■ www.deeplearning.net

Introduction to deep learning

Related slideshows

More Related Content

Introduction to deep learning

Editor's Notes