Santiago Pascual de la Puente
PhD Candidate
Universitat Politecnica de Catalunya
Technical University of Catalonia
Deep Generative Models I
Variational Autoencoders


● What is in here?
● Introduction
● Taxonomy
● Variational Auto-Encoders (VAEs) (DGMs I)
● Generative Adversarial Networks (GANs)
● PixelCNN/Wavenet
● Normalizing Flows (and flow-based gen. models)
○ Real NVP
● Models comparison
● Conclusions


What is in here?


● We are going to make a shift in the modeling paradigm:
discriminative → generative.
● Is this a type of neural network? No → either fully connected
networks, convolutional networks or recurrent networks fit,
depending on how we condition data points (sequentially,
globally, etc.).
● This is a very fun topic with which we can make a network paint,
sing or write.


Planning for our generative trip
● 5/11/2018 → DGM I: Introduction to gen models + VAEs
○ (... in b/w ...) Methodology + RNN lessons
● 19/11/2018 → DGM II: GANs
● 19/11/2018 → DGM III: likelihood models (pixelCNN + flow models)
● 26/11/2018 → Practical lesson on DGMs (code day)




What we are used to do with Neural Nets
Figure credit: Javier Ruiz
Discriminative model → aka. tell me the probability of some ‘Y’ responses given ‘X’ inputs. Here we
don’t care about the process that generated ‘X’. we just detect some patterns in the input to give
an answer. This gives us some outcomes probabilities given some ‘X’ data: P(Y | X).
P(Y = [0,1,0] | X = [pixel1
, pixel2
, …, pixel784


What is a generative model?
We have datapoints that emerge from some
generating process (landscapes in the nature,
speech from people, etc.)
X = {x1
, x2
, …, xN
Having our dataset X with example datapoints
(images, waveforms, written text, etc.) each point xn
lays in some M-dimensional space.


What is a generative model?
We have datapoints that emerge from some
generating process (landscapes in the nature,
speech from people, etc.)
X = {x1
, x2
, …, xN
Having our dataset X with example datapoints
(images, waveforms, written text, etc.) each point xn
lays in some M-dimensional space.
Each point xn
comes from an M-dimensional
probability distribution P(X) → Model it!


What is a generative model?
We have datapoints that emerge from some generating
process (landscapes in the nature, speech from people,
X = {x1
, x2
, …, xN
Y = {y1
, y2
, …, yN
Having our dataset X with example datapoints (images,
waveforms, written text, etc.) each point xn
lays in some
M-dimensional space. Each point yn
lays in some
K-dimensional space. M can be different than K.
We can also modeled joint probabilities to apply
conditioning variables on the generative
process: P(X, Y)


1) We want our model with parameters θ={weights, biases} to output samples
distributed Pmodel, matching the distribution of our training data Pdata or P(X).
2) We can sample points from Pmodel plausibly looking Pdata distributed.
What is a generative model?


1) We want our model with parameters θ={weights, biases} to output samples
distributed Pmodel, matching the distribution of our training data Pdata or
2) We can sample points from Pmodel plausibly looking Pdata distributed.
What is a generative model?
We have not mentioned any network structure
or model input data format, just a requirement
on what we want in the output of our model.


We can generate any type of data, like speech waveform samples or image pixels.
What is a generative model?
M samples = M dimensions
Scan and unroll
x3 channels
M = 32x32x3 = 3072 dimensions


Our learned model should be able to make up new samples from the
distribution, not just copy and paste existing samples!
What is a generative model?
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)


Why Generative Models?
● Model very complex and high-dimensional distributions.
● Be able to generate realistic synthetic samples
○ possibly perform data augmentation
○ simulate possible futures for learning algorithms
● Unsupervised learning: fill blanks in the data
● Manipulate real samples with the assistance of the generative model
○ Example: edit pictures with guidance (photoshop super pro level)


Motivating Applications


Image inpainting
Recover lost information/add enhancing details by learning the natural distribution of pixels.
original enhanced


Speech Enhancement
Recover lost information/add enhancing details by learning the natural distribution of audio samples.


Speech Synthesis
Generate spontaneously new speech by learning its natural distribution along time.


Image Generation
Generate spontaneously new images by learning their spatial distribution.
Figure credit: I. Goodfellow


Generate high resolution image version, even introducing made up plausible details.
(Ledig et al. 2016)


Generative Models Taxonomy


We will see models that learn the probability density function:
● Explicitly (we impose a known loss function)
○ (1) With approximate density → Variational
Auto-Encoders (VAEs)
○ (2) With tractable density & likelihood-based:
■ PixelCNN, Wavenet.
■ Flow-based models: real-NVP, GLOW.
● Implicitly (we “do not know” the loss function)
○ (3) Generative Adversarial Networks (GANs).


Variational Auto-Encoders


Auto-Encoder Neural Network
● Predict at the output the
same input data.
● Do not need labels
cx x^
Encode Decode


Auto-Encoder Neural Network
● Q: Is an AE a generative model? (How can we make it generate data?)
Encode Decode
“Generate” 26


Auto-Encoder Neural Network
● Q: Is an AE a generative model? (How can we make it generate data?)
○ A: This “just” memorizes codes C from our training samples!
Encode Decode


Variational Auto-Encoder
VAE intuitively:
● Introduce a restriction in z, such that our data points x (e.g. images)
are distributed in a latent space (manifold) following a specified
probability density function Z (normally N(0, I)).
Encode Decode
z ~ N(0, I)


VAE intuitively:
● Introduce a restriction in z, such that our data points x (e.g. images)
are distributed in a latent space (manifold) following a specified
probability density function Z (normally N(0, I)).
Encode Decode
z ~ N(0, I)
We can then sample z to
generate NEW x data
points (e.g. images).
Variational Auto-Encoder


Variational Auto-Encoder
● VAE, aka. where Bayesian theory and deep learning collide, in depth:
Generate X,
N times
sample z,
N times
function 30Credit:


Variational Auto-Encoder
● VAE, aka. where Bayesian theory and deep learning collide, in depth:
Generate X,
N times
sample z,
N times
Maximize the probability of
each X under the generative
Maximum likelihood framework


Variational Auto-Encoder
Intuition behind normally distributed z vectors: any output distribution can be achieved from the simple
N(0, I) with powerful mappings.
N(0, I)


Variational Auto-Encoder
Intuition behind normally distributed z vectors: any output distribution can be achieved from the simple
N(0, I) with powerful mappings.
Who’s the strongest non-linear and learnable mapper in the universe (so far)?


Variational Auto-Encoder
● VAE, aka. where Bayesian theory and deep learning collide, in depth:
Generate X,
N times
sample z,
N times
Maximize the probability of
each X under the generative
Maximum likelihood framework


Variational Auto-Encoder
Now to solve the maximum likelihood problem… We’d like to know and . We
introduce as a key piece → sample values z likely to produce X, not just the whole
But is unkown too! Variational Inference comes in to play its role: approximate
with .
Key Idea behind the variational inference application: find an approximation
function that is good enough to represent the real one → optimization problem.


Variational Auto-Encoder
Neural network prespective
The approximated function starts to shape up as a neural encoder, going from training datapoints x to
the likely z points following , which in turn is similar to the real .
36Credit: Altosaar
What is a variational autoncoder? (Altosaar 2017)


Variational Auto-Encoder
Neural network prespective
The (latent→ data) mapping starts to shape up as a neural decoder, where we go from our sampled z
to the reconstruction, which can have a very complex distribution.
37Credit: Altosaar


Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
KL divergence
Credit: Kristiadis


Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
Bayes rule
Variational autoencoder: Intutition and implementation (Kristiadi 2017)


Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
Gets out of expectation for no
dependency over z.
40Credit: Kristiadi


Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
41Credit: Kristiadi


Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
A bit more rearranging with sign and grouping leads us to a new KL term between
and , thus the encoder approximate distribution and our prior. 42
Credit: Kristiadi


Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
43Credit: Kristiadi


Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
44Credit: Kristiadi


Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
Not computable and non-negative (KL)
approximation error.
log likelihood of our data
45Credit: Kristiadi


Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
Not computable and non-negative (KL)
approximation error.
Reconstruction loss of our data
given latent space → NEURAL
DECODER reconstruction loss!
Credit: Kristiadi


Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
Not computable and non-negative (KL)
approximation error.
Reconstruction loss of our data
given latent space → NEURAL
DECODER reconstruction loss!
Regularization of our latent
representation → NEURAL
ENCODER projects over prior.
47Credit: Kristiadi


Variational Auto-Encoder
Now, we have to define shape to compute its divergence against the prior (i.e. to properly
condense x samples over the surface of z). Simplest way: distribute over normal distribution with
predicted moments: and .
This allows us to compute the KL-div with in a closed form!
48Credit: Kristiadi


Variational Auto-Encoder
Encode Decode
We can compose our encoder - decoder setup, and place our VAE losses to regularize and reconstruct.


Variational Auto-Encoder
Encode Decode
Reparameterization trick
But WAIT, how can we backprop through sampling of ? Not differentiable!


Variational Auto-Encoder
Reparameterization trick
Sample and operate with it, multiplying by and summing


Variational Auto-Encoder
Generative behavior
Q: How can we now generate new samples once the underlying generating distribution is learned?


Variational Auto-Encoder
Generative behavior
Q: How can we now generate new samples once the underlying generating distribution is learned?
A: We can sample from our prior, for example, discarding the encoder path.


Variational Auto-Encoder
Walking around z manifold dimensions gives us spontaneous generation of samples with different
shapes, poses, identitites, lightning, etc..
MNIST manifold: https://youtu.be/hgyB8RegAlQ
Face manifold: https://www.youtube.com/watch?v=XNZIN7Jh3Sg


Variational Auto-Encoder
Walking around z manifold dimensions gives us spontaneous generation of samples with different
shapes, poses, identitites, lightning, etc..
Example with MNIST manifold


Variational Auto-Encoder
Walking around z manifold dimensions gives us spontaneous generation of samples with different
shapes, poses, identitites, lightning, etc..
Example with Faces manifold


Variational Auto-Encoder
Code show with PyTorch on VAEs!


Variational Auto-Encoder
Model MNIST:Binary reconstruction case (BCE loss)


GANs are coming in DGMs II ...
The GAN epidemic


Thanks! Questions?


● NIPS 2016 Tutorial: Generative Adversarial Networks (Goodfellow 2016)
● Auto-Encoding Variational Bayes (Kingma & Welling 2013)
● https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/
● https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
● Tutorial on Variational Autoencoders (Doersch 2016)

