Week 6 Unsupervised Learning
Week 6 Unsupervised Learning
4
7
9
5 8
3
6 1
0 2
Definitions
Unsupervised Learning
● Learning patterns from data without human annotations
● e.g., clustering, density estimation, dimensionality reduction
Self-supervised Learning
● Use the success of supervised learning without relying on human provided
supervision (automatic supervision)
● e.g., mask part of the input and predict the masked information
Semi-supervised Learning
● Learning from data that mostly consists of unlabeled samples
● A small amount of human-labeled data are available as well
Autoencoders
Autoencoders
Find efficient representations of input data that could be used to reconstruct the
original input using two components:
● Encoder
○ Converts the inputs to an internal representation
○ Dimensionality reduction
● Decoder
○ Converts the internal representation to the outputs
○ Generative network
Autoencoders
The number of outputs is the same as the inputs
Hourglass shape creating a bottleneck layer, lower dimensional representation
Autoencoders
It is forced to learn the most important features in the input data and drop the
unimportant ones
Applications
● Feature Extraction
● Unsupervised Pre-training
● Dimensionality Reduction
● Generate new data
● Anomaly detection → Autoencoders are bad at reconstructing outliers
PyTorch Implementation
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
encoding_dim = 32
self.encoder = nn.Linear(28 * 28, encoding_dim)
self.decoder = nn.Linear(encoding_dim, 28 * 28)
criterion = nn.MSELoss()
Stacked Autoencoders
● Autoencoders can have multiple hidden layers: stacked (deep) autoencoders
● Typically symmetrical with regards to the central coding layer.
Visualizing Reconstructions
One way to ensure that an autoencoder is properly trained is to compare the inputs
and the outputs.
Original image
Noisy image
Reconstructed image
Generating New Images
● Since we are drastically reducing the dimensionality of the image, there has to be
some kind of structure in the codings (i.e. embedding space).
● That is, the network should be able to save space by mapping similar images to
similar embeddings.
● Let’s see how we can exploit this to allow us to generate new types of images.
New Images with Interpolation
● First compute low-dimensional embeddings of two images.
● Then interpolate between the two embeddings and decode those as well!
● Interpolated codings result in new images that are somewhere in between the
two starting images.
Plotting Interpolated Codings
We can do this for other image combinations
Plotting Interpolated Codings
What if we randomly select a coding?
The latent space in autoencoders can become disjoint and non-continues
Variational AutoEncoders (VAE)
VAEs
They are quite different from the autoencoders we have discussed so far:
● Probabilistic → their outputs are partly determined by chance even after training
● Generative → they can generate new instances that look like they were sampled
from the training set.
They impose a distribution constraint on the latent space to have a smooth space.
VAEs
Encoder generates a normal distribution with mean µ and a standard deviation σ
instead of a fixed embedding.
An embedding is sampled from the distribution and decoder decodes the sample to
reconstruct the input.
VAEs
We want the encoder distribution ew df to be close to prior
We can use Kullback–Leibler (KL) divergence to measure the difference between two
distributions P(X) and Q(X):
If we plug-in the encoder distribution and the prior into KL-divergence of two
multivariate Gaussians, we get:
VAEs
Training
sample
Generating N(0, I) Embedding
Generating Data
Generate images that look like handwritten digits by training a variational
autoencoder.
Intermission
(5 to 10 min break)
Convolutional Autoencoders
Convolutional Autoencoder
Convolutional autoencoders take advantage of spatial information.
● Encoder → Learns visual embedding using convolutional layers
● Decoder → Up-samples the learned visual embedding to match the original size
of the image.
Transposed Convolution
The opposite of the convolution is the transposed convolution (different from an
inverse convolution).
They work with filters, kernels, padding, strides just as the convolution layers.
Instead of mapping KxK pixels to 1, they can map from 1 pixel to KxK pixels.
The kernels are learned just like normal convolutional kernels.
output dimension input dimension stride kernel size padding output padding
Transposed Convolution
1. Take each pixel of your input image
2. Multiply each value of your kernel with the input pixel to get a weighted kernel
3. Insert it in the output to create an image
4. Where the outputs overlap sum them
Transposed Convolution
1. Take each pixel of your input image
2. Multiply each value of your kernel with the input pixel to get a weighted kernel
3. Insert it in the output to create an image
4. Where the outputs overlap sum them
Padding
The effect is the opposite of what happens with the convolution layers:
1. Compute the output as normal
2. Remove rows and columns around the perimeter
Output padding
● When stride > 1, Conv2d maps multiple input shapes to the same output shape.
● E.g. Inputs of size 7x7 and 8x8 both return an output of 3x3 for a kernel of size
3x3 with stride=2
● When applying the transpose convolution, it is ambiguous that which output
shape to return, 7x7 or 8x8 for stride=2 transpose convolution.
● Output padding is provided to resolve this ambiguity by effectively increasing the
calculated output shape on one side.
● It is only used to find output shape, but does not actually add zero-padding to
output.
Strides
The effect is also the opposite from what happens with the convolution layers
Increasing the stride results in an increase in the upsampling effect.
s=2
PyTorch Implementation
A convolution transpose layer with the exact same specifications as the convolution
layer would have the reverse effect on the shape.
convt = nn.ConvTranspose2d(in_channels=16,
out_channels=8,
kernel_size=5,
padding=2)
x = torch.randn(32, 16, 64, 64)
y = convt(x)
y.shape
convt = nn.ConvTranspose2d(in_channels=16,
out_channels=8,
kernel_size=5,
stride=2,
padding=2)
convt = nn.ConvTranspose2d(in_channels=16,
out_channels=8,
kernel_size=5,
stride=2,
padding=2,
output_padding=1)
def embed(self, x)
return self.encoder(x)
Classifier
Self-Supervised Learning
Self-supervised learning with pretext tasks
What if we can cast unsupervised learning into supervised setting?
define proxy supervised tasks such that:
● The labels are generated automatically for free
● Solving the task, requires the model to “understand” the content
The challenge is devising the tasks such that they enforce the model to learn robust
representations.
RotNet
Idea: Rotate images randomly by 0, 90, 180, or 270 degrees and make the model to
predict the rotation angle
if someone is not aware of the concepts of the objects depicted in the images, they
cannot recognize the rotation that was applied to them.
RotNet
The task is multiclass classification with 4 classes (cross-entropy loss) with free
labels being generated automatically
Contrastive Learning
Autoencoding methods: Contrastive methods:
● Reconstruct input ● Contrast pair of positive/negative samples
● Compute the loss in output space ● Compute the loss in embedding space
● Compress all the details ● Compress relevant information
● Requires lots of negative examples
SimCLR
MLP
CNN
Questions?