Uncertainty in Deep learning

This post sheds light on uncertainty in deep learning models. We all realise that deep learning algorithms grow in popularity and use by the greater engineering community. Maybe a deep learning model recommended this post to you, or the spotify song you are listening on the background. Soon, deep learning models might rise in more sensitive domains. Autonomous driving, medical decision making or jurisdiction might adapt models too, but what about the uncertainties that the models introduce in such applications?

Training a neural net gets you a point estimate of the weights. Testing a neural net gets you one sample of a softmax distribution. Think about the knowledge being lost in those two steps. Moreover, we know that these softmax outputs can be easily fooled. An imperceptible change might change a classification from schoolbus 0.95 to Ostrich 0.95. This project will not focus on these adversarial examples, but their existence motivates us to consider a more elaborate view on neural networks.

This project will compare on three approaches to uncertainty in neural networks. I talked to various researchers over the past months. There is no conclusion for one approach for obtaining uncertainties. However, all researchers agreed these three approaches point in the right direction.

Overview of bootstrapping, MCMC and variational inference

Our three approaches are bootstrapping, MCMC and variational inference. Before we dive into the details of each, this section will sketch an overarching structure in which to understand these approaches.

The bootstrap follows from the assumption that there is one correct parameter for our model and we estimate it from a random data source. How can we use this assumption to compute the uncertainty in our parameter? We will subsample the training set many times and estimate one parameter from each. We are uncertain about our model, so we maintain this set of estimated parameters. At test-time, we average the outputs from the model with each parameter as our prediction. The variance in our outputs represents the uncertainty. chapter 8 of elements of statistical learning explains the bootstrap in more detail

Another way to view the learning proces is that there is one dataset and we learn a distribution over our model parameters. Bayes rule exemplifies this reasoning. We distill our knowledge of the world in the prior. In the likelihood we update the distribution according to the data. This gives rise to a posterior distribution over parameters. However, this distribution is intractable to compute. Therefore, we resort to two approximations for this process.

_Monte Carlo sampling Rather than evaluating the distribution, we draw samples from it. Then we can evaluate any function of the distribution via these samples.
_Variational inference Rather than evaluating the distribution, we find a close approximation to it. This approximation will have a form that we can easily perform calculations over.

Both approximations come with disadvantages. For Monte Carlo methods, our estimate may vary from the expected value if we have few samples. More samples will reduce this variance. For variational inference we will exactly find the best approximation. However, we will not know if there is a bias between our approximate distribution and the true distribution. In other words, Monte Carlo methods have variance, variational inference has bias.

Sampling, averages and uncertainty

All three approaches results in multiple samples of the parameter vector. Our interest lies in the output for a sample and its uncertainty. How do we get these quantities from the parameter samples?

Our model outputs a softmax distribution. Therefore, we take the average over all these softmax distributions.

$\bar{f}(x) = \sum_{\theta_i \in \{\theta\}} softmax(x;\theta_i)$

For many applications, we need a decision. This will be the bin with the highest softmax value, $\delta(x) = \arg max \bar{f}(x)$

Our estimate of the uncertainty is less clear. We are working with softmax distribution which has no common uncertainty number associated. In the literature, I came across three options

Softmax value: in this case, the value of the softmax at the decision is used to represent uncertainty. so $\gamma = max \bar{f}(x)$
Variance in the softmax: in this case, the variance of the softmax values in the different outputs is used to represent uncertainty. Define the set of all softmax values, $S = \{softmax_j(x; \theta_i)| j=\delta(x), \theta_i \in {\theta} \}$ . Then the uncertainty is the variance in this set, $\gamma = var(S)$ . Section 4 of this paper
Entropy in the average softmax: in this case, the entropy of the average distribution represents the uncertainty. So $\gamma = \sum_{k=1}^K \bar{f}_k(x) log(\bar{f}_k(x))$ . Section 5.3 of this paper

In this project, we implement all three of them.

Details on the implementations

Bootstrapping

In bootstrappping, we sample multiple datasets with replacement. The Dataloader object has a function bootstrap_yourself to resample the training set for a bootstrap. Then the model is trained num_runs times to obtain the set of parameters

MCMC

We use Langevin dynamics to obtain samples from the posterior over parameters. This implementation exactly follows this paper by Teh and Welling. After a burn_in period, it will save a parameter vector every steps_per_epoch steps.

Variational Inference

Honestly, I currently lack some understanding of the variational approach. The implementation follows the papers here and here. At the moment, I understand this literature as fundamental approach that leads to an intuitive implementation. We are all familiar with dropout and its dropping of weights in a neural network. We can interpret this as fitting a two spike distribution to the parameter posterior (per weight) while constraining one spike at zero. We obtain samples from this distribution by sampling from these spikes. That amounts to running the model many times with different dropout masks. I hope to update this section if I gain more understanding. The researchers I chatted with on this project also pointed me to this paper

Experiment

So how to assess uncertainty in image classification? There is no uncalibrated measure of uncertainty for any image, as that would assume a model of the full (history of) the world. However, we can assess images for which we know that uncertainty increases. We take two approaches, injecting Gaussian noise or rotating the image.

Results

We experiment with different noise levels or angles of rotation and record the corresponding uncertainty metrics. At perturbation method, we take num_experiments experiments on batch_size_test images.

These diagrams plot the risk numbers against the experiment variable. For differen injected noise and different rotation angles, we see the entropy, mean and standard deviation of the softmax. You can make this diagram with plot_risk.py

We also want intuition for the mutilation and its effect on the uncertainty. Therefore, we made these GIFs where the mutilations increase. Red and green titles indicate incorrect/correct classifications.

Observations

In these results, there are some interesting observations

When rotating the images, the error quickly shoots up. At 90 degrees rotation the model misclassifies 80% of the images. It's interesting to see how the uncertainty numbers behave under such large error.
The entropy of mc_dropout is larger than the other two MC types. In parallel, we notice that its mean softmax value is lower.
Even though the entropy and mean softmax of Bootstrapping and Langevin samples are comparable, the standard deviation is lower.

Discussion

At this point, we leave many open ends for this project. No researcher I contacted on this expressed a conclusion on the uncertainties in neural networks. Lots of research needs to be done in this area. I hope these diagrams give you a starting point to think about these questions too.

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bayes_nn		bayes_nn
svgs		svgs
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
development.config.ini		development.config.ini
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncertainty in Deep learning

Overview of bootstrapping, MCMC and variational inference

Sampling, averages and uncertainty

Details on the implementations

Bootstrapping

MCMC

Variational Inference

Experiment

Results

Observations

Discussion

Further reading

About

Releases

Packages

Languages

License

RobRomijnders/bayes_nn

Folders and files

Latest commit

History

Repository files navigation

Uncertainty in Deep learning

Overview of bootstrapping, MCMC and variational inference

Sampling, averages and uncertainty

Details on the implementations

Bootstrapping

MCMC

Variational Inference

Experiment

Results

Observations

Discussion

Further reading

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages