Abstract
Recent work has shown that Neural Ordinary Differential Equations (ODEs) can serve as generative models of images using the perspective of Continuous Normalizing Flows (CNFs). Such models offer exact likelihood calculation, and invertible generation/density estimation. In this work we introduce a Multi-Resolution variant of such models (MRCNF), by characterizing the conditional distribution over the additional information required to generate a fine image that is consistent with the coarse image. We introduce a transformation between resolutions that allows for no change in the log likelihood. We show that this approach yields comparable likelihood values for various image datasets, with improved performance at higher resolutions, with fewer parameters, using only one GPU. Further, we examine the out-of-distribution properties of MRCNFs, and find that they are similar to those of other likelihood-based generative models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. In: International Conference on Learned Representations (2017)
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions. In: Advances in Neural Information Processing Systems, pp. 10215–10224 (2018)
Ho, J., Chen, X., Srinivas, A., Duan, Y., Abbeel, P.: Flow++: Improving flow-based generative models with variational dequantization and architecture design. In: International Conference on Machine Learning (2019)
Yu, J., Derpanis, K., Brubaker, M.: Wavelet flow: Fast training of high resolution normalizing flows. In: Advances in Neural Information Processing Systems (2020)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning. 1 (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. Preprint arXiv:1312.6114 (2013)
Chen, R.T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. (2018)
Grathwohl, W., Chen, R.T.Q., Bettencourt, J., Sutskever, I., Duvenaud, D.: Ffjord: Free-form continuous dynamics for scalable reversible generative models. International Conference on Learning Representations (2019)
Finlay, C., Jacobsen, J.-H., Nurbekyan, L., Oberman, A.: How to train your neural ode: the world of jacobian and kinetic regularization. International Conference on Machine Learning (2020)
Lin, Z., Khetan, A., Fanti, G., Oh, S.: Pacgan: The power of two samples in generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1498–1507 (2018)
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. Preprint arXiv:1701.04862 (2017)
Berard, H., Gidel, G., Almahairi, A., Vincent, P., Lacoste-Julien, S.: A closer look at the optimization landscapes of generative adversarial networks. In: International Conference on Machine Learning (2020)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019)
Shaham, T.R., Dekel, T., Michaeli, T.: Singan: Learning a generative model from a single natural image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4570–4580 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Vahdat, A., Kautz, J.: Nvae: A deep hierarchical variational autoencoder. In: Advances in Neural Information Processing Systems (2020)
Tabak, E.G., Turner, C.V.: A family of nonparametric density estimation algorithms. Commun. Pur. Appl. Math. 66(2), 145–164 (2013)
Jimenez Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: International Conference on Machine Learning, pp. 1530–1538 (2015)
Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., Lakshminarayanan, B.: Normalizing flows for probabilistic modeling and inference. Preprint arXiv:1912.02762 (2019)
Kobyzev, I., Prince, S., Brubaker, M.: Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Ghosh, A., Behl, H.S., Dupont, E., Torr, P.H., Namboodiri, V.: Steer: Simple temporal regularization for neural odes. In: Advances in Neural Information Processing Systems (2020)
Onken, D., Fung, S.W., Li, X., Ruthotto, L.: Ot-flow: Fast and accurate continuous normalizing flows via optimal transport. AAAI Conf. Artif. Intell. (2021)
Huang, H.-H., Yeh, M.-Y.: Accelerating continuous normalizing flow with trajectory polynomial regularization. AAAI Conf. Artif. Intell. (2021)
Burt, P.J.: Fast filter transform for image processing. Comput Graphics Image Process 16(1), 20–51 (1981)
Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. (2010)
Witkin, A.P.: Scale-space filtering, 329–332 (1987)
Burt, P., Adelson, E.: The laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)
Lindeberg, T.: Scale-space for discrete signals. IEEE Trans. Pattern Anal. Mach. Intell. 12(3), 234–254 (1990)
Adelson, E.H., Anderson, C.H., Bergen, J.R., Burt, P.J., Ogden, J.M.: Pyramid methods in image processing. RCA Eng. 29(6), 33–41 (1984)
Mallat, S.G., Peyré, G.: A Wavelet Tour of Signal Processing: the Sparse Way, (2009)
Yan, H., Du, J., Tan, V.Y.F., Feng, J.: On robustness of neural ordinary differential equations. International Conference on Learning Representations. (2020)
Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: International Conference on Learned Representations (2018)
Karnewar, A., Wang, O.: Msg-gan: Multi-scale gradients for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7799–7808 (2020)
Razavi, A., Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint arXiv:1511.06434 (2015)
Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. International Conference on Machine Learning. (2016)
Reed, S., Oord, A.v.d., Kalchbrenner, N., Colmenarejo, S.G., Wang, Z., Belov, D., De Freitas, N.: Parallel multiscale autoregressive density estimation. In: International Conference on Machine Learning (2017)
Menick, J., Kalchbrenner, N.: Generating high fidelity images with subscale pixel networks and multidimensional upscaling. In: International Conference on Learning Representations (2019)
Hoogeboom, E., Berg, R.v.d., Welling, M.: Emerging convolutions for generative normalizing flows. In: International Conference on Machine Learning (2019)
Hoogeboom, E., Peters, J., Berg, R., Welling, M.: Integer discrete flows and lossless compression. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12134–12144 (2019). https://proceedings.neurips.cc/paper/2019/file/9e9a30b74c49d07d8150c8c83b1ccf07-Paper.pdf
Song, Y., Meng, C., Ermon, S.: Mintnet: Building invertible neural networks with masked convolutions. In: Advances in Neural Information Processing Systems, pp. 11004–11014 (2019)
Ma, X., Kong, X., Zhang, S., Hovy, E.: Macow: Masked convolutional generative flow. In: Advances in Neural Information Processing Systems, pp. 5893–5902 (2019)
Durkan, C., Bekasov, A., Murray, I., Papamakarios, G.: Neural spline flows. In: Advances in Neural Information Processing Systems, vol. 32, pp. 7511–7522 (2019). https://proceedings.neurips.cc/paper/2019/file/7ac71d433f282034e088473244df8c02-Paper.pdf
Chen, J., Lu, C., Chenli, B., Zhu, J., Tian, T.: Vflow: More expressive generative flows with variational data augmentation. In: International Conference on Machine Learning (2020)
Lee, S.-g., Kim, S., Yoon, S.: Nanoflow: Scalable normalizing flows with sublinear parameter complexity. In: Advances in Neural Information Processing Systems (2020)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265 (2015). PMLR
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. (2020)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. International Conference on Learning Representations. (2020)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. (2019)
Song, Y., Ermon, S.: Improved techniques for training score-based generative models. Adv. Neural Inf. Process. Syst. (2020)
Jolicoeur-Martineau, A., Piché-Taillefer, R., Combes, R.T.d., Mitliagkas, I.: Adversarial score matching and improved sampling for image generation. International Conference on Learning Representations. (2021)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations. (2021)
Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. Preprint arXiv:1904.10509. (2019)
Jun, H., Child, R., Chen, M., Schulman, J., Ramesh, A., Radford, A., Sutskever, I.: Distribution augmentation for generative modeling. In: International Conference on Machine Learning, pp. 10563–10576 (2020)
Grcić, M., Grubišić, I., Šegvić, S.: Densely connected normalizing flows. Preprint. (2021)
Chen, R.T., Behrmann, J., Duvenaud, D.K., Jacobsen, J.-H.: Residual flows for invertible generative modeling. In: Advances in Neural Information Processing Systems, pp. 9916–9926 (2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical Report, University of Toronto. (2009)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). IEEE
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. Preprint arXiv:1511.05644. (2015)
Grover, A., Dhar, M., Ermon, S.: Flow-gan: Combining maximum likelihood and adversarial learning in generative models. In: AAAI Conference on Artificial Intelligence (2018)
Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., Levine, S.: Stochastic adversarial video prediction. ArXiv. abs/1804.01523 (2018)
Beckham, C., Honari, S., Verma, V., Lamb, A.M., Ghadiri, F., Hjelm, R.D., Bengio, Y., Pal, C.: On adversarial mixup resynthesis. In: Advances in Neural Information Processing Systems, pp. 4346–4357 (2019)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Theis, L., Oord, A.v.d., Bethge, M.: A note on the evaluation of generative models. In: International Conference on Learning Representations (2016)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? In: International Conference on Learning Representations (2019)
Serrà, J., Álvarez, D., Gómez, V., Slizovskaia, O., Núñez, J.F., Luque, J.: Input complexity and out-of-distribution detection with likelihood-based generative models. In: International Conference on Learning Representations (2020)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Lakshminarayanan, B.: Detecting out-of-distribution inputs to deep generative models using a test for typicality. Preprint arXiv:1906.02994. 5 (2019)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning. (2011)
Choi, H., Jang, E., Alemi, A.A.: Waic, but why? generative ensembles for robust anomaly detection. Preprint arXiv:1810.01392 (2018)
Kirichenko, P., Izmailov, P., Wilson, A.G.: Why normalizing flows fail to detect out-of-distribution data. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Sneyers, J., Wuille, P.: Flif: Free lossless image format based on maniac compression. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 66–70 (2016). IEEE
Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: International Conference on Learning Representations (2019)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International Conference on Learning Representations (2017)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: International Conference on Learning Representations (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 7167–7177 (2018)
Sabeti, E., Høst-Madsen, A.: Data discovery and anomaly detection using atypicality for real-valued data. Entropy 21(3), 219 (2019)
Høst-Madsen, A., Sabeti, E., Walton, C.: Data discovery and anomaly detection using atypicality: Theory. IEEE Trans. Inf. Theory 65(9), 5302–5322 (2019)
Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning (2018)
Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: An improved autoregressive generative model. In: International Conference on Machine Learning, pp. 864–872 (2018). PMLR
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. Preprint arXiv:1912.12180 (2019)
Nielsen, D., Winther, O.: Closing the dequantization gap: Pixelcnn as a single-layer flow. In: Advances in Neural Information Processing Systems (2020)
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improving variational inference with inverse autoregressive flow. (2016). cite arxiv:1606.04934
Behrmann, J., Grathwohl, W., Chen, R.T., Duvenaud, D., Jacobsen, J.-H.: Invertible residual networks. In: International Conference on Machine Learning, pp. 573–582 (2019)
Karami, M., Schuurmans, D., Sohl-Dickstein, J., Dinh, L., Duckworth, D.: Invertible convolutional flow. In: Advances in Neural Information Processing Systems, vol. 32, pp. 5635–5645 (2019). https://proceedings.neurips.cc/paper/2019/file/b1f62fa99de9f27a048344d55c5ef7a6-Paper.pdf
Huang, C.-W., Dinh, L., Courville, A.: Augmented normalizing flows: Bridging the gap between generative flows and latent variable models. Preprint arXiv:2002.07101 (2020)
Xiao, C., Liu, L.: Generative flows with matrix exponential. In: International Conference on Machine Learning (2020)
Lu, Y., Huang, B.: Woodbury transformations for deep generative flows. In: Advances in Neural Information Processing Systems (2020)
Hoogeboom, E., , Satorras, V.G., Tomczak, J., Welling, M.: The convolution exponential and generalized sylvester flows. In: Advances in Neural Information Processing Systems (2020)
Kelly, J., Bettencourt, J., Johnson, M.J., Duvenaud, D.: Learning differential equations that are easy to solve. In: Advances in Neural Information Processing Systems (2020)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Acknowledgements
Chris Finlay contributed to this paper while a postdoc at McGill University; he is now affiliated with Deep Render. His postdoc was funded in part by a Healthy Brains Healthy Lives Fellowship. Adam Oberman was supported by the Air Force Office of Scientific Research under award number FA9550-18-1-0167 and by IVADO. Christopher Pal is funded in part by CIFAR. We thank CIFAR for their support through the CIFAR AI Chairs program. We also thank Samsung for partially supporting Vikram Voleti for this work. We thank Adam Ibrahim, Etienne Denis, Gauthier Gidel, Ioannis Mitliagkas, and Roger Girgis for their valuable feedback.
Funding
Chris Finlay contributed to this paper while a postdoc at McGill University, funded in part by a Healthy Brains Healthy Lives Fellowship. Adam Oberman was supported by the Air Force Office of Scientific Research under award number FA9550-18-1-0167 and by IVADO. Christopher Pal is funded in part by CIFAR. We thank CIFAR for their support through the CIFAR AI Chairs program.
Author information
Authors and Affiliations
Contributions
Vikram Voleti and Chris Finlay brainstormed over ideas for improving image generation using the continuous normalizing flows framework of Neural ODEs. Adam Oberman and Christopher Pal provided advice and guidance throughout the project and wrote parts of the paper. With help from Adam Oberman and Christopher Pal, Vikram derived the mathematical framework. With help from Chris Finlay, Vikram designed the experiments, wrote the code, ran experiments, proposed and executed on out-of-distribution analysis, and wrote the paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Full Table 1
Table 5 presents the full version of Table 1 including other results relevant to the conclusion but not mentioned in the main paper for brevity.
Appendix B Qualitative samples
Here we present qualitative examples of our method for the datasets of MNIST and CIFAR10.
Appendix C Simple example of density estimation
For example, if we use Euler method as our ODE solver, for density estimation (2) reduces to:
where \(f_s\) is a neural network, \(t_0\) represents the "time" at which the state is image \({\textbf{x}}\), and \(t_1\) is when the state is noise \({\textbf{z}}\). We start at scale S with an image sample \({\textbf{x}}_S\), and assume \(t_0\) and \(t_1\) are 0 and 1 respectively:
Appendix D Simple example of generation
For example, if we use Euler method as our ODE solver, for generation (2) reduces to:
i.e. the state is integrated backwards from \(t_1\) (i.e. \({\textbf{z}}_s\)) to \(t_0\) (i.e. \({\textbf{x}}_s\)). We start at scale 0 with a noise sample \({\textbf{z}}_0\), and assume \(t_0\) and \(t_1\) are 0 and 1 respectively:
Appendix E Models
We used the same neural network architecture as in RNODE [9]. The CNF at each resolution consists of a stack of bl blocks of a 4-layer deep convolutional network comprised of 3x3 kernels and softplus activation functions, with 64 hidden dimensions, and time t concatenated to the spatial input. In addition, except at the coarsest resolution, the immediate coarser image is also concatenated with the state. The integration time of each piece is [0, 1]. The number of blocks bl and the corresponding total number of parameters are given in Table 6.
Appendix F Gradient norm
In order to avoid exploding gradients, We clipped the norm of the gradients [94] by a maximum value of 100.0. In case of using adversarial loss, we first clip the gradients provided by the adversarial loss by 50.0, sum up the gradients provided by the log-likelihood loss, and then clip the summed gradients by 100.0.
Appendix G 8-bit to uniform
The change-of-variables formula gives the change in probability due to the transformation of \({\textbf{u}}\) to \({\textbf{v}}\):
Specifically, the change of variables from an 8-bit image to an image with pixel values in range [0, 1] is:
where \(\text {bpd}({\textbf{x}})\) is given from (17).
Appendix H FID v/s Temperature
Table 7 lists the FID values of generated images from MRCNF models trained on CIFAR10, with different temperature settings on the Gaussian.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Voleti, V., Finlay, C., Oberman, A. et al. Multi-resolution continuous normalizing flows. Ann Math Artif Intell 92, 1295–1317 (2024). https://doi.org/10.1007/s10472-024-09939-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-024-09939-5