Abstract
Understanding how to learn feature representations for images and generate high-quality images under unsupervised learning was challenging. One of the main difficulties in feature learning has been the problem of posterior collapse in variational inference. This paper proposes a hierarchical aggregated vector-quantized variational autoencoder, called TransVQ-VAE. Firstly, the multi-scale feature information based on the hierarchical Transformer is complementarily encoded to represent the global and structural dependencies of the input features. Then, it is compared to the latent encoding space with a linear difference to reduce the feature dimensionality. Finally, the decoder generates synthetic samples with higher diversity and fidelity compared to previous ones. In addition, we propose a dual self-attention module in the encoding process that uses spatial and channel information to capture distant texture correlations, contributing to the consistency and realism of the generated images. Experimental results on MNIST, CIFAR-10, CelebA-HQ, and ImageNet datasets show that our approach significantly improves the diversity and visual quality of the generated images.
Supported by the National Social Science Fund of China (21BTJ071).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengio, Y., Laufer, E., Alain, G., Yosinski, J.: Deep generative stochastic networks trainable by backprop. In: International Conference on Machine Learning, vol. 32, pp. 226–234 (2014)
Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-VAE. arXiv preprint arXiv:1804.03599 (2018)
Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: an improved autoregressive generative model. In: International Conference on Machine Learning, pp. 864–872 (2018)
Chien, J.T., Wang, C.W.: Hierarchical and self-attended sequence autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4975–4986 (2021)
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. Adv. Neural. Inf. Process. Syst. 29, 64–72 (2016)
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2674–2780 (2014)
Gregor, K., Besse, F., Jimenez Rezende, D., Danihelka, I., Wierstra, D.: Towards conceptual compression. Adv. Neural. Inf. Process. Syst. 29, 3549–3557 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural. Inf. Process. Syst. 30, 6626–6637 (2017)
Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (2016)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. Adv. Neural. Inf. Process. Syst. 29, 4743–4751 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Lee, D., Kim, C., Kim, S., Cho, M., Han, W.S.: Autoregressive image generation using residual quantization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11523–11532 (2022)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lucas, J., Tucker, G., Grosse, R.B., Norouzi, M.: Don’t blame the ELBO! A linear VAE perspective on posterior collapse. Adv. Neural. Inf. Process. Syst. 32, 9408–9418 (2019)
Makhzani, A., Frey, B.: K-sparse autoencoders. arXiv preprint arXiv:1312.5663 (2013)
Ravuri, S., Vinyals, O.: Classification accuracy score for conditional generative models. Adv. Neural. Inf. Process. Syst. 32, 12268–12279 (2019)
Razavi, A., Van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. Adv. Neural. Inf. Process. Syst. 32, 14866–14876 (2019)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 833–840 (2011)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural. Inf. Process. Syst. 29, 2234–2242 (2016)
Takida, Y., et al.: SQ-VAE: variational Bayes on discrete representation with self-annealed stochastic quantization. In: International Conference on Machine Learning, pp. 20987–21012 (2022)
Van Den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. Adv. Neural. Inf. Process. Syst. 30, 6306–6315 (2017)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jin, C., Zheng, A., Wu, Z., Tong, C. (2023). TransVQ-VAE: Generating Diverse Images Using Hierarchical Representation Learning. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14256. Springer, Cham. https://doi.org/10.1007/978-3-031-44213-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-44213-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44212-4
Online ISBN: 978-3-031-44213-1
eBook Packages: Computer ScienceComputer Science (R0)