Abstract
In this paper, we focus on the task of generating realistic images given an input semantic layout, which is also called semantic image synthesis. Most of previous methods are based on conditional generative adversarial networks mechanism, which is stacks of convolution, normalization, and non-linearity layers. However, these methods easily generate blurred regions and distorted structures. There are two limits existing: their normalization layers are unable to make a good balance between keeping semantic layout information and geometric changes; and cannot effectively aggregated multi-level feature. To address the above problems, we propose a novel method which incorporates multi-level gate feature aggregation mechanism (GFA) and spatially adaptive batch-instance normalization (SPAda-BIN) for semantic image synthesis. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, in terms of both visual fidelity and quantitative metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016)
Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff: thing and stuff classes in context (2016)
Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks (2017)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Cordts, M., et al: The cityscapes dataset for semantic urban scene understanding (2016)
Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization (2017)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Li, X., Zhao, H., Han, L., Tong, Y., Yang, K.: GFF: gated fully fusion for semantic segmentation (2019)
Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 622–638. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_37
Lin, T.Y., et al.: Microsoft COCO: common objects in context (2014)
Nam, H., Kim, H.E.: Batch-instance normalization for adaptively style-invariant neural networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 2558–2567. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7522-batch-instance-normalization-for-adaptively-style-invariant-neural-networks.pdf
Qi, X., Chen, Q., Jia, J., Koltun, V.: Semi-parametric image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation (2015)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. Comput. Sci. (2015)
Szegedy, C., et al.: Going deeper with convolutions (2014)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs (2017)
Zhou, B., Hang, Z., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)
Acknowledgment
This paper is supported by NSFC (Nos. 61772330, 61533012, 61876109), China Next Generation Internet IPv6 project (Grant No. NGII20170609) and Shanghai authentication key Lab. (2017XCWZK01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Long, J., Lu, H. (2021). Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-67832-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)