Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis

Long, Jia; Lu, Hongtao

doi:10.1007/978-3-030-67832-6_31

Jia Long¹⁵ &
Hongtao Lu^15,16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

International Conference on Multimedia Modeling

2625 Accesses

Abstract

In this paper, we focus on the task of generating realistic images given an input semantic layout, which is also called semantic image synthesis. Most of previous methods are based on conditional generative adversarial networks mechanism, which is stacks of convolution, normalization, and non-linearity layers. However, these methods easily generate blurred regions and distorted structures. There are two limits existing: their normalization layers are unable to make a good balance between keeping semantic layout information and geometric changes; and cannot effectively aggregated multi-level feature. To address the above problems, we propose a novel method which incorporates multi-level gate feature aggregation mechanism (GFA) and spatially adaptive batch-instance normalization (SPAda-BIN) for semantic image synthesis. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, in terms of both visual fidelity and quantitative metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MUSH: Multi-scale Hierarchical Feature Extraction for Semantic Image Synthesis

Enhancing Semantic Image Synthesis: A GAN-Based Approach with Multi-Feature Adaptive Denormalization Layer

3D Noise and Adversarial Supervision Is All You Need for Multi-modal Semantic Image Synthesis

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016)
Google Scholar
Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff: thing and stuff classes in context (2016)
Google Scholar
Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks (2017)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Cordts, M., et al: The cityscapes dataset for semantic urban scene understanding (2016)
Google Scholar
Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Li, X., Zhao, H., Han, L., Tong, Y., Yang, K.: GFF: gated fully fusion for semantic segmentation (2019)
Google Scholar
Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 622–638. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_37
Chapter Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context (2014)
Google Scholar
Nam, H., Kim, H.E.: Batch-instance normalization for adaptively style-invariant neural networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 2558–2567. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7522-batch-instance-normalization-for-adaptively-style-invariant-neural-networks.pdf
Qi, X., Chen, Q., Jia, J., Koltun, V.: Semi-parametric image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation (2015)
Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. Comput. Sci. (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions (2014)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs (2017)
Google Scholar
Zhou, B., Hang, Z., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: IEEE Conference on Computer Vision & Pattern Recognition (2017)
Google Scholar

Download references

Acknowledgment

This paper is supported by NSFC (Nos. 61772330, 61533012, 61876109), China Next Generation Internet IPv6 project (Grant No. NGII20170609) and Shanghai authentication key Lab. (2017XCWZK01).

Author information

Authors and Affiliations

Key Lab of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Jia Long & Hongtao Lu
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Hongtao Lu

Authors

Jia Long
View author publications
You can also search for this author in PubMed Google Scholar
Hongtao Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongtao Lu .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Long, J., Lu, H. (2021). Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_31
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MUSH: Multi-scale Hierarchical Feature Extraction for Semantic Image Synthesis

Enhancing Semantic Image Synthesis: A GAN-Based Approach with Multi-Feature Adaptive Denormalization Layer

3D Noise and Adversarial Supervision Is All You Need for Multi-modal Semantic Image Synthesis

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MUSH: Multi-scale Hierarchical Feature Extraction for Semantic Image Synthesis

Enhancing Semantic Image Synthesis: A GAN-Based Approach with Multi-Feature Adaptive Denormalization Layer

3D Noise and Adversarial Supervision Is All You Need for Multi-modal Semantic Image Synthesis

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation