Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Panoptic-Level Image-to-Image Translation for Object Recognition and Visual Odometry Enhancement

Published: 22 June 2023 Publication History

Abstract

Image-to-image translation methods have progressed from only considering the image-level information to integrating the global- and instance-level information. However, only the foreground instances are refined, and the background semantics are taken as an entire feature, which causes a substantial loss of the semantic information in the translation. Additionally, the insufficient quality of the translated semantic regions also leads to an unsatisfactory performance of the object recognition or visual odometry tasks in which the translated images/videos are further used. In this paper, we propose a novel generative adversarial network for panoptic-level image-to-image translation (PanopticGAN). The proposed method has three advantages: 1) the extracted panoptic perception (i.e., the foreground instances and background semantic regions) as content codes are aligned with the sampled panoptic style codes, which considers the panoptic-level information to avoid the semantic information loss, and the latent space of each object has a rich fusion of content and style codes to generate the higher-fidelity results; 2) a feature masking module is proposed to extract the representations within each object contour by masks for sharpening the object boundaries; 3) the improved fidelity of the translated semantic regions further contributes to enhancing the performance of the object recognition or visual odometry tasks that the translated images/videos are used in. In this paper, we also annotate a compact panoptic segmentation dataset for the thermal-to-color translation task. Extensive experiments are conducted to demonstrate the effectiveness of our PanopticGAN over the latest methods.

References

[1]
J.-Y. Zhuet al., “Multimodal image-to-image translation by enforcing bi-cycle consistency,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 465–476.
[2]
Z. Shen, M. Huang, J. Shi, X. Xue, and T. S. Huang, “Towards instance-level image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3678–3687.
[3]
X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 172–189.
[4]
D. S. Tan, Y. Lin, and K. Hua, “Incremental learning of multi-domain image-to-image translations,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 4, pp. 1526–1539, Apr. 2021.
[5]
Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin, “Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 701–710.
[6]
R. Chen and Y. Zhang, “Learning dynamic generative attention for single image super-resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 12, pp. 8368–8382, Dec. 2022.
[7]
J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 694–711.
[8]
Y. Zhao, L. Po, K. Cheung, W. Yu, and Y. A. U. Rehman, “SCGAN: Saliency map-guided colorization with generative adversarial network,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 8, pp. 3062–3077, Aug. 2021.
[9]
X. Zhong, T. Lu, W. Huang, M. Ye, X. Jia, and C. Lin, “Grayscale enhancement colorization network for visible-infrared person re-identification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1418–1430, Mar. 2022.
[10]
L. Fu, H. Yu, F. Juefei-Xu, J. Li, Q. Guo, and S. Wang, “Let there be light: Improved traffic surveillance via detail preserving night-to-day transfer,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 12, pp. 8217–8226, Dec. 2022.
[11]
Z. Zhu, Y. Meng, D. Kong, X. Zhang, Y. Guo, and Y. Zhao, “To see in the dark: N2DGAN for background modeling in nighttime scene,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 2, pp. 492–502, Feb. 2021.
[12]
Z. Zheng, Y. Wu, X. Han, and J. Shi, “ForkGAN: Seeing into the rainy night,” in Proc. 16th Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 155–170.
[13]
E. Jung, N. Yang, and D. Cremers, “Multi-frame GAN: Image enhancement for stereo visual odometry in low light,” in Proc. Conf. Robot Learn., 2020, pp. 651–660.
[14]
H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: Part I,” IEEE Robot. Autom. Mag., vol. 13, no. 2, pp. 99–110, Jun. 2006.
[15]
P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5967–5976.
[16]
T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8798–8807.
[17]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2980–2988.
[18]
A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9396–9405.
[19]
A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 6392–6401.
[20]
J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 3, pp. 611–625, Mar. 2018.
[21]
S. Hwang, J. Park, N. Kim, Y. Choi, and I. S. Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1037–1045.
[22]
L. Zhanget al., “Panoptic-aware image-to-image translation,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2023, pp. 259–268.
[23]
J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242–2251.
[24]
H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 35–51.
[25]
X. Kuanget al., “Thermal infrared colorization via conditional generative adversarial network,” Infr. Phys. Technol., vol. 107, Jun. 2020, Art. no.
[26]
A. Mahendran and A. Vedaldi, “Understanding deep image representations by inverting them,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 5188–5196.
[27]
T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” 2017, arXiv:1710.10196.
[28]
H. Tang, D. Xu, N. Sebe, and Y. Yan, “Attention-guided generative adversarial networks for unsupervised image-to-image translation,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2019, pp. 1–8.
[29]
J. Kim, M. Kim, H. Kang, and K. Lee, “U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” 2019, arXiv:1907.10830.
[30]
L. Jiang, C. Zhang, M. Huang, C. Liu, J. Shi, and C. C. Loy, “TSIT: A simple and versatile framework for image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 206–222.
[31]
M. Cai, H. Zhang, H. Huang, Q. Geng, Y. Li, and G. Huang, “Frequency domain image translation: More photo-realistic, better identity-preserving,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 13910–13920.
[32]
Y. Pang, J. Xie, and X. Li, “Visual haze removal by a unified generative adversarial network,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 11, pp. 3211–3221, Nov. 2019.
[33]
E. Richardsonet al., “Encoding in style: A StyleGAN encoder for image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 2287–2296.
[34]
T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4396–4405.
[35]
O. Ashual and L. Wolf, “Specifying object attributes and relations in interactive scene generation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4560–4568.
[36]
J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from scene graphs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1219–1228.
[37]
B. Zhao, L. Meng, W. Yin, and L. Sigal, “Image generation from layout,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8576–8585.
[38]
T. Sylvain, P. Zhang, Y. Bengio, R. D. Hjelm, and S. Sharma, “Object-centric image generation from layouts,” in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 3, pp. 2647–2655.
[39]
W. Sun and T. Wu, “Image synthesis from reconfigurable layout and style,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 10530–10539.
[40]
S. Mo, M. Cho, and J. Shin, “InstaGAN: Instance-aware image-to-image translation,” 2018, arXiv:1812.10889.
[41]
S. Ma, J. Fu, C. W. Chen, and T. Mei, “DA-GAN: Instance-level image translation by deep attention generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5657–5666.
[42]
L. Jiang, M. Xu, X. Wang, and L. Sigal, “Saliency-guided image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 16504–16513.
[43]
J. Su, H. Chu, and J. Huang, “Instance-aware image colorization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 7965–7974.
[44]
T. Chen, W. Xiong, H. Zheng, and J. Luo, “Image sentiment transfer,” in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 4407–4415.
[45]
S. Kim, J. Baek, J. Park, G. Kim, and S. Kim, “InstaFormer: Instance-aware image-to-image translation with transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 18300–18310.
[46]
Y. Lin, Y. Wang, Y. Li, Y. Gao, Z. Wang, and L. Khan, “Attention-based spatial guidance for image-to-image translation,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 816–825.
[47]
J. Huang, J. Liao, and S. Kwong, “Semantic example guided image-to-image translation,” IEEE Trans. Multimedia, vol. 23, pp. 1654–1665, 2021.
[48]
A. Dundar, K. Sapra, G. Liu, A. Tao, and B. Catanzaro, “Panoptic-based image synthesis,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8067–8076.
[49]
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–3440.
[50]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6230–6239.
[51]
X. Sun, C. Chen, X. Wang, J. Dong, H. Zhou, and S. Chen, “Gaussian dynamic convolution for efficient single-image segmentation,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 2937–2948, May 2022.
[52]
X. Weng, Y. Yan, S. Chen, J. Xue, and H. Wang, “Stage-aware feature alignment network for real-time semantic segmentation of street scenes,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 7, pp. 4444–4459, Jul. 2022.
[53]
Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6154–6162.
[54]
N. Gaoet al., “SSAP: Single-shot instance segmentation with affinity pyramid,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 642–651.
[55]
X. Zhang, H. Li, F. Meng, Z. Song, and L. Xu, “Segmenting beyond the bounding box for instance segmentation,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 704–714, Feb. 2022.
[56]
X. Wang, C. Shen, H. Li, and S. Xu, “Human detection aided by deeply learned semantic masks,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 8, pp. 2663–2673, Aug. 2020.
[57]
Q. Chen, A. Cheng, X. He, P. Wang, and J. Cheng, “SpatialFlow: Bridging all tasks for panoptic segmentation,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 6, pp. 2288–2300, Jun. 2021.
[58]
P.-Y. Laffont, Z. Ren, X. Tao, C. Qian, and J. Hays, “Transient attributes for high-level understanding and editing of outdoor scenes,” ACM Trans. Graph., vol. 33, no. 4, pp. 1–11, Jul. 2014.
[59]
J. Hyun Lim and J. Chul Ye, “Geometric GAN,” 2017, arXiv:1705.02894.
[60]
J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3992–4000.
[61]
S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 802–810.
[62]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[63]
A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” 2018, arXiv:1809.11096.
[64]
T. Miyato and M. Koyama, “CGANs with projection discriminator,” 2018, arXiv:1802.05637.
[65]
I. Goodfellowet al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 1–12.
[66]
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” 2018, arXiv:1802.05957.
[67]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[68]
A. M. Saxe, J. L. McClelland, and S. Ganguli, “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks,” 2013, arXiv:1312.6120.
[69]
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training GANs,” in Proc. Adv. Neural Inf. Process. Syst., vol. 29, 2016, pp. 2234–2242.
[70]
S. Ravuri and O. Vinyals, “Classification accuracy score for conditional generative models,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–22.
[71]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[72]
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 586–595.
[73]
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 3354–3361.
[74]
W. Sun and T. Wu, “Learning layout and style reconfigurable GANs for controllable image synthesis,” 2020, arXiv:2003.11571.
[75]
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–12.
[76]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 25, Dec. 2012, pp. 1097–1105.

Index Terms

  1. Panoptic-Level Image-to-Image Translation for Object Recognition and Visual Odometry Enhancement
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          Publisher

          IEEE Press

          Publication History

          Published: 22 June 2023

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Oct 2024

          Other Metrics

          Citations

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media