research-article

Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network

Authors:

Baocai YinAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 1

Article No.: 24, Pages 1 - 23

https://doi.org/10.1145/3611389

Published: 18 September 2023 Publication History

Abstract

Video object segmentation (VOS) exhibits heavy occlusions, large deformation, and severe motion blur. While many remarkable convolutional neural networks are devoted to the VOS task, they often mis-identify background noise as the target or output coarse object boundaries, due to the failure of mining detail information and high-order correlations of pixels within the whole video. In this work, we propose an edge attention gated graph convolutional network (GCN) for VOS. The seed point initialization and graph construction stages construct a spatio-temporal graph of the video by exploring the spatial intra-frame correlation and the temporal inter-frame correlation of superpixels. The node classification stage identifies foreground superpixels by using an edge attention gated GCN which mines higher-order correlations between superpixels and propagates features among different nodes. The segmentation optimization stage optimizes the classification of foreground superpixels and reduces segmentation errors by using a global appearance model which captures the long-term stable feature of objects. In summary, the key contribution of our framework is twofold: (a) the spatio-temporal graph representation can propagate the seed points of the first frame to subsequent frames and facilitate our framework for the semi-supervised VOS task; and (b) the edge attention gated GCN can learn the importance of each node with respect to both the neighboring nodes and the whole task with a small number of layers. Experiments on Davis 2016 and Davis 2017 datasets show that our framework achieves the excellent performance with only small training samples (45 video sequences).

References

[1]

Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274–2282.

Digital Library

[2]

Fatemeh Azimi, Stanislav Frolov, Federico Raue, and Andreas Dengel. 2020. Hybrid sequence to sequence model for video object segmentation. CoRR abs/2010.05069 (2020).

[3]

Linchao Bao, Baoyuan Wu, and Wei Liu. 2018. CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5977–5986.

[4]

Xavier Bresson and Thomas Laurent. 2017. Residual gated graph convNets. CoRR abs/1711.07553 (2017). arXiv:1711.07553

[5]

Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel Cremers, and Luc Van Gool. 2017. One-shot video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 221–230.

[6]

John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence6 (1986), 679–698.

Digital Library

[7]

Yadang Chen, Chuanyan Hao, Alex Liu, and Enhua Wu. 2019. Multilevel model for video object segmentation based on supervision optimization. IEEE Transactions on Multimedia 21, 8 (2019), 1934–1945.

[8]

Yadang Chen, Chuanyan Hao, Alex X. Liu, and Enhua Wu. 2019. Appearance-consistent video object segmentation based on a multinomial event model. ACM Transactions on Multimedia Computing Communications and Applications 15, 2 (2019), 40.1–40.15.

Digital Library

[9]

Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. 2021. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In Advances in Neural Information Processing Systems, Vol. 34. 11781–11794.

[10]

Yan Gui, Ying Tian, Dao-Jian Zeng, Zhi-Feng Xie, and Yi-Yu Cai. 2020. Reliable and dynamic appearance modeling and label consistency enforcing for fast and coherent video object segmentation with the bilateral grid. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2020), 4781–4795.

Digital Library

[11]

Chuanyan Hao, Yadang Chen, Zhixin Yang, and Enhua Wu. 2020. Higher-order potentials for video object segmentation in bilateral space. Neurocomputing 401 (2020), 28–35.

[12]

Li Hu, Peng Zhang, Bang Zhang, Pan Pan, Yinghui Xu, and Rong Jin. 2021. Learning position and target consistency for memory-based video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4142–4152.

[13]

Yuan-Ting Hu, Jia-Bin Huang, and Alexander Schwing. 2017. MaskRNN: Instance level video object segmentation. In Advances in Neural Information Processing Systems, Vol. 30.

[14]

Yuan-Ting Hu, Jia-Bin Huang, and Alexander Schwing. 2018. VideoMatch: Matching based video object segmentation. In Proceedings of the European Conference on Computer Vision. 54–70.

Digital Library

[15]

Xuhua Huang, Jiarui Xu, Yu-Wing Tai, and Chi-Keung Tang. 2020. Fast video object segmentation with temporal aggregation network and dynamic template matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8879–8889.

[16]

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2462–2470.

[17]

Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, and Michael Felsberg. 2019. A generative appearance model for end-to-end video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8953–8962.

[18]

Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, and Yi Yang. 2022. Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8709–8720.

[19]

Yuxi Li, Ning Xu, Wenjie Yang, John See, and Weiyao Lin. 2022. Exploring the semi-supervised video object segmentation problem from a cyclic perspective. International Journal of Computer Vision 130, 10 (2022), 2408–2424.

Digital Library

[20]

Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, and Pan Zhou. 2021. Spatiotemporal graph neural network based mask reconstruction for video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2100–2108.

[21]

Weide Liu, Guosheng Lin, Tianyi Zhang, and Zichuan Liu. 2021. Guided co-segmentation network for fast video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31, 4 (2021), 1607–1617.

[22]

Ziyang Liu, Jingmeng Liu, Weihai Chen, Xingming Wu, and Zhengguo Li. 2022. FAMINet: Learning real-time semisupervised video object segmentation with steepest optimized optical flow. IEEE Transactions on Instrumentation and Measurement 71 (2022), 1–16.

[23]

Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, and Luc Van Gool. 2020. Video object segmentation with episodic graph memory networks. In Proceedings of the European Conference on Computer Vision. 661–679.

Digital Library

[24]

Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, and Fatih Porikli. 2019. See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3623–3632.

[25]

Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David J. Crandall, and Steven C. H. Hoi. 2020. Learning video object segmentation from unlabeled videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]

Nicolas Maerki, Federico Perazzi, Oliver Wang, and Alexander Sorkine-Hornung. 2016. Bilateral space video segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 743–751.

[27]

K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.

Digital Library

[28]

Thomas N-Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016).

[29]

Seoung Wug Oh, Joon-Young Lee, Kalyan Sunkavalli, and Seon Joo Kim. 2018. Fast video object segmentation by reference-guided mask propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7376–7385.

[30]

Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. 2019. Video object segmentation using space-time memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9226–9235.

[31]

Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 724–732.

[32]

Federico Perazzi, Oliver Wang, Markus Gross, and Alexander Sorkine. 2015. Fully connected object proposals for video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3227–3234.

Digital Library

[33]

Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbelez, Alex Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017).

[34]

Andreas Robinson, Felix Jaremo Lawin, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2020. Learning fast and robust target models for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7406–7415.

[35]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510–4520.

[36]

Hongje Seong, Junhyuk Hyun, and Euntai Kim. 2020. Kernelized memory network for video object segmentation. In Proceedings of the European Conference on Computer Vision. 629–645.

Digital Library

[37]

Mingjie Sun, Jimin Xiao, Eng Gee Lim, Bingfeng Zhang, and Yao Zhao. 2020. Fast template matching and update for video object tracking and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10791–10799.

[38]

Zhentao Tan, Bin Liu, Qi Chu, Hangshi Zhong, Yue Wu, Weihai Li, and Nenghai Yu. 2020. Real time video object segmentation in compressed domain. IEEE Transactions on Circuits and Systems for Video Technology 31, 1 (2020), 175–188.

Digital Library

[39]

Yihsuan Tsai, Minghsuan Yang, and Michael Black. 2016. Video segmentation via object flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3899–3908.

[40]

Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, and Liang-Chieh Chen. 2019. FEELVOS: Fast end-to-end embedding learning for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9481–9490.

[41]

Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1328–1338.

[42]

Tao Wang, Ning Xu, Kean Chen, and Weiyao Lin. 2021. End-to-end video instance segmentation via spatial-temporal graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10797–10806.

[43]

Wenguan Wang, Xiankai Lu, Jianbing Shen, David J. Crandall, and Ling Shao. 2019. Zero-shot video object segmentation via attentive graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9236–9245.

[44]

Wenguan Wang, Jianbing Shen, Xiankai Lu, Steven C. H. Hoi, and Haibin Ling. 2021. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 7 (2021), 2413–2428.

[45]

Wenguan Wang, Jianbing Shen, and Fatih Porikli. 2017. Selective video object cutout. IEEE Transactions on Image Processing 26, 12 (2017), 5645–5655.

Digital Library

[46]

Wenguan Wang, Jianbing Shen, Fatih Porikli, and Ruigang Yang. 2018. Semi-supervised video object segmentation with super-trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 4 (2018), 985–998.

Digital Library

[47]

Wenguan Wang, Jianbing Shen, Jianwen Xie, and Fatih Porikli. 2017. Super-trajectory for video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1671–1679.

[48]

Wenguan Wang, Jianbing Shen, Ruigang Yang, and Fatih Porikli. 2018. Saliency-aware video object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 20–33.

[49]

Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, and Luc Van Gool. 2021. A survey on deep learning technique for video segmentation. arXiv preprint arXiv:2107.01153 (2021).

[50]

Huaxin Xiao, Bingyi Kang, Yu Liu, Maojun Zhang, and Jiashi Feng. 2019. Online meta adaptation for fast video object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2019), 1205–1217.

[51]

Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, and Wenxiu Sun. 2021. Efficient regional memory network for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1286–1295.

[52]

Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, and Thomas Huang. 2018. YouTube-VOS: Sequence-to-sequence video object segmentation. In Proceedings of the European Conference on Computer Vision. 585–601.

Digital Library

[53]

Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, and Weidi Xie. 2021. Self-supervised video object segmentation by motion grouping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7157–7168.

[54]

Fengting Yang, Qian Sun, Hailin Jin, and Zihan Zhou. 2020. Superpixel segmentation with fully convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13964–13973.

[55]

Donghun Yeo, Jeany Son, and Bohyung Han. 2017. Superpixel-based tracking-by-segmentation using Markov chains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1812–1821.

[56]

Jingang Yu and Jinwen Tian. 2012. Saliency detection using midlevel visual cues. Optics Letters 37, 23 (2012), 4994–4997.

[57]

Bingfeng Zhang, Jimin Xiao, Jianbo Jiao, Yunchao Wei, and Yao Zhao. 2022. Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2022), 8082–8096.

[58]

Kaihua Zhang, Zicheng Zhao, Dong Liu, Qingshan Liu, and Bo Liu. 2021. Deep transport network for unsupervised video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8781–8790.

[59]

Zongji Zhao, Sanyuan Zhao, and Jianbing Shen. 2021. Real-time and light-weighted unsupervised video object segmentation network. Pattern Recognition 120 (2021), 108–120.

Digital Library

[60]

Lei Zhu, Qi She, Bin Zhang, Yanye Lu, Zhilin Lu, Duo Li, and Jie Hu. 2021. Learning the superpixel in a non-iterative and lifelong manner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1225–1234.

[61]

Wencheng Zhu, Jiahao Li, Jiwen Lu, and Jie Zhou. 2022. Separable structure modeling for semi-supervised video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology 32, 1 (2022), 330–344.

Digital Library

[62]

Tao Zhuo, Zhiyong Cheng, Peng Zhang, Yongkang Wong, and Mohan Kankanhalli. 2020. Unsupervised online video object segmentation with motion property understanding. IEEE Transactions on Image Processing 29 (2020), 237–249.

Index Terms

Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Video segmentation

Recommendations

Object segmentation in video via graph cut built on superpixels
Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (II)

This paper proposes a real-time scheme for object segmentation in video. In the first stage a segmentation based on pairwise region comparison is utilized to oversegment image through extracting superpixels. Next, the algorithmapplies the graph cut ...
Vertebral Body Segmentation of Spine MR Images Using Superpixels
CBMS '15: Proceedings of the 2015 IEEE 28th International Symposium on Computer-Based Medical Systems

This paper presents a segmentation approach guided by the user for extracting the vertebral bodies of spine from MRI. The proposed approach, called VBSeg, takes advantage of super pixels to reduce the image complexity and then making easy the detection ...
End-to-end trainable network for superpixel and image segmentation
Highlights
- Our network is end-to-end trainable and can be easily assembled into other deep networks.
Abstract
Image segmentation and superpixel generation have been studied for many years, and they are still active research topics in computer vision. Although many advanced computer vision algorithms have been used for image segmentation and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 1

January 2024

639 pages

EISSN:1551-6865

DOI:10.1145/3613542

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2023

Online AM: 01 August 2023

Accepted: 17 July 2023

Revised: 11 July 2023

Received: 09 August 2022

Published in TOMM Volume 20, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China
Science and Technology Planning Project of Guangdong Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
265
Total Downloads

Downloads (Last 12 months)196
Downloads (Last 6 weeks)7

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents