Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network

Published: 18 September 2023 Publication History

Abstract

Video object segmentation (VOS) exhibits heavy occlusions, large deformation, and severe motion blur. While many remarkable convolutional neural networks are devoted to the VOS task, they often mis-identify background noise as the target or output coarse object boundaries, due to the failure of mining detail information and high-order correlations of pixels within the whole video. In this work, we propose an edge attention gated graph convolutional network (GCN) for VOS. The seed point initialization and graph construction stages construct a spatio-temporal graph of the video by exploring the spatial intra-frame correlation and the temporal inter-frame correlation of superpixels. The node classification stage identifies foreground superpixels by using an edge attention gated GCN which mines higher-order correlations between superpixels and propagates features among different nodes. The segmentation optimization stage optimizes the classification of foreground superpixels and reduces segmentation errors by using a global appearance model which captures the long-term stable feature of objects. In summary, the key contribution of our framework is twofold: (a) the spatio-temporal graph representation can propagate the seed points of the first frame to subsequent frames and facilitate our framework for the semi-supervised VOS task; and (b) the edge attention gated GCN can learn the importance of each node with respect to both the neighboring nodes and the whole task with a small number of layers. Experiments on Davis 2016 and Davis 2017 datasets show that our framework achieves the excellent performance with only small training samples (45 video sequences).

References

[1]
Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274–2282.
[2]
Fatemeh Azimi, Stanislav Frolov, Federico Raue, and Andreas Dengel. 2020. Hybrid sequence to sequence model for video object segmentation. CoRR abs/2010.05069 (2020).
[3]
Linchao Bao, Baoyuan Wu, and Wei Liu. 2018. CNN in MRF: Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5977–5986.
[4]
Xavier Bresson and Thomas Laurent. 2017. Residual gated graph convNets. CoRR abs/1711.07553 (2017). arXiv:1711.07553
[5]
Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel Cremers, and Luc Van Gool. 2017. One-shot video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 221–230.
[6]
John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence6 (1986), 679–698.
[7]
Yadang Chen, Chuanyan Hao, Alex Liu, and Enhua Wu. 2019. Multilevel model for video object segmentation based on supervision optimization. IEEE Transactions on Multimedia 21, 8 (2019), 1934–1945.
[8]
Yadang Chen, Chuanyan Hao, Alex X. Liu, and Enhua Wu. 2019. Appearance-consistent video object segmentation based on a multinomial event model. ACM Transactions on Multimedia Computing Communications and Applications 15, 2 (2019), 40.1–40.15.
[9]
Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. 2021. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In Advances in Neural Information Processing Systems, Vol. 34. 11781–11794.
[10]
Yan Gui, Ying Tian, Dao-Jian Zeng, Zhi-Feng Xie, and Yi-Yu Cai. 2020. Reliable and dynamic appearance modeling and label consistency enforcing for fast and coherent video object segmentation with the bilateral grid. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2020), 4781–4795.
[11]
Chuanyan Hao, Yadang Chen, Zhixin Yang, and Enhua Wu. 2020. Higher-order potentials for video object segmentation in bilateral space. Neurocomputing 401 (2020), 28–35.
[12]
Li Hu, Peng Zhang, Bang Zhang, Pan Pan, Yinghui Xu, and Rong Jin. 2021. Learning position and target consistency for memory-based video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4142–4152.
[13]
Yuan-Ting Hu, Jia-Bin Huang, and Alexander Schwing. 2017. MaskRNN: Instance level video object segmentation. In Advances in Neural Information Processing Systems, Vol. 30.
[14]
Yuan-Ting Hu, Jia-Bin Huang, and Alexander Schwing. 2018. VideoMatch: Matching based video object segmentation. In Proceedings of the European Conference on Computer Vision. 54–70.
[15]
Xuhua Huang, Jiarui Xu, Yu-Wing Tai, and Chi-Keung Tang. 2020. Fast video object segmentation with temporal aggregation network and dynamic template matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8879–8889.
[16]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2462–2470.
[17]
Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, and Michael Felsberg. 2019. A generative appearance model for end-to-end video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8953–8962.
[18]
Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, and Yi Yang. 2022. Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8709–8720.
[19]
Yuxi Li, Ning Xu, Wenjie Yang, John See, and Weiyao Lin. 2022. Exploring the semi-supervised video object segmentation problem from a cyclic perspective. International Journal of Computer Vision 130, 10 (2022), 2408–2424.
[20]
Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, and Pan Zhou. 2021. Spatiotemporal graph neural network based mask reconstruction for video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2100–2108.
[21]
Weide Liu, Guosheng Lin, Tianyi Zhang, and Zichuan Liu. 2021. Guided co-segmentation network for fast video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31, 4 (2021), 1607–1617.
[22]
Ziyang Liu, Jingmeng Liu, Weihai Chen, Xingming Wu, and Zhengguo Li. 2022. FAMINet: Learning real-time semisupervised video object segmentation with steepest optimized optical flow. IEEE Transactions on Instrumentation and Measurement 71 (2022), 1–16.
[23]
Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, and Luc Van Gool. 2020. Video object segmentation with episodic graph memory networks. In Proceedings of the European Conference on Computer Vision. 661–679.
[24]
Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, and Fatih Porikli. 2019. See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3623–3632.
[25]
Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David J. Crandall, and Steven C. H. Hoi. 2020. Learning video object segmentation from unlabeled videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26]
Nicolas Maerki, Federico Perazzi, Oliver Wang, and Alexander Sorkine-Hornung. 2016. Bilateral space video segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 743–751.
[27]
K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.
[28]
Thomas N-Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016).
[29]
Seoung Wug Oh, Joon-Young Lee, Kalyan Sunkavalli, and Seon Joo Kim. 2018. Fast video object segmentation by reference-guided mask propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7376–7385.
[30]
Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. 2019. Video object segmentation using space-time memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9226–9235.
[31]
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 724–732.
[32]
Federico Perazzi, Oliver Wang, Markus Gross, and Alexander Sorkine. 2015. Fully connected object proposals for video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3227–3234.
[33]
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbelez, Alex Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017).
[34]
Andreas Robinson, Felix Jaremo Lawin, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2020. Learning fast and robust target models for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7406–7415.
[35]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510–4520.
[36]
Hongje Seong, Junhyuk Hyun, and Euntai Kim. 2020. Kernelized memory network for video object segmentation. In Proceedings of the European Conference on Computer Vision. 629–645.
[37]
Mingjie Sun, Jimin Xiao, Eng Gee Lim, Bingfeng Zhang, and Yao Zhao. 2020. Fast template matching and update for video object tracking and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10791–10799.
[38]
Zhentao Tan, Bin Liu, Qi Chu, Hangshi Zhong, Yue Wu, Weihai Li, and Nenghai Yu. 2020. Real time video object segmentation in compressed domain. IEEE Transactions on Circuits and Systems for Video Technology 31, 1 (2020), 175–188.
[39]
Yihsuan Tsai, Minghsuan Yang, and Michael Black. 2016. Video segmentation via object flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3899–3908.
[40]
Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, and Liang-Chieh Chen. 2019. FEELVOS: Fast end-to-end embedding learning for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9481–9490.
[41]
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1328–1338.
[42]
Tao Wang, Ning Xu, Kean Chen, and Weiyao Lin. 2021. End-to-end video instance segmentation via spatial-temporal graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10797–10806.
[43]
Wenguan Wang, Xiankai Lu, Jianbing Shen, David J. Crandall, and Ling Shao. 2019. Zero-shot video object segmentation via attentive graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9236–9245.
[44]
Wenguan Wang, Jianbing Shen, Xiankai Lu, Steven C. H. Hoi, and Haibin Ling. 2021. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 7 (2021), 2413–2428.
[45]
Wenguan Wang, Jianbing Shen, and Fatih Porikli. 2017. Selective video object cutout. IEEE Transactions on Image Processing 26, 12 (2017), 5645–5655.
[46]
Wenguan Wang, Jianbing Shen, Fatih Porikli, and Ruigang Yang. 2018. Semi-supervised video object segmentation with super-trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 4 (2018), 985–998.
[47]
Wenguan Wang, Jianbing Shen, Jianwen Xie, and Fatih Porikli. 2017. Super-trajectory for video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1671–1679.
[48]
Wenguan Wang, Jianbing Shen, Ruigang Yang, and Fatih Porikli. 2018. Saliency-aware video object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 20–33.
[49]
Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, and Luc Van Gool. 2021. A survey on deep learning technique for video segmentation. arXiv preprint arXiv:2107.01153 (2021).
[50]
Huaxin Xiao, Bingyi Kang, Yu Liu, Maojun Zhang, and Jiashi Feng. 2019. Online meta adaptation for fast video object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2019), 1205–1217.
[51]
Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, and Wenxiu Sun. 2021. Efficient regional memory network for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1286–1295.
[52]
Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, and Thomas Huang. 2018. YouTube-VOS: Sequence-to-sequence video object segmentation. In Proceedings of the European Conference on Computer Vision. 585–601.
[53]
Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, and Weidi Xie. 2021. Self-supervised video object segmentation by motion grouping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7157–7168.
[54]
Fengting Yang, Qian Sun, Hailin Jin, and Zihan Zhou. 2020. Superpixel segmentation with fully convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13964–13973.
[55]
Donghun Yeo, Jeany Son, and Bohyung Han. 2017. Superpixel-based tracking-by-segmentation using Markov chains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1812–1821.
[56]
Jingang Yu and Jinwen Tian. 2012. Saliency detection using midlevel visual cues. Optics Letters 37, 23 (2012), 4994–4997.
[57]
Bingfeng Zhang, Jimin Xiao, Jianbo Jiao, Yunchao Wei, and Yao Zhao. 2022. Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2022), 8082–8096.
[58]
Kaihua Zhang, Zicheng Zhao, Dong Liu, Qingshan Liu, and Bo Liu. 2021. Deep transport network for unsupervised video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8781–8790.
[59]
Zongji Zhao, Sanyuan Zhao, and Jianbing Shen. 2021. Real-time and light-weighted unsupervised video object segmentation network. Pattern Recognition 120 (2021), 108–120.
[60]
Lei Zhu, Qi She, Bin Zhang, Yanye Lu, Zhilin Lu, Duo Li, and Jie Hu. 2021. Learning the superpixel in a non-iterative and lifelong manner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1225–1234.
[61]
Wencheng Zhu, Jiahao Li, Jiwen Lu, and Jie Zhou. 2022. Separable structure modeling for semi-supervised video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology 32, 1 (2022), 330–344.
[62]
Tao Zhuo, Zhiyong Cheng, Peng Zhang, Yongkang Wong, and Mohan Kankanhalli. 2020. Unsupervised online video object segmentation with motion property understanding. IEEE Transactions on Image Processing 29 (2020), 237–249.

Index Terms

  1. Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 1
    January 2024
    639 pages
    EISSN:1551-6865
    DOI:10.1145/3613542
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 September 2023
    Online AM: 01 August 2023
    Accepted: 17 July 2023
    Revised: 11 July 2023
    Received: 09 August 2022
    Published in TOMM Volume 20, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. semi-supervised video object segmentation
    2. superpixel
    3. spatio-temporal graph model
    4. graph convolutional network

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China
    • National Natural Science Foundation of China
    • Science and Technology Planning Project of Guangdong Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 265
      Total Downloads
    • Downloads (Last 12 months)196
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media