research-article

Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation

Authors:

Jiaqing Fan,

Tiankang Su,

Kaihua Zhang,

Qingshan LiuAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 3646 - 3655

https://doi.org/10.1145/3503161.3548039

Published: 10 October 2022 Publication History

Get Access

Abstract

Spatio-temporal feature representation is essential for accurate unsupervised video object segmentation, which needs an effective feature propagation paradigm for both appearance and motion features that can fully interchange information across frames. However, existing solutions mainly focus on the forward feature propagation from the preceding frame to the current one, either using the former segmentation mask or motion propagation in a frame-by-frame manner. This ignores the bi-directional temporal feature interactions (including the backward propagation from the future to the current frame) across all frames that can help to enhance the spatiotemporal feature representation for segmentation prediction. To this end, this paper presents a novel Dense Bidirectional Spatio-temporal feature propagation Network (DBSNet) to fully integrate the forward and the backward propagations across all frames. Specifically, a dense bi-ConvLSTM module is first developed to propagate the features across all frames in a forward and backward manner. This can fully capture the multi-level spatio-temporal contextual information across all frames, producing an effective feature representation that has a strong discriminative capability to tell from noisy backgrounds. Following it, a spatio-temporal Transformer refinement module is designed to further enhance the propagated features, which can effectively capture the spatio-temporal long-range dependencies among all frames. Afterwards, a Co-operative Direction-aware Graph Attention (Co-DGA) module is designed to integrate the propagated appearancemotion cues, yielding a strong spatio-temporal feature representation for segmentation mask prediction. The Co-DGA assigns proper attentional weights to neighboring points along the coordinate axis, making the segmentation model to selectively focus on the most relevant neighbors. Extensive evaluations on four mainstream challenging benchmarks including DAVIS16, FBMS, DAVSOD, and MCL demonstrate that the proposed DBSNet achieves favorable performance against state-of-the-art methods in terms of all evaluation metrics.

Supplementary Material

MP4 File (MM22-fp1209.mp4)

We propose a DBSNet framework for Unsupervised Video Object Segmentation(UVOS), which densely propagates the cross-frame spatio-temporal features along bidirectional directions. Then, to capture long-range dependencies from both spatial and temporal dimensions, a spatio-temporal Transformer refinement module has been designed. It is able to aggregate all the positions over current frame and neighboring frames. Furthermore, we have designed a Co-DGA module to integrate the appearance and motion cues, making the model learn mutual knowledge from static and dynamic contexts. The Co-DGA is able to extract the implicit structural information of the foreground areas, as well, contributing to a more reliable and fine-grained representation for UVOS. Extensive evaluations on four benchmark datasets have demonstrated the advantage and effectiveness of the proposed approach, which substantially outperforms the state-of-the-art methods.

Download
41.17 MB

References

[1]

Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In CVPR.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Efficient Spatio-temporal Segmentation for Extracting Moving Objects in Video Sequences

A Spatio-temporal Feature Based on Triangulation of Dense SURF

Unsupervised Video Object Segmentation Using Motion Saliency-Guided Spatio-Temporal Propagation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations