research-article

Blind 3D Video Stabilization with Spatio-Temporally Varying Motion Blur

Authors:

Xin XuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 11

Article No.: 351, Pages 1 - 23

https://doi.org/10.1145/3686159

Published: 13 November 2024 Publication History

Abstract

Video stabilization is a challenging task that attempts to compensate for the overall frame shake during video acquisition. Existing three-dimensional video stabilization methods aim at modeling camera perspective projection through either data-driven training or explicit motion estimation. However, the above methods are difficult to effectively solve the issue of shaky videos with abrupt object movements, resulting in local motion blur in the direction of the movement. This phenomenon is prevalent in real-world scenarios featuring foreground blind motion scenes. Unfortunately, directly combining stabilization and deblurring methods poses challenges when dealing with this situation. In the video, the intensity of motion blur undergoes continuous changes, and the direct combination method inadequately utilizes spatiotemporal information, providing insufficient clues for cross-frame compensation. To alleviate this problem, the Cross-frame-temporal Module framework is proposed to address blind motion blur induced by various conditions, which utilizes cross-frame temporal features to estimate depth maps and camera motion. In this framework, a Blur Transform Network (BTNet) is designed to adapt to spatially varying motion blur, which transforms local regions according to the impact of blur intensities to adapt to the effects of non-uniform motion blur; furthermore, our Temporal-Aware Network (TANet) further suppresses motion blur by leveraging cross-frame temporal features. In addition, the limited availability of pair-training video data containing motion blur limits the application of this approach in practice. The Cross-frame-temporal Module framework adopts an un-pretrained in-test training strategy. Extensive experimental results have demonstrated that our method outperforms state-of-the-art methods.

References

[1]

Muhammad Kashif Ali, Sangjoon Yu, and Tae Hyun Kim. 2020. Deep motion blind video stabilization. arXiv:2011.09697. Retrieved from https://arxiv.org/pdf/2011.09697

[2]

Yiheng Cai, Jiaqi Liu, Yajun Guo, Shaobin Hu, and Shinan Lang. 2021. Video anomaly detection with multi-scale feature and temporal information fusion. Neurocomputing 423 (2021), 264–273.

[3]

Guillermo Carbajal, Patricia Vitoria, Mauricio Delbracio, Pablo Musé, and José Lezama. 2021. Non-uniform blur kernel estimation via adaptive basis decomposition. arXiv:2102.01026. Retrieved from https://arxiv.org/pdf/2102.01026

[4]

Jierun Chen, Shiu-Hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, and S.-H. Gary Chan. 2023. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12021–12031.

[5]

Yu-Ta Chen, Kuan-Wei Tseng, Yao-Chih Lee, Chun-Yu Chen, and Yi-Ping Hung. 2021. Pixstabnet: Fast multi-scale deep online video stabilization with pixel-based warping. In Proceedings of the IEEE International Conference on Image Processing (ICIP ’21). IEEE, 1929–1933.

[6]

Sunghyun Cho, Jue Wang, and Seungyong Lee. 2012. Video deblurring for hand-held cameras using patch-based synthesis. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1–9.

Digital Library

[7]

Jinsoo Choi and In So Kweon. 2020. Deep iterative frame interpolation for full-frame video stabilization. ACM Transactions on Graphics (TOG) 39, 1 (2020), 1–9.

Digital Library

[8]

Chung-Hua Chu. 2015. Visual comfort for stereoscopic 3D by using motion sensors on 3D mobile devices. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 12, 1s (2015), 1–20.

Digital Library

[9]

Michael L. Gleicher and Feng Liu. 2008. Re-cinematography: Improving the camerawork of casual video. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 5, 1 (2008), 1–28.

Digital Library

[10]

Amit Goldstein and Raanan Fattal. 2012. Video stabilization using epipolar geometry. ACM Transactions on Graphics (TOG) 31, 5 (2012), 1–10.

Digital Library

[11]

Shao Huang, Weiqiang Wang, Shengfeng He, and Rynson W. H. Lau. 2017. Egocentric hand detection via dynamic region growing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 1 (2017), 1–17.

Digital Library

[12]

Maria Silvia Ito and Ebroul Izquierdo. 2019. A dataset and evaluation framework for deep learning based video stabilization systems. In Proceedings of the IEEE Visual Communications and Image Processing (VCIP ’19). IEEE, 1–4.

[13]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. Advances in Neural Information Processing Systems 28 (2015).

[14]

Jerin Geo James, Devansh Jain, and Ajit Rajwade. 2023. Globalflownet: Video stabilization using deep distilled global motion estimates. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 5078–5087.

[15]

Yao-Chih Lee, Kuan-Wei Tseng, Yu-Ta Chen, Chien-Cheng Chen, Chu-Song Chen, and Yi-Ping Hung. 2021. 3D video stabilization with depth estimation by CNN-based optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10621–10630.

[16]

Chengcheng Li, YuanTian, Lisen Ma, Yunhong Jia, and Yueqi Bi. 2024. Vehicle video stabilization algorithm based on grid motion statistics and adaptive Kalman filtering. Signal, Image and Video Processing (SIVP) 18, 2 (2024), 1969–1981.

[17]

Haipeng Li, Kunming Luo, Bing Zeng, and Shuaicheng Liu. 2024. Gyroflow+: Gyroscope-guided unsupervised deep homography and optical flow learning. International Journal of Computer Vision (IJCV) (2024), 1–19.

[18]

Alan J. Lipton, Hironobu Fujiyoshi, and Raju S. Patil. 1998. Moving target classification and tracking from real-time video. In Proceedings of the 4th IEEE Workshop on Applications of Computer Vision (WACV ’98). IEEE, 8–14.

[19]

Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala. 2009. Content-preserving warps for 3D video stabilization. ACM Transactions on Graphics (TOG) 28, 3 (2009), 1–9.

Digital Library

[20]

Shuaicheng Liu, Ping Tan, Lu Yuan, Jian Sun, and Bing Zeng. 2016. Meshflow: Minimum latency online video stabilization. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference. Springer, 800–815.

[21]

Shuaicheng Liu, Lu Yuan, Ping Tan, and Jian Sun. 2013. Bundled camera paths for video stabilization. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–10.

Digital Library

[22]

Shuaicheng Liu, Lu Yuan, Ping Tan, and Jian Sun. 2014. Steadyflow: Spatially smooth optical flow for video stabilization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’14), 4209–4216.

Digital Library

[23]

Tao Liu, Gang Wan, Hongyang Bai, Xiaofang Kong, Bo Tang, and Fangyi Wang. 2023. Real-time video stabilization algorithm based on superpoint. IEEE Transactions on Instrumentation and Measurement (TIM), (2023), 1–13.

[24]

Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, and Jia-Bin Huang. 2021. Hybrid neural fusion for full-frame video stabilization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2299–2308.

[25]

Ao Luo, Fan Yang, Xin Li, and Shuaicheng Liu. 2022. Learning optical flow with kernel patch attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8906–8915.

[26]

Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. 2020. Consistent video depth estimation. ACM Transactions on Graphics (TOG) 39, 4 (2020), 71–1.

Digital Library

[27]

Yasuyuki Matsushita, Eyal Ofek, Weina Ge, Xiaoou Tang, and Heung-Yeung Shum. 2006. Full-frame video stabilization with motion inpainting. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 7 (2006), 1150–1163.

Digital Library

[28]

Carlos Morimoto and Rama Chellappa. 1998. Evaluation of image stabilization algorithms. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’98). Vol. 5. IEEE, 2789–2792.

[29]

Manish Okade and P. K. Biswas. 2011. Improving video stabilization in the presence of motion blur. In Proceedings of the 3rd National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics. IEEE, 78–81.

Digital Library

[30]

Stefano Petrangeli, Jeroen Van Der Hooft, Tim Wauters, and Filip De Turck. 2018. Quality of experience-centric management of adaptive video streaming services: Status and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM ’18) 14, 2s (2018), 1–29.

Digital Library

[31]

Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. 2020. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. 11908–11915.

[32]

Qi Rao, Xin Yu, Shant Navasardyan, and Humphrey Shi. 2023. Sim2realvs: A new benchmark for video stabilization with a strong baseline. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV ’23), 5406–5415.

[33]

Wenqi Ren, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. 2017. Video deblurring via semantic segmentation and pixel-wise non-linear kernel. In Proceedings of the IEEE International Conference on Computer Vision, 1077–1085.

[34]

Kalpana Seshadrinathan and Alan Conrad Bovik. 2009. Motion tuned spatio-temporal quality assessment of natural videos. IEEE Transactions on Image Processing (TIP) 19, 2 (2009), 335–350.

Digital Library

[35]

Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, and Yingyu Liang. 2022. Deep online fused video stabilization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1250–1258.

[36]

Shimon Ullman. 1979. The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences 203, 1153 (1979), 405–426.

[37]

Jian Wang, Qiang Ling, and Peiyan Li. 2023. Robust video stabilization based on motion decomposition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 19, 5 (2023), 1–24.

Digital Library

[38]

Miao Wang, Guo-Ye Yang, Jin-Kun Lin, Song-Hai Zhang, Ariel Shamir, Shao-Ping Lu, and Shi-Min Hu. 2018. Deep online video stabilization with multi-grid warping transformation learning. IEEE Transactions on Image Processing 28, 5 (2018), 2283–2292.

[39]

Naiyao Wang, Changdong Zhou, Rongfeng Zhu, Bo Zhang, Ye Wang, and Hongbo Liu. 2024. SOFT: Self-supervised sparse optical flow transformer for video stabilization via quaternion. Engineering Applications of Artificial Intelligence 130 (2024), 107725.

Digital Library

[40]

Yiming Wang, Qian Huang, Chuanxu Jiang, Jiwen Liu, Mingzhou Shang, and Zhuang Miao. 2023. Video stabilization: A comprehensive survey. Neurocomputing 516 (2023), 205–230.

Digital Library

[41]

Yu-Shuen Wang, Feng Liu, Pu-Sheng Hsu, and Tong-Yee Lee. 2013. Spatially and temporally optimized video stabilization. IEEE Transactions on Visualization and Computer Graphics 19, 8 (2013), 1354–1361.

Digital Library

[42]

Sen-Zhe Xu, Jun Hu, Miao Wang, Tai-Jiang Mu, and Shi-Min Hu. 2018. Deep video stabilization using adversarial networks. In Proceedings of the Computer Graphics Forum. Vol. 37. Wiley Online Library, 267–276.

[43]

Yufei Xu, Jing Zhang, Stephen J. Maybank, and Dacheng Tao. 2022. Dut: Learning video stabilization by simply watching unstable videos. IEEE Transactions on Image Processing 31 (2022), 4306–4320.

Digital Library

[44]

Yufei Xu, Jing Zhang, and Dacheng Tao. 2021. Out-of-boundary view synthesis towards full-frame video stabilization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4842–4851.

[45]

Jiyang Yu and Ravi Ramamoorthi. 2019. Robust video stabilization by optimization in cnn weight space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’19), 3800–3808.

[46]

Jiyang Yu, Ravi Ramamoorthi, Keli Cheng, Michel Sarkis, and Ning Bi. 2021. Real-time selfie video stabilization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21). 12036–12044.

[47]

Lei Zhang, Qing-Zhuo Zheng, and Hua Huang. 2018. Intrinsic motion stability assessment for video stabilization. IEEE Transactions on Visualization and Computer Graphics (TVCG) 25, 4 (2018), 1681–1692.

Digital Library

[48]

Lei Zhang, Qing-Zhuo Zheng, Hong-Kang Liu, and Hua Huang. 2018. Full-reference stability assessment of digital video stabilization based on Riemannian metric. IEEE Transactions on Image Processing 27, 12 (2018), 6051–6063.

[49]

Minda Zhao and Qiang Ling. 2020. Pwstablenet: Learning pixel-wise warping maps for video stabilization. IEEE Transactions on Image Processing 29 (2020), 3582–3595.

Digital Library

[50]

Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1851–1858.

[51]

Zihan Zhou, Hailin Jin, and Yi Ma. 2013. Plane-based content preserving warps for video stabilization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2299–2306.

Digital Library

Cited By

Zhu YZheng YLiu JLi YZha Z(2024)Noise-Resistance Learning via Multi-Granularity Consistency for Unsupervised Domain Adaptive Person Re-IdentificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3702328Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702328
Cui KLiu SFeng WDeng XGao LCheng MLu HYang L(2024)Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3698772Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3698772
Song PZhou YYang XLiu DHu ZWang DWang M(2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682067
Show More Cited By

Index Terms

Blind 3D Video Stabilization with Spatio-Temporally Varying Motion Blur
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Motion capture

Recommendations

Fast Video Stabilization in the Compressed Domain
ICME '12: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo

Video stabilization is an important technique in present day digital cameras as most of the cameras are hand-held, mounted on moving platforms or subjected to atmospheric vibrations. Motion estimation is a bottleneck in the stabilization pipeline as it ...
Invertible motion blur in video

We show that motion blur in successive video frames is invertible even if the point-spread function (PSF) due to motion smear in a single photo is non-invertible. Blurred photos exhibit nulls (zeros) in the frequency transform of the PSF, leading to an ...
Spatio-temporal weighting in local patches for direct estimation of camera motion in video stabilization

This paper presents a robust video stabilization method by solving a novel formulation for the camera motion estimation. We introduce spatio-temporal weighting on local patches in optimization formulation, which enables one-step direct estimation ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 11

November 2024

702 pages

EISSN:1551-6865

DOI:10.1145/3613730

Editor:
Abuabdulmotaleb El Saddik
University of Ottowa

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2024

Online AM: 08 August 2024

Accepted: 25 July 2024

Revised: 24 July 2024

Received: 24 January 2024

Published in TOMM Volume 20, Issue 11

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
255
Total Downloads

Downloads (Last 12 months)255
Downloads (Last 6 weeks)51

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu YZheng YLiu JLi YZha Z(2024)Noise-Resistance Learning via Multi-Granularity Consistency for Unsupervised Domain Adaptive Person Re-IdentificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3702328Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702328
Cui KLiu SFeng WDeng XGao LCheng MLu HYang L(2024)Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3698772Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3698772
Song PZhou YYang XLiu DHu ZWang DWang M(2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1145/3682067
Ye CChen WLi JZhang LMao ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Dual-path Collaborative Generation Network for Emotional Video CaptioningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681603(496-505)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681603
Wen HSong XChen XWei YNie LChua THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657727

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents