Abstract
Point-of-View videos recorded by Augmented Reality Glasses contain jitters because they are acquired under users’ actions in varying environments. Applying video stabilization on such videos is difficult due to weakness of conventional keypoint-based motion estimation to environmental conditions. They are prone to fail to track in low-textured or dark environments. To overcome this limitation, we propose a neural network-based motion estimation method for video stabilization. Our network predicts frame-to-frame motion in high accuracy by focusing on global camera motion, while ignoring local motion caused by moving objects. Motion prediction takes only up to 10 ms so that we achieve real-time stabilization on modern smartphones hardware. We demonstrate our method outperforms keypoint-based motion estimation and the quality of estimated motion is good enough for video stabilization. Our network is trainable without ground truth and easily scalable to large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Choi, J., Kim, M.: A deep convolutional neural network with selection units for super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1150–1156 (2017)
Gordon, A., Li, H., Jonschkowski, R., Angelova, A.: Depth from videos in the wild: unsupervised monocular depth learning from unknown cameras. In: International Conference on Computer Vision, pp. 8976–8985 (2019)
Han, L., Lin, Y., Du, G., Lian, S.: Deepvio: self-supervised deep learning of monocular visual inertial odometry using 3d geometric constraints. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 6906–6913 (2019)
Liu, F., Gleicher, M., Jin, H., Agarwala, A.: Content-preserving warps for 3d video stabilization. ACM Trans. Graph. 28(3), 1–9 (2009)
Liu, S., Yuan, L., Tan, P., Sun, J.: Steadyflow: spatially smooth optical flow for video stabilization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4209–4216 (2014)
Liu, S., Tan, P., Yuan, L., Sun, J., Zeng, B.: Meshflow: minimum latency online video stabilization. In: European Conference on Computer Vision, pp. 800–815 (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 807–814 (2010)
Ovrén, H., Forssén, P.: Gyroscope-based video stabilisation with auto-calibration. In: IEEE International Conference on Robotics and Automation, pp. 2090–2097 (2015)
Ringaby, E., Forssén, P.: Efficient video rectification and stabilisation for cell-phones. Int. J. Comput. Vis. 96(3), 335–352 (2012)
Runzhi, W., Wan, W., Wang, Y., Di, K.: A new RGB-D slam method with moving object detection for dynamic indoor scenes. Remote Sens. 11(10), 1143 (2019)
Schindler, A., Bartels, A.: Integration of visual and non-visual self-motion cues during voluntary head movements in the human brain. NeuroImage 172, 597–607 (2018)
Shum, H.Y., Szeliski, R.: Construction of panoramic image mosaics with global and local alignment. Int. J. Comput. Vis. 36(2), 101–130 (2000)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D slam systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580 (2012)
Wang, R., Pizer, S.M., Frahm, J.: Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5550–5559 (2019)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6612–6619 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lee, W. et al. (2021). Robust Camera Motion Estimation for Point-of-View Video Stabilization. In: Chen, J.Y.C., Fragomeni, G. (eds) Virtual, Augmented and Mixed Reality. HCII 2021. Lecture Notes in Computer Science(), vol 12770. Springer, Cham. https://doi.org/10.1007/978-3-030-77599-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-77599-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77598-8
Online ISBN: 978-3-030-77599-5
eBook Packages: Computer ScienceComputer Science (R0)