Abstract
In this study, we suggest a novel video retargeting approach by considering the essential factors of a video: main object and movement thereof. Such two factors have been considered including the region of interest (ROI) for target object. Experimentally, we set the main object as human for storing the interaction object and movement in each sequential frame. Our method aims to preserve the ROI to the maximum extent possible over retargeting constraints for the target resolution. With a view to preserving the original main object, we rely on an object detection model to identify human-oriented objects; subsequently, we conduct a decision-making process to determine the suitability of our scheme. Upon the application of the proposed method, video frames are split into many patches and then generated with a precise target resolution using a video super-resolution model. The results of retargeting the frame images are compared against quality assessment metrics. The PSNR, SSIM, MS-SSIM, LPIPS, BMPRI, BRISQUE, PIQE and NIQE were used. We perform comparative experiments to confirm that the proposed approach can maintain the original ratio of important objects and the content of the video. We experimentally demonstrate that the proposed approach could enhance video resolution while ensuring visually pleasing quality and original important object.








Similar content being viewed by others
Data availability
Two datasets: DAVIS [19] and TVD [30] are used in this research. The DAVIS [19] dataset can be obtained at https://davischallenge.org/davis2017/code.html, and TVD [30] dataset can be accessed at https://multimedia.tencent.com/resources/tvd. The results of retargeting videos can be accessed at http://gofile.me/7apVR/aNiuR7y1d.
References
Jocher G, Stoken A, Chaurasia A, Borovec J, Kwon Y, Michael K et al (2021) Ultralytics/yolov5: v6. 0-YOLOv5n'Nano'models. In: Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo
Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graph 26(3):10
Bansal A, Ma S, Ramanan D, Sheikh Y (2018) Recycle-GAN: Unsupervised video retargeting. In: Proceedings of the European conference on computer vision (ECCV), pp 119–135
Cheng WH, Wang CW, Wu JL (2006) Video adaptation for small display based on content recomposition. IEEE Trans Circuits Syst Video Technol 17(1):43–58
Cho D, Park J, Oh TH, Tai YW, So Kweon I (2017) Weakly-and self-supervised learning for content-aware deep image retargeting. In: Proceedings of the IEEE international conference on computer vision, pp 4558–4567
Cho D, Jung Y, Rameau F, Kim D, Woo S, Kweon IS (2019) Video Retargeting: trade-off between content preservation and spatio-temporal consistency. In: Proceedings of the 27th ACM international conference on multimedia, pp 882–889
Chu M, Xie Y, Mayer J et al (2020) Learning temporal coherence via self-supervision for gan-based video generation. ACM Trans Graphics (TOG) 39(4):75–81
Dong C, Loy CC, He K et al (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Duchon CE (1979) Lanczos filtering in one and two dimensions. J Appl Meteorol Climatol 18(8):1016–1022
Imani H, Islam MB, Wong LK (2023) Saliency-aware stereoscopic video retargeting. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 1230–1239
Jin JG, Bae J, Baek HG, Park SH (2023) Object-ratio-preserving video retargeting framework based on segmentation and inpainting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 497–503
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410
Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160
Kiess J, Kopf S, Guthier B et al (2018) A survey on content-aware image and video retargeting. ACM Trans Multimed Comput Commun App 14(3):28. https://doi.org/10.1145/3231598
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Min X, Zhai G, Gu K et al (2018) Blind image quality estimation via distortion aggravation. IEEE Trans Broadcast 64(2):508–517
Mittal A, Moorthy AK, Bovik AC (2011) Blind/referenceless image spatial quality evaluator. In: 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR). IEEE, pp 723–727
Ni H, Liu Y, Huang SX, Xue Y (2023) Cross-identity video motion retargeting with joint transformation and synthesis. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 412–422
Pont-Tuset J, Perazzi F, Caelles S, Arbeláez P, Sorkine-Hornung A, Van Gool L (2017) The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rubinstein M, Shamir A, Avidan S (2008) Improved seam carving for video retargeting. ACM Transactions Graphics (TOG) 27(3):1–9
Shocher A, Bagon S, Isola P, Irani M (2019) Ingan: Capturing and retargeting the “DNA” of a natural image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4492–4501
Th’evenaz P, Blu T, Unser M (2000) Image interpolation and resampling. Handbook Med Imaging, Process Anal 1(1):393–420
Tomar S (2006) Converting video formats with ffmpeg. Linux Journal 2006(146):10
Venkatanath N, Praneeth D, Bh MC, Channappayya SS, Medasani SS (2015) Blind image quality evaluation using perception based features. In: 2015 twenty first national conference on communications (NCC). IEEE, pp 1–6
Wang YS, Lin HC, Sorkine O, Lee TY (2010) Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 papers, pp 1–9
Wang YS, Hsiao JH, Sorkine O, Lee TY (2011) Scalable and coherent video resizing with per-frame optimization. ACM Trans Graph (TOG) 30(4):1–8
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In: The thrity-seventh Asilomar conference on signals, systems & computers, vol 2. IEEE, pp 1398–1402
Wang Z, Bovik AC, Sheikh HR et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Xu X, Liu S, Li Z (2021) Tencent video dataset (tvd): A video dataset for learning-based visual data compression and analysis. https://doi.org/10.48550/ARXIV. 2105.05961, URL https://arxiv.org/abs/2105.05961
Yang Z, Zhu W, Wu W, Qian C, Zhou Q, Zhou B, Loy CC (2020) Transmomo: Invariance-driven unsupervised video motion retargeting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5306–5315
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595
Granot N, Feinstein B, Shocher A, Bagon S, Irani M (2022) Drop the GAN: In defense of patches nearest neighbors as single image generative models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13460–13469
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Acknowledgements
This work was supported in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-00087, Development of high-quality conversion technology for SD/HD low-quality media) and in part by the BK21 FOUR project (AI-driven Convergence Software Education Research Program) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interests
The authors declare that there are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, DH., Lee, S., Bae, J. et al. Human-oriented video retargeting via object detection and patch decision. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18878-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18878-6