Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475635acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Keyframe Extraction from Motion Capture Sequences with Graph based Deep Reinforcement Learning

Published: 17 October 2021 Publication History

Abstract

Animation production workflows centred around motion capture techniques often require animators to edit the motion for various artistic and technical reasons. This process generally uses a set of keyframes. Unsupervised keyframe selection methods for motion capture sequences are highly demanded to reduce the laborious annotations. However, most existing methods are optimization-based, which cause the issues of flexibility and efficiency and eventually constrains the interactions and controls with animators. To address these limitations, we propose a novel graph based deep reinforcement learning method for efficient unsupervised keyframe selection. First, a reward function is devised in terms of reconstruction difference by comparing the original sequence and the interpolated sequence produced by the keyframes. The reward complies with the requirements of the animation pipeline satisfying: 1) incremental reward to evaluate the interpolated keyframes immediately; 2) order insensitivity for consistent evaluation; and 3) non-diminishing return for comparable rewards between optimal and sub-optimal solutions. Then by representing each skeleton frame as a graph, a graph-based deep agent is guided to heuristically select keyframes to maximize the reward. During the inference it is no longer necessary to estimate the reconstruction difference, and the evaluation time can be reduced significantly. The experimental results on the CMU Mocap dataset demonstrate that our proposed method is able to select keyframes at a high efficiency without clearly compromising the quality in comparison with the state-of-the-art methods.

Supplementary Material

ZIP File (mfp2537aux.zip)
Comparisons of our keyframe selection method with human demonstrations and the optimal directed path finding algorithm.

References

[1]
2003. Carnegie-Mellon University Motion Capture Database. http://mocap.cs. cmu.edu/
[2]
Evlampios Apostolidis, Alexandros I Metsai, Eleni Adamantidou, Vasileios Mezaris, and Ioannis Patras. 2019. A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In International Workshop on AI for Smart TV Content Production, Access and Delivery. 17--25.
[3]
Geoffrey H Ball and David J Hall. 1965. ISODATA, a novel method of data analysis and pattern classification. Technical Report. Stanford research inst Menlo Park CA.
[4]
Eyuphan Bulut and Tolga Capin. 2007. Key frame extraction from motion capture data by curve saliency. In Computer animation and social agents. 119.
[5]
Xiaojing Chang, Pengfei Yi, and Qiang Zhang. 2016. Key frames extraction from human motion capture data based on hybrid particle swarm optimization algorithm. In Recent Developments in Intelligent Information and Database Systems. Springer, 335--342.
[6]
Cihan Halit and Tolga Capin. 2011. Multiscale motion saliency for keyframe extraction from motion capture sequences. Computer Animation and Virtual Worlds, Vol. 22, 1 (2011), 3--14.
[7]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[8]
Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon JG Lewis, and David Dagan Feng. 2019. Graph sequence recurrent neural network for vision-based freezing of gait detection. IEEE Transactions on Image Processing, Vol. 29 (2019), 1890--1901.
[9]
Zhong Ji, Yuxiao Zhao, Yanwei Pang, Xi Li, and Jungong Han. 2020. Deep attentive video summarization with distribution consistency learning. IEEE transactions on neural networks and learning systems (2020).
[10]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[11]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[12]
Midori Kitagawa and Brian Windsor. 2020. MoCap for artists: workflow and techniques for motion capture. CRC Press.
[13]
Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353.
[14]
Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, and Ling Shao. 2021. Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recognition, Vol. 111 (2021), 107677.
[15]
SY Li, J Hou, and LY Gan. 2015. Extraction of motion key-frame Based on inter-frame Pitch. Comput. Eng, Vol. 41 (2015), 242--247.
[16]
Long-Ji Lin. 1993. Reinforcement learning for robots using neural networks. Technical Report. Carnegie-Mellon Univ Pittsburgh PA School of Computer Science.
[17]
David Liu, Gang Hua, and Tsuhan Chen. 2010. A hierarchical visual model for video object summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, 12 (2010), 2178--2190.
[18]
Feng Liu, Y. Zhuang, Fei Wu, and Yunhe Pan. 2003. 3D motion retrieval with motion index tree. Comput. Vis. Image Underst., Vol. 92 (2003), 265--284.
[19]
Xian-mei Liu, Ai-min Hao, and Dan Zhao. 2013. Optimization-based key frame extraction for motion capture animation. The visual computer, Vol. 29, 1 (2013), 85--95.
[20]
David G Lowe. 1999. Object recognition from local scale-invariant features. In International Conference on Computer Vision, Vol. 2. Ieee, 1150--1157.
[21]
Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In IEEE conference on Computer Vision and Pattern Recognition. 202--211.
[22]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[23]
Min Je Park and Sung Yong Shin. 2004. Example-based motion cloning. Computer Animation and Virtual Worlds, Vol. 15, 3--4 (2004), 245--257.
[24]
Richard Roberts. 2018. Converting Motion Capture Into Editable Keyframe Animation: Fast, Optimal, and Generic Keyframe Selection. (2018).
[25]
Richard Roberts, John P Lewis, Ken Anjyo, Jaewoo Seo, and Yeongho Seol. 2018. Optimal and interactive keyframe selection for motion capture. In SIGGRAPH Asia 2018 Technical Briefs. 1--4.
[26]
Mrigank Rochan, Linwei Ye, and Yang Wang. 2018. Video summarization using fully convolutional sequence networks. In European Conference on Computer Vision. 347--363.
[27]
Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, Vol. 45, 11 (1997), 2673--2681.
[28]
Shubham Sharma, Shubhankar Verma, Mohit Kumar, and Lavanya Sharma. 2019. Use of motion capture in 3D animation: motion capture systems, challenges, and recent trends. In International Conference on Machine Learning, Big Data, Cloud and Parallel Computing. IEEE, 289--294.
[29]
D. Shelton. [n.d.]. Capturing The Last of Us: Motion Capture Pipeline. GDC 2015. https://www.gdcvault.com/play/1021854/Capturing-The-Last-of-Us
[30]
Ken Shoemake. 1985. Animating rotation with quaternion curves. In Proceedings of the 12th annual conference on Computer graphics and interactive techniques. 245--254.
[31]
Gregory G Slabaugh. 1999. Computing Euler angles from a rotation matrix. (1999).
[32]
F. Thomas and O. Johnston. 1981. Disney Animation: The Illusion of Life. Abbeville Press. 81012699
[33]
Hado Van Hasselt, Arthur Guez, and David Silver. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015).
[34]
Junbo Wang, Wei Wang, Zhiyong Wang, Liang Wang, Dagan Feng, and Tieniu Tan. 2019. Stacked memory network for video summarization. In ACM International Conference on Multimedia. 836--844.
[35]
Jiaxin Wu, Sheng-hua Zhong, Jianmin Jiang, and Yunyun Yang. 2017. A novel clustering method for static video summarization. Multimedia Tools and Applications, Vol. 76, 7 (2017), 9625--9641.
[36]
Man Yu Zhang. 2013. Application of performance motion capture technology in film and television performance animation. In Applied Mechanics and Materials, Vol. 347. Trans Tech Publ, 2781--2784.
[37]
Qiang Zhang, Shao-Pei Yu, Dong-Sheng Zhou, and Xiao-Peng Wei. 2013. An efficient method of key-frame extraction based on a cluster algorithm. Journal of Human Kinetics, Vol. 39, 1 (2013), 5--14.
[38]
Qiang Zhang, Shulu Zhang, and Dongsheng Zhou. 2014. Keyframe extraction from human motion capture data based on a multiple population genetic algorithm. Symmetry, Vol. 6, 4 (2014), 926--937.
[39]
Huiyu Zhou, Abdul H Sadka, Mohammad R Swash, Jawid Azizi, and Umar A Sadiq. 2010. Feature extraction and clustering for dynamic video summarisation. Neurocomputing, Vol. 73, 10--12 (2010), 1718--1729.
[40]
Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018a. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[41]
Kaiyang Zhou, Tao Xiang, and Andrea Cavallaro. 2018b. Video summarisation by classification with deep reinforcement learning. arXiv preprint arXiv:1807.03089 (2018).

Cited By

View all
  • (2023)Material-Aware Self-Supervised Network for Dynamic 3D Garment Simulation2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00114(630-635)Online publication date: Jul-2023
  • (2023)Self-Supervised Multimodal Fusion Network for Knee Osteoarthritis Severity Grading2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA60407.2023.00017(57-64)Online publication date: 28-Nov-2023
  • (2023)Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01335(13894-13903)Online publication date: Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph convolutional networks
  2. keyframe animation
  3. keyframe extraction
  4. keyframe selection
  5. motion capture
  6. reinforcement learning

Qualifiers

  • Research-article

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)13
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Material-Aware Self-Supervised Network for Dynamic 3D Garment Simulation2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00114(630-635)Online publication date: Jul-2023
  • (2023)Self-Supervised Multimodal Fusion Network for Knee Osteoarthritis Severity Grading2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA60407.2023.00017(57-64)Online publication date: 28-Nov-2023
  • (2023)Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01335(13894-13903)Online publication date: Jun-2023
  • (2023)Federated Unsupervised Cluster-Contrastive learning for person Re-identificationComputer Vision and Image Understanding10.1016/j.cviu.2023.103831237:COnline publication date: 1-Dec-2023
  • (2022)Multimedia Meets Deep Reinforcement LearningIEEE MultiMedia10.1109/MMUL.2022.319647929:3(5-7)Online publication date: 1-Jul-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media