research-article

Keyframe Extraction from Motion Capture Sequences with Graph based Deep Reinforcement Learning

Authors:

Zhiyong WangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5194 - 5202

https://doi.org/10.1145/3474085.3475635

Published: 17 October 2021 Publication History

Abstract

Animation production workflows centred around motion capture techniques often require animators to edit the motion for various artistic and technical reasons. This process generally uses a set of keyframes. Unsupervised keyframe selection methods for motion capture sequences are highly demanded to reduce the laborious annotations. However, most existing methods are optimization-based, which cause the issues of flexibility and efficiency and eventually constrains the interactions and controls with animators. To address these limitations, we propose a novel graph based deep reinforcement learning method for efficient unsupervised keyframe selection. First, a reward function is devised in terms of reconstruction difference by comparing the original sequence and the interpolated sequence produced by the keyframes. The reward complies with the requirements of the animation pipeline satisfying: 1) incremental reward to evaluate the interpolated keyframes immediately; 2) order insensitivity for consistent evaluation; and 3) non-diminishing return for comparable rewards between optimal and sub-optimal solutions. Then by representing each skeleton frame as a graph, a graph-based deep agent is guided to heuristically select keyframes to maximize the reward. During the inference it is no longer necessary to estimate the reconstruction difference, and the evaluation time can be reduced significantly. The experimental results on the CMU Mocap dataset demonstrate that our proposed method is able to select keyframes at a high efficiency without clearly compromising the quality in comparison with the state-of-the-art methods.

Supplementary Material

ZIP File (mfp2537aux.zip)

Comparisons of our keyframe selection method with human demonstrations and the optimal directed path finding algorithm.

Download
16.97 MB

References

[1]

2003. Carnegie-Mellon University Motion Capture Database. http://mocap.cs. cmu.edu/

[2]

Evlampios Apostolidis, Alexandros I Metsai, Eleni Adamantidou, Vasileios Mezaris, and Ioannis Patras. 2019. A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In International Workshop on AI for Smart TV Content Production, Access and Delivery. 17--25.

Digital Library

[3]

Geoffrey H Ball and David J Hall. 1965. ISODATA, a novel method of data analysis and pattern classification. Technical Report. Stanford research inst Menlo Park CA.

[4]

Eyuphan Bulut and Tolga Capin. 2007. Key frame extraction from motion capture data by curve saliency. In Computer animation and social agents. 119.

[5]

Xiaojing Chang, Pengfei Yi, and Qiang Zhang. 2016. Key frames extraction from human motion capture data based on hybrid particle swarm optimization algorithm. In Recent Developments in Intelligent Information and Database Systems. Springer, 335--342.

[6]

Cihan Halit and Tolga Capin. 2011. Multiscale motion saliency for keyframe extraction from motion capture sequences. Computer Animation and Virtual Worlds, Vol. 22, 1 (2011), 3--14.

Digital Library

[7]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[8]

Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon JG Lewis, and David Dagan Feng. 2019. Graph sequence recurrent neural network for vision-based freezing of gait detection. IEEE Transactions on Image Processing, Vol. 29 (2019), 1890--1901.

Digital Library

[9]

Zhong Ji, Yuxiao Zhao, Yanwei Pang, Xi Li, and Jungong Han. 2020. Deep attentive video summarization with distribution consistency learning. IEEE transactions on neural networks and learning systems (2020).

[10]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[11]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[12]

Midori Kitagawa and Brian Windsor. 2020. MoCap for artists: workflow and techniques for motion capture. CRC Press.

Digital Library

[13]

Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1346--1353.

Digital Library

[14]

Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, and Ling Shao. 2021. Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recognition, Vol. 111 (2021), 107677.

[15]

SY Li, J Hou, and LY Gan. 2015. Extraction of motion key-frame Based on inter-frame Pitch. Comput. Eng, Vol. 41 (2015), 242--247.

[16]

Long-Ji Lin. 1993. Reinforcement learning for robots using neural networks. Technical Report. Carnegie-Mellon Univ Pittsburgh PA School of Computer Science.

[17]

David Liu, Gang Hua, and Tsuhan Chen. 2010. A hierarchical visual model for video object summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, 12 (2010), 2178--2190.

Digital Library

[18]

Feng Liu, Y. Zhuang, Fei Wu, and Yunhe Pan. 2003. 3D motion retrieval with motion index tree. Comput. Vis. Image Underst., Vol. 92 (2003), 265--284.

Digital Library

[19]

Xian-mei Liu, Ai-min Hao, and Dan Zhao. 2013. Optimization-based key frame extraction for motion capture animation. The visual computer, Vol. 29, 1 (2013), 85--95.

[20]

David G Lowe. 1999. Object recognition from local scale-invariant features. In International Conference on Computer Vision, Vol. 2. Ieee, 1150--1157.

Digital Library

[21]

Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In IEEE conference on Computer Vision and Pattern Recognition. 202--211.

[22]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[23]

Min Je Park and Sung Yong Shin. 2004. Example-based motion cloning. Computer Animation and Virtual Worlds, Vol. 15, 3--4 (2004), 245--257.

Digital Library

[24]

Richard Roberts. 2018. Converting Motion Capture Into Editable Keyframe Animation: Fast, Optimal, and Generic Keyframe Selection. (2018).

[25]

Richard Roberts, John P Lewis, Ken Anjyo, Jaewoo Seo, and Yeongho Seol. 2018. Optimal and interactive keyframe selection for motion capture. In SIGGRAPH Asia 2018 Technical Briefs. 1--4.

Digital Library

[26]

Mrigank Rochan, Linwei Ye, and Yang Wang. 2018. Video summarization using fully convolutional sequence networks. In European Conference on Computer Vision. 347--363.

[27]

Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, Vol. 45, 11 (1997), 2673--2681.

Digital Library

[28]

Shubham Sharma, Shubhankar Verma, Mohit Kumar, and Lavanya Sharma. 2019. Use of motion capture in 3D animation: motion capture systems, challenges, and recent trends. In International Conference on Machine Learning, Big Data, Cloud and Parallel Computing. IEEE, 289--294.

[29]

D. Shelton. [n.d.]. Capturing The Last of Us: Motion Capture Pipeline. GDC 2015. https://www.gdcvault.com/play/1021854/Capturing-The-Last-of-Us

[30]

Ken Shoemake. 1985. Animating rotation with quaternion curves. In Proceedings of the 12th annual conference on Computer graphics and interactive techniques. 245--254.

Digital Library

[31]

Gregory G Slabaugh. 1999. Computing Euler angles from a rotation matrix. (1999).

[32]

F. Thomas and O. Johnston. 1981. Disney Animation: The Illusion of Life. Abbeville Press. 81012699

[33]

Hado Van Hasselt, Arthur Guez, and David Silver. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015).

[34]

Junbo Wang, Wei Wang, Zhiyong Wang, Liang Wang, Dagan Feng, and Tieniu Tan. 2019. Stacked memory network for video summarization. In ACM International Conference on Multimedia. 836--844.

Digital Library

[35]

Jiaxin Wu, Sheng-hua Zhong, Jianmin Jiang, and Yunyun Yang. 2017. A novel clustering method for static video summarization. Multimedia Tools and Applications, Vol. 76, 7 (2017), 9625--9641.

Digital Library

[36]

Man Yu Zhang. 2013. Application of performance motion capture technology in film and television performance animation. In Applied Mechanics and Materials, Vol. 347. Trans Tech Publ, 2781--2784.

[37]

Qiang Zhang, Shao-Pei Yu, Dong-Sheng Zhou, and Xiao-Peng Wei. 2013. An efficient method of key-frame extraction based on a cluster algorithm. Journal of Human Kinetics, Vol. 39, 1 (2013), 5--14.

[38]

Qiang Zhang, Shulu Zhang, and Dongsheng Zhou. 2014. Keyframe extraction from human motion capture data based on a multiple population genetic algorithm. Symmetry, Vol. 6, 4 (2014), 926--937.

[39]

Huiyu Zhou, Abdul H Sadka, Mohammad R Swash, Jawid Azizi, and Umar A Sadiq. 2010. Feature extraction and clustering for dynamic video summarisation. Neurocomputing, Vol. 73, 10--12 (2010), 1718--1729.

Digital Library

[40]

Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018a. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[41]

Kaiyang Zhou, Tao Xiang, and Andrea Cavallaro. 2018b. Video summarisation by classification with deep reinforcement learning. arXiv preprint arXiv:1807.03089 (2018).

Cited By

Liu AHu KYue WWu QWang Z(2023)Material-Aware Self-Supervised Network for Dynamic 3D Garment Simulation2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00114(630-635)Online publication date: Jul-2023
https://doi.org/10.1109/ICME55011.2023.00114
Wu WHu KYue WLi WSimic MLi CXiang WWang Z(2023)Self-Supervised Multimodal Fusion Network for Knee Osteoarthritis Severity Grading2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA60407.2023.00017(57-64)Online publication date: 28-Nov-2023
https://doi.org/10.1109/DICTA60407.2023.00017
Mo CHu KLong CWang Z(2023)Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01335(13894-13903)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01335
Show More Cited By

Index Terms

Keyframe Extraction from Motion Capture Sequences with Graph based Deep Reinforcement Learning
1. Computing methodologies

Recommendations

Optimal and interactive keyframe selection for motion capture
SA '18: SIGGRAPH Asia 2018 Technical Briefs

Motion capture is increasingly used in games and movies. However, it often requires editing before it can be used. Unfortunately, editing is laborious because of the low-level representation of the data. Existing motion editing methods accomplish modest ...
Multiscale motion saliency for keyframe extraction from motion capture sequences

Motion capture is an increasingly popular animation technique; however data acquired by motion capture can become substantial. This makes it difficult to use motion capture data in a number of applications, such as motion editing, motion understanding, ...
Using motion capture for interactive motion editing
VRCAI '14: Proceedings of the 13th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

Motion capture technology has been widely used for creating character motions. Motion editing is usually also required to adjust captured motions. Because character poses which include joint rotations, body positions, and orientations are high-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
427
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)13

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu AHu KYue WWu QWang Z(2023)Material-Aware Self-Supervised Network for Dynamic 3D Garment Simulation2023 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME55011.2023.00114(630-635)Online publication date: Jul-2023
https://doi.org/10.1109/ICME55011.2023.00114
Wu WHu KYue WLi WSimic MLi CXiang WWang Z(2023)Self-Supervised Multimodal Fusion Network for Knee Osteoarthritis Severity Grading2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA60407.2023.00017(57-64)Online publication date: 28-Nov-2023
https://doi.org/10.1109/DICTA60407.2023.00017
Mo CHu KLong CWang Z(2023)Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01335(13894-13903)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01335
Weng JHu KYao TWang JWang Z(2023)Federated Unsupervised Cluster-Contrastive learning for person Re-identificationComputer Vision and Image Understanding10.1016/j.cviu.2023.103831237:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.cviu.2023.103831
Chen S(2022)Multimedia Meets Deep Reinforcement LearningIEEE MultiMedia10.1109/MMUL.2022.319647929:3(5-7)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1109/MMUL.2022.3196479

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents