research-article

Iterative Motion Editing with Natural Language

Authors:

Kuan-Chieh Wang,

Kayvon FatahalianAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 71, Pages 1 - 9

https://doi.org/10.1145/3641519.3657447

Published: 13 July 2024 Publication History

Abstract

Text-to-motion diffusion models can generate realistic animations from text prompts, but do not support fine-grained motion editing controls. In this paper, we present a method for using natural language to iteratively specify local edits to existing character animations, a task that is common in most computer animation workflows. Our key idea is to represent a space of motion edits using a set of kinematic motion editing operators (MEOs) whose effects on the source motion is well-aligned with user expectations. We provide an algorithm that leverages pre-existing language models to translate textual descriptions of motion edits into source code for programs that define and execute sequences of MEOs on a source animation. We execute MEOs by first translating them into keyframe constraints, and then use diffusion-based motion models to generate output motions that respect these constraints. Through a user study and quantitative evaluation, we demonstrate that our system can perform motion edits that respect the animator’s editing intent, remain faithful to the original animation (it edits the original animation, but does not dramatically change it), and yield realistic character animation results.

Supplemental Material

MP4 File - presentation

presentation

Download
449.22 MB

MP4 File

Three supplemental materials: Appendix, Video, and Extra Video

Download
54.27 MB

MP4 File

Three supplemental materials: Appendix, Video, and Extra Video

Download
22.38 MB

PDF File

Three supplemental materials: Appendix, Video, and Extra Video

Download
1.43 MB

References

[1]

Kfir Aberman, Yijia Weng, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020. Unpaired Motion Style Transfer from Video to Animation. ACM Transactions on Graphics (TOG) 39, 4 (2020), 64.

Digital Library

[2]

Maneesh Agrawala. 2023. Unpredictable Black Boxes are Terrible Interfaces. https://magrawala.substack.com/p/unpredictable-black-boxes-are-terrible.

[3]

Andreas Aristidou, Daniel Cohen-Or, Jessica K. Hodgins, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Deep Motifs and Motion Signatures. ACM Trans. Graph. 37, 6, Article 187 (dec 2018), 13 pages. https://doi.org/10.1145/3272127.3275038

Digital Library

[4]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208–18218.

[5]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Computer Vision – ECCV 2016(Lecture Notes in Computer Science). Springer International Publishing.

[6]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR.

[7]

Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno-Noguer, and Grégory Rogez. 2023. PoseFix: Correcting 3D Human Poses with Natural Language. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15018–15028.

[8]

Mark Endo, Joy Hsu, Jiaman Li, and Jiajun Wu. 2023. Motion Question Answering via Modular Motion Programs. ICML (2023).

[9]

Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, and Michael J. Black. 2023. PoseGPT: Chatting about 3D Human Pose. arxiv:2311.18836 [cs.CV]

[10]

Michael Gleicher. 1997. Motion Editing with Spacetime Constraints. In Proceedings of the 1997 Symposium on Interactive 3D Graphics (Providence, Rhode Island, USA) (I3D ’97). Association for Computing Machinery, New York, NY, USA, 139–ff.https://doi.org/10.1145/253284.253321

Digital Library

[11]

Michael Gleicher. 2001. Motion Path Editing. In Proceedings of the 2001 Symposium on Interactive 3D Graphics(I3D ’01). Association for Computing Machinery, New York, NY, USA, 195–202. https://doi.org/10.1145/364338.364400

Digital Library

[12]

Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, and Kai Chen. 2023. MultiModal-GPT: A Vision and Language Model for Dialogue with Humans. arxiv:2305.04790 [cs.CV]

[13]

Deepak Gopinath and Jungdam Won. 2020. fairmotion - Tools to load, process and visualize motion capture data. Github. https://github.com/facebookresearch/fairmotion

[14]

Chuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, and Li Cheng. 2023. MoMask: Generative Masked Modeling of 3D Human Motions. (2023). arxiv:2312.00063 [cs.CV]

[15]

Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, and Li Cheng. 2022. Generating Diverse and Natural 3D Human Motions From Text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5152–5161.

[16]

Sehoon Ha and C. Karen Liu. 2015. Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques. ACM Trans. Graph. 34, 1, Article 1 (dec 2015), 11 pages. https://doi.org/10.1145/2682626

Digital Library

[17]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).

[18]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. arXiv preprint arxiv:2006.11239 (2020).

[19]

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. CoRR abs/2201.07207 (2022). arXiv:2201.07207https://arxiv.org/abs/2201.07207

[20]

Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. 2023. MotionGPT: Human Motion as a Foreign Language. arXiv preprint arXiv:2306.14795 (2023).

[21]

Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, and Jenq-Neng Hwang. 2024. Back to optimization: Diffusion-based zero-shot 3d human pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 6142–6152.

[22]

Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwajanakorn, and Siyu Tang. 2023. Optimizing Diffusion Noise Can Serve As Universal Motion Priors. In arxiv:2312.11994.

[23]

Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2023. Motion Graphs (1 ed.). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3596711.3596788

Digital Library

[24]

Sumith Kulal, Jiayuan Mao, Alex Aiken, and Jiajun Wu. 2021. Hierarchical Motion Understanding via Motion Programs. In CVPR.

[25]

Jehee Lee and Sung Yong Shin. 1999. A Hierarchical Approach to Interactive Motion Editing for Human-like Figures. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 39–48. https://doi.org/10.1145/311535.311539

Digital Library

[26]

Jiaman Li, Jiajun Wu, and C Karen Liu. 2023. Object motion guided human motion synthesis. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–11.

Digital Library

[27]

Ruilong Li, Shan Yang, David A. Ross, and Angjoo Kanazawa. 2021. AI Choreographer: Music Conditioned 3D Dance Generation with AIST++. arxiv:2101.08779 [cs.CV]

[28]

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as Policies: Language Model Programs for Embodied Control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE. https://doi.org/10.1109/icra48891.2023.10160591

[29]

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision. 5442–5451.

[30]

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. arxiv:2108.01073 [cs.CV]

[31]

Lea Müller, Vickie Ye, Georgios Pavlakos, Michael Black, and Angjoo Kanazawa. 2023. Generative Proxemics: A Prior for 3D Social Interaction from Images. arXiv preprint arXiv:2306.09337 (2023).

[32]

Boris N. Oreshkin, Florent Bocquelet, Félix G. Harvey, Bay Raitt, and Dominic Laflamme. 2022. ProtoRes: Proto-Residual Network for Pose Authoring via Learned Inverse Kinematics. In International Conference on Learning Representations.

[33]

Jia Qin, Youyi Zheng, and Kun Zhou. 2022. Motion In-Betweening via Two-Stage Transformers. ACM Trans. Graph. 41, 6, Article 184 (nov 2022), 16 pages. https://doi.org/10.1145/3550454.3555454

Digital Library

[34]

Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, and Or Litany. 2023. Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion. In Conference on Computer Vision and Pattern Recognition (CVPR).

[35]

Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Xiao Ma, Liang Pan, and Ziwei Liu. 2023. InsActor: Instruction-driven Physics-based Characters. NeurIPS (2023).

[36]

Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Ré, and Kayvon Fatahalian. 2023. Collage Diffusion. arxiv:2303.00262 [cs.CV]

[37]

Yonatan Shafir, Guy Tevet, Roy Kapon, and Amit H Bermano. 2023. Human motion diffusion as a generative prior. arXiv preprint arXiv:2303.01418 (2023).

[38]

Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning. arxiv:2303.11366 [cs.AI]

[39]

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. 2023. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). 11523–11530. https://doi.org/10.1109/ICRA48891.2023.10161317

[40]

Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. ViperGPT: Visual Inference via Python Execution for Reasoning. Proceedings of IEEE International Conference on Computer Vision (ICCV) (2023).

[41]

Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. 2023. Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=SJ1kSyO2jwu

[42]

Jonathan Tseng, Rodrigo Castellon, and C Karen Liu. 2022. EDGE: Editable Dance Generation From Music. arXiv preprint arXiv:2211.10658 (2022).

[43]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023b. Voyager: An Open-Ended Embodied Agent with Large Language Models. arxiv:2305.16291 [cs.AI]

[44]

Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, João Pedro Araújo, Jeffrey Gu, Karen Liu, and Serena Yeung. 2023a. NeMo: Learning 3D Neural Motion Fields From Multiple Video Instances of the Same Action. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22129–22138.

[45]

Dong Wei, Xiaoning Sun, Huaijiang Sun, Bin Li, Sheng liang Hu, Weiqing Li, and Jian-Zhou Lu. 2023a. Understanding Text-driven Motion Synthesis with Keyframe Collaboration via Diffusion Models. ArXiv abs/2305.13773 (2023). https://api.semanticscholar.org/CorpusID:258841591

[46]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023b. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903 [cs.CL]

[47]

Andrew P. Witkin and Zoran Popovic. 1995. Motion warping. Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (1995). https://api.semanticscholar.org/CorpusID:1497012

[48]

Wilson Yan, Yunzhi Zhang, Pieter Abbeel, and Aravind Srinivas. 2021. VideoGPT: Video Generation using VQ-VAE and Transformers. arxiv:2104.10157 [cs.CV]

[49]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arxiv:2210.03629 [cs.CL]

[50]

Wenjie Yin, Hang Yin, Kim Baraka, Danica Kragic, and Mårten Björkman. 2023. Dance Style Transfer with Cross-modal Transformer. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 5047–5056. https://doi.org/10.1109/WACV56688.2023.00503

[51]

Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, and Ziwei Liu. 2022. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model. arXiv preprint arXiv:2208.15001 (2022).

[52]

Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, and Ziwei Liu. 2023. FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing. NeurIPS (2023).

Cited By

Athanasiou NCseke ADiomataris MBlack MVarol G(2024)MotionFix: Text-Driven 3D Human Motion EditingSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687559(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687559
Huang YWan WYang YCallison-Burch CYatskar MLiu L(2024)CoMo: Controllable Motion Generation Through Language Guided Pose Code EditingComputer Vision – ECCV 202410.1007/978-3-031-73397-0_11(180-196)Online publication date: 3-Nov-2024
https://doi.org/10.1007/978-3-031-73397-0_11
Huang IYang GGuibas L(2024)BlenderAlchemy: Editing 3D Graphics with Vision-Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-73024-5_18(297-314)Online publication date: 24-Nov-2024
https://doi.org/10.1007/978-3-031-73024-5_18
Show More Cited By

Index Terms

Iterative Motion Editing with Natural Language
1. Computing methodologies
  1. Computer graphics
    1. Animation

Recommendations

A magic wand for motion capture editing and edit propagation
SA '18: SIGGRAPH Asia 2018 Technical Briefs

This paper introduces a new method for editing character animation, by using a data-driven pose distance as a falloff to interpolate edited poses seamlessly into a motion sequence. This pose distance is defined using Green's function of the pose space ...
Gestural motion editing using mobile devices
MIG '16: Proceedings of the 9th International Conference on Motion in Games

We present novel techniques for interactive editing the motion of an animated character by gesturing with a mobile device. Our approach is based on the notion that humans are generally able to convey motion using simple and abstract mappings from their ...
Layered acting for character animation

We introduce an acting-based animation system for creating and editing character animation at interactive speeds. Our system requires minimal training, typically under an hour, and is well suited for rapidly prototyping and creating expressive motion. A ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

July 2024

1106 pages

ISBN:9798400705250

DOI:10.1145/3641519

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SIGGRAPH '24

Sponsor:

SIGGRAPH

SIGGRAPH '24: Special Interest Group on Computer Graphics and Interactive Techniques Conference

July 27 - August 1, 2024

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)393
Downloads (Last 6 weeks)41

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Athanasiou NCseke ADiomataris MBlack MVarol G(2024)MotionFix: Text-Driven 3D Human Motion EditingSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687559(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687559
Huang YWan WYang YCallison-Burch CYatskar MLiu L(2024)CoMo: Controllable Motion Generation Through Language Guided Pose Code EditingComputer Vision – ECCV 202410.1007/978-3-031-73397-0_11(180-196)Online publication date: 3-Nov-2024
https://doi.org/10.1007/978-3-031-73397-0_11
Huang IYang GGuibas L(2024)BlenderAlchemy: Editing 3D Graphics with Vision-Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-73024-5_18(297-314)Online publication date: 24-Nov-2024
https://doi.org/10.1007/978-3-031-73024-5_18
Zhang MJin DGu CHong FCai ZHuang JZhang CGuo XYang LHe YLiu Z(2024)Large Motion Model for Unified Multi-modal Motion GenerationComputer Vision – ECCV 202410.1007/978-3-031-72624-8_23(397-421)Online publication date: 26-Oct-2024
https://doi.org/10.1007/978-3-031-72624-8_23

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten