Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581641.3584069acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
Open access

SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks

Published: 27 March 2023 Publication History


This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.

Supplementary Material

MP4 File (video_figure.mp4)
Video figure of SmartRecorder


Nikola Banovic, Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2012. Waken: Reverse Engineering Usage Information and Interface Structure from Software Videos. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, New York, NY, USA, 83–92. https://doi.org/10.1145/2380116.2380129
Lingfeng Bao, Zhenchang Xing, Xin Xia, and David Lo. 2018. Vt-revolution: Interactive programming video tutorial authoring and watching system. IEEE Transactions on Software Engineering 45, 8 (2018), 823–838.
Lawrence Bergman, Vittorio Castelli, Tessa Lau, and Daniel Oblinger. 2005. DocWizards: A System for Authoring Follow-Me Documentation Wizards. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology (Seattle, WA, USA) (UIST ’05). Association for Computing Machinery, New York, NY, USA, 191–200. https://doi.org/10.1145/1095034.1095067
Carlos Bernal-Cárdenas, Nathan Cooper, Madeleine Havranek, Kevin Moran, Oscar Chaparro, Denys Poshyvanyk, and Andrian Marcus. 2022. Translating Video Recordings of Complex Mobile App UI Gestures Into Replayable Scenarios. IEEE Transactions on Software Engineering(2022).
Carlos Bernal-Cárdenas, Nathan Cooper, Kevin Moran, Oscar Chaparro, Andrian Marcus, and Denys Poshyvanyk. 2020. Translating Video Recordings of Mobile App Usages into Replayable Scenarios. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 309–321. https://doi.org/10.1145/3377811.3380328
Diogo Cabral and Nuno Correia. 2017. Video editing with pen-based technology. Multimedia Tools & Applications 76, 5 (2017), 6889–6914.
Juan Casares, A Chris Long, Brad A Myers, Rishi Bhatnagar, Scott M Stevens, Laura Dabbish, Dan Yocum, and Albert Corbett. 2002. Simplifying video editing using metadata. In Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques. 157–166.
Peggy Chi, Nathan Frey, Katrina Panovich, and Irfan Essa. 2021. Automatic Instructional Video Creation from a Markdown-Formatted Tutorial. In The 34th Annual ACM Symposium on User Interface Software and Technology. 677–690.
Peggy Chi, Zheng Sun, Katrina Panovich, and Irfan Essa. 2020. Automatic Video Creation From a Web Page. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 279–292.
Pei-Yu Chi, Sally Ahn, Amanda Ren, Mira Dontcheva, Wilmot Li, and Björn Hartmann. 2012. MixT: Automatic Generation of Step-by-Step Mixed Media Tutorials. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, New York, NY, USA, 93–102. https://doi.org/10.1145/2380116.2380130
Deephow. 2022. Deephow. https://www.deephow.cn/#solution
Jonathan D. Denning, William B. Kerr, and Fabio Pellacini. 2011. MeshFlow: Interactive Visualization of Mesh Construction Sequences. In ACM SIGGRAPH 2011 Papers (Vancouver, British Columbia, Canada) (SIGGRAPH ’11). Association for Computing Machinery, New York, NY, USA, Article 66, 8 pages. https://doi.org/10.1145/1964921.1964961
Jennifer Fernquist, Tovi Grossman, and George Fitzmaurice. 2011. Sketch-Sketch Revolution: An Engaging Tutorial System for Guided Sketching and Application Learning. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 373–382. https://doi.org/10.1145/2047196.2047245
Google. 2022. Documentation for app developers. https://developer.android.com/docs
Floraine Grabler, Maneesh Agrawala, Wilmot Li, Mira Dontcheva, and Takeo Igarashi. 2009. Generating Photo Manipulation Tutorials by Demonstration. In ACM SIGGRAPH 2009 Papers (New Orleans, Louisiana) (SIGGRAPH ’09). Association for Computing Machinery, New York, NY, USA, Article 66, 9 pages. https://doi.org/10.1145/1576246.1531372
Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2010. Chronicle: Capture, Exploration, and Playback of Document Workflow Histories. Association for Computing Machinery, New York, NY, USA, 143–152. https://doi.org/10.1145/1866029.1866054
Jeff Huang and Michael B. Twidale. 2007. Graphstract: Minimal Graphical Help for Computers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (Newport, Rhode Island, USA) (UIST ’07). Association for Computing Machinery, New York, NY, USA, 203–212. https://doi.org/10.1145/1294211.1294248
Michael Xuelin Huang, Yang Li, Nazneen Nazneen, Alexander Chao, and Shumin Zhai. 2021. TapNet: The Design, Training, Implementation, and Applications of a Multi-Task Learning CNN for Off-Screen Mobile Input. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 282, 11 pages. https://doi.org/10.1145/3411764.3445626
Amir Jahanlou and Parmit K Chilana. 2022. Katika: An End-to-End System for Authoring Amateur Explainer Motion Graphics Videos. In CHI Conference on Human Factors in Computing Systems. 1–14.
Rubaiat Habib Kazi, Tovi Grossman, Nobuyuki Umetani, and George Fitzmaurice. 2016. Motion amplifiers: sketching dynamic illustrations using the principles of 2D animation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 4599–4609.
Kimia Kiani, Parmit K Chilana, Andrea Bunt, Tovi Grossman, and George Fitzmaurice. 2020. “I Would Just Ask Someone”: Learning Feature-Rich Design Software in the Modern Workplace. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–10.
Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 4017–4026. https://doi.org/10.1145/2556288.2556986
Kenneth A Kobak, Wendy L Stone, Elizabeth Wallace, Zachary Warren, Amy Swanson, and Kraig Robson. 2011. A web-based tutorial for parents of young children with autism: results from a pilot study. Telemedicine and e-Health 17, 10 (2011), 804–808.
Junhan Kong, Dena Sabha, Jeffrey P Bigham, Amy Pavel, and Anhong Guo. 2021. TutorialLens: Authoring Interactive Augmented Reality Tutorials Through Narration and Demonstration. In Symposium on Spatial User Interaction (Virtual Event, USA) (SUI ’21). Association for Computing Machinery, New York, NY, USA, Article 16, 11 pages. https://doi.org/10.1145/3485279.3485289
Nicholas Kong, Tovi Grossman, Björn Hartmann, Maneesh Agrawala, and George Fitzmaurice. 2012. Delta: A Tool for Representing and Comparing Workflows. Association for Computing Machinery, New York, NY, USA, 1027–1036. https://doi.org/10.1145/2207676.2208549
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (may 2017), 84–90. https://doi.org/10.1145/3065386
Ben Lafreniere, Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2014. Investigating the Feasibility of Extracting Tool Demonstrations from In-Situ Video Content. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 4007–4016. https://doi.org/10.1145/2556288.2557142
Toby Jia-Jun Li, Amos Azaria, and Brad A Myers. 2017. SUGILITE: creating multimodal smartphone automation by demonstration. In Proceedings of the 2017 CHI conference on human factors in computing systems. 6038–6049.
Bruce D Lucas, Takeo Kanade, 1981. An iterative image registration technique with an application to stereo vision. Vol. 81. Vancouver.
Nichole A Martin and Ross Martin. 2015. Would you watch it? Creating effective and engaging video tutorials. Journal of Library & Information Services in Distance Learning 9, 1-2(2015), 40–56.
Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2011. Ambient Help. Association for Computing Machinery, New York, NY, USA, 2751–2760. https://doi.org/10.1145/1978942.1979349
Britta Meixner, Katarzyna Matusik, Christoph Grill, and Harald Kosch. 2014. Towards an easy to use authoring tool for interactive non-linear video. Multimedia Tools and Applications 70, 2 (2014), 1251–1276.
Cuong Nguyen and Feng Liu. 2015. Making Software Tutorial Video Responsive. Association for Computing Machinery, New York, NY, USA, 1565–1568. https://doi.org/10.1145/2702123.2702209
Amy Pavel, Gabriel Reyes, and Jeffrey P Bigham. 2020. Rescribe: Authoring and Automatically Editing Audio Descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 747–759.
Andraž Petrovčič, Sakari Taipale, Ajda Rogelj, and Vesna Dolničar. 2018. Design of mobile phones for older adults: An empirical analysis of design guidelines and checklists for feature phones and smartphones. International Journal of Human–Computer Interaction 34, 3(2018), 251–264.
Suporn Pongnumkul, Mira Dontcheva, Wilmot Li, Jue Wang, Lubomir Bourdev, Shai Avidan, and Michael F. Cohen. 2011. Pause-and-Play: Automatically Linking Screencast Video Tutorials with Applications. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology(Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 135–144. https://doi.org/10.1145/2047196.2047213
Adobe Premiere. 2022. Premiere Pro. https://www.adobe.com/products/premiere.html
Jorge Ribeiro and Ana Correia de Barros. 2014. Efficiency of a video and a tutorial in teaching older adults to interact with smartphones. In International Conference on Universal Access in Human-Computer Interaction. Springer, 34–45.
André Rodrigues, Leonardo Camacho, Hugo Nicolau, Kyle Montague, and Tiago Guerreiro. 2018. Aidme: Interactive Non-Visual Smartphone Tutorials. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, 205–212. https://doi.org/10.1145/3236112.3236141
Alborz Rezazadeh Sereshkeh, Gary Leung, Krish Perumal, Caleb Phillips, Minfan Zhang, Afsaneh Fazly, and Iqbal Mohomed. 2020. VASTA: a vision and language-assisted smartphone task automation system. In Proceedings of the 25th international conference on intelligent user interfaces. 22–32.
Cheng-Yao Wang, Wei-Chen Chu, Hou-Ren Chen, Chun-Yen Hsu, and Mike Y. Chen. 2014. EverTutor: Automatically Creating Interactive Guided Tutorials on Smartphones by User Demonstration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 4027–4036. https://doi.org/10.1145/2556288.2557407
Sarah Weir, Juho Kim, Krzysztof Z. Gajos, and Robert C. Miller. 2015. Learnersourcing Subgoal Labels for How-to Videos. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 405–416. https://doi.org/10.1145/2675133.2675219
Matt Whitlock, George Fitzmaurice, Tovi Grossman, and Justin Matejka. 2019. AuthAR: concurrent authoring of tutorials for AR assembly guidance. (2019).
Mingyuan Zhong, Gang Li, Peggy Chi, and Yang Li. 2021. HelpViz: Automatic Generation of Contextual Visual Mobile Tutorials from Text-Based Instructions. Association for Computing Machinery, New York, NY, USA, 1144–1153. https://doi.org/10.1145/3472749.3474812

Cited By

View all
  • (2024)Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental HealthCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678494(412-417)Online publication date: 5-Oct-2024
  • (2024)DesignWatch: Analyzing Users' Operations of Mobile Apps Based on Screen RecordingsAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680231(1-7)Online publication date: 21-Sep-2024
  • (2024)The future of PIM: pragmatics and potentialHuman–Computer Interaction10.1080/07370024.2024.2356155(1-28)Online publication date: 25-Jun-2024

Index Terms

  1. SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks


    Information & Contributors


    Published In

    cover image ACM Conferences
    IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces
    March 2023
    972 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 March 2023


    Request permissions for this article.

    Check for updates

    Author Tags

    1. IMU-based Interaction Trace Extraction
    2. Non-experts
    3. Smartphone Usage Video Tutorial
    4. Tutorial Creation by Demonstration


    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Natural Science Foundation of China


    IUI '23

    Acceptance Rates

    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Upcoming Conference

    IUI '25


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)780
    • Downloads (Last 6 weeks)89
    Reflects downloads up to 31 Jan 2025

    Other Metrics


    Cited By

    View all
    • (2024)Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental HealthCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678494(412-417)Online publication date: 5-Oct-2024
    • (2024)DesignWatch: Analyzing Users' Operations of Mobile Apps Based on Screen RecordingsAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680231(1-7)Online publication date: 21-Sep-2024
    • (2024)The future of PIM: pragmatics and potentialHuman–Computer Interaction10.1080/07370024.2024.2356155(1-28)Online publication date: 25-Jun-2024

    View Options

    View options


    View or Download as a PDF file.



    View online with eReader.


    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options






    Share this Publication link

    Share on social media