research-article

Open access

SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks

Authors:

Aaron J Quigley,

Yuanchun ShiAuthors Info & Claims

IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces

Pages 278 - 293

https://doi.org/10.1145/3581641.3584069

Published: 27 March 2023 Publication History

All formats PDF

Abstract

This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.

Supplementary Material

MP4 File (video_figure.mp4)

Video figure of SmartRecorder

Download
238.02 MB

References

[1]

Nikola Banovic, Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2012. Waken: Reverse Engineering Usage Information and Interface Structure from Software Videos. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, New York, NY, USA, 83–92. https://doi.org/10.1145/2380116.2380129

Digital Library

[2]

Lingfeng Bao, Zhenchang Xing, Xin Xia, and David Lo. 2018. Vt-revolution: Interactive programming video tutorial authoring and watching system. IEEE Transactions on Software Engineering 45, 8 (2018), 823–838.

[3]

Lawrence Bergman, Vittorio Castelli, Tessa Lau, and Daniel Oblinger. 2005. DocWizards: A System for Authoring Follow-Me Documentation Wizards. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology (Seattle, WA, USA) (UIST ’05). Association for Computing Machinery, New York, NY, USA, 191–200. https://doi.org/10.1145/1095034.1095067

Digital Library

[4]

Carlos Bernal-Cárdenas, Nathan Cooper, Madeleine Havranek, Kevin Moran, Oscar Chaparro, Denys Poshyvanyk, and Andrian Marcus. 2022. Translating Video Recordings of Complex Mobile App UI Gestures Into Replayable Scenarios. IEEE Transactions on Software Engineering(2022).

Digital Library

[5]

Carlos Bernal-Cárdenas, Nathan Cooper, Kevin Moran, Oscar Chaparro, Andrian Marcus, and Denys Poshyvanyk. 2020. Translating Video Recordings of Mobile App Usages into Replayable Scenarios. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 309–321. https://doi.org/10.1145/3377811.3380328

Digital Library

[6]

Diogo Cabral and Nuno Correia. 2017. Video editing with pen-based technology. Multimedia Tools & Applications 76, 5 (2017), 6889–6914.

Digital Library

[7]

Juan Casares, A Chris Long, Brad A Myers, Rishi Bhatnagar, Scott M Stevens, Laura Dabbish, Dan Yocum, and Albert Corbett. 2002. Simplifying video editing using metadata. In Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques. 157–166.

Digital Library

[8]

Peggy Chi, Nathan Frey, Katrina Panovich, and Irfan Essa. 2021. Automatic Instructional Video Creation from a Markdown-Formatted Tutorial. In The 34th Annual ACM Symposium on User Interface Software and Technology. 677–690.

Digital Library

[9]

Peggy Chi, Zheng Sun, Katrina Panovich, and Irfan Essa. 2020. Automatic Video Creation From a Web Page. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 279–292.

Digital Library

[10]

Pei-Yu Chi, Sally Ahn, Amanda Ren, Mira Dontcheva, Wilmot Li, and Björn Hartmann. 2012. MixT: Automatic Generation of Step-by-Step Mixed Media Tutorials. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, New York, NY, USA, 93–102. https://doi.org/10.1145/2380116.2380130

Digital Library

[11]

Deephow. 2022. Deephow. https://www.deephow.cn/#solution

[12]

Jonathan D. Denning, William B. Kerr, and Fabio Pellacini. 2011. MeshFlow: Interactive Visualization of Mesh Construction Sequences. In ACM SIGGRAPH 2011 Papers (Vancouver, British Columbia, Canada) (SIGGRAPH ’11). Association for Computing Machinery, New York, NY, USA, Article 66, 8 pages. https://doi.org/10.1145/1964921.1964961

Digital Library

[13]

Jennifer Fernquist, Tovi Grossman, and George Fitzmaurice. 2011. Sketch-Sketch Revolution: An Engaging Tutorial System for Guided Sketching and Application Learning. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 373–382. https://doi.org/10.1145/2047196.2047245

Digital Library

[14]

Google. 2022. Documentation for app developers. https://developer.android.com/docs

[15]

Floraine Grabler, Maneesh Agrawala, Wilmot Li, Mira Dontcheva, and Takeo Igarashi. 2009. Generating Photo Manipulation Tutorials by Demonstration. In ACM SIGGRAPH 2009 Papers (New Orleans, Louisiana) (SIGGRAPH ’09). Association for Computing Machinery, New York, NY, USA, Article 66, 9 pages. https://doi.org/10.1145/1576246.1531372

Digital Library

[16]

Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2010. Chronicle: Capture, Exploration, and Playback of Document Workflow Histories. Association for Computing Machinery, New York, NY, USA, 143–152. https://doi.org/10.1145/1866029.1866054

Digital Library

[17]

Jeff Huang and Michael B. Twidale. 2007. Graphstract: Minimal Graphical Help for Computers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (Newport, Rhode Island, USA) (UIST ’07). Association for Computing Machinery, New York, NY, USA, 203–212. https://doi.org/10.1145/1294211.1294248

Digital Library

[18]

Michael Xuelin Huang, Yang Li, Nazneen Nazneen, Alexander Chao, and Shumin Zhai. 2021. TapNet: The Design, Training, Implementation, and Applications of a Multi-Task Learning CNN for Off-Screen Mobile Input. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 282, 11 pages. https://doi.org/10.1145/3411764.3445626

Digital Library

[19]

Amir Jahanlou and Parmit K Chilana. 2022. Katika: An End-to-End System for Authoring Amateur Explainer Motion Graphics Videos. In CHI Conference on Human Factors in Computing Systems. 1–14.

[20]

Rubaiat Habib Kazi, Tovi Grossman, Nobuyuki Umetani, and George Fitzmaurice. 2016. Motion amplifiers: sketching dynamic illustrations using the principles of 2D animation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 4599–4609.

Digital Library

[21]

Kimia Kiani, Parmit K Chilana, Andrea Bunt, Tovi Grossman, and George Fitzmaurice. 2020. “I Would Just Ask Someone”: Learning Feature-Rich Design Software in the Modern Workplace. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–10.

[22]

Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 4017–4026. https://doi.org/10.1145/2556288.2556986

Digital Library

[23]

Kenneth A Kobak, Wendy L Stone, Elizabeth Wallace, Zachary Warren, Amy Swanson, and Kraig Robson. 2011. A web-based tutorial for parents of young children with autism: results from a pilot study. Telemedicine and e-Health 17, 10 (2011), 804–808.

[24]

Junhan Kong, Dena Sabha, Jeffrey P Bigham, Amy Pavel, and Anhong Guo. 2021. TutorialLens: Authoring Interactive Augmented Reality Tutorials Through Narration and Demonstration. In Symposium on Spatial User Interaction (Virtual Event, USA) (SUI ’21). Association for Computing Machinery, New York, NY, USA, Article 16, 11 pages. https://doi.org/10.1145/3485279.3485289

Digital Library

[25]

Nicholas Kong, Tovi Grossman, Björn Hartmann, Maneesh Agrawala, and George Fitzmaurice. 2012. Delta: A Tool for Representing and Comparing Workflows. Association for Computing Machinery, New York, NY, USA, 1027–1036. https://doi.org/10.1145/2207676.2208549

Digital Library

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (may 2017), 84–90. https://doi.org/10.1145/3065386

Digital Library

[27]

Ben Lafreniere, Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2014. Investigating the Feasibility of Extracting Tool Demonstrations from In-Situ Video Content. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 4007–4016. https://doi.org/10.1145/2556288.2557142

Digital Library

[28]

Toby Jia-Jun Li, Amos Azaria, and Brad A Myers. 2017. SUGILITE: creating multimodal smartphone automation by demonstration. In Proceedings of the 2017 CHI conference on human factors in computing systems. 6038–6049.

[29]

Bruce D Lucas, Takeo Kanade, 1981. An iterative image registration technique with an application to stereo vision. Vol. 81. Vancouver.

[30]

Nichole A Martin and Ross Martin. 2015. Would you watch it? Creating effective and engaging video tutorials. Journal of Library & Information Services in Distance Learning 9, 1-2(2015), 40–56.

[31]

Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2011. Ambient Help. Association for Computing Machinery, New York, NY, USA, 2751–2760. https://doi.org/10.1145/1978942.1979349

Digital Library

[32]

Britta Meixner, Katarzyna Matusik, Christoph Grill, and Harald Kosch. 2014. Towards an easy to use authoring tool for interactive non-linear video. Multimedia Tools and Applications 70, 2 (2014), 1251–1276.

Digital Library

[33]

Cuong Nguyen and Feng Liu. 2015. Making Software Tutorial Video Responsive. Association for Computing Machinery, New York, NY, USA, 1565–1568. https://doi.org/10.1145/2702123.2702209

Digital Library

[34]

Amy Pavel, Gabriel Reyes, and Jeffrey P Bigham. 2020. Rescribe: Authoring and Automatically Editing Audio Descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 747–759.

Digital Library

[35]

Andraž Petrovčič, Sakari Taipale, Ajda Rogelj, and Vesna Dolničar. 2018. Design of mobile phones for older adults: An empirical analysis of design guidelines and checklists for feature phones and smartphones. International Journal of Human–Computer Interaction 34, 3(2018), 251–264.

[36]

Suporn Pongnumkul, Mira Dontcheva, Wilmot Li, Jue Wang, Lubomir Bourdev, Shai Avidan, and Michael F. Cohen. 2011. Pause-and-Play: Automatically Linking Screencast Video Tutorials with Applications. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology(Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 135–144. https://doi.org/10.1145/2047196.2047213

Digital Library

[37]

Adobe Premiere. 2022. Premiere Pro. https://www.adobe.com/products/premiere.html

[38]

Jorge Ribeiro and Ana Correia de Barros. 2014. Efficiency of a video and a tutorial in teaching older adults to interact with smartphones. In International Conference on Universal Access in Human-Computer Interaction. Springer, 34–45.

Digital Library

[39]

André Rodrigues, Leonardo Camacho, Hugo Nicolau, Kyle Montague, and Tiago Guerreiro. 2018. Aidme: Interactive Non-Visual Smartphone Tutorials. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, 205–212. https://doi.org/10.1145/3236112.3236141

Digital Library

[40]

Alborz Rezazadeh Sereshkeh, Gary Leung, Krish Perumal, Caleb Phillips, Minfan Zhang, Afsaneh Fazly, and Iqbal Mohomed. 2020. VASTA: a vision and language-assisted smartphone task automation system. In Proceedings of the 25th international conference on intelligent user interfaces. 22–32.

Digital Library

[41]

Cheng-Yao Wang, Wei-Chen Chu, Hou-Ren Chen, Chun-Yen Hsu, and Mike Y. Chen. 2014. EverTutor: Automatically Creating Interactive Guided Tutorials on Smartphones by User Demonstration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 4027–4036. https://doi.org/10.1145/2556288.2557407

Digital Library

[42]

Sarah Weir, Juho Kim, Krzysztof Z. Gajos, and Robert C. Miller. 2015. Learnersourcing Subgoal Labels for How-to Videos. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 405–416. https://doi.org/10.1145/2675133.2675219

Digital Library

[43]

Matt Whitlock, George Fitzmaurice, Tovi Grossman, and Justin Matejka. 2019. AuthAR: concurrent authoring of tutorials for AR assembly guidance. (2019).

[44]

Mingyuan Zhong, Gang Li, Peggy Chi, and Yang Li. 2021. HelpViz: Automatic Generation of Contextual Visual Mobile Tutorials from Text-Based Instructions. Association for Computing Machinery, New York, NY, USA, 1144–1153. https://doi.org/10.1145/3472749.3474812

Digital Library

Cited By

Hu YZhang SDang TJia HSalim FHu WQuigley AKostakos VKay JHoang T(2024)Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental HealthCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678494(412-417)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678494
Zhang XZeng YLi QChen GXu QHu XPeng Z(2024)DesignWatch: Analyzing Users' Operations of Mobile Apps Based on Screen RecordingsAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680231(1-7)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3680231
Dix A(2024)The future of PIM: pragmatics and potentialHuman–Computer Interaction10.1080/07370024.2024.2356155(1-28)Online publication date: 25-Jun-2024
https://doi.org/10.1080/07370024.2024.2356155

Index Terms

SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
      1. User interface toolkits

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces

March 2023

972 pages

ISBN:9798400701061

DOI:10.1145/3581641

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Natural Science Foundation of China

Conference

IUI '23

Sponsor:

IUI '23: 28th International Conference on Intelligent User Interfaces

March 27 - 31, 2023

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
1,271
Total Downloads

Downloads (Last 12 months)780
Downloads (Last 6 weeks)89

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu YZhang SDang TJia HSalim FHu WQuigley AKostakos VKay JHoang T(2024)Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental HealthCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678494(412-417)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678494
Zhang XZeng YLi QChen GXu QHu XPeng Z(2024)DesignWatch: Analyzing Users' Operations of Mobile Apps Based on Screen RecordingsAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3680231(1-7)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3680231
Dix A(2024)The future of PIM: pragmatics and potentialHuman–Computer Interaction10.1080/07370024.2024.2356155(1-28)Online publication date: 25-Jun-2024
https://doi.org/10.1080/07370024.2024.2356155

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten