research-article

Open access

VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos

Authors:

Hariharan Subramonyam,

Eytan AdarAuthors Info & Claims

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

Pages 672 - 690

https://doi.org/10.1145/3490099.3511132

Published: 22 March 2022 Publication History

All formats PDF

Abstract

Video is an effective medium for knowledge communication and learning. Yet active viewing and note-taking from videos remain a challenge. Specifically, during note-taking, viewers find it difficult to extract essential information such as representation, composition, motion, and interactions of graphical objects and narration. Current approaches rely on creating static screenshots, manual clipping, manual annotation and transcription. This is often done by repeatedly pausing and rewinding the video, thus disrupting the viewing experience. We propose VideoSticker, a tool designed to support visual note-taking by extracting expressive content and narratives from videos as ‘object stickers.’ VideoSticker implements automated object detection and tracking, linking objects to the transcript, and supporting rapid extraction of stickers across space, time, and events of interest. VideoSticker’s two-pass approach allows viewers to capture high-level information uninterrupted and later extract specific details. We demonstrate the usability of VideoSticker for a variety of videos and note-taking needs.

Supplementary Material

MP4 File (videosticker.mp4)

Download
208.39 MB

References

[1]

Kurzgesagt – In a Nutshell. 2019. Neutron Stars – The Most Extreme Things that are not Black Holes. Youtube. https://www.youtube.com/watch?v=udFxKZRyQt4

[2]

Neural Academy. 2019. MITOSIS, CYTOKINESIS, AND THE CELL CYCLE. Youtube. https://www.youtube.com/watch?v=8uzHTKdv_Sw

[3]

Mortimer J Adler and Charles Van Doren. 2014. How to read a book: The classic guide to intelligent reading. Simon and Schuster.

[4]

Megha Agarwala, I-Han Hsiao, Hui Soo Chae, and Gary Natriello. 2012. Vialogues: Videos and dialogues based social learning environment. In 2012 IEEE 12th International Conference on Advanced Learning Technologies. IEEE, 629–633.

Digital Library

[5]

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233

Digital Library

[6]

Aaron Bauer and Kenneth R Koedinger. 2007. Selection-based note-taking applications. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 981–990.

Digital Library

[7]

Clément Benkada and Laurent Moccozet. 2017. Enriched interactive videos for teaching and learning. In 2017 21st International Conference Information Visualisation (IV). IEEE, 344–349.

[8]

Mireille Bétrancourt and Kalliopi Benetos. 2018. Why and when does instructional video facilitate learning? A commentary to the special issue “developments and trends in learning with instructional video”. Computers in Human Behavior 89 (2018), 471–475.

Digital Library

[9]

Janice M Bonner and William G Holliday. 2006. How college science students engage in note-taking strategies. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching 43, 8 (2006), 786–818.

[10]

John Boreczky, Andreas Girgensohn, Gene Golovchinsky, and Shingo Uchihashi. 2000. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI conference on Human factors in computing systems. 185–192.

Digital Library

[11]

Mike Bostock. 2012. D3.js - Data-Driven Documents. http://d3js.org/

[12]

G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools(2000).

[13]

Cynthia J. Brame. 2016. Effective Educational Videos: Principles and Guidelines for Maximizing Student Learning from Video Content. CBE—Life Sciences Education 15, 4 (2016), es6. https://doi.org/10.1187/cbe.16-03-0125 arXiv:https://doi.org/10.1187/cbe.16-03-0125PMID: 27789532.

[14]

Inc. Brightcove. 2015. Video.js - open source HTML5 & Flash video player. https://github.com/videojs/video.js.

[15]

Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46, 3 (01 Sep 2014), 904–911. https://doi.org/10.3758/s13428-013-0403-5

[16]

Dung C Bui, Joel Myerson, and Sandra Hale. 2013. Note-taking with computers: Exploring alternative strategies for improved recall.Journal of Educational Psychology 105, 2 (2013), 299.

[17]

Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, and Luc Van Gool. 2017. One-shot video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 221–230.

[18]

Sergi Caelles, Jordi Pont-Tuset, Federico Perazzi, Alberto Montes, Kevis-Kokitsi Maninis, and Luc Van Gool. 2019. The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation. arXiv:1905.00737 (2019).

[19]

Robert Carlson, Paul Chandler, and John Sweller. 2003. Learning and understanding science instructional material.Journal of educational psychology 95, 3 (2003), 629.

[20]

Paul Chandler. 2004. The crucial role of cognitive processes in the design of dynamic visualizations. Learning and Instruction 14, 3 (2004), 353–357.

[21]

Lin Chen, Jianbing Shen, Wenguan Wang, and Bingbing Ni. 2015. Video object segmentation via dense trajectories. IEEE Transactions on Multimedia 17, 12 (2015), 2225–2234.

Digital Library

[22]

Kai-Yin Cheng, Sheng-Jie Luo, Bing-Yu Chen, and Hao-Hua Chu. 2009. Smartplayer: user-centric video fast-forwarding. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 789–798.

Digital Library

[23]

Chekuri Choudary and Tiecheng Liu. 2007. Summarization of visual content in instructional videos. IEEE Transactions on Multimedia 9, 7 (2007), 1443–1455.

Digital Library

[24]

Stamatia Dasiopoulou, Eirini Giannakidou, Georgios Litos, Polyxeni Malasioti, and Yiannis Kompatsiaris. 2011. A Survey of Semantic Image and Video Annotation Tools. Springer Berlin Heidelberg, Berlin, Heidelberg, 196–239. https://doi.org/10.1007/978-3-642-20795-2_8

[25]

Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. 2008. Video browsing by direct manipulation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 237–246.

Digital Library

[26]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226–231.

[27]

Kenneth Forbus, Jeffrey Usher, Andrew Lovett, Kate Lockwood, and Jon Wetzel. 2011. CogSketch: Sketch understanding for cognitive science research and for education. Topics in Cognitive Science 3, 4 (2011), 648–666.

[28]

Dan B Goldman, Chris Gonterman, Brian Curless, David Salesin, and Steven M Seitz. 2008. Video object annotation, navigation, and composition. In Proceedings of the 21st annual ACM symposium on User interface software and technology. 3–12.

Digital Library

[29]

Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139–183.

[30]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv 2015. arXiv preprint arXiv:1512.03385(2015).

[31]

David Held, Devin Guillory, Brice Rebsamen, Sebastian Thrun, and Silvio Savarese. 2016. A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues. In Robotics: Science and Systems.

[32]

Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal information gathering techniques for active reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1893–1896.

Digital Library

[33]

Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Edward Cutrell, Michael Shilman, and Desney Tan. 2007. InkSeine: In Situ search for active note taking. In Proceedings of the SIGCHI conference on human factors in computing systems. 251–260.

Digital Library

[34]

Tim N Höffler and Detlev Leutner. 2007. Instructional animation versus static pictures: A meta-analysis. Learning and instruction 17, 6 (2007), 722–738.

[35]

Shruti Jadon and Mahmood Jasim. 2019. Video summarization using keyframe extraction and video skimming. arXiv preprint arXiv:1910.04792(2019).

[36]

Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2010. Forward-backward error: Automatic detection of tracking failures. In 2010 20th International Conference on Pattern Recognition. IEEE, 2756–2759.

Digital Library

[37]

Thorsten Karrer, Malte Weiss, Eric Lee, and Jan Borchers. 2008. Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 247–250.

Digital Library

[38]

Thorsten Karrer, Moritz Wittenhagen, and Jan Borchers. 2009. Pocketdragon: a direct manipulation video navigation interface for mobile devices. In Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services. 1–3.

Digital Library

[39]

Tim Kuehl, Alexander Eitel, Gregor Damnik, and Hermann Koerndle. 2014. The impact of disfluency, pacing, and students’ need for cognition on learning with multimedia. Computers in Human Behavior 35 (2014), 189–198.

[40]

Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3313831.3376519

Digital Library

[41]

Ville Lehtola, Heikki Huttunen, Francois Christophe, and Tommi Mikkonen. 2017. Evaluation of visual tracking algorithms for embedded devices. In Scandinavian Conference on Image Analysis. Springer, 88–97.

[42]

James R Lewis. 1992. Psychometric evaluation of the post-study system usability questionnaire: The PSSUQ. In Proceedings of the human factors society annual meeting, Vol. 36. Sage Publications Sage CA: Los Angeles, CA, 1259–1260.

[43]

Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A Simple Pooling-Based Design for Real-Time Salient Object Detection. In IEEE CVPR.

[44]

Richard K Lowe. 1999. Extracting information from an animation during complex visual learning. European journal of psychology of education 14, 2 (1999), 225–244.

[45]

Richard E Mayer. 1984. Aids to text comprehension. Educational psychologist 19, 1 (1984), 30–42.

[46]

Richard E Mayer. 2002. Multimedia learning. In Psychology of learning and motivation. Vol. 41. Elsevier, 85–139.

[47]

Richard E Mayer. 2005. Cognitive theory of multimedia learning. The Cambridge handbook of multimedia learning 41 (2005), 31–48.

[48]

Richard E Mayer and Patricia A Alexander. 2016. Handbook of research on learning and instruction. Taylor & Francis.

[49]

Xiaojun Meng, Shengdong Zhao, and Darren Edge. 2016. HyNote: Integrated Concept Mapping and Notetaking. In Proceedings of the International Working Conference on Advanced Visual Interfaces. 236–239.

Digital Library

[50]

Martin Merkt, Anne Ballmann, Julia Felfeli, and Stephan Schwan. 2018. Pauses in educational videos: Testing the transience explanation against the structuring explanation. Computers in Human Behavior 89 (2018), 399–410.

Digital Library

[51]

Leann J Mischel. 2019. Watch and learn? Using EDpuzzle to enhance the use of online videos. Management Teaching Review 4, 3 (2019), 283–289.

[52]

Xiangming Mu. 2010. Towards effective video annotation: An approach to automatically link notes with video content. Computers & Education 55, 4 (2010), 1752–1763.

Digital Library

[53]

Yair Even Or. 2017. Tagify - tags input component. https://github.com/yairEO/tagify.

[54]

Rolf Ploetzner and Richard Lowe. 2012. A systematic characterisation of expository animations. Computers in Human Behavior 28, 3 (2012), 781–794.

Digital Library

[55]

RCSBProteinDataBank. 2017. What is a Protein? (from PDB-101). Youtube. https://www.youtube.com/watch?v=wvTv8TqWC48

[56]

Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 658–666.

[57]

Lloyd P Rieber and Asit S Kini. 1991. Theoretical foundations of instructional applications of computer-generated animated visuals.J. COMP. BASED INSTR. 18, 3 (1991), 83–88.

[58]

Gavriel Salomon. 2012. Interaction of media, cognition, and learning: An exploration of how symbolic forms cultivate mental skills and affect knowledge acquisition. Routledge.

[59]

Klaus Schoeffmann, Marco A Hudelist, and Jochen Huber. 2015. Video interaction tools: A survey of recent work. ACM Computing Surveys (CSUR) 48, 1 (2015), 1–34.

Digital Library

[60]

Abdulhadi Shoufan. 2019. Estimating the cognitive value of YouTube’s educational videos: A learning analytics approach. Computers in Human Behavior 92 (2019), 450–458.

Digital Library

[61]

Robert E Slavin. 2019. Educational psychology: Theory and practice.

[62]

Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. texSketch: Active Diagramming through Pen-and-Ink Annotations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.

Digital Library

[63]

Huib K Tabbers, Rob L Martens, and Jeroen JG Van Merriënboer. 2004. Multimedia instructions and cognitive load theory: Effects of modality and cueing. British journal of educational psychology 74, 1 (2004), 71–81.

[64]

Craig S Tashman and W Keith Edwards. 2011. LiquidText: a flexible, multitouch environment to support active reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3285–3294.

Digital Library

[65]

Frank Thomas, Ollie Johnston, and Frank Thomas. 1995. The illusion of life: Disney animation. Hyperion New York.

[66]

Barbara Tversky, Julie Bauer Morrison, and Mireille Betrancourt. 2002. Animation: can it facilitate?International journal of human-computer studies 57, 4 (2002), 247–262.

Digital Library

[67]

Matthew Walsh. 2017. Video.js Transcript. https://github.com/walsh9/videojs-transcript.

[68]

Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai Yin, and Xiang Ruan. 2017. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 136–145.

[69]

Greg Winslett. 2014. What counts as educational video?: Working toward best practice alignment between video production approaches and outcomes.Australasian Journal of Educational Technology 30, 5(2014).

[70]

Kuldeep Yadav, Ankit Gandhi, Arijit Biswas, Kundan Shrivastava, Saurabh Srivastava, and Om Deshmukh. 2016. Vizig: Anchor points based non-linear navigation and summarization in educational videos. In Proceedings of the 21st International Conference on Intelligent User Interfaces. 407–418.

Digital Library

[71]

Eun-Mi Yang, Thomas Andre, Thomas J Greenbowe, and Lena Tibell. 2003. Spatial ability and the impact of visualization/animation on learning electrochemistry. International Journal of Science Education 25, 3 (2003), 329–349.

[72]

Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, and Yong Zhou. 2019. Video object segmentation and tracking: A survey. arXiv preprint arXiv:1904.09172(2019).

[73]

Ahmed Mohamed Fahmy Yousef, Mohamed Amine Chatti, Narek Danoyan, Hendrik Thüs, and Ulrik Schroeder. 2015. Video-mapper: A video annotation tool to support collaborative learning in moocs. Proceedings of the Third European MOOCs Stakeholders Summit EMOOCs (2015), 131–140.

[74]

Lei Zhang, Qian-Kun Xu, Lei-Zheng Nie, and Hua Huang. 2014. VideoGraph: a non-linear video representation for efficient exploration. The Visual Computer 30, 10 (2014), 1123–1132.

Digital Library

Cited By

Yu FZhang PDing XLu TGu N(2024)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 12-Mar-2024
https://dl.acm.org/doi/10.1145/3638775
Yang SVermeulen JFitzmaurice GMatejka J(2024)AQuA: Automated Question-Answering in Software Tutorial Videos with Visual AnchorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642752(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642752
Kim MLee KBalan RLee Y(2023)Bubbleu: Exploring Augmented Reality Game Design with Uncertain AI-based InteractionProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581270(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581270

Index Terms

VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos
1. Human-centered computing
  1. Human computer interaction (HCI)

Index terms have been assigned to the content through auto-classification.

Recommendations

Designing an Object-based Preproduction Tool for Multiscreen TV Viewing
CHI EA '18: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems

Multiscreen TV viewing refers to a spectrum of media productions that can be watched using TV and companion screens such as smartphones and tablets. In the last several years, companies are creating companion applications to enrich the TV viewing ...
Authoring, viewing, and generating hypervideo: An overview of Hyper-Hitchcock

Hyper-Hitchcock consists of three components for creating and viewing a form of interactive video called detail-on-demand video: a hypervideo editor, a hypervideo player, and algorithms for automatically generating hypervideo summaries. Detail-on-demand ...
Many-screen viewing: evaluating an olympics companion application
TVX '14: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video

The trend of users integrating second screen behaviours in their viewing habits, and practitioners' interest in designing systems to support them has evolved a strong research agenda. In this paper we extend these ideas to explore many-screen ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

March 2022

888 pages

ISBN:9781450391443

DOI:10.1145/3490099

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IUI '22

Sponsor:

IUI '22: 27th International Conference on Intelligent User Interfaces

March 22 - 25, 2022

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,838
Total Downloads

Downloads (Last 12 months)625
Downloads (Last 6 weeks)64

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu FZhang PDing XLu TGu N(2024)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 12-Mar-2024
https://dl.acm.org/doi/10.1145/3638775
Yang SVermeulen JFitzmaurice GMatejka J(2024)AQuA: Automated Question-Answering in Software Tutorial Videos with Visual AnchorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642752(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642752
Kim MLee KBalan RLee Y(2023)Bubbleu: Exploring Augmented Reality Game Design with Uncertain AI-based InteractionProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581270(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581270

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten