Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3490099.3511132acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos

Published: 22 March 2022 Publication History

Abstract

Video is an effective medium for knowledge communication and learning. Yet active viewing and note-taking from videos remain a challenge. Specifically, during note-taking, viewers find it difficult to extract essential information such as representation, composition, motion, and interactions of graphical objects and narration. Current approaches rely on creating static screenshots, manual clipping, manual annotation and transcription. This is often done by repeatedly pausing and rewinding the video, thus disrupting the viewing experience. We propose VideoSticker, a tool designed to support visual note-taking by extracting expressive content and narratives from videos as ‘object stickers.’ VideoSticker implements automated object detection and tracking, linking objects to the transcript, and supporting rapid extraction of stickers across space, time, and events of interest. VideoSticker’s two-pass approach allows viewers to capture high-level information uninterrupted and later extract specific details. We demonstrate the usability of VideoSticker for a variety of videos and note-taking needs.

Supplementary Material

MP4 File (videosticker.mp4)

References

[1]
Kurzgesagt – In a Nutshell. 2019. Neutron Stars – The Most Extreme Things that are not Black Holes. Youtube. https://www.youtube.com/watch?v=udFxKZRyQt4
[2]
Neural Academy. 2019. MITOSIS, CYTOKINESIS, AND THE CELL CYCLE. Youtube. https://www.youtube.com/watch?v=8uzHTKdv_Sw
[3]
Mortimer J Adler and Charles Van Doren. 2014. How to read a book: The classic guide to intelligent reading. Simon and Schuster.
[4]
Megha Agarwala, I-Han Hsiao, Hui Soo Chae, and Gary Natriello. 2012. Vialogues: Videos and dialogues based social learning environment. In 2012 IEEE 12th International Conference on Advanced Learning Technologies. IEEE, 629–633.
[5]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233
[6]
Aaron Bauer and Kenneth R Koedinger. 2007. Selection-based note-taking applications. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 981–990.
[7]
Clément Benkada and Laurent Moccozet. 2017. Enriched interactive videos for teaching and learning. In 2017 21st International Conference Information Visualisation (IV). IEEE, 344–349.
[8]
Mireille Bétrancourt and Kalliopi Benetos. 2018. Why and when does instructional video facilitate learning? A commentary to the special issue “developments and trends in learning with instructional video”. Computers in Human Behavior 89 (2018), 471–475.
[9]
Janice M Bonner and William G Holliday. 2006. How college science students engage in note-taking strategies. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching 43, 8 (2006), 786–818.
[10]
John Boreczky, Andreas Girgensohn, Gene Golovchinsky, and Shingo Uchihashi. 2000. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI conference on Human factors in computing systems. 185–192.
[11]
Mike Bostock. 2012. D3.js - Data-Driven Documents. http://d3js.org/
[12]
G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools(2000).
[13]
Cynthia J. Brame. 2016. Effective Educational Videos: Principles and Guidelines for Maximizing Student Learning from Video Content. CBE—Life Sciences Education 15, 4 (2016), es6. https://doi.org/10.1187/cbe.16-03-0125 arXiv:https://doi.org/10.1187/cbe.16-03-0125PMID: 27789532.
[14]
Inc. Brightcove. 2015. Video.js - open source HTML5 & Flash video player. https://github.com/videojs/video.js.
[15]
Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods 46, 3 (01 Sep 2014), 904–911. https://doi.org/10.3758/s13428-013-0403-5
[16]
Dung C Bui, Joel Myerson, and Sandra Hale. 2013. Note-taking with computers: Exploring alternative strategies for improved recall.Journal of Educational Psychology 105, 2 (2013), 299.
[17]
Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, and Luc Van Gool. 2017. One-shot video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 221–230.
[18]
Sergi Caelles, Jordi Pont-Tuset, Federico Perazzi, Alberto Montes, Kevis-Kokitsi Maninis, and Luc Van Gool. 2019. The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation. arXiv:1905.00737 (2019).
[19]
Robert Carlson, Paul Chandler, and John Sweller. 2003. Learning and understanding science instructional material.Journal of educational psychology 95, 3 (2003), 629.
[20]
Paul Chandler. 2004. The crucial role of cognitive processes in the design of dynamic visualizations. Learning and Instruction 14, 3 (2004), 353–357.
[21]
Lin Chen, Jianbing Shen, Wenguan Wang, and Bingbing Ni. 2015. Video object segmentation via dense trajectories. IEEE Transactions on Multimedia 17, 12 (2015), 2225–2234.
[22]
Kai-Yin Cheng, Sheng-Jie Luo, Bing-Yu Chen, and Hao-Hua Chu. 2009. Smartplayer: user-centric video fast-forwarding. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 789–798.
[23]
Chekuri Choudary and Tiecheng Liu. 2007. Summarization of visual content in instructional videos. IEEE Transactions on Multimedia 9, 7 (2007), 1443–1455.
[24]
Stamatia Dasiopoulou, Eirini Giannakidou, Georgios Litos, Polyxeni Malasioti, and Yiannis Kompatsiaris. 2011. A Survey of Semantic Image and Video Annotation Tools. Springer Berlin Heidelberg, Berlin, Heidelberg, 196–239. https://doi.org/10.1007/978-3-642-20795-2_8
[25]
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. 2008. Video browsing by direct manipulation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 237–246.
[26]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226–231.
[27]
Kenneth Forbus, Jeffrey Usher, Andrew Lovett, Kate Lockwood, and Jon Wetzel. 2011. CogSketch: Sketch understanding for cognitive science research and for education. Topics in Cognitive Science 3, 4 (2011), 648–666.
[28]
Dan B Goldman, Chris Gonterman, Brian Curless, David Salesin, and Steven M Seitz. 2008. Video object annotation, navigation, and composition. In Proceedings of the 21st annual ACM symposium on User interface software and technology. 3–12.
[29]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139–183.
[30]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv 2015. arXiv preprint arXiv:1512.03385(2015).
[31]
David Held, Devin Guillory, Brice Rebsamen, Sebastian Thrun, and Silvio Savarese. 2016. A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues. In Robotics: Science and Systems.
[32]
Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal information gathering techniques for active reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1893–1896.
[33]
Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Edward Cutrell, Michael Shilman, and Desney Tan. 2007. InkSeine: In Situ search for active note taking. In Proceedings of the SIGCHI conference on human factors in computing systems. 251–260.
[34]
Tim N Höffler and Detlev Leutner. 2007. Instructional animation versus static pictures: A meta-analysis. Learning and instruction 17, 6 (2007), 722–738.
[35]
Shruti Jadon and Mahmood Jasim. 2019. Video summarization using keyframe extraction and video skimming. arXiv preprint arXiv:1910.04792(2019).
[36]
Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2010. Forward-backward error: Automatic detection of tracking failures. In 2010 20th International Conference on Pattern Recognition. IEEE, 2756–2759.
[37]
Thorsten Karrer, Malte Weiss, Eric Lee, and Jan Borchers. 2008. Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 247–250.
[38]
Thorsten Karrer, Moritz Wittenhagen, and Jan Borchers. 2009. Pocketdragon: a direct manipulation video navigation interface for mobile devices. In Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services. 1–3.
[39]
Tim Kuehl, Alexander Eitel, Gregor Damnik, and Hermann Koerndle. 2014. The impact of disfluency, pacing, and students’ need for cognition on learning with multimedia. Computers in Human Behavior 35 (2014), 189–198.
[40]
Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3313831.3376519
[41]
Ville Lehtola, Heikki Huttunen, Francois Christophe, and Tommi Mikkonen. 2017. Evaluation of visual tracking algorithms for embedded devices. In Scandinavian Conference on Image Analysis. Springer, 88–97.
[42]
James R Lewis. 1992. Psychometric evaluation of the post-study system usability questionnaire: The PSSUQ. In Proceedings of the human factors society annual meeting, Vol. 36. Sage Publications Sage CA: Los Angeles, CA, 1259–1260.
[43]
Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A Simple Pooling-Based Design for Real-Time Salient Object Detection. In IEEE CVPR.
[44]
Richard K Lowe. 1999. Extracting information from an animation during complex visual learning. European journal of psychology of education 14, 2 (1999), 225–244.
[45]
Richard E Mayer. 1984. Aids to text comprehension. Educational psychologist 19, 1 (1984), 30–42.
[46]
Richard E Mayer. 2002. Multimedia learning. In Psychology of learning and motivation. Vol. 41. Elsevier, 85–139.
[47]
Richard E Mayer. 2005. Cognitive theory of multimedia learning. The Cambridge handbook of multimedia learning 41 (2005), 31–48.
[48]
Richard E Mayer and Patricia A Alexander. 2016. Handbook of research on learning and instruction. Taylor & Francis.
[49]
Xiaojun Meng, Shengdong Zhao, and Darren Edge. 2016. HyNote: Integrated Concept Mapping and Notetaking. In Proceedings of the International Working Conference on Advanced Visual Interfaces. 236–239.
[50]
Martin Merkt, Anne Ballmann, Julia Felfeli, and Stephan Schwan. 2018. Pauses in educational videos: Testing the transience explanation against the structuring explanation. Computers in Human Behavior 89 (2018), 399–410.
[51]
Leann J Mischel. 2019. Watch and learn? Using EDpuzzle to enhance the use of online videos. Management Teaching Review 4, 3 (2019), 283–289.
[52]
Xiangming Mu. 2010. Towards effective video annotation: An approach to automatically link notes with video content. Computers & Education 55, 4 (2010), 1752–1763.
[53]
Yair Even Or. 2017. Tagify - tags input component. https://github.com/yairEO/tagify.
[54]
Rolf Ploetzner and Richard Lowe. 2012. A systematic characterisation of expository animations. Computers in Human Behavior 28, 3 (2012), 781–794.
[55]
RCSBProteinDataBank. 2017. What is a Protein? (from PDB-101). Youtube. https://www.youtube.com/watch?v=wvTv8TqWC48
[56]
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 658–666.
[57]
Lloyd P Rieber and Asit S Kini. 1991. Theoretical foundations of instructional applications of computer-generated animated visuals.J. COMP. BASED INSTR. 18, 3 (1991), 83–88.
[58]
Gavriel Salomon. 2012. Interaction of media, cognition, and learning: An exploration of how symbolic forms cultivate mental skills and affect knowledge acquisition. Routledge.
[59]
Klaus Schoeffmann, Marco A Hudelist, and Jochen Huber. 2015. Video interaction tools: A survey of recent work. ACM Computing Surveys (CSUR) 48, 1 (2015), 1–34.
[60]
Abdulhadi Shoufan. 2019. Estimating the cognitive value of YouTube’s educational videos: A learning analytics approach. Computers in Human Behavior 92 (2019), 450–458.
[61]
Robert E Slavin. 2019. Educational psychology: Theory and practice.
[62]
Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. texSketch: Active Diagramming through Pen-and-Ink Annotations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
[63]
Huib K Tabbers, Rob L Martens, and Jeroen JG Van Merriënboer. 2004. Multimedia instructions and cognitive load theory: Effects of modality and cueing. British journal of educational psychology 74, 1 (2004), 71–81.
[64]
Craig S Tashman and W Keith Edwards. 2011. LiquidText: a flexible, multitouch environment to support active reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3285–3294.
[65]
Frank Thomas, Ollie Johnston, and Frank Thomas. 1995. The illusion of life: Disney animation. Hyperion New York.
[66]
Barbara Tversky, Julie Bauer Morrison, and Mireille Betrancourt. 2002. Animation: can it facilitate?International journal of human-computer studies 57, 4 (2002), 247–262.
[67]
Matthew Walsh. 2017. Video.js Transcript. https://github.com/walsh9/videojs-transcript.
[68]
Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai Yin, and Xiang Ruan. 2017. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 136–145.
[69]
Greg Winslett. 2014. What counts as educational video?: Working toward best practice alignment between video production approaches and outcomes.Australasian Journal of Educational Technology 30, 5(2014).
[70]
Kuldeep Yadav, Ankit Gandhi, Arijit Biswas, Kundan Shrivastava, Saurabh Srivastava, and Om Deshmukh. 2016. Vizig: Anchor points based non-linear navigation and summarization in educational videos. In Proceedings of the 21st International Conference on Intelligent User Interfaces. 407–418.
[71]
Eun-Mi Yang, Thomas Andre, Thomas J Greenbowe, and Lena Tibell. 2003. Spatial ability and the impact of visualization/animation on learning electrochemistry. International Journal of Science Education 25, 3 (2003), 329–349.
[72]
Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, and Yong Zhou. 2019. Video object segmentation and tracking: A survey. arXiv preprint arXiv:1904.09172(2019).
[73]
Ahmed Mohamed Fahmy Yousef, Mohamed Amine Chatti, Narek Danoyan, Hendrik Thüs, and Ulrik Schroeder. 2015. Video-mapper: A video annotation tool to support collaborative learning in moocs. Proceedings of the Third European MOOCs Stakeholders Summit EMOOCs (2015), 131–140.
[74]
Lei Zhang, Qian-Kun Xu, Lei-Zheng Nie, and Hua Huang. 2014. VideoGraph: a non-linear video representation for efficient exploration. The Visual Computer 30, 10 (2014), 1123–1132.

Cited By

View all
  • (2024)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 12-Mar-2024
  • (2024)AQuA: Automated Question-Answering in Software Tutorial Videos with Visual AnchorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642752(1-19)Online publication date: 11-May-2024
  • (2023)Bubbleu: Exploring Augmented Reality Game Design with Uncertain AI-based InteractionProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581270(1-18)Online publication date: 19-Apr-2023

Index Terms

  1. VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces
    March 2022
    888 pages
    ISBN:9781450391443
    DOI:10.1145/3490099
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 March 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. education technology
    2. video interaction
    3. video object detection
    4. visual note-taking

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IUI '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Upcoming Conference

    IUI '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)625
    • Downloads (Last 6 weeks)64
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 12-Mar-2024
    • (2024)AQuA: Automated Question-Answering in Software Tutorial Videos with Visual AnchorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642752(1-19)Online publication date: 11-May-2024
    • (2023)Bubbleu: Exploring Augmented Reality Game Design with Uncertain AI-based InteractionProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581270(1-18)Online publication date: 19-Apr-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media