Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2642918.2647400acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

Video digests: a browsable, skimmable format for informational lecture videos

Published: 05 October 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Increasingly, authors are publishing long informational talks, lectures, and distance-learning videos online. However, it is difficult to browse and skim the content of such videos using current timeline-based video players. Video digests are a new format for informational videos that afford browsing and skimming by segmenting videos into a chapter/section structure and providing short text summaries and thumbnails for each section. Viewers can navigate by reading the summaries and clicking on sections to access the corresponding point in the video. We present a set of tools to help authors create such digests using transcript-based interactions. With our tools, authors can manually create a video digest from scratch, or they can automatically generate a digest by applying a combination of algorithmic and crowdsourcing techniques and then manually refine it as needed. Feedback from first-time users suggests that our transcript-based authoring tools and automated techniques greatly facilitate video digest creation. In an evaluative crowdsourced study we find that given a short viewing time, video digests support browsing and skimming better than timeline-based or transcript-based video players.

    Supplementary Material

    ZIP File (uistf3662-file5.zip)
    The supplementary pdf contains information on how we selected and tuned the segmentation algorithm.
    suppl.mov (uistf3662-file3.mp4)
    Supplemental video

    References

    [1]
    edX. http://www.edx.org.
    [2]
    Khan Academy. http://khanacademy.org.
    [3]
    TED. http://www.ted.com/.
    [4]
    Barnes, C., Goldman, D. B., Shechtman, E., and finkelstein, A. Video tapestries with continuous temporal zoom. ACM Trans. Graph. 29, 4 (July 2010), 89:1--89:9.
    [5]
    Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. Crowds in two seconds: Enabling realtime crowd-powered interfaces. In UIST, ACM (2011), 33--42.
    [6]
    Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. In Proc. of the 23nd annual, ACM (2010), 313--322.
    [7]
    Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Trans.Graph. 31, 4 (2012), 67.
    [8]
    Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. An interactive comic book presentation for exploring video. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '00, ACM (New York, NY, USA, 2000), 185--192.
    [9]
    Burrows, S., Potthast, M., and Stein, B. Paraphrase acquisition via crowdsourcing and machine learning. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 3 (2013), 43.
    [10]
    Buzek, O., Resnik, P., and Bederson, B. B. Error driven paraphrase annotation using mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 217--221.
    [11]
    Casares, J., Long, A. C., Myers, B. A., Bhatnagar, R., Stevens, S. M., Dabbish, L., Yocum, D., and Corbett, A. Simplifying video editing using metadata. In Proceedings of the 4th conference on Designing interactive systems: processes, practices, methods, and techniques, ACM (2002), 157--166.
    [12]
    Chi, P.-Y., Liu, J., Linder, J., Dontcheva, M., Li, W., and Hartmann, B. Democut: generating concise instructional videos for physical demonstrations. In UIST, ACM (2013), 141--150.
    [13]
    Choi, F. Y. Advances in domain independent linear text segmentation. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, Association for Computational Linguistics (2000), 26--33.
    [14]
    Christel, M. G., Smith, M. A., Taylor, C. R., and Winkler, D. B. Evolving video skims into useful multimedia abstractions. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM Press/Addison-Wesley Publishing Co. (1998), 171--178.
    [15]
    Corum, J. Storytelling with Data. http://style.org/tapestry/, February 2014.
    [16]
    Denkowski, M., Al-Haj, H., and Lavie, A. Turker-assisted paraphrasing for english-arabic machine translation. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics (2010), 66--70.
    [17]
    Du, L., Buntine, W., and Johnson, M. Topic segmentation with a structured topic model. In Proceedings of NAACL-HLT (2013), 190--200.
    [18]
    Eisenstein, J., and Barzilay, R. Bayesian unsupervised topic segmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2008), 334--343.
    [19]
    Gendler, T. Philosophy 181: Introduction. http://oyc.yale.edu/philosophy/phil-181/lecture-1, Spring 2011.
    [20]
    Guo, P. J., Kim, J., and Rubin, R. How video production affects student engagement: An empirical study of mooc videos. In Proceedings of the first ACM Learning@ scale conference, ACM (2014), 41--50.
    [21]
    Gupta, V., and Lehal, G. S. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258--268.
    [22]
    Haubold, A., and Kender, J. R. Augmented segmentation and visualization for presentation videos. In Proceedings of the 13th annual ACM international conference on Multimedia, ACM (2005), 51--60.
    [23]
    He, L., Sanocki, E., Gupta, A., and Grudin, J. Auto-summarization of audio-video presentations. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM (1999), 489--498.
    [24]
    Hearst, M. A. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64.
    [25]
    Khan, S. Us history overview: Jamestown to the civil war. https://www.khanacademy.org/humanities/ history/history-survey/us-history/v/ us-history-overview-1--jamestown-to-the-civil-war, April 2011.
    [26]
    Kim, J., Nguyen, P., Weir, S., Guo, P. J., Miller, R. C., and Gajos, K. Z. Crowdsourcing step-by-step information extraction to enhance existing how-to videos. In Proceedings of the 2014 ACM annual conference on Human factors in computing systems, ACM (2014).
    [27]
    Kim, J., Shang-Wen, L. D., Cai, C. J., Gajos, K. Z., and Miller, R. C. Leveraging video interaction data and content analysis to improve video learning. In CHI'14 Extended Abstracts on Human Factors in Computing Systems, ACM (2014).
    [28]
    Klemmer, S. The power of prototyping. https://class.coursera.org/hci/lecture, 2012.
    [29]
    Lasecki, W., Miller, C., Sadilek, A., Abumoussa, A., Borrello, D., Kushalnagar, R., and Bigham, J. Real-time captioning by groups of non-experts. In UIST, ACM (2012), 23--34.
    [30]
    Lasecki, W. S., Song, Y. C., Kautz, H., and Bigham, J. P. Real-time crowd labeling for deployable activity recognition. In Proceedings of the 2013 conference on Computer supported cooperative work, ACM (2013), 1203--1212.
    [31]
    Malioutov, I., and Barzilay, R. Minimum cut model for spoken lecture segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2006), 25--32.
    [32]
    Mayer, R. E., and Moreno, R. Nine ways to reduce cognitive load in multimedia learning. Educational psychologist 38, 1 (2003), 43--52.
    [33]
    Nenkova, A., Maskey, S., and Liu, Y. Automatic summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011, Association for Computational Linguistics (2011), 3.
    [34]
    Rosling, H. The best statistics you've ever seen. http://www.ted.com/talks/hans_rosling_shows_ the_best_stats_you_ve_ever_seen, February 2006.
    [35]
    Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content based tools for editing audio stories. In UIST, ACM Press (2013), 113--122.
    [36]
    Smith, M. A., and Kanade, T. Video skimming and characterization through the combination of image and language understanding. In Content-Based Access of Image and Video Database, 1998. Proceedings., 1998 IEEE International Workshop on, IEEE (1998), 61--70.
    [37]
    Tang, A., and Boring, S. # epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2012), 1569--1572.
    [38]
    Taskiran, C. M., Pizlo, Z., Amir, A., Ponceleon, D., and Delp, E. J. Automated video program summarization using speech transcripts. Multimedia, IEEE Transactions on 8, 4 (2006), 775--791.
    [39]
    Truong, B. T., and Venkatesh, S. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) 3, 1 (2007), 3.
    [40]
    Uchihashi, S., Foote, J., Girgensohn, A., and Boreczky, J. Video manga: generating semantically meaningful video summaries. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM (1999), 383--392.
    [41]
    Victor, B. Media for thinking the unthinkable. http://worrydream.com/MediaForThinkingTheUnthinkable, April 2013.
    [42]
    Victor, B. Personal communication, December 2013.
    [43]
    Whittaker, S., and Amento, B. Semantic speech editing. In Proceedings of the SIGCHI conference on Human factors in computing systems, ACM (2004), 527--534.
    [44]
    Yuan, J., and Liberman, M. Speaker identification on the scotus corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.

    Cited By

    View all
    • (2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
    • (2024)FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual ContextsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652922(205-216)Online publication date: 4-Apr-2024
    • (2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
    • Show More Cited By

    Index Terms

    1. Video digests: a browsable, skimmable format for informational lecture videos

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technology
      October 2014
      722 pages
      ISBN:9781450330695
      DOI:10.1145/2642918
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 October 2014

      Check for updates

      Author Tags

      1. education
      2. video digests
      3. video presentation interfaces

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      UIST '14

      Acceptance Rates

      UIST '14 Paper Acceptance Rate 74 of 333 submissions, 22%;
      Overall Acceptance Rate 842 of 3,967 submissions, 21%

      Upcoming Conference

      UIST '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)160
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
      • (2024)FastPerson: Enhancing Video-Based Learning through Video Summarization that Preserves Linguistic and Visual ContextsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652922(205-216)Online publication date: 4-Apr-2024
      • (2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
      • (2024)Tutorial mismatches: investigating the frictions due to interface differences when following software video tutorialsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661511(1942-1955)Online publication date: 1-Jul-2024
      • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
      • (2024)Temaneki: Map-Based Collaboration Tool for Consensus-Building in Student-Run Festival Management TeamsExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651013(1-8)Online publication date: 11-May-2024
      • (2024)Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery LearningProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642587(1-18)Online publication date: 11-May-2024
      • (2024)SwapVid: Integrating Video Viewing and Document Exploration with Direct ManipulationProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642515(1-13)Online publication date: 11-May-2024
      • (2024)Bridging the Literacy Gap for Adults: Streaming and Engaging in Adult Literacy Education through LivestreamingProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642423(1-15)Online publication date: 11-May-2024
      • (2023)BNoteHelper: A Note-Based Outline Generation Tool for Structured Learning on Video Sharing PlatformsACM Transactions on the Web10.1145/3638775Online publication date: 27-Dec-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media