research-article

Content-driven Multi-modal Techniques for Non-linear Video Navigation

Authors:

Kundan Shrivastava,

S. Mohana Prasad,

Harish Arsikere,

Om DeshmukhAuthors Info & Claims

IUI '15: Proceedings of the 20th International Conference on Intelligent User Interfaces

Pages 333 - 344

https://doi.org/10.1145/2678025.2701408

Published: 18 March 2015 Publication History

Abstract

The growth of Massive Open Online Courses (MOOCs) has been remarkable in the last few years. A significant amount of MOOCs content is in the form of videos and participants often use non-linear navigation to browse through a video. This paper proposes the design of a system that provides non-linear navigation in educational videos using features derived from a combination of audio and visual content of a video. It provides multiple dimensions for quickly navigating to a given point of interest in a video i.e., customized dynamic time-aware word-cloud, video pages, and a 2-D timeline. In word-cloud, the relative placement of the words indicates their temporal ordering in the video whereas color codes are used to represent acoustic stress. The 2-D timeline is used to present multiple occurrences of a keyword/concept in the video in response to user click in the word-cloud. Additionally, visual content is analyzed to identify frames with "maximum written content", known as video pages. We conducted a user study with 20 users to evaluate the proposed system and compared it with transcription-based interfaces used by major MOOC providers. Our findings suggest that the proposed system leads to statistically significant navigation time savings especially on multimodal navigation tasks.

References

[1]

EdX MOOC Platform, https://www.edx.org/

[2]

Coursera MOOC Platform https://www.coursera.org/

[3]

FFMPEG Library, https://www.ffmpeg.org/

[4]

Wordle http://www.wordle.net/

[5]

Lindsay Ryan, MOOCs are on the Move: A Snapshot of the Rapid Growth of MOOCs, https://www.efmd.org/index.php/blog/view/250-white-paper-moocs-massive-open-online-courses

[6]

An Early Report Card on Massive Open Online Courses, http://goo.gl/jceRXA

[7]

Online viewers ditch slow-loading video after 2 seconds, http://edition.cnn.com/2012/11/12/tech/web/video-loading-study

[8]

Guo, P. J., & Reinecke, K. (2014, March). Demographic differences in how students navigate through MOOCs. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 21--30). ACM. Chicago

Digital Library

[9]

Kim, J., Guo, P. J., Seaton, D. T., Mitros, P., Gajos, K. Z., & Miller, R. C. (2014, March). Understanding in-video dropouts and interaction peaks in online lecture videos. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 31--40). ACM.

Digital Library

[10]

Cutrell, E., Bala, S., Bansal, C., Cross, A., Datha, N., John, A., ... & Thies, W. (2013). Massively Empowered Classroom: Enhancing Technical Education in India, MSR Technical Report.

[11]

Krishnan, S. S., & Sitaraman, R. K. (2013). Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Transactions on Networking (TON), 21(6), 2001--2014.

Digital Library

[12]

Castella, Q., & Sutton, C. (2014, April). Word storms: multiples of word clouds for visual comparison of documents. In Proceedings of the 23rd international conference on World wide web (pp. 665--676). International World Wide Web Conferences Steering Committee.

Digital Library

[13]

Ada, I., Thiel, K., & Berthold, M. R. (2010, October). Distance aware tag clouds. In Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on (pp. 2316--2322). IEEE.

[14]

Monserrat, T. J. K. P., Zhao, S., McGee, K., & Pandey, A. V. (2013, April). NoteVideo: facilitating navigation of blackboard-style lecture videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1139--1148). ACM.

Digital Library

[15]

Kim, J., Guo, P. J., Cai, C. J., Li, S. W. D., Gajos, K. Z., & Miller, R. C. (2014, October). Data-driven interaction techniques for improving navigation of educational videos. In Proceedings of the 27th annual ACM symposium on User interface software and technology (pp. 563--572). ACM.

Digital Library

[16]

Nicholson, J., Huber, M., Jackson, D., & Olivier, P. (2014, April). Panopticon as an eLearning support search tool. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems (pp. 1221--1224). ACM.

Digital Library

[17]

Matejka, J., Grossman, T., & Fitzmaurice, G. (2013, April). Swifter: improved online video scrubbing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1159--1168). ACM.

Digital Library

[18]

Rohlicek, J. Robin, et al. "Continuous hidden Markov modeling for speaker-independent word spotting." Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on. IEEE, 1989.

[19]

Vidrascu, Laurence, and Laurence Devillers. "Detection of real-life emotions in call centers." INTERSPEECH. Vol. 2005. No. 10. 2005.

[20]

Deshmukh, Om D., and Ashish Verma. "Nucleus-level clustering for word-independent syllable stress classification." Speech Communication 51.12 (2009): 1224--1233.

Digital Library

[21]

SNACK sound toolkit http://www.speech.kth.se/snack/

[22]

Jitendra Ajmera, Om D. Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput, Saurabh Srivastava: Audio cloud: creation and rendering. IUI 2012: 277--280.

Digital Library

[23]

http://www.colour-affects.co.uk/psychologicalproperties-of-colours

[24]

Young, Steve, et al. "The HTK book (for HTK version 3.4)." Cambridge university engineering department 2.2 (2006): 2--3.

[25]

http://www.keithv.com/software/htk/us/

[26]

Reeves, Carolyn, A. Ren Schmauder, and Robin K. Morris. "Stress grouping improves performance on an immediate serial list recall task." Journal of Experimental Psychology: Learning, Memory, and Cognition 26.6 (2000): 1638.

[27]

C. J. van Rijsbergen, S. E. Robertson and M. F. Porter, 1980. New models in probabilistic information retrieval. London: British Library. (British Library Research and Development Report, no. 5587.

[28]

Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

Digital Library

[29]

Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005.

Digital Library

[30]

Jrvelin, Kalervo, and Jaana Keklinen. "IR evaluation methods for retrieving highly relevant documents." Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2000.

Digital Library

[31]

Comaniciu, Dorin, and Peter Meer. "Mean shift: A robust approach toward feature space analysis." Pattern Analysis and Machine Intelligence, IEEE Transactions on 24.5 (2002): 603--619.

Digital Library

[32]

Ying-Dong, Qu, et al. "A fast subpixel edge detection method using¡ i¿ Sobel¡/i¿¡ i¿ Zernike moments¡/i¿ operator." Image and Vision Computing 23.1 (2005): 11--17.

[33]

Hu, Weiming, et al. "A survey on visual content-based video indexing and retrieval." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 41.6 (2011): 797--819.

Digital Library

[34]

Choudary, Chekuri, and Tiecheng Liu. "Summarization of visual content in instructional videos." Multimedia, IEEE Transactions on 9.7 (2007): 1443--1455.

Digital Library

[35]

Altman, Edward, Yu Chen, and Wai Chong Low. "Semantic exploration of lecture videos." Proceedings of the tenth ACM international conference on Multimedia. ACM, 2002.

Digital Library

[36]

Adcock, John, et al. "Talkminer: a lecture webcast search engine." Proceedings of the international conference on Multimedia. ACM, 2010.

Digital Library

Cited By

Jaber RZhong SKuoppamäki SHosseini AGessinger IBrumby DCowan BMcmillan D(2024)Cooking With Agents: Designing Context-aware Voice InteractionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642183(1-13)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642183
Xia WSun GLi TChang BTang JZhang GLiang R(2024)Video Visualization and Visual Analytics: A Task-Based and Application- Driven InvestigationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342340234:11(11316-11339)Online publication date: Nov-2024
https://doi.org/10.1109/TCSVT.2024.3423402
Yu FZhang PDing XLu TGu N(2023)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 27-Dec-2023
https://dl.acm.org/doi/10.1145/3638775
Show More Cited By

Index Terms

Content-driven Multi-modal Techniques for Non-linear Video Navigation
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

ViZig: Anchor Points based Non-Linear Navigation and Summarization in Educational Videos
IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces

Instructional videos are one of the most popular ways of teaching and learning in an online setting. However, navigation in videos is linear as compared to other instructional resources such as textbooks, where a table of topics and a multi-faceted ...
Towards Supporting Non-linear Navigation in Educational Videos
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

MOOC participants spend most of their time watching videos in a course and recent studies have found that there is a requirement of non-linear navigation system for educational videos. We propose a system that provides efficient and non-linear ...
Effects of In-Video Quizzes on MOOC Lecture Viewing
L@S '16: Proceedings of the Third (2016) ACM Conference on Learning @ Scale

Online courses on sites such as Coursera use quizzes embedded inside lecture videos (in-video quizzes) to help learners test their understanding of the video. This paper analyzes how users interact with in-video quizzes, and how in-video quizzes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '15: Proceedings of the 20th International Conference on Intelligent User Interfaces

March 2015

480 pages

ISBN:9781450333061

DOI:10.1145/2678025

General Chairs:
Oliver Brdiczka
Vectra Networks, Inc.
,
Polo Chau
Georgia Tech
,
Program Chairs:
Giuseppe Carenini
University of British Columbia
,
Shimei Pan
University of Maryland
,
Per Ola Kristensson
University of Cambridge

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IUI'15

Sponsor:

IUI'15: IUI'15 20th International Conference on Intelligent User Interfaces

March 29 - April 1, 2015

Georgia, Atlanta, USA

Acceptance Rates

IUI '15 Paper Acceptance Rate 47 of 205 submissions, 23%;

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
529
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jaber RZhong SKuoppamäki SHosseini AGessinger IBrumby DCowan BMcmillan D(2024)Cooking With Agents: Designing Context-aware Voice InteractionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642183(1-13)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642183
Xia WSun GLi TChang BTang JZhang GLiang R(2024)Video Visualization and Visual Analytics: A Task-Based and Application- Driven InvestigationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342340234:11(11316-11339)Online publication date: Nov-2024
https://doi.org/10.1109/TCSVT.2024.3423402
Yu FZhang PDing XLu TGu N(2023)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 27-Dec-2023
https://dl.acm.org/doi/10.1145/3638775
Chen XLi SLiu SFowler RWang X(2023)MeetScript: Designing Transcript-based Interactions to Support Active Participation in Group Video MeetingsProceedings of the ACM on Human-Computer Interaction10.1145/36101967:CSCW2(1-32)Online publication date: 4-Oct-2023
https://dl.acm.org/doi/10.1145/3610196
Xu CJia WWang RHe XZhao BZhang Y(2023)Semantic Navigation of PowerPoint-Based Lecture Video for AutoNote GenerationIEEE Transactions on Learning Technologies10.1109/TLT.2022.321653516:1(1-17)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1109/TLT.2022.3216535
Lu FXu YXu XJones BMalamed L(2023)Exploring the Impact of User and System Factors on Human-AI Interactions in Head-Worn Displays2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)10.1109/ISMAR59233.2023.00025(109-118)Online publication date: 16-Oct-2023
https://doi.org/10.1109/ISMAR59233.2023.00025
Biswas DShah SSubhlok J(2023)Identification of Visual Objects in Lecture Videos with Color and Keypoints Analysis2023 IEEE International Symposium on Multimedia (ISM)10.1109/ISM59092.2023.00060(315-320)Online publication date: 11-Dec-2023
https://doi.org/10.1109/ISM59092.2023.00060
Ma BLu MTaniguchi YKonomi S(2022)Exploring jump back behavior patterns and reasons in e-book systemSmart Learning Environments10.1186/s40561-021-00183-69:1Online publication date: 4-Jan-2022
https://doi.org/10.1186/s40561-021-00183-6
Zhao YJaber RMcMillan DMunteanu C(2022)“Rewind to the Jiggling Meat Part”: Understanding Voice Control of Instructional Videos in Everyday TasksProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502036(1-11)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502036
Tang CLiao JWang HSung CLin W(2021)ConceptGuide: Supporting Online Video Learning with Concept Map-based Recommendation of Learning PathProceedings of the Web Conference 202110.1145/3442381.3449808(2757-2768)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449808
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten