research-article

Fine-Grained Similarity Measurement between Educational Videos and Exercises

Authors:

Xue WangAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 331 - 339

https://doi.org/10.1145/3394171.3413783

Published: 12 October 2020 Publication History

Abstract

In online learning systems, measuring the similarity between educational videos and exercises is a fundamental task with great application potentials. In this paper, we explore to measure the fine-grained similarity by leveraging multimodal information. The problem remains pretty much open due to several domain-specific characteristics. First, unlike general videos, educational videos contain not only graphics but also text and formulas, which have a fixed reading order. Both spatial and temporal information embedded in the frames should be modeled. Second, there are semantic associations between adjacent video segments. The semantic associations will affect the similarity and different exercises usually focus on the related context of different ranges. Third, the fine-grained labeled data for training the model is scarce and costly. To tackle the aforementioned challenges, we propose VENet to measure the similarity at both video-level and segment-level by just exploiting the video-level labeled data. Extensive experimental results on real-world data demonstrate the effectiveness of VENet.

Supplementary Material

MP4 File (3394171.3413783.mp4)

This video starts with the background of the problem--fine-grained similarity measurement. It then briefly introduces the three challenges and our solutions. Finally, extensive experimental results on real-world data demonstrate the effectiveness of our approach.

Download
15.94 MB

References

[1]

Akshay Agrawal, Jagadish Venkatraman, Shane Leonard, and Andreas Paepcke. 2015. YouEDU: addressing confusion in MOOC discussion forums by recommending instructional video clips. (2015).

[2]

Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2014. Engaging with massive online courses. In Proceedings of the 23rd international conference on World wide web. ACM, 687--698.

Digital Library

[3]

Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S Yu. 2016. Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1445--1454.

Digital Library

[4]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 335--344.

Digital Library

[5]

Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 27--34.

[6]

Matthew Cooper, Jian Zhao, Chidansh Bhatt, and David A Shamma. 2018. MOOCex: Exploring Educational Video via Recommendation. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. ACM, 521--524.

Digital Library

[7]

Peng Cui, Shaowei Liu, and Wenwu Zhu. 2017. General knowledge embedded image representation learning. IEEE Transactions on Multimedia, Vol. 20, 1 (2017), 198--207.

Digital Library

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[9]

Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).

[10]

Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2007. Multi-dimensional recurrent neural networks. In International conference on artificial neural networks. Springer, 549--558.

[11]

Rachida Hannane, Abdessamad Elboushaki, Karim Afdel, P Naghabhushan, and Mohammed Javed. 2016. An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram. International Journal of Multimedia Information Retrieval, Vol. 5, 2 (2016), 89--104.

[12]

Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

[13]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[14]

Wei Huang, Enhong Chen, Qi Liu, Yuying Chen, Zai Huang, Yang Liu, Zhou Zhao, Dan Zhang, and Shijin Wang. 2019. Hierarchical multi-label text classification: An attention-based recurrent network approach. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1051--1060.

Digital Library

[15]

Zhenya Huang, Qi Liu, Yuying Chen, Le Wu, Keli Xiao, Enhong Chen, Haiping Ma, and Guoping Hu. 2020. Learning or Forgetting? A Dynamic Approach for Tracking the Knowledge Proficiency of Students. ACM Transactions on Information Systems (TOIS), Vol. 38, 2 (2020), 1--33.

Digital Library

[16]

Nitin J Janwe and Kishor K Bhoyar. 2013. Video shot boundary detection based on JND color histogram. In 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013). IEEE, 476--480.

[17]

Andrej Karpathy, Armand Joulin, and Li F Fei-Fei. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems. 1889--1897.

[18]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

[19]

Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalized travel package recommendation. In 2011 IEEE 11th International Conference on Data Mining. IEEE, 407--416.

Digital Library

[20]

Qi Liu, Zai Huang, Zhenya Huang, Chuanren Liu, Enhong Chen, Yu Su, and Guoping Hu. 2018. Finding similar exercises in online education systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1821--1830.

Digital Library

[21]

Lin Ma, Zhengdong Lu, and Hang Li. 2016. Learning to answer questions from image using convolutional neural network. In Thirtieth AAAI Conference on Artificial Intelligence.

[22]

Zachary MacHardy and Zachary A Pardos. 2015. Evaluating the Relevance of Educational Videos Using BKT and Big Data. International Educational Data Mining Society (2015).

[23]

Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Thirtieth AAAI Conference on Artificial Intelligence.

[24]

Yuxin Peng and Chong-Wah Ngo. 2006. Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, 5 (2006), 612--627.

Digital Library

[25]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[26]

q. liu, Z. Huang, Y. Yin, E. Chen, H. Xiong, Y. Su, and G. Hu. 2019. EKT: Exercise-aware Knowledge Tracing for Student Performance Prediction. IEEE Transactions on Knowledge and Data Engineering (2019), 1--1.

[27]

Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque. 2019. M-BERT: Injecting Multimodal Information in the BERT Structure. arXiv preprint arXiv:1908.05787 (2019).

[28]

Vasili Ramanishka, Abir Das, Dong Huk Park, Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, and Kate Saenko. 2016. Multimodal video description. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 1092--1096.

Digital Library

[29]

Bahjat Safadi, Mathilde Sahuguet, and Benoit Huet. 2014. When textual and visual information join forces for multimedia retrieval. In Proceedings of International Conference on Multimedia Retrieval. ACM, 265.

Digital Library

[30]

Lucas Theis and Matthias Bethge. 2015. Generative image modeling using spatial lstms. In Advances in Neural Information Processing Systems. 1927--1935.

[31]

Lucas Theis, Reshad Hosseini, and Matthias Bethge. 2012. Mixtures of conditional Gaussian scale mixtures applied to multiscale image representations. PloS one, Vol. 7, 7 (2012), e39857.

[32]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.

Digital Library

[33]

Aaron Van Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel Recurrent Neural Networks. In International Conference on Machine Learning. 1747--1756.

[34]

Martin J Wainwright and Eero P Simoncelli. 2000. Scale mixtures of Gaussians and the statistics of natural images. In Advances in neural information processing systems. 855--861.

[35]

Hao Wang, Tong Xu, Qi Liu, Defu Lian, Enhong Chen, Dongfang Du, Han Wu, and Wen Su. 2019 b. MCNE: An end-to-end framework for learning multiple conditional network representations of social network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1064--1072.

Digital Library

[36]

Jiawei Wang, Jiansheng Fang, Jiao Xu, Shifeng Huang, Da Cao, and Ming Yang. 2019 a. MOC: Measuring the Originality of Courseware in Online Education Systems. In Proceedings of the 27th ACM International Conference on Multimedia. 1952--1960.

Digital Library

[37]

Yair Weiss and William T Freeman. 2007. What makes a good model of natural images?. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.

[38]

Le Wu, Lei Chen, Richang Hong, Yanjie Fu, Xing Xie, and Meng Wang. 2019. A hierarchical attention model for social contextual image recommendation. IEEE Transactions on Knowledge and Data Engineering (2019).

[39]

Huijuan Xu, Kun He, Bryan A Plummer, Leonid Sigal, Stan Sclaroff, and Kate Saenko. 2019. Multilevel language and vision integration for text-to-clip retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9062--9069.

Digital Library

[40]

Ran Xu, Caiming Xiong, Wei Chen, and Jason J Corso. 2015. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

Digital Library

[41]

Haojin Yang and Christoph Meinel. 2014. Content based lecture video retrieval using speech and video text information. IEEE Transactions on Learning Technologies, Vol. 7, 2 (2014), 142--154.

[42]

Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 259--272.

[43]

Youngjae Yu, Jongseok Kim, and Gunhee Kim. 2018. A joint sequence fusion model for video question answering and retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 471--487.

Digital Library

[44]

Youngjae Yu, Hyungjin Ko, Jongwook Choi, and Gunhee Kim. 2017. End-to-end concept word detection for video captioning, retrieval, and question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3165--3173.

[45]

Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 765--774.

Digital Library

[46]

Kun Zhang, Guangyi Lv, Le Wu, Enhong Chen, Qi Liu, Han Wu, Xing Xie, and Fangzhao Wu. 2019 a. Multilevel Image-Enhanced Sentence Representation Net for Natural Language Inference. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2019).

[47]

Lei Zhang, Xin Zhang, Fan Cheng, Xiaoyan Sun, and Hongke Zhao. 2019 b. Personalized Recommendation for Crowdfunding Platform: A Multi-objective Approach. In CEC. 3316--3324.

Cited By

Su YYang XLu JLiu YHan ZShen SHuang ZLiu Q(2024)Multi-task Information Enhancement Recommendation model for educational Self-Directed Learning SystemExpert Systems with Applications10.1016/j.eswa.2024.124073(124073)Online publication date: May-2024
https://doi.org/10.1016/j.eswa.2024.124073
Yu JZheng YRuan SLiu QCheng ZWu JElkind E(2023)Actor-multi-scale context bidirectional higher order interactive relation Network for spatial-temporal action localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/186(1676-1685)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/186
Zhang MZhu XZhang CQian WPan FZhao HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Counterfactual Monotonic Knowledge Tracing for Assessing Students' Dynamic Mastery of Knowledge ConceptsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614827(3236-3246)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614827
Show More Cited By

Index Terms

Fine-Grained Similarity Measurement between Educational Videos and Exercises

Recommendations

Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification
ICCMS '19: Proceedings of the 11th International Conference on Computer Modeling and Simulation

Fine-grained visual categorization (FGVC) is challenging mainly due to the large intra-class confusion and small inter-class variance in terms of shape, pose, and appearance. We propose the concept of fine-grained label and that any given label can be ...
Towards an educational design pattern language to support the development of open educational resources in videos for the MOOC context
PLoP '19: Proceedings of the 26th Conference on Pattern Languages of Programs

The creation and adoption of Massive Open Online Courses (MOOCs) can bring many benefits and impact on education, such as put forward diversity in education; enhance student's learning by encouraging and engaging them for lifelong learning; connect with ...
Fine-Grained Adversarial Semi-Supervised Learning
In this article, we exploit Semi-Supervised Learning (SSL) to increase the amount of training data to improve the performance of Fine-Grained Visual Categorization (FGVC). This problem has not been investigated in the past in spite of prohibitive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
298
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Su YYang XLu JLiu YHan ZShen SHuang ZLiu Q(2024)Multi-task Information Enhancement Recommendation model for educational Self-Directed Learning SystemExpert Systems with Applications10.1016/j.eswa.2024.124073(124073)Online publication date: May-2024
https://doi.org/10.1016/j.eswa.2024.124073
Yu JZheng YRuan SLiu QCheng ZWu JElkind E(2023)Actor-multi-scale context bidirectional higher order interactive relation Network for spatial-temporal action localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/186(1676-1685)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/186
Zhang MZhu XZhang CQian WPan FZhao HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Counterfactual Monotonic Knowledge Tracing for Assessing Students' Dynamic Mastery of Knowledge ConceptsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614827(3236-3246)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614827
He LHuang ZChen ELiu QTong SWang HLian DWang S(2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 30-Jan-2023
https://dl.acm.org/doi/10.1145/3570725
Kaur PRagha L(2023)Audio de-noising and quality assessment for various noises in lecture videos2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS)10.1109/PCEMS58491.2023.10136057(1-6)Online publication date: 5-Apr-2023
https://doi.org/10.1109/PCEMS58491.2023.10136057
Shi JSu WLiu LXu SHuang TLiu JYue WLi S(2023)A Deep Memory-Aware Attentive Model for Knowledge Tracing2023 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW60847.2023.00201(1581-1590)Online publication date: 4-Dec-2023
https://doi.org/10.1109/ICDMW60847.2023.00201
Meng J(2023)NeurReview: A Neural Architecture Based Conformity Prediction of Peer ReviewsIEEE Access10.1109/ACCESS.2022.322401911(1407-1417)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2022.3224019
Huang WXiao TLiu QHuang ZMa JChen E(2023)HMNet: a hierarchical multi-modal network for educational video concept predictionInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01809-614:9(2913-2924)Online publication date: 19-Mar-2023
https://doi.org/10.1007/s13042-023-01809-6
Zhou XSong XWu HZhang JXu XMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)MAVT-FGProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548383(3811-3819)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548383
Chen CZhang JSong JGao LMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Class Gradient Projection For Continual LearningProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548054(5575-5583)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548054
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents