Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3269206.3271694acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections

Mathematics Content Understanding for Cyberlearning via Formula Evolution Map

Published: 17 October 2018 Publication History


Although the scientific digital library is growing at a rapid pace, scholars/students often find reading Science, Technology, Engineering, and Mathematics (STEM) literature daunting, especially for the math-content/formula. In this paper, we propose a novel problem, "mathematics content understanding", for cyberlearning and cyberreading. To address this problem, we create a Formula Evolution Map (FEM) offline and implement a novel online learning/reading environment, PDF Reader with Math-Assistant (PRMA), which incorporates innovative math-scaffolding methods. The proposed algorithm/system can auto-characterize student emerging math-information need while reading a paper and enable students to readily explore the formula evolution trajectory in FEM. Based on a math-information need, PRMA utilizes innovative joint embedding, formula evolution mining, and heterogeneous graph mining algorithms to recommend high quality Open Educational Resources (OERs), e.g., video, Wikipedia page, or slides, to help students better understand the math-content in the paper. Evaluation and exit surveys show that the PRMA system and the proposed formula understanding algorithm can effectively assist master and PhD students better understand the complex math-content in the class readings.


Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz. 2014. NTCIR-11 Math-2 Task Overview. In NTCIR. Citeseer.
David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning. ACM, 113--120.
Alan R. Dennis, Kelly O. McNamara, Stacy Morrone, and Joshua Plaskoff. 2015. Improving Learning with eTextbooks. In Proceedings of the 48th Hawaii International Conference on System Sciences. 5253--5259.
Laura Dietz, Steffen Bickel, and Tobias Scheffer. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th international conference on Machine learning. ACM, 233--240.
Liangcai Gao, Ke Yuan, Yuehan Wang, Zhuoren Jiang, and Zhi Tang. 2016. The math retrieval system of ICST for NTCIR-12 MathIR task. Proc. NTCIR-12 (2016).
Sean Gerrish and David M. Blei. 2010. A language-based approach to measuring scholarly impact. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 375--382.
Zhuoren Jiang, Xiaozhong Liu, and Liangcai Gao. 2015. Chronological Citation Recommendation with Information-Need Shifting. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1291--1300.
Zhuoren Jiang, Xiaozhong Liu, Liangcai Gao, and Zhi Tang. 2016. Community-based Cyberreading for Information Understanding. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 789--792.
Tristan E. Johnson, Thomas N. Archibald, and Gershon Tenenbaum. 2010. Individual and team annotation effects on students' reading comprehension, critical thinking, and meta-cognitive skills. Computers in human behavior, Vol. 26, 6 (2010), 1496--1507.
Michael Kohlhase and Ioan Sucan. 2006. A search engine for mathematical formulae. In International Conference on Artificial Intelligence and Symbolic Computation. Springer, 241--253.
Hang Li. 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, Vol. 7, 3 (2014), 1--121.
Xiaoyan Lin, Liangcai Gao, Xuan Hu, Zhi Tang, Yingnan Xiao, and Xiaozhong Liu. 2014. A mathematics retrieval system for formulae in layout presentations. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 697--706.
Xiaozhong Liu. 2013. Generating metadata for cyberlearning resources through information retrieval and meta-search. Journal of the American Society for Information Science and Technology, Vol. 64, 4 (2013), 771--786.
Xiaozhong Liu and Han Jia. 2013. Answering academic questions for education by recommending cyberlearning resources. Journal of the American Society for Information Science and Technology, Vol. 64, 8 (2013), 1707--1722.
Xiaozhong Liu, Zhuoren Jiang, and Liangcai Gao. 2015. Scientific information understanding via open educational resources (OER). In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 645--654.
Xiaozhong Liu and Jian Qin. 2014. An interactive metadata model for structural, descriptive, and referential representation of scholarly output. Journal of the Association for Information Science and Technology, Vol. 65, 5 (2014), 964--983.
Donald Metzler and W. Bruce Croft. 2007. Linear feature-based models for information retrieval. Information Retrieval, Vol. 10, 3 (2007), 257--274.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
Bruce R. Miller and Abdou Youssef. 2003. Technical aspects of the digital library of mathematical functions. Annals of Mathematics and Artificial Intelligence, Vol. 38, 1-3 (2003), 121--136.
Robert Miner and Rajesh Munavalli. 2007. An approach to mathematical search through query formulation and data normalization. In Towards Mechanized Mathematical Assistants. Springer, 342--355.
Jozef Mišutka and Leo Galamboš. 2008. Extending full text search engine for mathematical content. Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008 (2008), 55--67.
Elena Novak, Rim Razzouk, and Tristan E. Johnson. 2012. The educational use of social annotation tools in higher education: A literature review. The Internet and Higher Education, Vol. 15, 1 (2012), 39--49.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
Roy D. Pea. 2004. The social and technological dimensions of scaffolding and related theoretical concepts for learning, education, and human activity. The journal of the learning sciences, Vol. 13, 3 (2004), 423--451.
Sadhana Puntambekar and Roland Hubscher. 2005. Tools for scaffolding students in a complex learning environment: What have we gained and what have we missed? Educational psychologist, Vol. 40, 1 (2005), 1--12.
Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 248--256.
Thomas Schellenberg, Bo Yuan, and Richard Zanibbi. 2012. Layout-based substitution tree indexing and retrieval for mathematical expressions. In IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 82970I--82970I.
Petr Sojka and Martin Líška. 2011. Indexing and searching mathematics in digital libraries. In International Conference on Intelligent Computer Mathematics. Springer, 228--243.
Addison Su, Stephen J. H. Yang, Wu-Yuin Hwang, and Jia Zhang. 2010. A Web 2.0-based collaborative annotation system for enhancing knowledge sharing in collaborative learning environments. Computers & Education, Vol. 55, 2 (2010), 752--766.
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. 2011. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In Proc. 2011 Int. Conf. Very Large Data Bases (VLDB'11). Seattle, WA.
Yuehan Wang, Liangcai Gao, Simeng Wang, Zhi Tang, Xiaozhong Liu, and Ke Yuan. 2015. WikiMirs 3.0: a hybrid MIR system based on the context, structure and importance of formulae in a document. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 173--182.
Scott White and Padhraic Smyth. 2003. Algorithms for estimating relative importance in networks. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 266--275.
Joanna Wolfe. 2008. Annotations and the collaborative digital library: Effects of an aligned annotation interface on student argumentation and reading strategies. International Journal of Computer-Supported Collaborative Learning, Vol. 3, 2 (2008), 141--164.
David Wood, Jerome S. Bruner, and Gail Ross. 1976. The role of tutoring in problem solving*. Journal of child psychology and psychiatry, Vol. 17, 2 (1976), 89--100.
Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 334--342.
Xiaode Zhang, Liangcai Gao, Ke Yuan, Runtao Liu, Zhuoren Jiang, and Zhi Tang. 2017. A Symbol Dominance Based Formulae Recognition Approach for PDF Documents. In 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 9-15, 2017. 1144--1149.

Cited By

View all
  • (2024)Explainable Notes: Examining How to Unlock Meaning in Medical Notes with Interactivity and Artificial IntelligenceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642573(1-19)Online publication date: 11-May-2024
  • (2024)EBERT: A lightweight expression-enhanced large-scale pre-trained language model for mathematics educationKnowledge-Based Systems10.1016/j.knosys.2024.112118300(112118)Online publication date: Sep-2024
  • (2021)Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and SymbolsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445648(1-18)Online publication date: 6-May-2021
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018


Request permissions for this article.

Check for updates

Author Tags

  1. cyberlearning
  2. education
  3. formula evolution
  4. formula layout
  5. formula understanding


  • Research-article

Funding Sources


CIKM '18

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Feb 2025

Other Metrics


Cited By

View all
  • (2024)Explainable Notes: Examining How to Unlock Meaning in Medical Notes with Interactivity and Artificial IntelligenceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642573(1-19)Online publication date: 11-May-2024
  • (2024)EBERT: A lightweight expression-enhanced large-scale pre-trained language model for mathematics educationKnowledge-Based Systems10.1016/j.knosys.2024.112118300(112118)Online publication date: Sep-2024
  • (2021)Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and SymbolsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445648(1-18)Online publication date: 6-May-2021
  • (2021)Evaluating Methodologies on Deep Understanding of Mathematical Formulas in Technical Documents2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI52525.2021.00148(920-926)Online publication date: Nov-2021
  • (2021)Formula Citation Graph Based Mathematical Information RetrievalDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86549-8_40(631-647)Online publication date: 2-Sep-2021
  • (2021)Handwritten Mathematical Expression Recognition with Bidirectionally Trained TransformerDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86331-9_37(570-584)Online publication date: 2-Sep-2021
  • (2020)Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph EmbeddingProceedings of The Web Conference 202010.1145/3366423.3380230(1581-1591)Online publication date: 20-Apr-2020
  • (2019)Cross-domain Aspect Category Transfer and Detection via Traceable Heterogeneous Graph Representation LearningProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357989(289-298)Online publication date: 3-Nov-2019
  • (2019)Finding Camouflaged Needle in a Haystack?Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331197(365-374)Online publication date: 18-Jul-2019

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media