Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

Liu, Sannyuya; Li, Qing; Shen, Xiaoxuan; Sun, Jianwen; Yang, Zongkai

doi:10.1038/s43588-024-00629-0

Article
Published: 24 May 2024

Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

Sannyuya LiuÂ ORCID: orcid.org/0000-0002-4926-3720^1,2,3,
Qing Li^1,2,
Xiaoxuan ShenÂ ORCID: orcid.org/0000-0002-6663-5821^1,2,
Jianwen SunÂ ORCID: orcid.org/0000-0002-0951-1072^1,2 &
â¦
Zongkai YangÂ ORCID: orcid.org/0009-0005-0096-6993^1,2,3Â

Nature Computational Science volumeÂ 4,Â pages 334â345 (2024)Cite this article

940 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and an algorithmic explosion in searching. A deep learning model is initially employed to determine the learnerâs cognitive state and assess the feature importance. Symbolic regression algorithms are then used to parse the neural network model into algebraic equations. Experimental results show that the algorithm can accurately restore preset laws within a noise range in continuous feedback settings. When applied to Lumosity training data, the method outperforms traditional and recent models in fitness terms. The study reveals two new forms of skill acquisition laws and reaffirms some previous findings.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overall model architecture diagram.**

**Fig. 2: Results of the simulated data experiment.**

**Fig. 3: The skill acquisition patterns discovered by the proposed method.**

Comparing models of learning and relearning in large-scale cognitive training data sets

Article Open access 04 October 2022

Initial motor skill performance predicts future performance, but not learning

Article Open access 13 July 2023

Inferring latent learning factors in large-scale cognitive training data

Article 31 August 2020

Data availability

All data generated or analyzed during this study are available via GitHub at https://github.com/ccnu-mathits/ADM (ref. ⁵⁸) and Zenodo at https://doi.org/10.5281/zenodo.10938670 (ref. ⁵⁹). Source Data are provided with this paper.

Code availability

The source code of this study is freely available on Github at https://github.com/ccnu-mathits/ADM (ref. ⁵⁸), and via Zenodo at https://doi.org/10.5281/zenodo.10938670 (ref. ⁵⁹).

References

VanLehn, K. Cognitive skill acquisition. Ann. Rev. Psychol. 47, 513â539 (1996).
ArticleÂ Google ScholarÂ
DeKeyser, R. in Skill Acquisition Theory 83â104 (Routledge, 2020).
Tabibian, B. et al. Enhancing human learning via spaced repetition optimization. Proc. Natl Acad. Sci. USA 116, 3988â3993 (2019).
ArticleÂ MathSciNetÂ Google ScholarÂ
Evans, N. J., Brown, S. D., Mewhort, D. J. & Heathcote, A. Refining the law of practice. Psychol. Rev. 125, 592 (2018).
ArticleÂ Google ScholarÂ
Heathcote, A., Brown, S. & Mewhort, D. J. The power law repealed: the case for an exponential law of practice. Psychon. Bull. Rev. 7, 185â207 (2000).
ArticleÂ Google ScholarÂ
Shrager, J., Hogg, T. & Huberman, B. A. A graph-dynamic model of the power law of practice and the problem-solving fan-effect. Science 242, 414â416 (1988).
ArticleÂ Google ScholarÂ
Wixted, J. T. The enigma of forgetting. Proc. Natl Acad. Sci. USA 119, e2201332119 (2022).
ArticleÂ Google ScholarÂ
Averell, L. & Heathcote, A. The form of the forgetting curve and the fate of memories. J. Math. Psychol. 55, 25â35 (2011).
ArticleÂ MathSciNetÂ Google ScholarÂ
Roediger III, H. L. Relativity of remembering: why the laws of memory vanished. Annu. Rev. Psychol. 59, 225â254 (2008).
ArticleÂ Google ScholarÂ
Chiaburu, D. S. & Marinova, S. V. What predicts skill transfer? An exploratory study of goal orientation, training self-efficacy and organizational supports. Int. J. Train. Dev. 9, 110â123 (2005).
ArticleÂ Google ScholarÂ
Sturm, L. P. et al. A systematic review of skills transfer after surgical simulation training. Ann. Surgery 248, 166â179 (2008).
ArticleÂ Google ScholarÂ
Logan, G. D. Toward an instance theory of automatization. Psycho. Rev. 95, 492 (1988).
ArticleÂ Google ScholarÂ
Logan, G. D. Shapes of reaction-time distributions and shapes of learning curves: a test of the instance theory of automaticity. J. Exp. Psychol. 18, 883 (1992).
Google ScholarÂ
Anderson, J. R. Acquisition of cognitive skill. Psychol. Rev. 89, 369 (1982).
ArticleÂ Google ScholarÂ
Tenison, C. & Anderson, J. R. Modeling the distinct phases of skill acquisition. J. Exp. Psychol. 42, 749 (2016).
Google ScholarÂ
Tenison, C., Fincham, J. M. & Anderson, J. R. Phases of learning: how skill acquisition impacts cognitive processing. Cognitive Psychol. 87, 1â28 (2016).
ArticleÂ Google ScholarÂ
Jordan, M. I. Serial order: a parallel distributed processing approach. Adv. Pyschol. 121, 471â495 (1997).
McClelland, J. L. et al. Parallel Distributed Processing Vol. 2 (MIT Press, 1986).
Young, R. M. & Lewis, R. L. in Models of Working Memory: Mechanisms of Active Maintenance and Executive Control 224â256 (Cambridge Univ. Press, 1999).
Anderson, J. R., Matessa, M. & Lebiere, C. in HumanâComputer Interaction Vol. 12, 439â462 (Lawrence Erlbaum Associates, 1997).
Ritter, F. E., Tehranchi, F. & Oury, J. D. ACT-R: a cognitive architecture for modeling cognition. WIREs Cogn. Sci. 10, e1488 (2019).
ArticleÂ Google ScholarÂ
Goldstone, R. L. & Lupyan, G. Discovering psychological principles by mining naturally occurring data sets. Topics Cogn. Sci. 8, 548â568 (2016).
ArticleÂ Google ScholarÂ
Jenkins, J. J., Cermak, L. & Craik, F. in Levels of Processing in Human Memory 429â446 (Pyschology, 1979).
Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).
ArticleÂ Google ScholarÂ
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47â60 (2023).
ArticleÂ Google ScholarÂ
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81â85 (2009).
ArticleÂ Google ScholarÂ
Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. In 34th Conference on Neural Information Processing Systems 17429â17442 (NeurIPS, 2020).
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
ArticleÂ Google ScholarÂ
Chen, Z., Liu, Y. & Sun, H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 12, 6136 (2021).
ArticleÂ Google ScholarÂ
Margraf, J. T., Jung, H., Scheurer, C. & Reuter, K. Exploring catalytic reaction networks with machine learning. Nat. Catal. 6, 112â121 (2023).
ArticleÂ Google ScholarÂ
Han, Z.-K. et al. Single-atom alloy catalysts designed by first-principles calculations and artificial intelligence. Nat. Commun. 12, 1833 (2021).
ArticleÂ Google ScholarÂ
Wang, Y., Wagner, N. & Rondinelli, J. M. Symbolic regression in materials science. MRS Commun. 9, 793â805 (2019).
ArticleÂ Google ScholarÂ
He, M. & Zhang, L. Machine learning and symbolic regression investigation on stability of mxene materials. Comput. Mater. Sci. 196, 110578 (2021).
ArticleÂ Google ScholarÂ
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436â444 (2015).
ArticleÂ Google ScholarÂ
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS, 2017).
Baker, F. B. The Basics of Item Response Theory (ERIC, 2001).
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. Fundamentals of Item Response Theory Vol. 2 (Sage, 1991).
Swaminathan, H. & Gifford, J. A. Bayesian estimation in the three-parameter logistic model. Psychometrika 51, 589â601 (1986).
ArticleÂ MathSciNetÂ Google ScholarÂ
Maris, G. & Bechger, T. On interpreting the model parameters for the three parameter logistic model. Measurement 7, 75â88 (2009).
Google ScholarÂ
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251â257 (1991).
ArticleÂ Google ScholarÂ
Raghu, M., Poole, B., Kleinberg, J., Ganguli, S. & Sohl-Dickstein, J. On the expressive power of deep neural networks. In Proc. 34th International Conference on Machine Learning 2847â2854 (PMLR, 2017).
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303â314 (1989).
ArticleÂ MathSciNetÂ Google ScholarÂ
Spearman, C. âGeneral intelligence,â objectively determined and measured. Am. J. Psychol. 15, 201â293 (1904).
ArticleÂ Google ScholarÂ
Eid, M., Geiser, C., Koch, T. & Heene, M. Anomalous results in g-factor models: explanations and alternatives. Psychol. Methods 22, 541 (2017).
ArticleÂ Google ScholarÂ
Steyvers, M. & Schafer, R. J. Inferring latent learning factors in large-scale cognitive training data. Nat. Hum. Behav. 4, 1145â1155 (2020).
ArticleÂ Google ScholarÂ
Simons, D. J. et al. Do âbrain-trainingâ programs work?. Psychol. Sci. Public Interest 17, 103â186 (2016).
ArticleÂ Google ScholarÂ
Kievit, R. A. et al. Mutualistic coupling between vocabulary and reasoning supports cognitive development during late adolescence and early adulthood. Psychol. Sci. 28, 1419â1431 (2017).
ArticleÂ Google ScholarÂ
Kievit, R. A., Hofman, A. D. & Nation, K. Mutualistic coupling between vocabulary and reasoning in young children: a replication and extension of the study by Kievit et al. (2017). Psychol. Sci. 30, 1245â1252 (2019).
ArticleÂ Google ScholarÂ
Fisher, R. A. Design of experiments. Brit. Med. J. 1, 554 (1936).
ArticleÂ Google ScholarÂ
Kumar, A., Benjamin, A. S., Heathcote, A. & Steyvers, M. Comparing models of learning and relearning in large-scale cognitive training data sets. NPJ Sci. Learn. 7, 24 (2022).
ArticleÂ Google ScholarÂ
Liu, R. & Koedinger, K. R. Towards reliable and valid measurement of individualized student parameters. In Proc.10th International Conference on Educational Data Mining 135â142 (International Educational Data Mining Society, 2017).
Koedinger, K. R., Carvalho, P. F., Liu, R. & McLaughlin, E. A. An astonishing regularity in student learning rate. Proc. Natl Acad. Sci. USA 120, e2221311120 (2023).
ArticleÂ Google ScholarÂ
Neath, A. A. & Cavanaugh, J. E. The bayesian information criterion: background, derivation, and applications. WIREs Comput. Stats 4, 199â203 (2012).
ArticleÂ Google ScholarÂ
Vrieze, S. I. Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol. Methods 17, 228 (2012).
ArticleÂ Google ScholarÂ
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd International Conference on Neural Information Processing Systems 8026â8037 (Curran Associates Inc., 2019).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).
Steyvers, M. & Benjamin, A. S. The joint contribution of participation and performance to learning functions: exploring the effects of age in large-scale data sets. Behav. Res. Methods 51, 1531â1543 (2019).
ArticleÂ Google ScholarÂ
Liu, S. et al. ccnu-mathits/ADM. GitHub https://github.com/ccnu-mathits/ADM (2024).
Liu, S. et al. ccnu-mathits/ADM: source code. Zenodo https://doi.org/10.5281/zenodo.10938670 (2024).

Download references

Acknowledgements

This work was jointly supported by the National Science and Technology Major Project (grant no. 2022ZD0117103 to J.S. and Z.Y.), the National Natural Science Foundation of China (grant no. 62293554 to J.S. and Z.Y., 62107017 to X.S. and 62077021 to J.S.), the Higher Education Science Research Program of China Association of Higher Education (grant no. 23XXK0301 to J.S.), the China Postdoctoral Science Foundation (grant no. 2023T160256 to X.S.), Hubei Provincial Natural Science Foundation of China (grant no. 2023AFA020 to S.L. and 2022CFB414 to J.S.), and Fundamental Research Funds for the Central Universities (grant no. CCNU23XJ007 to X.S. and CCNU22LJ005 to S.L.).

Author information

Authors and Affiliations

National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan, China
Sannyuya Liu,Â Qing Li,Â Xiaoxuan Shen,Â Jianwen SunÂ &Â Zongkai Yang
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan, China
Sannyuya Liu,Â Qing Li,Â Xiaoxuan Shen,Â Jianwen SunÂ &Â Zongkai Yang
National Engineering Research Center for E-learning, Central China Normal University, Wuhan, China
Sannyuya LiuÂ &Â Zongkai Yang

Authors

Sannyuya Liu
View author publications
You can also search for this author in PubMedÂ Google Scholar
Qing Li
View author publications
You can also search for this author in PubMedÂ Google Scholar
Xiaoxuan Shen
View author publications
You can also search for this author in PubMedÂ Google Scholar
Jianwen Sun
View author publications
You can also search for this author in PubMedÂ Google Scholar
Zongkai Yang
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

S.L. and X.S. conceptualized the work. S.L. and X.S. designed the methodology. Q.L. performed investigations and curated the data, whereas S.L., Q.L. and Z.Y. performed the data validation. J.S. and Z.Y. administered the project. J.S. and Z.Y. acquired funding. X.S., J.S. and Z.Y. supervised the project. S.L. and X.S. wrote the original draft, whereas Q.L. and J.S. reviewed and edited it.

Corresponding authors

Correspondence to Xiaoxuan Shen, Jianwen Sun or Zongkai Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Martijn Meeter, Konstantina Sokratous and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Partial results on the Lumosity dataset.

More specifically, (a) the change curve of mean fitting absolute error during the iteration process of the deep learning regressor; (b) the change curve of the value of the regularization term during the iteration processes; (c) the prediction error distribution of 1000 randomly selected records from the trained H1000+R1 model. The box plot displays the interquartile range (IQR) with the median line, while the whiskers extend to the minimum and maximum values or a multiple of the IQR from the quartiles. Outliers are depicted as individual points beyond the whiskers; (d) the proportion of the number of practice times for each skill to the total number of practice times; (e) the distribution of feature importance for each skill in H1000+R1.

Source data

Extended Data Fig. 2 Schematic representation of the simulation experiment.

(a) Flowchart depicting the process of generating simulated data and the data format. (b) Schematic diagram illustrating the evaluation procedure of the simulation experiment. (c) Sample representation of the simulated data.

Source data

Extended Data Fig. 3 The feature set utilized in the real-world dataset (Lumosity) experiment.

The feature set comprises three components: user (U), exercise (E), and scheduling (S). The features consist of two categories: discrete and continuous. Discrete features are encoded through one-hot encoding, while continuous features are encoded using real values. Game features are two-dimensional, consisting of skill and subskill, both of which are discrete features and characterize the relationship between the game and cognitive skills. Subskill is a subdivision feature of skill.

Source data

Extended Data Fig. 4 Analysis of the difference between the mastery level of skills calculated by the deep learning regressor and the discovered governing laws.

Here, we present the changes in skill proficiency at each time point during 2000 practice sessions for five learners. The blue line represents the mastery curve computed by the deep regressor, while the red line represents the curve computed by the discovered governing law. The governing law is the same for all learners, but due to differences in their choices and order of practice, there are variations in the independent variable of the governing law for each learner.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Reporting Summary

Source data

Source Data Fig. 1

Image source data for Fig. 1.

Source Data Fig. 2

Numerical source data for Fig. 2.

Source Data Fig. 3

Numerical source data for Fig. 3.

Source Data Extended Data Fig. 1

Numerical source data for Extended Data Fig. 1.

Source Data Extended Data Fig. 2

Image source data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Image source data for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Numerical source data for Extended Data Fig. 4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, S., Li, Q., Shen, X. et al. Automated discovery of symbolic laws governing skill acquisition from naturally occurring data. Nat Comput Sci 4, 334â345 (2024). https://doi.org/10.1038/s43588-024-00629-0

Download citation

Received: 11 June 2023
Accepted: 15 April 2024
Published: 24 May 2024
Issue Date: May 2024
DOI: https://doi.org/10.1038/s43588-024-00629-0

This article is cited by

Outsourcing eureka moments to artificial intelligence
- Martijn Meeter
Nature Computational Science (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links