A Dataset for Learning University STEM Courses at Scale and Generating Questions at a Human Level

Authors

  • Iddo Drori Massachusetts Institute of Technology Columbia University Boston University
  • Sarah Zhang Massachusetts Institute of Technology
  • Zad Chin Harvard University
  • Reece Shuttleworth Massachusetts Institute of Technology
  • Albert Lu Massachusetts Institute of Technology
  • Linda Chen Massachusetts Institute of Technology
  • Bereket Birbo Massachusetts Institute of Technology
  • Michele He Massachusetts Institute of Technology
  • Pedro Lantigua Massachusetts Institute of Technology
  • Sunny Tran Massachusetts Institute of Technology
  • Gregory Hunter Columbia University
  • Bo Feng Columbia University
  • Newman Cheng Columbia University
  • Roman Wang Columbia University
  • Yann Hicke Cornell University
  • Saisamrit Surbehera Columbia University
  • Arvind Raghavan Columbia University
  • Alexander Siemenn Massachusetts Institute of Technology
  • Nikhil Singh Massachusetts Institute of Technology
  • Jayson Lynch University of Waterloo
  • Avi Shporer Massachusetts Institute of Technology
  • Nakul Verma Columbia University
  • Tonio Buonassisi Massachusetts Institute of Technology
  • Armando Solar-Lezama Massachusetts Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v37i13.27091

Keywords:

AI For Education, STEM Courses, Natural Language Processing

Abstract

We present a new dataset for learning to solve, explain, and generate university-level STEM questions from 27 courses across a dozen departments in seven universities. We scale up previous approaches to questions from courses in the departments of Mechanical Engineering, Materials Science and Engineering, Chemistry, Electrical Engineering, Computer Science, Physics, Earth Atmospheric and Planetary Sciences, Economics, Mathematics, Biological Engineering, Data Systems, and Society, and Statistics. We visualize similarities and differences between questions across courses. We demonstrate that a large foundation model is able to generate questions that are as appropriate and at the same difficulty level as human-written questions.

Downloads

Published

2024-07-15

How to Cite

Drori, I., Zhang, S., Chin, Z., Shuttleworth, R., Lu, A., Chen, L., Birbo, B., He, M., Lantigua, P., Tran, S., Hunter, G., Feng, B., Cheng, N., Wang, R., Hicke, Y., Surbehera, S., Raghavan, A., Siemenn, A., Singh, N., Lynch, J., Shporer, A., Verma, N., Buonassisi, T., & Solar-Lezama, A. (2024). A Dataset for Learning University STEM Courses at Scale and Generating Questions at a Human Level. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 15921-15929. https://doi.org/10.1609/aaai.v37i13.27091