Article

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Authors:

Dhiman Goswami,

Sadiya Sayara Chowdhury Puspo,

Christian Newman,

Tharindu Ranasinghe,

Marcos ZampieriAuthors Info & Claims

Foundations of Intelligent Systems: 27th International Symposium, ISMIS 2024, Poitiers, France, June 17–19, 2024, Proceedings

Pages 45 - 54

https://doi.org/10.1007/978-3-031-62700-2_5

Published: 17 June 2024 Publication History

Abstract

Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters. Commercial applications (e.g., ChatGPT) have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes. Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse. Educational programs in Computer Science (CS) and related fields are particularly affected because LLMs are also capable of generating programming code in various programming languages. To help understand the potential impact of publicly available LLMs in CS education, we introduce CSEPrompts (https://github.com/mraihan-gmu/CSEPrompts), a framework with hundreds of programming exercise prompts and multiple-choice questions retrieved from introductory CS and programming courses. We also provide experimental results on CSEPrompts to evaluate the performance of several LLMs with respect to generating Python code and answering basic computer science and programming questions.

References

[1]

OpenAI, GPT-4 Technical Report. ArXiv arxiv:2303.08774 (2023)

[2]

Anil, R., et al.: PaLM 2 Technical Report (2023)

[3]

Touvron, H., Martin, L., et al.: Llama 2: open foundation and fine-tuned chat models (2023)

[4]

Penedo, G., Malartic, Q., Hesslow, D., et al.: The RefinedWeb dataset for falcon LLM: outperforming curated corpora with web data, and web data only (2023)

[5]

The MosaicML NLP Team, MPT-30B: raising the bar for open-source foundation models (2023)

[6]

Islamovic, A.: Stability AI launches the first of its StableLM suite of language models (2023)

[7]

Biderman, S.: Pythia: a suite for analyzing large language models across training and scaling. In: EleutherAI (2023)

[8]

Nori, H., King, N., McKinney, S.M., Carignan, D., Horvitz, E.: Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375 (2023)

[9]

Katz, D.M., Bommarito, M.J., Gao, S., Arredondo, P.: Gpt-4 passes the bar exam. SSRN (2023)

[10]

Tack, A.: The AI teacher test: measuring the pedagogical ability of blender and GPT-3 in educational dialogues (2022)

[11]

Haruna-Cooper, L., Rashid, M.A.: GPT-4: the future of artificial intelligence in medical school assessments. J. Roy. Soc. Med. 01410768231181251 (2023)

[12]

Lukasczyk, S., Fraser, G.: Pynguin: automated unit test generation for python, pp. 168–172 (2022)

[13]

Krekel, H., Pytest-dev team.: Pytest: helps you write better programs (2023)

[14]

Rogers A, Kovaleva O, and Rumshisky A A primer in BERTology: what we know about how BERT works Trans. Assoc. Comput. Linguist. 2020 8 842-866

[15]

Zhang, S.J., Florin, S., Lee, A.N., et al.: Exploring the MIT mathematics and EECS curriculum using large language models. arXiv preprint arXiv:2306.08997 (2023)

[16]

Lo CK What is the impact of ChatGPT on education? a rapid review of the literature Educ. Sci. 2023 13 4 410

[17]

Sok, S., Heng, K.: ChatGPT for education and research: a review of benefits and risks. SSRN 4378735 (2023)

[18]

Halaweh, M.: ChatGPT in education: strategies for responsible implementation. Contemp. Educ. Technol. 15 (2) (2023)

[19]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality (2013)

[20]

Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation (2014)

[21]

Peters, M.E., et al.: Deep contextualized word representations (2018)

[22]

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018)

[23]

Roziere, B., Gehring, J., Gloeckle, F., Sootla, S., et al.: Code llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)

[24]

Jiang, A.Q., Sablayrolles, A., Mensch, A., et al.: Mistral 7B. arXiv preprint arXiv:2310.06825 (2023)

[25]

Li, R., Allal, L.B., Zi, Y., Muennighoff, N., Kocetkov, D., et al.: StarCoder: may the source be with you!. arXiv preprint arXiv:2305.06161 (2023)

[26]

Luo, Z., et al.: WizardCoder: empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568 (2023)

[27]

Savelka, J., Agarwal, A., Bogart, C., Song, Y., Sakr, M.: Can generative pre-trained transformers (GPT) pass assessments in higher education programming courses?. arXiv preprint arXiv:2303.09325 (2023)

[28]

Zan, D., et al.: Large language models meet NL2Code: a survey. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers (2023)

[29]

Surameery, N.M.S., Shakor, M.Y.: Use chat gpt to solve programming bugs. Int. J. Inf. Technol. Comput. Eng. (IJITC) 3(01), 17–22 (2023). ISSN: 2455-5290

[30]

Austin, J., Odena, A., Nye, M., Bosma, M., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)

[31]

Feng, Z., Guo, D., Tang, D., Duan, N., et al.: CodeBERT: a pre-trained model for programming and natural languages (2020)

[32]

Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., et al.: Graphcodebert: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)

[33]

Wang, X., et al.: Syncobert: syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556 (2021)

[34]

Wang, Y., Wang, W., Joty, S., Hoi, S.C.H.: CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation (2021)

[35]

Roziere, B., Gehring, J., Gloeckle, F., Sootla, S., et al., Code llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)

[36]

Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Mapping language to code in programmatic context (2018)

[37]

Liu, J., Xia, C.S., Wang, Y., Zhang, L.: Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210 (2023)

[38]

Lai, Y., Li, C., Wang, Y., Zhang, T., Zhong, R.: DS-1000: a natural and reliable benchmark for data science code generation (2023)

[39]

Guo, W., Yang, J., Yang, K., Li, X., et al.: Instruction fusion: advancing prompt evolution through hybridization. arXiv preprint arXiv:2312.15692 (2023)

[40]

Babe, H.M., et al.: StudentEval: a benchmark of student-written prompts for large language models of code. arXiv preprint arXiv:2306.04556 (2023)

Recommendations

Updating Introductory Computer Science with Creative Computation
SIGCSE '18: Proceedings of the 49th ACM Technical Symposium on Computer Science Education

This paper reports on the results of a multi-year project in which we identified essential pedagogy and curriculum for teaching introductory computing courses focused on Creative Computation using Processing. The curriculum aligns with a traditional '...
Introductory computer science courses
SIGCSE '82: Proceedings of the thirteenth SIGCSE technical symposium on Computer science education

Many colleges and universities offer an introductory computer science course based on a specific programming language. The Department of Computer Science at the University of Kansas has recently reorganized its introductory computer science course to ...
An application oriented introductory computer science sequence
Proceedings of the 10th SIGCSE symposium on Computer science education

Today's computer science programs are in an excellent growth position. Many high school students and guidance counselors believe that the term “computer” implies high paying jobs in an exciting field. High schools are also beginning to introduce ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Foundations of Intelligent Systems: 27th International Symposium, ISMIS 2024, Poitiers, France, June 17–19, 2024, Proceedings

Jun 2024

318 pages

ISBN:978-3-031-62699-9

DOI:10.1007/978-3-031-62700-2

Editors:
Annalisa Appice
https://ror.org/027ynra39University of Bari Aldo Moro, Bari, Italy
,
Hanane Azzag
University of Sorbonne Paris Nord, Villetaneuse, France
,
Mohand-Said Hacid
Claude Bernard University Lyon 1, Villeurbanne Cedex, France
,
Allel Hadjali
LIAS/ENSMA, Poitiers, Cedex, France
,
Zbigniew Ras
University of North Carolina, Charlotte, NC, USA

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 June 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten