1. Introduction
Over the past few years, the advancement of natural language processing (NLP) has achieved significant success in many realistic applications such as speech and entity recognition, summarization, language translation, and text generation. Researchers have also used recurrent neural network (RNN) models in many applications because of their recurrent structure to remember dependencies in texts [
1,
2,
3]. However, an RNN model has limitations and cannot handle extremely long-range dependencies in natural language texts [
2]. Therefore, the Transformer architecture was introduced [
4] to alleviate this problem. It is an autoregressive (
https://en.wikipedia.org/wiki/Autoregressive_model, accessed on 10 February 2023) and self-supervised (
https://en.wikipedia.org/wiki/Self-supervised_learning, accessed on 10 February 2023) language model. Transformer also has a self-attention mechanism that determines the relationship and relevance of different parts of the input. This makes the model very robust for understanding the relationship between words in a sentence regardless of their position. Generative pretrained transformer (GPT)-3 is a large language model (LLM) based on the Transformer architecture [
5,
6] that has achieved significant success in NLP tasks. GPT-3 models are trained on extensive text data (approximately 175 billion trainable parameters and 570 GB of text), capable of generating human-like text and performing other language-related tasks with a high percentage of accuracy.
ChatGPT is an NLP model developed by OpenAI (
https://openai.com/, accessed on 5 January 2023) and was launched in November 2022. ChatGPT [
7] emerged as a breakthrough LLM that can generate text and maintain a human-like conversational style. As GPT-3 is trained on vast amounts of Internet data and is suitable for a wide range of downstream tasks, the ChatGPT model begins with the GPT-3 pretrained LLM. However, GPT-3 has the disadvantage of having “
poorly characterized behavior” [
8]. Therefore, to avoid toxic and untruthful outputs, ChatGPT decided to use three different strategies: (
i) supervised fine-tuning, (
) reward modeling, and (
) reinforcement learning. The process begins by collecting a dataset of labeler demonstrations used to fine-tune GPT-3 with supervised learning. Next, a dataset of rankings of model outputs is used to further fine-tune the supervised model with reinforcement learning from human feedback. Finally, the resulting model is called InstructGPT [
8]. Unlike other AI-based language models, ChatGPT generates and presents entirely new content in a real-time conversation with the user. Furthermore, ChatGPT can consistently maintain a style of dialogue that engages the user in a more realistic way, rather than providing irrelevant answers to each question. This makes ChatGPT a more unique model than other LLMs.
ChatGPT has exhibited top performance in many application domains, such as coherent content and essay generation, chatbot responses, language translation, question answering, and programming code [
9,
10]. In addition, research is underway to fine-tune such LLMs for specific tasks and apply transfer learning in new domains. In the context of education, both students and educators can use ChatGPT for many academic and research purposes. Educators can take advantage of ChatGPT to prepare an outline of a particular course, topic-related content for lectures, presentations on academic topics, questions, problemsets, etc. Similarly, students can be assisted by ChatGPT in solving complex problems and questions, writing essays, and explaining a specific topic to accelerate their learning [
10,
11]. Students can even receive programming-related support here to accelerate their learning of programming. ChatGPT has made significant progress; however, there are concerns about misuse [
12,
13,
14]. Therefore, it is important to consider the potential threats (e.g., the integrity of online exams and question answering) alongside the many good applications of ChatGPT for education. Some experts have expressed concern about the future of some common practices such as programming in the era of ChatGPT [
13]. Therefore, it is important to rationally evaluate the situation and prepare a suitable future educational plan in the presence of tools such as ChatGPT.
In recent decades, many technologies have emerged that have occasionally disrupted traditional practices. Therefore, people need to evaluate and consider the benefits and threats of such new technologies [
10]. In the past, questions have been raised about Google and how this tool will change the way people think, read, and memorize [
15]. Another educational tool is the Massive Open Online Course, which gained a significant amount of attention in early 2010, and then its performance declined because of its strategies and business models [
16]. These concerns can also be applied to ChatGPT, as it has many potentials but also significant dangers. Above all, ChatGPT can be used as an educational technology in many ways, including tutors, language models, and research and teaching assistants. Furthermore, ChatGPT is distinguished from other LLMs by some of its special characteristics such as accessibility, personalization, conversational format, and cost-effectiveness. Numerous studies have been conducted to explore the application of artificial intelligence (AI) in education, including chatbots [
17], programming support [
18,
19], language models [
2,
20], and NLP tools [
21]. However, ChatGPT has recently been launched and is also a relatively new technology in the educational domain. To the best of our knowledge, no research has (
i) conducted a comprehensive survey with students and teachers to find out how ChatGPT supports programming learning and teaching and (
) addressed the opportunities, threats, and strategies of ChatGPT for education, research, and particularly programming education.
Aims and Contributions
In this study, we explore the opportunities, challenges, and strategies of using ChatGPT in education and research and identify strategies for potential threats. To demonstrate the effectiveness of ChatGPT in programming support, we performed coding-related experiments, such as code generation from problem descriptions, pseudocode generation of algorithms from texts, and code correction. These generated solution codes were validated using an online judge system. In addition, we conducted a comprehensive survey with students and teachers to find out how ChatGPT supports programming learning and teaching. The main contributions of this study are as follows:
Investigated the opportunities and possible threats of using ChatGPT in educational settings, particularly in programming education
Performed experiments with ChatGPT to illustrate how this tool can be used to support programming learning
Conducted a comprehensive survey with students and teachers to find out how ChatGPT supports programming learning and teaching
Presented threat-mitigation strategies in the presence of AI tools such as ChatGPT
Discussed future educational plans and curricula in light of such revolutionary AI tools
The rest of the paper is organized as follows:
Section 2 presents related work,
Section 3 explores the opportunities of ChatGPT from educator and student perspectives,
Section 4 presents the programming learning with ChatGPT,
Section 5 presents the survey results and analysis,
Section 6 explores the potential threats of ChatGPT and strategies to address them,
Section 7 presents the limitations, and finally,
Section 8 concludes the study.
2. Related Literature
In this section, we present recent studies on education and research using ChatGPT from the perspective of educators and learners. We also summarize published studies that use ChatGPT for different educational fields, including science, medicine, and engineering. Due to the novelty of the topic, we found few peer-reviewed scholarly papers. However, we reviewed preprints (non-peer-reviewed academic papers) in different educational branches. Muneer [
22] presented the potential of AI and NLP to improve academic performance. The study employed a case study using ChatGPT, which significantly enhances academic research on economics and finance. Dowling and collaborators [
23] discovered the benefits of ChatGPT for their research in finance. They also mentioned the ethical implications of this revolutionary AI tool. Rudolph et al. [
24] discussed the opportunities and challenges of using ChatGPT in education. Moreover, some basic features and user interfaces of ChatGPT were presented in [
24]. Furthermore, several studies [
9,
10,
11,
12,
14] have published applications of ChatGPT in education.
Frieder et al. [
25] tested the mathematical capabilities of ChatGPT on GHOSTS and handcrafted datasets. GHOSTS is the first collection of data expressed in natural language that was created and maintained by active mathematicians. Their experimental results proved that “
the mathematical abilities of ChatGPT are significantly below those of an average mathematics student. ChatGPT often understands the question but does not provide correct solutions.” The performance of ChatGPT as an assistant in medical education is also significant. Kung et al. [
26] evaluated the performance of ChatGPT on the US Medical Licensing Exam, which includes three exams: Step 1, Step 2CK, and Step 3. Their experimental results show that ChatGPT performed close to or near the minimum requirement on all three exams despite receiving no specific guidance or support. Gilson et al. [
27] investigated the performance of ChatGPT on the medical licensing exam. These compelling results mean that ChatGPT can support medical education. A summary of articles related to medical education using ChatGPT can be found in [
28].
Computer programming is a complex task requiring correct logical and syntactic implementation. AI-based models have achieved plausible success on a variety of programming tasks, including code repair, summarization, completion, correction, classification, and generation [
20,
29]. Alphacode [
30] is a state-of-the-art LLM developed and trained specifically to support competitive-level programming. Both ChatGPT and Alphacode perform coding-related tasks by digesting a large amount of human-generated text [
31]. ChatGPT is a more general conversation engine, whereas Alphacode is more specialized for programming [
31], even though these two systems use “
virtually the same architecture” [
31]. Moreover, although ChatGPT was not designed and developed for automatic code repair, this tool is still suitable for it. The performance of code debugging with ChatGPT has been presented in the study [
32]. Their experimental results show that the bug-fixing performance of ChatGPT is competitive with deep learning models such as Codex and CoCoNut. Nevertheless, ChatGPT has significantly outperformed traditional code-repair approaches [
32]. Jalil et al. [
33] studied the performance of ChatGPT in solving software testing curriculum questions. ChatGPT could generate correct/partially correct answers and explanations in approximately 44% and 57% of cases, respectively.
Malinka and collaborators [
34] studied the impact of ChatGPT on higher (university) education, especially on subjects related to computer programming. They demonstrated the effectiveness and usability of ChatGPT for handling programming assignments, exams, and homework using collected data. They also presented the misuse and benefits of ChatGPT for computer science education. In the study [
35], the programming capabilities of ChatGPT for solving numerical problems were investigated. Generating code, debugging and improving code written by humans, completing code, and rewriting code in different programming languages were tested with ChatGPT for numerical algorithms. They considered several numerical problems, including the diffusion equation, Poisson equation, compressible inviscid flow, incompressible Navier–Stokes equations, solving linear systems of equations, eigenvalue problems, and storing sparse matrices. The experimental results showed that ChatGPT can successfully generate program codes for numerical problems, but there are limitations and challenges in solving problems. Moreover, ChatGPT has recently been applied to tasks in different domains such as software engineering [
36], medical education [
37], data augmentation [
38], fifth industrial revolution [
39], code generation [
40], intelligent vehicles [
41], and solving AI tasks [
42]. In contrast, researchers are concerned about the general practices of programming in the future by ChatGPT [
13]. For the convenience of researchers, we have summarized some studies using ChatGPT in
Table 1.
3. Opportunities with ChatGPT
ChatGPT is a powerful LLM developed by OpenAI that has the potential to transform our technological interactions and lead to a significant paradigm shift. Many academic articles have been published on ChatGPT, but a review of the literature on the effects of ChatGPT revealed various viewpoints ranging from favorable to unfavorable. In mathematics and science education, calculators have become an inseparable part; similarly, ChatGPT will be an important tool for daily writing and work [
50]. Sharples [
51] encourages educators and learners to take advantage of the available capabilities of AI tools such as ChatGPT rather than forego their use. In this section, we present the prospects and opportunities of ChatGPT for education and research from the perspective of learners, educators, and researchers.
3.1. Opportunities for Learners
ChatGPT offers many possibilities, and this tool can be a good assistant for learners. Learners are allowed to use this tool to understand and solve complex problems. For learners who prefer experimental and hands-on learning, ChatGPT is an excellent platform to achieve this [
24]. One of the biggest advantages of ChatGPT is its ability to understand and respond to natural language queries. This allows learners to ask ChatGPT a question in the same way they would ask their tutors. This makes ChatGPT more intuitive and learner-friendly. It can be used at all levels of education, from elementary to higher education, and even for professional development. The ChatGPT model can help students develop their reading and writing skills by providing suggestions (e.g., syntactic and grammatical); the model can create practice exercises and quizzes for various subjects (e.g., mathematics, physics, language, and literature); the model can provide a set of exercises and quizzes. Moreover, the ChatGPT model can create explanations and step-by-step solutions to a given problem; the model can help develop problem-solving skills and analytical and out-of-the-box thinking.
Furthermore, ChatGPT can be used for group discussions and debates by providing personalized guidance to learners during the discussion; ChatGPT can support learners with disabilities by providing services such as speech-to-text and text-to-speech. The ChatGPT model can be a professional tutor for developing language skills, programming, report writing, project management, and technical (e.g., medical, legal, and IT) report writing. More interestingly, learners can argue with ChatGPT about the given explanations, solutions, and other suggestions. Therefore, learners receive interactive help from ChatGPT anytime and anywhere. In addition, we experimented with the ChatGPT model to find the derivatives of mathematical equations, and it solved all equations correctly. In this case, the correctness is 100%.
Figure 1 shows the capabilities of ChatGPT for technical education.
3.2. Opportunities for Educators
As an LLM, ChatGPT can be a valuable tool for educators in many ways. Educators can take advantage of using ChatGPT for effective teaching and research. Here are some examples that can demonstrate the effectiveness of ChatGPT for teaching and research.
Lesson Planning: ChatGPT can be used to create lesson plans for specific courses, such as math, chemistry, physics, computer science, civil engineering, language, and literature. ChatGPT provides topic-specific illustrations, activities, and exercises to help educators better teach their students. ChatGPT can also be used to generate topic-specific quiz questions tailored to the subject matter and difficulty level. For example, we asked ChatGPT to “
prepare a detailed outline for the Algorithms and Data Structures course”, as depicted in
Figure 2. It creates a complete table of contents for the Algorithms and Data Structures course, showing topics and a breakdown of each topic, along with learning objectives.
Personalized Learning Support: Educators can use ChatGPT to provide personalized learning support for their students. Depending on a student’s needs and learning style, ChatGPT can suggest customized resources and learning activities. For instance, educators can use ChatGPT to analyze student performance data and identify areas where students are struggling with particular concepts or algorithms. An educator might notice that a particular student is struggling with sorting algorithms. In this case, the educator can take advantage of ChatGPT to generate customized resources based on that student’s learning style and abilities (e.g., a video tutorial on a specific sorting algorithm that the student is struggling with or a coding exercise to reinforce that concept).
Figure 3 depicts the personalized learning steps.
Answering Learners’ Queries: Educators can receive the help of ChatGPT to answer learners’ questions. Furthermore, if educators can ask ChatGPT for explanations and examples on a particular topic, it will surely increase the effectiveness of teaching. For example, if a learner asks the question “
Which sorting algorithm should we use for which data field?”, ChatGPT can provide summary information in this case (see
Figure 4), which can be useful for teachers. In addition, teachers can ask ChatGPT for explanations and examples on a complex topic to obtain accurate and tailored information.
Rapid Assessment and Evaluation: Educators can also leverage the power of the ChatGPT model to assess and evaluate learner assignments and quizzes. The model can be used to check submitted assignments for plagiarism. Interestingly, the model can generate questions/quizzes on the basis of different difficulty levels (e.g., high, medium, easy) on the same topic. For example, as depicted in
Figure 5, we asked ChatGPT to create a
high-difficulty quiz on “
Sorting and Searching Algorithms”. We also asked ChatGPT about quizzes of other levels, such as medium and easy, and we found that there was a significant difference between quizzes of different levels. In addition, the model can be used to grade assignments and quizzes [
9]. This can save educators a significant amount of valuable time.
Apart from the above benefits, educators can use ChatGPT for language learning support, personalized feedback, professional development, and research.
3.3. Opportunities for Researchers
The ChatGPT model offers many advantages to researchers. First, it can effectively support the writing process of research. At its most basic, it can improve writing by finding and correcting typographical errors, improving grammatical inconsistencies, providing advanced vocabulary, and recommending improvement strategies. This allows researchers to devote more time to experimentation and implementation. The model can also summarize published work on a particular topic, which helps researchers understand the work. It can also provide clues and research ideas by analyzing a specific topic. For example, we asked ChatGPT to provide an unexplored research idea on “
how to reduce errors related to resource constraints (time and memory limit exceeded) in code for programmers”. Based on this query, ChatGPT provided some interesting and promising research ideas, as illustrated in
Figure 6.
4. Programming Learning with ChatGPT
The importance of computer programming in professional and academic fields is significant. Experienced programmers have demonstrated improvement in both professional and academic fields [
52]. Programming skills are acquired through repeated practice. To assist programmers, deep-learning-based tools are introduced for code repair, completion, error detection, optimization, verification, and classification [
1,
2,
19,
20,
53,
54]. In recent years, LLMs based on the Transformer architecture (e.g., CodeBERT, Codex, and PyMT5) have achieved state-of-the-art results for various programming tasks [
55]. ChatGPT is an LLM based on the Transformer architecture and has received significant attention because of its human-like conversational style. The application of ChatGPT is not limited to language-related tasks but also finds use in programming learning applications (e.g., code suggestion, optimization, completion, and error detection). However, the quality and suitability of these applications for programming learning remain unclear. Therefore, we evaluate and analyze the performance of ChatGPT in various programming learning tasks as follows.
4.1. Conceptual Understanding
A clear concept is a basic requirement for improved programming performance. ChatGPT can provide explanations and examples of various programming concepts (e.g., data structures, algorithms, languages, and programming language syntax) in a concise, simple, and understandable manner. For example, when we asked ChatGPT to “
provide a pseudocode for a selection sort and an explanation”, it generated the pseudocode and explanation, as depicted in
Figure 7. The results demonstrate that ChatGPT can generate easy-to-understand explanations and pseudocode that are useful for learners to understand algorithmic concepts.
4.2. Solution Code Generation
Programmers can use ChatGPT to generate solution codes based on a problem description. Such solution codes can assist programmers in their programming learning phase. Therefore, we experimented to evaluate the performance of ChatGPT in generating code based on problem descriptions. In the experiment, we used the “
Algorithms and Data Structures” problem description from the Aizu Online Judge (AOJ) system [
56] and verified the correctness of the codes generated by ChatGPT on the same platform and in a basic compiler. We leveraged the problem descriptions of eight random problems and generated codes three times for each problem on the basis of the same description. We then used these strategies to validate our experimental results.
Table 2 shows the correctness of the generated codes based on the problem descriptions using ChatGPT.
In addition,
Figure 8 shows the comparative accuracy of the generated codes executed in the AOJ platform and a basic compiler. The following observations can be drawn: (
i) the correctness rate of the generated code based on the basic compiler is approximately 95.83%; (
) the correctness rate based on AOJ compilation is approximately 75%; (
) when running a submitted code on the AOJ platform, various constraints (time, memory, etc.) and output formatting are taken into account, which may be a reason for the lower accuracy on the AOJ platform than the basic compiler; (
) ChatGPT generates mostly correct code considering long problem descriptions, including constraints, algorithms, and input and output formatting.
4.3. Error Checking and Debugging in Code
Error checking and debugging code is a tedious and time-consuming task for learners, requiring them to check code line-by-line to identify errors and their locations. ChatGPT can identify errors in code and provide potential suggestions and code snippets. We experimented with ChatGPT to debug erroneous codes. To do this, we collected erroneous codes from the AOJ system. We asked the questions “
Does this code have a bug? How can it be fixed?” to ChatGPT, and it responded with valuable suggestions, code snippets, and occasionally whole codes, as depicted in
Figure 9. The suggestions proved to be interesting to the programmers and helpful in resolving the code errors.
4.4. Solution Code Optimization
Code optimization is important in competitive programming, where all test cases and applied constraints must be met. ChatGPT can help optimize codes by suggesting ways to reduce the memory usage and time complexity. Moreover, the explanations provided by ChatGPT based on code reviews are valuable and help programmers better understand the problem and error in the code. We tested using ChatGPT for code optimization. For this experiment, we collected codes that received the time limit exceeded (TLE) or memory limit exceeded (MLE) decision from the AOJ platform, implying that optimization is required to reduce the memory usage and time complexity to be accepted. We asked the question “
The following code passed most of the test cases on the AOJ platform, but received a TLE error decision, how can I optimize the code?” to ChatGPT. In response, ChatGPT provided a useful explanation on the basis of a review of the code as well as optimized code, as depicted in
Figure 10. The generated optimized code was validated on the AOJ platform, and it passed all test cases (10/10) and was accepted for the
Bubble Sort problem (
https://onlinejudge.u-aizu.ac.jp/courses/lesson/1/ALDS1/4/ALDS1_2_A accessed on 21 February 2023). This is a rare case (or an example of erroneous code), but ChatGPT has optimized and fixed it. However, the erroneous code does not require an advanced algorithm like Merge Sort to solve it. It appears that ChatGPT can be a powerful tool to assist programmers in their programming-learning process.
Furthermore, the ChatGPT model can be used for daily practice, learning resources, and personalized programming support for both beginners and advanced programmers. This platform can be a significant assistant for programming learners to better develop concepts, logic, programming language, and coding skills.
5. Survey on ChatGPT Language Model for Programming Learning and Teaching Support
We conducted a comprehensive survey on how ChatGPT supports programming learning and teaching among undergraduate, Master’s, and doctoral students and teachers. The purpose of the survey was to learn how ChatGPT supports programming learning and teaching. Our survey included several questions, such as the respondents’ identification, current programming experience, whether they receive programming learning and teaching support from ChatGPT, how useful this support is in solving programming problems and teaching, and how satisfied they are with this support. The questionnaire for students was related to programming learning support and for teachers, to teaching support. In this section, we present the results of the survey in detail.
Of the participating students, 61.3% (35.5% first year, 16.1% second year, 6.5% third year, and 3.2% fourth year) were undergraduate students, 32.2% (16.1% first year and 16.1% second year) were Master’s students, and 6.5% were second-year doctoral students.
Figure 11 depicts the students’ participation in this survey.
We asked students to rate their programming experiences on a scale of 1 to 5, and received a score of
from about 29% of students,
from 25.8%,
from 25.8%,
from 9.7%, and
from 9.7%. We found that most students rated their programming experiences between
and
.
Figure 12 shows the programming skills of the participating students.
In response to the question “Have you taken help/support from ChatGPT for solving programming problems?”, 78.8% of the students answered “Yes” and 21.2% answered “No”. These results show that a large number of students have taken the support of ChatGPT for their programming learning. By asking the question “Do ChatGPT’s suggestions help you in solving programming problems?”, we tried to assess how ChatGPT suggestions help students. We found that about 86.7% of the students answered “Yes”. Moreover, we asked students “What kind of problems have you solved with the help of ChatGPT?” and gave them two options, “simple problem” and “complex problem”. The answers were interesting: about 59.3% of students used ChatGPT to solve their “complex” problems and about 40.7% of students used ChatGPT to solve “simple” problems. To evaluate the usefulness of ChatGPT for learning programming, 92.9% of the students answered “Yes”. Finally, we asked students, “How satisfied are you with the use of ChatGPT? Please give a rating on a scale 1–5.” and received a rating of from 3.3% of students, from 10%, from 30%, from 36.6%, and from 20%. It can be seen that most students were satisfied with the programming support provided by ChatGPT.
On the other hand, a similar type of survey was conducted with teachers to evaluate how ChatGPT supports programming teaching and research. To the question “Have you taken help/support from ChatGPT for teaching programming?”, about 60% of the teachers answered “Yes”. For questions about programming teaching and research, most teachers were satisfied with the suggestions provided by ChatGPT.
6. Threats and Strategies
As an AI LLM, ChatGPT can play an important role in education and research. The capabilities of this powerful tool are not limited to these fields but also apply to many others. However, despite the many advantages of using ChatGPT, there are also challenges in using it for education and research, especially for technical education such as computer programming. Because ChatGPT is capable of generating texts that are nearly indistinguishable from human-generated texts in high-level cognitive tasks, this capability of ChatGPT raises concerns about its potential use in education and research. In this section, we present the challenges and possible strategies for using ChatGPT.
Integrity of assignments and online exams: Online exams have become a common phenomenon in higher education. As ChatGPT can generate human-like text for academic topics, educators and institutions need to be aware of the possibility of cheating in online exams using ChatGPT. In short, ChatGPT threatens the fairness and validity of online exams and assignments. To address these raised issues, there are some strategies that educators and institutions can take. Students can be given clear instructions on assignments and online exams on how to structure their assignments and answer their questions online [
57]. Students can send their assignments to teachers for review before final submission. An advanced plagiarism-detection tool can be used to detect AI-generated texts. Furthermore, advanced exam supervision/proctoring techniques could be effective for online exams [
14]. In this context, further research is required to fully understand the impact of AI LLMs such as ChatGPT and strategies for combating the misuse of ChatGPT.
Blind reliance on generative AI tools: A heavy reliance on generative AI tools such as ChatGPT can negatively affect education and research. This is because the ease of obtaining answers, problem-solving strategies, and scientific text generation can limit critical thinking and problem-solving skills. Recently, ChatGPT has authorized and credited published papers and preprints [
58]. It also raises questions about writing essays and research articles. The CEO of OpenAI warns against blind reliance on ChatGPT, saying:
“ChatGPT is incredibly limited but good enough at some things to create a misleading impression of greatness. It’s a mistake to be relying on it for anything important but a preview of progress. We have lots of work to do on robustness and truthfulness.”
— Sam Altman, CEO of OpenAI
To address this issue, it is important for students, educators, and researchers to be aware of the limitations of LLMs, and these tools can only be used as supportive tools to enhance research and learning [
59]. Further research is required to design academic curricula, question-and-answer patterns, assignments, and exams to address the challenges raised.
Difficulty in evaluating the ChatGPT-generated answers and texts: As an AI LLM, ChatGPT uses complex algorithms and statistical models to generate answers and text on the basis of patterns learned from large amounts of text data. The answers and texts generated by ChatGPT are becoming indistinguishable from human-generated answers and texts. This poses a challenge to educators and researchers [
57,
60,
61,
62]. Existing plagiarism-detection tools are finding it increasingly difficult to distinguish between AI- and human-generated texts. As a result, restrictions have been placed on the use of ChatGPT in educational institutions [
63]. Cotton et al. [
57] presented several strategies for recognizing texts from LLMs such as ChatGPT, including language inconsistencies, a lack of proper citations, factual errors, ambiguity, and poor context awareness. Further research on the development of new technologies (e.g., AI-based plagiarism detectors) is needed to ensure the integrity of education and research.
Ethical implications and potential biases: The ethical implications and biases of using ChatGPT in education and research should be carefully considered. Typically, LLMs rely heavily on training data, and when the data contain biases or anomalies, it could lead to unfair results. For example, if the training data are biased toward certain people or cultures, the model may produce unfair or discriminatory output. Therefore, it is imperative to ensure that the training data are diverse and well-balanced. ChatGPT and other AI language models can be used to generate fake news, hate speech, and other harmful content. This can lead to social unrest, reputation damage, and even physical harm. Furthermore, the internal mechanisms and processes are not sufficiently open and transparent to users about how they work. It is also important to ensure that the decision-making processes of these models are transparent to users. Because ChatGPT generates responses without human intervention, it can be difficult to hold anyone accountable for the responses generated. This may make it difficult to address any ethical concerns or biases. ChatGPT and other generative models involve the collection and processing of personal data, which raises concerns about privacy and data security. Appropriate measures should be taken to protect unauthorized access to individual data.
Critical thinking and problem-solving skill: ChatGPT can generate nearly accurate answers to technical questions from a wide range of topics and correct or partially correct programming code based on problem descriptions, algorithm and problem names, etc. Simply acquiring answers and code from ChatGPT can be a barrier to improving learners’ critical thinking and problem-solving skills. To the best of our knowledge, there are no such tools that can recognize code generated by AI models, and thus, the solution codes generated by AI models can be used for academic coding exams and competitions. This poses a challenge to educators on how to deal with this new situation.
However, there are some strategies for determining whether responses and programming codes are generated by ChatGPT. Look for telltale signs: ChatGPT responses typically have certain characteristics, such as a lack of personalization or a rather generic tone. Furthermore, the programming code typically contains programming syntax and formatting. Check for coherence: ChatGPT responses may not have a consistent or logical flow, especially when it is generating answers to complex questions. If the answers appear disjointed or nonsensical, this may indicate that they were generated by ChatGPT or another AI model. Compare responses: We can compare responses generated by ChatGPT with responses generated by other language models or humans. If the answer is identical to answers generated by ChatGPT, it may be a sign that the answer was not generated by a human. Use of plagiarism-detection tools: we may also use plagiarism-detection tools to determine whether an answer contains programming code copied from somewhere else. This can help detect cases of fraud. In addition, if there is possible cheating with programming codes, we can ask follow-up questions to determine the depth of the learner’s understanding in answering the question.
7. Limitations
It is worth noting that our experiments were conducted on ChatGPT, which is currently in an active development phase. During the experiments, we obtained significant results in code generation, error checking and debugging, and optimization of the solution code. The results may vary for the following reasons: (i) the release of a new version of ChatGPT may lead to different results; () asking different questions than those presented in this study; () results may vary for different problem descriptions; () code-optimization results may vary for different solution codes.
Apart from that, the experimental results show that the accuracy of the generated code is 95.83% on the basic compiler and 75% on AOJ. The average accuracy of the code generated by ChatGPT is 85.42%. Although ChatGPT generates about 85.42% correct code, it still has the limitations of generating code based on the description. In some cases, adjustments to the generated code may be necessary to achieve error-free compilation and acceptance.
Furthermore, in the survey for teachers, we asked the question, “How satisfied are you with the use of ChatGPT? Please give a rating on a scale of 1-5”. About 50% of teachers gave a rating of . This result also indicates that ChatGPT has not yet gained enough trust in programming education. Some teachers shared their real-life experiences, such as “ChatGPT is still not perfect to answer some exams of my Java class”. In addition, we asked ChatGPT to solve an elementary school quiz that involved counting the number of occurrences of the digit “8” in a given number “348789623489109823647864351672”, and it failed every time to count the exact number of occurrences of the digit “8”, which can be easily counted by elementary students. It also shows the clear limitation of ChatGPT. There is a risk of blindly trusting ChatGPT.
8. Conclusions
ChatGPT and other AI LLMs have the potential and can be supporting tools for educational and research work. ChatGPT is a revolutionary LLM that can maintain human-like conversations and generate human-like text for any natural language query that is nearly indistinguishable. The model can be used to answer questions, write essays, solve problems, explain complex topics, provide virtual tutoring, practice languages, learn programming, teach, and support research. Furthermore, the ChatGPT model can be used to solve technical (e.g., engineering and computer programming) and non-technical (e.g., language and literature) problems. Our surveys and experimental results show that ChatGPT is useful not only for programming education but also for education and research. However, although ChatGPT is a powerful tool that can generate impressive responses on a variety of topics, it still has certain limitations, such as a lack of common sense, potential bias, difficulty with complex reasoning, and inability to process visual information. It is important to keep in mind the limitations of ChatGPT when using it, and it should not be relied upon blindly. In addition, the ethical implications (e.g., bias and discrimination, privacy and security, misuse of technology, accountability, transparency, and social impact) of ChatGPT are complex and multifaceted and should be carefully considered.
Despite the various difficulties and challenges, we believe that the risks discussed can be effectively managed and must be addressed to provide reliable and equitable access to LLMs for educational and research purposes.