Why Is Programming So Difficult To Learn? Patterns of Difficulties Related To Programming Learning
Why Is Programming So Difficult To Learn? Patterns of Difficulties Related To Programming Learning
Why Is Programming So Difficult To Learn? Patterns of Difficulties Related To Programming Learning
1 http://mashable.com/2015/09/21/coding-schools-australia/?id=mash-com-fb- 4 http://www.telegraph.co.uk/technology/news/10410036/Teaching-our-children-
aus-link#Yv6gpyKnmGqh to-code-a-quiet-revolution.html
2 http://www.npr.org/sections/ed/2016/01/12/462698966/the-president-wants- 5 Site: https://www.youtube.com/watch?v=nKIu9yen5nc. What Most Schools
ensino-de-programacao-a-escolas-do-brasil/
Copyright is held by the author.
ACM SIGSOFT Software Engineering Notes Page 2 November 2016 Volume 41 Number 6
There are several skills needed to learn how to program, being more 3.1 Research Question 1
obvious the ability to solve problems and fundamental knowledge of RQ1 – What is the unsuccessful rate in introduction programming
math. Besides these, Jenkins [15] states that it is necessary to know how courses?
to use a computer, to create the program, compile, test, and correct bugs,
and learning style and motivation are factors that influence the process of First of all, it is important to explain the meaning of unsuccessful. For
learning to how to program. our research, unsuccessful is the result showing that the student has not
completed or did not receive a grade necessary to conclude the course. In
Understanding the process of learning a first programming language can order to gather evidence about the problem we are dealing with, we will
help in the task of creating more effective learning environments [13], conduct a quantitative study about approvals and failures in introduction
thereby reducing the difficulties encountered by beginners. Several programming courses. Much of our data collection will be conducted at
researchers aimed to find information about these difficulties. Denny et the University of São Paulo - USP, which offers annual Introduction
al. [12] show that syntax error is one of the barriers for programming Programming courses for thousands of students from several different
novices, delaying the feedback provided to students about the logic of the subject-areas. Thus, our goal is to discover the following: what courses
code developed. Cechinel et al. reported that the most common problems are being offered, what is the profile of the students, what is the failure
are the lack of ability to find errors, develop a program to solve a task, and drop-out rate, what is the profile of each instructor, and how this
and modularization of code using functions and procedures. The topics compares to the data obtained from the literature. We are also analyzing
considered the most difficult were functions and procedures, error the possibility to create and submit a survey to several universities from
handling, and arrays (vectors) [7]. different countries to seek information regarding the unsuccessful rate in
Ribeiro et al. investigated the differences between the use of textual and introductory programming courses.
visual programming environments in the introduction of computer Methodology: The first step was to query Introduction to Programming
programming [20]. After analyzing the data collected from NASA TLX, (IP) courses in the academic system using three keywords:
activity log, and survey, they concluded that visual programming is a "programming," "algorithms," and "computing." Our search returned a
good model for teaching algorithms and programming. Many others total of 207 courses. After analyzing the content of these programs, we
researches are conducted to determine if specific methodology, as Agile selected a group of 31 courses for our research. Only 29 of these courses
[17], or code smells by novice programming [14] help to learn how to were considered because two were new, and their classes had not been
program. completed. We obtained an anonymous database which provided the
Lahtinen et al. conducted a survey at six universities in five countries and individual results of the 29 courses in the previous five years. We also
obtained responses from 559 students and 34 instructors. The answers obtained the school records of each student who attended one of the 29
were given on a scale of 1, easy to learn, to 5, very difficult. As for the courses. The preliminary results of the analysis of this database have been
educational content covered in the course, the average student perception shown in two papers [5, 6]. We are currently analyzing the results over a
about how difficulty is the course (mean 2.8) is smaller than instructors longer period of time and cross referencing additional data such as the
(mean 3.5). Students and instructors have the same perceptions of the results of the students in the university entrance exam and in other
three content considered more difficult. They are, in this order: pointers, subjects. Our aim is to compare the results in the IP course with other
error handling, and recursion. Other contents also considered difficult courses and specific knowledge areas, such as Languages, Math, Physics,
were: using language library and abstract data types. Both in the view of etc.
students and instructors, the three content deemed easier were: selection, Validation test plans and publications: Some information has already
repetition, and variables. However, learning the concepts is not been obtained , such as the percentage of failures, which corroborated the
considered by students and instructors the biggest problem for results obtained by Bennedsen and Caspersen [3]. We will also conduct
programming apprentices. The biggest problem is to apply them in a quantitative examination of the performance of students at the
practice [16]. University of São Paulo, and these will be compared with the results from
This work contributes to the state of the art identifying patterns of the literature, and an article will be submitted to a reputable journal.
difficulties related to programming learning. As opposed to the Threats to validity and other challenges: Some factors may lead to errors
traditional focus on syntax problems, our study focuses mainly on the in the data disclosed, such as the possibility of errors in the extraction and
semantic level in the procedural programming paradigm. Other studies compilation of the system data. To avoid this, we selected a sample of
cite difficulties, problems, and common errors, however do not provide data for manual checking, and compared this with data from other
an in-depth understanding of the difficulties, their relations, and their sources.
relevance in multiple scenarios. Thus, knowledge about learning
difficulties is spread thin across the literature, and there is little Timeline with Milestones from RQ1: Figure 1 below shows the timeline
exploration of the problems faced by learners that are not from the of RQ1.
computer science area. Additionally, we observed that the majority of the
related research predominantly relied on quantitative questionnaire- Database Submission
based methodology. Those that uncovered difficulties missed research analysis Database paper of
questions or objectives related to the in-depth understanding of the (5 years DB) analysis Database journal
phenomena from points of view of students and instructors. In this (5 years DB + analysis
research, we systematically review the literature and collect data from questionnaire) (10 years DB)
students and instructors. Our study aims to provide this neglected in- First Second
depth understanding of the difficulties and to add to the dominant paper paper
quantitative survey-based research on learning how to program.
3.2 Research Question 2 from six courses were invited to participate in our research project by
RQ2 – What difficulties have been reported in the literature with regard filling out diaries during their studies. They were encouraged to report
to learning how to program? their experiences, their feelings, the difficulties encountered during their
studies, and how they were resolved. 34 students took part in the activity.
Methodology: A systematic literature review will be carried out to Google Docs was used for the data collection, by means of individual
identify difficulties in learning how to program that have been reported documents for each student. Open coding and axial coding [8] were used
and/or empirically investigated so far. We are currently refining the for the data analysis. Our group has already used diaries and this kind of
protocol, which includes a search string, databases, and criteria for analysis in another situation [22]. This method will be applied again in
exclusion/inclusion. On the basis of these factors, the search will be the second half of 2016 with students from other courses.
carried out, and this will involve identifying the primary studies,
determining what difficulties have been reported and evaluated, and The third method will include a survey with specific questions about
collating the results. A “model of difficulties” will be proposed, in a possible difficulties encountered in the introductory programming
similar way to a previous study conducted by our research group [21]. course. This survey aims to quantitatively confirm observed patterns and
expand the scope of analysis, collect more qualitative and quantitative
Validation test plan and publication: the design of the model will be data. This survey will be applied to students from several universities on
grounded on the data obtained from the primary studies. We will also two occasions, with classes in the first and second half of 2016.
compare our results with those of other literature reviews or catalogues,
if available. The results will be formatted in an article that will be Validation test plan and publication: data were collected in three different
submitted to a Software Engineering journal. ways and a joint analysis will be carried out to identify the patterns. The
data collected with interviews regarding the Think Aloud method were
Threats to validity and other challenges: a threat to validity that we have described in a submitted paper and the data from the diaries (part of RQ3)
in mind is the improper definition of the search string. To avoid this and interviews (part of RQ4) were compiled and a paper is being
threat, we will select articles that are known in the area and the string prepared.
must return these items in the search results.
Threats to validity and other challenges: The greatest challenge is to
Timeline with Milestones from RQ2: Figure 2 below shows the timeline persuade the students to participate by filling in the diaries and answering
of RQ2. the questions in the survey. We will go to some classrooms and collect
the responses in person.
Preparation - Papers selected Submission
Systematic with inclusion paper of 3.4 Research Question 4
Review criteria journal RQ4 – What are the difficulties of learning how to program from the
Reading of instructors’ perspective?
Papers selected
Methodology: Interviews were conducted in late 2015 with 16 instructors
selected with papers
involved in the Introduction to Programming course. Ten instructors were
search string randomly selected and the other 6 were those that were teaching
Introduction Programming courses that semester. The purpose of the
interviews was to find out what are the difficulties of the students in the
Q3 2016 Q4 2016 Q1 2017 Q2 2017 view of the instructor. Inquiries were made about the syllabus of the
Writing papers and qualification work subject to determine the difficulties observed by the instructors. The
interviews were conducted individually, the content was recorded on
Figure 2. Timeline with milestones from RQ2. audio and transcribed. Currently, we are at the stage of analyzing and
formatting data employing the methodology of Grounded Theory.
3.3 Research Question 3 With also aim to conduct surveys to collect additional data and confirm
RQ3 – What are the difficulties of learning how to program from the
some hypothesis raised during the analysis.
students’ perspective?
Validation test plan and publication: data from the diaries (part of RQ3)
Methodology: We have been collection information from students by
and interviews (part of RQ4) were formatted and will be presented in a
means of three different methods. The first involves individual interviews
paper. Data collected from students (RQ3), together with the data from
based on the Think Aloud technique [19]. This technique consists of
the instructors (RQ4) will be analyzed and formatted in a paper that will
observing the way users perform specific tasks in controlled
be submitted to an international journal.
environments. The task assigned to the students was made up of 4
exercises with different levels of difficulty. They had to solve a problem Timeline with Milestones from RQ3 and RQ4: Figure 3 below shows the
using the C programming language, in the Virtual Programming Lab - timeline of RQ3 and RQ4.
VPL 6. VPL is a plugin for Moodle developed by the University of Las
Palmas, Canary Islands - ULPGC that offers information about the
compilation of the code. In addition, through test cases set by the
instructor, it gives feedback to the students about their code. During the
interviews, the computer screen and audio was recorded for subsequent
analysis. In the pilot study, six students who had failed in the introductory
programming course took part in the interviews at the end of 2015.
The second method is based on Diaries [18]. This method was chosen
because it enables information about events and experiences to be
obtained from the perspective of the subject in a spontaneous way,
reducing the time between the occurrence of the event and the time when
it is reported to the researchers [4]. In the second half of 2015, students Figure 3. Timeline with milestones from RQ3 and RQ4.
3.5 Research Question 5 them, we will apply the strategy. In the other one, we will analyze the
RQ5 – What errors in syntax and semantics are recurrently found in the students manifest the difficulty related to the pattern (Figure 6). At each
code developed by the students? stage of action research, different elements of the model will be
evaluated. We will perform the triangulation of data to validate the results
Methodology: Code made by students during the semesters has been and improve accuracy [9, 10].
collected for analysis of error patterns. We will connect these patterns to
those identified from the previous RQs. We will use mining software
repositories techniques in order to collect, clean, and analyze the data,
searching for the patterns. 1. Set
teaching
Validation test plan and publication: A problem must be detected in at strategy
least three different situations in order to be considered a pattern. We plan
to gather evidence of the reported difficulties and find new patterns
analyzing the source code produced by the learners. 5. 2. Apply
Threats to validity and other challenges: The analysis of syntax errors can Analyze the
be done by a system that analyzes the code submitted by the students. the strategy
The analysis of the logic errors is more complicated to be performed by results in a group
the system. We are still testing different way of doing this activity.
Timeline with Milestones from RQ5: Figure 4 below shows the timeline
of RQ5.
3. In another
4. Compare the
group, without
two groups to
the strategy,
see if the
verify the
difficulty
presence of the
decreased
pattern
Figure 4. Timeline with milestones from RQ5. Validation test plan and publication: An article describing the research
and its results will be submitted for publication in an international
3.6 Patterns Definition journal.
Based on the results of the RQ2 to RQ5, we will compile the difficulties
Threats to validity and other challenges: One challenge will be to have
observed into patterns. Each pattern will comprise a name, situation in
classes and instructors enough to work in this action research. Another
which it occurs, how to solve it, and examples. We will also categorize
challenge will be to have time enough to make all validation.
the patterns according to the Bloom’s taxonomy. Bloom created
categories for educational goals[1] (Figure 5). Each category has a set of Timeline with Milestones from RQ6: Figure 7 below shows the timeline
action words that could be used help identify the kind of knowledge of RQ6.
related to the difficulties.
Finished the
pattern
Pattern and strategy validation process documentation
7 https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/
ACM SIGSOFT Software Engineering Notes Page 5 November 2016 Volume 41 Number 6
or more times the course. This course is among the ones with the highest which they referred to were correctly written, but with undeclared name.
failure rates. Moreover, in some moments, the student faced problems with intention
and practice. They verbalized something, but wrote something different.
RQ3 – What are the difficulties of learning how to program from the This situation was detected during interviews and can be observed in the
students’ perspective? comments “I do not know if it's like this to read an array, but okay” a1 and
“I think something is missing in this print” a6 .
In the second half of 2015, 34 students from six courses filled diaries
about their studies. They reported difficulties and some strategies found Syntax errors were common in all the interviews and exercises, e.g.
to solve them. In the following, we present some students' comments opening and closing structures with brackets, colons, correct spelling of
identifying who wrote them by means of a subscript "a" followed by a the commands, among others. Some errors are noteworthy, such as: (A)
numbering. The data found in these diaries were analyzed using attempt to read the data in the matrix; (B) create an unnamed function,
Grounded Theory procedures and they were grouped by concepts, besides the incorrect declaration of the variables to receive the parameters,
forming four categories: Difficulties, Study Strategies, Preferences, and and (C) semi-colon ending a structure of repetition and selection that has
Self-assessments. In the following, we present some results from the first not even started.
category. We detected that 'syntax error', with 13 occurrences, was the When semantic errors occurred, students usually became more
problem most frequently reported by students, with comments like: “I disappointed than with syntax errors. With syntax errors, they seemed to
still have a lot of errors in basic things like braces, parentheses, and be more accustomed. The semantic errors made students to drop out the
semicolons” a1 , “the program still didn't execute due to some syntax exercise faster, because they already realized that they need more time to
errors that I don't know how to solve” a20 and, “It is returning syntax error fix semantic errors.
all the time” a22 . This type of error makes students to return often to the
code before being able to check if their logic was correct. RQ4 – What are the difficulties of learning how to program from the
instructors’ perspective?
Problems with 'variables' was the second most cited, as noted in the
following comment “I had difficulty to understand what should be float We randomly selected 14 instructors of the Computer Science
and what should be int type, so I had to go testing to find” a1 . The concept Department of the University of São Paulo, that taught introductory
'Language + IDE + Error Message' was also widely cited, having programming. Individual interviews were conducted with each of them.
complaints as: “initially, I had difficulty with the language, even with the The interviews were recorded and are being analyzed using Grounded
Theory procedures. The objective of this study is to seek the difficulties
complementary material, I had difficulty putting into practice” a5 ,
“because the program's messages did not help at all” a20 and, “I could not encountered by students in the instructors’ view.
interpret the messages that the program showed, so I had to execute parts The main difficulty, cited by instructors, is the 'logical reasoning'. They
of the program separately in another window until I could identify the have tried pseudocode, but most have given up. Some instructors use the
error” a20 . In addition to these complaints about the language and the error pseudocode only to quickly explain the concept, then they go straight to
messages, we received comments related to the IDE, as “the instructor's the programming language. Others use pseudocode in parallel, i.e., they
site doesn't have the link to download the updated version of the IDE, and develop in pseudocode and then translate into the programming language:
the available version doesn't work on Windows 8” a17 . “...if you don't know where to start, writes in natural language a draft.
After this, you go to the pseudocode and only at the end you go to
In an another study, aimed at getting more information about the students
Python” p3 . They also reported that the experience sometimes makes it
and their behavior during the studies, using the Think Aloud method, we
difficult to teach: “I see a problem, it already is structured in my mind and
conducted interviews with six students, lasting about one hour each. These
I don't know how it happens” p1 .
students did not succeed during the semester and needed to make the final
test if they wanted to be approved. During the interviews, they were About the 'syntactical issues of language', they all agreed that C syntax has
challenged to solve four exercises with increasing degree of difficulty. much more details to be observed during programming. It was also
Their interview session was registered, including the computer screen and commented that the 'choice of language' influences on the development of
audio recordings, for analysis. the student.
One of the observed attitudes, adopted by 2 of the students, was to take They cited operators - arithmetic, logical and relational – as sources of
notes while they read the statements (student 1 and 3). These 2 had no difficulties. Students get confused with precedence. There is also difficulty
better results than the others, but one of them, when asked by the interview in differentiating the logical operators 'and' and 'or' and do arithmetic with
moderator, informed that “annotating helps to remember what needs to be variables from the same type, but resulting in a different type. An example
done because otherwise I cannot remember”. Analyzing the behavior of is when the division of two integers results zero, as the division of 1 by 2.
the respondents while running the session, we observed that this To display the correct result, the type of the resulting value must be float,
annotation process helped, for example, in the definition of which and how “it is hard to them realize the error” p4 .
many variables were required to solve the task. One difference between
these students and the others is that they had less mistakes in declaring the Among the structures of selection/decision and repeat/loop, most
variables and setting their types, practically they did not need to go back instructors start teaching the loop structure, more specifically by 'while'.
to the code to change what they had written. It was often cited that 'while' gives the impression of having more control
about the structure, that the student prefers 'while’ rather than 'for',
The interview moderator observed in two students a reaction while reading information that corroborates those of diaries written by students. Students
the statement. Student 6 had not read the entire statement when he stopped also have difficulty in embedded loops and how to set the break condition.
reading to make the comment “I get nervous when I see the word matrix”. In the selection structure, the difficulty is ‘see the if..else pairs’. In
The student 1, when started to read the second question spoke instantly “I addition, the students mix concepts between decision and loop structures.
do not like function” and “I have difficulties with function parameters”.
Student 1 said “At a first glance I dislike this exercise, I like exercises that For arrays, there is the 'forgotten to put the index' regarding the position,
have numbers”. In these three situations the students did not succeed on which is solved with the strategy of 'intensive practice': “You have to do
solving the exercise. This may be a sign that the students create a barrier by repeating, which is a tiring business at the beginning” p1 . Instructors
to the content that they face more difficulty. also commented that students understand the concept, but fail to apply in
practice, information that reinforces what has already been published [23].
We also noticed uncertainty in students and some degree of absence of
analytical thinking. They are used to copy and paste the code to read About function, the difficulty lies in understanding the scope of the
matrix elements, but when faced by compilation errors, there stated variables and the importance of the return value. Teachers believe that
comments like “We will see now. Must be something wrong. There is there are not major problems with parameter passing, however, when it is
always something wrong.” The moderator noted that the commands to by reference and the language used is C, the difficulty increases.
ACM SIGSOFT Software Engineering Notes Page 6 November 2016 Volume 41 Number 6
Instructors comment that there are some factors that help to make Algoritmos e Programação. Cbie (2015), 1389.
difficult to teach how to program, as: heterogeneity of the groups and [6] Bosse, Y. and Gerosa, M.A. 2015. Reprovações e Trancamentos
between groups, low participation in class, low frequency, very large nas Disciplinas de Introdução à Programação da Universidade
classes, disinterest in learning, the programming language adopted,
de São Paulo : Um Estudo Preliminar. WEI - Workshop sobre
trauma of students that repeat the course, among others. They also
expressed concern about trying to motivate the student. They use strategies Educação em Computação. (2015), 1–10.
like working with games, challenge, and competition. Instructors [7] Cechinel, C. et al. 2008. Desenvolvimento de Objetos de
complained about trying to know by heart instead of learning. This was a Aprendizagem para o Apoio à Disciplina de Algoritmos e
strategy also quoted by the students in the diaries. Programação. Simpósio Brasileiro de …. (2008).
5. NEXT STEPS [8] Corbin, J. and Strauss, A. 1990. Grounded theory research:
The next steps of the research are: Procedures, canons, and evaluative criteria. Qualitative
Sociology. 13, (1990), 3–21.
1. Complete the database analysis about the last 10 years of the
[9] Creswell, J.W. 2013. Research design: Qualitative,
Introduction to Programming course at USP (RQ1).
quantitative, and mixed methods approaches.
2. Perform the systematic literature review to find difficulties [10] Creswell, J.W. and Clark, V.L.P. 2007. Designing and
reported (RQ2). conducting mixed methods research.
3. Apply the technique of diaries in more courses and run a [11] Crowne, M. 2002. Why software product startups fail and what
confirmatory questionnaire with students (RQ3). to do about it. Evolution of software product development in
startup companies. IEEE International Engineering
4. Complete interviews with instructors from USP and apply an
Management Conference. 1, (2002), 338–343.
extended survey for instructors from outside. Analyze and
tabulate the data that will give us information about the [12] Denny, P. et al. 2011. Understanding the syntax barrier for
students' difficulties perceived by teachers (RQ4). novices. Proceedings of the 16th ACM conference on Innovation
and technology in computer science education - ITiCSE ’11.
5. Consolidate the results from the systematic literature review (2011), 208.
and the data collection in a single model
[13] Garner, S. et al. 2005. My program is correct but it doesn’t run:
6. Analyze the source code produced by students (RQ5). A preliminary investigation of novice programmers’ problems.
Conferences in Research and Practice in Information
7. Validate the patterns (RQ6).
Technology Series. 42, (2005), 173–180.
The specific timelines were presented in the method section. [14] Hermans, F. and Aivaloglou, E. 2016. Do Code Smells Hamper
Novice Programming ? (2016).
6. CONCLUSION
Until now, we are not a lot of results, but some patterns are already defined. [15] Jenkins, T. 2002. On the Difficulty of Learning to Program. ICS
One of these is the difficulty that students have to work with functions. - International Conference on Supercomputing. (2002).
Understanding the scope of variables and why it is necessary to pass and [16] Lahtinen, E. et al. 2005. A study of the difficulties of novice
return parameters is not easy for them. Some strategies used by instructors programmers. ACM SIGCSE Bulletin. 37, 3 (2005), 14–18.
to mitigate this barrier were explained in the interviews with instructors.
[17] Missiroli, M. et al. 2016. Learning Agile Software Development
We expected that the patterns of difficulty related to programming learning in High School : an Investigation. Proceedings of the 38th
help students in their studies, teachers in preparing their lessons, and International Conference on Software Engineering (ICSE).
researchers in developing new tools to support teaching and learning (2016), 293–302.
programming. This will help to train the next generation of software
engineers. [18] Reis, H.T. 1994. Domains of experience: investigating
relationship processes from three perspectives. 87–110.
7. ACKNOWLEDGMENTS [19] Renzi, A.B. et al. 2012. Use of Think-Aloud Protocol to Verify
We would like to express our thanks to the instructors and students from Usability Problems and Flow During Use of Entertainment and
the University of São Paulo for their valuable assistance with our research Personal Journal. 12o Congresso Internacional de Ergonomia e
project. Usabilidade de Interfaces Humano-Computador (Natal - Brasil,
2012), 7.
8. REFERENCES
[1] Anderson, L.W.. et al. 2001. A taxonomy for learning, teaching, [20] Ribeiro, R. da S. et al. 2014. Programming web-course analysis:
and assessing : a revision of Bloom’s taxonomy of educational How to introduce computer programming? 2014 IEEE Frontiers
objectives. Longman. in Education Conference (FIE) Proceedings. 2015–Febru,
February (2014), 1–8.
[2] Beaubouef, T. and Mason, J. 2005. Why the high attrition rate
for computer science students. ACM SIGCSE Bulletin. 37, 2 [21] Steinmacher, I. et al. 2015. A systematic literature review on the
(2005), 103. barriers faced by newcomers to open source software projects.
Information and Software Technology. 59, (2015), 67–85.
[3] Bennedsen, J., & Caspersen, M.E. 2007. Failure rates in
Introductory Programming. ACM SIGCSE Bulletin. 39, 2 [22] Steinmacher, I. et al. 2016. Overcoming open source project
(2007), 32–36. entry barriers with a portal for newcomers. Proceedings of the
38th International Conference on Software Engineering (ICSE).
[4] Bolger, N. et al. 2003. Diary methods: Capturing life as it is
(2016), 273–284.
lived. Annual Review of Psychology. 54, (2003), 579–616.
[23] Winslow, L.E. 1996. Programming Pedagogy - A Psychological
[5] Bosse, Y. and Gerosa, M.A. 2015. As Disciplinas de Introdução
Overview. ACM SIGCSE Bulletin. 28, 3 (1996), 17–22.
à Programação na USP: um Estudo Preliminar. WAlgProg - I
Workshop de Ensino em Pensamento Computacional,