The University of Nottingham
School of Computer Science and Information Technology
Formative Computer Based Assessment
in Diagram Based Domains
by Brett Bligh, BSc(Hons).
Thesis submitted to the University of Nottingham for
the degree of Doctor of Philosophy, September 2006
To my parents…
ii
Abstract
This research argues that the formative assessment of student coursework in freeform, diagram-based domains can be automated using CBA techniques in a way
which is both feasible and useful. Formative assessment is that form of assessment in
which the objective is to assist the process of learning undertaken by the student. The
primary deliverable associated with formative assessment is feedback. CBA
courseware provides facilities to implement the full lifecycle of an exercise through
an integrated, online system. This research demonstrates that CBA offers unique
opportunities for student learning through formative assessment, including allowing
students to correct their solutions over a larger number of submissions than it would
be feasible to allow within the context of traditional assessment forms.
The approach to research involves two main phases. The first phase involves
designing and implementing an assessment course using the CourseMarker /
DATsys CBA system. This system, in common with may other examples of CBA
courseware, was intended primarily to conduct summative assessment. The benefits
and limitations of the system are identified. The second phase identifies three
extensions to the architecture which encapsulate the difference in requirements
between summative assessment and formative assessment, presents a design for the
extensions, documents their implementation as extensions to the CourseMarker /
DATsys architecture and evaluates their contribution.
The three extensions are novel extensions for free-form CBA which allow the
assessment of the aesthetic layout of student diagrams, the marking of student
solutions where multiple model solutions are acceptable and the prioritisation and
truncation of feedback prior to its presentation to the student.
Evaluation results indicate that the student learning process can be assisted through
formative assessment which is automated using CBA courseware. The students learn
through an iterative process in which feedback upon a submitted student
coursework solution is used by the student to improve their solution, after which
they may re-submit and receive further feedback.
iii
Acknowledgements
To attempt to acknowledge every person who has contributed to my intellectual
development, research and life during the four long years of my PhD is a fool’s
errand. I will not even make the attempt. Instead, I will single out a few notable
people and apologise to the rest. You know who you all are.
First of all, I would like to thank Colin Higgins, the leader of the Learning
Technology Research group and my supervisor throughout the PhD process. His
encouragement and advice were essential to the completion of the research.
Athanasios Tsintsifas provided the starting point for this research through his design
of the DATsys framework. He also provided a kick-start to the research in the form
of an imposing breeze-block of essential reference materials.
Pavlos Symeonidis provided crucial practical advice and encouragement in his
inimitable way and did his best to keep me on the straight and narrow.
The other members of the LTR group provided food for thought and a welcoming
environment, especially Marjahan Begum, Swe Myo Htwe (Shannon), Geoff Gray
and Nasirah Omar, who were a constant presence in the office and source of banter
throughout.
I have relaxed and socialised with a large number of people in Nottingham over the
past four years. However, Said Macharkah deserves a special mention for boosting
my morale at particularly important moments. My long-term friends from back
home have had to put up with much neglect during my stay in Nottingham and
deserve my gratitude. The gang from “The Welly” have provided many light-hearted
moments during a potentially dull writing up process.
Very special thanks are due to my fiancée, Marioara Urda. With the PhD complete,
my priorities lie with her.
Last but absolutely not least, I wish to thank those members of my family — parents,
grandparents and all — who have supported me in so many ways throughout my
long years of studentship. It would not have been possible without you.
Thank you all, Brett Bligh
iv
Table of Contents
ABSTRACT ...........................................................................................................................................II
ACKNOWLEDGEMENTS ................................................................................................................ III
TABLE OF CONTENTS .................................................................................................................... IV
LIST OF FIGURES............................................................................................................................. IX
LIST OF TABLES................................................................................................................................. X
LIST OF EQUATIONS......................................................................................................................... X
CHAPTER 1 INTRODUCTION..........................................................................................................1
INTRODUCTION ..................................................................................................................................2
1.1 BACKGROUND...............................................................................................................................3
1.1.1 MOTIVATION ................................................................................................................................3
1.1.2 SCOPE ..........................................................................................................................................4
1.2 BRIEF OVERVIEW ........................................................................................................................5
1.2.1 GENERAL OBJECTIVES .................................................................................................................5
1.2.2 PROBLEMS AND SPECIFIC OBJECTIVES .........................................................................................7
1.2.3 APPROACH ...................................................................................................................................8
1.2.4 CONTRIBUTIONS ...........................................................................................................................9
1.3 SYNOPSIS OF THE THESIS .........................................................................................................9
CHAPTER 2 CBA, FORMATIVE ASSESSMENT AND DIAGRAMMING................................12
INTRODUCTION ................................................................................................................................13
2.1 COMPUTER BASED ASSESSMENT .........................................................................................13
2.1.1 DEFINITION ................................................................................................................................13
2.1.2 DEVELOPMENT OF AUTOMATED ASSESSMENT ...........................................................................16
2.1.3 MOTIVATION IN AUTOMATED ASSESSMENT...............................................................................17
2.1.4 BENEFITS OF COMPUTER BASED ASSESSMENT...........................................................................18
2.1.5 LIMITATIONS OF COMPUTER BASED ASSESSMENT .....................................................................22
2.1.6 A TAXONOMY FOR COMPUTER BASED ASSESSMENT .................................................................26
2.1.6.1 Fixed-response Automated Assessment .............................................................................27
2.1.6.2 Free-response Automated Assessment ...............................................................................28
2.1.7 ASSESSING HIGHER COGNITIVE LEVELS USING CBA.................................................................33
2.1.8 SUMMARY ..................................................................................................................................34
2.2 FORMATIVE ASSESSMENT ......................................................................................................35
2.2.1 DEFINITION ................................................................................................................................35
2.2.2 BENEFITS OF FORMATIVE ASSESSMENT .....................................................................................36
2.2.3 DRAWBACKS ASSOCIATED WITH FORMATIVE ASSESSMENT .......................................................38
2.2.4 MANAGING THE RESOURCE INTENSIVENESS OF FORMATIVE ASSESSMENT ................................40
2.2.5 EFFECTIVE FEEDBACK FOR FORMATIVE ASSESSMENT ...............................................................42
2.2.6 SUMMARY ..................................................................................................................................44
2.3 DIAGRAMS IN EDUCATION .....................................................................................................45
2.3.1 DEFINITION ................................................................................................................................45
2.3.2 HISTORY AND SCOPE ..................................................................................................................45
2.3.3 DIAGRAMS IN AUTOMATED ASSESSMENT ..................................................................................47
2.3.4 AESTHETICS OF EDUCATIONAL DIAGRAMS ................................................................................49
2.3.4.1 Aesthetic Criteria...............................................................................................................49
2.3.4.2 Criteria from Graph Layout...............................................................................................50
2.3.4.3 Criteria from User Interface Design..................................................................................51
2.3.4.4 Domain-specific Layout Criteria .......................................................................................53
2.3.6 SUMMARY ..................................................................................................................................53
v
2.4 CHAPTER SUMMARY ................................................................................................................54
CHAPTER 3 CBA APPROACHES FOR FORMATIVE ASSESSMENT AND DIAGRAMS....56
INTRODUCTION ................................................................................................................................57
3.1 USING CBA TECHNOLOGY TO PROVIDE FORMATIVE ASSESSMENT .......................57
3.1.1 FIXED-RESPONSE FORMATIVE CBA: A REVIEW ..........................................................................59
3.1.1.1 Using existing platforms ....................................................................................................59
3.1.1.2 In-house fixed-response CBA systems................................................................................61
3.1.1.3 Implications for formative assessment using CBA.............................................................65
3.1.2 FREE-RESPONSE FORMATIVE CBA: A REVIEW ............................................................................66
3.1.2.1 Formative assessment capabilities of free-response CBA systems ....................................67
3.1.2.2 Implications for formative assessment using CBA.............................................................71
3.1.3 SUMMARY ..................................................................................................................................72
3.2 CBA APPROACHES IN DIAGRAMMATIC DOMAINS .........................................................72
3.2.1 TRAKLA2: A REVIEW ...............................................................................................................73
3.2.2 PILOT: A REVIEW ......................................................................................................................75
3.2.3 DIAGRAM COMPARISON SYSTEM: A REVIEW ..............................................................................77
3.2.4 AUTOMATIC MARKER FOR ENTITY RELATIONSHIP DIAGRAMS: A REVIEW .................................79
3.2.5 SUMMARY ..................................................................................................................................82
3.3 CEILIDH, COURSEMARKER AND DATSYS..........................................................................82
3.3.1 CEILIDH......................................................................................................................................83
3.3.1.1 Ceilidh’s Architecture........................................................................................................83
3.3.1.2 Ceilidh’s Course Structure ................................................................................................84
3.3.1.3 Ceilidh’s User Views .........................................................................................................85
3.3.1.4 Ceilidh’s Marking Tools ....................................................................................................85
3.3.1.5 Review of Ceilidh ...............................................................................................................86
3.3.2 COURSEMARKER .......................................................................................................................88
3.3.2.1 CourseMarker’s Development Overview...........................................................................89
3.3.2.2 CourseMarker’s Architecture ............................................................................................89
3.3.2.3 CourseMarker’s Course Structure.....................................................................................90
3.3.2.4 CourseMarker’s User Views..............................................................................................90
3.3.2.5 CourseMarker’s Marking Tools and the Generic Marking System ...................................91
3.3.2.6 Experiences with CourseMarker........................................................................................92
3.3.3 DATSYS .....................................................................................................................................94
3.3.3.1 Daidalos.............................................................................................................................95
3.3.3.2 Ariadne ..............................................................................................................................96
3.3.3.3 Theseus ..............................................................................................................................97
3.3.3.4 Integration of DATsys with CBA courseware ....................................................................97
3.3.3.5 Experiences with DATsys...................................................................................................98
3.3.4 SUMMARY ..................................................................................................................................98
3.4 SUMMARY.....................................................................................................................................99
CHAPTER 4 PROBLEMS IN CBA APPLIED TO FREE-RESPONSE FORMATIVE
ASSESSMENT.................................................................................................................................... 101
INTRODUCTION .............................................................................................................................. 102
4.1 ASSESSMENT BACKGROUND................................................................................................ 102
4.2 ASSESSMENT CONSTRUCTION AND METHODOLOGY................................................. 104
4.2.1 ASSESSMENT CONSTRUCTION .................................................................................................. 104
4.2.2 METHODOLOGY ....................................................................................................................... 110
4.3 RESULTS AND ANALYSIS ....................................................................................................... 111
4.3.1 GENERAL IMPRESSIONS ............................................................................................................ 111
4.3.2 PROBLEMS................................................................................................................................ 111
4.3.3 MARKING DATA ....................................................................................................................... 112
4.3.4 PERFORMANCE AS FORMATIVE ASSESSMENT........................................................................... 114
vi
4.4 CONCLUSIONS........................................................................................................................... 115
4.5 SUMMARY................................................................................................................................... 116
CHAPTER 5 PROVIDING A SPECIFICATION FOR FORMATIVE CBA IN DIAGRAMBASED DOMAINS ............................................................................................................................ 118
INTRODUCTION .............................................................................................................................. 119
5.1 OBJECTIVES............................................................................................................................... 119
5.1.1 DEFINITIONS ............................................................................................................................ 120
5.1.2 IDENTIFYING THE NECESSARY EXTENSIONS............................................................................. 121
5.1.2.1 Fulfilling Computer Based Assessment criteria............................................................... 122
5.1.2.2 Fulfilling Formative Assessment criteria......................................................................... 124
5.1.2.3 Fulfilling Computer Based Assessment criteria............................................................... 126
5.1.2.4 Summary .......................................................................................................................... 126
5.1.3 AIMS AND MOTIVATION ........................................................................................................... 127
5.1.4 SUMMARY ................................................................................................................................ 130
5.2 DETAILED REQUIREMENTS.................................................................................................. 130
5.2.1 REQUIREMENTS FOR ASSESSING THE AESTHETICS OF STUDENT DIAGRAMS .............................. 131
5.2.2 REQUIREMENTS FOR ASSESSING SOLUTIONS WITH MUTUALLY EXCLUSIVE ALTERNATE SOLUTION
CASES ................................................................................................................................................ 132
5.2.3 REQUIREMENTS FOR PRIORITISING AND TRUNCATING FEEDBACK TO STUDENTS ...................... 135
5.2.4 SCOPE OF GUIDANCE NEEDED FOR EDUCATORS AND DEVELOPERS ........................................... 136
5.2.5 SUMMARY ................................................................................................................................ 138
5.3 SUMMARY................................................................................................................................... 138
CHAPTER 6 DESIGNING THE EXTENSIONS........................................................................... 140
INTRODUCTION .............................................................................................................................. 141
6.1 HIGH LEVEL OVERVIEW ....................................................................................................... 142
6.1.1 REQUIREMENTS ........................................................................................................................ 143
6.1.2 HIGH LEVEL DESIGN ................................................................................................................ 143
6.1.2.1 Assessing the aesthetics of student diagrams................................................................... 143
6.1.2.2 Assessing solutions with mutually exclusive alternate solution cases.............................. 146
6.1.2.3 Prioritising and truncating feedback to students ............................................................. 149
6.1.3 EXTENSION INTEGRATION ........................................................................................................ 151
6.1.4 SUMMARY ................................................................................................................................ 154
6.2 ASSESSING THE AESTHETIC LAYOUT OF STUDENT DIAGRAMS: RESOLVING THE
DESIGN ISSUES ................................................................................................................................ 154
6.2.1 LINKING THE DESIGN TO THE REQUIREMENTS .......................................................................... 155
6.2.2 HIERARCHY .............................................................................................................................. 156
6.2.3 INTERFACE ............................................................................................................................... 157
6.2.4 SCALING................................................................................................................................... 158
6.2.5 AESTHETIC MEASURES ............................................................................................................. 159
6.2.5.1 The aesthetic measures for non-interception and non-intersection ................................. 160
6.2.5.2 The aesthetic measure for equilibrium............................................................................. 161
6.2.5.3 The aesthetic measures for balance, unity, proportion, simplicity, density, economy,
homogeneity and cohesion........................................................................................................... 162
6.2.5.4 The need for students to adapt their solutions ................................................................. 163
6.2.6 STRUCTURAL MEASURES .......................................................................................................... 165
6.2.7 SUMMARY ................................................................................................................................ 166
6.3 ASSESSING SOLUTIONS WITH MUTUALLY EXCLUSIVE ALTERNATE SOLUTION
CASES: RESOLVING THE DESIGN ISSUES............................................................................... 166
6.3.1 LINKING THE DESIGN TO THE REQUIREMENTS .......................................................................... 167
6.3.2 A TOOL FOR GENERIC FEATURES TESTING OF DIAGRAMS .......................................................... 168
6.3.3 DESIGNING THE PROCESS OF ASSESSMENT FOR MUTUALLY EXCLUSIVE SOLUTION CASES ........ 170
6.3.4 HARBINGERS AND THE DISTINCTION TEST ................................................................................ 171
vii
6.3.5 STRATEGIES FOR DISTINGUISHING BETWEEN MUTUALLY EXCLUSIVE SOLUTION CASES ........... 172
6.3.6 SUMMARY ................................................................................................................................ 173
6.4 PRIORITISING AND TRUNCATING STUDENT FEEDBACK: RESOLVING THE
DESIGN ISSUES ................................................................................................................................ 173
6.4.1 LINKING THE DESIGN TO THE REQUIREMENTS .......................................................................... 174
6.4.2 THE PRIORITISETRUNCATETOOL ............................................................................................. 175
6.4.3 THE STRATEGY INTERFACES AND ABSTRACT CLASSES ............................................................. 175
6.4.4 PROVIDING A BASIS .................................................................................................................. 177
6.4.5 SUMMARY ................................................................................................................................ 178
6.5 SUMMARY................................................................................................................................... 178
CHAPTER 7 ISSUES IN IMPLEMENTATION AND ADVICE FOR EDUCATORS AND
DEVELOPERS ................................................................................................................................... 179
INTRODUCTION .............................................................................................................................. 180
7.1 IMPLEMENTATION ISSUES ................................................................................................... 180
7.1.1 OBJECTIVES ............................................................................................................................. 181
7.1.2 INTEGRATION INTO COURSEMARKER ...................................................................................... 182
7.1.3 ASSESSING THE AESTHETIC LAYOUT OF STUDENT DIAGRAMS: IMPLEMENTING THE DESIGN ..... 182
7.1.4 ASSESSING SOLUTIONS WITH MUTUALLY EXCLUSIVE SOLUTION CASES: IMPLEMENTING THE
DESIGN .............................................................................................................................................. 184
7.1.5 PRIORITISING AND TRUNCATING STUDENT FEEDBACK: IMPLEMENTING THE DESIGN ................ 184
7.1.6 SUMMARY ................................................................................................................................ 186
7.2 ADVICE FOR DEVELOPERS AND EDUCATORS ............................................................... 186
7.2.1 GUIDANCE FOR DEVELOPERS.................................................................................................... 187
7.2.1.1 Prerequisites .................................................................................................................... 187
7.2.1.2 Expressing features testing regimes to assess mutually exclusive solution cases............ 188
7.2.1.3 Layout tools ..................................................................................................................... 189
7.2.1.4 Prioritisation and truncation strategies........................................................................... 190
7.2.1.5 The marking scheme ........................................................................................................ 191
7.2.2 GUIDANCE FOR DEVELOPERS.................................................................................................... 193
7.2.2.1 Prerequisites .................................................................................................................... 193
7.2.2.2 Identifying harbingers and specifying distinction tests.................................................... 194
7.2.2.3 The weighting system ....................................................................................................... 195
7.2.2.4 Configuring and specifying aesthetic and structural measures ....................................... 196
7.2.2.5 Specifying and configuring prioritisation and truncation strategies ............................... 197
7.2.2.6 Writing good feedback comments .................................................................................... 198
7.2.3 SUMMARY ................................................................................................................................ 199
7.3 SUMMARY................................................................................................................................... 199
CHAPTER 8 USE AND EVALUATION ........................................................................................ 200
INTRODUCTION .............................................................................................................................. 201
8.1 OBJECTIVES............................................................................................................................... 201
8.2 EXAMPLES OF FORMATIVE, COMPUTER-BASED ASSESSMENT EXERCISES IN
DIAGRAM-BASED DOMAINS ....................................................................................................... 202
8.2.1 THE PROCESS OF EXERCISE CREATION ...................................................................................... 202
8.2.2 EXERCISE DOMAINS AND METHODOLOGY ................................................................................ 205
8.2.2.1 UML Use Case Diagram exercises.................................................................................. 205
8.2.2.2 UML Class Diagram exercises ........................................................................................ 207
8.2.2.3 Methodology .................................................................................................................... 209
8.2.3 USE AND EVALUATION OF THE PROTOTYPICAL EXERCISES ....................................................... 210
8.2.3.1 Constructing and running the exercises........................................................................... 210
8.2.3.2 Evaluation of the exercises .............................................................................................. 212
8.3 ASSESSING THE AESTHETIC LAYOUT OF STUDENT DIAGRAMS: EVALUATING
PERFORMANCE............................................................................................................................... 215
viii
8.3.1 EVALUATING THE EXTENSION AS CBA .................................................................................... 215
8.3.2 EVALUATING THE EXTENSION AS FORMATIVE ASSESSMENT ..................................................... 216
8.3.3 EVALUATING THE EXTENSION AS EDUCATIONAL DIAGRAMMING ............................................. 216
8.4 ASSESSING SOLUTIONS WITH MUTUALLY EXCLUSIVE SOLUTION CASES:
EVALUATING PERFORMANCE................................................................................................... 217
8.4.1 EVALUATING THE EXTENSION AS CBA .................................................................................... 217
8.4.2 EVALUATING THE EXTENSION AS FORMATIVE ASSESSMENT ..................................................... 218
8.4.3 EVALUATING THE EXTENSION AS EDUCATIONAL DIAGRAMMING ............................................. 219
8.5 PRIORITISING AND TRUNCATING THE FEEDBACK: EVALUATING
PERFORMANCE............................................................................................................................... 219
8.5.1 EVALUATING THE EXTENSION AS CBA .................................................................................... 219
8.5.2 EVALUATING THE EXTENSION AS FORMATIVE ASSESSMENT ..................................................... 220
8.6 CONCLUSIONS........................................................................................................................... 222
CHAPTER 9 CONCLUSIONS ........................................................................................................ 225
INTRODUCTION .............................................................................................................................. 226
9.1 MEETING THE OBJECTIVES ................................................................................................. 226
9.1.1 ASSESSING THE AESTHETIC LAYOUT OF STUDENT DIAGRAMS................................................... 226
9.1.2 ASSESSING SOLUTIONS WITH MUTUALLY EXCLUSIVE SOLUTION CASES ................................... 228
9.1.3 PRIORITISING AND TRUNCATING STUDENT FEEDBACK .............................................................. 230
9.2 CONTRIBUTIONS ...................................................................................................................... 231
9.2.1 CBA......................................................................................................................................... 231
9.2.2 FORMATIVE ASSESSMENT ......................................................................................................... 232
9.2.3 EDUCATIONAL DIAGRAMMING ................................................................................................. 232
9.3 FUTURE WORK.......................................................................................................................... 233
9.3.1 CBA......................................................................................................................................... 233
9.3.2 FORMATIVE ASSESSMENT ......................................................................................................... 234
9.3.3 EDUCATIONAL DIAGRAMS ........................................................................................................ 235
9.4 EPILOGUE ................................................................................................................................... 235
BIBLIOGRAPHY............................................................................................................................... 238
ix
List of Figures
Figure 1.1: The thesis scope at a high level ...............................................................................................4
Figure 2.1: Relationships between CAL, CBL, CAA and CBA ...........................................................14
Figure 3.1: TRAKLA2’s student applet and model solution window [MK04] .................................74
Figure 3.2: Example exercise and student solution using PILOT [BGK+00] ..................................76
Figure 3.3: The student revision tool [TWS05] ......................................................................................81
Figure 3.5: Ceilidh’s dumb terminal interface [Sp06].............................................................................88
Figure 3.6: The Java CourseMarker client [Sp06]...................................................................................92
Figure 3.7: A range of diagram notations expressed within DATsys .................................................95
Figure 4.1: Uneditable nodes and distracters in Tsintsifas’ OO exercise ........................................ 104
Figure 4.2: Generic nodes in the E-R exercises with editable text ................................................... 104
Figure 4.3: An illustrative student ER diagram solution .................................................................... 106
Figure 4.4: First nine submissions of students who submitted 12 times or less ............................ 113
Figure 4.5: Submissions 15 to 30 for those students who submitted more than 12 times........... 113
Figure 5.1: Two mutually exclusive model solutions.......................................................................... 133
Figure 6.1: A high-level view of the relationships between the extensions .................................... 152
Figure 6.2: The hierarchy of the aesthetic layout extension .............................................................. 156
Figure 6.3: Aesthetic and structural measures implement LayoutToolInterface............................ 157
Figure 6.4: The LayoutToolInterface interface.................................................................................... 157
Figure 6.5: The relationship between the raw score and the scaled mark....................................... 158
Figure 6.6: The design of the non-interception tool........................................................................... 160
Figure 6.7: The design of the non-intersection tool ........................................................................... 161
Figure 6.8: The co-ordinate system in DATsys diagram editors ...................................................... 163
Figure 6.9: Original student solution and student solution with modification............................... 164
Figure 6.10: The DiagramFeaturesTool................................................................................................ 169
Figure 6.11: Marking multiple features test cases................................................................................ 170
Figure 6.12: The PrioritiseTruncateTool .............................................................................................. 175
Figure 6.13: Strategy interfaces for the four sub-problems ............................................................... 176
Figure 7.1: Features tests organised into cases .................................................................................... 188
Figure 7.2: A simple marking scheme for a formative exercise ........................................................ 192
Figure 8.1: The tool library for UML use case diagrams.................................................................... 206
Figure 8.2: A simple use case diagram using the tool library............................................................. 206
Figure 8.3: The tool library for UML class diagrams.......................................................................... 208
Figure 8.4: A simple class diagram using the tool library................................................................... 208
x
List of Tables
Table 2.1: CBA provides concrete pedagogical benefits.......................................................................21
Table 2.2: Bloom’s levels of cognitive learning ......................................................................................24
Table 2.3: Fourteen aesthetic measures from Ngo et al [NTB00].......................................................52
Table 4.1: Features expressions for the ER exercises......................................................................... 108
Table 8.1: Average submission numbers for the prototype exercises.............................................. 213
Table 8.2: Results of the student questionnaire................................................................................... 214
List of Equations
Equation 6.1: The non-interception measure ...................................................................................... 160
Equation 6.2: The non-intersection measure....................................................................................... 161
Equation 6.3: Equilibrium....................................................................................................................... 162
Equation 6.4: x-axis equilibrium component....................................................................................... 162
Equation 6.5: y-axis equilibrium component ....................................................................................... 162
Equation 6.6: Calculating the priority of a MarkingLeafResult......................................................... 177
Chapter 1
Introduction
1. Introduction
2
Introduction
Higher education institutions are confronted with the challenge of providing
academic courses to a higher number of students without the benefit of a
proportionate increase in teaching staff. Student-to-staff ratios (SSRs) are less
favourable to academic staff than in the past [Dfes05, Ml97] and SSR increases of
150% since the 1970s have been reported [AUT05]. The delivery of course materials,
assessment of student work, detection of plagiarism and administration of course
data are but a few of the academic tasks affected by the situation.
Formative assessment is that form of assessment in which the primary aim is to assist
the process of learning [Kp01]. Formative assessment should occur throughout the
learning process and have the primary aim of providing useful feedback to students
[JMM+04]; it stands opposed to summative assessment, in which the primary aim is
to provide an indicator of progress at the end of a particular learning process.
Formative assessment has considerable pedagogic advantages over summative
assessment: it encourages active student learning, can assess a wider range of
learning outcomes, can help in the avoidance of mark aggregation and discourages
plagiarism [Kp01]. However, formative assessment is more resource-intensive than
summative assessment due to its frequency and the detail of the feedback to be
provided to the student. Furthermore, summative assessment may be prioritised
institutionally due to the need to indicate student achievement externally at the end
of an academic course. Therefore, as SSRs become less favourable, the amount of
formative assessment from which students can benefit has tended to be reduced.
Computer Based Assessment refers to the delivery of materials for teaching and
assessment, the input of solutions by the students, an automated assessment process
and the delivery of feedback, all achieved through an integrated, coherent online
system. The process of coursework delivery, development of solutions by students,
an automated process of assessment and the delivery of student feedback all occur
online at the computer terminal [CE98a, SM97]. A prime motivator in the
development of CBA technology was to reduce marking time in response to
changing SSRs and this is a key benefit of the technology. The reasons for the
development of CBA systems are therefore analogous to the reasons for the decline
in formative assessment usage.
1. Introduction
3
The CourseMarker CBA system [FHH+01, FHS+01] provides functionality for the
authoring, running, marking, and administering of CBA exercises. It is the successor
to the widely-used Ceilidh system [FHT+99] but has better performance, scalability,
extensibility and maintainability. CourseMarker can accommodate diagram-based
CBA through an integrated system known as DATsys [Ta02].
CourseMarker and DATsys represent a powerful mechanism for conducting
diagram-based CBA in a summative context. However, the system is not suitable for
purely formative assessment. Both the mechanism for the marking of diagrams and
the feedback facilities are, while powerful, insufficiently flexible to provide formative
assessment courses. Furthermore, there is no mechanism for the marking of diagram
layout, which is essential to formative, diagram-based CBA.
The work in this dissertation presents research, design, implementation and
evaluation of techniques that facilitate the construction of formative, diagram-based
CBA exercises which are unique in the literature.
This chapter presents the motivation for the work and the scope of the thesis and
highlights the key contributions novel to this work. The chapter ends with a chapterby-chapter synopsis of the thesis.
1.1 Background
1.1.1 Motivation
A key objective of this research was to “close the gap” between the pedagogical
practices of formative assessment and the field of CBA. Strategies in the literature
which attempt to bolster the position of formative assessment within contemporary
higher education range from a managed reduction to mechanisation; however, this
mechanisation rarely extends beyond ideas such as paper tick-sheets and pre-written
feedback statement banks [Rc01]. CBA systems, on the other hand, are often derived
from earlier ad-hoc marking scripts informally developed to aid individual lecturers
(usually in Computer Science departments); such a description fits Ceilidh precisely.
Even when a system is formally designed, as with CourseMarker, the effectiveness of
1. Introduction
4
material delivery and feedback is often measured in terms of student questionnaires
rather than by reference to formal teaching principles.
The motivation for this research was therefore to prove that the delivery of a CBA
course whose primary aim was for formative assessment could be feasibly achieved,
could adhere to principles of good formative assessment and would be useful in
practice. Diagram-based domains were used as a vehicle for the research due to the
prevalence of diagram-based coursework in many academic disciplines and the freeresponse nature of the student submissions.
No earlier work describes the formative CBA of diagram-based domains. This is
because only recently, with the completion of DATsys, has an extensible, scalable
and maintainable framework for diagram-based CBA exercises been available.
1.1.2 Scope
Figure 1.1: The thesis scope at a high level
The formative CBA of exercises in diagram-based domains requires theory and
techniques from the disciplines of learning technology, education and diagramming.
Figure 1.1 illustrates the relationships between these disciplines in the context of this
work.
1. Introduction
5
From a Learning Technology perspective, this research investigates the area of freeresponse CBA exercises. CBA involves the delivery of course materials, the input of
student solutions, the marking and the returning of feedback to the student
automatically within an integrated online system. Free-response CBA allows
students to construct a solution within an online environment rather than simply
selecting one or more options from among distracters.
From the field of education research, this work focuses on formative assessment.
Formative assessment involves the provision of feedback to the student in order to
enhance learning. Good feedback should motivate learning and opportunities for the
student to redeem a poor solution should be provided.
From the field of diagramming, this work investigates diagram layout. The function
of a diagram is to convey meaning to the observer. For this to occur successfully a
diagram must have an aesthetically acceptable layout as well as the correct diagram
elements, connected in the correct manner.
1.2 Brief Overview
1.2.1 General Objectives
This research aims to investigate, propose, design, implement and evaluate
techniques which allow formative assessment of diagram-based coursework
exercises to be conducted through a CBA system with a practical amount of effort
required by those responsible for setting and administering the course. These
techniques are illustrated through deliverables which demonstrate the practical
benefit of the work.
Two central questions formed the inspiration for this work:
•
To what extent can CBA techniques be used to reduce the resources involved
in setting a formatively assessed coursework in a diagram-based domain,
marking student submissions and returning feedback, while still adhering to
good formative assessment principles?
1. Introduction
•
6
To what extent would current, successful CBA practices need to be changed
to conform to formal formative assessment guidelines?
DATsys provides a simple interactive interface for the authoring of new diagram
domains. Marking, however, is achieved through the creation of marking tools. A
more detailed set of questions arises from the need for generality across diagram
domains:
•
To what extent will it be possible for the educator to provide formative
feedback in many diagram-based domains by configuring the system and
writing feedback comments rather than through impractically complex
programming?
•
To what extent can standardisation of CBA processes occur without the
assessment failing to meet the standards of formative assessment guidelines?
The next set of questions arises from the specific need to mark aesthetic diagram
layout:
•
To what extent can an automated system for the marking of diagram
aesthetics generate useful results within a multitude of diagram domains?
•
To what extent will domain-specific layout rules be required and can these
be specified by the educator with a practical level of effort?
•
What trade-offs are required in a system to mark diagram aesthetics to
ensure generality across domains whilst at the same time allowing
specialisation when necessary?
Finally, it is necessary to ask questions regarding the performance of a CBA system
in conducting formative assessment within diagram-based domains:
•
Can formative assessment be rendered less resource-intensive through the
use of CBA technology? Conversely, can CBA technology be used to deliver
good formative assessment?
•
Can a formative assessment process automated using CBA technology
enhance student learning?
1. Introduction
7
1.2.2 Problems and Specific Objectives
The general objectives resulted in four major problem areas being examined.
The initial problem area concerns the concrete identification of the problems arising
from applying CBA techniques to a formative assessment course. An initial period of
research, documented in chapter 4, involved the running of a live formative
assessment course for entity-relationship diagrams within a Database Systems
module. This ensured that solutions proposed by the work were relevant to
formative assessment practice and suggested the three subsequent problem areas for
further research.
Further research concentrates on three main problem areas arising out of the
practical experience gained. The first subsequent problem area is concerned with the
aesthetic appearance of diagrams. Work in the field of graph layout is extensive
[BET+94, Sk02] and work on aesthetics has been conducted in the context of user
interface design [SP04, NTB00]. However, diagram-based CBA research to date
[Ta02, HL98] has not attempted to assess diagram appearance. The objective within
this problem area is to design and implement a flexible framework for the
assessment of diagram appearance which takes into account general aesthetic
principles and the layout rules of specific diagram domains. The approach must be
general since multiple diagram domains need to be assessed.
The second subsequent problem area is concerned with the marking process.
Diagram-based coursework may have one or more mutually exclusive solutions.
CBA software, including CourseMarker, sometimes penalises solutions which
deviate from the model solution [Fj01] and has no mechanism for accommodating
multiple model solutions to a problem. Good formative assessment should be able to
assist students attempting to solve the coursework in different ways. The objective
within this problem area is to design and implement techniques to allow variation
within model solutions to a problem in order that any one of several mutually
exclusive solution elements may be considered a correct solution.
The third subsequent problem area is concerned with the provision of good
formative feedback to the student. Feedback is the primary deliverable associated
with formative assessment [Kp01] and detailed advice is available on the principles
1. Introduction
8
of good formative assessment feedback [JMM+04]. CBA software can return detailed
feedback to the student [HST02], however this feedback is often indiscriminate since
it was developed within the context of summative assessment where the emphasis
lies in providing a detailed breakdown of the mark obtained. In formative
assessment it is more important that feedback is motivational, prioritised and limited
in scope to focus student attention on the most serious weaknesses. The objective
within this problem area is to design and implement techniques to provide targeted,
motivational feedback within a formative context where the student can
incrementally improve their solution using multiple submissions.
1.2.3 Approach
Initially, a live experiment was implemented using CourseMarker / DATsys.
Coursework involving the construction of entity-relationship diagrams was assessed
as part of an undergraduate module in Database Systems. A new marking tool for
assessing entity-relationship diagrams within CourseMarker was developed.
Problems arising from this live experiment were used to determine which aspects of
CBA practice would need to be augmented to conduct formative assessment
successfully, a general objective of the work as discussed in section 1.2.1. The results
failed to achieve good formative assessment practice and the system did not meet
CBA criteria in several key aspects, but the experiment was crucial in identifying the
shortcomings of current practices. The subsequent design stage of the thesis could
then confidently be targeted on real problems arising from CBA of formative
assessment. This work is documented further in Chapter 4.
Subsequent work in the marking of diagram layout, handling of mutually exclusive
solution cases and the delivery of truncated, prioritised feedback aimed for a high
level of generality. The number of diagram-based domains used in coursework
across multiple academic disciplines is large. The approach taken is to determine
those factors common across domains and to encapsulate them systemically. The
differences between domains are then specified on a per-domain basis through
parameterisation and extensions.
To facilitate the marking of diagram layout, a distinction is drawn between aesthetic
measures, which denote the commonality across domains, and structural measures,
which represent the differences between domains. Layout marking of a specific
1. Introduction
9
domain is achieved through configuration of the aesthetic measures to indicate the
relative importance of the factors, together with specification of the structural
measures on a per-domain basis, if required. In the handling of mutually exclusive
solution cases, a high level of generality is achieved through authoring an expressive
notation for the specification of solution cases and their relationships. To generate
prioritised feedback, a system of prioritisation of marking factors is developed based
upon the relative weight of the factor and the deficiency of the student solution
within the factor. Categorisation of factors helps the feedback to be balanced.
1.2.4 Contributions
The primary contribution of this research is in the area of CBA. The combination of
the handling of mutually exclusive solution cases and provision of truncated,
prioritised feedback is new to CBA and aids the construction of formatively assessed
courses in free-response domains using CBA software. CBA in diagram-based
domains is also enhanced through the marking of the layout of student solutions.
Coursework has been constructed in the domains of entity-relationship diagrams
and object oriented design diagrams. The initial work also contributes to the
understanding of the problems associated with the use of CBA software in
unfamiliar (and unanticipated) contexts.
A second contribution is in the area of diagramming. A flexible and powerful
platform for the generic assessment of diagram layout has been provided.
1.3 Synopsis of the Thesis
Chapter 1 outlines the background of the research by showing its motivation and its
scope. It gives a brief overview in terms of the general objectives and approach of the
work and explains the contributions made by the work. Theory and techniques from
the fields of Learning Technology, Education and Diagramming are combined. The
research aims to present techniques to achieve the formative CBA of student
coursework in diagram-based domains. To achieve this, initial research is presented
to demonstrate the problem areas: the marking of aesthetic diagram layout, the
accommodation of mutually exclusive solution cases and the delivery of truncated,
1. Introduction
10
prioritised feedback to the student. The work presents novel solutions within each of
these specific problem areas.
Chapter 2 introduces the key concepts from the areas of CBA, formative assessment
and diagramming. The focus of the work in CBA is in free-response exercises within
a formative assessment context. The focus of the work in formative assessment is in
providing feedback to the student to assist the process of further learning. The focus
within a diagramming context centres on the perception of diagrams from the fields
of graph layout and aesthetics. This chapter presents the background and main
problems within each area, in order that chapter 3 can concentrate on the most
relevant approaches found in the literature.
Chapter 3 presents a critical analysis of the work upon which this research is based,
together with other relevant work from the literature. Existing work in free response
CBA and diagram editing is documented and other approaches to cope with the
resource-intensiveness of formative assessment are examined.
Chapter 4 presents the initial practical research conducted. Coursework involving
the construction of entity-relationship diagrams was assessed using CourseMarker /
DATsys as part of an undergraduate module in Database Systems. A new marking
tool for assessing entity-relationship diagrams within CourseMarker was developed.
The experiment is described, key results are presented and conclusions are drawn
which feed into the subsequent design and implementation chapters.
Chapter 5 examines the provision of formative CBA within diagram-based domains
and outlines the problems which must be overcome in light of the conclusions drawn
by the preliminary work in chapter 4. The three identified problem areas are
concerned with the assessment of aesthetic diagram layout, the handling of mutually
exclusive sections of solutions and the provision of concise, motivational feedback to
the student in line with formative assessment principles. The problem of balance
between simplicity of configuration, so that the creation of formative assessment
coursework by the educator is rendered practical, with expressiveness, so that many
diagrammatic domains can be assessed, is examined.
Chapter 6 documents the design decisions made in creating subsystems to augment
CourseMarker and shows how these decisions satisfy the objectives identified in
1. Introduction
11
chapter 5. A generic framework for marking diagram layout is designed that can be
customised to mark individual diagrammatic domains. An expressive notation for
the specification of solution cases and their relationships is developed to facilitate the
marking of coursework where the solution has one or more mutually exclusive
elements. To generate prioritised feedback, a system of prioritisation of marking
factors is developed based upon the relative weight of the factor and the deficiency
of the student solution within the factor. Categorisation of factors helps the feedback
to be balanced.
Chapter 7 reports on a prototype system which was developed as an extension to
CourseMarker. It documents the three main subsystems developed in response to the
three identified problems and shows how they interact with the existing
CourseMarker CBA system and the integrated DATsys environment.
Chapter 8 evaluates the prototype system and documents the success of the approach
taken by the research. Formative assessment can be conducted in multiple diagrambased domains using CBA techniques. The evaluation considers the success of the
design from the point of view of educators and documents results with students in
example domains.
Chapter 9 reviews the thesis’ key points and shows how the evaluation of the system
in chapter 8 relates to the general objectives for research stated in chapter 1. The
contributions of this research to the fields of CBA, formative assessment and
diagramming are discussed. Areas for future work are considered. The thesis shows
that formative assessment within diagram-based domains can be feasibly conducted
using CBA techniques and is useful in the practical context of higher education.
Chapter 2
CBA, Formative Assessment and Diagramming
2. CBA, formative assessment and diagramming
13
Introduction
This chapter provides the research background in the fields of Computer Based
Assessment, formative assessment and diagramming. A section is presented for each
field.
Section 2.1 introduces Computer Based Assessment (CBA). CBA is defined
specifically in terms of its relationships with other areas of learning technology. A
brief historical overview of automatic assessment is provided and the motivation for
the development of the technology is explained. The advantages and limitations of
CBA techniques are considered. Methods to minimise the limitations associated with
CBA usage are documented.
Section 2.2 provides an overview of formative assessment. Formative assessment is
defined and its differences to other forms of assessment emphasised. The merits of
formative assessment are considered and the decline in formative assessment usage
within education institutions explained. Strategies to overcome those drawbacks of
formative assessment which are responsible for this decline are considered. The
provision of feedback as the primary aim of formative assessment is explained and
criteria for good formative feedback are presented.
Section 2.3 examines the concept of diagrams. The role of diagrams in education is
examined and the presence of diagrams in a wide number of academic disciplines is
demonstrated. An overview of the academic study of diagrams is provided and the
concept of aesthetics in diagramming is introduced.
2.1 Computer Based Assessment
2.1.1 Definition
As institutions seek to maintain teaching and assessment standards with decreasing
unit-resource, attempts are being made to automate some or all of those processes
necessary for conducting teaching, learning and assessment — such as authoring of
course and assessment material and mark schemes, distribution of material and
questions to the learner, development and submission of student solutions, course
2. CBA, formative assessment and diagramming
14
administration and marking [CE98a]. The collection of processes necessary for
conducting a piece of assessment is known as the lifecycle of the assessment.
Computer Based Assessment (CBA) constitutes a section of learning technologies
distinguished from others by the number and types of processes that are automated
within the lifecycle. The relationships between CBA and Computer Assisted
Assessment (CAA), Computer Assisted Learning (CAL) and Computer Based
Learning (CBL) can be represented as shown in Figure 2.1 [HB06].
Automation of
some stages within
lifecycle
Focus on assessment
but may also involve
delivery of course
materials
CAL
CBL
CAA
CBA
more specialised
Focus on
delivery of
course material
to learners
Automation of full
lifecycle
more specialised
Figure 2.1: Relationships between CAL, CBL, CAA and CBA
Computer Assisted Learning (CAL) is a generalised term which refers to the use of
technology to ease the learning process in virtually any way. Only tasks of teaching
and learning may be automated and even these in a superficial way with little coordination between the automation of separate tasks. Thus, the delivery of lecture
materials using software packages such as Microsoft PowerPoint or allowing students
to print out lecture notes from a centrally available resource would constitute a basic
form of CAL.
Computer Based Learning (CBL) is defined as that subset of CAL in which the
learning materials must be presented to the student online via a computer terminal
in a coherent system; the implication is that the student is primarily responsible for
2. CBA, formative assessment and diagramming
15
navigating through the course materials available online and structuring their
learning at their own pace. Only tasks of teaching and learning need to be
automated, but the intention is to create a coherent system which can be utilised by
the student with little need for input from a teacher.
An example of CBL is MacCycle [Bj93], used at St. Andrews University to teach
second-year medical undergraduates about the menstrual cycle as a replacement for
lectures on the topic. The students are able to work through the material, which
includes text, images, video and interactive sections showing dynamic changes in
hormone levels, at their own pace and are then asked to write and electronically
submit an essay based upon what they have learned from the system. However, the
assessment, an essay, is then later printed out by the tutor and hand-marked. CBL
can therefore be seen as a specialisation of CAL in which an entire process of learning
is conducted online through a computer terminal.
Computer Assisted Assessment (CAA) refers to the use of technology to deliver
coursework to the students, mark student responses and conduct analysis of
submitted coursework. In CAA the automation of certain tasks of assessment,
teaching and learning are likely to be present, but some stage of the process (often
the development of solutions by candidates) is still accomplished using mechanisms
outside the system.
A common practice which constitutes CAA involves using Optical Mark Reader
(OMR) technology to read student responses to an assessment from paper into an
assessment system which compares the responses against a set of model answers.
The teaching and assessment material would likely have been created using
computerised means as well. In this way, CAA can be seen as be seen as that
specialisation of CAL which involves automated marking and analysis of student
submissions as well as the delivery of materials.
Computer Based Assessment (CBA) therefore refers to the delivery of materials for
teaching and assessment, the input of solutions by the students, an automated
assessment process and the delivery of feedback, all achieved through an integrated,
coherent online system. It can therefore be seen as that specialisation of CAA in
which the entire process (including the development of solutions by candidates)
occurs online at a computer terminal and also as that specialisation of CBL in which
2. CBA, formative assessment and diagramming
16
assessment must occur as part of the system, as well as the delivery of teaching and
learning materials. CBA is the most specialised form of learning technology to be
considered because it provides for the highest level of automation within a coherent
system; this, in turn, means that CBA has more potential in terms of time saving than
the other forms of learning technology. This model and definition are consistent with
those prevailing in the literature [CE98a, Ta02, SM97].
2.1.2 Development of Automated Assessment
Early automated assessment systems began to appear towards the beginning of the
1960s. The use of computers to automate simple, repetitive processes was already
appreciated and educators within fields such as computer science, physics and
mathematics were eager to take advantage of the time-saving potential offered by
automating the assessment process. The first automated assessment systems were
characterised by the use of simple marking mechanisms to assess simple question
types. Hollingsworth [Hj59, Hj60], describes a system used as early as 1959 to assess
a student machine language course which, despite problems of unreliability and lack
of security, was seen to be clearly justified on “economic grounds”.
The system described by Forsythe and Wirth [FW65] is similar to that of
Hollingsworth in that a simple matching mechanism was used to assess carefully
simplified exercises, in this case in the programming language Balgol, a variation on
Algol-58. Students submitted their work on punched cards and the marking was
done as a batch after the deadline for the exercise had passed. Subsequent systems
tended to improve the level of automation achieved and the number of exercise
domains which could be assessed.
Hext and Winings [HW69] describe a system which could assess three domains (two
variants of Algol and an assembly language) and whose batch processing was
entirely automated. Later papers by Taylor and Deever [TD76], Rottman and Hudson
[RH83] and Myers [Mr86] expanded automated assessment usage into domains
outside Computer Science, in physics, mathematics and chemistry respectively.
Despite a historical progression charting an increased level of automation in the
assessment process and a gradual widening of the domains covered by automated
assessment, all of the systems described above involve simple assessment
2. CBA, formative assessment and diagramming
17
mechanisms being used to assess exercises which were carefully constructed by the
educator to be “assessable”. Rather than to improve the pedagogic quality of
assessment, automated assessment was seen as a mechanism with the potential to
increase the speed with which assessment could be carried out and thus to allow the
assessment of increasing numbers of students to be rendered feasible. As student
numbers increase further, this factor is likely to continue to be a motivator in the
development of automated assessment systems.
2.1.3 Motivation in Automated Assessment
Higher education institutions are confronted with the challenge of providing
academic courses to a higher number of students without the benefit of a
proportionate increase in teaching staff. Student-to-staff ratios (SSRs) are less
favourable to academic staff than in the past [Dfes05, Ml97], and SSR increases of
150% since the 1970s have been reported [AUT05]. The delivery of course materials,
assessment of student work, detection of plagiarism and administration of course
data are but few of the academic tasks affected by the situation. Section 2.1.2 showed
that, historically, automated assessment research has been motivated by a desire to
assess more students with increased speed; this motivation would appear set to
increase given current conditions.
In the systems described in section 2.1.2 the automation often led to a change in the
presentation of the assessment process itself, which was constrained by the
limitations of the technology available. Students were obliged to submit their
solutions in a simplified form which could be assessed by the system; this introduced
notations for representing solutions which were far removed from the coursework
problems themselves. Only in more recent times have the pedagogic implications of
automated assessment come to be acknowledged. An evaluation of prominent
contemporary automated assessment systems is provided in Chapter 3. However, it
is clear that pedagogic principles cannot be ignored when the development of the
automated assessment field is considered.
Automated assessment is an inter-disciplinary topic and contributors to the field
often maintain the perspective of their original discipline when making
contributions. Section 2.1.1 defined CBA technology in terms of the number and
types of processes automated. Often, the extent to which the assessment process is
2. CBA, formative assessment and diagramming
18
automated is determined by educator preconception rather than the practical limits
of the technology available. Canup and Shackelford [CS98], for example, argue that
final marking must always be performed by human graders using automation as a
simple aid. Mason and Woit [MW98, MW99] propose approaches which involve an
online system for the presentation of examination materials and collection of student
submissions but only a limited role for automatic marking. Joy and Griffiths [JG04]
describe a system which facilitates online student submission and allows both
students and graders to run automatic tests on submissions, but which provides
neither learning materials nor an integrated, fully automated marking process by
design. None of these systems, therefore, constitute CBA technology as defined in
section 2.1.1. This work will utilise a CBA approach based upon the full automation
of the lifecycle of an exercise in order to maximise the resource-saving potential
offered by the research. Furthermore, an integrated CBA system which allows
student solutions to be developed in a naturalistic, intuitive way within an
interactive environment will be used to minimise abstract representations which are
removed from the student learning process.
2.1.4 Benefits of Computer Based Assessment
The benefits of CBA technology fit into two broad categories: the practical and the
pedagogical.
The practical reasons were the motivation for the development of CBA in the first
place and were the focus of section 2.1.2. Charman and Elmes [CE98a] acknowledge
that CBA develops out of the desire to automate an increasingly large workload of
assessment, within the context of providing higher education to a larger proportion
of the population without proportionately higher resources. They consider that such
a scenario often leads to the following assessment strategies being adopted:
•
Reducing the assessment loading for students;
•
Evaluation of the function of each piece of assessment;
•
Diversification of the assessment portfolio.
Adoption of CBA techniques is often as a result of a decision to diversify the
assessment portfolio since CBA can be used to save time in the assessment process,
2. CBA, formative assessment and diagramming
19
rather than abandoning sections of assessment altogether. Rust [Rc01] suggests six
methods for confronting the same issues: front ending, doing the assessment in class,
use of self and peer assessment, group assessment techniques, assessment
mechanisation, and strategic reduction strategies. CBA would clearly constitute a
mechanisation strategy, although it is worth noting that Rust’s own suggested
mechanisation strategies are confined to the use of paper tick-sheets and statement
banks to aid in the traditional feedback process.
Charman and Elmes [CE98a] emphasise, however, that the resource-saving potential
of CBA technology is often manifest in the long term. CBA systems are non-trivial to
develop and maintain and commercial systems may be costly to purchase. A costbenefit analysis of CBA technology used in a limited context over a short timescale
can easily be negative; six strategies are, therefore, suggested to maximise the
practical benefits of CBA technology.
Firstly, the advantages of CBA technology will be maximised over a large timescale.
CBA technology is often difficult to develop or learn to use. Writing questions and
feedback can often be more time-consuming, early in the process, than the traditional
assessment methods being replaced would have been in their entirety. However,
once the technology is in place the resource-savings can be utilised repeatedly since
the process is automated and the total time costs compared with tutor based
assessment will often show considerable benefit.
Secondly, additional resources for development may be available. Many institutions
have funds available for technological development. Linking CBA deployment with
research may also allow further resources to be allocated.
Thirdly, introducing CBA technology may allow module delivery to be restructured,
allowing teaching assistants to assist in the administration of the CBA system.
Fourthly, wider departmental, or even institutional, benefits should be considered. If
a strategy for using the CBA technology across several modules or even departments
can be developed then the benefits of the technology are increased and the resource
burden shared.
2. CBA, formative assessment and diagramming
20
Fifthly, existing assessments in the subject area should be considered. Many projects
in educational technology have been developed in recent years. Making direct use of
the experiences and even technological infrastructure developed by those who have
already experimented with automated assessment in the subject area can save
considerable resources. To this end, Charman and Elmes [CE98b] provide a useful
handbook which documents several existing academic CBA systems.
Sixthly, existing question banks should be utilised if they exist. Some institutions
have existing question banks whose questions could be automatically assessed, and
the establishment of national and international question banks is now underway
[SP03]. Re-usable digital resources which can be used to support learning are
available in Learning Object Repositories (LORs), which are often web-based. A
prominent example is MERLOT [Mf04]. Neven and Duval [ND02] provide an
overview of pertinent issues associated with LOR use.
The pedagogical benefits of CBA technology are often overlooked. Experience [CE98a,
BBF+93] has shown that CBA software:
•
Increases assistance to weaker students because problems in learning can be
immediately traced and teaching strategies adapted accordingly;
•
Provides immediate feedback which ensures that the student can internalise
the submission and feedback as one entity while both are fresh in the mind;
•
Increases student consciousness about the assessment process since students
are more willing to contest automatically generated results and, therefore,
become interested in determining what the assessor is looking for in a model
solution;
•
Increases student confidence by allowing easy early exercises and by
demonstrating to students that they are performing well;
•
Encourages students to effectively manage their own workload since students
can increase their mark through multiple submissions if they begin to submit
before the deadline;
2. CBA, formative assessment and diagramming
21
Criterion
Meaning
Application to CBA
Valid
The assessment should measure
what you want to measure and
not depend on other qualities
Will measure specified coursework
aspects assuming good initial assessment
design
Reliable
The assessment should be
consistent between assessors
and for the same assessor on
different occasions
The same assessment process will run for
each submission; consistency is absolute
Fair
Assessment should provide
equal opportunity to succeed;
students should perceive the
assessment as fair
Design-dependent; CBA has no inherent
advantages
Equitable
Assessment
should
not
discriminate between students
other than by ability. Particular
talents (e.g. exam technique)
should
not
be
disproportionately favoured.
The same assessment process will run for
each submission; discrimination is nonexistent
Formative
See section
definition
full
CBA provides a good opportunity to run
assessment frequently throughout the
learning process, and to provide multiple
submissions with full feedback each time
Timely
Assessment
throughout
programme
should
occur
the
learning
CBA provides a good opportunity to run
assessment frequently throughout the
learning process
Incremental
Assessment should be a
gradual
process
allowing
achievement to be ‘built up’
Design-dependent; CBA has no inherent
advantages
Redeemable
Initial failure should not be
absolute and students should
have a second chance
CBA is suited to allowing multiple
submissions should the designer wish
this
Demanding
Assessments should be pitched
at the right level of achievement
and not be easy
Design-dependent; CBA has no inherent
advantages
Efficient
Assessment
should
make
efficient use of available
resources;
over-assessment
should be avoided
Considerable time and other resource
savings to be made; originally a
motivator for CBA’s inception
2.2
for
a
Table 2.1: CBA provides concrete pedagogical benefits
2. CBA, formative assessment and diagramming
•
22
Provides an opportunity to put the information learned in a course
immediately into effect in the next piece of work.
Brown et al [BRS96] argue that good assessment should be valid, reliable, fair,
equitable, formative, timely, incremental, redeemable, demanding and efficient.
Table 2.1 provides an explanation of each of these terms and then considers whether
CBA meets the criterion in question. The definitions are consistent with those in
[CE98a]. It can be seen that in 7 of the 10 criteria CBA is likely to present a distinct
pedagogic advantage over traditional assessment, while in the remaining 3 criteria
CBA has no negative effect. Hence CBA can be said to have concrete pedagogic
benefits.
2.1.5 Limitations of Computer Based Assessment
Like the benefits of CBA which were the focus of section 2.1.4, the limitations of CBA
can be separated into two large categories: practical and pedagogical. Awareness of
these issues, together with careful design and prior planning, can help to minimise
the problems encountered during the assessment process.
A survey of teaching staff with technical backgrounds by Inoue [Iy01] concluded
that, given institutional support for the use of technology in education, the following
six practical factors have the greatest influence on the success or failure of
educational technology in general:
•
Teachers’ knowledge and skills in technology. Training programs for
educators are useful to ensure their awareness of available technology, their
ability to choose appropriate technology and their ability to use the
technology correctly.
•
Availability of hardware and software. Research into educational software
must be encouraged and appropriate hardware developed.
•
Commitment by involved parties. Educators must have the determination to
persevere in solving problems associated with implementing educational
technology, rather than returning to traditional assessment forms.
2. CBA, formative assessment and diagramming
•
23
Availability of time. Educators must be aware that implementing
educational technology can be an initially time-consuming process and allow
time for this in their schedule.
•
Availability of technical support. Educational technology can be difficult to
implement. Developers must provide appropriate technical support to
educators to overcome technical obstacles.
•
Cost of hardware. The hardware on which the education software runs must
not be prohibitively expensive for an institution to purchase and install.
Inoue emphasises that his findings are consistent with those of other studies, and
furthermore identifies the latter three limitations as oft-stated inhibitors for computer
use generally. Charman and Elmes [CE98a] focus on three central problems in this
area: the availability of equipment for writing CBA material, the availability of
equipment for the delivery of CBA material to students and the existence of an
infrastructure to implement CBA delivery on a suitably large scale.
In order to define the pedagogic limitations of CBA techniques it is necessary to define
precisely the type of learning which is to be assessed. Bloom’s Taxonomy of learning
objectives [BEF+56] is the learning model most cited by the automated assessment
community and classifies learning into three categories: cognitive, affective and
psychomotor. Most assessment is an attempt to evaluate cognitive learning. Bloom’s
Taxonomy further divides the cognitive learning domain into six levels of increasing
cognitive complexity as illustrated in table 2.2. Each cognitive level is assumed to
encompass those below it; for example, Comprehension cannot occur without
Knowledge. Bloom’s Taxonomy is both simple and easy to apply.
As a result of research conducted during the 1990s a revised version of Bloom’s
Taxonomy has been proposed by a group of researchers led by Anderson and
Krathwohl [AK01] in which the six cognitive levels are renamed Remembering,
Understanding, Applying, Analysing, Evaluating and Creating and form the
Cognitive Process Dimension of a two-dimensional taxonomy. The second
dimension is the Knowledge Dimension, comprising Factual Knowledge, Conceptual
Knowledge, Procedural Knowledge and Meta-Cognitive Knowledge. In the revised
Bloom’s taxonomy, curricular standards are aligned with both the Cognitive Process
2. CBA, formative assessment and diagramming
24
Dimension and the Knowledge Dimension according to their position in a twodimensional table.
Cognitive Level
Meaning
1. Knowledge
To recall information approximately as it was learned
2. Comprehension
To interpret information based upon prior learning
3. Application
To select data and principles to solve a problem with
minimum outside assistance
4. Analysis
To distinguish and relate the assumptions, structure or
hypotheses of a statement
5. Synthesis
To originate and integrate ideas into a proposal that is
new to the student
6. Evaluation
To critique on the basis of explicit standards
Table 2.2: Bloom’s levels of cognitive learning
While many educationalists agree with Bloom’s general approach, some cognitive
psychologists, who doubt either the ordering or the distinction between the cognitive
levels, debate Bloom’s explicitly tiered architecture. For example, the alternative
RECAP taxonomy [Ib84, Ib95] advocates the view that the three highest cognitive
levels in Bloom’s taxonomy, Analysis, Synthesis and Evaluation, cannot be robustly
distinguished and hence presents a general category called “problem-solving” which
represents the three combined. Assessors who simply wish to ensure that surfacelearning is avoided have successfully used both taxonomies [DK01, BEH+05] and it is
clear that the implications for assessment design are less than for research into
cognitive psychology.
The radically different SOLO taxonomy [BC82] is notable because it deviates from
Bloom’s approach. SOLO is based upon the evaluation of student responses to
assessment rather than the design of the assessment itself. Stephens and Percik
[SP03], however, argue that SOLO sacrifices validity for increased reliability. Since
CBA has already been shown to be reliable in section 2.1.4, automated assessment is
overwhelmingly devised using either Bloom’s or Bloom-like taxonomies.
2. CBA, formative assessment and diagramming
25
The pedagogic limitations of CBA techniques are related to the type of student
response which can be assessed by the system. Fixed-response assessment is a term
used to refer to assessment modes in which students must choose their answer from
a pre-designated selection of alternatives including distracters. Fixed-response
assessment modes, such as multiple-choice questions, are often criticised for simply
assessing the Knowledge of a student (the lowest level in Bloom’s taxonomy) and
therefore encouraging surface-learning strategies [JA00]. Although this problem
occurs when traditional assessment methods are used, the problem is exacerbated
when CBA assessment is considered because of the prevalence of fixed-response
assessment in CBA systems. Johnstone and Ambusaidi [JA00] note that such
methods of assessment have the following disadvantages:
•
Students can guess the correct solution from the alternatives offered;
•
Awarding negative marks to incorrect answers to discourage guessing
discourages even “educated” guesses and, furthermore, can be shown to be
statistically futile [BM00];
•
Students can usually eliminate many distracters from common sense, leading
to a situation sometimes called “multiple true-false”;
•
It is often unclear to the educator why a student chose an answer — students
can get the correct answer for the wrong reason;
•
Negative discrimination can occur in which more knowledgeable students are
disproportionately tempted by incorrect distracters;
•
Students can be disproportionately affected by precise wording: the success
rate of a question can realistically be changed by 20% by the simple use of an
unfamiliar word.
In their overview of the automated assessment field, Bull and Danson [BD04] note
that “CAA is most commonly associated with multiple-choice questions” while Culwin
[Cf98] notes that that the majority of CBA systems to date consider only fixed
responses. Fixed-response assessment has the practical advantage to the CBA
designer that the assessment algorithm can be kept simple, since it is necessary only
to check whether the correct response has been provided. It is for this same reason
2. CBA, formative assessment and diagramming
26
that other mechanised learning systems, such as Optical Mark Recognition, are based
upon fixed-response questions. However, this has led to CBA suffering from many of
the perceived disadvantages of exclusively fixed-response assessment in the minds of
educators, since the two are seen to be synonymous; in a web survey conducted by
Carter et al [CDE+03], only 36% of academics agreed or strongly agreed with the
statement “It is possible to test high-order learning using CAA”.
It is, however, possible to take action to minimise the pedagogic limitations of CBA.
Charman and Elmes [CE98a] suggest the careful construction of assertion-reason
multiple-choice questions as a technique for assessing deeper student understanding
of material, while Duke-Williams and King [DK01] present research into question
design techniques for use with multiple-choice and graphical hotspot questions to
ensure assessment of higher learning outcomes. Furthermore, it is possible to
conduct free-response CBA where the pedagogic limitations are reduced, though this
is less common in the literature due to the complexity of the marking algorithms
required. Section 2.1.6 presents a taxonomy for CBA in which the types of fixedresponse and free-response questions are considered. Their advantages and
limitations are then examined.
Bull and Danson [BD04] argue that the prevalence of CBA which tests only basic
knowledge is a result of misconception, lack of pedagogic understanding and poor
question design rather than inherent limitations of the technology. They counter the
problem in “cultural acceptance” by drawing attention to the existence of automated
assessment systems which “draw on an extensive range of sophisticated question types by
using computers to create questions which would not be possible using the medium of paper.”
They continue that: “CAA offers the opportunity to creatively extend the range and type of
assessment methods used [to] support and enhance student learning in ways which are not
possible with paper-based assessments.”
2.1.6 A Taxonomy for Computer Based Assessment
Section 2.1.5 introduced the concepts of fixed-response and free-response automated
assessment. The key difference between the two is the process by which the learner
constructs their solution to the problem.
2. CBA, formative assessment and diagramming
27
In fixed-response systems the learner chooses a solution from a fixed number of
clearly defined alternatives. One or more of the alternatives is the correct solution;
the other alternatives are incorrect and serve as distracters. The automated
assessment system is required to record whether the solution submitted was correct
or a distracter; no unanticipated solutions are permitted. Fixed-response assessment
may also be referred to as objective assessment.
In free-response systems the student is presented with an environment within which
a solution can be constructed in a freeform way. Free-response assessment requires a
more complex marking algorithm since the student solution cannot be precisely
anticipated. Free-response automated assessment constitutes only a small minority of
the platforms in existence because the complexity of the development process acts as
a deterrent. Culwin [Cf98] notes that the development of free-response automated
assessment is, in comparison with its fixed-response counterpart, “much harder or even
impossible.”
2.1.6.1 Fixed-response Automated Assessment
Fixed-response automated assessment includes the assessment of multiple-choice
questions (MCQs), short response exercises or graphical hotspot exercises.
Multiple Choice Questions (MCQs) present the user with a statement or stem,
followed by a series of choices from which a selection must be made [JA00].
Traditionally, one choice was the correct answer, often referred to as “the key”. The
remaining choices, “the distracters”, were incorrect. A variation in which the student
must identify more than one key is often referred to as a “multiple response
question”. Frosini et al [FLM98] provide a list of MCQ variants: simple true/false,
item order, multiple choice, multiple response, combination, gap filling and best
answer. Modern automated assessment systems which provide a platform for MCQs
do not limit either the stem or the choices to be text; images, sounds or video footage
are all examples of credible MCQ components.
Many commercial systems offer platforms for conducting large scale automated
assessment of MCQs. QuestionMark [BSP+03], which claims to be the world-leader,
allows for the creation of multiple-choice questions within an interactive authoring
environment and includes facilities for the presentation of questions involving a
2. CBA, formative assessment and diagramming
28
wide variety of colours, fonts and pictures, as well as the option to automatically
open other applications (for example, a spreadsheet package or calculator) during the
assessment. The associated program Perception [BFK+04] allows for assessment to be
distributed over the Web. Commercial competitors with comparable facilities include
EQL Interactive Assessor [Mp95], LXR*TEST [GW01] and QuizIt [TBF97]. A
comparison of the features of these tools is provided by Baklavas et al [BER+99].
Short response exercises require the student to provide an answer in the form of a
word, short phrase or number. Students typically present their answer in response to
a stem question, as for MCQs, or may be asked to complete a sentence or phrase. The
assessment system compares the answer provided against one or more correct
answers, or “keys”. The system may expect an exact response, or a degree of
flexibility may be introduced through the use of regular expression-like notations
such as Oracles [ZF92] or allowance of rounding errors for numerical answers.
Academic systems which support short response exercises include TRIADS [Md99]
and Ceilidh [BBF+93].
Graphical Hotspot Exercises require the student to select an area on a graphic, to
connect two or more graphics together using a connection line or to arrange graphics
on a canvas containing pre-defined positions. The correct response is defined in
advance, usually as a sequence of “target areas” within which an answer is deemed
to be correct. Later versions of the commercial QuestionMark software [BSP+03] allow
this type of exercise while, previously, many domain-specific systems were created
using multimedia authoring packages such as Asymetrix Toolbook [Asy94] and
Macromedia Authorware [Mac95].
2.1.6.2 Free-response Automated Assessment
Free-response automated assessment includes the assessment of programming
assignments, essay exercises and diagrammatic exercises.
Programming assignments occur frequently within Computer Science education.
Efforts to automate them result from the historically increasing popularity of
technology courses, in terms of student numbers, and the background of many
educational technology pioneers within the Computer Science field itself.
2. CBA, formative assessment and diagramming
29
The Ceilidh system [BBF+93] was an important pioneer in demonstrating the
feasibility and usefulness of automatically assessing programming assignments.
Ceilidh was also one of the first systems to cater for the full lifecycle of a CBA
exercise. Ceilidh checked for the presence of designated tokens in a student’s
program text and simulated output using an extended regular expression notation
called “oracles” [ZF92]. Ceilidh was rendered able to mark assignments in many
domains, including several programming languages, through its multi-layered
architecture and its introduction of several key CBA concepts such as marking tools,
multiple user views and a logical course structure. Ceilidh was an important
influence on subsequent CBA development and is considered in more detail in
section 3.3.1.
The direct successor to the Ceilidh system is CourseMarker [FHH+01, FHS+01].
CourseMarker takes its inspiration from the most successful aspects of Ceilidh but
benefits from an improved, object-oriented design and a platform-independent
implementation in Java. The result was a system which built upon the success of
Ceilidh, but which had increased usability, maintainability and extensibility. The
CourseMarker system was used as a platform for the work described in this thesis
and is considered in more detail in section 3.3.2.
Another system influenced by the example of Ceilidh is ASSYST [Jd00, JU97]. Like
Ceilidh, ASSYST caters for the full lifecycle of a CBA exercise and is aimed to be a
complete “grading support system” rather than a simple assessment tool. ASSYST is
used to analyse programming assignments in the C language according to
correctness, style, complexity and run-time efficiency; like Ceilidh, it is possible to
allocate proportional weightings to the tests. Unlike Ceilidh, ASSYST takes a
“hybrid” approach between CBA and manual marking. Jackson and User claim that
this enables the system to benefit from the marking consistency and speed associated
with CBA while still maintaining fine control over student results. This approach
does, however, negate some of the benefits of CBA, such as the ability to allow great
numbers of submissions and the immediate return of full feedback, since teacher
intervention in the assessment process is required for each submission.
Another hybrid approach is taken by the BOSS system [JG04, JL98], which facilitates
online student submission and allows both students and teachers to run automatic
2. CBA, formative assessment and diagramming
30
tests on submissions, but which provides neither learning materials nor an
integrated, fully automated marking process by design. The student can run
automated tests as an aid to constructing and evaluating their solution and is then
able to submit their solution online. The educators can then run automated tests on
the solution, invoke plagiarism detection mechanisms, mark the solution and return
feedback online. BOSS does not operate as a fully automated, coherent CBA system
as a design decision. Joy and Griffiths [JG04] acknowledge that an integrated CBA
approach can act as a formative process; however, they state that the aim of BOSS is
to concentrate “on the process, and measuring the correctness of students’ code” and argue
that CourseMarker and Ceilidh “prescribe” a style of programming through their
frequent, automated checking.
The Kassandra system described by von Matt [Mu94] conducts automatic testing of
programs written in the Matlab, Maple and Oberon languages. These languages are
mathematically based and correctness is determined by matching output data with
defined test data. Little infrastructure is provided for automatic testing and test
software is developed by the exercise developer. Kassandra does, however, support
more than one type of user: the “student” and the “assistant”.
The RoboProf system described by Daly [Dc99] also uses output checking. RoboProf
concentrates on assessing the syntax and structure of programming languages rather
than program correctness. RoboProf is influenced by the architecture of Ceilidh and
is used for formative assessment purposes. RoboProf is considered in more detail in
section 3.1.2.1 on CBA systems in a formative assessment context.
The TRAKLA system described by Korhonen and Malmi [KM00] is primarily a
Computer Based Learning system which presents a visual environment to teach
students the concepts of algorithms and data structures through the use of diagrams
and animations. However, a formative assessment component used to test student
understanding has been introduced based upon the Ceilidh model. The latest
version, TRAKLA2, is considered in more detail in section 3.2.1.
The ASAP system described by Douce et al [DLO+05] utilises “test classes” which
can be seen as analogous to Ceilidh’s marking tools. Each test class must provide
objective criteria for evaluation, feedback for the test and a single mark to evaluate
the submission. Standardisation of test classes is accomplished through a template
2. CBA, formative assessment and diagramming
31
superclass which all test classes must extend, as in CourseMarker. ASAP was
developed to be integrated into institution-wide e-learning frameworks such as
Blackboard or WebCT using VLE standards. This allows ASAP to take advantage of
institutional infrastructure and represents an advance on the typical CBA approach
of developing standalone systems.
The JEWL system described by English [Ej02] assesses student programs which
involve a Graphical User Interface (GUI). English believes that this increases student
motivation since programs with GUIs are seen by students as “real programs” rather
than toys. The JEWL system is an object-oriented toolkit; the student solution is
replaced by a “test harness” which interprets those instructions which the student
program executes. Further research into the assessment of student GUIs is also being
carried out using CourseMarker as a platform [GH06].
Essay exercises are popular as an assessment tool since they are seen as a proven
way to test higher-order cognitive skills such as synthesis and analysis.
Landauer [LD97, LHL98] described an approach based upon Latent Semantic
Analysis (LSA). LSA emphasises essay content by analysing word co-occurrences
while ignoring the linguistic and structural features of an essay. LSA scores typically
correlate as well with human graders as different graders do with each other [CO97].
A variant of LSA was used as the basis of a prototype system for assessing essays
using Ceilidh as a platform [FL94].
Page [Pe94] describes the Project Essay Grade (PEG) system, which uses a model
with the essay’s surface features, including document length, word length and
punctuation features, as independent variables and the essay score as the dependent
variable. PEG scores have been found to correlate better with human graders than
the graders correlate with each other [PPK97].
Rudner and Liang [RL02] describe a statistical approach based upon Bayesian
networks which is simple to implement and can be used on short essays. A student
response is classified into one of three grades (complete, partially complete or
incomplete) according to probabilities which have been associated with features of
the essay as likely to be appropriate, partial or inappropriate. Rudner and Liang
argue that their approach can incorporate the best features of earlier methodologies.
2. CBA, formative assessment and diagramming
32
Diagrammatic exercises are the least common of the free-response CBA exercise
types examined here. Hirmanpour [Hi88] describes an automated diagramming tool
which allows students to develop Data Flow Diagrams, Entity-Relationship diagrams
and Structure charts without using licensed software. The assessment however, is
traditional; automated assessment of diagrammatic exercises is regarded as difficult.
Power [Pc99] presents a development environment called Designer which allows the
student to interactively design structure diagrams. Designer can analyse the
diagrams and present programs and control structures. Designer allows the student
to interactively “walk-through” the program represented by the structure diagram
they have designed. The commercial object-oriented design package IBM Rational
Rose [Qt99] allows code templates to be generated from object-oriented class
diagrams in the C++ and Java programming languages, although this functionality is
primarily aimed at software developers.
Hoggarth and Lockyer [HL98] developed a system in response to problems in
teaching systems analysis and design. Computer Aided Software Engineering
(CASE) tools are often used in teaching this subject to allow students to apply the
basic theory and concepts, but most CASE tools are intended for commercial use and
do not cater for students who require assistance in underlying concepts. Hoggarth
and Lockyer describe an interactive CBA learning system which embeds a Computer
Assisted Learning (CAL) system within an existing CASE system to assist student
understanding. A verification mechanism for student diagrams relies on the student
manually matching the meanings of ‘tokens’ (the names of diagram components) in
their solution with the corresponding tokens in the model solution. The verification
mechanism then compares the diagrams as two directional “flows” of modes and
connections and notes the differences in ordering between the two. Specific feedback
is then provided to the student, which can be used to improve the solution in an
iterative, formative process. Hoggarth and Lockyer’s diagram comparison system
and the feedback provided are reviewed in more detail in section 3.2.3.
DATsys [Ta02] is a framework for conducting diagram-based CBA. DATsys caters
for the full lifecycle of automated assessment exercises through integration with the
CourseMarker CBA system. Diagrammatic domains can be defined by exercise
developers without programming, using the diagram editor component Daidalos.
2. CBA, formative assessment and diagramming
33
Marking tools for diagrams are defined on a per-domain basis as CourseMarker
marking tools. Tsintsifas reports on the use of DATsys to assess student coursework
in logic design, flowchart and object-oriented design as part of a first-year
undergraduate course in Software Tools. Logic design exercises involved the student
drawing an analogue circuit diagram, which was assessed by a marking tool which
simulated a circuit based upon the student diagram, provided test data and checked
output properties. Flowcharts were translated into programs and marked as such
using tools developed in CourseMarker for the assessment of programming
exercises. Object-oriented design diagrams were marked using a tool which tested
the features of the tool, such as the presence of nodes and the connections between
them. DATsys was used as a platform for this work and its architecture and
infrastructure is detailed further in section 3.3.3.
Thomas [Tp04] reports on the use of a simple drawing tool to allow students to draw
diagrams as part of an online examination. The drawing tool used a diagram
representation consisting of simple nodes and links and allowed students to arrange
the diagram elements and to define the text labels which were associated with them.
Most students were able to use the drawing tool even though they were under exam
conditions and unfamiliar with the tool itself. Later research [TWS05] investigated
the assessment of student entity-relationship diagrams using a system which
compared the features of a student solution with those of a model in a similar way to
DATsys’ assessment of object-oriented design diagrams. This system, together with
the feedback provided to students, is further reviewed in section 3.2.4.
2.1.7 Assessing Higher Cognitive Levels using CBA
Automated assessment has traditionally been associated with testing only the lower
levels of Bloom’s cognitive taxonomy due to the most common, fixed-response
question types being dismissed as “mere” objective testing. It is still common for
automated assessment to be regarded as having an “inability” to test higher skills due
to its reliance on “simple techniques such as pattern-matching of input” [RJE02].
Research into assessing higher levels of Bloom’s cognitive taxonomy fits into two
broad categories. The first approach is to carefully design objective test questions
according to criteria designed to force students to demonstrate higher order
cognitive abilities such as analysis, synthesis and evaluation. The second approach is
2. CBA, formative assessment and diagramming
34
to utilise more complex automated assessment mechanisms to enable CBA to assess
question types which would traditionally be used to assess higher order cognitive
abilities.
McKenna and Bull [MB99] present an overview of the techniques often used to
design effective objective test questions. The techniques focus primarily upon the
construction of multiple choice questions and a series of weak questions with
improved counterparts are presented. Techniques discussed include: constructing
the question stem as a definite statement, avoiding irrelevant material, constructing
the stem to test a student understanding of the domain rather than reading
comprehension, concentrating material in the stem and avoiding option duplication.
Techniques for extending MCQs to the preferred variants, such as multiple true /
false questions, assertion-reason items, multiple response questions, matching test
items and text match response problems are then considered.
Duke-Williams and King [DK01] set out an explicit approach to question design
using a revised version of Bloom’s taxonomy. The authors note the limitations of
traditional approaches, such as constructing questions using verbs known to be
associated with higher-order learning outcomes, and demonstrate a system of
question design which makes use of both MCQs and graphical hotspot questions.
Stephens and Percik [SP03] document the procedure of creating questions based
upon Bloom’s taxonomy through a process of concept mapping.
This research forms part of the second strand: that of automating the assessment of
more complex question types through the use of more complex assessment
mechanisms. Essays [RL02], programming assignments and diagrammatic questions
[Ta02] are suitable for assessing the higher cognitive levels of Bloom’s taxonomy and
can therefore be used to reduce the pedagogical drawbacks associated with CBA
usage.
2.1.8 Summary
Section 2.1 defined Computer Based Assessment in relation to other areas of learning
technology in terms of the number and types of processes that are automated. The
motivations for the development of CBA technology were considered, and a brief
history of CBA development was provided. CBA has practical advantages, such as
2. CBA, formative assessment and diagramming
35
the saving of resources (time), and pedagogical advantages in terms of reliability and
other assessment criteria. CBA’s practical limitations are primarily infrastructural,
since introducing CBA into an academic environment is a resource-intensive process.
CBA’s pedagogical limitations relate to its perceived inability to assess the higher
cognitive levels as defined in taxonomies such as Bloom’s. Attempts to minimise
these pedagogical limitations may include the careful construction of objective
questions or the automation of question types traditionally used to assess higher
cognitive levels. An overview of the fixed-response and free-response CBA question
types was provided. This research forms part of the free-response strand of CBA into
the automated assessment of diagrammatic exercises. Section 2.2 will introduce the
key concepts associated with formative assessment.
2.2 Formative Assessment
2.2.1 Definition
Formative assessment is typically defined thus: “Formative assessment involves methods
designed to establish what progress a student is making during learning and to enable giving
feedback on it” [Bj93].
Such a definition carries two implications: firstly, that formative assessment must
occur during the process of learning and, secondly, that the most important
deliverable associated with formative assessment is feedback. These two implications
are complementary; the aim of the feedback is to improve the learning of the student
whilst that learning is still ongoing. Thus, formative assessment stands opposed to
summative assessment, whose central function is to provide an indicator of
achievement (e.g. in the form of a grade) at the conclusion of a unit of learning, rather
than feedback. Knight [Kp01] goes further in arguing that only formative assessment
truly provides feedback and that the results of summative assessment merely
constitute “feedout” since they may have little impact on the subsequent learning
process.
Assessment may sometimes be drawn into the four categories of formative
assessment, summative assessment, diagnostic assessment and self-assessment. In
this model, summative assessment remains opposed to the other three types in both
2. CBA, formative assessment and diagramming
36
form and function. Self and diagnostic assessments constitute further specialisations
of formative assessment with specific forms and purposes [Mm02].
The working definition of formative assessment adopted here is as follows:
“Formative assessment is assessment conducted throughout the learning process, as an
integral part of that process, where the central aim is to provide feedback to enable the
enhancement of learning.”
It is inferred, firstly, that feedback includes that given to both student and educator
to enhance the process of learning and, secondly, that the provision of feedback takes
priority over the generation of “feedout” such as summary grades or marks.
2.2.2 Benefits of Formative Assessment
There are many pedagogic advantages associated with formative assessment [Kp01].
Formative assessment:
•
Encourages openness among students;
•
Can be used to assess a great scope of learning outcomes;
•
Can help in avoiding mark aggregation;
•
Discourages plagiarism.
Rowntree [Rd87] considers the relationship between an education system’s learning
objectives and its assessment mechanisms thus: “If we wish to discover the truth about
an education system, we must look into its assessment procedures. What student qualities
and achievements are actively valued and rewarded by the system. How are its purposes and
intentions realised? To what extent are the ideals, aims and objectives professed by the system
ever truly perceived, valued and striven for by those who make their way within it? The
answers to such questions are to be found in what the system requires the students to do in
order to survive and prosper. The spirit and style of student assessment defines the de facto
curriculum.”
Assessment, therefore, defines what students learn. Few students will expend effort
in trying to acquire those skills and knowledge which are not rewarded by the
system, irrespective of the stated aims and objectives of the course on paper. Brown
2. CBA, formative assessment and diagramming
37
[Bg01] notes that, “assessment shapes learning, so if you want to change learning change
the assessment method.” Students increasingly learn how “to play the exam game”,
engaging in surface learning strategies at the expense of genuinely broad and deep
learning.
By its nature, summative assessment invites deceit since the informed student is
aware that it is their work that is being assessed, not themselves. The student has an
interest in emphasising their knowledge and hiding their ignorance in any work
which is summatively assessed in the hope of attaining the greatest possible final
grade. Similarly, the student has an interest in focusing only on those sections of a
syllabus which are likely to be directly assessed and ignoring all others.
Knight [Kp01] notes that good formative assessment encourages disclosure rather
than deceit. The student is much more likely to admit an area of ignorance if the
consequence is further assistance, rather than a lower mark. Given Rowntree’s
assertion above, this implies that formative assessment encourages a more general
programme of learning, reduces the impetus for students to “play the exam game”
and allows insight by the teacher into the effectiveness of the syllabus and the
teaching methods employed.
Similarly, formative assessment discourages plagiarism. Stefani and Carroll [SC01]
argue that plagiarism is difficult to define precisely and that much plagiarism may be
unintentional and could be solved by better education of students on academic
standards. However, plagiarism is still an increasingly high-priority concern within
higher education. Since formative assessment encourages disclosure, and students
feel it is in their best interests to admit to weakness, then plagiarists harm only
themselves [Kp01]. In a formative assessment environment, therefore, plagiarism is
less likely.
Section 2.1.4 introduced the concept of reliability in relation to assessment. Modern
higher education learning outcomes are complex and demand assessment of what
are often referred to as “soft skills”. Knight presents extracts from a course handbook
in order to argue that many learning outcomes cannot be assessed reliably as part of
a summative assessment; Knight argues that formative assessment is the only
authentic method of providing feedback on these outcomes. The feedback is as fuzzy
as the learning outcomes, but the impact of this disadvantage can be minimised so
2. CBA, formative assessment and diagramming
38
long as students are warned in advance and the assessment is formative. Thus, it is
often impossible to assess many fuzzy outcomes reliably and affordably other than
with formative assessment since it is less constrained by reliability concerns.
Furthermore, Knight argues that an increase in the proportion of formative
assessment can be utilised (as part of a strategy which also incorporates a
proportionally smaller summative assessment element) as a solution to the noted
problem of mark aggregation within increasingly modular higher education courses.
Black and William [BW98] conducted a survey of 681 research publications on
formative assessment. They concluded that formative assessment acts to improve the
student learning process to an extent that, “if best practices were achieved in mathematics
on a nationwide scale that would raise ‘average’ countries such as England and the USA into
the top five”.
Despite the pedagogical advantages associated with formative assessment, recent
times have seen a marked decline in its use on higher education courses. The next
section examines this phenomenon and explains why it has occurred.
2.2.3 Drawbacks associated with Formative Assessment
The drawbacks associated with formative assessment can be grouped into two broad
categories: the pedagogic and the practical. Many of the pedagogic drawbacks can be
ameliorated using known techniques. The practical drawbacks are traditionally seen
as more implacable.
Yorke [Ym01] identifies four main academic problems associated with high
formative assessment use:
•
A student who wished to terminate a module which traversed multiple
semesters before the end of the final semester would have difficulty obtaining
credit if the assessment conducted up until that point was formative;
•
Again, in a multi-semester module, it is difficult to ensure equity between
semesters if formative and summative assessment is unevenly distributed
across the semesters;
2. CBA, formative assessment and diagramming
•
39
Students may under-prioritise formative assessment when under pressure to
complete summative assessments (and, possibly, to engage in paid
employment);
•
Problems occur in ensuring that formative assessment is maximally effective
— the student must be provided with good feedback and make good use of
the feedback to improve their future learning.
The first two issues cannot be decisively addressed within the scope of assessment
design. They are systemic and can only be dealt with at the programme or even
institutional level.
A common suggestion which attempts to address the third problem is the use of a
two-part assessment strategy [Kp01]. Here, formative assessment is used initially, with
a summative element introduced later as a motivator. So long as the summative
component is in some way based upon the formative — for example, an assignment
could be marked formatively, to be followed by a question based upon the
assignment which is assessed summatively — then the assessment as a whole is
viewed with higher priority by the student. Hence the pedagogic advantages of
formative assessment can still be retained. Depending upon the timing of the
summatively assessed components, this could, additionally, go some way towards
addressing the first two problems. Unfortunately, adopting a two-part assessment
strategy exacerbates the practical problems associated with conducting formative
assessment still further.
The problem of designing useful, effective formative feedback is examined in section
2.2.5.
The practical problems associated with formative assessment are simpler yet more
consequential than the pedagogic problems. Effectively, formative assessment is
viewed as being costly to undertake, especially in terms of time to mark assessments.
The creation of rich and meaningful feedback to the student is more involved than
simply assigning a grade or mark. In the past this was tolerated, but as Chapter 1
pointed out, recent years have seen a marked decline in staff-to-student ratios and
educators are expected to teach students with ever-decreasing unit-resource. Given
this deterioration, many staff simply believe that they do not have the time or other
2. CBA, formative assessment and diagramming
40
resources which would be necessary to undertake formative assessment to the levels
used in the past.
Furthermore, the two-part assessment strategy proposed as a solution to some of the
pedagogic problems associated with formative assessment would seem to indicate
that one should implement formative assessment and then do the summative assessment
as well anyway. Comprehensive formative assessment thus becomes viewed as a pipedream.
Several strategies have been proposed to overcome this practical difficulty. The next
section will briefly outline these strategies and place this work in context among
them.
2.2.4 Managing the Resource Intensiveness of Formative Assessment
Rust [Rc01] presents an overview of the assessment issues associated with teaching
large groups. Rust argues that there are six main methods which can be used to
maintain the quality of assessment within these difficult conditions:
•
“Front-ending” the assessment;
•
Conducting assessment in class;
•
Conducting self and peer assessments;
•
Conducting group assessments;
•
“Mechanising” the assessment;
•
Strategically reducing the amount of assessment conducted.
“Front-ending” the assessment refers to a strategy concentrating educator and
student effort at the beginning of the course. The purpose is to “set up” the students
for the work they are going to have to complete. An example is the creation and
dissemination of very detailed instructions or checklists, including examples, of what
is expected from the course’s assessment. Rust argues that this reduces the marking
time associated with misinterpretation of work and results in fewer student requests
for guidance.
2. CBA, formative assessment and diagramming
41
Conducting assessment in class refers to a strategy of performing assessment
alongside the presentation of teaching materials, within allocated class time.
Examples include setting assignments which can be undertaken and / or marked in
class or alternatively giving general feedback in class rather than individual feedback
to students after marking is complete.
Self and peer-assessment can be used as a technique for generating feedback for
students which would have been too time-consuming for staff to write. Group
assessment can also be used as a useful device for solving common student
problems. Self, peer and group assessment constitute active areas of research in their
own right; a general introduction is provided by Race [Rp01].
In its most general form, the strategy of mechanising the assessment refers to
standardising the assessment process in order to save time. Rust, in common with
many outside the CBA field, restricts ambition in this area to speeding up feedback
using statement banks and feedback tick-sheets. Statement banks are pre-written
archives of feedback statements from which the teacher chooses the most
appropriate; this saves time because the teacher does not have to consider the
wording of the feedback provided to the student. Feedback tick-sheets contain grids
of tick-boxes aligned with both scores and feedback statements. Feedback is returned
to the student by ticking the appropriate boxes and handing back the sheet itself as a
statement of feedback. Rust also promotes the use of objective tests such as MCQs to
ease the marking workload. Section 2.1 outlined the approach taken by Computer
Based Assessment technology to further mechanise the assessment and feedback
process.
A strategic reduction strategy can be split into two distinct approaches: reducing the
amount of assessment conducted, or reducing the amount of time spent providing
feedback. This is the least preferred option available, although it is often viewed as
practical. Rust argues that if assessment reduction is carefully considered then the
effect on students need not be hugely detrimental in all cases.
This work falls within boundaries of the fifth strategy for teaching large groups, that
of mechanising the assessment. The benefits and drawbacks of mechanising
assessment using CBA software have been examined in sections 2.1.4 and 2.1.5
respectively.
2. CBA, formative assessment and diagramming
42
Front-ending has been criticised for resulting in student work of high conformity and
little originality. Performing the assessment in class involves allocating time
previously used for teaching to assessment, hence simply “moving” the problem and
restricting teaching opportunities. A strategic reduction strategy in assessment
would view formative assessment as more “expendable” than the summative
components; formative assessment would, therefore, be disproportionately reduced.
Self, peer and group assessment are assessment forms which offer considerable
opportunities for future research. However, the validity and reliability of these forms
is as yet unproven. Furthermore, the use of these assessment forms as part of a twopart assessment strategy would require considerable problems to be overcome; these
forms may, therefore, be susceptible to low student take-up or effort.
The next section focuses on criteria for designing effective feedback for formative
assessment.
2.2.5 Effective Feedback for Formative Assessment
The central aim of formative assessment has been defined previously in section 2.1.1
as “to provide feedback to enable the enhancement of learning”. In order to claim
that good formative assessment has occurred, therefore, it is necessary to show that
good formative feedback has been produced. Nicol and Macfarlane-Dick have
proposed seven principles of formative feedback practice as a result of their
conceptual model based upon student-centred learning methodologies [JMM+04].
These criteria will be used to judge the effectiveness of formative assessment
conducted using CBA techniques and are briefly explained here.
A good feedback framework for formative assessment should:
1. Facilitate the development of self-assessment (reflection) in learning
An assessment programme which is overly educator-led will produce students
dependent on others for instruction. A student should be expected to monitor the
divergence between their internal perception of the task and the outcomes being
produced. Critical self-assessment is a good technique for allowing students to
evaluate the weaknesses of their own work and thus to incrementally improve their
standards and develop their evaluative skills.
2. CBA, formative assessment and diagramming
43
2. Encourage teacher and peer dialogue around learning
Educator feedback serves as a valuable objective yardstick against which students
can evaluate their work, but there is considerable evidence that students find this
difficult to internalise and respond to productively. Feedback from peers may be
useful because another novice experiencing the same learning curve may have
experienced similar problems and is likely to be able to communicate compatibly.
Furthermore, the educator should be willing to respond to queries relating to
feedback. In this way, feedback should be viewed as a continuous dialogue rather
than as simple information transmission.
3. Clarify what constitutes good performance
A student is attempting to close the gap between their own internal perception of a
task and their current outcomes. Thus, the degree of overlap between the student’s
internal perception of the task and the actual goals of the educator in setting the
exercise is important and should be maximised. Poor performance by students may
be related to a misinterpretation of what is required. Hence feedback should include
mechanisms to clarify task requirements if students are to improve their performance
in future.
4. Provide opportunities to improve performance
Feedback should change subsequent student behaviour. The impact of feedback is
reduced if students receive feedback which points out the errors in the specific
solution, only to move on to a different assignment. In such a situation, a student
may regard the feedback as irrelevant. Students should therefore make a response to
feedback soon after it is delivered; ideally an opportunity should be provided to
repeat the task-performance-feedback cycle. A good method for achieving this is by
allowing resubmission. Furthermore, feedback should support students in producing
a piece of work by offering constructive advice rather than mechanically listing
errors.
5. Deliver information focused on student learning
Feedback should focus on the objectives of the task being attempted rather than
providing a list of unrelated strengths and weaknesses on a per-solution basis.
2. CBA, formative assessment and diagramming
44
Feedback should be delivered in good time and not be overwhelming in quantity,
utilising a manageable number of prioritised comments in order to maximize the
likelihood of corrective action. Feedback sheets with lengthy criteria lists and marks
discourage a view of the exercise as a holistic entirety. Hence the number of criteria
about which feedback is given should be controlled. Feedback should be available for
the student to consult in the future.
6. Encourage positive motivational beliefs and self-esteem
Student motivation is related to the type of external feedback the student receives.
Frequent high stakes assessment involving marks or grades lowers motivation and
leads to students concentrating on passing the test rather than learning. A mixture of
grades and feedback comments leads to students concentrating on the former at the
expense of the latter. Therefore frequent assessment in which only feedback
comments are provided is recommended. Feedback should also concentrate on
achieving future learning goals rather than current failure in order to lead to a
motivational ‘incremental view’ of learning.
7. Provide information to educators that can be used to help shape the teaching
If feedback to the students is to be of a high standard then the assessment process
should include a mechanism for feeding back good information to teachers.
2.2.6 Summary
Section 2.2 introduced the concepts of formative assessment. Formative assessment is
assessment conducted throughout the learning process, as an integral part of that
process, where the central aim is to provide feedback to enable the enhancement of
learning. Formative assessment has concrete pedagogic benefits and has been shown
to improve student learning. The pedagogic drawbacks associated with formative
assessment are not the main reason for its decline in usage in higher education
courses and can be minimised through institutional planning and the adoption of a
two-part assessment strategy. Formative assessment has declined in use because it is
seen as a resource-intensive assessment mode. Strategies for reducing resourceintensive assessment include front ending, doing the assessment in class, use of self
and peer assessment, group assessment techniques, use of a mechanisation strategy,
and strategic reduction strategies; this work constitutes an example of the use of a
2. CBA, formative assessment and diagramming
45
mechanisation strategy. Criteria were presented by which the effectiveness of
formative assessment feedback can be analysed. Section 2.3 introduces the concepts
associated with diagrams, examines their role in learning and assessment and
provides an overview of criteria used to assess good diagram layout.
2.3 Diagrams in Education
2.3.1 Definition
In 1911 James Maxwell [eb11] defined a diagram as “a figure drawn in such a manner
that the geometrical relations between the parts of the figure illustrate relations between other
objects”. This definition of diagrams as an abstract representation is general enough
to encompass the many forms that diagrams have assumed throughout history but it
is insufficiently concrete to be used as the basis for research.
This section will concentrate on the role of diagrams in education. Although
educational diagrams take many forms across multiple educational domains, it is still
possible to present a general definition which is specific enough to serve as the
starting point for research.
Dodson [Dd99] noted that a diagram is typically comprised of two types of
components: nodes and lines. Many common types of educational diagrams consist of
nodes linked by lines, but diagrams can alternatively consist of combinations of lines,
nodes overlapping, nodes ‘labelled’ by other nodes and nodes whose meaning is
determined by colour or other distinguishing feature. For a diagram to be
comprehensible, its notation must conform to a convention of meaning which describes
how the elements of the notation are combined to indicate a relationship or function.
2.3.2 History and Scope
Tsintsifas [Ta02] provides an overview of the historical role of diagrams. The word
“diagram” is Greek in origin and means literally to express using lines. The earliest
diagrams were land maps, which demonstrate a low level of abstraction because of
the direct relationship between the diagram and the terrain it represents. Later
diagrams demonstrate an increased level of abstraction: examples include ancient
Greek illustrations of geometric concepts, philosophical representations found in
2. CBA, formative assessment and diagramming
46
European texts from the Middle Ages and family trees of genetic lineage. The level of
abstraction apparent in diagram representations began to increase markedly in the
17th century with the development of the Cartesian co-ordinate system. Later stages
of progress include the development of Calculus and set diagrams.
This work will concentrate on modern diagrams used in education. Diagrams are
used in a wide variety of disciplines spanning science, the humanities and art.
In computer science, diagrams are used as an aid to visualisation, for solving
computational problems and for design purposes. Von Neumann [GV47] created the
“flowchart” notation to aid in the visualisation of algorithms. Statecharts [Hd88],
Petri nets [Pc65] and state transition diagrams [BGK+96] assist in the solving of
computational problems. Diagrams to assist in the object-oriented design of software
projects include UML diagrams [JBR98] and the earlier competitors such as Booch
diagrams [Bg93]. Lohse et al [LBW+94] present illustrations of sixty graphical
representations, including frequently used notations such as entity-relationship
diagrams, data-flow diagrams, Nassi-Schneiderman diagrams and pert charts.
Diagrams are used extensively across engineering disciplines. For example, circuit
diagrams for analogue and digital components are used in electrical engineering,
while manufacturing blueprints are used in mechanical engineering. Standards
databases exist to document the conventions of meaning of these notations [ISO05,
ANSI05].
In sports science, articulated body schematics are used to illustrate ideal athletic
body positions, examine body stresses, determine ranges of motion and calculate
acceleration [MLC03]. A wide variety of diagrams are used across biology to describe
biological processes and structures; famously, Watson and Crick [WC53]
accompanied their discovery of the DNA double-helix structure with a “purely
diagrammatic” representation. Interdisciplinary diagram notations with less formal
conventions include concept maps [GS95] and mindmaps [Tb93]. More generally,
Blackwell and Engelhardt [BE98] have proposed a “taxonomy of taxonomies” for
diagrams. Blackwell and Engelhardt present six taxonomic dimensions for diagrams,
each of which represents a category of interest in research: representation, message,
relation between representation and message, task and process, context and
convention and mental representation.
2. CBA, formative assessment and diagramming
47
2.3.3 Diagrams in Automated Assessment
Tsintsifas [Ta02] conducted research into diagrams within the context of automated
assessment. He noted that approaches to developing diagram editors could be
grouped into three broad categories: multi-domain diagram editors, frameworks and
diagram editor generators. A multi-domain diagram editor aims to address the
editing of a group of related diagram domain notations and is usually specialised to
provide editing of a group of related notations in a subject area, for example software
engineering. A framework allows developers to create new editors by extending the
framework, taking advantage of existing design and implementation; this approach
allows great freedom in editor customisation but requires more effort from the
developer. A diagram editor generator requires the developer to provide a
specification for a diagram notation in a customised grammar. The program then
generates a software implementation of a new diagram editor from the specification
provided.
Multi-domain diagram editors include Thinglab [Ba79], which allows a description
to be followed by the runtime execution of constraints by using the interpreted
language Smalltalk. The visual and non-visual attributes of the diagram editors can
be related based upon formulas described in Smalltalk. The Templa/Graphica
system [Hs90] uses a “design template” to customise a graphical editor, while
MetaBuilder [FWW00] allows specifications for a new editor to be provided in the
form of a “meta-diagram” which describes the elements, relationships and
constraints of the diagram domain. The commercial software package Microsoft Visio
[En01] also conforms to the multi-domain diagram editor pattern.
Framework approaches include MacApp [App89], ET++ [GMW88], Unidraw [VL89],
HotDraw [Tek87] and JHotDraw [BG97]. In each case, to create an editor for a new
diagram domain the developer must create specialised classes for each abstraction,
building on top of the existing implemented architecture.
Diagram Editor Generators include Minas [Vg95], which relies on the construction of
a hypergraph grammar to define a new editor based upon an archive library of
graphical components. GenEd [HW96] allows editors for “visual languages” (of
which educational diagrams are a subset) to be defined using algebraic
specifications. The Penguins system [CM03] allows the realtime creation of editors
2. CBA, formative assessment and diagramming
48
for a wide variety of diagram domains using a constraint multiset grammar as the
specification for the language. Penguins also allows editors to be created from
malformed or incomplete grammars through a system of incremental parsing.
Tsintsifas concluded that existing multi-domain diagram editors, frameworks and
diagram editor generators were generally unsuited for use in an assessment context.
Multi-domain graphical editors are constrained in scope and lack the features
required
for
automated
assessment.
Existing
frameworks
were
aimed
at
programmers and developers, had many extraneous features and were considered
overwhelming in an assessment context due to their complex architectures. Similarly,
diagram generators were unsuitable because a deep understanding of the mechanics
of the generator was required to specify a new domain; non-programming users such
as assessment developers could not be expected to attain such specialist domain
knowledge simply to set a new exercise domain.
Tsintsifas developed DATsys, an object-oriented framework whose classes make up
a reusable design for CBA-oriented diagram editors. Daidalos, Ariadne and Theseus
are presented as concrete subclasses for such diagram editors, intended for use by
domain developers, exercise developers and students respectively. Representations
for diagram domains are specified using Daidalos. Daidalos defines tools for the
creation of figures, diagram elements, tools and commands, as well as a selection
editor which allows domain libraries of diagram notations to be managed.
Developers using Daidalos to author diagram domain notations can define diagram
elements in terms of their graphical view, underlying data model and connectivity
constraints.
DATsys is fully integrated into the CourseMarker CBA system and makes use of
CourseMarker’s system of marking tools to provide a generic marking mechanism
which will allow any diagram notation to be marked. A drawback of this generality
is that the development of marking tools, which are necessary to assess diagram
domains, is left to the developer, who must have knowledge of both the domain to be
marked and the system of marking tools. DATsys was used as a platform for this
research, and a more detailed overview is provided in section 3.3.3.
Thomas et al [TWS05, STW04] concentrated their attention on the “network-like
domains” which are common in computer-science education, such as entity-
2. CBA, formative assessment and diagramming
49
relationship diagrams and pipelines. The smallest meaningful unit, an “association”,
is defined as two nodes connected by a line. Student diagrams are assumed to be
“imprecise” compared with a model solution, since required features may be missing
or incorrectly presented and extraneous features may also be included. Thomas et al
concentrate on a tool which conducts a comparison of the associations found in both
the model solution and the student solution. The research of Thomas et al is
reviewed in more detail in section 3.2.4.
2.3.4 Aesthetics of Educational Diagrams
2.3.4.1 Aesthetic Criteria
Section 2.3.1 noted that most educational diagrams consist of a collection of nodes
with lines linking the nodes. A convention of meaning applies to define how the
nodes and lines logically interact to relate meaning to the reader. The reader’s
interpretation of the diagram is also influenced by the aesthetic layout of the diagram,
such as the physical relationships between the diagram elements. By implementing
consistent diagramming practices including clear layouts, reader confusion can be
minimised [Dfa04].
Aesthetic principles have been proposed in fields as disparate as fine art and
architecture. For the purposes of considering the physical relationships between
nodes and lines in educational diagrams, generalised approaches can be found in the
fields of graph layout and user interface design. The field of graph layout suggests
precise attributes for graphs which can be assessed mathematically, but doubts
remain over the merits of the resulting automatic layout algorithms [EG03]. User
interface design principles consider the layout of user interface primitives, which can
be considered as nodes, on a computer display, but in this context the nodes are not
connected by lines. Both of these approaches are examined in more detail below. It is
also necessary to consider the domain-dependant layout rules which exist for many
common educational diagram types.
When considering aesthetic criteria it is important to bear in mind the point made by
Purchase et al [PAC02] that not all criteria are of equal importance. Purchase et al
conducted a study into how the aesthetics of UML class and collaboration diagrams,
common educational domains, are perceived by readers. “Preference tests”, based
2. CBA, formative assessment and diagramming
50
upon a user’s instinctive preferences between a set of diagrams placed before them,
were conducted using sets of technically literate (though not necessarily UMLconversant) volunteers. Volunteers were provided with pairs of diagrams in which
one of the pair emphasized a given feature while another did not and were asked to
indicate their preferred diagram with reasons. Quantitative data were collated using
a points system for volunteer responses while qualitative data, in the form of the
stated reasons, were used to search for confounding factors. In this way, the relative
importance of six aesthetic measures commonly suggested in the literature, as well as
two additional domain-specific layout rules for each of the two types of diagram,
were ascertained.
For UML class diagrams the most popular measure, concerned with the
minimisation of edge crossings, had a 93% preference level while the least popular
measure, concerned with directional indicators on diagram arcs, rated 60%. The most
and least popular measures for UML collaboration diagrams were spaced even
further apart. Although all measures scored more than 50% preference and hence
would appear to be valid, all the measures are not of equal value. Therefore, any
educational system which aims to take into account the aesthetic merit of a diagram
layout must not only determine the specific criteria to be measured but also the
relative importance, or weighting, of the criteria.
2.3.4.2 Criteria from Graph Layout
The field of graph layout classifies graphs into two broad groups: syntactic graphs,
which are abstract and have no real-world meaning, and semantic graphs, which have
a real-world meaning and are usually used to convey information within a domain.
Petre [Pm95] notes that semantic graphs are subject to “additional secondary notations”
which tend not to be defined within the formal syntax. Layout features associated
with syntactic graphs cannot be transferred ad hoc to semantic graphs, including most
educational diagrams; to be relevant, a syntactic layout measure must have a realworld application.
The two most commonly cited criteria in graph layout, and two of the easiest to
calculate, are the number of “bends” in connection lines and the number of
connection lines that cross or overlap other connection lines. Tamassia [Tr87]
proposes that an optimal graph has nodes exclusively connected by straight
2. CBA, formative assessment and diagramming
51
connection lines and states that curved and segmented connection lines should be
minimised. Tamassia et al [TTV00] later presented an algorithm to this end within an
automated layout context. Diagramming guidelines support the minimisation of
bends in such real-world domains as entity-relationship diagrams [Dfa04].
Reingold and Tilford [RT81] state that trees should avoid the “overlapping” of
connection lines, while Stedile [Sa01] acknowledges this as a principle for drawing all
graphs. Sugiyama [Sk02] concentrates on minimising the overlapping of both
connection lines and nodes. The study by Purchase et al [PAC02] concluded that this
was the single most important aesthetic consideration in the presentation of UML
diagrams.
Papakostas and Tollis [PT00] outline the concept of graph orthogonality. An optimal
orthogonal graph has the nodes and connection lines aligned to a regular grid
pattern. Nodes should be aligned with grid intersections, while connection lines
should lie along the gridlines. Tamassia [Tr87] proposes a similar aesthetic measure,
while studies in the real-world domain of user interface design have produced
similar measures [NTB00].
Coleman and Stott Parker [CS96] propose a measure which seeks to minimise the
physical width of a drawing. Other measures concentrate on the text labels which
accompany a graph: text direction should be uniform and the font typeface should be
consistent throughout [Pm95, Dfa04].
2.3.4.3 Criteria from User Interface Design
Ngo et al [NTB00, NB01] provide an overview of aesthetic measures from the field of
user interface design, presenting fourteen measures which constitute a “theoretical
approach to capture the essence of artists’ insights”. The authors acknowledge the criteria
as being applicable outside the field of user interface design; their aim is to provide a
mathematical model in which the insight of artists is represented by a series of
mathematical formulae which assume values between 0 and 1.
The fourteen aesthetic measures proposed by Ngo et al are briefly summarised in
Table 2.3. It can be seen that a subset of the aesthetic criteria overlap with the criteria
from the graph layout literature outlined in section 2.3.4.2.
2. CBA, formative assessment and diagramming
52
Measure
Description
Balance
The distribution of “optical weight”. Optical weight is calculated from
its area, colour and shape. Balance is considered both vertically and
horizontally.
Equilibrium
The difference between the centre of mass of the elements and the
physical centre of the screen / canvas.
Symmetry
The level of axial duplication. Symmetry is measured horizontally
(about the horizontal axis), vertically (about the vertical axis) and
radially (about two or more axes which intersect at a central point).
Sequence
The arrangement of objects in a way that facilitates the movement of
the eye through the information displayed. In Western culture the eye
is trained to move in horizontal lines from top-left to bottom right. The
eye moves most easily from big to small, bright to subdued, colour to
black-and-white and irregular to regular objects.
Cohesion
The degree of use of similar aspect ratios (the ratio of width to height)
in multiple-window systems.
Unity
The appearance of the elements as a visual totality. Elements should be
similar in terms of size, shape and colour. The distance between
elements and the distance at the margins of the figure should be
similar.
Proportion
The comparative relationship between the dimensions of elements and
those of 5 aesthetically pleasing proportional shapes: the square,
square root of two, golden rectangle, square root of three and double
square.
Simplicity
Directness and singleness of form, achieved by optimising the number
of elements and minimising visual alignment points.
Density
The proportion of the screen / canvas covered by objects. Screen
density levels should be reasonably minimised.
Regularity
The uniformity of elements. Horizontal and vertical alignment points
should be standard and consistently spaced. The number of alignment
points should be minimised.
Economy
The careful use of elements to get the message across as simply as
possible. As few styles, displays, techniques and colours should be
used as possible.
Homogeneity
The evenness of distribution of elements across the four quadrants of
the screen / canvas. Evenness means that each quadrant should
contain nearly equal numbers of elements.
Rhythm
Regular variation, the extent to which elements are systematically
ordered. Determined by variations in arrangement, dimension,
number and form of the elements.
Order / Complexity
The sum of the previous 13 measures for layout. Complexity refers to a
lack of order; extreme complexity and total order may thus be
considered opposite ends of the same scale.
Table 2.3: Fourteen aesthetic measures from Ngo et al [NTB00]
2. CBA, formative assessment and diagramming
53
2.3.4.4 Domain-specific Layout Criteria
Section 2.3.2 noted the high number of diagram types across a variety of domains.
Furthermore, it is a well-known problem that multiple, competing notations may
exist for the same purpose, within the same domain. Ambler [As04] argues that wellknown notation should always be preferred over esoteric notation. Ambler notes that
even within a specific diagramming standard, a “kernel notation” of the most wellknown features, often consisting of no more than 20% of the available specification,
can be used to accomplish a majority of communication. Ambler argues that this
kernel notation should be used whenever possible and less-known features avoided.
It is not feasible to provide an overview of all diagrammatic guidelines. Overviews of
databases of diagram standards are provided in [ANSI05] and [ISO05]. The research
of Purchase et al [PAC02] into the layout of UML diagrams was summarised in
section 2.3.4.1; Eichelberger and von Gudenberg [EG03] provide further insight into
this domain.
The U.S. government Defense Finance and Accounting Service document provides a
set of typical Diagramming Guidelines [Dfa04] which specify standard practices and
aim to accomplish increased consistency, improved readability and improved
pattern recognition through consistency. Introductory comments concentrate on
deletion and consolidation of diagrams and standard formats for diagram legends.
Subsequent sections provide domain-specific guidelines for Business Process Model
(BPM) diagrams, Function Hierarchy Diagrams (FHD), Entity Relationship Diagrams
(ERD) and Server Model Diagrams (SMD). Entity Relationship Diagrams are the
most frequently encountered of these diagram types within an academic context.
2.3.6 Summary
Section 2.3 introduced the concepts associated with educational diagrams. A diagram
can be viewed as a collection of nodes connected by lines. Diagrams have been used
as abstractions to represent information for several thousand years and are currently
used to illustrate concepts and assist design processes in a large number of academic
disciplines. Categories of interest in diagramming research are representation,
message, relation between representation and message, task and process, context and
convention and mental representation. Systems for creating diagramming editors can
2. CBA, formative assessment and diagramming
54
be categorised into the three approaches of multi-domain diagram editors,
frameworks and diagram editor generators. In assessment, approaches to
diagramming have included a CBA-integrated framework for diagram-based CBA
and research into the assessment of “imprecise” student diagrams. Diagrams with an
aesthetically good physical layout can help to reduce reader confusion. Criteria
approaches applicable to diagramming layouts can be drawn from the fields of graph
layout and user interface design aesthetics.
2.4 Chapter Summary
This chapter introduced the areas of CBA, formative assessment and diagramming.
Section 2.1 defined CBA in relation to other areas of learning technology in terms of
the number and types of processes that are automated. CBA was defined, the
motivations for the development of CBA technology were considered, and a brief
history of CBA development was provided. CBA has both practical and pedagogical
advantages, while its practical limitations are primarily infrastructural. CBA’s
pedagogical limitations relate to its perceived inability to assess the higher cognitive
levels as defined in taxonomies such as Bloom’s. Attempts to minimise these
pedagogical limitations may include the careful construction of objective questions or
the automation of question types traditionally used to assess higher cognitive levels.
An overview of the fixed-response and free-response CBA question types was
provided.
Section 2.2 introduced formative assessment. Formative assessment has concrete
pedagogic benefits and has been shown to improve student learning but it is seen as
a resource-intensive assessment mode. Strategies for reducing resource-intensive
assessment, including mechanisation strategies, were considered. Criteria were
presented by which the effectiveness of formative assessment feedback can be
analysed.
Section 2.3 introduced diagrams in education. Diagrams are currently used to
illustrate concepts and assist design processes in a large number of academic
disciplines. Systems for creating diagramming editors can be categorised into the
three approaches of multi-domain diagram editors, frameworks and diagram editor
generators. In assessment, approaches to diagramming have included a CBAintegrated framework for diagram-based CBA and research into the assessment of
2. CBA, formative assessment and diagramming
55
“imprecise” student diagrams. Diagrams with an aesthetically good physical layout
can help to reduce reader confusion. Criteria approaches applicable to diagramming
layouts were drawn from the fields of graph layout and user interface design
aesthetics.
Chapter 3
CBA approaches for formative assessment and diagrams
3. CBA approaches for formative assessment and diagrams
57
Introduction
Previous work on formative assessment using free-response CBA across multiple
diagrammatic domains in a generic, extendable way is undocumented in the
literature. Formative assessment using CBA techniques has hitherto been largely
conducted within fixed-response domains such as multiple-choice questions.
Formative assessment in free-response domains is less common in the literature but
some work has been documented in domains such as technical essays. Diagrammatic
CBA is relatively uncommon in the literature but several systems have been
documented. A commonly cited CBA system is CourseMarker, which incorporates
the DATsys framework for diagrammatic exercises. CourseMarker is the successor to
the successful system Ceilidh and is used as a platform for this work.
Section 3.1 outlines the approaches used in the literature to provide formative
assessment using CBA. Most examples in the literature use multiple-choice questions
as the domain and can be categorised into those systems which are based around
pre-existing software, usually commercial systems, and those systems which were
developed entirely by the educators themselves. The approaches are reviewed and
the feedback mechanisms are examined in light of the formative feedback guidelines
provided in section 2.2.5.
Section 3.2 outlines documented approaches to conducting CBA in diagrammatic
domains. Approaches are compared in terms of their flexibility (which is sometimes
carefully restricted in terms of student interaction) and their marking mechanisms.
Section 3.3 provides an in-depth examination of the CourseMarker and DATsys
systems which were used as a platform for this work. The Ceilidh system is
described since it provides important historical and theoretical background.
CourseMarker and DATsys are reviewed within the context of providing formative
CBA in diagrammatic domains and their current advantages and limitations are
discussed.
3.1 Using CBA technology to provide formative assessment
It is neither possible nor necessary to provide an exhaustive catalogue of all examples
in which automated assessment has been used for formative purposes. Stephens and
3. CBA approaches for formative assessment and diagrams
58
Mascia [SM97] noted the high usage of automated assessment technologies in 1997
and the trend has been for increased use of such technologies in a process which has
been described as “inexorable” [Br02]. Instead it is useful to compare examples of
various approaches taken to try to automate the process of formative assessment and
to contrast their relative merits. Some advantages, such as time-saving, are common
to nearly all automated assessment systems and so it is not useful to concentrate on
them here. Denton [Dp03], for example, reports considerable time-savings simply by
automating the process of returning feedback to students using an email system
based upon Microsoft Word.
A naïve comparison of CBA assessments might categorise examples based merely
upon whether the assessments used a commercially available package or had been
developed ‘from scratch’ by the academic staff. Considerations such as affordability
are important in the implementation of a CBA system and authors of in-house
systems routinely bemoan the level of resources required in the development of their
systems. However, as section 2.2.1 has previously argued, the primary deliverable of
formative assessment is feedback, and the examples considered here demonstrate
widely varying levels of feedback using the same system as a base for different
assessments. Therefore it is clear that any useful review of a formative CBA example
must consider both the practical system and the pedagogical approach: that is to say,
the technical abilities of the system and the level of feedback which is actually
provided to students to assist learning. In order to accomplish this, the framework
for effective formative feedback outlined in section 2.2.5 will be used as a benchmark
when assessing the CBA examples.
Overviews of prominent automated assessment systems already exist. Charman and
Elmes [CE98b] provide examples intended to be used by educators wishing to
develop their own systems. Rawles et al [RJE02] provide a review of systems in terms
of technical capability and ease of incorporation into teaching structures. Symeonidis
[Sp06] provides an overview of prominent CBA systems in terms of developmental
history, system requirements and automatic marking capability. The aim is not to
repeat this material here, but to review the systems in terms of their formative
assessment potential.
3. CBA approaches for formative assessment and diagrams
59
Section 2.1.6 defined and contrasted fixed-response and free-response CBA. Fixedresponse CBA is the easier to develop and therefore examples are more common.
Sections 3.1.1 and 3.1.2 consider, respectively, fixed-response and free-response CBA
used for formative assessment purposes.
3.1.1 Fixed-response formative CBA: a review
Due to the relative ease of implementation associated with fixed-response CBA,
together with the potential for standardised requirements across multiple
assessments,
fixed-response
formative
CBA
is
often
accomplished
using
commercially available software. A brief overview of several such systems was
provided in section 2.1; the most commonly documented platform for CBAs is
QuestionMark [BSP+03], which has been referred to as “something of an industry
standard” [RJE02] in overviews of the field. Several CBAs based upon commercial
platforms such as QuestionMark are considered, with emphasis upon the difference in
formative feedback delivery. The section then concludes with a review of several
systems developed by educational institutions themselves; the advantages and
drawbacks of this approach are also briefly considered.
3.1.1.1 Using existing platforms
Charman and Elmes [CE98c] describe the use of QuestionMark to conduct formative
assessment on a first-year undergraduate module on data analysis in a Geography
degree. Short tests are conducted using questions selected randomly from a stratified
data bank. The tests typically take 10-15 minutes to complete, but no strict time limit
needs to be imposed since the assessment is formative. To motivate students the tests
are integrated into the teaching of the course and a two-part assessment strategy is
used: several questions typically relate to the results of earlier practical experiments
and an end-of-module summative examination is conducted which is similar to the
earlier formative tests.
Charman and Elmes note that development costs included employing a research
assistant for 2 months to write the questions and feedback for the data bank. Since a
commercial package was used there was also an initial outlay to purchase the
software. Feedback from the system has allowed teachers to monitor student
progress. Charman and Elmes note that there has been little improvement in the very
3. CBA approaches for formative assessment and diagrams
60
poorest and the most able students, but that a central group of students labelled
“struggling” could be seen to be helped by the system considerably. Student
response to the CBA was generally positive: 64% agreed that the system constituted a
good way of learning while 56% agreed that the CBA was an improvement over the
previous assessment forms.
Opportunities to improve performance are constrained by this system since the tests
cannot be retaken, although Charman and Elmes do address this problem partially
by allowing students to review their results as a revision resource. Student feedback
is returned on a per-question basis and is specific to the student response; this system
is not optimal in focusing on student learning and encouraging motivational beliefs,
however it provides a concise method for clarifying good performance within the
context of the MCQ test. Reflection in learning would seem to benefit strongly from
this approach, however, since Charman and Elmes report the “unexpected” benefit
students now found “stimulating and interesting” a module centred around data
analysis material which was traditionally regarded as dry.
Greenhow [Gm00] provides an overview of the Mathletics system, again built using
QuestionMark as a platform. Mathletics is used for both formative and summative
assessment. Mathletics employs both MCQ and hotspot graphical questions. The
approach outlined by Greenhow does constitute a two-part assessment strategy since
a summative assessment is conducted using the software after the formative
assessments have taken place. Furthermore, Greenhow emphasises that student
problems are a focal point for discussion in subsequent student tutorials, facilitating
teacher and peer dialogue around learning. However, the key difference between the
“formative” assessments conducted using Mathletics and their summative
counterpart seems to be the suppression of feedback in the summative tests. The
formative assessments are still conducted in formal examination sessions to avoid
“cheating” for example, and Greenhow admits that extensive use of Mathletics
within a module may result in students having little experience in problem solving;
paper-based tests and worksheets are used in conjunction with Mathletics to achieve
these aims.
Wybrew [Wl98] describes the use of Question Mark to conduct CBA in Health Science
modules. The CBA replaces existing MCQ tests which were previously marked using
3. CBA approaches for formative assessment and diagrams
61
OMR technology and conducted throughout the module. A two-part assessment
strategy is used in which strictly voluntary formative assessments precede a
compulsory summative assessment at the end of the module. A comparison of marks
between the CBA assessments and the previous OMR assessments shows no
difference. Take-up of the formative assessments is low, but this is likely to be as a
result of limited access to the Faculty computing facilities on which the courseware is
available. A further drawback of the current approach is the fact that the feedback
mainly tells the student which answers were correct and wrong. A positive major
feature of this example of CBA is that academic staff are simply required to write the
assessment itself; technical issues such as converting the questions into the
proprietary Question Mark format are the responsibility of a separate Unit for
Learning, Technology, Research and Assessment within the university.
Other examples of formative Computer Based Assessments based around
commercial systems such as QuestionMark occur frequently in the literature. Hawkes
[Ht98] describes a course of automated assessment in an undergraduate Number
Theory course in which the assessment is linked with a sequence of workbooks
written specifically for the course; questions are variations of exemplars in the
workbooks and only feedback on the exemplar questions is provided.
3.1.1.2 In-house fixed-response CBA systems
Buchanan [Bt00] reports on the use of the web-based formative assessment package
PsyCAL to assess undergraduates in the first year of a degree in psychology.
PsyCAL’s infrastructure is composed of CGI scripts written in Perl; students access
the system through a web browser. Students are allocated three specific weeks
within a 15-week course to assess themselves using the system, although students are
free to use the system outside these weeks and many take advantage of this. PsyCAL
assesses MCQs exclusively, which are presented to students in short, informal tests
typically numbering 20 questions. At the end of the test the student is presented with
a list of those questions they answered incorrectly, together with formal references to
documents which would help the student to answer those questions. The student is
not presented with the correct answers to the questions since this may act as a
disincentive to further student research.
3. CBA approaches for formative assessment and diagrams
62
Buchanan conducted two studies using PsyCAL in which the level of integration
with the module was varied. Both studies demonstrated that the formative
assessment tool was useful to student learning. Buchanan notes that the package
operates with a test-study-retest cycle and emphasises the importance of “repeated”
automated assessment for formative purposes. Buchanan notes that the system was
difficult to develop and cautions against starting to develop a system from the
beginning if an existing system can be used.
The feedback mechanism described by Buchanan is effective. Feedback is short and
student independence is encouraged by the fact that further research is encouraged.
Opportunities to improve performance are provided since the student can repeat the
test any number of times. It is unclear if the system is inherently motivational but
Buchanan reports that 97% of respondents to a questionnaire would be willing to use
the package in other modules.
Amelung et al [APR06] describe the LlsMultipleChoice extension module for the opensource Plone content management system used to assess MCQs in Computer Science
modules. LlsMultipleChoice allows the grouping of questions into ‘units’ and
provides for the provision of instant feedback while allowing multiple submissions
by students; perhaps the greatest novelty of the system lies in its “localization
facility” which allows feedback to be provided in multiple languages (English,
German and French).
Hall et al [HRT+98] describe The Human Brain CD-ROM, a multimedia tutorial on the
human nervous system for use by undergraduates. Much of the project focuses on
the delivery of learning materials to students, which can be navigated in non-linear
pathways of the student’s choosing. Two assessment components are provided: a
“quick-test” option, which provides MCQs with typical feedback for incorrect
responses and an accumulated score, and the “concept test” component. Learning
objectives for each section of the teaching material is defined in terms of “concepts”,
and each test in the concept test is designed to assess understanding of a particular
concept. At the end of a concept test, students are advised of which concepts they are
perceived to have poor understanding and are referred to the appropriate teaching
materials. The Human Brain CD-ROM links assessment and learning in a similar
way to PsyCAL with similar advantages: feedback is short and student
3. CBA approaches for formative assessment and diagrams
63
independence is encouraged by the fact that further research is encouraged. Unlike
PsyCAL, The Human Brain CD-ROM takes on the responsibilities of providing all
teaching materials within the system with a consequent increase in development
resources. The Human Brain CD-ROM is based upon the Scholar’s Desktop CBL
platform [BS95], but due to the modified assessment component and the need to
generate the teaching materials the authors note that constructing the CD-ROM
involved “a huge amount of resources (both money and academic time)”.
Culverhouse and Burton [CB98] report on the use of Mastertutor to assess
undergraduates studying Electronics and Electrical Engineering. Mastertutor is based
upon the format of a “music master-class”: the system sets a problem, provides
information resources and accepts the student’s solution in the form of a
questionnaire. The student is then presented with a mark and shown a valid solution
in order to define what constituted good performance for the assessment.
The system is used as part of a “feedback cycle” where the solutions are discussed
while the problem is still fresh in the student’s mind. However, the student is not
typically allowed the opportunity to improve their performance through
resubmission; the impact of this would be limited in any case since the student has
already seen the optimal solution. It is doubtful how much reflection in learning
occurs, although dialogue around the assessment is clearly prioritised. The provision
of marks to the students raises questions about the extent to which the system is
motivational, but it is clear that feedback is delivered in good time and the objectives
of the task are exposed.
Paul and Boyle [PB98] describe a CBA system used to assess second year
palaeontology undergraduates. The assessment is designed to be simultaneously
formative and summative. Frequent assessments are conducted throughout the
module, feedback is given and students can be assessed by the system several times.
However, marks awarded by the system count summatively within the module; this
means that care must be taken to generate different tests each time the student
repeats the assessment to avoid students simply repeating the assessment to gain
higher marks. This is achieved by selecting questions from a large question bank. To
prevent students repeating assessments to re-attain a previous (high) mark, the
3. CBA approaches for formative assessment and diagrams
64
highest mark across all submissions for the assessment is counted for summative
purposes.
Paul and Boyle note that time is saved by both teaching staff and students and
acknowledge the importance of providing feedback in CBAs so that “students can
learn from them”. However, the mix of summative elements with the formative
assessment results in some rather awkward compromises. Feedback cannot be acted
upon because the next submission involves a different test; furthermore, students
know this in advance, so it is unclear how much attention they are likely to pay to
the feedback provided to them. Experience with conducting frequent assessments in
other domains with a dual formative-summative purpose has shown that students
are likely to sit repeated assessments merely as part of a “gambling” strategy to
chance on higher marks [BBF+93, Or98]. Paul and Boyle note that marks are not
consistently better than before CBA was introduced.
Vendlinski and Stevens [VS02] outline an approach which uses CBA technology to
provide information about student learning to educators, relating to the seventh
criteria for good formative assessment feedback outlined in section 2.2.5. The Hazmat
system is based upon courseware called IMMEX, a web-based CBA tool which
allows teachers to present domain-specific “simulations” to students. Hazmat is used
to assess high-school chemistry students, using a simulation in which students are
expected to guess the identity of a succession of chemicals by accessing information
presented by the system. Feedback to the student informs them of the correctness of
their choices. Feedback to educators is more complex. The system keeps a record of
the information accessed by each student before they attempted to guess a chemical
identity. Vendlinski and Stevens use an artificial neural network to identify groups of
similar performances from the data and then further analyse the features of the
performances in each group to identify the strategy represented by each cluster. The
distribution of students across clusters can be calculated and mathematical Markov
models used to determine the distribution of students after each student attempts to
solve a given number of successive cases within the problem set. The effectiveness of
student strategies was determined the probability of producing a correct answer, and
Vendlinski and Stevens developed a model to allow the probability of a student
changing strategy to be determined based upon the information accessed by the
student.
3. CBA approaches for formative assessment and diagrams
65
Vendlinski and Stevens’ research provides a credible strategy for allowing student
understanding of individual concepts within a course to be analysed using data
generated by fixed-response CBA. This information can then be fed back to educators
in order to improve course teaching; Vendlinski and Stevens’ research serves as a
reminder that teachers should benefit from good formative feedback in addition to
students (the 7th criterion of the framework for good formative assessment feedback
provided in section 2.2.5). The properties of the Hazmat system, however, would
appear to be difficult to generalise across many assessment and teaching domains
and the construction of a multiple-choice testing environment to use the same
techniques would probably appear contrived to students. Furthermore, the amount
of effort expended in creating the assessment is very large since not only the
assessment but also the teaching materials must be authored specifically for the
individual assessment.
Many more attempts to automate formative assessment using bespoke CBA systems
have been documented in the literature. Bull [Bj93] provides several useful case
studies including the CALM Project, which is used to teach mathematics, primarily
calculus, to engineering undergraduates. Students can progress through tutorials at
their own pace and access formative MCQs throughout. A summative assessment at
the conclusion of the module is conducted in a conventional manner.
3.1.1.3 Implications for formative assessment using CBA
The examples reviewed here provide several key lessons when attempting to
automate the formative assessment process using CBA software:
•
The pedagogic approach to assessment, especially the design of feedback, can
be at least as important as the technical capabilities of the CBA system in
determining the success of the formative assessment in assisting student
learning;
•
A two-part assessment strategy, involving using a summative assessment
component to act as a motivator for the formative assessment exercises, may
increase student participation in the formative assessment process;
3. CBA approaches for formative assessment and diagrams
•
66
Restricting student access to the CBA system will result in poor student
attendance, limiting the formative impact of the assessment;
•
Mixing the formative assessment with a simultaneous summative element,
rather than using a two-part assessment approach, can confuse the pedagogic
approach of the assessment and limit student learning opportunities;
Axelsson et al [AMW06] discuss the difficulties of mixing formative and
summative assessment aims even in seminar sessions benefiting from high
educator participation;
•
Constructing CBA systems “from scratch” is time-consuming and expensive
and should only be undertaken if commercial systems or existing academic
CBA platforms cannot demonstrate required functionality; for fixed-response
assessment designs this is now unlikely;
•
Formative assessment using CBA often benefits certain student profile groups
more than others;
•
Feedback should be linked to learning materials;
•
Constructing in-house learning materials may be prohibitively resourceintensive; linking feedback to papers, textbooks and website references is an
acceptable substitute which can encourage student research;
•
Students should be allowed to repeat assessment questions in order to correct
mistakes;
•
Linking sections of the assessment to specific concepts can assist students in
identifying and correcting shortcomings in their understanding;
•
Formative assessment should provide feedback to educators as well as
students in order that subsequent teaching processes can be improved.
3.1.2 Free-response formative CBA: a review
Section 2.1.6 outlined the reasons for the relative difficulty in developing freeresponse CBA. This difficulty has resulted in free-response CBA systems being much
fewer in number than their fixed-response counterparts and also explains why those
3. CBA approaches for formative assessment and diagrams
67
free-response systems which do exist have been created by educational institutions
themselves since no major commercial software is available. Section 3.1.2.1 provides
an overview of the available systems, while section 3.1.2.2 provides a brief look at the
conclusions which can be drawn from the examples. Diagrammatic CBA systems are
not featured here since they will be examined in more detail in section 3.2.
CourseMarker and DATsys are not featured here since they will be examined in
detail in section 3.3.
3.1.2.1 Formative assessment capabilities of free-response CBA systems
Joy et al describe the BOSS system [JG04, JL98], a system to allow online submission
and automated testing of programming assignments which utilises a hybrid CBA
approach in which student submission is automated and student programs can be
compared against test data but the assessment and feedback process are carried out
manually by the lecturer. Checking the program against test data can be
accomplished either by the student to assist in the development of the program or by
the educator to assist the assessment process. However, Joy et al argue that a full
CBA approach is unable to award fair credit to novel solutions and thus teaches
students in a “prescriptive” way. Originally developed, like Ceilidh, as a command
line environment, BOSS has been updated several times and now has a client-server
architecture with a relational database to store data and a choice of either Java or
web-based client for student interaction. BOSS has been used to assess courses in
Pascal, UNIX shell programming and C++. Joy et al report considerable practical
advantages, especially in administrative matters, a positive student response and a
reduced marking time.
BOSS has certain pedagogical advantages. The system of allowing students to run
automated tests on their programs before submission allows improvements to be
made to the solution and promotes introspective self-assessment. Certain other
factors, such as the level of motivation encouraged by the feedback, the clarification
of good performance and the concentration on student learning, are unchanged from
traditional assessment precisely because they are conducted using traditional means.
Furthermore, BOSS has proved useful in information collation and is able to provide
accurate information to educators. Problems associated with BOSS from a formative
assessment standpoint centre around the intervention by the educator at the point of
3. CBA approaches for formative assessment and diagrams
68
assessment. Timeliness of feedback to the student is entirely dependent upon the
individual educator, rather than guaranteed, and in any case will never successfully
rival the near-instant feedback times associated with full CBA systems. Furthermore,
given that each submission must be assessed manually, it is unlikely that multiple
student submissions on a large scale could ever be feasibly allowed. The implications
of this are that feedback might be returned to the student at a time when their
solution is no longer fresh in the mind, and that few or no opportunities to repeat the
assessment, for the purposes of acting upon feedback, are likely to be allowed.
Fundamentally, conducting formative assessment as an iterative cycle in which the
student can improve their solution over the course of multiple submissions while
receiving several sets of motivational feedback, is not feasible with the BOSS
approach. This is a disadvantage recognised by Joy et al [JG04] where they note the
formative potential of a fully automated approach such as that taken by
CourseMarker.
Jackson and Usher [JU97, Jd00] describe the ASSYST system. Like BOSS, the UNIXbased ASSYST uses an approach which is a hybrid of CBA and traditional
assessment. The pedagogical advantages and drawbacks in terms of formative
assessment potential are therefore similar to those of BOSS.
Daly [Dc99] describes RoboProf, an online teaching system structured as a formative,
coursebook which presents students with information on programming topics and
then automatically assesses student exercises that cover those topics. RoboProf runs
under UNIX and interacts with students via a Java applet, is designed to be modular
and scalable, allows any number of submissions and does not penalise failure. Daly
reports on the use of RoboProf to assess a C++ programming course. RoboProf’s
assessment mechanism is based upon running student solutions against test data and
examining the results in a similar way to that of Ceilidh. Student feedback is based
around revealing the model solution to the student.
RoboProf clarifies good performance through providing students with the model
solution, provides ample opportunities to improve performance through allowing
unlimited submissions and provides positive, motivational beliefs through its policy
of not penalising failure. On the other hand, reflection and dialogue around learning
are likely to be minimised since the student is presented with the model solution.
3. CBA approaches for formative assessment and diagrams
69
Furthermore, feedback information is not focused on the student learning process
since it is overly problem-specific and provides no motivation to conduct further
research. Spacco et al [SHP+06] describe the Marmoset system’s attempt to overcome
these limitations. Sections of the instructor’s private test cases are released to the
student using a system of time-based tokens. In standard configuration, only three
tokens may be redeemed per day. Spacco et al argue that this motivates students to
begin work early since more help will be available.
Von Matt [Mu94] describes the Kassandra system which is used to assess student
programming assignments in Maple and Matlab. Kassandra’s primary focus is on
summative assessment but the assessment is conducted throughout the course and
feedback is provided. Due to the summative impact of the assessment, security is a
key feature of Kassandra. Kassandra requires students to modify their code prior to
submission, although Winters [Wt04] argues that this is an acceptable requirement in
a domain such as computing. Kassandra requires very precise output from student
programs.
Kassandra promotes dialogue around learning since students must be aware of the
Kassandra system while developing their solutions if they are to modify their code to
conform to Kassandra specifications. However, Kassandra fails to provide
opportunities to improve performance, to adequately clarify good performance, to
encourage positive motivational beliefs or to keep feedback information focused on
student learning, all because of the constraints imposed by the summative nature of
the assessment process.
A similar approach to Kassandra is documented by Oliver [Or98] and by Douce et al
[DLO+05] although not all of the approaches require students to tailor their solutions
to the same extent; these approaches are heavily influenced by the Ceilidh system
which is examined in detail in section 3.3.1.
Another system heavily influenced by Ceilidh and CourseMarker is the
EduComponents system described by Amelung et al [APR06]. Amelung et al
acknowledge
their
experience
with
systems
such
as
CourseMarker.
The
EduComponents ECAutoAssessmentBox system is written as an extension of the opensource content management system Plone; a central priority in the development of
the system was achieving integration with the departmental system already used for
3. CBA approaches for formative assessment and diagrams
70
the delivery of materials to students. The EduComponents assessment technique was
based upon work by Saikkonen et al [SMK01], which demonstrated that automated
assessment of functional programming languages can be undertaken by directly
comparing the values of functions in the student and model solutions.
ECAutoAssessmentBox assesses student programming exercises in Python, Haskell,
Scheme, CommonLisp and Prolog. Amelung et al discuss the vital role of formative
assessment and report that use of the EduComponents ECAutoAssessmentBox has
increased student motivation in programming exercises. However, the system of
feedback rests upon awarding the student one of two states, Accepted and Rejected.
Such a system of classification is not motivational to struggling students and,
furthermore, the system does not provide the opportunity to resubmit after one of
the states has been awarded. Grading and brief feedback is also provided, but since
the student cannot resubmit then opportunities to improve, promotion of student
dialogue and the encouragement for self-assessment are not provided.
English [Ej04] describes the process of automated assessment of student GUI-based
programs in Java using JEWL, a set of Java packages created with the aim of allowing
novice programmers to construct GUI programs “from the ‘Hello world’ stage onwards”.
A “test harness” is used to generate sequences of events which the JEWL event loop
allows to be processed as a stream of characters. English reports that the assessment
process is as yet in the early stages of development, but that students seem to be
motivated by their ability to create Java programs as opposed to command line based
programs (which are seen as unrelated to real-world Java programming and
therefore dismissed as toys). English reports that only interface functionality can be
assessed and considers the drawbacks of being unable to assess the layout of the user
interface. The feedback regime is not explicitly described in the literature, although it
is implied that students are provided with a report of those features in regard of
which their program failed to conform to the program specification. This has obvious
drawbacks in terms of motivation; English also reports frequently infuriated student
complaints to the system administration but dismisses most concerns as being to do
with misread questions. Dialogue around learning is obviously generated, however,
and opportunities to improve are provided. Gray and Higgins [GH06] describe a
system for the assessment of GUI-based student Java programs using CourseMarker
which makes use of the standard CourseMarker feedback mechanism.
3. CBA approaches for formative assessment and diagrams
71
3.1.2.2 Implications for formative assessment using CBA
Section 2.1.7 noted the advantages of free-response CBA in terms of the opportunities
provided for assessing higher cognitive learning levels. The review of the formative
potential of existing free-response CBA systems raises several key implications with
relevance to this work:
•
Fully automated CBA systems may not account for particularly novel
solutions and can be criticised for pedagogic “prescriptiveness”;
•
A trade-off exists between fully automated CBA approaches and humanassisted approaches which, while more able to cope with student novelty,
may result in less timely feedback and fewer chances for the student to
improve;
•
Requiring precise input from students encourages awareness of the
assessment process and has fewer disadvantages in scientific disciplines
where questions can be worded as ‘specifications’;
•
Free-response CBA systems can allow the student to construct solutions
which they feel are relevant to the real world—this acts as a motivator;
•
CBA can be helpful to educators in collating often complex feedback results
derived from free-response exercise submissions;
•
CBA can provide timely feedback;
•
CBA can assist formative assessment through an iterative cycle: test–
feedback–retest;
•
A system which does not penalise failure can successfully motivate learning;
•
Providing model solutions to students may fail to encourage student research
— a more effective method is to construct feedback;
•
CBA can be used to construct feedback which is not failure specific;
•
Allocating simple states or grades focuses on failure and may de-motivate
students;
3. CBA approaches for formative assessment and diagrams
•
72
Questions should be carefully phrased to avoid confusion, since misread
questions often infuriate students.
3.1.3 Summary
Section 3.1 reviewed existing approaches to automating the process of automatic
assessment. Section 3.1.1 reviewed examples of providing formative assessment
using fixed-response CBA systems. The design of the assessment itself is more
important to the success of the formative assessment than the technical capability of
the system. Some systems are built from scratch by the educator, allowing a flexible
and highly targeted approach but at the expense of high development costs. Systems
built on top of commercial platforms such as QuestionMark require less resources and
can provide successful formative assessment so long as the assessment itself, and
especially the feedback, is carefully designed by the educator. Section 3.1.2 reviewed
free-response CBA systems. Free-response CBA can be used to provide timely
feedback and encourage discussion around the learning and assessment processes,
but sometimes at the expense of tolerance in marking. Human-assisted approaches
are more flexible but do not allow the same potential for timely feedback or multiple
student submissions.
3.2 CBA approaches in diagrammatic domains
Tsintsifas’ [Ta02] approach to conducting CBA within diagram-based domains
aspired to be generic, as opposed to domain-dependent. The DATsys object-oriented
framework was developed with the intention that the extensions could be diagram
editors in any conceivable domain within a CBA context. CourseMarker’s Generic
Marking Mechanism was, similarly, designed to be flexible enough to allow any
domain to be assessed for which specific marking tools could be constructed.
CourseMarker and DATsys are described in detail in section 3.3. Since section 2.3
illustrated that one of the key advantages in assessing diagram domains is their
interdisciplinary potential, this work will continue Tsintsifas’ generic approach: the
aim will be to provide a framework for the formative CBA of diagram-based
domains, within which assessment for individual domains can be conducted through
extension and parameterisation.
3. CBA approaches for formative assessment and diagrams
73
This section considers other approaches to the automated assessment of diagrams
which are described in the literature, of which there are four: the TRAKLA2 system
described by Malmi and Korhonen [MK04], the PILOT system described by
Bridgeman et al [BGK+00], the diagram comparison system described by Hoggarth
and Lockyer [HL98] and a more recent body of work developed by Thomas, Waugh
and Smith at the Open University, UK [TWS05], which was published as this work
was being undertaken. Each review considers the level to which the work is domainspecific, the user-interactivity of the student interface and the ability of the approach
to provide feedback to students. The reviews are in order of generality, with the most
domain-specific system first.
3.2.1 TRAKLA2: a review
Malmi and Korhonen [MK04] describe the TRAKLA2 system, the successor to the
earlier TRAKLA [HM93]. TRAKLA2 is a domain-specific CBA system used to
distribute visual algorithm simulation exercises to Computer Science undergraduates
on a Data Structures and Algorithms course. Distribution of the exercises is
accomplished over the web as a Java applet; the interactive environment (Figure 3.1)
is used by the students to solve the exercise by manipulating the available tools.
Exercises cover such topics as binary search tree insertion and deletion, and insertion
into AVL-trees, red-black trees, digital search trees and radix search trees. Typically,
a student drags and drops graphical entities (keys, nodes and references) onto the
drawing canvas in an attempt to simulate the operations performed by the algorithm
defined in the question. Therefore, the user interface can be viewed as a more
complex variation on standard graphical hotspot interaction techniques as described
in section 2.1.6.1. Malmi and Korhonen are keen to emphasise that the student
exercise is individually tailored after each submission — what this means is that
certain exercise parameters are randomised in order to alter the student solution.
This strategy has been adopted for use with TRAKLA2 to prevent plagiarism.
Student feedback is returned as a mark: the number of correct steps is given as the
mark, while the total number of correct steps is given as the highest possible mark. A
standardisation system is used for inter-exercise consistency. The student can also
request to see the model solution. If this occurs then the model solution is displayed,
but grading suspended until the exercise has been re-initialised with different data.
3. CBA approaches for formative assessment and diagrams
74
The design of TRAKLA2 emphasises the collection of data regarding student
performance. Data is logged every time a student initialises the exercise, asks to be
graded, requests the model solution or submits an exercise. The number of
submissions allowed is unlimited.
Figure 3.1: TRAKLA2’s student applet and model solution window [MK04]
TRAKLA2 provides in-depth information to educators; Malmi and Korhonen’s paper
[MK04] concentrates substantially on student mark analysis. TRAKLA2 also provides
ample opportunities to improve since submissions are unlimited, and clarifies good
performance through allowing the student to request the model solution. Malmi and
Korhonen emphasise their belief that the randomised exercise elements result in
3. CBA approaches for formative assessment and diagrams
75
sustained student interest and increased learning. Unfortunately, this randomisation
relies upon the domain and user interaction restrictions of the system to be feasible.
The motivational potential of the feedback, in concentrating upon the proportion of
the model solution correctly identified by the student, is ambiguous at best. For
similar reasons, the feedback fails to provide information focused upon student
learning, instead focusing on student mistakes. Malmi and Korhonen fail to
document the potential for dialogue around learning in terms of student feedback to
educators, although their insistence on guarding against plagiarism seems
superfluous for formative assessment. A final examination seems to act as a
successful motivator for student participation. Finally, although TRAKLA2 was
popular with students and encouraged learning, its context is entirely domainspecific.
3.2.2 PILOT: a review
The PILOT system described by Bridgeman et al [BGK+00] was designed to
accomplish three goals: use in class by educators as a demonstration tool to aid
exposition, use by students in entering online solutions to randomly generated
instances of questions and as a grading tool for formative purposes. PILOT has a
degree of commonality with the TRAKLA2 system described in section 3.2.1:
PILOT’s user client is distributed as a Java applet and the system is used to conduct
formative assessment of graph problems such as the minimum spanning of a tree,
shortest path algorithms and breadth- and depth-first searches. Furthermore, PILOT
aims to reduce plagiarism by only allowing students to solve problems generated
“on the spot”; this is an attempt to prevent students from using the system to help
them solve their “homework” problems, which are summatively assessed.
Bridgeman et al note the proven usefulness of graph and algorithm visualisation
systems in learning; a recent such system is described by Brusilovsky and Loboda
[BL06]. To begin an assessment, a student chooses a problem type and a random
instance of the problem type is generated. The graph of the problem is then drawn on
the screen: this necessitates the use of a Graph Generator system to conduct
automatic layout of the graph using algorithms derived from Di Battista et al
[BGL+97]. To indicate their solution to the system, the student clicks on the edges, in
3. CBA approaches for formative assessment and diagrams
76
order. This generates a list of edges which constitutes the solution. An illustration of
an exercise and the student solution is shown in figure 3.2.
Figure 3.2: Example exercise and student solution using PILOT [BGK+00]
Bridgeman et al emphasise the need to provide partial credit to a student for an
“almost” correct solution. The solution checker operates slightly differently
depending upon the exercise type: solutions may vary from simply checking the list
of edges in the solution tree, to checking the order of the list of edges. In the case of
non-unique solutions, the checker must evaluate if a student solution which differs
from the generated model solution is equally valid — this is accomplished by
running a simulation of the graph and comparing the weights of the edges denoted
by the two. Feedback is provided in the form of brief comments denoting precise
student errors, for example “Edge (a,c) should be replaced by the lower-weight edge (a,b)”.
3. CBA approaches for formative assessment and diagrams
77
The system uses a system of penalty marking in which one mark is deducted from
the total for each incorrect edge.
Like TRAKLA2, some of PILOT’s most central features are based around domainspecific properties of the exercises, together with limitations to the user interaction.
Bridgeman et al’s reference to graph layout within a CBA context is interesting, but it
is important to note that the context is in the construction by the program of
aesthetically acceptable graphs which are then manipulated, in a limited way, by the
user. PILOT does not assess the aesthetics of student diagrams.
PILOT clarifies good performance to the students by suggesting specific changes to
their solutions in terms of edges. While this information concentrates on improving
the specific student solution, there is no attempt to motivate students to research on
their own by referencing external material or by explaining the reasons why the
modification to their solution is necessary. By concentrating on student errors, and
by adopting a penalty marking scheme, positive motivational beliefs in students are
not encouraged. Dialogue around learning and self-assessment is promoted by the
tool since students are allowed to discuss their solutions with each other and since
PILOT is used as a demonstration tool — again, however, it is unclear how much
student improvement is gained through increased understanding as opposed to
blindly adopting the system’s corrections. Bridgeman et al’s insistence that the
system not be used to provide help with homework assignments implies worry on
their part that students may not learn through PILOT’s feedback process.
Opportunities to improve are provided through allowing multiple submissions.
3.2.3 Diagram Comparison System: a review
Hoggarth and Lockyer [HL98] developed their diagram comparison system, for use
with systems analysis and design diagramming methods, due to their perception that
existing CASE tools did not cater for academic users, who would require assistance
with the underlying methodology of developing their solutions and well as with the
specific usage of the CASE tool. Hoggarth and Lockyer documented a tool which
embedded Computer Aided Learning features within a CASE tool; they argued that,
“as a CASE tool fully recognises the content of a software diagram it can provide feedback
based on its actual diagrams”. By the definitions outlined in section 2.1.1, the features
which are described are actually CBA rather than CAL. The system documented by
3. CBA approaches for formative assessment and diagrams
78
Hoggarth and Lockyer is domain-specific: it is used to assess student systems
analysis and design diagrams. However, the fact that it is embedded within a CASE
tool suggests at least the potential for implementation in further domains, and the
comparison mechanism itself is not domain-specific.
As noted in section 2.1.6.2, the verification mechanism involves the student manually
tailoring their diagram to match the requirements of the system. This notion of the
student labelling their solution to assist the automated assessment process is
reminiscent of the Kassandra system described in section 3.1.2.1. The student must
specify ‘tokens’ (the names of diagram components) in their solution, which are
matched with tokens in the model solution in order that the system can identify the
equivalent diagram components in the two diagrams. The verification mechanism
then compares the diagrams as two directional “flows” of modes and connections
and notes the differences in ordering between the two. Formative feedback is
generated according to three specification criteria: outlining specific student errors,
for example the absence of necessary flows in the student diagram; outlining student
inconsistencies, such as different symbol order or connections with incorrect
directionality and listing the symbol selections between the two diagrams.
The feedback system described by Hoggarth and Lockyer promotes dialogue around
learning since students must be aware of the assessment process in order to tailor
their solutions to the system, provides opportunities to improve through multiple
submissions and provides timely feedback. It could also be argued that comparing
the student solution to the model solution helps to clarify good performance.
However, the danger inherent in such a feedback mechanism for formative purposes
is that the information provided to students is exclusively based around differences
between the student’s solution and the model solution at the expense of focus on the
student learning process itself. Furthermore, the feedback is not inherently
motivational since it concentrates mainly on the level of student failure. Such a
feedback framework may encourage students to blindly minimise the differences
highlighted by the system, at the expense of promoting dialogue around learning.
Listing the symbol selection between diagrams is, however, a useful tool in
providing a psychological link between the student’s submission and the feedback,
especially if the student chooses to view their feedback at a later date.
3. CBA approaches for formative assessment and diagrams
79
3.2.4 Automatic Marker for Entity Relationship Diagrams: a review
This section examines a body of work undertaken by Thomas, Waugh and Smith at
the Open University, UK. Much of the work reviewed in this section was undertaken
concurrently with the research outlined by this thesis and it must be emphasised
that, for this reason, the strategies and results achieved impacted little on the work
outlined in the remainder of this thesis. Significant differences between the two
approaches exist: the work described in this thesis provides a theoretical framework
for the formative assessment of many diagram domains. Marking of individual
domains is achieved through extension and parameterisation. The work described in
this section is primarily domain-specific and aims for a deeper understanding of the
structure of entity-relationship diagrams for use in a simple marking tool. This
section will examine the work undertaken and review its potential for providing
formative assessment.
Thomas [Tp04] describes the process of creating a tool to allow students to draw
diagrams in an online examination. The drawing tool could be launched simply by
the students from within the online examination and the interface was simple
(consisting of only “boxes” and “links”). Some students were dissatisfied by the
interface (particularly the level of screen scrolling routinely necessitated by the tool)
and there was also a reluctance by students to use the system, with which they were
not previously familiar, under examination conditions. Thomas concluded that the
situation would have been improved had the students been more familiar with the
system before the exam. An interesting feature of the results was that students
tended to use spatial correlation between boxes, rather than direct links, to indicate
intent. For this reason alone it was fortunate that no automated assessment process
was used in this trial: it is likely that many students would have received very low
marks indeed.
Smith et al [STW04] describe an initial approach to the assessment of imprecise
diagrams. Student diagrams are said to be imprecise because required features are
incorrectly defined or missing, or because extraneous features have been introduced
by the student. Smith et al outline a general approach to “interpreting” imprecise
diagrams which does not attempt to identify semantic structures, but instead
searches for associations between nodes in the student diagram and conducts a
3. CBA approaches for formative assessment and diagrams
80
comparison with equivalent associations in a model solution. A mock exam was
conducted in which students were asked a simple question whose answer was a
pipeline diagram. Similarity measures, generating a value between 0 and 1 for each
association, were used along with a system of weighting to generate a final mark.
Student solutions were also hand marked and the marks compared. Smith et al noted
that the results were encouraging but that it could not be proved that the automatic
marker was not significantly different from human markers. This approach is similar
to that used by Tsintsifas [Ta02] in the marking of UML diagrams. Thomas et al
[TWS05] describe a similar approach to the assessment of entity-relationship
diagrams. A tool was used to identify “minimum meaningful units” (MMUs) in a
diagram: in entity-relationship diagrams an MMU is a connection between two
nodes. Once all MMUs are identified an aggregation stage combines MMUs into
higher level features which are then compared with a model solution. Two exercises
were presented to participating students, who drew their solution online. Official
marking was undertaken manually by tutors. The tool was used afterwards to
generate an alternative set of marks which were then compared with the manual
marks. In both exercises the correlation between marks was good. Furthermore,
correlation for the second, more complex, exercise was later improved by the
introduction of a feature to consider synonymous entity names.
Later work [TWS06] attempted to identify higher-order semantic structures through
the use of a cliché library. The work was domain-specific, again concentrating on
entity-relationship diagrams. A pattern was defined as a sub-diagram with some of
its details omitted (i.e. made generic). Some patterns are considered equivalent, such
as a many-to-many relationship and two one-to-many relationships, and it would be
useful to be able to substitute these patterns as part of the assessment process. Reusable patterns were stored for reference and referred to as clichés.
Thomas et al [TWS05, TWS06] describe the construction of a student revision tool.
Students are presented with a collection of typical assessment questions in the
domain of entity-relationship diagrams. Students draw their answers online; the user
interface is shown in figure 3.3. Feedback is provided in terms of a mark and a series
of entity-relationship diagrams that form the MMUs of the model solution. The
student is also able to display an “interactive” version of the specimen solution. The
3. CBA approaches for formative assessment and diagrams
81
tool was later modified to allow the addition of patterns from the student diagram to
the cliché library [TWS06] and to suggest pattern substitutions.
Figure 3.3: The student revision tool [TWS05]
The identification of MMUs and the creation of a cliché library have clear advantages
for the automated marking process. Clichés may be equivalent but they are not
always equally preferable; therefore, it is possible to award differential marks across
equivalent clichés and to suggest substitutions to the student. Good student
performance is thus clarified. The identification of MMUs makes understanding the
structure of a student diagram possible and allows feedback to be presented as a
direct comparison of student and model solutions MMUs, with individual subdiagrams presented to the student as tailored feedback. Information is, therefore,
focused on student learning. Good information is also provided to educators and
further clichés added to the library.
The promotion of self-assessment and dialogue around learning may be difficult to
achieve since the information provided to students centres directly upon the model
solution, allowing them to tailor their solution directly. Little information is provided
to motivate student research. Opportunities to improve may therefore not be
3. CBA approaches for formative assessment and diagrams
82
maximally effective. The system also concentrates on student differences to the
model solution, which is not motivational.
These minor criticisms aside, the work that this section has described provides a rare
example of truly free-form diagram-based CBA which provides a deep
understanding of the diagram domain under consideration and allows tailored
feedback to be provided to the student. The authors concede that their approach has
been domain-specific, as opposed to this work which aims to provide a generic
framework for formative CBA in diagram-based domains. The authors do express
the hope [TWS06] that a similar approach can be focused upon object-oriented design
diagrams at a later date.
3.2.5 Summary
Section 3.2 provided an overview of four diagram-based CBA systems. The
TRAKLA2 and PILOT systems limit the free-form nature of the CBA by providing a
limited number of interactions between the student and the onscreen diagrams.
Conversely, the diagram comparison system described in section 3.2.3 and the entityrelationship diagram tool described in section 3.2.4 allow true, free-form CBA to
occur. All four described systems are largely domain-specific and all four provide
specific comments on student errors as feedback. As the framework for good
formative feedback, presented in section 2.2.5, suggests, this may not provide
optimal student motivation and discourages student research. The work into
patterns and clichés described in section 3.2.4 represents an attempt at deep
understanding of a limited diagram domain. Conversely, the DATsys framework,
and the CourseMarker CBA system into which it is integrated, provide a framework
for generic marking and allow marking tools to be configured by domain experts at a
later date. The influential Ceilidh system, its successor CourseMarker and the
DATsys framework for diagram editors in a CBA context are reviewed in depth in
section 3.3.
3.3 Ceilidh, CourseMarker and DATsys
Section 2.2 emphasised the key differences between formative and summative
assessment. From a CBA perspective there are, however, elements which must
necessarily be present in both formative and summative assessment, for example
3. CBA approaches for formative assessment and diagrams
83
course management facilities. This section will give an overview of automatic
assessment systems already developed at the University of Nottingham, primarily
for the automated assessment of coursework in programming. The aim is to
demonstrate that formative CBA can best be achieved through the extension of an
existing CBA platform and that development from scratch would constitute a
metaphorical reinvention of the wheel.
Section 3.3.1 begins with an examination of Ceilidh. Ceilidh is a system largely of
interest to CBA researchers now for historical reasons; however, many key concepts
in the later CourseMarker system were originally implemented in Ceilidh and their
development can best be understood within context. Section 3.3.2 then provides an
overview of the CourseMarker system, while section 3.3.3 looks at DATsys, the object
oriented framework for CBA-based diagram editors.
3.3.1 Ceilidh
Ceilidh was a general purpose courseware system whose main use in practice was in
the supporting of courses in student programming [BBF+95]. It was originally
developed not as an academic research project but as an ad hoc practical aid for the
teaching of programming to large groups of undergraduates; Ceilidh was developed
and added to while in actual use and this ensured that its features were designed
with practical teaching needs in mind [BBF+93]. Ceilidh supported features to
address
several
problem
areas:
presentation
of
course
materials,
course
administration and assessment of submitted student coursework. Ceilidh therefore
constitutes a true CBA system according to the definition system outlined in section
2.1.1.
3.3.1.1 Ceilidh’s Architecture
In CBA, Ceilidh pioneered the concept of separating the system itself from the
courses it administered. The user interface was also separated, the result being that
Ceilidh’s architecture was a three layer-model as shown in fig 3.4. Foxley et al
[FHT+99] emphasise that the interfaces between layers were well-defined, meaning
that developers could modify one layer without any of the others being affected. The
database layer was responsible for storing course information, both for running and
administration and including exercise data and student submissions and marks. The
3. CBA approaches for formative assessment and diagrams
84
tools layer, which operates on the database layer, was comprised of a wide variety of
external marking tools. The client layer provided the user interface to the system.
Multiple user interfaces were implemented, including a dumb terminal user
interface, a command line interface, an X-window interface and a web interface.
Ceilidh’s concept of User Views meant that each interface was capable of providing
different views depending upon the registered type of the user.
The separation of system and data was central to the development of Ceilidh as a
general purpose courseware system. Ceilidh was originally developed to assist in the
administration of a C programming course; if the C exercises and assessment code
had been hard-coded into the system itself, the task of adding new courses later
would have been rendered difficult. Instead, the separated architecture allowed
courses to be later developed for the assessment of SML, FORTRAN, Pascal,
Modula2, SQL, Prolog, Z and UNIX-based software tools. It should be noted that
Ceilidh’s architecture allowed external developers to feasibly produce their own
courses; the result was that a substantial number of courses were developed outside
the University of Nottingham.
Dumb Terminal Menu Interface
Command Line Interface
Client layer
X-Window Interface
World Wide Web Interface
Tools layer
Database layer
Fig 3.4: architectural overview of the Ceilidh system
3.3.1.2 Ceilidh’s Course Structure
Ceilidh was capable of hosting multiple courses simultaneously. Each course had a
hierarchical directory structure: within each course directory was a series of
subdirectories representing exercises and units (collections of exercises). Ceilidh
specified the files which must be present at each level in the directory hierarchy
3. CBA approaches for formative assessment and diagrams
85
[BBF96], including files for publishing information (such as unit notes or questions),
and skeleton files for student solutions. The flexibility of structure allowed the type
of a course to be specified within the course directory and for each exercise to specify
the information it will collect upon student submission and the marking tools called
upon to assess the submission.
3.3.1.3 Ceilidh’s User Views
Users of Ceilidh were split into five groups: students, tutors, teachers, developers
and system administrators. Each of these users had different main duties and were
consequently presented, upon login, with a different User View of the system.
System administrators, for example, had access to every aspect of the system with
permissions to modify any file within it while a student would be presented only
with those options relevant to the viewing of exercise questions and skeleton files
and the submission of solutions, whereupon they would be provided with feedback.
In general, for every user interface provided by the system, a course, unit and
exercise level functionality specific to the User type were presented through the
client layer. This approach has obvious advantages to those using the system. For
convenient use, it is necessary that teachers, for example, be provided with the
facilities to develop and modify exercises. However, students should not be
presented with these facilities upon logging in to the system.
3.3.1.4 Ceilidh’s Marking Tools
Marking of student submissions is a complex task comprising many sub-sections
related to marking criteria of different types; furthermore, marking coursework
within different domains will necessarily involve different operations being
performed. Ceilidh’s solution to this problem was to encapsulate separate
components of the marking process in what were known as marking tools. To mark an
exercise Ceilidh called all necessary marking tools for the exercise and assigned
weights to the numeric values returned. The overall mark assigned by Ceilidh was a
composite of the weighted marks. Invocation of marking tools occurred through a
marking action, a configuration file in which the marking tools to be called, together
with the corresponding weight for each, was defined [Ta02]. Marking actions were
created for each exercise depending upon which marking tools were required. This
innovation was crucial in the development of Ceilidh as an environment in which
3. CBA approaches for formative assessment and diagrams
86
multiple exercise types could be assessed. New exercise domains could be assessed
providing that marking tools could be written to achieve the task. Marking tools
were free to make use of pre-existing software, including UNIX tools; this proved to
be of major practical benefit since common operations did not have to be coded from
scratch.
Another key abstraction within Ceilidh was the distinction made between dynamic
and static marking tools [FHT+99]. Static metrics were responsible for analysing
student source code and examining, for example, typography, complexity and
program structure. Dynamic metrics executed the student program and used predefined test data, examining program output and allocating marks based on such
criteria as dynamic correctness and dynamic efficiency. Both kinds of marking tools
made use of an expression recogniser known as Oracles [ZF92], which used an
expanded Regular Expression notation to check for the presence (or absence) of
defined tokens.
Although these ideas were originally put to the use of assessing coursework in
imperative languages, similar ideas were successfully applied in other areas, such as
assessing Prolog [MGH98], Z [FSZ97] and UNIX software tools [FHG96]. Apart from
these courses and the central ones for imperative programming (C, C++), other
courses created for Ceilidh assessed exercises in Pascal, SML and SQL [FHT+99].
Work was also undertaken on the assessment of Object Oriented Analysis and
Design in the guise of the TOAD subsystem [FHT+99]; however, Ceilidh was
superseded soon afterwards.
3.3.1.5 Review of Ceilidh
Benford et al [BBF+93] provide an overview of their experiences using the Ceilidh
system, whilst Tsintsifas [Ta02] documents the necessity of Ceilidh’s super-cession
by CourseMarker. The Ceilidh system was built as a necessary response to practical
circumstances and not as a research project; the authors argue that the result of this is
that Ceilidh evolved to meet the actual needs of teachers and students rather than
being a hollow prototype. In terms of its effect on students, Ceilidh was seen to have
advantages in: confidence building, since early simple exercises result in positive
feedback which boosts the confidence of students, especially the less able; providing
assistance to weaker students, since Ceilidh’s statistics packages can be used to spot
3. CBA approaches for formative assessment and diagrams
87
struggling students earlier than would otherwise be the case and consciousness
raising, since immediate automated feedback brought about an increased willingness
in students to question the marks they were given and the criteria being applied to
mark their work, hence improving both the quantity and quality of discussion with
students. Ceilidh was also seen to encourage students to manage their workload.
Ceilidh’s disadvantages for students were in encouraging the phenomena of
perfectionists (those who would continue to submit even after a satisfactory mark had
been achieved in an attempt to gain a mark even closer to 100%, hardly an optimum
use of time) and gamblers (those who would submit many times with varying
modifications in an attempt to ‘stumble’ upon a good mark rather than considering
the problem logically). A feature was introduced making it possible to define a
minimum delay between submissions per student at the discretion of the teacher in
an attempt to combat these trends.
Positive effects on teaching were the most predictable: Ceilidh resulted in a reduction
in marking time and proved to be an efficient course administration system.
Negative effects on teaching were the very high raw marks which resulted from the
combination of continuous assessment and multiple submissions; this posed
problems in differentiating between candidates since the assessment served a
summative purpose. Marks were also found to be tightly grouped. The authors
argued that this means that Ceilidh’s use presents fewer problems where criterion
assessment [Kp01] is to be used, i.e. in the first, qualifying year of an undergraduate
degree. Where normative assessment is to be used then Ceilidh usage might present
additional problems. It is worth noting that in the case of formative assessment, high
raw marks indicate success rather than a problem.
Typical usage at other institutions was to use the system but to modify the structure
of the courses provided to meet institution-based practices. Marking with Ceilidh is
seen to be equitable, incremental and redeemable, while Ceilidh offers facilities to
detect plagiarism.
Ceilidh was, however, eventually deemed to have considerable limitations. Foxley et
al [FHH+01] point to the fact that Ceilidh was difficult to install and maintain as
considerable knowledge of the UNIX operating system was required. The fact that
Ceilidh was based in UNIX limited the number of possible installation bases. Also,
3. CBA approaches for formative assessment and diagrams
88
while Ceilidh’s assessment mechanisms were seen as powerful, its level of feedback
to students was limited. Popularity of the system among students was also
hampered by the fact that, for many years, Ceilidh’s interface was based upon ASCII
character terminals (fig 3.5).
Figure 3.5: Ceilidh’s dumb terminal interface [Sp06]
Furthermore, Tsintsifas [Ta02] saw Ceilidh as unsuited to the integration of diagrambased assessment into CBA. Ceilidh system dependencies were seen as too
constricting to accommodate the range of exercise types which could be constructed
within the domain of diagrams. The Ceilidh system had performance, scalability,
extensibility and maintainability problems due to its lack of initial design; its
architectural limitations in particular were considered serious enough to decrease the
feasibility of diagram-based assessment.
As a result of these identified weaknesses, a complete redesign of the Ceilidh system
was undertaken. The result was the CourseMarker system, which is examined in
detail in section 3.3.2.
3.3.2 CourseMarker
The shortcomings of Ceilidh, as outlined in section 3.3.1.5, eventually led to the
creation of a new system to support the full lifecycle of CBA. This system was
originally entitled the Ceilidh CourseMaster System [FHH+01], often shortened to
CourseMaster, and was later renamed simply CourseMarker. This section will briefly
3. CBA approaches for formative assessment and diagrams
89
describe CourseMarker and explain why it provides a suitable platform for the
implementation of diagram-based CBA, including for formative assessment. Many of
the ideas in the system were based upon the most successful aspects of Ceilidh and
so this section will concentrate on those aspects of CourseMarker which are different
from, or expanded upon, ideas from Ceilidh rather than re-stating those ideas
already considered in section 3.3.1. Furthermore, this section will not concentrate on
the DATsys diagramming system which is integrated into CourseMarker; this will be
covered in section 3.3.3.
3.3.2.1 CourseMarker’s Development Overview
Ceilidh’s creation as a direct response to the needs of programming had some
advantages, but ultimately the lack of design and coherent planning were major
factors in the need to replace the system. It was therefore decided from the outset
that CourseMarker would be conceived using object oriented (OO) methods and
theory from OO frameworks and design patterns to maximize usability,
maintainability and extensibility [FHH+01]. Furthermore, it was decided to develop
CourseMarker in the Java object-oriented programming language as this rendered
the system platform-neutral, hence increasing the range of potential installation bases
over Ceilidh. Certain aspects of CourseMarker rely on UNIX-like “tools”, however
these can be simulated in Windows systems through the use of the freely-distributed
Cygwin packages [Cyg98]. The vast majority of Ceilidh functionality was duplicated
within CourseMarker. A more extensive comparison of Ceilidh functionality with
CourseMarker functionality is provided within [FHH+01].
3.3.2.2 CourseMarker’s Architecture
CourseMarker’s architecture was expanded from that of Ceilidh in an attempt to
reorganise functionality in a more extensible way. Commonalities and variations
between the tools and data layer were identified [Ta02], with the commonalities
abstracted into class hierarchies and the variation represented by extension points
and parameterisation. Seven logical parts were identified and are represented within
CourseMarker as servers. The client layer now communicates with Login, Ceilidh,
Course, and Submission servers whilst auditing, marking and archiving servers are
also included with specific functionality.
3. CBA approaches for formative assessment and diagrams
90
The Login Server is responsible for registering users, validating sessions and student
login and registering when a student has logged out; the Ceilidh Server returns the
structure of a course, manages the servers and can be used to reload servers at
runtime; the Course Server returns the list of modules available to the user together
with the module information and setup exercises; the Submission Server is responsible
for submission attempts and receipts, together with the submission of exercises; it
communicates with the Marking Server for the marking of exercises and the Archiving
Server, which maintains audit trails, after exercises have been marked.
In common with the CourseMarker design, considerable effort has been expended to
ensure that the system, and communications within it, is secure; this is considered
especially necessary because CourseMarker is used for summative as well as
formative assessment. RMI is used for convenient distribution. CourseMarker
supports a range of auditing facilities, generates unique session keys for clients
which are validated on every transaction and can use DES password encryption for
the transmission of passwords between clients and servers. In their overview of the
CAA field, Rawles et al [RJE02] single out CourseMarker as an example of a CAA
system which addresses security “unusually” well. CourseMarker security is
considered in more detail in [HGS+06].
3.3.2.3 CourseMarker’s Course Structure
The logical abstraction of a Course as being subdivided into Units, each of which is,
in turn, subdivided into Exercises remains unaltered in CourseMarker. Data is
organized according to a hierarchical directory structure analogous to this logical
abstraction. Course directories contain a subdirectory for each Unit as well as
information files such as course notes. Unit directories contain subdirectories for
Exercises and other information files. Exercise directories contain exercise files.
Symeonidis [Sp02] provides a complete specification of this structure, including the
files which must be present at Course, Unit and Exercise level.
3.3.2.4 CourseMarker’s User Views
The CourseMarker system has five types of users: students, tutors, teachers,
developers and system administrators. Whilst direct access to CourseMarker server
commands can still occur through a command-line interface, most users will never
3. CBA approaches for formative assessment and diagrams
91
see this in their use of the system. CourseMarker clients have a GUI interface for use
by students [FHH+01] and a web interface has also been developed. A CourseMarker
web interface has also been developed for the use of system administrators [FHS+01].
3.3.2.5 CourseMarker’s Marking Tools and the Generic Marking System
CourseMarker was created concurrently with the diagramming subsystem DATsys,
described in section 3.3.3, and so the marking system was developed with the full
knowledge that a potentially very large number of domains would need to be
marked. Tsintsifas states of the system [Ta02] that: “Devising a prototype mechanism
that allows experimentation and creation of novel automatically assessable and across
domains diagram CBA is an important deliverable. By using this, metric research for the
evaluation of diagram-based coursework could be realistically tested in the context of the
classroom.” With CourseMarker many more domains would have to be marked than
with Ceilidh and the potential for the marking of new domains to become a semiregular occurrence exists if diagram-based CBA becomes widely used. Therefore a
marking mechanism had to be designed which would be both extensible and
expressive. Furthermore, the marking mechanism had to enable the creation of
detailed feedback since this was a perceived weakness of Ceilidh. It was therefore
necessary that marking must be more flexible and generic than Ceilidh and able to be
configured to mark a large number of domains [Ta02]. Integration of external tools
was key to the marking success of Ceilidh and had to be supported here too.
The design of the marking mechanism was based upon Ceilidh’s system of marking
tools. A Marking Scheme is used to describe the marking of an exercise, calling upon
Marking Commands, which in turn use Marking Tools to mark aspects of the
solution, return marks and generate a Marking Result which contains feedback to the
user which is richer than that returned by Ceilidh. Marking Tool Configurations,
which are exercise-specific, exist to specialise the marking tool to the requirements of
the exercise. A full conceptual overview of each of these components is provides by
[Ta02]. The appearance of rich feedback, based upon the Marking Result, was
completed by the use of a GUI representation within the student CourseMarker
client [FHS+01], as illustrated in figure 3.6.
3. CBA approaches for formative assessment and diagrams
92
Figure 3.6: The Java CourseMarker client [Sp06]
3.3.2.6 Experiences with CourseMarker
CourseMarker, which is commercially distributed, has been purchased by more than
15 institutions and used in classes of up to 1500 students [Ta02]. This section will
consider experiences with CourseMarker at the University of Nottingham, its
development base, before considering comments made by those from other
institutions.
CourseMarker was first used at Nottingham as a replacement for Ceilidh during the
academic year 1998-99. Tsintsifas [Ta02] provides a general evaluation of
CourseMarker in March 2002 including an examination of technical improvements
over Ceilidh. CourseMarker’s primary use at the University of Nottingham is with
the assessment of first year courses in Java programming; to this end two courses
have been created with exercises regularly updated year-on-year. Courses assessed
by CourseMarker involved 150 students at the University in 1998-99; this had risen to
310 students in 2001-02. During 1998-99 (the transition year) some students had used
both CourseMarker and Ceilidh depending upon their year of entry and their
selected courses. These students were asked to compare their experiences of the two
systems, and resoundingly preferred CourseMarker to Ceilidh, mainly due to its
3. CBA approaches for formative assessment and diagrams
93
Graphical User Interface and its expanded range of feedback [FHH+01]. Tsintsifas
reports on students who were asked simply to evaluate CourseMarker (rather than to
compare it with Ceilidh), and states that returned questionnaires indicate that
students were largely in favour of the system, especially due to the immediate
feedback, availability of multiple submissions and the ability to submit the
coursework at their own pace within allotted deadlines.
Teachers and administrators view the system favourably. Teachers “appreciate the fact
that they no longer have to mark hundreds of exercise solutions. Because course
administration and monitoring are very effective, even less time is spent on these activities”
[Ta02], whilst administrators find that the system is easier to set up and run than
Ceilidh, especially when use is made of the administrator’s web interface.
Outside Nottingham, the majority of sites are old Ceilidh users who made the
transition to CourseMarker. Tsintsifas [Ta02] quotes positive reports from academic
users in Singapore and Glamorgan, UK, illustrating that CourseMarker is viewed as
a success outside Nottingham. An evaluation of the usefulness of CourseMarker
when compared to other automatic assessment methods is also made by Foster
[Fj01]. Foster argues that the benefits of the system are strong; especially praised,
once again, is the fully automated, fast marking which makes multiple students
submissions feasible. Foster further states that the price of CourseMarker is cheap
given the amount of marking time saved. Foster’s reservations about the system
include: that novel solutions may be penalised by the system; that extra functionality
above the question specification will not be rewarded by the system; that the system
documentation is regarded as “patchy” and that, as a commercial product, the
system is distributed as an executable only, meaning that inspection of the system
source or further modifications cannot be undertaken by those purchasing the
system. Foster argues that most of the problems he outlines are unsurprising given
CourseMarker’s nature as a research project which was later simply distributed
commercially, and concludes that “[t]he fact that we are still using CourseMaster, and
will continue to do so, is a tribute to the considerable benefit that it does have.”
To summarise, therefore, CourseMarker has been successfully introduced at the
University of Nottingham and at a number of external institutions. The amount of
workload it carries has increased year-on-year at Nottingham, and reports on the
3. CBA approaches for formative assessment and diagrams
94
system by students (both those who had previously used Ceilidh and those who had
not) are generally of a positive nature. Teachers and administrators find the system
easier to use than Ceilidh, and the system has considerable advantages in terms of
marking time saved.
3.3.3 DATsys
DATsys was developed as the main deliverable of Tsintsifas’ PhD [Ta02]. Tsintsifas
identified the need for diagram-based CBA. CBA applications developed thus far did
not address the assessment of diagram-based domains, while existing diagramming
packages had not been designed with CBA in mind. This section will first consider
the requirements for diagram-based CBA which Tsintsifas identified and which
formed the basis for his approach before looking at the main deliverables produced.
These deliverables include the DATsys framework for diagram-based CBA, the
Daidalos environment for authoring diagram notations, the Ariadne environment for
exercise authoring and the Theseus customisable student diagram editor [Bb03]. The
final deliverables, the Generic Marking Mechanism, has already been examined in
section 3.3.2.5. This is an indication of a key point in the development of DATsys,
namely that DATsys was not conceived as an addition to CourseMarker to be simply
“bolted-on” afterwards; instead, CourseMarker and DATsys were developed in coordination with each other. As a result, the need for a Generic Marking Mechanism
for the successful CBA of diagrams was identified and implemented in
CourseMarker from the beginning and is now used in the assessment of
CourseMarker’s other CBA domains (primarily programming exercises) as well as
for diagram-based CBA. CourseMarker and DATsys are indelibly interlinked, and no
true understanding of the one can be achieved without an appreciation of the context
of the other.
Tsintsifas identified three major requirements to solve the problem of developing
useful diagram-based CBA. These were: the ability to author the editor used by the
student to develop solutions in an exercise-specific way during the authoring of the
exercise; development of a Generic Marking Mechanism which can be suitably
customised to enable the marking of a wide range of diagram types and the
integration of a system which meets the previous two requirements into a system
which can support the full-lifecycle of Computer Based Assessment. The Generic
3. CBA approaches for formative assessment and diagrams
95
Marking System and CourseMarker CBA system, both described in section 3.3.2,
were developed to fulfil the second and third requirements respectively.
DATsys was created as an object oriented framework, defined by Gamma et al as “a
set of cooperating classes that make up a reusable design for a specific class of software”
[GHJ+94]. Daidalos, Ariadne and Theseus, examined in the next three subsections
respectively, are implemented as concrete subclasses which enable the functionality
of the DATsys framework to be used in a CBA context.
3.3.3.1 Daidalos
Daidalos allows the authoring of specifications for diagram notations. It is therefore
used by developers to author diagram domains before they can be assessed. Daidalos
defines tools for the creation of figures, diagram elements, tools and commands, as
well as a selection editor which allows domain libraries of diagram notations to be
managed. Tsintsifas argues that Daidalos “could be considered a meta-diagrammer, as it
provides a graphical process for making parts of new diagram editors” [Ta02].
Figure 3.7: A range of diagram notations expressed within DATsys
3. CBA approaches for formative assessment and diagrams
96
Developers using Daidalos to author diagram domain notations can define diagram
elements (in terms of their graphical view, underlying data model and connectivity
constraints), tools and their interaction with the diagram elements, and menu options
and the commands they execute. Developers create libraries of tools which are stored
in diagram library files (with a .dlib extension). Library management functions
allow library files to be arranged into groups, with each group representing a
diagram domain. A tools graphical view is created by the grouping of graphical
primitives on the drawing canvas. The data model for the tool is specified by the
addition of typed data fields. The associated connectivity constraints are specified by
choosing either perimeter-based connections or pin-based connections (which can
themselves be further specialised through specification of connection lines). In this
way the representation of a diagram domain is constructed interactively and in an
intuitive, graphically-based environment. Exercises within the domain can then be
developed within the Ariadne exercise authoring environment. Daidalos is
effectively a standalone application with no integration with CourseMarker; it can be
used to author a wide variety of diagram notations as evidenced by figure 3.7.
3.3.3.2 Ariadne
Ariadne allows the authoring of specifications for CBA exercises within a diagram
domain previously specified within Daidalos. Within Ariadne the student diagram
editor, exercise properties and marking scheme can all be specified. Ariadne loads a
default group of tool libraries and the existing exercises. If these exercises have
already been deployed within CourseMarker then they will be loaded from the
course area.
Teachers use Ariadne for the specification of exercises. The student diagram editor is
specified in terms of its available tools (taken from tool library files) and available
options. Authors select the correct tools from the tool libraries. The marking scheme
and marking tool configuration can be edited within text-based windows and
configuration for the exercise can be specified. It is possible to draw a model solution
upon Ariadne’s drawing canvas. Again, therefore, exercises can be developed within
an interactive, intuitive graphical environment. Student solutions are then entered
online using the configured version of Theseus and then marked according by
CourseMarker according to the specification defined by the teacher within Ariadne.
3. CBA approaches for formative assessment and diagrams
97
3.3.3.3 Theseus
Theseus is the configurable student diagram editor within which students develop
their solution before submission. All of Theseus’ features, including the tools
available and available options, are defined through configuration. Theseus relies for
its configuration upon three configuration files: the first provides the exercisespecific tool library, the second provides the tools to be placed on the toolbar and the
third provides configuration for Theseus’ execution parameters and working paths.
The tool library, as developed within Daidalos, contains all the tools available to
solve the exercise. The students thus place tools on the canvas and attempt to connect
them, interacting with the tools, diagram elements, menu options and canvas in the
process. Upon completion, the students save the solution as a diagram file (with a
.draw extension). The students then submit their solution through CourseMarker,
which is responsible for marking the solution and returning appropriate feedback.
3.3.3.4 Integration of DATsys with CBA courseware
The level of integration of DATsys with CourseMarker differs across the different
sections of DATsys. The Generic Marking Mechanism is fully a part of CourseMarker
and is used for the marking of all exercise types, even those with no diagrammatic
content.
Conversely,
Daidalos
has
been
designed
to
operate
completely
independently of CourseMarker. Ariadne should have access to CourseMarker’s
‘CourseArea’ directories, where exercise configuration files are stored. Ariadne has
knowledge of the files that describe this exercise configuration for exercises in
diagram-based domains.
CourseMarker’s exercises have a designated ‘type’ within their configuration, each
type being associated, within CourseMarker, with an editor suitable to the exercise
domain. Diagramming exercise types have Theseus as their registered editor within
CourseMarker; consequently, when students elect to develop their solutions from
within CourseMarker the configured Theseus student diagram editor for the exercise
is loaded, using the configuration files generated by Ariadne.
When a student solution is saved within Theseus details of all diagram details are
saved. These details can be processed by the diagrammatic marking tools which are
accessible to the Generic Marking Mechanism. Traversing, translating, converting
3. CBA approaches for formative assessment and diagrams
98
and understanding the diagram can all be achieved by these tools. Translating the
structure of the diagram involves associating identifiers with each of the nodes
within the diagram and the relationships between them. Thus the contents of the full
diagram objects are available to the marking tools.
3.3.3.5 Experiences with DATsys
Diagram-based CBA exercises were developed in three domains: logic design,
flowchart design and object-oriented design. These exercises were deployed in 1999
to a group of 167 first year computer science undergraduates as part of a module on
“Software Tools”. Experience has shown that the process of developing exercises is
relatively lengthy but straightforward. Tsintsifas describes the complete process of
developing the exercises in each of these domains in his thesis [Ta02]. These exercises
proved popular with students since the development environment was intuitive and
they had been given a brief demonstration of Theseus in a lecture prior to attempting
the exercises. The most complex of the exercises within the object oriented design
diagram-domain had the unexpected side-effect of causing some students to draw
pen-and-paper solutions before transferring this solution online, but in most cases
students worked out their solutions and entered them directly into Theseus.
Performance at both the client and server level was good; the server was seen to
mark up to 15 submissions simultaneously. As a result of this live experience
Tsintsifas was able to conclude that diagram-based CBA was both feasible and
useful.
Unfortunately, this is the only documented example of diagram-based CBA being
used in a live environment prior to this work. Despite the number of institutions
which have taken CourseMarker (and previously Ceilidh), none outside Nottingham
have yet used diagram-based CBA. It is therefore hoped that the implementation of
formative diagram-based CBA as a result of this work will stimulate a wider usage.
A key deliverable of this work, indeed, will be to produce examples of working
formative diagram-based CBA and test their use in a live environment.
3.3.4 Summary
Ceilidh was an influential CBA system which was responsible for several key
innovations. The first was its multi-layer architecture, separating the client layer and
3. CBA approaches for formative assessment and diagrams
99
database of course information and submissions away from the architecture of the
system and hence allowing multiple courses in disparate domains to be housed
simultaneously in a hierarchical structure. The second was its use of marking tools to
allow marking of different domains to be specified on a per-domain basis. Ceilidh
was widely used and liked, but became difficult to maintain and was eventually
superseded by CourseMarker. CourseMarker offers a stable, reliable and secure
platform on which to provide CBA courses across multiple domains. Its user
interface is attractive and intuitive to students and the system is secure.
CourseMarker’s feedback mechanism allows feedback comments to be specified for
each test conducted in the automated assessment. The DATsys framework for
diagram editors in a CBA context provides a flexible platform for the authoring of
new diagram notations, the authoring of exercises across multiple domains and the
presentation of a configurable development environment for students. Prior to this
work, exercises had been developed in three domains but usage had not been
extensively reviewed or tested.
3.4 Summary
This chapter provided an overview of CBA systems used for formative assessment
and CBA systems used to assess diagram-based domains. Most CBA systems are
fixed-response. Section 3.1 first examined a cross-section of fixed response CBA
examples, both those built upon established platforms and those built “in-house”
from scratch and concluded that the design of the feedback provided to the student is
at least as important as the platform upon which the assessment is based. Systems
built upon the same platform (e.g. QuestionMark) were found to exhibit variation in
the quality of formative feedback they provided. Linking to reference materials in
feedback was found to motivate subsequent student research. Student motivation
could be encouraged further by the use of a two-part assessment strategy, allowing
re-submissions and ensuring easy student access to the CBA system. A cross-section
of free-response CBA was then examined. A trade-off was noted between fully
automating the assessment process or allowing human marker input into stages of
the process. Fully automated assessment allows multiple submissions, the fast
provision of student feedback and formative assessment opportunities at the expense
of academic “prescriptiveness”, whereas systems involving human intervention are
able to cope better with novel solutions at the expense of speed and formative
3. CBA approaches for formative assessment and diagrams
100
assessment opportunity in a test-feedback-retest situation. Section 3.2 examined CBA
systems based around diagrammatic domains. All of the systems considered focused
on a specialist domain or set of domains. Two of the systems restricted student
interaction to the point where their status as free-response CBA systems was
debateable. Two of the systems allowed free-responses by students. Section 3.3 began
by providing an overview of the historically important CBA system Ceilidh,
documenting the continuing advantages and influence of its multi-tier architecture
and devolved marking tools. Chapter 3 then provided an in-depth summary of the
CourseMarker CBA system and DATsys framework for diagram editors in a CBA
context, which will be used as the basis of this work. Chapter 4 documents an
attempt to conduct formative assessment using CourseMarker / DATsys, specifically
with the intention of identifying the limitations of existing CBA techniques in
relation to formative assessment.
Chapter 4
Problems in CBA applied to free-response formative assessment
4. Problems in CBA applied to free-response formative assessment
102
Introduction
This chapter presents the initial practical research experiment conducted in order to
identify those aspects of existing CBA techniques which would need to be extended
or adapted in order to meet the criteria of good formative assessment. Section 3.3
described the CourseMarker and DATsys systems. DATsys is a flexible, objectoriented framework for CBA-related diagram editors. CourseMarker is a reliable
platform for conducting CBA across a variety of domains. Previous courses assessed
using CourseMarker were generally for summative assessment purposes, or else had
a dual purpose. This research aimed to conduct formative assessment using
CourseMarker / DATsys as a model CBA system. Conclusions could be drawn in
terms of positive and negative experiences. The drawbacks would be used to identify
those aspects of existing CBA techniques which need to be extended or adapted in
order to meet the criteria of good formative assessment, as outlined in Chapter 2.
Coursework involving the construction of entity-relationship diagrams was assessed
using CourseMarker / DATsys as part of an undergraduate module in Database
Systems and a new marking tool for assessing entity-relationship diagrams within
CourseMarker was developed [HB06]. This work aims to develop a framework of
best practice for formative assessment across a variety of diagram-based domains
rather than be restricted to domain-specific instances. Therefore, an attempt was
made to keep the tools constructed as generic as possible to maximise inter-domain
potential. Entity-relationship diagrams were assumed to be part of that large section
of educational diagrams which are constructed from nodes and the links between
them, as per the definition in section 2.3.1. These are the domains which Thomas et al
[TWS05] have labelled the “network-like domains”. Marking, therefore, consisted of
features testing in which the nodes and links of the student diagram were assessed
according to features criteria defined by a domain expert. Results were collected and
conclusions drawn.
4.1 Assessment Background
Since the experiment aimed to implement a positive formative assessment experience
using CBA techniques, it is evident that the assessment should constitute a Computer
Based Assessment strategy and adhere to formative assessment practice.
4. Problems in CBA applied to free-response formative assessment
103
When defining CBA in relation to other areas of learning technology in section 2.1.1,
it was emphasised that CBA is the most specialised of the areas considered. The full
lifecycle of a CBA exercise includes stages such as authoring the exercise, presenting
it to the student, accepting submissions, returning marks and managing the data
generated by the system [Ta02]. Since a true CBA system should be committed to
automating the entirety of this lifecycle, it is clear that a CBA system is, by definition,
non-trivial. CourseMarker and DATsys, described in section 3.3, were developed to
manage the full lifecycle of Computer Based Assessment and to allow the CBA of
diagram-based domains. Section 2.2.5 outlined a strategy for good formative
feedback. It is against these criteria that the CBA assessment experiment feedback
and student experience will be measured.
The assessment was conducted as part of a compulsory course in Database Systems
taken by second year Computer Science undergraduates at the University of
Nottingham. The system attempted to formatively assess student Entity-Relationship
diagrams as part of coursework in which students developed their diagrams at an
early stage, before moving on to successive tasks involving the construction of SQL
query statements (which were further assessed by CourseMarker using methods
unrelated to this work).
The coursework constituted a two-part assessment. As defined in section 2.2.3, this
means formative assessment with a linked summative element added at the final
stage to act as a motivator. An initial problem was presented under purely formative
conditions. The students were allowed an unlimited number of submissions and
were provided with unlimited help from lab assistants in weekly lab sessions. A
second problem was then presented to the students with a summative element:
although unlimited submissions to CourseMarker were still allowed, help from lab
assistants was limited and students were expected to copy the final diagram into a
pre-designated submission form for final, summative marking. This structure was
agreed with, and influenced by, the module lecturer. The question texts were also
developed by the module lecturer. This ensured, firstly, that the exercises were
useful since they had been set by a subject specialist and, secondly, that they did not
unconsciously play to the strengths of the system whilst hiding weaknesses.
4. Problems in CBA applied to free-response formative assessment
104
For this experiment, informal student questionnaires were distributed and tutor
observations noted.
4.2 Assessment Construction and Methodology
4.2.1 Assessment Construction
As described in section 3.3.2.5, in CourseMarker a Marking Scheme is used to
describe the marking of an exercise. This calls upon Marking Commands, which in
turn use Marking Tools to mark aspects of the solution, return marks and generate a
Marking Result. For the marking of student submissions a new Marking Command,
the EntityRelationshipCMD, was created together with a new Marking Tool, the
EntityRelationshipTool.
The approach to marking was based upon an assessment of
diagram features, in which the types of nodes and their connections were assessed
against criteria provided by the exercise developer. Tsintsifas [Ta02] had developed a
similar system for the marking of his trial OO diagramming course; however the
approach here was considerably extended to allow the student increased flexibility.
Figure 4.1: Uneditable nodes and distracters in Tsintsifas’ OO exercise
Figure 4.2: Generic nodes in the E-R exercises with editable text
In Tsintsifas’ course all possible diagram elements were provided as complete,
uneditable entities with incorrect entities also included as distracters, as shown in
Figure 4.1. It was felt that in the context of the Entity Relationship diagrams such a
method, even with distracters, would serve to provide the students with too much
help, especially in light of the fact that an initial problem for the students in the set
4. Problems in CBA applied to free-response formative assessment
105
coursework was to correctly identify Entities, Attributes and Relationships from the
problem description.
Instead, the students were provided with generic diagram elements for Entities,
Relationships and Attributes, together with a tool to edit the text within each element
on the canvas to the string of their choice. The tool library for these exercises was
constructed in Daidalos, and is illustrated in figure 4.2.
Within DATsys, a figure can be composed of any number of primitives (such as lines)
together with any number of figures, recursively. Each figure has an attribute Name
which can be used to distinguish it from other figures. In figure 4.2, from the left, are
the standard pointer used to highlight elements on the drawing canvas, followed by
figures representing the text tool, Entity figures, Relationship figures, Attribute
figures, one-to-one connection lines, one-to-many connection lines and many-tomany connection lines. This tool library was constructed within Daidalos and each
type of figure was given a different Name. The latter three figures were each defined
as connection lines.
The EntityRelationshipTool worked from the assumption that each diagram node
was a composite figure whose members included a text field called TextElement;
indeed, this would always be true since the original elements were authored this way
in Daidalos. Each node could, therefore be identified in terms of two attributes:
•
Name: the name of the node, for example Relation, Entity or Attribute;
•
Text Content: the contents of the editable TextElement.
Hence an Entity containing the text “Artist” could be distinguished from both an
Entity containing “Album” and an Attribute containing “Artist”.
Connection lines, by contrast, were identifiable in terms of three attributes:
•
Name: the name of the connection, for example Onetoone;
•
Start node: the node connected to the start of the connection line;
•
End node: the node connected to the end of the connection line.
4. Problems in CBA applied to free-response formative assessment
106
The procedure for connecting two nodes on the canvas is simple. First, a node is
selected from the library and positioned on the canvas with a single click. The text
can be edited using the Text tool and the node can be further repositioned by
highlighting and dragging. This procedure is repeated for the second node. A
connection line is drawn by selecting the line type in the library, and then “dragging”
the line between the two nodes by depressing the mouse button on the first node,
moving the cursor to the second node and releasing the button. It is important to
note in this context that the start and end nodes for each connection line are clearly
defined and each connection line is effectively directional for marking purposes.
Figure 4.3: An illustrative student ER diagram solution
Potentially, multiple strings could constitute an acceptable Text Content in a student
node: an entity may be deemed to be acceptable, for instance, if it contained any of
the
strings
“Artist”,
EntityRelationshipTool
“artist”,
“Artiste”
or
“artiste”.
Therefore,
the
allowed the mark scheme to specify desired text in terms of
Oracles [ZF92], an extended notation based upon regular expressions which had
already been used successfully in the assessment of programming coursework. A
4. Problems in CBA applied to free-response formative assessment
107
further keyword, “owt”, was introduced to indicate that any text would be accepted
in a given instance.
A key aim of the design of the tool library was for the drawing of the solution to be
intuitive to the student. For this reason the unadorned, straight line connector which
officially represented one-to-one connections was also allowed to represent the
connection from an Entity to an Attribute, since the two connections are both
represented by a simple, straight line. An illustrative example of a student solution is
shown in figure 4.3.
The marking tool was invoked by a customised marking scheme (expressed in Java
as described by Symeonidis [Sp02]). Both the submitted student diagram and the
features specification file, mark.er, were passed to the command by the marking
scheme. Within the specification file each line represented an individual features test.
Each individual features test is represented as follows:
•
Mark weight : Feature expression : Description : Positive feedback :
Negative feedback
The mark weight is an integer denoting the relative significance of the test. The
description is a string containing a description of the test which is available to the
student at the point of feedback. The student receives either the positive feedback or
the negative feedback depending on the outcome of the test. Feature expressions are
the most complex component of the features test and may take one of the formats
described in Table 4.1.
With the exception of compositeRelationship, these feature expressions are generic. To
author other compatible domain notations in Daidalos one would simply ensure that
each diagram element type has a unique name, and that all text which is to be
editable by the student is contained within an embedded TextElement. The
compositeRelationship expression is domain specific because it has knowledge, firstly,
of which connection lines across a Relationship element constitute a complete
relationship and, secondly, the types of the relationships. Further parameterisation of
feature expressions is possible to indicate the number of matches within the diagram
as illustrated in the examples. If no such parameter is provided then the default is to
declare success if one or more matches are found.
4. Problems in CBA applied to free-response formative assessment
Expression
Format
exist
exist [Name] Checks that an element with a given Name exists.
108
Examples
exist Relationship Checks that at least one relationship exists.
exist onetoone Checks that at least one onetoone connection exists.
exact
exact [Name] [Text] Checks for an element with given Name and Text Content.
Example
exact Entity CD Checks for at least one Entity with Text Content “CD”.
connection
connection [ConnectionLine] [ElementType1] [ElementType2]
Checks that a direction-specific Connection Line exists from a node with Name Name1 to a
node with Name Name2.
Example
connection onetoone Entity Relationship
Checks that at least one onetoone connection line exists from an Entity element to a
Relationship element.
exactConnection
exactConnection
[u|d]
[ConnectionLineName]
[TextContent1] [Name2] [TextContent2]
[Name1]
Checks for a Connection Line from one element with specified text to another. Directionspecific from element1 to element2 if the parameter is “d”; “u” indicates that the connection
may be in either direction.
Example
exactConnection u onetoone Entity CD Relationship Produces
Checks that at least one onetoone connection line joins an Entity displaying the text “CD” to
a Relationship displaying the text “Produces”.
composite
Relationship
compositeRelationship
[u|d]
[RelationshipType]
[Name1]
[TextContent1]
[Relationship]
[RelationshipTextContent]
[Name2] [TextContent2]
Checks for a full E-R relationship, with a connection across a relationship between two
entities. It does not need the individual ConnectionLines to be specified, just the entire
relationship type. If the first parameter is “d” the connection must go from element1 to
element2; if “u” it may go either way.
Example
compositeRelationship d onetomany Entity CD Relationship Have
Entity Track
Checks for a directional onetomany relationship from an Entity CD, across a Relationship
Have, to an Entity Track.
Table 4.1: Features expressions for the ER exercises
4. Problems in CBA applied to free-response formative assessment
109
Accepted operators are ==, >, <, <= and >=.
The following are sample features tests from the exercises, of gradually increasing
complexity.
Example 1
•
1 : exist Entity : Check for entities in your diagram : Found :
You have no entities at all in your diagram! :
This test checks that a student solution contains at least 1 figure with the Name
“Entity”. If this is found then 1 mark is awarded and “Found” is given as feedback; if
this is not found then 0 marks are awarded and “You have no entities at all in your
diagram!” is given as feedback.
Example 2
•
1: exact Entity [Aa]rtist==1 : Check for an Artist entity : Found :
NOT found! :
This test checks that exactly one Entity figure exists whose text is either “Artist” or
“artist”.
Example 3
•
3: exactConnection u onetoone Entity (CD|cd) Attribute [Pp]rice :
Checking
an
attribute
of
CD
:
Found
correctly
:
CD does not possess an essential attribute :
This test checks if an undirectional one-to-one connection line joins an Entity figure
whose text is either “CD” or “cd” with an Attribute figure whose text content is
either “Price” or “price”. 3 marks are awarded if the desired connection is found.
Example 4
•
5: compositeRelationship d onetomany Entity [Aa]rtist Relationship owt
Entity (CD|cd) : Check relationship between Artist and CD : Correct :
The type of relationship between Artist and CD is incorrect! :
This test checks if a directional one-to-many relationship exists which links an Entity
figure whose text content is either “Artist” or “artist” with another Entity figure
4. Problems in CBA applied to free-response formative assessment
110
whose text content is either “CD” or “cd” via a Relationship figure whose text is not
examined.
4.2.2 Methodology
In this experiment, a student cohort of 141 undergraduate Computer Science
students in their second year was invited to attempt the formative exercises prior to
undertaking associated summatively assessed exercises, which were compulsory.
The formative exercises were, therefore, the initial stage in a two-part assessment
strategy whose purpose was to motivate students. A smaller initial problem set
comprised 3 exercises. The first exercise was trivial and designed to allow students to
learn to use the system while the other two exercises were progressively more
complex. Subsequently, a more substantial problem required students to draw a
diagram which would be used as the basis for further questions in the summative
assessment.
Data was collected in several ways. Quantitative data was collected using
CourseMarker’s Archiving Server and by using Likert scale questions in student
surveys. The student solution at every submission was captured using CourseMarker’s
Archiving Server, together with the associated marks, which were hidden from the
student, and the feedback, which was returned to the student. Thus, for each student,
it was possible to track changes made between submissions. It was also possible to
access information based upon the number of submissions made by each student and
how the marks changed as the number of submissions increased.
Likert scale questions within student questionnaires were designed to assess how
useful the exercises had been to the student learning process and how enjoyable the
experience of using the CBA system had been to the users. Student questionnaires
were distributed to students by lab tutors in the final lab session; unfortunately,
attendance at this particular weekly session was low.
Qualitative data was collected through the use of open-ended questions at the end of
the student surveys and through conducting brief, informal interviews with the lab
tutors. The qualitative data was used to explain trends which could be observed in
the quantitative data in context. Much of the quantitative data was taken from the
marking audits which, by definition, can take into account only those factors which
4. Problems in CBA applied to free-response formative assessment
111
have been marked. Since a fundamental requirement of the experiment was to
determine the success of the automated assessment, it was necessary to consider the
observations and experience of the tutors who had led the laboratory sessions.
4.3 Results and Analysis
4.3.1 General Impressions
The two-part assessment strategy successfully ensured a high student motivation: of
141 active students registered on the Database Systems course, 130 (92%) attempted
the formative diagramming exercises [HB06]. Although the students themselves
were presented with text feedback rather than their percentage scores, the following
results provide a good indication of the level of assistance provided by the system.
For the smaller initial problem set students made an average of 5 submissions, with
first submissions being awarded an average of 49.2% and final submissions an
average of 75.1%. For the larger second problem linked to the summative
assignment, students made an average of 9 submissions (with 8 students making
more than 25 submissions and one student a total of 72!), with initial submissions
being awarded an average of 50.7% and final submissions an average of 70.1%.
Completed questionnaires showed that students were pleased by the parameterised
Theseus development environment in which they were asked to develop their
solutions. Although it was effectively optional, most students chose to directly
develop their solutions online — the main exceptions were those most conscientious
students who had started to develop their solutions on paper at home as soon as the
coursework was announced, and even many of those were persuaded to copy their
solutions from paper into Theseus in order to gain feedback. The lab assistants
largely corrected most common student misunderstandings quickly; these can be
reduced in future by taking care with the wording of questions. Generally, however,
students found the instructions clear and the exercises straightforward.
4.3.2 Problems
A major problem from a marking point of view occurred because of the way in
which features tests are specified in CourseMarker. Each features test is assessed
exactly once and a mark assigned for each submission. Although this had previously
seemed adequate for features testing of both programming coursework and the
4. Problems in CBA applied to free-response formative assessment
112
summative CBA of diagrams by Tsintsifas [Ta02], in the coursework being assessed
here there were several equally valid model solutions with slightly differing,
mutually exclusive, features. As a result, features tests could only be constructed to
search for that subset of features which were common to all model solutions. This
scenario is clearly unacceptable. In earlier programming exercises, such cases were
rare and were often solved by careful wording of the question specification (or,
sometimes, explicit instructions to students).
A second problem was in the lack of marking for diagram appearance. Since the
features marking system utilised considered only the diagram elements and the
connections between them, it was possible to attain good feedback with a diagram of
very poor layout. Indeed, many students took full advantage of this fact, meaning
that when unexpected feedback was received it was sometimes difficult for a lab
assistant to determine what was wrong with a student diagram due to its poor
layout. In fact, the importance of marking diagram appearance had been identified
before deployment of the course and was not implemented simply because of time
constraints. In the event, experience has confirmed that this is a major issue to be
addressed.
The third major problem was in the expressiveness of the feedback. Although
considerable effort was undertaken to provide useful feedback for each features test
— especially the feedback for the ‘negative’ case where the student had failed the
feature test and assistance was required — it is clear that the feedback did not fully
constitute effective formative feedback as defined in section 2.2.5. Specifically, the
feedback tended to be too lengthy, since feedback was returned for every features
test, and too focused on particular student weaknesses due to its link to a specific
features test failure. The feedback will be scrutinised more closely in section 4.3.4.
4.3.3 Marking data
An analysis of results from the course shows that, for the largest exercise, the
difference in marks between the earlier submissions is substantially larger than that
between later submissions. Figure 4.4 shows how the underlying average student
mark improved over the first 9 submissions for those whose total submissions were
12 (the average) or fewer. On average, over the first 9 submissions a gradually
improving underlying student average mark converges around the 70% mark.
4. Problems in CBA applied to free-response formative assessment
113
At this stage the improvement in student marks becomes negligible; this may
account for the average number of submissions being 12 since the feedback to the
student would have changed little for 2 or 3 consecutive submissions. Since 70% is
considered a first-class mark, the feedback to the student was generally positive at
the 70% level and so the student would consider their solution adequate.
80
Underlying mark
70
60
50
40
30
20
10
0
1
2
3
4
5
6
7
8
9
Submission number
Figure 4.4: First nine submissions of students who submitted 12 times or less
72
Underlying mark
70
68
66
64
62
60
0
5
10
15
20
25
30
Submission number
Figure 4.5: Submissions 15 to 30 for those students
who submitted more than 12 times
35
4. Problems in CBA applied to free-response formative assessment
114
Those students who submitted a great number of times failed to acquire
proportionally higher marks. Indeed, those students who submitted greatly more
than the average produced a widely fluctuating average mark as shown in figure 4.5.
It is likely that these students are the “gamblers and perfectionists” constituency
identified by Benford et al [BBF92]. Gamblers randomly submit altered solutions in
the hope of chancing on a higher mark and include those students who are interested
in the mechanism behind the automated assessment (who were here provided with
ample opportunity to experiment). Perfectionists tend to never be satisfied with their
feedback and submit more times in the hope of achieving a slightly higher result. It
should be noted, however, that the proportion of students falling into these
categories in this course was lower than that reported by Benford, and that the
nature of this course as a formative assessment rendered such practices academically
harmless, if a waste of the student’s own time.
4.3.4 Performance as Formative Assessment
The effectiveness of formative assessment can be measured in terms of its primary
deliverable: feedback. Section 2.2.5 proposed a good feedback framework for
formative assessment and here the system is judged by those criteria.
The interactive CBA system encourages independent learning; although 2 hours of
tutor assistance was provided in laboratory sessions weekly, attendance was on a
voluntary basis and most students chose to attempt the exercises at times convenient
to them as the laboratory was open during daytime hours. The students were free to
work independently or could discuss their work with peers, and the feedback
provided by the system allowed gradual improvements in the quality of
submissions. Tutors were allowed to help the students for the early exercises but
such help was discouraged later. Students also found the system itself easy to use
and tutor advice was required rarely. As a result, the system fulfilled the objectives
of criterion 1 successfully.
Peer dialogue was encouraged in all the formative assessment exercises. A lot of
peer-to-peer discussion occurred during lab sessions and likely at other times as well.
This has already been documented before in relation to CBA systems in a process
which Benford et al [BBF92] labelled consciousness raising. CBA encourages debate on
the assessment process and students will contest perceived injustice in their feedback
4. Problems in CBA applied to free-response formative assessment
115
more vociferously than when a human marker was involved. The system therefore
fulfilled the objectives of criterion 2 successfully.
The decision to display only feedback to students and to withhold marks caused
some initial confusion as to what constituted good performance as some students
were used to the convention of grades or percentages. Tutor reassurance overcame
this problem, however, and eventually most students came to recognise from the
phrasing of the feedback alone that they had submitted a good solution. It is clear,
however, that improvement could be achieved in relation to criterion 3 by providing
a structured set of feedback examples in the online exercise text alongside a
corresponding set of illustrative examples.
Marking is conducted and feedback returned on a timescale which is, in practical
terms, instantaneous. Therefore the feedback can be viewed by the student and
related to a submission which is still fresh in the mind. Students are allowed
unlimited submissions and are therefore provided ample opportunity to act upon
their feedback. A student can later choose to consult their feedback for any exercise
for which they have a submitted solution. Criteria 4 and 5 are therefore adhered to
but the problem of considerable feedback unwieldiness still needs to be addressed, as
discussed in section 5.2.
Students were assessed frequently but since this was on a formative basis students
were not under pressure to attain high marks immediately. Thus criterion 6 was
adhered to. Finally, since CourseMarker has good statistics facilities already in place
[FHH+01], student progress was monitored with ease, allowing conclusions to be
drawn about future teaching of the material and fulfilling criterion 7.
4.4 Conclusions
The process of conducting formative assessment within diagram-based domains
using CBA courseware in this experiment was encouraging, but the adherence to
good formative feedback practice and Computer Based Assessment principles was
incomplete.
Computer Based Assessment principles were breached because the marking system
of CourseMarker was not sufficiently flexible to assess the mutually exclusive
4. Problems in CBA applied to free-response formative assessment
116
solution cases which arose in the more complex later exercises. This necessitated
tutor involvement at the marking stage, since CourseMarker only marked (and
therefore could only provide feedback upon) the common subset features.
Formative assessment best practice was not achieved for two distinct reasons. Firstly,
the feedback, distributed through a feedback mechanism designed with summative
assessment in mind, was lengthy. Many comments provided as feedback related to
aspects of the exercise which the student had already successfully completed, while
those relating to areas of improvement tended to simply state the student failure in a
way which was both too specific and overly negative. It is clear that more targeted
and motivational feedback is required for the formative assessment process.
Secondly, the features marking system assessed only the semantics of the diagram.
The aesthetic appearance of the diagram was effectively ignored by the system.
Section 2.3 outlined the purpose of diagrams: to convey information. Even tutors
familiar with the entity-relationship diagram domain often found difficulty in
comprehending student diagrams due to poor layout. This fact alone confirms that
student diagrams often failed to achieve good practice in this area. The purpose of
formative assessment is to aid in the learning process; it is clear that, here, a part of
that learning process had been excluded.
4.5 Summary
This chapter described a practical experiment in the automated assessment of entityrelationship diagrams, conducted with the intention of determining the suitability of
the CourseMarker / DATsys platform, which section 3.3 had presented as a model
CBA architecture, to conduct formative assessment within free-form, diagram-based
domains. Overall results were encouraging. The system was popular with students
since they appreciated the interactive and intuitive user interface and immediate
feedback. Students were provided only with feedback, but the underlying marks
increased steadily over multiple submissions, demonstrating student learning.
However, the system demonstrated problems in several key areas. Firstly, the
marking scheme was insufficiently expressive to allow for the mutually exclusive
solution cases which arose in more complex questions: this resulted in the CBA
system being able to mark only common subset features. Secondly, the feedback was
4. Problems in CBA applied to free-response formative assessment
117
lengthy, insufficiently targeted on the learning process and was not motivational.
Thirdly, a key component of learning to draw educational diagrams, i.e. aesthetic
considerations, was not addressed by the system with negative consequences in the
resultant student diagrams.
Chapter 5 examines the provision of formative CBA within diagram-based domains
and outlines the problems which must be overcome in light of the conclusions drawn
by the preliminary work in chapter 4.
Chapter 5
Providing a specification for formative CBA
in diagram-based domains
5. Providing a specification for formative CBA in diagram-based domains
119
Introduction
The process of automating the formative assessment process in free-response
diagram-based domains using Computer Based Assessment technology is both
feasible and useful. Chapter 4 described an initial research experiment conducted to
identify the shortcomings in current CBA techniques, exemplified by the
CourseMarker / DATsys system, when related to formative assessment. Proceeding
from this point, this chapter identifies those features which must be present for
formative, diagram-based CBA to be successful, considers the extensions needed to
facilitate that success and outlines a series of specific requirements in each of the
identified problem areas. The objective is to argue that CourseMarker / DATsys are a
suitable platform for conducting CBA formatively in diagram-based domains and
that the system can cater for the full lifecycle of formative CBA if the identified
extensions are implemented.
Section 5.1 states the requirements of formative, diagram-based CBA arising from its
definition, states the motivation and aims of the work and argues that the
requirements can be feasibly achieved by extending the CourseMarker / DATsys
CBA system. Current capabilities of the CourseMarker / DATsys system allow it to
fulfil some requirements, primarily those shared with summative CBA, while the
extensibility of the system makes it a suitable platform for the necessary extensions.
The extension requirements are discussed: an extensible system of marking tools to
allow the marking of the aesthetic appearance of diagrams; a more flexible features
marking system able to consider mutually exclusive alternative solution cases; a
system to provide truncated, prioritised feedback and a system of guidance for the
construction of formative, free-response exercises, particularly in terms of the
creation of feedback.
Section 5.2 outlines measurable requirements which must be fulfilled in order to
render successful the extensions in each of the problem areas.
5.1 Objectives
Section 5.1.1 outlines those criteria which must be fulfilled in order to satisfy the
definitions of Computer Based Assessment and formative assessment. Section 5.1.2
5. Providing a specification for formative CBA in diagram-based domains
120
assesses which of these criteria are already fulfilled by the existing CourseMarker /
DATsys system and which criteria necessitate the extension or change of the existing
architecture. Section 5.1.3 explains the overall motivation in terms of the resultant
questions which must be answered in the context of applying a CBA approach to
conducting formative assessment in free-response diagrammatic domains.
5.1.1 Definitions
Figure 1.1 illustrated the scope of this project as the intersection between freeresponse CBA, formative assessment and diagramming. Section 2.1.1 defined CBA as
“the delivery of materials for teaching and assessment, the input of solutions by the students,
an automated assessment process and the delivery of feedback, all achieved through an
integrated, coherent online system”. Section 2.2.1 defined formative assessment as
“assessment conducted throughout the learning process, as an integral part of that process,
where the central aim is to provide feedback to enable the enhancement of learning” while
section 2.3.1 notes that an educational diagram is a collection of nodes and lines
constrained by a convention of meaning whose purpose is to convey information.
Section 3.1 noted that a fully automated CBA approach, when compared with other
approaches, provides the best potential for time-saving in conducting formative
assessment and also allows a realistic test-feedback-retest cycle of iterative learning.
Almond et al [ASM02] summarise the four basic processes present in an assessment
cycle. The Activity Selection Process selects and sequences tasks with an assessment or
instructional focus, including administrative duties. The Presentation Process presents
the task to the student and captures their response. Response Processing identifies and
evaluates essential features in the response and records a series of “Observations”.
Finally, the Summary Scoring Process uses the Observations to update the “Scoring
Record”. Since a CBA approach attempts to automate the entire assessment process it
is clear that these processes constitute a minimum programme of automation objectives
for any CBA system.
As the primary deliverable of the formative assessment process, feedback provides a
key to measuring the success of a formative assessment system. Section 2.2.5 outlined
a framework for effective feedback within a formative assessment context. Formative
assessment should: facilitate the development of self-assessment (reflection) in
learning; encourage teacher and peer dialogue around learning; clarify what
5. Providing a specification for formative CBA in diagram-based domains
121
constitutes good performance; provide opportunities to improve performance;
deliver information focused on student learning; encourage positive motivational
beliefs and self-esteem and provide information to educators to shape future
teaching.
Based upon the definition of diagrams as a collection of nodes and connection lines,
Tsintsifas [Ta02] concluded that a key element of diagram-based CBA was the ability
to provide domain coverage while allowing users to manipulate a standard set of
tools. Requirements for diagram editors were the level of Human Computer
Interaction and the simplicity, intuitiveness and usability of the diagram editors. If a
diagram has as its purpose the aim of communicating information, then it is
necessary to assess a diagram in terms of two criteria:
•
The information provided by the diagram to the recipient must be correct;
•
The diagram must be displayed in a way that is aesthetically pleasing, to
avoid recipient confusion.
Section 5.1.2 assesses which of these criteria are already fulfilled by the existing
CourseMarker / DATsys system and which criteria necessitate the extension or
change of the existing architecture.
5.1.2 Identifying the Necessary Extensions
To conduct formative computer-based assessment in diagram-based domains, it is
necessary to adhere to requirements and best practice in three areas: CBA, which
must fully automate the assessment process in a coherent online system; formative
assessment, whose primary purpose is to assist learning and educational diagrams,
which constitute a wide variety of node-and-link based domains. This section
considers each set of criteria and outlines systematically the shortcomings of
CourseMarker / DATsys in fulfilling the requirements.
In chapter 4, the shortcomings of CourseMarker / DATsys were summarised, as the
result of a practical experiment, in terms of the ability to encompass mutually
exclusive solution cases in marking, the ability to assess the aesthetics of a student
diagram, and the provision of concise, prioritised feedback. This section argues that
these shortcomings form the core extensions needed to accommodate formative
5. Providing a specification for formative CBA in diagram-based domains
122
assessment within diagrammatic domains using CourseMarker / DATsys and are
not specific to a specific experiment case. The section also demonstrates why further
shortcomings of a generic nature would not be identified by further, domain-specific
experiments. The approach taken is to consider how CourseMarker / DATsys can
fulfil each of the criteria arising from the basic definitions, as outlined in section 5.1.1.
Formative and summative assessment stand opposed in many respects, as outlined
in section 2.2. However, the underlying architecture required for the assessment
process is similar in many respects; it is this similarity that enhances the feasibility of
this project by enabling a system for formative assessment to be built by extending
the CourseMarker / DATsys system which was intended for summative assessment
purposes. Such similarities were not emphasised by Tsintsifas [Ta02]. Tsintsifas
emphasised the practical benefits gained by replacing summative assessment with
CBA, without considering that formative assessment was the assessment form most
in need of replacement. Tsintsifas believed that security, performance and
administration were important only in summative assessment. While student
plagiarism is not an issue in formative assessment, it is plain that unauthorised
tampering with the system would affect the learning process of others. Furthermore,
effective performance and administration are required to provide timely feedback to
the students and useful feedback to educators to improve future learning. Tsintsifas
further argued that formative assessment could be implemented by discarding the
summative marks — disregarding the differences in feedback required, as
demonstrated by the experiment outlined in Chapter 4.
5.1.2.1 Fulfilling Computer Based Assessment criteria
To successfully apply CBA technology, it is necessary that the entire assessment
process be automated within an integrated system. It is necessary, therefore, to
examine the success of CourseMarker in automating the basic assessment process as
described in section 5.1.1.
The CourseMarker architecture was described in section 3.3.2. Sequencing of
assessment tasks can be specified exactly using a Marking Scheme. Presentation of
teaching materials can be achieved through the user clients, for example the Java
CourseMarker client illustrated in figure 3.6. The Administrator has a defined role as
a User within CourseMarker and administrative tasks are split between the Login
5. Providing a specification for formative CBA in diagram-based domains
123
Server, which registers users and validates sessions, the Course Server, which
controls module information and sets up exercises, the Submission Server, which
receives submissions and issues receipts, the Archiving Server, which maintains
audit trails and the Ceilidh server, which manages the other servers and can reload
them at runtime. The DATsys architecture, described in section 3.3.3, describes how
new domains can be authored with Daidalos and new exercises with Ariadne. The
student launches the configured Theseus editor from within the CourseMarker client
to draw their solution and then submits through CourseMarker. Hence, the Activity
Selection Process, involving the sequencing of tasks and the administrative duties, is
well defined in CourseMarker / DATsys.
The Presentation Process in assessment systems presents the problem to the student
and then captures the student’s response. The student is presented with a problem
specification in the CourseMarker client. Upon setting up the exercise, the student
can develop their solution within a parameterised Theseus client. Theseus allows the
student to interactively “draw” their diagram upon a development canvas. The
student can save their drawing by selecting the save function within Theseus. Their
drawing is stored in a .draw file as a collection of objects, each representing a node or
connection, which can later be traversed as an enumeration by the marking tools.
Hence the problem is concisely presented to the student and the solution captured in
a usable way.
Response Processing examines the features of the solution and catalogues them.
Features testing for an exercise is specified on a feature by feature basis. The relative
weight of the feature, the definition of the feature being sought and feedback for
positive and negative results, are specified line by line. The manner of features
specification has survived since Ceilidh with little change. In programming exercises,
feature definitions are regular expressions which match the feature being sought. In
diagrammatic exercises the feature being sought is defined as in section 4.2. One
major problem, identified as a result of the experiment summarised in Chapter 4, is
that the features are assumed to be mutually supportive. Marking is therefore
accumulated across all features. Previously, in programming exercises using Ceilidh
and CourseMarker, students have been “shepherded” into using one particular
feature over an alternative option through careful question wording or the threat of
being awarded 0 marks overall if a given token is identified. Within formative,
5. Providing a specification for formative CBA in diagram-based domains
124
diagram-based exercises, this situation is unsatisfactory. Therefore, it is necessary to
enable the marking system to consider mutually exclusive solution cases before
cataloguing for the purposes of response.
The Summary Scoring Process uses the Observations recorded by Response
Processing to update the score of the exercise. CourseMarker assigns marks based
upon a weighted summary of the tests it has carried out, and stores these marks in a
structured Marking Result. For the purposes of general assessment, the
CourseMarker marking system is logical and efficient. Therefore, the Summary
Scoring Process is successfully automated within CourseMarker.
5.1.2.2 Fulfilling Formative Assessment criteria
To successfully facilitate formative assessment, it is necessary to examine how the
feedback system of CourseMarker conforms to the framework for formative feedback
originally summarised in section 2.2.5.
The feedback mechanism of CourseMarker facilitates student reflection in learning
because the feedback process is automated. Students can learn at their own pace and
submit at their own convenience since the CourseMarker client is widely accessible
to students. Previous courses in programming and diagram-based domains,
including the entity-relationship course described in Chapter 4, adopted an approach
whereby students were free to choose to attend weekly laboratory sessions, where
help from tutors was available, or to attempt the exercises on their own. Experience
has shown that tutor advice is required rarely and on peripheral issues; there is no
evidence to demonstrate that a course involving the student working through a set of
exercises at their own pace without tutor input would be impractical, provided help
could be provided (perhaps in the form of a technical support email address) for
unforeseen technical issues or comments. Thus, the first criterion outlined in section
2.2.5 can be fulfilled using CourseMarker’s present capabilities.
CourseMarker encourages teacher and peer dialogue around learning. Section 2.1.4
outlined student willingness to question CBA marking results. Students are likely to
collaborate if working in unsupervised laboratory sessions; if the assessment being
conducted is formative, then this can be viewed as a helpful part of the learning
process. Section 3.3.1.5 outlined the observation that Ceilidh, the forerunner to
5. Providing a specification for formative CBA in diagram-based domains
125
CourseMarker, markedly increased student consciousness of the learning and
assessment process. Therefore, the second criterion outlined in section 2.2.5 can be
fulfilled using CourseMarker’s present capabilities.
The practical experiment in Chapter 4 highlighted problems in clarifying to the
student what constitutes good performance. In previous summative assessment,
CourseMarker provided feedback as an expandable tree of grades. For formative
assessment purposes the grades were removed, but this caused confusion among
students as to how much improvement their solution required. To improve this
situation will require presenting the questions, solutions and feedback in a clear way
to the student. Section 3.1 highlighted that the design of the assessment problem and
feedback is as important to the success of the formative assessment as the technical
capability of the system used. Therefore, it will be necessary to produce a set of
guidelines for exercise developers and teachers to promote good practice in this area.
CourseMarker provides opportunities to improve performance through allowing
multiple submissions and providing feedback quickly to students. The fact that the
assessment process is entirely automated is key to CourseMarker’s innate ability in
this area. Therefore, the fourth criterion outlined in section 2.2.5 can be fulfilled using
CourseMarker’s present capabilities.
CourseMarker fails to deliver information which is sufficiently focused on student
learning. CourseMarker’s feedback is an exhaustive summary of the feedback
comments from each test undertaken during the assessment process, which is often
overwhelming in quantity and discourages the student from viewing the exercise as
a holistic entity. It is clear that the feedback mechanism of CourseMarker must be
modified to provide a smaller number of prioritised comments focused on the
student learning process in a motivational way.
Formative assessment using CourseMarker can encourage positive motivational
beliefs and self-esteem through providing frequent low-stakes assessment. For this to
be successful the feedback itself must be phrased by the exercise developer in a
motivational way; this is a focus of the guidelines discussed above. Furthermore, the
feedback must be short and prioritised, as discussed within the context of providing
feedback focused on student learning.
5. Providing a specification for formative CBA in diagram-based domains
126
Lastly, feedback should provide information to educators. This requirement is met
fully by the existing CourseMarker system, which allows all submissions by all
students, together with associated marks and feedback, to be examined by educators.
5.1.2.3 Fulfilling Computer Based Assessment criteria
Representations for new diagram domains can be authored, for use in DATsys
diagram editors, using the Daidalos authoring environment. Section 2.3 explained
that educational diagrams are commonly a collection of nodes and links. Nodes and
links, and the connectivity between them, can be specified in Daidalos on a domain
specific basis. Furthermore, the Generic Marking Mechanism is extendable and
designed to accommodate the features marking of new exercise domains, which
includes new diagram-based domains.
However, the effectiveness of a diagram in conveying meaning is affected by its
aesthetic appearance. A diagram whose physical layout is confusing to the reader is
poorer at conveying information than a diagram with identical nodes and
connections but a less confusing layout. The assessment of diagram aesthetics in a
CBA context is not catered for by CourseMarker / DATsys; indeed, it is
undocumented in the literature. This is, therefore, a requirement for formative
assessment in this field.
5.1.2.4 Summary
Section 5.1.2 outlined the requirements for CBA, formative assessment and
educational diagrams based upon the definitions provided in section 5.1.1.
CourseMarker / DATsys constitute a platform which is already able to cater for
many of the outlined requirements; it was argued that this was because of the
overlap between formative and summative assessment requirements in CBA terms.
Outstanding requirements which must be met in order to demonstrate that formative
CBA within diagram-based domains can be useful and feasible include the three
requirements identified as a result of practical experiments:
•
The ability to distinguish between student diagrams of differing aesthetic
appearance;
5. Providing a specification for formative CBA in diagram-based domains
•
127
The ability to consider solutions in which multiple, mutually exclusive cases
may be acceptable;
•
The ability to truncate the feedback provided to students to reduce confusion
— students must be provided with the most relevant comments to their
solution.
Furthermore, to accommodate formative assessment criteria it is necessary to provide
a set of brief guidelines to educators to assist in presenting materials within a freeresponse CBA context. These guidelines should outline methods for creating
positive, motivational feedback and for clearly specifying good practice to students
in the specification text.
5.1.3 Aims and Motivation
The aim of this work is to demonstrate that the automation of the formative
assessment of diagram-based coursework using CBA courseware is both feasible and
useful.
Section 2.2 outlined the numerous pedagogical benefits of formative assessment.
Formative assessment encourages openness among students, can be used to assess a
great scope of learning outcomes, can help in avoiding mark aggregation and
discourages plagiarism. Despite this, formative assessment is in a usage decline
because it is seen as resource intensive. Section 2.1 pointed out that CBA approaches
routinely demonstrate great resource-savings whilst free-response domains, such as
diagrams, offer the great scope for assessing a wide range of cognitive learning
levels. The most basic motivation of this work, therefore, is to answer the question:
•
To what extent can CBA techniques be used to reduce the resource required
in setting a formatively assessed coursework in a diagram-based domain,
marking student submissions and returning feedback, while still adhering to
good formative assessment principles?
This question could be alternately phrased thus:
•
To what extent would current, successful CBA practices need to be changed
to conform to formal formative assessment guidelines?
5. Providing a specification for formative CBA in diagram-based domains
128
To answer these questions this work used an initial phase of research to identify
shortcomings in the existing courseware. A practical experiment and a consideration
of the requirements implied by definition highlighted that three extensions to the
courseware must be developed. Since the approach taken is to develop a generic
approach to the problem in order to maximise potential usefulness over multiple
domains, a guide to educators in presenting domain-specific questions to students
using the courseware is also required. A plan is feasible if it can be implemented
such that the requirements are fulfilled. Usefulness is achieved when a system
provides results which are of benefit to practitioners.
To prove that the automation of the formative assessment of diagram-based
coursework using CBA courseware is both feasible and useful, it is necessary to
design and implement the extensions, deploy a course of exercises and analyse the
results.
The three identified areas of extension are as follows:
•
Extending the marking system to assess the aesthetics of student diagrams;
•
Extending the marking system to allow mutually exclusive solution cases;
•
Changing the system of feedback to provide only the highest priority
comments to students.
The marking system should allow the aesthetic appearance of student diagrams to be
assessed. The assessment of aesthetic appearance should accommodate a wide range
of diagram domains but be able to provide useful insight into the diagram on a
domain-specific basis. The aesthetic assessment should provide an analysis of the
diagram appearance as a coherent whole. Feedback should be provided to students
to indicate aesthetic improvements which would benefit the diagram.
The marking system should be able to accommodate solutions where several,
mutually exclusive alternatives are available to the student. A student solution may
provide an incomplete attempt to fulfil the solution using one of the solution cases.
The marking system should identify which solution case the student is attempting to
attain and provide useful feedback which would benefit the diagram. Mutually
5. Providing a specification for formative CBA in diagram-based domains
129
exclusive alternatives may constitute multiple nodes and connections which differ
between alternate versions of the model solution.
Feedback provided to students should be a truncated version of the feedback
generated by the marking system. Feedback comments should be prioritised and the
highest priority feedback comments should be returned to students. The idea is to
induce an iterative process whereby students are encouraged to successively
improve their solution and then re-submit to receive further feedback.
The idea of assessing the aesthetic appearance of student diagrams is novel. It has
not previously been documented in the literature, presumably because diagrambased CBA is a research area in infancy. Research interest lies in developing a
mechanism flexible enough to encompass multiple diagram domains whilst
providing meaningful feedback to students. The feedback provided should be
acceptable to human markers as a measure of validity. It should be noted that,
although the assessment of diagram aesthetics has been identified as a requirement
of formative assessment in this context, aesthetic assessment is likely to be of interest
more widely, including in summative assessment using CBA.
The idea of a system of mutually exclusive solution cases is also novel to CBA.
Previous CBA research with CourseMarker has relied upon checking for the presence
or absence of defined tokens. If multiple model solutions are feasible then the
wording of the question has been changed to “force” the student to adhere to one
version. This has included explicitly “banning” the use of certain constructions in
the question specification. Question marking involving simulation may provide a
solution to this problem, but the approach is necessarily domain-specific. Fixedresponse CBA does not encounter this problem by the restrictive nature of its design.
The idea of presenting truncated feedback to students within the context of freeresponse CBA is also novel. Fixed-response CBA such as MCQs often involves only
one piece of feedback provided to the student. In free-response exercises such as
diagram-based coursework a range of feedback can be generated based upon tests
conducted upon the student answer. However, feedback is presented as a concise list
of all tests conducted upon the solution. The solution here differs because only highpriority feedback is provided to the student, which can change upon each
submission as the student improves the solution; hybrid systems involving human
5. Providing a specification for formative CBA in diagram-based domains
130
marking can be used to accomplish the former objective, but it is unlikely that
multiple submissions could be assessed due to resource constraints.
5.1.4 Summary
Section 5.1.1 defined the requirements for CBA, formative assessment and
educational diagrams, based upon research previously summarised in chapters 2, 3
and 4 of the thesis. Section 5.1.2 assessed whether these requirements were met
within the existing CourseMarker / DATsys system and placed the outstanding
requirements — the assessment of diagram aesthetics, the assessment of solutions
containing mutually exclusive solution cases and the presentation of prioritised,
truncated feedback — within context in the general requirements for the areas of
CBA, formative assessment and educational diagrams. Section 5.1.3 demonstrated
how the proposed extensions relate to the aim and motivation of the thesis as
outlined in Chapter 1 and outlined their novelty to the CBA field. Section 5.2 will
consider each of the proposed extensions in turn and outline the detailed
requirements which each extension must achieve. The scope of guidance required for
educators and developers is also examined.
5.2 Detailed Requirements
This work proposes three extensions to the existing CourseMarker / DATsys system,
as well as the creation of guidance to developers and educators to ensure successful
development of domains and exercises, and solutions to the problem of enabling the
formative assessment of diagram-based student coursework to be successfully
automated using CBA courseware. Section 5.2.1 outlines detailed requirements
necessary to allow student diagrams to be assessed in terms of their aesthetic
properties. Section 5.2.2 outlines detailed requirements necessary to extend the
system to flexibly mark coursework where mutually exclusive alternate solution
cases are allowed. Section 5.2.3 details requirements to allow prioritised, truncated
feedback to be delivered to students. Section 5.2.4 outlines the scope of advice
needed if the resulting CBA courseware is to be used successfully by developers and
educators.
5. Providing a specification for formative CBA in diagram-based domains
131
5.2.1 Requirements for assessing the aesthetics of student diagrams
This research aims to enhance the formative assessment of diagrams in a domain
independent way. Therefore, it is clear that the layout of diagrams in many domains
will need to be assessed in a flexible manner.
Any approach to the assessment of diagram layout which aims to provide one
concrete mechanism for the assessment of all diagram domains in a general way will
result in assessment of only the most superficial aspects of diagram layout due to the
conflicting requirements of different domains. Conversely, a ‘blank slate’ approach
based upon applying an entirely different set of rules on a per-domain or even perexercise basis will result in an unacceptable level of difficulty to the developer
whenever the assessment of a new diagram domain is required.
Within a CBA context it is necessary to enable the marking of aesthetics in new
diagram domains with an acceptable amount of development effort, while still
providing capability to apply different conventions across disparate domains as
required. Concretely, it is necessary to:
•
Minimise the effort required to assess the aesthetics of a new diagram domain
(to provide a basis for aesthetic assessment across common domains) ;
•
Allow domain disparities to be accommodated through conducting a
different aesthetic assessment (to make the system extensible);
•
Allow educator preferences and priorities to be reflected (through
parameterisation);
Since the extensions will be made to the existing CourseMarker / DATsys
architecture, it is necessary to ensure compatibility and transparency to users.
Specifically, the extensions should be:
•
Integrated into the marking system;
•
Integrated into the feedback system;
•
Able to recognise existing conventions for specifying diagram formats;
•
Transparent to students.
5. Providing a specification for formative CBA in diagram-based domains
132
Within a formative assessment context the central requirement is feedback. Students
should be provided with motivational feedback which is relevant to the shortcoming
identified by the assessment procedure. Feedback comments should be short and
prioritised; this requirement is further elaborated in section 5.2.3. The central
requirement for assessing the aesthetics of student diagrams within the context of
formative assessment is, therefore, that feedback is provided which is integrated with
the extensions proposed for delivering truncated, prioritised feedback to students.
Within the context of educational diagrams, several requirements are essential to the
approach. It is necessary to:
•
Provide a basis for assessing educational diagrams generically;
•
Provide a platform for extension to accommodate new domains;
•
Assess diagrams based upon justified criteria;
•
Allow the relative importance of criteria to be specified to take into account
the fact that not all criteria contribute equally to the general aesthetic of the
diagram, as outlined in section 2.3.4.1.
5.2.2 Requirements for assessing solutions with mutually exclusive
alternate solution cases
Once again, the requirements must allow mutually exclusive alternate solution cases
to be assessed by the system in a general, domain-independent way while
minimising the effort required on the part of the exercise developer. Mutually
exclusive alternate solution cases constitute alternate subsets of the model solution. It
is necessary to distinguish between those parts of the solution which are common to
all versions of the model solution and those parts which differ. Consider the simple
example in figure 5.1, which shows two versions of a model solution.
Let M be the set of all model solutions. In the simple problem in figure 5.1 there are
two model solutions, M 1 and M 2 . Let I be the set of features common to all model
solutions in M while D x = M x − I , all features in the model solution which are not
common.
5. Providing a specification for formative CBA in diagram-based domains
A
B
133
C
E
A
B
F
G
Figure 5.1: Two mutually exclusive model solutions
In figure 5.1, therefore:
M 1 = {A, B, C , (a, b ), (b, c )}
M 2 = {A, B, E , F , G, (a, b ), (b, e ), (b, f ), ( f , g )}
I = {A, B, (a, b )}
D1 = {C , (b, c )}
D2 = {E , F , G, (b, e ), (b, f ), ( f , g )}
If both the commonality and difference across model solutions is to be assessed
successfully, a central requirement within the context of CBA is that the exercise
developer be allowed to specify which features are common to all model solutions
and then to specify the mutually exclusive features. Within this, the former
requirement can be satisfied with relative ease since this effectively duplicates
current features testing within CourseMarker. Within a real world context, however,
the specification of the mutually exclusive solution cases requires flexibility.
5. Providing a specification for formative CBA in diagram-based domains
134
In figure 5.1, D1 ∩ D2 = { }. That is to say that the mutually exclusive solution cases
contain no common elements. In practice, it cannot be guaranteed that a feature
appears in either one and only one mutually exclusive solution case or else be
common. Furthermore, the marking system of CourseMarker acknowledges, through
its system of weighting, that not all features are of equal importance. Similarly,
features within a mutually exclusive solution case may vary in importance.
In an educational context, a mutually exclusive solution cases occurs because more
than one solution is plausible to the educator. This, in turn, occurs because more than
one line of reasoning may be employed to develop a solution. Therefore, it is helpful
to determine which features within each mutually exclusive solution cases denote the
reasoning responsible for the solution case, and which features are dependent upon
the original reasoning by virtue of being a logical continuation.
From this, the requirements for assessing solutions with mutually exclusive alternate
solution cases flow. From a CBA perspective it is necessary to allow the specification
by the exercise developer of the different solution cases. The exercise developer
should be able to specify the common cases and each of the mutually exclusive
solution cases, including which features within the solution case denote the
difference in reasoning, in a way which can be used by the marking mechanism to
assess the student solution appropriately. This is necessary to accommodate the
generic approach which forms the foundation for this work. An exercise developer
should be able to specify assessment criteria across domains with the minimum of
development effort and maximum consistency, but the assessment of domainspecific diagrams must still allow sufficient scope to be meaningful.
Furthermore, from a CBA perspective, the proposed modifications must:
•
Be integrated into the marking and feedback system system;
•
Be seen to maximise transparency to students.
In fact, the first of these objectives leads to the second, since facilitating conventional
operation of the marking system and allowing feedback to be delivered consistently
with conventional CourseMarker CBA exercises will ensure that the student
experience is consistent across CBA domains.
5. Providing a specification for formative CBA in diagram-based domains
135
From the perspective of educational diagrams, the extensions must:
•
Provide a basis for assessment of a wide variety of educational diagram
domains;
•
Allow educators to specify criteria for assessment and minimise constraints;
•
Allow criterion weighting to account for features of unequal importance.
Specification of common and mutually exclusive features should be consistent across
domains. The system should not constrain educators from applying any criteria set of
their choice to the exercise; the approach is to supply the exercise developer, who is
assumed to be a domain expert, with the facility for CBA rather than to act in a
prescriptive manner.
Within a formative assessment context the central requirement is feedback. Students
should be provided with motivational feedback which is relevant to the shortcoming
identified by the assessment procedure. It is necessary for the assessment process to
be able to determine which version of the model solution the student is attempting to
attain and to give tailored, motivational feedback based upon the correct, mutually
exclusive solution case which will improve the solution. Again, this extension must
be integrated into the existing feedback mechanism and be compatible with the
mechanism for providing truncated, prioritised feedback outlined in section 5.2.3.
5.2.3 Requirements for prioritising and truncating feedback to
students
In order to fulfil formative assessment criteria, the feedback generated by the
CourseMarker marking system must be truncated so as not to overwhelm the
student. The most relevant information at each submission should be presented to
the student while less relevant comments are omitted (possibly to be presented to the
student after a later submission). Section 2.2.5 outlined criteria by which formative
assessment feedback can be assessed. The primary requirement is to modify the
feedback so that these criteria are met.
For this to occur, it is necessary to define a mechanism whereby feedback comments
can be prioritised. If the relative priority of feedback comments after each submission
5. Providing a specification for formative CBA in diagram-based domains
136
can be successfully defined then the task to delivering only the highest priority
comments can be managed. Once the comments are prioritised, the extension must
allow the comments to be delivered in a way which meets truncation criteria suited
to the individual exercise.
Consequently, from a formative assessment perspective, the central requirements are
to allow the prioritisation of comments and the definable truncation of the feedback.
Flexibility for the exercise developer is key: it should be possible to define the
truncation preferences to be applied by the marking and feedback delivery systems
when developing the exercise. It is, furthermore, necessary to ensure that the
feedback itself is motivational to students. This last issue will be addressed in the
exercise developer guidelines, the scope of which is outlined in section 5.2.4.
Since this last extension is concerned with prioritising and truncating feedback which
has already been generated, no direct requirements exist for this extension within the
context of educational diagramming.
Within the context of CBA, the central requirements are the integration of the
mechanism for prioritising and truncating feedback into the existing CourseMarker
marking and feedback systems. To allow the process of assessment and feedback
delivery to be automated and online, the mechanism for determining comment
priority must operate without human intervention. This implies that comment
priority must be able to be determined by information provided by the exercise
developer and the results generated by the assessment process from the student
submission. Truncation must be also be performed using criteria which are specified
during the process of developing the exercise.
5.2.4 Scope of guidance needed for educators and developers
Section 3.1 provided a review of attempts to provide formative assessment
capabilities using automated assessment courseware. Although the capabilities of the
CBA courseware to provide formative assessment to students is crucial to success by
definition, a central conclusion was that the process of development of the exercises
themselves, most especially feedback, was also essential if the process of formative
assessment was to contribute maximally to the learning process.
5. Providing a specification for formative CBA in diagram-based domains
137
It would be neither necessary nor possible to provide a complete guide to educators
and developers to the development of exercises in all domains. The pedagogical
development of exercises is a research field in its own right and the subject of active
debate. Chapter 2 briefly cited examples of research which aimed to develop
assessment directly from cognitive taxonomies such as Bloom’s taxonomy, but such
methods are unproven even at the domain-specific level.
Conversely, a minimal guide to educators and developers whose function is to define
the capabilities of the system and the necessary file formats would fail to provide
educators with sufficient information to maximise the formative assessment potential
of the system. It is necessary to find a “middle-ground”, whereby practical advice
can be provided to educators and developers within the context of developing
formative assessment using the courseware while leaving domain-specific decisionmaking to be implemented by a specialist.
Chapter 3 noted several good examples of the delivery of good, motivational
formative feedback to students using CBA courseware. Phrasing feedback
motivationally and referring to learning material or providing research references
can encourage further student research and motivate further, useful re-submission.
From an educational point of view, it is important to illustrate how good formative
feedback can be developed within the context of CBA. It is, furthermore, important to
emphasise that CBA feedback comments are linked with assessment criteria, with the
result that prioritisation of the underlying assessment criteria must be defined in
such a way that the feedback comments are encountered by the student in a useful
order which reflects the learning curve of the domain materials.
For developers it is necessary to make explicit the relationships between the
pedagogic priorities of the formative assessment and the functions of the courseware
responsible for their implementation. Specifically, guidance must be given to allow
the developer to successfully:
•
Indicate the relative priorities of assessment criteria within the exercise;
•
Specify and integrate new aesthetic measures or configure existing measures;
5. Providing a specification for formative CBA in diagram-based domains
•
Facilitate the assessment of
138
multiple model solutions by defining their
commonality and variation points;
•
Specify the method of feedback truncation to be used for the exercise.
5.2.5 Summary
Section 5.2 provided detailed specifications of the requirements which each of the
proposed extensions must attain if the existing drawbacks outlined in section 5.1 are
to be overcome. Section 5.2.1 outlined detailed requirements for diagram aesthetics
to be assessed by courseware, including the need for a flexible and extensible system
which, nevertheless, provides a basis for assessing the aesthetic properties of a wide
range of education diagram types. Section 5.2.2 considered the requirements in
marking exercises with multiple model solutions through considering the
commonality across and variation between solutions. Requirements for educators to
define those key features which denote the different reasoning between model
solutions were outlined. Section 5.2.3 considered the provision of useful formative
feedback in terms of the necessity to allow the prioritisation of comments and a
system of defining the level of truncation applied to the features-linked feedback
comments. Section 5.2.4 considered the scope of advice needed for educators and
developers to develop formative exercises using the courseware: practical advice on
the development of motivational feedback, plus an explanation of the relationships
between the pedagogic priorities of the formative assessment and the functions of the
courseware responsible for their implementation, are required.
5.3 Summary
Chapter 5 built upon the shortcomings of the CourseMarker / DATsys CBA system
which were identified in Chapter 4. Section 5.1 discussed the requirements inherent
in conducting formative, diagram-based CBA, demonstrated that the courseware can
accommodate some of the requirements, especially those shared with conducting
summative CBA, and placed the proposed extensions to the CBA courseware within
context. The topics of fulfilling the criteria linked with CBA, formative assessment
and educational diagrams were each considered in turn.
Section 5.2 developed detailed requirements for each of the extensions, considered
within the context of CBA, formative assessment and educational diagrams. The
5. Providing a specification for formative CBA in diagram-based domains
139
requirements for assessing aesthetic diagram criteria, accommodating mutually
exclusive solution cases and prioritising and truncating the feedback of the system
were each considered in turn. The scope of guidance to be provided to educators and
developers was also discussed. Based upon these guidelines, the extensions can be
designed. Based upon the requirements outlined within this chapter, Chapter 5
documents the design decisions made in the context of each of the extensions and the
integration into the existing courseware.
Chapter 6
Designing the extensions
6. Designing the extensions
141
Introduction
This chapter describes a solution to the problem of automating the formative
assessment of diagram-based coursework using CBA courseware. It describes the
process of designing the extensions to the courseware which were identified as
necessary in Chapter 5: assessing the aesthetics of student diagrams, considering
solutions with mutually exclusive alternate solution cases and prioritising and
truncating the student feedback. The design meets the detailed requirements
identified in section 5.2.
Section 6.1 presents a high-level overview of the design. The issues of ensuring that
the design meets the identified requirements and that the components are
successfully integrated are discussed and a brief overview of the approach in each
area is provided. Section 6.2 describes the design of the extension which enables the
assessment of the aesthetics of student diagrams. A series of aesthetic measures are
chosen to represent commonality across educational diagram domains, while
domain-specific structural measures can be implemented through extension. The
hierarchy and weighting system is described, together with the individual aesthetic
measures. Section 6.3 outlines the extension which deals with solutions which
possess mutually exclusive alternate solution cases. The approach is based upon the
notion that some uncommon features, designated harbingers, define the difference in
reasoning between the mutually exclusive cases. Other features, which are present in
all model solutions, are designated common. Responsibility for defining the solution
is defined and the integration into the features testing system is discussed. Section 6.4
presents the extension responsible for prioritising and truncating the feedback
provided to students. The system of prioritisation is described and responsibilities
for users defined. A configurable system for truncating the prioritised results is
presented.
6. Designing the extensions
142
6.1 High Level Overview
The central purpose which motivates the design of the extensions described in this
chapter is to allow research to be conducted with the aim of proving that formative
CBA in diagram-based domains is feasible and useful. This assessment is feasible if
the extensions can be successfully designed and implemented and if exercises in
educational diagram domains can be developed with realistic levels of time and
effort. The assessment is useful if the exercises assist the process of student learning.
The design assumes that the eventual courseware will be used by three distinct
categories of users:
•
Developers, who are responsible for developing new diagram domains and
carry responsibility for configuring the marking tools and specifying the tests
to be conducted;
•
Teachers, who are domain experts who can design exercises, including
exercise specifications to be presented to students, possible model solutions
and useful feedback;
•
Students, whose learning process is the focus of the formative assessment.
The approach is intended to facilitate a domain-independent environment where
new domains can be assessed through specification and extension, carried out by
developers. Chapter 4 outlined the reasons for using the CourseMarker / DATsys
system as a development base. The design is, therefore, able to take advantage of
considerable existing design infrastructure. It is incumbent upon the extensions,
however, to integrate with the existing architecture in order to provide a smooth,
coherent experience for the student users.
The design must consider the trade-off between, on the one hand, providing a
realistic basis for formative assessment without overwhelming developmental
requirements and, on the other hand, restricting the cross-domain potential for
assessment through allowing insufficient flexibility in extension. At each stage, the
intention is to provide a concrete basis useful for common, node-link type,
educational diagram domains whilst specifying flexibility through extension points
and parameterisation.
6. Designing the extensions
143
6.1.1 Requirements
Section 5.2 outlined set of detailed requirements which must be fulfilled by each of
the extensions if success in formative assessment is to be achieved. The design must
be shown to explicitly meet each of these requirements in turn. Generally, the
requirements aim to ensure that the resultant courseware achieves an optimal tradeoff between flexibility and developmental effort, integrates seamlessly into the
existing architecture and provides a clearly defined role for each of the system’s
users.
6.1.2 High Level Design
This section aims to provide a brief overview of the design strategy for each of the
extensions. The aim is to provide an introduction of the strategy used to ensure that
the extensions are effective and meet the requirements set out in section 5.2. A highlevel overview of the integration between the extensions is then considered in section
6.1.3 before the detailed design decisions are discussed in section 6.2 to 6.4.
6.1.2.1 Assessing the aesthetics of student diagrams
The design of the extension to allow the assessment of the aesthetic properties of
student diagrams is based around the aggregation of input from a series of measures
which each examine distinct aesthetic properties of the student diagram. Each
measure is applied to the diagram, returning a scaled numeric mark and a piece of
motivational feedback. Some diagram domains are subject to domain-specific
aesthetic rules. For this reason, a key distinction is made between aesthetic measures
and structural measures. These are defined as follows:
•
Aesthetic measures are domain-independent and based upon the relationships
between the nodes and links within the diagram and the drawing canvas on
which the diagram has been created;
•
Structural measures are domain-specific and based upon knowledge of the
rules governing the relationships between types of links and nodes, as
defined by the convention of meaning associated with the educational
diagram domain.
6. Designing the extensions
144
These two distinct types of measures encapsulate the commonalities across and
differences between domains of educational diagrams.
Aesthetic measures provide a basis for the marking of general diagram layout across
a range of educational diagram domains. Existing aesthetic measures are based upon
mathematical graph layout criteria and studies of aesthetics in graphical user
interface design. New aesthetic measures may be added by developers upon
discovery, but this process is likely to be irregular. The task of the educator with
regard to aesthetic measures is to specify the relative importance of the aesthetic
criteria through the allocation of weights to the aesthetic measures. This process
reflects the fact that, although certain measures of aesthetics are applicable across
educational diagram domains, their importance varies across domains.
Structural measures provide the means by which the marking of the layout of
student diagrams can be extended to accommodate domain-specific requirements
specified within the convention of meaning of an educational diagram domain.
Structural measures are identified by the educator when a new domain is to be
assessed. The educator is also responsible for defining the relative importance of the
new structural measure through the allocation of a numeric weight. Developers
create new structural measures as required, each time a new educational diagram
domain is to be assessed.
Aesthetic and structural measures both constitute marking tools with similar aims.
The distinction between the two is pedagogical. The way in which the measures are
implemented in each case is similar, although a hierarchy is used to make the
distinction between the two clear for the purposes of avoiding confusion.
The marking scheme is responsible for calling marking tools and providing
parameters. Both aesthetic and structural measures must accept three parameters:
•
The student diagram to be assessed;
•
The relative weight of the measure;
•
A leniency value.
6. Designing the extensions
145
The student diagram to be assessed is represented within DATsys as a Diagram
object. The relative weight of the measure is provided as a real number. The leniency
value is used for linear scaling purposes, based upon the maximum value for the
measure which the educator can reasonably expect the student to obtain.
The criteria against which student diagrams are judged may be derived from
theoretical formulae which assume an ideal diagram scenario. Due to circumstances
beyond the control of the student, therefore, it may be impossible to obtain a score of
100% from one or more measures due to the nature of the nodes and links (which are
defined by the developer) or the circumstances of the model solution (which is
defined by the educator). If the assessment is to be valid (as defined in table 2.1) then
these external assessment qualities, which are not based upon a reflection of the
ability of the student, should be maximally excluded. Therefore, it is incumbent upon
the developer to determine, by considering the model solutions, the base level in
each criterion which it is reasonable to expect the student to attain. This value can be
used, in a process of linear scaling, to ensure that this value is “scaled up” to 100%,
with other values being proportionately scaled through a linear process.
Since developers may develop new measures, both aesthetic and structural, to extend
the functionality of the CBA courseware further, it is necessary to define a standard
to which all measures must conform. An interface is used, therefore, which all
aesthetic and structural measures must implement. The interface enforces the
acceptance of the three parameters and the return of a MarkingResult object to
enable integration with the CourseMarker marking system.
Therefore, the architecture of the extension consists of a package layout, which is
located in the CourseMarker marking system and which contains the interface
LayoutToolInterface. The package layout.aesthetic contains classes which
implement the aesthetic measures, while the package layout.structural is
provided to developers to add domain-specific structural measures.
The exercise marking scheme is used to call the aesthetic measures. If structural
measures are present then they too are invoked by the marking scheme. The student
drawing, the relative weight of the measure and the leniency value are passed as
6. Designing the extensions
146
parameters to the measures, which each return a MarkingResult containing the
weighted mark and motivational feedback.
Section 6.2 describes in more detail the concrete design of this extension. The linear
scaling system is described and the LayoutToolInterface interface is fully
defined. Suitable aesthetic measures are chosen from criteria in the fields of graph
layout and GUI design, and their transformation into marking tools in the
layout.aesthetic package is outlined. The commonality in design and
implementation between aesthetic and structural measures is discussed and their
different usage explained. Finally, the integration into the existing CourseMarker
architecture is described and the intersection points made clear. The design is shown
to arise from the detailed requirements listed in section 5.2.1.
6.1.2.2 Assessing solutions with mutually exclusive alternate solution cases
The design of the extension to allow the assessment of solutions with mutually
exclusive alternate solution cases arising from the acceptability of multiple model
solutions is based upon the notion of allowing the definition of features set cases. The
features set cases are derived from the acceptable model solutions. The educator
defines the acceptable model solutions for the exercise as part of the process of
creating the exercise and outlines, to the extent possible, those features which denote
the difference in reasoning which resulted in the model solution case. The developer
takes the model solutions and identifies those elements (nodes and links) which are
common to all model solutions, defined in section 5.2.2 as I . Features tests which
search for the elements in I and the relationships between them are constructed by
the developer. These tests constitute the first features set case, FT0 .
Subsequent features set cases, FT1 ...FTx , contain features tests whose success
depends on the presence of the elements and links present within a mutually
exclusive solution case, defined in section 5.2.2 as D x . These features tests are, by
definition, uncommon features which are not present in all model solutions. The first
features test within each features set case FT1 ...FTx ideally denotes a feature test
whose search criteria checks for an element or combination of elements which is
unique to the specific mutually exclusive alternate solution case. This feature test is
known as the distinction test since it is used to distinguish between alternate mutually
6. Designing the extensions
147
exclusive solution cases. The element or combination of elements in the model
solution case which is used to distinguish the case from all others is called the
harbinger.
In the ideal situation an element or combination of elements which represent the line
of reasoning that resulted in the student arriving at that particular version of the
model solution is identified by the educator and used by the developer for the
distinction test; in this case a perfect harbinger has been found. This situation is ideal
since the feedback can be focused on the specific line of reasoning associated with the
specific model solution.
It is possible that the educator is unable to describe a precise line of distinct
reasoning which leads to each version of the model solution. The design can still be
used to assess student solutions where multiple model solutions are plausible
without it. The minimum requirement for the distinction test is that it searches for an
element or combination of elements which is unique to the specific model solution;
such an element or combination of elements must exist or, by definition, the features
set case is not assessing a mutually exclusive alternate solution case. Such an element
or combination of elements is an imperfect harbinger.
After creating the teaching materials, exercises and one or more model solutions, the
educator is responsible for highlighting those unique elements within each model
solution which will be used to determine the distinction test. The educator also
prioritises the features and is responsible for generating positive, motivational
feedback. This responsibility lies with the educator across a wide variety of
automated assessment cases; the task is onerous, but since the exercises may be
repeatedly re-used in a formative assessment context the time can be justified.
Guidance for generating positive, motivational feedback is outlined in chapter 7.
Features tests associated with common elements, FT0 , are placed into the first
features tests file, [exercisename].ft0. Features test cases FT1 ...FTx are placed
into features test files [exercisename].ft[x] with the features tests representing
the distinction test defined as the first features test in each file. The structure of
individual features tests remains, as previously defined in chapter 4, as follows:
6. Designing the extensions
•
148
Mark weight : Feature expression : Description : Positive feedback :
Negative feedback
The DiagramFeaturesTool marking tool is based upon the same principles as the
EntityRelationshipTool tool described in section 4.2. Nodes are identifiable by
their Name and Text Content while links are identifiable by their Name, Start Node
and End Node. The features expression types exist, exact, connection and
exactConnection have usage and meaning consistent with EntityRelationshipTool. The
feature expression type compositeRelationship is included for backward compatibility
purposes but its usage is discouraged due to its domain-specific nature.
The exercise marking scheme is used to call the DiagramFeaturesTool once for
each features test file. The student drawing file and the correct features test file are
passed to the DiagramFeaturesTool as parameters as each call is made. The
DiagramFeaturesTool returns a MarkingCompositeResult at each call. The
remaining task is to parse the MarkingCompositeResult tree to determine which
of the mutually exclusive solution cases the student solution attempts to emulate.
The remaining MarkingCompositeResult objects can be pruned from the tree
accordingly.
The process of parsing the marking tree to determine the best solution case to
consider and then truncating the tree to remove the other cases is accomplished as
part of the responsibility of the PrioritiseTruncateTool described in section
6.1.2.3. Since much of the functionality of truncating and prioritising the tree
duplicates the functionality required to prioritise and truncate the student feedback
in general, the construction of a separate tool was not justified. The allocation of
responsibility for prioritising the alternate mutually exclusive solution cases to the
PrioritiseTruncateTool
constitutes
the
primary
relationship
and
interdependency between the extension to enable the assessment of mutually
exclusive solution cases and the extension to prioritise and truncate student
feedback. The integration of the extensions is summarised more generally in section
6.1.3.
Section 6.3 describes in more detail the concrete design of this extension. The
responsibility of users, the defining of features and the use of the marking scheme to
search for features are described. Finally, the integration into the existing
6. Designing the extensions
149
CourseMarker architecture is described and the intersection points made clear. The
design is shown to arise from the detailed requirements listed in section 5.2.2.
6.1.2.3 Prioritising and truncating feedback to students
The design of the extension to allow the delivery of prioritised and truncated
feedback to students is based around providing an extensible, flexible and
configurable mechanism for the developer to encapsulate different methods of
prioritisation and truncation, based upon the wishes of the educator.
The problem of prioritising and truncating the feedback to students can be divided
into four smaller tasks, each of which may be accomplished in a number of different
ways. The four smaller tasks are:
1. Establishing which of the competing mutually exclusive solution cases has
the highest priority and deciding what course of action to take with regard to
the feedback generated by the other mutually exclusive solution cases;
2. Prioritising the feedback provided by all features tests;
3. Prioritising the feedback generated by the aesthetic and structural measures;
4. Truncating the feedback.
To the solving of each of these smaller tasks, multiple strategies could be applied
depending upon context-dependent factors such as the nature of the domain, the
details of the assessment, the type of the students and the preferences of the
educator.
A strategy for solving the first problem could be examining the distinction test for
each mutually exclusive solution case, determining the distinction test with the
highest score and pruning all other mutually exclusive solution cases from the
feedback tree. An alternative strategy could examine other features tests within the
mutually exclusive solution case to determine if features from multiple cases were
being confused by the student.
A strategy to solve the second problem could sort the feedback from the common
elements features tests FT0 together with the remaining mutually exclusive solution
6. Designing the extensions
150
cases to determine the most important feedback overall. Alternately, a strategy could
prioritise comments from FT0 in the event of a low overall mark and introduce
feedback from mutually exclusive solution cases only when the student solution
passes a given threshold.
A strategy to solve the third problem could inter-mingle aesthetic and structural
measures and determine the highest priority comments overall, or try to prioritise
structural measures in the early stages to ensure domain correctness before
emphasising feedback comments from aesthetic measures later once a threshold is
reached.
Truncating the feedback could involve retaining a specific number of features
comments and a specific number of layout comments, both specified by the educator.
Alternately, the topmost percentage of comments above a threshold could be
provided to the student.
The design makes use of the object-oriented Design Pattern known as Strategy
[GHJ+94]. The intent of the Strategy pattern is to define a family of interchangeable
algorithms, allowing the algorithm to vary independently from the clients which
make use of it.
The Strategy pattern has three participants: the Strategy, the Concrete Strategy and the
Context. The Strategy defines an interface common to all supported algorithms, while
a Concrete Strategy implements a specific algorithm whilst conforming to the Strategy
interface. The Context is configured with a Concrete Strategy object, maintains a
reference to a Strategy object and may define an interface which lets Strategy access its
data.
In this case the four tasks to be completed in the prioritisation and truncation of
student feedback are translated into four Strategy interfaces. These are, in order of the
problems listed above, the SolutionCaseStrategy, FeaturesSortStrategy,
AestheticsSortStrategy and TruncationStrategy interfaces.
The PrioritiseTruncateTool acts as the Context to all four strategies. The tool is
configured with four objects representing the configured algorithm to be employed
at each of the four stages of the prioritisation and truncation process.
6. Designing the extensions
151
The educator specifies the methodology to be used at each stage of the prioritisation
and truncation process. The developer then develops a Concrete Strategy for each
stage containing an algorithm which encapsulates the methodology defined by the
educator, and which conforms to the Strategy interface responsible for the specific
stage of the process. The educator can then specify parameterisation on a coursespecific (or, if desired, exercise-specific) basis.
For example, an educator could decide on a truncation strategy for stage 4 of the
process which involves retaining a specified number of feature-related feedback
comments and another specified number of layout-related feedback comments. This
methodology could be used by the developer to develop a Concrete Strategy called
OrdinateTruncationConcreteStrategy
which
conforms
to
the
TruncationStrategy interface. For a given exercise, the educator could decide to
retain precisely the 2 most relevant feature-related feedback comments and the 2
most relevant layout-related feedback comments. This information is used, in the
form
of
parameters,
in
the
construction
of
a
new
OrdinateTruncationConcreteStrategy object which, in turn, is used as one of
the four Concrete Strategies necessary to configure the PrioritiseTruncateTool.
Section 6.4 describes in more detail the concrete design of this extension. The
PrioritiseTruncateTool is defined, along with the four Strategies used for
configuration. The responsibility of users is made explicit, the integration into the
existing CourseMarker architecture is described and the intersection points are made
clear. The design is shown to arise from the detailed requirements listed in section
5.2.3.
6.1.3 Extension Integration
Figure 6.1 illustrates a high-level view of the relationships between the extensions
discussed in this section. The student solution is marked through a Marking Scheme
which invokes configured instances of each extension in turn.
6. Designing the extensions
152
Student solution
Exercise Specific Configuration
Mutually exclusive features marking
Common features
tests
Mutually
exclusive features
tests
Aesthetic layout marking
Aesthetic
measures
configuration
Structural
measures
configuration
Full
feedback
Prioritisation and truncation
1. Solution case
priority and
pruning
2. Features
prioritisation
3. Aesthetics
prioritisation
4. Truncation
Prioritised, truncated
student feedback
Figure 6.1: A high-level view of the relationships between the extensions
6. Designing the extensions
153
A Composite Marking Result is created by the Marking Scheme to hold the feedback
returned by each extension. A Composite Marking Result operates using a tree-like
structure designed to be intuitive to the student when feedback is presented. Each
node on the Composite Marking Result may be either a Marking Leaf Result, which
contains a mark value, weight value, description and feedback comment or,
recursively, another Composite Marking Result.
The first two extensions may be invoked in either order. The extension which solves
the problem of assessing multiple, mutually exclusive solution cases within the
student solution, runs features tests for the common features and each case of the
mutually exclusive features identified by the marking scheme. A Composite Marking
Result
is
generated
upon
each
run-through
of
the
features
test
tool
DiagramFeaturesTool. Each of the Composite Marking Results contains a
Marking Leaf Result for each features test specified in the test case. If there are no
mutually exclusive solution cases then only one Composite Marking Result is
generated, that for the common features.
The extension for aesthetic layout marking runs the aesthetic measures and the
structural measures specified in the configuration. Two Composite Marking Results
are added to the feedback: the first contains the Marking Leaf Result feedback
generated by the aesthetic measures, whilst the second contains that generated by the
structural measures. If no structural measures are invoked then this Composite
Marking Result is empty.
The extension for the prioritisation and truncation of student feedback is invoked
last. The PrioritiseTruncateTool is configured using four legal Concrete
Strategies, each of which is applied in order. The first strategy prioritises the mutually
exclusive solution cases to decide which mutually exclusive solution case is most
relevant to the student solution. Depending upon the strategy, all other mutually
exclusive solution cases may be subsequently ignored by pruning the appropriate
Composite Marking Result branches. The second and third strategies prioritise the
feedback branches provided by the features tests and the aesthetic layout tests
respectively. The final strategy truncates the feedback.
The resultant Composite Marking Result, modified at each stage, is presented to the
student using CourseMarker’s existing feedback mechanism.
6. Designing the extensions
154
6.1.4 Summary
This section provided an overview in words of the design strategy for each of the
three courseware extensions proposed within this work. The assessment of the
aesthetic layout of student diagrams is accomplished through distinguishing
between aesthetic measures, which are domain-independent and structural
measures, which are domain-specific. An interface is defined to ensure compatibility
between the measures. The assessment of student solutions where multiple model
solutions are viable, containing mutually exclusive solution cases, is accomplished by
identifying the commonality and variation between model solutions. Common
features are assessed using a first solution case, while subsequent solution cases
assess the remaining features within each model solution. The fundamental
difference between solution cases is identified by a harbinger. Prioritisation and
truncation of feedback is accomplished through specifying four sub-tasks. The
algorithm for each sub-task can be specified differently within context but must
implement an interface for compatibility. The design for this third extension was
based upon the Strategy design pattern.
Finally, the section provided an overview of the relationships between the three
extensions.
This section aimed to provide an overview and a sense of context to each of the
extensions. The subsequent sections within this chapter offer further detail, as
required, on the specific design decisions taken within the context of each extension,
for the purposes of ensuring that the specific requirements identified in chapter 5 are
fulfilled.
6.2 Assessing the aesthetic layout of student diagrams: resolving
the design issues
Section 6.1.2.1 outlined the approach to assessing the layout of student diagrams. The
approach is based upon implementing marking tools to assess a wide variety of
marking criteria. Broadly, these criteria can be divided into aesthetic measures,
which are domain-independent, and structural measures, which are domain-specific.
A hierarchy was described which separates the aesthetic and structural measures
into separate packages. The marking tools for each marking criteria must implement
6. Designing the extensions
155
an interface and are invoked by the exercise marking scheme. This section
demonstrates the link between the design decisions and the detailed criteria for the
extension provided in section 5.2.1. Concrete design decisions for the class hierarchy,
top-level interface, scaling mechanism, aesthetic measures and structural measures
are made explicit, such that implementation may be achieved.
6.2.1 Linking the design to the requirements
Section 5.2.1 identified the key requirement in assessing the aesthetics of student
diagrams as the task of ensuring that diagrams in many domains can be assessed in a
flexible manner. The system of aesthetic and structural measures achieves this
through considering the domain context of each marking criterion. Aesthetic
measures are domain-independent and can be called upon to assess diagrams from
many educational domains, while the educator is able to specify further, domainspecific, criteria to be implemented as structural measures.
The design also minimises the effort required to assess the aesthetics of a new
diagram domain. Only the unique features of a diagram domain need be adapted into
structural measures. Most common attributes can be assessed by the existing
aesthetic measures, which provide a basis for all domains. It is plausible that, in
many cases, no unique layout rules exist for a new educational domain, in which
case the sole task is to specify the relative weights of the existing aesthetic measures.
Structural measures should not be regarded as indispensable for every diagram
domain — they simply act as an extension point to allow unique domain disparities
to be accommodated. Educator priorities may be expressed through changing the
relative weights of the aesthetic and structural measures. There is also the further
option of defining prioritisation strategies between aesthetic and structural measures,
as discussed in section 6.1.2.3.
The extension is integrated into the existing marking and feedback system. Aesthetic
and structural measures are CourseMarker marking tools which can be called from
within the exercise marking scheme. The measures return their results as a
CourseMarker standard marking result which can be returned to the student using
the existing feedback mechanism. In practice, the marking result will be subject to
modification by the extension to enable the delivery of prioritised, truncated
feedback before it is presented to the student, but the process is still transparent from
6. Designing the extensions
156
the student’s point of view. The aesthetic and structural measures recognise the
existing conventions for specifying diagram formats (in Daidalos) through the
mechanism of enumerating the diagram components, in the same way as existing
diagram features marking tools.
The extension provides a basis for assessing educational diagrams generically
through the implementation of aesthetic measures, while structural measures
constitute the platform for extension. Aesthetic measures for diagrams are based
upon justified criteria. Aesthetic measures are drawn from documented aesthetic
criteria in the fields of graph layout and user interface design with demonstrated
real-world application. Structural measures can be specified on a domain-specific
basis by a domain expert. The relative importance of the criteria, based upon either
research, anecdotal evidence or simply the “gut feeling” of the educator, can be
specified through the system of weighting.
6.2.2 Hierarchy
Figure 6.2 illustrates the hierarchy of the extension. The layout package is
positioned at the top-level of the hierarchy, while both the aesthetic and
structural package occupy one level beneath the layout package. The
LayoutToolInterface interface is located within the layout package.
layout
aesthetic
contains
<<interface>>
LayoutToolInterface
structural
Figure 6.2: The hierarchy of the aesthetic layout extension
Figure 6.3 illustrates the locations of Marking Tools representing both aesthetic and
structural measures. Marking tools for aesthetic measures are located within the
aesthetic package whilst marking tools for structural measures are located within
6. Designing the extensions
157
the structural package. Both aesthetic and structural measures must implement
the LayoutToolInterface interface.
aesthetic
structural
BalanceTool
StructuralToolA
EquilibriumTool
StructuralToolB
…
…
CohesionTool
StructuralToolX
<<interface>>
LayoutToolInterface
Figure 6.3: Aesthetic and structural measures implement LayoutToolInterface
6.2.3 Interface
The LayoutToolInterface interface is shown in figure 6.4. The interface contains
one method which must be implemented: mark.
<<interface>> LayoutToolInterface
+mark( Drawing, int, double) : MarkingLeafResult
Figure 6.4: The LayoutToolInterface interface
The mark method requires three parameters: the student diagram to be assessed, the
relative weight and the leniency value for scaling purposes. The method returns a
MarkingLeafResult.
The
MarkingLeafResult
is
defined
within
the
6. Designing the extensions
158
CourseMarker marking system as an extension of TMarkingResult which
encapsulates the following data:
•
markvalue, an integer representing the final, scaled mark returned by the test;
•
weight, an integer representing the relative weight of the feedback result;
•
description, a String holding the description of the test;
•
feedback, a String holding the feedback returned by the test.
6.2.4 Scaling
Aesthetic and structural measures use scaling to translate the raw score into a
suitable mark to return embedded in the MarkingLeafResult. The linear scaling
mechanism, which has been in use since the time of Ceilidh [ZF94], requires but a
trivial modification to take into account the fact that aesthetic and structural
measures return scores between 0 and 1. Figure 6.5 illustrates the simple relationship
scaled mark
between the raw score and the scaled mark.
100%
a
raw score
Figure 6.5: The relationship between the raw score and the scaled mark
The value a represents the leniency factor, the maximum raw score which the
educator feels it is reasonable to expect the student to achieve for this measure. All
raw scores above the leniency factor are scaled to 100%, while scores below the
leniency factor are scaled to a percentage of a.
6. Designing the extensions
159
6.2.5 Aesthetic measures
Section 2.3.4 introduced the concept of educational diagram aesthetics and examined
aesthetic criteria from the fields of graph layout and user interface design.
Furthermore, section 2.3.4 explained that graph layout criteria from syntactic graphs
must be shown to be justified in a real-world context before being used to assess
educational diagrams. The process of choosing aesthetic criteria, from which to
develop aesthetic measures, was based upon the notion that several requirements
must be considered:
•
The criterion must not be domain-specific and must have the potential for
relevance across a wide variety of educational diagram domains;
•
The criterion must be able to be expressed algorithmically such that a
numeric value in the range 0 to 1 can be assigned to a student diagram to
indicate student compliance;
•
The criterion must not require the student to conduct complex modification to
their solution for the purposes of “assisting” the algorithm to be successfully
applied.
On the basis of these requirements, eleven criteria were chosen to be implemented as
aesthetic measures. Two of these, the principles of non-intersection and noninterception, were taken from the field of graph layout. The remaining nine —
balance, equilibrium, unity, proportion, simplicity, density, economy, homogeneity
and cohesion — were taken from the field of user interface design. The criteria fulfil
the first two requirements completely. The third requirement is not fulfilled by
several of the criteria taken from the field of user interface design, which requires the
student to modify their solution slightly, prior to submission.
Subsequent sub-sections document the process of transforming these criteria into
aesthetic measures. Section 6.2.5.1 documents the process of designing the aesthetic
measures to assess non-intersection and non-interception, including the need to
identify a suitable formula, provide a clear design for the class, scale the raw score
and conform to the LayoutToolInterface interface. Existing formulae exist for
calculating compliance to the criteria taken from graphical user interface design.
Section 6.2.5.2 documents the process of creating an aesthetic measure to assess the
6. Designing the extensions
160
property of equilibrium, while section 6.2.5.3 provides an overview of creating the
other, similar aesthetic measures. Finally, section 6.2.5.4 describes and justifies the
student modifications which must be made if several of the measures inspired by
graphical user interface design are to be effectively measured and demonstrates that
the required modification is insufficiently major to disrupt student learning.
6.2.5.1 The aesthetic measures for non-interception and non-intersection
The first necessary step in the process of designing an aesthetic measure is the
identification of an appropriate method to determine compliance with the criterion
numerically. Non-interception, as discussed in section 2.3.4.2, refers to minimising
the number of lines in the diagram which cross over other lines. Non-intersection
refers to minimising the number of nodes which intersect other nodes. In this case
equation 6.1 is sufficient to determine non-interception, where c is the number of
valid lines that intercept another and l is the number of valid lines on the canvas.
M non −int erception = 1 −
c
l
Equation 6.1: The non-interception measure
NonInterceptionTool
+mark( Drawing, int, double ) : MarkingLeafResult
- noninterception( Drawing ) : double
- lineCrossed( Figure, Drawing ) : boolean
- linesCross( Figure, Figure ) : boolean
<<interface>>
LayoutToolInterface
Figure 6.6: The design of the non-interception tool
The design for the NonInterceptionTool is simple. The linesCross method
returns true if two line figures cross each other. The lineCrossed method returns
true if a line is crossed by any other line in the drawing by repeatedly invoking
linesCross. The noninterception method counts the number of lines and the
number of lines which are crossed before applying equation 6.5 to obtain the raw
score. The mark method, which must be defined in order to implement the
LayoutToolInterface interface, invokes noninterception to obtain the raw
score, applies scaling to obtain the mark and calls the MarkingLeafResult
6. Designing the extensions
161
constructor, supplying the scaled mark and weight, the internal description and the
associated feedback before returning the MarkingLeafResult. Figure 6.6 presents
an overview of the NonInterceptionTool using this design.
Similarly, the design of the non-intersection tool is based upon equation 6.2, in which
o is the number of nodes that intersect at least one other node, while t is the total
number of valid nodes on the canvas. Non-intersection refers to minimising the
number of valid nodes in the diagram which overlap. The NonIntersectionTool
operates similarly to the NonInterceptionTool, and is summarised in figure 6.7.
M non −int er sec tion = 1 −
o
t
Equation 6.2: The non-intersection measure
NonIntersectionTool
+mark( Drawing, int, double ) : MarkingLeafResult
- nonintersection( Drawing ) : double
- figOverlaps( Figure, Drawing ) : boolean
- figsOverlap( Figure, Figure ) : boolean
<<interface>>
LayoutToolInterface
Figure 6.7: The design of the non-intersection tool
6.2.5.2 The aesthetic measure for equilibrium
This section outlines the process of designing the aesthetic measure for equilibrium.
The process of designing the aesthetic measures for balance, unity, proportion,
simplicity, density, economy, homogeneity and cohesion was very similar to the
process of designing the aesthetic measure for equilibrium; section 6.2.5.3 discusses
the design process of these aesthetic measures, based upon the process described
here.
Table 2.3 has previously given a brief description of the equilibrium criterion as “The
difference between the centre of mass of the elements and the physical centre of the screen /
canvas”. Ngo et al [NTB00] provide an extended definition of equilibrium, together
with formulae to enable equilibrium to be calculated. These formulae are reproduced
here as equations 6.3, 6.4 and 6.5.
6. Designing the extensions
EM = 1 −
EM x + EM y
2
162
2∑ a i ( x i − x c )
n
EM x =
2∑ a i ( y i − y c )
n
EM y =
nb frame ∑ ai
i
n
i
nh frame ∑ ai
i
n
i
Equation 6.3:
Equation 6.4: x-axis
Equation 6.5: y-axis
Equilibrium
equilibrium component
equilibrium component
In equations 6.3, 6.4 and 6.5, EM is the equilibrium measure, EM x is the x-axis
equilibrium component, EM y is the y-axis equilibrium component,
(xc , y c ) are the co-ordinates of the centres of object
(xi , y i )
and
i and the frame, ai is the area of
object i , b frame and h frame are the width and height of the frame and n is the number
of objects on the frame.
The design process for the aesthetic measure for equilibrium is similar to that for the
non-interception and non-intersection measures discussed in section 6.2.5.1, but the
need to develop mathematical formulae to enable the calculation of the measure
numerically is obviated by the existence of such formulae in the existing literature. A
mark method calls a method equilibrium which invokes methods to calculate the
x- and y-axis equilibrium components, and so on. The design of the aesthetic
measures for criteria based upon user interface design principles is thus rendered a
straightforward, if laborious, process.
The Figure objects embedded in each Drawing object can be accessed by means of
a FigureEnumeration. Each Figure contains ‘getter’ methods for attributes
including its centre and size. Consequently, the only modification required to
complete the required calculation is the specification of the diagram’s border. This
procedure, and the pedagogical issues surrounding it, is outlined in section 6.2.5.4.
6.2.5.3 The aesthetic measures for balance, unity, proportion, simplicity,
density, economy, homogeneity and cohesion
The aesthetic measures for balance, unity, proportion, simplicity, density, economy,
homogeneity and cohesion were developed in the same way as the aesthetic measure
for equilibrium. Mathematical formulae allowing these measures to be determined
and compliance expressed numerically are already available in the literature. Ngo et
6. Designing the extensions
163
al [NTB00] provide an overview of interface aesthetics. The aesthetic measures for
balance, unity, proportion, simplicity, homogeneity and cohesion were based around
the equations presented in [NTB00]. The aesthetic measures for density and economy
were based upon the equations published in [NB00]. Section 6.2.5.4 justifies the
diagram modifications which must be made by the student if several of the measures
inspired by graphical user interface design are to be effectively measured.
6.2.5.4 The need for students to adapt their solutions
Section 3.3 emphasised that a key benefit of the DATsys framework and the Theseus
student diagram editor was the ability to allow the student to draw their solution
onto a canvas in an interactive and intuitive way. It would have been possible for
students to specify their diagram solution using other means, for example a
proprietary text-based notation entered through a text editor, but this would have
created an extra layer of abstraction between the student and the solution, hence
unnecessarily hindering the learning process. Section 3.1.2.1 described the Kassandra
system, where students were indeed expected to adapt their solutions to the
requirements of the marking system.
It is clear that the level of disruption to the student learning process is related to the
amount of modification which the student is required to perform. The student
learning process will also be impacted less if the modification can be understood
easily by the student, rather than involving requirements which are not understood
by the student and are viewed as “abstract”.
x
y
Figure 6.8: The co-ordinate system in DATsys diagram editors
In order for the aesthetic measures based upon graphical user interface design
principles to be successfully calculated, it is necessary to define the boundaries of the
student diagram. This is necessary if certain properties of the diagram, such as its
6. Designing the extensions
164
centre, are to be calculated. Within the DATsys framework, drawings are allowed an
unbounded size, based upon a grid system of co-ordinates, as illustrated in figure
6.8. Students can use both scroll and zoom facilities to traverse large diagrams. One
possible solution was to impose a canvas size upon the student for each exercise —
this solution, however, is prescriptive to the student and fails to take into account
that different diagram sizes may be required for different model solutions. The
solution adopted, therefore, was to allow the student to specify the boundaries of
their own diagrams by drawing a BorderRectangle around their solution, prior to
submission. The extent of the student modifications is illustrated in figure 6.9.
Figure 6.9: Original student solution and student solution with modification
In figure 6.9 an illustrative student solution is shown both before and after
modification. For the student to make the modification, they must select the
BorderRectangle tool from the library and use it to draw the rectangular
boundaries of their solution. Only figures within the boundaries of the
BorderRectangle will be considered by the marking process. This feature also
allows students to leave reminder notes for their own purposes (for example, to
remind them of why they chose features in their solution should they choose to view
6. Designing the extensions
165
their solution again at a later date), by simply placing the comments or other objects
outside the boundaries of the BorderRectangle, where they will be ignored for
marking purposes. Since this modification is simple, easy to understand theoretically
and can be carried out within the student environment Theseus, it is clear that it is
unlikely to impact upon the student learning process.
6.2.6 Structural measures
The design process for structural measures is precisely the same as for aesthetic
measures. Structural measures must implement the LayoutToolInterface
interface, must be based upon a criterion that can be expressed algorithmically such
that a numeric value in the range 0 to 1 can be assigned to a student diagram to
indicate student compliance and must not require the student to conduct complex
modification to their solution for the purposes of “assisting” the algorithm to be
successfully applied. Structural measures are invoked by the exercise marking
scheme is the same way as aesthetic measures.
The primary difference between structural and aesthetic measures is pedagogic.
Structural measures should measure some criterion that is domain-specific. Only
exercises within the domain associated with the structural measure would invoke the
measure in their marking scheme. The MarkingLeafResult returned by a
structural measure is assigned to a different MarkingCompositeResult to those
returned by aesthetic measures to allow the PrioritiseTruncateTool to
distinguish between the two when prioritising and truncating student feedback.
Many domains will not require structural measures to be designed and implemented
since there may be no domain-specific layout rules, allowing the layout of student
diagrams within the educational domain to be assessed adequately by the aesthetic
measures alone. Facility for structural measures is provided as an extension point to
allow the layout of student diagrams in non-typical educational diagram domains to
be addressed by educators and developers at a later date. If a new educational
diagram domain does not require structural measures to be assessed, then
development effort has been successfully minimised. If an educational domain
requires structural measures, then the task of the educator is to specify one or more
structural measures which can be implemented by the developer. The developer
creates one class for each structural measure, which must be located in the
6. Designing the extensions
layout.structural
package
166
and
which
must
implement
the
LayoutToolInterface interface. The mark method of the class is then invoked
within the marking scheme of exercises within the educational diagram domain.
6.2.7 Summary
This section outlined the specific design decisions made to allow the implementation
of the design outlined in section 6.1.2.1 to occur. The design decisions were linked to
the detailed requirements and a hierarchy of packages and classes was defined. The
interface which all aesthetic and structural measures must implement was defined
and the design of the aesthetic measures was illustrated at length. Finally, the design
similarity of structural and aesthetic measures was explained and the difference in
usage emphasised. Section 6.3 outlines the specific design decisions made to allow
the assessment of solutions with mutually exclusive alternate solution cases to occur.
6.3 Assessing solutions with mutually exclusive alternate solution
cases: resolving the design issues
Section 6.1.2.2 outlined the approach to assessing solutions with mutually exclusive
alternate solution cases. The approach is based upon identifying the common and
uncommon elements within the acceptable model solutions. Features tests based
around these elements are then constructed in features test cases. The 0th features test
case contains all features test cases based around the common elements, while
subsequent features test cases are based around those uncommon elements present
in a model solution. Therefore, if x model solutions have been designated acceptable
by the educator, then x + 1 features test cases will be required.
Although the functionality introduced by this extension is key to allowing the
formative assessment of student coursework in free-form, diagram-based domains,
the design and implementation process for this extension was the least demanding of
the three extensions discussed in this work. The design is able to build upon existing
functionality within the CourseMarker marking system. This section demonstrates
the link between the design decisions and the detailed requirements for the extension
presented in section 5.2.2. The process of implementing the features test cases using
the generic DiagramFeaturesTool and invoking the test cases from within the
marking scheme is described. The key process of identifying suitable harbingers
6. Designing the extensions
167
within each alternate model solution, and defining a distinction test based upon each
harbinger, is outlined. Finally, possible methods of distinguishing between solution
cases in order to prioritise feedback, after the marking process has been undertaken,
are proposed and the decision to incorporate this stage of the marking process as a
strategy within the PrioritiseTruncateTool is justified.
6.3.1 Linking the design to the requirements
Section 5.2.2 outlined the requirements for assessing solutions with mutually
exclusive alternate solution cases. The way in which the alternate solution cases arise
from the acceptability of multiple model solutions was outlined and the need to
specify the common and uncommon features across the model solutions was
explained and the requirements in the areas of CBA, educational diagrams and
formative assessment were shown to arise from this situation.
The requirement for the exercise developer to specify the different solution cases is
accomplished by allowing the specification of the common features, and the
uncommon features tests associated with each model solution, to be achieved
through the use of separate features test files. The specification of features tests has
been common to CBA since the days of Ceilidh. Section 4.2 provided a description of
features testing within the context of diagram-based CBA using CourseMarker. The
features tests can be specified using the four generic features expressions exist, exact,
connection and exactConnection which are implemented within the generic
DiagramFeaturesTool. It is possible to extend the tool to introduce new features
expressions as required, but this does not constitute part of the extension since the
ability to create new marking tools is a historic ability of Ceilidh and CourseMarker
[Sp06]. The novelty of the extension, and the focus here, lies in the ability to allow
alternate cases of feature tests to be assessed, rather than in the specification of the
features tests themselves. The DiagramFeaturesTool allows generic features tests,
such as checking for the existence of nodes and the links between them, to be
conducted in order to fulfil the requirement that the exercise developer should be
able to specify assessment criteria across domains with a minimum of development
effort and maximum consistency.
The extension is integrated into the marking and feedback system. Marking tools can
be invoked by the exercise marking scheme. They return feedback by returning
6. Designing the extensions
168
marking results. Transparency from the student perspective is achieved by returning
the marking result using the existing feedback delivery system. Integration with the
PrioritiseTruncateTool allows raw feedback to be modified and truncated
through the parameterisation of one tool, minimising development effort for the
exercise developer and reducing the need for parallel tools and development
hierarchies.
The extensions provide a basis for the assessment of a wide variety of educational
diagram domains through the use of a generic mechanism to allow marking tools,
such as the domain-independent DiagramFeaturesTool, to be executed on
multiple occasions. This allows the features tests to be assessed independently of
context and MarkingLeafResult objects to be composed for later examination
within the context of the PrioritiseTruncateTool. The DiagramFeaturesTool
allows educators to specify criteria for assessment as features tests. The system for
specifying weighting to account for measures of unequal importance survives intact
from previous CourseMarker features testing standards, but the usage is changed to
reflect formative, rather than summative, assessment priorities.
The task of determining which model solution the student is attempting to attain is
solved by the first strategy of the PrioritiseTruncateTool. This strategy is
examined in section 6.4. Again, the design decision to integrate this functionality into
the PrioritiseTruncateTool minimises development effort on the part of the
exercise developer. It also facilitates the requirement to achieve compatibility
between the feedback provided by this extension and the mechanism to allow the
prioritisation and truncation of student feedback outlined in section 6.4.
6.3.2 A tool for generic features testing of diagrams
Section 6.3.1 explained that the DiagramFeaturesTool is not a key component of
the extension and, furthermore, pointed out that any other CourseMarker marking
tool which supported features-based testing in a manner compliant with the domain
to be assessed could be substituted in place of the DiagramFeaturesTool. It is still
necessary to provide a brief overview of the DiagramFeaturesTool, however, for
several key reasons. Firstly, the DiagramFeaturesTool is generic. Therefore, it
provides a basis for features testing to be conducted across a wide variety of
educational diagram domains. It is in keeping with the design decisions taken
6. Designing the extensions
169
throughout this work that the commonality across domains be used to provide a basis
for formative assessment across domains, while still allowing future flexibility by
allowing extensions to be made by developers. Secondly, providing a platform for
features testing is an essential prerequisite if the assessment of alternate features
testing cases is to occur.
The
DiagramFeaturesTool
is
based
closely
upon
the
EntityRelationshipTool used in the initial experiment described in Chapter 4.
However, where the EntityRelationshipTool was constructed on an ad hoc
basis, within a strict time frame, the DiagramFeaturesTool benefits from a clear
design perspective which is intended to guide developers in the process of creating
their own features testing tools, should this be required in the future.
DiagramFeaturesTool
+mark( String, String ) : TMarkingResult
- exist( Drawing, String, String ) : int
- exact( Drawing, String, String ) : int
- connection( Drawing, String, String ) : int
- exactConnection( Drawing, String, String ) : int
- getFigure( Drawing, String ) : Figure
- getExactFigure( Drawing, String, String ) : Figure
- findFigure( Drawing, String ) : int
- findExactFigure( Drawing, String, String ) : int
- findConnection( Drawing, String, String, String ) : int
- findExactConnection( Drawing, String, String, String,
String, String ) : int
<<abstract class>>
TMarkingTool
Figure 6.10: The DiagramFeaturesTool
The DiagramFeaturesTool extends the abstract class TMarkingTool. Several
utility methods are present: findFigure returns the number of figures within a
drawing which match the given figure name, getFigure returns the first figure
which
matches
the
given
figure
name,
whilst
findExactFigure
and
getExactFigure perform analogous functions based upon both a figure name and
displayed text. findConnection and findExactConnection return the
connection lines specified by name, and name and display text, respectively.
Methods exist, exact, connection and exactConnection generate an
enumeration of figures based upon the student diagram and return the number of
6. Designing the extensions
170
times the specified condition was matched. Finally, the public method mark acts to
draw the functionality together. The String of the features test is parsed, with
relevant information stored in variables. A case statement is used to call exist,
exact, connection and exactConnection, dependent upon context and then to
check if the feature test has been met, based upon the five accepted operators
described in section 4.2. Based upon this, a new TMarkingResult is created and
returned. Figure 6.10 shows the design of the DiagramFeaturesTool.
The DiagramFeaturesTool is invoked by the exercise marking scheme. The
features tests are as described in section 6.1.2.2.
6.3.3 Designing the process of assessment for mutually exclusive
solution cases
The assessment of student solutions in which multiple model solutions are deemed
acceptable by the educator is a two stage process. The first stage involves assessing
each of the features tests cases using a suitable marking tool such as the
DiagramFeaturesTool. The second stage involves using a strategy to decide
which of the model solutions the student is attempting to achieve and modifying the
feedback accordingly. This section examines the first stage of this process. The
second stage of the process is discussed in section 6.3.4, with the resultant design
decisions being used as a SolutionCaseStrategy in section 6.4.
MarkingCompositeResult
FeaturesFeedback0
FT0
FT1
(e.g. Diagram
Features
Tool)
MarkingCompositeResult
FeaturesFeedback1
…
…
FTx
Marking
Tool
MarkingCompositeResult
FeaturesFeedbackx
Figure 6.11: Marking multiple features test cases
Figure 6.11 illustrates the process of assessing multiple feature test cases. The
marking tool, for example the DiagramFeaturesTool, is invoked repeatedly by
the exercise marking scheme. A new MarkingCompositeResult is generated for each
6. Designing the extensions
features
test
case.
171
Within
each
MarkingCompositeResult,
a
MarkingLeafResult node is used to store the mark, weight, description and
feedback returned by the marking tool for each individual features test within the
test case. The description of each MarkingCompositeResult is set to allow the
common features test set, and each of the mutually exclusive solution cases, to be
identified when the process of comparing the results generated by each of the
solution cases is undertaken by the PrioritiseTruncateTool.
6.3.4 Harbingers and the distinction test
Section 6.1.2.2 has already discussed the role of harbingers, solution elements or
combinations of elements which exist in only one model solution. Harbingers are used
to construct the distinction test for each mutually exclusive solution case. The
distinction test is a features test which should only succeed if the harbinger elements
are found. This technique provides valuable assistance in the process of ascertaining
which of the model solutions the student is attempting to attain. Indeed, some
strategies to distinguish between solution cases may rely entirely upon the detection
of harbingers through the distinction test.
The relative importance of features tests is generally indicated through the system of
weighting. The distinction test, however, may or may not be educationally important
to the student learning process. It is, instead, useful because it is useful to the features
marking process within the context of assessing mutually exclusive solution cases.
For this reason, the identification of the distinction test is carried out using a
mechanism unconnected with the weighting system. In each mutually exclusive
solution case, the distinction test is always the first features tests within the test set.
This convention is carried through the marking tool; the feedback generated by the
distinction test will always be held by the first MarkingLeafResult held within
the MarkingComposite result for the mutually exclusive test case. Components of
the DiagramFeaturesTool which implement the SolutionCaseStrategy may
choose to use this information when deciding between solution cases. The common
features tests, FT0 , do not have associated harbinger elements and so a distinction test
is inappropriate.
6. Designing the extensions
172
6.3.5 Strategies for distinguishing between mutually exclusive
solution cases
Given the disparity between educational diagram domains, it is unrealistic to expect
that any one strategy can be successful in a generic way with regard to the
assessment of mutually exclusive solution cases. This work has assumed that the fact
that multiple model solutions may be feasible as a response to a given problem
specification indicates that multiple conventions within the domain may be applied
to solve the problem in varying ways, or that conventions may be inconsistently
understood or applied within the domain in general. Indeed, since educational
diagram domains attempt to teach principles to design problems to which there
exists no single, deterministic solution, then this problem is likely to be permanent. If
the occurrence of multiple model solutions is due to disparity within a domain, then,
similarly, it is unrealistic to expect that similarity in the learning process across
domains can be achieved.
Expert educators rely on domain knowledge to ascertain which model solution a
student was attempting to construct. Within a CBA context, therefore, it is necessary
to allow the strategy to distinguish between mutually exclusive solution cases to be
determined by an expert.
However, the approach of this work has been to provide a basis for assessment, while
allowing extension by subsequent developers. In this case, therefore, the following,
simple, algorithm will be used as the basis for distinction:
•
IF one distinction test succeeds (returns a mark > 0) AND all others fail
(return 0) then identify the solution case associated with the successful
distinction test;
•
ELSE identify the distinction test with the highest average mark for features
tests overall;
The concrete design of this strategy will be expanded in section 6.4, within the
context of designing the PrioritiseTruncateTool.
6. Designing the extensions
173
6.3.6 Summary
This section outlined the specific design decisions made to allow the implementation
of the design outlined in section 6.1.2.2 to occur. The design decisions were linked to
the detailed requirements. The design is able to build upon existing functionality
within the CourseMarker courseware. The design of DiagramFeaturesTool tool,
for the features testing of diagram in a domain-independent way, was demonstrated
and the process of repeatedly invoking the tool through the exercise marking
scheme,
subsequently
storing
the
feedback
results
in
a
separate
MarkingCompositeResult for each features test case, was outlined. The decision
to incorporate the process of distinguishing between mutually exclusive solution
cases into the PrioritiseTruncateTool was justified and the importance of
harbingers and the distinction test to the approach was emphasised. It is unrealistic to
expect any one strategy for distinguishing between mutually exclusive solution cases
to be successful across educational diagram domains, so a basic strategy was
proposed to provide a default basis on which to operate, while the potential for later
expansion was emphasised. The process of implementing such a strategy is outlined
in section 6.4, which documents the design of the PrioritiseTruncateTool for
the prioritisation and truncation of student feedback.
6.4 Prioritising and truncating student feedback: resolving the
design issues
Section 6.1.2.3 outlined the approach to the prioritisation and truncation of student
feedback. This process was divided into four sub-tasks: examining the features
feedback and deciding what course of action to take with regard to that provided by
the mutually exclusive solution cases, prioritising all features test feedback,
prioritising all layout feedback generated by the aesthetic and structural measures
and, finally, truncating the feedback prior to its delivery to the student.
The design of the PrioritiseTruncateTool is based upon the Strategy design
pattern. The tool acts as a context to four strategies, one for each of the sub-problems.
Each strategy acts as an interface; concrete strategies to solving each of the four subproblems must implement the interface associated with the sub-problem. The
interfaces are used to define the rules associated with concrete strategies so that the
6. Designing the extensions
174
PrioritiseTruncateTool can operate smoothly using a variety of implemented
strategies.
This section begins by linking the design decision to the detailed requirements for
the extension set out in section 5.2.3. The section then establishes the design of the
PrioritiseTruncateTool, followed by the design for each of the interfaces
responsible for regulating the strategies for each sub-problem: respectively, those
interfaces
representing
SolutionCaseStrategy,
FeaturesSortStrategy,
AestheticsSortStrategy and TruncationStrategy requirements.
6.4.1 Linking the design to the requirements
Section 5.2.3 outlined the requirements for prioritising and truncating feedback to
students. The central criteria were to define a mechanism whereby the prioritisation
followed by the truncation of the feedback could be achieved. Furthermore,
flexibility for the educator and developer was required such that prioritisation and
truncation could be configured to preference. The design achieves these
requirements by outlining four sub-problems and allowing the strategy used to solve
the sub-problems to be defined by the educator and implemented by the developer.
The first three of the sub-problems are associated with the prioritisation of feedback
comments, whilst the fourth sub-problem is associated with the truncation of the
comments based upon prioritisation.
Requirements from the field of CBA were identified in terms of ensuring
compatibility with the existing CourseMarker marking and feedback systems. In fact,
the tool does not affect the functioning of the CourseMarker marking system in any
way since it is designed to operate upon the feedback generated by the marking tools
after their operation has been completed. The PrioritiseTruncateTool which
acts as the context for the design acts transparently within the context of providing
feedback. Within CourseMarker, the generation of feedback is associated with
marking tools which are invoked through the exercise marking scheme. The delivery
of the generated feedback to the student also occurs through the marking scheme.
The PrioritiseTruncateTool is invoked between these two actions. The new
course
of
events
sees
the
feedback
being
generated,
modified
by
the
PrioritiseTruncateTool and then returned to the student by CourseMarker in
the conventional manner.
6. Designing the extensions
175
6.4.2 The PrioritiseTruncateTool
The PrioritiseTruncateTool acts as the context for each of the strategies which
represent the four sub-problems. The PrioritiseTruncateTool contains one
method, streamline, which accesses the student feedback object together with
objects representing concrete implementations of each of the four strategies, through
parameterisation. The design uses the approach of having the context (the
PrioritiseTruncateTool) pass the data as parameters to each of the strategy
operation since this keeps the context decoupled from each of the strategies. Concrete
strategies
extend
the
SolutionCaseTool,
FeaturesSortTool,
AestheticsSortTool and TruncationTool respectively. The relationships
between these tools and the SolutionCaseStrategy, FeaturesSortStrategy,
AestheticsSortStrategy and TruncationStrategy interfaces is outlined in
section 6.4.3. The streamline method passes the feedback object to each concrete
strategy in turn before finally returning the student feedback. The design for the
PrioritiseTruncateTool is summarised in figure 6.12.
PrioritiseTruncateTool
+streamline( MarkingCompositeResult, SolutionCaseTool,
FeaturesSortTool, AestheticsSortTool,
TruncationTool ) : MarkingCompositeResult
Figure 6.12: The PrioritiseTruncateTool
6.4.3 The strategy interfaces and abstract classes
The design of the strategy interfaces remains simple to allow maximum flexibility to
the educator and developer. One design plan might have been to have the interfaces
define specific methods for each of the operations which the strategy might
reasonably expect to overcome (for example, the SolutionCaseTool might be
expected to define methods to operate on marking results with only one solution
case). However, this design would necessitate an inflexible design for the
PrioritiseTruncateTool, which would be responsible for distinguishing
between each plausible scenario based upon an examination of the marking result,
and unnecessary dependency between the PrioritiseTruncateTool, which acts
6. Designing the extensions
176
as a context for the strategies, and the individual strategies themselves. The result
would be an inflexible system with limited development flexibility.
To maximise flexibility, each interface requires that only one method be
implemented: modify. The modify method is parameterised by the current feedback
result,
the
MarkingCompositeResult,
and
returns
a
new
MarkingCompositeResult representing the feedback after the concrete strategy
has been applied. Each interface has an associated abstract class. These abstract
classes must be extended by the concrete marking results in order that
parameterisation of the PrioritiseTruncateTool may occur, thus enforcing
implementation of the interfaces. The strategy interfaces, together with the associated
abstract classes, are illustrated in figure 6.13.
<<interface>>
SolutionCaseStrategy
<<abstract class>>
SolutionCaseTool
+modify(
MarkingCompositeResult ) :
MarkingCompositeResult
<<interface>>
FeaturesSortStrategy
<<abstract class>>
FeaturesSortTool
+modify(
MarkingCompositeResult ) :
MarkingCompositeResult
<<interface>>
AestheticsSortStrategy
<<abstract class>>
AestheticsSortTool
+modify(
MarkingCompositeResult ) :
MarkingCompositeResult
<<interface>>
TruncationStrategy
<<abstract class>>
TruncationTool
+modify(
MarkingCompositeResult ) :
MarkingCompositeResult
Figure 6.13: Strategy interfaces for the four sub-problems
6. Designing the extensions
177
6.4.4 Providing a basis
This work has consistently argued that, while flexibility in extension for educators
and developers must be a high priority in the design of the extensions proposed by
this work, it is also necessary to provide a basis for assessment to occur through the
implementation of default assessment behaviour. Section 6.3.5 outlined a concrete
strategy which could be used as the basis for solving the first sub-problem in the
prioritisation and truncation of student feedback. It is, however, necessary to propose
approaches to solving each of the three remaining sub-problems which can be used
in implementation.
A concrete features sort strategy will be implemented which ignores the distinction
between the common features test feedback and the feedback from the remaining
mutually exclusive solution case. The two MarkingCompositeResult branches
will be merged into one and sorted according to the prioritisation equation presented
as equation 6.6.
A concrete aesthetics sort strategy will be implemented similarly. Feedback from
aesthetic measures and structural measures will be combined and sorted by priority
according to equation 6.6.
A concrete truncation strategy will be implemented where the highest priority
n features feedback comments will be retained along with the highest priority
m aesthetic layout feedback comments. The values n and m can be specified
through parameterisation within the exercise marking scheme. All other feedback
results will be pruned from the feedback tree.
Equation 6.6 relates the priority P , of a MarkingLeafResult x to its weight w
and percentage mark m . Priority increases in proportion to both the weight of the
comment and the level of error of the student.
Px = wx (100 − m x )
Equation 6.6: Calculating the priority of a MarkingLeafResult
6. Designing the extensions
178
6.4.5 Summary
This section outlined the specific design decisions made to allow the implementation
of the design outlined in section 6.1.2.3 to occur. The design decisions were linked to
the detailed requirements. The design of the PrioritiseTruncateTool was
defined, and a simple definition was provided and justified for each of the four subproblems associated with the process of prioritising and truncating the feedback
provided to students. A basis for implementing examples of concrete strategies to
solve each of the four sub-problems was described.
6.5 Summary
This chapter outlined the design for the three extensions to the CBA courseware
proposed by this work. Section 6.1 provided a high-level overview of the design for
each of the extensions in words, and illustrated the high-level integration between
the extensions. Section 6.2 outlined the design for the extension to allow the aesthetic
layout of student diagrams to be assessed. A package structure was introduced for
aesthetic and structural measures and the design of the measures themselves
explained. A number of criteria from the fields of user interface design and graph
layout were chosen to be implemented as aesthetic measures. Section 6.3 outlined the
design for the extension to allow mutually exclusive solution cases to be assessed. A
generic diagram features tool was described and the process of repeatedly invoking
the features tool for each features test cases was described. The features test cases
themselves were discussed and the importance of defining a distinction test for
mutually exclusive solution cases based upon harbingers in the different model
solution versions was emphasised. Section 6.4 outlined the design for the extension
to undertake the prioritisation and truncation of student feedback. The process was
divided into four sub-problems and a strategy interface was presented for each subproblem. Concrete strategies for each of the four sub-problems were presented.
Chapter 7 provides an overview of implementation issues based upon the design
decisions presented in this chapter, together with a summary of guidance for the use
of teachers and educators, whose scope was identified in section 5.2.4.
Chapter 7
Issues in implementation and
advice for educators and developers
7. Issues in implementation and advice for educators and developers
180
Introduction
This chapter provides an overview of the issues arising from the implementation of
the extensions and their integration into the CourseMarker architecture. The chapter
also presents advice useful for developers and educators in the formative assessment
of new domains and the setting of exercises, which builds upon existing theory and
the documentation available for CourseMarker / DATsys.
Section
7.1
considers
the
implementation
issues.
The
objectives
of
the
implementation are outlined, software quality is considered and issues arising from
integrating the extensions into the existing CourseMarker architecture are explained.
A brief overview of the implementation of each of the extensions is provided. For
each extension, the point of integration into CourseMarker is defined and reference is
made to the design described in chapter 6.
Section 7.2 presents advice for developers and educators in developing CBA for
formative assessment in new diagram-based domains and the setting of exercises.
References to existing development materials, such as CourseMarker documentation,
are provided and the differences between developing traditional materials for
CourseMarker for summative assessment purposes and developing formative
assessment materials for CourseMarker / DATsys which use the extensions provided
as a result of this work are explained.
7.1 Implementation Issues
Chapter 6 documented the design process for each of the extensions and linked the
design to the detailed requirements specification. This section outlines the key
elements in the implementation of the design. Section 7.1.1 outlines the objectives in
terms of the requirements which the implementation must fulfil. Section 7.1.2
discusses the issues arising from implementing the system as a set of extensions
integrated into the CourseMarker architecture. Finally, sections 7.1.3 to 7.1.5 describe
how the designs for each of the three extensions were implemented and the points of
integration into CourseMarker.
7. Issues in implementation and advice for educators and developers
181
7.1.1 Objectives
The main objective of this research is to investigate the feasibility and usefulness of
automating the process of providing formative assessment in free-form, diagrambased domains using CBA courseware. Implementing the design is fundamental if
the feasibility and usefulness is to be evaluated. The purpose of the implementation
is to meet the following goals:
•
To implement the extension to allow student diagrams to be assessed in terms
of their aesthetic layout;
•
To implement the extension to allow features testing to accommodate
mutually exclusive solution features;
•
To implement the extension to allow the prioritisation and truncation of
student feedback;
•
To address software quality issues;
•
To integrate the extensions into CourseMarker, which provides a realistic,
extensible framework for the full lifecyle of CBA.
The first three goals will be addressed in section 7.1.3 to 7.1.5 respectively.
The existing CourseMarker infrastructure has proven reliability, maintainability,
portability and extensibility. The marking mechanism for CourseMarker is stable and
integrated with course management, assessment material delivery, feedback and
other facilities. Section 7.1.2 argues that integration into the CourseMarker
architecture ensures that key software quality issues such as reliability, robustness,
maintainability and portability are addressed automatically, as long as the
integration is successful and the design of the extensions themselves is sound. For
each of the extensions, therefore, it is necessary to demonstrate that the required
functionality has been implemented, that the extension can be maintained and
extended in accordance with software quality principles.
7. Issues in implementation and advice for educators and developers
182
7.1.2 Integration into CourseMarker
CourseMarker provides an existing architecture, with a design emphasising explicit
extension points, interfaces and standards to which components integrated into the
architecture must conform. CourseMarker was implemented in Java 2, a language
which
is
simple,
object-oriented,
distributed,
interpreted,
robust,
secure,
architectural-neutral, portable, high performance, multi-threaded and dynamic
[GJS97]. Tsintsifas [Ta02] argued that choosing Java 2 as the implementation
language would allow the development of a “better deliverable”.
The design for the extensions outlined in chapter 6 made explicit its intention to
integrate the extensions into the existing courseware architecture. Therefore, the
design process was influenced by the need for integration from the outset.
Integrating the extensions into CourseMarker has several major advantages. Key
CBA concepts such as the storage of administration data, security of access and the
delivery of materials and feedback, are already implemented as part of a proven
design. Since these components have been successfully tested and repeatedly used in
a live situation already, the design of the extensions was able to be simplified to the
extensions themselves and their integration with the surrounding courseware. The
other requirements of a CBA system, which were already implemented, could be
removed from consideration. This is an advantage that carries through to the
implementation stage.
Designing extensions to existing systems can be restrictive to the design process. In
this case, however, the effect was minimised due to the fact that CourseMarker was
designed with extensibility as a primary requirement.
For these reasons, the implementation of the extensions from their design was
straightforward. Where implementation problems did occur, they were usually
trivial.
7.1.3 Assessing the aesthetic
implementing the design
layout
of
student
diagrams:
The hierarchy of packages described in section 6.2.2 is implemented within the
package
of
marking
tools,
com.ltr.cm.marking.tool.
LayoutToolInterface therefore assumes the position at:
The
7. Issues in implementation and advice for educators and developers
•
183
com.ltr.cm.marking.tool.layout.LayoutToolInterface
The aesthetic and structural packages reside within the layout package. The
11 aesthetic measures, such as NonInterceptionTool and EquilibriumTool,
are placed within the aesthetic package. The structural package is initially
empty. The Figure interface, in com.ltr.daidalos.framework, imposes
methods for all Figure objects which return the “size”, “center” and other attributes
of the object. These method calls are the basis behind the algorithms within the
aesthetic measures, which are therefore implemented simply. Implementation closely
follows design. Only two noteworthy issues arise.
Firstly, for the marking tools to be integrated successfully into CourseMarker, an
associated marking command must be created for each. The marking command is
responsible for retrieving the user’s solution based upon the filename specified in the
marking scheme and the project code of the user. The marking command is also
responsible for dealing with errors, for example if the user’s file cannot be found. The
marking command calls the marking tool and returns the result to the feedback
system. It is necessary to create a marking command for each marking tool.
Implementation thus requires that 11 marking commands are created and housed in
the com.ltr.cm.marking.cmd package. By convention, a marking command has
a similar name to the associated marking tool. For example, the EquilibriumTool
has the associated marking command EquilibriumCMD. The creation of marking
commands is a standard process requiring no further design exertion; it can be
accomplished most simply by making a copy of an existing marking command and
editing both the name of the command and the name of the marking tool referenced
within.
Secondly, the BorderRectangle necessary for the operation of many of the
aesthetic measures, as illustrated in figure 6.9, is assumed to have been defined
within
Daidalos
for
each
educational
diagram
domain
to
be
assessed.
BorderRectangle is treated as a reserved keyword. A simple utility method
isValidFigure is introduced to the LayoutToolInterface which returns true if
a figure is entirely contained within the BorderRectangle and false otherwise.
7. Issues in implementation and advice for educators and developers
184
The process of invoking and parameterising the aesthetic and structural measures
and storing the results within a MarkingCompositeResult object is illustrated in
section 7.2.
7.1.4 Assessing solutions with mutually exclusive solution cases:
implementing the design
The DiagramFeaturesTool for the generic features testing of diagrams is
implemented within the package com.ltr.cm.marking.tool. An associated
marking command, DiagramFeaturesCMD, is implemented within the package
com.ltr.cm.marking.cmd. Implementation of the DiagramFeaturesTool is
straightforward
because
the
functionality
was
based
upon
the
earlier
EntityRelationshipTool, which had already been implemented. A clearer
method structure was, however, imposed by the design of the tool.
Methods to facilitate a traversable enumeration of all figures within a drawing are
imposed by the Drawing interface within the com.ltr.daidalos.framework
package.
The process of repeatedly assessing the mutually exclusive solution cases is based
upon
repeatedly
invoking
the
DiagramFeaturesTool,
through
the
DiagramFeaturesCMD, within the exercise marking scheme. Therefore, since the
distinction between solution cases is achieved by the PrioritiseTruncateTool,
this stage of the implementation is the least demanding.
The process of repeatedly invoking and parameterising the DiagramFeaturesTool
and storing the results within MarkingCompositeResult objects is illustrated in
section 7.2.
7.1.5 Prioritising and truncating student feedback: implementing the
design
The PrioritiseTruncateTool marking tool is implemented within a package
located
prioritisetruncate
com.ltr.cm.marking.tool.
with
the
The
other
four
SolutionCaseStrategy,
marking
simple
tools
at
interfaces
FeaturesSortStrategy,
AestheticsSortStrategy and TruncationStrategy, together with their
associated
abstract
classes,
are
also
implemented
within
7. Issues in implementation and advice for educators and developers
185
com.ltr.cm.marking.tool.prioritisetruncate; four further sub-packages
solutioncasestrategies,
featuressortstrategies,
aestheticssortstrategies and truncationstrategies are introduced for
the purpose of grouping together the implemented concrete strategies in a consistent
way which is convenient for the domain developer.
Four concrete strategies are implemented. The reasons for this are three-fold. Firstly,
the implementation is consistent with the approach of the work, which attempts to
provide a basis for formative assessment through implementing appropriate default
behaviour as well as providing a foundation for future extension. Secondly, the
implementation of concrete strategies facilitates an analysis of the usefulness of the
extension, a key requirement of the implementation. Thirdly, within a CBA context
the most suitable way for a developer to create new components is by modifying
existing ones. Therefore, the implemented concrete strategies provide a useful
template for future expansion by developers.
The DistinctionFirstSolutionCaseTool extends SolutionCaseTool and is
located within the solutioncasestrategies package. It is based upon the
algorithm outlined in section 6.3.5.
The MergeEqualFeaturesSortTool extends FeaturesSortTool and is located
within
the
featuressortstrategies
MergeEqualAestheticsSortTool
located
within
the
extends
package.
AestheticsSortTool
aestheticssortstrategies
package.
The
and
is
The
PriorityBothTruncationTool extends TruncationTool and is located within
the truncationstrategies package. These three concrete strategies are based
upon the algorithms described in section 6.4.4.
Associated marking commands were implemented for each of the concrete marking
tools in the same way as in sections 7.1.2 and 7.1.3.
The process of invoking and parameterising the PrioritiseTruncateTool
delivering the resulting feedback, which has been prioritised and truncated, to the
student, is described in section 7.2.
7. Issues in implementation and advice for educators and developers
186
7.1.6 Summary
Section 7.1 outlined the key elements in the implementation of the design. The
objectives of the implementation were explained. The integration into the existing
CourseMarker
architecture
was
discussed
and
its
implications
for
the
implementation outlined. The design of the three extensions was influenced from the
very beginning by the need to be integrated into the CourseMarker architecture. This
facilitated a smooth implementation process, an overview of which was provided for
each of the three extensions. Section 7.2 presents advice for developers and educators
in developing CBA for formative assessment in new diagram-based domains and the
setting of exercises.
7.2 Advice for developers and educators
Section 5.2.4 argued that guidance for developers and educators was an essential
requirement if the implementation of the extensions was to result in successful
formative assessment being carried out. Designing the assessment format to take full
advantage of the capabilities of an automated assessment system is a prerequisite for
successful assessment, a fact illustrated in section 3.1, where formative assessment
examples utilising the same courseware were shown to vary in their level of success.
This section presents essential guidance for educators and developers. In doing so,
the section achieves several objectives. Firstly, it demonstrates the mechanisms which
allow educators and developers to use the extensions to deploy formative assessment
exercises in a feasible and useful way. Secondly, it provides a documentation
overview which illustrates the practical implementation of exercises using the
extensions. In doing so, the practical integration between the extensions themselves,
and between the extensions and the existing courseware infrastructure, is made
explicit. Thirdly, it provides a useful overview to allow existing CourseMarker users
to appreciate the new functionality available for formative assessment purposes for
the purposes of migrating their exercises to make use of the extensions, where
appropriate.
Section 7.1 provides guidance for developers. Documentation for the development of
CourseMarker / DATsys exercises is already available and reference to this is made.
The guidance to developers concentrates on those aspects of domain and exercise
7. Issues in implementation and advice for educators and developers
187
development which are either changed or completely new when setting formative
exercises using the extensions outlined within this work. Section 7.2 provides
guidance for educators. The literature on formative assessment is plentiful and
references to relevant introductory texts are provided. The section concentrates on
the development of assessment materials which utilise the CourseMarker extensions
to facilitate the achievement of formative assessment best practice. The section also
notes the differences in conceptual assessment design between the formative
assessment exercises and the previous, summative, CourseMarker exercise
assessment schemes.
The roles of educators and developers have been previously defined. Conceptually,
the educator develops assessment materials while the developer facilitates the
necessary extensions to the courseware and may be responsible for translating the
assessment materials into correct CourseMarker exercises. However, it is certainly
useful for those in both roles to be aware of the material in its entirety, since coordination and mutual understanding across roles will best facilitate exercises which
make the most effective use of the courseware environment.
7.2.1 Guidance for developers
7.2.1.1 Prerequisites
The process of setting up an exercise using CourseMarker is outlined in [Sp02]. The
document provides an overview of the directory structure for CourseMarker
exercises and summarises the files required at the course, unit and exercise levels of
the structure. The structure of the necessary administration files is specified, such as
save.txt, which is responsible for defining the student files retrieved by the server
prior to the marking process, setup.txt, which is responsible for defining the files
placed in the student directory when the exercise is set up, and so on. The definition
of features for features testing is covered, including the Oracles notation for the
features expressions used to define the search. The use of marking schemes
expressed in Java is explained and a sample mark.java file is listed in full. The
function of the various marking commands is explained. Finally the batch file
mrun.bat, used to compile the marking scheme, is explained.
7. Issues in implementation and advice for educators and developers
188
The use of Daidalos to author new diagram notations is a simple, intuitive process
described in [Ta02]. It would be helpful for the developer to familiarise themselves
with CourseMarker conventions before attempting to set exercises. This section
explains the differences and extensions to the exercise format necessary to implement
formative assessment exercises in diagram-based domains.
7.2.1.2 Expressing features testing regimes to assess mutually exclusive
solution cases
Features marking continues to use the same format as in [Sp02]. Feature expressions
vary
depending
upon
the
marking
tool
called;
the
simple,
generic
DiagramFeaturesTool supports four types of features expression (exist, exact,
connection and exactConnection) with the same parameterisation as for common
features tests. Features test cases are expressed in separate features marking files. The
first stage of the process is to express the common features tests in the file
[ExerciseName].ft0. The second stage is to express the mutually exclusive
alternate cases in subsequently numbered features files. The first features test within
each file should examine the distinction test. The third stage is to assess all cases in
turn by invoking the marking tool within the exercise mark scheme.
mark.ft0:
5 : exact CircleNode A : Feedback1 : Feedback2
4 : exact CircleNode B : Feedback1 : Feedback2
4: exactConnection Link CircleNode A CircleNode B : Feedback1 : Feedback2
mark.ft1:
5 : exact CircleNode C : Feedback1 : Feedback2
7 : exist SquareNode==0 : Feedback1 : Feedback2
mark.ft2:
5 : exact SquareNode E : Feedback1 : Feedback2
5 : exactConnection Link CircleNode B SquareNode E : Feedback1 : Feedback2
4 : exact SquareNode F : Feedback1 : Feedback2
3 : exact SquareNode G : Feedback1 : Feedback2
Figure 7.1: Features tests organised into cases
Figure 7.1 outlines a very simple example of three features files which might be used
to assess the exemplar problem in figure 5.1. The domain has three types of figure:
CircleNode, SquareNode and Link.
7. Issues in implementation and advice for educators and developers
189
The mark.ft0 features file contains features tests examining those elements
common to all model solutions, while the mark.ft1 and mark.ft2 files represent
mutually exclusive solution cases. The first features tests within mark.ft1 and
mark.ft2 — exact CircleNode C and exact SquareNode E respectively —
denote the distinction test for each case.
The weight and feedback should be determined by the educator according to the
guidance presented in section 7.2.2. Feedback assists the student learning process
within a domain and therefore, by definition, relies upon domain knowledge to be
useful to the student. As an example, the unsuccessful feedback (Feedback2) from
mark.ft1 could explain to the student why the type of solution represented by that
case precludes the existence of any SquareNode nodes.
The way the features test cases are assessed through invocation of the marking tool
by the marking scheme is examined in section 7.2.1.5.
7.2.1.3 Layout tools
All layout tools must implement the LayoutToolInterface interface and are
placed in either the package aesthetic or structural depending upon the
nature of the tool. Layout tools must implement the method mark, which takes the
student drawing, relative weight and leniency value as arguments and returns a
MarkingLeafResult. The majority of methods related to the calculations of layout
tools are located in the Drawing and Figure interfaces of DATsys. Method calls can
return the co-ordinates of the location of the centre, width, height etc. of a figure.
Enumerating the figures within a drawing object for the purposes of traversal is a
repetitive task which varies little between diagram marking tools. As with all
CourseMarker exercise components, the best way to implement a new layout tool is
to copy an existing tool and make necessary modifications, rather than attempting to
implement “from scratch”. In most cases, the mathematical formulae for layout
measures can be translated directly into algorithms for implementation into layout
marking tools.
Once the raw score from the algorithm is returned, scaling should be undertaken
using the leniency value. The new mark value is used to parameterise the constructor
7. Issues in implementation and advice for educators and developers
190
of MarkingLeafResult, along with the description and feedback returned by the
tool, and the weight, which is unchanged during the marking process.
The process of invocating and parameterising the layout tools is examined further in
section 7.2.1.5.
7.2.1.4 Prioritisation and truncation strategies
Like layout tools, prioritisation and truncation strategy tools implement specific
interface methods, deduce their input data in a standard way from the provided
parameters and return an object, in this case a MarkingCompositeResult, to
conform to the interface. Similarly, the most straightforward way to implement a
new prioritisation or truncation strategy tool is to copy an existing tool of the correct
type and modify the central algorithm to conform to the strategy outlined by the
educator.
There are four types of prioritisation and truncation strategy tools which may be
implemented.
Solution Case Strategy marking tools extend the SolutionCaseTool class and
must decide how to distinguish between mutually exclusive alternate solution case
feedback and prune the feedback tree accordingly. MarkingCompositeResult
objects containing feedback from the common feature case will contain the substring
“common” in their description, whilst those representing mutually exclusive
alternate solution cases will have the substring “exclusivex”, where x was the
solution case number.
Features Sort Strategy marking tools extend the FeaturesSortTool class and must
implement an algorithm to apply criteria for sorting to features test feedback for the
purposes of prioritising feedback leaf nodes, and deciding how to prioritise features
results from the common and mutually exclusive cases relative to each other.
Aesthetics Sort Strategy marking tools must apply an algorithm for sorting and
criteria for prioritisation, this time between the feedback from aesthetic and
structural marking tools. Aesthetics Sort Strategy marking tools extend the
AestheticsSortTool class.
7. Issues in implementation and advice for educators and developers
191
Truncation Strategy marking tools must implement an algorithm to truncate the
overall MarkingCompositeResult in order that feedback can be returned to the
user. Truncation Strategy marking tools extend the TruncationTool class.
Prioritisation and truncation strategy tools must be placed in the appropriate
package within com.ltr.cm.marking.tool.prioritisetruncate. Solution
Case Strategy marking tools are located within the solutioncasestrategies
package,
Features
Sort
Strategy
marking
tools
within
the
featuressortstrategies package, Aesthetics Sort Strategy marking tools within
the aestheticssortstrategies package and Truncation Strategy marking tools
within the truncationstrategies package.
The process of invocating and parameterising the PrioritiseTruncateTool with
marking tools from each strategy area is examined further in section 7.2.1.5.
7.2.1.5 The marking scheme
Figure 7.2 shows an example of an exercise marking scheme. Twelve points of
interest are noted on the figure for reference.
The CourseMarker course structure hierarchically stores exercises in directories
which conceptually represent courses, units and exercises. The package structure
reflects the directory structure of the course. Marking commands and tools must be
imported to make them available to the marking scheme class (point 1). The
markExercise() method (point 2) returns a TMarkingResult object; this is the
point of integration to CourseMarker’s feedback delivery facilities, since it is the
TMarkingResult object which is used to populate the graphical feedback tree
representation. String and integer range boundaries can be specified to configure the
look of the feedback tree to the student.
package MyCourse.MyUnitOfExercises.SimpleExercise;
import com.ltr.cm.marking.*;
import com.ltr.cm.marking.cmd.*;
7. Issues in implementation and advice for educators and developers
1
192
import
com.ltr.cm.marking.tool.prioritisetruncate.solutioncasestrategies.*;
import
com.ltr.cm.marking.tool.prioritisetruncate.featuressortstrategies.*;
import
com.ltr.cm.marking.tool.prioritisetruncate.aestheticssortstrategies.*;
import com.ltr.cm.marking.tool.prioritisetruncate.truncationstrategies.*;
import com.ltr.cm.marking.tool.layout.aesthetics.*;
public class mark extends TBaseMarkScheme {
2
public TMarkingResult markExercise() {
String[] strRange = {"Rotten", "Poor", "Good", "Excellent"};
int[]
intRange = { 40
,
50 ,
80 ,
100
};
3
4
5
TMarkingResult am1 = execute(new NonIntersectionCMD( “Simple.draw”, 4,
0.5 );
am1.setWeight(4);
am1.setFeedbackRange(strRange, intRange);
TMarkingResult am2 = execute(new EquilibriumCMD( “Simple.draw”, 3, 0.21
);
am2.setWeight(6);
am2.setFeedbackRange(strRange, intRange);
MarkingCompositeResult amcr = new MarkingCompositeResult("Aesthetics
Measures");
amcr.addChild(am1);
amcr.addChild(am2);
6
TMarkingResult feat0 = execute(new DiagramFeaturesCMD( “Simple.draw”,
“SimpleExercise.ft0” );
feat0.setFeedbackRange(strRange, intRange);
7
MarkingCompositeResult f0mcr = new MarkingCompositeResult("Common
Features");
f0mcr.addChild(feat0);
8
9
TMarkingResult feat1 = execute(new DiagramFeaturesCMD( “Simple.draw”,
“SimpleExercise.ft1” );
feat1.setFeedbackRange(strRange, intRange);
MarkingCompositeResult f1mcr = new MarkingCompositeResult("Exclusive1");
f1mcr.addChild(feat1);
MarkingCompositeResult rawtree = new MarkingCompositeResult("Main");
10
rawtree.addChild(amcr);
rawtree.addChild(f0mcr);
rawtree.addChild(f1mcr);
11
MarkingCompositeResult feedback = new PrioritiseTruncateTool( new
DistinctionFirstSolutionCaseTool(), new MergeEqualFeaturesSortTool(),
new MergeEqualAestheticsSortTool(), new PriorityBothTruncateTool() );
12
return feedback;
}}
Figure 7.2: A simple marking scheme for a formative exercise
The NonIntersectionCMD is called (point 3) and the feedback generated by the
associated
NonIntersectionTool
is
given
the
variable
name
am1.
The
7. Issues in implementation and advice for educators and developers
193
NonIntersectionTool returns a MarkingCompositeResult, which is an
implementation of the TMarkingResult interface. Various properties of the
TMarkingResult, such as the weight, can still be manually set, if required.
The EquilibriumCMD is invoked using the same mechanism (point 4); in a
complete marking scheme all eleven aesthetic measures, plus any required structural
measures, would be invoked in this way. Here they are omitted for the sake of
brevity. Next, a new MarkingCompositeResult is generated to store the feedback
generated by all aesthetic measures (point 5). Structural measures are dealt with in
the same way as the aesthetic measures. Each structural measure is invoked and its
marking result is stored as a TMarkingResult. Once all structural measures have
returned
their
results,
they
are
all
added
as
child
nodes
to
a
MarkingCompositeResult for later prioritisation and truncation.
The
next
stages
demonstrate
the
marking
of
features
test
cases.
The
DiagramFeaturesTool is invoked (point 6) to carry out the features assessment
upon the common features tests expressed in the file SimpleExercise.ft0. A
composite marking result for common features is created with common features
feedback as its child nodes (point 7). The process is repeated for the first mutually
exclusive features case (points 8 and 9). Generally, a marking scheme would invoke
at least 2 mutually exclusive features cases; if only one model solution is acceptable,
then the use of mutually exclusive features testing is superfluous.
A composite marking result to encompass all feedback is generated (point 10), with
the composite marking results for the aesthetic and structural measures (in this case,
there are no structural measures) and the features test cases being added as children.
The PrioritiseTruncateTool is then invoked, parameterised by four concrete
strategies (point 11). The PrioritiseTruncateTool utilises each of the strategies
in turn to generate a new MarkingCompositeResult. Finally (point 12), the new
MarkingCompositeResult is returned as feedback.
7.2.2 Guidance for developers
7.2.2.1 Prerequisites
Conceiving and constructing assessment materials is a non-trivial task. Assessment
materials for formative assessment benefit from a great potential for re-use across
7. Issues in implementation and advice for educators and developers
194
many academic sessions since plagiarism between students is not an issue.
Unfortunately, due to the care which must be taken in defining the feedback and the
amount of time necessary to consider alternative model solutions, formative
assessment materials require considerable development time. Resource-savings are
therefore optimised by creating exercises which can be re-used.
Formative assessment is intended to assist student learning. Therefore, exercises
based upon logical application of domain principles are to be preferred over
deliberately misleading questions, especially in the early stage of a course of
exercises. A briefing on formative assessment principles is provided in [Kp01]. The
primary deliverable associated with formative assessment is feedback, rather than
assessment marks or grades. For this reason, great care must be taken in constructing
the feedback for the exercises. The nature of feedback for formative assessment is
discussed in [JMM+04]. Good formative assessment using CBA courseware depends
upon a successful interaction between the assessment materials and the courseware
itself. Thus, an examination of CBA exercises using CourseMarker and the guidance
for developers provided in section 7.2.1 is likely to prove useful. Knowledge of the
way in which features testing operates in existing CBA exercises is a pre-requisite to
specifying features testing using the mutually exclusive features test cases allowed
by the extensions.
This section outlines the issues in constructing assessment materials to utilise the
formative assessment potential offered by the new extensions described in this work.
7.2.2.2 Identifying harbingers and specifying distinction tests
Given an assessment specification to which there is more than one possible model
solution, the first task is to identify those elements which are common to all model
solutions and to construct features tests which assess the solution based upon those
elements, or combinations of those elements, alone.
For each model solution, it is next necessary to consider those elements which are
uncommon. It is necessary to emphasise that, although features tests search for
features expressions which, in turn, are developed around the idea of searching for
desired elements, the precise nature of the relationship between features tests and
elements varies across both domains and the preferences of educators.
7. Issues in implementation and advice for educators and developers
195
Features tests within the mutually exclusive solution cases will, therefore, be based
upon testing for the presence (or absence) of combinations of both common and
uncommon elements in the model solutions which allows the pedagogic
understanding of the student to be assessed and meaningful advice, in the form of
feedback, to be given.
An important features test, which must be defined for each mutually exclusive
alternate solution case, is the distinction test, which may be used to determine which
of the model solutions is most related to the attempt of the student.
The initial effort of the educator should be directed towards identifying a perfect
harbinger within each of the model solutions. A perfect harbinger is an element (or,
likely, a combination of elements) which defines the key difference which
distinguishes the model solution. Although other elements within the model solution
may be uncommon, it is likely that they could have occurred as a consequence of the
choice of elements in the perfect harbinger. The task of the educator is, then, to create a
distinction test based upon the perfect harbinger which returns helpful, domainspecific feedback based upon the reasons why the model solution is distinguished.
All mutually exclusive solution cases must, by definition, contain an uncommon
element (or combination of elements). If a model solution contains no perfect
harbinger, then an element or combination of elements must be used as the basis for
the distinction test which fulfils the minimum criterion of being unique to the model
solution. This still allows the system to make a definite distinction between
alternative model solutions. It is, however, less ideal from pedagogic standpoint and
is thus referred to as an imperfect harbinger. The construction of useful feedback may
prove a more difficult task for the educator when using imperfect harbingers.
7.2.2.3 The weighting system
Unlike most of the advice which is summarised in this section, the system of
weighting might be most easily understood by those with the least experience in
setting CourseMarker exercises. Weights are attached to features tests using the same
mechanism as for standard CourseMarker exercises. However, their meaning within
formative assessment exercises changes.
7. Issues in implementation and advice for educators and developers
196
In summative assessment exercises, weight was assigned to features tests to
represent the relative weight of the features test in assigning grades. Therefore, the
highest weights were awarded to the most difficult features tests in order to
designate credit fairly to the more able students. In formative assessment exercises,
however, the weights refer to the priority of the feedback. Therefore, the highest
weights are awarded to the most fundamental (usually the easiest) features tests,
since the most fundamental aspects of a student solution must be corrected first,
before moving on to the more advanced features of the student solution at
subsequent stages, when the student has successfully attained the basics.
7.2.2.4 Configuring and specifying aesthetic and structural measures
The key difference between aesthetic and structural measures is that the former are
domain-independent, while the latter are domain-specific. New aesthetic measures
will be implemented rarely where a measurable criterion can be demonstrated to
have domain-independent assessment validity. The need for structural measures
must, however, be examined when each new domain is to be assessed for the first
time. Many domains will require the specification of no structural measures. In this
case only the prioritisation of the aesthetic measures will need to be considered. If
structural measures are required then these, together with their priority relative to
the aesthetic measures, will need to be specified.
A measure is based upon any algorithm which, when applied to a student drawing,
produces a numeric value to indicate success (or compliance with the criterion). A
variety of aesthetic measures have already been designed and implemented. Suitable
structural measures depend upon the properties of the domain to be assessed, and
their representation as an algorithm. For example, given a domain in which all nodes
of type b must be located exactly vertically underneath a corresponding node of type
a, a structural measure could be defined as the proportion of nodes of type b which
do, in fact, reside vertically underneath an a node. Suitable properties to be examined
include the positions of nodes, since their centres and dimensions can be determined
by the marking tools.
Configuring the marking tools involves first specifying the leniency of the tool. The
leniency lowers the threshold at which good feedback is returned to the student by a
measure. It is represented as a percentage. A good method of determining suitable
7. Issues in implementation and advice for educators and developers
197
leniency for an exercise is to use the measures, configured to have no leniency, to
assess the model solutions, in order to develop an idea of what is realistically
possible within context.
Prioritising the marking tools is a straightforward concept, but great care must be
taken to achieve a useful balance. Priority is determined by using an integer.
Priorities are relative; the numbers could represent percentages in the mind of the
educator, but any system may be used so long as consistency is maintained
throughout. It is especially important not to weight measures too disproportionately.
If disproportionate weighting is applied, then certain measures may never qualify to
return feedback to the student since their priority would be constantly overridden.
7.2.2.5 Specifying and configuring prioritisation and truncation strategies
Once the assessment of aesthetic and structural measures, together with that of the
features cases, has been achieved, a raw tree of all feedback comments is generated
by the system. At this point it is necessary to prioritise the feedback and truncate the
tree to leave only that feedback which is most important for the purposes of the
student. This process is accomplished through a four-stage process. In the first stage,
the most relevant mutually exclusive solution case is determined. The feedback
associated with all other cases may be discarded at this stage. In the second stage, the
priority of the feedback generated by the features testing is determined and sorting
carried out. The feedback nodes from the common features test case may be either
merged with those from the most relevant mutually exclusive solution case, or kept
separate, dependent upon context. In the third stage, priority of aesthetic and
structural measure feedback is determined and the feedback nodes sorted. In the
fourth and final stage, the resultant feedback tree is pruned according to a method of
truncation.
At each stage, it is necessary to visualise the process of prioritising and truncating the
tool as if completing the task by hand. Careful examination of the processes used,
within the context of the domain, to choose feedback to return to the student in a
manual process may result in the visualisation of a suitable algorithm. Examination
of previously implemented algorithms may be a further source of inspiration (or
even reveal suitability for simple re-use).
7. Issues in implementation and advice for educators and developers
198
When specifying an algorithm it will reduce future development effort if algorithms
are specified generally and allowed to be parameterised for the purposes of
configuration. For example, consider a simple algorithm to remove all feedback
except the 2 highest priority features feedback nodes. It would be better to specify an
algorithm which removes all feedback except the n highest priority features feedback
nodes, and specify n = 2 through parameterisation. This increases the scope for reuse in future contexts and maximises the resource-savings associated with the
courseware.
7.2.2.6 Writing good feedback comments
Specifying good feedback comments is a non-trivial undertaking and likely to
consume a large proportion of an educator’s development time. [JMM+04] provides
a useful overview of feedback comments and their relationship to conceptual
frameworks of student-centred learning. In general, good CBA practice encourages
student research after each submission. Such student research can be encouraged
through the linking of feedback comments to assessment materials. One solution to
this problem is to develop extensive teaching materials which can be directly
referenced by the feedback. The student can then refer to the materials directly. This
approach has the disadvantage that very large amounts of time and resources are
required to develop the materials. A successful mechanism for the encouragement of
student research which is more common is to provide references to appropriate texts
which are available to the student online or using institutional infrastructure such as
library facilities.
The feedback comment itself should be of a positive, motivational nature. The
feedback comment should emphasise good practice related to the shortcoming
within the student solution which caused the comment to be returned, rather than
stating the failing of the student solution directly. An example of a scenario would be
a student diagram in which a required connection line between two existing nodes is
absent from the student solution. Suitable feedback would explain the options for
connecting nodes of the type in question and provide a suitable reference to further,
relevant, information. Less suitable feedback would state the missing link to the
student.
7. Issues in implementation and advice for educators and developers
199
7.2.3 Summary
Section 7.2 provided essential guidance for both educators and developers. Section
7.2.1 provided an overview of the issues arising in the development of CBA
assessment materials for formative assessment using the courseware and
demonstrated the way in which implementation of the exercises would occur in
practice. Useful references were provided to existing documentation. The
implementation and configuration of features testing regimes, layout tools and
prioritisation and truncation strategies was examined. Finally, a simple marking
scheme was presented with a step-by-step explanation attached.
Section 7.2.2 provided an overview from the point of view of the educator.
Knowledge prerequisites were indicated, and the topics of identifying harbingers and
distinction tests, configuring and specifying aesthetic and structural measures and
prioritisation and truncation strategies and the writing of good feedback comments
were discussed.
7.3 Summary
This chapter provided an overview of the issues arising from the implementation of
the extensions and their integration into the CourseMarker architecture, together
with useful advice for developers and educators in the formative assessment of new
domains and the setting of exercises. The aim of the implementation, to facilitate
research into the feasibility and usefulness of automating the formative assessment
process, within diagram-based domains, using CBA courseware was discussed. To
this end, the implementation itself was described, and the way in which the
implementation can be used by a combination of educators and developers to
produce assessable course exercises was discussed in detail.
Chapter 8
Use and evaluation
8. Use and evaluation
201
Introduction
This chapter argues that the development of the extensions and their integration into
the existing CourseMarker courseware has resulted in a system which allows the
formative assessment of diagram-based domains to be automated in a manner which
is both feasible and useful.
Following the implementation overview presented in chapter 7, the purpose of this
chapter is to illustrate the use of the system, discuss initial results in the development
of formative, diagram-based CBA, evaluate the courseware from the perspectives of
CBA, formative assessment and educational diagramming, evaluate the integration
between the extensions and the existing architecture and discuss general conclusions
with regard to the three research areas related to the work.
Formative exercises, utilising the implemented extensions, have been implemented
in two domains. These exercises were evaluated by being provided to students.
Results from the exercises were available for scrutiny and great attention was paid to
comments from students and to the responses to questionnaires.
Section 8.1 outlines the objectives of the chapter. Section 8.2 provides an overview of
the exercises in terms of their development process, use by students and evaluation.
Sections 8.3 to 8.5 evaluate each of the extensions in turn with respect to CBA,
formative assessment and educational diagramming considerations. Section 8.6
draws together general conclusions in order to argue that the central objective of the
work has been met.
8.1 Objectives
This chapter has two main objectives:
•
To evaluate the implemented extensions in terms of criteria defined by the
three research areas of CBA, formative assessment and educational
diagramming and to determine the effectiveness of their integration into
existing courseware;
•
To test formative, diagram-based CBA in practice and draw initial
conclusions about the benefits such an approach brings.
8. Use and evaluation
202
A further objective is to reflect upon the feasibility and usefulness of conducting
formative computer-based assessment in diagram-based domains.
The objective of the three extensions is to enhance the functionality of the
CourseMarker / DATsys courseware to take into account the shortcomings of the
existing system with regard to conducting formative exercises, which were identified
during the initial phase of research, summarised in chapter 4, and by applying
detailed consideration of the requirements, as demonstrated in chapter 5. By
integrating the extensions into the existing CourseMarker / DATsys courseware it is
possible to take advantages of existing features, such as the ability to define
representations for new diagram domains without programming, the ability to
specify customised student diagram editors and a stable, reliable platform for
delivering CBA across a departmental network and collecting administrative data.
8.2 Examples of formative, computer-based assessment exercises in
diagram-based domains
8.2.1 The process of exercise creation
The authoring of a formative, diagram-based CBA exercise, based upon a problem
specification, involves a series of stages.
Firstly, the Daidalos editor must be used to build a tool library which represents the
domain notation, including the nodes and connection lines associated with the
domain and the Border Tool. This stage must be undertaken once for each new
domain.
Secondly, the marking tools must be developed and configured on a domain-specific
basis. The DiagramFeaturesTool can be used to assess diagram features using
several generic operators, but if more specific functionality is required then this tool
must be extended or a new, suitable tool developed. Evaluation of the domain
requirements must be used to indicate whether domain-specific structural measures
need to be developed. In all cases, the relative weighting of the aesthetic layout
measures must be considered. Finally, the prioritisation and truncation strategies
must be decided. If existing prioritisation and truncation strategies are suitable
within context, then simple parameterisation occurs. Otherwise, new prioritisation
and truncation strategies must be developed.
8. Use and evaluation
203
Thirdly, the Ariadne editor must be used to build the individual exercises. A subset
of the tools from the tool library is defined, application features are selected and
configuration of exercise options is undertaken. Model solutions for the exercise are
drawn on Ariadne’s drawing canvas for later reference. Configuration of the
marking tools on a per-exercise basis can be achieved through the text editors
associated with Ariadne (or with a simple editor such as Notepad).
Once a domain has been defined and exercises developed, CourseMarker can be
used to manage the full lifecycle of the CBA exercises. This involves the same stages
for formative exercises developed using the extensions described in this work as for
previous CourseMarker exercises, namely:
•
The testing and deployment of the exercise using CourseMarker;
•
The running of the exercise and the marking of student solutions;
•
Exercise administration.
The administering of the exercise involves the collecting of student solutions,
marking results and feedback for the purposes of evaluation of the results.
Formative assessment exercises in two domains were assessed using the courseware.
The exercises were offered to students on a voluntary basis only for reasons of
institutional administration. The authored exercise domains were in UML Class
Diagrams and in UML Use Case Diagrams.
The task of authoring new exercise domains is very lengthy, but straightforward. The
outcome benefits both students and educators and, furthermore, each domain need
only be developed once and added to the repertoire of the system. Both implemented
domains are features marked by the DiagramFeaturesTool and had no special
layout requirements, meaning that only configuration of the existing aesthetic
measures was required.
The use of Daidalos to create tool libraries involves drawing the diagram elements on
the canvas, selecting the elements and defining the connectivity properties (if
required). The data model of the elements is specified, including whether the
8. Use and evaluation
204
elements are editable and their Names. The new, composite, element is then placed
into the tool library to be used repeatedly, at will.
The use of Ariadne to author exercises in those domains which have already been
developed involves several operations. Parameters for Theseus must be specified to
configure the menus, toolbar and other options which are available to student in the
course of developing the exercise. Ariadne can be used to develop the marking
scheme, including specifying the features test cases. In order to accomplish this,
Ariadne invokes its own text editor. The exercise specification can also be input in
this way, along with the editing of the properties files which are required for all
exercises within CourseMarker.
Deployment and testing of the exercise through CourseMarker is then undertaken.
Theseus is invoked as the student diagram editor from within CourseMarker by
clicking the ‘Develop’ button after exercise set up. The exercise model solutions can
be pasted into Theseus from Ariadne in turn, and used for the purposes of testing the
marking and feedback results of the exercise and tweaking any problems.
Specifically, a recommended way to determine the leniency value for the aesthetic
and structural measures is to submit the model solutions, with no leniency applied,
and examine the raw scores awarded to get an idea of what it is reasonable for the
student to achieve within the constraints of the exercise. To access all the marking
data a temporary concrete truncation strategy can be used which performs no
truncation and changes the feedback from the measures to a string containing the
raw score. Care should be taken to remove this strategy and replace it prior to
making the exercise available to students.
Evaluation of the exercises was accomplished through two means. Firstly, the results
of the exercises were stored by CourseMarker and made available for analysis.
Secondly, questionnaires were distributed to students containing two types of
questions. The majority of the questions asked the student to agree with a series of
statements which were then scored on a five point Likert scale [Lr32], from 1disagree to 5-agree. Finally, the questionnaire contained some open ended questions
where student could make further, free-form comments. The questionnaires will be
examined in further detail, along with the results obtained from students, in section
8.2.3.
8. Use and evaluation
205
8.2.2 Exercise domains and methodology
Prototype exercises were developed in two domains: UML Class Diagrams and UML
Use Case Diagrams. UML Class Diagrams are used in the design process of objectoriented systems to describe the classes within the system and their relationships to
each other. UML Use Case Diagrams are used to describe sets of scenarios which
describe interactions between external actors and the system.
Section 8.2.2.1 provides a brief outline of the UML Use Case Diagram exercises,
whilst section 8.2.2.2 provides an outline of the UML Class Diagram exercises.
Section 8.2.2.3 outlines the methodology.
8.2.2.1 UML Use Case Diagram exercises
UML Use Case Diagrams are used to describe interactions between users and the
system. They are conceptually easy for many students to understand and so were a
suitable choice for the first domain to be implemented.
The authoring of prototypical exercises in the UML Use Case Diagram domain
involves first specifying the nodes and connections to be used by students to
construct their solutions. Two types of domain nodes are available to the student,
namely actors and use cases, while one type of domain connection, interaction, is
available. The final tool which is made available is the Border Tool object, which is
used by the students to describe the physical borders of their diagrams.
Figure 8.1 demonstrates the tool library which is developed for UML Use Case
Diagram exercises, while figure 8.2 shows a simple diagram constructed using the
tool library.
The task of using Daidalos to create the tool library did not require much effort. The
tool bar components are composed of groups of standard shapes, graphical
primitives and text elements. Each tool has a data model which defines the name of
the tool (for example, Actor), while each of the sub-components has also been named
for reference purposes. The naming of these tools facilitates the basic mechanism for
features marking of the resulting diagrams in the same way as in the earlier entityrelationship diagram coursework.
8. Use and evaluation
206
Figure 8.1: The tool library for UML use case diagrams
Figure 8.2: A simple use case diagram using the tool library
With the tool library complete, Ariadne is used to configure the features available to
the student with the Theseus student diagram editor. Ariadne is then used to
develop the marking scheme, configure the marking tools and the CBA exercise.
The DiagramFeaturesTool was used for features testing of the UML use case
diagram solutions. The exist, exact, connection and exactConnection operators can be
applied by stating the Name and, if required, the Text Content of the nodes (Actor
and UseCase) and the Name, Start Node and End Node of the connection (Interaction)
in the same way as for entity-relationship diagrams.
8. Use and evaluation
207
The process of developing the marking tools, exercises and feedback has been
described in section 8.2.1. The application of this process to the UML Use Case
Diagram Exercises is evaluated in section 8.2.3.
8.2.2.2 UML Class Diagram exercises
UML Class Diagrams are used in the design process of object-oriented systems to
describe the classes within the system and their relationships to each other. The
authoring of prototypical exercises in the UML Class diagram domain involves,
again, first specifying the nodes and connections to be used by students to construct
their solutions.
Two types of domain nodes are available to the student, both of which represent
classes. The first simply allows the class name to be defined, while the second has
editable text components for the class name, attributes and operations. Four types of
domain connections are available: one-way associations, two-way associations,
generalisation and implementation. Furthermore, generalisation must be configured
as an “elbow-type” connection line. The final tool which is made available is the
Border Tool object, which is used by the students, again, to define the physical
borders of their diagrams.
Figure 8.3 demonstrates the tool library which is developed for UML Use Case
Diagram exercises, while figure 8.4 shows a simple diagram constructed using the
tool library.
Once again, the task of using Daidalos to create the tool library is straightforward.
The tool bar components are composed of groups of standard shapes, graphical
primitives and text elements. Each tool’s data model is defined by naming the object
after inserting it into the tool library, while each of the sub-components is named for
reference purposes. Ariadne is again used to configure the features available to the
student with the Theseus student diagram editor. Ariadne is then used to develop
the marking scheme, configure the marking tools and the CBA exercise.
8. Use and evaluation
208
Figure 8.3: The tool library for UML class diagrams
Figure 8.4: A simple class diagram using the tool library
The DiagramFeaturesTool was used for features testing of the UML Class
Diagram solutions. The operators can be applied by stating the Name and, if
required, the Text Content of the nodes and the Name, Start Node and End Node of
the connection in the same way as for the use case diagrams.
The general process of developing the marking tools, exercises and feedback has
been outlined in section 8.2.1. The application of this process to the UML Class
Diagram Exercises is evaluated in section 8.2.3.
8. Use and evaluation
209
8.2.2.3 Methodology
30 undergraduate Computer Science students in their second year undertook the Use
Case Diagram exercises, whilst 28 students (a subset of the 30) undertook the Class
Diagram exercises. The students were volunteers; unlike the previous experiment
described in chapter 4, there was no attached summative element to the exercises.
Three exercises were set within each diagram domain, of gradually increasing
complexity.
Quantitative data was collected, as before, by using CourseMarker’s Archiving
Server and by using Likert scale questions in student surveys. The student solution at
each submission was captured using CourseMarker’s Archiving Server, together
with the hidden marks and the feedback.
In the previous experiment the questionnaires were poorly responded because their
dissemination had been inadequately planned. For the Use Case and Class Diagram
exercises the students were provided with the questionnaire soon after they were
registered to take the exercises and were asked to complete and return the
questionnaire once they had finished the course. The response rate was improved in
comparison with the previous experiment.
In the questionnaires, the students were asked to indicate their agreement with a
series of statements by choosing their level of agreement with the statement on a 5point Likert scale. The statements were designed to assess whether the students had
found the exercises easy to comprehend, whether the feedback provided by the
system was considered useful, whether the students thought the system had assisted
their learning process and whether the exercises were motivational, provoked the
students to conduct further research to improve their answers and were considered a
good use of time. The questionnaires were kept brief (11 statements) to try to
minimise the extent to which the students found them tiresome. The 11 statements
are presented in table 8.2.
Again, qualitative data was collected through the use of open-ended questions at the
end of the student surveys. For these exercises there were no formal laboratory
sessions and no paid lab tutors. Instead, students engaged with the course at their
own pace and at their own time and asked for assistance by contacting a course email
8. Use and evaluation
210
address. Further qualitative data was obtained by keeping records of the emails sent
by the students as a result of the course. This substituted for the tutor interviews.
8.2.3 Use and evaluation of the prototypical exercises
8.2.3.1 Constructing and running the exercises
Three use case diagram exercises and three class diagram exercises were set as
coursework at the University of Nottingham. For logistical reasons, the students
were essentially ‘volunteers’ with no system of compulsion possible to induce the
students to register for the formative exercises. Students were sent information about
the formative exercises and asked to register by email to receive access to the courses.
Once a student had been added to the course list then viewing the course material
involved loading the CourseMarker client and entering the standard username and
password. No problems were encountered with this initial stage of the process for
two reasons:
•
Access to the CourseMarker client is available in all terminal rooms within
the Computer Science building;
•
The students had already used CourseMarker for previous exercises,
especially the Java programming exercises which are compulsory for all
Computer Science first year undergraduates, and were therefore familiar with
the principles of logging on to the system, choosing the course to view and
setting up their exercises.
The use case diagram exercises are attempted by students first, since they are the
easiest to understand conceptually and, moreover, since the exercise model solutions
constitute simpler diagrams than those for the class diagram exercises. The unit
specification gives reference to a small example exercise specification and model
solution. This is for the purposes of demonstrating good practice to the students.
Furthermore, the first exercise varies only slightly from the example and has only
one, simple model solution. The idea is to allow the student to concentrate on
becoming comfortable with the Theseus student diagram editor and to provide an
initial “confidence boost” before the second and third exercises present the student
with more substantial domain problems.
8. Use and evaluation
211
Since the first exercise does not require mutually exclusive solution cases, then the
DiagramFeaturesTool is invoked only once by the exercise marking scheme. For
the subsequent exercises, mutually exclusive solution cases are identified and the
marking tool is invoked repeatedly using the method demonstrated in figure 7.2. The
identification of mutually exclusive solution cases is straightforward and the
repeated invocation of the marking tool is rendered a trivial task. The creation of
suitable feedback content is a non-trivial and very time-consuming process.
Traditional CBA feedback comments such as “connection x absent” are scrupulously
avoided but at extreme cost in exercise development time.
Model solutions are initially submitted to reveal the raw scores allocated by the
aesthetic measures. The aesthetic measures’ leniency is subsequently set to equal the
initial raw scores so that unrealistic layout expectations were avoided. This process is
straightforward, but the relative weighting of the aesthetic measures is set equal for
each measure.
The prioritisation and truncation of student feedback is achieved through the use of
four concrete strategies. The strategy for distinguishing between mutually exclusive
alternate solution cases was as discussed in section 6.3.5. Sorting the feedback for all
features tests was achieved as described in section 6.4.4. Since no structural measures
were deployed, then comments were prioritised according to equation 6.6.
The truncation strategy utilised the concrete strategy outlined in section 6.4.4,
parameterised such that the 2 highest priority features comments were retained
along with the 1 highest priority aesthetic layout comment.
The UML class diagram exercises were attempted by students after the UML use case
exercises had been completed. Again, the unit specification gives reference to a small
example exercise specification and model solution for the purposes of clarifying good
practice. The first exercise varies only slightly from the example and has only one,
simple model solution, for motivational purposes.
The DiagramFeaturesTool is again used for features marking. Many of the
features of developing UML Class Diagram exercises are similar to those for the
development of the earlier, UML Use Case diagram exercises. The features tool is
8. Use and evaluation
212
repeatedly invoked in marking exercises 2 and 3, with the effort required to develop
feedback being the most arduous stage of the exercise development.
For the UML Class Diagram exercises, the relative weighting of the non-interception
aesthetic measure is reduced relative to the other aesthetic measures because the
Generalisation connection routinely intercepts other generalisation connections in a
manner which is not detrimental to the aesthetic layout of the diagram. Such an
interception is present, for example, in figure 8.4. The other aesthetic measures are
set to be equal for the purposes of the exercise, with leniency values which differ
from those of the UML Use Case diagram exercise but which are initially defined in
the same, standard manner.
Again, the prioritisation and truncation of student feedback is achieved through the
use of four concrete strategies. The concrete strategies utilised are the same as for the
Use Case Diagram exercises.
8.2.3.2 Evaluation of the exercises
A total of 36 students, all Computer Science undergraduates, volunteered and had
their usernames added to the course list. Of these students, 6 subsequently failed to
set up any exercises while 30 attempted the UML Use Case Diagram exercises. Of
these 30 students, 28 continued on to attempt the UML Class Diagram exercises.
Student marks for all exercises were consistently high. In all exercises, nearly all
students achieved “effective” full marks by the time of their final submission. The
term “effective” full marks is used here to indicate that the student submissions were
of a very high standard but did not achieve exactly 100% in features and aesthetics
marking. This situation resulted because:
•
Students were presented only with feedback, and not marks. Consequently,
so long as the feedback was good then the student would not even realise that
a mark of 100% had not been achieved and would move on to the next
exercise.
•
Aesthetics measures did not always return 100% due to the nature of the
exercises.
8. Use and evaluation
213
The fact that students were presented with only feedback, with the underlying marks
withheld, combined with the fact that the assessment was purely formative, seemed
to reduce the temptation for gambling and perfectionism. The maximum number of
submissions made for any exercise in either domain was 17. The average number of
submissions for each of the exercises is shown in table 8.1.
Exercise
UC1
UC2
UC3
Class1
Class2
Class3
Average
3
4
5
1
5
3
submissions
Table 8.1: Average submission numbers for the prototype exercises
Students were asked to complete a brief questionnaire, in order to summarise their
experience of using the exercises and learning from the feedback. The majority of the
questions asked the student to agree with a series of statements which were then
scored on a five point Likert scale [Lr32], from 1-disagree to 5-agree. Table 8.2 shows
the statements and the mean score. 22 completed questionnaires were returned.
Students found the courseware easy to use. The questions were regarded as easy to
comprehend in both domains. Generally, students felt that the feedback was useful
in improving their diagram and re-submitting a better version and many students
were motivated to further research between submissions to find information which
elaborated on the feedback comments. This was helped, no doubt, by the references
included in the feedback comments themselves. Also, in general, students thought
the exercises helped their learning process and were a good use of their time.
A notable trend in the questionnaire results, however, is that the UML Use Case
Diagram exercises were more popular with students than the UML Class Diagram
exercises. Several students noted, in response to the request for free-form comments
at the end of the questionnaire, that the Class elements were difficult to edit so that
the result was aesthetically pleasing due to the poor flexibility of the text elements
8. Use and evaluation
214
holding the attributes and operations. It is clear that the authoring of a new domain
notation for class diagrams might improve student response in future.
Statement
Mean score (N = 22)
The system was easy to use
4.2
The Use Case Diagram questions were easy to comprehend
4.4
The feedback provided to my submitted Use Case Diagram
coursework helped me to improve my diagram
4.0
The Use Case Diagram questions helped my learning process
4.0
The Class Diagram questions were easy to comprehend
4.1
The feedback given when I submitted the Class Diagram
coursework helped me to improve my diagram
3.5
The Class Diagram questions helped my learning process
3.5
The feedback comments on appearance helped me to lay out my
diagrams more clearly
3.8
The feedback I received for my submissions motivated me to
research further
3.5
I made improvements to my solution as a result of the feedback I
received and re-submitted the improved version
4.0
The exercises were a good use of my time
3.8
Table 8.2: Results of the student questionnaire
A notable trend across questionnaires was that those who responded more
favourably to the statement questions tended to leave no further comments, while
those who had responded less favourably were more likely to leave (critical)
comments. The most common critical comment was that there were not enough
exercises. It seemed that many students had hoped for a comprehensive series of
courses to assist them up to modular examination level. Unfortunately, the
development of such courses was not feasible due to time constraints and the
difficulty encountered in constructing good formative feedback comments for the
features tests. Howsoever critical, these comments do imply that the students wanted
8. Use and evaluation
215
more formative assessment using these methods. This is a desire which only the
continued development of the exercises would help to fulfil.
This section has described the process of constructing and running the prototypical
exercises. Subsequent sections will apply more rigorous examination of the
performance of each of the extensions in turn, according to the criteria laid down by
the three research areas of CBA, formative assessment and educational
diagramming.
8.3 Assessing the aesthetic layout of student diagrams: evaluating
performance
To assess the performance of the extension to allow the assessment of the aesthetic
layout of student diagrams to occur, it is necessary to link the experience of
designing exercise domains, authoring exercises, running the exercises and
generating feedback to students to the requirements from each of the disciplines of
CBA, formative assessment and educational diagramming identified in section 5.2.1.
Section 6.2.1 linked the design of the extension to its requirements. Here, we relate
the experience in use to those requirements.
8.3.1 Evaluating the extension as CBA
The prototypical exercises constitute good examples of formative assessment. The
domain notations are designed online, the exercises are developed online and the
management of the full lifecycle of a CBA exercise is achieved through an integrated,
online system.
The three main requirements of the extension in relation to CBA are met. The
aesthetic measures successfully provide a basis for the assessment of diagram
aesthetics in both assessed domains. The system of aesthetic measures is
comprehensive, such that the implementation of structural measures has not been
required. However, given that structural measures operate in precisely the same way
as aesthetic measures, there is no reason to believe that their use would have been
any less successful than for aesthetic measures should additional layout criteria have
proved necessary for the domains.
8. Use and evaluation
216
It proves possible to take into account educator preferences and differences between
domains when assessing the aesthetic layout of different domains. The system of
weighting allows the non-interception aesthetic measure to be “downgraded” in
relative importance for the UML Class Diagrams with a trivial amount of effort.
It has become obvious, however, that the system of weighting would be more
effective if the relative weights were defined based upon research outcomes rather
than what amounts to carefully considered guesswork. It must be noted, however,
that increased usage of the system would be a good platform for the research itself to
be carried out, with weights improved in accuracy, year upon year.
The other requirements are met in entirety. The extension is successfully integrated
into the marking and feedback systems and is transparent to students, who showed
no awareness that any of the “behind the scenes” processes had changed. Indeed, the
students seemed to regard the prototype exercises in much the same way as they had
the compulsory Java programming exercises they had completed previously, since
the exercise specifications were delivered and feedback was returned using a
consistent format for both.
8.3.2 Evaluating the extension as formative assessment
Most students agreed that the feedback comments relating to the aesthetic layout of
their solutions had helped to improve the appearance of their diagrams. The
feedback was motivational since many students were inspired to improve their
diagrams and re-submit with better versions because of it. Section 8.3.1 has discussed
the integration of the extensions into the feedback system, and the evaluation of the
extension for prioritising and truncating feedback will consider this issue further.
8.3.3 Evaluating the extension as educational diagramming
Within the context of educational diagramming, the key requirements have been
met. The system provides a basis for assessing diagrams generically through the
application of aesthetic criteria, which have successfully assessed diagrams in two
domains based upon different weighting and leniency configurations. A platform for
extension to accommodate new domains has been provided by allowing future
developers to develop further layout criteria, on a domain specific basis, which
operate according to the same design principles. Currently, the aesthetic measures
8. Use and evaluation
217
operate on a domain-independent basis apart from exercise-specific configuration,
and are based upon established aesthetic principles from research fields. The relative
importance of criteria can be taken into account through the system of weighting,
although section 8.3.1 has noted already that further research into precise weighting
values on a per-domain basis would be useful. The system of leniency values, on the
other hand, benefits from being able to be configured using a defined mechanism.
Fundamentally, the system has succeeded in improving the appearance of student
diagrams and in assessing the clarity of the diagram as well as its features-based
correctness. This, therefore, represents a positive achievement in educational
diagramming.
8.4 Assessing solutions with mutually exclusive solution cases:
evaluating performance
This section considers how the extension to allow the assessment of solutions with
more than one acceptable model solution through defining mutually exclusive
alternate solution cases has performed, relative to the requirements in CBA,
formative assessment and educational diagramming which were established in
section 5.2.2. Section 6.2.2 previously linked the design of the extension to its
requirements. Here, we consider the experience in running prototypical exercises.
8.4.1 Evaluating the extension as CBA
The requirements for the extension from a CBA perspective have been met. Defining
mutually exclusive solution cases is a straightforward, repetitive process once the
multiple model solutions have been developed. It is necessary to consider all possible
model solutions — barring the minor variations in labelling which can be taken into
account by defining flexible regular expressions as Oracles — which raises the
possibility that a student with a particularly novel solution might not receive a
suitable response. However, given that exercises constructed using the extension are
intended for formative assessment purposes, there is no possibility of a student
losing credit through originality, and, furthermore, this issue did not arise during the
running of the prototypical coursework.
The exercise developer is able to specify the common case and each of the mutually
exclusive solution cases using a consistent notation, with little more effort than for
8. Use and evaluation
218
conventional CBA exercises using CourseMarker. Furthermore, the extension is
integrated into the marking and feedback system in such a way that complete
transparency is achieved from the point-of-view of the student.
8.4.2 Evaluating the extension as formative assessment
From the perspective of formative assessment, the central requirement is feedback.
The difficulty of developing good formative feedback comments for CBA exercises
must not be underestimated when the process of exercise development begins.
Developing good feedback is a time-consuming process requiring the construction of
carefully phrased, motivational comments relating to the principles tested by the
associated features expressions within the features test. Furthermore, it is necessary
to link the comments to learning materials. This could be achieved, firstly, by
locating good reference material for the various feedback comments — trying not to
reference the same text repeatedly if student research is to be nurtured — or,
secondly, by developing a wide selection of bespoke research material for integration
into the CBA course. The first option is feasible, if sufficient priority is attached to the
exercise feedback by the educator. The second option may only be feasible in
extraordinary cases. A third solution which could be considered in the future is the
integration of courseware into content management systems, with active linking
between assessment feedback and teaching materials within the CMS.
The assessment process was able to determine which version of the model solution
the student was able to attain and to tailor the feedback accordingly. This process
occurred smoothly across both prototypical domains. Again, with reference to the
creation of feedback by the educator, most identifiable distinction tests were based
upon imperfect harbingers, since it proved difficult to identify precise pedagogical
reasons which encapsulated the difference between different model solutions in
many cases. However, with effort, it proved possible to construct useful,
motivational feedback.
The success of the feedback when the formative assessment framework criteria,
summarised in section 2.2.5, are applied is considered in section 8.5.2.
8. Use and evaluation
219
8.4.3 Evaluating the extension as educational diagramming
The requirements within the context of educational diagramming are similar to those
considered in section 8.3.3. The system of aesthetic and structural measures, further
parameterised by relative weighting and leniency values, allowed a basis for
assessment in a wide variety of educational diagram domains to be provided.
The specification of common and mutually exclusive features is a standard process
involving the determination of common and uncommon solution elements in the
various model solutions. So long as the DiagramFeaturesTool is used to assess
diagram features in a generic way then consistency across domains is also achieved,
even down to the level of the features expressions which are evaluated by the
marking tools. Any operator available within the DiagramFeaturesTool may be
applied freely to any aspect of the student diagram, and this, indeed, occurred
successfully within the prototypical exercises. Further potential in allowing
educators to develop their own criteria is presented by the decoupling of the
extension from the features tool used. Although the DiagramFeaturesTool was
developed to provide generic functionality, new marking tools may be developed
and substituted for the DiagramFeaturesTool by merely changing the invocation
in the exercise marking scheme.
8.5 Prioritising
performance
and
truncating
the
feedback:
evaluating
The extension responsible for the prioritisation and truncation of feedback has
requirements from the perspective of CBA and formative assessment. This section
provides an overview of how the experience gained from the prototypical exercises
demonstrates that the requirements from these areas, defined in section 5.2.3, have
been fulfilled.
8.5.1 Evaluating the extension as CBA
The central requirements from the perspective of CBA were the integration of the
mechanism for prioritising and truncating feedback into the architecture, which was
achieved. The prioritisation and extension successfully occurs for each submission
without human marker intervention. Comment priority can be specified by the
exercise developer in terms of the system of weights; although some care must be
8. Use and evaluation
220
taken when defining the weights, the system was shown to operate successfully for
the prototypical exercises and the students were motivated by the resulting feedback.
8.5.2 Evaluating the extension as formative assessment
From a formative assessment perspective, the central requirement was to provide
only the most relevant feedback comments in order to allow the student to
concentrate on key improvements to their solutions which are required, without
being “overloaded” by irrelevant comments. Flexibility for the exercise developer has
been achieved through the implementation of the system of Strategies, in which each
Strategy represents a sub-problem within the prioritisation and truncation of
feedback.
The purpose of the prioritisation and truncation strategy was to adapt the feedback
provided by automated assessment into a form which would, if combined with a
flexible marking system, motivational feedback and a CBA platform, provide a
framework for effective feedback for formative assessment to be delivered.
Section 2.2.5 outlined the properties to which a framework for effective formative
feedback should conform. Firstly, formative assessment should facilitate the
development of self assessment, or reflection, in learning. The assessment programme
was not educator-led in any sense. The students were free to work through the
exercises at their own pace and relied for feedback upon the courseware assessment.
Feedback comments had been engineered to be motivational, to encourage further
research and to provide good references as a starting point for that research. The
students were therefore compelled to reflect upon how the information they had
been directed to would help to improve their coursework solution, a task involving
both critical self-assessment and the gradual improvement of a student’s internal
perception of what is required from the coursework.
Secondly, formative assessment should encourage teacher and peer dialogue around
learning. Students were aware that they could confer and collaborate on exercises to
the extent that they wished. Indeed, it would have been impossible to prevent
students from acting in this way due to the lack of restrictions on the availability of
the exercises through CourseMarker. Students could collaborate and assists each
8. Use and evaluation
221
other during exercises. Furthermore, several students sent emails containing queries
about coursework issues.
Thirdly, formative assessment should clarify what constitutes good performance.
The unit specification within each exercise domain was authored to include a simple
specification of an example exercise, together with links to diagram solutions which
would satisfy the requirements of the specification. Including the example in the unit
specification allowed students to grasp the goals of the unit at an early stage, so that
the correlation between the internal perception of the student and the actual goals of
the educator was maximised. Questionnaire results demonstrate both that students
found the exercises easy to comprehend and that students felt their coursework
solutions had been improved as a result of the feedback to earlier submissions.
Fourthly,
formative
assessment
should
provide
opportunities
to
improve
performance. Table 8.1 demonstrates that students made multiple submissions for
exercises in the majority of cases. Students agreed that they utilised the feedback
provided to early submissions in order to improve their performance and re-submit.
The fact that the assessment process is fully automated allows students to submit
solutions several times with no increased workload for the educator.
Fifthly, formative assessment should deliver information focused on student
learning. The feedback was always delivered in good time due to the nature of the
automated
assessment
process.
In
practical
terms,
feedback
is
delivered
instantaneously in all cases. The prioritisation and truncation extension is a key
component in the process of delivering feedback which is not overwhelming in
quantity. A large number of features and aesthetic layout comments are sorted by
priority, with truncation allowed according to the specification of the educator, who
can control the number of criteria about which feedback is given. The student
perceives the feedback as targeted and hence does not lose the view of the exercise as
a holistic entity.
Sixthly, formative assessment should encourage positive motivational beliefs and
self-esteem. Students were aware that the prototypical exercises carried no
summative assessment weight or course credit. Therefore, students were able to relax
and enjoy the process of learning rather than concentrating upon the achievement of
good marks. Feedback concentrated on learning goals by specifying good practice
8. Use and evaluation
222
and referring to the educational literature. Questionnaire results demonstrate that,
overall, students thought that the exercises were a good use of their time, even
though no course credit was gained from them.
Seventhly, formative feedback should provide information to educators that can be
used to help shape the teaching. CourseMarker archives all submissions across a
course. Administrators can retrieve submissions, precise marks and the feedback
provided for each submission. Even details such as the time of the submissions are
stored. Substantial material for further research can be generated by the
implementation of exercises within such a CBA context.
8.6 Conclusions
Chapter 9 will review the key points of the thesis to show how the evaluation of the
system relates to the general objectives for research stated at the beginning. The
general objectives of this work asked several specific questions, which can now be
answered. The purpose of this section is to discuss each of these questions in turn
and to argue that the formative CBA of diagram-based domains is both feasible and
useful.
This work has demonstrated that the formative CBA of diagram-based domains is
certainly possible. Feasibility is assessed by determining whether the level of
difficulty encountered in developing and deploying the exercises would render the
process too difficult to be developed by educators. Usefulness, on the other hand, is
primarily assessed by determining whether the exercises were useful to students and
enhanced their learning process.
The automated system for the marking of the aesthetic layout of student diagrams
was thought by students to have provided useful feedback which improved the
layout of their diagrams. Prototypical exercises have demonstrated that domainspecific layout rules are not required for each domain. The trade-offs required to
ensure generality across domains while at the same time allowing specialisation
involve the need to specify general functionality while also allowing that
functionality to be extended by future developers in a carefully defined way. The
system of aesthetic and structural measures accomplishes this through defining a
basic range of domain-independent functionality which can be invoked in many
8. Use and evaluation
223
domains, while allowing similar marking tools to be developed around domainspecific criteria in the form of structural domains. Furthermore, the distinction
between aesthetic and structural domains allows clarity when considering those
measures necessary to assess each new domain. Aesthetic measures should be used
by default and only sidelined with justification. Conversely, structural measures are
intended to be domain-specific, and therefore do not need to be considered for
inclusion when new domains are developed, without specific reason on the behalf of
the educator.
The extent to which it is possible for the educator to provide formative feedback in
many diagram-based domains by configuring the system and writing feedback
comments is constrained by the similarity of the domains. In the worst case, the
development of a marking tool, structural measures and new concrete prioritisation
and truncation strategies would have to be carried out once for each domain.
Configuring of exercises within the domain is possible through configuration and the
writing of feedback content. However, such a worst case scenario is not inevitable or
even common. The development of those components required for a new domain
can often take advantage of the similarity between many educational diagram
domains. Adaptation or even complete re-use of components designed for use with a
previous domain is plausible in many cases and the probability of existing
components being useful increases as new domains are developed for assessment
and more components are created. Furthermore, the extensions were created to
provide a generic basis for formative assessment, including a domain-independent
tool for features marking, a general suite of aesthetic measures and example concrete
strategies to solve the problem of prioritising and truncating student feedback.
The main area where standardisation of CBA processes has failed to occur is in the
creation of the formative feedback. This chapter has shown, through the
documentation of prototypical exercises, that CBA can be used to deliver good
formative assessment. It must be emphasised, however, that the effort involved in
creating the exercises was great due to the feedback requirements. With this in mind,
it must be understood that formative assessment can be rendered less resourceintensive through the use of CBA technology, so long as the potential for long-term
re-use of the exercises is considered. In fact, re-use of formative assessment diagrams
does not pose problems of question security in the same way that re-using
8. Use and evaluation
224
summative questions does. Furthermore, courses can be incrementally improved,
year-on-year, both by adding new exercises each year to increase the coverage of the
domain, and by taking into account student comments to improve existing exercises.
A formative assessment process which is automated using CBA technology has been
shown to enhance student learning. The exercises were popular with those students
who enrolled and can be shown, as in section 8.5.2, to conform to the framework for
good formative assessment practice.
Chapter 9 builds upon the experience documented within this chapter, and the
evaluation which followed, to initiate a discussion surrounding the two fundamental
questions which formed the basis of this work. Furthermore, the contributions of the
work are examined and future work is proposed.
Chapter 9
Conclusions
9. Conclusions
226
Introduction
This chapter reviews the key points of the thesis to show how the evaluation of the
system relates to the general objectives for research stated at the outset. The
contributions of the research are discussed and areas for further research to be
carried out in the future are considered.
Section 9.1 discusses the way in which the research has approached the problem of
conducting formative, computer-based assessment in diagram-based domains. A
summary is provided of how the work has met its general and specific objectives
according to the requirements set out in chapter 5. Section 9.2 provides a summary of
the contributions of this work while section 9.3 outlines areas for future research in
the key topic areas of the thesis. Finally, section 9.4 concludes with an epilogue on
CBA, formative assessment and educational diagramming.
9.1 Meeting the objectives
Chapter 5 demonstrated that, in order to prove that the automation of the formative
assessment of diagram-based coursework using CBA courseware is both feasible and
useful, the design, implementation and integration into the existing courseware of
three key extensions was necessary. The three identified areas of extension are:
•
Extending the marking system to assess the aesthetics of student diagrams;
•
Extending the marking system to allow the assessment of mutually exclusive
solution cases;
•
Changing the system of feedback to provide only the highest priority
comments to students.
This section revisits these objectives and demonstrates that the key objectives have
been accomplished.
9.1.1 Assessing the aesthetic layout of student diagrams
Allowing the assessment of the aesthetic layout of student diagrams is necessary if
the formative assessment is to assist student learning within a diagram domain.
Educational diagrams convey domain-specific information through their convention
9. Conclusions
227
of meaning, but if the aesthetic layout of the diagram is poor then the meaning of the
diagram may be poorly understood. Initial experimentation, described in chapter 4,
showed that students often produced diagrams of poor aesthetic appearance which
conveyed information in an unclear way. If the aesthetic appearance of student
diagrams is not assessed, and no feedback to the student provided, then the student
will not be given the incentive to improve their diagram-based solution in this key
aspect.
The design and implementation of the extensible mechanism to allow the aesthetic
layout of student diagrams to be assessed has been successful. The extension was
realised and integrated successfully into the existing CourseMarker architecture. The
potential for domain coverage is large. Representations for new domains can be
authored easily using the Theseus diagram editor. The only constraint, if the
aesthetic layout of the domain is to be assessed, is necessity of inclusion within the
domain notation of a Border Tool which can be used by the student to indicate the
boundaries of their diagram; this is a trivial task. The system of aesthetic and
structural measures allows the aesthetic layout of any diagram domain to be
assessed. Aesthetic measures are built around domain-independent, general purpose
aesthetic layout criteria which can be used to assess the diagram appearance of a
large number of educational diagram domains. Structural measures can be used to
extend the layout marking to incorporate any domain-specific criterion necessary to
assess aesthetics within a new domain that may arise.
The construction of new structural measures requires programming from the
developer and, as such, is non-trivial. However, layout tools do not require extensive
coding and, furthermore, are not required at all for many educational diagram
domains which may be assessed successfully using the provided aesthetic measures
alone. Once a structural measure has been created, it can be used for all exercises of
the same type.
The extension to allow the assessment of the aesthetic layout of student diagrams
was designed with the requirements to provide a basis for assessment of aesthetic
layout, to allow extension to support future aesthetic layout requirements and to
integrate into existing courseware architecture as prime concerns. The first two
requirements were met by considering the commonality and variation across layout
9. Conclusions
228
criteria for diagram domains. Commonality was represented by aesthetic measures,
whilst variation was represented through allowing extension as structural measures.
The distinction between aesthetic and structural measures was essential in clarifying
cross-domain requirements to the educator. A design in which layout measures were
applied generally to each domain would result in greater effort on the part of
educators to consider suitability on a domain-specific basis. Inevitably, this would
result in the number of applied measures for each domain being reduced due to the
amount of time required to determine suitability. With the design described by this
work, the educator need only consider the special requirements of the domain in
order to determine the necessary layout tools.
The extension has been successfully integrated into the CourseMarker marking
system. It is implemented as a series of marking tools and interfaces and provides
feedback in a form which can be utilised by the CourseMarker feedback system. This
final requirement allows the extension to be deployed as part of CourseMarker’s
high-performing, platform neutral architecture. This allows the creation and delivery
of aesthetic-based feedback comments to be delivered to the student.
9.1.2 Assessing solutions with mutually exclusive solution cases
The system of features marking within CourseMarker had been developed as part of
the Generic Marking Mechanism as a general tool to allow marking to be
accomplished across domains with widely varying notions of quality [Ta02]. In the
process of summative assessment to which CourseMarker was applied, students
were often to be tested using very specific problem specifications which designated
one technique of solving a problem to be used and often forbade all others. This was
especially true in programming exercises where students were expected to
concentrate on a new programming construct, or a new set of programming
constructs, in each week’s assessment.
Within the context of the formative assessment of diagrams this method was
demonstrated to be insufficiently flexible. Chapter 4 outlined the problems
associated with the approach. Initial, simple exercises could be appropriately marked
in many cases, but as the exercise specifications grew more complex the possibility of
multiple model solutions being acceptable meant that features testing was markedly
less comprehensive.
9. Conclusions
229
It became clear that, if mutually exclusive solution cases could not be assessed, then
features marking would be reduced either to marking the common subset of features,
or to restricting learning through precise problem specifications. Both of these
possibilities would have a negative effect on student learning in diagram-based
domains and, hence, on the formative assessment process.
The design and implementation of the mechanism to allow the assessment of
solutions with mutually exclusive solution cases has been successful, with the
extension integrated successfully into the existing CourseMarker architecture. The
mechanism is generic, allowing any marking tool which has been defined to be used.
A DiagramFeaturesTool is implemented to allow generic features testing in
common educational diagram domains, but the functionality can be extended to
cover domain-specific features tests by designing and implementing a new marking
tool
and
invoking
it
within
the
marking
scheme
in
place
of
the
DiagramFeaturesTool. This approach is consistent with the notion, applied
throughout this work, that a basis of existing functionality should be supplemented
by the possibility of expansion by developers to accommodate new domains, which
may raise previous unforeseen requirements, in the future.
Defining the feedback comments which will be delivered to students according to the
evaluation of each individual features expression is a non-trivial task which takes
much time. Despite this, the fact that the nature of formative assessment allows great
potential for exercise re-use means that resource-savings can be made over the
medium and long-term, resulting in a set of consistently marked exercises which can
be incrementally improved or added to over time.
The extension has been successfully integrated into the CourseMarker marking
system. The DiagramFeaturesTool is implemented as a CourseMarker marking
tool with generic functionality while the design takes into account the flexibility of
the exercise marking schemes to repeatedly invoke the marking tool to assess each
mutually
exclusive
solution
case
in
turn.
Feedback
is
returned
in
the
MarkingLeafResult and MarkingCompositeResult structure which can be
utilised by the CourseMarker feedback system, thus allowing the extension to be
deployed as an integrated part of CourseMarker’s high-performing, platform neutral
infrastructure.
9. Conclusions
230
9.1.3 Prioritising and truncating student feedback
CourseMarker’s feedback system was developed for the purposes of providing a
concise, expandable representation of the feedback generated for each stage in the
marking process. For summative assessment purposes, this representation provided
a useful breakdown of the grades awarded for the exercise, maximised for ease of
referral. For the purposes of formative assessment, however, the feedback was
unwieldy, unfocused, un-motivational and contained many feedback comments
which were unhelpful and irrelevant. The extension allows the feedback to be
prioritised according to defined criteria and strategies before being truncated
according to educator preferences.
The design of the mechanism for prioritisation and truncation of feedback divides
the task into four sub-problems: distinguishing between mutually exclusive solution
cases, prioritising features feedback, prioritising aesthetic layout feedback and
truncating the feedback tree. The implementation and integration into the existing
CourseMarker infrastructure have been successful. The potential for domain
coverage is large: any feedback can be prioritised and truncated so long as the raw
feedback tree can be generated using marking tools and appropriate concrete
strategies to solve each of the four sub-problems are defined. A basis for
prioritisation and truncation has been provided through the implementation of
example concrete strategies, while the interfaces and abstract classes defined offer a
precise extension point for developers in future.
Implementing and integrating new truncation strategies in the future, based upon
educator preferences, is a straightforward task. So long as the new concrete strategy
extends the correct abstract class according to the sub-problem it is developed to
solve, only the construction of a simple algorithm to encapsulate the strategy is left to
the developer.
Invocation in the exercise marking scheme represents the point of direct integration
into CourseMarker. The process of invoking the PrioritiseTruncateTool and
parameterising it using the correct concrete strategy objects is logical and
straightforward. The PrioritiseTruncateTool returns the truncated feedback as
a new composite marking tree, which is returned transparently to the student.
9. Conclusions
231
9.2 Contributions
The main contributions of this work are in the area of CBA, but advances can also be
demonstrated in the fields of formative assessment and educational diagramming.
Novel experience in
these fields has been gained through
the design,
implementation, integration into courseware and evaluation of the extensions to
allow aesthetic layout marking, assessment of multiple model solutions through
mutually exclusive solution cases and the prioritisation and truncation of student
feedback. The following sections summarise the key contributions in each of the
three areas of research.
9.2.1 CBA
The most obvious contribution to CBA is in the development and deployment of a
new type of CBA. Formative assessment of a free-response domain such as
educational diagrams has not been attempted prior to this work in such a manner as
to take into account such factors as mutually exclusive features correctness, aesthetic
layout and configuration of feedback. Free-response CBA still constitutes a minority
of systems in the field because of its perceived “difficulty”; this work demonstrates
that useful, formative assessment can be deployed within a free-response domain
such as educational diagrams and provides a platform for deployment across many
domains and an example of the incorporation of such features into an existing CBA
architecture. Another contribution to CBA lies in advancing understanding within
the CBA community as to what constitutes good formative assessment; previous
CBA work, including that of Tsintsifas [Ta02] has argued that formative assessment
is merely summative assessment with the marks discarded. This work has developed
the understanding, within a CBA context, that this is not the case.
A deeper contribution to CBA is that the software deliverable can be used as the
basis for further research. Although prototypical exercises have been deployed for
the purposes of assessing the feasibility and usefulness of the concepts, it is clear that
each of the extensions provides clear potential for further work by CBA developers
and researchers into areas such as the automated marking of diagram layout, the
automated marking of coursework with multiple valid model solutions and the
truncation and prioritisation of student feedback. Section 9.3 considers the potential
for future research based upon the research described here.
9. Conclusions
232
9.2.2 Formative assessment
The contribution to formative assessment is in the application of CBA to the problem
of formative assessment decline. Chapter 2 highlighted the solutions suggested in the
literature to adapt formative assessment to a changing educational climate
characterised by less favourable staff to student ratios. The literature highlights that
“mechanisation” may be an option but the scope of ambition is fairly modest. CBA
courseware offers potential for considerable resource-savings through a total
automation of the formative processes for assessment and the return of feedback,
provided that the infrastructure is sufficiently flexible and has been adapted to
formative assessment requirements. This work has demonstrated that such an
approach is feasible and useful, provided advice for educators related to the issues
involved and provided useful experience for developers.
Moreover, the integration of the extensions into CourseMarker provides a basis for
impact upon formative assessment automation within higher education institutions.
CourseMarker has been successfully deployed at at least 15 other higher academic
institutions and is prominently cited within the literature. The contributions made by
this work therefore have the potential to impact upon future research in formative
assessment with a view to changing the common perception that the assessment
form is necessarily resource-intensive.
9.2.3 Educational diagramming
The contribution to the field of educational diagramming lies in the creation of an
extensible theoretical framework for the assessment of educational diagram
aesthetics, the implementation of the framework and its deployment as part of a CBA
courseware system. Research into educational diagrams has previously taken into
account automated layout algorithms, whereby an algorithm is used to place nodes
and edges in such a way that the result is pleasing to a human eye. This work
provides a basis for the inverse process — that of assessing the aesthetics of a
diagram which has been drawn by a student. Again, the contribution further lies in
the use of the deliverables for the purposes of further research. Research into the
relative importance of aesthetic layout criteria can be facilitated through the system.
Section 9.3 considers the potential for future research based upon the research
described here.
9. Conclusions
233
9.3 Future Work
A central contribution of this work is the development of a solid basis for future
extension by researchers. The work described here can be used to facilitate future
research in numerous ways which span various research areas. This section considers
the potential for future research in the areas of CBA, formative assessment and
educational diagramming.
9.3.1 CBA
As well as being deployed within several educational institutions as a live CBA
system, CourseMarker has been, and continues to be, an active research platform for
academic researchers and students. Interesting student projects would include the
development of courses, including exercises and concrete designs for further
structural measures, diagram marking tools to be incorporated into the mutually
exclusive features case extension, and concrete strategies.
An interesting piece of future research would be to investigate the extent to which
the extension described here for mutually exclusive solution cases could be usefully
deployed in other domains, such as programming exercises. It has been noted that,
currently, programming exercise specifications are used to “shepherd” students into
making use of an exact programming construct, which is subsequently the subject of
features testing. Research could be conducted to determine the extent to which
mutually exclusive solution cases could be used to “relax” specifications in existing
programming courses with a view to providing increased flexibility to students in
problem-solving and a less rigid problem specification.
Further research could be conducted which uses the aesthetic layout marking tools as
a starting point. To what extent could the layout marking mechanism be adapted to a
summative assessment scenario in which the raison d’être is the exact summative
mark returned by the system, rather than the feedback. This requirement would
place greater emphasis on the need for accurate grading, both between the various
aesthetic and structural measures and between the layout measures and other gradecontributing factors from other marking tools. A special emphasis would need to be
placed upon finding a mechanism to adequately aggregate marks to produce a fair
overall summative result.
9. Conclusions
234
Further research could also aim to further augment the usefulness of the feedback.
Currently, the feedback is modified based upon only the current submission. Further
research could aim to take previous submissions’ feedback into account, developing
for the student a “satisfying” sense that their individual progress is being aided. The
idea of using AI agents to monitor student progress and tailor feedback through
gathering data from both submissions and administrative data has been proposed
previously [Ta02], but has yet to be realised.
Furthermore, there is a need to address the interoperability of CBA exercises. At the
time of writing, research is already underway within the LTR research group into
defining free-response CBA questions in an interoperable way. It would be useful if
formative assessment exercises within diagram-based domains were to be
approached using such a methodology so that, when further courseware
incorporates functionality similar to that described by this work, exercises could be
created and used on a cross-platform basis.
9.3.2 Formative assessment
Several future research projects within the area of formative assessment could use
this work as a departure point. This work examines the formative assessment using
CBA courseware of diagram-based domains. Diagram-based domains were chosen
because of their free-response nature, which allows higher-order cognitive levels to
be more easily assessed, and their wide potential for cross-disciplinary application.
This research, however, raises questions as to whether CBA automation could be
applied to formative assessment in other domains such as programming using the
approach described by this work.
Further research could seek to investigate whether different methodologies in
feedback construction for the formative exercises resulted in better assistance for the
learning process of the student. This research provided a set of brief guidance to
educators which touched upon the subject of feedback construction. However, this
area could be the subject of considerable further research. As well as the direct
methods of constructing and phrasing feedback comments, projects could also
investigate strategies for providing extensive teaching material content through the
courseware and providing direct links to this content from the feedback area.
9. Conclusions
235
9.3.3 Educational diagrams
The extensions described by this work, and their integration into courseware,
facilitates research into several areas of educational diagramming.
This research provides a platform to determine whether theoretical criteria for the
assessment of educational diagram aesthetics will accurately provide an indicator of
the appearance of a student diagram. Research can now be conducted to determine
the perception of diagram layout among domain novices rather than experts, as is
usually the case. Methods could also be investigated to optimise the priority
accorded to the various layout measures through interactive modification of the
weight values assigned to the measures in the marking scheme.
Furthermore, in the general area of diagramming, this research offers the potential to
carry out research which reverses the traditional methodologies of those such as
Purchase et al [PAC02]. Perception of aesthetic layout criteria could be tested by
asking volunteers to interactively construct diagrams with priority placed upon a
defined aesthetic layout criterion. The drawing could then be assessed objectively by
the marking system to see the actual level, as opposed to the level perceived by the
volunteer, that the drawing had achieved according to the criteria. Similarly,
interactions between criteria when combined could be investigated. This reverses the
traditional methodology in which diagrams are prepared as part of the research
materials, with volunteers assessing the layout of the pre-drawn diagrams.
9.4 Epilogue
This research investigated the feasibility and usefulness of the idea of automating
formative assessment coursework using CBA courseware, in free-response diagrambased domains. Work on free-response, diagram-based CBA is sparse. It is for this
reason that formative assessment using CBA in such domains has not been reported
prior to this work.
An initial research phase involved the construction of CBA exercises using existing
courseware with only minor, obvious, modification. The initial phase of the research
pointed to the fact that CBA courseware certainly had the potential to deliver
formative assessment courses in free-response domains, but that the current
techniques and methodology associated with the assessment required augmentation.
9. Conclusions
236
Nevertheless, the initial research did indicate that several key concepts, such as the
use of two-part assessment as a motivator, held true in real world courses.
The design of extensions to available CBA courseware capability and the integration
of the extensions into an existing CBA courseware architecture subsequently formed
the core of this work. The deliverables of the work have been used to demonstrate
that the assessment could be achieved feasibly and could be useful for students, in a
formative assessment context, by aiding their learning process.
The first extension, to allow the aesthetic layout of student diagram solutions to be
assessed, resulted from the principle that formative assessment, in being responsible
for the learning process, must take great care to teach good practice. The second
extension, to allow assessment to occur in situations where multiple model solutions
are plausible, took into account the idea that formative assessment must attempt to
allow scope for learning and student variation. The third extension, to prioritise and
truncate student feedback, resulted from the notion that large lists of marks and
feedback, incorporating many irrelevant feedback items, may result in student
‘overload’ and a subsequent failure to apply the feedback to further learning. In all
cases, the extensions were designed to be extensible to adapt to subsequent
developments, and were successfully integrated into the CourseMarker courseware
architecture.
Two central questions formed the inspiration for this work. Firstly, to what extent
can CBA techniques be used to reduce the resource required in setting a formatively
assessed coursework in a diagram-based domain, marking student submissions and
returning feedback, while still adhering to good formative assessment principles?
Secondly, to what extent would current, successful CBA practices need to be changed
to conform to formal formative assessment guidelines?
This research has provided answers to both of these questions. CBA techniques can
be used extensively to reduce the resource-intensiveness of formative assessment
while conforming to good formative assessment principles, but the processes of
determining requirements for new domains and authoring exercises can be lengthy
and involved. The resource saving of CBA exercises is derived from the fact that,
once designed and deployed, they can be used repeatedly over many academic
years. Indeed, assessment materials can be gathered over the course of several
9. Conclusions
237
academic years, with the amount of formative assessment conducted through
traditional methods gradually reduced in parallel as both educator and student
confidence with the new technology increases. Furthermore, the marking of
submissions and returning of feedback opens up new avenues of formative
assessment, such as the potential for repeated re-submission, which would never
have been plausible using traditional marking methods.
Conversely, CBA techniques can benefit greatly through being subject to the scrutiny
of assessment guidelines like those available for formative assessment. Technologyled solutions such as CBA courseware are often motivated by the desire to automate
those types of assessment, and feedback, which are the easiest to achieve in practice.
Studying the CourseMarker CBA system through “the lens” of formative assessment
criteria has opened up new lines of research enquiry and resulted in considerable
functionality being added to the system of assessment and feedback. It is only when
rigorous scrutiny is applied across the board that CBA techniques will attain a true
level of acceptance within higher education and finally convince their many
detractors of their merit.
Bibliography
Bibliography
239
All URLs were last accessed on 31 March 2007.
AK01
Anderson L.W. and Krathwohl D.R. (eds.), A taxonomy for learning,
teaching, and assessing: A revision of Bloom's taxonomy of educational
objectives, Longman, 2001, ISBN 0321084055.
AMW06
Axelsson K., Melin U. and Wedland T., Student Activity in Seminars —
Designing Multi-functional Assessment Events, Proceedings of the 11th
Annual SIGCSE Conference on Innovation and Technology in Computer
Science Education (ITiCSE 06), Bologna, Italy, p93-97, June 26-28, 2006.
ANSI05
American National Standards Institute, Overview of the U.S.
Standardization System: Understanding the U.S. Voluntary Consensus
Standardization and Conformity Assessment Infrastructure, ANSI, July,
2005. Available from: http://www.ansi.org/
App89
Apple Computer Inc., MacAppII Programmer’s Guide, Apple Computer
Inc., 1989.
APR06
Amelung M., Piotrowski M. and Rösner D., EduComponents: Experiences in
E-Assessment in Computer Science Education, Proceedings of the 11th
Annual SIGCSE Conference on Innovation and Technology in Computer
Science Education (ITiCSE 06), Bologna, Italy, p88-92, June 26-28, 2006.
As04
Ambler S.W., General Diagramming Guidelines, The Official Agile Modeling
(AM)
Site,
2004.
Available
from
http://www.agilemodeling.com/style/general.htm
ASM02
Almond R.G., Steinberg L.S. and Mislevy R.J., Enhancing the Design and
Delivery of Assessment Systems: A Four Process Architecture, Journal of
Technology, Learning and Assessment 1(5), 2002. Available from
http://www.jtla.org/
Asy94
Asymetrix Inc, Toolbook 3.0 User's Manual, Asymetrix Incorporated, 1994.
AUT05
AUT Research, Packing them in: The student-to-staff ratio in UK Higher
Education, Association of University Teachers Research, October 2005.
Available
from
http://www.aut.org.uk/media/pdf/c/j/ssr_packingthemin.pdf
Ba79
Borning A., Thinglab — A Constraint-Oriented Simulation Laboratory,
Technical Report STAN-CS-79-746, Stanford University, USA, March, 1979.
Bb03
Bligh B., CourseMarker and DATsys: next generation automated assessment
systems, talk delivered to Use of CAA in ICS Education, LTSN conference,
University of Brighton, UK, July 16, 2003.
BBF+93
Benford S., Burke E., Foxley E., Gutteridge N. and Zin A.M., Experiences
with the Ceilidh System, Proceedings of the 1st International Conference on
Computer Based Learning in Science (CBLIS’93), Vienna, Austria, 1993.
Bibliography
240
BBF+95
Benford S., Burke E., Foxley E. and Higgins C., The Ceilidh System for the
Automatic Grading of Students on Programming Courses, ACM Press,
Proceedings of the 33rd Annual ACM Southeast Conference, Clemson, South
Carolina, March 1995.
BBF96
Benford S., Burke E. and Foxley E., Developer’s Guide to Ceilidh, LTR
Report, Computer Science Department, The University of Nottingham, UK,
1996.
BC82
Biggs J.B. and Collis K.F., Evaluating the Quality of Learning: the SOLO
Taxonomy, Academic Press, 1982.
BD04
Bull J. and Danson M., Computer-assisted Assessment (CAA), LTSN Generic
Centre: Assessment Series No. 14, January 2004.
BE98
Blackwell A. and Engelhardt Y., A taxonomy of diagram taxonomies,
Proceedings of Thinking With Diagrams 98: Is there a Science of Diagrams?,
p60-70,
1998.
Available
from:
http://www.cl.cam.ac.uk/~afb21/publications/TwD98.html
BEF+56
Bloom B.S., Englehart M.D., Furst E.J., Hill W.H. and Krathwohl, D.R.,
Taxonomy of educational objectives: the classification of educational goals,
Handbook 1: Cognitive Domain, Longmans, Green, Co., 1956.
BEH+05
Burrow M., Evdorides H., Hallam B. and Freer-Hewish R., Developing
formative assessments for postgraduate students in engineering, European
Journal of Engineering Education 30(2), p255-263, May 2005.
BER+99
Baklavas G., Economides A.A. and Roumeliotis M., Evaluation and
Comparison of Web-based testing tools, Proceedings of the World
Conference on the WWW and Internet (WebNet-99), Association for the
Advancement of Computing in Education, Honolulu HI, USA, October 2528, 1999.
BET+94
Battista D., Eades P., Tamassia R., and Tollis I., Annotated bibliography on
graph drawing algorithms, Computational Geometry: Theory and
Applications, Vol 4, p235-282, 1994.
BFK+04
Belton M., Fair K., Kleeman J., Phaup J. and Shepherd E., Perception to Go:
Empowering Disconnected Delivery of Assessments, Questionmark White
Paper, 2004. Available from: http://www.questionmark.com/
Bg01
Brown G., Assessment: A Guide for Lecturers, LTSN Generic Centre:
Assessment Series, November 2001.
Bg93
Booch G., Object-oriented Analysis and Design with Applications, 2nd
Edition, Benjamin Cummings, 1993. ISBN 0-8053-5340-2.
BG97
Beck K. and Gamma E., Advanced Design with Patterns in Java, ObjectOriented Programming Systems, Languages and Applications (OOPSLA’97),
Tutorial 30.
Bibliography
241
BGK+00
Bridgeman S., Goodrich M.T., Kobourov S.G. and Tamassia R., PILOT: An
Interactive Tool for Learning and Grading, Proceedings of SIGCSE 2000,
Austin, TX, USA, p139-143, March 8-12, 2000.
BGK+96
Broy M., Grosu R., Klein C. and Rumpe B., State Transition Diagrams,
Technical Report TUM-I-9630, Technical University of Munchen, 1996.
BGL+97
Di Battista G., Garg A., Liotta G, Tamassia R., Tassinari E. and Vargiu F., An
experimental comparison of four graph drawing algorithms, Computational
Geometry 7(5-6), p303-325, April, 1997.
Bj93
Bull J., Using Technology to Assess Student Learning, TLTP Project Alter,
December 1993, ISBN 1 85889 091 8.
BL06
Brusilovsky P. and Loboda T.D., WADEIn II: A Case for Adaptive
Explanatory Visualization, Proceedings of the 11th Annual SIGCSE
Conference on Innovation and Technology in Computer Science Education
(ITiCSE 06), Bologna, Italy, p48-52, June 26-28, 2006.
Br02
Bennett R.E., Inexorable and inevitable: The continuing story of technology
and assessment, Journal of Technology, Learning and Assessment 1(1), 2002.
Available from http://www.jtla.org/
BRS96
Brown S., Race P. and Smith B., 500 Tips on Assessment, Kogan Page, 1996,
ISBN 0749419415.
BSP+03
Belton M., Shephard E., Phaup J., Fair K. and Kleeman J., Questionmark’s
Holistic Approach: Assessments systems for the enterprise, Questionmark
White Paper, 2003. Available from: http://www.questionmark.com/
Bt00
Buchanan T., The efficacy of a World-Wide Web mediated formative
assessment, Journal of Computer Assisted Learning 16(3), p193-200, 2000.
BW98
Black P. and William D., Assessment and classroom learning, Assessment in
Education 5(1), p7-74, 1998.
CB98
Culverhouse P.F. and Burton C.J., Mastertutor: a tutorial shell for the
support of problem solving skill acquisition, Bringing Information
Technology to Education (BITE): Integrating Information & Communication
Technology in Higher Education, Maastricht, p433-443, March 25-27, 1998.
Available from: http://www.cis.plym.ac.uk/cis/publications/BITE1998.pdf
CDE+03
Carter J., Dick M., English J., Ala-Mukta K., Fone W., Fuller U., Sheard J.,
How Shall We Assess This?, Proceedings of the 8th Annual Joint Conference
Integrating Technology into Computer Science Education, Thessaloniki,
Greece, p107-123, June 30 to July 2, 2003. ISSN 0097-8418.
CE98a
Charman D. and Elmes A., Computer Based Assessment (Volume 1): A
guide to good practice, SEED Publications, University of Plymouth, 1998,
ISBN 1-84102-024-9.
Bibliography
242
CE98b
Charman D. and Elmes A., Computer Based Assessment (Volume 2): Case
studies in Science and Computing, SEED Publications, University of
Plymouth, 1998, ISBN 1-84102-02-7.
CE98c
Charman D. and Elmes A., A Computer-based Formative Assessment
Strategy for a Basic Statistics Module in Geography, Journal of Geography in
Higher Education 22(3), p381-385, November, 1998.
Cf98
Culwin F., Web hosted assessment: possibilities and policy, Proceedings of
the 6th Annual Conference on the Teaching of Computing/3rd Annual ITiCSE
Conference on Changing the Delivery of Computer Science Education, p55–
58, 1998.
CM03
Chok S.S. and Marriott K., Automatic generation of intelligent diagram
editors, ACM Transactions on Computer-Human Interaction (TOCHI) 10(3),
p244-276, September, 2003.
CO97
Chung G.K.W.K. and O’Neil H.F. Jnr, Methodological approaches to online
scoring of essays, Document Reproduction Service, Educational Resources
Information Center (ERIC), U.S. Department of Education Office of
Educational Research and Improvement, ED-418-101, 1997.
CS96
Coleman M. and Stott Parker D., Aesthetics-based graph layout for human
consumption, Software — Practice and Experience 26(12), p1415-1438,
December, 1996. Available from: http://www.cs.ucla.edu/~stott/aglo/
CS98
Canup M. and Shackelford R., Using software to solve problems in large
computing courses, Proceedings of the 29th SIGCSE, Technical Symposium
on Computer Science Education, Atlanta GA, USA, February 26 to March 1,
1998, p135-139.
Cyg98
Cygwin.com,
Cygwin
User’s
http://cygwin.com/cygwin-ug-net/
Dc99
Daly C., RoboProf and an Introductory Computer Programming Course,
Proceedings of the 4th Annual SIGCSE / SIGCUE on Innovation and
Technology in Computer Science Education, Krakow, Poland, p155-158, June
27-30, 1999.
Dd99
Dodson D., Diagrammatic Interaction, Tutorial, Computer
Department, City University, London, 12 February, 1999.
Dfa04
Defense Finance and Accounting Service, Diagramming Guidelines,
DFAS/DCII Development Standards and Guidelines, 2004.
Dfes05
Department for Education and Skills, Trends in Education and Skills,
Learning & Skills Gateway, DfES website. Updated frequently, accessed on 3
January 2006. Available from http://www.dfes.gov.uk/trends/
Guide.
Available
from
Science
Bibliography
243
DK01
Duke-Williams E. and King T., Testing Higher Learning Outcomes with
CBA, Handbook of ILTAC 2001, The Institute for Learning and Teaching in
Higher Education Conference: Professionalism in Practice, University of
York,
Session
42,
4-6
July,
2001.
Available
from:
http://www.tech.port.ac.uk/%7Ekingt/research/ILT01/ILTAC01caa.html
DLO+05
Douce C., Livingstone D., Orwell J., Grindle S. and Cobb J., A Technical
Perspective on ASAP — Automated System for Assessment of
Programming, Proceedings of the 9th International Conference on Computer
Aided Assessment, Loughborough, July, 2005. Available from:
http://dircweb.king.ac.uk/Ris/Queries/Pages/home_page.asp?authorID=4
Dp03
Denton P., Returning Feedback to Students via Email Using Electronic
Feedback 9, Learning and Teaching in Action 2(1), Manchester Metropolitan
University: Learning and Teaching Unit, February 2003, ISSN 1477-1241.
Available from http://www.ltu.mmu.ac.uk/ltia/
eb11
Encyclopaedia Britannica 11th Edition, Cambridge University Press, 1911.
EG03
Eichelberger H. and von Gudenberg J.W., UML Class Diagrams — State of
the Art in Layout Techniques, Proceedings of the 2nd Annual “Designfest” on
Visualizing Software for Understanding and Analysis (VISSOFT 2003),
Amsterdam, Netherlands, September 22, 2003. Available from:
http://www.cs.uvic.ca/~mstorey/vissoft2003/
Ej02
English J., Experience with a computer-assisted formal programming
examination, ACM SIGCSE Bulletin, Proceedings of the 7th Annual
Conference on Innovation and Technology in Computer Science Education,
p51-54, Aarhus, Denmark, June 24-26, 2002.
Ej04
English J., Automated Assessment of GUI Programs using JEWL,
Proceedings of the 9th Annual SIGCSE Conference on Innovation and
Technology in Computer Science Education (ITiCSE 04), Leeds, UK, p137141, June 25-27, 2001.
En01
Eaton N., Microsoft Visio Version 2002 Inside Out, Microsoft Press, June,
2001. ISBN 0735612854.
FHG96
Foxley E., Higgins C. and Gibbon C., The Ceilidh System: A general
overview, LTR Report, Department of Computer Science, University of
Nottingham, UK, 1996.
FHH+01 Foxley E., Higgins C., Hegazy T., Symeonidis P. and Tsintsifas A., The
CourseMaster CBA System: Improvements over Ceilidh, Proceedings of the
5th Annual Computer Assisted Assessment Conference, Loughborough, UK,
2-4 July 2001, p189-201, ISBN 0-9539572-0-9.
FHS+01
Foxley E., Higgins C., Symeonidis P. and Tsintsifas A., The CourseMaster
Automated Assessment System — a next generation Ceilidh, Computer
Assisted Assessment Workshop, Warwick, UK, 5-6 April 2001.
Bibliography
244
FHT+99
Foxley E., Higgins C., Tsintsifas A. and Symeonidis P., Ceilidh: A System for
the Automatic Evaluation of Student Programming Work, Proceedings of
the 4th International Conference on Computer Based Learning in Science
(CBLIS’99), University of Twente, Netherlands, July 2-6 1999.
Fj01
Foster J., Improved coursework assessment for undergraduate and
postgraduate courses, Teaching and Learning Innovation Fund: Final
Report, Department of Electronic Systems Engineering, University of Essex,
UK,
April
2001.
Available
from:
http://www.essex.ac.uk/innovations/Foster.htm
FL94
Foxley E. and Lou B., A Simple Text Automatic Marking System, Artificial
Intelligence and Simulation of Behaviour 94 Conference for Computational
Linguistics for Speech and Handwriting Recognition Workshop, Leeds, UK,
April 12, 1994.
FLM98
Frosini G., Lazzerini B., Marcelloni F., Performing automatic exams,
Computers & Education 31, 1998.
FSZ97
Foxley E., Salman O. and Shukur Z., The automatic assessment of Z
specifications, Working group reports and supplemental proceedings,
Uppsala, Sweden, p129-131 June 1-5, 1997.
FW65
Forsythe G. and Wirth N., Automatic grading programs, Communications of
ACM 8(5), May 1965, p275-278.
FWW00
Ferguson R., Hunter A. and Hardy C., MetaBuilder: The diagrammer’s
diagrammer, First International Conference on Theory and Applications of
Diagrams, Lecture Notes in Artificial Intelligence, Springer Verlag,
Edinburgh, p407-421, September, 2000.
Gc97
Gibbon C.A., Heuristics for Object-Oriented Design, Ph.D. thesis, University
of Nottingham, October 1997.
GH06
Gray G.R. and Higgins C.A., An Introspective Approach to Marking
Graphical User Interfaces, Proceedings of the 11th Annual SIGCSE
Conference on Innovation and Technology in Computer Science Education
(ITiCSE 06), Bologna, Italy, p43-47, June 26-28, 2006.
GHJ+94
Gamma E., Helm R., Johnson R. and Vlissides J., Design Patterns: Elements
of Reusable Object-Oriented Software, Addison-Wesley, 1994.
GJS97
Gosling J., Joy B. and Steele G., The Java Language Specification, Addison
Wesley, 1997.
Gm00
Greenhow M., Setting objective tests in mathematics with QM Designer,
MSOR Connections 0(1), p21-26, Learning Technology Support Network,
February, 2000.
Bibliography
245
GMW88
Gamma E., Marty R. and Weinand A., ET++ — an object oriented application
framework for C++, Proceedings of Object-Oriented Programming Systems,
Languages and Applications (OOPSLA’88), San Diego CA, USA, p46-57,
September 25-30, 1988.
GS95
Gaines B. and Shaw M., Concept maps as hypermedia components,
Knowledge Science Institute, University of Calgary, 1995.
GV47
Goldstine H. and von Neuman J., Planning and Coding Problems for an
Electronic Computing Instrument, Volume 1, Van Nostrand, 1947.
GW01
Lee G. and Weerakoon P., The role of computer aided assessment in health
professional education: A comparison of student performance in computer
based and paper and pen multiple choice tests, The Vice-Chancellor's
Showcase of Scholarly Inquiry in Teaching and Learning, Institute for
Teaching and Learning, The University of Sydney, Australia, 2001. Available
from: http://www.itl.usyd.edu.au/itl/Showcase2001/
HB06
Higgins C.A. and Bligh B., Formative Computer Based Assessment in
Diagram Based Domains, Proceedings of the 11th Annual SIGCSE
Conference on Innovation and Technology in Computer Science Education
(ITiCSE 06), Bologna, Italy, p98-102, June 26-28, 2006.
Hd88
Harel D., On visual formalisms, Communications of the ACM 31(5), p514530, May, 1988.
HGS+06
Higgins C.A., Gray G., Symeonidis P. and Tsintsifas A., Automated
Assessment and Experiences of Teaching Programming, ACM Journal on
Educational Resources in Computing (JERIC) 5, Special Issue on Automated
Assessment of Programming Assignments, ISSN 1531-4278. To appear.
Hi88
Hirmanpour I., A student system development diagrammer, Proceedings of
the 19th SIGCSE Technical Symposium on Computer Science Education,
Atlanta GA, USA, p104-108, 25-26 February, 1988.
Hj59
Hollingsworth J., An educational program in computing, Communications
of the ACM 2(8), August 1959, p6.
Hj60
Hollingsworth J., Automatic graders for programming
Communications of ACM 3(10), October 1960, p528-529.
HL98
Hoggarth G. and Lockyer M., An automated student diagram assessment
system, Proceedings of the 6th Annual Conference on the Teaching of
Computing / 3rd Annual Conference on Integrating Technology into
Computer Science Education: Changing the Delivery of Computer Science
Education, Dublin, Ireland, 18-21 August 1998, p122-124, ISSN 0097-8418.
HM93
Hyvönen J. and Malmi L., TRAKLA — A system fro teaching algorithms
using email and a graphical editor, Proceedings of HYPERMEDIA in
Vaasa’93, Vaasa, Finland, p141-147, 1993.
classes,
Bibliography
246
HRT+98
Hall M.J., Robinson D.J., Tucknott G. and Carlton T., A multimedia tutorial
shell with qualitative assessment in biology, in Charman D. and Elmes A.
(eds), Computer Based Assessment (Volume 2): Case studies in Science and
Computing, p33-38, SEED Publications, University of Plymouth, 1998, ISBN
1-84102-02-7.
Hs90
Hekmatpour S., Templa and Graphica: A Generic Graphical Editor for the
Macintosh, Prentice Hall, 1990.
HST02
Higgins C., Symeonidis P. and Tsintsifas T., The Marking System for
CourseMaster, Proceedings of the 7th Annual Conference on Innovation and
Technology in Computer Science, University of Aarhus, Denmark, June 2426, 2002, p46-50. ISSN 1-58113-499-1.
Ht98
Hawkes T., An Experiment in Computer-Assisted Assessment, Interactions
2(3), Educational Technology Service, University of Warwick, 1998.
Available
from:
http://www.warwick.ac.uk/ETS/interactions/vol2no3/index.htm
HW69
Hext J. and Winings J., An automatic grading scheme for simple
programming exercises, Communications of ACM 12(5), May 1969, p272275.
HW96
Harrslev V. and Wessel M., GenEd: an editor with generic semantics for
formal reasoning about visual notations, IEEE Symposium on Visual
Languages 1996, Boulder CO, USA, September 3-6, 1996. Available from:
http://citeseer.ist.psu.edu/haarslev96gened.html
Ib84
Imrie B.W., In search of academic excellence: samples of experience,
Proceedings of the Tenth International Conference on Improving University
Experience, University of Maryland, University College, p160-183, 1984.
Ib95
Imrie B.W., Assessment for Learning: quality and taxonomies, Assessment &
Evaluation in Higher Education, 20(2), p175-189, 1995.
ISO05
International Standards Organisation, ISO in brief: International Standards
for a sustainable world, ISO, March, 2005. ISBN: 92-67-10401-2. Available
from: http://www.iso.org/
Iy01
Inoue Y., Questionnaire Surveys: Four Survey Instruments in Educational
Research, Educational Resources Information Center (ERIC), U.S.
Department of Education Office of Educational Research and Improvement,
2001. Available from: http://eric.ed.gov/
JA00
Johnstone A.H. and Ambusaidi A., Fixed Response: What are we testing?,
Chemistry Education: Research and Practice in Europe 1(3), p323-328, 2000.
JBR98
Jacobson I., Booch G. and Rumbaugh J., The Unified Software Development
Process, Addison Wesley Longman, 1998. ISBN 0-201-57169-2.
Bibliography
247
Jd00
Jackson D., A semi-automated approach to online assessment, Proceedings
of the 5th Annual SIGCSE / SIGCUE Conference on Innovation and
Technology in Computer Science Education, Helsinki, Finland, p164-167, 1113 July, 2000.
JG04
Joy M. and Griffiths N., Online Submission of Coursework — a
Technological Perspective, Proceedings of the 4th IEEE International
Conference on Advanced Learning Technologies (ICALT2004), Joensuu,
Finland,
2004,
p430-434.
Available
from:
http://www.dcs.warwick.ac.uk/research/edtech/
JL98
Joy M. and Luck M., Effective electronic marking for on-line assessment,
Proceedings of the 6th Annual Conference on the Teaching of Computing /
3rd Annual Conference on Integrating Technology into Computer Science
Education: Changing the Delivery of Computer Science Education, Dublin,
Ireland, 18-21 August 1998, p134-138, ISSN 0097-8418.
JMM+04 Juwah C., Macfarlane-Dick D., Matthew B., Nicol D., Ross D. and Smith B.,
Enhancing student learning through effective formative feedback, The
Higher Education Academy (Generic Centre), June, 2004. ISBN 1-904190-588.
JU97
Jackson D. and Usher M., Grading student programs using ASSYST,
Proceedings of the 28th SIGCSE technical symposium on Computer Science
Education, San Jose CA, USA, p335-339, February 27 to March 1, 1997.
KM00
Korhonen A. and Malmi L., Algorithm simulation with automatic
assessment, Proceedings of the 5th Annual SIGCSE / SIGCUE Conference on
Innovation and Technology in Computer Science Education, Helsinki,
Finland, p160-163, 11-13 July, 2000.
Kp01
Knight P., A Briefing on Key Concepts: Formative and summative, criterion
and norm-referenced assessment, LTSN Generic Centre: Assessment Series,
November 2001.
LBW+94 Lohse G., Biolski K., Walker N. and Rueter H., A classification of visual
representations, Communications of the ACM 37(12), p36-49, 1994.
LD97
Landauer T.K. and Dumais S.T., A solution to Plato’s problem: The Latent
Semantic Analysis theory of the acquisition, induction, and representation of
knowledge, Psychological Review 104, p211-240, 1997.
LHL98
Landauer T.K., Holtz P.W. and Laham D., Introduction to Latent Semantic
Analysis, Discourse Processes 25, p259-284, 1998.
Lr32
Likert R., A Technique for the Measurement of Attitudes, Archives of
Psychology 140, p55, 1932.
Mac95
Macromedia Inc, Macromedia Authorware 7.0.
http://www.macromedia.com/software/authorware/
Available
from:
Bibliography
248
MB99
McKenna C. and Bull J., Designing effective objective test questions: an
introductory workshop, CAA Centre, June 17, 1999. Available from:
http://www.caacentre.ac.uk/dldocs/otghdout.pdf
Md99
Mackenzie D., Recent developments in the Tripartite Interactive Assessment
Delivery System (TRIADS), Proceedings of the 3rd Annual Computer
Assessment Conference, Loughborough, UK, June 16-17, 1999.
Mf04
McMartin F., MERLOT: A Model for User Involvement in Digital Library
Design and Implementation, Journal of Digital Information 5(3), September
2004.
Available from: http://jodi.ecs.soton.ac.uk/Articles/v05/i03/McMartin/
MGH98
Mansouri F.Z., Gibbon C.A. and Higgins C.A., PRAM: PRolog Automatic
Marker, Proceedings of the 6th Annual Conference on the Teaching of
Computing / 3rd Annual Conference on Integrating Technology Into
Computer Science Education: Changing the delivery of computer science
education (ITiCSE 98), Dublin City University, Ireland, p166-170, 1998.
MK04
Malmi L. and Korhonen A., Automatic Feedback and Resubmissions as
Learning Aid, Proceedings of the IEEE International Conference on
Advanced Learning Technologies (ICALT 04), Joensuu, Finland, p186-190,
August 30 to September 1, 2004.
Ml97
Markham L., Staff-Student Ratios in Commonwealth Countries,
Commonwealth Higher Education Management Service (CHEMS) Paper 16,
Commonwealth Higher Education Support Scheme, 1997. Available from
http://www.acu.ac.uk/chems/
MLC03
Murphy A.J., Lockie R.G. and Coutts A. J., Kinematic determinants of early
acceleration in field sport athletes, Journal of Sports Science and Medicine
2(4), p144-150, 2003.
Mm02
McAlpine M., Principles of Assessment, Bluepaper Number 1, CAA Centre,
University of Luton, February, 2002. ISBN 1-904020-01-1. Available from:
http://www.caacentre.ac.uk/dldocs/Bluepaper1.pdf
Mp95
Martin P., EQL International's Interactive Assessor for Windows reviewed,
Monitor 6, CTI Computing, University of Ulster, 1995.
Mr86
Myers R., Computerized Grading of Freshman Chemistry Laboratory
Experiments, Journal of Chemical Education 63, 1986, p507-509.
Mu94
von Matt U., Kassandra: The automatic grading system, Technical Report
UMIACS-TR-94-59, Institute for Advanced Computer Studies, Department
of Computer Science, University of Maryland, USA, 1994.
MW98
Mason D. and Woit D., Integrating technology into computer science
examinations, Proceedings of the 29th SIGCSE, Technical Symposium on
Computer Science Education, Atlanta GA, USA, February 26 to March 1,
1998, p140-144.
Bibliography
249
MW99
Mason D. and Woit D., Providing mark-up and feedback to students with
online marking, Proceedings of the 30th SIGCSE, Technical Symposium on
Computer Science Education, Atlanta GA, USA, March 24 to 28, 1998, p140144.
NB01
Ngo D.C.L. and Byrne J.G., Another Look at a Model for Evaluating Interface
Aesthetics, International Journal of Applied Mathematical Computer Science
11(2), p515-535, 2001.
ND02
Neven F. and Duval E., Reusable Learning Objects: a survey of LOM-based
repositories, Proceedings of the 10th ACM international conference on
Multimedia, Juan-les-Pins, France, p291-294, December 1-6, 2002.
NTB00
Ngo D.C.L., Teo L.S. and Byrne J.G., A Mathematical Theory of Interface
Aesthetics, Visual Mathematics 2(4), Serbian Academy of Sciences and Arts:
Mathematical
Institute,
2000.
Available
from:
http://www.mi.sanu.ac.yu/vismath/
Or98
Oliver R., Experiences of assessing programming assignments by computer,
in Charman D. and Elmes A. (eds), Computer Based Assessment (Volume 2):
Case studies in Science and Computing, p47-49, SEED Publications,
University of Plymouth, 1998, ISBN 1-84102-02-7.
PAC02
Purchase H.C., Allder J. and Carrington D., Graph Layout Aesthetics in
UML Diagrams: User Preferences, Journal of Graph Algorithms and
Applications 6(3), p255-279, World Scientific Publishing, 2002. ISSN 15261719. Available from: http://www.cs.brown.edu/publications/jgaa/
PB98
Paul C.R.C. and Boyle A.P., Computer-based assessment in palaeontology,
in Charman D. and Elmes A. (eds), Computer Based Assessment (Volume 2):
Case studies in Science and Computing, p51-56, SEED Publications,
University of Plymouth, 1998, ISBN 1-84102-02-7.
Pc65
Petri C., Kommunikation mit Automaten, Ph.D. thesis, Translation by
Greene C.F., Supplement to Technical Report RADC-TR-65-337, Volume 1,
Rome Labs, Griffiss Air Force Base, New York, USA, 1965.
Pc99
Power C., Designer — a logic diagram design tool, Proceedings of the 4th
Annual SIGCSE / SIGCUE on Innovation and Technology in Computer
Science Education, Krakow, Poland, p211, June 27-30, 1999.
Pe94
Page E.B., Computer grading of student prose: Using modern concepts and
software, Journal of Experimental Education 62(2), p127-142, 1994.
Pm95
Petre M., Why looking isn’t always seeing: readership skills and graphical
programming, Communications of the ACM 38(6), p33-44, June, 1995.
PPK97
Page E.B., Poggio J.P. and Keith T.Z., Computer analysis of student essays:
Finding trait differences in the student profile, Proceedings of the AERA /
NCME Symposium on Grading Essays by Computer, 1997.
Bibliography
250
PT00
Papakostas A. and Tollis I., Efficient orthogonal drawings of high degree
graphs, Algorithmica 26(1), p100-125, Springer-Verlag, January, 2000.
Qt99
Quatrani T., Visual Modeling with Rational Rose 2000 and UML, AddisonWesley Professional, 1999. ISBN 0201699613.
Rc01
Rust C., A Briefing on Assessment of Large Groups, LTSN Generic Centre:
Assessment Series, November 2001.
Rd87
Rowntree D., Assessing Students: How shall we know them?, London:
Kogan Page, 1987.
RH83
Rottmann R.M. and Hudson H.T., Computer Grading as an Instructional
Tool, Journal of College Science Teaching 12, 1983, p152-156.
RJE02
Rawles S., Joy M. and Evans M., Computer Assisted Assessment Review
Exercise, LTSN Centre for Information and Computer Sciences, February 4,
2002.
RL02
Rudner L.M. and Liang T., Automated essay scoring using Bayes’ theorem,
Journal of Technology, Learning and Assessment 1(2), 2002. Available from
http://www.jtla.org/
Rp01
Race P., A Briefing on Self, Peer and Group Assessment, LTSN Generic
Centre: Assessment Series, November 2001.
RT81
Reingold E. and Tilford J., Tidier Drawings of Trees, IEEE Transactions on
Software Engineering 7(2), p223-228, IEEE Computer Society Press, March,
1981.
Sa01
Stedile A., JMFGraph — A Modular Framework for Drawing Graphs in Java,
M.Sc. Thesis, Institute for Information Processing and Computer Supported
New Media (IICM), Graz University of Technology, Austria, November 18th,
2001. Available from: http://www.iicm.edu/thesis/
SC01
Stefani L. and Carroll J., A Briefing on Plagiarism, LTSN Generic Centre:
Assessment Series, November 2001.
SHP+06
Spacco J., Hovemeyer D., Pugh W., Emad F., Hollingsworth J.K. and PaduaPerez N., Experiences with Marmoset: Designing and Using an Advanced
Submission and Testing System for Programming Courses, Proceedings of
the 11th Annual SIGCSE Conference on Innovation and Technology in
Computer Science Education (ITiCSE 06), Bologna, Italy, p13-17, June 26-28,
2006.
Sk02
Sugiyama K., Graph Drawing and Applications for Software and Knowledge
Engineers, Series on Software Engineering and Knowledge Engineering
Volume 11, World Scientific, March 2002, ISBN 981-02-4879-2.
Bibliography
251
SM97
Stephens D. and Mascia J., Results of a Survey into the use of ComputerAssisted Assessment in Institutions of Higher Education in the UK, DILS,
Loughborough University, January 1997.
SMK01
Saikkonen R., Malmi L. and Korhonen A., Fully automatic assessment of
programming exercises, Proceedings of the 6th Annual SIGCSE Conference
on Innovation and Technology in Computer Science Education (ITiCSE 01),
Canterbury, UK, p133-136, June 25-27, 2001.
Sp02
Symeonidis P., Setting Up Exercises Within CourseMaster, LTR Report,
School of Computer Science and IT, University of Nottingham, UK, 2002.
SP03
Stephens D. and Percik D., Constructing a Test Bank for Information Science
based upon Bloom’s Principles, Innovations in Teaching and Learning in
Information and Computer Sciences (ITALICS) 2(1), July 2003.
SP04
Shneiderman B. and Plaisant C., Designing the User Interface (fourth
edition), Addison Wesley, 2004. ISBN 0-321-19786-0.
Sp06
Symeonidis P., Automated Assessment of Java Programming Coursework
for Computer Science Education, Ph.D. thesis (unpublished manuscript),
University of Nottingham, February 2006.
STW04
Smith N., Thomas P.G. and Waugh K., Interpreting Imprecise Diagrams,
Proceedings of the Third International Conference in the Theory and
Application of Diagrams, Cambridge, UK, p239-241, March 22-24, 2004.
Available from: http://mcs.open.ac.uk/ns938/publications/diagrams-04poster.pdf
Ta02
Tsintsifas A., A Framework for the Computer Based Assessment of Diagram
Based Coursework, Ph.D. thesis, University of Nottingham, March 2002.
Available from http://www.cs.nott.ac.uk/~azt/research.htm
Tb93
Buzan T., The Maind Map Book: Radiant Thinking — the major evolution in
human thought, BBC Publications, 1993.
TBF97
Tinoco L., Barnette D. and Fox E., Online evaluation in WWW-based
courseware, Proceedings of the 28th SIGCSE technical symposium on
Computer Science Education, San Jose CA, USA, p194-198, February 27 to
March 1, 1997.
TD76
Taylor J. and Deever D., Constructed-response, computer-graded
homework, American Journal of Physics 44(6), June 1976, p598-599.
Tek87
Tektronix Computer Research Laboratory, Semantic
HotDraw, Technical Report CR-87-34, April, 1987.
Tp04
Thomas P.G., Drawing Diagrams in an Online Examination, Technical
Report 2004/14, Department of Computing, The Open University, Milton
Keynes, UK, April 23, 2004. Available from: http://computingreports.open.ac.uk/index.php/2004/200414
Drawing
with
Bibliography
252
Tr87
Tamassia R., On embedding a graph in the grid with the minimum number
of bends, SIAM Journal on Computing 16(3), p421-444, Society for Industrial
and Applied Mathematics, June, 1987.
TTV00
Tamassia R., Tollis I.G. and Vitter J.S., A Parallel Algorithm for Planar
Orthogonal Grid Drawings, Parallel Processing Letters, March 2000.
Available
from:
http://www.cs.duke.edu/~jsv/Papers/catalog/node140.html
TWS05
Thomas P.G., Waugh K. and Smith N., Experiments in the Automatic
Marking of ER-Diagrams, Proceedings of the 10th annual SIGCSE conference
on Innovation and Technology in Computer Science Education (ITiCSE 05),
Monte de Caparica, Portugal, p158-162, June 27-29, 2005.
TWS06
Thomas P., Waugh K. and Smith N., Using Patterns in the Automatic
Marking of ER-Diagrams, Proceedings of the 11th Annual SIGCSE
Conference on Innovation and Technology in Computer Science Education
(ITiCSE 06), Bologna, Italy, p83-87, June 26-28, 2006.
Vg95
Viehstaedt G., A Generator for Diagram Editors, Ph.D. thesis, University of
Erlangen-Nürnberg, 1995.
VL89
Vlissides J. and Linton M., Unidraw: A Framework for Building DomainSpecific Graphical Editors, Technical Report CSL-TR-89-380, Stanford
University, July, 1989.
VS02
Vendlinski T. and Stevens R., Assessing Student Problem-Solving Skills With
Complex Computer-Based Tasks, Journal of Technology, Learning and
Assessment 1(3), 2002. Available from http://www.jtla.org/
WC53
Watson J.D. and Crick F.H.C., A Structure for Deoxyribose Nucleic Acid,
Nature
171,
p737-738,
April
25,
1953.
Available
from:
http://www.nature.com/nature/dna50/watsoncrick.pdf
Wl98
Wybrew L., The use of computerised assessment in Health Science modules,
in Charman D. and Elmes A. (eds), Computer Based Assessment (Volume 2):
Case studies in Science and Computing, p61-65, SEED Publications,
University of Plymouth, 1998, ISBN 1-84102-02-7.
Wt04
Winters T.D., Analysis, Design, Development, and Deployment of a
Generalized Framework for Computer-Aided Assessment, M.Sc. thesis,
University of California Riverside, June 2004.
Ym01
Yorke M., Assessment: A Guide for Senior Managers, LTSN Generic Centre:
Assessment Series, November 2001.
ZF92
Zin A. M. and Foxley E., The “oracle” program, LTR Report, Dept. of
Computer Science, University of Nottingham, UK, 1992. Available from:
http://www.cs.nott.ac.uk/~ceilidh/papers.html
Bibliography
ZF94
253
Zin A.M. and Foxley E., Analyse: An Automatic Program Assessment
System, Malaysian Journal of Computer Science 7, 1994. Available from:
http://www.cs.nott.ac.uk/CourseMarker/more_info/html/ASQA.HTM