CHPT 3&4
CHPT 3&4
CHPT 3&4
OBJECTIVES
By the end of this chapter, you should be able to: o o o o identify key concepts that
ensure assessment and reporting focus on learning evaluate the importance and
relevance of those concepts to classroom contexts identify a variety of approaches that
will promote assessment for learning appreciate the external contexts in which the
assessment and reporting of student learning are constructed. Assessment can be used
for many purposes— to select students for higher education; to rank students, or even
countries, in order to establish an order from highest to lowest; to decide who will
progress from one stage of schooling to another; or to monitor student progress at
different stages of schooling. In all of these cases, assessment of learning takes place.
It is summative in nature, it provides no opportunities to try again if the first attempt
is not successful, and it is ‘high stakes’ in nature, as the outcomes have very
significant implications for students— especially for those who do not do well. Yet,
assessment of learning serves very particular purposes, and these are usually related to
the broader social, political and economic contexts that have come to the fore in many
countries (Métais and Tabberer 1997). These contexts will undoubtedly continue to be
important as the pace of change continues, and in particular as technological
innovation drives change in almost every aspect of life. Such changes have the
potential to alter radically the role and function of schools, including approaches to
assessment and reporting. But assessment can achieve more than these broad
purposes: assessment can also be used to improve learning. When it is properly
designed, assessment provides feedback to students, leads to re-teaching and provides
extra opportunities for practice. Assessment of this kind— assessment for learning—
can contribute towards a supportive school environment and can help to create a
culture of learning. If schools are to be ‘social anchors’ (Kennedy 1999), providing
students with safe and secure environments, focusing on assessment for learning is an
important objective. .
Not all assessment tasks can demonstrate all aspects of validity. Messick (1989) has
argued that construct validity is at the heart of assessment. Gipps (1994) has pointed
out that ‘construct validity is needed not only to support test interpretation, but also to
justify test use’. She also argued strongly for the importance of consequential validity,
and this highlights a focus on the social and educational consequences of assessment.
Construct validity needs to be built into an assessment task from the very beginning.
If the purpose of the assessment is to gain some understanding of students’
mathematical reasoning or language comprehension, then specific tasks must be
designed to elicit responses that will clearly demonstrate student understanding within
those broad domains. Tasks that cannot be related to the broad domain being assessed
are not valid tasks.
Another approach that will assist both construct and content validity is to make the
selection of tasks a collegial decision. A moderation process can be used to reach
agreement on whether tasks are related to a broad domain or to specific curriculum
objectives, and how they might be improved to make them relate better. Collegial
decisions, sometimes referred to as panel judgments, are well-recognised processes
for determining content validation (Rudner and Farris 1992). There are other
processes for task validation, but the collegial approach is probably the most useful
and efficient for schools.
Authentic assessment, as referred to throughout this book, places particular emphasis
on construct validity, although of a different type. Assessment tasks, if they are to be
considered ‘authentic’, must be challenging, relevant and engaging for students. As
Wiggins (1990) has pointed out, they must reflect real-world contexts and situations
that will confront students outside of school. The reason these can be considered
criteria for valid assessment tasks is that student performance on decontextualised,
abstract and theoretical assessment tasks may well be a reflection of the nature of the
tasks themselves and not a true reflection of student understanding. If learning is to be
the focus of assessment, then it is important that assessment tasks provide the
opportunity for students to demonstrate what they know and are able to do. Authentic
assessment, more than any other form of assessment, seeks to ensure that the task
itself engages students and provides the best opportunity to demonstrate the kind of
learning that has taken place.
A second issue to be discussed is the requirement that assessment tasks produce
reliable or consistent results. Traditionally, assessment tasks are considered to be
reliable when they get the same results from students irrespective of when they are
administered. Such approaches to determining reliability have been outlined by Gipps
(1994, p. 67). These include: multiple administrations of the same tests (test/ retest);
administration of multiple forms of tests designed to measure the same underlying
construct (parallel forms); and comparing performance on two halves of a test (split-
half procedure). In addition, consistency in marking can be addressed by having
multiple markers for a single assessment task (inter-rater reliability) or having an
assessment task marked on two different occasions by the same marker (intra-rater
reliability).
The underlying similarity in all of the above approaches is that they assume
consistency of task performance by students. Each approach is designed to produce a
‘measure’ of consistency, and in traditional measurement theory this measure is
usually represented by a statistical correlation. The higher the correlation between the
results of successive administrations of the test, performance on parallel forms of the
test, performance in two halves of the test and so on, the more reliable the items are
said to be.
This is a very traditional way of regarding reliability, but it cannot be applied to all
forms of assessment— especially those used in classrooms on a day-to-day basis. In
addition, the technical requirements for determining reliability in this way are often
beyond the resources of schools. Nevertheless, reliability and consistency are
important if confidence is to be maintained in the outcomes of assessment. There are
some quite specific efforts that can be made to ensure greater consistency in
assessment, and these represent an alternative to the more measurement-oriented
approaches described above.
Masters and Forster (1996, p. 6), for example, relate reliability to the amount of
evidence or information that is used to make a judgment about student learning: ‘in
general, the greater the amount of evidence used in making an estimate, the more
reliable that estimate’. This approach calls for multiple measures rather than reliance
on a single assessment task. Gipps (1994) has suggested a wholesale shift in thinking
about reliability, away from traditional approaches based on statistical assumptions
towards approaches based on the conditions surrounding assessment. She talks of
consistency in task administration, in the interpretation of assessment criteria by
markers, and in the application of the same standards to judging performance. What
these do is help to provide some indication that the results are due to what students
actually know, rather than some other variable in the external environment. In the
same way, processes such as moderation can be used to help markers understand the
requirements for assessment and to make them aware of how other markers have
approached the assessment process. These have the effect of reducing any variance in
marking that might come from extraneous factors.
In relation to moderation, there is now considerable evidence that teachers who
engage in consensus moderation processes are able to reach agreement very quickly
on such things as assessment criteria and their application to specific tasks. Where
worked samples are used as the basis for discussion, teachers can explain why they
made particular decisions and can modify those decisions if necessary after the
discussion. Such approaches work well for high-stakes assessment, where it is
necessary to establish some comparability of marking across schools, but can also be
used at the school level where different teachers are involved in assessing groups
doing the same task. The purpose of consensus moderation is to have a group of
markers reach agreement about criteria and standards for marking so that a piece of
assessment work will get the same mark irrespective of who marks it or where it is
marked. This is particularly important when different teachers are involved in
assessing multiple classes using the same assessment task.
Valid and reliable assessment tasks will ensure that there is a solid foundation for the
assessment process. Yet, valid and reliable tasks also need to be fair. Not all students
come to assessment tasks with the same attributes, life opportunities, values and
predispositions. As far as possible, assessment tasks should not be based on
requirements that will favour one individual or group over another. In particular, tasks
can be biased in favour of boys rather than girls, one ethnic group over another,
European Australians over Indigenous Australians, students from high-income groups
over students from low-income groups, and students whose first language is English
over students whose first language is not English. Childs (1990) provides a good
definition of bias in assessment and, while it is referring specifically to gender bias,
the general principles it outlines can be applied to all forms of structural bias in
assessment:
A test is biased if men and women with the same ability levels tend to obtain different
scores. The conditions under which a test is administered, the wording of individual
items, and even a student’s attitude toward the test will affect test results. These
factors may change with time as tests are administered differently, as items are
revised, and as students feel more or less comfortable taking the test. The error caused
by these factors will randomly affect both men and women.
Another type of error is caused by factors which do not change. Known as systematic
error, it is the result of characteristics of the examinees that are stable (such as gender
or race) and that are characteristics other than those the test is intended to measure.
Gender bias in testing is often the result of such systematic error. (Childs 1990, p. 1)
Teachers can be alert to this kind of bias by paying attention to the results of
assessment. Where results seem to be favouring groups or individuals (as outlined
above), then close attention needs to be paid to the assessment tasks to see how these
might be biased. It might be the form of those tasks (e.g. multiple-choice questions,
essays, demonstrations), the content of the tasks (e.g. the content might assume
certain knowledge that has nothing to do with what is being assessed and is more
likely to be familiar to one particular group) or the marking of the completed tasks,
where judgments are called for on the part of the marker and there has been
insufficient marker training to ensure high inter-rater reliability. In this situation,
markers can be guided by their own perceptions of what is required, rather than by the
need to make an assessment of what students know and are able to do.
The detection of bias of any kind requires that teachers record their results in ways
that enable them to examine easily the results for different groups. The easiest way to
examine differences is to split the results for boys and girls. Depending on the class,
however, there may be other groups to examine as well: English as a first or second
language, ethnic groups where these are large enough to warrant it, European
Australians and Indigenous Australians, or any other significant groups where bias
might affect results. The reason for doing this is to ensure that the results of
assessment genuinely reflect what students know and are able to do and are not the
result, even in part, of systematic error.
One way Broadfoot sees of advancing this cause is in better recording and reporting
processes. She was a strong advocate of records of achievement, which she saw as
one way of recognising ‘the whole range of student achievement and assert[ing] the
need to motivate and encourage students’ development’ (Broadfoot 1991, p. 13).
Assessment from this perspective is not so much about ‘sifting and sorting’ as about
‘celebrating achievement’ and recognising it publicly. There is also a strong indication
that, for Broadfoot, learning is for all students and not just some. This means that
assessment is primarily concerned with individuals and their progress, rather than
with ranking and rating individuals against one another.
More recently, Broadfoot’s approach has been taken up with an increased emphasis
on what has been called ‘assessment for learning’. There are different ways to express
what is meant by assessment for learning:
Assessment for learning . . . acknowledges that assessment should occur as a regular
part of teaching and learning and that the information gained from assessment
activities can be used to shape the teaching and learning process. (Curriculum
Corporation n.d.)
and
Assessment for Learning is the process of seeking and interpreting evidence for use
by learners and their teachers to decide where the learners are in their learning, where
they need to go and how best to get there. (Assessment Reform Group 2002)
REPORTING
As shown in Chapter 1, schools are a part of the broad social, political and economic
structures of society. As such, schools both respond to and influence that society.
Assessment and reporting, as core functions of schools, quite naturally come under
the scrutiny of groups and individuals who are external to schools but who see
themselves as having a stake in the outcomes of schooling. This scrutiny creates a
number of important issues that schools and their communities have to learn to
negotiate. In this section, three of those issues are discussed: accountability, school
reform, and the social purposes of schooling.
SUMMARY
Assessment tasks that can promote learning must have a number of characteristics
to ensure that they have the confidence of students, parents and the community.
They need to be valid: they must be challenging and demanding tasks with some
relevance to real-world contexts; they must yield information that can be used to
make judgments about student performance in broad domains of knowledge and
skills, as well as in relation to specific curriculum objectives; and the social
consequences of the assessment activity should not be negative. They need to be
reliable: the results should be based on achievement on multiple tasks; tasks
should be administered in a consistent way; and there should be agreement on the
interpretation of assessment criteria, and on any reference to performance
standards or benchmarks. They also need to be fair: the requirements of the task
should not favour one individual or group over another. Assessment tasks that
lack these qualities will always be suspect and questionable.
Resolving the conceptual issues associated with assessment does not resolve
broader issues, such as what kind of assessment for what purposes. Educators
have consistently argued for assessment practices that are integrated with the
processes of teaching and learning and that are underpinned by agreed
educational values. Not all forms of assessment to which students are subject
meet these criteria. There are some encouraging trends in large-scale assessments
that support such an approach, although there is by no means consensus within
the community.
Schools are under continuous scrutiny, and there are constant demands for
schools to deliver outcomes that will meet community expectations. Inevitably,
this means that students will at times be subject to assessment practices endorsed
by those external to schools but that do not always reflect agreed educational
values.
Schools serve broad social purposes, and it is important that assessment and
reporting reflect these purposes. Assessment should be concerned with significant
content and learning that can help students to become effective and productive
citizens.
CHAPTER 4: