High Stakes Assessment in Reading
High Stakes Assessment in Reading
High Stakes Assessment in Reading
Assessments
in
READING
A Position
Statement
of the
International
Reading
Association
T
he Board of Directors of the International Reading Association is op-
posed to high-stakes testing. High-stakes testing means that one test is
used to make important decisions about students, teachers, and
schools. In a high-stakes testing situation, if students score high on a single test
they could be placed in honors classes or a gifted program. On the other hand, if
students score low on a high-stakes test, it could mean that they will be rejected by
a particular college, and it could affect their teacher’s salary and the rating of the
school district as compared with others where the same test was given.
In the United States in recent years there has been an increase in policy
makers’ and educators’ reliance on high-stakes testing in which single test scores
are used to make important educational decisions. The International Reading
Association is deeply concerned about this trend. The Board of Directors offers this
position statement as a call for the evaluation of the impact of current types and
levels of testing on teaching quality, student motivation, educational policy mak-
ing, and the public’s perception of the quality of schooling. Our central concern is
that testing has become a means of controlling instruction as opposed to a way of
gathering information to help students become better readers. To guide educators
who must use tests as a key element in the information base used to make deci-
sions about the progress of individual children and the quality of instructional
programs, we offer this position in the form of a question and answer dialogue.
This format is intended to ensure that important conceptual, practical, and ethi-
cal issues are considered by those responsible for designing and implementing
testing programs.
What does the term Why are we Is testing an
high-stakes testing concerned with important part of
mean? high-stakes testing? good educational
High-stakes testing means that the conse-
quences for good (high) or poor (low) per- Although high-stakes testing has been and
probably will continue to be part of the ed-
design?
formance on a test are substantial. In other Yes, testing students’ skills and knowledge
words, some very important decisions, such ucational landscape, there has been an in- is certainly an important part of education,
as promotion or retention, entrance into an crease in such testing in recent years, par- but it is only one type of educational as-
educational institution, teacher salary, or a ticularly at the state level. More children are sessment. Assessment involves the system-
school district’s autonomy depend on a being tested at younger ages, and states atic and purposeful collection of data to
single test score. and local school districts are using these inform actions. From the viewpoint of edu-
High-stakes tests have been a part of tests to make a greater variety of important cators, the primary purpose of assessment
education for some time. Perhaps the most decisions than ever before. Increased frus- is to help students by providing information
conspicuous form of high-stakes testing, tration with lack of achievement has led to about how instruction can be improved.
historically speaking, was in the British a greater reliance on testing. In response to Assessment has an important role to play in
educational system. National exams in these frustrations many states have adopted decision making beyond the classroom lev- el,
England and in other countries that adopt- educational standards and assessments of however. Administrators, school board
ed the British system separated students those standards. The logic is that tests of members, policy makers, and parents make
into different educational tracks. In the standards accompanied by a reward and significant decisions that impact students.
United States, tests such as the Medical penalty structure will improve children’s The needs of many audiences must be con-
College Admission Test and Law School achievement. In too many cases the assess- sidered in building a quality assessment plan.
Admission Test, as well as professional cer- ment is a single multiple-choice test, which Testing is a form of assessment that in-
tification examinations (for example, state would be considered high stakes and volves the systematic sampling of behavior
bar examinations, medical board examina- would not yield enough information to under controlled conditions. Testing can
tions, state teacher examinations) all repre- make an important instructional decision. provide quick reliable data on student per-
sent high-stakes tests. formance. Single tests might be used to
The meaning of high stakes can be make decisions that do not have major
confusing at times. Tests that have no spe- long-term consequences, or used to supple-
cific decision tied to them can become high ment other forms of assessment such as fo-
stakes to teachers and school administrators cused interviews, classroom observations
when they must face public pressure after and anecdotal records, analysis of work
scores are made public. In other cases, a samples, and work inventories.
low-stakes state test can be transformed into Different kinds of assessment produce
a high-stakes test at a school district level if different kinds of information. If a teacher
a local school board decides to make needs to know whether a student can read
educational or personnel decisions based on a particular textbook, there are many
the test results. sources of information available to her. She
can consult districtwide achievement tests in
reading, estimate the level of the textbook,
determine what score a student would need Why does using ities that they think will improve the single
to read the textbook effectively, and then tests for high-stakes important score. Time spent focusing on
those activities will come from other activi-
make a decision. However, it might be sim-
pler for the teacher to ask the student to
decisions cause ties in the curriculum and will consequent-
read a section of the text and then talk with problems? ly narrow the curriculum. Most state assess-
ments tend to focus on reading, writing,
the student about the text. This would prob- There are several possible problematic out-
and mathematics. Too much attention to
ably be faster and more accurate than look- comes of high-stakes testing. These include
these basic subjects will marginalize the
ing up test scores and conducting studies to making bad decisions, narrowing the cur-
fine arts, physical education, social studies,
see what kind of a test score is needed to riculum, focusing exclusively on certain
and the sciences.
comprehend the textbook. In general, segments of students, losing instructional time,
Narrowing of the curriculum is most
teachers need information specific to the and moving decision making to
likely to occur in high-poverty schools
content and strategies they are teaching, central authorities and away from local
that tend to have the lowest test scores.
and they can best get that information through personnel.
Compared to students in schools in afflu-
assessments built around their daily Tests are imperfect. Basing important
ent communities, students in high-poverty
educational tasks. decisions on limited and imperfect infor-
schools receive teaching with a greater em-
The public and policy makers have dif- mation can lead to bad decisions—deci-
phasis in lower level skills, and they have
ferent needs from teachers. In general they sions that can do harm to students and
limited access to instruction focusing on
need to know whether the school, school teachers and that sometimes have unfortu-
higher level thinking. A recent survey in
district, and state are effectively educating nate legal and economic consequences for
one state that uses high-stakes assessments
the students in their charge. For this pur- the schools. Decision makers reduce the
found that 75% of classroom teachers sur-
pose they need to collect information about chance of making a bad decision by seek-
veyed thought the state assessment had a
many students and they need to know how ing information from multiple sources.
negative impact on their teaching
those students stand in relation to other stu- However, the information from norm- (Hoffman et al., in press).
dents across the United States or in relation referenced and criterion-referenced tests is Another way that educators sometimes
to some specific standards set by the state. inexpensive to collect, easy to aggregate, respond to test pressure is to focus their at-
For these purposes, standardized norm- and usually is highly reliable; for those rea- tention on particular students. Sometimes
referenced or criterion-referenced tests are sons it is tempting to try to use this infor- this means that only low-performing read-
efficient and can give a broad picture of mation alone to make major decisions. ers get the instructional resources they
achievement on certain kinds of tasks. Another problem is that high-stakes need, and those doing only slightly better
These kinds of tests are used most com- tests have a tendency to narrow the cur- are ignored. Sometimes there is an attempt
monly for high-stakes decisions regarding riculum and inflate the importance of the to raise test scores by focusing instructional
schools and school districts. test. Schools should address a broad range initiatives on those students scoring just
of student learning needs, not just the sub- below cut-off points, and ignoring those
jects or parts of subject areas covered on a both above or far below cutoff points. And
particular test. As the consequences for sometimes schools place children in ex-
low performance are raised, teachers feel pensive special education programs they
pressured to raise scores at all costs. This do not need, discourage particular children
means they will focus their efforts on activ-
from attending school on testing days, or
Analyses of national reading scores do not
encourage low-achieving students to drop
out of school altogether, all in the name of
show the substantial gains claimed by state
reading assessments. Studies of norm-
Is there a way to
getting higher test scores.
referenced tests in states with sustained help states monitor
The loss of instructional time also is a
negative result of high-stakes tests. The
patterns of growth in state skill assess- student success in
time for preparing for and taking tests is
ments (for example, Texas and Kentucky)
show no comparable patterns of gain.
the curriculum?
time taken away from basic instruction. If the intent of state assessments is to mea-
Although Texas showed steady improve- ment
The consequences of lost instructional time, sure how well students are learning the
on state tests, its National Assessment of
particularly for low-performing stu- dents, outcomes identified in the state curriculum
Educational Progress (NAEP) reading scores
are too great for information that can be framework, then one way students’ success can
are not among the highest, and the scores did
gathered more efficiently. be monitored is by following the NAEP model
not show significant improve- ment between
Finally, we are concerned that instruc- with selective sampling across stu-
1992 and 1998 (U.S.
tional decision making in high-stakes test- ing dent populations and across content areas
Department of Education, 1999). This may
situations is diverted from local teach- ers and on a systematic basis. This model monitors
be the result of high-stakes assessments
is concentrated in a central achievement without encouraging high-
that tend to narrow the curriculum and em-
authority far away from the school. The stakes testing. The tests are directed to-
phasize only parts of what students need
further decision making is removed from ward particular grade levels and are not
to learn to become successful readers.
the local level of implementation, the less given every year. A sampling procedure is
adaptive the system becomes to individual used so very few students actually partici-
needs. High-stakes assessment shifts deci- Why don’t we just pate in testing. NAEP is designed to give a
sions from teachers and principals to bu- end high-stakes report card on general achievement levels
reaucrats and politicians and consequently
may diminish the quality of educational assessment? in the basic subject areas over time.
Many aspects of the NAEP assessment
services provided to students. It is unlikely that states using these assess- in reading are commendable. The NAEP
ments will abandon them. Indeed, the most sampling strategy has been useful in keep-
Do test scores likely scenario is for an increasing number of
states to develop and adopt similar assess-
ing efficiency high and maintaining a focus
improve when high- ment plans. Tests can be useful for making
on the questions that the national assess-
ment is designed to address. Sampling also
stakes assessment is state-level educational decisions, and they has provided NAEP with an opportunity to
mandated? provide the public with at least a partial un-
derstanding of how well schools are doing.
experiment with a wide variety of testing
formats and conditions. Such a strategy
Test scores in the states with high-stakes Less positively, politicians, bureaucrats, and would avoid most of the problems associ-
assessment plans have often shown im- test publishers have discovered that they can ated with teaching to the test. This type of
provement. This could be because high- influence classroom instruction through the plan would reflect sound principles of in-
stakes pressure and competition leads use of high-stakes tests. Tests allow these structional design and assessment.
teachers to teach reading more effectively. outside parties to take control away from lo- In the book High Stakes: Testing for
An alternative interpretation is that gains in cal educational authorities without assuming Tracking, Promotion, and Graduation
test scores are the result of “teaching to the the responsibilities of educating the students. (Heubert & Hauser, 1999), the following
test” even when reading does not improve.
basic principles for test use are presented:
• The important thing about a test is not of questions asked and/ or transitory fac- position statement is not to blame policy
its validity in general, but its validity when tors, such as the student’s health on the makers for the current dilemma with high-
used for a specific purpose. Thus, tests day of the test. Thus, no single test score stakes testing.
that are valid for influencing classroom can be considered a definitive measure of Our recommendations begin with a
practice, “leading” the curriculum, or hold- a student’s knowledge. consideration of teachers and their respon-
ing schools accountable are not appropri- • An educational decision that will have a sibility to create rich assessment environ-
ate for making high-stakes decisions about major impact on a test taker should not be ments in their classrooms and schools.
individual student mastery unless the made solely or automatically on the basis Next, we suggest that researchers must
curriculum, the teaching, and the tests of a single test score. Other relevant infor- continue to investigate how assessment
are aligned. mation about the student’s knowledge and can better serve our educational goals.
• Tests are not perfect. Test questions are skills should also be taken into account. Third, we stress the importance of parents
a sample of possible questions that could • Neither a test score nor any other kind and community members in bringing bal-
be asked in a given area. Moreover, a test of information can justify a bad decision. ance to the assessment design. Finally, we
score is not an exact measure of a stu- Research shows that students are typically offer recommendations to policy makers
dent’s knowledge or skills. A student’s hurt by a simple retention and repetition for developing a plan of action.
score can be expected to vary across dif- of a grade in school without remedial and
ferent versions of a test—within a margin other instructional support services. In the Recommendations to teachers:
of error determined by the reliability of the absence of effective services better tests • Construct more systematic and rigorous
test—as a function of the particular sample will not lead to better educational out- assessments for classrooms, so that external
comes. (p. 3) audiences will gain confidence in the mea-
State testing programs should respect these sures that are being used and their inherent
basic principles. value to inform decisions.
• Take responsibility to educate parents,
community members, and policy makers
about the forms of classroom-based assess-
What are the recom- ment, used in addition to standardized
mendations of the tests, that can improve instruction and
©
1999 International Reading Association
Cover and inside photo 3: Little Bighorn Photos
Inside photo 1, 2, & 4: Robert Finken
INTER N ATI O N
AL
Reading
Association
800 Barksdale Road
®
PO Box 8139
Newark, Delaware 19714-8139, USA
Phone: 302-731-1600
Fax: 302-731-1057
Web site: www.reading.org
99-19 pub 6/99 1035 7/99