High Stakes Assessment in Reading

High-Stakes
Assessments
in
READING
A Position
Statement
of the
International
Reading
Association
T
he Board of Directors of the International Reading Association is op-
posed to high-stakes testing. High-stakes testing means that one test is
used to make important decisions about students, teachers, and
schools. In a high-stakes testing situation, if students score high on a single test
they could be placed in honors classes or a gifted program. On the other hand, if
students score low on a high-stakes test, it could mean that they will be rejected by
a particular college, and it could affect their teacher’s salary and the rating of the
school district as compared with others where the same test was given.
In the United States in recent years there has been an increase in policy
makers’ and educators’ reliance on high-stakes testing in which single test scores
are used to make important educational decisions. The International Reading
Association is deeply concerned about this trend. The Board of Directors offers this
position statement as a call for the evaluation of the impact of current types and
levels of testing on teaching quality, student motivation, educational policy mak-
ing, and the public’s perception of the quality of schooling. Our central concern is
that testing has become a means of controlling instruction as opposed to a way of
gathering information to help students become better readers. To guide educators
who must use tests as a key element in the information base used to make deci-
sions about the progress of individual children and the quality of instructional
programs, we offer this position in the form of a question and answer dialogue.
This format is intended to ensure that important conceptual, practical, and ethi-
cal issues are considered by those responsible for designing and implementing
testing programs.
What does the term Why are we Is testing an
high-stakes testing concerned with important part of
mean? high-stakes testing? good educational
High-stakes testing means that the conse-
quences for good (high) or poor (low) per- Although high-stakes testing has been and
probably will continue to be part of the ed-
design?
formance on a test are substantial. In other Yes, testing students’ skills and knowledge
words, some very important decisions, such ucational landscape, there has been an in- is certainly an important part of education,
as promotion or retention, entrance into an crease in such testing in recent years, par- but it is only one type of educational as-
educational institution, teacher salary, or a ticularly at the state level. More children are sessment. Assessment involves the system-
school district’s autonomy depend on a being tested at younger ages, and states atic and purposeful collection of data to
single test score. and local school districts are using these inform actions. From the viewpoint of edu-
High-stakes tests have been a part of tests to make a greater variety of important cators, the primary purpose of assessment
education for some time. Perhaps the most decisions than ever before. Increased frus- is to help students by providing information
conspicuous form of high-stakes testing, tration with lack of achievement has led to about how instruction can be improved.
historically speaking, was in the British a greater reliance on testing. In response to Assessment has an important role to play in
educational system. National exams in these frustrations many states have adopted decision making beyond the classroom level,
England and in other countries that adopt- educational standards and assessments of however. Administrators, school board
ed the British system separated students those standards. The logic is that tests of members, policy makers, and parents make
into different educational tracks. In the standards accompanied by a reward and significant decisions that impact students.
United States, tests such as the Medical penalty structure will improve children’s The needs of many audiences must be con-
College Admission Test and Law School achievement. In too many cases the assess- sidered in building a quality assessment plan.
Admission Test, as well as professional cer- ment is a single multiple-choice test, which Testing is a form of assessment that in-
tification examinations (for example, state would be considered high stakes and volves the systematic sampling of behavior
bar examinations, medical board examina- would not yield enough information to under controlled conditions. Testing can
tions, state teacher examinations) all repre- make an important instructional decision. provide quick reliable data on student per-
sent high-stakes tests. formance. Single tests might be used to
The meaning of high stakes can be make decisions that do not have major
confusing at times. Tests that have no spe- long-term consequences, or used to supple-
cific decision tied to them can become high ment other forms of assessment such as fo-
stakes to teachers and school administrators cused interviews, classroom observations
when they must face public pressure after and anecdotal records, analysis of work
scores are made public. In other cases, a samples, and work inventories.
low-stakes state test can be transformed into Different kinds of assessment produce
a high-stakes test at a school district level if different kinds of information. If a teacher
a local school board decides to make needs to know whether a student can read
educational or personnel decisions based on a particular textbook, there are many
the test results. sources of information available to her. She
can consult districtwide achievement tests in
reading, estimate the level of the textbook,
determine what score a student would need Why does using ities that they think will improve the single
to read the textbook effectively, and then tests for high-stakes important score. Time spent focusing on
those activities will come from other activi-
make a decision. However, it might be sim-
pler for the teacher to ask the student to
decisions cause ties in the curriculum and will consequent-
read a section of the text and then talk with problems? ly narrow the curriculum. Most state assess-
ments tend to focus on reading, writing,
the student about the text. This would prob- There are several possible problematic out-
and mathematics. Too much attention to
ably be faster and more accurate than look- comes of high-stakes testing. These include
these basic subjects will marginalize the
ing up test scores and conducting studies to making bad decisions, narrowing the cur-
fine arts, physical education, social studies,
see what kind of a test score is needed to riculum, focusing exclusively on certain
and the sciences.
comprehend the textbook. In general, segments of students, losing instructional time,
Narrowing of the curriculum is most
teachers need information specific to the and moving decision making to
likely to occur in high-poverty schools
content and strategies they are teaching, central authorities and away from local
that tend to have the lowest test scores.
and they can best get that information through personnel.
Compared to students in schools in afflu-
assessments built around their daily Tests are imperfect. Basing important
ent communities, students in high-poverty
educational tasks. decisions on limited and imperfect infor-
schools receive teaching with a greater em-
The public and policy makers have dif- mation can lead to bad decisions—deci-
phasis in lower level skills, and they have
ferent needs from teachers. In general they sions that can do harm to students and
limited access to instruction focusing on
need to know whether the school, school teachers and that sometimes have unfortu-
higher level thinking. A recent survey in
district, and state are effectively educating nate legal and economic consequences for
one state that uses high-stakes assessments
the students in their charge. For this pur- the schools. Decision makers reduce the
found that 75% of classroom teachers sur-
pose they need to collect information about chance of making a bad decision by seek-
veyed thought the state assessment had a
many students and they need to know how ing information from multiple sources.
negative impact on their teaching
those students stand in relation to other stu- However, the information from norm- (Hoffman et al., in press).
dents across the United States or in relation referenced and criterion-referenced tests is Another way that educators sometimes
to some specific standards set by the state. inexpensive to collect, easy to aggregate, respond to test pressure is to focus their at-
For these purposes, standardized norm- and usually is highly reliable; for those rea- tention on particular students. Sometimes
referenced or criterion-referenced tests are sons it is tempting to try to use this infor- this means that only low-performing read-
efficient and can give a broad picture of mation alone to make major decisions. ers get the instructional resources they
achievement on certain kinds of tasks. Another problem is that high-stakes need, and those doing only slightly better
These kinds of tests are used most com- tests have a tendency to narrow the cur- are ignored. Sometimes there is an attempt
monly for high-stakes decisions regarding riculum and inflate the importance of the to raise test scores by focusing instructional
schools and school districts. test. Schools should address a broad range initiatives on those students scoring just
of student learning needs, not just the sub- below cut-off points, and ignoring those
jects or parts of subject areas covered on a both above or far below cutoff points. And
particular test. As the consequences for sometimes schools place children in ex-
low performance are raised, teachers feel pensive special education programs they
pressured to raise scores at all costs. This do not need, discourage particular children
means they will focus their efforts on activ-
from attending school on testing days, or
Analyses of national reading scores do not
encourage low-achieving students to drop
out of school altogether, all in the name of
show the substantial gains claimed by state
reading assessments. Studies of norm-
Is there a way to
getting higher test scores.
referenced tests in states with sustained help states monitor
The loss of instructional time also is a
negative result of high-stakes tests. The
patterns of growth in state skill assess- student success in
time for preparing for and taking tests is
ments (for example, Texas and Kentucky)
show no comparable patterns of gain.
the curriculum?
time taken away from basic instruction. If the intent of state assessments is to mea-
Although Texas showed steady improve- ment
The consequences of lost instructional time, sure how well students are learning the
on state tests, its National Assessment of
particularly for low-performing students, outcomes identified in the state curriculum
Educational Progress (NAEP) reading scores
are too great for information that can be framework, then one way students’ success can
are not among the highest, and the scores did
gathered more efficiently. be monitored is by following the NAEP model
not show significant improve- ment between
Finally, we are concerned that instruc- with selective sampling across stu-
1992 and 1998 (U.S.
tional decision making in high-stakes testing dent populations and across content areas
Department of Education, 1999). This may
situations is diverted from local teachers and on a systematic basis. This model monitors
be the result of high-stakes assessments
is concentrated in a central achievement without encouraging high-
that tend to narrow the curriculum and em-
authority far away from the school. The stakes testing. The tests are directed to-
phasize only parts of what students need
further decision making is removed from ward particular grade levels and are not
to learn to become successful readers.
the local level of implementation, the less given every year. A sampling procedure is
adaptive the system becomes to individual used so very few students actually partici-
needs. High-stakes assessment shifts deci- Why don’t we just pate in testing. NAEP is designed to give a
sions from teachers and principals to bu- end high-stakes report card on general achievement levels
reaucrats and politicians and consequently
may diminish the quality of educational assessment? in the basic subject areas over time.
Many aspects of the NAEP assessment
services provided to students. It is unlikely that states using these assess- in reading are commendable. The NAEP
ments will abandon them. Indeed, the most sampling strategy has been useful in keep-
Do test scores likely scenario is for an increasing number of
states to develop and adopt similar assess-
ing efficiency high and maintaining a focus
improve when high- ment plans. Tests can be useful for making
on the questions that the national assess-
ment is designed to address. Sampling also
stakes assessment is state-level educational decisions, and they has provided NAEP with an opportunity to
mandated? provide the public with at least a partial un-
derstanding of how well schools are doing.
experiment with a wide variety of testing
formats and conditions. Such a strategy
Test scores in the states with high-stakes Less positively, politicians, bureaucrats, and would avoid most of the problems associ-
assessment plans have often shown im- test publishers have discovered that they can ated with teaching to the test. This type of
provement. This could be because high- influence classroom instruction through the plan would reflect sound principles of in-
stakes pressure and competition leads use of high-stakes tests. Tests allow these structional design and assessment.
teachers to teach reading more effectively. outside parties to take control away from lo- In the book High Stakes: Testing for
An alternative interpretation is that gains in cal educational authorities without assuming Tracking, Promotion, and Graduation
test scores are the result of “teaching to the the responsibilities of educating the students. (Heubert & Hauser, 1999), the following
test” even when reading does not improve.
basic principles for test use are presented:
• The important thing about a test is not of questions asked and/ or transitory fac- position statement is not to blame policy
its validity in general, but its validity when tors, such as the student’s health on the makers for the current dilemma with high-
used for a specific purpose. Thus, tests day of the test. Thus, no single test score stakes testing.
that are valid for influencing classroom can be considered a definitive measure of Our recommendations begin with a
practice, “leading” the curriculum, or hold- a student’s knowledge. consideration of teachers and their respon-
ing schools accountable are not appropri- • An educational decision that will have a sibility to create rich assessment environ-
ate for making high-stakes decisions about major impact on a test taker should not be ments in their classrooms and schools.
individual student mastery unless the made solely or automatically on the basis Next, we suggest that researchers must
curriculum, the teaching, and the tests of a single test score. Other relevant infor- continue to investigate how assessment
are aligned. mation about the student’s knowledge and can better serve our educational goals.
• Tests are not perfect. Test questions are skills should also be taken into account. Third, we stress the importance of parents
a sample of possible questions that could • Neither a test score nor any other kind and community members in bringing bal-
be asked in a given area. Moreover, a test of information can justify a bad decision. ance to the assessment design. Finally, we
score is not an exact measure of a stu- Research shows that students are typically offer recommendations to policy makers
dent’s knowledge or skills. A student’s hurt by a simple retention and repetition for developing a plan of action.
score can be expected to vary across dif- of a grade in school without remedial and
ferent versions of a test—within a margin other instructional support services. In the Recommendations to teachers:
of error determined by the reliability of the absence of effective services better tests • Construct more systematic and rigorous
test—as a function of the particular sample will not lead to better educational out- assessments for classrooms, so that external
comes. (p. 3) audiences will gain confidence in the mea-
State testing programs should respect these sures that are being used and their inherent
basic principles. value to inform decisions.
• Take responsibility to educate parents,
community members, and policy makers
about the forms of classroom-based assess-
What are the recom- ment, used in addition to standardized
mendations of the tests, that can improve instruction and
International Reading benefit students learning to read.

• Understand the difference between ethi-
Association regarding cal and unethical practices when teaching
high-stakes reading to the test. It is ethical to familiarize stu-
assessments? dents with the format of the test so they are
familiar with the types of questions and
In framing our recommendations the responses required. Spending time
Association would like to stress two points. on this type of instruction is helpful to all and
First, we recognize accountability is a nec- can be supportive of the regular curriculum.
essary part of education. Concerns over high- It is not ethical to devote substantial
stakes tests should not be interpreted as fear instructional time teaching to the test, and
of or disregard for professional
accountability. Second, the intent in this
There are few data on the impact of tests Recommendations to policy makers:
on instruction. Good baseline data and • Design an assessment plan that is consider-
follow-up studies will help in monitoring ate of the complexity of reading, learning to
the situation. These studies should not be read, and the teaching of reading. A strong
left to those who design, develop, and assessment plan is the best ally of teachers
implement tests; they should be conducted and administrators because it supports good
by independent researchers. instructional decision making and good in-
• Find ways to link performance assess- structional design. Consider the features of
ment alternatives to questions that external good assessment as outlined in Standards
audiences must address on a regular basis. for the Assessment of Reading and Writing
Researchers must continue to offer demon- (International Reading Association & National
strations of ways that data from perfor- Council of Teachers of English, 1994) in
mance assessments can be aggregated designing an assessment plan. Be aware of
meaningfully. This strategy will allow them to the pressures to use tests to make high-
build trustworthy informal assessments. stakes decisions.
• When decisions about students must be
Recommendations to parents, made that involve high-stakes outcomes
parent groups, and child advocacy (e.g., graduation, matriculation, awards) rely
groups: on multiple measures rather than just perfor-
it is not ethical to focus instructional time • Be vigilant regarding the costs of high- mance on a single test. The experiences in
on particular students who are most likely stakes tests on students. Parents must ask England with high-stakes assessment have
to raise test scores while ignoring groups questions about what tests are doing to been instructive. England has moved to an
unlikely to improve. their children and their schools. They can- assessment system that values teacher infor-
• Inform parents and the public about not simply accept the “we’re just holding mal assessments, ongoing performance as-
tests and their results. the school accountable” response as satis- sessments, portfolios, teacher recommenda-
• Resist the temptation to take actions to factory. They must consider cost, time, al- tions, and standardized testing. The
improve test scores that are not based on ternative methods, and emotional impact triangulation of data sources leads to more
the idea of teaching students to read better. on students as a result of these tests. valid decision making.
• Lobby for the development of classroom- • Use sampling strategies when assessments
Recommendations to researchers: based forms of assessment that provide do not involve decisions related to the per-
• Conduct ongoing evaluations of high- useful, understandable information, im- formance of individual students (e.g., pro-
stakes tests. These studies should include prove instruction, and help children be- gram evaluation). Sampling is less intrusive,
but not be limited to teacher use of results, come better readers. less costly, and just as reliable as full-scale
impact on the curriculum focus, time in testing assessment plans. Sampling strategies also
and test preparation, the costs of provide an opportunity to design alternate
the test (both direct and hidden), parent and forms and types of assessments. Such a vari-
community communication, and effects on ety of assessments encourages careful in-
teacher and student motivations. spection of issues of validity and reliability.
• Do not use incentives, resources, money, References 4. Why does using tests for high-stakes deci-
or recognition of test scores to reward or sions cause problems?
Heubert, J.P., & Hauser, R.M. (1999). High stakes: Allington, R.L., & McGill-Franzen, A. (1992).
punish schools or teachers. Neither the Testing for tracking, promotion, and graduation. Unintended effects of educational reform in New
awards (e.g., blue ribbon schools) nor the Washington, DC: National Academy Press. York State. Educational Policy, 6(4), 396–413.
punishing labels (e.g., low-performing Hoffman, J., Paris, S., Patterson, E.U., Pennington, Madaus, G.F. (1985). Test scores as administrative
schools) are in the interest of students or J., & Assaf, L.C. (in press). High-stakes assessment mechanisms in educational policy. Phi Delta
in the language arts: The piper plays, the players Kappan, 66(9), 611–617.
teachers. The consequences of achieving or dance, but who pays the price? In J. Flood, J.M. Mathison, S. (1989). The perceived effects of standard-
not achieving in schools are real Jensen, D. Lapp, & J. Squire (Eds.), Handbook of ized testing on teaching and curriculum. Paper
enough. Well-intentioned efforts to recog- research on teaching the English language arts presented at the annual meeting of the American
(2nd ed.). Educational Research Association, San Francisco,
nize achievement often become disincen-
International Reading Association & National CA.
tives to those who need the most help. Council of Teachers of English. (1994). McGill-Franzen, A., & Allington, R.L. (1993).
• Do not attempt to manipulate instruction Standards for the assessment of reading and Flunk ’em or get them classified: The contamina-
through assessments. In other words, do writing. Newark, DE: International Reading tion of primary grade accountability data.
Association; Urbana, IL: National Council of Educational Researcher, 22(1), 19–22, 34.
not initiate, design, or implement high- Teachers of English. Paris, S.G. (1998). Why learner-centered assessment
stakes tests when the primary goal is to is better than high-stakes testing. In N.M. Lambert
U.S. Department of Education. (1999). The NAEP
affect instructional practices. Ask the ques- 1998 reading report card for the nation and the & B.L. McCombs (Eds.), How students learn:
tion, “Is the primary goal of the assessment states (NCES 1999-459). Washington, DC: Author. Reforming schools through learner-centered edu-
cation (pp. 189–209). Washington, DC: American
to collect data that will be used to make Psychological Association.
better decisions that impact the individual 5. Do test scores improve when high-stakes
Suggested Readings assessment is mandated?
students taking the test?” If the answer is
1. What does the term high-stakes testing Cornet, H.D., & Wilson, B.L. (1991). Testing reform
“no,” high-stakes tests are inappropriate. mean? and rebellion. Norwood, NJ: Ablex.
The pattern of testing as the preferred Downing, S., & Haladyna, T. (1996). A model for Resnick, D.P., & Resnick, L.B. (1985). Standards, cur-
tool to manipulate teaching continues to evaluation of high-stakes testing programs: Why riculum, and performance: A historical and com-
expand. We call on educators, policy mak- the fox should not guard the chicken coop. parative perspective. Educational Researcher,
Educational Measurement: Issues and Practice, 14(4), 5–20.
ers, community leaders, and parents to 5(1), 5–12. Wise, A.E. (1990, January 10). A look ahead:
take a common-sense look at the testing in Popham, W. (1987). Can high-stakes tests be devel- Education and the new decade. Education Week,
schools today. Visit classrooms. Talk to oped at the local level? NASSP Bulletin, 71(496), p. 30.
77–84. 6. Why don’t we just end high-stakes
teachers. Listen to teachers talk about the assessment?
2. Why are we concerned with high-stakes
curriculum and the decisions they are mak- testing? Madaus, G.F. (1985). Test scores as administrative
ing. Talk to the teachers about the kinds of Pipho, C., & Hadley, C. (1985). State activity: mechanisms in educational policy. Phi Delta
assessments they use in the classroom and Minimum competence testing as of January 1985 Kappan, 66(9), 611–617.
(Clearinghouse notes). Denver, CO: Education 7. Is there a way to help states monitor student
how they use collected data. To be op- Commission of the States. success in the curriculum?
posed to large-scale, high-stakes testing is 3. Is testing an important part of good educa- Linn, R.L. (1993). Educational assessment: Expanded
not to be opposed to assessment or ac- tional design? expectations and challenges. Educational,
International Reading Association. (1995). Reading Evaluation and Policy Analysis, 15(1), 1–16.
countability. It is to affirm the necessity of Messick, S. (1993). Validity. In R.L. Linn (Ed.),
assessment in practice. Newark, DE: Author.
aligning our purposes and goals with our Educational measurement (3rd ed., pp. 13–103).
methods. Washington, DC: American Council on Education.
Moss, P. (1998). The role of consequences in validity
theory. Educational measurement: Issues and
practice, 17(2), 6–12.
Adopted by the Board of Directors, May 1999
Board of Directors at Time of Adoption
Kathryn A. Ransom, President
Carol Minnick Santa, President-Elect
Carmelita K. Williams, Vice President
Alan E. Farstrup, Executive Director
Kathryn H. Au
Betsy M. Baker
Patricia A. Edwards
James V. Hoffman
Adria F. Klein
Diane L. Larson
John W. Logan
Lesley Mandel Morrow
Timothy Shanahan
This brochure may be purchased from the International

Reading Association in quantities of 10, prepaid only.
(Please contact the Association for pricing information.)
Single copies are free upon request by sending a self-
addressed, stamped envelope. Requests from outside
the United States should include an envelope, but postage
is not required.
©
1999 International Reading Association
Cover and inside photo 3: Little Bighorn Photos
Inside photo 1, 2, & 4: Robert Finken
INTER N ATI O N
AL
Reading
Association
800 Barksdale Road
®
PO Box 8139
Newark, Delaware 19714-8139, USA
Phone: 302-731-1600
Fax: 302-731-1057
Web site: www.reading.org
99-19 pub 6/99 1035 7/99

High Stakes Assessment in Reading

Uploaded by

Copyright:

Available Formats

High Stakes Assessment in Reading

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High Stakes Assessment in Reading

Uploaded by

Copyright:

Available Formats

High-Stakes

International Reading benefit students learning to read.

This brochure may be purchased from the International

You might also like