Page 26
TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO
TOEFL REVIEW
TOEFL iBT Review: The Reading Section
By Kimberley Hindy & Derek Martin
S
ince its inception, the TOEFL has evolved
from a paper-based test (PBT) to an Internet-based test, TOEFL iBT (Wall & Horák,
2008, p. ii). Early publications of the TOEFL
Monograph series set out a preliminary working framework for the development of the iBT,
stating that the goals of the test development
program were to design a test that “was more
reflective of communicative competence models, included tasks that integrated the language
modalities tested, [and] provided more information than current TOEFL scores did about
international students’ ability to use English in
an academic environment” (Jamieson et al.,
2000, p. 3). Chapelle et al. provide a detailed
history of the iBT’s distribution worldwide
(2008, p. 359–361).
The TOEFL website
(www.toefl.org/) contains pertinent information for test takers, academic institutions, and
English language teachers, as well as the TOEFL
iBT Research series, TOEFL Research Reports,
and Monograph series reports.
This review of the TOEFL iBT Reading
Section has the intentions of describing its educational purposes and highlighting critical areas of validity research and implication limitations. It is our hope that this review will be a
useful reference for teachers of English for Academic Purposes (EAP).
Educational Purposes
The purpose of the TOEFL iBT is to assess English proficiency for academic purposes: “the TOEFL® test measures [the test
taker’s] ability to communicate in English in
colleges and universities” (www.toefl.org/).
TOEFL scores are accepted by more than 6000
(Continued on page 27)
Contact
Page 27
volume 36, issue 3
(Continued from page 26)
colleges, universities, licensing agencies, and
immigration authorities in 136 countries
(Alderson, 2009, p. 621).
ETS asserts that the
“test is scored using methods
that ensure unbiased results
and a quality control process
that meets the highest standards of fairness and objectivity” (ETS, 2009a). ETS aims
to simulate tasks that are typical of university settings
which “ensures applicants
are equipped with the skills
students need in a higher
education classroom” (ETS,
2009b).
takers can expect to experience in university.
The 60 to 100 minutes allotted for this section
includes time for reading the passages
(categorized as Exposition, Argumentation and
Historical) and answering the questions (ETS,
2006, p. 19). From these alterations, EAP instructors may infer
that students could benefit from
strategy use through exposure to
longer reading passages in order
to more effectively prepare them
for reading tasks at the tertiary
level.
“EAP instructors
may infer that
students could
benefit from strategy use through
exposure to longer
reading passages
in order to more
effectively prepare
them for reading
tasks at the
tertiary level.”
The Reading section
in particular assesses the test
taker’s ability to understand
and perform university-level
academic reading tasks. The
specific purposes for academic reading that the
TOEFL iBT aims to address
are as follows: reading to find
information (i.e. effectively
scanning text for key facts
and important information), basic comprehension (i.e. understanding the general topic or
main idea, major points, important facts and
details, and vocabulary in context), and reading
to learn (i.e. recognizing the organization and
purpose of a passage and inferring how ideas
throughout the passage connect) (Wall & Horák,
2008, p.1). There are fewer but longer passages
than previous versions of the TOEFL (500-700
vs. 300-400 words; 3-5 reading passages with
12-14 questions per passage) on a variety of
topics, a change that was made in order to more
authentically reproduce the reading tasks test
TOEFL iBT Reading Question Formats
The TOEFL iBT Reading
section is scored out of 30; ETS
has published score conversion
tables for the PBT and ComputerBased Test (CBT) which offer useful interpretation guidelines for
the new iBT scores (ETS, 2005a).
There are three question formats
in the Reading section: those with
four choices and a single answer
in traditional multiple-choice format, those with four choices and a
single answer that requires test
takers to “insert a sentence” where it best fits,
and “reading to learn” questions with more than
four choices and more than one possible correct answer (ETS, 2008b, p.9-10). The innovative “reading to learn” questions test the taker’s
ability to recognize paragraph organization and
relationships among facts and ideas in different
parts of the passage. The glossary feature of the
iBT allows test takers to select “special purpose
words and phrases” to view a definition or explanation of the term (ibid, p. 10). By including
this glossary feature, the interactive nature of
(Continued on page 28)
Page 28
TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO
(Continued from page 27)
the iBT allows for more vocabulary support than
the PBT and may strengthen the validity of
“reading to learn.”
Issues of Content and
Construct
Validity
ground” (Liu et al., 2009, p. vi). The researchers
found very little effect of this kind, which supports the claim by ETS that “test takers should
not be concerned if they are unfamiliar with a
topic. The passage contains all the information
needed to answer the questions” (ETS, 2008b, p. 8). In a
high-stakes testing situation such
as the TOEFL iBT, it is crucial to
eliminate unwarranted item advantages for certain test takers,
in order to ensure the validity of
the test scores (Liu et al., 2009,
p.4). Of course, as the iBT continues to evolve, more studies will
be required to maintain content
validity. For their part, EAP instructors must be aware that the
iBT does not test specific knowledge of particular subjects and
should expose their students to a
variety of reading topics, perhaps taking into consideration
factors such as the interests and
intended academic specialties of
their students, but not the anticipated topics of TOEFL reading
passages.
“If preparation
materials or
programs focus on
test taking ‘tricks,’
‘tips,’ ‘strategies,’
or ‘cracking the
TOEFL’ ...they
should be viewed
as suspect.”
Validity is “the degree to which evidence and
theory support the interpretations of test scores entailed
by proposed uses of
tests” (AERA, APA, & NCME,
1999, p. 9). The validity
claims of the Reading section
of the TOEFL iBT have been
the subject of much research;
in fact, there is an annual call
for research proposals by
researchers outside of ETS.
As aforementioned, the
change to longer reading
passages in the iBT means
that there are also fewer passages. A concern arising from
this change is whether the
decrease in topic variety increases the likelihood that an examinee’s familiarity with the particular content of the passages will influence the examinee’s reading
performance. Having identified this concern,
Liu et al. used differential item functioning (DIF)
and differential bundle functioning (DBF) to investigate the impact of outside knowledge on
TOEFL iBT reading performance. The rationale
for this research was that the TOEFL iBT is “a
test of communicative language skills rather
than of specific content knowledge, and therefore the test results should not be affected by
test takers’ major field of study or cultural back-
A construct validity concern of
the Reading section is the possibility that test takers may use strategies other
than the reading strategies that the test is intended to assess. They have been referred to as
“test wiseness” strategies and may include
various ways of selecting answers without properly reading and comprehending a text passage (Cohen & Upton, 2006, p. 4). More generally, test takers may “find themselves using
strategies that they would not use under pretest conditions. It is for this reason that during
the pilot phase, it is crucial for test constructors
to find out what their tests are actually measur(Continued on page 29)
Contact
Page 29
volume 36, issue 3
(Continued from page 28)
ing” (ibid, p.5). Cohen and Upton analyzed students’ verbal reports to determine their reading
and test-taking strategies to answer reading
comprehension questions. Data
were collected from a sample
group of 32 students, from four
language groups (Chinese,
Japanese, Korean, and other
languages), as they responded
to prototype TOEFL reading
comprehension tasks mimicking
those of the TOEFL iBT test. It
was noted that test takers did
not rely on “test wiseness”
strategies, but that their strategies:
or ‘cracking the TOEFL’ (which a quick Internet
search will reveal), they should be viewed as
suspect. Of course, as students become more
familiar with the TOEFL iBT through such practice materials and preparatory courses, construct validity may be weakened since what they will
learn may not only be relevant
reading skills, but also more
sophisticated “test wiseness”
skills. Worthy of note is that in
the case of the Cohen and Upton study, the sample was
taken from an East Asian context. More research to verify
their findings against other
cultural and linguistic contexts
is needed.
“By including
integrated tasks
and skill-specific
feedback,
teachers, learners
and test-takers
may experience
further positive
washback.”
reflect the fact that respondents were in actuality
engaged with the reading
test tasks in the manner desired by the test designers...
respondents were actively
working to understand the
text, to understand the expectations of the questions,
to understand the meaning
and implications of the different options in light of the text, and to select
and discard options based on what
they understood about the text (p. 105).
These findings suggest that test takers
might achieve high scores on iBT reading comprehension tasks by using reading strategies or
appropriate test management strategies. This
indicates that EAP instructors should be cognizant of helping learners improve their reading
strategies. If preparation materials or programs
focus on test taking ‘tricks,’ ‘tips,’ ‘strategies,’
In addition to the isolated Reading section, the iBT
aims to measure how well a
test taker is able to use integrated language skills in the
university classroom. Thus, it
contains integrated sections
which model academic requirements of “combining information they have heard in
class lectures with what they
have read in textbooks or
other materials” (ETS, 2008b, p. 22) by incorporating information from a reading passage into
their spoken or written responses. As preliminary versions of integrated tasks were contemplated, Cumming et al. (2006) supported the
inclusion of integrated reading-writing and/or
listening-writing tasks as measures of English
writing proficiency in the TOEFL [iBT]. These
prototype tasks allowed written discourse that
differed significantly in a variety of ways from
which were produced in the independent essay
on the TOEFL, providing an additional measure
of writing ability that can be scored reliably and
(Continued on page 30)
Page 30
TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO
(Continued from page 29)
that interconnects English language comprehension purposefully with text production
(Cumming et al., 2006, p.46). Additionally, the
new scores of the TOEFL iBT
come with “helpful performance feedback on their score
reports” (ETS, 2006, p. 5; see
Table 1). ETS provides comprehensive scoring information that
includes scores for the four
skills and a total score. An understanding of the quality of
feedback given by ETS may be
beneficial for EAP instructors to
consider as they provide their
own feedback; since this feedback portion is a new development, it likely is evidence of
influence from extensive external research. By including integrated tasks and skill-specific
feedback, teachers, learners,
and test-takers may experience
further positive washback.
More research would surely
offer interesting insights.
Limitations
A cautionary note relates to how the
TOEFL is used. Some institutions set section or
skill score requirements either
by themselves or in combination with a total score, while
others plan to set a total score
standard in order to make use
of the score information in
ways they deem best suited to
their purposes (ETS, 2005b).
This application of TOEFL iBT
scores could affect its reliability for academic application
since it may influence test takers’ focus on the actual exam
in accordance with preferences of preferred universities. Teachers and students
should be aware of this form of
washback, which derives from
the use of the test, rather than
from the test itself.
“The TOEFL iBT
does not measure
non-linguistic
factors;
stakeholders
should...refrain
from inferring the
predictive
validity of iBT
scores.”
Another concern relates to how test reliability contributes to test validity. “The more reliable the
scores are, the more confidence score users
have in using the scores for making important
decisions about test takers” (ETS, 2008a). In
Zhang’s analyses on repeater performance
(2008) the “high to moderate correlations between the two test scores indicated a high degree of consistency in repeaters’ rank orders of
their scores” (Zhang, 2008, p. 10). This suggests
there is a high degree of reliability for the
TOEFL iBT, even for people who take the test
more than once.
Of course, there are multifaceted variables at play in tertiary studies – linguistic, as well
as non-linguistic factors. The
TOEFL iBT does not measure
non-linguistic factors; stakeholders should be
sure to recognize this limitation and refrain
from inferring the predictive validity of iBT
scores. In fact, a study was undertaken at the
University of Western Ontario and Brescia University College to support their practice of using TOEFL scores as part of the overall academic profile of applicants, “but with no explicit cutoff score” (Simner & Mitchell, 2007),
meaning an iBT score is used to inform, but not
to dictate, acceptance. This approach utilizes
the strength of the TOEFL to measure the lan(Continued on page 31)
Contact
Page 31
volume 36, issue 3
(Continued from page 30)
guage skills that are important to academic success, but it avoids overdependence on a TOEFL
score as a predictor of overall academic success.
Still, a reason given for taking the
TOEFL iBT is that a successful test taker “will be
able to ...read textbooks, perform online research, speak with professors and other students, write academic papers, reports, e-mails
and more” (ETS, 2009c). Additionally, many ETS
publications are equipped with slogans such as,
“TOEFL scores open more doors” (ETS, 2007).
However, there appears to be inadequate research to support this claim. A study done in the
late 1990s at the University of Bahrain suggests
that success on the TOEFL is a poor predictor of
academic success (Al-Musawi & Al-Ansari,
1999). As this research was done on a pre-iBT
version, interpretations must not be overgeneralized. More longitudinal studies that
track the academic success of iBT TOEFL test
takers would be useful.
Conclusion
The TOEFL iBT is recognized for its reliability and validity as a large-scale standardized proficiency test of English for Academic
Purposes. Within this context, the Reading section of the test has evolved to better simulate
the reading task conditions faced by students in
university settings. The incorporation of integrated reading, listening, and writing tasks
highlights the importance of simulating authentic reading tasks for tertiary studies and indicates that recent non-ETS research on validity
has been carefully considered. In recent years,
ESL professionals have done extensive research
projects on the TOEFL iBT which has both been
compensated by ETS and by outside parties.
Such research from those who would be more
likely to critically analyze the test and ultimately
improve its validity is welcome by ETS and is
most useful for EAP instructors who may not
understand the constructs measured by this
preparatory tertiary exam. Although there are a
number of issues discussed in this review of
which EAP instructors need to be critically
aware, the iBT Reading test remains at the cutting edge of language testing in terms of validity and reliability.
References
Al-Musawi, N. M. & Al-Ansari, S. H. (1999). Test
of English as a Foreign Language and First
Certificate of English tests as Predictors of
Academic Success for Undergraduate Students at the University of Bahrain. System:
An International Journal of Educational Technology and Applied Linguistics, 27(3), 389399.
Alderson, J. Charles. (2009). Test of English as a
Foreign Language™: Internet-based Test
(TOEFL iBT®). Language Testing 26 (4), 621–
631.
American Educational Research Association,
American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and
psychological testing. Washington, DC:
American Educational Research Association.
Chapelle, C. A., Enright, M. K., & Jamieson, J. M.
(Eds). (2008). Building a validity argument
for the Test of English as a Foreign Language.
New York: Routledge.
Cohen, A., & Upton, T. (2006). Strategies in responding to the New TOEFL reading tasks
(TOEFL Monograph No. MS-33). Princeton,
NJ: Educational Testing Service.
(Continued on page 32)
Page 32
TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO
(Continued from page 31)
Cumming, A., Kantor, R., Baba, K., Eouanzoui,
K., Erdosy, U., & James, M. (2006). Analysis
of discourse features and verification of scoring levels for independent and integrated
prototype writing tasks for new TOEFL
(TOEFL Monograph MS-30). Princeton, NJ:
Educational Testing Service.
ETS. (2005a). TOEFL iBT Score Comparison Tables. Princeton, NJ: Educational Testing Service.
ETS. (2005b). TOEFL iBT Scores Set by Universities and Other Score Users. Princeton, NJ:
Educational Testing Service.
ETS. (2006).The Official Guide to the new TOEFL
iBT. Princeton, NJ: Educational Testing Service.
ETS. (2007). TOEFL® iBT Score Reliability and
Generalizability. Princeton, NJ: Educational
Testing Service.
ETS. (2008a). Reliability and Comparability of
TOEFL® iBT Scores. Princeton, NJ: Educational Testing Service.
ETS. (2008b). TOEFL iBT tips – How to prepare
for the TOEFL iBT. Princeton, NJ: Educational
Testing
Service.
ETS. (2009c). About the TOEFL iBT. Retrieved
from: http://www.ets.org/portal/site/
ets/menuitem.1488512ecfd5b8849
a77b13bc3921509/?vgnextoid=
f138af5e44df4010VgnVCM1000002
2f95190RCRD&vgnextchannel=b5f5197a4
84f4010VgnVCM10000022f95190RCRD
ETS. (2009a). For Academic Institutions. Retrieved from: http://www.ets.org/portal/
site/ets/menuitem.Fab2360b1645a1de9
b3a0779f1751509/?vgnextoid=6be9d898
c84f4010 VgnVCM10000022f95190RCRD
ETS. (2009b). Test Content. Retrieved from:
http://www.ets.org/portal/site/ets/
menuitem.1488512ecfd5b8849a77b13bc39
2 1 5 0 9 / ?v g ne x toi d = ff 58 af 5 e 4 4 df 4 0 1 0
VgnVCM10000022f95190RCRD&vgnextch
annel=ab16197a484f4010VgnVCM1000002
2f95190RCRD
Gomez, P. G., Noah, A., Schedl, M., Wright, C., &
Yolkut, A. (2007). Proficiency descriptors
based on a scale-anchoring study of the new
TOEFT iBT reading test. Language Testing,
24 (3), 417-444.
Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., &
Taylor, C. (2000). TOEFL 2000 framework: A
working paper (TOEFL Monograph MS-16).
Princeton, NJ: Educational Testing Service.
Liu, O.L., Schedl, M., Malloy, J., & Kong, N.
(2009). Does content knowledge affect TOEFL
iBT reading performance? A confirmatory
approach to differential item functioning.
TOEFL iBT Research Report. Princeton, NJ:
Educational Testing Service.
Simner, Marvin L. & Mitchell, John B. (2007).
Validation of the TOEFL as a Canadian University Admissions Requirement. Canadian
Journal of School Psychology, 22, 182-190.
TOEFL website: www.toefl.org/
Wall, D. & Horák, T. (2006). The impact of
changes in the TOEFL examination on teaching and learning in Central and Eastern
Europe: Phrase 1, The baseline study (TOEFL
Monograph No. MS-34). Princeton, NJ: Educational Testing Service.
Zhang, Y. (2008). Repeater analyses for TOEFL
iBT. (ETS Research Memorandum 08-05).
Princeton, NJ: Educational Testing Service.