Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

TOEFL iBT Review: The Reading Section

This paper discusses the educational purposes of the TOEFL iBT Reading section, important research into its validity, and implications and limitations to its use.

Page 26 TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO TOEFL REVIEW TOEFL iBT Review: The Reading Section By Kimberley Hindy & Derek Martin S ince its inception, the TOEFL has evolved from a paper-based test (PBT) to an Internet-based test, TOEFL iBT (Wall & Horák, 2008, p. ii). Early publications of the TOEFL Monograph series set out a preliminary working framework for the development of the iBT, stating that the goals of the test development program were to design a test that “was more reflective of communicative competence models, included tasks that integrated the language modalities tested, [and] provided more information than current TOEFL scores did about international students’ ability to use English in an academic environment” (Jamieson et al., 2000, p. 3). Chapelle et al. provide a detailed history of the iBT’s distribution worldwide (2008, p. 359–361). The TOEFL website (www.toefl.org/) contains pertinent information for test takers, academic institutions, and English language teachers, as well as the TOEFL iBT Research series, TOEFL Research Reports, and Monograph series reports. This review of the TOEFL iBT Reading Section has the intentions of describing its educational purposes and highlighting critical areas of validity research and implication limitations. It is our hope that this review will be a useful reference for teachers of English for Academic Purposes (EAP). Educational Purposes The purpose of the TOEFL iBT is to assess English proficiency for academic purposes: “the TOEFL® test measures [the test taker’s] ability to communicate in English in colleges and universities” (www.toefl.org/). TOEFL scores are accepted by more than 6000 (Continued on page 27) Contact Page 27 volume 36, issue 3 (Continued from page 26) colleges, universities, licensing agencies, and immigration authorities in 136 countries (Alderson, 2009, p. 621). ETS asserts that the “test is scored using methods that ensure unbiased results and a quality control process that meets the highest standards of fairness and objectivity” (ETS, 2009a). ETS aims to simulate tasks that are typical of university settings which “ensures applicants are equipped with the skills students need in a higher education classroom” (ETS, 2009b). takers can expect to experience in university. The 60 to 100 minutes allotted for this section includes time for reading the passages (categorized as Exposition, Argumentation and Historical) and answering the questions (ETS, 2006, p. 19). From these alterations, EAP instructors may infer that students could benefit from strategy use through exposure to longer reading passages in order to more effectively prepare them for reading tasks at the tertiary level. “EAP instructors may infer that students could benefit from strategy use through exposure to longer reading passages in order to more effectively prepare them for reading tasks at the tertiary level.” The Reading section in particular assesses the test taker’s ability to understand and perform university-level academic reading tasks. The specific purposes for academic reading that the TOEFL iBT aims to address are as follows: reading to find information (i.e. effectively scanning text for key facts and important information), basic comprehension (i.e. understanding the general topic or main idea, major points, important facts and details, and vocabulary in context), and reading to learn (i.e. recognizing the organization and purpose of a passage and inferring how ideas throughout the passage connect) (Wall & Horák, 2008, p.1). There are fewer but longer passages than previous versions of the TOEFL (500-700 vs. 300-400 words; 3-5 reading passages with 12-14 questions per passage) on a variety of topics, a change that was made in order to more authentically reproduce the reading tasks test TOEFL iBT Reading Question Formats The TOEFL iBT Reading section is scored out of 30; ETS has published score conversion tables for the PBT and ComputerBased Test (CBT) which offer useful interpretation guidelines for the new iBT scores (ETS, 2005a). There are three question formats in the Reading section: those with four choices and a single answer in traditional multiple-choice format, those with four choices and a single answer that requires test takers to “insert a sentence” where it best fits, and “reading to learn” questions with more than four choices and more than one possible correct answer (ETS, 2008b, p.9-10). The innovative “reading to learn” questions test the taker’s ability to recognize paragraph organization and relationships among facts and ideas in different parts of the passage. The glossary feature of the iBT allows test takers to select “special purpose words and phrases” to view a definition or explanation of the term (ibid, p. 10). By including this glossary feature, the interactive nature of (Continued on page 28) Page 28 TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO (Continued from page 27) the iBT allows for more vocabulary support than the PBT and may strengthen the validity of “reading to learn.” Issues of Content and Construct Validity ground” (Liu et al., 2009, p. vi). The researchers found very little effect of this kind, which supports the claim by ETS that “test takers should not be concerned if they are unfamiliar with a topic. The passage contains all the information needed to answer the questions” (ETS, 2008b, p. 8). In a high-stakes testing situation such as the TOEFL iBT, it is crucial to eliminate unwarranted item advantages for certain test takers, in order to ensure the validity of the test scores (Liu et al., 2009, p.4). Of course, as the iBT continues to evolve, more studies will be required to maintain content validity. For their part, EAP instructors must be aware that the iBT does not test specific knowledge of particular subjects and should expose their students to a variety of reading topics, perhaps taking into consideration factors such as the interests and intended academic specialties of their students, but not the anticipated topics of TOEFL reading passages. “If preparation materials or programs focus on test taking ‘tricks,’ ‘tips,’ ‘strategies,’ or ‘cracking the TOEFL’ ...they should be viewed as suspect.” Validity is “the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (AERA, APA, & NCME, 1999, p. 9). The validity claims of the Reading section of the TOEFL iBT have been the subject of much research; in fact, there is an annual call for research proposals by researchers outside of ETS. As aforementioned, the change to longer reading passages in the iBT means that there are also fewer passages. A concern arising from this change is whether the decrease in topic variety increases the likelihood that an examinee’s familiarity with the particular content of the passages will influence the examinee’s reading performance. Having identified this concern, Liu et al. used differential item functioning (DIF) and differential bundle functioning (DBF) to investigate the impact of outside knowledge on TOEFL iBT reading performance. The rationale for this research was that the TOEFL iBT is “a test of communicative language skills rather than of specific content knowledge, and therefore the test results should not be affected by test takers’ major field of study or cultural back- A construct validity concern of the Reading section is the possibility that test takers may use strategies other than the reading strategies that the test is intended to assess. They have been referred to as “test wiseness” strategies and may include various ways of selecting answers without properly reading and comprehending a text passage (Cohen & Upton, 2006, p. 4). More generally, test takers may “find themselves using strategies that they would not use under pretest conditions. It is for this reason that during the pilot phase, it is crucial for test constructors to find out what their tests are actually measur(Continued on page 29) Contact Page 29 volume 36, issue 3 (Continued from page 28) ing” (ibid, p.5). Cohen and Upton analyzed students’ verbal reports to determine their reading and test-taking strategies to answer reading comprehension questions. Data were collected from a sample group of 32 students, from four language groups (Chinese, Japanese, Korean, and other languages), as they responded to prototype TOEFL reading comprehension tasks mimicking those of the TOEFL iBT test. It was noted that test takers did not rely on “test wiseness” strategies, but that their strategies: or ‘cracking the TOEFL’ (which a quick Internet search will reveal), they should be viewed as suspect. Of course, as students become more familiar with the TOEFL iBT through such practice materials and preparatory courses, construct validity may be weakened since what they will learn may not only be relevant reading skills, but also more sophisticated “test wiseness” skills. Worthy of note is that in the case of the Cohen and Upton study, the sample was taken from an East Asian context. More research to verify their findings against other cultural and linguistic contexts is needed. “By including integrated tasks and skill-specific feedback, teachers, learners and test-takers may experience further positive washback.” reflect the fact that respondents were in actuality engaged with the reading test tasks in the manner desired by the test designers... respondents were actively working to understand the text, to understand the expectations of the questions, to understand the meaning and implications of the different options in light of the text, and to select and discard options based on what they understood about the text (p. 105). These findings suggest that test takers might achieve high scores on iBT reading comprehension tasks by using reading strategies or appropriate test management strategies. This indicates that EAP instructors should be cognizant of helping learners improve their reading strategies. If preparation materials or programs focus on test taking ‘tricks,’ ‘tips,’ ‘strategies,’ In addition to the isolated Reading section, the iBT aims to measure how well a test taker is able to use integrated language skills in the university classroom. Thus, it contains integrated sections which model academic requirements of “combining information they have heard in class lectures with what they have read in textbooks or other materials” (ETS, 2008b, p. 22) by incorporating information from a reading passage into their spoken or written responses. As preliminary versions of integrated tasks were contemplated, Cumming et al. (2006) supported the inclusion of integrated reading-writing and/or listening-writing tasks as measures of English writing proficiency in the TOEFL [iBT]. These prototype tasks allowed written discourse that differed significantly in a variety of ways from which were produced in the independent essay on the TOEFL, providing an additional measure of writing ability that can be scored reliably and (Continued on page 30) Page 30 TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO (Continued from page 29) that interconnects English language comprehension purposefully with text production (Cumming et al., 2006, p.46). Additionally, the new scores of the TOEFL iBT come with “helpful performance feedback on their score reports” (ETS, 2006, p. 5; see Table 1). ETS provides comprehensive scoring information that includes scores for the four skills and a total score. An understanding of the quality of feedback given by ETS may be beneficial for EAP instructors to consider as they provide their own feedback; since this feedback portion is a new development, it likely is evidence of influence from extensive external research. By including integrated tasks and skill-specific feedback, teachers, learners, and test-takers may experience further positive washback. More research would surely offer interesting insights. Limitations A cautionary note relates to how the TOEFL is used. Some institutions set section or skill score requirements either by themselves or in combination with a total score, while others plan to set a total score standard in order to make use of the score information in ways they deem best suited to their purposes (ETS, 2005b). This application of TOEFL iBT scores could affect its reliability for academic application since it may influence test takers’ focus on the actual exam in accordance with preferences of preferred universities. Teachers and students should be aware of this form of washback, which derives from the use of the test, rather than from the test itself. “The TOEFL iBT does not measure non-linguistic factors; stakeholders should...refrain from inferring the predictive validity of iBT scores.” Another concern relates to how test reliability contributes to test validity. “The more reliable the scores are, the more confidence score users have in using the scores for making important decisions about test takers” (ETS, 2008a). In Zhang’s analyses on repeater performance (2008) the “high to moderate correlations between the two test scores indicated a high degree of consistency in repeaters’ rank orders of their scores” (Zhang, 2008, p. 10). This suggests there is a high degree of reliability for the TOEFL iBT, even for people who take the test more than once. Of course, there are multifaceted variables at play in tertiary studies – linguistic, as well as non-linguistic factors. The TOEFL iBT does not measure non-linguistic factors; stakeholders should be sure to recognize this limitation and refrain from inferring the predictive validity of iBT scores. In fact, a study was undertaken at the University of Western Ontario and Brescia University College to support their practice of using TOEFL scores as part of the overall academic profile of applicants, “but with no explicit cutoff score” (Simner & Mitchell, 2007), meaning an iBT score is used to inform, but not to dictate, acceptance. This approach utilizes the strength of the TOEFL to measure the lan(Continued on page 31) Contact Page 31 volume 36, issue 3 (Continued from page 30) guage skills that are important to academic success, but it avoids overdependence on a TOEFL score as a predictor of overall academic success. Still, a reason given for taking the TOEFL iBT is that a successful test taker “will be able to ...read textbooks, perform online research, speak with professors and other students, write academic papers, reports, e-mails and more” (ETS, 2009c). Additionally, many ETS publications are equipped with slogans such as, “TOEFL scores open more doors” (ETS, 2007). However, there appears to be inadequate research to support this claim. A study done in the late 1990s at the University of Bahrain suggests that success on the TOEFL is a poor predictor of academic success (Al-Musawi & Al-Ansari, 1999). As this research was done on a pre-iBT version, interpretations must not be overgeneralized. More longitudinal studies that track the academic success of iBT TOEFL test takers would be useful. Conclusion The TOEFL iBT is recognized for its reliability and validity as a large-scale standardized proficiency test of English for Academic Purposes. Within this context, the Reading section of the test has evolved to better simulate the reading task conditions faced by students in university settings. The incorporation of integrated reading, listening, and writing tasks highlights the importance of simulating authentic reading tasks for tertiary studies and indicates that recent non-ETS research on validity has been carefully considered. In recent years, ESL professionals have done extensive research projects on the TOEFL iBT which has both been compensated by ETS and by outside parties. Such research from those who would be more likely to critically analyze the test and ultimately improve its validity is welcome by ETS and is most useful for EAP instructors who may not understand the constructs measured by this preparatory tertiary exam. Although there are a number of issues discussed in this review of which EAP instructors need to be critically aware, the iBT Reading test remains at the cutting edge of language testing in terms of validity and reliability. ™ References Al-Musawi, N. M. & Al-Ansari, S. H. (1999). Test of English as a Foreign Language and First Certificate of English tests as Predictors of Academic Success for Undergraduate Students at the University of Bahrain. System: An International Journal of Educational Technology and Applied Linguistics, 27(3), 389399. Alderson, J. Charles. (2009). Test of English as a Foreign Language™: Internet-based Test (TOEFL iBT®). Language Testing 26 (4), 621– 631. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds). (2008). Building a validity argument for the Test of English as a Foreign Language. New York: Routledge. Cohen, A., & Upton, T. (2006). Strategies in responding to the New TOEFL reading tasks (TOEFL Monograph No. MS-33). Princeton, NJ: Educational Testing Service. (Continued on page 32) Page 32 TEACHERS OF ENGLISH AS A SECOND LANGUAGE OF ONTARIO (Continued from page 31) Cumming, A., Kantor, R., Baba, K., Eouanzoui, K., Erdosy, U., & James, M. (2006). Analysis of discourse features and verification of scoring levels for independent and integrated prototype writing tasks for new TOEFL (TOEFL Monograph MS-30). Princeton, NJ: Educational Testing Service. ETS. (2005a). TOEFL iBT Score Comparison Tables. Princeton, NJ: Educational Testing Service. ETS. (2005b). TOEFL iBT Scores Set by Universities and Other Score Users. Princeton, NJ: Educational Testing Service. ETS. (2006).The Official Guide to the new TOEFL iBT. Princeton, NJ: Educational Testing Service. ETS. (2007). TOEFL® iBT Score Reliability and Generalizability. Princeton, NJ: Educational Testing Service. ETS. (2008a). Reliability and Comparability of TOEFL® iBT Scores. Princeton, NJ: Educational Testing Service. ETS. (2008b). TOEFL iBT tips – How to prepare for the TOEFL iBT. Princeton, NJ: Educational Testing Service. ETS. (2009c). About the TOEFL iBT. Retrieved from: http://www.ets.org/portal/site/ ets/menuitem.1488512ecfd5b8849 a77b13bc3921509/?vgnextoid= f138af5e44df4010VgnVCM1000002 2f95190RCRD&vgnextchannel=b5f5197a4 84f4010VgnVCM10000022f95190RCRD ETS. (2009a). For Academic Institutions. Retrieved from: http://www.ets.org/portal/ site/ets/menuitem.Fab2360b1645a1de9 b3a0779f1751509/?vgnextoid=6be9d898 c84f4010 VgnVCM10000022f95190RCRD ETS. (2009b). Test Content. Retrieved from: http://www.ets.org/portal/site/ets/ menuitem.1488512ecfd5b8849a77b13bc39 2 1 5 0 9 / ?v g ne x toi d = ff 58 af 5 e 4 4 df 4 0 1 0 VgnVCM10000022f95190RCRD&vgnextch annel=ab16197a484f4010VgnVCM1000002 2f95190RCRD Gomez, P. G., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFT iBT reading test. Language Testing, 24 (3), 417-444. Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., & Taylor, C. (2000). TOEFL 2000 framework: A working paper (TOEFL Monograph MS-16). Princeton, NJ: Educational Testing Service. Liu, O.L., Schedl, M., Malloy, J., & Kong, N. (2009). Does content knowledge affect TOEFL iBT reading performance? A confirmatory approach to differential item functioning. TOEFL iBT Research Report. Princeton, NJ: Educational Testing Service. Simner, Marvin L. & Mitchell, John B. (2007). Validation of the TOEFL as a Canadian University Admissions Requirement. Canadian Journal of School Psychology, 22, 182-190. TOEFL website: www.toefl.org/ Wall, D. & Horák, T. (2006). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phrase 1, The baseline study (TOEFL Monograph No. MS-34). Princeton, NJ: Educational Testing Service. Zhang, Y. (2008). Repeater analyses for TOEFL iBT. (ETS Research Memorandum 08-05). Princeton, NJ: Educational Testing Service.