This study reviews research into the IELTS Speaking and Listening modules to build a validity argument for them. Based on Kane’s (1992, 2001, 2004) validity argument framework, the researcher postulates seven assumptions to examine the... more
This study reviews research into the IELTS Speaking and Listening modules to build a validity argument for them. Based on Kane’s (1992, 2001, 2004) validity argument framework, the researcher postulates seven assumptions to examine the two modules’ interpretive arguments, as well as the sufficiency and efficacy of research conducted on them. The Speaking Module has been thoroughly studied in many respects, but its
validity argument is nevertheless seriously compromised because IELTS has yet to articulate a constituent theory of second-language speech on which the module’s analytic scoring system is based, and because a number of studies have shown very limited correlations between performance on the module and performance in target language domains. The Listening Module is the least-researched module of the test, and is in urgent need of investigation before a validity argument can even be attempted.
At the top of the assessment pyramid are multinational testing corporations, best known by the names of their standardized tests, such as IELTS, TOEIC, TOEFL, BULATS, TKT, Cambridge ESOL main suite, or G-TELP (there are many other... more
At the top of the assessment pyramid are multinational testing corporations, best known by the names of their standardized tests, such as IELTS, TOEIC, TOEFL, BULATS, TKT, Cambridge ESOL main suite, or G-TELP (there are many other aspirants). In some ways these testing companies can be thought of as the Big Pharma corporations (i.e. drug companies) of the educational world. Like Big Pharma they are subject to constant challenges to their ethics and reliability from within and without, and like Big Pharma they are rather prone to corrupt the issues which they were designed to assist with. The possible corruption of language learning by the requirements of testing is known as wash-back. Wash-back is not always malignant. The analysis in this paper is a tentative attempt to manipulate the wash-back from an international test in a manner which actually assists genuine language acquisition. This material is drawn from some Masters degree work (2005) and comes to 138 pages.
The advancement in technology has paved the way for the inclusion of videos in L2 listening comprehension tests. It is true that video listening test format is becoming more popular in various contexts. But, as existing research shows,... more
The advancement in technology has paved the way for the inclusion of videos in L2 listening comprehension tests. It is true that video listening test format is becoming more popular in various contexts. But, as existing research shows, there are still some on-going debates over the practice of video-based listening tests. Taking an argument-based approach (Chapelle, Enright, & Jamieson, 2008), this paper focuses on the issues of construct definition and test authenticity in video-based listening tests. The inferences of Domain definition and Explanation were introduced to help contextualize the issues. Empirical studies suggest that the controversial role of visual-related skills in the construct of video-based listening tests is not well recognized both in theory and in practice. The commonly held assumption that the introduction of videos into listening tests boosts authenticity is questionable with a closer look at the two aspects of authenticity, namely situational and interactional authenticity. Therefore, more empirical research and theoretical thinking are needed to warrant the use of videos in listening tests. Some suggestions concerning video-based listening test development and validation studies are made at the end of this paper.
This article develops a validity inquiry heuristic from several Elder Sophists’ positions on the nomos–physis controversy of the fifth and fourth century B.C.E. in Greece. The nomos–physis debate concerned the nature and existence of... more
This article develops a validity inquiry heuristic from several Elder Sophists’ positions on the nomos–physis controversy of the fifth and fourth century B.C.E. in Greece. The nomos–physis debate concerned the nature and existence of knowledge and virtue, and maps well to current discussion of validity inquiry in writing assessment. Beyond rearticulating validity as a reflexive, agency-constructing, rhetorical act, this article attempts to bridge disciplines by articulating validity in terms of rhetorical theory, and understanding ancient sophistic rhetorical positions as validity theory.
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the... more
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response theory (1-p IRT) models. The PEET is a national test consisting of a centralized written examination designed to provide information on the eligibility of PhD applicants of TEFL to enter PhD programs. The 2013 administration of this test provided score data for a sample of 999 Iranian PhD applicants consisting of 397 males and 602 females. First, the data were subjected to DIF analysis through logistic regression (LR) model. Then, to triangulate the findings, a 1-p IRT procedure was applied. The results indicated (1) more items flagged for DIF by LR than by 1-p IRT (2) DIF cancellation (the number of DIF items were equal for both males and females), as revealed through LR, (3) equal number of uniform and non-uniform DIF, as tracked via LR, and (4) female superiority in the test performance, as revealed via IRT analysis. Overall, the findings of the study indicated that PEET suffers from DIF. As such, test developers and policymakers (like NOET & MSRT) are recommended to take these findings into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making.
It is commonly claimed that the conclusion of a valid deductive argument is contained in its premises and says nothing new. In 'Deduction and Novelty,' in The Reasoner 5 (4), pp. 56-57, I refuted that claim. In The Reasoner, 8 (3), pp.... more
It is commonly claimed that the conclusion of a valid deductive argument is contained in its premises and says nothing new. In 'Deduction and Novelty,' in The Reasoner 5 (4), pp. 56-57, I refuted that claim. In The Reasoner, 8 (3), pp. 24-25, David McBride criticised my refutation. I show that McBride’s arguments are unsound.
5245 North Backer Ave. M/S PB 98, Fresno, CA 93740-8001, ainoue@csufresno.edu Asao B. Inoue is an assistant professor and Assessment Expert for the College of Arts and Humanities at California State University, Fresno, where he teaches... more
5245 North Backer Ave. M/S PB 98, Fresno, CA 93740-8001, ainoue@csufresno.edu Asao B. Inoue is an assistant professor and Assessment Expert for the College of Arts and Humanities at California State University, Fresno, where he teaches graduate courses in composition pedagogy, writing assessment, and the rhetoric of racism. His most recent article, “Community-Based Assessment Pedagogy,” focused on migrating assessment theory, such as validity theory and fourth generation evaluation, into the classroom, and was published in Assessing Writing (2004). A related exchange with Peter Elbow is planned in forthcoming issues of the same journal (11.2 and 11.3). Journal of Writing Assessment Vol. 3, No. 1, pp. 31-54
A few computer-assisted language learning (CALL) instruments have been developed in Iran to measure EFL (English as a foreign language) learners’ attitude toward CALL. However, these instruments have no solid validity argument and... more
A few computer-assisted language learning (CALL) instruments have been developed in Iran to measure EFL (English as a foreign language) learners’ attitude toward CALL. However, these instruments have no solid validity argument and accordingly would be unable to provide a reliable measurement of attitude. The present study aimed to develop a CALL attitude instrument (CALLAI) to be used in the Iranian EFL context.
A pool of 633 survey items was developed and 27 items were judged to be appropriate for measuring CALL attitude. The chosen items were translated and back-translated by
experts and were administered to 1001 Iranian EFL learners. The psychometric features of the items were examined using three primary data analysis techniques: principal component analysis (PCA), confirmatory factor analysis (CFA), and the Rasch-Andrich rating scale model. Finally, a validity argument for CALLAI was developed which comprised five primary inferences. The findings from the psychometric analysis were mapped onto the validity framework. The validity framework is generally well supported, although adding a few items could yield higher reliability coefficients.