Papers by Farshad Effatpanah
International Journal of Testing, 2024
This study applied the Mixed Rasch Model (MRM) to the listening comprehension section of the Inte... more This study applied the Mixed Rasch Model (MRM) to the listening comprehension section of the International English Language Testing System (IELTS) to detect latent class differential item functioning (DIF) by exploring multiple profiles of second/foreign language listeners. Item responses of 462 examinees to an IELTS listening test were subjected to MRM analysis. Three classes emerged: (1) 'Medium-level Stimulus Processors' who can somewhat synchronize top-down and bottom-up processing, handle multitasking to a certain extent, comprehend moderately complex items, and manage input delivered at a relatively fast pace; (2) 'High-level Stimulus Processors' who have greater abilities in synchronizing top-down and bottom-up processing, multitasking, understanding complex items, and handling fast delivery input and more paraphrased content; and (3) 'Low-level Stimulus Processors' who rely more on bottom-up processing, have limited lexico-grammatical knowledge, struggle with multitasking and complex items, and find fast delivery input and paraphrased content challenging. Differences across the classes were further explained.
Language Testing, 2024
This study aimed to propose a new method for scoring C-Tests as measures of general language prof... more This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence are aggregated as sentence score, and then each sentence is entered into the analysis as a polytomous item. Rasch partial credit model is applied to analyze the test. To investigate the effectiveness of the new method, the results of this new strategy were compared with those derived from passage and gap levels as well as combining locally dependent items using the dichotomous Rasch model, the rating scale model, and the partial credit model. The obtained scores from the administration of a C-Test comprising four English passages, each containing 25 gaps, to 160 participants were subjected to dichotomous and polytomous Rasch model analyses. The models were compared regarding individual item fit, person/item separation and reliability, global fit (e.g., deviance), unidimensionality, local item dependence (LID), and person parameters. Results showed the effectiveness of the new scoring method. The suggested strategy increases the number of items compared to the super-item approach, provides information at the sentence level, and reduces the impact of LID.
Psych, 2024
Likert scales are the most common psychometric response scales in the social and behavioral scien... more Likert scales are the most common psychometric response scales in the social and behavioral sciences. Likert items are typically used to measure individuals' attitudes, perceptions, knowledge, and behavioral changes. To analyze the psychometric properties of individual Likerttype items and overall Likert scales, mostly methods based on classical test theory (CTT) are used, including corrected item-total correlations and reliability indices. CTT methods heavily rely on the total scale scores, making it challenging to directly examine the performance of items and response options across varying levels of the trait. In this study, Kernel Smoothing Item Response Theory (KS-IRT) is introduced as a graphical nonparametric IRT approach for the evaluation of Likert items. Unlike parametric IRT models, nonparametric IRT models do not involve strong assumptions regarding the form of item response functions (IRFs). KS-IRT provides graphics for detecting peculiar patterns in items across different levels of a latent trait. Differential item functioning (DIF) can also be examined by applying KS-IRT. Using empirical data, we illustrate the application of KS-IRT to the examination of Likert items on a psychological scale.
Educational Methods & Practice, 2024
It has been acknowledged that second/foreign language (L2) writing is a complex and multi-dimensi... more It has been acknowledged that second/foreign language (L2) writing is a complex and multi-dimensional cognitive process, and linguistic knowledge is the foremost predictor of L2 writing. Previous research on developing models and orientations for characterizing L2 writing and its linguistic features are based on methods rooted in classical test theory (CTT) which mostly tend to overlook qualitative differences among writers. The use of item response theory (IRT) and Rasch models has been disregarded in L2 writing research. This study aimed to psychometrically investigate the dimensionality of linguistic features in L2 writing using the Rasch model. To achieve this, 500 Iranian English as a foreign language (EFL) students wrote an essay marked by four experienced raters using an empirically-derived descriptor-based diagnostic checklist. The scores derived from the marking of the essays were subjected to Rasch model analysis. Individual item/descriptor fit, separation and reliability, unidimensionality, and local item dependency (LID) were examined. The results provided evidence for the multidimensionality of linguistic features in L2 writing. The analysis of the positive and negative item loadings on Factor 1, extracted from the Rasch model residuals, revealed the presence of two sets of descriptors that contribute to the definition of two groups of L2 writers. The first set comprises descriptors with positive loadings mostly related to higher-level linguistic features of L2 writing, including content fulfillment (CON) and organizational effectiveness (ORG). However, the second set includes descriptors with negative loadings chiefly related to lower-level linguistic features, such as vocabulary use (VOC), grammatical knowledge (GRM), and mechanics (MCH). Implications and suggestions for further research are discussed.
Assessing Writing (Elsevier), 2024
The present study used the Mixed Rasch Model (MRM) to identify multiple profiles in L2 students' ... more The present study used the Mixed Rasch Model (MRM) to identify multiple profiles in L2 students' writing with regard to several linguistic features, including content, organization, grammar, vocabulary, and mechanics. To this end, a pool of 500 essays written by English as a foreign language (EFL) students were rated by four experienced EFL teachers using the Empiricallyderived Descriptor-based Diagnostic (EDD) checklist. The ratings were subjected to MRM analysis. Two distinct profiles of L2 writers emerged from the sample analyzed including: (a) Sentence-Oriented and (b) Paragraph-Oriented L2 Writers. Sentence-Oriented L2 Writers tend to focus more on linguistic features, such as grammar, vocabulary, and mechanics, at the sentence level and try to utilize these subskills to generate a written text. However, Paragraph-Oriented Writers are inclined to move beyond the boundaries of a sentence and attend to the structure of a whole paragraph using higher-order features such as content and organization subskills. The two profiles were further examined to capture their unique features. Finally, the theoretical and pedagogical implications of the identification of L2 writing profiles and suggestions for further research are discussed.
Practical Assessment, Research, and Evaluation, 2023
Item response theory (IRT) refers to a family of mathematical models which describe the relatio... more Item response theory (IRT) refers to a family of mathematical models which describe the relationship between latent continuous variables (attributes or characteristics) and their manifestations (dichotomous/polytomous observed outcomes or responses) with regard to a set of item characteristics. Researchers typically use parametric IRT (PIRT) models to measure educational and psychological latent variables. However, PIRT models are based on a set of strong assumptions that often are not satisfied. For this reason, non-parametric IRT (NIRT) models can be more desirable. An exploratory NIRT approach is kernel smoothing IRT (KS-IRT; Ramsay, 1991) which estimates option characteristic curves by non-parametric kernel smoothing technique. This approach only gives graphical representations of item characteristics in a measure and provides preliminary feedback about the performance of items and measures. Although KS-IRT is not a new approach, its application is far from widespread, and it has limited applications in psychological and educational testing. The purpose of the present paper is to give a reader-friendly introduction to the KS-IRT, and then use the KernSmoothIRT package (Mazza et al., 2014, 2022) in R to straightforwardly demonstrate the application of the approach using data of Children’s Test Anxiety scale.
The Quantitative Methods for Psychology, 2022
Cloze-elide tests are overall measures of both first (L1) and second language (L2) reading compre... more Cloze-elide tests are overall measures of both first (L1) and second language (L2) reading comprehension and communicative skills. Research has shown that a time constraint is an effective method to understand individual differences and increase the reliability and validity of tests. The purpose of this study is to investigate the psychometric quality of a speeded cloze-elide test using a ploytomous Rasch model, called partial credit model (PCM), by inspecting the fit of four different scoring techniques. To this end, responses of 150 English as a foreign language (EFL) students to a speeded cloze-elide test was analyzed. The comparison of different scoring techniques revealed that scoring based on wrong scores can better explain variability in the data. The results of PCM indicated that the assumptions of unidimensionality holds for the speeded cloze-elide test. However, the results of partial credit analysis of data structure revealed that a number of categories do not increase with category values. Finally, suggestions for further research, to better take advantage of the flexibilities of item response theory and Rasch models for explaining count data, will be presented.
Psychological Test and Assessment Modeling, 2022
A large number of researchers have explored the use of non-parametric item response theory (IRT) ... more A large number of researchers have explored the use of non-parametric item response theory (IRT) models, including Mokken scale analysis (Mokken, 1971), for inspecting rating quality in the context of performance assessment. Unlike parametric IRT models, such as Many-Facet Rasch Model (Linacre, 1989), non-parametric IRT models do not entail logistic transformations of ordinal ratings into interval scales neither do they impose any constraints on the form of item response functions. A disregarded method for examining raters' scoring patterns is the nonparametric item characteristic curve estimation using kernel smoothing approach (Ramsay, 1991) which provides, without giving numerical values, graphical representations for identifying any unsystematic patterns across various levels of the latent trait. The purpose of this study is to use the non-parametric item characteristic curve estimation method for modeling and examining the scoring patterns of raters. To this end, the writing performance of 217 English as a foreign language (EFL) examinees were analyzed. The results of rater characteristic curves, tetrahedron simplex plots, QQ-plot, and kernel density functions across gender subgroups showed that different exploratory plots derived from the non-parametric estimation of item characteristic curves using kernel smoothing approach can identify various rater effects and provide valuable diagnostic information for examining rating quality and exploring rating patterns, although the interpretation of some graphs are subjective. The implications of the findings for rater training and monitoring are discussed.
International Journal of Language Testing, 2019
The purpose of the present study was twofold: (a) to compare the performance of six cognitive dia... more The purpose of the present study was twofold: (a) to compare the performance of six cognitive diagnostic models, including a general model (GDINA), two non-compensatory models (DINA and NC-RUM), and three compensatory models (ACDM, DINO, and C-RUM), at test level to find the best model for describing the underlying interaction among the listening attributes of the IELTS exam; and (b) to diagnose the performance of Iranian candidates in the listening section of the IELTS. To accomplish these, item responses of 310 Iranian test takers to the Listening Sub-test of the IELTS exam were analyzed. The models were first compared in terms of absolute and relative fit indices for selecting the most optimal model. The results showed that the G-DINA model was the best model with regard to all fit indices among the competing models followed by the C-RUM, ACDM, NC-RUM, DINO, and DINA. Then, the C-RUM as the best specific CDM was selected for the second phase of the study. It was found that making inference and comprehending vocabulary and syntax are the most difficult listening constituents for Iranian IELTS candidates.
Psychological Test and Assessment Modeling, 2021
Writing in a second/foreign language (L2) is a demanding task for L2 writers because it calls for... more Writing in a second/foreign language (L2) is a demanding task for L2 writers because it calls for multiple language abilities and (meta)cognitive knowledge. Research investigating the (meta)cognitive processes involved in composing in L2 have emphasized the complex and multidimensional nature of L2 writing with many underlying (meta)cognitive components. However, it is still unclear what factors or components are involved in composing in L2. Employing correlational and qualitative approaches and through the modeling of L2 writing proficiency, previous studies could not offer adequate evidence for the exact nature of such components. This study aimed at examining the underlying cognitive operations of L2 writing performance using an IRT-based cognitive processing model known as linear logistic test model (LLTM). To achieve this, the performance of 500 English as a foreign language (EFL) students on a writing task was analyzed. Five cognitive processes underlying L2 writing were postulated: content fulfillment, organizational effectiveness, grammatical knowledge, vocabulary use, and mechanics. The results of the likelihood ratio test showed that the Rasch model fits significantly better than the LLTM. The correlation coefficient between LLTM and Rasch model item parameters was .85 indicating that about 72 % of variance in item difficulties can be explained by the five postulated cognitive operations. LLTM analyses also revealed that vocabulary and content are the most difficult processes to use and grammar is the easiest. More importantly, the results showed that it is possible to envisage a model for L2 writing with reference to a set of subskills or attributes.
Language Testing in Asia
Cognitive diagnostic models (CDMs) have recently received a surge of interest in the field of sec... more Cognitive diagnostic models (CDMs) have recently received a surge of interest in the field of second language assessment due to their promise for providing fine-grained information about strengths and weaknesses of test takers. For the same reason, the present study used the additive CDM (ACDM) as a compensatory and additive model to diagnose Iranian English as a foreign language (EFL) university students’ L2 writing ability. To this end, the performance of 500 university students on a writing task was marked by four EFL teachers using the Empirically derived Descriptor-based Diagnostic (EDD) checklist. Teachers, as content experts, also specified the relationships among the checklist items and five writing sub-skills. The initial Q-matrix was empirically refined and validated by the GDINA package. Then, the resultant ratings were analyzed by the ACDM in the CDM package. The estimation of the skill profiles of the test takers showed that vocabulary use and content fulfillment are the most difficult attributes for the students. Finally, the study found that the skills diagnosis approach can provide informative and valid information about the learning status of students.
Conference Presentations by Farshad Effatpanah
International Meeting of the Psychometric Society (IMPS), Prague University, Prague, Czech Republic, 2024
Measurement invariance is a crucial consideration in psychological and educational measurement wh... more Measurement invariance is a crucial consideration in psychological and educational measurement which evaluates the psychometric equivalence of a latent trait across different groups or time points. This property is typically assessed by differential item functioning (DIF). Numerous statistical techniques have been proposed to investigate DIF, including Mantel-Haenszel, logistic regression, likelihood ratio test, multiple-group factor analysis, multiple indicator multiple cause, item response theory (IRT)-/Rasch-based analytical methods, and multidimensional IRT. However, almost all of these methods require a priori specification of two or more groups. This study aims to apply a tree-based global model test for polytomous Rasch models, built on model-based recursive partitioning algorithm (Komboz et al., 2018), to the simplified Beck Depression Inventory (BDI-S) to investigate DIF across age and gender. Unlike the conventional methods, the model does not require a priori specification of groups for detecting DIF. The model splits the sample by subjecting the data to iterative nonlinear partitioning and estimate item difficulty for each split. To explore possible breaches of measurement consistency in the BDI-S, the responses of 4521 German respondents (both clinical and nonclinical) were analyzed using the psychotree package in R. After checking the fit of the data to the Rasch model, the rating scale tree model was estimated. The analysis generated 19 non-predefined DIF nodes, with varying patterns of item difficulties. The results also indicated that age and gender affect the manifestation of depression. Overall, the findings suggest that the model could effectively capture the underlying interaction between the covariates and the BDI items.
The 19th International TELLSI Conference, University of Birjand, Birjand, Iran, 2022
C-tests are measures of general language proficiency. A typical C-test consists of several passag... more C-tests are measures of general language proficiency. A typical C-test consists of several passages with the second half of every second word is deleted, and respondents have to restore the deleted parts. Each gap is typically considered as an item, and the score on a passage is based on the number of gaps correctly reformulated. However, this increases the number of item parameters to be estimated and violates the local independence assumption. Rather, researchers have proposed another approach in which each passage is viewed as a polytomous item with different ordered categories. Although this method is more effective than the gap level, it results in information loss at item-level which, in turn, affects the accuracy of ability parameters. In this study, we propose another strategy upon which each sentence is considered as a polytomous item. That is, the scores on the sentences are aggregated and then polytomous Rasch models such as Rasch partial credit model are applied. The results of this new strategy is compared with those derived from passage and gap levels. To achieve this, data were collected from a sample of 160 English language learners in Iran. Participants completed a C-test comprising four English passages. Each text contained 25 gaps. The test was administered as a mid-term reading comprehension test. Using WINSTEPS computer program, the obtained scores were subjected to dichotomous and polytomous Rasch model analysis. The results of unidimensionality, infit and outfit mean squares, and reliability showed that the new strategy has a comparable performance to the other methods, supporting the effectiveness of the method. The findings of the study indicate that the suggested strategy is more efficient than gap-and passage-level. It also leads to the increase number of items and provides information at sentence level.
4th Conference on Interdisciplinary Approaches to Language Teaching, Literature, and Translation Studies, Ferdowsi University, Mashhad, Iran, 2022
Books by Farshad Effatpanah
Peter Lang, 2023
One of the known problems with statistically modeling C-tests is the dependency between the diffe... more One of the known problems with statistically modeling C-tests is the dependency between the different gaps within one text, also called local item dependence (LID). We suggest a new modeling strategy to circumvent LID in C-Tests, and we compare the results with those obtained from a dichotomous Rasch model where LID is ignored. The new strategy entails combining only the items which are identified to be locally dependent instead of combining all the items nested within a passage. Our findings show that the new modeling approach has a better fit compared to the dichotomous model where LID is ignored. Further examinations show that the dichotomous model overestimates person parameters and test reliability, compared to our new approach. We can also show that when LID is accounted for, the data are closer to unidimensionalit.
Sokhan Gostar Publishing, 2022
Uploads
Papers by Farshad Effatpanah
Conference Presentations by Farshad Effatpanah
Books by Farshad Effatpanah