- Vahid ARYADOUST (Dr) | Assistant Professor | English Language and Literature |
National Institute of Education
NIE3-03-97, 1 Nanyang Walk, Singapore 637616
Tel: (65) 6790-3475 GMT+8h | Fax: (65) 6896-9149 | Email: vahid.aryadoust@nie.edu.sg |
Web: http://www.nie.edu.sg/profile/aryadoust-vahid
- Language Testing, Listening, Psychology, Emotion, Psychometrics, Personality, and 10 moreLanguages and Linguistics, Academic Writing, Reading Comprehension, Structural Equation Modeling, Listening Comprehension (Psychology of Language), Rasch Models, Language Testing and Assessment, Effective Listening, Second Language Writing, and Psychological and Educational Testingedit
- Vahid Aryadoust, PhD, is Assistant Professor in the English Language and Literature Academic Group. He is the Associa... moreVahid Aryadoust, PhD, is Assistant Professor in the English Language and Literature Academic Group. He is the Associate Director of the Global Listening Center, and a member of international associations such as the International Listening Association, American Association of Applied Linguistics, and the Cognitive Science Society. He has provided consultation on language assessment projects to, for example, Paragon Testing Enterprises (Canada), the DELTA Project of Hong Kong, and Learning Resource Network (London). Vahid has led multiple language assessment projects funded by, for example, Cambridge-Michigan Language Assessment (CaMLA) in 2013 and 2010, and published his research in Language Testing, Language Assessment Quarterly, Assessing Writing, Educational Assessment, Educational Psychology, and Computer Assisted Language Learning, etc. He has also (co)authored multiple book chapters and books published by Routledge, Cambridge University Press, Springer, Cambridge Scholar Publishing, Wiley Blackwell, etc. In addition, Vahid has served as the Principle Guest Editor of a special issue on assessing writing published in Educational Psychology (2017) and the Co-Principle Guest Editor of learners’ listening special issue published in The International Journal of Listening (2016). He is a member of the Advisory Board of several international journals including Language Testing (UK, Sage Publisher), Language Assessment Quarterly (USA, Tayloredit
The computerization of reading assessments has presented a set of new challenges to test designers. From the vantage point of measurement invariance, test designers must investigate whether the traditionally recognized causes for... more
The computerization of reading assessments has presented a set of new challenges to test designers. From the vantage point of measurement invariance, test designers must investigate whether the traditionally recognized causes for violating invariance are still a concern in computer-mediated assessments. In addition, it is necessary to understand the technology-related causes of measurement invariance among test-taking populations. In this study, we used the available data (n = 800) from the previous administrations of the Pearson Test of English Academic (PTE Academic) reading, an international test of English comprising 10 test items, to investigate measurement invariance across gender and the Information and Communication Technology Development index (IDI). We conducted a multi-group confirmatory factor analysis (CFA) to assess invariance at four levels: configural, metric, scalar, and structural. Overall, we were able to confirm structural invariance for the PTE Academic, which is a necessary condition for conducting fair assessments. Implications for computer-based education and the assessment of reading are discussed.
https://www.sciencedirect.com/science/article/pii/S0191491X19301452
https://www.sciencedirect.com/science/article/pii/S0191491X19301452
Research Interests:
Volume I of Quantitative Data Analysis for Language Assessment is a resource book that presents the most fundamental techniques of quantitative data analysis in the field of language assessment. Each chapter provides an accessible... more
Volume I of Quantitative Data Analysis for Language Assessment is a resource book that presents the most fundamental techniques of quantitative data analysis in the field of language assessment. Each chapter provides an accessible explanation of the selected technique, a review of language assessment studies that have used the technique, and finally, an example of an authentic study that uses the technique. Readers also get a taste of how to apply each technique through the help of supplementary online resources that include sample data sets and guided instructions. Language assessment students, test designers, and researchers should find this a unique reference as it consolidates theory and application of quantitative data analysis in language assessment.
Research Interests:
The purpose of the present study was twofold: (a) it examined the relationship between peer-rated likeability and peer-rated oral presentation skills of 96 student presenters enrolled in a science communication course, and (b) it... more
The purpose of the present study was twofold: (a) it examined the relationship between peer-rated likeability and peer-rated oral presentation skills of 96 student presenters enrolled in a science communication course, and (b) it investigated the relationship between student raters’ severity in rating presenters’ likeability and their severity in evaluating presenters’ skills. Students delivered an academic presentation and then changed roles to rate their peers’ performance and likeability, using an 18-item oral presentation scale and a 10- item likeability questionnaire, respectively. Many-facet Rasch measurement was used to validate the data, and structural equation modeling (SEM) was used to examine the research questions. At an aggregate level, likeability explained 19.5% of the variance of the oral presentation ratings and 8.4% of rater severity. At an item-level, multiple cause-effect relationships were detected, with the likeability items explaining 6–30% of the variance in the oral presentation items. Implications of the study are discussed.
Research Interests:
This chapter describes the listening section of the Internet-Based Test of English as a Foreign Language (TOEFL iBT) which was designed by Educational Testing Service (ETS). The TOEFL iBT is administered in many testing centers around the... more
This chapter describes the listening section of the Internet-Based Test of English as a Foreign Language (TOEFL iBT) which was designed by Educational Testing Service (ETS). The TOEFL iBT is administered in many testing centers around the world and is used to measure academic English language proficiency of test candidates who are applying to universities whose primary language of instruction and research is English.
Research Interests:
This chapter aims to demonstrate how peer assessment can be used to generate information in support of teaching and learning in Singapore and other educational settings. The chapter reports on the development of the tertiary-level English... more
This chapter aims to demonstrate how peer assessment can be used to generate information in support of teaching and learning in Singapore and other educational settings. The chapter reports on the development of the tertiary-level English oral presentation scale (TEOPS) which is used in a science communication module in a major Singaporean university. A survey of peer assessment and oral presentations is conducted and the multicomponential model of TEOPS is presented. In addition, the importance of the assessment of oracy and presentation skills in Singapore is discussed and a narration of the validation studies of TEOPS, which use many-facet Rasch measurement (MFRM) and students' perception, is presented. The author elaborates on how this scale can be used for peer assessment and provides directions for future research on the peer assessment of oral presentations.
Research Interests:
This entry seeks to examine second language (L2) listening comprehension from a subskill-based approach. It provides an overview of two models of listening comprehension, that is, the default listening construct and the listening-response... more
This entry seeks to examine second language (L2) listening comprehension from a subskill-based approach. It provides an overview of two models of listening comprehension, that is, the default listening construct and the listening-response model, and delineates listening subskills. It also proposes a list of the subskills that have been identified and validated through empirical research. The entry concludes by discussing the potential relationships between the subskills and the limitation of the listening comprehension research. APA citation: Aryadoust, V. (2017). Taxonomies of listening skills. In J. I. Liontas and M. DelliCarpini (Eds.), The TESOL encyclopedia of English language teaching. John Wiley in partnership with TESOL International.
Research Interests:
Two models of listening comprehension are presented: a cognitive model for non-assessment settings and a language proficiency model which has been applied extensively to the assessment of listening. The similarities of the models are then... more
Two models of listening comprehension are presented: a cognitive model for non-assessment settings and a language proficiency model which has been applied extensively to the assessment of listening. The similarities of the models are then discussed, and a general framework for communicative assessment of listening is proposed. The framework considers socio-cognitive aspects of listening assessment and lends itself to both in-class and beyond-class assessment situations.
Research Interests:
Coh-Metrix has emerged as a promising psycholinguistic tool in writing and reading research. Researchers have used Coh-Metrix to predict English proficiency of first and second language learners. The common statistical method used in... more
Coh-Metrix has emerged as a promising psycholinguistic tool in writing and reading research. Researchers have used Coh-Metrix to predict English proficiency of first and second language learners. The common statistical method used in predictive modeling research is a multiple linear regression model, which has achieved varying degrees of success. This chapter examines the relative merits of the learning/validation method applied in previous Coh-Metrix studies and then proposes genetic algorithm-based symbolic regression as an alternative and efficient approach which provides robust evidence for the predictive power of some of the Coh-Metrix indices. Using a sample of papers written by university students (n = 450), the author demonstrates that genetic algorithm-based symbolic regression is capable of significantly minimizing the error of measurement and providing a much clearer understanding of the data.
Research Interests:
Over the past few decades, the field of language assessment has grown in importance, sophistication, and scope. The increasing internationalization of educational and work contexts, heightened global understanding of the role of... more
Over the past few decades, the field of language assessment has grown in importance, sophistication, and scope. The increasing internationalization of educational and work contexts, heightened global understanding of the role of assessment in learning (e.g., Black & Wiliam, 2001; Fox, 2014; Rea-Dickens, 2001), greater emphasis on the assessment of educational outcomes (e.g., Biggs & Tang, 2007), and the concomitant expansion of the language testing industry (e.g., Alderson, 2009) have led to unprecedented changes in assessment practices and approaches. These advancements, spurred on by technological innovation and a burgeoning array of new data analysis techniques, have prompted some to suggest (e.g., McNamara, 2014) that language assessment is on the verge of a revolution....
Research Interests:
http://www.cambridgescholars.com/trends-in-language-assessment-research-and-practice Despite prodigious developments in the field of language assessment in the Middle East and the Pacific Rim, research and practice in these areas have... more
http://www.cambridgescholars.com/trends-in-language-assessment-research-and-practice
Despite prodigious developments in the field of language assessment in the Middle East and the Pacific Rim, research and practice in these areas have been underrepresented in mainstream literature. This volume takes a fresh look at language assessment in these regions, and provides a unique overview of contemporary language assessment research. In compiling this book, the editors have tapped into the knowledge of language and educational assessment experts whose diversity of perspectives and experience has enriched the focus and scope of language and educational assessment in general, and the present volume in particular. The six ‘trends’ addressed in the 26 chapters that comprise this title consider such contemporary topics as data mining, in-class assessment, and washback. The contributors explore new approaches and techniques in language assessment including advances resulting from multidisciplinary collaboration with researchers in computer science, genetics, and neuroscience. The current trends and promising new directions identified in this volume and the research reported here suggest that researchers across the Middle East and the Pacific Rim are playing—and will continue to play—an important role in advancing the quality, utility, and fairness of language testing and assessment practices.
Despite prodigious developments in the field of language assessment in the Middle East and the Pacific Rim, research and practice in these areas have been underrepresented in mainstream literature. This volume takes a fresh look at language assessment in these regions, and provides a unique overview of contemporary language assessment research. In compiling this book, the editors have tapped into the knowledge of language and educational assessment experts whose diversity of perspectives and experience has enriched the focus and scope of language and educational assessment in general, and the present volume in particular. The six ‘trends’ addressed in the 26 chapters that comprise this title consider such contemporary topics as data mining, in-class assessment, and washback. The contributors explore new approaches and techniques in language assessment including advances resulting from multidisciplinary collaboration with researchers in computer science, genetics, and neuroscience. The current trends and promising new directions identified in this volume and the research reported here suggest that researchers across the Middle East and the Pacific Rim are playing—and will continue to play—an important role in advancing the quality, utility, and fairness of language testing and assessment practices.
Research Interests:
Our interest in putting together the present volume grew out of a burgeoning stream of research into language assessment in the Middle East and the Pacific Rim. As the focus on education and the role of English language teaching continues... more
Our interest in putting together the present volume grew out of a burgeoning stream of research into language assessment in the Middle East and the Pacific Rim. As the focus on education and the role of English language teaching continues to intensify across these regions at an unprecedented rate, assessing communication skills becomes an increasingly significant field. Some of the major universities in these regions have had a long history in teaching and assessing English and other languages, and researchers, practitioners, and scholars alike have attempted to develop innovative assessment approaches and techniques to address the pressing needs of language test developers and test takers. At the same time, multiple annual conferences, such as Pacific Rim Objective Measurement Symposium (PROMS) and the Asian Association for Language Assessment (AALA) conference, have been launched to bring scholars together and keep them updated about the latest developments in language and educational assessment in these regions.
Research Interests:
Bagheri, M.S., Nikpoor, S., & Aryadoust, S.V. (2007). Crack IELTS in a flash. Shiraz: Sandbad Publication.
Aryadoust, V., Akbarzadeh, S., Afarinesh, A. (2008). A guidebook to passages 2. Shiraz: Sandbad Publication.
Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task2, general and academic. Tehran: Jungle Publication. Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task1, academic... more
Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task2, general and academic. Tehran: Jungle Publication.
Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task1, academic module. Tehran: Jungle Publication.
Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task1, general module. Tehran: Jungle Publication.
Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task1, academic module. Tehran: Jungle Publication.
Aryadoust, V., Akbarzadeh S., & Nasiri, E. (2007). IELTS writing tutor, writing task1, general module. Tehran: Jungle Publication.
Aryadoust, V. (2007). A dictionary of sociolinguistics, plus pragmatics and languages. Shiraz: Faramatn Publication.
Aryadoust, V. (2006). A guidebook to passages 1. Shiraz: Sandbad Publication.
A number of scaling models—developed originally for psychological studies—have been adapted into language assessment. Although their application has been promising, they have not yet been validated in language assessment contexts. This... more
A number of scaling models—developed originally for psychological studies—have been adapted into language assessment. Although their application has been promising, they have not yet been validated in language assessment contexts. This study discusses the relative merits of two such models in the context of second language (L2) listening comprehension tests: confirmatory factor analysis (CFA) and cognitive diagnostic models (CDMs). Both CFA and CDMs model multidimensionality in assessment tools, whereas other models force the data to be statistically unidimensional. The two models were applied to the listening test of the Michigan English Language Assessment Battery (MELAB). CFA was found to impose more restrictions on the data than CDM. It is suggested that CFA might not be suitable for modelling dichotomously scored data of L2 listening tests, whereas the CDM used in the study (the Fusion Model) appeared to successfully portray the listening sub-skills tapped by the MELAB listening test. The paper concludes with recommendations about how to use each of these models in modelling L2 listening.
Although second language (L2) listening assessment has been the subject of much research interest in the past few decades, there remain a multitude of challenges facing the definition and operationalization of the L2 listening... more
Although second language (L2) listening assessment has been the subject of much research interest in the past few decades, there remain a multitude of challenges facing the definition and operationalization of the L2 listening construct(s). Notably, the majority of L2 listening assessment studies are based upon the (implicit) assumption that listening is reducible to cognition and metacognition. This approach ignores emotional, neurophysiological, and sociocultural mechanisms underlying L2 listening. In this paper, the role of these mechanisms in L2 listening assessment is discussed and four gaps in understanding are explored: the nature of L2 listening, the interaction between listeners and the stimuli, the role of visuals, and authenticity in L2 listening assessments. Finally, a review of the papers published in the special issue is presented and recommendations for further research on L2 listening assessments are provided.
Research Interests:
This study sought to examine research trends in computer-assisted language learning (CALL) using a retrospective scientometric approach. Scopus was used to search for relevant publications on the topic and generate a dataset consisting of... more
This study sought to examine research trends in computer-assisted language learning (CALL) using a retrospective scientometric approach. Scopus was used to search for relevant publications on the topic and generate a dataset consisting of 3,697 studies published in 11 journals between 1977 and 2020. A document co-citation analysis method was adopted to identify the main research clusters in the dataset. The impact of each publication on the field was measured by using the burst index and the betweenness centrality and the content of influential publications was closely analysed to determine the focus of each cluster and the key themes of the studies in focus. Overall, we identified seven major clusters. We further found that leveraging synchronous computer-mediated communication and negotiated interaction, multimedia, telecollaboration or e-mail exchanges, blogs, digital games, Wikis and podcasts to support language learning was probably beneficial for language learning. Varying degrees of support were found in various studies for each of these technologies. Stronger support was found for synchronous computer-mediated communication and negotiated interaction, multimedia, telecollaboration or e-mail exchanges and digital games and weaker support was found for blogs, Wikis, and podcasts. The limitations the supporting studies listed were also considered inconsequential. On the other hand, while there was strong support for blogs, Wikis and podcasts, some major drawbacks were observed. The findings of the study would be helpful for teachers and instructors who want to decide whether to use technology in the classroom for instructional purposes. Additionally, researchers and graduate students who need to identify a research topic for their thesis or dissertation may find the results of the study useful for them, too.
Research Interests:
The present study conducted a systematic review of the item response theory (IRT) literature in language assessment to investigate the conceptualization and operationalization of the dimensionality of language ability. Sixty-two IRT-based... more
The present study conducted a systematic review of the item response theory (IRT) literature in language assessment to investigate the conceptualization and operationalization of the dimensionality of language ability. Sixty-two IRT-based studies published between 1985 and 2020 in language assessment and educational measurement journals were first classified into two categories based on a unidimensional and multidimensional research framework, and then reviewed to examine language dimensionality from technical and substantive perspectives. It was found that 12 quantitative techniques were adopted to assess language dimensionality. Exploratory factor analysis was the primary method of dimensionality analysis in papers that had applied unidimensional IRT models, whereas the comparison modeling approach was dominant in the multidimensional framework. In addition, there was converging evidence within the two streams of research supporting the role of a number of factors such as testlets, language skills, subskills, and linguistic elements as sources of multidimensionality, while mixed findings were reported for the role of item formats across research streams. The assessment of reading, listening, speaking, and writing skills was grounded within both unidimensional and multidimensional framework. By contrast, vocabulary and grammar knowledge was mainly conceptualized as unidimensional. Directions for continued inquiry and application of IRT in language assessment are provided.
Research Interests:
This is the second neurocognitive study of language assessments produced in our lab. In addition to the experiment, we have proposed the concept of neurocognitive validity in language assessment. We are working towards expanding on this... more
This is the second neurocognitive study of language assessments produced in our lab. In addition to the experiment, we have proposed the concept of neurocognitive validity in language assessment. We are working towards expanding on this framework. We believe neurocognitive approaches to learning and assessment will be the future of education, and it is best that pertinent frameworks be proposed and tested now.
<Abstract>
With the advent of new technologies, assessment research has adopted technology- based methods to investigate test validity. This study investigated the neurocognitive processes involved in an academic listening comprehension test, using a biometric technique called functional near-infrared spectroscopy (fNIRS). Sixteen right-handed university students completed two tasks: (1) a linguistic task that involved listening to a mini-lecture (i.e., Listening condition) and answering of questions (i.e., Questions condition) and (2) a nonlinguistic task that involved listening to a variety of natural sounds and animal vocalizations (i.e., Sounds condition). The hemodynamic activity in three left brain regions was measured: the inferior frontal gyrus (IFG), dorsomedial prefrontal cortex (dmPFC), and posterior middle temporal gyrus (pMTG). The Listening condition induced higher activity in the IFG and pMTG than the Sounds condition. Although not statistically significant, the activity in the dmPFC was higher during the Listening condition than in the Sounds conditions. The IFG was also significantly more active during the Listening condition than in the Questions condition. Although a significant gender difference was observed in listening comprehension test scores, there was no difference in brain activity (across the IFG, dmPFC, and pMTG) between male and female participants. The implications for test validity are discussed.
<Abstract>
With the advent of new technologies, assessment research has adopted technology- based methods to investigate test validity. This study investigated the neurocognitive processes involved in an academic listening comprehension test, using a biometric technique called functional near-infrared spectroscopy (fNIRS). Sixteen right-handed university students completed two tasks: (1) a linguistic task that involved listening to a mini-lecture (i.e., Listening condition) and answering of questions (i.e., Questions condition) and (2) a nonlinguistic task that involved listening to a variety of natural sounds and animal vocalizations (i.e., Sounds condition). The hemodynamic activity in three left brain regions was measured: the inferior frontal gyrus (IFG), dorsomedial prefrontal cortex (dmPFC), and posterior middle temporal gyrus (pMTG). The Listening condition induced higher activity in the IFG and pMTG than the Sounds condition. Although not statistically significant, the activity in the dmPFC was higher during the Listening condition than in the Sounds conditions. The IFG was also significantly more active during the Listening condition than in the Questions condition. Although a significant gender difference was observed in listening comprehension test scores, there was no difference in brain activity (across the IFG, dmPFC, and pMTG) between male and female participants. The implications for test validity are discussed.
Research Interests:
Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we reviewed and coded 215 papers using Rasch measurement published in 21 applied linguistics journals for... more
Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we reviewed and coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet Rasch measurement (n = 100) and Facets (n = 113) being the most frequently used Rasch model and software, respectively. Significant differences were detected between the number of papers that applied Rasch measurement to different language skills and components, with writing (n = 63) and grammar (n = 12) being the most and least frequently investigated, respectively. In addition, significant differences were found between the number of papers reporting person separation (n = 73, not reported: n = 142) and item separation (n = 59, not reported: n = 156) and those that did not. An alarming finding was how few papers reported unidimensionality check (n = 57 vs 158) and local independence (n = 19 vs 196). Finally, a multilayer network analysis revealed that research involving Rasch measurement has created two major discrete communities of practice (clusters), which can be characterized by features such as language skills, the Rasch models used, and the reporting of item reliability/separation vs person reliability/separation. Cluster 1 was accordingly labelled the production and performance cluster, whereas cluster 2 was labelled the perception and language elements cluster. Guidelines and recommendations for analyzing unidimensionality, local independence, data-to-model fit, and reliability in Rasch model analysis are proposed.
Research Interests:
This is the first study to investigate the effects of test methods (while-listening performance and post-listening performance) and gender on measured listening ability and brain activation under test conditions. Functional near-infrared... more
This is the first study to investigate the effects of test methods (while-listening performance and post-listening performance) and gender on measured listening ability and brain activation under test conditions. Functional near-infrared spectroscopy (fNIRS) was used to examine three brain regions associated with listening comprehension: the inferior frontal gyrus and posterior middle temporal gyrus, which subserve bottom-up processing in comprehension, and the dorsomedial prefrontal cortex, which mediates top-down processing. A Rasch model reliability analysis showed that listeners were homogeneous in their listening ability. Additionally, there were no significant differences in test scores across test methods and genders. The fNIRS data, however, revealed significantly different activation of the investigated brain regions across test methods, genders, and listening abilities. Together, these findings indicated that the listening test was not sensitive to differences in the neurocognitive processes underlying listening comprehension under test conditions. The implications of these findings for assessing listening and suggestions for future research are discussed.
Research Interests:
Even though the field of linguistics has witnessed a growth of research in the areas of comprehension (listening and reading) subskills, there is currently no universally accepted taxonomy for categorizing them. Using a dataset of 192... more
Even though the field of linguistics has witnessed a growth of research in the areas of comprehension (listening and reading) subskills, there is currently no universally accepted taxonomy for categorizing them. Using a dataset of 192 publications, a document co-citation analysis was conducted. Eighteen discrete research clusters were identified, comprising 73 empirically investigated comprehension subskills, of which 55 were related to first language (L1) comprehension and 18 were associated with second language (L2) comprehension. Fifteen research clusters (83.33%) were focused on lower-order L1 processing abilities in reading such as orthographic processing and speeded word reading. The remaining three clusters were relatively small, and focused on L2 comprehension subskills. The list of subskills was visualized in the form of a codex that serves as the first integrative framework for empirically investigated comprehension subskills and processing abilities. The need for conducting experimental investigations to improve the understanding of L2 comprehension subskills was highlighted.
Research Interests:
A recent review of the literature concluded that Rasch measurement is an influential approach in psychometric modeling. Despite the major contributions of Rasch measurement to the growth of scientific research across various fields, there... more
A recent review of the literature concluded that Rasch measurement is an influential approach in psychometric modeling. Despite the major contributions of Rasch measurement to the growth of scientific research across various fields, there is currently no research on the trends and evolution of Rasch measurement research. The present study used co-citation techniques and a multiple perspectives approach to investigate
5,365 publications on Rasch measurement between 01 January 1972 and 03 May 2019 and their 108,339 unique references downloaded from the Web of Science (WoS). Several methods of network development involving visualization and text-mining were used to analyze these data: author co-citation analysis (ACA), document co-citation analysis (DCA), journal author co-citation analysis (JCA), and keyword analysis. In
addition, to investigate the inter-domain trends that link the Rasch measurement specialty to other specialties, we used a dual-map overlay to investigate specialty-to-specialty connections. Influential authors, publications, journals, and keywords were identified. Multiple research frontiers or sub-specialties were detected and the major ones were
reviewed, including “visual function questionnaires”, “non-parametric item response theory”, “validmeasures (validity)”, “latent classmodels”, and “many-facet Rasch model”. One of the outstanding patterns identified was the dominance and impact of publications written for general groups of practitioners and researchers. In personal communications, the authors of these publications stressed their mission as being “teachers” who aim to promote Rasch measurement as a conceptual model with real-world applications. Based on these findings, we propose that sociocultural and ethnographic factors have
a huge capacity to influence fields of science and should be considered in future investigations of psychometrics and measurement. As the first scientometric review of the Rasch measurement specialty, this study will be of interest to researchers, graduate students, and professors seeking to identify research trends, topics, major publications, and influential scholars.
5,365 publications on Rasch measurement between 01 January 1972 and 03 May 2019 and their 108,339 unique references downloaded from the Web of Science (WoS). Several methods of network development involving visualization and text-mining were used to analyze these data: author co-citation analysis (ACA), document co-citation analysis (DCA), journal author co-citation analysis (JCA), and keyword analysis. In
addition, to investigate the inter-domain trends that link the Rasch measurement specialty to other specialties, we used a dual-map overlay to investigate specialty-to-specialty connections. Influential authors, publications, journals, and keywords were identified. Multiple research frontiers or sub-specialties were detected and the major ones were
reviewed, including “visual function questionnaires”, “non-parametric item response theory”, “validmeasures (validity)”, “latent classmodels”, and “many-facet Rasch model”. One of the outstanding patterns identified was the dominance and impact of publications written for general groups of practitioners and researchers. In personal communications, the authors of these publications stressed their mission as being “teachers” who aim to promote Rasch measurement as a conceptual model with real-world applications. Based on these findings, we propose that sociocultural and ethnographic factors have
a huge capacity to influence fields of science and should be considered in future investigations of psychometrics and measurement. As the first scientometric review of the Rasch measurement specialty, this study will be of interest to researchers, graduate students, and professors seeking to identify research trends, topics, major publications, and influential scholars.
Research Interests:
Eye tracking technology has become an increasingly popular methodology in language studies. Using data from 27 journals in language sciences indexed in the Social Science Citation Index and/or Scopus, we conducted an in-depth... more
Eye tracking technology has become an increasingly popular methodology in language studies. Using data from 27 journals in language sciences indexed in the Social Science Citation Index and/or Scopus, we conducted an in-depth scientometric analysis of 341 research publications together with their 14,866 references between 1994 and 2018. We identified a number of countries, researchers, universities, and institutes with large numbers of publications in eye tracking research in language studies. We further discovered a mixed multitude of connected research trends that have shaped the nature and development of eye tracking research. Specifically, a document co-citation analysis revealed a number of major research clusters, their key topics, connections, and bursts (sudden citation surges). For example, the foci of clusters #0 through #5 were found to be perceptual learning, regressive eye movement(s), attributive adjective(s), stereotypical gender, discourse processing, and bilingual adult(s). The content of all the major clusters was closely examined and synthesized in the form of an in-depth review. Finally, we grounded the findings within a data-driven theory of scientific revolution and discussed how the observed patterns have contributed to the emergence of new trends. As the first scientometric investigation of eye tracking research in language studies, the present study offers several implications for future research that are discussed.
Research Interests:
The aim of the present study is two-fold. Firstly, it uses eye tracking to investigate the dynamics of item reading, both in multiple choice (MCQ) and matching items, before and during two hearings of listening passages in a computerized... more
The aim of the present study is two-fold. Firstly, it uses eye tracking to investigate the dynamics of item reading, both in multiple choice (MCQ) and matching items, before and during two hearings of listening passages in a computerized while-listening performance (WLP) test. Secondly, it investigates answer changing during the two hearings, which include four rounds of item reading taking place during: pre-listening in hearing 1, while-listening in hearing 1, pre-listening in hearing 2, and while-listening in hearing 2. The listening test was completed by 28 secondary school students in different sessions. Using time series, cross-correlation functions, and multivariate data analyses, we found that listeners tended to quickly skim the test items, distractors, and answers during pre-listening in hearing 1 and pre-listening in hearing 2. By contrast, during while-listening in hearing 1 and while-listening in hearing 2, significantly more attention was paid to the written stems, distractors, and options. The increment in attention to the written stems, distractors, and options was greater for the matching items and interactions between item format and item reading were also detected. Additionally, we observed a mixed answer changing pattern (i.e., incorrect-to-correct and correct-to-incorrect), although the dominant pattern for both item formats (67%) was wrong-to-correct. Implications of the findings for language research are discussed.
Research Interests:
This study investigates the dimensions of visual mental imagery (VMI) in aural discourse comprehension. We introduce a new approach to inspect VMIs which integrates forensic arts and latent class analysis. Thirty participants listened to... more
This study investigates the dimensions of visual mental imagery (VMI) in aural discourse comprehension. We introduce a new approach to inspect VMIs which integrates forensic arts and latent class analysis. Thirty participants listened to three descriptive oral excerpts and then verbalized what they had seen in their mind’s eye. The verbalized descriptions were simultaneously illustrated by two trained artists using the Adobe PhotoshopVR and the digital drawing tablets with electromagnetic induction technology, generating approximations of the VMIs. Next, a code sheet was developed to examine the illustrated VMIs on 16 dimensions. Latent class analysis identified three classes of VMI imaginers with nine discriminating dimensions: clarity, completeness of figures, details, shape crowdedness, shapeadded features, texture, space, time and motion, and flamboyance. The groups classes further differentiated by significant differences in their listening abilities. An individual lacking the ability to imagine (a condition called Aphantasia) and some
evidence that VMIs in listening are both symbolic and depictive were also found.
evidence that VMIs in listening are both symbolic and depictive were also found.
Research Interests:
This study investigates the underlying structure of the listening test of the Singapore-Cambridge General Certificate of Education (GCE) exam, comparing the fit of five cognitive diagnostic assessment models comprising the deterministic... more
This study investigates the underlying structure of the listening test of the Singapore-Cambridge General Certificate of Education (GCE) exam, comparing the fit of five cognitive diagnostic assessment models comprising the deterministic input noisy “and” gate (DINA), generalized DINA (G-DINA), deterministic input noisy “or” gate (DINO), higher-order DINA (HO-DINA), and the reduced reparameterized unified model (RRUM). Through model-comparisons, a nine-subskill RRUM model was found to possess the optimal fit. The study shows that students’ listening test performance depends on an array of test-specific facets, such as the ability to eliminate distractors in multiple-choice questions alongside listening-specific subskills such as the ability to make inferences. The validated list of the listening subskills can be employed as a useful guideline to prepare students for the GCE listening test at schools.
Research Interests:
A B S T R A C T The present study applied recursive partitioning Rasch trees to a large-scale reading comprehension test (n = 1550) to identify sources of DIF. Rasch trees divide the sample by subjecting the data to recursive non-linear... more
A B S T R A C T The present study applied recursive partitioning Rasch trees to a large-scale reading comprehension test (n = 1550) to identify sources of DIF. Rasch trees divide the sample by subjecting the data to recursive non-linear partitioning and estimate item difficulty per partition. The variables used in the recursive partitioning of the data were vocabulary and grammar knowledge and gender of the test takers. This generated 11 non-pre-specified DIF groups, for which the item difficulty parameters varied significantly. This is grounded within the third generation of DIF analysis and it is argued that DIF induced by the readers' vocabulary and grammar knowledge is not construct-irrelevant. In addition, only 204 (13.16%) test takers who had significantly high grammar scores were affected by gender DIF. This suggests that DIF caused by manifest variables only influences certain subgroups of test takers with specific ability profiles, thus creating a complex network of relationships between construct-relevant and-irrelevant variables.
Research Interests:
This article proposes an integrated cognitive theory of reading and listening that draws on a maximalist account of comprehension and emphasizes the role of bottom-up and top-down processing. The theoretical framework draws on the... more
This article proposes an integrated cognitive theory of reading and listening that draws on a maximalist
account of comprehension and emphasizes the role of bottom-up and top-down processing. The
theoretical framework draws on the findings of previous research and integrates them into a coherent
and plausible narrative to explain and predict the comprehension of written and auditory inputs. The
theory is accompanied by a model that schematically represents the fundamental components of the
theory and the comprehension mechanisms described. The theory further highlights the role of perception
and word recognition (underresearched in reading research), situation models (missing in listening
research), mental imagery (missing in both streams), and inferencing. The robustness of the theory is
discussed in light of the principles of scientific theories adopted from Popper (1959).
account of comprehension and emphasizes the role of bottom-up and top-down processing. The
theoretical framework draws on the findings of previous research and integrates them into a coherent
and plausible narrative to explain and predict the comprehension of written and auditory inputs. The
theory is accompanied by a model that schematically represents the fundamental components of the
theory and the comprehension mechanisms described. The theory further highlights the role of perception
and word recognition (underresearched in reading research), situation models (missing in listening
research), mental imagery (missing in both streams), and inferencing. The robustness of the theory is
discussed in light of the principles of scientific theories adopted from Popper (1959).
Research Interests:
To cite this article: Vahid Aryadoust & Mehdi Riazi (2017) Future directions for assessing for learning in second language writing research: epilogue to the special issue, Educational Psychology, 37:1, 82-89,
Research Interests:
This study adapts Levels 1 and 2 of Kirkpatrick’s model of training evaluation to evaluate learning outcomes of an English as a second language (ESL) paragraph writing course offered by a major Asian university. The study uses a... more
This study adapts Levels 1 and 2 of Kirkpatrick’s model of training evaluation to evaluate learning outcomes of an English as a second language (ESL) paragraph writing course offered by a major Asian university. The study uses a combination of surveys and writing tests administered at the beginning and end of the course. The survey evaluated changes in students’ perception of their skills, attitude, and knowledge (SAK), and the writing tests measured their writing ability. Rasch measurement was applied to examine the psychometric validity of the instruments. The measured abilities were successively subjected to path modeling to evaluate Levels 1 and 2 of the model. The students reported that the module was enjoyable and useful. In addition, their self-perceived level of skills and knowledge developed across time alongside their writing scores but their attitude remained unchanged. Limitations of Kirkpatrick’s model as well as lack of solid frameworks for evaluating educational effectiveness in applied linguistics are discussed.
Research Interests:
The fairness and precision of peer assessment have been questioned by educators and academics. Of particular interest, yet poorly understood, are the factors underlying the biases that cause unfair and imprecise peer assessments. To shed... more
The fairness and precision of peer assessment have been questioned by educators and academics. Of particular interest, yet poorly understood, are the factors underlying the biases that cause unfair and imprecise peer assessments. To shed light on this issue, I investigated gender and academic major biases in peer assessments of oral presentations. The study sample comprised 66 science students enrolled in a formative assessment-based communication module at an Asian university. Each student presented an oral presentation in English and also evaluated 10–14 of their classmates’ oral presentations. The students’ evaluations were anchored by the instructor’s evaluation of each oral presentation. I performed many-facet Rasch measurement (MFRM) for two purposes: (a) to examine the effect of multiple facets on the student and teacher ratings of oral presentations and (b) to adjust the ratings on oral presentations according to gender and academic major biases. The scores assigned by student raters had good fit to MFRM; however, when students evaluated oral presentations by peers of the opposite sex, the scores were overestimated. An academic major bias was also observed, where students consistently underestimated the scores of same-major peers. After adjusting for biases, it was concluded that peer assessments can be a reliable and useful form of formative assessment.
Research Interests:
This study applies evolutionary algorithm-based (EA-based) symbolic regression to assess the ability of metacognitive strategy use tested by the metacognitive awareness listening questionnaire (MALQ) and lexico-grammatical knowledge to... more
This study applies evolutionary algorithm-based (EA-based) symbolic regression to assess the ability of metacognitive strategy use tested by the metacognitive awareness listening questionnaire (MALQ) and lexico-grammatical knowledge to predict listening comprehension proficiency among English learners. Initially, the psychometric validity of the MALQ subscales, the lexico-grammatical test, and the listening test was examined using the logistic Rasch model and the Rasch-Andrich rating scale model. Next, linear regression found both sets of predictors to have weak or inconclusive effects on listening comprehension; however, the results of EA-based symbolic regression suggested that both lexico-grammatical knowledge and two of the five metacognitive strategies tested predicted strongly and nonlinearly listening proficiency (R2 = .64). Constraining prediction modeling to linear relationships is argued to jeopardize the validity of language assessment studies, potentially leading these studies to inaccurately contradict otherwise well-established language assessment hypotheses and theories.
Research Interests:
It has been argued that item difficulty can affect the fit of a confirmatory factor analysis (CFA) model (McLeod, Swygert, & Thissen, 2001; Sawaki, Sticker, & Andreas, 2009). We explored the effect of items with outlying difficulty... more
It has been argued that item difficulty can affect the fit of a confirmatory factor analysis (CFA) model (McLeod, Swygert, & Thissen, 2001; Sawaki, Sticker, & Andreas, 2009). We explored the effect of items with outlying difficulty measures on the CFA model of the listening module of International English Language Testing System (IELTS). The test has four sections comprising 40 items altogether (10 items in each section). Each section measures a different listening skill making the test a conceptually four-dimensional assessment instrument.
Research Interests:
Forty science students received training for 12 weeks on delivering effective presentations and using a tertiary-level English oral presentation scale comprising three subscales (Verbal Communication, Nonverbal Communication, and Content... more
Forty science students received training for 12 weeks on delivering effective presentations and using a tertiary-level English oral presentation scale comprising three subscales (Verbal Communication, Nonverbal Communication, and Content and Organization) measured by 18 items. For their final project, each student was given 10 to 12 min to present on 1 of the 5 compulsory science books for the module and was rated by the tutor, peers, and himself/herself. Many-facet Rasch measurement, correlation, and analysis of variance were performed to mine the data. The results show that the student raters, tutor, items, and rating scales achieved high psychometric quality, though a small number of assessments exhibited bias. Although all of the biased self-assessments were underestimations of presentation skills, the peer and tutor assessment bias had a mixed pattern. In addition, self-, peer, and tutor assessments had low to medium correlations on the subscales, and a significant difference was found between the assessments. Implications are discussed.
Research Interests:
The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the... more
The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental translation (MT), person knowledge (PK), and directed attention (DA). The listening test was subjected to MRM analysis where a two-latent class model had a sufficient fit. Next, an artificial neural network and a chi-square test were used to examine the nature of the latent classes. Class 1 comprised high-ability listeners capable of multitasking and obtained high PS, PE, and lexico-grammatical test scores but low DA, PK, and MT scores. Class 2 comprised low-ability listeners with limited multitasking skills who obtained high DA, PK, and MT scores but low scores on PS, PE, and the lexico-grammatical test. Finally, a model of listening
comprehension is postulated and discussed.
Keywords: artificial neural network, gender, item response theory, lexicogrammatical knowledge, listening comprehension, metacognitive strategy awareness, mixture Rasch measurement
comprehension is postulated and discussed.
Keywords: artificial neural network, gender, item response theory, lexicogrammatical knowledge, listening comprehension, metacognitive strategy awareness, mixture Rasch measurement
Research Interests:
Research shows that test method can exert a significant impact on test takers’ performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional... more
Research shows that test method can exert a significant impact on test takers’ performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared feature are a source of response dependence (Marais & Andrich, 2008). In this study, we use the Rasch testlet model (Wang & Wilson, 2005a) to examine the effect of test method on violating the unidimensionality assumption of the Rasch model. Results show that test formats can introduce small to large construct-irrelevant variance, contaminate test scores, and lead to the violation of the conditional independence assumption. Our findings further suggest that the degree of construct-irrelevant variance exerted by test method could be a function of test format familiarity.
Keywords: conditional independence, Rasch testlet model, test method
Keywords: conditional independence, Rasch testlet model, test method
Research Interests:
The testing and teaching of listening has been partially guided by the notion of subskills, or a set of listening abilities that are needed for achieving successful comprehension and utilization of the information from listening texts.... more
The testing and teaching of listening has been partially guided by the notion of subskills, or a set of listening abilities that are needed for achieving successful comprehension and utilization of the information from listening texts. Although this notion came about mainly through applications of theoretical perspectives from psychology and communication studies, the actual divisibility of the subskills has rarely been examined. This article reports an attempt to do so by using data from the answers of 916 test takers of a retired version of the Michigan English Language Assessment Battery listening test. First, an iterative content analysis of items was carried out, identifying five key subskills. Next, the discriminability of subskills was examined through confirmatory factor analysis (CFA). Five independent measurement models representing the subskills were evaluated. The overall CFA model comprising the measurement models showed excessively high correlations among factors. Further tests through CFA resolved the inadmissible correlations, though the high correlations persisted. Finally, we made 23 aggregate-level items which were used in a higher-order model, which induced best fit indices and resolved the inadmissible estimates. The results show that the subskills in the test were empirically divisible, lending support to scholarly attempts in discussing components in the listening construct for the purpose of teaching and assessment.
Research Interests:
This study investigates the development in paragraph writing ability of 116 undergraduate English as a second language (ESL) students enrolled in a paragraph writing course. Students wrote sample paragraphs before, during, and after the... more
This study investigates the development in paragraph writing ability of 116 undergraduate English as a second language (ESL) students enrolled in a paragraph writing course. Students wrote sample paragraphs before, during, and after the course, and these were marked on an analytical scale by multiple expert raters. The results were first subjected to many-facet Rasch model (MFRM) analysis to measure differences in rater severity and identify rater misfits; raters’ scores were anchored to these initial results to generate fair scores for students. Next, a curve-of-factors latent growth model was fitted to the scores. The results showed that students’ ability in multiple writing skills grew gradually and linearly from the beginning of the course. This progress was found to be independent of the writing prompts. Students’ development is attributed to a variety of facilitative factors, including explicit lessons and frequent practice, regular feedback through a continuous assessment (CA) approach and various opportunities to engage with class tutors, and the use of online technology in the course. - See more at: http://www.ajsotl.edu.sg/article/examining-the-development-of-paragraph-writing-ability-of-tertiary-esl-students-a-continuous-assessment-study/#sthash.gN47hzGg.dpuf
Research Interests:
This study sought to examine the development of paragraph writing skills of 116 English as a second language university students over the course of 12 weeks and the relationship between the linguistic features of students’ written... more
This study sought to examine the development of paragraph writing skills of
116 English as a second language university students over the course of
12 weeks and the relationship between the linguistic features of students’
written texts as measured by Coh-Metrix – a computational system for estimating
textual features such as cohesion and coherence – and the scores
assigned by human raters. The raters’ reliability was investigated using
many-facet Rasch measurement (MFRM); the growth of students’ paragraph
writing skills was explored using a factor-of-curves latent growth model
(LGM); and the relationships between changes in linguistic features and
writing scores across time were examined by path modelling. MFRM analysis
indicates that despite several misfits, students’ and raters’ performances
and scale’s functionality conformed to the expectations of MFRM, thus providing
evidence of psychometric validity for the assessments. LGM shows
that students’ paragraph writing skills develop steadily during the course.
The Coh-Metrix indices have more predictive power before and after the
course than during it, suggesting that Coh-Metrix may struggle to discriminate between some ability levels. Whether a Coh-Metrix index gains or loses predictive power over time is argued to be partly a function of whether
raters maintain or lose sensitivity to the linguistic feature measured by that index in their own assessment as the course progresses.
Keywords: Coh-Metrix; factor-of-curves latent growth model; linguistic
features; many-facet Rasch measurement; paragraph writing
116 English as a second language university students over the course of
12 weeks and the relationship between the linguistic features of students’
written texts as measured by Coh-Metrix – a computational system for estimating
textual features such as cohesion and coherence – and the scores
assigned by human raters. The raters’ reliability was investigated using
many-facet Rasch measurement (MFRM); the growth of students’ paragraph
writing skills was explored using a factor-of-curves latent growth model
(LGM); and the relationships between changes in linguistic features and
writing scores across time were examined by path modelling. MFRM analysis
indicates that despite several misfits, students’ and raters’ performances
and scale’s functionality conformed to the expectations of MFRM, thus providing
evidence of psychometric validity for the assessments. LGM shows
that students’ paragraph writing skills develop steadily during the course.
The Coh-Metrix indices have more predictive power before and after the
course than during it, suggesting that Coh-Metrix may struggle to discriminate between some ability levels. Whether a Coh-Metrix index gains or loses predictive power over time is argued to be partly a function of whether
raters maintain or lose sensitivity to the linguistic feature measured by that index in their own assessment as the course progresses.
Keywords: Coh-Metrix; factor-of-curves latent growth model; linguistic
features; many-facet Rasch measurement; paragraph writing
Research Interests:
Research Interests:
""This article reports the development of the Test Takers’ Metacognitive Awareness Reading Questionnaire (TMARQ) which measure test takers’ metacognition in reading comprehension tests. TMARQ comprises seven subscales: planning... more
""This article reports the development of the Test Takers’ Metacognitive Awareness Reading Questionnaire (TMARQ) which measure test takers’ metacognition in reading comprehension tests. TMARQ comprises seven subscales: planning strategies, evaluating strategies, monitoring strategies, strategies for identifying important information, inference-making strategies, integrating strategies, and supporting strategies. In this article, a validity argument is laid out for the questionnaire by presenting content-referenced, substantive, and structural evidence of validity, which is primarily yielded through Rasch measurement and structural equation modeling.
http://link.springer.com/article/10.1007%2Fs40299-013-0083-z
Keywords: metacognitive awareness; reading test; Rasch measurement; structural equation modeling; validity
""
http://link.springer.com/article/10.1007%2Fs40299-013-0083-z
Keywords: metacognitive awareness; reading test; Rasch measurement; structural equation modeling; validity
""
The purpose of this paper is to examine the psychometric features of the International English Language Competency Assessment (IELCA) listening test. Specifically, it explores the reliability and underlying structure of the test and... more
The purpose of this paper is to examine the psychometric features of the International English Language Competency Assessment (IELCA) listening test. Specifically, it explores the reliability and underlying structure of the test and sheds a light on test method effects.
Abstract—This study reports a novel application of the Adaptive Neuro Fuzzy Inference Systems (ANFIS) to a second language listening test, and compares it with path modeling of observed variables. Seven variables were defined and... more
Abstract—This study reports a novel application of the Adaptive Neuro Fuzzy Inference Systems (ANFIS) to a second language listening test, and compares it with path modeling of observed variables. Seven variables were defined and hypothesized to influence the primary dependent variable, test item difficulty. Next, a matrix of these eight variables was developed and subjected to ANFIS and path modeling. ANFIS analysis found stronger effects for several of the seven explanatory variables. Path modeling captured some of the same effects through a mediating variable, test section, which captures aggregate differences across different subsections of the test. In general, neurofuzzy models (NFMs) appear to be a promising tool in language and educational assessment.
Keywords: Adaptive Neuro-Fuzzy Inference Systems (ANFIS); item difficult; listening test
Keywords: Adaptive Neuro-Fuzzy Inference Systems (ANFIS); item difficult; listening test
"This article reports on the development and administration of the Academic Listening Self-Assessment Questionnaire (ALSA). The ALSA was developed on the basis of a proposed model of academic listening comprising six related components.... more
"This article reports on the development and administration of the Academic Listening Self-Assessment Questionnaire (ALSA). The ALSA was developed on the basis of a proposed model of academic listening comprising six related components. The researchers operationalized the model, subjected items to iterative rounds of content analysis, and administered the finalized questionnaire to international ESL (English as a second language) students in Malaysian and Australian universities. Structural equation modeling and Rasch rating scale modeling of data provided content-related, substantive, and structural validity evidence for the instrument. The researchers explain the utility of the questionnaire for educational and assessment purposes.
Keywords: academic listening, language testing, Rasch Rating Scale model self-assessment, structural equation modeling"
Keywords: academic listening, language testing, Rasch Rating Scale model self-assessment, structural equation modeling"
Language self-appraisal (or self-assessment) is a process by which students evaluate their own language competence. This article describes the relationship between students’ self-appraisals and their performance on a measure of academic... more
Language self-appraisal (or self-assessment) is a process by which students evaluate their own language competence. This article describes the relationship between students’ self-appraisals and their performance on a measure of academic listening (AL). Following Aryadoust and Goh (2011), AL was defined as a multi-componential construct including cognitive processing skills, linguistic components and prosody, note-taking, rating input to other materials, knowledge of lecture structure, and memory and concentration. Participants (n = 63) were given a self-assessment questionnaire which is founded upon the components of AL presented by Aryadoust and Goh, and a test of academic listening developed by English Testing Service (ETS); subsequently, their performance on both measures were found to be correlated. Significant correlations were apparent, indicating that learners assessed their listening skills fairly accurately and precisely. Pedagogical implications and applications of self-assessment are discussed in this paper.
http://blog.nus.edu.sg/eltwo/2012/05/29/reliability-of-second-language-listening-self-assessments-implications-for-pedagogy/
http://blog.nus.edu.sg/eltwo/2012/05/29/reliability-of-second-language-listening-self-assessments-implications-for-pedagogy/
SEM applied to language assessment tools
In the first installment of this article, I reviewed cognitive diagnostic assessment (CDA) and mentioned its advantages over other latent trait methods. I argued that the difficulty of a task can be accounted for by multiple factors or... more
In the first installment of this article, I reviewed cognitive diagnostic assessment (CDA) and mentioned
its advantages over other latent trait methods. I argued that the difficulty of a task can be accounted for by multiple factors or attributes1. Conventional unidimensional item response theory (IRT) models do not disseminate information concerning the factors attributing to task difficulty. On the other hand, the
fusion model - which is a CDA model - partitions the difficulty parameter so as to furnish fine-grained information about the tasks and test takers’ ability level. I further argued that granularity of the attributes is determined by researchers. In this installment the application of the fusion model to a
while-listening performance (WLP) test of is described.
its advantages over other latent trait methods. I argued that the difficulty of a task can be accounted for by multiple factors or attributes1. Conventional unidimensional item response theory (IRT) models do not disseminate information concerning the factors attributing to task difficulty. On the other hand, the
fusion model - which is a CDA model - partitions the difficulty parameter so as to furnish fine-grained information about the tasks and test takers’ ability level. I further argued that granularity of the attributes is determined by researchers. In this installment the application of the fusion model to a
while-listening performance (WLP) test of is described.
The most important property of a measurement tool is the validity of uses and interpretation of its scores. Test developers attempt to establish validity by exploiting different techniques. Conventionally, validity has subsumed content,... more
The most important property of a measurement tool is the validity of uses and interpretation of its scores. Test developers attempt to establish validity by exploiting different techniques. Conventionally, validity has subsumed content, criterion, predictive, and construct classes. However, the new argument-based approach to validity adheres to the use and interpretations of test scores. The argument-based approach to validity has been introduced to the field by Kane (1992, 2001, 2002, 2004, 2006) (see also Mislevy, Steinberg, & Almond, 2003; Kane, Crooks, & Cohen, 1999; Mislevy, 2003; Koenig and Bachman, 2004; Bachman, 2005).
This study investigates the psychometric quality of a placement tool to assess English as a second language (ESL) writing. The author proposes an ESL writing model comprising four major facets: examinees, raters, tasks, and scoring... more
This study investigates the psychometric quality of a placement tool to assess English as a second language (ESL) writing. The author proposes an ESL writing model comprising four major facets: examinees, raters, tasks, and scoring criteria. The model has five scoring criteria: relevance and adequacy of content, compositional organization, cohesion, adequacy of vocabulary, and grammar; these are evaluated using a seven-point scoring rubric. The data were subjected to many-facets Rasch analysis which showed that the facets adapted in the present study functioned according to the expectations of the Rasch model in a few cases, but further studies should address the psychometric properties of the rating scale which might be a major cause of central tendency error.
Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in assessing writing, but no consensus on its functionality has been reached. The present study presents evidence that this scoring criterion... more
Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in assessing writing, but no consensus on its functionality has been reached. The present study presents evidence that this scoring criterion may not be appropriate in writing assessment. Scripts by 182 ESL students at two language centers were analyzed with the Rasch partial credit model. Although other scoring criteria functioned satisfactorily, SSV scores did not fit the Rasch model, and analysis of residuals showed SSV scoring on most test prompts loaded on a benign secondary dimension. The study proposes that a lexico-grammatical scoring criterion has potentially conflicting properties, and therefore recommends considering separate vocabulary and grammar criteria in writing assessment.
This paper integrates the Rasch validity model (Wright & Stone, 1988, 1999) into the argument-based validity framework (Kane, 1992, 2004). The Rasch validity subsumes fit and order validity. Order validity has two subcategories: meaning... more
This paper integrates the Rasch validity model (Wright & Stone, 1988, 1999) into the argument-based validity framework (Kane, 1992, 2004). The Rasch validity subsumes fit and order validity. Order validity has two subcategories: meaning validity (originated from the calibration of test variables) and utility validity (based on the calibration of persons to implement criterion validity). Fit validity concerns the consistency of response patterns. From 1) analysis of residuals, i.e., the difference between the Rasch model and the responses, 2) analysis of item fit, which can help revising the test, and 3) analysis of person fit, which can help diagnosing the testees whose performance do not fit our expectations, we get response, item function, and person performance validity, respectively....
In a series of YouTube videos, I provide systematic guidelines for using SPSS, WINSTEPS, and other statistical software and interpreting their output. Please subscribe to receive notifications when new videos are released:... more
In a series of YouTube videos, I provide systematic guidelines for using SPSS, WINSTEPS, and other statistical software and interpreting their output. Please subscribe to receive notifications when new videos are released:
https://www.youtube.com/channel/UCfu2GCdjq50W-kL-cv3rcLw?view_as=subscriber
https://www.youtube.com/channel/UCfu2GCdjq50W-kL-cv3rcLw?view_as=subscriber
Research Interests: Multivariate Statistics, Structural Equation Modeling, Testing, Rasch Models, Latent Class Models, and 13 moreYoutube, statistics with SPSS and Excel, Factor analysis, Correlation, ANOVA, Latent Gold, Multiple Linear Regression, ANCOVA, Manova, Repeated Measures, Normal Distribution, Mancova, and SPM 2014
Special Issue on Research into Learner Listening
Guest Editors: Christine C. M. Goh and Vahid Aryadoust
Guest Editors: Christine C. M. Goh and Vahid Aryadoust
Research Interests:
Special Issue on Using Assessment Tasks for Improving Second Language Writing
EDUCATIONAL PSYCHOLOGY
AN INTERNATIONAL JOURNAL OF EXPERIMENTAL EDUCATIONAL PSYCHOLOGY
EDUCATIONAL PSYCHOLOGY
AN INTERNATIONAL JOURNAL OF EXPERIMENTAL EDUCATIONAL PSYCHOLOGY
Research Interests:
It has been argued that item difficulty can affect the fit of a confirmatory factor analysis (CFA) model (McLeod, Swygert, & Thissen, 2001; Sawaki, Sticker, & Andreas, 2009). We explored the effect of items with outlying difficulty... more
It has been argued that item difficulty can affect the fit of a confirmatory factor analysis (CFA) model (McLeod, Swygert, & Thissen, 2001; Sawaki, Sticker, & Andreas, 2009). We explored the effect of items with outlying difficulty measures on the CFA model of the listening module of International English Language Testing System (IELTS). The test has four sections comprising 40 items altogether (10 items in each section). Each section measures a different listening skill making the test a conceptually four-dimensional assessment instrument...
Paper presented at the fourth ALTE, Krakow, Poland.
Research into the psychological and cognitive aspects of language learning, and second language (L2) learning in particular, demands new measurement tools that provide highly detailed information about language learners’ progress and... more
Research into the psychological and cognitive aspects of language learning, and second language (L2) learning in particular, demands new measurement tools that provide highly detailed information about language learners’ progress and proficiency. A new development in measurement models is Cognitive Diagnostic Assessment (CDA), which helps language assessment researchers evaluate students’ mastery of specific language sub-skills with greater specificity than other item response theory models. This paper discusses the tenets of CDA models in general and the fusion model (FM) in particular, and reports the results of a study applying the FM to lecture-comprehension section of the International English Language Testing System (IELTS) listening module. FM separates only two major listening sub-skills (i.e., the ability to understand explicitly stated information and make close paraphrases), likely indicating construct-underrepresentation. It also provides a master / non-mastery profile of test takers. Implications for assessing listening comprehension and IELTS are discussed.
The application of MFRM to writing tests of English.
factor structure of the IELTS listening test
This report reviews three prominent conceptualizations of validity (i.e., Embretson, 1983; Kane, 2002; Messick, 1989) to lay out a validity argument (VA) for the International English Proficiency Test (IEPT). To build and support the VA... more
This report reviews three prominent conceptualizations of validity (i.e., Embretson, 1983; Kane, 2002; Messick, 1989) to lay out a validity argument (VA) for the International English Proficiency Test (IEPT). To build and support the VA for the IEPT, we endorse Kane’s (2004, 2006, 2012) conceptualization which defines validity as a two-stage undertaking: making claims about the uses and interpretations of the scores (or interpretive argument) and evaluating the claims (or VA). The report further proposes several rigorous research methods and psychometric models to support the VA. The document, however, does not compare these concepts. For further information, readers are referred to Aryadoust (forthcoming).
Research Interests:
Research Interests:
Research Interests:
Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in assessing writing, but no consensus on its functionality has been reached. The present study presents evidence that this scoring criterion... more
Several studies have evaluated sentence structure and vocabulary (SSV) as a scoring criterion in assessing writing, but no consensus on its functionality has been reached. The present study presents evidence that this scoring criterion may not be appropriate in writing assessment. Scripts by 182 ESL students at two language centers were analyzed with the Rasch partial credit model. Although other scoring criteria functioned satisfactorily, SSV scores did not fit the Rasch model, and analysis of residuals showed SSV scoring on most test prompts loaded on a benign secondary dimension. The study proposes that a lexico-grammatical scoring criterion has potentially conflicting properties, and therefore recommends considering separate vocabulary and grammar criteria in writing assessment.
Research Interests:
Research Interests:
Although language assessment and testing can be viewed as having a much longer history (Spolsky, 2017; Farhady, 2018), its genesis as a research field is often attributed to Carroll’s (1961) and Lado’s (1961) publications. Over the past... more
Although language assessment and testing can be viewed as having a much longer history (Spolsky, 2017; Farhady, 2018), its genesis as a research field is often attributed to Carroll’s (1961) and Lado’s (1961) publications. Over the past decades, the field has gradually grown in scope and sophistication as researchers have adopted various interdisciplinary approaches to problematize and address old and new issues in language assessment as well as learning. The assessment and validation of reading, listening, speaking, and writing, as well as language elements such as vocabulary and grammar have formed the basis of extensive studies (e.g., Chapelle, 2008). Emergent research areas in the field include the assessment of sign languages (Kotowicz et al., 2021). In addition, researchers have employed a variety of psychometric and statistical methods to investigate research questions and hypotheses (see chapters in Aryadoust and Raquel, 2019, 2020). The present special issue entitled “Front...
Research Interests:
Test fairness has been recognised as a fundamental requirement of test validation. Two quantitative approaches to investigate test fairness, the Rasch-based differential item functioning (DIF) detection method and a measurement invariance... more
Test fairness has been recognised as a fundamental requirement of test validation. Two quantitative approaches to investigate test fairness, the Rasch-based differential item functioning (DIF) detection method and a measurement invariance technique called multiple indicators, multiple causes (MIMIC), were adopted and compared in a test fairness study of the Pearson Test of English (PTE) Academic Reading test (n = 783). The Rasch partial credit model (PCM) showed no statistically significant uniform DIF across gender and, similarly, the MIMIC analysis showed that measurement invariance was maintained in the test. However, six pairs of significant non-uniform DIF (p < 0.05) were found in the DIF analysis. A discussion of the results and post-hoc content analysis is presented and the theoretical and practical implications of the study for test developers and language assessment are discussed.
This study evaluated the validity of the Michigan English Test (MET) Listening Section by investigating its underlying factor structure and the replicability of its factor structure across multiple...
Research Interests:
Social interactions accompany individuals throughout their whole lives. When examining the underlying mechanisms of social processes, dynamics of synchrony, coordination or attunement emerge between individuals at multiple levels. To... more
Social interactions accompany individuals throughout their whole lives. When examining the underlying mechanisms of social processes, dynamics of synchrony, coordination or attunement emerge between individuals at multiple levels. To identify the impactful publications that studied such mechanisms and establishing the trends that dynamically originated the available literature, the current study adopted a scientometric approach. A sample of 543 documents dated from 1971 to 2021 was derived from Scopus. Subsequently, a document co-citation analysis was conducted on 29,183 cited references to examine the patterns of co-citation among the documents. The resulting network consisted of 1,759 documents connected to each other by 5,011 links. Within the network, five major clusters were identified. The analysis of the content of the three major clusters—namely, “Behavioral synchrony,” “Towards bio-behavioral synchrony,” and “Neural attunement”—suggests an interest in studying attunement in...
Research Interests:
Research Interests:
This study investigates the underlying structure of the listening test of the Singapore–Cambridge General Certificate of Education (GCE) exam, comparing the fit of five cognitive diagnostic assessment models comprising the deterministic... more
This study investigates the underlying structure of the listening test of the Singapore–Cambridge General Certificate of Education (GCE) exam, comparing the fit of five cognitive diagnostic assessment models comprising the deterministic input noisy “and” gate (DINA), generalized DINA (G-DINA), deterministic input noisy “or” gate (DINO), higher-order DINA (HO-DINA), and the reduced reparameterized unified model (RRUM). Through model-comparisons, a nine-subskill RRUM model was found to possess the optimal fit. This study shows that students’ listening test performance depends on an array of test-specific facets, such as the ability to eliminate distractors in multiple-choice questions alongside listening-specific subskills such as the ability to make inferences. The validated list of the listening subskills can be employed as a useful guideline to prepare students for the GCE listening test at schools.
Research Interests:
This article proposes an integrated cognitive theory of reading and listening that draws on a maximalist account of comprehension and emphasizes the role of bottom-up and top-down processing. The theoretical framework draws on the... more
This article proposes an integrated cognitive theory of reading and listening that draws on a maximalist account of comprehension and emphasizes the role of bottom-up and top-down processing. The theoretical framework draws on the findings of previous research and integrates them into a coherent and plausible narrative to explain and predict the comprehension of written and auditory inputs. The theory is accompanied by a model that schematically represents the fundamental components of the theory and the comprehension mechanisms described. The theory further highlights the role of perception and word recognition (underresearched in reading research), situation models (missing in listening research), mental imagery (missing in both streams), and inferencing. The robustness of the theory is discussed in light of the principles of scientific theories adopted from Popper (1959).
Research Interests:
The effectiveness of a language test to meaningfully diagnose a learner’s language proficiency remains in some doubt. Alderson (2005) claims that diagnostic tests are superficial because they do not inform learners what they need to do in... more
The effectiveness of a language test to meaningfully diagnose a learner’s language proficiency remains in some doubt. Alderson (2005) claims that diagnostic tests are superficial because they do not inform learners what they need to do in order to develop; “they just identify strengths and weaknesses and their remediation” (p. 1). In other words, a test cannot claim to be diagnostic unless it facilitates language development in the learner. In response to the perceived need for a mechanism to both provide diagnostic information and specific language support, four Hong Kong universities have developed the Diagnostic English Language Tracking Assessment (DELTA), which could be said to be meaningfully diagnostic because it is both integrated into the English language learning curriculum and used in combination with follow-up learning resources to guide independent learning.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
The purpose of the present study was twofold: (a) it examined the relationship between peer-rated likeability and peer-rated oral presentation skills of 96 student presenters enrolled in a science communication course, and (b) it... more
The purpose of the present study was twofold: (a) it examined the relationship between peer-rated likeability and peer-rated oral presentation skills of 96 student presenters enrolled in a science communication course, and (b) it investigated the relationship between student raters’ severity in rating presenters’ likeability and their severity in evaluating presenters’ skills. Students delivered an academic presentation and then changed roles to rate their peers’ performance and likeability, using an 18-item oral presentation scale and a 10-item likeability questionnaire, respectively. Many-facet Rasch measurement was used to validate the data, and structural equation modeling (SEM) was used to examine the research questions. At an aggregate level, likeability explained 19.5% of the variance of the oral presentation ratings and 8.4% of rater severity. At an item-level, multiple cause-effect relationships were detected, with the likeability items explaining 6–30% of the variance in the oral presentation items. Implications of the study are discussed.
Research Interests:
Validity evidence is provided for a Persian blog attitude questionnaire (P-BAQ). P-BAQ was administered to 565 Iranians and factor analysis and rating scale model identified affective, behavioral, and perseverance, and confidence... more
Validity evidence is provided for a Persian blog attitude questionnaire (P-BAQ). P-BAQ was administered to 565 Iranians and factor analysis and rating scale model identified affective, behavioral, and perseverance, and confidence dimensions underlying the data. P-BAQ’s validity argument was supported by the theoretical and psychometric evidence, although adding a few items to the instrument would improve its construct representativeness.
Recommended Citation
Aryadoust, Vahid and Shahsavar, Zahra (2016) "Validity of the Persian Blog Attitude Questionnaire: An Evidence-Based Approach," Journal of Modern Applied Statistical Methods: Vol. 15: Iss. 1, Article 22.
Available at: http://digitalcommons.wayne.edu/jmasm/vol15/iss1/22
Recommended Citation
Aryadoust, Vahid and Shahsavar, Zahra (2016) "Validity of the Persian Blog Attitude Questionnaire: An Evidence-Based Approach," Journal of Modern Applied Statistical Methods: Vol. 15: Iss. 1, Article 22.
Available at: http://digitalcommons.wayne.edu/jmasm/vol15/iss1/22
Research Interests:
The Kurdish Language is mainly spoken in Iran, Iraq, Turkey, and Syria. Some dialects of this language still possess similar features this language has had in the past, among them Hawrami which is mainly spoken in western Iran (along with... more
The Kurdish Language is mainly spoken in Iran, Iraq, Turkey, and Syria. Some dialects of this language still possess similar features this language has had in the past, among them Hawrami which is mainly spoken in western Iran (along with other areas ...
Modelling listening item difficulty remains a challenge to this day. Latent trait models such as the Rasch model used to predict the outcomes of test takers’ performance on test items have been criticized as “thin on substantive theory”... more
Modelling listening item difficulty remains a challenge to this day. Latent trait models such as the Rasch model used to predict the outcomes of test takers’ performance on test items have been criticized as “thin on substantive theory” (Stenner, Stone, & Burdick, 2011, p.3). The use of regression models to predict item difficulty also has its limitations because linear regression assumes linearity and normality of data which, if violated, results in a lack of fit. In addition, classification and regression trees (CART), despite their rigorous algorithm, do not always yield a stable tree structure (Breiman, 2001). Another problem pertains to the operationalization of dependent variables. Researchers have relied on content specialists or verbal protocols elicited from test takers to determine the variables predicting item difficulty. However, even though content specialists are highly competent, they may not be able to determine precisely the lower-level comprehension processes used ...
This article reports on the development and administration of the Academic Listening Self-rating Questionnaire (ALSA). The ALSA was developed on the basis of a proposed model of academic listening comprising six related components. The... more
This article reports on the development and administration of the Academic Listening Self-rating Questionnaire (ALSA). The ALSA was developed on the basis of a proposed model of academic listening comprising six related components. The researchers operationalized the model, subjected items to iterative rounds of content analysis, and administered the finalized questionnaire to inter-national ESL (English as a second language) students in Malaysian and Australian universities. Structural equation modeling and rating scale modeling of data provided content-related, substan-tive, and structural validity evidence for the instrument. The researchers explain the utility of the questionnaire for educational and assessment purposes.
ABSTRACT Research into the psychological and cognitive aspects of language learning, and second language (L2) learning in particular, demands new measurement tools that provide highly detailed information about language learners’ progress... more
ABSTRACT Research into the psychological and cognitive aspects of language learning, and second language (L2) learning in particular, demands new measurement tools that provide highly detailed information about language learners’ progress and proficiency. A new development in measurement models is Cognitive Diagnostic Assessment (CDA), which helps language assessment researchers evaluate students’ mastery of specific language sub-skills with greater specificity than other item response theory models. This paper discusses the tenets of CDA models in general and the fusion model (FM) in particular, and reports the results of a study applying the FM to lecture-comprehension section of a practice version the International English Language Testing System (IELTS) listening module. FM separates only two major listening sub-skills (i.e., the ability to understand explicitly stated information and make close paraphrases), likely indicating construct-underrepresentation. It also provides a master / non-mastery profile of test takers. Implications for assessing listening comprehension and IELTS are discussed.