Hongli Li
Georgia State University, Educational Policy Studies, Faculty Member
- Language Testing, English Language Learners, Educational Measurement, Cognitive Diagnostic Assessment, Structural Equation Modeling, Hierarchical Linear Modeling, and 14 moreQuantitative Methods, Formative Assessment, Psychometrics, Language Assessment, Peer Assessment, Measurement and Evaluation, Multilevel Modeling of Educational Data, Assessment and Measurement, Assessment, Language Testing and Assessment, Educational measurement/assessment, Psychometrics (Research Methodology), Item Response Theory, and Psychological and Educational Testingedit
- Hongli Li is a professor in the Research, Measurement and Statistics (RMS) program of Educational Policy Studies Depa... moreHongli Li is a professor in the Research, Measurement and Statistics (RMS) program of Educational Policy Studies Department at Georgia State University.
She graduated from the Pennsylvania State University in 2011 with a Ph.D. in educational psychology specializing in educational measurement.
Her primary research areas are applied measurement and quantitative methods in education. In particular, she is interested in how testing influences teaching and learning (cognitive diagnostic modeling and formative assessment), peer assessment, and measurement issues in the online learning environment.edit
The purpose of this study is to review the status of differential item functioning (DIF) research in language testing, particularly as it relates to the investigation of sources (or causes) of DIF, which is a defining characteristic of... more
The purpose of this study is to review the status of differential item functioning (DIF) research in language testing, particularly as it relates to the investigation of sources (or causes) of DIF, which is a defining characteristic of the third generation DIF. This review included 110 DIF studies of language tests dated from 1985 to 2019. We found that DIF researchers did not address sources of DIF more frequently in recent years than in earlier years. Nevertheless, DIF research in language testing has expanded with new DIF analysis procedures, more grouping variables, and more diversified methods for investigating sources of DIF. In addition, in the early years of DIF research, methods to identify sources of DIF relied heavily on content analysis. This review showed that while more sophisticated statistical procedures have been adopted in recent years to address sources of DIF, understanding sources of DIF still remains a challenging task. We also discuss the pros and cons of existing methods to detect sources of DIF and implications for future investigations.
Research Interests:
Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on noncognitive aspects... more
Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on noncognitive aspects by performing a meta-analysis to synthesize the effect of peer assessment on students’ non-cognitive learning outcomes. After a systematic search, we included 43 effect sizes from 19 studies, which mostly involved learning strategies and academic mind-sets as non-cognitive outcomes. Using a random effects model, we found that students who had participated in peer assessment showed a 0.289 standard deviation unit improvement in non-cognitive outcomes as compared to students who had not participated in peer assessment. Further, we found that the effect of peer assessment on non-cognitive outcomes was significantly larger when both scores and comments were provided to students or when assessors and assessees were matched at random. Our findings can be used as a basis for
further investigation into how best to use peer assessment as a learning tool, especially to promote non-cognitive development.
further investigation into how best to use peer assessment as a learning tool, especially to promote non-cognitive development.
Research Interests:
Research Interests:
Background: The aim of this study was to investigate the relationship between executive function (EF), stuttering, and comorbidity by examining children who stutter (CWS) and children who do not stutter (CWNS) with and without comorbid... more
Background: The aim of this study was to investigate the relationship between executive function (EF), stuttering, and comorbidity by examining children who stutter (CWS) and children who do not stutter (CWNS) with and without comorbid conditions. Data from the National Health Interview Survey were used to examine behavioral manifestations of EF, such as inattention and self-regulation, in CWS and CWNS. Methods: The sample included 2258 CWS (girls = 638, boys = 1620), and 117,725 CWNS (girls = 57,512; boys = 60,213). EF, and the presence of stuttering and comorbid conditions were based on parent report. Descriptive statistics were used to describe the distribution of stuttering and comorbidity across group and sex. Regression analyses were to determine the effects of stuttering and comorbidity on EF, and the relationship between EF and socioemotional competence. Results: Results point to weaker EF in CWS compared to CWNS. Also, having comorbid conditions was also associated with weaker EF. CWS with comorbidity showed the weakest EF compared to CWNS with and without comorbid-ity, and CWS without comorbidity. Children with stronger EF showed higher socioemotional competence. A majority (60.32%) of CWS had at least one other comorbid condition in addition to stuttering. Boys who stutter were more likely to have comorbid conditions compared to girls who stutter. Conclusion: Present findings suggest that comorbidity is a common feature in CWS. Stuttering and comorbid conditions negatively impact EF.
Research Interests:
With online course delivery on the rise, it is essential to understand the preparedness of students attending traditional universities. Prior research has found that some students struggle in online courses, which leads to a quest to... more
With online course delivery on the rise, it is essential to understand the preparedness of students attending traditional universities. Prior research has found that some students struggle in online courses, which leads to a quest to better understand the reason why. Studies of self-regulated learning (SRL) in online and blended courses have added to our understanding. However, few studies have used a person-centered approach to study profiles of SRL in fully online courses, and none with a population of students attending a traditional university. This is of importance, especially at a time when traditional universities are increasingly providing online courses. To address the gaps in previous SRL profile research, the current study examined individual differences in SRL profiles of 477 students attending online courses at a traditional university setting, using the Online Self-regulated Learning Questionnaire (OSLQ). Using latent profile analysis, we found four different profiles, with a majority of the students falling in groups representing lower levels of SRL skills. We also explored the possible relationship of experience in online learning, online comfort, age, and gender with the identified self-regulated learning profiles. Relationships were found between the profiles and comfort level as well as with gender.
Research Interests:
Despite increasing pressure for children to learn to write at younger ages, there are many unanswered questions about composition skills in early elementary school. The goal of this research was to examine the dimensionality of... more
Despite increasing pressure for children to learn to write at younger ages, there are many unanswered questions about composition skills in early elementary school. The goal of this research was to examine the dimensionality of composition skills in kindergarten children, thereby adding to current knowledge about the measurement of young children’s writing and its component skills. The writing of 282 kindergarten children were assessed using three different scoring methods. Confirmatory factor analyses were used to investigate the dimensionality of various methods of scoring. Results indicated that a qualitative scoring system and a productivity scoring system capture distinct dimensions of kindergartners’ compositions. A scoring system for curriculum-based measurement could not attain acceptable fit, which may suggest that CBM is ill-suited for capturing the important components of composition for kindergartners. This study indicated that the measurement and components of composition in kindergarten may be qualitatively different from the compositions of older children.
Research Interests:
A major challenge in research with struggling adult readers is their heterogeneity in reading-related competencies and demographic characteristics. The purpose of this investigation was to identify unique profiles of skill sets among... more
A major challenge in research with struggling adult readers is their heterogeneity in reading-related competencies and demographic characteristics. The purpose of this investigation was to identify unique profiles of skill sets among struggling adult readers and explore informative demographic differences between profiles. Using latent class analysis with a sample of 542 struggling adult readers, we uncovered four empirically distinct classes of readers based on their performance on ten assessments of lower-level and higher-level competencies. On all measured competencies, globally impaired readers (n = 123) demonstrated the largest deficits and globally better readers (n = 86) outperformed all other classes. Two intermediate profiles, weak decoders (n = 144) and weak language comprehenders (n = 189), exhibited complementary patterns of strengths and weaknesses on lower-level and higher-level competencies. One-way ANOVA and chi-square tests of difference indicated that the classes differed significantly in terms of reading comprehension performance, age, and language background but not high school completion. Implications for instruction and future research are discussed.
Research Interests:
Researchers have been interested in classifying massive open online course (MOOC) students based on their learning behaviors. However, less attention has been paid to the cognitive attributes associated with various learning behaviors. In... more
Researchers have been interested in classifying massive open online course (MOOC) students based on their learning behaviors. However, less attention has been paid to the cognitive attributes associated with various learning behaviors. In this study, we propose a conceptual model that links MOOC students’ observable learning behaviors to their latent attributes (i.e., individual learning versus interactive learning). Using students’ behavior data from a MOOC, we performed a cognitive diagnostic analysis to identify the students’ learning profiles and to determine how these profiles related to their course achievement. We found that a large portion of the students performed individual learning whereas only a very small portion of them overtly performed interactive learning. In addition, the students who performed interactive learning were more likely to pass the course with distinction than the students who did not show this attribute. The results of this study have important implications for improving students’ learning in MOOCs. Further, the study provides a good demonstration of how to use clickstream process data for psychometric analysis.
Research Interests:
The Classroom Assessment Scoring System (CLASS) has been used extensively to measure teacher-student interactions and classroom quality. With a theoretical foundation rooted in the developmental theory of learning, CLASS has three primary... more
The Classroom Assessment Scoring System (CLASS) has been used extensively to measure teacher-student interactions and classroom quality. With a theoretical foundation rooted in the developmental theory of learning, CLASS has three primary domains—Emotional Support, Classroom Organization, and Instructional Support. In this study, we performed a meta-analysis of the factor structure of CLASS using Cheung’s two-stage structural equation modeling (TSSEM) approach. After searching the literature, we obtained 26 correlation matrices of the 10 dimensions shared by multiple versions of CLASS. This meta-analysis supports the three-factor model initially proposed by CLASS developers. The finding of this meta-analysis provides important evidence pertinent to the CLASS factor structure and has significant implications regarding the interpretation and use of CLASS scores.
Research Interests:
This study explored the relations between reading comprehension and two memory capacities, short‐term memory (STM) and working memory (WM), for adults who read between the third and eighth grade levels. With a sample of 407 adults from... more
This study explored the relations between reading comprehension and two memory capacities, short‐term memory (STM) and working memory (WM), for adults who read between the third and eighth grade levels. With a sample of 407 adults from two countries, we computed correlations among measures and conducted hierarchical regression and commonality analyses for reading comprehension. Reading comprehension had moderate positive correlations with STM and WM. Additionally, STM and WM jointly accounted for approximately 19% of the reading comprehension variance and uniquely contributed approximately 4% and 7% of the variance, respectively. The predictive utility of memory to reading comprehension was greatly reduced after controlling for age, word reading, fluency and oral vocabulary. WM appears to be a slightly stronger predictor of reading comprehension than STM for struggling adult readers. However, the overall contributions of memory capacities to reading comprehension are much smaller than those of reading‐related skills.
Research Interests:
The role of measuring functional impairment holds an important place in research, clinical practice, and service provision for children and adolescents. Responding to the growing need to measure serious emotional disturbances at the... more
The role of measuring functional impairment holds an important place in research, clinical practice, and service provision for children and adolescents. Responding to the growing need to measure serious emotional disturbances at the local, state, and national level, the Columbia Impairment Scale (CIS) was developed in the early 1990s and has remained one of the several popular scales for assessing functional impairment. However, despite the growing popularity of the instrument in research and practice, only a few studies to date have specifically examined the psychometric properties of the CIS. In this article, we describe the results of the first item response theory analysis of the CIS utilizing nationally representative data from the Medical Expenditure Panel Survey (N = 69,966). The results of our analysis lend support to the essential unidimensionality of the CIS and demonstrate that the scale is most reliable for those who exhibit high levels of functional impairment. Given the psychometric properties of the scale identified by our analysis, we contend that the CIS is a viable measure in the ongoing efforts to establish a national epidemiologic surveillance system to track the prevalence and impact of serious emotional disturbances in children and adolescents.
Research Interests:
Research Interests:
In recent years, students’ test scores have been used to evaluate teachers’ performance. The assumption underlying this practice is that students’ test performance reflects teachers’ instruction. However, this assumption is generally not... more
In recent years, students’ test scores have been used to evaluate teachers’ performance. The assumption underlying this practice is that students’ test performance reflects teachers’ instruction. However, this assumption is generally not empirically tested. In this study, we attempted to examine the effect of teachers’ instruction on test performance at the item level. Specifically, using the U.S. TIMSS 2011 4th-grade math assessment data, we examined the instructional sensitivity of the items using a hierarchical differential item functioning (DIF) approach. Specifically, we tested whether students who had received instruction on a given item showed significantly better performance on the item than students who had not received such instruction when their overall math ability was controlled for, whether with or without controlling for student-level and class-level covariates. The study provided preliminary findings regarding why some items showed instructional sensitivity and shed some light on how to develop instructionally sensitive items. Implications and directions for further research were also discussed.
Research Interests:
Drawing on the PISA 2009 US dataset, this study examines the relationship between formative assessment and students’ reading achievement using a structural equation modeling approach. We find that formative assessment is positively... more
Drawing on the PISA 2009 US dataset, this study examines the relationship between formative assessment and students’ reading achievement using a structural equation modeling approach. We find that formative assessment is positively related to students’ reading achievement directly and indirectly (through teacher–student relationship and attitude toward reading) for all students. The direct relationship between formative assessment and reading achievement is significantly stronger for Black students than for White students, whether or not student SES, gender, and school mean SES are controlled for. The total relationship (the direct plus the indirect relationship) between formative assessment and reading achievement also appears to be stronger for Black students than for White students; however, the difference is not statistically significant whether or not we control for covariates. No significant difference is found between White and Hispanic students in terms of the direct and the total relationship between formative assessment and reading achievement. Using a nationally representative dataset, this study provides empirical evidence that formative assessment is positively related to students’ reading achievement in general. In addition, this study provides preliminary evidence to show the potential of formative assessment to help reduce achievement gaps between Black and White students. The implications and limitations of the study are also discussed.
Research Interests:
Cognitive diagnostic models (CDMs) have great promise for providing diagnostic information to aid learning and instruction, and a large number of CDMs have been proposed. However, the assumptions and performances of different CDMs and... more
Cognitive diagnostic models (CDMs) have great promise for providing diagnostic information to aid learning and instruction, and a large number of CDMs have been proposed. However, the assumptions and performances of different CDMs and their applications in regard to reading comprehension tests are not fully understood. In the present study, we compared the performance of a saturated model (G-DINA), two compensatory models (DINO, ACDM), and two non-compensatory models (DINA, RRUM) with the Michigan English Language Assessment Battery (MELAB) reading test. Compared to the saturated G-DINA model, the ACDM showed comparable model fit and similar skill classification results. The RRUM was slightly worse than the ACDM and G-DINA in terms of model fit and classification results, whereas the more restrictive DINA and DINO performed much worse than the other three models. The findings of this study highlighted the process and considerations pertinent to model selection in applications of CDMs with reading tests.
Research Interests:
In this study, we examined relationships between the use of test results and U.S. students’ math, reading, and science performance in Programme for International Student Assessment (PISA) 2009. Based on a literature review, we... more
In this study, we examined relationships between the use of test results and U.S. students’ math, reading, and science performance in Programme for International Student Assessment (PISA) 2009. Based on a literature review, we hypothesized that the 16 items in the PISA school questionnaire, which are related to the use of test results, can be categorized according to four factors. We validated this hypothesized factor structure using a confirmatory factor analysis and then obtained composite scores for each factor. As revealed by a multilevel analysis, when student and school demographic variables were controlled for, using test results to hold schools accountable to authority and the public was significantly positively related to students’ performance across all three subjects. No statistically significant relationship, however, was detected between students’ performance and the following uses of test scores: informing parents of their children’s performance, providing information for instructional purposes, and evaluating teachers and principals.
Research Interests:
Read-aloud accommodations have been proposed as a way to help remove barriers faced by students with disabilities in reading comprehension. Many empirical studies have examined the effects of read-aloud accommodations; however, the... more
Read-aloud accommodations have been proposed as a way to help remove barriers faced by students with disabilities in reading comprehension. Many empirical studies have examined the effects of read-aloud accommodations; however, the results are mixed. With a variance-known hierarchical linear modeling approach, based on 114 effect sizes from 23 studies, a meta-analysis was conducted to examine the effects of read-aloud accommodations for students with and without disabilities. In general, both students with disabilities and students without disabilities benefited from the read-aloud accommodations, and the accommodation effect size for students with disabilities was significantly larger than the effect size for students without disabilities. Further, this meta-analysis reveals important factors that influence the effects of read-aloud accommodations. For instance, the accommodation effect was significantly stronger when the subject area was reading than when the subject area was math. The effect of read-aloud accommodations was also significantly stronger when the test was read by human proctors than when it was read by video/audio players or computers. Finally, the implications, limitations, and directions for future research are discussed.
Research Interests:
As engagement with science, technology, engineering, and mathematics (STEM) increases in after-school programs (ASPs), it is important to examine the impact of this engagement on students' academic achievement, STEM participation, and... more
As engagement with science, technology, engineering, and mathematics (STEM) increases in after-school programs (ASPs), it is important to examine the impact of this engagement on students' academic achievement, STEM participation, and affinity toward STEM. Results of these examinations can offer insights into both best practices that could be replicated and possible poor practices that could be avoided in ASP sites. This study describes the validation process that was undertaken on an instrument developed to measure science-related attitudes, and education and career trajectories of students participating in a STEM-focused ASP. We then use the validated instrument to draw certain conclusions about the impact of the ASP program on the participants. We propose a model for predicting students' notions about the importance of science for their future and a model for predicting students' enactment of science agency. The study and the derived instrument may be useful for those interested in examining the impact of STEM-focused ASPs on students' attitudes and proclivities toward science.
Research Interests:
With cognitive diagnostic analysis, each examinee receives a multidimensional skill profile expressing whether he/she is a master or nonmaster of each skill measured by the test. Fine-grained diagnostic feedback that facilitates teaching... more
With cognitive diagnostic analysis, each examinee receives a
multidimensional skill profile expressing whether he/she is a master or nonmaster of each skill measured by the test. Fine-grained diagnostic feedback that facilitates teaching and learning can thus be provided to teachers and students. This study investigated cognitive diagnostic analysis as applied to the Michigan English Language Assessment Battery (MELAB) reading test. The Fusion Model (Hartz, 2002) was used to estimate examinee profiles on each reading subskill underlying the MELAB reading test. With data collected from multiple sources, such as the think-aloud protocol and expert rating, a tentative Q-matrix was initially developed to indicate the subskills required by each item. This Q-matrix was then validated via an application of the Fusion Model using
data from the MELAB reading test. Four subskills were found to underlie the test, e.g., vocabulary, syntax, extracting explicit information, and understanding implicit information. Examinee skill mastery profiles were produced as the result of the cognitive diagnostic analysis. Finally, issues involved in the cognitive diagnostic analysis of reading tests were discussed,
and areas for future research were also suggested.
multidimensional skill profile expressing whether he/she is a master or nonmaster of each skill measured by the test. Fine-grained diagnostic feedback that facilitates teaching and learning can thus be provided to teachers and students. This study investigated cognitive diagnostic analysis as applied to the Michigan English Language Assessment Battery (MELAB) reading test. The Fusion Model (Hartz, 2002) was used to estimate examinee profiles on each reading subskill underlying the MELAB reading test. With data collected from multiple sources, such as the think-aloud protocol and expert rating, a tentative Q-matrix was initially developed to indicate the subskills required by each item. This Q-matrix was then validated via an application of the Fusion Model using
data from the MELAB reading test. Four subskills were found to underlie the test, e.g., vocabulary, syntax, extracting explicit information, and understanding implicit information. Examinee skill mastery profiles were produced as the result of the cognitive diagnostic analysis. Finally, issues involved in the cognitive diagnostic analysis of reading tests were discussed,
and areas for future research were also suggested.
Research Interests:
Research Interests:
In the present study we examined the ability of American and Chinese undergraduate students to calibrate their understanding of textbook passages translated into their native languages. Students read a series of texts and made predictions... more
In the present study we examined the ability of American and Chinese undergraduate students to calibrate their understanding of textbook passages translated into their native languages. Students read a series of texts and made predictions of their understanding of each text as well as the number of questions they would be able to answer correctly. Students also made postdictions of their test performance. Chinese students were significantly better than American students in calibrating their understanding of passages and predicting how many comprehension items they would answer correctly. Chinese students also outperformed American students on comprehension tests. All students were able to make more accurate postdictions of comprehension test scores than predictions. Results are related to possible instructional differences between American and Chinese students. Several possible directions for future research are discussed.
Research Interests:
The College English Test (CET) in China is a high-stakes standardized test to assess college students' English ability. One frequent claim against this test is that teachers may teach to the test, which could narrow the curriculum and... more
The College English Test (CET) in China is a high-stakes standardized test to assess college students' English ability. One frequent claim against this test is that teachers may teach to the test, which could narrow the curriculum and turn regular English classes into CET coaching. This study aims to find out whether teachers are truly teaching to the test and the potential reasons involved. In order to gain deeper and more focused insight into the influence of the CET on classroom teaching, only its writing section was examined. Based on data collected from some students and teachers at a University in Beijing, China, it was found that the overall influence of the CET writing was not as substantial as what has been claimed. Due to different stakeholders' perceptions of the CET, the influence on teachers was weak and indirect compared to a stronger and more direct influence on students. Also, teachers did not teach to the test due to the lower priority of writing among the four language skills. The relatively low requirement of the CET writing and its restrictive testing format also prevented the teachers from teaching to the test. Finally, the teachers' lack of professional training and some logistic factors outweighed the influence of the CET writing. It is pointed out that teacher factor may outweigh the influence of the CET, and thus rigorous teacher training should be provided to improve the efficiency of classroom teaching.