Online calibration is a technology-enhanced architecture for item calibration in computerized ada... more Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two num...
This paper discusses automated top-down heuristic assembly of multistage testing (MST). As a bala... more This paper discusses automated top-down heuristic assembly of multistage testing (MST). As a balanced compromise of linear tests and computerized adaptive testing, MST has gained increasing popularity in recent years. Automated test assembly (ATA) is one of the most important components for developing MST tests. In this study, we developed a top-down, heuristic design to assemble a classification MST test. Heuristic ATA methods have great application potential due to their simplicity and feasibility. Top-down assembly is also a useful strategy for assembling short MST tests or migrating existing linear tests to MST. However, MST literature has only had limited discussion on both topics. Full detail of the proposed assembly method is given in this paper. A simulation study was conducted to evaluate the performance of the method under different conditions crossed by the following five factors: (a) number of stages, (b) stage length assignment, (c) target information target source, (d) assembly order among stages, and (e) level of item overlap control. This paper could be used to guide practitioners and researchers on implementing this method in practice. The results also shed light on how to make decisions on the different test design options according to specific test conditions and requirements.
Purpose. Most multidimensional patient-reported out- comes (PRO) measures are lengthy to complete... more Purpose. Most multidimensional patient-reported out- comes (PRO) measures are lengthy to complete. Computerized adaptive testing (CAT) that selects the most informative items can potentially reduce respondent burden without sacrificing measurement accuracy. The commonly used maximum Fisher information item selection method has been reported to lead to highly unbalanced item bank usage and potentially imprecise trait estimation. This study employs the content-balancing strategy in a bifactor-modeled CAT item selection and examines its impact on measurement accuracy and item bank usage.
Methods. Item responses from a population-based SF-36 survey were first calibrated using the bifactor graded response model. Four post hoc CATs using items and responses from the SF-36 data set were then created. The content-balancing strategy was adopted in the item selection procedure of the bifactor-modeled CAT. The measurement accuracy and usage of items of the CAT were compared between the tests with and without the content-balancing strategy.
Results. The results indicate that the CAT implemented with the content-balancing strategy offers a better overall measurement accuracy of both the general health status and the two health domains (physical and mental) of the SF-36.
Conclusions. The content-balancing strategy helps the CAT–PRO to balance the selection of items and achieve improved measurement accuracy. Its implementation in real-time CAT administration to measure multidimensional PRO traits merits further studies.
Recently, multistage testing (MST) has been adopted by several important large-scale testing prog... more Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, ‘‘on- the-fly assembled multistage adaptive testing’’ (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the ‘‘Discussion’’ section provides suggestions on possible future adaptive testing designs based on the OMST frame- work, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.
Online calibration is a technology-enhanced architecture for item calibration in computerized ada... more Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two num...
This paper discusses automated top-down heuristic assembly of multistage testing (MST). As a bala... more This paper discusses automated top-down heuristic assembly of multistage testing (MST). As a balanced compromise of linear tests and computerized adaptive testing, MST has gained increasing popularity in recent years. Automated test assembly (ATA) is one of the most important components for developing MST tests. In this study, we developed a top-down, heuristic design to assemble a classification MST test. Heuristic ATA methods have great application potential due to their simplicity and feasibility. Top-down assembly is also a useful strategy for assembling short MST tests or migrating existing linear tests to MST. However, MST literature has only had limited discussion on both topics. Full detail of the proposed assembly method is given in this paper. A simulation study was conducted to evaluate the performance of the method under different conditions crossed by the following five factors: (a) number of stages, (b) stage length assignment, (c) target information target source, (d) assembly order among stages, and (e) level of item overlap control. This paper could be used to guide practitioners and researchers on implementing this method in practice. The results also shed light on how to make decisions on the different test design options according to specific test conditions and requirements.
Purpose. Most multidimensional patient-reported out- comes (PRO) measures are lengthy to complete... more Purpose. Most multidimensional patient-reported out- comes (PRO) measures are lengthy to complete. Computerized adaptive testing (CAT) that selects the most informative items can potentially reduce respondent burden without sacrificing measurement accuracy. The commonly used maximum Fisher information item selection method has been reported to lead to highly unbalanced item bank usage and potentially imprecise trait estimation. This study employs the content-balancing strategy in a bifactor-modeled CAT item selection and examines its impact on measurement accuracy and item bank usage.
Methods. Item responses from a population-based SF-36 survey were first calibrated using the bifactor graded response model. Four post hoc CATs using items and responses from the SF-36 data set were then created. The content-balancing strategy was adopted in the item selection procedure of the bifactor-modeled CAT. The measurement accuracy and usage of items of the CAT were compared between the tests with and without the content-balancing strategy.
Results. The results indicate that the CAT implemented with the content-balancing strategy offers a better overall measurement accuracy of both the general health status and the two health domains (physical and mental) of the SF-36.
Conclusions. The content-balancing strategy helps the CAT–PRO to balance the selection of items and achieve improved measurement accuracy. Its implementation in real-time CAT administration to measure multidimensional PRO traits merits further studies.
Recently, multistage testing (MST) has been adopted by several important large-scale testing prog... more Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, ‘‘on- the-fly assembled multistage adaptive testing’’ (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the ‘‘Discussion’’ section provides suggestions on possible future adaptive testing designs based on the OMST frame- work, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.
Uploads
Papers by Yi Zheng
Methods. Item responses from a population-based SF-36 survey were first calibrated using the bifactor graded response model. Four post hoc CATs using items and responses from the SF-36 data set were then created. The content-balancing strategy was adopted in the item selection procedure of the bifactor-modeled CAT. The measurement accuracy and usage of items of the CAT were compared between the tests with and without the content-balancing strategy.
Results. The results indicate that the CAT implemented with the content-balancing strategy offers a better overall measurement accuracy of both the general health status and the two health domains (physical and mental) of the SF-36.
Conclusions. The content-balancing strategy helps the CAT–PRO to balance the selection of items and achieve improved measurement accuracy. Its implementation in real-time CAT administration to measure multidimensional PRO traits merits further studies.
Methods. Item responses from a population-based SF-36 survey were first calibrated using the bifactor graded response model. Four post hoc CATs using items and responses from the SF-36 data set were then created. The content-balancing strategy was adopted in the item selection procedure of the bifactor-modeled CAT. The measurement accuracy and usage of items of the CAT were compared between the tests with and without the content-balancing strategy.
Results. The results indicate that the CAT implemented with the content-balancing strategy offers a better overall measurement accuracy of both the general health status and the two health domains (physical and mental) of the SF-36.
Conclusions. The content-balancing strategy helps the CAT–PRO to balance the selection of items and achieve improved measurement accuracy. Its implementation in real-time CAT administration to measure multidimensional PRO traits merits further studies.