An important consideration of any computer adaptive testing (CAT) program is the criterion used f... more An important consideration of any computer adaptive testing (CAT) program is the criterion used for ending item administration-the stopping rule, which ensures that all examinees are assessed to the same standard. Although various stopping rules exist, none of them have been compared under the generalized partial-credit model (Muraki in Applied Psychological Measurement, 16, 159-176, 1992). In this simulation study we compared the performance of three variable-length stopping rules-standard error (SE), minimum information (MI), and change in theta (CT)-both in isolation and in combination with requirements of minimum and maximum numbers of items, as well as a fixed-length stopping rule. Each stopping rule was examined under two termination criteria-one a more lenient requirement (SE = 0.35, MI = 0.56, CT = 0.05), and one more stringent (SE = 0.30, MI = 0.42, CT = 0.02). The simulation design also included content-balancing and exposure controls, aspects of CAT that have been exclude...
The current study proposes novel methods to predict multistage testing (MST) performance without ... more The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance.
... Research questions were grouped by the two phases of the study. In Phase 1, for each of five ... more ... Research questions were grouped by the two phases of the study. In Phase 1, for each of five high school graduate cohorts (1998–2002) and for each of seven AP Exams studied separately—English Language and Composition, English Literature and Composition, Calculus ...
... 1 Research Questions . . . . . 2 ... 26 8. Descriptive Statistics and Planned Comparison Resu... more ... 1 Research Questions . . . . . 2 ... 26 8. Descriptive Statistics and Planned Comparison Results for AP vs. Non-AP English Literature and Composition . . . . . 28 ...
Exposure control research with polytomous item pools has determined that randomization procedures... more Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). The current study investigated the performance of four procedures for controlling item exposure in a CAT under the partial credit model. In addition to a no exposure control baseline condition, the Kingsbury-Zara, modified-within-.10-logits, Sympson-Hetter, and conditional Sympson-Hetter procedures were implemented to control exposure rates. The Kingsbury-Zara and the modified-within-.10-logits procedures were implemented with 3 and 6 item candidate conditions. The results show that the Kingsbury-Zara and modified-within-.10-logits procedures with 6 item candidates performed as well as the conditional Sympson-Hetter in terms of exposure rates, overlap rates, and pool utilization. These two procedures are strongly recommended for use with partial credit CATs due to their simplicity and strength of their results.
Marginal maximum likelihood estimation based on the expectation-maximization algorithm (MML/EM) i... more Marginal maximum likelihood estimation based on the expectation-maximization algorithm (MML/EM) is developed for the one-parameter logistic model with ability-based guessing (1PL-AG) item response theory (IRT) model. The use of the MML/EM estimator is cross-validated with estimates from NLMIXED procedure (PROC NLMIXED) in Statistical Analysis System. Numerical data are provided for comparisons of results from MML/EM and PROC NLMIXED.
This study investigated parameter recovery for the partial credit model using the MULTILOG comput... more This study investigated parameter recovery for the partial credit model using the MULTILOG computer program. Factors studied were the sample size and the number of item parameters, which were manipulated by systematically varying the number of steps per item and the number of items. The findings suggest that the ratio of sample size to number of item parameters being estimated as a "rule of thumb" can be a more complete guideline when the number of steps per item is taken into account. Accurate estimation of ability can be obtained across all conditions, even with sample sizes as small as 250. With regard to estimation of step values, however, more caution is warranted. Accurate estimation of the step values of items which have more categories requires larger sample sizes for a given number of total parameters to be estimated.
In the development of health outcome measures, the pool of candidate items may be divided into mu... more In the development of health outcome measures, the pool of candidate items may be divided into multiple forms, thus "spreading" response burden over two or more study samples. Item responses collected using this approach result in two or more forms whose scores are not equivalent. Therefore, the item responses must be equated (adjusted) to a common mathematical metric. The purpose of this study was to examine the effect of sample size, test size, and selection of item response theory model in equating three forms of a health status measure. Each of the forms was comprised of a set of items unique to it and a set of anchor items common across forms. The study was a secondary data analysis of patients' responses to the developmental item pool for the Health of Seniors Survey. A completely crossed design was used with 25 replications per study cell. We found that the quality of equatings was affected greatly by sample size. Its effect was far more substantial than choice ...
An important consideration of any computer adaptive testing (CAT) program is the criterion used f... more An important consideration of any computer adaptive testing (CAT) program is the criterion used for ending item administration-the stopping rule, which ensures that all examinees are assessed to the same standard. Although various stopping rules exist, none of them have been compared under the generalized partial-credit model (Muraki in Applied Psychological Measurement, 16, 159-176, 1992). In this simulation study we compared the performance of three variable-length stopping rules-standard error (SE), minimum information (MI), and change in theta (CT)-both in isolation and in combination with requirements of minimum and maximum numbers of items, as well as a fixed-length stopping rule. Each stopping rule was examined under two termination criteria-one a more lenient requirement (SE = 0.35, MI = 0.56, CT = 0.05), and one more stringent (SE = 0.30, MI = 0.42, CT = 0.02). The simulation design also included content-balancing and exposure controls, aspects of CAT that have been exclude...
The current study proposes novel methods to predict multistage testing (MST) performance without ... more The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance.
... Research questions were grouped by the two phases of the study. In Phase 1, for each of five ... more ... Research questions were grouped by the two phases of the study. In Phase 1, for each of five high school graduate cohorts (1998–2002) and for each of seven AP Exams studied separately—English Language and Composition, English Literature and Composition, Calculus ...
... 1 Research Questions . . . . . 2 ... 26 8. Descriptive Statistics and Planned Comparison Resu... more ... 1 Research Questions . . . . . 2 ... 26 8. Descriptive Statistics and Planned Comparison Results for AP vs. Non-AP English Literature and Composition . . . . . 28 ...
Exposure control research with polytomous item pools has determined that randomization procedures... more Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). The current study investigated the performance of four procedures for controlling item exposure in a CAT under the partial credit model. In addition to a no exposure control baseline condition, the Kingsbury-Zara, modified-within-.10-logits, Sympson-Hetter, and conditional Sympson-Hetter procedures were implemented to control exposure rates. The Kingsbury-Zara and the modified-within-.10-logits procedures were implemented with 3 and 6 item candidate conditions. The results show that the Kingsbury-Zara and modified-within-.10-logits procedures with 6 item candidates performed as well as the conditional Sympson-Hetter in terms of exposure rates, overlap rates, and pool utilization. These two procedures are strongly recommended for use with partial credit CATs due to their simplicity and strength of their results.
Marginal maximum likelihood estimation based on the expectation-maximization algorithm (MML/EM) i... more Marginal maximum likelihood estimation based on the expectation-maximization algorithm (MML/EM) is developed for the one-parameter logistic model with ability-based guessing (1PL-AG) item response theory (IRT) model. The use of the MML/EM estimator is cross-validated with estimates from NLMIXED procedure (PROC NLMIXED) in Statistical Analysis System. Numerical data are provided for comparisons of results from MML/EM and PROC NLMIXED.
This study investigated parameter recovery for the partial credit model using the MULTILOG comput... more This study investigated parameter recovery for the partial credit model using the MULTILOG computer program. Factors studied were the sample size and the number of item parameters, which were manipulated by systematically varying the number of steps per item and the number of items. The findings suggest that the ratio of sample size to number of item parameters being estimated as a "rule of thumb" can be a more complete guideline when the number of steps per item is taken into account. Accurate estimation of ability can be obtained across all conditions, even with sample sizes as small as 250. With regard to estimation of step values, however, more caution is warranted. Accurate estimation of the step values of items which have more categories requires larger sample sizes for a given number of total parameters to be estimated.
In the development of health outcome measures, the pool of candidate items may be divided into mu... more In the development of health outcome measures, the pool of candidate items may be divided into multiple forms, thus "spreading" response burden over two or more study samples. Item responses collected using this approach result in two or more forms whose scores are not equivalent. Therefore, the item responses must be equated (adjusted) to a common mathematical metric. The purpose of this study was to examine the effect of sample size, test size, and selection of item response theory model in equating three forms of a health status measure. Each of the forms was comprised of a set of items unique to it and a set of anchor items common across forms. The study was a secondary data analysis of patients' responses to the developmental item pool for the Health of Seniors Survey. A completely crossed design was used with 25 replications per study cell. We found that the quality of equatings was affected greatly by sample size. Its effect was far more substantial than choice ...
Uploads
Papers by Barbara Dodd